Paper ID: 2405.05885

VLM-Auto: VLM-based Autonomous Driving Assistant with Human-like Behavior and Understanding for Complex Road Scenes

Ziang Guo, Zakhar Yagudin, Artem Lykov, Mikhail Konenkov, Dzmitry Tsetserukou

Recent research on Large Language Models for autonomous driving shows promise in planning and control. However, high computational demands and hallucinations still challenge accurate trajectory prediction and control signal generation. Deterministic algorithms offer reliability but lack adaptability to complex driving scenarios and struggle with context and uncertainty. To address this problem, we propose VLM-Auto, a novel autonomous driving assistant system to empower the autonomous vehicles with adjustable driving behaviors based on the understanding of road scenes. A pipeline involving the CARLA simulator and Robot Operating System 2 (ROS2) verifying the effectiveness of our system is presented, utilizing a single Nvidia 4090 24G GPU while exploiting the capacity of textual output of the Visual Language Model (VLM). Besides, we also contribute a dataset containing an image set and a corresponding prompt set for fine-tuning the VLM module of our system. In CARLA experiments, our system achieved $97.82\%$ average precision on 5 types of labels in our dataset. In the real-world driving dataset, our system achieved $96.97\%$ prediction accuracy in night scenes and gloomy scenes. Our VLM-Auto dataset will be released at this https URL.

Submitted: May 9, 2024