A Survey of Vision Transformers in Autonomous Driving: Current Trends and Future Directions
Abstract
This survey explores the adaptation of visual transformer models in Autonomous Driving, a transition inspired by their success in Natural Language Processing. Surpassing traditional Recurrent Neural Networks in tasks like sequential image processing and outperforming Convolutional Neural Networks in global context capture, as evidenced in complex scene recognition, Transformers are gaining traction in computer vision. These capabilities are crucial in Autonomous Driving for real-time, dynamic visual scene processing. Our survey provides a comprehensive overview of Vision Transformer applications in Autonomous Driving, focusing on foundational concepts such as self-attention, multi-head attention, and encoder-decoder architecture. We cover applications in object detection, segmentation, pedestrian detection, lane detection, and more, comparing their architectural merits and limitations. The survey concludes with future research directions, highlighting the growing role of Vision Transformers in Autonomous Driving.
Cited in this thesis
Frequently Cited Together
- Minimally Invasive Evaluation of Venous Leg Ulcers in an Outpatient Setting Usin1 chapter
- Fish mislabelling in France: substitution rates and retail types1 chapter
- DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learni1 chapter
- Application of rapid evaporative ionization mass spectrometry in preclinical and1 chapter
- Qualitative and quantitative analysis of adulterated Antarctic Krill Oil (AKO) b1 chapter
- DNA barcoding reveals mislabeling of endangered sharks sold as swordfish in New 1 chapter
BibTeX
@article{LaiDang2024,
author = {Lai-Dang, Quoc-Vinh},
journal = {arXiv preprint arXiv:2403.07542},
title = {A survey of vision transformers in autonomous driving: Current trends and future directions},
year = {2024},
}