A Survey of Vision Transformers in Autonomous Driving: Current Trends and Future Directions

Quoc-Vinh Lai-Dang

2024 arXiv.org Cited 14 times

Abstract

This survey explores the adaptation of visual transformer models in Autonomous Driving, a transition inspired by their success in Natural Language Processing. Surpassing traditional Recurrent Neural Networks in tasks like sequential image processing and outperforming Convolutional Neural Networks in global context capture, as evidenced in complex scene recognition, Transformers are gaining traction in computer vision. These capabilities are crucial in Autonomous Driving for real-time, dynamic visual scene processing. Our survey provides a comprehensive overview of Vision Transformer applications in Autonomous Driving, focusing on foundational concepts such as self-attention, multi-head attention, and encoder-decoder architecture. We cover applications in object detection, segmentation, pedestrian detection, lane detection, and more, comparing their architectural merits and limitations. The survey concludes with future research directions, highlighting the growing role of Vision Transformers in Autonomous Driving.

Cited in this thesis

Literature Survey

Frequently Cited Together

Identification of the Species of Origin for Meat Products by Rapid Evaporative IBalog 20161 chapter
Fishers' preference for mobile traceability platform: challenges in achieving a Untal 20251 chapter
Automatic design of convolutional neural network architectures under resource coLi 20211 chapter
Unlocking the combined impact of microplastics and emerging contaminants on fishWu 20251 chapter
Microplastic contamination in wild freshwater fish: global trends, challenges ande Araujo 20251 chapter
Adaptive mixtures of local expertsJacobs 19911 chapter

BibTeX

@article{LaiDang2024,
  author = {Lai-Dang, Quoc-Vinh},
  journal = {arXiv preprint arXiv:2403.07542},
  title = {A survey of vision transformers in autonomous driving: Current trends and future directions},
  year = {2024},
}