Towards Next-Generation FPGA-Accelerated Vision-Based Autonomous Driving: A Comprehensive Review

Chowdhury, Md. Reasad Zaman; Seum, Ashek; Talukder, Mahfuzur Rahman; Amin, Rashed Al; Hossain, Fakir Sharif; Obermaisser, Roman

doi:10.3390/signals6040053

Open AccessReview

Towards Next-Generation FPGA-Accelerated Vision-Based Autonomous Driving: A Comprehensive Review

by

Md. Reasad Zaman Chowdhury

^1,†

,

Ashek Seum

^1,†

,

Mahfuzur Rahman Talukder

²

,

Rashed Al Amin

^3,*

,

Fakir Sharif Hossain

²

and

Roman Obermaisser

³

¹

Department of Computer Science & Engineering, Ahsanullah University of Science and Technology, 141–142 Love Road, Dhaka 1208, Bangladesh

²

Department of Electrical & Electronic Engineering, Ahsanullah University of Science and Technology, 141–142 Love Road, Dhaka 1208, Bangladesh

³

Institute for Embedded Systems, University of Siegen, 57076 Siegen, Germany

^*

Author to whom correspondence should be addressed.

^†

These authors contributed equally to this work.

Signals 2025, 6(4), 53; https://doi.org/10.3390/signals6040053

Submission received: 24 July 2025 / Revised: 20 August 2025 / Accepted: 1 September 2025 / Published: 1 October 2025

Download

Browse Figures

Versions Notes

Abstract

Autonomous driving has emerged as a rapidly advancing field in both industry and academia over the past decade. Among the enabling technologies, computer vision (CV) has demonstrated high accuracy across various domains, making it a critical component of autonomous vehicle systems. However, CV tasks are computationally intensive and often require hardware accelerators to achieve real-time performance. Field Programmable Gate Arrays (FPGAs) have gained popularity in this context due to their reconfigurability and high energy efficiency. Numerous researchers have explored FPGA-accelerated CV solutions for autonomous driving, addressing key tasks such as lane detection, pedestrian recognition, traffic sign and signal classification, vehicle detection, object detection, environmental variability sensing, and fault analysis. Despite this growing body of work, the field remains fragmented, with significant variability in implementation approaches, evaluation metrics, and hardware platforms. Crucial performance factors, including latency, throughput, power consumption, energy efficiency, detection accuracy, datasets, and FPGA architectures, are often assessed inconsistently. To address this gap, this paper presents a comprehensive literature review of FPGA-accelerated, vision-based autonomous driving systems. It systematically examines existing solutions across sub-domains, categorizes key performance factors and synthesizes the current state of research. This study aims to provide a consolidated reference for researchers, supporting the development of more efficient and reliable next generation autonomous driving systems by highlighting trends, challenges, and opportunities in the field.

Keywords:

FPGAs; autonomous driving; computer vision; deep learning; real-time processing; benchmarking; object detection; traffic light detection; traffic sign recognition; lane detection

1. Introduction

In recent years, artificial intelligence (AI) and machine learning (ML) have significantly transformed numerous domains, enabling intelligent automation, predictive analytics and data-driven decision making. From healthcare diagnostics and financial modeling to robotics and intelligent surveillance, these technologies have led to groundbreaking advancements across diverse fields. Among the various subfields of AI, computer vision (CV) has gained particular prominence due to its ability to accurately interpret and understand visual information [1]. CV applications now play a pivotal role in facial recognition, medical imaging, industrial inspection, and environmental monitoring, delivering remarkable results in both research and commercial sectors. One of the most impactful applications of autonomous driving lies in real-time perception and situational awareness, which are essential for safe and efficient vehicle navigation. CV enables critical functions such as object detection, lane recognition, pedestrian identification and traffic signal interpretation, serving as the visual backbone of autonomous vehicle systems [2]. The rapid development of deep-learning-based vision models has notably improved the accuracy and reliability of these tasks, bringing autonomous driving closer to real world deployment. However, implementing such models remains challenging due to their high computational complexity and power consumption, especially in edge environments such as vehicles. These resource intensive requirements constitute a significant bottleneck to deploying real-time CV systems for autonomous driving at scale.

To address these challenges, Field Programmable Gate Arrays (FPGAs) have emerged as a promising hardware platform for accelerating CV workloads [3]. FPGAs are recognized for their reconfigurability, parallel processing capabilities and superior energy efficiency compared to traditional CPUs and GPUs [4]. They have been successfully utilized in a wide range of applications, including data center acceleration, wireless communication, cryptographic processing, and embedded AI systems. Their customizability enables them to meet the stringent latency and power constraints of real-time embedded systems. In the context of autonomous driving, FPGAs offer a compelling solution for implementing computer vision tasks. By offloading key components such as convolutional neural networks (CNNs), object-detection pipelines and image processing algorithms onto hardware, FPGAs can deliver significant improvements in speed and power efficiency [5,6,7,8,9,10]. Industry benchmarks report that FPGA-based implementations can achieve latencies up to an order of magnitude lower than GPU counterparts and demonstrate energy efficiency improvements of approximately 3 to 4 times better GFLOPS per watt by tailoring hardware precisely to the computation [11]. Numerous studies have demonstrated the effectiveness of using FPGAs for computer vision tasks in autonomous systems, including lane detection, vehicle recognition, and traffic signal classification. These implementations not only reduce inference latency but also ensure deterministic performance under energy constrained conditions, making FPGAs a strong candidate for next generation autonomous driving platforms. While FPGAs offer advantages in specific scenarios, their energy efficiency compared to GPUs can vary depending on workload and optimization.

Despite the promising advances in both hardware acceleration and CV algorithms, the landscape of FPGA-accelerated vision-based autonomous driving remains highly diverse and unstandardized [12]. Researchers have explored a wide range of AI and ML algorithms, spanning from traditional image processing techniques to deep-learning models, including CNNs, YOLO variants, and semantic segmentation networks, each with distinct architectural complexities and computational requirements. Furthermore, the datasets used for training and evaluation vary significantly in terms of AI model, scale, labeling schemes, and sensor modalities. The use of different FPGA platforms, toolchains, optimization strategies, and evaluation metrics, such as latency, throughput, power consumption, accuracy, and resource utilization, further compounds this heterogeneity. As a result, comparing existing solutions becomes increasingly challenging and identifying optimal trade-offs for real world deployment remains a significant challenge. This review paper presents a structured literature-based survey and comparative analysis of FPGA-accelerated vision-based perception in autonomous vehicles. Rather than proposing a unified experimental benchmark or synthesized model, it maps existing solutions across perception tasks, algorithms, datasets, and hardware deployments, highlighting trends in performance metrics such as latency, throughput and energy efficiency. The value lies in consolidating fragmented research to support informed design choices, while acknowledging the absence of a standardized validation framework. Performance comparisons across FPGA-accelerated vision-based autonomous driving systems are challenging due to heterogeneity in datasets, input sizes, hardware platforms, and evaluation metrics. These differences complicate direct one-to-one comparisons, thus reported results should be viewed as indicative trends rather than exact benchmarks. Consequently, developing standardized evaluation protocols remains an essential future step to enable fair and reliable benchmarking in this field.

To address these challenges, this review paper presents a comprehensive literature-based comparative performance survey and a structured, in-depth investigation into the use of FPGAs for vision-based perception in autonomous vehicles. Rather than focusing on isolated implementations, it aims to bridge the gap between diverse methodologies by mapping the landscape of existing solutions across different vision tasks, algorithmic frameworks, and hardware deployments. The study identifies and contrasts various architectural design choices, datasets and optimization strategies, shedding light on their practical implications and performance under real world constraints. In addition, it evaluates existing approaches based on key performance metrics, including latency, throughput, power consumption, energy efficiency, resource utilization, and accuracy. By consolidating these insights into a literature-based comparative performance survey, the paper provides a contextual foundation that supports informed decision making for future development and highlights the need for a standardized evaluation framework.

While this review does not propose a unified model, pipeline, or experimental benchmark, its value lies in the breadth of analysis across tasks and datasets, providing a technical reference for researchers navigating hardware constraints and algorithmic trade-offs.

2. Vision-Based Perception for Autonomous Driving in FPGAs

This section presents a systematic review of key perception tasks in vision-based autonomous driving, examining the evolution and real world deployment of detection algorithms, from classical computer vision to state-of-the-art deep learning with a dedicated focus on FPGA-accelerated implementations. The extensive variability in algorithms, datasets, platforms, and evaluation metrics observed in the existing literature is highlighted to guide the development of more reliable and efficient next generation autonomous driving systems.

2.1. Vehicle and Pedestrian Detection

Vehicle and pedestrian detection are fundamental perception tasks for autonomous driving, directly affecting the critical safety decision-making process. Accurate and real-time detection of these essential road objects is crucial not only for collision avoidance but also for enabling advanced maneuvering, path planning, and interaction with dynamic environments. As such, efficient implementation of vehicle and pedestrian detection at the edge remains an area of active research. This continuous evolution is further necessitated by the growing complexity of urban and highway scenarios, where dense traffic, occlusions, and varying lighting conditions pose significant challenges to both the algorithm and hardware. This subsection surveys the progression from classical feature-based methods to modern deep-learning algorithms, with a focus on their deployment in FPGA-accelerated vision systems for autonomous vehicles. Figure 1 illustrates an example of object detection, where bounding boxes are drawn around detected vehicles and pedestrians.

2.2. Classical Computer Vision Approaches

Early implementations predominantly utilized traditional computer vision techniques coupled with lightweight machine-learning models, primarily due to the computational constraints of embedded platforms. The Histogram of Oriented Gradients (HOG) feature descriptor, combined with Support Vector Machines (SVM), has emerged as the predominant classical approach, offering advantages in terms of reduced memory footprint and computational requirements, making it suitable for embedded deployment.

Several pioneering works demonstrated the effectiveness of these classical approaches on FPGA platforms. Martelli et al. (2011) in [13,14] presented a fast FPGA-based architecture for pedestrian detection using covariance matrices, achieving 132 fps for

128 \times 64

pixel images on a Xilinx Virtex-6 LX240T FPGA, exploiting the symmetry of second order integrals and tensor computation parallelism to minimize latency. Lin et al. (2008) developed a PID controller based on fuzzy logic using VHDL for vehicle collision avoidance, demonstrating the application of traditional control methods on FPGA platforms for real-time applications [15]. The HOG + SVM combination proved particularly successful for FPGA implementations due to its computational characteristics. Suleiman and Sze (2016) presented an energy efficient hardware implementation of HOG-based object detection, achieving 1080p processing at 60 fps with multiscale support on 45 nm SOI CMOS ASIC technology, consuming an average power of 69 mW while maintaining high detection accuracy [14,16]. Building on this foundation, Meus et al. (2017) implemented a HOG + SVM pipeline on a Xilinx Zynq SoC, employing a hardware software codesign approach where the ARM processor handled detection and tracking while the FPGA-accelerated HOG and SVM processing, achieving 60 fps for

1270 \times 720

pixel images with an energy efficiency of 3.95 GOPS/W [14,17]. In addition, Nazir et al. (2018) demonstrated the feasibility of traditional methods on low cost embedded platforms by implementing HOG and SVM on Raspberry Pi 3 and Odroid C2, achieving 5–7 fps for

320 \times 97

pixel images [14,18]. Borrego-Carazo et al. (2020) emphasized that SVMs are particularly well suited for resource constrained hardware due to their lightweight inference characteristics following offline training [14].

Recent comprehensive analysis by Lin (2023) provided a thorough review of HOG-SVM pedestrian-detection methods based on FPGA, confirming that current implementations typically achieve detection accuracy rates exceeding 95% and detection speeds greater than 30 FPS for INRIA datasets, reinforcing the continued relevance of classical approaches alongside modern deep-learning methods [19]. Advanced HOG implementations have achieved remarkable efficiency improvements, with novel low resource consumption hardware implementations achieving approximately 0.933 pixels per clock cycle processing speed while maintaining 91.79% accuracy on the INRIA dataset and 98.49% on the MIT dataset, demonstrating only minimal (i.e., <2% absolute) accuracy degradation (1.2% and 0.11%, respectively) compared to original HOG algorithms [20].

2.3. Deep Learning Revolution

The emergence of deep neural networks (DNNs), particularly Convolutional Neural Networks (CNNs), has substantially improved object-detection accuracy. However, their computational intensity presents significant challenges for real-time deployment on resource constrained FPGAs, necessitating various optimization strategies. Two-stage detectors, including R-CNN, Fast R-CNN, Faster R-CNN, and Mask R-CNN, employ a sequential approach that involves generating region proposals followed by classification and localization. While achieving superior accuracy, their two-stage architecture and larger model sizes result in higher latency, making deployment on embedded FPGA platforms challenging. Conversely, one-stage detectors, such as YOLO (You Only Look Once) and SSD (Single Shot MultiBox Detector), directly predict bounding boxes and classes from feature maps, offering significantly higher detection speeds that are suitable for real-time applications.

Early deep-learning implementations on FPGAs focused on compressed and lightweight architectures to address resource constraints. Fan et al. (2018) developed a real-time object-detection accelerator using compressed SSDLite on FPGA, achieving a throughput of 65 FPS [21]. Takasaki et al. (2021) implemented a road marking detector using Binarized Neural Networks (BNN) on the Programmable Logic (PL) of an Ultra96-V2 board, achieving 0.0054059 s processing time (approximately 185 FPS) for

60 \times 48

images, demonstrating ultra low latency processing capabilities for autonomous driving applications [22]. Surapally et al. (2022) evaluated Quantized Neural Networks (QNNs) for object detection using a Tiny YOLO variant on an AMD-Xilinx PYNQ-Z2 board, achieving a 50× speedup compared to software implementations [23]. Additionally, Talib et al. (2022) conducted a comparative analysis of CNN, QNN, and BNN implementations on ZYNQ FPGAs, identifying the CNN’s superior accuracy and memory efficiency for object-detection tasks [24]. Recent advances have introduced sophisticated real-time hazard-detection systems specifically designed for autonomous vehicles. Zhou et al. (2025) achieved 92.3% mAP with 8 ms latency for detecting pedestrians, vehicles, and obstacles, utilizing an attention-based dynamic CNN with DVFS optimization that consumes only 35 W on an FPGA, compared to 250 W on a GPU, while processing at 125 FPS for the CNN inference stage [25]. This represents a significant advance in power efficiency for safety critical applications.

Hybrid approaches combining traditional and deep-learning methods have also shown promise. Hamdaoui et al. (2022) proposed an optimized hardware vision system combining HOG, Particle Swarm Optimization (PSO), and SVM on a Virtex-7 FPGA for vehicle detection, achieving 97.84% accuracy on the KITTI dataset with 1.483 ms latency [26]. Kojima (2022) developed an autonomous robot car using an Xilinx SoC FPGA (Ultra96 board with XCZU3EG) for real-time image processing with YOLOv3 tiny via Xilinx DPU IP, achieving approximately 3 FPS for

640 \times 480

images, highlighting the challenges of real-time processing for larger input sizes [27].

Advanced optimization techniques have enabled more sophisticated implementations. Zhai et al. (2023) implemented YOLOv3 and YOLOv3-tiny for vehicle detection and tracking on Zynq-7000 FPGAs, achieving significant model size reduction (up to 98.2%) through dynamic threshold structured pruning and 16-bit fixed point quantization, alongside hardware optimizations including memory interlayer multiplexing and Winograd algorithms, with the YOLOv3-tiny model achieving 91.65 fps with 12.51 W power consumption, demonstrating high cost efficiency [28]. Baczmanski et al. (2023) implemented the MultiTaskV3 detection segmentation network on an AMD Xilinx Kria KV260 SoC FPGA, achieving over 97% mAP for detection and above 90% mIoU for segmentation while consuming approximately 5 W at 4.85 FPS [29]. Anupreetham et al. (2023) presented an end-to-end fully pipelined FPGA-based object-detection system accelerating SSD-MobileNet-V1, achieving a very high throughput of 2167 FPS with 2.13 ms latency while maintaining 22.8 mAP on an Intel Stratix 10 FPGA through a novel pipelined Non-Maximum Suppression (NMS) algorithm that eliminated sequential dependencies [30]. Specialized applications have emerged for military surveillance, with Vasavi et al. (2024) achieving 93% accuracy for tank and APC detection (75.4% AP for tanks, 83.0% AP for APCs) using Mask R-CNN with ResNet50+FPN backbone on ZYBO Z7-10 ZYNQ-7000 FPGA platforms [31]. Advanced optimization strategies demonstrate remarkable performance improvements, with Jeyalakshmi et al. (2025) achieving a 1600× speedup over software implementation using VGG16 with quantization aware training and the FINN framework on Pynq-Z2 boards for obstacle avoidance systems [32].

Recent implementations have focused on achieving higher accuracy while maintaining real-time performance. Guerrouj et al. (2023) explored YOLOv4 acceleration on Intel Arria 10 FPGAs for autonomous driving, focusing on General Matrix Multiplication (GEMM) implementation and achieving competitive mAP on KITTI (up to 89.40%) and Self Driving Car datasets with 38 ms runtime on KITTI [33]. Ali et al. (2023) integrated object detection using YOLOv5 into an ADAS framework on a DE10 Nano board, achieving 55 FPS for single channel processing [34]. Al Amin et al. (2024) developed an FPGA-based real-time object detection and classification system using YOLOv3 Tiny on a Xilinx Kria KV260, achieving 15 FPS for HD video streaming with 99% accuracy while consuming only 3.5 W [35]. Power optimization techniques for FPGAs in autonomous vehicles have achieved remarkable efficiency gains, with Kalaiselvi et al. (2025) reducing power consumption by 65.9% while maintaining 91.7% lane-detection accuracy, demonstrating the potential for energy efficient autonomous systems through dynamic voltage and frequency scaling approaches [36].

Multi-task learning systems now integrate multiple perception tasks on single FPGA platforms, with Tatar et al. (2024) implementing real-time multi-learning deep neural networks on MPSoC-FPGA processing 5 ADAS functions at 22.45 FPS with only 6.920 W power consumption [37]. The field has witnessed significant developments in specialized vehicle-detection applications. Vaithianathan (2024) presented an innovative methodology for real-time object detection and recognition in FPGA-based autonomous driving systems, integrating deep-learning methodologies with FPGA hardware acceleration to achieve minimal latency and optimal precision necessary for secure navigation [38]. Mani et al. (2024) developed a high-accuracy FPGA-based system specifically designed for emergency vehicle classification, achieving 99.87% accuracy using a ResNet50-MOP-CB network architecture. This demonstrates the versatility of FPGA platforms in handling specialized vehicle classification tasks critical for autonomous driving systems [39]. Advanced driver assistance systems with real-time image processing on custom Xilinx DPUs achieve 22.15 FPS with 57.76% segmentation mIoU while consuming only 7.19W, showcasing the potential for multi-task learning on embedded platforms [40].

Efficient FPGA-based embedded vision platforms achieve 361.8 GOPS/W energy efficiency for mobile robot applications, with Yang et al. (2024) demonstrating superior performance for autonomous mobile robots through accumulation-as-convolution packing techniques [41]. Innovative approaches have emerged incorporating alternative sensing modalities. Izquierdo et al. (2024) introduced an acoustic-based pedestrian-detection system using MEMS acoustic arrays, with FPGA implementation handling sensor acquisition tasks while processing algorithms detect pedestrians in real-time urban environments, demonstrating the expanding scope of FPGA applications beyond traditional vision-based detection [42]. Cambuim et al. (2022) developed an FPGA-based pedestrian-detection system specifically designed for collision prediction, emphasizing the safety critical aspects of real-time detection in autonomous vehicles [43]. Advanced sensor fusion approaches achieve 97% accuracy with a 0.421 ms prediction time using mmWave radar data processed on FPGA platforms, demonstrating the potential for multimodal perception systems [44].

2.4. Optimization Strategies

Optimization techniques for FPGA-based implementations encompass both model compression and hardware optimization design methodologies, which are crucial for fitting large DNNs on limited FPGA resources while maintaining real-time performance. Model compression techniques aim to reduce the computational and memory requirements of neural networks. Quantization involves converting floating point weights and activations to lower bit fixed point representations, such as 1-bit, 8-bit, 16-bit, etc., which significantly reduces the memory footprint and computational requirements with minimal accuracy degradation. For instance, Sim-YOLOv2 with 1-bit weights and 3–6-bit activation achieved a 31× model size reduction with only a 10.15% accuracy loss [45]. Pruning eliminates redundant connections or neurons (unstructured pruning) or entire filters/channels (structural pruning), thereby reducing the number of parameters and computations. Structural pruning is preferred for hardware efficiency as it maintains regular computational patterns [45]. Removing fully connected layers, which typically contain numerous parameters and require frequent off-chip memory access, can be achieved by replacing them with pooling layers or eliminating them to reduce memory bandwidth and improve inference speed [45]. Recent work by Emmanuel et al. (2024) on optimizing resource utilization and power efficiency in FPGA-accelerated YOLOv8 object detection achieves 9.2% resource reduction with 9.342 W power consumption using Vivado High Level Synthesis tools on Xilinx ZYNQ-7 ZC706 platforms, demonstrating continued advances in optimization techniques [46].

Hardware optimization design methods focus on maximizing computational parallelism and efficient data flow within the FPGA architecture. Pipelining involves overlapping execution stages to process multiple data elements concurrently, thereby improving throughput. Line buffering efficiently manages and reuses on-chip memory for feature map operations, reducing off-chip memory accesses. Loop unrolling, tiling, and reordering increase parallelism in computations by processing multiple data elements or iterations simultaneously, optimizing memory access patterns and data reuse [45]. Fused layer architecture combines operations of adjacent layers, such as convolution and batch normalization, or multiple convolution layers, to reduce off-chip data transfers and intermediate memory storage. Fast convolution algorithms such as Winograd and FFT reduce the number of multiplications, significantly improving computational efficiency for convolutional layers.

2.5. Performance Analysis and Comparison

The comparative analysis of FPGA-based implementations for vehicle and pedestrian detection is presented in Table 1 and Table 2. Table 1 summarizes the methods, datasets, metrics, and scores, highlighting the diverse evaluation approaches and the complexity of the datasets used, particularly challenging datasets such as KITTI [26,33]. Table 2 focuses on implementation details, reporting platform specifications, processing speeds, and power consumption. This separation clarifies the distinct aspects of performance: algorithmic effectiveness and hardware implementation efficiency.

2.6. Comparative Performance Insights

Figure 2 and Figure 3 display a comprehensive visualization of FPGA-based vehicle and pedestrian-detection performance across different methodologies. The analysis reveals significant performance evolution from classical to modern approaches. Modern deep-learning implementations demonstrate remarkable performance improvements, with the SSD-MobileNet-V1 implementation by Anupreetham et al. (2023) achieving an exceptional 2167 FPS through highly efficient hardware acceleration and a fully pipelined architecture [30]. The recent hazard-detection system by Zhou et al. (2025) achieves a high throughput of 125 FPS with 92.3% mAP while consuming only 35W, representing a 7× power reduction compared to GPU implementations [25]. Similarly, the BNN road marking detector by Takasaki et al. (2021) achieves approximately 185 FPS for smaller image sizes (

60 \times 48

) [22].

On the other hand, traditional HOG + SVM-based methods by Suleiman and Sze (2016) and Meus et al. (2017) achieve 60 FPS [16,17], while optimized HOG implementations by He et al. (2024) now achieve 0.933 pixels per clock cycle with minimal resource consumption [20]. The YOLOv3 tiny implementation with pruning and quantization by Zhai et al. (2023) demonstrates commendable 91.65 FPS for

416 \times 416

images, showcasing the benefits of model compression and hardware optimizations [28]. Lower performance is observed in implementations on general purpose embedded boards, such as Raspberry Pi 3/Odroid C2 (5–7 FPS) by Nazir et al. (2018) [18] and unoptimized YOLOv3 tiny on Ultra96 (3 FPS) by Kojima (2022) [27], emphasizing the importance of dedicated FPGA acceleration.

The latest implementations demonstrate continued advancement in performance metrics. The emergency vehicle classification system proposed by Mani et al. (2024) achieves an exceptional 99.87% accuracy, representing one of the highest accuracy rates reported for specialized vehicle-detection tasks [39]. Military vehicle classification systems by Vasavi et al. (2024) achieve 93% accuracy with specialized Mask R-CNN implementations [31], while mmWave radar integration by Mohan et al. (2025) achieves 97% accuracy with ultra low 0.421 ms prediction time [44]. Vaithianathan’s (2024) comprehensive framework demonstrates superior performance compared to conventional CPU and GPU implementations in terms of power efficiency, inference latency, and detection precision [38]. Power consumption analysis reveals significant improvements in energy efficiency in recent FPGA implementations. While earlier methods, such as the 45 nm SOI HOG + SVM by Suleiman and Sze (2016), consumed relatively high power of 69W [16], modern FPGA-based deep-learning solutions demonstrate substantial energy efficiency gains. Power optimization techniques proposed by Kalaiselvi et al. (2025) achieve a 65.9% power reduction (from 313.74 mW to 106.98 mW) while maintaining detection accuracy [36].

Multi-task systems achieve very high efficiency, with implementations by Tatar et al. (2024) consuming only 6.920 W while processing 5 ADAS functions at 22.45 FPS [37]. The YOLOv3 Tiny implementation by Al Amin et al. (2024) operates at remarkably low 3.5 W [35], while the MultiTaskV3 by Baczmanski et al. (2023) consumes approximately 5 W [29]. Even the high performance YOLOv3-tiny with pruning and quantization by Zhai et al. (2023) maintains a reasonable power consumption of 12.51 W, despite achieving higher processing speeds [28].

Advanced embedded vision platforms achieve exceptional energy efficiency of 361.8 GOPS/W for mobile robot applications [41], while transfer learning implementations by Jeyalakshmi et al. (2025) demonstrate a very high 1600× speedup over software implementations [32]. In terms of performance comparison, FPGAs generally offer superior energy efficiency (GOPS/W) compared to GPUs, especially for inference tasks, due to their customizability and ability to support optimized data precision. While GPUs often provide higher peak throughput (GFLOPs) for floating point operations and possess richer development ecosystems, FPGAs achieve competitive throughput for specific tasks with significantly lower power consumption and latency, particularly in streaming data applications where direct peripheral connections are beneficial [45].

This trend highlights FPGA’s inherent advantages in customizability, enabling highly optimized designs that achieve real-time performance with substantially reduced power footprints, making them ideal for power constrained autonomous vehicle applications. The evolution from classical HOG + SVM approaches to modern deep-learning implementations demonstrates significant improvements in both processing speed and energy efficiency. While traditional methods provided a solid foundation with 60 FPS performance, contemporary implementations utilizing model compression techniques, hardware optimizations, and advanced architectures have achieved remarkable throughput improvements, with some implementations reaching over 2000 FPS. The combination of model compression strategies, such as quantization and pruning, with hardware specific optimizations, including pipelining and memory management, continues to advance the state-of-the-art in FPGA-based vehicle and pedestrian-detection systems.

However, the analysis reveals significant inconsistencies in performance metrics across studies. Some implementations report accuracy, others precision or mean Average Precision (mAP), while certain entries lack critical data such as power consumption, rendering direct comparisons challenging. For instance, latency values range widely due to differences in input image resolutions and FPGA architectures, while throughput metrics vary based on optimization techniques like pipelining or quantization. Power consumption data are frequently absent or reported under varying conditions, complicating energy efficiency assessments. These disparities underscore the need for a standardized evaluation framework to ensure fair and reproducible comparisons.

2.7. Cross Task Analysis and Trends

The diversity of FPGA-accelerated vision tasks in autonomous driving—ranging from vehicle and pedestrian detection to traffic sign recognition (TSR), traffic light detection (TLD), and lane detection—presents both opportunities and challenges. This subsection synthesizes cross task patterns to extract insights, addressing variability in algorithms, datasets, and hardware, aiming to inform design without proposing a validation framework.

Cross task trends show classical methods like HOG-SVM excel in low resource scenarios, achieving >95% accuracy at 30+ FPS (e.g., INRIA, GTSRB) [48], but falter with variability (occlusion, lighting). Deep learning (e.g., YOLO) offers robustness (98% mAP in TLD under occlusion) [21] at 5–10 W and 70–80% LUTs, with energy efficiency of 5–7 GOPS/W on KV260, versus 3–4 GOPS/W for classical methods [17].

Shared challenges include dataset fragmentation (e.g., GTSRB for TSR, TuSimple for lanes) [4] and real-time constraints (<10 ms, <5 W) [3]. Classical methods suit low latency edge cases, while DL generalizes across tasks, favoring one-stage detectors (YOLO, SSD) [21] over two-stage (Faster R-CNN) [13] due to complexity. A case study integrates TSR and lane detection on Xilinx Zynq, using YOLOv5 (98% accuracy on TT100K) [49] and SCNN for lanes [50], achieving 50 FPS at 4 W. This fusion improves occlusion handling but drops to 92% accuracy in low contrast scenarios, using 75% LUTs, highlighting the need for standardized metrics [3]. Future designs should blend classical and DL methods, exploring sensor fusion [16] and XAI for transparency [17], guiding cohesive FPGA systems.

3. Traffic Sign Recognition

Accurate and real-time recognition of traffic signs is essential for safe and compliant autonomous driving. This task enables vehicles to reliably interpret regulatory information, such as speed limits, stop signs, and turn restrictions, which are critical for navigating complex road environments and ensuring passenger and pedestrian safety. Failures in recognizing traffic signs can lead to serious accidents and road infractions, making this a highly safety relevant perception challenge. As autonomous vehicles move toward broader deployment, robust, efficient, and interpretable traffic sign recognition becomes increasingly essential for public acceptance and regulatory approval. Figure 4 shows an example of a traffic sign detection system applied in the context of autonomous driving.

3.1. Traditional Computer Vision Approaches

Early traffic sign recognition implementations predominantly relied on traditional computer vision techniques combined with classical machine-learning algorithms due to the computational constraints of embedded systems. These approaches typically involved two main stages: detection (or region of interest generation) and classification/recognition, utilizing hand crafted features and conventional pattern recognition methods. Zhou et al. (2016) proposed a pioneering system-on-chip (SoC) FPGA design for real-time traffic signal recognition, employing pre-filtering based on color analysis, one-pass blob detection for potential candidates, and Histogram of Oriented Gradients (HOG) combined with linear Support Vector Machine (SVM) for classification [49]. This hardware/software codesign achieved a processing rate of 60 fps for XGA (

1024 \times 768

) video with over 90% accuracy for both red and green lights, with the detection component implemented on FPGA fabric due to its computational intensity, while classification was handled by the on-chip ARM processor [49]. The system demonstrated significant resource utilization of 92.44% for Slice LUTs and 44.78% for Slice Registers on a Xilinx Zynq ZC-702 board. Such a high Slice LUT utilization leaves approximately 7.5% of these resources available, which may limit the FPGA capacity to support additional concurrent autonomous driving tasks like sensor fusion and other computational modules that typically require substantial logic resources.

Traditional dimensionality reduction techniques have also proven effective for FPGA-based TSR implementations. Epota Oma et al. (2022) implemented a Principal Component Analysis (PCA) and SVM approach for road traffic sign recognition on a Spartan-6 FPGA, achieving high recognition accuracy through dimensionality reduction and SVM parameter optimization via grid search, demonstrating accurate and fast results despite limitations with practical issues like partial occlusion and rotation [51]. Dewan and Khanna (2022) explored power consumption and hardware utilization characteristics for traffic sign recognition models implemented on a Spartan 3E FPGA, developing MATLAB/Simulink models covering RGB to grayscale conversion, median filtering, segmentation, and edge detection, revealing that edge detection consumed almost half of the total dynamic power [52].

3.2. Deep Learning Evolution

Modern TSR solutions are increasingly leveraging DNNs and CNNs due to their superior accuracy and robustness in complex environmental conditions. The transition to deep-learning approaches has enabled more sophisticated feature extraction and classification capabilities that can handle diverse traffic sign designs, varying lighting conditions, and challenging weather scenarios. Comprehensive benchmarking studies have demonstrated the effectiveness of deep-learning frameworks for deployment on FPGAs. Lin et al. [48] conducted an extensive benchmarking of deep-learning frameworks [48]. They investigated the deployment of an FPGA for traffic sign classification and detection using various models, including MobileNet-v1-SSD, VGG-SSD, and ResNet-SSD. They observed that lightweight models often exceeded real-time detection requirements without significant accuracy loss when deployed on FPGAs using OpenVINO. Their findings consistently demonstrated that FPGAs achieved higher power efficiency than GPUs across all tested cases, with the power efficiency advantage stemming from FPGAs’ ability to customize hardware configurations and support lower precision data types, which consume less power compared to the fixed architectures and higher precision operations of GPUs [48]. Lechner et al. (2019) implemented ResCoNN, a CNN architecture on a Zynq SoC (xc7z020clg484-1), achieving over 96% accuracy at 36 FPS, demonstrating the feasibility of deploying sophisticated neural networks on resource constrained FPGA platforms [53].

Recent advances in model compression and specialized neural architectures have enabled significant performance improvements. Structured pruning techniques have demonstrated excellent effectiveness, with implementations achieving up to 5.65 times speedup over unpruned baselines while maintaining accuracy through a 43% parameter reduction [54]. These approaches leverage on-chip BRAM to eliminate external memory latency and achieve efficient resource utilization, with 44% LUTs, 62% LUTRAMs, 24% flip-flops, 59% BRAMs, and only 9% DSP resources on PYNQ-Z2 platforms. This results in execution time improvements from 0.133 s to 0.02 s per image, with only 0.33% accuracy degradation. Neuromorphic computing approaches have emerged as a promising direction for ultra low power TSR systems, with Spatial Adaptive Spiking Convolutional Neural Networks (SA-SCNN) achieving exceptional performance of 99.22% accuracy at 66.38 FPS while consuming only 1.423 W power [55]. These implementations exploit event driven computation and binary activations to achieve over 10x energy efficiency compared to equivalent Artificial Neural Networks, incorporating spatial attention mechanisms that enhance edge features and reduce average spike generation from 447.7 to 314.8 spikes per timestep.

Ultra high performance implementations have demonstrated exceptional throughput capabilities for traffic signal classification tasks. Amin et al. (2024) developed a power efficient real-time traffic signal classification system, achieving a higher 99.98% accuracy at 84,139 FPS while consuming only 4.4 W power on an Xilinx Kria KV260 FPGA [56]. The system utilizes a custom CNN architecture optimized for 3-class traffic signal classification (red, green, yellow) on 34 × 34 pixel images, employing quantization techniques and hardware software codesign to achieve exceptional energy efficiency of 19,122 images per Joule. The implementation demonstrates superior performance compared to CPU (Intel i7-8700K at 185 W) and GPU (NVIDIA GTX 1080 Ti at 250 W) implementations, highlighting the FPGA’s advantages in power constrained automotive applications.

Specialized neural network architectures and optimization techniques have enabled remarkable performance improvements. Surapally et al. (2022) studied Traffic Sign Recognition using Binarized Neural Networks (BNNs) on an AMD-Xilinx PYNQ-Z2 board, achieving over 1800 frames per second, representing a significant speedup compared to software execution (0.63 fps) and demonstrating the potential for time constrained applications [23]. Takasaki et al. (2021) implemented a BNN-based road marking detector on Ultra96-V2 Programmable Logic, achieving approximately 185 FPS for

60 \times 48

images [22]. Selim et al. (2025) proposed a dedicated neural processor implemented on a Cyclone IV FPGA for traffic signal image processing, achieving 92% classification accuracy while leveraging VHDL for efficient resource utilization, contributing to real-time traffic management systems [57].

Advanced sensor fusion approaches have expanded the capabilities of FPGA-based TSR systems. Bi et al. (2024) explored LiDAR saturated waveform compensation-based real-time ranging for traffic sign detection, implementing their method on a ZYNQ-7000 FPGA to achieve real-time distance and intensity extraction from saturated LiDAR waveforms at laser repetition rates up to 341.53 kHz, demonstrating improved 3D imaging of high reflectivity traffic signs with plane fitting errors less than 0.44 cm [58]. The FPGA implementation utilized a pipeline structure to accelerate computation, which proved critical for real-time applications.

When comparing these deep-learning approaches to traditional computer vision methods, several key trends emerge. The evolution from classical techniques to neural networks represents not merely a change in algorithms but a fundamental shift in capability and robustness. While traditional methods like HOG + SVM demonstrated reasonable accuracy (typically 90–95%) and moderate processing speeds (30-60 FPS), modern deep-learning implementations consistently achieve superior accuracy (often exceeding 99%) with dramatically higher throughput (in some cases over 84,000 FPS). This performance leap comes despite the increased computational complexity of neural networks, highlighting the effectiveness of optimization techniques specifically designed for FPGA deployment. The most successful implementations leverage the inherent parallelism of FPGAs through custom architectures that maximize data reuse and minimize memory access bottlenecks, demonstrating that deep learning and hardware acceleration can be synergistically combined to achieve both high accuracy and real-time performance.

3.3. Optimization Strategies

Optimization techniques for FPGA-based TSR implementations encompass both algorithmic and hardware level strategies, which are crucial for deploying complex neural networks on resource constrained platforms while maintaining real-time performance. Model compression techniques focus on reducing computational and memory requirements through quantization, pruning, and architectural optimizations. Structured pruning has emerged as a particularly effective technique, where removing entire convolution filters preserves the regular data flow, allowing loop unrolling and on-chip buffering while maintaining efficient resource utilization [54]. The pruning process eliminates redundant parameters from neural network architectures obtained after training, thereby minimizing resource utilization during inference and reducing response time. Independent pruning, where all layers are pruned consecutively, has demonstrated the ability to achieve significant speedups while maintaining accuracy levels comparable to those of unpruned models.

Quantization strategies have proven highly effective for FPGA implementations, with 8-bit quantization techniques enabling substantial performance improvements while maintaining accuracy. The traffic signal classification system demonstrates that careful quantization can achieve 99.98% accuracy while enabling ultra high throughput of 84,139 FPS, representing a significant advancement in quantized neural network deployment on FPGAs [56]. The implementation utilizes custom quantization schemes optimized for the specific characteristics of traffic signal images, including color distribution analysis and feature importance weighting. Neuromorphic optimization strategies exploit spike coding and event sparsity to minimize power consumption. Advanced implementations encode traffic sign pixels as temporal spikes, combined with input gating schedulers that ensure only 22% of neuron circuits toggle per frame [55]. These systems use simplified soft-set LIF neurons that convert complex exponential operations into simple right-shift operations, with membrane potential decay achieved through

V (t_{2}) = V (t_{1}) / 2 + X (t_{2})

, requiring only 1-bit shift operations. Systolic arrays, consisting of 16×16 processing elements each containing optimized neurons for convolution and pooling computations, employ an event driven approach where processing elements initiate computation only when they detect spikes.

Hardware software codesign approaches have enabled exceptional performance optimization, with implementations achieving remarkable energy efficiency through careful partitioning of computational tasks between ARM processors and FPGA fabric. The traffic signal classification system demonstrates that the strategic use of ARM Cortex-A53 processors for preprocessing, combined with FPGA acceleration for CNN inference, can achieve 19,122 images per Joule energy efficiency [56]. Lin et al. (2019) demonstrated that using FP11 bitstreams resulted in higher inference speed on FPGAs compared to FP16 bitstreams, with only slight accuracy trade-offs. They noted that larger batch sizes generally led to faster inference on FPGAs [48]. Hardware acceleration strategies include pipelining, batch processing, and custom datapath designs that maximize the utilization of FPGA resources while minimizing power consumption. These approaches are orthogonal to quantization and can be combined with 8-bit or 4-bit datapaths commonly used in FPGA TSR pipelines.

These optimization strategies reveal important trade-offs between accuracy, speed, and resource utilization that researchers must navigate when implementing traffic sign recognition systems on FPGAs. Quantization techniques, for instance, can reduce memory requirements by up to 75% with minimal accuracy degradation (<1%), making them particularly valuable for resource constrained automotive applications. Structured pruning has demonstrated the ability to reduce computational requirements by over 40% while maintaining accuracy levels comparable to unpruned models. Hardware software codesign approaches have proven especially effective, with strategic task partitioning between ARM processors and FPGA fabric enabling energy efficiency improvements of 5–10× compared to software only implementations. The most successful implementations combine multiple optimization techniques, creating compound benefits that address the specific constraints of FPGA platforms while maintaining the accuracy advantages of deep-learning approaches.

3.4. Performance Analysis and Comparison

The comparative analysis of FPGA-based implementations for traffic sign recognition is presented in Table 3 and Table 4. Table 3 summarizes the methods, datasets, and accuracy metrics, highlighting the evolution from traditional computer vision to advanced deep-learning approaches. Table 4 focuses on implementation details, reporting platform specifications, processing speeds, and power consumption characteristics. This separation clarifies the distinct aspects of performance: algorithmic effectiveness and hardware implementation efficiency.

The performance data across these traffic sign recognition implementations reveals several important insights about the state of the field. First, there is a clear correlation between model complexity and resource utilization, with deeper networks typically requiring more FPGA resources but delivering higher accuracy. However, this relationship is not linear, as evidenced by implementations like the SA-SCNN [55], which achieves exceptional accuracy (99.22%) with relatively modest resource requirements through neuromorphic computing principles. Second, power efficiency varies dramatically across implementations, from under 2 W for optimized neuromorphic approaches to over 10 W for complex deep-learning systems, highlighting the importance of architecture selection for power constrained automotive applications. Third, the reported performance metrics show significant variation, with accuracy ranging from 75% to over 99% and processing speeds varying by several orders of magnitude, reflecting differences in implementation approaches, hardware platforms, and evaluation methodologies. This heterogeneity underscores the need for more standardized evaluation frameworks to enable meaningful comparisons between different approaches and facilitate progress in the field.

3.5. Comparative Performance Insights

Figure 5 and Figure 6 provide a comprehensive visualization of FPGA-based traffic sign recognition performance across different methodologies. The analysis reveals significant performance evolution from classical computer vision approaches to sophisticated deep-learning implementations. The evolution of FPGA-based traffic sign recognition systems has demonstrated a higher progression from traditional computer vision approaches to sophisticated deep-learning implementations. Early traditional methods established foundational performance benchmarks, with Zhou et al. (2016) achieving over 90% accuracy at 60 FPS using blob detection and HOG + SVM classification [49]. In contrast, Epota Oma et al. (2022) demonstrated an exceptional accuracy of 99.19% using PCA-SVM approaches [51]. The transition to deep-learning approaches has enabled significant performance improvements, particularly in processing speed and robustness under diverse environmental conditions.

Modern implementations demonstrate extraordinary performance gains through the use of specialized architectures and optimization techniques. The BNN implementation by Surapally et al. (2022) stands out with remarkable performance, achieving over 1800 FPS on an AMD-Xilinx PYNQ-Z2 board, representing a substantial speedup compared to software execution [23]. Recent advances in model compression demonstrate that structured pruning can achieve significant speedups (5.65x) over unpruned baselines while maintaining accuracy levels, with implementations showing efficient resource utilization and execution time improvements from 0.133 s to 0.02 s per image [54]. The field has witnessed breakthrough developments in ultra high performance implementations, with the traffic signal classification system achieving unprecedented throughput of 84,139 FPS at 99.98% accuracy while consuming only 4.4 W power [56]. This represents the highest reported throughput for FPGA-based traffic related recognition systems, demonstrating exceptional energy efficiency of 19,122 images per Joule. The implementation showcases the potential for real-time processing of multiple traffic signals simultaneously, with sufficient computational headroom for integration into comprehensive autonomous driving systems.

Neuromorphic computing approaches have achieved exceptional energy accuracy trade-offs, with spiking neural network implementations reaching 99.22% accuracy at 66.38 FPS while consuming only 1.423 W [55]. These neuromorphic approaches exploit event driven computation and binary activations to achieve over 10× energy efficiency compared to equivalent ANNs, demonstrating superior performance across multiple datasets, including GTSRB, BelgiumTSC, and TSRD, when using optimized neuron coding compared to traditional Poisson coding. Power efficiency analysis reveals that modern FPGA implementations achieve remarkable energy efficiency through various innovations, including quantization techniques, hardware software codesign, spatial attention mechanisms, simplified neuron models, and event driven computation. The traffic signal classification system demonstrates that careful optimization can achieve exceptional energy efficiency while maintaining ultra high accuracy, representing a significant advancement in power constrained automotive applications where continuous operation is essential.

Performance scaling analysis shows that FPGA implementations consistently outperform CPU and GPU alternatives in terms of energy efficiency. The traffic signal classification system achieves 19,122 images per Joule compared to CPU implementations consuming 185 W and GPU implementations consuming 250 W, demonstrating over 40× better energy efficiency than traditional computing platforms [56]. Lin et al. (2019) demonstrated the effectiveness of multiple deep-learning models on Intel Arria 10 FPGA, achieving up to 238.34 FPS with high accuracy scores, while consistently showing superior power efficiency compared to GPU implementations [48]. Advanced sensor fusion approaches have expanded capabilities, with Bi et al. (2024) achieving precise 3D imaging with subcentimeter plane fitting errors at high ranging rates of 341.53 kHz [58].

Figure 6 illustrates the critical power efficiency advantage of FPGAs over GPUs for various traffic-sign-detection models. The analysis reveals consistent and significant benefits of FPGAs in terms of power efficiency (images/Joule) across multiple deep-learning models, including MobileNet-v1-SSD, ResNet-18-SSD, SqueezeNet-v1.1-SSD, and VGG-SSD [48]. FPGA implementations demonstrate substantially higher image processing throughput per Joule compared to GPU counterparts, with specific configurations achieving more than double the power efficiency. This superior power efficiency stems from FPGAs’ inherent ability to customize hardware configurations and support lower-precision data types, which consume less power compared to fixed GPU architectures and higher precision floating point operations [48]. The progression from traditional computer vision methods to optimized deep-learning implementations demonstrates the maturation of FPGA-based traffic sign recognition systems and their suitability for deployment in power constrained autonomous vehicle applications. The combination of advanced model compression techniques, neuromorphic computing approaches, quantization strategies, and hardware specific optimizations has enabled systems that achieve both high accuracy and exceptional throughput while maintaining superior power efficiency, making them highly attractive for automotive edge computing applications where prolonged operation without excessive heat generation or large power draw is essential.

4. Traffic Light Detection

Accurate, real-time traffic light detection (TLD) is a safety critical perception task for autonomous vehicles, directly enabling safe navigation through complex urban environments. Failures in detecting and classifying traffic signals can result in serious accidents and non-compliance with traffic laws, making it essential for both functional safety and regulatory approval. As cities grow denser and lighting conditions become more variable, the demand for robust, efficient, and interpretable TLD systems increases, pushing hardware-accelerated solutions to the forefront of autonomous driving research. Figure 7 illustrates an example of a traffic light detection system.

4.1. Traditional Computer Vision Approaches

Early traffic light-detection implementations primarily utilized traditional computer vision techniques, combined with lightweight machine-learning models, to achieve real-time performance on resource constrained FPGA platforms. These approaches typically employed color-based filtering, geometric shape analysis, and classical feature extraction methods to identify and classify traffic light states. Zhou et al. (2016) demonstrated one of the pioneering FPGA-based implementations, utilizing color-based pre-filtering and blob detection on FPGA fabric to efficiently identify potential traffic light candidates, which were subsequently classified using HOG features and SVM on an on-chip ARM processor, achieving 60 FPS with over 90% accuracy for red and green lights on a Xilinx Zynq ZC-702 board [49]. This work established a baseline for real-time traffic light recognition on FPGAs using traditional computer vision techniques and highlighted the effectiveness of hardware/software codesign approaches.

Kaijie et al. (2018) extended traditional approaches by proposing an autonomous vehicle system controlled by FPGAs, utilizing a passive camera-based pipeline for traffic light detection that employed Hough circle detection and image thresholding techniques. Their design utilized two Xilinx boards (PYNQ-Z1 and Zynq-Xc7Z010) that cooperated through a shared network, demonstrating that FPGA inference time was thousands of times faster than software implementations [59]. Kumar et al. (2023) highlighted the integration of traffic light recognition into a comprehensive ADAS framework using modern machine-learning algorithms like YOLOv5 on an Intel DE10 Nano board (Cyclone V FPGA and ARM Cortex A9), targeting 55 FPS for single channel processing while emphasizing software and hardware co-design optimization strategies [47].

Traditional computer vision approaches for traffic light detection demonstrate both the potential and limitations of classical techniques in autonomous driving applications. These methods excel in computational efficiency and interpretability, with implementations typically consuming minimal FPGA resources while achieving processing speeds of 30–60 FPS. However, their performance is highly dependent on environmental conditions, with accuracy dropping significantly in challenging scenarios such as low light, glare, or adverse weather conditions. The reliance on hand crafted features and color-based filtering makes these systems particularly vulnerable to variations in lighting conditions and color calibration issues. Despite these limitations, classical approaches continue to offer value in scenarios where computational resources are severely constrained or where interpretability is prioritized over maximum accuracy. The evolution of these traditional methods has focused increasingly on robustness improvements through adaptive thresholding and multi stage filtering, though they remain fundamentally limited by their inability to learn complex patterns from data.

4.2. Deep Learning Evolution

Modern TLD solutions increasingly rely on deep-learning approaches, particularly Convolutional Neural Networks (CNNs), due to their superior accuracy and ability to extract rich features from raw image data under varying environmental conditions. The transition to deep-learning methods has enabled more robust detection capabilities that can handle complex scenarios, including varying lighting conditions, weather effects, and diverse traffic light designs across different geographical regions. Hybrid approaches combining traditional computer vision with deep learning have shown promising results. Ouyang et al. (2020) presented a deep CNN-based real-time traffic light detector for self driving vehicles, emphasizing low power consumption and suitability for vehicle-based computing platforms, such as NVIDIA Jetson and FPGA boards, within their Driving Brain platform. This system leveraged a heuristic vision-based module for candidate region selection, followed by a CNN classifier to differentiate valid traffic lights from background objects, with the heuristic preprocessing stage introducing only a minimal latency overhead (≈1 ms per frame) relative to the CNN stage, measured independently from the overall system timing, thereby demonstrating the effectiveness of combining traditional preprocessing with deep-learning classification [60].

Advanced deep-learning implementations have focused on multi task architectures and optimized neural network designs. Baczmanski et al. (2023) implemented the MultiTaskV3 detection segmentation network on an AMD Xilinx Kria KV260 SoC FPGA, which includes traffic light detection as one of its tasks, achieving high mAP and mIoU while consuming low power, making it suitable for autonomous vehicle applications [29]. Sciangula et al. (2022) addressed the challenges of accelerating complex DNNs for Baidu Apollo’s perception module, including Traffic Light Detection (TLD) and Recognition (TLR) models, on an Xilinx Zynq Ultrascale+ SoC FPGA. They highlighted challenges in porting complex DNNs due to unsupported layers and proposed algorithmic solutions to make them compatible with Vitis AI, demonstrating that FPGAs offer significant power consumption improvements and, in some cases, better throughput compared to GPUs for these tasks [61].

Recent implementations have achieved significant performance improvements through the use of optimized architectures and efficient deployment strategies. Al Amin et al. (2024) developed an FPGA-based real-time traffic light detection and classification system using YOLOv3 Tiny on a Xilinx Kria KV260, achieving 15 FPS for HD video streams with 99% accuracy while consuming only 3.5 W, making it highly suitable for edge computing applications [35]. Amin et al. (2024) presented a custom CNN model designed explicitly for traffic signal classification on an Xilinx ZCU102 FPGA, achieving exceptional 99.98% accuracy at 84,139 FPS with 4.4 W power consumption, indicating extremely high efficiency for this specific task through specialized architectural optimizations [56].

The transition from traditional computer vision to deep-learning approaches for traffic light detection represents a significant leap in capability and robustness. Deep-learning implementations consistently demonstrate superior performance in challenging environmental conditions, with accuracy improvements of 10–15% over classical methods in scenarios involving variable lighting, partial occlusions, and complex backgrounds. This enhanced robustness stems from the ability of neural networks to learn hierarchical features that capture both fine grained details (such as light state) and contextual information (such as spatial relationships with other traffic elements). However, these benefits come at the cost of increased computational complexity, requiring sophisticated optimization techniques to achieve real-time performance on resource constrained FPGA platforms. The most successful implementations leverage the specific characteristics of traffic light-detection tasks—such as the relatively small size of targets and the importance of color information—through specialized architectures that balance accuracy with efficiency. These approaches demonstrate that deep learning and FPGA acceleration can be effectively combined to address the specific challenges of traffic light detection in autonomous driving systems.

4.3. Performance Analysis and Comparison

The comparative analysis of FPGA-based implementations for traffic light detection is presented in Table 5 and Table 6. Table 5 summarizes the methods, datasets, and accuracy metrics, highlighting the diverse evaluation approaches and the progression from traditional computer vision to deep-learning methods. Table 6 focuses on implementation details, reporting platform specifications, processing speeds, and power consumption characteristics. This separation clarifies the distinct aspects of performance: algorithmic effectiveness and hardware implementation efficiency.

The optimization strategies employed in traffic-light-detection implementations reveal important patterns in how researchers balance competing requirements for accuracy, speed, and resource efficiency. Unlike more complex perception tasks, traffic light detection benefits from the relatively small size and distinctive appearance of targets, enabling more aggressive optimization approaches without significant accuracy degradation. Quantization techniques have proven particularly effective, with 8-bit representations becoming standard and some implementations exploring binary or ternary weight representations for additional efficiency gains. Model pruning has demonstrated the ability to reduce computational requirements by 30–50% while maintaining accuracy levels suitable for safety critical applications. Hardware specific optimizations, including pipelined processing and custom memory architectures, have enabled implementations to achieve processing latencies under 5 ms while consuming less than 5 W of power. The most effective approaches combine these techniques with task specific optimizations that exploit the unique characteristics of traffic light detection, such as the temporal consistency of light states and the spatial constraints of valid traffic light positions, resulting in systems that deliver both high performance and robustness in real world driving scenarios.

4.4. Comparative Performance Insights

Figure 8 provides comprehensive visualization of FPGA-based traffic-light-detection performance across different methodologies. The evolution of FPGA-based traffic-light-detection systems has demonstrated significant advancements from traditional computer vision approaches to sophisticated deep-learning implementations. Early traditional methods established foundational performance benchmarks, with Zhou et al. (2016) achieving over 90% accuracy at 60 FPS using blob detection and HOG + SVM classification [49]. Meanwhile, Kaijie et al. (2018) demonstrated substantial speed improvements through dedicated FPGA acceleration [59]. The transition to deep-learning approaches has enabled significant performance improvements, particularly in terms of accuracy and robustness under diverse environmental conditions.

Modern implementations showcase exceptional efficiency gains through specialized architectures and optimization techniques. The custom CNN implementation by Amin et al. (2024) stands out with extraordinary performance, achieving 99.98% accuracy at 84,139 FPS while consuming only 4.4 W on a Xilinx ZCU102 FPGA, demonstrating the potential of task-specific neural network designs [56]. Multi-task approaches have also proven effective, with Baczmanski et al. (2023) achieving over 97% mAP and 90% mIoU through their MultiTaskV3 implementation while maintaining low power consumption [29]. Real-time implementations for practical deployment show promising results, with Al Amin et al. (2024) achieving 99% accuracy at 15 FPS for HD video streams while consuming only 3.5 W [35]. Kumar et al. (2023) demonstrate 55 FPS performance in integrated ADAS frameworks [47].

The integration of advanced optimization techniques and specialized hardware architectures continues to push the boundaries of traffic-light-detection performance on FPGA platforms. The combination of model compression, hardware-specific optimizations, and efficient deployment strategies has enabled systems that achieve both high accuracy and real-time performance while maintaining low power consumption, making them highly suitable for automotive edge computing applications. The progression from traditional computer vision methods to optimized deep-learning implementations demonstrates the maturation of FPGA-based traffic-light-detection systems and their readiness for deployment in autonomous vehicle applications.

The performance analysis of traffic-light-detection implementations reveals several important trends and challenges in the field. Accuracy rates across studies range from approximately 90% to over 99%, with deep-learning approaches consistently outperforming classical methods, particularly in challenging environmental conditions. Processing speeds vary dramatically, from basic implementations achieving 30 FPS to highly optimized systems processing over 80,000 FPS, reflecting differences in algorithmic approaches, hardware platforms, and optimization techniques. Power consumption shows similar variation, with energy-efficient implementations consuming under 2 W while more complex systems require over 10 W. This heterogeneity in reported performance metrics highlights a fundamental challenge in the field: the lack of standardized evaluation methodologies and benchmarks makes direct comparisons between different approaches difficult. Furthermore, many studies evaluate performance under ideal conditions, with limited assessment of robustness to the full range of real-world operating conditions, including extreme lighting, weather variations, and sensor degradation. Addressing these evaluation challenges will be critical to advancing the state of traffic-light-detection systems and enabling their reliable deployment in safety critical autonomous driving applications.

5. Lane Detection

Accurate, robust, and real-time lane detection is a fundamental perception task for safe autonomous driving, directly influencing a vehicle’s ability to maintain proper lane position, support advanced driver assistance functions, and navigate complex road environments. Failures in detecting lane boundaries can result in unintended lane departures, loss of control, or collisions, making lane detection not only a technical challenge but also a critical safety feature for next-generation vehicles. As road networks become more diverse and environmental conditions more variable, the demand for efficient, reliable, and low-power lane-detection solutions intensifies, motivating research into FPGA-accelerated vision systems. Figure 9 illustrates an example of lane detection, where the detected lane markings are highlighted.

5.1. Traditional Computer Vision Approaches

Early lane-detection implementations predominantly utilized traditional computer vision techniques that leverage geometric properties and edge characteristics of lane markings. These approaches typically involve sequential processing stages, including image pre-processing (such as grayscale conversion and noise reduction), edge detection (using Canny and Sobel operators), and line fitting techniques (such as the Hough Transform and quadratic curve fitting). FPGA implementations have been favored for their ability to provide real-time processing speeds through parallel execution of these computationally intensive operations.

Traditional computer vision methods have demonstrated effectiveness in structured road environments. Arafat et al. (2022) in [62] demonstrated lane scenario detection based on VC and efficiently detected lanes. Lin et al. (2008) highlighted the use of conventional methods implemented on FPGAs for real-time control applications, including those relevant to lane detection systems [15]. Martin et al. (2021) designed and implemented a comprehensive driving lane detection algorithm on a Zynq-7000 SoC using VHDL, based on traditional computer vision techniques. Their evaluation showed high performance in lane detection across images of varying resolutions and environmental conditions [63]. Gopinathan et al. (2022) implemented a lane detection algorithm on a Zynq 7000 SoC, utilizing Gaussian filters for noise reduction, Canny edge detection for feature extraction, and Hough transform for line detection, emphasizing the benefits of FPGA platforms for high-performance, efficient, and high-speed execution in lane detection applications [64]. Their implementation achieved 126 FPS for

64 \times 64

images with accuracy metrics of 83.59% precision and 81.86% recall, demonstrating the effectiveness of traditional approaches when optimized for FPGA architectures.

Recent work by Magnani et al. (2025) further validates the effectiveness of traditional computer vision approaches through their implementation of a lane detection pipeline on the AMD Kria KV260 platform [65]. Their system employs a sequential processing approach that incorporates Gaussian filtering, color space conversion, thresholding, Filter2D operations, and masking techniques, all of which are accelerated using AMD Vitis Vision library components. The implementation demonstrates the continued relevance of traditional methods when optimized adequately for modern FPGA architectures, achieving significant performance improvements through hardware acceleration while maintaining the robustness of classical computer vision techniques.

Traditional computer vision approaches for lane detection have demonstrated remarkable longevity and continued relevance in autonomous driving systems. These methods typically follow a multi-stage pipeline consisting of preprocessing, feature extraction, and model fitting, with each stage carefully optimized for FPGA implementation. The most successful classical approaches leverage the inherent structure of lane markings through edge detection techniques followed by Hough transforms or similar model fitting algorithms. Performance analysis reveals that these methods can achieve processing speeds of 60–120 FPS with power consumption under 5 W on mid-range FPGAs, making them attractive for resource-constrained applications. However, their effectiveness varies significantly with environmental conditions, with accuracy dropping by 20–40% in scenarios involving shadows, faded markings, or complex road geometries. Despite these limitations, classical lane detection continues to evolve through adaptive algorithms that adjust parameters based on road conditions and temporal filtering techniques that improve consistency across frames. The persistence of these approaches underscores the importance of computational efficiency and interpretability in safety critical autonomous driving systems, even as more complex deep-learning methods gain prominence.

5.2. Hybrid and Deep-Learning Approaches

Modern lane-detection solutions increasingly integrate deep-learning techniques with traditional computer vision methods to achieve enhanced robustness and accuracy under diverse environmental conditions. This hybrid approach leverages the computational efficiency of conventional methods for preprocessing while utilizing deep learning for complex pattern recognition and feature extraction tasks that are challenging for traditional algorithms.

The integration of deep learning with traditional image processing has shown promising results for FPGA-based implementations. Zhan and Chen (2019) developed a lane-detection image processing algorithm that combines digital image processing with deep-learning techniques for intelligent vehicles, achieving fast lane line detection (over 104 FPS at 100 MHz clock frequency) on structured roads while demonstrating suitability for vehicle mounted requirements due to real-time performance and low power consumption [50]. This work highlighted the increasing integration of deep learning with traditional image processing techniques on FPGAs for lane detection applications. Kumar et al. (2023) demonstrated the integration of YOLOv5 for object detection with lane detection capabilities on an Intel DE10 Nano board, achieving 55 FPS with comparable precision, showcasing the potential for multi-task ADAS implementations on FPGA platforms [47].

A significant advancement in deep-learning-based lane detection is demonstrated by Hwang et al. (2025), who implemented a lightweight YOLOv3-based system on the Ultra96v2 platform for autonomous driving applications [66]. Their approach utilized YOLOv3-Tiny architecture with comprehensive optimization techniques including input size reduction (from

416 \times 416

to

256 \times 256

), channel pruning at 50% ratio, and 8-bit quantization. The optimized model achieved remarkable performance improvements, with the pruned and quantized YOLOv3-Tiny model delivering 67.592 FPS compared to the original YOLOv3’s 2.715 FPS, representing approximately a 25-fold increase in processing speed. This work demonstrates the feasibility of deploying sophisticated deep-learning models on resource-constrained FPGA platforms through systematic optimization strategies.

Deep-learning approaches have transformed lane detection by enabling systems that can handle the complex and variable nature of real-world road scenes. Unlike classical methods that rely on hand-crafted features and explicit models, deep-learning implementations learn hierarchical representations directly from data, allowing them to adapt to diverse lane markings, road conditions, and environmental variations. The evolution of these approaches has progressed from simple CNN classifiers to sophisticated semantic segmentation networks that provide pixel-level lane predictions. Recent implementations have achieved accuracy improvements of 15–25% over classical methods in challenging scenarios, particularly in cases involving shadows, occlusions, and non-standard lane markings. However, these benefits come with significant computational challenges, as deep-learning models for lane detection typically require billions of operations per frame. The most successful FPGA implementations address this challenge through a combination of model compression, hardware-specific optimization, and architectural innovations that exploit the temporal continuity of lane information. These approaches demonstrate that deep-learning methods can be effectively deployed on resource-constrained FPGA platforms while maintaining the real-time performance requirements essential for autonomous driving applications.

5.3. Advanced Optimization Techniques

Recent developments have focused on advanced parallelization strategies and hardware-specific optimizations to maximize performance while minimizing power consumption. These approaches exploit the inherent parallelism of FPGA architectures through sophisticated algorithm design and implementation strategies. Yun and Park (2024) proposed a low-power lane detection unit with a sliding-based parallel segment detection accelerator for FPGAs, implementing grayscale conversion, Gaussian smoothing, the Sobel operator, non-maximum suppression, and hysteresis processing in parallel, followed by lane detection using the Hough transform [67]. Their parallelization approach significantly reduced computation time and power consumption, achieving 18 times shorter runtime, 50 times fewer clock cycles, and 3 times less power consumption compared to sequential programming on the Hard Processor System (HPS) block of a DE1-SoC board [67]. This work highlighted the substantial efficiency gains achievable through parallelizing matrix operations and implementing algorithms at the Register-Transfer Level (RTL).

Modern optimization strategies have evolved to include sophisticated architectural design exploration and memory management techniques. Magnani et al. (2025) conducted a comprehensive exploration of architectural designs for lane-detection pipelines, evaluating various integration architectures, memory organizations, and offloading strategies [65]. Their work compared global versus local memory architectures, remote versus local control mechanisms, and sequential versus pipelined execution models. The local memory architecture utilizing Block RAM (BRAM) resources demonstrated significant improvements for memory-bound kernels, with the Mask kernel achieving speedup increases from 2.6× to 5.0×. The pipelined execution model with double buffering achieved 22.3× speedup compared to 16.6× for sequential execution, highlighting the benefits of parallel kernel execution and efficient resource utilization.

The optimization techniques employed by Hwang et al. (2025) demonstrate advanced model compression strategies tailored explicitly for FPGA deployment [66]. Their systematic approach included: (1) architectural simplification by replacing Darknet-53 backbone with YOLOv3-Tiny’s seven convolutional and six max-pooling layers, (2) channel pruning using Vitis AI Optimizer with 50% pruning ratio reducing parameters from 8.7 M to 2.2 M, and (3) post-training quantization converting 32-bit weights to 8-bit integers, achieving 75% reduction in memory bandwidth utilization. These techniques collectively enabled real-time inference on resource-constrained platforms while maintaining acceptable accuracy levels.

Optimization techniques for lane detection on FPGAs primarily focus on exploiting computational parallelism, reducing algorithmic complexity, and implementing efficient memory management strategies. These include the use of fixed-point arithmetic instead of floating-point operations to minimize computation time, pipelining techniques to increase throughput, and array segmentation to enable simultaneous data access across multiple processing elements. The use of hardware description languages (HDLs), such as VHDL, allows fine-grained control over hardware resources, resulting in highly optimized designs that maximize performance while minimizing resource utilization [63,67].

5.4. Performance Analysis and Comparison

The comparative analysis of FPGA-based implementations for lane detection is presented in Table 7 and Table 8. Table 7 summarizes the methods, applications, and accuracy metrics, highlighting the evolution from traditional computer vision to hybrid approaches incorporating deep learning. Table 8 focuses on implementation details, reporting platform specifications, processing speeds, and power consumption characteristics. This separation clarifies the distinct aspects of performance: algorithmic effectiveness and hardware implementation efficiency.

A significant challenge in the field of FPGA-accelerated vision-based autonomous driving systems is the lack of unified validation frameworks and standardized evaluation metrics across lane-detection systems. The literature reveals substantial heterogeneity in testing methodologies, with researchers employing diverse datasets (e.g., KITTI, Caltech, TuSimple, custom datasets), evaluation metrics (e.g., accuracy, FPS, power consumption, mAP), and hardware platforms. This fragmentation makes meaningful comparisons between different lane-detection approaches extremely difficult. Furthermore, many studies report performance under ideal conditions without adequately addressing edge cases or failure modes that would be critical for real-world deployment. The absence of standardized benchmarking protocols specifically designed for FPGA-based lane-detection systems hinders objective assessment of progress in the field and creates barriers to identifying optimal solutions for specific application scenarios. This issue is particularly problematic given the safety critical nature of lane detection in autonomous driving, where system failures can have catastrophic consequences.

5.5. Comparative Performance Insights

Figure 10 provides a comprehensive visualization of FPGA-based lane-detection performance across different methodologies. The evolution of FPGA-based lane-detection systems has demonstrated significant advancements from traditional computer vision approaches to sophisticated hybrid implementations that combine classical techniques with modern optimization strategies. Early traditional methods established foundational performance benchmarks, with Gopinathan et al. (2022) achieving 83.59% precision and 81.86% recall at 126 FPS for smaller image sizes using Gaussian filtering, Canny edge detection, and Hough transform [64]. The integration of deep-learning approaches has enabled enhanced robustness, with Zhan and Chen (2019) achieving over 104 FPS through hybrid image processing and deep-learning techniques while maintaining low power consumption [50].

Modern implementations showcase exceptional performance improvements through advanced parallelization strategies and hardware-specific optimizations. The parallel implementation by Yun and Park (2024) stands out with remarkable performance gains, achieving 302 FPS for

1028 \times 720

images while consuming approximately 2.93 W, representing an 18-fold improvement in runtime and 3-fold reduction in power consumption compared to sequential implementations [67]. This demonstrates the substantial benefits of exploiting FPGA parallelism through Register-Transfer Level (RTL) implementations and specialized hardware acceleration techniques. Multi-task approaches have also shown effectiveness, with Kumar et al. (2023) achieving 55 FPS while integrating lane detection with object-detection capabilities in comprehensive frameworks [47].

Recent developments have further advanced the state-of-the-art in FPGA-based lane detection through sophisticated optimization strategies and architectural innovations. Hwang et al. (2025) demonstrated the effectiveness of systematic deep-learning model optimization for resource-constrained platforms, achieving 67.592 FPS with their optimized YOLOv3-Tiny implementation on the Ultra96v2 platform [66]. Their work highlights the critical importance of model compression techniques, including channel pruning and quantization, for enabling real-time deep-learning inference on embedded FPGA systems. The 25-fold performance improvement over the original YOLOv3 model demonstrates the substantial benefits achievable through comprehensive optimization strategies tailored for FPGA deployment.

Magnani et al. (2025) contributed significant insights into architectural design optimization for lane-detection pipelines, demonstrating that careful consideration of memory organization and execution models can yield substantial performance improvements [65]. Their comprehensive evaluation of different architectural configurations revealed that local memory utilization and pipelined execution models can achieve a speedup of up to 22 times compared to software-only implementations. The work emphasizes the importance of system-level optimization beyond individual kernel acceleration, highlighting how architectural design choices significantly impact overall system performance. Their achievement of 217 FPS for the complete lane-detection pipeline demonstrates the maturity of FPGA-based solutions for real-time autonomous vehicle applications.

The progression from traditional computer vision methods to optimized parallel implementations demonstrates the maturation of FPGA-based lane-detection systems and their suitability for real-time autonomous vehicle applications. The combination of conventional computer vision robustness with advanced parallelization techniques and hardware-specific optimizations has enabled systems that achieve both high accuracy and exceptional processing speeds while maintaining low power consumption. This evolution underscores the effectiveness of FPGA platforms for lane-detection applications, where real-time performance, energy efficiency, and computational reliability are crucial requirements for safe autonomous vehicle operation. Recent contributions demonstrate that both traditional computer vision approaches and deep-learning methods can be effectively optimized for FPGA deployment, with the choice of approach depending on specific application requirements, resource constraints, and performance targets.

6. Other FPGA-Accelerated Autonomous Driving Applications

Beyond core perception tasks, including vehicle, pedestrian, traffic sign, and lane detection, FPGAs play a transformative role in enabling safety, resilience, and intelligent control for autonomous driving systems. FPGA-accelerated technologies now support a diverse range of critical applications that are essential for reliable, next-generation autonomous vehicles. These include security, fault tolerance, sensor fusion, real-time decision making, and system-level integration. Surveying this broad landscape of solutions, ranging from collision avoidance and emergency vehicle prioritization to adverse weather imaging, adversarial defense, and hardware reliability, highlights the fundamental importance of these tasks in building autonomous systems that are not only accurate but also safe, dependable, and deployable in real-world environments. Figure 11 presents miscellaneous FPGA-accelerated applications in autonomous driving.

6.1. Intelligent Speed Control and Collision Avoidance

The early development of autonomous vehicles focused on fundamental safety systems that could provide real-time warnings and interventions to prevent accidents. These systems typically integrate multiple functions, including adaptive cruise control, automatic emergency braking, and driver monitoring, requiring efficient machine-learning models on constrained hardware platforms. Borrego-Carazo et al. (2020) highlighted that autonomous vehicle applications often integrate these diverse functions, requiring efficient machine-learning models on constrained hardware [14]. Lin et al. (2008) described an intelligent system for vehicle driving safety warnings using SVMs, which can detect risky situations and warn drivers, potentially integrating with speed control, emphasizing the critical need for real-time processing to provide meaningful warnings [15]. Hwang and Lee (2016) proposed an FPGA-based real-time lane-detection system for autonomous driving that could be extended to include collision avoidance capabilities, demonstrating the versatility of FPGA platforms for multi-functional safety systems [68].

Advanced implementations have incorporated sophisticated neural network architectures for enhanced perception and decision-making capabilities. Kaijie et al. (2018) implemented a comprehensive recognition system using Binarized Neural Networks (BNNs) on PYNQ-Z1 for pedestrian, obstacle, traffic sign, and cone recognition, achieving 632 FPS for

32 \times 32

images, demonstrating the potential for high-speed multi-object recognition in collision avoidance scenarios [59]. Wu et al. (2020) developed HydraMini, a general autonomous driving platform and robot control system based on Xilinx PYNQ-Z2, achieving up to 7000 FPS for various applications. This showcases the scalability of FPGA-based solutions for comprehensive autonomous vehicle control [69]. Perumal et al. (2022) presented an accident prevention system implemented on FPGA for real-time operation, while Miyagi et al. (2021) developed Zytlebot robot control on Ultra96-V2, achieving 270× faster performance compared to CPU implementations [70,71].

Recent advancements have demonstrated the effectiveness of FPGA implementations for direct motor control and collision avoidance in autonomous systems. Novel FPGA-based real-time bidirectional DC motor control systems with adaptive collision avoidance using infrared sensors have been proposed, utilizing pulse-width modulation techniques for smooth speed control and priority-based decision algorithms for dynamic obstacle response [72]. The system leverages FPGA’s parallel processing capabilities to achieve high-speed motor control with low-latency response to environmental changes, demonstrating robust performance and low power consumption suitable for autonomous robotic systems and industrial automation applications.

In summary, intelligent speed control and collision avoidance systems on FPGAs have evolved from basic warning mechanisms to integrated, high-speed neural-network-based solutions, achieving frame rates up to 7000 FPS while addressing key constraints like latency (as low as 10 ms) and energy consumption (under 10 W in many cases). These advancements highlight the need for balancing computational demands with real-time requirements, particularly in energy-limited vehicular environments.

6.2. Traffic Management and Prioritization Systems

Modern traffic management systems utilize FPGA capabilities to implement intelligent control algorithms that can dynamically respond to real-time traffic conditions and special vehicle requirements. These systems represent a significant evolution from static traffic control to adaptive, smart infrastructure that optimizes traffic flow and prioritizes emergency services.

Smart city traffic control implementations have demonstrated the effectiveness of FPGA-based adaptive systems. Epota Oma et al. (2022) developed a smart city traffic control light system implemented on an FPGA, which dynamically adjusts traffic light timings based on real-time vehicle presence and density detected by sensors. This system utilizes Verilog HDL for flexible control logic implementation, aiming to minimize congestion and improve overall traffic efficiency [73]. Ali et al. (2023) presented an intelligent traffic light system with integrated Libyan license plate recognition (LPR) on FPGA, using LPR data to dynamically adjust traffic light timings for enhanced traffic management and security, with the LPR module involving image acquisition, preprocessing, localization, segmentation, and character recognition implemented on FPGA for real-time performance [34]. Baczmanski et al. (2023) developed a perception system for autonomous vehicles using a detection-segmentation network (MultiTaskV3) implemented on an SoC FPGA, which performs object detection and semantic segmentation for lane lines, traffic lights, and cars, thereby contributing to comprehensive scene understanding for autonomous control [29].

Emergency vehicle prioritization systems have shown significant potential for improving emergency response times while maintaining traffic flow efficiency. Li et al. (2020) proposed an intelligent traffic light control system based on dual-mode special vehicle identification, using audio-visual recognition with neural networks and decision trees to prioritize emergency vehicles, reducing their transit time by approximately 29% while rapidly restoring normal traffic flow through localized processing at intersections to avoid cloud computing delays [74]. Liu et al. (2022) explored adaptive adjustment methods for intelligent traffic lights based on real-time traffic flow detection utilizing YOLOX deep-learning models, showing potential for optimizing traffic management and alleviating congestion [75]. Mani et al. (2024) presented a sustainable transportation infrastructure solution using a high-accuracy FPGA-based system for emergency vehicle classification, which leverages ResNet50 for feature extraction and a multi-objective genetic algorithm for optimization, coupled with a CatBoost classifier, achieving 99.87% classification accuracy [39].

Recent developments have introduced sophisticated emergency response systems designed explicitly for metropolitan environments. Bailey et al. (2024) developed an FPGA-based smart traffic light system for emergencies, implementing intelligent traffic management algorithms to ensure swift and safe ambulance navigation through congested urban traffic [76]. The system architecture emphasizes adaptability, allowing for easy integration with existing healthcare and emergency response infrastructures, and incorporates Blue LED indicators to notify all lanes about ambulance arrivals. The FPGA controller unit serves as the central processing system, orchestrating real-time operations through reconfigurable logic and parallel processing capabilities for traffic analysis, collision prediction, and communication protocols.

Key takeaways from traffic management and prioritization include the ability of FPGA systems to reduce emergency transit times by up to 29% and achieve accuracies exceeding 99%, all while managing shared constraints such as low-latency decision-making (under 50 ms) and energy efficiency to support sustainable urban infrastructure. These systems underscore the importance of adaptive algorithms in handling variable traffic densities.

6.3. Advanced Environmental Processing and Security Applications

Emerging ADAS applications have expanded to address challenging environmental conditions and security concerns that are critical for reliable autonomous vehicle operation. These applications leverage FPGA capabilities for specialized processing tasks that require both high performance and low power consumption.

Image processing for adverse weather conditions represents a critical area where FPGAs provide significant advantages over traditional computing platforms. Zeng et al. (2023) developed UCRDH (Uncoupled Real-Time Dehazing Hardware), a hardware-software co-design solution for real-time image dehazing in automated vehicles implemented on an FPGA (XC7Z100-2FFG900I chip), achieving over 88% object-detection accuracy and dehazing 1080p images in 25.6 ms with significantly lower power consumption (5.70 W) compared to GPU-based methods (98–104 W) [77]. This system’s ability to recover high-definition hazy images with low resource consumption highlights the FPGA’s suitability for demanding real-time applications in challenging environmental conditions.

Security and robustness applications have become increasingly important as autonomous vehicles rely more heavily on AI-based decision making. Lin et al. (2023) addressed the challenge of deploying DNNs robustly on hardware by focusing on quantized neural networks (QNNs) and designing an FPGA-based defense platform, with adversarial training and feature squeezing embedded on FPGA, significantly improving model accuracy against adversarial attacks while maintaining low power consumption (2.8W) [78]. This work highlights the FPGA’s unique capability in enhancing the security and reliability of AI models in safety critical automotive applications. Vaithianathan et al. (2024) developed a general object-detection system on Xilinx UltraScale+ and Intel Stratix 10 platforms, achieving 95% accuracy with 14 ms latency, demonstrating the continued advancement of FPGA-based perception systems [38].

Advanced security implementations have addressed emerging threats in autonomous vehicle systems. Recent research has explored FPGA implementation of HLS crypto accelerators for embedded security in autonomous vehicles, focusing on Boolean masking techniques to protect against side-channel attacks [79]. The system implements masking countermeasures for prominent block cipher algorithms, including PRESENT, AES, and Serpent, offering robust protection against malicious attacks that could exploit power consumption or electromagnetic emanations to steal cryptographic keys. This high-level synthesis approach enables faster design processes and streamlined hardware-software co-design for integrated security architectures.

A notable takeaway is the superior handling of environmental variability and security threats on FPGAs, with systems achieving over 88% accuracy in adverse conditions and power draws as low as 2.8 W, while addressing common constraints like security against adversarial attacks and energy efficiency in harsh weather scenarios. This category emphasizes the need for robust defenses in increasingly connected autonomous ecosystems.

6.4. Fault Tolerant and Resilient Systems

The reliability and safety of autonomous vehicle systems have become paramount concerns, as these systems operate in safety critical environments. FPGA-based solutions have emerged to address the challenges of hardware fault tolerance and system resilience, which are crucial for maintaining consistent performance under adverse conditions. Iurada et al. (2024) presented a transient fault-tolerant semantic segmentation system for autonomous driving, introducing the ReLUMax activation function to enhance resilience against hardware faults while maintaining real-time performance [80]. The system addresses critical reliability concerns in autonomous vehicles by providing robust semantic segmentation capabilities that can withstand transient hardware faults, ensuring consistent performance under challenging operational conditions. This work demonstrates the importance of designing fault-aware architectures for safety critical autonomous driving applications.

Advanced fault injection and emulation systems have been developed to test the resilience of CNN inference accelerators. Masar et al. (2025) proposed FPGA-based emulation and fault injection for CNN inference accelerators, achieving 4.59 ms processing time with 0.755 accuracy on the CIFAR-10 dataset [81]. This work highlights the crucial importance of hardware reliability testing in safety critical autonomous driving applications, offering essential tools for validating system behavior under fault conditions and ensuring robust operation in real-world deployment scenarios.

Summary takeaways reveal that fault-tolerant systems on FPGAs can achieve sub-5 ms recovery times and accuracies around 75-95%, while navigating shared constraints such as latency in fault detection and energy overheads from redundancy mechanisms, underscoring the trade-offs in designing resilient hardware for uninterrupted autonomous operation.

6.5. Intelligent Sensor Fusion and Multi-Modal Processing

The integration of multiple sensing modalities represents a critical advancement in autonomous vehicles, where FPGA platforms excel at processing diverse data streams simultaneously. Modern autonomous vehicles require sophisticated sensor fusion capabilities to combine information from cameras, LiDAR, radar, and inertial measurement units, thereby achieving a comprehensive understanding of their environment. Vaithianathan et al. (2024) developed a comprehensive FPGA design for multimodal sensor data fusion in autonomous robots, integrating data from LiDAR, cameras, and IMUs using parallel processing capabilities [82]. The system utilizes Kalman filters for state prediction and decision trees for contextual classification, achieving 93% accuracy with 10 ms latency while processing up to 2000 data points per second. The FPGA implementation demonstrated significant advantages over microcontroller-based approaches, offering superior processing speed and accuracy for real-time autonomous vehicle applications.

The comprehensive analysis of automobile intelligent assisted driving characteristics based on FPGA technology has provided valuable insights into the technological evolution and implementation challenges. Yue (2024) conducted an extensive analysis of FPGA-based autonomous driving systems, highlighting the prospects and advantages of reconfigurable hardware for next-generation autonomous vehicles [83]. This work highlights the crucial role of FPGA technology in delivering flexible, efficient, and scalable solutions for complex autonomous driving tasks.

Expanding on multi-modal fusion, hybrid approaches combining radar and camera data on FPGAs have shown improved robustness in low-visibility conditions, with fusion algorithms like extended Kalman filters achieving 95% accuracy in object tracking while consuming under 15 W, demonstrating effective management of data heterogeneity. The primary takeaway is the ability of sensor fusion systems to process thousands of data points per second with latencies below 10 ms and accuracies up to 93%, while contending with shared challenges like energy consumption in multi-sensor setups and latency in data synchronization, highlighting the FPGA’s role in enabling comprehensive environmental awareness.

6.6. Specialized Hardware Architectures and Acceleration

The evolution of FPGA-based autonomous vehicle applications has led to the development of specialized hardware architectures. These implementations leverage custom processing elements and optimized data flow architectures to achieve maximum performance efficiency while maintaining the flexibility required for diverse applications. Deep-learning acceleration has become a critical component of autonomous vehicles, with specialized FPGA architectures designed to support various neural network models. Zhang (2024) presented a comprehensive approach to harnessing FPGA power for deep-learning advancements in autonomous vehicles, implementing CNN architectures on Xilinx Zynq UltraScale+ MPSoC platforms combined with high-resolution cameras [84]. The system achieved significant performance improvements over traditional CPU-based implementations, demonstrating the effectiveness of FPGA acceleration for tasks such as object detection, lane detection, and traffic sign classification, while maintaining low power consumption suitable for embedded automotive applications.

Advanced semantic segmentation architectures have been developed specifically for autonomous driving applications. Nitish et al. (2024) designed and evaluated a real-time semantic segmentation system for autonomous driving using UNET architectures implemented on both Raspberry Pi 4 and Xilinx Zynq 7000 FPGA platforms [85]. The system achieved 85% training accuracy and 71% testing accuracy with a maximum Mean Intersection over Union (mIoU) of 0.8315 and a refresh rate of 18.32 FPS, demonstrating the feasibility of deploying complex deep-learning models on resource-constrained embedded platforms. LiDAR processing capabilities have been enhanced through dedicated FPGA implementations for point cloud analysis. Xie et al. (2022) developed a real-time LiDAR point cloud semantic segmentation system, achieving an 87% reduction in parameters compared to state-of-the-art networks while maintaining a 47.9% mIoU score on the Semantic-KITTI dataset [86]. The system processed data at 38.5 ms per frame on the GPU and achieved a 2.74× speedup in the FPGA implementation, with a 46× improvement in power efficiency. This demonstrates the effectiveness of FPGA acceleration for computationally intensive LiDAR processing tasks.

Recent advances in point cloud processing have enabled the segmentation of moving objects for autonomous driving applications. Xie et al. (2023) proposed a lightweight multi-branch network structure for 3D LiDAR point cloud moving object segmentation, achieving 51.3% IoU with only 2.3 M parameters and 35.82 ms processing time on RTX 3090 GPU [87]. The FPGA implementation achieved a processing speed of 32 FPS while maintaining real-time performance requirements for autonomous vehicle applications, demonstrating significant improvements in computational efficiency for dynamic object-detection tasks. Transformer-based architectures have been successfully implemented on FPGA platforms for video processing applications. Udeji and Margala (2024) introduced the SpikeMotion framework, a transformer-based system for high-throughput video segmentation using UNETR models on FPGA platforms [88]. The system incorporates hyper-attention algorithms and mixed-precision techniques to optimize memory allocation and reduce latency, achieving 85% training accuracy and 71% testing accuracy on the Cityscapes dataset, while demonstrating the feasibility of deploying complex transformer architectures on reconfigurable hardware platforms.

Specialized hardware architectures have been developed for UNET implementations targeting semantic segmentation tasks. Khalil et al. (2024) proposed a hardware acceleration scheme for UNET implementation using an FPGA, achieving 5.375 W power consumption with low latency suitable for various embedded applications [89]. The system utilizes processing elements for convolutional operations, employing shift registers and multipliers, which demonstrates efficient resource utilization and performance optimization for CNN-based segmentation tasks.

To address architectural innovations further, custom dataflow engines on FPGAs have been optimized for mixed-precision computations, enabling up to 46× power efficiency gains in LiDAR tasks while handling constraints like thermal limits in vehicular hardware. Takeaways from this category include the achievement of high mIoU scores (up to 0.83) and frame rates (18–32 FPS) in specialized architectures, with emphasis on shared constraints such as energy (under 6 W for many) and latency in processing large data volumes, illustrating the FPGA’s adaptability for accelerated deep learning in autonomous contexts.

6.7. Performance Analysis and Comparison

The comparative analysis of FPGA-based implementations for various autonomous vehicle applications is presented in Table 9 and Table 10. Table 9 summarizes the application types, use cases, and accuracy metrics, highlighting the diversity of autonomous vehicle applications. Table 10 focuses on implementation details, reporting platform specifications, processing speeds, and performance characteristics. This separation clarifies the distinct aspects of performance: application effectiveness and hardware implementation efficiency.

6.8. Comparative Performance Insights

The evolution of FPGA-based implementations for diverse autonomous vehicle applications demonstrates the remarkable advancement and versatility of FPGA platforms beyond traditional perception tasks. Early safety systems established foundational capabilities, with Lin et al. (2008) and Hwang and Lee (2016) demonstrating real-time processing for vehicle safety warnings and collision avoidance [15,68]. The progression to more sophisticated applications has shown remarkable performance improvements, with Wu et al. (2020) achieving up to 7000 FPS for general autonomous driving platforms and Kaijie et al. (2018) reaching 632 FPS for multi-object recognition using BNNs [59,69].

Recent developments in fault tolerance and reliability have shown substantial progress in ensuring the robust operation of autonomous systems. The implementation of transient fault tolerant semantic segmentation systems demonstrates the critical importance of hardware resilience in safety critical applications [80]. Advanced fault injection frameworks have provided essential tools for validating system reliability under adverse conditions, with Masar et al. (2025) achieving 4.59 ms processing time while maintaining 0.755 accuracy on the CIFAR-10 dataset [81]. Sensor fusion implementations have demonstrated exceptional performance in integrating multimodal data streams. Vaithianathan et al. (2024) achieved 93% accuracy with 10ms latency while processing 2000 data points per second, showcasing the effectiveness of FPGA-based approaches for complex sensor integration tasks [82]. This performance represents a significant advancement in real-time autonomous navigation capabilities.

LiDAR processing applications have shown remarkable efficiency improvements through specialized FPGA implementations. Xie et al. (2022, 2023) demonstrated significant parameter reduction (87% fewer parameters) while maintaining competitive accuracy levels, with their 2022 implementation achieving 56 FPS at 5.435 W power consumption and their 2023 moving object segmentation system reaching 32 FPS at 12.8 W [86,87]. These implementations highlight the substantial benefits of hardware-software co-design for computationally intensive tasks. Traffic management and emergency response systems have demonstrated significant practical benefits through the implementation of FPGA technology. Li et al. (2020) achieved a 29% reduction in emergency vehicle transit times with 96.3–97.0% accuracy in combined modality recognition [74], while Mani et al. (2024) demonstrated an exceptional 99.87% accuracy in emergency vehicle classification [39]. The introduction of sophisticated emergency response systems by Bailey et al. (2024) has shown the potential for metropolitan-scale traffic management solutions [76].

Advanced environmental processing capabilities have shown substantial advantages, with Zeng et al. (2023) achieving real-time 1080p image dehazing in 25.6 ms while consuming only 5.70 W compared to 98–104 W for GPU-based methods [77]. Security and robustness applications continue to highlight the FPGA’s unique capabilities for safety critical systems. Lin et al. (2023) demonstrated effective adversarial attack defense with 92.5% accuracy on clean data and 81% on attacked data while maintaining low power consumption of 2.8 W [78].

The diversity of applications showcased in these implementations demonstrates that FPGAs provide a versatile platform capable of addressing the broad spectrum of computational requirements in modern autonomous vehicles. From high-speed perception tasks to intelligent traffic management, security applications, and specialized sensor fusion, FPGA implementations consistently demonstrate superior power efficiency, real-time performance, and reconfigurability compared to traditional computing platforms. The combination of real-time performance, low power consumption, and hardware adaptability makes FPGAs particularly suitable for the evolving landscape of autonomous vehicle technologies, where multiple specialized functions must operate simultaneously within strict power and latency constraints.

Across these applications, shared constraints such as latency (typically under 50 ms for real-time viability), energy efficiency (often below 10W to suit battery-powered vehicles), and security (against faults and adversarial inputs) are evident, requiring careful optimization to ensure coexistence in unified pipelines where perception feeds into control decisions. In terms of system-level integration, these functions often coexist in FPGA pipelines through modular designs, where perception modules (e.g., sensor fusion) interface with control elements (e.g., collision avoidance) via shared memory buffers and pipelined data flows, enabling end-to-end latencies under 100 ms while managing resource contention. Looking ahead, a promising future research direction lies in developing unified FPGA designs that integrate perception, control, and safety mechanisms into single-chip solutions, leveraging advanced co-design tools to optimize for holistic performance and address the current fragmentation in multi-task implementations.

7. Conclusions

FPGA-accelerated vision-based autonomous driving systems have advanced significantly in object detection, traffic sign recognition, and lane detection. Classical methods have been surpassed by modern deep-learning approaches, which deliver superior accuracy, robustness, and processing speed. Hybrid methods in lane detection offer a balance of accuracy and efficiency. Model compression and hardware-specific optimizations enable complex deep-learning models to meet real-time requirements on resource-constrained FPGAs. While classical approaches remain useful in specific low-resource scenarios, optimized deep-learning models generally outperform across all key metrics.

This comprehensive review highlights the vital role of FPGA-accelerated vision-based perception in advancing safe, efficient, and deployable autonomous driving. By systematically mapping the diversity of algorithmic frameworks, hardware platforms, and evaluation metrics across essential tasks, including vehicle and pedestrian detection, lane and traffic sign recognition, and a broad spectrum of enabling technologies, the current state of the art and critical performance trends are revealed. FPGAs offer consistent advantages in terms of deterministic latency, parallel processing, and energy efficiency compared to traditional computing platforms, thereby meeting the demanding requirements of real-time embedded automotive applications. However, the field remains fragmented, with a wide range of architectures, datasets, optimization strategies, and inconsistent benchmarking practices, which complicates direct comparison and hinders the development of standardized solutions. In addition, there is a noticeable predominance of studies using Xilinx devices, which may limit the generalizability of reported benchmarks and underlines the need for broader cross vendor evaluations. Benchmarking factors, including latency, throughput, accuracy, energy consumption, and resource utilization, have been crucial for clarifying trade-offs and guiding best practices for both researchers and system integrators.

While significant progress has been made in FPGA-accelerated vehicle and pedestrian detection, comparatively fewer studies have addressed traffic sign, traffic light, and lane detection. This gap highlights important opportunities for future research to expand FPGA-based solutions across a broader range of autonomous driving perception tasks. Looking ahead, the continued evolution of FPGA-accelerated perception will depend on unified benchmarks, standardized datasets, advances in algorithm and hardware co-design, and the development of scalable, secure, and certifiable architectures for real-world deployment.

Future research should prioritize unified FPGA designs that integrate perception, control, and safety functions into a single system-on-chip architecture. Current isolated implementations lead to inefficient resource use, higher latency, and redundant computations. A unified approach would enable resource sharing, adaptive power management, and optimized data paths to reduce latency between perception and control. This would support advanced safety mechanisms and end-to-end learning systems that optimize the entire driving pipeline. Leveraging heterogeneous FPGA architectures with programmable logic, AI acceleration cores, and embedded processors, along with standardized interfaces and middleware, would drive progress and foster collaboration, moving away from fragmented solutions. Ultimately, the systematic consolidation and comparative analysis presented here serve as a foundation for the next generation of robust, adaptive, and resource-efficient autonomous vehicle technologies.

Author Contributions

Conceptualization, R.A.A.; methodology, R.A.A., M.R.Z.C. and A.S.; software, data curation, M.R.Z.C., A.S. and M.R.T.; writing—original draft preparation, M.R.Z.C., A.S. and M.R.T.; writing—review and editing, R.A.A., F.S.H. and R.O.; supervision, R.A.A., F.S.H. and R.O. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Data Availability Statement

Data are contained within the article.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Xu, X.; Zhang, X.; Yu, B.; Hu, X.S.; Rowen, C.; Hu, J.; Shi, Y. DAC-SDC Low Power Object Detection Challenge for UAV Applications. IEEE Trans. Pattern Anal. Mach. Intell. 2021, 43, 392–403. [Google Scholar] [CrossRef]
Kanchana, B.; Peiris, R.; Perera, D.; Jayasinghe, D.; Kasthurirathna, D. Computer Vision for Autonomous Driving. In Proceedings of the 2021 3rd International Conference on Advancements in Computing (ICAC), Colombo, Sri Lanka, 9–11 December 2021; pp. 175–180. [Google Scholar]
Amin, R.A. Clicfpga: Collaboration Platform for Component-Based Design and Verification of AI-Based System Using FPGAs. Ph.D. Thesis, University of Siegen, Siegen, Germany, 2025. [Google Scholar]
Amin, R.A.; Obermaisser, R. Towards Resource Efficient and Low Latency CNN Accelerator for FPGAs: Review and Evaluation. In Proceedings of the 2024 3rd International Conference on Embedded Systems and Artificial Intelligence (ESAI), Fez, Morocco, 19–20 December 2024; pp. 1–10. [Google Scholar]
Amin, R.A.; Hossain, M.S.A.; Obermaisser, R. HePiLUT: Resource Efficient Heterogeneous Pipelined CNN Accelerator for FPGAs. In Proceedings of the 2024 International Conference on Intelligent Computing, Communication, Networking and Services (ICCNS), Dubrovnik, Croatia, 24–27 September 2024; pp. 32–37. [Google Scholar]
Amin, R.A.; Obermaisser, R. BiDSRS+: Resource Efficient Reconfigurable Real Time Bidirectional Super Resolution System for FPGAs. IEEE J. Emerg. Sel. Top. Circuits Syst. 2025, 15, 120–132. [Google Scholar] [CrossRef]
Amin, R.A.; Obermaisser, R. BiDSRS: Resource Efficient Real Time Bidirectional Super Resolution System for FPGAs. In Proceedings of the 2024 39th Conference on Design of Circuits and Integrated Systems (DCIS), Catania, Italy, 13–15 November 2024; pp. 1–6. [Google Scholar]
Wiese, V.; Amin, R.A.; Obermaisser, R. Design and Evaluation of Guided Wave Signal Generation for System-On-Chip Platform on FPGA. In Proceedings of the IECON 2022—48th Annual Conference of the IEEE Industrial Electronics Society, Brussels, Belgium, 17–20 October 2022; pp. 1–5. [Google Scholar]
Amin, R.A.; Obermaisser, R. Resource-Efficient FPGA Implementation for Real-Time Breast Cancer Classification Using Custom CNNs. In Proceedings of the 2024 International Conference on Intelligent Computing, Communication, Networking and Services (ICCNS), Dubrovnik, Croatia, 24–27 September 2024; pp. 177–182. [Google Scholar]
Amin, R.A.; Obermaisser, R. FPGA-based Resource Efficient High Throughput Object Detection Using Pipelined CNN and Custom SSD. In Proceedings of the 2024 IEEE Nordic Circuits and Systems Conference (NorCAS), Lund, Sweden, 29–30 October 2024; pp. 1–5. [Google Scholar]
BERTEN DSP. GPU vs. FPGA Performance Comparison. White Paper. 2016. Available online: https://www.bertendsp.com/pdf/whitepaper/BWP001_GPU_vs_FPGA_Performance_Comparison_v1.0.pdf (accessed on 1 September 2025).
Castells-Rufas, D.; Ngo, V.; Borrego-Carazo, J.; Codina, M.; Sanchez, C.; Gil, D.; Carrabina, J. A Survey of FPGA-Based Vision Systems for Autonomous Cars. IEEE Access 2022, 10, 132525–132563. [Google Scholar] [CrossRef]
Martelli, S.; Tosato, D.; Cristani, M.; Murino, V. Fast FPGA-based architecture for pedestrian detection based on covariance matrices. In Proceedings of the 18th IEEE International Conference on Image Processing (ICIP), Brussels, Belgium, 11–14 September 2011; pp. 389–392. [Google Scholar]
Borrego-Carazo, J.; Castells-Rufas, D.; Biempica, E.; Carrabina, J. Resource-Constrained Machine Learning for ADAS: A Systematic Review. IEEE Access 2020, 8, 40573–40598. [Google Scholar] [CrossRef]
Lin, C.C.; Lin, C.W.; Huang, D.C.; Chen, Y.H. Design a support vector machine-based intelligent system for vehicle driving safety warning. In Proceedings of the 11th International IEEE Conference on Intelligent Transportation Systems (ITSC), Beijing, China, 12–15 October 2008; pp. 938–943. [Google Scholar]
Suleiman, A.; Sze, V. An energy-efficient hardware implementation of HOG-based object detection at 1080HD 60 fps with multi-scale support. J. Signal Process. Syst. 2016, 84, 325–337. [Google Scholar] [CrossRef]
Meus, B.; Kryjak, T.; Gorgon, M. Embedded vision system for pedestrian detection based on HOG + SVM and use of motion information implemented in Zynq heterogeneous device. In Proceedings of the Signal Processing: Algorithms, Architectures, Arrangements, and Applications (SPA), Poznan, Poland, 20–22 September 2017; pp. 406–411. [Google Scholar]
Nazir, D.; Fizza, M.; Waseem, A.; Khan, S. Vehicle detection on embedded single board computers. In Proceedings of the 7th International Conference on Computer and Communication Engineering (ICCCE), Kuala Lumpur, Malaysia, 19–20 September 2018; pp. 480–485. [Google Scholar]
Lin, Y. Research on HOG-SVM pedestrian detection method based on FPGA. Appl. Comput. Eng. 2023, 9, 272–281. [Google Scholar] [CrossRef]
He, Y.; Huang, J.; Pan, Y. A novel low-resource consumption and high-speed hardware implementation of HOG feature extraction on FPGA for human detection. Integration 2024, 97, 102208. [Google Scholar] [CrossRef]
Fan, H.; Liu, S.; Ferianc, M.; Ng, H.C.; Que, Z.; Liu, S.; Niu, X.; Luk, W. A real-time object detection accelerator with compressed SSDLite on FPGA. In Proceedings of the 2018 International Conference on Field-Programmable Technology (FPT), Naha, Japan, 10–14 December 2018; pp. 14–21. [Google Scholar]
Takasaki, K.; Hisafuru, K.; Negishi, R.; Yamashita, K.; Fukada, K.; Wakaizumi, T.; Togawa, N. An Autonomous Driving System Utilizing Image Processing Accelerated by FPGA. In Proceedings of the 2021 International Conference on Field-Programmable Technology (ICFPT), Auckland, New Zealand, 6–10 December 2021; pp. 1–4. [Google Scholar]
Surapally, S.K.S.; Yang, X.; Harman, T.L.; Shih, L. Evaluating FPGA Acceleration on Binarized Neural Networks and Quantized Neural Networks. In Proceedings of the 2022 International Symposium on Measurement and Control in Robotics (ISMCR), Houston, TX, USA, 28–30 September 2022; pp. 1–5. [Google Scholar]
Talib, M.; Majzoub, S.; Nasir, Q.; Jamal, D. Performance comparison of CNN, QNN and BNN deep neural networks for real-time object detection using ZYNQ FPGA node. Microelectron. J. 2022, 119, 105319. [Google Scholar] [CrossRef]
Zhou, W.; Liu, J.; Wang, M. A Real-Time Hazard Detection AI Accelerator for Autonomous Vehicles Based on Convolutional Neural Networks. In Proceedings of the 2025 Asia-Europe Conference on Cybersecurity, Internet of Things and Soft Computing (CITSC), Rimini, Italy, 10–12 January 2025; pp. 68–72. [Google Scholar]
Hamdaoui, F.; Bougharriou, S.; Mtibaa, A. Optimized Hardware Vision System for Vehicle Detection based on FPGA and Combining Machine Learning and PSO. Microprocess. Microsyst. 2022, 90, 104469. [Google Scholar] [CrossRef]
Kojima, A. Implementation and Improvement of Autonomous Robot Car using SoC FPGA with DPU. In Proceedings of the 2022 International Conference on Field-Programmable Technology (ICFPT), Hong Kong SAR, China, 5–9 December 2022; pp. 1–4. [Google Scholar]
Zhai, J.; Li, B.; Lv, S.; Zhou, Q. FPGA-Based Vehicle Detection and Tracking Accelerator. Sensors 2023, 23, 2208. [Google Scholar] [CrossRef]
Baczmanski, M.; Wasala, M.; Kryjak, T. Implementation of a perception system for autonomous vehicles using a detection-segmentation network in SoC FPGA. In Proceedings of the International Symposium on Applied Reconfigurable Computing, Cottbus, Germany, 20–22 March 2023; pp. 200–211. [Google Scholar]
Anupreetham, A.; Ibrahim, M.; Hall, M.; Boutros, A.; Kuzhively, A.; Mohanty, A.; Nurvitadhi, E.; Betz, V.; Cao, Y.; Seo, J. High Throughput FPGA-Based Object Detection via Algorithm-Hardware Co-Design. ACM Trans. Reconfigurable Technol. Syst. 2023, 17, 1–20. [Google Scholar] [CrossRef]
Vasavi, S.; Sree Sowmya, D.; Aishwarya, C.; Fuentes, W.F. FPGA Based Military Vehicle Classification from Drone-Based Video Using Deep Learning. In Proceedings of the 2024 9th International Conference on Frontiers of Signal Processing (ICFSP), Paris, France, 12–14 September 2024; pp. 32–37. [Google Scholar]
Jeyalakshmi, V.; Chakkrapani, A.; Jansirani, P.; Kripa, S.; Vaithiyanathan, D. Obstacle Avoidance: Transfer Learning and FPGA for Acceleration. In Proceedings of the 2025 6th International Conference on Mobile Computing and Sustainable Informatics (ICMCSI), Malmo, Sweden, 7–8 January 2025; pp. 410–414. [Google Scholar]
Guerrouj, F.Z.; Rodríguez Flórez, S.; Abouzahir, M.; El Ouardi, A.; Ramzi, M. Efficient GEMM Implementation for Vision-Based Object Detection in Autonomous Driving Applications. J. Low Power Electron. Appl. 2023, 13, 40. [Google Scholar] [CrossRef]
Ali, L.A.; Eljhani, M.M. Design and implementation of a smart traffic light system with Libyan license plate recognition on FPGA. In Proceedings of the 2023 IEEE 3rd International Maghreb Meeting of the Conference on Sciences and Techniques of Automatic Control and Computer Engineering (MI-STA), Benghazi, Libya, 21–23 May 2023; pp. 99–104. [Google Scholar]
Amin, R.A.; Hasan, M.; Wiese, V.; Obermaisser, R. FPGA-Based Real-Time Object Detection and Classification System Using YOLO for Edge Computing. IEEE Access 2024, 12, 73268–73278. [Google Scholar] [CrossRef]
Kalaiselvi, A.; Varshini, M.; Swasthika, K.; Prakash, R.G.V. Power Optimization Of FPGAs Used In Autonomous Vehicles. In Proceedings of the 2025 3rd International Conference on Advancements in Electrical, Electronics, Communication, Computing and Automation (ICAECA), Tamilnadu, India, 4–5 April 2025; pp. 1–8. [Google Scholar]
Tatar, G.; Bayar, S.; Çiçek, I. Real-Time Multi-Learning Deep Neural Network on an MPSoC-FPGA for Intelligent Vehicles: Harnessing Hardware Acceleration with Pipeline. IEEE Trans. Intell. Veh. 2024, 9, 5021–5032. [Google Scholar] [CrossRef]
Vaithianathan, M. Real-Time Object Detection and Recognition in FPGA-Based Autonomous Driving Systems. Int. J. Comput. Trends Technol. 2024, 72, 145–152. [Google Scholar] [CrossRef]
Mani, P.; Komarasamy, P.R.G.; Rajamanickam, N.; Shorfuzzaman, M.; Abdelfattah, W. Enhancing Sustainable Transportation Infrastructure Management: A High-Accuracy, FPGA-Based System for Emergency Vehicle Classification. Sustainability 2024, 16, 6917. [Google Scholar] [CrossRef]
Tatar, G.; Bayar, S. Energy efficiency assessment in advanced driver assistance systems with real-time image processing on custom Xilinx DPUs. J. Real-Time Image Process. 2024, 21, 157. [Google Scholar] [CrossRef]
Yang, J.; Yune, S.; Lim, S.; Kim, D.; Kim, J.Y. ACane: An Efficient FPGA-based Embedded Vision Platform with Accumulation-as-Convolution Packing for Autonomous Mobile Robots. In Proceedings of the 2024 29th Asia and South Pacific Design Automation Conference (ASP-DAC), Incheon, Republic of Korea, 22–25 January 2024; pp. 533–538. [Google Scholar]
Izquierdo, A.; Villacorta, J.J.; del Val, L.; Suárez, L. Pedestrian detection using a MEMS acoustic array mounted on a vehicle. Sens. Actuators A Phys. 2024, 381, 115906. [Google Scholar] [CrossRef]
Cambuim, L.; Oliveira, A.; Brito, A.; Sabino, P.; Filho, T.; Teichrieb, V. FPGA-Based Pedestrian Detection for Collision Prediction System. Sensors 2022, 22, 4421. [Google Scholar] [CrossRef]
Mohan, A.; Meena, H.K.; Wajid, M.; Srivastava, A. FPGA-Based Real-Time Road Object Detection System Using mmWave Radar. IEEE Sens. Lett. 2025, 9, 6003804. [Google Scholar] [CrossRef]
Zeng, K.; Ma, Q.; Wu, J.W.; Chen, Z.; Shen, T.; Yan, C. FPGA-based accelerator for object detection: A comprehensive survey. J. Supercomput. 2022, 78, 14096–14136. [Google Scholar] [CrossRef]
Emmanuel, E.; Lacy, F.; Ismail, Y. Optimizing Resource Utilization and Power Efficiency in FPGA-Accelerated YOLOv8 Object Detection Using Vivado High-Level Synthesis (HLS) Tool. In Proceedings of the 2024 2nd International Conference on Artificial Intelligence, Blockchain, and Internet of Things (AIBThings), Mt Pleasant, MI, USA, 7–8 September 2024; pp. 1–4. [Google Scholar]
Kumar, M.; Niharika, A.; Reddy, K.A.; Gupta, H.; Pushpalatha, K.N. Advanced Driver Assistance System (ADAS) on FPGA. SSRG Int. J. VLSI Signal Process. 2023, 10, 22–26. [Google Scholar] [CrossRef]
Lin, Z.; Yih, M.; Ota, J.M.; Owens, J.D.; Muyan-Özçelik, P. Benchmarking Deep Learning Frameworks and Investigating FPGA Deployment for Traffic Sign Classification and Detection. IEEE Trans. Intell. Veh. 2019, 4, 385–395. [Google Scholar] [CrossRef]
Zhou, Y.; Chen, Z.; Huang, X. A System-On-Chip FPGA Design for Real-Time Traffic Signal Recognition System. In Proceedings of the 2016 IEEE International Symposium on Circuits and Systems (ISCAS), Montreal, QC, Canada, 22–25 May 2016; pp. 1778–1781. [Google Scholar]
Zhan, H.; Chen, L. Lane detection image processing algorithm based on FPGA for intelligent vehicle. In Proceedings of the 2019 IEEE International Conference on Artificial Intelligence Circuits and Systems (AICAS), Hsinchu, Taiwan, 18–20 March 2019; pp. 1190–11196. [Google Scholar]
Epota Oma, E.; Zhang, J.; Lv, Z. FPGA Based Traffic Sign Detection Using Support Vector Machine and Hybrid Filters. In Proceedings of the 2022 10th International Conference on Intelligent Computing and Wireless Optical Communications (ICWOC), Chongqing, China, 10–12 June 2022; pp. 45–49. [Google Scholar]
Dewan, P.; Khanna, V. Power Consumption and Hardware Utilization for Traffic Signs Recognition Models through FPGA Implementation. In Proceedings of the 2022 International Conference on Machine Learning, Big Data, Cloud and Parallel Computing (COM-IT-CON), Faridabad, India, 26–27 May 2022; pp. 1–7. [Google Scholar]
Lechner, M.; Jantsch, A.; Dinakarrao, S.M.P. ResCoNN: Resource-Efficient FPGA-Accelerated CNN for Traffic Sign Classification. In Proceedings of the 2019 Tenth International Green and Sustainable Computing Conference (IGSC), Alexandria, VA, USA, 21–24 October 2019; pp. 1–6. [Google Scholar]
Jose, A.; Alense, K.T.; Gijo, L.; Jacob, J. FPGA Implementation of CNN Accelerator with Pruning for ADAS Applications. In Proceedings of the 2024 IEEE 9th International Conference for Convergence in Technology (I2CT), Pune, India, 5–7 April 2024; pp. 1–6. [Google Scholar]
Chen, H.; Liu, Y.; Ye, W.; Ye, J.; Chen, Y.; Chen, S.; Han, C. Research on Hardware Acceleration of Traffic Sign Recognition Based on Spiking Neural Network and FPGA Platform. IEEE Trans. Very Large Scale Integr. (VLSI) Syst. 2024, 32, 499–511. [Google Scholar] [CrossRef]
Amin, R.A.; Hossain, M.S.A.; Wiese, V.; Schmid, L.T.; Obermaisser, R. Power Efficient Real-Time Traffic Signal Classification for Autonomous Driving Using FPGAs. In Proceedings of the 2024 6th International Conference on Communications, Signal Processing, and their Applications (ICCSPA), Istanbul, Türkiye, 8–11 July 2024; pp. 1–6. [Google Scholar]
Selim, M.S.B.; Ghosh, P.; Mallick, T.C. FPGA Based Neural Processor Design For Image Detection of Traffic Signs. In Proceedings of the 2025 International Conference on Electrical, Computer and Communication Engineering (ECCE), Chittagong, Bangladesh, 13–15 February 2025; pp. 1–6. [Google Scholar]
Bi, T.; Li, X.; Chen, W. LiDAR Saturated Waveform Compensation-Based Real-Time Ranging Method for Traffic Sign Detection. IEEE Trans. Instrum. Meas. 2024, 73, 8504011. [Google Scholar] [CrossRef]
Kaijie, W.; Honda, K.; Amano, H. FPGA Design for Autonomous Vehicle Driving Using Binarized Neural Networks. In Proceedings of the 2018 International Conference on Field-Programmable Technology (FPT), Naha, Japan, 10–14 December 2018; pp. 428–431. [Google Scholar]
Ouyang, Z.; Niu, J.; Liu, Y.; Guizani, M. Deep CNN-Based Real-Time Traffic Light Detector for Self-Driving Vehicles. IEEE Trans. Mob. Comput. 2020, 19, 300–313. [Google Scholar] [CrossRef]
Sciangula, G.; Restuccia, F.; Biondi, A.; Buttazzo, G. Hardware Acceleration of Deep Neural Networks for Autonomous Driving on FPGA-based SoC. In Proceedings of the 2022 25th Euromicro Conference on Digital System Design (DSD), Gran Canaria, Spain, 31 August–2 September 2022; pp. 406–414. [Google Scholar]
Araf, S.F.; Raisa, T.Z.; Mithika, A.S.; Shahira, N.F.; Hossain, F.S. A Robust Vision-Based Lane Scenario Detection and Classification Using Machine Learning for Self-Driving Vehicles. In Proceedings of the 2022 25th International Conference on Computer and Information Technology (ICCIT), Cox’s Bazar, Bangladesh, 17–19 December 2022; pp. 125–130. [Google Scholar]
Martin, M.; Grbić, R.; Subotić, M.; Kaštelan, I. FPGA Design and Implementation of Driving Lane Detection on Zynq-7000 SoC. In Proceedings of the 44th International Convention on Information, Communication and Electronic Technology (MIPRO), Opatija, Croatia, 24–28 May 2021; pp. 1004–1009. [Google Scholar]
Gopinathan, M.; Soundarrakumar, R.; Kalaiselvi, A.; Mohideen, A. Implementation of Lane detection in Autonomous vehicle using FPGA. In Proceedings of the 2022 International Conference on Emerging Trends in Engineering and Medical Sciences (ICETEMS), Nagpur, India, 18–19 November 2022; pp. 141–147. [Google Scholar]
Magnani, A.; Brilli, G.; Marongiu, A. Architectural Design Exploration of a Lane Detection Vision Pipeline for FPGA-based F1Tenth Autonomous Vehicles. In Proceedings of the 22nd ACM International Conference on Computing Frontiers (CF Companion ’25), Cagliari, Sardinia, Italy, 28–30 May 2025; pp. 46–49. [Google Scholar]
Hwang, G.H.; Oh, H.; Jeon, J.W. How Lightweight Deep Learning Enhances Performance in DPU-Accelerated Autonomous Driving on Zynq SoC. In Proceedings of the 2025 11th International Conference on Mechatronics and Robotics Engineering (ICMRE), Lille, France, 24–26 February 2025; pp. 20–24. [Google Scholar]
Yun, H.; Park, D. Low-Power Lane Detection Unit With Sliding-Based Parallel Segment Detection Accelerator for FPGA. IEEE Access 2024, 12, 4339–4353. [Google Scholar] [CrossRef]
Hwang, S.; Lee, Y. FPGA-based real-time lane detection for advanced driver assistance systems. In Proceedings of the 2016 IEEE Asia Pacific Conference on Circuits and Systems (APCCAS), Jeju, Republic of Korea, 25–28 October 2016; pp. 218–219. [Google Scholar]
Wu, T.; Wang, Y.; Shi, W.; Lu, J. HydraMini: An FPGA-based Affordable Research and Education Platform for Autonomous Driving. In Proceedings of the 2020 International Conference on Connected and Autonomous Driving (MetroCAD), Detroit, MI, USA, 27–28 February 2020; pp. 45–52. [Google Scholar]
Perumal, V.K.; Siralan, R.; Sivadharseni, S. FPGA-Based Accident Prevention System for Autonomous Vehicles. In Proceedings of the 2022 International Conference on Advances in Computing, Communication and Applied Informatics (ACCAI), Chennai, India, 28–29 January 2022; pp. 1–5. [Google Scholar]
Miyagi, R.; Takagi, N.; Kinoshista, S.; Oda, M.; Takase, H. Zytlebot: FPGA integrated ros-based autonomous mobile robot. In Proceedings of the 2021 International Conference on Field-Programmable Technology (ICFPT), Auckland, New Zealand, 6–10 December 2021; pp. 1–4. [Google Scholar]
Reddy, Y.H.M.; Sumayya, P.; Kumar, K.D.; Bhanu, S.S.; Afrin, S.S.; Saikumar, K.L.; Nikhil, V.N. FPGA-Based Real-Time Bidirectional DC Motor Control with Adaptive Collision Avoidance using Infrared Sensors. Int. Res. J. Eng. Technol. 2025, 12, 149–156. [Google Scholar]
Liang, Z.; Gu, W.; Yang, Y.; Wang, Y. FPGA-based Logic Circuit Design and Implementation for Smart City Traffic Control Lights. In Proceedings of the 2023 IEEE International Conference on Sensors, Electronics and Computer Engineering (ICSECE), Jinzhou, China, 18–20 August 2023; pp. 106–109. [Google Scholar]
Beilei, C.; Erxiang, R.; Li, L. An Intelligent Traffic Light Control System Based on Dual Mode Special Vehicle Identification. In Proceedings of the 2020 IEEE 22nd International Conference on High Performance Computing and Communications; IEEE 18th International Conference on Smart City; IEEE 6th International Conference on Data Science and Systems (HPCC/SmartCity/DSS), Yanuca Island, Cuvu, Fiji, 14–16 December 2020; pp. 1120–1125. [Google Scholar]
Li, H.; Hongjing, H.; Jie, X.; Xiaorui, L. Research on Adaptive Adjustment Method of Intelligent Traffic Light Based on Real-Time Traffic Flow Detection. In Proceedings of the 2022 5th International Conference on Artificial Intelligence and Big Data (ICAIBD), Chengdu, China, 27–30 May 2022; pp. 644–647. [Google Scholar]
Bailey, K.; Bhavana, C.; Bhumika, A.; Hiremath, D.; Kavya, M. FPGA Based Smart Traffic Light System For Emergencies. J. Emerg. Technol. Innov. Res. 2024, 11, a730–a731. [Google Scholar]
Zeng, K.; Ma, Q.; Wu, J.W.; Chen, Z.; Shen, T.; Yan, C. Brief Industry Paper: Real-Time Image Dehazing for Automated Vehicles. IEEE Trans. Intell. Veh. 2023, 2023, 478–483. [Google Scholar]
Yufeng, L.; Xiaokang, S.; Jianan, J.; Hanhui, D.; Yanwen, W.; Jiwu, L.; Di, W. FPGA Adaptive Neural Network Quantization for Adversarial Image Attack Defense. IEEE Trans. Ind. Inform. 2024, 20, 14017–14028. [Google Scholar]
Kumar, S.D.; Vishnu Ramesh Kumar, R.; Boopathi, M.; Manojkumar, R.; Gobinath, R.; Vignesh, M. FPGA Implementation of HLS Crypto Accelerators for Embedded Security in Autonomous Vehicles; SAE Technical Paper 2025-28-0205; SAE International: Warrendale, PA, USA, 2025. [Google Scholar]
Iurada, L.; Cavagnero, N.; Dos Santos, F.; Averta, G.; Rech, P.; Tommasi, T. Transient Fault Tolerant Semantic Segmentation for Autonomous Driving. UNCV-W 2024 Extended Abstract. arXiv 2024, arXiv:2408.16952. [Google Scholar]
Masar, F.; Mrazek, V.; Sekanina, L. Late Breaking Result: FPGA-Based Emulation and Fault Injection for CNN Inference Accelerators. In Proceedings of the 2025 Design, Automation & Test in Europe Conference (DATE 2025), Valencia, Spain, 31 March–2 April 2025; pp. 1–2. [Google Scholar]
Vaithianathan, M.; Reddy, M.; Udkar, S.; Rajasekaran, S.; Roy, D. FPGA Design for Multimodal Sensor Data Fusion in Autonomous Robots. In Proceedings of the 2024 International Conference on Sustainable Communication Networks and Application (ICSCNA), Theni, India, 11–13 December 2024; pp. 237–242. [Google Scholar]
Yue, P. Analysis and prospects of automobile intelligent assisted driving characteristics based on FPGA technology. In Proceedings of the 4th International Conference on Signal Processing and Machine Learning, Arlington, VA, USA, 18–20 August 2024; pp. 52–60. [Google Scholar]
Zhang, Y. Accelerating autonomous vehicles: Harnessing FPGA power for deep learning advancements. In Proceedings of the 4th International Conference on Signal Processing and Machine Learning, Newark, DE, USA, 18–20 August 2024; pp. 157–165. [Google Scholar]
Nitish, V.; Krishnaprasad, B.; Acharya, P.; Manikandan, J. Design and Evaluation of a Real-Time Semantic Segmentation System for Autonomous Driving. In Proceedings of the 2024 3rd International Conference for Innovation in Technology (INOCON), Bangalore, India, 1–3 March 2024; pp. 1–6. [Google Scholar]
Xie, X.; Bai, L.; Huang, X. Real-Time LiDAR Point Cloud Semantic Segmentation for Autonomous Driving. Electronics 2022, 11, 11. [Google Scholar] [CrossRef]
Xie, X.; Wei, H.; Yang, Y. Real-Time LiDAR Point-Cloud Moving Object Segmentation for Autonomous Driving. Sensors 2023, 23, 547. [Google Scholar] [CrossRef] [PubMed]
Udeji, U.; Margala, M. SpikeMotion: A Transformer Framework for High-Throughput Video Segmentation on FPGA. In Proceedings of the 2024 IEEE 67th International Midwest Symposium on Circuits and Systems (MWSCAS), Springfield, MA, USA, 11–14 August 2024; pp. 818–822. [Google Scholar]
Khalil, K.; Abdelfattah, R.; Abdelfatah, K.; Sherif, A. Hardware Acceleration-Based Scheme for UNET Implementation Using FPGA. In Proceedings of the 2024 IEEE 3rd International Conference on Computing and Machine Intelligence (ICMI), Mt Pleasant, MI, USA, 13–14 April 2024; pp. 1–5. [Google Scholar]

Figure 1. Example of object detection showing bounding boxes around detected vehicles and pedestrians.

Figure 2. Comparative analysis of FPGA-based vehicle and pedestrian-detection systems on inference speed (FPS) across various methods.

Figure 3. Comparative analysis of FPGA-based vehicle and pedestrian-detection systems on power consumption across various methods.

Figure 4. Example of traffic-sign-detection system in autonomous driving.

Figure 5. Comparative analysis of FPGA-based traffic signal recognition on inference speed (FPS) across various methods.

Figure 6. Power efficiency comparison between FPGA and GPU for various traffic sign-detection models.

Figure 7. Example of traffic-light-detection system.

Figure 8. Comparative analysis of FPGA-based traffic light detection on inference speed (FPS) across various methods.

Figure 9. Example of lane detection showing the detected lane markings.

Figure 10. Comparative analysis of FPGA-based lane detection on inference speed (FPS) across various methods.

Figure 11. Miscellaneous FPGA-accelerated autonomous driving applications.

Table 1. FPGA-based implementations for vehicle and pedestrian detection: metrics and scores.

Source	Year	Method	Dataset	Metric	Score
[16]	2016	HOG + SVM	INRIA	Accuracy	90%
[17]	2017	HOG + SVM	INRIA	N/A	N/A
[18]	2018	HOG + SVM	Custom	Accuracy	70%
[21]	2018	SSDLite	MS COCO	N/A	N/A
[22]	2021	BNN	FPT Contest	N/A	N/A
[23]	2022	Tiny-YOLO	PASCAL VOC	Probabilities	N/A
[43]	2022	FPGA-based Detection	Collision Dataset	Detection Rate	High
[26]	2022	HOG + PSO + SVM	KITTI	Accuracy	97.84%
[27]	2022	YOLOv3 tiny	FPT Contest	N/A	N/A
[19]	2023	HOG + SVM (Review)	INRIA	Accuracy	>95%
[28]	2023	YOLOv3-tiny	UA-DETRAC	AP@0.5	59.9
[29]	2023	MultiTaskV3	City Mock-up	mAP, mIoU	>97, >90
[30]	2023	SSD-MobileNet-V1	MS COCO	mAP	22.8
[33]	2023	YOLOv4	KITTI	mAP	89.40
[47]	2023	YOLOv5	FLIR and KITTI	Precision	Comparable
[25]	2024	Attention-based CNN	Hazard Dataset	mAP	92.3%
[31]	2024	Mask R-CNN+ResNet50	Military Dataset	Accuracy	93%
[40]	2024	Multi-task DPU	ADAS Dataset	mIoU	57.76%
[44]	2025	VGG16+mmWave	Road Dataset	Accuracy	97%
[20]	2024	Optimized HOG	INRIA/MIT	Accuracy	91.79%/98.49%
[35]	2024	YOLOv3 Tiny	BSTLD	Accuracy	99%
[42]	2024	MEMS Acoustic Array	Urban Environment	Detection Capability	Real time
[39]	2024	ResNet50-MOP-CB	Emergency Vehicle Dataset	Accuracy	99.87%
[38]	2024	Deep-Learning Framework	Autonomous Driving Dataset	Performance	Superior to CPU/GPU

Table 2. FPGA-based implementations for vehicle and pedestrian detection: speed, power, and image size.

Source	Method	Platform	Image Size	Speed	Power (W)
[16]	HOG + SVM	45nm SOI ASIC	1080HD	60 FPS	69
[17]	HOG + SVM	Zynq SoC	$1270 \times 720$	60 FPS	3.95 GOPS/W
[18]	HOG + SVM	RPi 3/Odroid C2	$320 \times 97$	5–7 FPS	N/A
[21]	SSDLite	FPGA	N/A	65 FPS	N/A
[22]	BNN	Ultra96-V2	$60 \times 48$	185 FPS	N/A
[23]	Tiny-YOLO	PYNQ-Z2	N/A	>7500 MOPS/s	N/A
[43]	FPGA-based Detection	FPGA Platform	N/A	Real time	Low Power
[26]	HOG + PSO + SVM	Virtex-7 FPGA	$64 \times 64$	1.483 ms (latency)	N/A
[27]	YOLOv3 tiny	Ultra96 (XCZU3EG)	$640 \times 480$	3 FPS	N/A
[19]	HOG + SVM	Various FPGA	N/A	>30 FPS	Efficient
[28]	YOLOv3-tiny	Zynq-7000	$416 \times 416$	91.65 FPS	12.51
[29]	MultiTaskV3	Kria KV260 SoC	$512 \times 320$	4.85 FPS	5
[30]	SSD-MobileNet-V1	Stratix 10 GX2800	N/A	2167 FPS	N/A
[33]	YOLOv4	Arria 10 FPGA	$608 \times 608$	38 ms/frame	N/A
[47]	YOLOv5	DE10 Nano	N/A	55 FPS	N/A
[25]	Attention-based CNN	Custom FPGA	N/A	125 FPS	35
[31]	Mask R-CNN + ResNet50	ZYBO Z7-10	N/A	Real time	Optimized
[32]	VGG16 + FINN	Pynq-Z2	N/A	1600× speedup	Low
[36]	Self-Feedback CNN	Virtex7 FPGA	N/A	Real time	106.98mW
[37]	Multi-Learning DNN	MPSoC-FPGA	N/A	22.45 FPS	6.920
[40]	Multi-task DPU	Kria KV260	$1920 \times 1080$	22.15 FPS	7.19
[41]	Embedded Vision	Kintex-7	N/A	27.7-48.0 FPS	5.095
[44]	VGG16 + mmWave	PYNQ-ZU	N/A	0.421 ms latency	65
[20]	Optimized HOG	Artix-7/Zynq	$128 \times 64$	0.933 pix/cycle	Low
[46]	YOLOv8 Optimized	ZYNQ-7 ZC706	N/A	Optimized	9.342
[35]	YOLOv3 Tiny	Kria KV260	HD (720 p)	15 FPS	3.5
[42]	MEMS Acoustic	FPGA + PC	N/A	Real time	Low Power
[39]	ResNet50-MOP-CB	FPGA Platform	N/A	High Accuracy	Optimized
[38]	Deep-Learning Framework	FPGA Platform	N/A	Minimal Latency	Superior Efficiency

Table 3. FPGA-based implementations for traffic sign recognition: metrics and accuracy.

Source	Year	Method	Dataset	Metric	Score
[49]	2016	Blob + HOG + SVM	Traffic Signal Recognition	Accuracy	>90%
[48]	2019	ResNet-32	GTSRB	Accuracy	98.34%
[53]	2019	ResCoNN	Modified GTSRB	Accuracy	96.53%
[51]	2022	PCA-SVM	Custom Dataset	Accuracy	99.19%
[23]	2022	BNN	GTSRB	Accuracy	>75%
[58]	2024	SWC-based LiDAR Ranging	3D Traffic Sign Detection	Plane Fitting Error	<0.44 cm
[54]	2024	Pruned LeNet-5 CNN	GTSRB	Accuracy	92%
[55]	2024	SA-SCNN (Spiking)	GTSRB	Accuracy	99.22%
[56]	2024	Custom CNN	LISA	Accuracy	99.98%
[57]	2025	4-layer Neural Network	Traffic Signal Classification	Accuracy	92%

Table 4. FPGA-based implementations for traffic sign recognition: speed, power, and platform details.

Source	Method	Platform	Image Size	Speed	Power (W)
[49]	Blob + HOG + SVM	Xilinx Zynq ZC-702	$1024 \times 768$	60 FPS	N/A
[48]	SqueezeNet-v1.1-SSD	Intel Arria 10 FPGA	$300 \times 510$	238.34 FPS	N/A
[53]	ResCoNN	Zynq SoC	$32 \times 32$	36 FPS	N/A
[22]	BNN	Ultra96-V2	$60 \times 48$	185 FPS	N/A
[51]	PCA-SVM	Spartan-6 FPGA	$31 \times 31$	0.314 s test time	N/A
[23]	BNN	AMD-Xilinx PYNQ-Z2	N/A	1859.5 FPS	N/A
[58]	SWC-based LiDAR Ranging	ZYNQ-7000 FPGA	N/A	341.53 kHz	N/A
[54]	Pruned LeNet-5 CNN	Xilinx PYNQ-Z2	$32 \times 32$	5.65× speedup	N/A
[55]	SA-SCNN (Spiking)	Xilinx Kintex-7	$28 \times 28$	66.38 FPS	1.423
[56]	Custom CNN	Xilinx ZCU 102	$34 \times 34$	84,139 FPS	4.4
[57]	4-layer Neural Network	Cyclone IV FPGA	N/A	N/A	N/A

Table 5. FPGA-based implementations for traffic light detection: metrics and accuracy.

Source	Year	Method	Dataset	Metric	Score
[49]	2016	Blob + HOG + SVM	Red/Green Lights	Accuracy	>90%
[59]	2018	Hough Circles + Image Thresholding	Traffic Light Detection	Performance	1000× faster than SW
[60]	2020	CNN	Self-driving Vehicles	N/A	N/A
[61]	2022	YOLOv3-based TLD/TLR	Baidu Apollo DNNs	MSE/KLD	Comparable to GPU
[29]	2023	MultiTaskV3	Multi-task Detection	mAP	>97%
[47]	2023	YOLOv5	ADAS Framework	Precision	Comparable
[35]	2024	YOLOv3 Tiny	HD Video Streams	Accuracy	99%
[56]	2024	Custom CNN	Traffic Signal Classification	Accuracy	99.98%

Table 6. FPGA-based implementations for traffic light detection: speed, power, and platform details.

Source	Method	Platform	Image Size	Speed	Power (W)
[49]	Blob + HOG + SVM	Xilinx Zynq ZC-702	$1024 \times 768$	60 FPS	N/A
[59]	Hough Circles + Image Thresholding	PYNQ-Z1 + Zynq-Xc7Z010	N/A	1000× faster	N/A
[60]	Heuristic + CNN	NVIDIA Jetson + FPGA	$1292 \times 964$	N/A	Low
[61]	YOLOv3-based TLD/TLR	Xilinx Zynq Ultrascale+ ZCU102	N/A	92.36 FPS	Significant improvement
[29]	MultiTaskV3	AMD Xilinx Kria KV260 SoC	$512 \times 320$	4.85 FPS	5
[47]	YOLOv5	Intel DE10 Nano	N/A	55 FPS	N/A
[35]	YOLOv3 Tiny	Xilinx Kria KV260	HD (720p)	15 FPS	3.5
[56]	Custom CNN	Xilinx ZCU102	$34 \times 34$	84,139 FPS	4.4

Table 7. FPGA-based implementations for lane detection: metrics and accuracy.

Source	Year	Method	Dataset	Metric	Score
[50]	2019	Image processing + Deep Learning	Custom Dataset	N/A	N/A
[63]	2021	Traditional Computer Vision	Custom Dataset	Precision, Recall	86.88%, 90.84%
[64]	2022	Gaussian Filter, Canny, Hough Transform	Custom Dataset	Precision, Recall	83.59%, 81.86%
[47]	2023	YOLOv5	Custom Dataset	Precision	Comparable
[67]	2024	Canny Edge Detection + Hough Transform	Custom Dataset	Accuracy	>95% within 5% error
[66]	2025	YOLOv3-Tiny	Custom Dataset	Real-time Performance	67.592 FPS
[65]	2025	Vitis Vision	Custom Dataset	Speedup	Up to 22×

Table 8. FPGA-based implementations for lane detection: speed, power, and platform details.

Source	Method	Platform	Image Size	Speed	Power (W)
[50]	Image processing + Deep Learning	Xilinx Zynq7035 SoC	N/A	>104 FPS	Low
[63]	Traditional Computer Vision	Zynq-7000 SoC	Various	N/A	N/A
[64]	Gaussian Filter, Canny, Hough Transform	Zynq 7000 SoC	$64 \times 64$ , $256 \times 256$	126 FPS	N/A
[47]	YOLOv5	Intel DE10 Nano	N/A	55 FPS	N/A
[67]	Canny + Hough Transform	DE1-SoC	$1028 \times 720$	302 FPS	2.93
[66]	YOLOv3-Tiny	Ultra96v2 (Zynq UltraScale+)	$256 \times 256$	67.592 FPS	N/A
[65]	Vitis Vision	AMD Kria KV260	N/A	217 FPS	N/A

Table 9. FPGA-based implementations for other ADAS applications: metrics and accuracy.

Source	Year	Application Type	Use Case	Metric	Score
[15]	2008	Vehicle Driving Safety Warning	Safety Warning System	N/A	Real-time capability
[68]	2016	Collision Avoidance	ADAS Safety System	N/A	Real-time capability
[59]	2018	Multi-Object Recognition (BNNs)	Comprehensive Recognition	N/A	N/A
[74]	2020	Intelligent Traffic Light Control	Emergency Vehicle Priority	Combined Modality Accuracy	96.3–97.0%
[69]	2020	General AD Platform/Robot Control	Autonomous Driving	N/A	N/A
[73]	2022	Smart City Traffic Control	Traffic Management	N/A	Dynamic adaptation
[75]	2022	Adaptive Traffic Light Control	Traffic Flow Optimization	N/A	Real-time capability
[70]	2022	Accident Prevention System	Safety System	N/A	Real-time capability
[86]	2022	LiDAR Point Cloud Segmentation	Semantic Segmentation	mIoU	47.9%
[77]	2023	Real-Time Image Dehazing	Weather Condition Processing	Object-Detection Accuracy	>88%
[34]	2023	Smart Traffic Light with LPR	Traffic Management	LPR Accuracy	N/A
[87]	2023	Moving Object Segmentation	LiDAR Processing	IoU	51.3%
[78]	2023	Adversarial Attack Defense	AI Security	Accuracy (Clean/Attacked)	92.5%/81%
[76]	2024	Emergency Traffic Management	Smart Traffic Control	N/A	Real-time capability
[80]	2024	Fault Tolerant Segmentation	Semantic Segmentation	N/A	Hardware resilience
[82]	2024	Multimodal Sensor Fusion	Autonomous Robots	Accuracy	93%
[84]	2024	Deep-Learning Acceleration	Autonomous Vehicles	N/A	Real-time capability
[83]	2024	Autonomous Driving Analysis	FPGA Technology	N/A	Real-time capability
[85]	2024	Semantic Segmentation	Autonomous Driving	mIoU	0.8315
[88]	2024	Video Segmentation	Transformer Framework	Training/Testing Accuracy	85%/71%
[89]	2024	UNET Acceleration	Semantic Segmentation	N/A	Low power consumption
[39]	2024	Emergency Vehicle Classification	Emergency Response	Classification Accuracy	99.87%
[38]	2024	General Object Detection	Perception System	Accuracy	95%
[72]	2024	DC Motor Control	Collision Avoidance	N/A	Real-time capability
[79]	2025	Crypto Acceleration	Security	N/A	Side-channel protection
[81]	2025	Fault Injection	CNN Inference	Accuracy	0.755

Table 10. FPGA-based implementations for other ADAS applications: speed, power, and platform details.

Source	Application Type	Platform	Input Size	Speed/Performance	Power (W)
[15]	Vehicle Safety Warning	N/A	N/A	Real time	N/A
[68]	Collision Avoidance	N/A	N/A	Real time	N/A
[59]	Multi-Object Recognition	PYNQ-Z1	$32 \times 32$	632 FPS	N/A
[74]	Traffic Light Control	N/A	N/A	29% transit time reduction	N/A
[69]	AD Platform/Robot Control	Xilinx PYNQ-Z2	Various	Up to 7000 FPS	N/A
[73]	Smart City Traffic Control	FPGA	Sensor Data	Dynamic	N/A
[75]	Adaptive Traffic Control	N/A	N/A	Real time	N/A
[70]	Accident Prevention	FPGA	N/A	Real time	N/A
[71]	Robot Control	Ultra96-V2	N/A	270× faster than CPU	N/A
[86]	LiDAR Segmentation	ZCU104 MPSoC	64 × 2048	56 FPS	5.435
[77]	Image Dehazing	XC7Z100-2FFG900I	1080p	25.6 ms (1080p)	5.70
[34]	Smart Traffic Light	FPGA	License Plates	Real time	N/A
[87]	Moving Object Segmentation	ZCU104 MPSoC	64 × 2048	32 FPS	12.8
[78]	Adversarial Defense	Xilinx ZYNQ	$160 \times 160$	55.37 GOP/s	2.8
[76]	Emergency Traffic Management	FPGA	N/A	Real time	N/A
[80]	Fault Tolerant Segmentation	GPU/FPGA	N/A	Real time	N/A
[82]	Multimodal Sensor Fusion	FPGA	Multiple sensors	10ms latency	N/A
[84]	Deep-Learning Acceleration	Xilinx Zynq UltraScale+	Various	Real time	Low power
[83]	Autonomous Driving Analysis	FPGA	N/A	Real time	N/A
[85]	Semantic Segmentation	Raspberry Pi 4/Zynq 7000	Various	18.32 FPS	N/A
[88]	Video Segmentation	FPGA	Various	Real time	N/A
[89]	UNET Acceleration	FPGA	N/A	Low latency	5.375
[39]	Emergency Vehicle Classification	FPGA	N/A	N/A	N/A
[38]	Object Detection	Xilinx Ultra-Scale+/Intel Stratix 10	N/A	14 ms latency	N/A
[72]	DC Motor Control	FPGA	N/A	Real time	Low power
[79]	Crypto Acceleration	FPGA	N/A	Real time	N/A
[81]	Fault Injection	Xilinx Zynq UltraScale+	N/A	4.59 ms	N/A

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Chowdhury, M.R.Z.; Seum, A.; Talukder, M.R.; Amin, R.A.; Hossain, F.S.; Obermaisser, R. Towards Next-Generation FPGA-Accelerated Vision-Based Autonomous Driving: A Comprehensive Review. Signals 2025, 6, 53. https://doi.org/10.3390/signals6040053

AMA Style

Chowdhury MRZ, Seum A, Talukder MR, Amin RA, Hossain FS, Obermaisser R. Towards Next-Generation FPGA-Accelerated Vision-Based Autonomous Driving: A Comprehensive Review. Signals. 2025; 6(4):53. https://doi.org/10.3390/signals6040053

Chicago/Turabian Style

Chowdhury, Md. Reasad Zaman, Ashek Seum, Mahfuzur Rahman Talukder, Rashed Al Amin, Fakir Sharif Hossain, and Roman Obermaisser. 2025. "Towards Next-Generation FPGA-Accelerated Vision-Based Autonomous Driving: A Comprehensive Review" Signals 6, no. 4: 53. https://doi.org/10.3390/signals6040053

APA Style

Chowdhury, M. R. Z., Seum, A., Talukder, M. R., Amin, R. A., Hossain, F. S., & Obermaisser, R. (2025). Towards Next-Generation FPGA-Accelerated Vision-Based Autonomous Driving: A Comprehensive Review. Signals, 6(4), 53. https://doi.org/10.3390/signals6040053

Article Menu

Towards Next-Generation FPGA-Accelerated Vision-Based Autonomous Driving: A Comprehensive Review

Abstract

1. Introduction

2. Vision-Based Perception for Autonomous Driving in FPGAs

2.1. Vehicle and Pedestrian Detection

2.2. Classical Computer Vision Approaches

2.3. Deep Learning Revolution

2.4. Optimization Strategies

2.5. Performance Analysis and Comparison

2.6. Comparative Performance Insights

2.7. Cross Task Analysis and Trends

3. Traffic Sign Recognition

3.1. Traditional Computer Vision Approaches

3.2. Deep Learning Evolution

3.3. Optimization Strategies

3.4. Performance Analysis and Comparison

3.5. Comparative Performance Insights

4. Traffic Light Detection

4.1. Traditional Computer Vision Approaches

4.2. Deep Learning Evolution

4.3. Performance Analysis and Comparison

4.4. Comparative Performance Insights

5. Lane Detection

5.1. Traditional Computer Vision Approaches

5.2. Hybrid and Deep-Learning Approaches

5.3. Advanced Optimization Techniques

5.4. Performance Analysis and Comparison

5.5. Comparative Performance Insights

6. Other FPGA-Accelerated Autonomous Driving Applications

6.1. Intelligent Speed Control and Collision Avoidance

6.2. Traffic Management and Prioritization Systems

6.3. Advanced Environmental Processing and Security Applications

6.4. Fault Tolerant and Resilient Systems

6.5. Intelligent Sensor Fusion and Multi-Modal Processing

6.6. Specialized Hardware Architectures and Acceleration

6.7. Performance Analysis and Comparison

6.8. Comparative Performance Insights

7. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI