LNT-YOLO: A Lightweight Nighttime Traffic Light Detection Model

Munir, Syahrul; Lin, Huei-Yung

doi:10.3390/smartcities8030095

Open AccessArticle

LNT-YOLO: A Lightweight Nighttime Traffic Light Detection Model

by

Syahrul Munir

^1,2,†

and

Huei-Yung Lin

^3,*,†

¹

College of Electrical Engineering and Computer Science, National Taipei University of Technology, Taipei 10608, Taiwan

²

Department of Physics, Universitas Pembangunan Nasional Veteran Jawa Timur, Surabaya 60294, Indonesia

³

Department of Computer Science and Information Engineering, National Taipei University of Technology, Taipei 10608, Taiwan

^*

Author to whom correspondence should be addressed.

^†

These authors contributed equally to this work.

Smart Cities 2025, 8(3), 95; https://doi.org/10.3390/smartcities8030095

Submission received: 12 April 2025 / Revised: 29 May 2025 / Accepted: 2 June 2025 / Published: 6 June 2025

Download

Browse Figures

Versions Notes

Abstract

Highlights

What are the main findings?

The integration of low-level feature enhancement, the SEAM module, and HSM-EIoU significantly enhances the detection performance of small and poorly defined objects by enriching spatial and channel-wise feature representation, improves the model’s ability to focus on challenging samples, and provides more robust bounding-box regression, leading to increased detection accuracy and faster convergence.
A new, comprehensive dataset specifically designed for nighttime traffic light detection named TN-TLD is introduced, providing a valuable resource for evaluating and improving object detection models under low-light conditions.

What is the implication of the main findings?

The advancements in feature representation and loss function design can be applied to enhance object detection systems in smart city applications, improving safety and efficiency in urban environments.
The dataset serves as a benchmark for future research in this particular domain.

Abstract

Autonomous vehicles are one of the key components of smart mobility that leverage innovative technology to navigate and operate safely in urban environments. Traffic light detection systems, as a key part of autonomous vehicles, play a key role in navigation during challenging traffic scenarios. Nighttime driving poses significant challenges for autonomous vehicle navigation, particularly in regard to the accuracy of traffic lights detection (TLD) systems. Existing TLD methodologies frequently encounter difficulties under low-light conditions due to factors such as variable illumination, occlusion, and the presence of distracting light sources. Moreover, most of the recent works only focused on daytime scenarios, often overlooking the significantly increased risk and complexity associated with nighttime driving. To address these critical issues, this paper introduces a novel approach for nighttime traffic light detection using the LNT-YOLO model, which is based on the YOLOv7-tiny framework. LNT-YOLO incorporates enhancements specifically designed to improve the detection of small and poorly illuminated traffic signals. Low-level feature information is utilized to extract the small-object features that have been missing because of the structure of the pyramid structure in the YOLOv7-tiny neck component. A novel SEAM attention module is proposed to refine the features that represent both the spatial and channel information by leveraging the features from the Simple Attention Module (SimAM) and Efficient Channel Attention (ECA) mechanism. The HSM-EIoU loss function is also proposed to accurately detect a small traffic light by amplifying the loss for hard-sample objects. In response to the limited availability of datasets for nighttime traffic light detection, this paper also presents the TN-TLD dataset. This newly curated dataset comprises carefully annotated images from real-world nighttime driving scenarios, featuring both circular and arrow traffic signals. Experimental results demonstrate that the proposed model achieves high accuracy in recognizing traffic lights in the TN-TLD dataset and in the publicly available LISA dataset. The LNT-YOLO model outperforms the original YOLOv7-tiny model and other state-of-the-art object detection models in mAP performance by 13.7% to 26.2% on the TN-TLD dataset and by 9.5% to 24.5% on the LISA dataset. These results underscore the model’s feasibility and robustness compared to other state-of-the-art object detection models. The source code and dataset will be available through the GitHub repository.

Keywords:

deep learning; driving assistance system; traffic light detection

1. Introduction

As one of the pillars of smart cities, smart mobility focuses on improving the transportation system by integrating innovative technologies [1]. Smart mobility is designed to reduce travel times, minimize environmental damage, improve quality of life, and increase safety [2,3,4]. Intelligent transportation systems (ITSs) play a critical role in achieving these goals. They could help minimize idle vehicle time by approximately 25%, reduce carbon emissions by around 15–20%, and prevent accidents by around 20% [5]. As one of the pioneering smart city technologies, ITSs have been implemented in many cities worldwide. The next phase of ITS development will be driven by the rise of connected and autonomous vehicles, which, when integrated with other sensing technologies, will facilitate the adoption of driverless vehicles and transform the way we travel [6]. In recent years, autonomous vehicle technology has advanced significantly, establishing itself as a fundamental component of smart mobility that has the potential to revolutionize transportation [7]. Initially, autonomous vehicle research was predominantly focused on facilitating autonomous driving within controlled environments, such as highways. This early focus emphasized the development of systems capable of navigating predictable high-speed roads with minimal environmental complexity. However, contemporary self-driving systems have evolved considerably, demonstrating the ability to navigate diverse and challenging traffic scenarios, including the intricate conditions of urban environments [8]. Numerous automotive manufacturers, suppliers, and technology firms are actively developing autonomous vehicles capable of traveling long distances without human intervention, and there are also some efforts being made by the government to regulate the use of these autonomous vehicles. This widespread investment underscores the growing potential of autonomous driving technology.

For autonomous vehicles to operate effectively on various types of roads, particularly in urban environments, precise detection and interpretation of the status and location of traffic lights are essential. Achieving this requires overcoming several challenges through technological innovation. These challenges include varying lighting conditions, occluded traffic lights, limited resolution for distant signals, and motion blur resulting from high-speed travel [9]. Furthermore, until Car-to-Infrastructure (C2I) communication is universally implemented, autonomous vehicles must coexist with traditional vehicles, necessitating the accurate detection of traffic lights originally designed for human drivers.

Traffic light detection (TLD) systems are crucial not only for the safe and efficient operation of autonomous vehicles, but they also offer significant potential benefits for individuals with visual impairments. However, designing reliable TLD systems presents substantial challenges, requiring solutions that are highly reliable, effective, stable, and capable of real-time operation. The inherent diversity of traffic lights, which vary in size, type, and illumination, further complicates this task. Existing research has focused predominantly on daytime scenarios, often overlooking the significantly increased risk and complexity associated with nighttime driving. Nighttime car accidents exhibit considerably higher incidence and fatality rates compared to daytime accidents, underscoring the critical need for improved nighttime TLD capabilities. This study directly addresses this gap by focusing on the development of enhanced TLD algorithms specifically designed for nighttime conditions, encompassing both arrow and circular traffic signals.

Understanding the complexities of nighttime traffic light detection (TLD) is essential, as these differ significantly from daytime scenarios, as illustrated in Figure 1. The images showcase the unique challenge of nighttime traffic light detection regarding low-level lights compared to daytime. The presence of various light sources, such as vehicle taillights and streetlights, contributes to a greater challenge because they can mimic traffic lights. In addition, the halo effect, characterized by a luminous ring around traffic lights, further complicates detection. Moreover, TLD systems must address issues such as color tone shifts, occlusion, and incomplete shapes, as highlighted in previous studies [9]. These factors contribute to the complexity of nighttime TLD, underscoring the need for robust algorithms capable of distinguishing traffic lights from other light sources under low-light conditions.

Currently, there are no publicly available datasets that specifically focus on night driving. The closest is the LISA dataset [9], which includes both daytime and nighttime driving data. However, in this dataset, the number of nighttime driving images is insufficient compared to the daytime images. Therefore, there is an urgent need for a publicly available dataset that focuses on nighttime driving.

Traffic light detection has advanced significantly, with numerous techniques proposed by researchers. These methods can be broadly classified into traditional algorithms that utilize image processing and machine learning, and those based on deep learning [10]. Earlier research relied heavily on computer vision techniques. However, in recent years, the use of deep neural networks has increased substantially. This approach has demonstrated excellent performance in various domains, including classification and detection. Some state-of-the-art detectors include the single-state detector (SSD) [11], faster region-based convolutional neural network (Faster R-CNN) [12], RetinaNet [13], and the You Only Look Once (YOLO) series [14].

This paper proposes an end-to-end detection model that processes the input image and produces the detection results of bounding-box prediction and class probability without the need for intermediate components to improve the accuracy and efficiency of TLD systems. The structure of the LNT-YOLO model is based on the YOLOv7 framework, using low-level features, integrating an attention mechanism, and modifying the loss function for accurate and efficient traffic light detection. The primary objective is to enhance the ability to accurately recognize and respond to traffic signals in nighttime conditions by identifying crucial features and their inter-relationships within images.

The contributions of this paper are outlined as follows:

Developing a lightweight traffic light detection model for nighttime driving scenes, named LNT-YOLO. By incorporating low-level features, an attention mechanism, and an EIoU loss function, the model effectively identifies small objects under low-illumination conditions.
Introducing a new publicly available dataset, TN-TLD (Taiwan Nighttime Traffic Light Dataset), which includes object-level annotations for training and testing nighttime traffic light detection models. This dataset covers all types of traffic light shapes, including circular and arrow lights.
Demonstrating the effectiveness of LNT-YOLO through a series of experiments conducted on both the LNT-YOLO model and the publicly available LISA dataset. This paper also compares its performance quantitatively and qualitatively with existing state-of-the-art (SOTA) methods. These results underscore the superiority of our model over previous approaches.

This paper is organized as follows. In Section 2, related work on general traffic light detection and particular nighttime traffic light detection is studied. Section 3 explains LNT-YOLO as a proposed method. In Section 4, the experimental results are presented. Lastly, Section 5 concludes this work.

2. Related Work

Navigation is a fundamental aspect of the design of autonomous driving systems, with precise traffic light detection (TLD) being crucial for effective vehicle navigation. Traditional approaches typically employ computer vision and image processing techniques for recognition [15,16]. Traditional approaches are vulnerable to variations in distance, camera placement, lighting conditions, and vehicle computational power [17]. The traditional approach of using manually set parameters based on heuristic methods is not suitable for complex real-world scenarios and is labor-intensive. As a result, recent research has focused on leveraging deep learning techniques to develop data-driven models.

Advances in deep learning techniques have significantly improved the effectiveness of TLD systems. Several studies have employed two-stage methods for traffic light detection. For instance, Ouyang et al. [17] and Masaki et al. [18] used heuristic modules and semantic segmentation followed by CNN classifiers to improve accuracy and efficiency. Although these methods show some improvement compared to traditional approaches, they may suffer from higher computational costs and complexity.

Although not strictly one- or two-stage in the traditional sense, some models combine different architectures to achieve their goals. For example, Ou et al. [19] employed a combination of a Transformer and a ResNet CNN to improve safety in autonomous and manual driving by reducing red-light violations. A similar work by Yang et al. [20] introduces RC-DINO, a comprehensive approach for improving the recognition of traffic lights in autonomous vehicles with a combination of ResNeSt50, CBAM, and DINO. Those approaches, although effective, can be considered as involving multiple stages of a CNN to perform feature extraction as well as the transformer encoder–decoder part to carry out prediction tasks. Similarly, Greer et al. [21] adopted the DETR transform, which is deformable with salience-sensitive focal loss, improving the recall of critical traffic lights in various driving scenarios. This method also involves an encoder–decoder structure, making it a multistage process.

To mitigate these problems, one-stage methods have also been explored. Unlike two-stage detectors, one-stage methods perform object localization and classification in a single forward pass of the neural network. One-stage methods, such as those using the YOLO series, have been particularly effective. Wang et al. [22] improved YOLOv4 by addressing the sensitivity and detection precision of small objects. Li et al. [23] enhanced YOLOv5 with a coordinate attention layer and a weighted bidirectional feature pyramid. These methods offer real-time performance, but may still struggle in low-light conditions during nighttime.

Nighttime traffic light detection poses unique challenges due to low-light conditions and the need for highly sensitive image sensors. Haltakov et al. [24] and Diaz et al. [25] proposed methods that adapt to both daytime and nighttime scenarios, but these methods may not be optimized for the specific challenges of nighttime detection. Wang et al. [26] used a dual-channel method to detect traffic lights under varying illumination conditions, which is a step towards addressing nighttime detection, but it still has limitations in terms of accuracy and computational efficiency. Zhou et al. [27] introduced KCS-YOLO, which improved detection under poor visibility conditions, but their method required preprocessing images with a dark channel prior to the dehazing technique, which could add computational overhead.

Although various approaches have been explored to improve traffic light detection, each has its strengths and limitations. Two-stage methods offer high accuracy but at the cost of computational efficiency, while one-stage methods like YOLO provide real-time performance but may struggle with small objects in low-light conditions. The need for a lightweight, efficient, and accurate model remains a significant challenge, particularly for nighttime traffic light detection. Most of these models are too large and consume a significant amount of computational resources, making them less suitable for real-time applications in resource-constrained environments. For example, RC-DINO in [20] takes around 47 M parameters and 130 GFLOPs, while the enhanced YOLOv5 model in [23] takes around 7 M parameters and 16 GFLOPs, and KCS-YOLO in [27] takes around 7.4 M parameters and 20.7 GFLOPs. This is particularly problematic for autonomous vehicles, where efficient and accurate traffic light detection is critical. In addition to that, only a few of them focused on nighttime images compared to daytime images. Given these limitations, the main objective of this work is to develop a more lightweight traffic light detection model that maintains high accuracy while reducing computational load in night conditions. Our approach aims to leverage efficient architectures and novel attention mechanisms to achieve real-time performance without compromising detection accuracy. By designing a lightweight model, we aim to address the gap between accuracy and efficiency, making traffic light detection more feasible and reliable in real-world autonomous driving scenarios.

3. Proposed Method

This study focuses on the need to improve the accuracy of recognition at night when numerous challenges arise. In this research, a novel vision-based traffic light detection system is proposed to focus on this particular problem. Figure 2 shows the proposed LNT-YOLO framework based on the YOLOv7-tiny model. YOLOv7 is chosen as the base model of the proposed model because some other works have already proven that it displays excellent performance in detecting small objects in some areas of expertise, such as detecting traffic signs [28], floating waste objects [29], aerial-view objects [30,31], people from a long distance [32]. The lightweight version, YOLOv7-tiny, significantly reduces the number of parameters and computation while maintaining similar accuracy, making it suitable for applications with limited computational resources.

The baseline framework of YOLOv7-tiny can be divided into three parts: backbone, neck, and head. The backbone of our LNT-YOLO model is based on the ELAN module from the YOLOv7-tiny framework. This module utilizes the CBL (Convolutional Block Layer) base convolution module for feature extraction. The backbone of the original YOLOv7-tiny is retained in our proposed model to leverage its efficient feature extraction capabilities. The neck component of our model employs SPPCSPC (Spatial Pyramid Pooling with Convolutional Block Layer and Channel-Split Convolutional Block Layer) and ELAN modules to aggregate image features. A key innovation in our model is the introduction of the Simple Efficient Attention (SEAM) module as a bridge to connect the backbone part to the neck part. This novel SEAM module is proposed by combining a Simple Attention Module (SimAM) [33] and an Efficient Channel Attention (ECA) mechanism [34] to achieve the representation of features of small objects, such as a traffic light, which are often characterized by subtle visual cues and limited spatial extent. In addition, a new layer is introduced to combine low-level features with deep-level features. This approach improves the detection of small objects while maintaining the performance for medium and large objects. At the head of our model, standard convolutional layers are used to adjust the channels of the output features for prediction and output, similar to the baseline YOLOv7-tiny model. The detection of traffic lights at night is challenging due to low-light conditions and the small size of traffic lights, which can lead to inaccurate classification and imprecise localization. To address these issues, we used the HSM (Hard Sample Mining) loss function in conjunction with the EIoU (Enhanced Intersection Over Union) loss function. This combination helps distinguish traffic lights from the background and improves the precision of bounding-box coordinates, even in low-resolution or blurry images. By integrating the SEAM module within the neck of the network and adding a low-level feature extraction layer, we enhance the model’s ability to extract relevant features from nighttime traffic light images. Modifying the original loss function to the HSM-EIoU loss function further improves the model’s performance in accurately detecting and localizing traffic lights under challenging nighttime conditions.

3.1. Low-Level Feature Enhancement

Traffic lights often appear at a distance and only occupy a small portion of the image. At night, recognizing these distant traffic lights becomes even more challenging for cameras. Although YOLOv7-tiny is effective in various applications, it struggles to detect small objects due to its reliance on a convolution-based feature extraction approach. As the network depth increases, the resolution of the feature map progressively decreases. Additionally, pooling layers and convolutional kernels further diminish the small features of the targets. Consequently, these factors may result in the loss of small-object feature details during feature extraction.

To address this issue, LNT-YOLO incorporates a low-level feature detection layer, represented in Figure 2 by the box with the orange dashed line. The initial three-scale feature fusion in YOLOv7-tiny is expanded into a four-scale feature fusion. The feature detection layer with a size of 80 × 80 is duplicated and subsequently merged with the original 80 × 80 layer. This method helps the network retain detailed image features, improving its ability to detect smaller targets while preserving the original YOLOv7-tiny head structure.

Furthermore, the pyramid structure, depicted in the neck section of Figure 2, effectively combines shallow and deep features at different resolutions. This integration not only improves small-target detection but also maintains accuracy for medium and large targets. As a result, using this refined model is expected to significantly enhance the model’s detection performance.

Although our low-level feature enhancement strategy may appear similar to the small-object detection design in KCS-YOLO [27], there are important architectural differences that establish the novelty of our approach. First, KCS-YOLO is built on the YOLOv5n architecture, whereas our approach uses YOLOv7-tiny, which differs significantly in its backbone structure, neck design, and feature propagation strategy. Second, KCS-YOLO introduces a new modified head structure tailored for shallow feature enhancement. In contrast, our method retains all three original detection heads of YOLOv7-tiny (for small, medium, and large objects), preserving its multiscale prediction capability. We enhance small-object detection by introducing low-level feature maps into higher-level features within the neck, allowing the network to recover fine-grained details often lost in the downsampling process. This cross-scale fusion enriches the input to the detection heads without altering their structure, enabling better recognition of small and dim traffic lights during nighttime scenarios. These architectural distinctions contribute to both the novelty and effectiveness of our approach for nighttime traffic light detection.

3.2. Simple Efficient Attention Module

To enhance the representation of features within the YOLOv7-tiny architecture, this work introduces a new feature refinement module called the Simple Efficient Attention Module (SEAM). This module sequentially applies the Simple Attention Module (SimAM) [33] and the Efficient Channel Attention Mechanism (ECA) [34], and it is positioned before the concatenation layer in the backbone network. Both of them are used because of their fast and lightweight properties compared to other attention modules, as claimed in their respective works. While SimAM refines spatial relationships within feature maps, ECA selectively emphasizes informative channels. This combined module generates an output that is then combined with the preceding feature map, effectively enriching the feature representation with contextual spatial and channel-wise information. The proposed module is named the Simple Efficient Attention Module (SEAM), as depicted in Figure 3.

SimAM, rooted in neuroscientific principles, represents a novel approach to attention in convolutional neural networks. Unlike other attention-based modules that use one-dimensional channel attention or two-dimensional spatial attention, SimAM does not require traditional pooling operations. Instead, it assigns weights based on an energy function derived from neuroscience theory and the principle of linear differentiability. This unique approach results in a parameter-free three-dimensional weighted attention mechanism that avoids the introduction of additional trainable parameters.

The energy function of SimAM is as follows:

\begin{matrix} E_{t} (w_{t}, b_{t}, y, x_{i}) = & \sum_{i = 1}^{M - 1} {(- 1 - (w_{t}^{T} x_{i} + b_{t}))}^{2} + {(1 - (w_{t}^{T} t + b_{t}))}^{2} + λ w_{t}^{2}, \end{matrix}

(1)

where t and

x_{i}

refer to the target and other neurons, respectively, within the single channel of the input feature

X \in R^{H \times W \times C}

. The index i spans the spatial dimension, and

M = H \times W

represents the total number of neurons in that channel.

w_{t}

and

b_{t}

are denoted as weight and bias, respectively. Equation (1) has a fast closed-form solution with respect to

w_{t}

and

b_{t}

analytically, which can obtained by

\begin{matrix} w_{t} & = \frac{- 2 (x_{t} - μ_{t})}{{(x_{t} - μ_{t})}^{2} + 2 σ_{t}^{2} + 2 λ} \end{matrix}

(2)

\begin{matrix} b_{t} & = - \frac{1}{2} (x_{t} + μ_{t}) w_{t} \end{matrix}

(3)

where

\begin{matrix} μ_{t} & = \frac{1}{M - 1} \sum_{i = 1}^{M - 1} x_{i} \end{matrix}

(4)

\begin{matrix} σ_{t}^{2} & = \frac{1}{M - 1} \sum_{i = 1}^{M - 1} {(x_{i} - μ_{i})}^{2} . \end{matrix}

(5)

Hence, the minimum energy can be calculated using the following equation:

E_{t}^{*} = \frac{4 ({\hat{σ}}^{2} + λ)}{{(x_{t} - \hat{μ})}^{2} + 2 {\hat{σ}}^{2} + 2 λ}

(6)

where the neuron importance can be obtained by

\frac{1}{E_{t}^{*}}

. Unlike the original SimAM, this work retains this importance value without using the sigmoid activation function.

The ECA module employs a channel attention mechanism characterized by its efficiency and avoidance of dimensionality reduction. Unlike global attention methods that utilize fully connected layers to reduce dimensionality, ECA directly models channel interdependencies through a one-dimensional convolution. This direct approach preserves the channel-wise granularity of information, enhancing the model’s capacity to capture subtle feature variations.

Formally, consider

X \in R^{H \times W \times C}

as the input feature map, with H, W, and C representing the height, width, and number of channels, respectively. The ECA module operates on the channel dimension, processing each spatial location independently. The input feature map is processed through Global Average Pooling, forming a 1D channel descriptor as a result of compressing the spatial dimensions into a single value for each channel. Then, the global spatial information for each channel is obtained by averaging the input feature map’s spatial dimensions, calculated as

s_{c} = \frac{1}{h \times w} \sum_{i = 1}^{h} \sum_{j = 1}^{w} X_{c i j},

(7)

where

s_{c}

denotes the global spatial information for channel c. This yields a vector

s \in R^{C}

that captures the spatial data for all channels. Then, ECA-Net performs a 1D convolution on the vector s as follows:

{\hat{β}}_{c} = Conv 1 D (s_{c}),

(8)

where

{\hat{β}}_{c} \in R^{C}

is the result of the convolution. In the original ECA module, the size of the 1D convolution kernel k is adaptively chosen. However, to simplify computations, this works will be carried out with

k = 3

. In addition, as in the previous SimAM module, this work decided to keep this importance value without using the sigmoid activation function.

In LNT-YOLO, SEAM is strategically placed between the backbone and neck, processing feature maps before further refinement. This integration improves small-object representation, which is crucial for traffic lights characterized by subtle cues and limited spatial extent. The combination of the parameter-free SimAM module and the lightweight ECA module in SEAM ensures that it does not significantly increase the computational cost or the number of parameters in the already lightweight YOLOv7-tiny model. The improved feature representation facilitated by SEAM is expected to lead to more accurate and robust detection of small objects, like traffic lights, which are often characterized by subtle visual cues and limited spatial extent.

3.3. HSM-EIoU Loss

In object detection tasks, the loss function typically comprises three components: regression loss for bounding boxes, confidence prediction loss, and classification loss. YOLOv7, in its original configuration, utilizes focal loss for classification and Complete Intersection over Union (CIoU) loss for bounding-box regression. Focal loss addresses class imbalance by reducing the weight of easily classified samples, focusing more on difficult ones. Meanwhile, CIoU loss optimizes localization by considering overlap area, center-point distance, and bounding-box aspect ratio. However, these loss functions have limitations when applied to the challenging task of detecting nighttime traffic lights. Low-light conditions and the small size of traffic lights often lead to inaccurate classification (due to difficulty distinguishing traffic lights from the background) and imprecise localization (due to CIoU’s sensitivity to small variations in bounding-box coordinates, which are amplified in low-resolution or blurry images). These limitations result in reduced detection accuracy and missed detections, particularly for smaller or partially obscured traffic lights.

To address these shortcomings, this work proposes a modified loss function architecture. Specifically, we replace the focal loss with the Hard Sample Mining (HSM) loss and the CIoU loss with the Efficient Intersection Over Union (EIoU) loss. The HSM loss, proposed by [31], dynamically adjusts the weighting of samples based on their difficulty, improving the model’s focus on challenging instances. This could help the model to achieve a balance between positive and negative samples and easy and hard samples, as claimed in their work. The EIoU loss, proposed by [35], offers a more robust and efficient approach to bounding-box regression, which is particularly beneficial for small and poorly defined objects. EIoU loss is chosen because the original CIoU loss used by the baseline model suffers from slow convergence and inaccurate positioning in the optimization process of the bounding-box regression model [36].

Unlike binary cross-entropy (BCE) loss, which primarily focuses on suppression, HSM loss emphasizes both the amplification and suppression of samples. Initially, it uses the same modulation factor as focal loss to amplify the loss for hard samples while suppressing it for easy ones. Additionally, a lower limit is established in the modulation factor to prevent the exponentially decreasing decay factor from adversely affecting the model’s learning ability once it reaches a threshold for easy samples. This ensures a balanced and effective training process.

HSMLoss = - (2^{γ} {(1 - p_{t})}^{γ} + 0.1) log (p_{t})

(9)

Figure 4 illustrates the differences in classification loss among traditional binary cross-entropy (BCE) loss, focal loss of YOLOv7, and HSM loss. It is evident that focal loss consistently yields smaller values than cross-entropy loss because it suppresses both hard and easy samples. The HSM loss introduced in this study attempts to balance hard and easy samples. It amplifies the loss when the confidence probability is below 0.5 to enhance the detection of hard-to-detect objects, while it also suppresses the loss when the confidence probability exceeds 0.5 to mitigate the detection’s overfitting of easier objects. It can be seen that when confidence exceeds 0.5, the loss resembles focal loss. However, when confidence falls below 0.5, it closely resembles cross-entropy loss but with significant variation, facilitating better extraction of hard samples.

To address the challenges inherent in small-object detection, particularly the need for precise localization and rapid convergence during training, the EIoU loss function is integrated into the LNT-YOLO model. Small objects often present limited visual information, making accurate bounding-box regression crucial but difficult to achieve. EIoU’s superior ability to regress bounding-box coordinates, considering both center distance and aspect ratio discrepancies, directly addresses this challenge. Unlike CIoU, which may struggle with the aspect ratio of small-object bounding boxes, EIoU’s comprehensive approach leads to improved localization accuracy for these subtle targets while simultaneously offering faster convergence compared to traditional IoU-based losses. This combination of improved localization precision and efficient training makes EIoU a highly suitable choice to enhance the detection performance of the proposed model, especially with respect to small objects. EIoU is defines by the following equation:

\begin{matrix} L_{E I o U} & = L_{I o U} + L_{d i s} + L_{a s p} \end{matrix}

(10)

\begin{matrix} L_{I o U} = 1 - I o U \end{matrix}

(11)

\begin{matrix} L_{d i s} = \frac{ρ^{2} (b, b^{g t})}{{(w_{c})}^{2} + {(h_{c})}^{2}} \end{matrix}

(12)

\begin{matrix} L_{a s p} = \frac{ρ^{2} (w, w^{g t})}{{(w_{c})}^{2}} + \frac{ρ^{2} (h, h^{g t})}{{(h_{c})}^{2}} \end{matrix}

(13)

where

w_{c}

and

h_{c}

are the dimensions of the smallest box that encloses the two boxes. The loss function is segmented into three parts: IoU loss

L_{I o U}

, distance loss

L_{d i s}

, and aspect ratio loss

L_{a s p}

. This structure helps maintain the advantageous properties of the CIoU loss. Furthermore, EIoU minimizes the discrepancy between the height and width of the target and anchor boxes, leading to faster convergence and better localization accuracy.

4. Experimental Results

The proposed LNT-YOLO model is evaluated on the TN-TLD dataset and its performance is compared with other related works. Moreover, this paper will analyze the impact of the low-level feature enhancement, SEAM attention module and HSM-EIoU loss of the LNT-YOLO method and its runtime.

4.1. TN-TLD Dataset

Currently, there is no publicly available dataset specifically designed for the detection of traffic lights at night. To address this gap, the TN-TLD dataset was developed, which captures nighttime driving scenarios in Taiwan. This dataset features scenes from the Chiayi, Taiwan, road network, covering a wide range of traffic intersections, various types of traffic lights, and a mix of urban and rural environments. The camera used for recording was positioned at the front of the driver’s seat inside the vehicle, providing a perspective aligned with that of a driver.

The dataset consists of five recorded scenes, each approximately five minutes in duration, with meticulous manual labeling of bounding boxes in each frame. Each image in the TN-TLD dataset was carefully examined, and bounding boxes were manually drawn to tightly fit around the traffic lights to minimize background space. Consequently, the dataset comprises a total of 36,050 images, offering a comprehensive and detailed resource for research on nighttime traffic light detection. It includes a diverse set of images captured under various conditions, such as different time and different road sections, to ensure that the dataset is representative of real-world scenarios. The substantial size and wide range of variability in the dataset make it an excellent resource for achieving generalization in the training process. For training and validation purposes, three of the recorded scenes are allocated for these tasks, while the remaining scenes are reserved exclusively for testing. This strategic division ensures robust model evaluation and validation in a diverse range of nighttime driving scenarios.

Bounding boxes were utilized to label each traffic light around the illuminated light. The traffic lights are categorized into six classes based on three different colors and four different shapes: circle red light (referred to as “red”), circle yellow light (referred to as “yellow”), circle green light (referred to as “green”), left arrow green light (referred to as “left”), straight arrow green light (referred to as “straight”), and right arrow green light (referred to as “right”). The detailed overview of the LNT-YOLO dataset is presented in Table 1. This table provides an overview of the number of samples available for each class and a way to understand the balance of classes between different phases of model evaluation. The TN-TLD dataset, along with the LNT-YOLO model code, is publicly accessible via GitHub.

4.2. Evaluation Criteria

This work uses mean average precision (mAP) as a metric to evaluate the performance of our detector model. The mAP is calculated as the average of the precision values across all classes. This mAP value reflects the model accuracy in a proportional way. The equation is as follows:

m A P = \frac{1}{n} \sum_{k = 1}^{n} A P_{k}

(14)

The mAP@0.5 metric measures the average precision for all classes with an Intersection over Union (IoU) value of at least 0.5. On the other hand, mAP@0.5:0.95 evaluates the average precision by considering IoU values from 0.5 to 0.95 and then averaging these results. These metrics are prevalent in assessing the effectiveness of object detection algorithms, especially within the YOLO framework.

4.3. Implementation Details

The experimental methodology employed for this project focused on evaluating the model’s performance and establishing robust training procedures. Two different mAP values are used, as mentioned in the previous subsection, to evaluate the model’s performance.

The TN-TLD dataset is utilized to perform the experiments for training, validation, and testing the model. The input image resolution for training is set to 1280 × 1280 pixels, and the model is trained with 300 epochs. Data augmentation techniques are used to mitigate the class imbalance in the training set. HSV augmentation, image translation, image scale, image mosaic, and paste-in augmentation that were already integrated in the YOLOv7 model are chosen to improve the results. This work uses the same hyperparameter as the one used for YOLOv7. The YOLOv7-tiny pre-trained weights given from the original author are utilized.

4.4. Results

4.4.1. Comparisons with Other SOTA Detectors

To evaluate the effectiveness of the proposed nighttime object detection model, experiments were conducted using the TN-TLD dataset.

The LNT-YOLO model was compared against other state-of-the-art (SOTA) deep learning-based object detection algorithms. An object detection model that utilizes YOLO as its baseline architecture was used to carry out a comparison with our proposed method. Among these, the models with the lowest parameter count were selected to match the size of the LNT-YOLO model. For a fair comparison, each model employed its pre-trained weights, which were trained on the COCO dataset, while maintaining their unique augmentations.

Table 2 shows a comprehensive overview of mAP, parameter count, and GFLOPs of different object detection models. The bold text indicate the highest one among other models. The results presented there demonstrate the superior performance of the proposed LNT-YOLO model compared to several SOTA models in the challenging task of detecting nighttime traffic lights. It achieves improvements of more than 20% in mAP@0.5 and more than 13% in mAP@0.5:0.95 compared to other works. The LNT-YOLO model achieved the highest mean average precision (mAP) scores of 0.781 at an IoU threshold of 0.5 and 0.423 at IoU thresholds ranging from 0.5 to 0.95. These results underscore the effectiveness of the enhancements incorporated into LNT-YOLO, including the addition of low-level features and the SEAM module, which collectively contribute to improved detection performance in low-light conditions. The use of HSM+EIoU loss functions also plays a crucial role in achieving this balance, as they improve the model’s focus on challenging samples and enhance localization accuracy, particularly in the presence of small and poorly defined objects.

In terms of model complexity, LNT-YOLO maintains a competitive parameter count of approximately 6.4 M, slightly higher than the baseline YOLOv7-tiny and DEYO-N. Despite this, LNT-YOLO significantly outperforms these models in mAP metrics, indicating more efficient use of parameters to enhance feature representation and detection accuracy. The integration of low-level feature addition allows LNT-YOLO to capture finer details necessary for detecting small objects, like traffic lights, while SEAM enhances the model’s attention mechanism without adding extra parameters. Furthermore, the computational efficiency of LNT-YOLO is demonstrated by its GFLOP count of 14.9, which remains reasonable given the substantial improvements in detection accuracy. Although KCS-YOLO achieves a lower parameter count, it does not match the mAP performance of LNT-YOLO, underscoring the importance of the strategic enhancements in the proposed model.

Table 3 presents a comparison of the proposed LNT-YOLO model against the same SOTA object detection models on the LISA dataset. The bold text indicates the highest values among other models. The inclusion of daytime and nighttime traffic light images in the LISA dataset allows for a more robust evaluation of the generalization capabilities of the model beyond the scenarios focused on night, which have been the main focus of this work.

The results demonstrate that our proposed approach significantly outperforms the other models in terms of mean average precision (mAP) in the daytime and nighttime traffic light detection dataset. It achieve improvements of more than 15% in mAP@0.5 and more than 9% in mAP@0.5:0.95 compared to other works. Specifically, LNT-YOLO achieves an mAP@0.5 of 0.325 and an mAP@0.5:0.95 of 0.138, representing substantial improvements over the best-performing competitor, DEYO-N (mAP@0.5 = 0.282, mAP@0.5:0.95 = 0.12). This superior performance across both mAP metrics indicates that LNT-YOLO achieves higher accuracy in detecting traffic lights, even at higher Intersection over Union (IoU) thresholds, suggesting better localization and classification capabilities.

The improved performance of LNT-YOLO is particularly notable given the inclusion of daytime images in the LISA dataset. Although this work primarily focused on nighttime scenarios, the strong performance of LNT-YOLO on LISA suggests a greater degree of generalization. This implies that the model’s architecture and training strategy are less susceptible to variations in lighting conditions, making it a more robust and reliable solution for real-world traffic light detection applications where diverse lighting conditions are inevitable.

4.4.2. Ablation Study

This section provides a detailed examination of the enhancements introduced in the LNT-YOLO model, specifically focusing on the integration of low-level feature addition, SEAM, and the HSM+EIoU loss functions. Through a series of ablation studies, we aim to evaluate the contribution of each component to the model’s performance, particularly in the context of nighttime traffic light detection. The results presented in Table 4 offer insights into the individual and combined effects of each modification, providing valuable information on their contributions to the overall effectiveness of the model. The table details the results of systematically adding a low-level feature (LLF), SEAM module, and incorporating an HSM-EIoU loss function in the model and assessing the corresponding changes in performance metrics to display the contributions of each technique on the overall effectiveness of the proposed approach.

The baseline model, with a mean average precision (mAP) of 0.642 at an IoU threshold of 0.5 and 0.335 across IoU thresholds from 0.5 to 0.95, serves as a reference point to assess the impact of each enhancement. By systematically integrating each component, we observed significant improvements in detection performance, underscoring the effectiveness of the proposed approach.

The introduction of low-level feature (LLF) addition results in a notable increase in mAP by 11% to 0.717 at an IoU of 0.5 and increase in mAP by 14% to 0.383 across IoU thresholds from 0.5 to 0.95. This improvement highlights the critical role of enhanced feature representation in capturing the subtle details necessary for the accurate detection of small objects, such as traffic lights, under challenging conditions. LLF addition effectively enriches the model’s feature extraction capabilities with only an increase in parameters of 2% and an increase of 9% in computational loads, providing a more detailed and robust representation of the input images.

Further enhancement is achieved by incorporating the SEAM module, which raises the mAP by 13% and 23% to 0.726 and 0.417, respectively. The addition of SEAM demonstrates the advantage of an efficient attention mechanism that improves the model’s capability to focus on key features while increasing the parameter to 7% and the GFLOPs to 14%. The integration of this module illustrates its contribution to refining the feature map and improving overall detection accuracy.

Lastly, by integrating HSM-EIoU loss to form the whole LNT-YOLO model, it achieves an impressive mAP at 21% of 0.748 at IoU 0.5 and increase by 26% to 0.461 across IoU thresholds from 0.5 to 0.95. These results underscore the synergistic effects of combining low-level feature addition, SEAM, and HSM+EIoU loss functions. Collectively, these enhancements provide a substantial boost in performance while maintaining a manageable parameter count with a 7% increase and computational load with a 23% increase compared to the baseline, validating the effectiveness of the proposed approach in advancing nighttime traffic light detection capabilities.

4.4.3. Discussions

The performance of the LNT-YOLO model in different classes of traffic lights is detailed in Table 5. The evaluation metrics, including precision, recall, and mean average precision (mAP), offer a clear understanding into the proposed model’s ability to detect various traffic light categories and its overall effectiveness.

It can be seen from Table 5 that LNT-YOLO demonstrates a commendable balance between precision and recall, and also achieves high performance with regard to the mAP. The detection of red lights and green lights exhibits excellent performance with an mAP@0.5 that is higher than 0.8 and mAP@0.5:0.95 higher than 0.45, underscoring the effectiveness of the model in identifying these common traffic light colors. However, the yellow class presents distinct challenges, with lower precision (0.0784) but high recall (0.719). This discrepancy indicates that while the model is adept at detecting most yellow lights, it frequently generates false positives. The variability in lighting conditions, coupled with the smaller number of yellow light samples (as can be seen in Table 1), and the presence of street light that resembles a yellow color, contributes to this problem, highlighting the complexities involved in the detection of traffic lights under varying environmental conditions.

Directional signals, such as left, straight, and right, also display varied results. The model achieves excellent results for right-turning traffic lights, with high precision and a recall value greater than 0.9, as well as very high mAP values of 0.995 and 0.452. In contrast, left-turn lights show poor recall (0.37) and an mAP of less than 0.5, which points to a detection weakness. This may stem from their relative rarity in the dataset, leading to insufficient representation during training. Straight traffic lights exhibit a good balance between precision and recall, indicating a more consistent detection capability.

The TN-TLD dataset, while instrumental in training and evaluating the model, underscores the challenges associated with the limited availability of nighttime traffic light detection datasets. This scarcity affects the training process by limiting the diversity of samples available for model learning, particularly for less common classes such as yellow and left-turn lights. Consequently, the model’s ability to generalize across different lighting conditions and traffic scenarios is constrained, necessitating further research and dataset expansion to enhance detection performance.

To visually assess the improvement in detection performance of our enhanced model, sample images are selected from the test set of the TN-TLD dataset for a comparison. Figure 5 displays the detection results for different models mentioned in Table 2. Each row represents a different model, and each column shows a different scenario from the TN-TLD dataset.

NTUT DL APP HW4: Playing Tetris. The evaluation in all three scenarios highlights the strengths and challenges of each model in detecting traffic lights under varying conditions. In Scenario 1, which involved two green lights at an intersection, LNT-YOLO and YOLOv10s demonstrated their ability to detect both signals with a confidence value greater than 0.8, as supported by the high mAP of the green light in Table 5. This indicates their effectiveness in handling multiple signals in close proximity. In other models, the results are not substantial. The YOLOv7-tiny model shows lower confidence, the KCS-YOLO model misclassifies one of the green traffic lights as a straight traffic light, and the DEYO-n model is not able to detect one of the green traffic lights. In Scenario 2, where a red light and a green left arrow were present, LNT-YOLO again showed its ability to detect both signals with a confidence value greater than 0.6. This reinforces that while the left arrow finds fewer number of images, LNT-YOLO’s design mitigates this to some extent. Despite this, the LNT-YOLO model still performs a minor classification error by misclassifying the traffic light as a traffic light. This scenario revealed the common challenge of distinguishing traffic signals from other light sources, which was a common issue with KCS-YOLO. The YOLOv7-tiny model is able to detect both traffic lights correctly but suffers from a low confidence score compared to the result of the LNT-YOLO model. Meanwhile, DEYO-n and YOLOv10s models miss-detect one (and both) of the traffic lights. Scenario 3, which features a green light at a far distance, further underscored these difficulties. LNT-YOLO managed to detect the straight light with more than 0.5 confidence, despite its difficulties in detecting an even smaller traffic light object. This supports the mAP@0.5:0.95 values in Table 5, which show that LNT-YOLO maintains high performance across IoU thresholds. However, YOLOv10s, despite detecting the signal with high confidence, also misclassified a road light, highlighting a consistent issue across models. Meanwhile, KCS-YOLO and DEYO-n cannot detect the green traffic light. In general, LNT-YOLO exhibited strong detection capabilities in all scenarios, particularly in the capture of multiple signals, compared to other SOTA models. Some SOTA models fail to detect traffic lights altogether or incorrectly identify other light sources, such as road lights, as traffic lights. This highlights the superior performance of LNT-YOLO in accurately detecting traffic lights in various scenarios.

5. Conclusions

In this paper, LNT-YOLO is introduced as a traffic light detection model that is specifically designed to recognize traffic lights during nighttime conditions. Although existing research focuses primarily on traffic light detection in daytime images, the proposed model addresses the challenges of small-object detection and low-light conditions prevalent in nighttime scenes. This works achieves it by enhancing the original YOLOv7-tiny model with a low-level feature enhancement, a novel attention module, and the proposed HSM-EIoU loss function. Additionally, this work presents the TN-TLD dataset, a novel large-scale dataset with object-level annotations collected from natural nighttime driving scenes in Taiwan, providing rich and diverse data for training and evaluation purposes. Through extensive experiments conducted on the TN-TLD dataset, the effectiveness of the LNT-YOLO model is demonstrated. Comparisons with other SOTA models highlight the superior performance of the LNT-YOLO model in detecting traffic lights at night. Furthermore, ablation studies highlight the substantial impact of each proposed module and technique in boosting the effectiveness of the model. Evaluations of both quantitative and qualitative images on synthetic images demonstrate that our model outperforms current approaches. In general, our research presents a comprehensive solution for nighttime traffic light detection, addressing critical challenges in small-object detection and low-light conditions, thereby advancing the state of the art in this field. While LNT-YOLO demonstrates strong performance in nighttime traffic light detection, several avenues remain for future exploration. We aim to further improve the detection performance by addressing the issue of class imbalance to improve the accuracy of less frequently occurring traffic light types. Additionally, incorporating image preprocessing techniques may help improve detection robustness under challenging nighttime conditions. Finally, we plan to explore the deployment of LNT-YOLO in real-time applications, evaluating its performance with real on-board hardware.

Author Contributions

Conceptualization, S.M. and H.-Y.L.; methodology, S.M. and H.-Y.L.; software, S.M.; validation, S.M.; formal analysis, S.M.; investigation, S.M.; resources, H.-Y.L.; data curation, S.M.; writing—original draft preparation, S.M.; writing—review and editing, H.-Y.L.; visualization, S.M.; supervision, H.-Y.L.; project administration, H.-Y.L.; funding acquisition, H.-Y.L. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Data Availability Statement

The original data presented in the study are openly available in LNT-YOLO at https://github.com/esmunir/LNT-YOLO (accessed on 11 April 2025).

Conflicts of Interest

The authors declare no conflicts of interest.

Abbreviations

The following abbreviations are used in this manuscript:

TLD	Traffic Light Detection
C2I	Car-to-Infrastructure
CNN	Convolutional Neural Network
SOTA	State Of The Art
SimAM	Simple Attention Module
ECA	Efficient Channel Attention Mechanism
CIoU	Complete Intersection of Union
EIoU	Efficient Intersection of Union
HSM	Hard Sample Mining
GFLOPs	Giga Floating-Point Operations Per Second

References

Belaïd, F.; Amine, R.; Massie, C. Smart Cities Initiatives and Perspectives in the MENA Region and Saudi Arabia. In Smart Cities: Social and Environmental Challenges and Opportunities for Local Authorities; Belaïd, F., Arora, A., Eds.; Springer International Publishing: Cham, Switzerland, 2024; pp. 295–313. [Google Scholar] [CrossRef]
Benevolo, C.; Dameri, R.P.; D’Auria, B. Smart Mobility in Smart City. In Proceedings of the Empowering Organizations; Torre, T., Braccini, A.M., Spinelli, R., Eds.; Springer International Publishing: Cham, Switzerland, 2016; pp. 13–28. [Google Scholar]
Tomaszewska, E.J.; Florea, A. Urban smart mobility in the scientific literature—Bibliometric analysis. Eng. Manag. Prod. Serv. 2018, 10, 41–56. [Google Scholar] [CrossRef]
Müller-Eie, D.; Kosmidis, I. Sustainable mobility in smart cities: A document study of mobility initiatives of mid-sized Nordic smart cities. Eur. Transp. Res. Rev. 2023, 15, 36. [Google Scholar] [CrossRef]
Elassy, M.; Al-Hattab, M.; Takruri, M.; Badawi, S. Intelligent transportation systems for sustainable smart cities. Transp. Eng. 2024, 16, 100252. [Google Scholar] [CrossRef]
Menouar, H.; Guvenc, I.; Akkaya, K.; Uluagac, A.S.; Kadri, A.; Tuncer, A. UAV-Enabled Intelligent Transportation Systems for the Smart City: Applications and Challenges. IEEE Commun. Mag. 2017, 55, 22–28. [Google Scholar] [CrossRef]
Wolniak, R. Smart mobility in a smart city concept. Sci. Pap. Silesian Univ. Technol. Organ. Manag. Ser. 2023, 2023, 679–692. [Google Scholar] [CrossRef]
Suganuma, N.; Yoneda, K. Current status and issues of traffic light recognition technology in Autonomous Driving System. IEICE Trans. Fundam. Electron. Commun. Comput. Sci. 2022, 105, 763–769. [Google Scholar] [CrossRef]
Jensen, M.B.; Philipsen, M.P.; Møgelmose, A.; Moeslund, T.B.; Trivedi, M.M. Vision for looking at traffic lights: Issues, survey, and perspectives. IEEE Trans. Intell. Transp. Syst. 2016, 17, 1800–1815. [Google Scholar] [CrossRef]
Xiang, N.; Cao, Z.; Wang, Y.; Jia, Q. A real-time vehicle traffic light detection algorithm based on modified YOLOv3. In Proceedings of the 2021 IEEE 4th International Conference on Electronics Technology (ICET), Chengdu, China, 7–10 May 2021; pp. 844–850. [Google Scholar]
Liu, W.; Anguelov, D.; Erhan, D.; Szegedy, C.; Reed, S.; Fu, C.Y.; Berg, A.C. Ssd: Single shot multibox detector. In Proceedings of the Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, 11–14 October 2016; Proceedings, Part I 14. Springer: Berlin/Heidelberg, Germany, 2016; pp. 21–37. [Google Scholar]
Ren, S.; He, K.; Girshick, R.; Sun, J. Faster R-CNN: Towards real-time object detection with region proposal networks. IEEE Trans. Pattern Anal. Mach. Intell. 2016, 39, 1137–1149. [Google Scholar] [CrossRef]
Lin, T.Y.; Goyal, P.; Girshick, R.; He, K.; Dollár, P. Focal loss for dense object detection. In Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy, 22–29 October 2017; pp. 2980–2988. [Google Scholar]
Wang, J.; Wu, J.; Wu, J.; Wang, J.; Wang, J. YOLOv7 Optimization Model Based on Attention Mechanism Applied in Dense Scenes. Appl. Sci. 2023, 13, 9173. [Google Scholar] [CrossRef]
Alam, A.; Jaffery, Z.A. A Vision-Based System for Traffic Light Detection. In Applications of Artificial Intelligence Techniques in Engineering; Malik, H., Srivastava, S., Sood, Y.R., Ahmad, A., Eds.; Series Title: Advances in Intelligent Systems and Computing; Springer: Singapore, 2019; Volume 698, pp. 333–343. [Google Scholar] [CrossRef]
Yu, G.; Lei, A.; Li, H.; Wang, Y.; Wang, Z.; Hu, C. A Real-Time Traffic Light Detection Algorithm Based on Adaptive Edge Information. In Proceedings of the Intelligent and Connected Vehicles Symposium, Kunshan, China, 14–15 August 2018; p. 2018-01-1620. [Google Scholar] [CrossRef]
Ouyang, Z.; Niu, J.; Liu, Y.; Guizani, M. Deep CNN-based real-time traffic light detector for self-driving vehicles. IEEE Trans. Mob. Comput. 2019, 19, 300–313. [Google Scholar] [CrossRef]
Masaki, S.; Hirakawa, T.; Yamashita, T.; Fujiyoshi, H. Distant traffic light recognition using semantic segmentation. Transp. Res. Rec. 2021, 2675, 97–103. [Google Scholar] [CrossRef]
Ou, Y.; Sun, Y.; Yu, X.; Yun, L. Traffic signal light recognition based on transformer. In Proceedings of the International Conference on Computer Engineering and Networks, Haikou, China, 4–7 November 2022; Springer: Berlin/Heidelberg, Germany, 2022; pp. 1354–1361. [Google Scholar]
Yang, L.; He, Z.; Zhao, X.; Fang, S.; Yuan, J.; He, Y.; Li, S.; Liu, S. A Deep Learning Method for Traffic Light Status Recognition. J. Intell. Connect. Veh. 2023, 6, 173–182. [Google Scholar] [CrossRef]
Greer, R.; Gopalkrishnan, A.; Landgren, J.; Rakla, L.; Gopalan, A.; Trivedi, M. Robust traffic light detection using salience-sensitive loss: Computational framework and evaluations. In Proceedings of the 2023 IEEE Intelligent Vehicles Symposium (IV), Anchorage, AK, USA, 4–7 June 2023; pp. 1–7. [Google Scholar]
Wang, Q.; Zhang, Q.; Liang, X.; Wang, Y.; Zhou, C.; Mikulovich, V.I. Traffic lights detection and recognition method based on the improved YOLOv4 algorithm. Sensors 2021, 22, 200. [Google Scholar] [CrossRef] [PubMed]
Li, Z.; Zhang, W.; Yang, X. An Enhanced Deep Learning Model for Obstacle and Traffic Light Detection Based on YOLOv5. Electronics 2023, 12, 2228. [Google Scholar] [CrossRef]
Haltakov, V.; Mayr, J.; Unger, C.; Ilic, S. Semantic segmentation based traffic light detection at day and at night. In Proceedings of the Pattern Recognition: 37th German Conference, GCPR 2015, Aachen, Germany, 7–10 October 2015; Proceedings 37. Springer: Berlin/Heidelberg, Germany, 2015; pp. 446–457. [Google Scholar]
Diaz-Cabrera, M.; Cerri, P.; Medici, P. Robust real-time traffic light detection and distance estimation using a single camera. Expert Syst. Appl. 2015, 42, 3911–3923. [Google Scholar] [CrossRef]
Wang, J.G.; Zhou, L.B. Traffic light recognition with high dynamic range imaging and deep learning. IEEE Trans. Intell. Transp. Syst. 2018, 20, 1341–1352. [Google Scholar] [CrossRef]
Zhou, Q.; Zhang, D.; Liu, H.; He, Y. KCS-YOLO: An Improved Algorithm for Traffic Light Detection under Low Visibility Conditions. Machines 2024, 12, 557. [Google Scholar] [CrossRef]
Li, S.; Wang, S.; Wang, P. A Small Object Detection Algorithm for Traffic Signs Based on Improved YOLOv7. Sensors 2023, 23, 7145. [Google Scholar] [CrossRef]
Li, K.; Wang, Y.; Hu, Z. Improved YOLOv7 for Small Object Detection Algorithm Based on Attention and Dynamic Convolution. Appl. Sci. 2023, 13, 9316. [Google Scholar] [CrossRef]
Zhang, L.; Xiong, N.; Pan, X.; Yue, X.; Wu, P.; Guo, C. Improved Object Detection Method Utilizing YOLOv7-Tiny for Unmanned Aerial Vehicle Photographic Imagery. Algorithms 2023, 16, 520. [Google Scholar] [CrossRef]
Hu, S.; Zhao, F.; Lu, H.; Deng, Y.; Du, J.; Shen, X. Improving YOLOv7-Tiny for Infrared and Visible Light Image Object Detection on Drones. Remote Sens. 2023, 15, 3214. [Google Scholar] [CrossRef]
Tang, F.; Yang, F.; Tian, X. Long-Distance Person Detection Based on YOLOv7. Electronics 2023, 12, 1502. [Google Scholar] [CrossRef]
Yang, L.; Zhang, R.Y.; Li, L.; Xie, X. SimAM: A Simple, Parameter-Free Attention Module for Convolutional Neural Networks. In Proceedings of the 38th International Conference on Machine Learning, Virtual, 18–24 July 2021; Meila, M., Zhang, T., Eds.; Volume 139, pp. 11863–11874. [Google Scholar]
Zhang, J.; Wei, X.; Zhang, L.; Yu, L.; Chen, Y.; Tu, M. YOLO v7-ECA-PConv-NWD Detects Defective Insulators on Transmission Lines. Electronics 2023, 12, 3969. [Google Scholar] [CrossRef]
Zhang, Y.F.; Ren, W.; Zhang, Z.; Jia, Z.; Wang, L.; Tan, T. Focal and Efficient IOU Loss for Accurate Bounding Box Regression. arXiv 2022, arXiv:2101.08158. [Google Scholar] [CrossRef]
Deepti Raj, G.; Prabadevi, B. MoL-YOLOv7: Streamlining Industrial Defect Detection With an Optimized YOLOv7 Approach. IEEE Access 2024, 12, 117090–117101. [Google Scholar] [CrossRef]
Wang, A.; Chen, H.; Liu, L.; Chen, K.; Lin, Z.; Han, J.; Ding, G. YOLOv10: Real-Time End-to-End Object Detection. arXiv 2024, arXiv:2405.14458. [Google Scholar]
Ouyang, H. DEYO: DETR with YOLO for End-to-End Object Detection. arXiv 2024, arXiv:2402.16370. [Google Scholar]

Figure 1. Illustration of a challenging nighttime scene for traffic light detection.

Figure 2. Detailed view of the LNT-YOLO framework.

Figure 3. Proposed SEAM module.

Figure 4. Comparison between BCE loss (blue line), focal loss (red line), and the proposed HSM loss function (green line).

Figure 5. Comparison of detection results for different models across three scenarios.

Table 1. Class distribution of the TN-TLD dataset across training, validation, and testing.

Purpose	Images	Red	Yellow	Green	Left	Straight	Right
Training	9679	6687	994	3915	90	2666	2164
Validation	2447	1725	196	979	25	650	515
Testing	14,850	6324	256	3612	100	1264	953

Table 2. Comparison of the performance metrics of the proposed LNT-YOLO model with SOTA models on the LNT-YOLO dataset.

Model	mAP@0.5	mAP@0.5:0.95	Params	GFLOPS
YOLOv7-tiny [14]	0.642	0.335	6 M	13.1
YOLOv10s [37]	0.616	0.365	8 M	24.5
DEYO-N [38]	0.649	0.364	5.7 M	10.2
KCS-YOLO [27]	0.632	0.372	1.8 M	-
LNT-YOLO (Ours)	0.781	0.423	6.4 M	14.9

Bold values highlight the best results for each respective metric.

Table 3. Comparison of the performance metrics of the proposed LNT-YOLO model with SOTA models on the LISA dataset.

Model	mAP@0.5	mAP@0.5:0.95
YOLOv7-tiny [14]	0.261	0.126
YOLOv10s [37]	0.235	0.115
DEYO-N [38]	0.282	0.114
KCS-YOLO [27]	0.255	0.12
LNT-YOLO (Ours)	0.325	0.138

Bold values highlight the best results for each respective metric.

Table 4. Ablation studies results.

Model	mAP@0.5	mAP@0.5:0.95	Params	GFLOPS
Baseline	0.642	0.335	6 M	13.1
Baseline + LLF	0.717	0.383	6.1 M	14.3
Baseline + LLF + SEAM	0.731	0.413	6.4 M	14.9
LNT-YOLO (Ours)	0.781	0.423	6.4 M	14.9

Bold values highlight the best results for each respective metric.

Table 5. Performance evaluation of the LNT-YOLO model across all the traffic light’s classes.

Class	P	R	mAP@0.5	mAP@0.5:0.95
all	0.678	0.744	0.781	0.423
Red	0.755	0.868	0.884	0.459
Yellow	0.0784	0.719	0.674	0.482
Green	0.873	0.729	0.832	0.441
Left	0.77	0.37	0.478	0.234
Straight	0.667	0.782	0.821	0.469
Right	0.921	0.997	0.995	0.452

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Munir, S.; Lin, H.-Y. LNT-YOLO: A Lightweight Nighttime Traffic Light Detection Model. Smart Cities 2025, 8, 95. https://doi.org/10.3390/smartcities8030095

AMA Style

Munir S, Lin H-Y. LNT-YOLO: A Lightweight Nighttime Traffic Light Detection Model. Smart Cities. 2025; 8(3):95. https://doi.org/10.3390/smartcities8030095

Chicago/Turabian Style

Munir, Syahrul, and Huei-Yung Lin. 2025. "LNT-YOLO: A Lightweight Nighttime Traffic Light Detection Model" Smart Cities 8, no. 3: 95. https://doi.org/10.3390/smartcities8030095

APA Style

Munir, S., & Lin, H.-Y. (2025). LNT-YOLO: A Lightweight Nighttime Traffic Light Detection Model. Smart Cities, 8(3), 95. https://doi.org/10.3390/smartcities8030095

Article Menu

LNT-YOLO: A Lightweight Nighttime Traffic Light Detection Model

Abstract

Highlights

Abstract

1. Introduction

2. Related Work

3. Proposed Method

3.1. Low-Level Feature Enhancement

3.2. Simple Efficient Attention Module

3.3. HSM-EIoU Loss

4. Experimental Results

4.1. TN-TLD Dataset

4.2. Evaluation Criteria

4.3. Implementation Details

4.4. Results

4.4.1. Comparisons with Other SOTA Detectors

4.4.2. Ablation Study

4.4.3. Discussions

5. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

Abbreviations

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI