FIRE-YOLOv8s: A Lightweight and Efficient Algorithm for Tunnel Fire Detection

Bu, Lingyu; Li, Wenfeng; Zhang, Hongmin; Wang, Hong; Tian, Qianqian; Zhou, Yunteng

doi:10.3390/fire8040125

Open AccessArticle

FIRE-YOLOv8s: A Lightweight and Efficient Algorithm for Tunnel Fire Detection

by

Lingyu Bu

¹

,

Wenfeng Li

^2,*

,

Hongmin Zhang

^1,*,

Hong Wang

¹,

Qianqian Tian

¹ and

Yunteng Zhou

²

¹

School of Electrical and Electronic Engineering, Chongqing University Of Technology, Chongqing 400054, China

²

China Merchants Chongqing Transportation Communications Technology Research & Design Institute Co., Ltd., Chongqing 400067, China

^*

Authors to whom correspondence should be addressed.

Fire 2025, 8(4), 125; https://doi.org/10.3390/fire8040125

Submission received: 15 February 2025 / Revised: 19 March 2025 / Accepted: 20 March 2025 / Published: 24 March 2025

(This article belongs to the Special Issue Computer Vision and Artificial Intelligence in Fire and Flame Detection)

Download

Browse Figures

Versions Notes

Abstract

To address the challenges of high algorithmic complexity and low accuracy in current fire detection algorithms for highway tunnel scenarios, this paper proposes a lightweight tunnel fire detection algorithm, FIRE-YOLOv8s. First, a novel feature extraction module, P-C2f, is designed using partial convolution (PConv). By dynamically determining the convolution kernel’s range of action, the module significantly reduces the model’s computational load and parameter count. Additionally, the ADown module is introduced for downsampling, employing a lightweight and branching design to minimize computational requirements while preserving essential feature information. Secondly, the neck feature fusion network is redesigned using a lightweight CNN-based cross-scale fusion module (CCFF). This module leverages lightweight convolution operations to achieve efficient cross-scale feature fusion, further reducing model complexity and enhancing the fusion efficiency of multi-scale features. Finally, the dynamic head detection head is introduced, incorporating multiple self-attention mechanisms to better capture key information in complex scenes. This improvement enhances the model’s accuracy and robustness in detecting fire targets under challenging conditions. Experimental results on the self-constructed tunnel fire dataset demonstrate that, compared to the baseline model YOLOv8s, FIRE-YOLOv8s reduces the computational load by 47.2%, decreases the number of parameters by 52.2%, and reduces the model size to 50% of the original, while achieving a 4.8% improvement in accuracy and a 1.7% increase in mAP@0.5. Furthermore, deployment experiments on a tunnel emergency firefighting robot platform validate the algorithm’s practical applicability, confirming its effectiveness in real-world scenarios.

Keywords:

road tunnel; fire detection; YOLOv8s; light weight; tunnel fire extinguishing robot

1. Introduction

The transportation infrastructure in China has undergone significant expansion in recent years, particularly in highway and tunnel construction. Statistical data reveal a consistent increase in total highway and tunnel mileage, where tunnel construction plays a pivotal role in enhancing road network efficiency by improving technical specifications, reducing travel distances, and increasing transport capacity [1]. However, this rapid expansion has been accompanied by an increased frequency of road tunnel fire incidents. Road tunnel fires present unique challenges compared to conventional fire scenarios, characterized by their rapid development and propagation within confined spaces [2]. These incidents often lead to severe consequences, including potential explosions of transported goods and the generation of high-temperature toxic fumes. Such characteristics significantly increase the risks to both infrastructure and human safety. Consequently, the development of reliable fire detection systems for tunnels has become crucial for effective fire prevention and control. Advanced detection mechanisms not only enhance the safety of tunnel operations but also contribute to the overall reliability of highway transportation systems, making them an essential component of modern infrastructure management [3].

The conventional approach to tunnel fire detection relies on sensors to identify smoke, temperature fluctuations, and gas compounds generated by fires. However, this method exhibits several limitations, including restricted detection area coverage, susceptibility to environmental factors that compromise detection speed and accuracy, and an inability to provide critical information regarding fire size, location, and progression [4]. With the rapid advancements in machine vision and deep learning, target detection algorithms utilizing convolutional neural networks (CNNs) have emerged as a pivotal tool for identifying flames or smoke in tunnel surveillance imagery [5]. These algorithms offer numerous advantages, such as enhanced intelligence, cost-effective deployment, simplified operation and maintenance, and ease of functional expansion. Moreover, they enable the precise localization of tunnel fires, facilitating more timely and accurate detection and early warning systems [6].

Recent advancements in deep learning (DL)-based fire detection algorithms have demonstrated significant progress. Boroujeni et al. [7] proposed a novel DL framework utilizing improved conditional generative adversarial networks (IC-GANs) to convert RGB images into infrared images, enabling the capture of temperature information from forest fires using unmanned aerial vehicles (UAVs) for wildland fire management. Cao et al. [8] introduced an enhanced forest fire detection algorithm based on YOLOv5, incorporating a global attention mechanism, BiFPN, and a reparameterized convolution module, which improved detection accuracy by 3.8%. Similarly, Wang et al. [9] developed a YOLO-based forest fire detection model by designing the LEIEM information extraction module and the DF dynamic fusion module for YOLOv8, achieving a 2.9% increase in detection accuracy.

Despite these advancements, the high computational demands of deep learning result in large model sizes and complexity, posing challenges for deployment on resource-constrained mobile devices. To address this, researchers have focused on developing lightweight algorithms. Xiao et al. [10] proposed EMG-YOLO, a fire detection algorithm tailored for embedded devices, which integrates a multiscale attention module (MAM) and an efficient multiscale convolution module (EMCM). By pruning YOLOv8n using a thinning algorithm, the improved algorithm reduces the number of parameters by 53.5% and the computational load by 49.8%. Zhou et al. [11] introduced a lightweight fire detection model for UAV-based wildfire monitoring, leveraging MobileNetV3 as the backbone network for YOLOv5 and employing semi-supervised knowledge distillation (SSLD) for training, resulting in a compact model size of 6.3 MB. Additionally, Wang et al. [12] proposed an improved YOLOv5-based forest fire detection model, incorporating DSConv, C2f-Light, and C3CIB modules, which enhanced inference speed by 19.3%.

While the aforementioned methods have optimized fire detection algorithms to some extent, achieving a balance between detection accuracy and computational resource consumption remains challenging, hindering their practical application. Furthermore, research on fire detection algorithms tailored for highway tunnel scenarios is limited. The presence of lighting and moving vehicles in such environments often degrades the detection accuracy of existing algorithms. To address these issues, this paper enhances the YOLOv8s baseline model for fire detection in road tunnel scenarios and proposes a lightweight fire detection algorithm, FIRE-YOLOv8s. The primary contributions of this work are as follows:

(1): A novel feature extraction module, P-C2f, is designed to extract fire features from images. The ADown module is employed for downsampling, significantly reducing the model’s parameter count and computational requirements, making it suitable for deployment on robotic platforms. The feature fusion network is redesigned using the CCFF module to further lightweight the model and enable effective multi-scale feature fusion. The dynamic head detection head is introduced to enhance the model’s capability to detect multi-scale fire targets in the complex environments of highway tunnels, effectively reducing missed detections and false alarms.
(2): Ablation and comparison experiments were conducted on a self-constructed tunnel fire dataset, and generalization experiments were performed on a public dataset. The results demonstrate that, compared to the baseline model, FIRE-YOLOv8s reduces the number of parameters and the computational load by 52.2% and 47.9%, respectively, while decreasing the model size by 50%. Additionally, the mAP@0.5 improves by 1.7%, achieving higher detection accuracy alongside lightweighting and exhibiting superior generalization capabilities.
(3): To validate the effectiveness of the proposed algorithm, a comprehensive experimental evaluation was conducted. A fire simulation scenario was constructed in a tunnel laboratory, and the algorithm was deployed on a tunnel emergency firefighting robot platform. Experimental results demonstrate that the proposed algorithm achieves a detection frame rate of 28.13 FPS on the robot platform, exhibiting robust detection performance for multi-scale fire targets. These findings indicate that the algorithm meets the engineering application requirements for highway tunnel scenarios.

The remainder of this paper is organized as follows: Section 2 presents a comprehensive review of related work. Section 3 begins with an overview of the evolution of the YOLO family of algorithms, including the rationale for selecting YOLOv8s as the baseline model, followed by a detailed introduction to the proposed FIRE-YOLOv8s model. Section 4 elaborates on the experimental design, case studies, and a discussion of the results. Finally, Section 5 concludes the paper and outlines potential directions for future research.

2. Related Works

2.1. Methods Based on Image Processing

Early fire detection algorithms based on image recognition primarily relied on extracting static features such as the color and shape of flames and smoke. Flames typically appear as orange–red in images, while smoke is usually black or grey. Chen et al. [13] proposed an early fire alarm method based on video processing, utilizing chromaticity and disorder measurements derived from the RGB model to extract flame and smoke pixels. Wieser and Brupbacher [14] introduced a smoke detection algorithm based on the principle of image contrast loss due to smoke, conducting fire detection experiments in the Hagbach test tunnel, the “Schoenberg” road tunnel, and the Gubrist tunnel. Horng et al. [15] improved upon Chen et al.’s color-based algorithm by eliminating flame-like regions in images. Chen et al. [16] proposed a video-based method for early fire smoke detection; however, experimental results revealed that clouds or shadows could trigger false alarms, indicating that fire detection based solely on color is unreliable.

To enhance the reliability of fire detection, researchers have developed algorithms that incorporate not only static features such as color and shape but also dynamic features, including patterns, motion, flickering, and edge blurring. Toreyin et al. [17] proposed a flame detection algorithm for video that integrates color and motion information with a one-dimensional temporal wavelet transform to detect flame flickering, effectively addressing the issue of false alarms in methods relying solely on color and motion detection. Celik [18] introduced a fire detection algorithm combining fire color modeling and motion detection, developing a novel fire color model in the CIE Lab* color space. Experimental results demonstrated that fire pixels could be accurately classified based on color information alone. Ko et al. [19] utilized both motion detection and smoke color information for fire detection, creating a spatio-temporal fire model with wavelet coefficients and employing an SVM classifier for final fire pixel validation. This approach exhibited robustness against noise in videos, such as smoke and subtle differences between consecutive frames. Rong et al. [20] proposed a fire detection algorithm based on the color, motion, and pattern features of fire targets in various scenes, demonstrating advantages in robustness, accuracy, and rapid response. Given the challenges of applying existing fire detection algorithms to tunnel environments, Han and Lee [21] proposed a novel algorithm tailored for tunnels. Their method utilized color and motion information to reduce false detections and could identify the exact location of a fire event at an early stage.

2.2. Methods Based on Deep Learning

Traditional image processing methods based on fire features, while promising, are limited by the accuracy of feature extraction and environmental factors. Additionally, the complex and variable nature of flame and smoke shapes and colors poses significant challenges for these methods. With the rapid advancements in machine vision and deep learning, techniques built on convolutional neural networks (CNNs) have demonstrated remarkable progress, surpassing traditional image processing combined with machine learning methods in terms of both efficiency and accuracy [22]. Currently, deep learning-based fire detection algorithms have gained increasing attention and application, primarily categorized into single-stage and two-stage algorithms. Two-stage algorithms, such as R-CNN and Faster R-CNN, achieve high detection accuracy but suffer from large model sizes, making them difficult to deploy in resource-constrained tunnel environments and unsuitable for real-time fire monitoring. In contrast, single-stage algorithms, represented by YOLO and SSD, adopt a mathematical regression approach, offering faster recognition speeds while maintaining comparable accuracy to two-stage algorithms. This makes them well-suited for real-time applications, prompting researchers to develop a series of fire detection algorithms based on YOLO.

For instance, Min Hao et al. [23] proposed a tunnel fire detection algorithm based on an improved YOLOv8, incorporating polarized attention, the PConv module, and the WIoU function. Experimental results show that the algorithm improves mAP by 1.3% while reducing parameters by 29.7%, meeting real-time detection requirements in tunnel scenarios. Chang Li et al. [24] introduced a YOLOv5s-SRGAN fusion algorithm for tunnel fire detection, utilizing K-means to recalculate anchor frame sizes and integrating the CBAM attention mechanism and gradient equalization mechanism. This approach demonstrates superior recognition performance for small flame targets. Mao et al. [25] enhanced the YOLOv5 model by incorporating the CBAM attention mechanism, the RFB module, and a specialized feature extraction network, achieving a fire detection accuracy of 92.0% and an average detection speed of 55.9 FPS. Jia et al. [26] employed gamma decoding for image preprocessing to enhance dark details and improve algorithm accuracy, providing valuable insights for tunnel fire monitoring system design. Li et al. [27] proposed an intelligent tunnel fire monitoring method based on YOLOX and edge computing, leveraging frequency domain analysis, wavelet transform, and knowledge distillation to optimize the model. This approach addresses the computational limitations of edge devices, achieving a flame and smoke detection accuracy of 85% on a self-built dataset.

3. Materials and Methods

3.1. Introduction of YOLOv8s

The core idea of the YOLO-series algorithms is to treat target detection as a single regression problem, directly mapping image pixels to bounding box coordinates and class probabilities. This approach results in a relatively simple model structure with fewer parameters, enabling high-speed target detection while maintaining high accuracy. As a result, YOLO algorithms have been widely adopted in real-time processing scenarios [28]. Since the introduction of YOLOv1, the series has evolved through continuous improvements, culminating in YOLOv11.

YOLOv1, the pioneering model in the series, was the first to frame target detection as a single regression problem, enabling end-to-end training. YOLOv2 built upon YOLOv1 by incorporating batch normalization, multi-scale training, and anchor frames, significantly enhancing model performance. YOLOv3 introduced a deeper network and multi-scale prediction to improve the detection of objects of varying sizes. YOLOv5 gained widespread adoption, particularly in industrial applications, due to its lightweight design. YOLOv7 further advanced the series with an updated network architecture, optimizing both inference speed and accuracy.

YOLOv8 (2023) represents a convergence and refinement of previous YOLO models. For the first time, Ultralytics designed it as a versatile algorithmic framework library, supporting not only the YOLO family but also non-YOLO models and a wide range of vision tasks, including classification, segmentation, and pose estimation, making it highly extensible. YOLOv10 (2024) and YOLOv11 (2024) further optimized the network structure based on YOLOv8s, achieving higher detection accuracy while maintaining high-speed operation. However, these advancements come with increased model complexity.

YOLOv8 represents one of the latest advancements in the YOLO series of target detection algorithms, integrating the strengths of its predecessors while introducing new features and improvements. It offers a range of model sizes (n/s/m/l/x) based on scaling factors, catering to diverse application requirements. Given its scalability, ease of development and deployment, and consideration of the hardware limitations of emergency tunnel firefighting robots, YOLOv8s is selected as the baseline model. YOLOv8s strikes an optimal balance between model size and accuracy, making it well suited for real-time applications.

The architecture of YOLOv8s extends the network structure of YOLOv5 and comprises three main components: the backbone, neck, and head. The backbone and neck components incorporate the ELAN design concept from YOLOv7. The backbone is primarily responsible for feature extraction and introduces a novel C2f module (CSPLayer_2Conv), which leverages residual connections and a bottleneck structure to enhance feature extraction performance. The neck component focuses on multi-scale feature fusion, utilizing the PAN-FPN (path aggregation network with feature pyramid network) architecture to effectively aggregate and fuse features across different scales. The head component processes the feature maps and generates prediction results. It employs a decoupled head structure, which includes three detection heads. This design allows for greater flexibility in adapting to feature maps of varying sizes, thereby improving detection accuracy.

3.2. Improvement of YOLOv8s Model: FIRE-YOLOv8s

In this paper, the final optimized network model is named FIRE-YOLOv8s, and its architecture is illustrated in Figure 1. FIRE-YOLOv8s retains the three primary components: backbone, neck, and head. The backbone incorporates the enhanced P-C2f module for feature extraction and introduces the ADown module for downsampling, significantly reducing the model’s parameter count and computational requirements. The neck employs a redesigned feature fusion network structure with additional layer connections, which not only decreases the computational load but also enhances the efficiency of multi-scale feature fusion. The head utilizes the dynamic head structure, integrating multiple attention mechanisms to improve the model’s detection performance in complex backgrounds.

3.3. P-C2f Module

The C2f module in YOLOv8s comprises a DarknetBottleneck module, a 1 × 1 convolutional kernel, and a residual connection. This module integrates high-level features and contextual information, enabling richer gradient flow but at the cost of increased computational complexity. To ensure the improved model is more lightweight than the original, making it suitable for deployment on small embedded devices such as robots, and to allow for future enhancements to improve model accuracy, the backbone feature extraction network requires optimization for lightweighting.

Existing studies often employ lightweight convolutions, such as depthwise convolution (DWConv) [29] and group convolution (GConv) [30], to replace standard convolutions for feature extraction. However, these methods can lead to increased memory access, thereby prolonging computation time. To address this issue, this paper proposes a novel feature extraction module, P-C2f, utilizing partial convolution (PConv) [31]. The structure of P-C2f is illustrated in Figure 2.

The P-C2f module consists of n PConv convolutions and two 1 × 1 convolution kernels. The feature map is initially processed by a 1 × 1 convolution kernel to reduce the number of input channels. Feature information is then extracted using multiple PConvs. Residual connections are employed to directly add the inputs to the outputs, forming a cross-layer connection that enables more accurate representation of deeper features. Finally, the number of channels in the feature map is restored through processing by another 1 × 1 convolution kernel. Unlike conventional convolution, PConv utilizes the redundancy in the feature map and flexibly applies conventional convolution to only a portion of the input channels for feature extraction while keeping the rest of the channels unchanged, which significantly reduces computational redundancy as well as memory accesses, and its working principle is shown in Figure 3. For a given feature map input

I \in R^{c \times h \times w}

, the computational cost and memory access for a single convolution operation are calculated as shown in Equations (1) and (2):

h \times w \times k^{2} \times c^{2},

(1)

h \times w \times 2 c + k^{2} \times c^{2} \approx h \times w \times 2 c,

(2)

where h and w are the height and width of the feature map, k is the kernel size, and c is the number of channels. When the channel ratio of partial convolution (PConv) to conventional convolution satisfies c/

c_{p}

= 1/4, the computational cost of PConv is reduced to 1/16 of that of conventional convolution, while the memory access requirement is only 1/4 of that of conventional convolution.

In this study, we propose a novel feature extraction module, termed P-C2f, which leverages PConv to enhance computational efficiency. The proposed module significantly reduces both the computational complexity and the number of model parameters. However, since PConv performs feature extraction only on a subset of input channels, it may introduce a certain degree of degradation in detection accuracy.

3.4. ADown Module

In the YOLOv8s feature extraction network, the downsampling operation is primarily performed using a conventional convolution module, which consists of a 3 × 3 convolution kernel, batch normalization, and a SiLU activation function. This module reduces the spatial dimensions of the feature maps. However, due to the information bottleneck phenomenon, the input feature map may lose a significant amount of information during the layer-by-layer feature extraction and spatial transformation processes. This information loss leads to the diminished salience or dilution of features corresponding to small flame targets during downsampling, making it challenging for the model to effectively recognize these targets in subsequent layers. Addressing the information loss problem is particularly critical for lightweight algorithms, which are typically under-parameterized and prone to substantial information loss during the feed-forward process.

To mitigate these issues, this study introduces the ADown module as a replacement for the conventional convolution module in downsampling. The ADown module, proposed by Wang et al. [32], is a lightweight convolutional module designed to enhance inter-layer information interactions through a branching architecture. It effectively aggregates network information and reduces information loss during propagation. As illustrated in Figure 4, the input feature map first undergoes average pooling (AvgPool) to capture global information and reduce the feature map size. Subsequently, the feature map is split into two sub-feature maps along the channel dimension. One sub-feature map is processed using a 3 × 3 convolutional kernel, while the other undergoes maximum pooling (MaxPool2d) followed by a 1 × 1 convolutional kernel. Finally, the outputs of the two branches are concatenated to produce the final output. This hybrid pooling strategy is designed to extract important information, discard less relevant details, reduce computational costs, and retain critical features. Notably, average pooling is more sensitive to global information, whereas maximum pooling excels at capturing finer details.

In this study, the ADown module is integrated into the downsampling operation of YOLOv8s. This integration reduces the model’s parameter count and computational load while maintaining accuracy by effectively decreasing the data size of the input feature maps. Furthermore, in deep networks, the ADown downsampling mechanism facilitates the abstraction of high-level features from low-level details, thereby preserving as much image information as possible and mitigating significant information loss.

3.5. Enhancement of the Feature Fusion Network Using the CCFF Module

The neck component of YOLOv8s plays a critical role in the model, as it processes and fuses feature maps from the backbone at multiple scales to capture feature information across different resolutions. The neck adopts the PAN-FPN architecture [33], which primarily consists of multiple C2f modules, regular convolutional modules, and top-down and bottom-up sampling and connection operations. However, this architecture tends to lose some important feature information obtained from the backbone during the feature fusion process, resulting in suboptimal fusion quality. To address these limitations and further reduce the model’s complexity, this study redesigns the feature fusion network by incorporating the lightweight cross-scale feature fusion (CCFF) module.

The CCFF module, proposed by Zhao et al. [34], is a lightweight cross-scale feature fusion module based on convolutional neural networks (CNNs). As illustrated in Figure 5, the CCFF module consists of up-sampling (U), down-sampling (D), and a fusion block. The fusion block employs a 1 × 1 convolutional kernel and RepConv [35] with residual connections to fuse features from two adjacent scales into a new feature representation. The key advantage of CCFF lies in its use of lightweight operations for cross-scale feature fusion. Specifically, it utilizes a 1 × 1 convolutional kernel to reduce the resolution of feature maps, rearranges pixels to adjust the size and structure of the feature maps, and retains essential details without directly processing high-resolution features. This approach significantly reduces computational requirements. Additionally, the fusion block, integrated into the fusion path, leverages residual connections and convolutional operations to enhance the model’s feature representation capability, particularly for small target detection. By effectively fusing multi-level feature information and minimizing the interference of redundant features, the CCFF module improves both the efficiency and the accuracy of the feature fusion process.

In this study, the neck feature fusion network is redesigned by incorporating the CCFF module, splitting the fusion block, and introducing the P-C2f and ADown modules. This reconstruction significantly reduces the computational cost and parameter count of the model while maintaining its performance. The structure of the redesigned network is illustrated in Figure 6. Specifically, feature maps P3, P4, and P5 at different scales undergo resolution reduction via a 1 × 1 convolutional kernel. Subsequently, through a series of up-sampling, down-sampling, and convolutional layers, the feature representation capability is enhanced, enabling effective horizontal fusion of multi-scale feature maps. Finally, the processed feature maps (T1, T2, T3) are forwarded to the detection head for prediction. By reconfiguring the feature fusion network, the inter-layer connections are strengthened, and the efficiency of feature fusion is improved. This design leverages the spatial information from high-resolution feature maps and the rich semantic information from low-resolution feature maps, thereby enhancing the model’s ability to detect small targets. Furthermore, the redesigned network is more lightweight, with reduced computational and parametric complexity compared to the original architecture.

3.6. Dynamic Head

The head component of YOLOv8s is responsible for processing feature maps derived from the three layers of the neck network to generate the model’s outputs. It comprises a 3 × 3 convolution, a 1 × 1 convolution kernel, and a loss function. However, in tunnel fire scenarios, multiple fire targets often appear simultaneously at varying scales, and factors such as tunnel lighting, dust, and vehicle taillights can introduce significant interference in fire detection. Additionally, the shape, rotation angle, and position of fire targets in captured images can vary considerably when an emergency tunnel firefighting robot is in motion. To address these challenges and enhance the accuracy of fire target localization, this study proposes the integration of the dynamic head (DyHead) [36] as a replacement for the original detector head in YOLOv8s, with further optimization to adapt it for fire detection tasks. The structure of DyHead is illustrated in Figure 7. DyHead coherently integrates three attention mechanisms and deploys them across the three dimensions of the input feature map I (i.e., scale, spatial, and channel). This design enables more effective fusion of contextual information and improved recognition of multi-scale targets. The mathematical formulation of DyHead can be expressed as follows:

W (I) = π_{c} (π_{s} (π_{l} (I) \cdot I) \cdot I) \cdot I,

(3)

where

π_{l} ()

,

π_{s} ()

, and

π_{c} ()

represent three distinct attention functions applied to the horizontal, spatial, and channel dimensions, respectively. Specifically, the scale-aware attention module comprises global pooling, a 1 × 1 convolution, a ReLU activation function, and a hard sigmoid activation function. The spatially aware attention module primarily incorporates deformable convolution V2, which includes offset learning and feature amplitude modulation. The task-aware attention module consists of a fully connected layer, a ReLU activation function, and a normalization operation. When a feature map passes through the scale-aware attention module, it becomes more sensitive to scale variations in the target. Subsequently, after processing by the spatially aware attention module, the feature map becomes sparser and focuses on discriminative spatial locations of the target. Finally, the task-aware attention module refines the feature map with task-specific activations, enabling it to adapt to different downstream tasks.

In this study, we propose the integration of the DyHead detection module as a replacement for the conventional detection head in tunnel fire detection systems. The DyHead module enhances the model’s capability to process multi-scale target features while simultaneously improving task-specific and spatial awareness. This enhancement significantly boosts the model’s performance in fire detection within the complex environments of road tunnels, leading to improved accuracy and robustness. Nevertheless, the implementation of multiple attention mechanisms within DyHead inevitably increases the computational overhead of the model to some extent.

4. Experiments and Analysis

4.1. Dataset

Due to the absence of publicly available datasets for tunnel fire detection, this study utilizes a proprietary dataset constructed by China Merchants Chongqing Transportation Research and Design Institute Co., Ltd, located in Chongqing, China. The dataset comprises multiple sources, including: (1) online images of actual tunnel fire incidents, (2) video frames extracted from tunnel fire drills conducted in various cities across China, and (3) experimental fire scenarios captured in the State Key Tunnel Laboratory and the Huayan Tunnel of Chongqing Municipality. To enhance the dataset’s diversity and robustness, additional highway fire images were incorporated, and artificial variations were introduced through brightness adjustments and blurring noise to simulate challenging illumination conditions in tunnel environments.

The final dataset consists of approximately 6000 images, exclusively containing fire-related instances. These images were systematically partitioned into training (70%), validation (10%), and test sets (20%) to ensure proper model evaluation. The dataset encompasses diverse environmental conditions, including varying fire scales (large and small) and complex scenarios with vehicle occlusions. For precise annotation, fire regions in all images were manually labeled using Labelme3.16.2 software, with annotations initially stored in XML format and subsequently converted to TXT format for compatibility with our detection framework. Representative samples from the dataset are illustrated in Figure 8.

4.2. Experimental Environment

To validate the effectiveness of the proposed algorithm, we constructed a dedicated experimental platform. The experimental setup comprises a Windows 10 Professional 64-bit operating system, with detailed hardware specifications and software configurations systematically documented in Table 1. For model training and algorithm implementation, we employed the PyTorch 2.2.2 deep learning framework, which provides comprehensive support for neural network development and optimization.

To maintain consistency across experiments, we adopted identical hyperparameter configurations for all training processes. The specific settings were as follows: the model was trained for 253 epochs with a batch size of 128, utilizing the Stochastic Gradient Descent (SGD) optimization algorithm. The initial learning rate was set to 0.01, accompanied by a weight decay coefficient of 0.0005 to prevent overfitting. All input images were uniformly resized to 640 × 640 pixels to ensure dimensional consistency. To guarantee the fairness and reliability of experimental comparisons, we intentionally avoided using any pre-trained weights from official sources, ensuring that all network trainings were conducted from scratch under identical initialization conditions.

4.3. Model Evaluation Indicators

The model’s performance was comprehensively evaluated using multiple quantitative metrics, including precision (P), recall (R), F1 score, mean average precision at 0.5 IoU threshold (mAP@0.5), model size (weight size), computational complexity measured in floating point operations (FLOPs), parameter count (Params), and inference speed measured in frames per second (FPS). These metrics were calculated according to the following formulations:

P = \frac{T P}{T P + F P},

(4)

R = \frac{T P}{T P + F N},

(5)

F 1 = \frac{2 P R}{P + R},

(6)

A P = \int_{0}^{1} P (R) d R,

(7)

m A P = \frac{1}{N} \sum_{i = 1}^{N} A P_{i}

(8)

where true positive (TP) represents the number of correctly identified positive samples, false positive (FP) indicates the number of incorrectly classified negative samples as positive, and false negative (FN) corresponds to the number of undetected positive samples. Additionally, N denotes the total number of distinct categories within the dataset.

Precision (P) quantifies the proportion of correctly identified positive instances among all predicted positive samples, reflecting the model’s capability to accurately distinguish fire occurrences from non-fire scenarios. Recall (R) measures the ratio of correctly detected positive samples to the total number of actual positive instances, indicating the model’s effectiveness in identifying fire incidents. The F1 score, serving as the harmonic mean of precision and recall, provides a balanced evaluation metric, where higher values indicate more robust model performance. The mean average precision at 0.5 IoU threshold (mAP@0.5) integrates precision and recall across all N categories in the dataset, offering a comprehensive assessment of the model’s detection capability. Model size (weight size) reflects the memory footprint of the model, with smaller values indicating more efficient memory utilization. Computational complexity, measured in floating point operations (FLOPs), represents the model’s computational requirements, where lower FLOP values correspond to reduced computational overhead. The parameter count (Params) indicates the model’s architectural complexity, with fewer parameters generally suggesting a more lightweight architecture. Finally, inference speed, measured in frames per second (FPS), characterizes the model’s real-time processing capability, where higher FPS values denote faster detection performance.

4.4. Ablation Experiment

To systematically validate the effectiveness of our proposed improvement strategy, we conducted comprehensive ablation studies to evaluate the individual and combined contributions of each enhancement to the YOLOv8s model’s performance. The experimental design incorporates four key modifications: (1) implementation of a novel feature extraction module (P-C2f) for enhanced feature representation, (2) adoption of the ADown module for optimized downsampling operations, (3) integration of the CCFF module to refine the neck architecture, and (4) utilization of the dynamic head detector for final prediction processing and output generation. The experimental outcomes, presented in Table 2, employ the symbol “✓” to denote the incorporation of each respective improvement method.

As evidenced by the experimental results in Table 2, compared to the original YOLOv8s, the introduction of the P-C2f module reduced mAP@0.5 by 0.5%, but it also decreased the number of parameters by 32.4% and the computational load by 30.7%. This indicates that P-C2f significantly reduces model complexity while achieving lightweighting. However, since PConv extracts features from only a subset of input channels, it results in a slight degradation in accuracy and average precision. After incorporating the ADown module for downsampling, mAP@0.5 improved by 1%. This suggests that ADown mitigates information loss to some extent, and its lightweight design further reduces the model’s parameters and computational requirements. The use of the CCFF module to enhance the neck structure improved accuracy by 0.2% and recall by 0.3%. This indicates that the improved neck network enhances the model’s ability to detect small targets while further reducing parameters and computational load, providing room for subsequent precision improvements. The introduction of the dynamic head led to a 1.6% increase in mAP@0.5, a 2% improvement in accuracy, and a 1.7% boost in recall, with only a minor increase in parameters and computational cost. This demonstrates that dynamic head enhances the model’s ability to detect multi-scale fire targets in complex highway tunnel environments through the deployment of multiple attention mechanisms.

By incrementally adding the proposed improvement strategies, the experiments demonstrated that each enhancement contributed to the model’s performance. Specifically, P-C2f, ADown, and CCFF effectively reduced the model’s parameters and computational complexity, while the dynamic head significantly improved accuracy, recall, and average precision. Compared to YOLOv8s, FIRE-YOLOv8s reduced Params and FLOPs by 52.2% and 47.9%, respectively, while improving mAP@0.5 by 1.7%. These results indicate that the four improvement strategies successfully achieve lightweighting without compromising detection accuracy, making the model more suitable for resource-constrained tunnel scenarios.

To visualize the impact of the improvement strategies on model accuracy, Figure 9 presents the mAP@0.5 curves for the ablation experiments involving a series of models. As illustrated in the figure, the final improved model achieves a higher mAP@0.5 compared to the other models after approximately 150 epochs.

4.5. Comparison Experiment

To further evaluate the performance of FIRE-YOLOv8s, comparative experiments were conducted under identical experimental conditions with other mainstream models. The models included in the experiments were the two-stage model Faster R-CNN, the single-stage model SSD, lightweight YOLO series models (YOLOv5s, YOLOv6s, YOLOv7-tiny, YOLOv8s, YOLOv10s, and YOLOv11s), the proposed FIRE-YOLOv8s, and tunnel fire detection algorithms from the literature [24,25]. The experimental results are summarized in Table 3.

As shown in Table 3, FIRE-YOLOv8s achieved an accuracy of 78.3% and an mAP@0.5 of 64.5%, significantly outperforming Faster R-CNN and SSD, as well as other lightweight YOLO series models. These results indicate that FIRE-YOLOv8s is highly suitable for accurate fire detection tasks in tunnel environments. To visualize its performance, Figure 10 compares the mAP@0.5 curves of FIRE-YOLOv8s with other models, demonstrating that FIRE-YOLOv8s consistently achieves higher average accuracy.

In terms of model lightweighting, computational complexity, and size, FIRE-YOLOv8s exhibited superior performance. It had the smallest memory usage (10.3 MB), the fewest parameters (5.3 M), and a computational load of 14.9 G, which was only slightly higher than that of YOLOv7-tiny. However, FIRE-YOLOv8s outperformed YOLOv7-tiny in both average precision and accuracy. These attributes enable FIRE-YOLOv8s to significantly reduce computational complexity and model size, making it highly suitable for deployment on robotic platforms.

FIRE-YOLOv8s achieved a detection speed of 120.7 FPS, meeting real-time detection requirements and surpassing most models in speed, with only SSD being faster. However, SSD’s average accuracy was considerably lower than that of FIRE-YOLOv8s, highlighting the latter’s superior balance of speed and precision.

When compared to other tunnel fire detection algorithms, FIRE-YOLOv8s demonstrates clear advantages. For instance, the method proposed in [25] enhances YOLOv5’s accuracy by incorporating the CBAM attention mechanism and RFB module without increasing computational load. However, it employs a multi-model predictive inference approach, resulting in a larger model size, slower detection speed, and lower accuracy compared to FIRE-YOLOv8s. Similarly, the approach in [24] uses K-means to recalculate anchor frame sizes, integrates the CBAM module and gradient equalization mechanism into YOLOv5s, and combines it with the SRGAN model. While this improves small target detection accuracy, it also increases computational load and degrades real-time performance.

Comparative experimental results indicate that FIRE-YOLOv8s offers significant advantages in detection accuracy and lightweighting compared to both mainstream lightweight target detection algorithms and state-of-the-art tunnel fire detection algorithms. Its ability to balance accuracy, computational efficiency, and real-time performance makes it a highly effective solution for fire detection in tunnel environments.

4.6. Visualization and Analysis of Tunnel Fire Detection

To evaluate the detection performance of FIRE-YOLOv8s more comprehensively, a subset of images from the dataset was selected for comparative analysis with several mainstream models. These images included scenarios with regular fire targets, small targets, and multiple targets.

The detection results for a single regular-sized fire target are shown in Figure 11. Most models accurately detected the fire, with only YOLOv7-tiny and Faster R-CNN producing false detections. Among the models, FIRE-YOLOv8s and YOLOv5s demonstrated higher overall detection accuracy.

The detection results for a small fire target are shown in Figure 12. YOLOv8s and SSD exhibited significant missed detections, while Faster R-CNN, YOLOv6s, and YOLOv7-tiny produced varying degrees of misdetections, mistakenly identifying vehicle taillights as fires. Faster R-CNN performed the worst, with overlapping bounding boxes. In contrast, FIRE-YOLOv8s, YOLOv5s, and YOLOv11s performed well, with no missed detections or false alarms. Notably, FIRE-YOLOv8s predicted bounding boxes more accurately than YOLOv5s and YOLOv11s, achieving approximately 70% precision.

The detection results for multiple fire targets are shown in Figure 13. FIRE-YOLOv8s effectively identified all targets with high accuracy. YOLOv8s and SSD exhibited varying degrees of under-detection, YOLOv5s and YOLOv11s produced overlapping bounding boxes, and Faster R-CNN and YOLOv6s generated some misdetections. Although YOLOv7-tiny detected all targets, its accuracy was low. Overall, FIRE-YOLOv8s outperformed the other models, producing more accurate bounding boxes in multi-target scenarios.

The combined experimental results across different scenarios demonstrate that FIRE-YOLOv8s achieves higher detection accuracy, lower rates of missed and false detections, and stronger multi-scale target detection capabilities compared to other models. In conclusion, the visualization experiments validate the effectiveness of the proposed improvements, significantly enhancing the model’s ability to detect tunnel fire targets.

4.7. Generalisation Experiments

To further evaluate the performance of FIRE-YOLOv8s, the publicly available dataset M4SFWD [37] was selected for comparison with the aforementioned models. M4SFWD utilizes Unreal Engine 5 to simulate real forest fire scenarios with aerial views, covering eight different terrains, four weather conditions (sunny, foggy, rainy, snowy), and three times of day (morning, evening, night). Each scene is configured with three scenarios: no target, single target, and multiple targets, aiming to create a comprehensive forest fire dataset. The dataset includes 3974 images, divided into training, validation, and test sets in a 7:1:2 ratio. The experimental parameters were consistent with those described in Section 4.2.

The experimental results, as shown in Table 4, demonstrate that FIRE-YOLOv8s achieved an accuracy of 89.7% and an mAP@0.5 of 75.2% on the M4SFWD dataset. This represents a 3.3% improvement over the baseline model YOLOv8s and significantly outperformed other models. The number of parameters, computational load, and model size remained largely unchanged, while the detection speed was only slightly lower than that of SSD. These results indicate that FIRE-YOLOv8s maintains strong generalization capabilities and can be effectively applied to fire detection in diverse scenarios beyond tunnels.

4.8. Algorithm Validation and Application

4.8.1. Introduction to Robotics Platform

The overall structure and control system of the robot are illustrated in Figure 14. The hardware control scheme consists of three main components: an upper computer, an industrial computer, and a lower-level controller.

The lower-level controller is based on the STM32F407VGT6 microcontroller and is responsible for managing the electrical drive control module (e.g., servo motors, LEDs, fire extinguishing equipment) and the data acquisition module (e.g., RFID module, motor encoder, temperature and humidity sensors). The industrial computer communicates with the lower-level controller via a 485 bus, enabling manual control of motor operations and real-time acquisition of motor speed and position data. Additionally, the industrial computer interfaces with a dual-spectrum camera and an edge AI device. The edge AI device processes video image data in real time by deploying the fire detection algorithm, enabling efficient and accurate fire detection. The upper computer communicates with the industrial computer wirelessly and provides a user interface (UI) for real-time monitoring of tunnel video images and robot position coordinates.

This integrated control system ensures efficient operation, real-time data processing, and effective fire detection capabilities, making the robot well suited for tunnel emergency scenarios.

4.8.2. Algorithm Deployment Experiment

To validate the computational efficiency and practical applicability of the proposed algorithm on resource-constrained embedded systems, we implemented the algorithm on a rail-mounted tunnel firefighting robot platform. This robotic system, operating along the tunnel ceiling, performs real-time fire monitoring through its integrated vision system. The experimental platform utilizes an NVIDIA Jetson Xavier NX module, offering a computational capacity of 21 TOPS, with the following technical specifications:

Hardware configuration:

Processor: 6-core NVIDIA Carmel ARM v8.22 CPU @ 1.9GHz (64-bit architecture)
Graphics processing unit: 384 CUDA cores with 48 Tensor cores

Software environment:

Operating system: Ubuntu 18.04 LTS
Python: Version 3.6.9
CUDA: Version 10.2
PyTorch: Version 1.6

This experimental setup was specifically designed to evaluate the algorithm’s performance under realistic operational constraints, particularly in terms of computational efficiency and real-time processing capabilities on embedded hardware platforms.

The implementation process begins with the deployment of FIRE-YOLOv8s on the NVIDIA Jetson Xavier NX embedded platform. The trained model files are converted into TensorRT-compatible format using the tensorrtx tool, enabling inference acceleration through TensorRT optimization. Additionally, model quantization with FP16 precision is implemented to enhance computational efficiency. This configuration allows real-time processing of video streams captured by the robot’s integrated camera system for fire detection.

Experimental validation was conducted in the State Key Tunnel Laboratory of China Merchants Chongqing Transportation Research and Design Institute, with the experimental setup illustrated in Figure 15. The evaluation results demonstrate that the proposed algorithm achieved:

Real-time tunnel fire detection capability;
Robust multi-scale fire target recognition;
A zero false alarm rate, with no target omissions;
Sustained detection frame rate of 28.13 FPS.

These experimental outcomes, with representative results shown in Figure 16, confirm the algorithm’s strong applicability for tunnel fire detection scenarios under computational resource constraints. The system’s performance metrics validate its effectiveness in real-time operational environments.

5. Conclusions

This paper proposes a lightweight tunnel fire detection algorithm, FIRE-YOLOv8s, to address the challenges of high computational complexity and low accuracy in existing fire detection algorithms when applied to highway tunnel scenarios. The algorithm enhances the YOLOv8s model by improving its backbone, neck, and head components: a novel feature extraction module (P-C2f) is introduced to significantly reduce the number of parameters and computational complexity, the lightweight ADown module is employed to minimize information loss while reducing computational load, the CCFF module is used to improve feature fusion efficiency and further lightweight the model, and the dynamic head detection head, combined with attention mechanisms, enhances the model’s performance in complex tunnel environments.

The experimental results demonstrate that FIRE-YOLOv8s achieves a 1.7% improvement in average accuracy (mAP@0.5), reduces the number of parameters by 52.2%, decreases computational load by 47.9%, and reduces the model size to 50% of the original YOLOv8s. Compared to other lightweight models, FIRE-YOLOv8s achieves the highest detection accuracy with the smallest model size and the fewest parameters, while maintaining computational requirements only slightly higher than those of YOLOv7-tiny. These results highlight the algorithm’s ability to balance detection accuracy and computational efficiency, making it highly suitable for real-time fire detection in tunnel scenarios.

Despite its advantages, FIRE-YOLOv8s has areas for improvement. The P-C2f module’s feature extraction from only a subset of input channels results in a slight decrease in accuracy and average precision. The deployment of multiple attention mechanisms in the dynamic head increases the computational load. Additionally, the lack of publicly available tunnel fire datasets restricts the validation of the model’s generalization ability. Future work will focus on optimizing computational resource consumption while maintaining detection accuracy, expanding the dataset to include more diverse tunnel fire images, and improving the model’s generalization and stability. Integrating the algorithm with fire extinguishing equipment on tunnel emergency firefighting robots could enable precise fire suppression, providing a more reliable and efficient solution for tunnel safety operations.

Author Contributions

Conceptualization, L.B., W.L. and H.Z.; methodology, L.B.; software, L.B.; validation, L.B., W.L. and H.Z.; formal analysis, H.W.; investigation, Q.T.; resources, W.L.; data curation, L.B.; writing—original draft preparation, L.B.; writing—review and editing, W.L. and H.Z.; visualization, L.B.; supervision, W.L., H.Z. and Y.Z.; project administration, L.B.; funding acquisition, W.L. All authors have read and agreed to the published version of the manuscript.

Funding

This research was financially supported by multiple funding sources, including the National Key Research and Development Program of China (grant No. 2021YFC3002000); the Natural Science Foundation of Chongqing, China (grant Nos. cstc2021jcyj-msxmX0525, CSTB2022NSCQ-MSX0786, CSTB2022NSCQ-MSX1049, and CSTB2023NSCQ-MSX0911); and the Scientific and Technological Research Program of Chongqing Municipal Education Commission (grant No. KJQN202201109).

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The data presented in this study are available on request from the corresponding author.

Conflicts of Interest

Authors Wenfeng Li and Yunteng Zhou were employed by the company China Merchants Chongqing Transportation Communications Technology Research & Design Institute Co., Ltd. The remaining authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

References

Dai, W.; Jiang, J.; Ding, G.; Liu, Z. Development and application of fire video image detection technology in China’s road tunnels. Civ. Eng. J. 2019, 5, 1–17. [Google Scholar] [CrossRef]
Zhang, X.; Chen, X.; Ding, Y.; Zhang, Y.; Wang, Z.; Shi, J.; Johansson, N.; Huang, X. Smart real-time evaluation of tunnel fire risk and evacuation safety via computer vision. Saf. Sci. 2024, 177, 106563. [Google Scholar] [CrossRef]
Zhang, Y.; Huang, X. A review of tunnel fire evacuation strategies and state-of-the-art research in China. Fire Technol. 2024, 60, 859–892. [Google Scholar] [CrossRef]
Liu, X.; Sun, B.; Xu, Z.-D.; Liu, X.; Xu, D. An intelligent fire detection algorithm and sensor optimization strategy for utility tunnel fires. J. Pipeline Syst. Eng. Pract. 2022, 13, 04022009. [Google Scholar] [CrossRef]
Jin, Y.; Wu, F.; Zhu, X.; Yu, H.; Xia, M.; Liu, H. Intelligent tunnel fire emergency linkage measures based on image recognition. In Proceedings of the Fourth International Conference on Computer Vision and Data Mining (ICCVDM 2023), Changchun, China, 20–22 October 2023; pp. 25–31. [Google Scholar]
Jin, L.; Yu, Y.; Zhou, J.; Bai, D.; Lin, H.; Zhou, H. SWVR: A lightweight deep learning algorithm for forest fire detection and recognition. Forests 2024, 15, 204. [Google Scholar] [CrossRef]
Boroujeni, S.P.H.; Razi, A. Ic-gan: An improved conditional generative adversarial network for rgb-to-ir image translation with applications to forest fire monitoring. Expert Syst. Appl. 2024, 238, 121962. [Google Scholar] [CrossRef]
Cao, L.; Shen, Z.; Xu, S. Efficient forest fire detection based on an improved YOLO model. Vis. Intell. 2024, 2, 20. [Google Scholar] [CrossRef]
Wang, Z.; Xu, L.; Chen, Z. FFD-YOLO: A modified YOLOv8 architecture for forest fire detection. Signal Image Video Process. 2025, 19, 265. [Google Scholar] [CrossRef]
Xiao, L.; Li, W.; Zhang, X.; Jiang, H.; Wan, B.; Ren, D. EMG-YOLO: An efficient fire detection model for embedded devices. Digit. Signal Process. 2025, 156, 104824. [Google Scholar] [CrossRef]
Zhou, M.; Wu, L.; Liu, S.; Li, J. UAV forest fire detection based on lightweight YOLOv5 model. Multimed. Tools Appl. 2024, 83, 61777–61788. [Google Scholar] [CrossRef]
Wang, H.; Zhang, Y.; Zhu, C. YOLO-LFD: A Lightweight and Fast Model for Forest Fire Detection. Comput. Mater. Contin. 2025, 82, 3399–3417. [Google Scholar] [CrossRef]
Chen, T.-H.; Wu, P.-H.; Chiou, Y.-C. An early fire-detection method based on image processing. In Proceedings of the 2004 International Conference on Image Processing, ICIP’04, Singapore, 24–27 October 2004; pp. 1707–1710. [Google Scholar]
Wieser, D.; Brupbacher, T. Smoke detection in tunnels using video images. In Proceedings of the 12th International Conference on Automatic Fire Detection, Gaithersburg, MA, USA, 25-28 March 2001; pp. 79–90. [Google Scholar]
Horng, W.-B.; Peng, J.-W.; Chen, C.-Y. A new image-based real-time flame detection method using color analysis. In Proceedings of the 2005 IEEE Networking, Sensing and Control, Tucson, AZ, USA, 19–22 March 2005; pp. 100–105. [Google Scholar]
Chen, T.-H.; Yin, Y.-H.; Huang, S.-F.; Ye, Y.-T. The smoke detection for early fire-alarming system base on video processing. In Proceedings of the 2006 International Conference on Intelligent Information Hiding and Multimedia, Pasadena, CA, USA, 18–20 December 2006; pp. 427–430. [Google Scholar]
Töreyin, B.U.; Dedeoğlu, Y.; Güdükbay, U.; Cetin, A.E. Computer vision based method for real-time fire and flame detection. Pattern Recognit. Lett. 2006, 27, 49–58. [Google Scholar] [CrossRef]
Celik, T. Fast and efficient method for fire detection using image processing. ETRI J. 2010, 32, 881–890. [Google Scholar] [CrossRef]
Ko, B.C.; Cheong, K.-H.; Nam, J.-Y. Fire detection based on vision sensor and support vector machines. Fire Saf. J. 2009, 44, 322–329. [Google Scholar] [CrossRef]
Rong, J.; Zhou, D.; Yao, W.; Gao, W.; Chen, J.; Wang, J. Fire flame detection based on GICA and target tracking. Opt. Laser Technol. 2013, 47, 283–291. [Google Scholar] [CrossRef]
Han, D.; Lee, B. Flame and smoke detection method for early real-time detection of a tunnel fire. Fire Saf. J. 2009, 44, 951–961. [Google Scholar] [CrossRef]
Boroujeni, S.P.H.; Razi, A.; Khoshdel, S.; Afghah, F.; Coen, J.L.; O’Neill, L.; Fule, P.; Watts, A.; Kokolakis, N.-M.T.; Vamvoudakis, K.G. A comprehensive survey of research towards AI-enabled unmanned aerial systems in pre-, active-, and post-wildfire management. Inf. Fusion 2024, 108, 102369. [Google Scholar] [CrossRef]
Min, H.; Qu, B.; Xie, Z. Study on Tunnel Fire Detection Based on Improved YOLOv8. Comput. Meas. Control 2024, 32, 38–45. [Google Scholar] [CrossRef]
Chang, L.; Zhang, X.; Xue, H.; Yang, J.; Wan, Z. Real-time tunnel fire detection by fusion of YOLOv5s and SRGAN. J. Electron. Meas. Instrum. 2023, 36, 223–230. [Google Scholar]
Mao, K.; Gong, S.; Li, H.; Zhou, Q.; Yuan, H.; Huang, Z. Tunnel Fire Detection based on Improved YOLOv5. In Proceedings of the 2023 8th International Conference on Intelligent Computing and Signal Processing (ICSP), Xi’an, China, 21–23 April 2023; pp. 2091–2097. [Google Scholar]
Jia, F. Fire flame detection in tunnel based on Halcon and ResNet50. In Proceedings of the 2023 3rd International Conference on Neural Networks, Information and Communication Engineering (NNICE), Guangzhou, China, 24–26 February 2023; pp. 256–259. [Google Scholar]
Li, C.; Zhu, B.; Chen, G.; Li, Q.; Xu, Z. Intelligent Monitoring of Tunnel Fire Smoke Based on Improved YOLOX and Edge Computing. Appl. Sci. 2025, 15, 2127. [Google Scholar] [CrossRef]
Zhou, S.; Peng, Z.; Zhang, H.; Hu, Q.; Lu, H.; Zhang, Z. Helmet-YOLO: A Higher Accuracy Algorithm for Road Safety Helmet Detection. Comput. Eng. Appl. 2025, 61, 135–144. [Google Scholar]
Wang, Z.; He, X.; Li, Y.; Chuai, Q. EmbedFormer: Embedded Depth-Wise Convolution Layer for Token Mixing. Sensors 2022, 22, 9854. [Google Scholar] [CrossRef] [PubMed]
Zhang, J.; Chen, X.; Ray, S. GCONV chain: Optimizing the whole-life cost in end-to-end CNN acceleration. IEEE Trans. Comput. 2021, 71, 2300–2312. [Google Scholar]
Chen, J.; Kao, S.-h.; He, H.; Zhuo, W.; Wen, S.; Lee, C.-H.; Chan, S.-H.G. Run, don’t walk: Chasing higher FLOPS for faster neural networks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Vancouver, BC, Canada, 17–24 June 2023; pp. 12021–12031. [Google Scholar]
Wang, C.-Y.; Yeh, I.-H.; Mark Liao, H.-Y. Yolov9: Learning what you want to learn using programmable gradient information. In Proceedings of the European Conference on Computer Vision, Milan, Italy, 29 September–4 October 2024; pp. 1–21. [Google Scholar]
Lin, T.-Y.; Dollár, P.; Girshick, R.; He, K.; Hariharan, B.; Belongie, S. Feature pyramid networks for object detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 2117–2125. [Google Scholar]
Zhao, Y.; Lv, W.; Xu, S.; Wei, J.; Wang, G.; Dang, Q.; Liu, Y.; Chen, J. Detrs beat yolos on real-time object detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 16–22 June 2024; pp. 16965–16974. [Google Scholar]
Soudy, M.; Afify, Y.; Badr, N. RepConv: A novel architecture for image scene classification on Intel scenes dataset. Int. J. Intell. Comput. Inf. Sci. 2022, 22, 63–73. [Google Scholar]
Dai, X.; Chen, Y.; Xiao, B.; Chen, D.; Liu, M.; Yuan, L.; Zhang, L. Dynamic head: Unifying object detection heads with attentions. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Nashville, TN, USA, 20–25 June 2021; pp. 7373–7382. [Google Scholar]
Wang, G.; Li, H.; Li, P.; Lang, X.; Feng, Y.; Ding, Z.; Xie, S. M4SFWD: A Multi-Faceted synthetic dataset for remote sensing forest wildfires detection. Expert Syst. Appl. 2024, 248, 123489. [Google Scholar]

Figure 1. FIRE-YOLOv8s network structure.

Figure 2. Structure of P-C2f.

Figure 3. Working principle of PConv.

Figure 4. Structure of ADown.

Figure 5. Structure of CCFF.

Figure 6. Improved neck structure.

Figure 7. Structure of dynamic head.

Figure 8. Partial image of the dataset.

Figure 9. Comparison of mAP@0.5 curves in ablation experiments.

Figure 10. Comparison of mAP@0.5 curves in contrasting experiments.

Figure 11. Comparison of different models for detection in regular-sized fire target scenarios.

Figure 12. Comparison of different models for detection in small fire target scenarios.

Figure 13. Comparison of different models for detection in multiple fire target scenarios.

Figure 14. Robot integral control system.

Figure 15. Algorithm deployment experiment results.

Figure 16. Algorithm deployment experiment results.

Table 1. Experimental hardware and software configuration.

Configuration	Parameters
Operating system	Windows 10
Memory (unit)	64
Processing unit	Intel(R) Xeon(R) Platinum 8336C
GPU	Nvidia GeForce RTX 3090
CUDA	CUDA12.1
Deep learning framework	Pytorch2.2.2
Programming language	Python3.8

Table 2. Results of ablation experiments.

P-C2f	ADown	CCFF	DyHead	mAP@0.5/%	P/%	R/%	F1	Params/m	FLOPs/G	Weight Size/Mb	FPS
				62.8	73.5	55.4	0.63	11.1	28.6	21.4	115.9
✓				62.4	72.3	55	0.63	7.5	19.8	14.6	120.6
✓	✓			63.4	76.1	54.5	0.54	5.9	16.3	11.5	132.7
✓	✓	✓		62.9	76.3	54.8	0.63	3.8	13.3	7.52	138.5
✓	✓	✓	✓	64.5	78.3	56.5	0.65	5.3	14.9	10.3	120.7

Table 3. Results of comparison experiments.

Model	mAP@0.5/%	P/%	R/%	F1	Params/m	FLOPs/G	Weight Size/Mb	FPS
Faster R-CNN	41.2	31.59	48.07	0.34	-	-	105	36.1
SSD	32.99	76.49	22.65	0.35	-	-	90.5	252
YOLOv5s	62.5	75.7	54.5	0.64	7.2	16.5	13.7	117.1
YOLOv6s	60.5	70.1	53.7	0.61	18.5	45.3	27.5	115.5
YOLOv7-tiny	59.4	67.5	53.3	0.60	6.0	13.2	11.6	118.7
YOLOv8s	62.8	73.5	55.4	0.63	11.1	28.6	21.4	114.5
YOLOv10s	62.2	70.6	55.8	0.62	8.0	24.8	15.7	50.1
YOLOv11s	63.2	71	55.2	0.63	9.4	21.5	18.2	117.5
Model [25]	63.1	76.2	55.1	0.64	10.4	21.7	44.7	55.9
Model [24]	63.6	76.8	55.3	0.64	9.5	25.5	18.3	32
FIRE-YOLOv8s	64.5	78.3	56.5	0.65	5.3	14.9	10.3	120.7

Table 4. Results of generalization experiments.

Model	mAP@0.5/%	P/%	R/%	F1	Params/m	FLOPs/G	Weight Size/Mb	FPS
Faster R-CNN	53.8	43.67	57.73	0.47	-	-	105	38.5
SSD	42.74	86.49	32.65	0.49	-	-	90.5	255
YOLOv5s	73.1	86.9	64.5	0.75	7.2	16.5	13.7	116.1
YOLOv6s	71.5	79.3	63.9	0.70	18.5	45.3	27.5	115.9
YOLOv7-tiny	69.4	78.1	62.4	0.71	6.0	13.2	11.6	118.2
YOLOv8s	71.9	84.2	65.4	0.72	11.1	28.6	21.4	113.5
YOLOv10s	71.3	81.3	63.6	0.73	8.0	24.8	15.7	52.1
YOLOv11s	73.2	83.2	64.7	0.73	9.4	21.5	18.2	114.8
Model [25]	72.6	85.4	63.5	0.74	10.4	21.7	44.7	54
Model [24]	71.4	83.8	61.9	0.72	9.5	25.5	18.3	30
FIRE-YOLOv8s	75.2	89.7	66.8	0.75	5.3	14.9	10.3	121.1

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Bu, L.; Li, W.; Zhang, H.; Wang, H.; Tian, Q.; Zhou, Y. FIRE-YOLOv8s: A Lightweight and Efficient Algorithm for Tunnel Fire Detection. Fire 2025, 8, 125. https://doi.org/10.3390/fire8040125

AMA Style

Bu L, Li W, Zhang H, Wang H, Tian Q, Zhou Y. FIRE-YOLOv8s: A Lightweight and Efficient Algorithm for Tunnel Fire Detection. Fire. 2025; 8(4):125. https://doi.org/10.3390/fire8040125

Chicago/Turabian Style

Bu, Lingyu, Wenfeng Li, Hongmin Zhang, Hong Wang, Qianqian Tian, and Yunteng Zhou. 2025. "FIRE-YOLOv8s: A Lightweight and Efficient Algorithm for Tunnel Fire Detection" Fire 8, no. 4: 125. https://doi.org/10.3390/fire8040125

APA Style

Bu, L., Li, W., Zhang, H., Wang, H., Tian, Q., & Zhou, Y. (2025). FIRE-YOLOv8s: A Lightweight and Efficient Algorithm for Tunnel Fire Detection. Fire, 8(4), 125. https://doi.org/10.3390/fire8040125

Article Menu

FIRE-YOLOv8s: A Lightweight and Efficient Algorithm for Tunnel Fire Detection

Abstract

1. Introduction

2. Related Works

2.1. Methods Based on Image Processing

2.2. Methods Based on Deep Learning

3. Materials and Methods

3.1. Introduction of YOLOv8s

3.2. Improvement of YOLOv8s Model: FIRE-YOLOv8s

3.3. P-C2f Module

3.4. ADown Module

3.5. Enhancement of the Feature Fusion Network Using the CCFF Module

3.6. Dynamic Head

4. Experiments and Analysis

4.1. Dataset

4.2. Experimental Environment

4.3. Model Evaluation Indicators

4.4. Ablation Experiment

4.5. Comparison Experiment

4.6. Visualization and Analysis of Tunnel Fire Detection

4.7. Generalisation Experiments

4.8. Algorithm Validation and Application

4.8.1. Introduction to Robotics Platform

4.8.2. Algorithm Deployment Experiment

5. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI