YOLOv11-CHBG: A Lightweight Fire Detection Model

Jiang, Yushuang; Liu, Peisheng; Han, Yunping; Xiao, Bei

doi:10.3390/fire8090338

Open AccessArticle

YOLOv11-CHBG: A Lightweight Fire Detection Model

by

Yushuang Jiang

,

Peisheng Liu

,

Yunping Han

^* and

Bei Xiao

College of Artificial Intelligence and Software Engineering, Liaoning Petrochemical University, Fushun 113001, China

^*

Author to whom correspondence should be addressed.

Fire 2025, 8(9), 338; https://doi.org/10.3390/fire8090338

Submission received: 7 July 2025 / Revised: 13 August 2025 / Accepted: 15 August 2025 / Published: 24 August 2025

(This article belongs to the Special Issue Machine Learning (ML) and Deep Learning (DL) Applications in Wildfire Science: Principles, Progress and Prospects (2nd Edition))

Download

Browse Figures

Versions Notes

Abstract

Fire is a disaster that seriously threatens people’s lives. Because fires occur suddenly and spread quickly, especially in densely populated places or areas where it is difficult to evacuate quickly, it often causes major property damage and seriously endangers personal safety. Therefore, it is necessary to detect the occurrence of fires accurately and promptly and issue early warnings. This study introduces YOLOv11-CHBG, a novel detection model designed to identify flames and smoke. On the basis of YOLOv11, the C3K2-HFERB module is used in the backbone part, the BiAdaGLSA module is proposed in the neck, the SEAM attention mechanism is added to the model detection head, and the proposed model is more lightweight, offering potential support for fire rescue efforts. The model developed in this study is shown by the experimental results to achieve an average precision (mAP@0.5) of 78.4% on the Dfire datasets, with a 30.8% reduction in parameters compared to YOLOv11. The model achieves a lightweight design, enhancing its significance for real-time fire and smoke detection, and it provides a research basis for detecting fires earlier, preventing the spread of fires and reducing the harm caused by fires.

Keywords:

fire smoke detection; YOLOv11; object detection; real-time performance; lightweight

1. Introduction

As a sudden disaster, fire often poses a serious threat to people’s lives and property. The latest figures from China’s National Fire and Rescue Administration reveal that a total of 908,000 fires were reported by fire and rescue teams throughout the nation in 2024, resulting in 2001 deaths, 2665 injuries, and direct economic losses of 7.74 billion yuan [1]. These shocking figures highlight the urgency of fire prevention and control; thus, it is important to research efficient and accurate fire smoke detection technology. In the early stage of a fire, efficient flame and smoke detection can identify and give accurate warnings in time, saving time for emergency response. This technology can not only greatly improve the response speed of fire prevention and control, but also effectively reduce the casualty rate and property loss, and play an irreplaceable role in maintaining public safety. Therefore, conducting in-depth research on fire smoke detection technology and continuously improving detection accuracy is of great significance for safeguarding the lives and property of the people.

The detection of fire and smoke has always been a key area of research, and with the advancement of technology, detection techniques have continuously improved, from initial sensor-based detection to intelligent image recognition. In recent years, with the emergence of algorithms such as machine learning and deep learning, fire and smoke detection technology has also been advancing. Sensor-based detection primarily relies on physical characteristics such as the light intensity of flames, the concentration of smoke generation, and infrared thermal imaging. The use of various sensors for detection is easily affected by environmental factors. PARK et al. [2,3,4,5] conducted fire detection based on the physical characteristics of flames and fire scenes by analyzing the flame and the on-site environment. These studies used traditional image processing methods, which lack generalization ability and may not effectively address challenges posed by different types of fires or smoke, as well as complex environments. Later, with the emergence of machine learning, computer vision-based detection methods have become increasingly popular for fire detection. Qi X et al. [6,7,8,9,10,11] (Refer to Table 1 of the article) improved fire detection classification and reduced false alarm rates by combining machine learning, deep learning, and computer vision methods.

With the advancement of deep learning technology, the YOLO series algorithm, as the main algorithm for first-stage object detection, has become the most used algorithm for flame and smoke detection by many researchers by virtue of its real-time performance and accuracy during detection. V.-H. Hoang [12] proposed a hyperparameter tuning method based on Bayesian optimization to improve the performance of the YOLO model in fire detection. The method improves the detection accuracy and efficiency by automatically selecting hyperparameters, and the optimized model performs well in the fire identification task, providing an effective solution for the real-time fire monitoring system. S. Dalal [13] proposed a hybrid feature extraction method combining the local binary mode (LBP) and convolutional neural network (CNN) and integrated it into the YOLO-v5 model for fire and smoke detection under different environmental conditions in smart cities. Saydirasulovich, S.N. [14] improved the lightweight YOLOv5s model by combining deep separable convolution and the attention mechanism to improve the early detection effect of agricultural fires. The results show that the model can effectively detect early agricultural fires and reduce losses. In Chen, C. [15], based on the PP-YOLO network structure, a stronger feature extraction module and improved loss function were introduced, which improved the model’s ability to identify small targets and flames in complex scenarios, effectively reduced false detections and missed detections, and provided a feasible solution for early fire warning. Deng et al. [16,17,18,19,20,21] (Refer to Table 2 of the article) optimized the network architecture based on the YOLO series algorithms to enhance fire and smoke detection performance. By introducing novel modules and mechanisms—such as a dual-channel bottleneck structure, a small target detection strategy, and an efficient feature extraction and aggregation mechanism—the model improves the recognition capability for flames and smoke while reducing both False Positive and False Negative rates. Furthermore, the improved algorithm demonstrates enhanced adaptability and practicality across diverse environments. It emphasizes real-time performance and practicability, enabling the rapid and accurate detection of fire and smoke in actual monitoring systems. This provides robust technical support for early disaster warning. Talaat [22] proposes an improved fire detection method for smart city environment optimization, which enhances the basic YOLO-v8 model by introducing the attention mechanism and feature fusion optimization, aiming to significantly improve the accuracy, real-time nature, and robustness of fire detection in complex urban scenarios, and finally realizes high-precision real-time fire warning capabilities. Deng, L. et al. [23] proposed a lightweight AH-YOLO model based on adaptive receptive field modules and hierarchical feature fusion technology to realize the real-time remote monitoring of hangars. In order to overcome the limitations of existing forest fire detection algorithms in complex environments, such as insufficient feature extraction, high computational cost, and difficulty in deployment, Zhou, N. et al. [24] introduced spatial multi-scale feature fusion and hybrid pooling modules to propose the YOLOv8n-SMMP forest fire detection model. Didis H M [25] can analyze video in real time to detect early fire risks, which is suitable for large-scale rapid inspections, but may lead to missed detections due to signal blind spots. In contrast, the YOLO-CHBG model in this study is better at long-term monitoring tasks. Chaikarnjanakit T [26] proposed a two-stage YOLO smoke detection model, which was combined with the dynamic risk warning mechanism to design a smoke detection dynamic warning system, which reduced its false alarm rate through two-level false alarm filtering.

Although existing fire detection methods are relatively well-established, achieving high-precision real-time detection remains challenging due to the rapidly changing dynamics and blurred boundaries characteristic of flames and smoke. The real-time performance of a model is primarily determined by its parameter count and computational complexity: generally, fewer parameters and lower computational requirements lead to faster inference speeds and stronger real-time capability. Therefore, when designing flame and smoke detection models, it is essential to prioritize two key optimizations while maintaining detection accuracy: first, model compression to enhance real-time detection performance, and second, enhancing feature representation capabilities to accurately capture the characteristics of fuzzy boundaries and ensure robust detection. This dual-pronged strategy, balancing lightweight design with feature enhancement, is critical for improving fire detection model performance.

In this study, a YOLOv11-CHBG model with lightweight performance was proposed based on YOLOv11.

Firstly, in view of the unclear boundaries of flame and smoke objects and the requirements of light weight, the C3K2 module in YOLOv11n is improved based on the lightweight High-Frequency Enhanced Residual Block (HFERB), which strengthens the extraction ability of edge information of feature images and effectively reduces the number of model parameters and calculations.
Secondly, based on Global-to-Local Spatial Aggregation (Global-to-Local Spatial Aggregation), an Adaptive Global–Local Spatial Aggregation (AdaGLSA) module is proposed, which not only ensures the high performance of detection, but also reduces the number of parameters and computational complexity of the model.
Finally, in this study, the SEAM attention mechanism is introduced into the head network of the model, which can effectively deal with the problem of being occluded by other objects when the flame occurs. The proposed model is lighter than the original model and can accurately identify flames and smoke.

2. Methods

2.1. YOLOv11-CHBG

As shown in Figure 1, this study first uses a lightweight High-Frequency Enhanced Residual Block (HFERB) to improve the C3K2 module in YOLOv11 in the backbone network, which explicitly enhances the high-frequency information before feature fusion, strengthens the response of the model to edge features, and can significantly improve the problem where the target edge features are not obvious in flame and smoke detection, and the improvement has a small computational cost. Secondly, the Adaptive Global–Local Spatial Aggregation module is proposed in the neck, which reduces the number of model parameters through the Bidirectional Feature Pyramid Network (BiFPN), and further enhances the global and local characteristics of features by using AdaGLSA, so as to better capture the smoke diffusion range and flame edge features in the fire scene, and to improve the detection ability of fire targets at different scales. Finally, the self-ensembling attention mechanism (SEAM) is used to improve the head network structure, which is convenient at dealing with the problem where the target is occluded by other objects in the dataset, i.e., flame and smoke, and to make up for the loss in accuracy due to the lightweight improvement in the model.

2.1.1. C3k2-HFERB

The C3K2 module is a feature extraction module designed and improved by YOLOv11 based on the traditional C3 module. The C3 module combines the variable convolution kernel and channel separation strategy to enhance the feature extraction ability in complex backgrounds. The C3K2 module divides the input features into two parts for processing. A portion of the direct transfer using ordinary convolution operations preserves shallow features, ensuring that the model captures the details. The other part uses multiple bottleneck structures or C3K structures for deep feature extraction to further mine the deep information in the feature map, so as to improve the model’s ability to understand complex features. The C3K2 module also adopts technologies such as group convolution and channel compression, where effective lightweight experiments can be carried out; can still maintain high accuracy while reducing the number of parameters and calculations; and achieves a balance between performance and efficiency.

The C3K2 module significantly improves the characteristic learning performance and computational efficiency of the YOLOv11 model due to its parallel convolutional support design, variable convolution kernel, and lightweight optimization, which is one of the key factors to promote the excellent performance of YOLOv11 in the field of object detection. However, due to the dynamic variability of the detection target in flame and smoke detection, the detection target boundary is not easy to demarcate; therefore, it is necessary to strengthen the ability of the model in high-frequency information detection. Based on the lightweight requirements, this study uses the lightweight HFERB module to improve the C3K2 module, and by enhancing the high-frequency information, the model can better deal with complex background and small-target detection problems, improve the feature extraction ability of the model, and ensure computational efficiency.

High-Frequency Enhanced Residual Blocks (HFERBs) [27] can enhance detailed information, especially in complex backgrounds such as flames and smoke and small target detection. HFERBs consist of two branches: the Local Feature Extraction (LFE) branch and the High-Frequency Enhancement (HFE) branch. Among them, the High-Frequency Enhancement branch is the main branch of the HFERB, which extracts high-frequency information through maximum pooling and enhances the detailed information of the feature map, such as the edges and texture features of fire and smoke, so as to improve the detection effect. At the same time, the maximum pooling layer can also reduce the spatial resolution of the feature map, which can reduce the computational cost and memory consumption.

As shown in Figure 2, HFERB divides the input features

F_{i n} \in R^{H \times W \times C}

into two parts according to the channel dimension, which are then fed into two branches: the LFE branch and the HFE branch:

F_{i n}^{L F E}, F_{i n}^{H F E} = S p l i t (F_{i n}),

(1)

where

F_{i n}^{L F E}, F_{i n}^{H F E} \in R^{H \times W \times C / 2}

in Equation (1) represent the input of LFE and HFE.

For the LFE branch, as shown in the left branch in Figure 2, a 3 × 3 convolutional layer followed by the GELU activation function is used to extract the local high-frequency features, and this branch mainly extracts the smoother regional features in the image:

F_{i n}^{L F E} = f_{a} ({C o n v}_{3 \times 3} (F_{i n}^{L F E})),

(2)

where

{C o n v}_{3 \times 3} (\cdot)

represents the convolutional layer and

f_{a} (\cdot)

represents the GELU activation function.

For the HFE branch, as shown in Figure 2 on the right, the high-frequency information is first extracted from the input feature

F_{i n}^{H F E}

by the maximum pooling layer, and then the high-frequency features are enhanced using a 1 × 1 convolutional layer and a GELU activation function:

F_{i n}^{H F E} = f_{a} ({C o n v}_{1 \times 1} (M a x P o o l i n g (F_{i n}^{H F E}))),

(3)

where

{C o n v}_{1 \times 1} (\cdot)

represents the convolutional layer and MaxPooling(·) represents the maximum pooling. This branch explicitly extracts and enhances the edge and texture features of the image and effectively improves the model’s ability to capture subtle visual cues for the initial flame and smoke features with blurred boundaries and small sizes, laying the foundation for the subsequent identification of small targets.

Finally, the outputs of the two branches are connected; first, a 1 × 1 convolutional layer fuses the high-level feature information and reverts to the original channel count, and then a jump connection is introduced to maintain the stability of the training, the fused output and input residuals are added to form a complete residual block, and the whole process can be described as follows:

X_{H} = {C o n v}_{1 \times 1} (C o n c a t (F_{i n}^{L F E}, F_{i n}^{H F E})) + F_{i n},

(4)

where

C o n c a t (\cdot)

represents a concatenated work and

{C o n v}_{1 \times 1} (\cdot)

represents a convolutional layer.

Due to the lack of obvious edge features of flames and smoke in flame and smoke detection, HFERB is introduced into the residual branch of C3K2 (Figure 3) in this study, the high-frequency information is explicitly enhanced before feature fusion, and the improved module is more suitable for real-time object detection and flame and smoke detection in complex environments. The C3K2 module itself has enhanced feature extraction capabilities through a multi-scale convolutional kernel and parallel branching design. The integration of the HFERB module further strengthens the capture of high-frequency information, enabling the model to understand the image content more comprehensively. At the same time, the residual connection design of the HFERB maintains the training stability of the network, and its lightweight characteristics do not significantly increase the computational burden, which makes YOLOv11 more accurate in the positioning and recognition of targets while maintaining efficient computing.

2.1.2. BiAdaGLSA

In order to better capture the flame smoke and capture the diffusion range, and further improve the feature extraction ability of the model for flames and smoke at the neck, based on the Global Local Feature Fusion Module (GLSA) [28], BiFPN (BiFPN), and BiAdaGLSA (Adaptive Global–Local Spatial Aggregation), BiFPN is able to effectively handle multi-scale targets through top-down and bottom-up feature fusion. However, the magnitude of flames and smoke in fire scenarios varies greatly, and BiFPN alone cannot fully cover the features of all scales. The proposed AdaGLSA can further enhance the global and local characteristics of the features, facilitate better capture of various information in the fire scene, and improve the detection ability of the model for different fire scene target scales.

The Adaptive Global–Local Spatial Aggregation (AdaGLSA) module consists of the Global Spatial Attention (GSA) module and the Local Spatial Attention (LSA) module, as shown in Figure 4. According to the channel separation, the input features with 64 channels were divided into two groups and then input into the GSA module and the LSA module, and finally the outputs of the two modules were fused and output through a 1 × 1 convolutional layer.

The formula is expressed as follows:

F_{i}^{1}, F_{i}^{2} = S p l i t (F_{i}),

(5)

F_{i}^{,} = C_{1 \times 1} (a d d (G_{s a} (F_{i}^{1}), L_{s a} (F_{i}^{2}))),

(6)

where

F_{i} (i \in (2, 3, 4))

is the input feature,

F_{i}^{1}, F_{i}^{2} (i \in (2, 3, 4))

are the input features of the GSA module and the LSA module,

F_{i}^{,} \in R^{\frac{H}{8} \times \frac{W}{8} \times 32}

is the output feature,

G_{s a}

represents the GSA, and

L_{s a}

represents the LSA. The GSA module mainly captures the diffusion trend, approximate range, and interaction with the background of flames and smoke. The LSA module focuses on the edges of the flame, the texture of the smoke, and the subtle changes in the smoke. Through this separation strategy, a single feature map learns the conflicts or information dilution that may occur from global and local information at the same time, ensuring that both types of information have independent computing paths for in-depth mining.

Since the diffusion path of fire and smoke is irregular and difficult to predict, the GSA module can predict the extent of diffusion by focusing on the long-term relationship between pixels in the spatial domain:

{A t t}_{G} (F_{i}^{1}) = S o f t m a x (T r a n s p o s e (C_{1 \times 1} (F_{i}^{1}))),

(7)

G_{s a} (F_{i}^{1}) = M L P ({A t t}_{G} (F_{i}^{1}) ⨂ F_{i}^{1}) + F_{i}^{1},

(8)

where

{A t t}_{G} (\cdot)

represents attention,

C_{1 \times 1}

represents the convolution of 1 × 1, the symbol

⨂

represents matrix multiplication, and

M L P (\cdot)

includes two fully connected layers, ReLU activation and a normalization layer.

The Local Spatial Attention (LSA) module can extract local features from the region of interest in the spatial dimension of a given feature map through cascade and residual joining, so as to detect small objects more effectively, locally focus on effectively strengthening the problem where small targets have difficulty in capturing the edge of flames and smoke, and small initial flames can easily lead to missed detection, making them easier to detect in complex backgrounds:

{A t t}_{L} (F_{i}^{2}) = σ (C_{1 \times 1} (F_{c} (F_{i}^{2})) + F_{i}^{2}),

(9)

L_{s a} = {A t t}_{L} (F_{i}^{2}) ⨀ F_{i}^{2} + F_{i}^{2},

(10)

where

F_{c} (\cdot)

represents a sequence of three 1 × 1 convolutional layers, followed by a 3 × 3 deep convolutional layer,

{A t t}_{G} (\cdot)

represents attention,

σ (\cdot)

represents the Sigmoid activation function, and the symbol

⨀

represents element multiplication.

Through the collaboration between GSA and LSA, the AdaGLSA module can simultaneously capture long-distance dependencies and local detail information, and effectively aggregate local spatial information with fewer parameters. The newly proposed module can more accurately identify flames and smoke in fire detection, GSA provides local details on whether this edge is a flame or smoke, and LSA makes the predicted diffusion range more accurate, which well meets the requirements for the prediction of flame and smoke diffusion ranges and the light weight of the model.

2.1.3. SEAMHead

By observing the dataset, it was found that the environment during a fire is complex and the background is chaotic, with a large number of detection targets being obscured by trees and buildings, among other items. Experiments also showed that lightweight improvements to the model would result in a loss of accuracy. In order to optimize the performance of the model in terms of accuracy and further reduce the number of model parameters, this study improved the detection head using the Self-Integrated Attention Module (SEAM). The improved detection head enhances the response to unobscured parts of the flame and smoke, and when the small size of flames or smoke is partially obscured by branches, buildings, and other objects, the module can effectively enhance the response to the visible part and reduce the missed detection rate, thereby improving detection accuracy.

As shown in Figure 5, the SEAM [29] first consists of three different batches of channels and the Spatial Hybrid Module (CSMM) to obtain multi-scale features through convolution, then downsampling the obtained features using average pooling, and finally expanding the number of channels through a multi-layer connection network to strengthen the important features. On the right side of the figure is a detailed structure diagram of CSMM, which uses multi-scale features and deep separable convolutions to effectively learn the relationship between space and channels. The input information is first encoded by the Embedding Layer and then processed by the GELU activation function and Batch Normalization, and then the multi-level features are further extracted through Depthwise Convolution and Pointwise Convolution to enhance the representation ability. The resulting feature map is converted into attention weights and weighted with the original feature map. The improved module pays more attention to the key areas of detection, which has significant advantages in dealing with the problem of occlusion of detection targets in complex scenes.

3. Results

3.1. Datasets

In this study, a total of 21,527 datasets were used in the Dfire [30] public dataset. There are two types of dataset labels, namely Fire and Smoke, with a total of 14,692 fire tags and 11,865 smoke tags. Some of the annotated data are shown in Figure 6. The details of the DFIRE dataset are shown in Table 3.

3.2. Experiment Setup

3.2.1. Experimental Environment

The Windows 11 operating system was used with Python is 3.12.3, an NVIDIA GeForce RTX 3060 GPU, and the CUDA121 configuration, and the deep learning framework PyTorch version 2.3.0 was used.

3.2.2. Experimental Parameters

The parameters of the experimental process are shown in Table 4.

3.3. Evaluation Metrics

In this study, the number of model parameters, the number of giga-floating-point operations per second (GFLOPs), the Frames Per Second (FPS), the average precision (mAP), and the accuracy–recall curve (PR) were selected as evaluation indicators to compare the performance of different models. The calculation formula is as follows:

P = \frac{T P}{T P + F P},

(11)

R = \frac{T P}{T P + F N},

(12)

A P = \int_{0}^{1} P (R) d R,

(13)

m A P = \frac{1}{n} \sum_{i = 1}^{n} A P_{i}

(14)

where TP stands for True Positive, which means the sample is actually true and the predicted result is also true. FP represents False Positive, indicating that the sample is actually false but the predicted result is true. FN is False Negative, meaning the sample is true but the predicted result is false. Precision measures the proportion of samples that are correctly predicted to be true out of all samples predicted to be true. Recall assesses the proportion of true samples that are correctly predicted to be true. The PR curve is the relationship between P and R, reflecting the balance between accuracy and coverage. AP is a metric calculated for a single target category and reflects the combined performance of the model for Precision and Recall at different confidence thresholds. mAP is the average value of APs across different categories. mAP@0.5 is when calculating the APs of each category, the prediction is considered correct only if IoU ≥ 0.5 of the prediction box and the real box, and then the APs of all categories are averaged. mAP@0.5 is commonly used to compare the accuracy of model detection.

Params are the total number of parameters that directly affect the complexity and training time of the model, and are used to measure the overall size and complexity of the model. Meanwhile, computational costs (GFLOPs) are the number of floating-point operations that a model performs per second and are used to measure the actual computational complexity of a model. FPS is the number of image frames displayed per second, which is used to measure the real-time performance of the object detection model.

3.4. Comparative Experiments

In order to more intuitively see the performance superiority of this model, the current mainstream models YOLOv8n, YOLOv10n, and YOLOv12 were selected for comparison, as well as three fire detection models using the same dataset, which are widely used in real-time object detection. The experimental data are shown in Table 5.

Experimental observations and comparisons reveal that while the recall rate of YOLOv12 remains stable and achieves the highest detection accuracy (mAP@0.5), the model proposed in this study exhibits a significantly lower parameter count and computational cost, resulting in a lighter architecture. Ref. [31] proposes an improved fire detection model based on YOLOv5s. Although the model does not provide parameters and computational data for small fire and smoke detection, it shows a demand advantage in detection accuracy. Ref. [32] proposed a spatiotemporal attention conversion mechanism to improve the YOLOv5-based backbone network. The proposed model improved mAP@0.5 by 0.2 percentage points, reduced the number of parameters by 5.1M, and reduced GFLOPs by 13.3. Ref. [33] proposed a multi-scale convolutional attention (MSCA) mechanism and constructed the YOLO11s-MSCA model based on it. The experimental results show that although the FPS of the YOLOv11-CHBG model is not the best among all models, it reaches more than 30, which meets the real-time requirements of target detection. Comprehensive experimental comparison demonstrates that our proposed model significantly outperforms the other models in terms of parameter efficiency and computational cost. Furthermore, it offers substantial advantages in lightweight design, real-time performance, and detection accuracy, effectively meeting the practical requirements for flame and smoke detection.

In this study, by comparing the effects of BiFPN, AdaGLSA, BiAdaGLSA, and YOLOv11 basic models, it is concluded that BiAdaGLSA can reduce the complexity of the model while ensuring the detection accuracy of the model, and the effect is better. The specific experimental data are shown in Table 6:

Experiment 1 is a baseline YOLOv11 experiment to verify the baseline performance, and Experiment 2 incorporates the BiFPN module, achieving an mAP@0.5 of 77.7%, with 2.0M parameters and 6.4 GFLOPs of computational cost. Experiment 3 introduces the Adaptive Global–Local Space Aggregation (AdaGLSA) module. However, the resulting YOLOv11 mAP@0.5 is found to be lower than that of the BiFPN module with light weight and stronger performance. Therefore, the ability of AdaGLSA is introduced to improve the ability of neck pair and feature extraction, in which the mAP@0.5 is 78%, the parameter quantity is 3.7M, and the computational GFLOPs is 8.5. Experiment 4 combines the BiFPN and AdaGLSA modules (Bi-AdaGLSA). This hybrid approach achieved the highest mAP@0.5 of 78.2%, with 2.1M parameters and 6.9 GFLOPs.

3.5. Ablation Experiments

The ablation experiment can verify the effectiveness of each module added to the model, and a total of six sets of ablation experiments were carried out on the improved modules. The specific experimental data are shown in Table 7.

Experiment 1 serves as the baseline YOLOv11 experiment to establish comparative performance. Experiment 2 incorporates the C3K2-HSFPRB module, achieving an mAP@0.5 of 78.4% with 2.6 M parameters. Experiment 3 utilizes the BiAdaGLSA module for improvement. While it reduces the parameter count to 2.1 M, the mAP@0.5 is 78.2%. Notably, the computational cost increases due to the added attention mechanism. Experiment 4 enhances the model head with the SEAM. This approach achieves a higher mAP@0.5 of 78.7% with 2.5 M parameters, while also significantly reducing both parameters and computational cost compared to previous relevant experiments. Experiment 5 combines the C3K2-HSFPRB and Bi-AdaGLSA modules. This integration achieves an mAP@0.5 of 78.2% with a reduced parameter count of 1.9 M. However, the results indicate a decrease in detection accuracy alongside a significant reduction in model complexity. Experiment 6 evaluates the proposed YOLOv11-CHBG model. Compared to the baseline YOLOv11 model (Experiment 1), it maintains equivalent detection accuracy (mAP@0.5) while reducing model parameters by 30.8% and computational cost by 9.2%. These results demonstrate that the YOLOv11-CHBG model can accurately identify flames and smoke in fire scenes, meeting the requirements for daily fire detection tasks.

3.6. Visualization of Experimental Results

In this section, we display and analyze the graph and confusion matrix of the training results, and visualize the training image.

3.6.1. Experimental Resu

Figure 7 shows a visualization of the model training monitoring proposed in this study, and it can be seen that the training loss continues to decrease with the increase in the number of training times, indicating that the learning effect of bounding box regression and classification is gradually improved, and the fitting effect becomes increasingly better and better. The verification loss also showed a downward trend, and basically stabilized in the later stage, indicating that the model gradually learned effective features on the validation set, and there was no obvious overfitting. The experimental indexes P and R increased rapidly at the beginning of training and then tended to be stable, indicating that the detection ability of the model on the validation set was gradually in a stable state. Both mAP50 and mAP50-95 gradually increased and stabilized with the increase in training times, indicating that the performance of the model was improving, and the overall detection effect gradually converged and stabilized. In general, the convergence trend of the model training process is good, and the model is continuously optimized and the performance is continuously improved during the training process.

Figure 8 shows that the model can correctly identify 750 backgrounds and has accurate half-stage capabilities for normal scenes, but the model has areas for improvement in the detection of flames and smoke. Although the model correctly identified 1250 flames, as many as 1500 were mistaken for smoke, which in actual monitoring scenarios can cause delayed alarms. The problem in smoke identification is more serious; 306 smokes are misjudged as backgrounds, 1911 are misjudged as flames, and the accuracy rate is very low, which may be due to light smoke, a similarity to the background, or missed detection caused by insufficient samples. Combined with the misjudgment of the flame as smoke, it is indicated that there is confusion between the two categories of detection, and the model still needs to be improved.

3.6.2. Training Experimental Result

Figure 9 shows the results of the experimental verification using the improved model, which shows that the model can accurately identify and classify flames and smoke at the fire scene, and the model can distinguish objects similar to flame smoke, such as strong sunlight and clouds.

4. Discussion

Although the fire detection model presented in this study has achieved significant results, several aspects warrant further improvement. In fire scenarios, flames and smoke initially often appear as small targets, and the existing models exhibit limitations in detecting such small objects. Furthermore, the model’s detection accuracy requires enhancement. Analysis revealed classification errors within the dataset used in this study; consequently, the next step involves improving dataset quality to boost model accuracy and strengthen its ability to detect flames and smoke. Finally, the model’s validation is currently confined to laboratory settings using standard datasets and has not been deployed in real-world scenarios. This limitation raises concerns about performance stability under the complex disturbances inherent in actual fire environments. The lack of real-world testing also hinders the evaluation of long-term operational reliability on edge devices and prevents the quantification of environmental noise’s impact on False Positive rates—factors that could directly affect the application’s reliability and safety. Addressing these areas for improvement holds the potential to significantly increase the model’s effectiveness in early fire warning systems, thereby providing stronger technical support for safeguarding public safety and reducing fire-related losses.

5. Conclusions

In this study, an efficient flame and smoke detection model was designed based on the advanced YOLOv11 object detection algorithm. While YOLOv11 offers strong real-time performance and high detection accuracy, its application in the field of fire detection still needs to be further optimized. To this end, YOLOv11 was improved and optimized in this study. The backbone network of YOLOv11 was adjusted, and C3K2 was improved with HSPRB to enhance the model’s ability to extract flame and smoke features. At the same time, in view of the particularity of flames and smoke, the BiAdaGLSA module is proposed, which enables the model to focus more on the key characteristic areas of flames and smoke, which greatly reduces the number of parameters and computational complexity. In addition, the detection head of the model was improved by using the SEAM attention mechanism so that it could better adapt to the multi-scale object detection needs in the fire scene and further improve the detection performance of the model. After the above optimization and improvement, the fire detection model constructed in this study has been significantly improved in terms of the light weight and real-time performance of flames and smoke. The experimental results show that the improved model can still maintain high recall and accuracy in a variety of complex scenarios. The recall rate of the model to the flame reached more than 90%, the accuracy reached 78.4%, and the parameter count decreased by 30.8%. The model is capable of processing more than 30 frames of image data per second, which effectively satisfies real-time fire detection requirements.

Author Contributions

Conceptualization, Y.J. Methodology, Software, Validation, Formal Analysis, Investigation, Resources, and Data Management; Writing—Manuscript Preparation, P.L. and Y.H. Writing—Review and Editing; Y.J. Visualization, B.X. Supervision, P.L. Project Management, P.L. Capital Acquisition, Y.J., P.L., Y.H. and B.X. All authors have read and agreed to the published version of the manuscript.

Funding

The fund is funded by the Liaoning Provincial Department of Education University Basic Research Project (LJ212410148034), Liaoning Provincial Department of Education Xingliao Talents Program Project (XLYC1907166), Liaoning Provincial Department of Education Scientific Research Fund (L2019027).

Conflicts of Interest

The authors declare no conflict of interest.

References

Huang, Z. There are Five Main Characteristics of National Fires in 2024, in National Fire and Rescue Bureau. Available online: https://www.china-news-online.com/lang/English/4251186.html (accessed on 26 May 2025).
Park, S.; Han, K.W.; Lee, K. A study on fire detection technology through spectrum analysis of smoke particles. In Proceedings of the 2020 International Conference on Information and Communication Technology Convergence (ICTC), Jeju, Republic of Korea, 21–23 October 2020; IEEE: New York, NY, USA, 2020; pp. 1563–1565. [Google Scholar]
Xavier, K.L.B.L.; Nanayakkara, V.K. Development of an early fire detection technique using a passive infrared sensor and deep neural networks. Fire Technol. 2022, 58, 3529–3552. [Google Scholar] [CrossRef]
Celik, T. Fast and efficient method for fire detection using image processing. ETRI J. 2010, 32, 881–890. [Google Scholar] [CrossRef]
Qiu, X.; Xi, T.; Sun, D.; Zhang, E.; Li, C.; Peng, Y.; Wei, J.; Wang, G. Fire detection algorithm combined with image processing and flame emission spectroscopy. Fire Technol. 2018, 54, 1249–1263. [Google Scholar] [CrossRef]
Qi, X.; Ebert, J. A computer vision based method for fire detection in color videos. Int. J. Imaging 2009, 2, 22–34. [Google Scholar]
Liu, C.-B.; Ahuja, N. Vision based fire detection. In Proceedings of the 17th International Conference on Pattern Recognition, Cambridge, UK, 26–26 August 2004; IEEE: New York, NY, USA, 2004; Volume 4, pp. 134–137. [Google Scholar]
Khan, R.A.; Hussain, A.; Bajwa, U.I.; Raza, R.H.; Anwar, M.W. Fire and smoke detection using capsule network. Fire Technol. 2023, 59, 581–594. [Google Scholar] [CrossRef]
Zhao, H.; Jin, J.; Liu, Y.; Guo, Y.; Shen, Y. FSDF: A high-performance fire detection framework. Expert. Syst. Appl. 2024, 238, 121665. [Google Scholar] [CrossRef]
Geetha, S.; Abhishek, C.; Akshayanat, C. Machine vision based fire detection techniques: A survey. Fire Technol. 2021, 57, 591–623. [Google Scholar] [CrossRef]
Sarikaya Basturk, N. Forest fire detection in aerial vehicle videos using a deep ensemble neural network model. Aircr. Eng. Aerosp. Technol. 2023, 95, 1257–1267. [Google Scholar] [CrossRef]
Hoang, V.-H.; Lee, J.W.; Park, C.-S. Enhancing Fire Detection with YOLO Models: A Bayesian Hyperparameter Tuning Approach. Comput. Mater. Contin. 2025, 83, 4097. [Google Scholar] [CrossRef]
Dalal, S.; Lilhore, U.K.; Radulescu, M.; Simaiya, S.; Jaglan, V.; Sharma, A. A hybrid LBP-CNN with YOLO-v5-based fire and smoke detection model in various environmental conditions for environmental sustainability in smart city. Environ. Sci. Pollut. Res. 2024, 31, 1–18. [Google Scholar] [CrossRef]
Saydirasulovich, S.N.; Umirzakova, S.; Nabijon Azamatovich, A.; Mukhamadiev, S.; Temirov, Z.; Abdusalomov, A.; Cho, Y.I. Lightweight YOLOv5s Model for Early Detection of Agricultural Fires. Fire 2025, 8, 187. [Google Scholar] [CrossRef]
Chen, C.; Yu, J.; Lin, Y.; Lai, F.; Zheng, G.; Lin, Y. Fire detection based on improved PP-YOLO. Signal Image Video Process. 2023, 17, 1061–1067. [Google Scholar] [CrossRef]
Deng, L.; Zhou, J.; Liu, Q. Fire and smoke detection algorithm based on improved YOLOv8. J. Tsinghua Univ. (Sci. Technol.) 2025, 65, 681–689. [Google Scholar]
He, Y.; Hu, J.; Zeng, M.; Qian, Y.; Zhang, R. DCGC-YOLO: The efficient dual-channel bottleneck structure YOLO detection algorithm for fire detection. IEEE Access 2024, 12, 65254–65265. [Google Scholar] [CrossRef]
Zhao, L.; Zhi, L.; Zhao, C.; Zheng, W. Fire-YOLO: A small target object detection method for fire inspection. Sustainability 2022, 14, 4930. [Google Scholar] [CrossRef]
Wang, D.; Qian, Y.; Lu, J.; Wang, P.; Yang, D.; Yan, T. Ea-yolo: Efficient extraction and aggregation mechanism of YOLO for fire detection. Multimed. Syst. 2024, 30, 287. [Google Scholar] [CrossRef]
Zhang, Z.; Tan, L.; Robert, T.L.K. An improved fire and smoke detection method based on YOLOv8n for smart factories. Sensors 2024, 24, 4786. [Google Scholar] [CrossRef]
Ishtiaq, M.; Won, J.-U. YOLO-SIFD: YOLO with Sliced Inference and Fractal Dimension Analysis for Improved Fire and Smoke Detection. Comput. Mater. Contin. 2025, 82, 5343. [Google Scholar] [CrossRef]
Talaat, F.M.; ZainEldin, H. An improved fire detection approach based on YOLO-v8 for smart cities. Neural Comput. Appl. 2023, 35, 20939–20954. [Google Scholar] [CrossRef]
Deng, L.; Wang, Z.; Liu, Q. AH-YOLO: An Improved YOLOv8-Based Lightweight Model for Fire Detection in Aircraft Hangars. Fire 2025, 8, 199. [Google Scholar] [CrossRef]
Zhou, N.; Gao, D.; Zhu, Z. YOLOv8n-SMMP: A Lightweight YOLO Forest Fire Detection Model. Fire 2025, 8, 183. [Google Scholar] [CrossRef]
Didis, H.M.; Adibeli, F.; Boz, I.; Azdavay, N.S. Integrating UAVs and YOLO Deep Learning for Early-Stage Forest Fire Detection. In Proceedings of the International Trend of Tech Symposium, İstanbul, Türkiye, 7 December 2024; SETSCI-Conference Proceedings. 2024; pp. 12–17. [Google Scholar]
Chaikarnjanakit, T.; Kongprawechnon, W.; Karnjana, J. Real-Time Wildfire-Prone Area Monitoring and Early Warning System Based on Two-Stage YOLO-Based Smoke Detection Model. In Proceedings of the International Symposium on Integrated Uncertainty in Knowledge Modelling and Decision Making, Hanoi, Vietnam, 5–7 November 2025; Springer: Berlin/Heidelberg, Germany, 2025; pp. 320–331. [Google Scholar]
Li, A.; Zhang, L.; Liu, Y.; Zhu, C. Feature modulation transformer: Cross-refinement of global representation via high-frequency prior for image super-resolution. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Paris, France, 1–6 October 2023; pp. 12514–12524. [Google Scholar]
Tang, F.; Xu, Z.; Huang, Q.; Wang, J.; Hou, X.; Su, J.; Liu, J. DuAT: Dual-aggregation transformer network for medical image segmentation. In Proceedings of the Chinese Conference on Pattern Recognition and Computer Vision (PRCV), Beijing, China, 13–15 October 2023; pp. 343–356. [Google Scholar]
Yu, Z.; Huang, H.; Chen, W.; Su, Y.; Liu, Y.; Wang, X. Yolo-facev2: A scale and occlusion aware face detector. Pattern Recognit. 2024, 155, 110714. [Google Scholar] [CrossRef]
de Venancio, P.V.A.; Lisboa, A.C.; Barbosa, A.V. An automatic fire detection system based on deep convolutional neural networks for low-power, resource-constrained devices. Neural Comput. Appl. 2022, 34, 15349–15368. [Google Scholar] [CrossRef]
Zhang, X.; Wang, Y.; Wu, S.; Sun, B. An improved lightweight fire detection algorithm based on cascade sparse query. Opto-Electron. Eng. 2023, 50, 230216. [Google Scholar]
Lv, K.; Wu, R.; Chen, S.; Lan, P. CCi-YOLOv8n: Enhanced Fire Detection with CARAFE and Context-Guided Modules. In Proceedings of the International Conference on Intelligent Computing, Ningbo, China, 26–29 July 2025; Springer: Berlin/Heidelberg, Germany, 2025; pp. 128–140. [Google Scholar]
Li, Y.; Nie, L.; Zhou, F.; Liu, Y.; Fu, H.; Chen, N.; Dai, Q.; Wang, L. Improving Fire and Smoke Detection with You Only Look Once 11 and Multi-Scale Convolutional Attention. Fire 2025, 8, 165. [Google Scholar] [CrossRef]

Figure 1. YOLOv11-CHBG.

Figure 2. High-Frequency Enhanced Residual Block (HFERB).

Figure 3. C3K2-HFERB module.

Figure 4. Adaptive Global–Local Spatial Aggregation (AdaGLSA) module.

Figure 5. SEAMHead module. The left shows the overall structure of SEAM, and the right side of the figure shows the CSMM (Mixed Channel and Space Module) structure.

Figure 6. Datasets. The top shows sunlight and clouds that are easily mixed, and the bottom shows images of flames and smoke.

Figure 7. Visualization diagram. The parts in the figure are box_loss, cls_loss, dfl_loss, Precision, Recall, val/box_loss, val/cls_loss, val/dfl_loss, mAP50, and mAP50-95.

Figure 8. Confusion Matrix.

Figure 9. Training results.

Table 1. Reference content.

	Technology	Contribution	Outcome
Ref. [6]	It mainly uses color feature analysis and dynamic flame recognition technology.	A purely visual and low-computational-cost fire detection framework was proposed.	In standard test videos, the detection accuracy is 89% and the processing speed is up to 30 FPS.
Ref. [7]	It analyzes color, texture, and motion characteristics in your videos.	This is a representative study of the early application of computer vision to fire detection.	In standard test videos, the accuracy rate was 85% and the processing speed was up to 25 FPS on a regular PC.
Ref. [8]	The multi-level features of smoke and flames are extracted through deep learning models to achieve end-to-end fire identification.	For the first time, capsule networks are applied to fire/smoke detection.	It achieves an accuracy rate of 96.8% on the test set and can distinguish smoke from clouds and steam.
Ref. [9]	It combines deep learning methods with multimodal data fusion (e.g., visual, infrared, or sensor data).	It establishes the first high-performance FSDF framework for fire detection that takes into account real-time and high-accuracy performance.	It achieves a 98.2% accuracy on benchmark datasets such as FireNet and BoFire.
Ref. [10]	This paper comprehensively reviews the fire detection methods based on machine vision from 2000 to 2020, covering two major technical routes: traditional image processing and deep learning.	A three-way challenge model of “illumination–occlusion–dynamic interference” was proposed. It quantifies the ability of different technologies to respond to various challenges.	Six major directions for future research are proposed.
Ref. [11]	It is an end-to-end deep learning framework that integrates multiple pre-trained CNN models and timing modeling.	It effectively solves core challenges such as small object detection and complex background interference.	The accuracy of the forest fire video dataset from the perspective of drones is 96.2%.

Table 2. Reference content.

	Improvement	Datasets	mAP@0.5, Params
Ref. [16]	In YOLOv8, the fire flow kinematics model is embedded in the deep learning framework for the first time and a dynamic detection head is introduced.	Self-managed FireSmoke-3K test set	98.6%
Ref. [17]	The standard bottleneck in the YOLOv5 backbone network is replaced with a DCGC bottleneck and channel attention (SE Block) is increased.	Datasets such as FireNet and BoFire	96.3%, 2.7 M
Ref. [18]	Based on yolov4, a multi-scale feature enhancement module and a dynamic receptive field mechanism are introduced to enhance the feature fusion network and improve the pyramid pooling and loss functions.	Self-managed Fire-Small dataset	94.7%, 3.8 M
Ref. [19]	Based on yolov8n, an efficient feature extraction and aggregation mechanism is proposed by replacing convolution and increasing cross-layer density connection.	Datasets such as FireNet and BoFire	96.8%, 3.6 MB
Ref. [20]	Based on YOLOv8n, a lightweight Transformer module and a dynamic threshold strategy are introduced.	Industrial test set	97.2%, 2.1 MB
Ref. [21]	Slice inference and fractal dimension analysis are organically combined to improve the detection accuracy of small targets and complex objects.	Self-managed FIRE-SMOKE-25 dataset	84.9%, 6.9 M

Table 3. Dfire datasets.

Parameter	Value	Parameter	Value
Train sets: Test sets	8:2	Fire	1164
Train sets	17,221	Smoke	5867
Test sets	4306	Fire and Smoke	4658
Total sets	21,527	Without	9838

Table 4. Parameters.

Parameter	Value
Input size	640 × 640
Initial learning rate	0.01
Weight decay	0.0005
Batch size	8
iterations	300
Optimizer	SGD

Table 5. Comparative experiments.

	P/%	R/%	mAP@0.5/%	Params/M	GFLOPs	FPS
YOLOv8n	96.3	92	77.9	3.2	8.7	69
YOLOv10n	99.4	91	76.7	2.7	8.6	72
YOLOv12	93.8	91	78.8	2.6	6.5	92
YOLOv11	93.4	92	78.4	2.6	6.5	90
Ref. [31]	87.68	53.35	71.15	-	-	-
Ref. [32]	-	-	78.2	6.9	19.1	-
Ref. [33]	79.7	69.8	77.9	-	22.5	-
YOLOv11-CHBG	92.2	92	78.4	1.8	5.8	51

Table 6. Neck comparison experiment.

		P/%	R/%	mAP@0.5/%	Params/M	GFLOPs
Experiment 1	YOLOv11	93.4	92	78.4	2.6	6.5
Experiment 2	BiFPN	92.9	91	77.7	2.0	6.4
Experiment 3	AdaGLSA	93.9	90	78	3.7	8.5
Experiment 4	BiAdaGLSA	92.8	90	78.2	2.1	6.9

Table 7. Ablation experiment table.

	C3K2HSFPRB	BiAdaGLSA	SEAM	P/%	R/%	mAP@0.5/%	Params/M	GFLOPs
Experiment 1				93.4	92	78.4	2.6	6.5
Experiment 2	√			93.5	92	78.3	2.5	6.1
Experiment 3		√		92.8	90	78.2	2.1	6.9
Experiment 4			√	92.8	93	78.7	2.5	6
Experiment 5	√	√		91.8	91	78.2	1.9	6.4
Experiment 6	√	√	√	92.2	92	78.4	1.8	5.8

Note: √ in the table indicates the use of the module.

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Jiang, Y.; Liu, P.; Han, Y.; Xiao, B. YOLOv11-CHBG: A Lightweight Fire Detection Model. Fire 2025, 8, 338. https://doi.org/10.3390/fire8090338

AMA Style

Jiang Y, Liu P, Han Y, Xiao B. YOLOv11-CHBG: A Lightweight Fire Detection Model. Fire. 2025; 8(9):338. https://doi.org/10.3390/fire8090338

Chicago/Turabian Style

Jiang, Yushuang, Peisheng Liu, Yunping Han, and Bei Xiao. 2025. "YOLOv11-CHBG: A Lightweight Fire Detection Model" Fire 8, no. 9: 338. https://doi.org/10.3390/fire8090338

APA Style

Jiang, Y., Liu, P., Han, Y., & Xiao, B. (2025). YOLOv11-CHBG: A Lightweight Fire Detection Model. Fire, 8(9), 338. https://doi.org/10.3390/fire8090338

Article Menu

YOLOv11-CHBG: A Lightweight Fire Detection Model

Abstract

1. Introduction

2. Methods

2.1. YOLOv11-CHBG

2.1.1. C3k2-HFERB

2.1.2. BiAdaGLSA

2.1.3. SEAMHead

3. Results

3.1. Datasets

3.2. Experiment Setup

3.2.1. Experimental Environment

3.2.2. Experimental Parameters

3.3. Evaluation Metrics

3.4. Comparative Experiments

3.5. Ablation Experiments

3.6. Visualization of Experimental Results

3.6.1. Experimental Resu

3.6.2. Training Experimental Result

4. Discussion

5. Conclusions

Author Contributions

Funding

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI