A Lightweight Fire Detection Algorithm Based on the Improved YOLOv8 Model

: Aiming at solving the issues that fire detection is prone to be affected by environmental factors, and the accuracy of flame and smoke detection remains relatively low at the incipient stage of fire, a fire detection algorithm based on GCM-YOLO is put forward. Firstly, GhostNet is introduced to optimize the backbone network, enabling the model to be lightweight without sacrificing model accuracy. Secondly, the upsampling module is reorganized with content-aware features to enhance the detail capture and information fusion effect of the model. Finally, by incorporating the mixed local channel attention mechanism in the neck, the model can enhance the processing capability of complex scenes. The experimental results reveal that, compared with the baseline model YOLOv8n, the GCM-YOLO model in fire detection increases the mAP@0.5 by 1.2%, and the number of parameters and model size decrease by 38.3% and 34.9%, respectively. The GCM-YOLO model can raise the accuracy of fire detection while reducing the computational burden and is suitable for deployment in practical application scenarios such as mobile terminals.


Introduction
Timely detection and subsequent intervention during the incipient stages of a fire are of paramount significance in restricting its propagation and minimizing potential casualties and property damage [1][2][3][4].Typically, early fire detection utilizes methods such as smoke sensors, including photoelectric and ionization smoke detectors [5,6].Nevertheless, these sensors are prone to environmental conditions and demonstrate low monitoring efficiency, restricting their efficacy in detecting delicate smoke or concealed flames in the initial phases of the fire.
Advancements in traditional image processing techniques have contributed to the enhancement of fire detection accuracy, particularly through the analysis of color, texture, and shape features [7].For instance, Singh et al. [8] utilized the YCbCr color model to detect fire regions from video image frames.Their study distinguishes flame pixels from other highintensity pixels using specific parameters in the YCbCr color space.By analyzing video image frames, an effective method for detecting flame regions was proposed.Xiong et al. [9] proposed a novel superpixel synthesis algorithm and enhanced existing horizon detection algorithms by employing support vector machines for superpixel classification.However, these methods often struggle with variations in lighting conditions and background interference [10], thereby limiting their effectiveness in complex environments.In recent years, object detection methods have emerged as promising approaches for fire and smoke detection.Farasin [11] proposed a deep learning-based "Double-Step U-Net" method that combines classification and regression algorithms to predict the damage/severity levels of sub-regions within affected areas by processing satellite images following a single fire incident.Bahhar et al. [12] presented an efficacious wildfire and smoke detection model integrating the YOLO architecture with a voting ensemble of convolutional neural network (CNN) architectures.This model adopts an ensemble of multiple CNN architectures and functions in two stages for the classification and detection of fire and smoke.Kim et al. [13] proposed a domain-independent fire detection algorithm grounded on the YOLOv5 framework, integrating linear attention and a Gated Temporal Pool (GTP) to extract the spatial and temporal features of fires.Yang et al. [14] proposed a lightweight fire detection algorithm based on YOLOv5, incorporating Ghost modules and the CA attention mechanism, alongside feature fusion weight parameters derived from the Path Aggregation Network structure, to enhance detection speed.
Currently, fire detection algorithms often require increasing the number of network layers to enhance accuracy [15], which demands substantial computational resources and time.Moreover, their robustness and real-time performance in complex environments still need improvement.To address the above problems, this paper proposes a lightweight fire detection network GCM-YOLO based on an improved feature extraction network and attention mechanism.Firstly, a phantom network is introduced to achieve the optimization of the backbone network, which reduces the number of parameters and the amount of computation of the model without losing the accuracy of the model; secondly, the CARAFEs upsampling module is added to enhance the feature information extraction capability, so that the model can better adapt to the targets of different scales; finally, the hybrid local channel attention mechanism is added at the neck to improve the accuracy of the model in detecting targets in complex environments.

Improved YOLOv8 Model
YOLOv8 (You Only Look Once version 8, YOLOv8) emerges as a real-time target detection model founded upon a deep convolutional neural network, initiated by Ultralytics.Renowned for its outstanding performance, versatility, and efficiency, this model incorporates several key architectural innovations.Its backbone network segment employs the C2f structure, based on the CSPNet framework, facilitating the extraction and prioritization of crucial information by mapping channel features to focal features.In the neck segment, YOLOv8 integrates the PANet structure, enabling the seamless transfer and aggregation of information through multiple pathways, effectively integrating global and local feature insights.The Detection Head segment adopts the Anchor-Free methodology, enabling the direct prediction of target location and size from the feature map, without reliance on fixed anchor frame scales and aspect ratios.Furthermore, the loss function incorporates the Task Aligned Assigner positive sample allocation strategy, along with Distribution Focal Loss [16], which effectively alleviates issues related to sample imbalance and category distribution disparities.
Currently, fire detection algorithms typically demand substantial computational resources to process complex image data and model structures, relying on high-performance devices such as GPUs with robust computing power.Therefore, this paper puts forward a lightweight fire detection network based on an improved YOLOv8 model, named GCM-YOLO after the modifications.The structure of the improved model is shown in Figure 1.

GhostNet
Traditional convolutional modules use fixed-size kernels for convolutional operations in each layer, leading to a fixed receptive field and a large number of parameters, which affects the efficiency of the model.To reduce the training time and number of parameters, while maintaining accuracy and improving performance on low-computation edge devices, this paper incorporates Ghost Convolution and Ghost Bottleneck from GhostNet (Ghost Network, GhostNet) into YOLOv8 [17].These modifications aim to boost computational efficiency and detection accuracy.
The core innovation of GhostNet is GhostConv.Traditional residual blocks produce feature maps with many similar features, which do not all need to be obtained through convolutions.These similar features can be generated through more efficient operations.GhostConv compresses channels using standard convolutions to obtain intrinsic feature maps, requiring fewer convolution kernels than conventional convolutions, thereby reducing computational cost.Then, it generates Ghost feature maps through grouped convolutions for each feature map.Finally, the feature maps from the standard convolutions and the grouped convolutions are concatenated to form the output feature map.The structure of GhostConv is shown in Figure 2.

GhostNet
Traditional convolutional modules use fixed-size kernels for convolutional operations in each layer, leading to a fixed receptive field and a large number of parameters, which affects the efficiency of the model.To reduce the training time and number of parameters, while maintaining accuracy and improving performance on low-computation edge devices, this paper incorporates Ghost Convolution and Ghost Bottleneck from GhostNet (Ghost Network, GhostNet) into YOLOv8 [17].These modifications aim to boost computational efficiency and detection accuracy.
The core innovation of GhostNet is GhostConv.Traditional residual blocks produce feature maps with many similar features, which do not all need to be obtained through convolutions.These similar features can be generated through more efficient operations.GhostConv compresses channels using standard convolutions to obtain intrinsic feature maps, requiring fewer convolution kernels than conventional convolutions, thereby reducing computational cost.Then, it generates Ghost feature maps through grouped convolutions for each feature map.Finally, the feature maps from the standard convolutions and the grouped convolutions are concatenated to form the output feature map.The structure of GhostConv is shown in Figure 2.  The Ghost Bottleneck, constructed by stacking GhostConv layers in a ResNet-like fashion, replaces the C2f module in YOLOv8.The Ghost Bottleneck comprises mainly an expansion layer and a compression layer.The first Ghost module functions as the expansion layer, generating some feature maps using standard convolution, and then expanding these into more Ghost feature maps through linear transformations, thus increasing the feature dimensions.The second Ghost module is responsible for reducing the feature dimensions.Additionally, a downsampling shortcut path, encompassing depthwise separable convolutions and standard convolution layers, is introduced.This constitutes the complete Ghost Bottleneck structure.The structure of Ghost Bottleneck is depicted in

The Content-Aware ReAssembly of Features Module
The structural diagram of the CARAFE (Content-Aware ReAssembly of Features CARAFE) module, as depicted in Figure 4, addresses the deficiencies of traditional upsampling methods in preserving fine details and reconstructing semantic information.By reassembling low-resolution feature maps to generate high-resolution equivalents, the CARAFE model effectively mitigates these shortcomings.Its adaptive convolutional kernel mechanism enables flexible adaptation to various scenes and scales for feature The Ghost Bottleneck, constructed by stacking GhostConv layers in a ResNet-like fashion, replaces the C2f module in YOLOv8.The Ghost Bottleneck comprises mainly an expansion layer and a compression layer.The first Ghost module functions as the expansion layer, generating some feature maps using standard convolution, and then expanding these into more Ghost feature maps through linear transformations, thus increasing the feature dimensions.The second Ghost module is responsible for reducing the feature dimensions.Additionally, a downsampling shortcut path, encompassing depthwise separable convolutions and standard convolution layers, is introduced.This constitutes the complete Ghost Bottleneck structure.The structure of Ghost Bottleneck is depicted in Figure 3.

Input Feature
Feature Concentration Output Feature The Ghost Bottleneck, constructed by stacking GhostConv layers in fashion, replaces the C2f module in YOLOv8.The Ghost Bottleneck compr expansion layer and a compression layer.The first Ghost module function sion layer, generating some feature maps using standard convolution, and ing these into more Ghost feature maps through linear transformations, t the feature dimensions.The second Ghost module is responsible for reduc dimensions.Additionally, a downsampling shortcut path, encompassing d arable convolutions and standard convolution layers, is introduced.This complete Ghost Bottleneck structure.The structure of Ghost Bottleneck is d ure 3.

The Content-Aware ReAssembly of Features Module
The structural diagram of the CARAFE (Content-Aware ReAssemb CARAFE) module, as depicted in Figure 4, addresses the deficiencies of sampling methods in preserving fine details and reconstructing semantic in reassembling low-resolution feature maps to generate high-resolution eq CARAFE model effectively mitigates these shortcomings.Its adaptive con

The Content-Aware ReAssembly of Features Module
The structural diagram of the CARAFE (Content-Aware ReAssembly of Features, CARAFE) module, as depicted in Figure 4, addresses the deficiencies of traditional upsampling methods in preserving fine details and reconstructing semantic information.
By reassembling low-resolution feature maps to generate high-resolution equivalents, the CARAFE model effectively mitigates these shortcomings.Its adaptive convolutional kernel mechanism enables flexible adaptation to various scenes and scales for feature reconstruction, thereby maintaining model lightweightness while enhancing its generalization capabilities.

R PEER REVIEW 5 of 13
reconstruction, thereby maintaining model lightweightness while enhancing its generalization capabilities.The CARAFE upsampling process is primarily divided into two modules: the upsampling kernel prediction module and the content-aware feature reassembly module.In the kernel prediction module, the model's parameter count is reduced by compression based on the number of channels in the input low-resolution feature map.Specifically, if the feature map has the dimensions To address challenges such as insufficiently capturing fire and smoke details, narrow perception range, and underutilization of fire and smoke information in complex backgrounds during fire and smoke detection, CARAFE upsampling is introduced into the YOLOv8 model.This enhancement aims to improve the upsampling module in the original feature pyramid structure, thereby enhancing the capability of the model to extract fire and smoke features.The CARAFE upsampling process is primarily divided into two modules: the upsampling kernel prediction module and the content-aware feature reassembly module.In the kernel prediction module, the model's parameter count is reduced by compression based on the number of channels in the input low-resolution feature map.Specifically, if the feature map has the dimensions H × W × C m , a k encoder × k encoder , then the convolutional kernel prediction module processes the compressed input feature map, where the input channel number is C m and the output channel number is σ 2 k 2 up .As a result, the upsampling kernel size becomes σH × σW × k 2 up .To ensure feature balance and stability during feature reassembly, the softmax function is used to normalize the weights of each upsampling kernel at each position.This transformation converts the original scores of the upsampling kernel to non-negative values, ensuring that the sum of weights is 1.In the feature reassembly module, regions centered around target positions in the output feature map with the dimensions k up × k up are extracted.Then, the predicted upsampling kernel at that point is used for dot product operation with the extracted region.The resulting feature is mapped back to the input feature map, resulting in a feature map of size σH × σW × C.

Mixed Local Channel Attention
To address challenges such as insufficiently capturing fire and smoke details, narrow perception range, and underutilization of fire and smoke information in complex backgrounds during fire and smoke detection, CARAFE upsampling is introduced into the YOLOv8 model.This enhancement aims to improve the upsampling module in the original feature pyramid structure, thereby enhancing the capability of the model to extract fire and smoke features.

Materials
Due to the current absence of authoritative publicly available datasets on fire and smoke, this study collected on-site images of fire incidents to create its own experimental dataset.These images were sourced from publicly available datasets, internet images, and screenshots from internet videos.Using the LabelImg v1.8.1 tool for object detection annotation, previously unlabeled images were annotated, focusing mainly on two labels: fire and smoke.To enrich the dataset, various data augmentation techniques were employed, including rotation, grayscale conversion, random scaling, and the addition of Gaussian noise.The final dataset comprises a total of 8751 images depicting fire and smoke, capturing a wide range of environments such as indoor settings, forests, residential buildings, and low-light conditions.Selected examples from the dataset are shown in Figure 6.

Materials
Due to the current absence of authoritative publicly available datasets on fire and smoke, this study collected on-site images of fire incidents to create its own experimental dataset.These images were sourced from publicly available datasets, internet images, and screenshots from internet videos.Using the LabelImg v1.8.1 tool for object detection annotation, previously unlabeled images were annotated, focusing mainly on two labels: fire and smoke.To enrich the dataset, various data augmentation techniques were employed, including rotation, grayscale conversion, random scaling, and the addition of Gaussian noise.The final dataset comprises a total of 8751 images depicting fire and smoke, capturing a wide range of environments such as indoor settings, forests, residential buildings, and low-light conditions.Selected examples from the dataset are shown in Figure 6.

Training Equipment and Parameter Setting
The experiment was conducted on a 64-bit Windows 10 version 22H2 operating system, using an NVIDIA Quadro P6000 GPU (NVIDIA, Santa Clara, CA, USA) with 24 GB of VRAM.Python 3.11.5 was the programming language used, and GPU acceleration was achieved with CUDA v11.8.Training was performed using the PyTorch 2.1.1 deep learning framework.In the experiment, the dataset was divided into training, validation, and test sets in an 8:1:1 ratio.Key model parameters are listed in Table 1.

Evaluation Indicators
In order to assess the detection performance of the GCM-YOLO model, two evaluation indexes, namely recall and mean average precision (mAP), are utilized.
The recall rate, expressed as a proportion, indicates the proportion of samples that are correctly recognized as positive cases among all samples that are actually positive cases.This metric is employed to evaluate the model's ability to recognize positive cases.The calculation formula for this metric is presented in Equation (1).

Recall
TP FN   In the above formula, TP represents True Positive and FN stands for False Negative.Average precision (AP) is the area under the precision-recall curve (PR curve) and reflects the average level of precision of the model under different recall rates.The

Training Equipment and Parameter Setting
The experiment was conducted on a 64-bit Windows 10 version 22H2 operating system, using an NVIDIA Quadro P6000 GPU (NVIDIA, Santa Clara, CA, USA) with 24 GB of VRAM.Python 3.11.5 was the programming language used, and GPU acceleration was achieved with CUDA v11.8.Training was performed using the PyTorch 2.1.1 deep learning framework.In the experiment, the dataset was divided into training, validation, and test sets in an 8:1:1 ratio.Key model parameters are listed in Table 1.

Evaluation Indicators
In order to assess the detection performance of the GCM-YOLO model, two evaluation indexes, namely recall and mean average precision (mAP), are utilized.
The recall rate, expressed as a proportion, indicates the proportion of samples that are correctly recognized as positive cases among all samples that are actually positive cases.This metric is employed to evaluate the model's ability to recognize positive cases.The calculation formula for this metric is presented in Equation (1).
In the above formula, TP represents True Positive and FN stands for False Negative.Average precision (AP) is the area under the precision-recall curve (PR curve) and reflects the average level of precision of the model under different recall rates.The precisionrecall curve of each category is calculated based on the prediction results of the model.The area under the curve is then calculated to obtain the average precision of each category.Subsequently, these averages are combined to obtain an overall mAP value using the equations shown in Equations ( 2) and (3).
The evaluation metrics also include measures such as the amount of model computations (FLOPs), the number of model parameters (parameters), and model size.The amount of computation refers to the floating-point operations required for the forward propagation process within a given model; it serves as an indicator for assessing computational complexity.The number of parameters denotes the total trainable parameters within a given model, while model size refers to the storage space required for storing said models, typically measured in megabytes (MB).

Comparison of Ablation Experiments
To systematically analyze and validate the influence of each constituent part of the model on its final performance, ablation experiments were conducted on a custom fire dataset.Each experiment within the groups maintained consistent training environments and related training parameters.Refer to Table 2 for detailed experimental results.√ " means that an improvement module has been added to the model, "-" means that this improvement has not been added.
As shown in Table 2, incorporating GhostConv and Ghost Bottleneck structures from GhostNet into the YOLOv8n baseline model optimizes the feature extraction network, significantly reducing the number of parameters and computational load.Compared to the baseline model, the parameters and FLOPs are reduced by 2.8 M and 2.5 G, respectively, while the mAP@0.5 (mean average precision at IoU threshold of 0.5) of this model only decreases by 0.1 percentage points.Adding the CARAFE upsampling detection module to the baseline model increases the attention to global feature information, improving the mAP@0.5 by 1.0%.Model 3 introduces the MLCA attention mechanism into the baseline model, enhancing the network's perception of both local and global information, which boosts the mAP@0.5 by 0.3% without additional computational cost.Model 4, which combines GhostNet and CARAFE modules, reduces computational load while enhancing feature resampling effectiveness, achieving an mAP@0.5 of 82.3%.This model shows a 38.3% reduction in parameters and a 34.6% reduction in FLOPs compared to the baseline model.Finally, the GCM-YOLO model integrates GhostNet, CARAFE, and MLCA, reaching an mAP@0.5 of 82.9% and a recall rate of 76.9%.It maintains a low level of parameters and FLOPs, demonstrating optimal overall performance.The synergistic interaction between these three modules maximizes the model's detection effectiveness while minimizing computational resource consumption.

Comparison Experiment
To validate the lightweight effectiveness of the GhostNet backbone, comparative experiments were conducted using mainstream lightweight networks, FasterNet, and HGNetv2.The results are presented in Table 3.As shown in the table, introducing FasterNet as the backbone network significantly reduces the number of parameters and FLOPs.However, this reduction in model size comes at the cost of less effective feature extraction compared to the baseline model, resulting in a 0.5 percentage point decrease in the mAP@0.5.Using HGNetv2 (Hierarchical Geometry Network version 2) as the backbone network also reduces the parameters and FLOPs compared to the baseline model, but the improvements in lightweight performance are not as pronounced as those achieved with GhostNet.Incorporating GhostNet as the backbone network markedly reduces both the number of parameters and the computational complexity while incurring only a minimal loss in accuracy, thereby demonstrating superior lightweight performance.
To further validate the effectiveness of the MLCA attention mechanism in fire detection, we conducted comparative experiments with other mainstream attention mechanisms on our custom fire dataset.The results are shown in Table 4.As shown in Table 4, the YOLOv8n-MLCA model achieves the highest improvement in mean average precision (mAP) compared to the baseline model.Introducing the SEAttention (Squeeze-and-Excitation Attention, SEAttention) module into the baseline model improves the mAP@0.5 to 81.9%, but the recall rate decreases by 0.6%.The SEAttention module uses global average pooling to capture global features, but this reduces the model's ability to capture detailed features.The CBAM (Convolutional Block Attention Module, CBAM) module includes a dual attention mechanism that computes both channel and spatial attention, resulting in a more complex processing flow.The SimAM (Simple Attention Module, SimAM) module simplifies the attention mechanism by combining linear transformation and dot product operations, avoiding the use of the softmax operation typically employed to compute attention weights.However, this simplification can lead to improper key weight allocation when handling small artifacts.MLCA, on the other hand, employs a multi-level feature fusion mechanism, allowing for the more effective integration of features at different scales.Compared to single-level attention mechanisms, MLCA can more comprehensively capture and utilize feature information.The comparison results indicate that the MLCA module performs best in terms of improving detection accuracy and recall rate, meeting the requirements for practical applications.
To further validate the performance of the GCM-YOLO model in fire detection, comparative analyses were conducted using different versions of the YOLO model on the custom dataset.The results are shown in Table 5.Table 5 indicates that both YOLOv8n and YOLOv8s achieved higher average precision than the YOLOv5 version.Despite YOLOv8s showing slightly better accuracy, its parameter count and model size are approximately three times larger than those of YOLOv8n, which limits its suitability for real-time detection in resource-constrained environments.In contrast, the proposed GCM-YOLO model demonstrates significant advantages across all metrics, achieving an mAP@0.5 of 82.9% and a recall rate of 76.9%.Moreover, it maintains the lowest parameter count (1.9 M), FLOPS (5.3 G), and smallest model size (4.1 MB).These results indicate that the enhanced algorithm improves accuracy, detection efficiency, and compactness for fire detection tasks, making it well suited for deployment on devices with limited computational resources.

Results and Analysis
Comparing the GCM-YOLO model with the YOLOv8 baseline network, there is an improvement of 1.2% in the mAP@0.5 and 1.1% in the recall rate.At the same time, the parameter count and model size are reduced by 38.3% and 34.9%, respectively, which indicates that the enhanced model effectively reduces resource consumption while maintaining high accuracy.
During the model training process, as shown in Figure 7a, the mAP@0.5 of GCM-YOLO remains stable and consistently higher than that of the baseline YOLOv8n model after 100 epochs.This demonstrates superior stability and robustness during training.The comparison of loss function curves in Figure 7b  The GCM-YOLO model demonstrates robust detection performance across diverse environments such as indoor, road, and complex backgrounds, as illustrated in Figure 8. Across various scenarios, it shows an improved ability to recognize flames and smoke, underscoring its adaptability and effectiveness in different settings.Figure 8a   The GCM-YOLO model demonstrates robust detection performance across diverse environments such as indoor, road, and complex backgrounds, as illustrated in Figure 8. Across various scenarios, it shows an improved ability to recognize flames and smoke, underscoring its adaptability and effectiveness in different settings.Figure 8a highlights the model's capability to enhance smoke detection accuracy in situations where smoke blends with road-like backgrounds, and it effectively detects smaller flame targets.In Figure 8b, the model's improvements are evident in reducing false detections, particularly in dimly lit conditions.Figure 8c further showcases GCM-YOLO's enhanced detection accuracy for flames and smoke amidst complex backgrounds.As shown in Figure 8d, when the image contains artificial light sources such as lamps, the YOLOv8n model mistakenly identifies the light as flames.However, the improved model significantly reduces the false detection rate under artificial light interference, enhancing the overall robustness of the detection.The GCM-YOLO model demonstrates robust detection performance across diverse environments such as indoor, road, and complex backgrounds, as illustrated in Figure 8. Across various scenarios, it shows an improved ability to recognize flames and smoke, underscoring its adaptability and effectiveness in different settings.Figure 8a highlights the model's capability to enhance smoke detection accuracy in situations where smoke blends with road-like backgrounds, and it effectively detects smaller flame targets.In Figure 8b, the model's improvements are evident in reducing false detections, particularly in dimly lit conditions.Figure 8c further showcases GCM-YOLO's enhanced detection accuracy for flames and smoke amidst complex backgrounds.As shown in Figure 8d, when the image contains artificial light sources such as lamps, the YOLOv8n model mistakenly identifies the light as flames.However, the improved model significantly reduces the false detection rate under artificial light interference, enhancing the overall robustness of the detection.

Conclusions
This study confronts the challenges of existing fire detection systems, which often lack robustness in complex environments and have difficulties in accurately detecting fires at an early stage.To improve upon the YOLOv8 baseline model, we propose the GCM-YOLO lightweight fire detection algorithm.GCM-YOLO enhances the YOLOv8n backbone by integrating GhostNet, introducing the CARAFEs upsampling module, and

Conclusions
This study confronts the challenges of existing fire detection systems, which often lack robustness in complex environments and have difficulties in accurately detecting fires at an early stage.To improve upon the YOLOv8 baseline model, we propose the GCM-YOLO lightweight fire detection algorithm.GCM-YOLO enhances the YOLOv8n backbone by integrating GhostNet, introducing the CARAFEs upsampling module, and incorporating the MLCA attention mechanism to enhance model precision.GhostNet is a lightweight neural network architecture.It reduces computational cost by employing a small number of actual convolutional kernels to generate the feature maps and then utilizes economical linear operations to generate additional feature maps, which demonstrates significant superiority in terms of model lightweighting when compared with FasterNet and HGNetv2 networks.The CARAFEs upsampling module generates reconfigured weights by exploiting the content information within the input feature maps, allowing the upsampled feature maps to retain more detailed information, thereby improving the detection accuracy of the model.The MLCA attention mechanism integrates channel attention, spatial attention, and local and global information and is capable of distributing attention resources more effectively, enhancing the weights of crucial features, and suppressing irrelevant or noisy features.Compared with SEAttention, CBAM, and SimAM attention mechanisms, it achieves the optimum performance in terms of accuracy.Experimental results suggest that GCM-YOLO attains 82.9% of the mAP@0.5,representing a 1.2% increase in average accuracy compared to YOLOv8n.Additionally, it reduces the model parameters and FLOP by 38.3% and 34.6%, respectively.With the enhanced model efficiency and accuracy, GCM-YOLO proves highly appropriate for deployment in resource-constrained environments and has considerable practical applications in diverse scenarios.
This study contributes novel knowledge to the realm of assertive decision-making in the supervision of security and fire protection systems.It introduces innovative feature extraction methods and attention mechanisms that exhibit practical feasibility in resource-constrained environments.Nevertheless, it is undeniable that although the model performs well in resource-constrained environments, real-time performance constitutes a key metric in practical applications.The performance of the model's inference speed on different hardware platforms needs further assessment and optimization.Future research should focus on investigating the reliability and security of the developed algorithms and the modernized security system as a whole.Comprehensive reliability tests and security evaluations will ensure the robustness of GCM-YOLO in real-world applications.

,
then the convolutional kernel prediction module processes the compressed input feature map, where the input channel number is m C and the output channel number is  2 2 up k .As a result, the upsampling kernel size becomes  ensure feature balance and stabil- ity during feature reassembly, the softmax function is used to normalize the weights of each upsampling kernel at each position.This transformation converts the original scores of the upsampling kernel to non-negative values, ensuring that the sum of weights is 1.In the feature reassembly module, regions centered around target positions in the output feature map with the dimensions  up up k k are extracted.Then, the predicted upsam- pling kernel at that point is used for dot product operation with the extracted region.The resulting feature is mapped back to the input feature map, resulting in a feature map of size     H W C .
highlights the model's capability to enhance smoke detection accuracy in situations where smoke blends with road-like backgrounds, and it effectively detects smaller flame targets.In Figure 8b, the model's improvements are evident in reducing false detections, particularly in dimly lit conditions.Figure 8c further showcases GCM-YOLO's enhanced detection accuracy for flames and smoke amidst complex backgrounds.As shown in Figure 8d, when

Table 3 .
Results of comparison experiment with different lightweight backbone networks.

Table 4 .
Results of comparison experiment with different attention mechanisms.

Table 5 .
Results of comparison experiment with different networks.