An Improved Forest Smoke Detection Model Based on YOLOv8

: This study centers on leveraging smoke detection for preemptive forest smoke detection. Owing to the inherent ambiguity and uncertainty in smoke characteristics, existing smoke detection algorithms suffer from reduced detection accuracy, elevated false alarm rates, and occurrences of omissions. To resolve these issues, this paper employs an efficient YOLOv8 network and integrates three novel detection modules for enhancement. These modules comprise the edge feature enhancement module, designed to identify smoke ambiguity features, alongside the multi-feature extraction module and the global feature enhancement module, targeting the detection of smoke uncertainty features. These modifications improve the accuracy of smoke area identification while notably lowering the rate of false alarms and omission phenomenon occurrences. Meanwhile, a large forest smoke dataset is created in this paper, which includes not only smoke images with normal forest backgrounds but also a considerable quantity of smoke images with complex backgrounds to enhance the algorithm’s robustness. The proposed algorithm in this paper achieves an AP of 79.1%, 79.2%, and 93.8% for the self-made dataset, XJTU-RS, and USTC-RF, respectively. These results surpass those obtained by the current state-of-the-art target detection-based and neural network-based improved smoke detection algorithms.


Introduction
Two primary means of early forest fire warning are fire detection and smoke detection.Compared with fire detection, smoke detection can ensure the safety of firefighters and minimize the loss of forest and firefighting resources to a greater extent.Smoke detection technology mainly includes smoke identification, segmentation, detection, and concentration estimation.Due to the incapability of smoke recognition to pinpoint the exact smoke occurrence, both smoke segmentation and smoke concentration estimation fall short of meeting the real-time demands of smoke detection.Therefore, this article uses smoke detection technology to detect smoke, which can quickly identify the exact whereabouts of smoke instances, has a wide working range, and meets the requirements for timely rescue and loss reduction in early forest fire warning.Existing smoke detection methods predominantly encompass manual detection, sensor-based detection, and machine vision detection, among which manual detection is costly, and the sensor detection range is narrow, so this study employs the machine vision detection approach.Machine vision techniques for smoke detection can be classified into conventional manual feature selection and deep learning methodologies.
Conventional manual feature selection methods usually detect smoke by manual selection or threshold setting based on one or more features such as frequency domain, color, shape, texture, time, etc., of smoke.Toreyin et al. [1][2][3] presented a smoke and flame detection method utilizing frequency domain features.Besbes et al. [4] introduced a smoke detection approach utilizing color features.Gomes et al. [5] presented a smoke detection approach integrating various features including color, frequency domain, and Forests 2024, 15, 409 3 of 20 convolutional neural network.However, demonstrating superior performance on datasets from CCTV and UAV, its partially synthetic dataset prevents direct evaluation for natural forest surveillance.Li et al. [39] introduced a wildfire smoke detection algorithm based on the 3D-PFCN network.Despite identifying early-stage smoke, the algorithm exhibited a high false alarm rate and limitations in detecting slow-moving smoke in complex environments.Cao et al. [40] proposed a smoke source prediction and detection method based on the EFFNet network.Despite incorporating a temporal module for enhanced spatiotemporal feature representation, the algorithm faced challenges detecting smoke in complex environments.
In machine vision-based smoke detection, some methods leverage 2D convolution to extract spatial features for smoke detection, while others utilize 3D convolution to capture spatiotemporal features.Furthermore, a hybrid network integrating CNN with a temporal modeling module can be employed for extracting spatiotemporal features in smoke detection.Lin et al. [41] proposed a smoke detection algorithm based on a FAST RCNN and 3D CNN network, achieving commendable network performance.Nevertheless, the algorithm comes with high hardware requirements and computational complexity and falls short of delivering satisfactory real-time results.Donahue et al. [42] introduced a visual recognition and description method employing a CNN + LSTM network.This method utilizes CNN to extract spatial features and LSTM to integrate temporal features of moving targets, making it applicable in smoke video recognition.Wang et al. [43] presented a novel framework for video action recognition based on the TSN network, providing an efficient and effective solution for video action recognition with potential application in smoke video recognition.Zhou et al. [44] proposed a framework based on the TRN-Multiscale network for capturing temporal relationships over multiple time scales, which holds promise for use in smoke video recognition.Lin et al. [45] introduced video understanding based on the TSM network, a consideration in smoke video recognition.However, forest surveillance videos, typically positioned on watchtowers, result in distant occurrences of forest fires and slow smoke movement.Consequently, CNN and hybrid networks struggle to extract evident movement information from smoke.This limitation gives rise to overfitting in existing smoke detection algorithms based on the spatial and temporal features of smoke.Therefore, this paper adopts a smoke detection algorithm centered on smoke spatial features.
To address the aforementioned challenges, this study proposes MEG-YOLOv8 based on improved YOLOv8.The improvements involve integrating new modules-multi-feature extraction, global feature enhancement, and edge feature enhancement.The multi-feature extraction module and edge feature enhancement module enhance the network's feature extraction abilities, boosting smoke detection accuracy and reducing false alarm rates.Additionally, the global feature enhancement module extracts structural smoke information globally, aiding the network in accurately identifying smoke areas and minimizing alarm omission.For effective model performance, neural networks require adequate real forest scene smoke images as datasets.However, available datasets like USTC-RF [46] and XJTU-RS [32] are limited.The USTC-RF dataset lacks semantic meaning and is unsuitable for real forest fire safety monitoring, while the XJTU-RS dataset, with only 6845 smoke images, lacks generalizability and complexity for smoke detection in intricate forest backgrounds.To address this, this paper combines existing real forest smoke datasets and actual forest scene images provided by forest security companies to create a comprehensive dataset called RSF.This dataset comprises 15,373 forest fire smoke images, including those from both normal and complex scenes, serving as a practical resource for forest security monitoring.
In summary, the network architecture proposed in this paper has the following four contributions: (1) To detect the uncertain features of smoke, this study introduces a multi-feature extraction module by combining the three operations of standard convolution, deformable convolution, and involution, which is capable of achieving the functions of adaptive spatial aggregation and adaptive weight assignment, which makes it possible to ex-tract the smoke features locally, globally, and variably and facilitate differentiation between smoke plumes and smoke-like objects, thereby enhancing smoke detection accuracy and minimizing false alarms.(2) To accurately identify the smoke region, this paper proposes a global feature enhancement module, which can extract the structural information of smoke from a global perspective, making the extraction of the smoke region more comprehensive, thus reducing the phenomenon of smoke detection omission.(3) To detect the ambiguity of smoke, this paper proposes an edge feature enhancement module, which can reduce the smoke edge noise and enhance the ability of smoke feature extraction to enhance smoke detection accuracy and decrease false alarms.(4) It is well known that a dataset's quality significantly impacts the efficacy of deep learning algorithms, so this paper produces a large forest fire smoke dataset, which not only contains smoke images in normal scenes but also smoke images with complex backgrounds; compared with the existing forest fire smoke dataset, this paper proposes that the dataset has a better practical value.

Datasets
This study employs three distinct datasets to showcase the superior performance of the introduced algorithm.As shown in Figure 1a, USTC-RF, developed by Zhang et al. [46], focuses on synthetic smoke in expansive forest settings.This dataset comprises 12,620 forest smoke images synthesized by extracting smoke plume features from 2800 genuine smoke images and embedding them randomly into forest background images.As shown in Figure 1b, XJTU-RS, introduced by Wang et al. [32], caters to real smoke scenarios within broader real-world applications.It was curated from two benchmark datasets, CVPR [47] and USTC [46].Recognizing the limitations of existing datasets in simulating forest fire warnings due to their inadequacy in capturing smoke in forest environments, a novel dataset, RSF, is proposed in this study, as shown in Figure 1c.The RSF dataset amalgamates 13,675 forest scenario images, incorporating ambiguous and uncertain smoke characteristics from current public smoke datasets and an additional 1698 images sourced from a security company.The detailed training configurations for different datasets are shown in Table 1.
(1) To detect the uncertain features of smoke, this study introduces a multi-feature extraction module by combining the three operations of standard convolution, deformable convolution, and involution, which is capable of achieving the functions of adaptive spatial aggregation and adaptive weight assignment, which makes it possible to extract the smoke features locally, globally, and variably and facilitate differentiation between smoke plumes and smoke-like objects, thereby enhancing smoke detection accuracy and minimizing false alarms.(2) To accurately identify the smoke region, this paper proposes a global feature enhancement module, which can extract the structural information of smoke from a global perspective, making the extraction of the smoke region more comprehensive, thus reducing the phenomenon of smoke detection omission.(3) To detect the ambiguity of smoke, this paper proposes an edge feature enhancement module, which can reduce the smoke edge noise and enhance the ability of smoke feature extraction to enhance smoke detection accuracy and decrease false alarms.(4) It is well known that a dataset's quality significantly impacts the efficacy of deep learning algorithms, so this paper produces a large forest fire smoke dataset, which not only contains smoke images in normal scenes but also smoke images with complex backgrounds; compared with the existing forest fire smoke dataset, this paper proposes that the dataset has a better practical value.

Datasets
This study employs three distinct datasets to showcase the superior performance of the introduced algorithm.As shown in Figure 1a, USTC-RF, developed by Zhang et al. [46], focuses on synthetic smoke in expansive forest settings.This dataset comprises 12,620 forest smoke images synthesized by extracting smoke plume features from 2800 genuine smoke images and embedding them randomly into forest background images.As shown in Figure 1b, XJTU-RS, introduced by Wang et al. [32], caters to real smoke scenarios within broader real-world applications.It was curated from two benchmark datasets, CVPR [47] and USTC [46].Recognizing the limitations of existing datasets in simulating forest fire warnings due to their inadequacy in capturing smoke in forest environments, a novel dataset, RSF, is proposed in this study, as shown in Figure 1c.The RSF dataset amalgamates 13,675 forest scenario images, incorporating ambiguous and uncertain smoke characteristics from current public smoke datasets and an additional 1698 images sourced from a security company.The detailed training configurations for different datasets are shown in Table 1.
The smoke images of the above three datasets are shown in Figure 1.The smoke images of the above three datasets are shown in Figure 1.

The Proposed Network Architecture
The vague and uncertain nature of early smoke characteristics in forest fires poses challenges for existing smoke detection algorithms in accurately identifying and pinpointing smoke locations.This paper introduces a target detection network, MGE-YOLOv8, to address this issue, illustrated in Figure 2.

Multi-Feature Extraction Module
Smoke uncertainty characteristics mainly come from two aspects; the first is that smoke is a non-rigid object without a fixed geometric shape, so the morphology of smoke is uncertain, leading to the diminished precision of the existing smoke detection algorithms; the second is that there are many objects in the background of the smoke image that are similar to the smoke, such as clouds, light, fog, as well as haze, and so on, so the classification of smoke is uncertain, leading to the existing smoke detection algorithms Since YOLOv8 demonstrates superior performance in target detection, this paper extends its application to smoke detection.In addition, this paper adds a new MEM (multi-feature extraction module), GEM (global feature enhancement module), and EEM (edge feature enhancement module) to the YOLOv8 network based on the ambiguity and uncertainty characteristics of smoke.To detect the uncertainty features of smoke, this study suggests that the MEM splices the local features, non-rigid features, and global features of smoke well by means of residual concatenation of standard convolution, deformable convolution, and involution, which enhances the ability of multi-feature representation of smoke.To augment the global dependence of the advanced features, this paper proposes the global feature enhancement module, which strengthens the structural information of the advanced features through the spatial self-attention mechanism and the channel selfattention mechanism connected serially, which enhances the global nature of the advanced features.Finally, to detect the ambiguity features of smoke, this study introduces the EEM to enhance the smoke boundary information by suppressing the smoke edge noise through median filter, and on the other hand, a different convolutional feature extraction process enhances the smoke feature representation capability.The introduction of these modules makes the smoke detection algorithm presented in this study able to accurately identify a variety of changing smoke features and ambiguous smoke features, which improves the precision of smoke detection in the early stage of forest fires and reduces the false alarm rate as well as reduces the phenomenon of missed alarms.

Multi-Feature Extraction Module
Smoke uncertainty characteristics mainly come from two aspects; the first is that smoke is a non-rigid object without a fixed geometric shape, so the morphology of smoke is uncertain, leading to the diminished precision of the existing smoke detection algorithms; the second is that there are many objects in the background of the smoke image that are similar to the smoke, such as clouds, light, fog, as well as haze, and so on, so the classification of smoke is uncertain, leading to the existing smoke detection algorithms having a high rate of false positives.The primary cause of the low accuracy and elevated false alarm rate in existing smoke detection algorithms lies in the predominant use of various forms of standard convolution within neural networks for feature extraction.Standard convolution's fixed geometric mechanism fails to capture the deformable geometrical attributes inherent in non-rigid objects like smoke.Moreover, the weight-sharing property of standard convolution struggles to discern effectively between smoke plumes and smoke-like things.Consequently, existing neural networks extract a single feature, unable to represent multiple features adequately.To address the aforementioned issues, this study suggests an MEM in which the standard convolution, deformable convolution, and involution are spliced in parallel with a residual connection structure.The detailed network architecture is shown in Figure 2b, while an extensive architectural analysis is presented in Section 3.2.
The standard convolution (Conv) as the base operator of a neural network has spatial invariance and channel specificity, i.e., the parameters of the convolution kernel are shared spatially and not shared in the channel; the specific network structure is shown in Figure 2b.The standard convolution effectively captures local image features but lacks adaptability to varying visual patterns across spatial locations, thereby restricting the convolutional kernel's perception field.Assuming a given input feature map X, the standard convolution output feature map Y s can be computed using the following equation: where K is the number of convolutional kernel parameters, p k is the position corresponding to different convolutional kernel parameters, w s is the standard convolutional learnable parameters, and p is the position corresponding to each pixel point in the feature map.Unlike standard convolution, involution (IVC) [48] possesses spatial specificity and channel invariance characteristics.The network structure is depicted in Figure 2b.Involution can aggregate context in the broader space and can adaptively assign weights to different pixel points on the feature map, thus highlighting the most informative visual elements in the spatial domain, and it can well distinguish smoke plumes from smokelike objects.The feature map Y i of the involution output can be computed using the subsequent equation: where r are both linear transformation matrices, C is the number of input channels, G ≪C is the number of convolution kernels shared by all the channels, and r is the channel reduction ratio.The two linear matrix transformations can adaptively compute the weight parameters at different locations of the convolution kernel according to different inputs.
Deformable convolution [49] (DCN) can adaptively adjust the sampling offset and tuning index according to the input data to achieve adaptive spatial aggregation; the detailed network structure is illustrated in Figure 2b.DCN can well capture the non-rigid features of the image, but the region of deformable convolution is sometimes larger than the region where the target is located and thus is prone to erroneous detection results.The feature map Y d output by deformable convolution can be computed using the subsequent equation: where ∆p k and ∆m k are obtained by convolution computation on the original input feature map, the convolution layer maintains the same size of spatial resolution as the input feature map, the output of the convolution layer is 3K channels, where the 2K channels correspond to the x-axis and y-axis offsets of each convolution kernel parameter, the last K channels correspond to the modulation scalars obtained by the sigmoid layer of each convolution kernel parameter, and w d is the parameter that can be learnt by the deformable convolution.Since ∆p k is usually a fraction, the pixel value of x(p + p k + ∆p k ) is computed by bilinear interpolation, which can be computed using the subsequent equation: where q is the spatial position in the input feature map that is involved in the computation, p is the fractional position, and G () is the bilinear interpolation kernel.
Using the above equation, we can see that )), and the standard convolution within the DCN and IVC modules is just a simple 1 × 1 convolution used to align the dimensions of parameters.For a given input feature map X, the output feature map f out obtained by the MEM can be calculated using the following equation: Forests 2024, 15, 409 where the number of Bottleneck block repetitions is shown.
The MEM proposed in this paper combines the advantages of various convolutions, which can well extract the local and global features of the image, and achieves the functions of adaptive spatial aggregation and adaptive weight assignment.

Global Feature Enhancement Module
As the neural network layers increase, while the high-level feature map offers rich semantic information beneficial for accurate classification, it lacks structural information essential for precise localization in the regression network.Therefore, this paper proposes a GEM to enhance the structural information of the high-level feature map; the specific network structure is depicted in Figure 2a.GEM can mine the structural information of the smoke in the feature map from a global perspective for better feature learning.Specifically, for each feature location in the high-level feature map, in order to intensively capture the structural information of global scope and local appearance information, this paper splices together the pairwise correlations of each feature point with all feature locations as well as the relationship of the feature itself and learns the attention through the spatial and channel dimensions [50], thus not only achieving semantic enhancement of clustering-like information but also augmenting the structural information of the target.
In this study, we first introduce the channel attention calculation method, assuming that the given input feature graph f in outputs X ∈ R C×H×W after convolutional computation, where C, H, and W correspond to the channels, heights, and widths of the input feature map, each channel pixel in X has a W × H dimensional vector, all the pixels in the channel are stacked into a graph G c with a total of C nodes, and each node is denoted as The correlation between the two nodes i and j is denoted as r c i.j ∈ R c , which is computed by using the dot product of the affinity matrix.To maintain the structural details of the extracted features, this study adopts the bi-directional pairwise correlation of each node to represent the correlation vector of each node as follows: where φ c and δ c are both modular functions consisting of 1 × 1 spatial convolution, batch normalization, and the activation function, r i ∈ R 2C .To make the proposed channel attention not only focus on the correlation vector of each feature point but also focus on the feature itself, this paper splices the input feature maps and the correlation vectors of the input feature maps and finally adopts the sigmoid function to calculate the channel attention a c , which is shown as follows: where α c and β c are both modular functions consisting of 1 × 1 spatial convolution, batch normalization, and activation functions.GAP s is the global average pooling along the spatial dimension.The computation of spatial attention is similar to the computation of channel attention for the output feature graph X * ∈ R C×H×W computed by multiplying a c and X.Each pixel point of the space has a C-dimensional vector, and all the pixels in the space are stacked into a graph G s that has a total of A = W × H nodes.The correlation between every two nodes i and j is denoted as r s i.j ∈ R A ; the bi-directional pairwise correlation of each node representing the correlation vector of each node is as follows: where φ s and δ s are both modular functions consisting of 1 × 1 spatial convolution, batch normalization, and activation functions, r i ∈ R 2A .Similar to the computation principle of channel attention, the computation of spatial attention is specified as follows: where α s and β s are both modular functions consisting of 1 × 1 spatial convolution, batch normalization, and activation functions.GAP c is the global average pooling along the channel dimension.
The above formula shows that for the input feature map f in , the output f out computed by the GEM is calculated as follows: The introduced GEM in this study enhances the network's attention towards the structural information of smoke targets.Consequently, the entire neural network achieves more precise identification of smoke areas, thereby reducing instances of missed detections.

Edge Feature Enhancement Module
The ambiguity feature of smoke mainly comes from the interference of noise and bad weather.Noise, heavy fog, bright light, and other weather conditions can cause the edges of the smoke image to be damaged, thus affecting the classification and localization detection results of the smoke region.To attenuate image edge noise and enhance the discrimination between smoke plumes and smoke-like objects, this paper proposes an EEM, which mainly consists of a median filter [51] and an enhanced convolution, as shown in Figure 2a.The median filter is highly responsive to edge information within the image, effectively eliminating noise while enhancing the edge information integrity.Enhanced convolution consists of standard convolution, deformable convolution, and involution connected in parallel, which can improve the extraction of smoke features and thus help to distinguish between smoke plumes and smoke-like objects.For the input feature map f eem , the output feature map y eem computed by the EEM is calculated as follows: where γ 2 represents a combined operation of a median filter of size 5 × 5 and an augmented convolution, and γ 1 represents a combined operation of a median filter of size 3 × 3 and an augmented convolution.
The median filter suggested in this study can effectively remove the noise in the feature mapping without significantly degrading the clarity of the image, and the enhanced convolution enriches the edge information of the smoke image and enhances the distinction between smoke plumes and smoke-like objects.

Results
This section initially outlines the experimental details.Subsequently, to showcase the superior performance of the algorithm introduced in this study, the following analyses are conducted: architectural analysis of the MEM, ablation experiments, comparisons with target detection-based improved smoke detection algorithms and neural network-based improved smoke detection algorithms, visualization analysis, and real application analysis.

Experimental Details
In this study, the experimental platform consisted of a personal desktop computer running on the Ubuntu operating system.The hardware configuration included an AMD Ryzen 9 5900X 12-Core Processor and an NVIDIA GeForce RTX 3090 GPU, employing the PyTorch framework.Image inputs were uniformly adjusted to 640 × 640 dimensions.Data augmentation primarily employed Mosaic and fliplr strategies.The activation function Forests 2024, 15, 409 10 of 20 used was the Sigmoid-weighted Linear Unit (SiLU).The batch size was set to 8, and the model was trained using stochastic gradient descent (SGD) with a momentum of 0.937.The initial learning rate was 0.01, and the learning rate decay coefficient was 0.0005.Since many similar objects always interfere with smoke images in forest scenes, such as fog, clouds, light changes, and white roofs or forest paths, the detection false alarm rate FAR is a vital evaluation index for smoke detection algorithms.Therefore, besides the typical average precision AP(IoU = 0.50:0.95),AP 50 (IoU = 0.50), and average recall AR, this paper still uses the false alarm rate FAR as an evaluation index.

Architectural Analysis of MEM
Currently, the mainstream feature extraction operators include standard convolution, deformable convolution, and involution; to better extract smoke features, a total of six fusion strategies are proposed for multi-feature extraction; the specific network architecture is illustrated in Figure 3, and the performance comparison of the various strategies is illustrated in Table 2.
From the data analysis in Table 2, it is evident that compared with the baseline net-work, only using the DCN module can improve the AP of the network by 2.1%, and using the combination of the Conv and DCN modules can enhance the AP of the network by 3.5%.The performance of the combined module is 1.4% higher than only using the separate module, which shows that the DCN module operation can significantly enhance the accuracy of the network, and the combination of multiple modules can further improve the precision of the network.This is mainly because the introduction of deformable convolution makes the network more accurate in extracting the non-rigid features of smoke, and the extraction of multiple features can significantly enhance the feature extraction ability of the network, thus improving the detection accuracy of the network.Similarly, from the data analysis in Table 2, it is evident that compared with the baseline network, only using the IVC module can reduce the FAR of the network by 1.3%, and using the combination of the Conv and IVC modules can reduce the FAR of the network by 1.6%.The performance of the combined module is 0.3% higher than that of only using the individual module, which shows that the operation of the IVC module can notably decrease the false alarm rate of the network, and the combination of multiple modules can more nearly reduce the false alarm rate.This is mainly because the IVC module can extract global features in a wider range and adaptively assign weights to each pixel, which is beneficial for the network to differentiate between smoke plumes and smoke-like objects, thus decreasing the false alarm rate.Finally, the combination of the three modules achieves the best performance in all the metrics, so the introduced MEM in this paper chooses the combination of the three modules.The six fusion strategies proposed in this paper are the Conv module only; the DCN module only; the IVC module only; the combination of the Conv and DCN modules; the combination of the Conv and IVC modules; and the combination of the Conv, DCN, and IVC modules, where the Conv is a combination of standard convolution, batch normalization, and activation function, the DCN is a combination of standard convolution, deformable convolution, batch normalization, and activation function, and the IVC is a combination of standard convolution, involution, batch normalization, and activation function.The standard convolution within the DCN and IVC modules is just a simple 1 × 1 convolution used to align the dimensions of parameters.
From the data analysis in Table 2, it is evident that compared with the baseline network, only using the DCN module can improve the AP of the network by 2.1%, and using the combination of the Conv and DCN modules can enhance the AP of the network by 3.5%.The performance of the combined module is 1.4% higher than only using the separate module, which shows that the DCN module operation can significantly enhance the accuracy of the network, and the combination of multiple modules can further improve the precision of the network.This is mainly because the introduction of deformable convolution makes the network more accurate in extracting the non-rigid features of smoke, and the extraction of multiple features can significantly enhance the feature extraction ability of the network, thus improving the detection accuracy of the network.Similarly, from the data analysis in Table 2, it is evident that compared with the baseline network, only using the IVC module can reduce the FAR of the network by 1.3%, and using the combination of the Conv and IVC modules can reduce the FAR of the network by 1.6%.The performance of the combined module is 0.3% higher than that of only using the individual module, which shows that the operation of the IVC module can notably decrease the false alarm rate of the network, and the combination of multiple modules can more nearly reduce the false alarm rate.This is mainly because the IVC module can extract global features in a wider range and adaptively assign weights to each pixel, which is beneficial for the network to differentiate between smoke plumes and smoke-like objects, thus decreasing the false alarm rate.Finally, the combination of the three modules achieves the best performance in all the metrics, so the introduced MEM in this paper chooses the combination of the three modules.

Ablation Experiments
Compared to the baseline algorithm, the advantages of this paper's algorithm mainly come from the improvements brought by the MEM, GEM, and EEM.Table 3 shows the ablation analysis of the proposed MGE-YOLOv8 on the RSF dataset with different modules.Considering that early forest fire warning has higher requirements for time, this paper adds the comparative analysis of the Inference time and GFLOPs in Table 3.

MEM
From experiments 2 to 4, the addition of MEM, when a separate module is added to the baseline network, leads to a significant enhancement in the average precision coupled with a substantial reduction in the false alarm rate.If MEM is considered a base value, then by comparing experiments 3 and 5, 4 and 7, and 6 and 8, it can be seen that the incorporation of the MEM module increases the AP of the algorithm by 2.8%, 2%, and 2.3%, respectively, while at the same time, the introduction of the MEM module decreases the FAR of the algorithm by 1.3%, 0.9%, and 0.8%, respectively.From the comparison of the above experimental outcomes, it can be verified that the MEM module combines the advantages of various convolutions to extract the local, non-rigid, and global features of the feature map, thus improving the detection average precision and recall and decreasing the false alarm rate, of which the increase in the average precision and recall mainly comes from the standard convolution and deformable convolution operations.The reduction in the false alarm rate primarily comes from the involution operation.

GEM
If GEM is considered a base value, then by comparing experiments 2 and 5, 4 and 6, and 7 and 8, it can be seen that the incorporation of the GEM module increases the AP of the algorithm by 1.4%, 1.2%, and 1.5%, respectively, while at the same time, the introduction of the MEM module decreases the FAR of the algorithm by 0.2%, 0.3%, and 0.2%, respectively.From the comparison of the above experimental outcomes, it can be verified that the GEM module extracts the structural information of the feature map from the global perspective, enriches the semantic and spatial information of the high-level features, and thus significantly improves the algorithm's average precision, recall, and at the same time slightly improves the algorithm's false alarm rate and significantly reduces the phenomenon of missed alarms, as described in Sections 3.6 and 3.7.

EEM
If EEM is considered a base value, then by comparing experiments 3 and 6, 2 and 7, and 5 and 8, it can be seen that the incorporation of the EEM module increases the AP of the algorithm by 0.8%, 0.2%, and 0.3%, respectively, while at the same time, the introduction of the EEM module decreases the FAR of the algorithm by 1.0%, 0.5%, and 0.5%, respectively.From the above comparison of experimental results, it can be verified that the EEM module improves the ability to extract the fuzzy features of smoke, which is conducive to the differentiation of smoke plumes and smoke-like things, thus significantly reducing the false alarm rate of the algorithm and, at the same time, slightly improving the average precision and recall of the algorithm.

Evaluation Time
Because early forest fire warning has high time requirements, this paper takes Inference time as an evaluation index.The baseline and this paper's Inference times are 4.51 ms and 4.86 ms, respectively.Although this paper's Inference time has a slight increase compared to the baseline, the processing speed is almost four times 25 fps, which can achieve the realtime effect well and fight for more time for the firefighters to stop loss in time.In conclusion, the MGE-YOLOv8 proposed in this paper can substantially enhance the average precision and recall of the algorithm under the premise of meeting the real time and significantly reduce the false alarm rate, reducing the phenomenon of missed alarms, thus improving the efficiency of security personnel.

Comparisons with Target Detection-Based Improved Smoke Detection Algorithms
In this section, we compare the introduced algorithm and 12 state-of-the-art target detection-based improved smoke detection algorithms on RSF, XJTU-RS, and USTC-RF datasets, as presented in Table 4.As XJTU-RS and USTC-RF datasets feature single backgrounds with minimal interference from similar objects, this paper does not include comparative analyses of false alarm rates for these datasets.The smoke images selected for the self-made dataset in this paper possess uncertainty and ambiguity, leading to lower performance indices for the algorithm proposed herein.Conversely, the algorithm showcases higher performance indices on the USTC-RF dataset.The EfficientDet model is simple, but the average precision is low, which is undoubtedly fatal for the forest fire warning task with devastating losses.The YOLOX algorithm is better in real time but has a high rate of false alarms, and frequent false alarms can lead to a smoke detection system that is not helpful but a nuisance with no practical application value.Compared with other smoke detection algorithms, the SASC-YOLOX algorithm has higher detection accuracy, faster processing speed, and a lower false alarm rate on the RSF dataset.Still, compared with the algorithm MGE-YOLOv8 introduced in this paper, the AP50 is reduced by 2.4%.The false alarm rate is increased by 0.9%, mainly due to the proposed algorithm improving the network's ability to extract features.The GEM attention module presented in this paper performs significantly better than CBAM [34], although the processing speed is relatively slow.Still, it is also almost four times the real-time frame rate, which can provide good real-time processing.In summary, the algorithm introduced in this study has good network performance, and the MEM, GEM, and EEM modules are simple.They can be inserted into any network architecture with good generalization.

Comparisons with Neural Network-Based Improved Smoke Detection Algorithms
To comprehensively analyze the superior performance of the introduced algorithms, this study compared six state-of-the-art neural network-based improved smoke detection algorithms with the proposed algorithm on RSF, XJTU-RS, and USTC-RF datasets.The experimental comparison results are presented in Table 5.  [28] 0.876 0.99 0.896 -W-Net [29] 0.77 0.986 0.807 -STCNet [30] 0.709 0.979 0.755 -MVMNet [31] 0.888 0.99 0.907 -SASC-YOLOX [32] 0.921 0.99 0.939 -MGE-YOLOv8 0.938 0.99 0.941 -In Table 5, DCNN, Deep CNN, W-Net, and STCNet employ deep neural networks as the foundational framework for smoke detection algorithms.These methods exhibit comparable performance metrics with lower average precision and higher false alarm rates.Conversely, MVMNet and SASC-YOLOX utilize the YOLO network architecture for one-stage target detection, resulting in superior network performance compared to deep neural networks.Hence, this paper adopts the latest YOLOv8 network architecture as the foundational framework, demonstrating the most optimal performance across all evaluation metrics.Furthermore, except for the algorithm proposed in this paper, all other methods rely on standard convolution for smoke detection.This approach limits the network to extracting a single feature, inadequately representing the non-rigid features of smoke.Consequently, it leads to reduced detection accuracy in identifying smoke characteristics and results in a higher rate of false alarms due to the inability to effectively differentiate between smoke plumes and smoke-like objects.

Visualization Analysis
To effectively demonstrate the efficacy of the algorithm introduced in this paper, a visualization analysis is conducted on different datasets.Specifically, the model proposed in this study is trained and subsequently tested on RSF, XJTU-RS, and USTC-RF datasets.Figure 4 presents the visual analysis proposed in this paper, encompassing bounding box visualization and Grad-CAM [52] visualization.The Grad-CAM visualization is primarily employed on the RSF dataset, offering a more intuitive judgment regarding the enhancement of the introduced algorithm compared to the baseline algorithm in mitigating the false alarm rate.On the other hand, bounding box visualization is predominantly used on the XJTU-RS and USTC-RF datasets, facilitating a more intuitive assessment of the improvement brought about by the proposed algorithm in terms of confidence values compared to the baseline algorithm.
similar smoke-like objects.In Figure 5b, considerable enhancement is observed in the confidence level of the bounding box detection on the XJTU-RS dataset.This improvement is primarily attributed to the multi-feature enhancement module introduced in this study, which significantly enhances the detection precision.The proposed algorithm accurately identifies the smoke region in the fifth column image, primarily because of the GEM's superior ability to capture global structural smoke detail.In Figure 5c, a slight elevation in the confidence level for bounding box detection is noticed in the USTC-RF dataset.This marginal improvement stems from the dataset's straightforward semantic information and uniform smoke shapes, allowing better performance across various network models.

Real Applications Analysis
In order to verify the practicality of the algorithm introduced in this study, this study downloads a set of smoke images in the network and uses the baseline network and the network presented in this paper to carry out target detection on this set of smoke images; due to the fact that there is no labeled data for the smoke images downloaded from the In Figure 5a, the RSF dataset displays smoke images with various interferences: the first column involves light interference, the second and fifth columns exhibit similar color interferences (pavement, roof), and the third and fourth columns showcase diverse cloud interferences.The proposed algorithm accurately identifies smoke regions, eliminating light, similar color, and different cloud interferences from the highlighted boxes.In contrast, the baseline method incorrectly categorizes smoke-like objects as smoke, resulting in a higher false alarm rate.This discrepancy is primarily due to the MEM and EEM functionalities in this paper, enabling adaptive spatial aggregation and weight assignment.Consequently, these modules enhance smoke edge feature extraction, significantly improving accuracy and reducing false alarms.Moreover, the sixth to eighth column images depict typical forest smoke images.The Grad-cam visualization highlights that the proposed algorithm more accurately identifies smoke regions compared to the baseline.This enhancement is attributed to the GEM proposed in this paper, which effectively captures global structural smoke information, thus mitigating instances of missed alarms.

Real Applications Analysis
In order to verify the practicality of the algorithm introduced in this study, this study downloads a set of smoke images in the network and uses the baseline network and the network presented in this paper to carry out target detection on this set of smoke images; due to the fact that there is no labeled data for the smoke images downloaded from the network, this paper only carries out bounding box visualization on this set of data, as shown in Figure 5.
In Figure 5a, the presented algorithm accurately identifies the smoke region and effectively excludes interference from clouds and similar-colored objects, thus notably reducing the false alarm rate in practical application scenarios.Figure 5b demonstrates the algorithm's comprehensive smoke region identification, significantly mitigating missed alarms in real-world applications.Additionally, Figure 5c highlights the clear advantage of the proposed algorithm's detection frame confidence over the baseline algorithm in various scenarios, affirming the reliability of the presented algorithm's detection accuracy in practical applications.

Discussion
Forest fires present a significant threat to both natural resources and human life.Accurate and prompt forest fire detection is paramount for minimizing associated losses.This paper suggests employing smoke detection as a method for early forest fire warning, recognizing that smoke often serves as the initial indicator of a fire and is observable from a distance.Nevertheless, existing smoke detection methods grapple with two primary challenges: low accuracy and a high false alarm rate.Low accuracy results in missed alarms, posing a potentially fatal risk for early forest fire warning.Conversely, a high false alarm rate generates nuisance alarms that provide little value, particularly when This study employed bounding box visualization for comparison on the XJTU-RS and USTC-RF datasets, characterized by a single background without interference from similar smoke-like objects.In Figure 5b, considerable enhancement is observed in the confidence level of the bounding box detection on the XJTU-RS dataset.This improvement is primarily attributed to the multi-feature enhancement module introduced in this study, which significantly enhances the detection precision.The proposed algorithm accurately identifies the smoke region in the fifth column image, primarily because of the GEM's superior ability to capture global structural smoke detail.In Figure 5c, a slight elevation in the confidence level for bounding box detection is noticed in the USTC-RF dataset.This marginal improvement stems from the dataset's straightforward semantic information and uniform smoke shapes, allowing better performance across various network models.

Real Applications Analysis
In order to verify the practicality of the algorithm introduced in this study, this study downloads a set of smoke images in the network and uses the baseline network and the network presented in this paper to carry out target detection on this set of smoke images; due to the fact that there is no labeled data for the smoke images downloaded from the network, this paper only carries out bounding box visualization on this set of data, as shown in Figure 5.
In Figure 5a, the presented algorithm accurately identifies the smoke region and effectively excludes interference from clouds and similar-colored objects, thus notably reducing the false alarm rate in practical application scenarios.Figure 5b demonstrates the algorithm's comprehensive smoke region identification, significantly mitigating missed alarms in real-world applications.Additionally, Figure 5c highlights the clear advantage of the proposed algorithm's detection frame confidence over the baseline algorithm in various scenarios, affirming the reliability of the presented algorithm's detection accuracy in practical applications.

Discussion
Forest fires present a significant threat to both natural resources and human life.Accurate and prompt forest fire detection is paramount for minimizing associated losses.This paper suggests employing smoke detection as a method for early forest fire warning, recognizing that smoke often serves as the initial indicator of a fire and is observable from a distance.Nevertheless, existing smoke detection methods grapple with two primary challenges: low accuracy and a high false alarm rate.Low accuracy results in missed alarms, posing a potentially fatal risk for early forest fire warning.Conversely, a high false alarm rate generates nuisance alarms that provide little value, particularly when interfering targets are present.Addressing these challenges is crucial for enhancing the effectiveness of early forest fire warning systems.
The low accuracy and high false alarm rate observed in current smoke detection methods primarily stem from the inaccurate extraction of discriminative smoke features by the feature extraction network, negatively impacting overall performance.Forest fire smoke exhibits characteristics of uncertainty and ambiguity.The uncertainty features of smoke primarily originate from two sources.Firstly, smoke is a non-rigid object with no fixed geometric shape, contributing to reduced accuracy in existing smoke detection algorithms.Secondly, the presence of numerous objects in the background of smoke images resembling smoke introduces uncertainty in smoke classification, resulting in a high false alarm rate in current smoke detection algorithms.The ambiguity feature of smoke mainly arises from noise and adverse weather conditions, leading to interference that damages the edges of smoke images.This interference, in turn, affects the smoke region's classification and localization detection outcomes.Addressing these challenges is crucial for enhancing the performance of smoke detection algorithms.Existing neural network-based smoke detection algorithms predominantly utilize standard convolution for feature extraction, including DCNN, Deep CNN, W-Net, STCNet, MVMNet, and SASC-YOLOX.Despite incorporating various attention mechanisms, the detection performance of the network experiences only marginal improvement.However, the limitation of a fixed network structure in standard convolution prevents the effective extraction of deformable smoke features, thereby impacting the overall network detection accuracy.Furthermore, the spatial specificity and channel invariance of standard convolution lead the neural network to assign identical weights to similar features during the extraction process.This characteristic is not conducive to distinguishing between smoke and smoke-like objects effectively.Fast R-CNN demonstrates the highest detection accuracy among current target detection-based improved smoke detection algorithms.However, the algorithm exhibits the highest GFLOPs and poor FPS, rendering it unsuitable for meeting real-time requirements in early forest fire warning.On the other hand, EfficientDet boasts the fewest model parameters and GFLOPs, offering better real-time performance.Nevertheless, it suffers from lower detection accuracy.YOLOV8 has the fewest model parameters and the highest AP and FPS within the YOLO series.Compared to other YOLO series models, such as SSD, RetinaNet-50, and RetinaNet-101, which have higher model parameters and GFLOPs, these algorithms exhibit reduced real-time performance and less satisfactory detection accuracy.Consequently, this paper adopts YOLOV8 as the foundational network framework.
To accurately detect smoke's uncertain and fuzzy features, this paper introduces three novel modules-MEM, GEM, and EEM-into the original YOLOv8 network.MEM strategically combines the strengths of different convolutions to extract the shape of smoke adaptively and assign weights to other pixels.GEM is designed to capture global features of smoke, enhancing the identification of smoke regions.In feature mapping, EEM reduces smoke edge noise and improves smoke plume discrimination from smoke-like objects through enhanced convolution.This integrated approach enhances detection accuracy and reduces false alarms.Additionally, this paper creates a large-scale real-scenario forest fire smoke dataset to facilitate direct application in forest fire monitoring, ensuring better model robustness and generalization.The experimental findings affirm that the proposed MGE-YOLOv8 algorithm is well suited for direct application in early forest fire warning systems.In practical scenarios, the algorithm can be deployed on fire monitoring towers, forest stations, industrial parks, or city perimeters for real-time monitoring of potential fires and timely warnings.Additionally, the algorithm is adaptable for implementation on surveillance devices such as UAVs to monitor expansive areas and issue prompt warnings.Customization of these deployment strategies to specific forest terrains and requirements is essential to improve the sensitivity and accuracy of fire warnings.The extensive coverage, rapid detection speed, and cost-effectiveness of vision-based detection devices facilitate widespread adoption.However, it is crucial to acknowledge the limitations of the proposed algorithm.Although it performs real-time detection and early warning effectively during daylight hours, its efficacy diminishes in low-light or nighttime environments.Future enhancements should prioritize optimizing the dataset, expanding nighttime images of natural scenes, or improving nighttime image preprocessing.These efforts will enable the algorithm to achieve all-weather real-time detection and early warning of forest fires.

Conclusions
To overcome the challenges of low accuracy and a high false alarm rate in forest fire smoke detection, this paper presents the MGE-YOLOV8 network architecture as an enhancement to the original YOLOv8 network.The network incorporates several crucial modifications for enhanced performance.Initially, the MEM module replaces the original feature extraction module to strengthen the network's ability to extract discriminative smoke features.Subsequently, the GEM module is introduced at the end of the feature extraction network to enhance the global dependence on advanced features.Finally, the EEM module is added before the prediction network to reduce edge noise in smoke without compromising image clarity.Experimental results demonstrate that the proposed algorithm in this paper achieves a 4.2% improvement in AP and a 3.9% increase in AR compared to the baseline network on the RSF dataset.Moreover, it reduces the false alarm rate by 2.1%, maintaining an impressive operational speed of 87.682 FPS and computational efficiency of 33.682 GFLOPs.These results highlight the exceptional detection capabilities of the algorithm.

Forests 2024 , 21 Figure 2 .
Figure 2. The whole network architecture of the proposed MGE-YOLOv8.(a) Overall network architecture and GEM, EEM module architectures, (b) MEM module architectures, and various convolutional module architectures.

Figure 2 .
Figure 2. The whole network architecture of the proposed MGE-YOLOv8.(a) Overall network architecture and GEM, EEM module architectures, (b) MEM module architectures, and various convolutional module architectures.

Figure 5 .
Figure 5.Comparison of experimental results between MGE-YOLOv8 and YOLOv8 on real applications.(a) False alarms reduce smoke images, (b) Missed alarms reduce smoke images, (c) Confidence value increases smoke images.

Figure 5 .
Figure 5.Comparison of experimental results between MGE-YOLOv8 and YOLOv8 on real applications.(a) False alarms reduce smoke images, (b) Missed alarms reduce smoke images, (c) Confidence value increases smoke images.

Figure 5 .
Figure 5.Comparison of experimental results between MGE-YOLOv8 and YOLOv8 on real applications.(a) False alarms reduce smoke images, (b) Missed alarms reduce smoke images, (c) Confidence value increases smoke images.

Table 4 .
Comparative performances of the proposed and target detection-based improved smoke detection algorithms.

Table 5 .
Comparative performances of the proposed and neural network-based improved smoke detection algorithms.