Early Wildfire Smoke Detection Method Based on EDA

Liu, Yang; Chen, Faying; Zhang, Changchun; Wang, Yuan; Zhang, Junguo

doi:10.3390/rs16244684

Open AccessArticle

Early Wildfire Smoke Detection Method Based on EDA

by

Yang Liu

^1,2,3,†

,

Faying Chen

^1,2,3,†,

Changchun Zhang

^1,2,3

,

Yuan Wang

^1,2,3,* and

Junguo Zhang

^1,2,3

¹

School of Technology, Beijing Forestry University, Beijing 100083, China

²

State Key Laboratory of Efficient Production of Forest Resources, Beijing 100083, China

³

Key Laboratory of National Forestry and Grassland Administration on Forestry Equipment and Automation, Beijing 100083, China

^*

Author to whom correspondence should be addressed.

^†

These authors contributed equally to this work and should be considered as co-first authors.

Remote Sens. 2024, 16(24), 4684; https://doi.org/10.3390/rs16244684

Submission received: 7 September 2024 / Revised: 12 December 2024 / Accepted: 13 December 2024 / Published: 15 December 2024

(This article belongs to the Special Issue Remote Sensing and Smart Forestry II)

Download

Browse Figures

Versions Notes

Abstract

Early wildfire smoke detection faces challenges such as limited datasets, small target sizes, and interference from smoke-like objects. To address these issues, we propose a novel approach leveraging Efficient Channel and Dilated Convolution Spatial Attention (EDA). Specifically, we develop an experimental dataset, Smoke-Exp, consisting of 6016 images, including real-world and Cycle-GAN-generated synthetic wildfire smoke images. Additionally, we introduce M-YOLO, an enhanced YOLOv5-based model with a 4× downsampling detection head, and MEDA-YOLO, which incorporates the EDA mechanism to filter irrelevant information and suppress interference. Experimental results on Smoke-Exp demonstrate that M-YOLO achieves a mean Average Precision (mAP) of 96.74%, outperforming YOLOv5 and Faster R-CNN by 1.32% and 3.26%, respectively. MEDA-YOLO further improves performance, achieving an mAP of 97.58%, a 2.16% increase over YOLOv5. These results highlight the potential of the proposed models for precise and real-time early wildfire smoke detection.

Keywords:

wildfire smoke detection; YOLO; Cycle-GAN; attention mechanism; small target detection

Graphical Abstract

1. Introduction

Wildfires are among the most devastating natural disasters, causing extensive environmental, economic, and social harm [1]. The early detection of wildfire smoke is critical for mitigating these impacts, as it provides the first opportunity for intervention before fires become uncontrollable. As climate change accelerates, the frequency, scale, and intensity of wildfires have been increasing globally, making it imperative to develop advanced and reliable early detection systems. One of the earliest indicators of a wildfire is the production of smoke, with video monitoring being the predominant method for detecting such early signs [2]. However, in the initial stages of a wildfire, the smoke typically occupies only a small portion of the monitoring image, making the prompt detection of these small smoke targets vital for preventing the escalation of wildfire events [3]. During this phase, incomplete combustion of fuel generates substantial smoke characterized by distinct texture and diffusion patterns [4]. Efficient smoke detection can provide firefighters with critical time to swiftly extinguish fires, thereby making the development of effective early wildfire smoke detection methods an urgent and essential goal in wildfire monitoring.

Traditional wildfire detection methods, including satellite imagery, thermal sensors, and manual monitoring, often fail to meet the requirements for real-time detection and high accuracy. While satellite systems provide broad coverage, they are limited by temporal resolution and are less effective in detecting small-scale smoke in its early stages [5]. Ground-based systems, relying on traditional computer vision techniques, often face issues such as poor generalization to diverse environmental conditions and difficulty in adapting to dynamic and complex backgrounds [6]. Deep learning-based object detection models have shown great promise in addressing these limitations due to their ability to automatically extract features and perform real-time detections [7]. Some researchers have employed deep learning methods for wildfire smoke detection and have achieved certain successes [8,9,10,11]. However, early wildfire smoke detection still faces significant challenges, including the detection of small and dispersed smoke targets [12], high false alarm rates caused by interference from smoke-like objects such as clouds and fog [13], and the limited availability of annotated datasets for training robust models [14]. These challenges hinder the development of reliable and effective wildfire monitoring systems.

This study aims to address these challenges by proposing a novel detection framework, MEDA-YOLO, which enhances the YOLOv5 architecture through the integration of an Efficient Channel and Dilated Convolution Spatial Attention (EDA) mechanism [15]. This hybrid attention module enables the model to focus on smoke-specific features while suppressing irrelevant information, such as clouds and fog. Additionally, to overcome the scarcity of training data, a diverse and realistic dataset, Smoke-Exp, was created by combining real-world images with synthetic data generated using Cycle-GAN.

The primary contributions of this study are as follows:

Creation of the Smoke-Exp Dataset: A novel dataset comprising real-world and synthetic wildfire smoke images generated using Cycle-GAN, which provides diverse and realistic training scenarios for small-scale smoke detection models.
Enhanced Detection Framework: An enhanced YOLOv5-based detection model incorporating a 4× downsampling detection head (Detect-Tiny) and optimized feature extraction techniques, significantly improving sensitivity to small smoke targets and reducing missed detections.
Hybrid Domain Attention Mechanism: A hybrid domain attention mechanism, Efficient Channel and Dilated Convolution Spatial Attention (EDA), is integrated into M-YOLO, enabling the model to filter out irrelevant information and focus on smoke-specific features, even in complex environmental conditions.
Comprehensive Evaluation: We conducted extensive experiments to demonstrate the effectiveness of our approach, achieving significant improvements over existing models in terms of accuracy and robustness, making it more applicable in real-world scenarios.

This paper is organized as follows: Section 1 introduces the background and significance of this study, highlighting the key contributions. Section 2 reviews related work, summarizing research progress in wildfire smoke detection and identifying the limitations of existing methods. Section 3 details the construction of the Smoke-Exp dataset. Section 4 presents the proposed MEDA-YOLO model, describing its architecture and innovations. Section 5 provides the experimental results and an in-depth analysis of the model’s performance. Section 6 discusses the advantages of the proposed method and explores potential applications of the EDA mechanism beyond wildfire detection. Section 7 concludes the paper by summarizing the key findings and contributions of this work.

2. Related Work

2.1. Vision-Based Wildfire Smoke Detection

Vision-based wildfire smoke detection methods analyze image or video data to identify smoke through its visual characteristics, such as color, texture, and motion patterns. Traditional computer vision techniques often rely on manually designed features and pattern recognition algorithms [16]. For instance, Xue et al. [17] integrated airborne LiDAR data with monitoring images to simulate the radiation flux from forests, offering a novel approach to subsequent wildfire detection. While these methods are computationally efficient, they suffer from poor generalization and are easily affected by environmental factors such as lighting variations and background clutter. Additionally, these approaches struggle to detect small or distant smoke targets, which are critical for early wildfire detection.

In recent years, with the rapid advancement of technology and hardware computing power, deep learning has gained widespread adoption across various fields, including medical image analysis [18], robot navigation [19], autonomous driving [20], agricultural monitoring [21], industrial defect detection [22], wildlife conservation [23], and security surveillance [24], etc. Unlike traditional methods, deep learning does not rely on manually crafted feature extractors but instead learns higher-level and more abstract features of objects in images through a data-driven approach. Sun et al. [25] proposed an enhanced convolutional neural network (CNN) for rapid fire smoke detection, employing various optimization strategies to improve the loss function across multiple convolutional kernels and batch normalization, thereby enhancing the robustness of the CNN and reducing the risk of overfitting. Shang et al. [26] developed an early wildfire risk monitoring and warning algorithm that integrates deep learning models with infrared monitoring and early warning systems, along with a forest fire weather index, significantly improving the accuracy of fire alarms. Mohnish et al. [27] introduced a CNN-based wildfire smoke detection and early warning system, achieving accuracies of 93% and 92% on their custom training and testing datasets, respectively. Transformer-based models, such as DETR and RT-DETR, have recently gained attention for their ability to capture long-range dependencies and improve detection accuracy. Park et al. [28] enhanced wildfire smoke detection capabilities by employing the RT-DETR-X model, which demonstrates superior performance in detecting small-scale smoke plumes. However, these models typically require high computational resources, making them less practical for real-time applications on resource-constrained devices.

While existing methods have made strides in wildfire smoke detection, they are often limited by their inability to effectively handle small-scale targets, high false alarm rates, and diverse environmental conditions. Many models prioritize accuracy but require significant computational resources, making them unsuitable for real-time applications.

2.2. Role of Wildfire Smoke Data Augmentation

The scarcity of annotated early wildfire smoke datasets is a major bottleneck in developing robust detection models [29]. Synthetic data generation methods, such as generative adversarial networks (GAN) [30], have proven effective in augmenting datasets and increasing the diversity of training samples. These methods generate realistic images by performing style transfers, such as converting seasonal backgrounds from one type to another, thereby simulating various real-world conditions. Such augmented datasets not only improve model robustness but also help address the challenges of overfitting and limited generalization, especially when dealing with small objects like wildfire smoke.

2.3. Attention Mechanisms in Object Detection

Attention mechanisms have been widely integrated into object detection frameworks to enhance feature extraction and improve performance in complex scenarios [31,32,33,34]. Channel attention modules dynamically adjust feature map weights to emphasize critical information, while spatial attention modules focus on the positional relevance of features. Hybrid attention mechanisms [35], which combine both channel and spatial attention, have shown significant promise in improving small object detection and suppressing background interference. The effectiveness of these mechanisms in improving object detection has motivated their application in wildfire smoke detection tasks.

Building on these observations, this study introduces MEDA-YOLO, a novel detection framework that leverages advancements in attention mechanisms and synthetic data augmentation to address the limitations of existing methods. The proposed model is designed to achieve high accuracy in small smoke target detection while maintaining computational efficiency, making it suitable for real-time deployment in challenging environments.

3. Construction of the Early Wildfire Smoke Dataset

Deep learning is a data-driven machine learning approach. The performance and generalization ability of deep learning models heavily depend on the quality of the dataset, as it enables the model to learn richer and more effective features, thereby improving detection accuracy and robustness [36]. Consequently, constructing a high-quality wildfire smoke dataset is a critical step in the early detection of wildfire smoke. Currently, widely used public wildfire datasets include those provided by the Computer Vision and Pattern Recognition Laboratory at Keimyung University in South Korea [37], the dataset from Bilkent University in Turkey [38], the dataset compiled by Professor Yuan’s team [39], and the dataset produced and publicly released by the State Key Laboratory of Fire Science at the University of Science and Technology of China [40]. However, these public datasets generally suffer from limitations such as small sample sizes, limited background diversity, and insufficient complexity, rendering them less suitable for the specific research objectives of this study. Furthermore, there is a notable lack of datasets specifically designed for early wildfire smoke imagery. To address these gaps, this study introduces a new wildfire smoke dataset, named Smoke-Raw.

The Smoke-Raw dataset was curated by selecting content from 32 wildfire and smoke videos, along with over 1000 high-definition images collected from various sources. Through techniques such as video frame extraction and image filtering, a total of 3616 video frame images were obtained. Approximately 90% of these images are ideal for this study, featuring small-target wildfire smoke within large scenes viewed from long distances. The remaining images include other types of smoke and flames, such as close-range smoke.

To further enhance the background diversity of small-scale wildfire smoke in the Smoke-Raw dataset, we implemented a data augmentation method based on the Cycle-Generative Adversarial Network (Cycle-GAN) [41]. Specifically, the original dataset was divided into two domains: spring-summer and autumn-winter. The well-trained Cycle-GAN model was then used to transform images from one domain to the other, generating a substantial number of synthetic smoke images that closely resemble realistic seasonal background variations. These augmented images were subsequently incorporated into the Smoke-Raw dataset, resulting in an expanded version named Smoke-Exp.

Figure 1 illustrates the structure of the Cycle-GAN. In this model,

X

and

Y

represent two distinct image domains.

D_{X}

and

D_{Y}

are the discriminators for each domain, respectively. These discriminators are composed of multiple residual blocks, convolutional layers, the Leaky ReLU activation function, and fully connected layers, and they are used to determine whether an image belongs to its corresponding domain.

G_{X Y}

is the generator that transforms images from domain

X

to domain

Y

, while

G_{Y X}

is the generator that performs the reverse transformation, from domain

Y

to domain

X

. Each generator consists of an encoder, a transformer, and a decoder, enabling the effective transformation between these domains.

When training Cycle-GAN, the process begins by randomly selecting an original image

x

from domain

X

and an image y from domain

Y

. These images are then passed through the generators

G_{X Y}

and

G_{Y X}

, respectively, resulting in the corresponding transformed image

y_{t} = G_{X Y} (x)

and

x_{t} = G_{Y X} (y)

. Subsequently,

y_{t}

and

x_{t}

are passed through the generators

G_{Y X}

and

G_{Y X}

again to obtain the reconstructed images

x_{r} = G_{Y X} (y_{t})

and

y_{r} = G_{X Y} (x_{t})

. Through this process, the generators

G_{X Y}

and

G_{Y X}

learn the mappings from domain

X

to domain

Y

and vice versa. To ensure the effectiveness of these mappings, a cycle consistency loss is introduced. This loss function calculates the discrepancy between the original image in

X

domain (

x

) and its corresponding reconstructed image (

x_{r}

) after being mapped to

Y

domain (

y_{t}

) and back to domain

X

. Similarly, it calculates the loss between the original image in

Y

domain (

y

) and its corresponding reconstructed image (

y_{r}

) after being mapped to

X

domain (

x_{t}

) and back to domain

Y

. This cycle consistency loss guides the network towards convergence and facilitates the accurate transfer of image styles between the two domains [42].

The cycle consistency loss is expressed in Equation (1).

L_{c y c} (G, F) = E_{x ~ P_{d a t a} (x)} [| | F (G (x)) - x {| |}_{1}] + E_{y ~ P_{d a t a} (y)} [| | G (F (y)) - y {| |}_{1}]

(1)

where

| | F (G (x)) - x {| |}_{1}

represents the L1 norm of the regenerated

x_{r} = F (G (x))

with the original real data

x

from generator

y_{t} = G (x)

, and

| | G (F (y)) - y {| |}_{1}

can be analogized.

The hyperparameter settings used for training Cycle-GAN in this paper were as follows: the number of input and output channels was 3, the discriminator network was set to “basic”, the generator network was set to “resnet_9blocks”, the batch size was 8, the number of epochs was 500, the initial learning rate was 0.0002, and the learning rate decreases according to the “linear” strategy every 50 epochs. The hardware used included an Intel Core i7-6700HQ processor (Intel Corporation, Santa Clara, California, United States, obtained from online retail channels) and NVIDIA GeForce GTX 2080Ti graphics card (NVIDIA Corporation, Santa Clara, California, United States, obtained from online retail channels, using ASUS XG STATION PRO Thunderbolt 3 GPU dock for connection), with PyTorch 1.7.1 as the framework and Python 3.7.

After training the Cycle-GAN model, two style transfer model files were saved: GA.pth, which transfers the style from spring-summer to autumn-winter, and GB.pth, which transfers the style from autumn-winter to spring-summer. These models were used to modify the seasonal background style of smoke images in the Smoke-Raw dataset, enabling data augmentation. Following the Cycle-GAN-based data augmentation, 2400 high-quality and relevant images were selected from the generated outputs and added to the Smoke-Raw dataset, resulting in the creation of the Smoke-Exp dataset. The Smoke-Exp dataset comprises a total of 6016 images. Examples of the generated images are shown in Figure 2.

Subsequently, the LabelImg tool [43] was utilized to annotate the wildfire smoke objects in the images with bounding boxes. Once the annotation process for all images was completed, the data were meticulously organized to ensure rigorous experimentation and ease of data handling. The images, annotation files, segmentation categories, and other related files were formatted and stored according to the PASCAL VOC standard dataset format [44].

4. Early Wildfire Smoke Detection Method Based on EDA

Early or long-distance wildfire smoke often appears as small targets in images captured by electronic watchtowers. Timely detection and warning of such smoke are critical for preventing wildfire incidents. However, detecting this early smoke is challenging due to several factors: it typically occupies a small number of pixels, exhibits low resolution in the smoke area, and has blurry texture and contour features. Additionally, variations in the height and width of smoke, along with interference from clouds and fog in real forest environments, can lead to false alarms. To enhance the detection accuracy for these early small smoke targets, YOLOv5 [45] was selected as the base model and adapted for small wildfire smoke detection. We propose an early wildfire smoke detection algorithm that integrates a hybrid domain attention mechanism, Efficient Channel and Dilated Convolution Spatial Attention (EDA). While more advanced YOLO algorithms exist, they incorporate a large number of general object detection methods that have not been validated for wildfire smoke detection. Therefore, we chose the more classic and mature YOLOv5 as the base model for its robustness and suitability for this specific application.

4.1. Multi-Scale Small Smoke Target Detection Algorithm M-YOLO

Detecting small smoke targets is a critical and challenging task in wildfire smoke detection [46]. Small wildfire smoke targets occupy relatively few pixels, exhibit low resolution in the smoke area, and have indistinct texture and contour features. This makes it difficult for deep learning models to effectively extract relevant features, rendering them highly susceptible to background interference and noise. During image feature extraction in YOLO, as the network deepens, the resolution of the feature maps decreases, and the local receptive field (the original image area corresponding to each pixel) expands. Consequently, deeper feature maps capture more global information, while shallower feature maps retain positional and detailed information.

To address the limitations of single-scale feature detection for small objects, researchers have proposed multi-scale feature fusion [47]. By integrating shallow and deep features, it becomes possible to leverage the positional, detailed, and semantic information contained in each feature map to predict object locations and categories across multiple scales. Studies have shown that multi-scale detection methods significantly improve algorithm performance. YOLOv5 achieves multi-scale object detection by employing three types of downsampling: 8×, 16×, and 32×, generating feature maps of different resolutions. For example, with an input image size of 640 × 640 × 3, the corresponding feature map sizes are 80 × 80, 40 × 40, and 20 × 20. Figure 3a illustrates the YOLOv5 network structure. Each feature map output layer contains 255 channels, encoding information such as object category, confidence, and bounding box coordinates [48]. The network uses anchor points at each scale to predict target positions and sizes, assigning larger targets to lower-level (32× downsampling) feature maps and smaller targets to higher-level (8× downsampling) feature maps.

To enhance the detection of small-scale wildfire smoke, a multi-scale improvement strategy was proposed. In addition to the existing three-scale detection heads in YOLOv5, a new detection head sensitive to small targets, with a minimum size of 4 × 4 pixels, was introduced. This improvement allows the network to detect early-stage wildfire smoke with a minimum resolution of 16 pixels. The enhanced model, named M-YOLO, incorporates an additional detection head, Detect-Tiny, which performs 4× downsampling. Furthermore, the feature fusion method at the Neck was adjusted. The improved multi-scale YOLO wildfire smoke detection model includes four additional modules in the upper Neck and introduces the Detect-Tiny head in the upper Prediction layer, as illustrated in Figure 3b.

The wildfire smoke detection model based on M-YOLO achieves two key objectives, addressing critical challenges in early wildfire smoke detection:

Enhancing network depth and width: By increasing the depth and width of the network, the model is able to learn more comprehensive and multi-level feature information, which enables it to capture both low-level details and high-level semantic representations of smoke targets. This enhancement improves the model’s performance in detecting objects across multiple scales, especially in complex environments, allowing it to better differentiate smoke from surrounding background noise and achieve more accurate detections in diverse conditions.
Improving sensitivity to small smoke targets: The model increases its sensitivity to small smoke targets, allowing it to capture more detailed information on these smaller targets. This enhancement significantly improves the detection of small smoke targets, even under conditions of low visual quality.

4.2. Construction of Hybrid Domain Attention Mechanism EDA

To enhance the network’s ability to detect smoke and suppress interference from clouds, fog, and irrelevant background elements, a hybrid domain attention mechanism called EDA was developed. This mechanism integrates an efficient channel attention mechanism with a spatial attention mechanism incorporating dilated convolutions. By simultaneously focusing on both the spatial and channel dimensions, the EDA mechanism reduces the influence of irrelevant information and strengthens the representation of smoke-related features in the network. Consequently, it improves the detection performance for wildfire smoke targets.

4.2.1. Efficient Channel Attention (ECA)

The Efficient Channel Attention (ECA) mechanism [49] is widely used in computer vision to dynamically adjust the weights of different feature channels based on the input image’s content. This adjustment enhances the feature representation and robustness by highlighting meaningful information and guiding the network’s focus toward target objects [50]. ECA, a form of channel attention mechanism, employs a one-dimensional convolutional layer that adaptively adjusts the kernel size to fit varying feature map sizes and dimensions. The ECA module’s schematic is illustrated in Figure 4.

4.2.2. Dilated Convolutional Spatial Attention (DCSA)

Dilated convolution [51] is a convolutional variant that expands the receptive field by introducing gaps between kernel elements without increasing kernel size or parameters. The dilation factor

d

controls the gaps, where

d = 1

corresponds to standard convolution, and

d > 1

increases the receptive field. Dilated convolution is computed as shown in Equation (2):

k^{'} = k + (k - 1) (d - 1)

(2)

where

k

is the original kernel size,

d

is the dilation factor, and

k^{'}

represents the effective kernel size after dilation.

The receptive field is the region in the input image that affects a point in the feature map of the convolutional neural network. Its size depends on the kernel size and stride. By increasing the dilation rate with fixed kernel size and stride, the receptive field grows exponentially, capturing richer feature map information. The receptive field size for a given layer is computed as shown in Equation (3):

R F_{n} = R F_{n - 1} + (k^{'} - 1) \prod_{n - 1}^{i = 1} S_{i}

(3)

where

R F_{n}

is the receptive field size in the current layer,

R F_{n - 1}

is the receptive field size in the previous layer, and

S

is the stride of the convolution.

The spatial attention mechanism assigns weights to each spatial position based on feature map correlations, highlighting target features and suppressing background noise. In the DCSA module, the 7 × 7 convolution in the standard spatial attention mechanism is replaced with dilated convolution, reducing computational costs while maintaining the receptive field and controlling parameter growth after introducing the attention mechanism [51]. A 3 × 3 dilated convolution with a dilation factor

d = 3

is used, providing an effective 7 × 7 receptive field size. Figure 5 illustrates the DCSA structure.

4.2.3. Hybrid Domain Attention Mechanism (EDA)

The hybrid domain attention (Efficient Channel and Dilated Convolution Spatial Attention, EDA) module combines the ECA and DCSA modules. Its structure is shown in Figure 6.

4.3. Early Wildfire Smoke Detection Method MEDA-YOLO

The EDA module applies attention weighting across both the spatial and channel dimensions, emphasizing critical feature information within the feature map. The EDA module is integrated into the M-YOLO network, resulting in a new model, MEDA-YOLO. MEDA-YOLO incorporates the EDA module into the backbone of M-YOLO, which is a deep CNN designed to extract multi-scale image features and fuse them into a global feature vector. The backbone primarily consists of CBS, C3, and SPPF modules. CBS includes a convolution module (Conv), normalization module, and SiLU activation function. C3 consists of CBS modules and bottleneck layers, responsible for feature fusion within the network. The EDA module is embedded in the C3 structure to enhance feature extraction in both spatial and channel dimensions, as shown in Figure 7.

4.4. Loss Function of MEDA-YOLO

The MEDA-YOLO model compares predicted results with ground truth labels to optimize its performance. The loss function measures the discrepancy between predictions and labels, guiding the model’s learning. The loss function consists of three components: bounding box loss

L_{b b o x}

, confidence loss

L_{o b j}

, and classification loss

L_{c l s}

.

Bounding box loss $L_{b b o x}$ : This loss uses the Generalized Intersection over Union ( $G I o U$ ) function for bounding box regression. The overlap between predicted and ground truth boxes is calculated for each anchor box, as shown in Equation (4).

$L_{b b o x} = 1 - G I o U$

(4)

where $G I o U$ stands for Generalized Intersection over Union, which is an improved metric of $I o U$ . It takes into account the area of the convex hull between the two boxes. The formula for $G I o U$ is given by Equation (5):

$G I o U = I o U - \frac{C - B \cup B^{g t}}{C}$

(5)

$I o U$ stands for Intersection over Union, which is the ratio of the intersection area between two boxes to their union area. $C$ represents the minimum convex hull area that contains both boxes, while $B$ refers to the predicted bounding box of the target. B^gt is the target’s real box.
Confidence loss $L_{o b j}$ : This refers to the error between the confidence scores of the predicted bounding boxes and the true labels during training. MEDA-YOLO uses the Binary Cross-Entropy (BCE) loss function to compute confidence loss, as shown in Equation (6):

$L_{o b j} = B C E L o s s = \{\begin{matrix} - {\log P}^{'}, & y = 1 \\ - \log (1 - P^{'}), & y = 0 \end{matrix}$

(6)

where $P^{'}$ is the predicted value, and $y$ is the true label.
Classification loss $L_{c l s}$ : This measures the difference between the predicted class probabilities and the true labels. Similar to confidence loss, MEDA-YOLO employs the BCE loss function to compute this loss for each anchor box, as shown in Equation (7).

$L_{c l s} = B C E L o s s$

(7)

5. Experiment and Result

To evaluate the performance of the proposed method, experiments were conducted on the self-constructed Smoke-Exp dataset. The dataset was divided into training, validation, and test sets in an 8:1:1 ratio.

5.1. Experiment Setup

The pseudo-labeled data obtained in the previous section were used for experiments. Comparative tests between the baseline model and the proposed algorithm were conducted under identical conditions, including the same training and test sets. The hyperparameters were consistent: input image size was 640 × 640, batch size was 4, initial learning rate was 0.0001, and the learning rate was reduced by 95% every five epochs. The hardware used included an Intel Core i7-6700HQ processor and NVIDIA GeForce GTX 2080Ti graphics card, with PyTorch 1.7.1 as the framework and Python 3.7.

5.2. Evaluation Metrics

The evaluation metrics for this experiment include precision (P), recall (R), mean average precision (mAP), and frames per second (FPS). Precision (P) and recall (R) are calculated using Equations (8) and (9).

P = \frac{T P}{T P + F P}

(8)

R = \frac{T P}{T P + F N}

(9)

Precision (P): Measures the ratio of correctly identified positive targets (TP) to the total identified positives (TP + FP).
Recall (R): Measures the ratio of correctly identified positive targets (TP) to the actual positives (TP + FN).

The average precision (AP) for each class can be calculated using precision (P) and recall (R). The mean average precision (mAP) is the average of the AP values for all classes, as shown in Equations (10) and (11). In this study, the

I o U

threshold used for mAP calculation is set to 0.5.

AP = \int_{0}^{1} P (R) d R

(10)

m AP = \frac{\sum_{i = 1}^{C} A P_{i}}{C}

(11)

where

C

represents the total number of classes in the dataset, and

A P_{i}

represents the average precision of the class

i

. In this study, the detection object has only one category of smoke, so the mAP is the AP under the same threshold.

5.3. Dataset Augmentation Experiment

To assess the effectiveness of the Cycle-GAN-based data augmentation method, YOLOv5s was trained four times on both the Smoke-Raw and Smoke-Exp datasets. All experiments were conducted with the same settings, utilizing early stopping mechanisms. For simplicity, the experiments on the Smoke-Raw and Smoke-Exp datasets are referred to as YOLO-Raw and YOLO-Exp, respectively. The results are presented in Table 1.

The results demonstrate that dataset augmentation significantly improves model performance. The mAP on Smoke-Raw was 92.40%, which increased by 2.27% after applying Cycle-GAN augmentation. Similarly, both precision and recall improved, validating the efficacy of Cycle-GAN augmentation in enhancing model generalization.

5.4. EDA Embedding Position Experiment

To determine the optimal embedding position for the EDA module, we conducted experiments by integrating the EDA module into different parts of the M-YOLO network.

MEDA-YOLO-Backbone: The EDA module was embedded into the Backbone.
MEDA-YOLO-Neck: The EDA module was embedded into the Neck.
MEDA-YOLO-Output: The EDA module was embedded into the Output.

These experiments were designed to compare the effectiveness of different embedding positions and identify the most suitable configuration for enhancing the model’s performance. The results of these experiments are summarized in Table 2, providing insights into the impact of the EDA module’s placement on detection accuracy and efficiency.

The results demonstrate that mAP improved across all embedding positions. Among the different configurations, MEDA-YOLO-Backbone, which integrates the EDA hybrid domain attention mechanism into the Backbone, achieved the highest mAP of 97.58%, while YOLOv5 had the lowest at 95.42%. MEDA-YOLO-Backbone also attained the highest precision and recall. Additionally, the precision of the entire MEDA-YOLO series was consistently higher than that of M-YOLO, indicating that the EDA module effectively guides the network to focus on smoke details and reduces the miss rate by adjusting channel and spatial attention on the feature map. Moreover, the addition of a small object detection layer significantly improved the model’s ability to detect small objects, leading to higher accuracy and recall compared to YOLOv5s.

In terms of FPS, adding the small object detection layer significantly decreases the FPS, while the EDA attention mechanism moderately mitigates this reduction. MEDA-YOLO-Neck achieves the highest FPS because the EDA attention mechanism is applied between the Concat and C3 layers of PAN. This allows the EDA module to adjust the channel and spatial attention of only the multi-scale fusion features obtained after Concat, without altering the feature maps at every scale. As a result, the computational load and number of parameters of the EDA module are reduced, enhancing the model’s running speed.

Figure 8 illustrates the four training metrics: mAP@0.5, mAP@0.95, precision, and recall for the five models. Overall, all models demonstrate strong performance in detection accuracy, precision, and recall, with MEDA-YOLO-Backbone exhibiting the best performance, while YOLOv5s shows the weakest results.

Additionally, we conducted experiments and analyzed the loss variation of the algorithm. Figure 9 illustrates the bounding box loss curves across epochs for the five models during training. Bounding box loss measures the difference between the predicted and actual bounding box positions, with lower values indicating higher localization accuracy. In Figure 9a, YOLOv5s exhibits the lowest curve, with a loss value of 0.02017, indicating the highest localization accuracy on the training set. In comparison, the curves for M-YOLO, MEDA-YOLO-Backbone, MEDA-YOLO-Neck, and MEDA-YOLO-Output are slightly higher, though the differences are minimal. Figure 9b shows the bounding box loss on the validation set, where M-YOLO achieves the lowest loss value of 0.02665, followed by MEDA-YOLO-Neck at 0.02874, with the latter showing the smoothest curve. YOLOv5s has the highest loss value of 0.03110. Overall, all models demonstrate strong convergence during training, as their bounding box losses consistently decrease with increasing epochs, eventually stabilizing at relatively low levels. This suggests that all models successfully fit the data without signs of overfitting or underfitting.

Objectness loss is a key metric for evaluating the classification accuracy of YOLO models. Lower objectness loss indicates greater accuracy in predicting target classes. To compare the performance of YOLOv5s with other models based on M-YOLO and MEDA-YOLO, we plotted the objectness loss curves for both the training set (Figure 10a) and the validation set (Figure 10b) and analyzed the results, highlighting the strengths and weaknesses of each model.

Figure 10a illustrates that, on the training set, YOLOv5s has the highest curve, indicating the largest objectness loss. In contrast, the curves for M-YOLO, MEDA-YOLO-Backbone, MEDA-YOLO-Neck, and MEDA-YOLO-Output are slightly lower, demonstrating that these models can accurately identify small smoke targets and exhibit comparable detection performance. During training, the objectness loss of all models decreases steadily with the number of training epochs, eventually stabilizing at low values, indicating that the models fit the data well and converge efficiently. Notably, during the first six epochs, the objectness loss for all models except YOLOv5s rises briefly before declining. This may be attributed to the use of a small object detection layer, which takes some time to adapt to the data distribution and might introduce minor noise or redundant information.

Figure 10b presents the objectness loss curves for the validation set, where YOLOv5s again exhibits the highest curve with slight oscillations, signifying lower detection accuracy and weaker generalization. The other four models maintain low and smooth objectness loss curves. After 100 epochs, the difference between their loss values is minimal, demonstrating good classification performance and strong generalization ability. From Figure 10, it is clear that YOLOv5s performs the worst, while the other improved models demonstrate superior performance, characterized by lower loss, higher accuracy, and stronger generalization.

These experimental results confirm that all models effectively minimize the loss function during training, achieving good convergence. The EDA module, when embedded in the Backbone, delivers the best performance. This configuration enables deep feature extraction and residual learning while adjusting feature importance across dimensions. It guides the network to focus on smoke details and reduces the influence of irrelevant information such as clouds and fog, thereby significantly lowering the false detection rate.

The MEDA-YOLO-Backbone will be adopted as the improved MEDA-YOLO algorithm.

5.5. Model Comparison Experiment

To validate the effectiveness of the proposed model, performance comparisons were conducted between our method and several others, including the single-stage YOLOv5s, M-YOLO, YOLOv7 [52], and YOLOv8 [53] models, as well as the two-stage Faster R-CNN [54] model and the Transformer-based RT-DETR [55] model. The results of these comparisons are presented in Table 3. Compared to the YOLOv5s model, MEDA-YOLO demonstrates a slight reduction in FPS by 3 frames but achieves notable improvements in precision, recall, and mAP, with increases of 4.51%, 4.37%, and 2.16%, respectively.

In terms of the mAP metric, the proposed M-YOLO model achieves a score of 96.74%, representing a 1.32% improvement over the original YOLOv5s model and a 3.26% improvement over the Faster R-CNN model. This demonstrates that the multi-scale enhancement strategy employed in this study significantly improves the model’s detection accuracy. Moreover, MEDA-YOLO achieves the highest mAP value of 97.58%, marking a 0.84% improvement over M-YOLO. Regarding precision and recall, the M-YOLO model shows a 2.41% increase in precision and a 3.56% increase in recall compared to YOLOv5s. This suggests that the inclusion of the Detect-Tiny head for small object detection effectively enhances the model’s capability to detect small smoke targets, reducing missed detections and false alarms. Although the Transformer-based RT-DETR model achieves the highest precision and recall scores, the differences between MEDA-YOLO and RT-DETR in these metrics are minimal. In terms of FPS, the addition of the Detect-Tiny head slightly reduces the detection speed of the MEDA-YOLO model to 75 FPS, which is 3 FPS lower than YOLOv5s. However, this remains a high speed, meeting the demands of real-time detection. Both YOLOv8 and YOLOv5 achieve similar speeds of 78 FPS. On the other hand, RT-DETR’s FPS decreases significantly compared to MEDA-YOLO due to the large number of parameters, which is 19 FPS. The Faster R-CNN model, with its complex network structure and two-stage detection process, has the slowest speed of only 9 FPS, making it unsuitable for real-time detection.

To comprehensively and accurately evaluate the performance of the wildfire smoke detection model, test images featuring large smoke targets were included. The qualitative analysis images in this section are categorized into two types:

Forest Background Images: These include smoke-like targets such as haze, clouds or fog.
Small-Scale Wildfire Smoke Images: These are captured in the early stages of a wildfire or from a distance.

These images are obtained from real forest monitoring cameras, providing a more realistic simulation of actual detection environments, which encompass common difficulties and challenges in wildfire detection, and are of crucial significance in verifying the robustness and generalization ability of the detection model. Figure 11 shows the results of different models for detecting small smoke objects.

It can be observed that all six models successfully detect small smoke objects in the images. Comparing the results in Figure 11b,e,f, MEDA-YOLO demonstrates higher detection confidence than both M-YOLO and YOLOv5s, indicating that the improved network is more sensitive to detecting small smoke objects. For the small smoke objects in the first column of the images, MEDA-YOLO outperforms YOLOv7 and YOLOv8, while the detection confidence for large smoke objects remains comparable among the three models. Notably, Faster R-CNN, as shown in Figure 11a, exhibits the highest prediction confidence, but it has the lowest mAP among the quantitative metrics. Despite this, it performs well in detecting small smoke objects during qualitative analysis. However, both Faster R-CNN in Figure 11a and YOLOv5s in Figure 11b exhibit false detections of a tent in the bottom left corner of the images. In contrast, MEDA-YOLO in Figure 11f successfully avoids such false detections.

The detection results for smoke-like objects are presented in Figure 12. It is evident that the first five networks—Faster R-CNN, YOLOv5s, YOLOv7, YOLOv8, and M-YOLO—exhibit a relatively high number of false detections for cloud-like and fog-like smoke objects.

In Figure 12a, Faster R-CNN generates the highest number of incorrect prediction boxes in the left image, with one instance even mistakenly detecting the forest background as smoke. It also exhibits the highest false detection confidence in the right image, indicating that Faster R-CNN has the weakest performance in recognizing smoke and smoke-like targets, resulting in the highest false detection rate. In Figure 12b, YOLOv5s, due to its smaller model size, fails to capture sufficient detailed features, leading to numerous incorrect prediction boxes for clouds and fog. This suggests a limitation in its ability to distinguish smoke from similar-looking objects. Figure 12c demonstrates that YOLOv7 produces many incorrect prediction boxes for cloud and fog images in the first column, highlighting its poor performance in suppressing interference from these elements. Similarly, Figure 12d shows YOLOv8 also generates false detections for clouds and fog.

In Figure 12e, M-YOLO shows improved performance with only three incorrect prediction boxes in the left image of clouds and fog and no false detections in the right image. This improvement is attributed to M-YOLO’s enhanced multi-scale detection capabilities, which expand the feature fusion network scale by four times, reduce local receptive fields, and better capture smoke details in small targets. Despite this progress, further enhancement is still possible. Figure 12f illustrates that MEDA-YOLO, incorporating the EDA attention mechanism, demonstrates superior discrimination capabilities for cloud and fog images. It virtually eliminates false alarms for these images, indicating a significant improvement in distinguishing smoke from interfering elements.

6. Discussion

Wildfire smoke detection is an essential task for early wildfire management, and deep learning models have become a cornerstone for addressing this challenge. While existing methods have made significant strides in improving detection accuracy, they often struggle with small smoke targets and interference from environmental factors and complex backgrounds. In this study, we proposed MEDA-YOLO, a novel model that incorporates the Efficient Channel and Dilated Convolution Spatial Attention (EDA) mechanism into an enhanced YOLO framework to address these challenges. MEDA-YOLO significantly improves the detection of small smoke targets, which is critical in the early stages of wildfire detection, where smoke often occupies only a few pixels in the image.

Beyond wildfire smoke detection, the proposed MEDA-YOLO model has broader potential applications in fields such as drone-based aerial imagery and satellite remote sensing. In both aerial and satellite imagery, small targets are often present against complex and dynamic backgrounds, similar to the conditions encountered in wildfire detection. The ability to detect small objects amidst interference makes MEDA-YOLO highly applicable in these domains. This cross-domain applicability highlights the versatility of the model and adds to its overall contribution to the field.

Despite these advancements, the computational demands of the EDA mechanism and the added detection head slightly reduce inference speed, posing challenges for deployment on resource-constrained edge devices. Moreover, while the Smoke-Exp dataset, enriched with synthetic data from Cycle-GAN, provides diverse training scenarios, there remains a need for further validation using larger real-world datasets to ensure robustness across diverse conditions. These limitations highlight the importance of developing lightweight and efficient detection frameworks to facilitate widespread deployment in real-time wildfire monitoring systems.

7. Conclusions

This study presents MEDA-YOLO, a novel wildfire smoke detection model designed to address key challenges such as detecting small smoke targets and minimizing false alarms in complex environmental conditions. By integrating the Efficient Channel and Dilated Convolution Spatial Attention (EDA) mechanism into an enhanced YOLO framework, MEDA-YOLO significantly improves detection accuracy and robustness. Experimental results demonstrate that MEDA-YOLO achieves a mean Average Precision (mAP) of 97.58% on the Smoke-Exp dataset, outperforming YOLOv5, YOLOv8, and M-YOLO by 2.16%, 1.24%, and 0.84%, respectively, while maintaining real-time detection speed.

Another contribution of this research is the creation of the Smoke-Exp dataset, which combines 3616 real-world wildfire smoke images with 2400 synthetic images generated using Cycle-GAN. This diverse dataset enables the model to generalize effectively across various wildfire detection scenarios, providing a strong foundation for training and evaluation. Additionally, the introduction of the Detect-Tiny head in M-YOLO enhances the network’s sensitivity to small smoke targets, addressing a key limitation in traditional object detection methods.

While MEDA-YOLO demonstrates superior performance and meets the requirements for real-time detection, it has only been tested on a PC and has not yet been deployed on resource-constrained edge devices. This limitation highlights an important direction for future work. Specifically, optimizing the model for deployment on edge devices, such as mobile or embedded systems, remains a key challenge. Future research will focus on enhancing the model’s computational efficiency and reducing its resource consumption, ensuring that it can be effectively deployed in real-time, low-latency environments. Additionally, expanding the dataset with more real-world images will further improve the model’s robustness and adaptability across diverse operational conditions.

Author Contributions

Conceptualization, Y.L. and F.C.; Data curation, Y.L. and F.C.; Formal analysis, Y.L. and F.C.; Funding acquisition, C.Z., Y.W. and J.Z.; Investigation, J.Z.; Methodology, Y.L. and C.Z.; Project administration, Y.W. and J.Z.; Resources, Y.L. and J.Z.; Software, Y.L.; Supervision, Y.W. and J.Z.; Validation, Y.L. and F.C.; Visualization, Y.L. and F.C.; Writing—original draft, Y.L. and F.C.; Writing—review and editing, Y.L., F.C. and C.Z. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the Fundamental Research Funds for the Central Universities (Grant No. QNTD202304), Xiong’an New Area Science and Technology Innovation Special Project of Ministry of Science and Technology of China (2023XAGG0065).

Data Availability Statement

The data presented in this study are available on request from the corresponding author. The data are not publicly available due to the fact that the research group’s wildfire smoke identification-related research is still being carried on; the later work will also rely on the current dataset to carry out.

Conflicts of Interest

The authors declare no conflicts of interest.

Correction Statement

This article has been republished with a minor correction to the Funding statement. This change does not affect the scientific content of the article.

References

Li, J.; Tang, H.; Li, X.; Dou, H.; Li, R. LEF-YOLO: A lightweight method for intelligent detection of four extreme wildfires based on the YOLO framework. Int. J. Wildland Fire 2024, 33, WF23044. [Google Scholar] [CrossRef]
Gonçalves, A.M.; Brandão, T.; Ferreira, J.C. Wildfire Detection with Deep Learning—A Case Study for the CICLOPE Project. IEEE Access 2024, 12, 82095–82110. [Google Scholar] [CrossRef]
Al-Smadi, Y.; Alauthman, M.; Al-Qerem, A.; Aldweesh, A.; Quaddoura, R.; Aburub, F.; Mansour, K.; Alhmiedat, T. Early Wildfire Smoke Detection Using Different YOLO Models. Machines 2023, 11, 246. [Google Scholar] [CrossRef]
Bai, Y.; Wang, B.; Wu, Y.; Liu, X. A Review of Global Forest Fires in 2021. Fire Sci. Technol. 2022, 41, 705–709. [Google Scholar]
Mohapatra, A.; Trinh, T. Early Wildfire Detection Technologies in Practice—A Review. Sustainability 2022, 14, 12270. [Google Scholar] [CrossRef]
Xia, X.; Yuan, F.; Zhang, L.; Yang, L.; Shi, J. From traditional methods to deep ones: Review of visual smoke recognition, detection, and segmentation. J. Image Graph. 2019, 24, 1627–1647. [Google Scholar] [CrossRef]
Gui, S.; Song, S.; Qin, R.; Tang, Y. Remote Sensing Object Detection in the Deep Learning Era—A Review. Remote Sens. 2024, 16, 327. [Google Scholar] [CrossRef]
Maillard, S.; Safayet Khan, M.; Cramer, A.; Karanci Sancar, E. Wildfire and Smoke Detection Using YOLO-NAS. In Proceedings of the 2024 IEEE 3rd International Conference on Computing and Machine Intelligence (ICMI), Mt Pleasant, MI, USA, 13–14 April 2024. [Google Scholar]
El-Madafri, I.; Peña, M.; Olmedo-Torre, N. Real-Time Forest Fire Detection with Lightweight CNN Using Hierarchical Multi-Task Knowledge Distillation. Fire 2024, 7, 392. [Google Scholar] [CrossRef]
Wang, M.; Yue, P.; Jiang, L.; Yu, D.; Tuo, T.; Li, J. An Open Flame and Smoke Detection Dataset for Deep Learning in Remote Sensing based Fire Detection. Geo-Spat. Inf. Sci. 2024, 1–16. [Google Scholar] [CrossRef]
Li, T.; Zhao, E.; Zhang, J.; Hu, C. Detection of Wildfire Smoke Images Based on a Densely Dilated Convolutional Network. Electronics 2019, 8, 1131. [Google Scholar] [CrossRef]
Srinivas, P.; Maheshwari, U.; Jha, S.K.; Bajpai, I.; Lalmohan, K.S. Smart Early Fire and Smoke Detection with Transfer Learning. In Proceedings of the 2024 IEEE Region 10 Symposium (TENSYMP), New Delhi, India, 27–29 September 2024. [Google Scholar]
Wang, G.; Li, H.; Ye, S.; Zhao, H.; Ding, H.; Xie, S. RFWNet: A Multiscale Remote Sensing Forest Wildfire Detection Network With Digital Twinning, Adaptive Spatial Aggregation, and Dynamic Sparse Features. IEEE Trans. Geosci. Remote Sens. 2024, 62, 4708523. [Google Scholar] [CrossRef]
Kumar, A.; Perrusquía, A.; Al-Rubaye, S.; Guo, W. Wildfire and smoke early detection for drone applications: A light-weight deep learning approach. Eng. Appl. Artif. Intell. 2024, 136, 108977. [Google Scholar] [CrossRef]
Liu, Y.; Chen, F.; Zhang, C.; Zhang, J.; Wang, Y. Early Wildfire Smoke Detection Method Based on EDA. In Proceedings of the 5th China Computer Application Conference on Forestry and Grassland (CACFG), Kunming, China, 19–21 July 2024. [Google Scholar]
Chen, Q.; Zhao, H.; Yu, Y. Research on Monitoring Technology of Mountain Fire Disaster Based on UAV’s Power Transmission Line. Autom. Instrum. 2019, 04, 242–245+250. [Google Scholar]
Xue, X.; Jin, S.; An, F.; Zhang, H.; Fan, J.; Eichorn, M.P.; Jin, C.; Chen, B.; Jiang, L.; Yun, T. Shortwave Radiation Calculation for Forest Plots Using Airborne LiDAR Data and Computer Graphics. Plant Phenomics 2022, 2022, 9856739. [Google Scholar] [CrossRef] [PubMed]
Mok, T.C.W.; Li, Z.; Bai, Y.; Zhang, J.; Liu, W.; Zhou, Y.; Yan, K.; Jin, D.; Shi, Y.; Yin, X.; et al. Modality-Agnostic Structural Image Representation Learning for Deformable Multi-Modality Medical Image Registration. In Proceedings of the 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, 19 June 2024. [Google Scholar]
Alotaibi, A.; Alatawi, H.; Binnouh, A.; Duwayriat, L.; Alhmiedat, T.; Alia, O.M. Deep Learning-Based Vision Systems for Robot Semantic Navigation: An Experimental Study. Technologies 2024, 12, 157. [Google Scholar] [CrossRef]
Huang, J.; Tian, Y.; Dai, X.; Wang, X.; Ping, Z. Deep Learning-based Multimodal Trajectory Prediction Methods for Autonomous Driving: State of the Art and Perspectives. Chin. J. Intell. Sci. Technol. 2023, 5, 180–199. [Google Scholar]
Seralathan, P.; Edward, S. Revolutionizing Agriculture: Deep Learning-Based Crop Monitoring Using UAV Imagery—A Review. In Proceedings of the 2024 International Conference on Expert Clouds and Applications (ICOECA), Bengaluru, India, 18 April 2024. [Google Scholar]
Zhang, Z.; Zhou, M.; Wan, H.; Li, M.; Li, G.; Han, D. IDD-Net: Industrial Defect Detection Method Based on Deep-Learning. Eng. Appl. Artif. Intell. 2023, 123, 106390. [Google Scholar] [CrossRef]
Yang, Z.; Tian, Y.; Zhang, J. Adaptive Image Processing Embedding to Make the Ecological Tasks of Deep Learning More Robust on Camera Traps Images. Ecol. Inform. 2024, 82, 102705. [Google Scholar] [CrossRef]
Zahrawi, M.; Shaalan, K. Improving Video Surveillance Systems in Banks Using Deep Learning Techniques. Sci. Rep. 2023, 13, 7911. [Google Scholar] [CrossRef]
Sun, X.; Sun, L.; Huang, Y. Forest Fire Smoke Recognition Based on Convolutional Neural Network. J. For. Res. 2021, 32, 1921–1927. [Google Scholar] [CrossRef]
Shang, D.; Zhang, F.; Yuan, D.; Hong, L.; Zheng, H.; Yang, F. Deep Learning-based Forest Fire Risk Research on Monitoring and Early Warning Algorithms. Fire 2024, 7, 151. [Google Scholar] [CrossRef]
Mohnish, S.; Akshay, K.P.; Gokul Ram, S.; Sarath Vignesh, A.; Pavithra, P.; Ezhilarasi, S. Deep Learning Based Forest Fire Detection and Alert System. In Proceedings of the 2022 International Conference on Communication (ICC), Chennai, India, 12 May 2022. [Google Scholar]
Park, G.; Lee, Y. Wildfire Smoke Detection Enhanced by Image Augmentation with StyleGAN2-ADA for YOLOv8 and RT-DETR Models. Fire 2024, 7, 369. [Google Scholar] [CrossRef]
Xu, G.; Zhang, Q.; Liu, D.; Lin, G.; Wang, J.; Zhang, Y. Adversarial Adaptation from Synthesis to Reality in Fast Detector for Smoke Detection. IEEE Access 2019, 7, 29471–29483. [Google Scholar] [CrossRef]
Krichen, M. Generative Adversarial Networks. In Proceedings of the 2023 14th International Conference on Computing Communication and Networking Technologies (ICCCNT), Delhi, India, 6–8 July 2023. [Google Scholar]
He, Y.; Kang, X.; Yan, Q.; Li, E. ResNeXt Plus: Attention Mechanisms Based on ResNeXt for Malware Detection and Classification. Leee Trans. Inf. Forensics Secur. 2023, 19, 1142–1155. [Google Scholar] [CrossRef]
Li, M.; Pan, X.; Liu, C. ASOD: An Atrous Object Detection Model Using Multiple Attention Mechanisms for Obstacle Detection in Intelligent Connected Vehicles. IEEE Internet Things J. 2024, 11, 33193–33203. [Google Scholar] [CrossRef]
Yang, S.; Liu, Y.; Liu, Z.; Xu, C.; Du, X. Enhanced Vehicle Logo Detection Method Based on Self-Attention Mechanism for Electric Vehicle Application. World Electr. Veh. J. 2024, 15, 467. [Google Scholar] [CrossRef]
Shan, C.; Geng, X.; Han, C. Remote Sensing Image Road Network Detection Based on Channel Attention Mechanism. Heliyon 2024, 10, e37470. [Google Scholar] [CrossRef] [PubMed]
Li, G.; Fang, Q.; Zha, L.; Gao, X.; Zheng, N. HAM: Hybrid attention module in deep convolutional neural networks for image classification. Pattern Recognit. 2022, 129, 108785. [Google Scholar] [CrossRef]
Herrmann, L.; Kollmannsberger, S. Deep Learning in Computational Mechanics: A Review. Comput. Mech. 2024, 74, 281–331. [Google Scholar] [CrossRef]
KMU Fire & Smoke Database. Available online: https://cvpr.kmu.ac.kr/ (accessed on 3 September 2024).
Sample Fire and Smoke Video Clips. Available online: http://signal.ee.bilkent.edu.tr/VisiFire/Demo/SampleClips.html (accessed on 3 September 2024).
Feiniu’s Homepage. Available online: http://staff.ustc.edu.cn/~yfn/index.html (accessed on 3 September 2024).
Research Webpage About Smoke Detection for Fire Alarm: Datasets. Available online: http://smoke.ustc.edu.cn/datasets.htm (accessed on 3 September 2024).
Song, T.; Wu, Z.; Gao, A.; Yuan, J. CycleGAN-based Data Enhancement Method for Lunar Surface Images. Syst. Eng. Electron. 2023, 45, 3041–3048. [Google Scholar]
Zhu, J.; Park, T.; Isola, P.; Efros, A.A. Unpaired Image-to-Image Translation Using Cycle-Consistent Adversarial Networks. In Proceedings of the 2017 IEEE International Conference on Computer Vision (ICCV), Venice, Italy, 25 December 2017. [Google Scholar]
LabelImg PyPi. Available online: https://pypi.org/project/labelImg/ (accessed on 3 September 2024).
The PASCAL Visual Object Classes Homepage. Available online: http://host.robots.ox.ac.uk/pascal/VOC/ (accessed on 3 September 2024).
Ultralytics/yolov5. Available online: https://github.com/ultralytics/yolov5/ (accessed on 3 September 2024).
Zhao, L.; Zhi, L.; Zhao, C.; Zheng, W. Fire-YOLO: A Small Target Object Detection Method for Fire Inspection. Sustainability 2022, 14, 4930. [Google Scholar] [CrossRef]
Lu, D.; Cheng, S.; Wang, L.; Song, S. Multi-scale Feature Progressive Fusion Network for Remote Sensing Image Change Detection. Sci. Rep. 2022, 12, 11968. [Google Scholar] [CrossRef] [PubMed]
Shen, R.; Zhen, T.; Li, Z. YOLOv5-Based Model Integrating Separable Convolutions for Detection of Wheat Head Images. IEEE Access 2023, 11, 12059–12074. [Google Scholar] [CrossRef]
Wang, Q.; Wu, B.; Zhu, P.; Li, P.; Zuo, W.; Hu, Q. ECA-Net: Efficient Channel Attention for Deep Convolutional Neural Networks. In Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, 13–19 June 2020. [Google Scholar]
Hu, J.; Shen, L.; Sun, G. Squeeze-and-Excitation Networks. In Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Salt Lake City, UT, USA, 18–23 June 2018. [Google Scholar]
Niu, G.; Wang, X. A Mult-task Traffic Scene Detection Model Based on Cross-attention. J. Beijing Univ. Aeronaut. Astronaut. 2023, 50, 1491–1499. [Google Scholar]
Wang, C.; Bochkovskiy, A.; Liao, H.M. YOLOv7: Trainable Bag-of-Freebies Sets New State-of-the-Art for Real-Time Object Detectors. In Proceedings of the 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Vancouver, BC, Canada, 17–24 June 2023. [Google Scholar]
Rejin, V.; Sambath, M. YOLOv8: A Novel Object Detection Algorithm with Enhanced Performance and Robustness. In Proceedings of the 2024 International Conference on Advances in Data Engineering and Intelligent Computing Systems (ADICS), Chennai, India, 18–19 April 2024. [Google Scholar]
Ren, S.; He, K.; Girshick, R.; Sun, J. Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks. IEEE Trans. Pattern Anal. Mach. Intell. 2017, 39, 1137–1149. [Google Scholar] [CrossRef] [PubMed]
Zhao, Y.; Lv, W.; Xu, S.; Wei, J.; Wang, G.; Dang, Q.; Liu, Y.; Chen, J. DETRs Beat YOLOs on Real-time Object Detection. In Proceedings of the 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, 17–21 June 2024. [Google Scholar]

Figure 1. The structure of Cycle-GAN.

Figure 2. Example images generated by Cycle-GAN: (a,c) real image; (b,d) generated image.

Figure 3. Network structure diagram: (a) YOLOv5; (b) M-YOLO. The improvements are marked in red.

Figure 4. The ECA structure diagram. Different colors represent different channels.

Figure 5. Schematic diagram of spatial attention mechanism. Different colors represent different channels.

Figure 6. Schematic diagram of the hybrid domain attention mechanism EDA. Different colors represent different channels.

Figure 7. C3-EDA module structure diagram.

Figure 8. Model metrics in training: (a) mAP@0.5; (b) mAP@0.95; (c) precision; (d) recall. In the legend, MEDA-YOLO-Backbone, MEDA-YOLO-Neck, and MEDA-YOLO-Output are marked as YOLO-Backbone, YOLO-Neck, and YOLO-Output, respectively.

Figure 9. Comparison of bounding box loss curve: (a) train; (b) validate.

Figure 10. Comparison of objectness loss curve: (a) train; (b) validate.

Figure 11. Detection results of small smoke target: (a) Faster R-CNN; (b) YOLOv5s; (c) YOLOv7; (d) YOLOv8; (e) M-YOLO; (f) MEDA-YOLO.

Figure 12. Detection results of smoke-like targets: (a) Faster R-CNN; (b) YOLOv5s; (c) YOLOv7; (d) YOLOv8; (e) M-YOLO; (f) MEDA-YOLO. The red box without the “smoke” label in the figure is the direct output content of the original monitoring image, which is not the target detected by the algorithm.

Table 1. Comparison of dataset augmentation experiment results. Bold numbers indicate the best results.

Experiment Name	Dataset	P (%)	R (%)	mAP@0.5 (%)
YOLO-Raw	Smoke-Raw	93.15 ± 1.42	94.12 ± 0.59	92.40 ± 1.03
YOLO-Exp	Smoke-Exp	94.17 ± 1.34	94.56 ± 0.57	94.67 ± 0.56

Table 2. Comparison of EDA embedding position experiment results. Bold numbers indicate the best results.

Detection Model	Embedding Position	P (%)	R (%)	mAP@0.5 (%)	FPS
YOLOv5s	/	90.94	89.81	95.42	78
M-YOLO	/	93.35	93.37	96.74	68
MEDA-YOLO-Backbone	Backbone	95.45	94.18	97.58	75
MEDA-YOLO-Neck	Neck	94.14	93.33	96.81	80
MEDA-YOLO-Output	Output	94.12	93.33	97.20	70

Table 3. Comparison of different model experiment results. Bold numbers indicate the best results.

Detection Model	P (%)	R (%)	mAP@0.5 (%)	FPS
Faster R-CNN	92.12	91.52	93.48	9
YOLOv5s	90.94	89.81	95.42	78
YOLOv7	92.15	92.30	96.51	76
YOLOv8	93.11	91.18	96.34	78
M-YOLO	93.35	93.37	96.74	68
RT-DETR	95.92	94.81	96.94	19
MEDA-YOLO	95.45	94.18	97.58	75

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Liu, Y.; Chen, F.; Zhang, C.; Wang, Y.; Zhang, J. Early Wildfire Smoke Detection Method Based on EDA. Remote Sens. 2024, 16, 4684. https://doi.org/10.3390/rs16244684

AMA Style

Liu Y, Chen F, Zhang C, Wang Y, Zhang J. Early Wildfire Smoke Detection Method Based on EDA. Remote Sensing. 2024; 16(24):4684. https://doi.org/10.3390/rs16244684

Chicago/Turabian Style

Liu, Yang, Faying Chen, Changchun Zhang, Yuan Wang, and Junguo Zhang. 2024. "Early Wildfire Smoke Detection Method Based on EDA" Remote Sensing 16, no. 24: 4684. https://doi.org/10.3390/rs16244684

APA Style

Liu, Y., Chen, F., Zhang, C., Wang, Y., & Zhang, J. (2024). Early Wildfire Smoke Detection Method Based on EDA. Remote Sensing, 16(24), 4684. https://doi.org/10.3390/rs16244684

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Early Wildfire Smoke Detection Method Based on EDA

Abstract

1. Introduction

2. Related Work

2.1. Vision-Based Wildfire Smoke Detection

2.2. Role of Wildfire Smoke Data Augmentation

2.3. Attention Mechanisms in Object Detection

3. Construction of the Early Wildfire Smoke Dataset

4. Early Wildfire Smoke Detection Method Based on EDA

4.1. Multi-Scale Small Smoke Target Detection Algorithm M-YOLO

4.2. Construction of Hybrid Domain Attention Mechanism EDA

4.2.1. Efficient Channel Attention (ECA)

4.2.2. Dilated Convolutional Spatial Attention (DCSA)

4.2.3. Hybrid Domain Attention Mechanism (EDA)

4.3. Early Wildfire Smoke Detection Method MEDA-YOLO

4.4. Loss Function of MEDA-YOLO

5. Experiment and Result

5.1. Experiment Setup

5.2. Evaluation Metrics

5.3. Dataset Augmentation Experiment

5.4. EDA Embedding Position Experiment

5.5. Model Comparison Experiment

6. Discussion

7. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

Correction Statement

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI