Gardens Fire Detection Based on the Symmetrical SSS-YOLOv8 Network

Liu, Bo; Wang, Junhua; An, Qing; Wan, Yanglu; Zhou, Jianing; Chen, Xijiang

doi:10.3390/sym17081269

Open AccessArticle

Gardens Fire Detection Based on the Symmetrical SSS-YOLOv8 Network

by

Bo Liu

¹,

Junhua Wang

²,

Qing An

³

,

Yanglu Wan

^3,*,

Jianing Zhou

¹ and

Xijiang Chen

⁴

¹

School of Landscape and Horticulture, Wuhan University of Bioengineering, Wuhan 430415, China

²

Science and Technology Department, GongQing Institute of Science and Technology, Jiujiang 332020, China

³

Hubei Engineering Research Center for BDS-Cloud High-Precision Deformation Monitoring, Artificial Intelligence School, Wuchang University of Technology, Wuhan 430223, China

⁴

School of Safety Science and Emergency Management, Wuhan University of Technology, Wuhan 430070, China

^*

Author to whom correspondence should be addressed.

Symmetry 2025, 17(8), 1269; https://doi.org/10.3390/sym17081269

Submission received: 2 July 2025 / Revised: 17 July 2025 / Accepted: 5 August 2025 / Published: 8 August 2025

(This article belongs to the Section Computer)

Download

Browse Figures

Versions Notes

Abstract

Fire detection primarily relies on sensors such as smoke detectors, heat detectors, and flame detectors. However, due to cost constraints, it is impractical to deploy such a large number of sensors for fire detection in outdoor gardens and landscapes. To address this challenge and aiming to enhance fire detection accuracy in gardens while achieving lightweight design, this paper proposes an improved symmetry SSS-YOLOv8 model for lightweight fire detection in garden video surveillance. Firstly, the SPDConv layer from ShuffleNetV2 is used to preserve flame or smoke information, combined with the Conv_Maxpool layer to reduce computational complexity. Subsequently, the SE module is introduced into the backbone feature extraction network to enhance features specific to fire and smoke. ShuffleNetV2 and the SE module are configured into a symmetric local network structure to enhance the extraction of flame or smoke features. Finally, WIoU is introduced as the bounding box regression loss function to further ensure the detection performance of the symmetry SSS-YOLOv8 model. Experimental results demonstrate that the improved symmetry SSS-YOLOv8 model achieves precision and recall rates for garden flame and smoke detection both exceeding 0.70. Compared to the YOLOv8n model, it exhibits a 2.1 percentage point increase in mAP, while its parameter is only 1.99 M, reduced to 65.7% of the original model. The proposed model demonstrates superior detection accuracy for garden fires compared to other YOLO series models of the same type, as well as different types of SSD and Faster R-CNN models.

Keywords:

fire detection; YOLOv8; object detection; convolution

1. Introduction

Forests and gardens, serving as crucial ecological barriers and recreational spaces, hold immense ecological, economic, and social significance in terms of fire prevention and control. Once a garden fire occurs, it spreads rapidly, is difficult to extinguish, and can readily cause irreversible ecological damage alongside loss of life and property. Therefore, early, precise, and efficient garden fire detection technology constitutes a critical link in disaster prevention and loss mitigation. In recent years, computer vision methods based on deep learning, particularly object detection models represented by You Only Look Once (YOLO) [1], Single Shot MultiBox Detector (SSD) [2], and Faster R-CNN [3], have demonstrated significant potential in the field of flame and smoke detection due to their powerful feature extraction and target localization capabilities. These methods are gradually replacing traditional inefficient approaches reliant on sensors or manual observation.

However, applying advanced object detection models to real-time monitoring scenarios for garden fires, particularly when deploying them on resource-constrained edge computing devices (such as drones and intelligent surveillance cameras), faces significant challenges. These challenges include balancing accuracy and speed, capturing fine-grained features of flames or smoke, and the adaptability of bounding box regression [4]. To address these challenges and achieve a lightweight, high-precision, real-time garden fire detection model, this paper proposes an improved symmetry SSS-YOLOv8 lightweight garden fire detection model. This model discards the standard convolution structure and instead adopts the Space-to-Depth Convolution (SPDConv) layer, a core symmetry component of ShuffleNetV2, combined with a carefully designed Conv_Maxpool strategy. This combination effectively preserves fine-grained spatial information, such as flame edges and smoke textures, while significantly reducing computational complexity. Within the backbone feature extraction network, we introduce the Squeeze-and-Excitation (SE) attention module. This module adaptively learns the importance weights of different feature channels, selectively enhancing those channels highly correlated with fire (flames and smoke). This significantly improves the model ability to discern fire features and enhances its robustness. Concurrently, we abandon the traditional Complete Intersection over Union (CIoU) loss function and introduce Wise-IoU (WIoU) with its dynamic focusing mechanism. WIoU further boosts the model’s detection performance by dynamically adjusting its focus between high-quality and low-quality samples.

2. Related Work

Deep learning has garnered extensive attention and interest from various fields in recent years [5]. Numerous scholars have begun leveraging deep learning for classification, segmentation, and recognition tasks across different targets [6]. Within the field of fire detection, alongside the continuous advancement of deep learning, researchers have conducted substantial research on image-based fire detection. Relatively simple image analysis-based fire detection primarily relies on image classification. The objective of this approach is to perform semantic classification of fire images [7] and assign appropriate labels [8]. Beyond this, object localization models can be employed to generate bounding boxes pinpointing fires. When these two processes—classification and localization—are integrated, they constitute the mainstream object detection models prevalent today. Currently, deep learning-based fire detection models are primarily categorized into two-stage and one-stage detectors. Examples include two-stage models such as R-CNN and Faster R-CNN. One-stage models encompass YOLO, CenterNet, SSD, and EfficientDet, among others.

2.1. Two-Stage Models

Two-Stage Models, primarily represented by the R-CNN model, first generate a set of region proposals and then perform bounding-box regression and classification on these regions [9]. Zhang et al. [10] employed Faster R-CNN to detect forest fire smoke, circumventing the complex manual feature extraction process inherent in traditional video-based smoke detection methods. However, feature fusion based on Feature Pyramid Networks (FPN) is susceptible to information attenuation, and cross-scale fusion can introduce aliasing effects. The inherent complexity of these models often results in slower detection speeds. Khan and Khan. [11] proposed FFireNet model, which is based on the pre-trained convolution of the MobileNetV2 model and adds a fully connected layer to transform the classification problem of forest fire images into the extraction problem of symmetrical features of forest fire images. James et al. [12] compared the performance of Faster R-CNN Inception V2 and SSD MobileNet V2 for fire detection. They found that Faster R-CNN Inception V2 achieved higher fire detection accuracy than SSD MobileNet V2. Their results suggest that a vision-based Faster R-CNN Inception V2 system could be integrated with existing indoor fire safety systems, leveraging initial data collection to enhance building fire detection. To maximize the mapping of flame and smoke features, Maroua et al. [13] proposed a hybrid feature extractor combining VGG19 and Xception, utilized within the Faster R-CNN framework. By fusing multi-scale features to enhance the model’s representational capacity, this approach outperformed traditional single-backbone networks, ensuring the extraction of the maximum number of flame or smoke feature maps for localization and classification tasks. A limitation of this method is the lack of explicit description of the feature fusion strategy and, crucially, the absence of reported inference speed (Frames Per Second, FPS). This omission makes it impossible to ascertain whether the model meets the requirements for real-time fire detection in garden environments. In the early stages of a fire, flames often appear as small targets. Furthermore, haze in gardens and chimney smoke can exhibit characteristics similar to fire flames and smoke, potentially degrading detection performance. To address this challenge, Swin Transformer was integrated with Faster R-CNN, serving as its backbone network for fire detection. This integration leverages the hierarchical attention mechanism of the Swin Transformer and the shift-based window multi-head self-attention (SW-MSA) to enhance the detection capabilities for small objects and dynamic scenes [14].

2.2. One-Stage Models

The two-stage fire detection models tend to be computationally complex, leading to slower detection speeds that often fail to meet the real-time requirements for forest fire monitoring. Consequently, research on smoke and fire detection algorithms based on object detection models has shifted towards efficient end-to-end, one-stage object detection algorithms. Prominent one-stage detection methods include the YOLO, CenterNet, SSD, and EfficientDet series.

Rahul et al. [15] conducted an analysis of garden fire detection using ResNet-50, VGG-16, and DenseNet-121 models. Input images were resized to a width of 224 pixels and augmented using techniques like cropping and flipping. Experimental results indicated that ResNet-50 outperformed VGG-16 and DenseNet-121 for garden fire detection, and the SGD optimizer proved more suitable than Adam for this task. To enhance the performance of general object detection methods specifically for flame detection, Fernandes et al. [16] modified the EfficientDet model, proposing the h-EfficientDet model for garden fire detection. This model replaced the swish activation function with its hard swish counterpart and integrated it with the efficient feature fusion system BIFPN (Bidirectional Feature Pyramid Network). This approach yielded a marginal improvement in fire detection accuracy. Currently, one-stage fire detection models are predominantly based on the YOLO series. For instance, Jiao et al. [17] utilized the YOLOv3-tiny model for fire detection in forest images captured by drones. This network employed ResNet and Darknet-19 as its backbone to extract optimal feature sets. Other researchers optimized the baseline YOLOv3 model, proposing the SRN-YOLO model [18], which incorporates a Sparse Residual Network (SRN) to enable precise forest fire identification through a more efficient network architecture. Fan and Pei [19] detected forest fires by modifying the lightweight network structure YOLOv4-Light. For the feature extraction network, MobileNet replaced the standard YOLOv4 backbone, and depth-wise separable convolutions substituted the original convolutions in Path Aggregation Network (PANet), resulting in enhanced fire prediction performance. Sun et al. [20] proposed a YOLOv5-based fire detection system. This system applied augmentation techniques to fire images, such as adjusting brightness, exposure, adding noise, and cropping, and leveraged the FireNet and FLAME aerial image datasets for fire detection. Mohamed and Moulay [21] introduced a fine-tuned YOLOv7 and YOLOv8 fire detection model. This network utilized CSPDarknet53 and C2f modules in its backbone to optimize gradient flow, alongside an Spatial Pyramid Pooling Fast (SPPF) layer to boost computational efficiency. Subsequently, researchers have conducted comparative analyses of YOLOv9, YOLOv10, and YOLOv11 for smoke and fire detection [22]. YOLOv9 integrates a Generalized Efficient Layer Aggregation Network (GELAN), improving multi-scale feature aggregation and gradient flow while minimizing computational costs. YOLOv10 employs an updated version of CSPNet, enhancing gradient flow, reducing computational redundancy, and achieving precise feature extraction. A key improvement in YOLOv11 is the adoption of the C3k2 block, which replaces the C2f block used in earlier versions. Research findings demonstrate the effectiveness of compact YOLO models (YOLOv9t, YOLOv10n, and YOLOv11n) in fire and smoke detection tasks, with all three models progressively enhancing fire detection capabilities.

3. Proposed Method

YOLOv8 offers a range of model variants of different scales, including YOLOv8n, YOLOv8s (small), YOLOv8m (medium), YOLOv8l (large), and YOLOv8x (extra large). In view of this, this paper improves the YOLOv8 model and realizes the construction of the garden fire detection system by combining the cameras in the garden, as shown in Figure 1.

In this paper, we selected YOLOv8n as the base model, primarily based on the following considerations:

Lightweight Design: A lightweight design was implemented for the Backbone. The C2f modules were substituted with lightweight symmetry ShuffleNetV2 modules, which include symmetry ShuffleNetv2_B, symmetry ShuffleNetv2_U, and symmetry SE. Additionally, the initial two convolutional layers and the final convolutional layer within the feature extraction network were replaced with Conv-Maxpool layers and an SPDConv layer, respectively.
SE module: It can establish a dynamic inter-channel relationship model, which adaptively recalibrates channel-wise feature responses to enhance representational power. Therefore, we propose a repeated symmetric deployment strategy for the SE module, which significantly enhances the model’s capability to focus on flame and smoke characteristics.
WIoU: The original Soft IoU (SIoU) loss function was replaced with WIoU. This substitution addresses two key limitations: (1) it mitigates the convergence slowdown caused by the SIoU loss’s inability to simultaneously adjust the aspect ratio of bounding boxes during training, and (2) it effectively reduces the adverse impact of low-quality examples on the model’s generalization ability.

In order to enhance the ability of the model to extract feature information of flames or smoke, this paper continuously and symmetrically uses the ShuffleNetv2_B, SE, and ShuffleNetv2_U modules in the backbone network, as shown in Figure 2. Key enhancements to the symmetry of ShuffleNetV2 and SE are implemented and integrated with SPDConv, resulting in the construction of the symmetric SSS-YOLOv8 network, as depicted in Figure 2.

3.1. Lightweight Module SSS-Neck

Within the SSS-Neck, the first two convolutional layers are streamlined into a more efficient structure termed Conv_Maxpool. This module sequentially comprises a Convolutional layer (Conv2d), a Batch Normalization (BN) layer, a ReLU activation function, and a Maxpool2d layer. Within the Conv_Maxpool module, the initial convolutional structure first performs a convolution operation on the input fire and smoke image, producing the initial feature map. This feature map is then passed to the subsequent Maxpool2d layer, which performs downsampling. This downsampling operation further refines and enhances the extraction of salient feature information. Compared to convolutional layers, the Maxpool2d structure not only accomplishes the downsampling operation but also reduces computational load. The schematic architecture of the Conv_Maxpool module incorporating this Maxpool2d layer is illustrated in Figure 3. We replaced the C2f module in the Backbone of the YOLOv8n model with the standard building blocks and downsampling module of ShuffleNetV2 [23], as shown in Figure 4. This architectural modification reduces computational complexity while maintaining feature extraction capability, aligning with our lightweight design objectives for fire and smoke detection.

The basic module of ShuffleNetV2 first performs channel split, dividing the input feature information into two branches. One branch performs identity mapping, and the other branch performs convolution operation. Then, the feature information of the two branches is concatenated. Finally, channel shuffle is carried out to ensure the full fusion of feature information, as shown in Figure 5a. The downsampling module of ShuffleNetV2 is shown in Figure 5b. After removing channel splitting, each branch is downsampled with a stride of 2, and then the feature maps are concatenated. Finally, channel shuffling is performed. After processing through this downsampling module, feature channels are doubled, feature map size is halved, while preserving feature information integrity. After feature extraction by the ShuffleNetV2 basic module, an SPDConv layer composed of a space-to-depth (SPD) spatial expansion layer and a non-strided convolution layer is embedded [24]. The SPD layer performs downsampling on smoke or flame features extracted by ShuffleNetV2, while preserving all feature information in the channel dimension, thereby preventing feature loss. The non-strided convolution performs feature extraction without reducing feature map dimensions, thereby further mitigating fine-grained information loss in flame or smoke images and ensuring model accuracy.

As illustrated in Figure 6 for a scale factor of 2, the input feature map

X

with dimensions

S \times S \times C_{1}

is reshaped by the SPD layer. The resulting sub-feature maps possess dimensions of (

S

/scale,

S

/scale,

C_{1}

), representing a down-sampled version of the original input feature map X. When the scale factor is 2, four sub-feature maps are generated. These sub-feature maps are concatenated along the channel dimension, yielding a feature map with halved spatial dimensions and quadrupled channel depth: (

S

/2,

S

/2,

{4 C}_{1}

). This feature map is subsequently fed into a non-strided convolutional layer containing

C_{2}

filters, yielding an output feature map with dimensions (

S

/2,

S

/2,

C_{2}

).

3.2. Integration of the SE Module

The SE module is a highly classical and effective channel attention mechanism. This module establishes a dynamic inter-channel dependency model, intelligently regulates the contribution weight of each channel, and effectively enhances feature representation capabilities, as illustrated in Figure 7. The module primarily consists of two operations: Squeeze applies Global Average Pooling (GAP) to the input feature map U (with dimensions H × W × C). This compresses the spatial dimensions of the feature map, producing a channel descriptor

z

(with dimensions 1 × 1 × C). This descriptor

z

can be viewed as a statistical summary of the global information for each channel’s feature map, expressed by Equation (1).

z_{c} = F_{s q} (u_{c}) = \frac{1}{H \times W} \sum_{i = 1}^{H} \sum_{j = 1}^{W} u_{c} (i, j)

(1)

Building upon the channel descriptor z obtained from the Squeeze operation, the Excitation step learns adaptive weights (also referred to as activation values) for each channel. This process is implemented through two fully connected (FC) layers. The first FC layer performs dimensionality reduction (using a reduction ratio r) followed by a ReLU activation function, while the second FC layer restores the original channel dimension C. Finally, a Sigmoid activation function is applied to generate final channel-wise weight vector s (with dimensions 1 × 1 × C), as described in Equation (2).

s = F_{e x} (z, W) = σ (W_{2} δ (W_{1} z))

(2)

Finally, the channel-wise weights s obtained from the Excitation operation are multiplied element-wise with the corresponding channels of the original input feature map U. This produces the final output feature map, which has been adaptively weighted by the channel attention mechanism, as shown in Equation (3).

{\tilde{x}}_{c} = F_{s c a l e} (u_{c}, s_{c}) = s_{c} \cdot u_{c}

(3)

One prominent advantage of the attention mechanism of the SE module is its high computational efficiency. Compared with other complex attention mechanisms, the SE block design is simple and has low computational overhead. Therefore, in this paper, the SE module attention mechanism is inserted into the YOLOv8n model. Integrating the SE module into YOLOv8n necessitates careful consideration of both the insertion locations and the specific implementation details. Guided by an analysis of the YOLOv8n architecture and the characteristics of the fire detection task, this study selects two critical C2f layers within the Backbone network of YOLOv8n as the insertion points for the SE modules. These layers are typically responsible for handling features at distinct scales. Within the YOLOv8n architecture, model.4 and model.6 represent critical feature extraction layers in the backbone network. We select two key C2f layers as the insertion positions for the SE module. In the Backbone layer, the SE module is mainly inserted behind ShuffleNetV2 and at the end of the convolutional layer SPDConv to enhance the ability to extract enhanced features. The SE modules are integrated at the outputs of these layers, applying channel attention immediately following the primary feature extraction process.

In the practical implementation, the SE module implements the channel attention mechanism through the following steps: First, GAP is applied to the input features, compressing the spatial dimensions to 1 × 1. The module employs the default channel reduction ratio (r) of 16. This means the first 1 × 1 convolutional layer compresses the channel dimension to 1/16 of its original size. Subsequently, a second 1 × 1 convolutional layer restores the channel dimension to its original width. The entire processing flow incorporates ReLU and Sigmoid activation functions to learn non-linear interdependencies between channels and generate normalized channel-wise weighting coefficients. Finally, a channel-wise multiplication operation integrates the learned weights with the original feature maps, achieving adaptive feature recalibration. This mechanism effectively enhances the responses of fire-relevant feature channels while suppressing useless features, thereby significantly improving the model’s detection accuracy. Through this mechanism, the SE module effectively enhances YOLOv8n’s ability to extract salient features in fire scenes, particularly for targets like smoke and flames, which exhibit distinctive channel-wise characteristics.

3.3. Selection of Loss Function

In the task of object detection, the IoU [25] metric is commonly adopted to quantify the similarity between predicted bounding boxes and ground truth bounding boxes. A higher IoU value represents a better prediction result. In training, IoU is usually taken as the bounding box loss function. The definition formula of IoU is shown as Equation (4):

I o U = \frac{i n t e r s e c t i o n A}{i n t e r s e c t i o n B}

(4)

L_{I o U}

represents the IoU between the predicted bounding box and the ground truth bounding box. Its geometric interpretation is visualized in Figure 8, while the mathematical formulation is provided in Equation (5).

L_{I o U} = 1 - \frac{W_{i} H_{i}}{w h + W_{g t} H_{g t} - W_{i} H_{i}}

(5)

While YOLOv8n adopts CIoU [26] as its default bounding box loss, this proves suboptimal for fire imagery. In fire image datasets, significant scale variations exist between flames and smoke. When using CIoU as the bounding box loss function during backpropagation, the aspect ratio penalty term restricts simultaneous optimization of width and height dimensions in predicted boxes. This constraint adversely affects model convergence for multi-scale fire objects, motivating our proposed WIoU [27] alternative. The WIoU loss function synthesizes advantages from CIoU and GIoU, establishing a novel loss formulation. It employs IoU as the core similarity metric, while integrating multi-dimensional factors including bounding box dimensions and center-point distance to deliver enhanced similarity assessment. This optimized approach demonstrates measurable improvements in object detection performance, significantly boosting localization accuracy while accelerating processing speed.

3.4. The Grad-CAM Algorithm Visualizes the Heatmap Features of the Model

To enhance the performance insight of the model proposed in this paper, the Grad-CAM algorithm is used to visualize the heat map features of the proposed model. Initially, gradient computation is performed on the feature maps to obtain gradient maps. Subsequently, normalization processing is applied to these gradient maps, yielding normalized gradient maps. Finally, element-wise multiplication is conducted between the normalized gradient maps and the original feature maps to generate the final heat map. A comparative analysis of Figure 9a,b reveals that the original YOLOv8n model produces dispersed feature representations and exhibits weak performance in small-object feature extraction. However, our proposed symmetry SSS-YOLOv8n model demonstrates superior performance by focusing on critical flame and smoke features while suppressing irrelevant background information. Additionally, it significantly enhances the ability to detect and characterize small targets.

4. Experiment Analysis

To validate the performance of the proposed network model in garden fire and smoke detection, we conducted monitoring of various types of fires and compared the monitoring results with those obtained from other network models. All models in this experiment were trained on a Windows 11 system. The software environment was established using Python 3.13.2 and the PyTorch 2.7.0+xpu framework. Regarding hardware configuration, the experiments utilized an Intel(R) Core(TM) Ultra 7 255H CPU @ 2.00GHz, an NVIDIA GeForce RTX 3050 Ti GPU, and CUDA version 11.8.

4.1. Dataset

This article obtained and integrated a total of 21,521 dataset images from multiple online platforms such as Google, Github, and Chinese Software Developer Network (CSDN), covering different types of fire or smoke scenes. The data sources mainly include the following aspects: Public Fire datasets, including datasets such as “Fire Detection” and “Smoke Detection” on the Kaggle platform, which have been initially labeled and organized [21]. The datasets attached to fire detection projects obtained from open-source platforms such as Github, usually have high annotation quality and scenario diversity [28]. Video Frame Extraction: comprises key frames extracted from fire-related videos at specific time intervals. This data subset effectively captures the dynamic progression of fire development. The dataset covers multiple fire scenarios, including but not limited to: Building fires in gardens/landscapes, Forest fires in parks/gardens, Vehicle fires. The image resolution is mainly concentrated between

640 \times 480

and

1920 \times 1080

to ensure the balance between the efficiency of model training and the detection accuracy. As illustrated in Figure 10, sample images from the dataset showcase fire and smoke states across various scenarios. These examples highlight the distinct visual characteristics of fire and smoke under diverse environments and varying lighting conditions.

To meet the requirements for training, validation, and testing of deep learning models, this study partitioned the collected dataset using a rational scheme. The specific partitioning ratios follow common standards in the machine learning field, namely, a training set/validation set/test set ratio of approximately 65:15:20. The final partitioned dataset consists of Training Set: 14,122 images (accounting for 65.6%); Validation Set: 3099 images (accounting for 14.4%); and Test Set: 4300 images (accounting for 20.0%). The final partitioned fire dataset consists of Training Set: 5445 images; Validation Set: 1195; and Test Set: 1659. The final partitioned smoke dataset consists of Training Set: 990 images; Validation Set: 218; and Test Set: 302. This partitioning strategy ensures sufficient training data while retaining adequate validation and test samples for robust model performance evaluation. Dataset annotation involved utilizing a pre-trained object detection model to generate initial annotations for the images. This process produced bounding box coordinates for two target classes: smoke and fire. The annotation information is stored in the YOLO format. Each image corresponds to a single text (.txt) file containing the object class label and normalized bounding box coordinates.

In this paper, the dataset contains two categories: smoke and fire. Meanwhile, in order to enhance the generalization ability of the model, various data augmentation operations were carried out on the training data. Data augmentation can not only effectively expand the number of training samples, but also simulate fire scenarios in various complex environments, improving the adaptability of the model in practical applications. The main data augmentation strategies adopted in this study include the following: (1) Color space transformation—by adjusting the parameters of Hue, Saturation, and Value of the image, the fire scene under different lighting conditions and weather environments is simulated. (2) Geometric transformations, including random horizontal flipping (probability 0.5), random vertical flipping (probability 0.3), and random rotation (±15 degrees). These transformations enhance the model’s ability to recognize fire from different directions and angles, which is particularly significant for scenarios where the installation angles of surveillance cameras are not fixed. (3) Mosaic Augmentation is to concatenate four different images together in a 2 × 2 grid to form new training samples. This method not only increases the context information of the image, but also effectively increases the sample size of small targets, improving the model’s detection ability for small fire sources and long-distance smoke. (4) MixUp involves blending two images in a certain proportion (α = 0.2–0.8) to generate new training samples. This method can smooth the decision boundary, improve the robustness of the model, and reduce the risk of overfitting. These data augmentation strategies are dynamically applied during the training process. Different augmentation methods are randomly applied to the images in each training batch, which greatly enriches the diversity of the training data and significantly improves the detection performance under complex backgrounds.

4.2. Evaluation Metrics

To comprehensively assess the performance of different models, we adopted multiple evaluation metrics as detailed in Table 1. These metrics rigorously gauge the detection capability of models from diverse perspectives, providing an objective reflection of their practical performance in fire and smoke detection tasks.

In fire and smoke detection, mAP@0.5 serves as the fundamental metric for evaluating overall model performance. A higher mAP@0.5 value indicates the model’s capability to accurately localize and classify both smoke and flame targets within fire scenarios, which is critical for early-stage fire warning systems. mAP@0.5:0.95 exhibits greater sensitivity to bounding box localization precision. For fire detection tasks, high-precision target localization supports accurate estimation of fire extent and progression dynamics, thereby informing critical fire management decisions. While mAP@0.5:0.95 values are typically lower than mAP@0.5 due to stricter evaluation criteria, the magnitude of disparity between these metrics reflects the model’s capability in fine-grained localization accuracy.

Precision is the ratio of correctly predicted positive observations to the total predicted positive observations, as shown in Equation (6).

P r e c i s i o n = \frac{T P}{T P + F P}

(6)

Recall is the ratio of correctly predicted positive observations to all actual positive instances, as shown in Equation (7).

R e c a l l = \frac{T P}{T P + F N}

(7)

F1-score represents a harmonic mean of Precision and Recall, as shown in Equation (8).

F 1 - s c o r e = \frac{2 * P r e c i s i o n * R e c a l l}{P r e c i s i o n + R e c a l l}

(8)

4.3. Ablation Experiment

To rigorously evaluate the impact of individual modules on model performance, this section conducts comprehensive ablation experiments. The proposed modules—SSS-Neck, SE attention, and WIoU loss—were sequentially integrated into the backbone network to validate the efficacy of the architectural enhancements. All ablation studies utilized YOLOv8n as the baseline backbone architecture. Experimental results demonstrating the performance of different module configurations on both the validation and test sets are systematically presented in Table 2.

It can be seen from Table 2 that the performance of all models on the test set is lower than that on the validation set, which is a common phenomenon in machine learning models. In particular, mAP@0.5: The 0.95 metric drops more significantly on the test set, indicating greater generalization challenges for the model in terms of precise localization. Compared with the benchmark model YOLOv8n, the model with the SE attention mechanism improved on all metrics, with a 1.2 percentage point increase on the validation set mAP@0.5 and a 0.6 percentage point increase on the test set. It indicates that the SE attention mechanism can effectively enhance the model’s ability to recognize the key features of the fire scene. After adding the SSS-Neck module, both mAP@0.5 and mAP@0.5:0.95 of the model decreased slightly, but the model parameters decreased significantly to 64.7% of the original. This is because when the constructed backbone feature extraction network extracts flame and smoke features again, it neither generates a large amount of redundant information nor loses feature information, further demonstrating the effectiveness of the lightweight improvement strategy. When the SSS-Neck module and SE module were introduced simultaneously and the bounding box loss function was replaced with WIoU, the models mAP@0.5 and mAP@0.5:0.95 both showed significant improvement with only 1.99MB of parameters, the lowest among ablation experiments. To sum up, SSS-Neck reduces the number of model parameters, SE ensures the detection accuracy, the bounding box loss function reduces the training loss value, and all the improved modules have fully played their respective roles.

To analyze the performance of different modules in different categories (smoke and flame), the precision, recall, and F1-score of different modules were calculated, and the results are shown in Table 3.

It can be clearly seen from Table 3 that the performance of all models in the smoke category is superior to that in the flame category. This phenomenon may stem from the following factors: Flames present diverse forms and colors at different combustion stages and when different materials are burning, increasing the difficulty of identification. The attention mechanism has different influences on different categories. The improvement of the SE attention mechanism in the detection of smoke and flame categories is more significant. In addition, the introduction of WIoU also improves the accuracy of smoke and flame detection to a certain extent. When SE, SSS-Neck, and WIoU are combined together, the accuracy of smoke and flame detection is significantly improved compared with the baseline YOLOv8n model. The precision of smoke category detection increased by 2.94 percentage points, recall increased by 1.87 percentage points, and F1-score increased by 2.34 percentage points. The precision of flame category detection increased by 2.62 percentage points, the recall increased by 1.9 percentage points, and the F1-score increased by 2.23 percentage points.

4.4. Comparison of Different Attention Mechanisms

To empirically demonstrate the rationale for incorporating the SE attention mechanism, this study conducts a comparative analysis between SE and Convolutional Block Attention Module (CBAM) attention mechanisms. During experimentation, both attention modules were embedded in identical positions within the baseline network architecture, with all other parameters strictly maintained to ensure a fair comparison. Performance evaluation focused specifically on fire detection accuracy, with comprehensive results detailed in Table 4.

It can be seen from Table 4 that different attention mechanisms have different influences on flame detection. The SE attention mechanism has a more significant improvement on the flame category, with precision increasing by 1.75 percentage points and recall increasing by 1.6 percentage points. It indicates that the SE attention mechanism is particularly suitable for enhancing the detection ability of targets with fuzzy boundaries and changing morphological characteristics. The introduction of the CBAM attention mechanism not only failed to improve the detection accuracy, but also led to a decrease in both precision and recall. It might be that the dual attention mechanism of CBAM increases the model complexity, leading to overfitting or optimization difficulties under the current training configuration.

4.5. Comparison Experiment

To study the differences in fire detection performance between the Symmetry SSS-YOLOv8n model and other models, YOLOv3-Tiny [29] and YOLOv5s [30] were selected. Performance comparison tests were conducted on the detection models of YOLOv7-Tiny [27], YOLOv8n [31], YOLOv8N-World [32], YOLOv9-Tiny [33], and the latest YOLOv8-FEP [28], as shown in Table 5.

It can be seen from Table 5 that the detection precision and mAP50 of the symmetry SSS-YOLOv8n model constructed by the method in this paper are both higher than those of other models. The recall of the symmetry SSS-YOLOv8n model is higher than that of the YOLOv5s, YOLOv3-Tiny, YOLOv7-Tiny, YOLOv8n, YOLOv8n-World, and YOLOv9-Tiny models, and slightly lower than that of the YOLOv8-FEP model. Compared with the latest fire detection model YOLOv11n, the precision of the symmetry SSS-YOLOv8 model is slightly higher than that of YOLOv11n, and the recall and mAP50 are slightly lower than those of YOLOv11n. However, the parameters of the symmetry SSS-YOLOv8n model are less than those of YOLOv11n. It is mainly attributed to the design of the Lightweight module SSS-Neck. The FPS of the symmetry SSS-YOLOv8n model is comparable to that of YOLOv11n, YOLOv8-FEP, YOLOv8N-World, and YOLOv8n, and is all higher than that of YOLOv9-Tiny, YOLOv7-Tiny, YOLOv5s, and YOLOv3-Tiny. It is indicated that the detection speed of the symmetry SSS-YOLOv8n model is comparable to that of the YOLOv11n, YOLOv8-FEP, YOLOv8n-World, and YOLOv8n models, and they are all much faster than YOLOv9-Tiny, YOLOv7-Tiny, YOLOv5s, and YOLOv3-Tiny. Simultaneously, the FPS of the symmetry SS-YOLOv8N model exceeds 200, making it fully capable of meeting the requirements of real-time detection.

Meanwhile, the speed of the symmetry SSS-YOLOv8 model on the GPU is significantly faster than that of YOLOv11n. It is indicated that the symmetry SSS-YOLOv8n model is more suitable for scenarios where there is a certain balance requirement between accuracy and speed, such as the safety alarm system for fire detection.

Since there may be a single fire point or multiple fire points in garden fires, in view of this, we use the symmetry SSS-YOLOv8n model of the method proposed in this paper to detect a single fire point and multiple fire points, and compare the detection results with those of other models. Firstly, the symmetry SSS-YOLOv8n model of the method proposed in this paper was compared with other models of the same type in the YOLO series for the detection of individual fire points in gardens. The detection results are shown in Figure 11.

It can be seen from Figure 11 that the fire detection results of the symmetry SSS-YOLOv8n model and the YOLOv11n for a single ignition point are similar, and the detection accuracy of both models exceed 0.8. The detection precision of YOLOv11n for one of the fire points is slightly higher than that of the symmetry SSS-YOLOv8n model, while the detection precision for the other fire point is slightly lower than that of the symmetry SSS-YOLOv8n model. The detection precision of YOLOv8-FEP for both fires exceeds 0.8. Similarly, the detection precision of both YOLOv8n-World and YOLOv9-Tiny for these two garden fire points did not exceed 0.8. Although the fire intensity in the first garden scenario was lower than that in the second, the detection accuracy of the symmetry SSS-YOLOv8n, YOLOv8-FEP, YOLOv9-Tiny, and YOLOv8n-World models for fire points in the first garden was consistently higher than for those in the second garden. The primary factor contributing to this enhanced performance is the closer proximity of the ignition point in the first garden to the surveillance camera, resulting in the acquisition of clearer flame imagery.

Secondly, the symmetry SSS-YOLOv8n model of the method proposed in this paper is compared with the benchmark model YOLOv8n of the same type, as well as other models SSD [34] and Faster_RCNN [35] of different types for the detection of multiple fire points and smoke in gardens, as shown in Figure 12.

As can be seen from Figure 12, there are multiple fire points in this garden and a large amount of smoke is produced. While these four models successfully detected multiple fire points, they also detected multiple smoke. It can be seen from Figure 12 that the detection precision of these four models for smoke is significantly higher than that for flame. The main reason is that a large amount of thick smoke was produced at the fire point in the garden, which made a distinct difference between the smoke area and the non-smoke area in the image. Meanwhile, it can be clearly seen from Figure 12 that the detection precision of these four models for smoke is significantly higher than that for flame. The main reason is that a large amount of thick smoke was produced at the fire point in the garden, which made a distinct difference between the smoke and the non-smoke area in the image. It can be seen from Figure 12a,b that the detection precision of the SSD model for flame is similar to that of the Faster_RCNN model, both within 0.9, and the detection precision of these two models for flame and smoke is lower than that of the YOLOv8n and symmetry SSS-YOLOv8n models. The detection precision of the benchmark model YOLOv8n for smoke exceeds 0.9, while its detection precision for flame is significantly lower than that for smoke. The detection precision of the symmetry SSS-YOLOv8n model for both flame and smoke exceeds 0.9, and the detection precision of this model is significantly better than that of the other three models.

In addition, we conducted a detailed performance comparison and analysis of the symmetry SSS-YOLOv8n proposed in this paper with SSD and Faster_RCNN, including metrics such as mAP50, recall, precision, and F1-score. Since it involves the two categories of Flame and Smoke, any detection result that is Flame or Smoke will be identified as a fire. The detection results of different models is obtained, as shown in Table 6.

According to Table 6, the recall, precision, and F1-score of different models are obtained. Simultaneously, the mAP@0.5 of different models is obtained, as shown in Table 7.

It can be seen from Table 7 that the mAP50 of the symmetry SS-YOLOv8N model is higher than that of the Faster_RCNN and SSD models. Similarly, the precision, recall, and F1-score of the symmetry SSS-YOLOv8n model are all higher than those of the Faster_RCNN and SSD models. It is indicated that the overall performance of the symmetry SS-YOLOV8N model constructed in this paper in fire detection is superior to that of the Faster_RCNN and SSD models.

5. Application

Since fires may occur during either daytime or nighttime, and the clarity of flames and smoke captured by cameras differs significantly between these conditions, experiments were conducted to validate the 24-h monitoring capability of the proposed method for garden fires. Video stream frames containing fire imagery from both daytime and nighttime scenarios were detected using the proposed approach. The detection results are presented in Figure 13.

It can be seen from Figure 13 that the method proposed in this paper can not only detect fires during the day but also at night. It can be seen from Figure 13a that the detection results of the method proposed in this paper for these two fire points are 0.61 and 0.64, respectively. It can be seen from Figure 13b that the detection result is 0.81. It can be seen from the comparison in Figure 13a,b that the detection accuracy of the method proposed in this paper for fires during the day is significantly lower than that at night. The main reason is that small-scale fire sources during daytime are affected by ambient illumination and tree occlusion, resulting in less discernible flame features within the video frame imagery. In contrast, flames manifest more distinctly in nighttime images, as clearly demonstrated in Figure 13b.

To further validate the feasibility of the application of the method proposed in this paper, a surveillance camera was deployed in a corner of the garden. Within its field of view, controlled fire ignition of trees was performed. Thirty video frames capturing both the initial and advanced stages of flame combustion were extracted from the video stream, as illustrated in Figure 14.

The video stream frame images were detected by using the method in this paper, and the detection results are shown in Table 8.

As presented in Table 8, the proposed method successfully detected nearly all flame images. The two undetected instances were attributed to the fire progression reaching the decay stage of combustion, characterized by minimal flame signatures that presented significant challenges for recognition. Furthermore, the total fire duration was approximately 2 min and 30 s. Crucially, the proposed system generated an early warning at 2 min and 22 s into the event. Significantly, this warning was triggered during the initial combustion phase, thereby maximizing the time available for fire suppression efforts.

6. Conclusions

Based on the YOLOv8n model, this paper proposes an improved lightweight network model symmetry SSS-YOLOv8 for detecting garden flames and smoke, thereby enabling real-time fire detection and alarm based on garden cameras. The proposed network is to replace the backbone feature extraction network of the basic model with the lightweight network ShuffleNetV2 module, introduce the SE attention mechanism into the backbone network, and replace the bounding box loss function with WIoU loss. The main conclusions are as follows:

First, the improved SSS-Neck lightweight module was introduced in YOLOv8n, and the SSS-Neck module was used to replace the main feature extraction module of the original model. Secondly, the integration of the lightweight Squeeze-and-Excitation (SE) attention mechanism enabled the model to maintain high accuracy while substantially reducing computational complexity. Finally, the adoption of the WIoU loss function significantly accelerated model convergence and substantially reduced the loss value. Ablation studies demonstrate that the constructed symmetry SSS-YOLOv8 model not only reduces the parameter count but also ensures robust detection precision.

Comparative experiments demonstrate that the proposed symmetry SSS-YOLOv8 model achieves higher precision and recall for garden fire detection than other YOLO-series models. Specifically, it surpasses the baseline YOLOv8n model by significant margins of 16.5 percentage points in precision and 7.3 percentage points in recall. Furthermore, in terms of both detection precision and parameter, the proposed symmetry SSS-YOLOv8 model outperforms other established models such as SSD and Faster RCNN.

The proposed symmetry SSS-YOLOv8 model achieves dual detection capability for both flames and smoke, demonstrating robustness to variations in smoke motion patterns. Furthermore, it effectively handles both single and multiple instances of fire or smoke sources. Crucially, the detection results for all targets are presented concurrently within the output, significantly enhancing overall fire detection efficiency.

The symmetry SSS-YOLOv8 model proposed in this study achieves effective recognition of both flames and smoke, while exhibiting a small model size and low computational complexity. This enables real-time fire detection utilizing multiple surveillance cameras deployed throughout gardens. However, the current research primarily focused on optimizing the lightweight design for single-camera fire detection systems. Consequently, future work should explore the extension of this framework for real-time fire monitoring using multiple synchronized cameras.

Author Contributions

Conceptualization, B.L. and Y.W.; methodology, J.W.; software, Q.A. and J.Z.; validation, X.C.; formal analysis, B.L.; investigation, J.Z.; resources, Y.W.; data curation, J.W. and Q.A.; writing—original draft preparation, B.L.; writing—review and editing, Y.W.; funding acquisition, Q.A. and Y.W. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported by the Hubei Engineering Research Center for BDS-Cloud High-Precision Deformation Monitoring Open Funding (No. HBBDGJ202507Y; HBBDGJ202511Y; HBBDGJ202502Z), in part by the National Natural Science Foundation of China (No. 62377037).

Data Availability Statement

The data used to support the findings of this study are available from the corresponding author upon request.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Yuan, Z.; Tang, X.; Ning, H.; Yang, Z. LW-YOLO: Lightweight Deep Learning Model for Fast and Precise Defect Detection in Printed Circuit Boards. Symmetry 2024, 16, 418. [Google Scholar] [CrossRef]
Huang, Z.J.; Sui, B.W.; Wang, S.T.; Zhang, Y.Y. The video fire detection and location method onboard ship based on an improved MF-SSD deep learning algorithm. Basic Clin. Pharmacol. Toxicol. 2020, 126, 127–128. [Google Scholar]
Wei, X.; Wu, Y.; Dong, F.; Zhang, J.; Sun, S. Developing an Image Manipulation Detection Algorithm Based on Edge Detection and Faster R-CNN. Symmetry 2019, 11, 1223. [Google Scholar] [CrossRef]
Sun, B.; Bi, K.; Wang, Q. YOLOv7-FIRE: A tiny-fire identification and detection method applied on UAV. AIMS Math. 2024, 9, 10775–10801. [Google Scholar] [CrossRef]
LeCun, Y.; Bengio, Y.; Hinton, G. Deep learning. Nature 2015, 521, 436–444. [Google Scholar] [CrossRef]
Fang, Y.; Pan, X.; Shen, H.-B. Recent Deep Learning Methodology Development for RNA–RNA Interaction Prediction. Symmetry 2022, 14, 1302. [Google Scholar] [CrossRef]
Wu, X.W.; Sahoo, D.; Hoi, S.C.H. Recent advances in deep learning for object detection. Neurocomputing 2020, 396, 39–64. [Google Scholar] [CrossRef]
Kaur, J.; Singh, W. Tools, techniques, datasets and application areas for object detection in an image: A review. Multimed. Tools Appl. 2022, 81, 38297–38351. [Google Scholar] [CrossRef]
Girshick, R. Fast R-CNN. In Proceedings of the IEEE International Conference on Computer Vision, Santiago, Chile, 7–13 December 2015; pp. 1440–1448. [Google Scholar] [CrossRef]
Zhang, Q.X.; Lin, G.H.; Zhang, Y.M.; Xu, G.; Wang, J.J. Wildland forest fire smoke detection based on faster R-CNN using synthetic smoke Images. Procedia Eng. 2018, 211, 441–446. [Google Scholar] [CrossRef]
Khan, S.; Khan, A. FFireNet: Deep Learning Based Forest Fire Classification and Detection in Smart Cities. Symmetry 2022, 14, 2155. [Google Scholar] [CrossRef]
Pincott, J.; Tien, P.W.; Wei, S.; Calautit, J.K. Indoor fire detection utilizing computer vision-based strategies. J. Build. Eng. 2022, 61, 105154. [Google Scholar] [CrossRef]
Cheknane, M.; Bendouma, T.; Boudouh, S.S. Advancing fire detection: Two-stage deep learning with hybrid feature extraction using faster R-CNN approach. Signal Image Video Process. 2024, 18, 5503–5510. [Google Scholar] [CrossRef]
Zaman, K.; Sun, Z.; Shah, S.M.; Shoaib, M.; Pei, L.; Hussain, A. Driver Emotions Recognition Based on Improved Faster R-CNN and Neural Architectural Search Network. Symmetry 2022, 14, 687. [Google Scholar] [CrossRef]
Rahul, M.; Shiva Saketh, K.; Sanjeet, A.; Srinivas Naik, N. Early detection of forest fire using deep learning. In Proceedings of the IEEE Region 10 Annual International Conference, Osaka, Japan, 16–19 November 2020; pp. 1136–1140. [Google Scholar]
Fernandes, A.M.; Utkin, A.B.; Chaves, P. Automatic early detection of wildfire smoke with visible-light cameras and EfficientDet. J. Fire Sci. 2023, 41, 122–135. [Google Scholar] [CrossRef]
Jiao, Z.T.; Zhang, Y.M.; Xin, J.; Mu, L.X.; Yi, Y.M.; Liu, H.; Liu, D. A deep learning based forest fire detection approach using Uav and Yolov3. In Proceedings of the International Conference on Industrial Artificial Intelligence, Shenyang, China, 23–27 July 2019; pp. 1–5. [Google Scholar] [CrossRef]
Gao, S.S.; Chu, M.H.; Zhang, L. A detection network for small defects of steel surface based on YOLOv7. Digit. Signal Process. 2024, 149, 104484. [Google Scholar] [CrossRef]
Fan, R.X.; Pei, M.T. Lightweight forest fire detection based on deep learning. In Proceedings of the IEEE International Workshop on Machine Learning for Signal Processing, MLSP. Electr Network, Gold Coast, Australia, 25–28 October 2021; pp. 2–7. [Google Scholar] [CrossRef]
Sun, Z.K.; Xu, R.Z.; Zheng, X.W.; Zhang, L.F.; Zhang, Y. A forest fire detection method based on improved YOLOv5. Signal Image Video Process. 2025, 19, 136. [Google Scholar] [CrossRef]
Chetoui, M.; Akhloufi, M.A. Fire and Smoke Detection Using Fine-Tuned YOLOv8 and YOLOv7 Deep Models. Fire 2024, 7, 135. [Google Scholar] [CrossRef]
Alkhammash, E.H. A Comparative Analysis of YOLOv9, YOLOv10, YOLOv11 for Smoke and Fire Detection. Fire 2025, 8, 26. [Google Scholar] [CrossRef]
Zhang, L.L.; Jiang, Y.; Sun, Y.P.; Zhang, Y.; Wang, Z. Improvements based on shuffleNetV2 model for bird identification. IEEE Access 2023, 11, 101823–101832. [Google Scholar] [CrossRef]
Sunkara, R.; Luo, T. No more strided convolutions or pooling: A new CNN building block for low-resolution images and small objects. In Proceedings of the Machine Learning and Knowledge Discovery in Databases, ECML PKDD 2022, PT III, Grenoble, France, 19–23 September 2023; Volume 13715, pp. 443–459. [Google Scholar]
Su, K.K.; Cao, L.H.; Zhao, B.T.; Li, N.; Wu, D.; Han, X.Y. N-IoU: Better IoU-based bounding box regression loss for object detection. Neural Comput. Appl. 2024, 36, 3049–3063. [Google Scholar] [CrossRef]
Zhao, Z.Q.; Wang, J.; Zhao, H. Research on apple recognition algorithm in complex orchard environment based on deep learning. Sensors 2023, 23, 5425. [Google Scholar] [CrossRef]
Tong, Z.J.; Chen, Y.H.; Xu, Z.W.; Yu, R. Wise-IoU: Bounding box regression loss with dynamic focusing mechanism. arXiv 2023, arXiv:2301.10051. [Google Scholar] [CrossRef]
Zhang, T.X.; Wang, F.W.; Wang, W.M.; Zhao, Q.H.; Ning, W.J.; Wu, H.D. Research on Fire Smoke Detection Algorithm Based on Improved YOLOv8. IEEE Access 2024, 12, 117354–117362. [Google Scholar] [CrossRef]
Pan, X.; Yang, T.Y.; Xiao, Y.F.; Yao, H.C.; Adeli, H. Vision-based real-time structural vibration measurement through deep-learning-based detection and tracking methods. Eng. Struct. 2023, 281, 115676. [Google Scholar] [CrossRef]
Sun, C.Y.; Chen, Y.J.; Xiao, C.; You, L.X.; Li, R.Z. YOLOv5s-DSD: An Improved Aerial Image Detection Algorithm Based on YOLOv5s. Sensors 2023, 23, 6905. [Google Scholar] [CrossRef] [PubMed]
Wang, Z.; Hua, Z.X.; Wen, Y.C.; Zhang, S.J.; Xu, X.S.; Song, H.B. E-YOLO: Recognition of estrus cow based on improved YOLOv8n model. Expert Syst. Appl. 2024, 238, 17. [Google Scholar] [CrossRef]
Cheng, T.H.; Song, L.; Ge, Y.X.; Liu, W.Y.; Wang, X.G.; Shan, Y. YOLO-World: Real-Time Open-Vocabulary Object Detection. In Proceedings of the 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, 16–22 June 2024; pp. 16901–16911. [Google Scholar] [CrossRef]
Castro-Bello, M.; Roman-Padilla, D.B.; Morales-Morales, C.; Campos-Francisco, W.; Marmolejo-Vega, C.V.; Marmolejo-Duarte, C.; Evangelista-Alcocer, Y.; Gutiérrez-Valencia, D.E. Convolutional Neural Network Models in Municipal Solid Waste Classification: Towards Sustainable Management. Sustainability 2025, 17, 3523. [Google Scholar] [CrossRef]
Zhan, H.W.; Pei, X.Y.; Zhang, T.H.; Zhang, L.Q. Research on flame detection method based on improved SSD algorithm. J. Intell. Fuzzy Syst. 2023, 45, 6501–6512. [Google Scholar] [CrossRef]
Pan, J.; Ou, X.M.; Xu, L. A Collaborative Region Detection and Grading Framework for Forest Fire Smoke Using Weakly Supervised Fine Segmentation and Lightweight Faster-RCNN. Forests 2021, 12, 768. [Google Scholar] [CrossRef]

Figure 1. Flow chart of the garden fire detection based on symmetry SSS-YOLOv8n.

Figure 2. Symmetry SSS-YOLOv8 model structure.

Figure 3. Conv_maxpool.

Figure 4. Replacement of the C2f module.

Figure 5. ShuffleNetV2 network structure: (a) Basic unit, (b) Downsampling unit.

Figure 6. Structure of SPDConv.

Figure 7. SE module.

Figure 8. Intersection over Union (IoU) for predictive and real frames.

Figure 9. Comparison of the visual output of heat map features of the symmetry SS-YOLOv8N model constructed in this paper and the original YOLOv8n model.

Figure 10. Partial dataset image.

Figure 11. Detection results of individual fire points in gardens of different models of the YOLO series.

Figure 12. Detection results of multiple fire points in the garden by the symmetry SSS-YOLOv8n model and the benchmark models YOLOv8n, SSD, and Faster_RCNN.

Figure 13. Daytime and Nighttime Garden Fire Detection Results: (a) Daytime, Fire one: 0.61, Fire two: 0.64, (b) Nighttime, Fire: 0.81.

Figure 14. Thirty flame images in the video stream frame.

Table 1. Model evaluation metrics.

Evaluation Metrics	Role
mAP@0.5	The mean Average Precision (mAP) calculated at a single IoU threshold of 0.5.
mAP@0.5:0.95	The mAP averaged over multiple IoU thresholds ranging from 0.5 to 0.95 in increments of 0.05.
Precision	The ratio of correctly predicted positive observations to the total predicted positive observations.
Recall	The ratio of correctly predicted positive observations to all actual positive instances.
F1-score	A harmonic mean of Precision and Recall.

Table 2. Ablation experiment results based on mAP analysis.

Model	mAP@0.5 (Val)	mAP@0.5:0.95 (Val)	mAP@0.5 (Test)	mAP@0.5:0.95 (Test)	Parameters/ M
YOLOv8n	0.762	0.550	0.723	0.445	3.03
YOLOv8n+SE	0.774	0.562	0.729	0.454	3.03
YOLOv8n+SSS-Neck	0.761	0.550	0.721	0.444	1.96
YOLOv8n+WIoU	0.769	0.559	0.727	0.449	3.03
SSS-YOLOv8n	0.783	0.573	0.736	0.464	1.99

Table 3. Ablation experiment based on precision, recall, and F1-score.

Model	Category	Precision	Recall	F1-Score
YOLOv8n	Smoke	0.8195	0.7025	0.7565
YOLOv8n	Flame	0.7725	0.6820	0.7244
YOLOv8n+SE	Smoke	0.8390	0.7135	0.7712
YOLOv8n+SE	Flame	0.7900	0.6980	0.7412
YOLOv8n+SSS-Neck	Smoke	0.8196	0.7024	0.7565
YOLOv8n+SSS-Neck	Flame	0.7727	0.6813	0.7241
YOLOv8n+WIoU	Smoke	0.8212	0.7130	0.7633
YOLOv8n+WIoU	Flame	0.7824	0.6945	0.7358
Symmetry SSS-YOLOv8n	Smoke	0.8489	0.7212	0.7799
Symmetry SSS-YOLOv8n	Flame	0.7987	0.7010	0.7467

Table 4. Comparison results with different attention mechanisms.

Models	Precision	Recall	Parameters/M	Model Size/MB
YOLOv8n	0.7725	0.6820	3.03	6.3
YOLOv8n+CBAM	0.7555	0.6400	3.12	6.5
YOLOv8n+SE	0.7900	0.6980	3.03	6.3

Table 5. Results of comparative experiments.

Models	Precision(%)	Recall(%)	mAP50(%)	FPS
YOLOv3-Tiny	63.4	62.8	63.5	69
YOLOv5s	74.5	65.7	72.9	67
YOLOv7-Tiny	71.1	66.9	71.4	97
YOLOv8n	77.3	68.2	74.8	230
YOLOv8n-World	77.9	68.8	75.9	223
YOLOv9-Tiny	78.1	67.8	77.2	93
YOLOv8-FEP	78.8	70.8	77.9	213
YOLOv11n	79.5	71.8	78.7	253
Symmetry SSS-YOLOv8n	79.9	70.1	78.3	226

Table 6. TP, FP, and FN of different models.

Models	SSD	Faster_RCNN	Symmetry SSS-YOLOv8n
TP	1160	1197	1322
FP	394	357	332
FN	518	481	356

Table 7. Results of detailed performance comparison.

Models	mAP@0.5(%)	Precision(%)	Recall(%)	F1-Score
SSD	72.9	74.6	69.1	71.8
Faster_RCNN	74.4	77.0	71.3	74.1
symmetry SSS-YOLOv8n	78.3	79.9	78.8	79.4

Table 8. Detection results.

Total	TP	FN	Recall
30	28	2	93.3%

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Liu, B.; Wang, J.; An, Q.; Wan, Y.; Zhou, J.; Chen, X. Gardens Fire Detection Based on the Symmetrical SSS-YOLOv8 Network. Symmetry 2025, 17, 1269. https://doi.org/10.3390/sym17081269

AMA Style

Liu B, Wang J, An Q, Wan Y, Zhou J, Chen X. Gardens Fire Detection Based on the Symmetrical SSS-YOLOv8 Network. Symmetry. 2025; 17(8):1269. https://doi.org/10.3390/sym17081269

Chicago/Turabian Style

Liu, Bo, Junhua Wang, Qing An, Yanglu Wan, Jianing Zhou, and Xijiang Chen. 2025. "Gardens Fire Detection Based on the Symmetrical SSS-YOLOv8 Network" Symmetry 17, no. 8: 1269. https://doi.org/10.3390/sym17081269

APA Style

Liu, B., Wang, J., An, Q., Wan, Y., Zhou, J., & Chen, X. (2025). Gardens Fire Detection Based on the Symmetrical SSS-YOLOv8 Network. Symmetry, 17(8), 1269. https://doi.org/10.3390/sym17081269

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Gardens Fire Detection Based on the Symmetrical SSS-YOLOv8 Network

Abstract

1. Introduction

2. Related Work

2.1. Two-Stage Models

2.2. One-Stage Models

3. Proposed Method

3.1. Lightweight Module SSS-Neck

3.2. Integration of the SE Module

3.3. Selection of Loss Function

3.4. The Grad-CAM Algorithm Visualizes the Heatmap Features of the Model

4. Experiment Analysis

4.1. Dataset

4.2. Evaluation Metrics

4.3. Ablation Experiment

4.4. Comparison of Different Attention Mechanisms

4.5. Comparison Experiment

5. Application

6. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI