YOLOv8n-SMMP: A Lightweight YOLO Forest Fire Detection Model

Zhou, Nianzu; Gao, Demin; Zhu, Zhengli

doi:10.3390/fire8050183

Open AccessArticle

YOLOv8n-SMMP: A Lightweight YOLO Forest Fire Detection Model

by

Nianzu Zhou

,

Demin Gao

and

Zhengli Zhu

^*

College of Information Science and Technology and Artificial Intelligence, Nanjing Forestry University, Nanjing 210037, China

^*

Author to whom correspondence should be addressed.

Fire 2025, 8(5), 183; https://doi.org/10.3390/fire8050183

Submission received: 9 April 2025 / Revised: 30 April 2025 / Accepted: 2 May 2025 / Published: 3 May 2025

(This article belongs to the Special Issue Intelligent Forest Fire Prediction and Detection)

Download

Browse Figures

Versions Notes

Abstract

Global warming has driven a marked increase in forest fire occurrences, underscoring the critical need for timely and accurate detection to mitigate fire-related losses. Existing forest fire detection algorithms face limitations in capturing flame and smoke features in complex natural environments, coupled with high computational complexity and inadequate lightweight design for practical deployment. To address these challenges, this paper proposes an enhanced forest fire detection model, YOLOv8n-SMMP (SlimNeck–MCA–MPDIoU–Pruned), based on the YOLO framework. Key innovations include the following: introducing the SlimNeck solution to streamline the neck network by replacing conventional convolutions with Group Shuffling Convolution (GSConv) and substituting the Cross-convolution with 2 filters (C2f) module with the lightweight VoV-based Group Shuffling Cross-Stage Partial Network (VoV-GSCSP) feature extraction module; integrating the Multi-dimensional Collaborative Attention (MCA) mechanism between the neck and head networks to enhance focus on fire-related regions; adopting the Minimum Point Distance Intersection over Union (MPDIoU) loss function to optimize bounding box regression during training; and implementing selective channel pruning tailored to the modified network architecture. The experimental results reveal that, relative to the baseline model, the optimized lightweight model achieves a 3.3% enhancement in detection accuracy (mAP@0.5), slashes the parameter count by 31%, and reduces computational overhead by 33%. These advancements underscore the model’s superior performance in real-time forest fire detection, outperforming other mainstream lightweight YOLO models in both accuracy and efficiency.

Keywords:

forest fire detection; YOLOv8; lightweight model; channel pruning

1. Introduction

Forest ecosystems, characterized by rich biodiversity, play a critical role in soil and water conservation as well as the ecological cycle of the Earth’s environment [1]. Additionally, forests are indispensable to human society’s daily production activities, making forest preservation vital for sustainable development [2]. Traditional forest fire detection methods, such as ground patrols [3], observation towers [4], remote video surveillance [5], UAV inspections [6,7], remote sensing [8,9], and sensor-based monitoring [10,11], often face challenges such as high maintenance costs, slow response times, and reliance on specialized training, hindering their ability to achieve cost-effective and timely fire alarms.

Advancements in computer vision have enabled more intuitive approaches for forest fire monitoring. Vision-based detection systems deploy specialized cameras in protected forest areas to analyze real-time video streams through image processing techniques. Conventional methods rely on manually designed features, involving preprocessing, feature extraction, and classification to identify regions potentially containing smoke or flames. For instance, Lin et al. [12] proposed a multi-stage decision strategy for intelligent fire image analysis in video sequences. Wei et al. [13] measure the matching between the extracted feature semantic information and the visual information of unknown class samples to understand the overall semantics of the video.

Since 2012, the rapid development of deep learning in computer vision has revolutionized traditional video-based fire detection. Object detection algorithms, categorized into two-stage and one-stage methods, have emerged as powerful tools. Two-stage methods, exemplified by R-CNN [14], Fast R-CNN [15], Faster R-CNN [16], and Mask R-CNN [17], first generate candidate regions and then classify them. In contrast, one-stage methods, such as SSD [18] and the YOLO (You Only Look Once) series (including YOLOv1 [19], YOLOv2 [20], YOLOv3 [21], YOLOv4 [22], YOLOv5 [23], YOLOv6 [24], YOLOv7 [25], YOLOv8 [26], YOLOv9 [27], YOLOv10 [28], and YOLOv11 [29]), predict bounding boxes and class probabilities simultaneously, offering superior speed. Deep learning-based approaches have been widely adopted in forest fire detection research [30]. Frizzi et al. [31] directly used convolutional neural networks as flame detectors to extract flame features in images. Rykhard et al. [32] proposed an algorithm to identify smoke detection in videos with obvious time series features. Peng et al. [33] enhanced SqueezeNet to identify smoke regions and Sarikaya et al. [34] fused outputs from four CNNs for flame detection in aerial imagery. Recent studies have focused on optimizing YOLO variants: Hou et al. [35] improved infrared target detection using YOLOv5-s; Wang et al. [36] proposed YOLO-LFD for forest fire detection; Liu et al. [37] refined YOLOv7-tiny for higher accuracy; Wang et al. [38] proposed a new model for forest fire detection by improving the YOLOv8 network structure. Yun et al. [39] reduced parameters in YOLOv8. Wang et al. [40] further advanced the field with DSS-YOLO. Forest fires frequently occur in remote areas with unstable communication networks connectivity, which necessitates real-time detection on edge devices. These devices are resource-constrained systems, such as embedded or IoT devices, that possess limited computational power and storage capacity. Deploying bulky models on such devices leads to latency and inefficiency. Model compression refers to techniques that reduce model size and computational complexity while preserving accuracy, such as pruning [41], quantization [42,43], and knowledge distillation [44].

Despite progress, challenges persist: incomplete forest fire datasets, difficulties in detecting dynamic smoke and flames, limited model generalization, insufficient localization accuracy, and slow processing speeds. To address these issues, this paper proposes YOLOv8n-SMMP, an enhanced forest fire detection model based on YOLOv8. The contributions are summarized as follows:

The neck network is optimized through a SlimNeck lightweight design, using GSConv and the VoV-GSCSP module, replacing standard convolutions and the C2f module to reduce computational overhead while maintaining performance;
Embedding the MCA attention module between the neck and head networks, MCA enhances focus on critical regions containing flames or smoke;
During training, replacing the Complete Intersection over Union (CIoU) loss function with MPDIoU simplifies bounding box regression through a more geometrically intuitive approach and reduces model computational complexity;
Developing a selective pruning strategy tailored to the lightweight network structure compresses model parameters and computations significantly without compromising accuracy.

These innovations collectively improve detection precision, accelerate processing, and facilitate deployment on edge devices, advancing early wildfire warning capabilities.

2. Materials and Methods

2.1. Datasets

In this study, a comprehensive forest fire dataset was constructed by integrating multiple sources, including manually curated images from publicly available datasets on platforms such as Kaggle [45,46]; additional flame and smoke images were collected through online searches and field photography. The final dataset comprises 2603 images captured under diverse lighting and weather conditions, with annotations focusing on visible flame edges and smoke diffusion regions. All annotations were performed locally using the Labelme software [47], ensuring precise manual labeling of fire and smoke targets. Figure 1 presents a detailed statistical analysis of the dataset.

For experimental validation, the dataset was randomly partitioned into training (2083 images), validation (260 images), and test (260 images) sets at an 8:1:1 ratio. This dataset’s diversity and meticulous annotation framework ensure robust training conditions, addressing key challenges in forest fire detection, such as scale variability, spatial coverage, and small-target recognition, thereby advancing the practical utility of early-warning systems.

2.2. Optimization of YOLOv8 Model

2.2.1. Baseline Model Selection and Architecture

Forest fires frequently occur in remote regions with unstable communication networks connectivity, rendering cloud-based data processing impractical due to transmission latency [48]. This necessitates the deployment of real-time fire detection models on edge devices, which are resource-constrained systems characterized by limited computational power, storage, and energy efficiency. Direct deployment of uncompressed deep learning models on such devices risks failure due to excessive parameter sizes or computational loads, leading to operational inefficiencies and storage constraints. Thus, model lightweighting becomes imperative to meet the real-time demands of forest fire detection systems. As a single-stage object detection framework, YOLO offers significant speed advantages, making it ideal for real-time applications. Building upon the design principles of YOLOv3 and YOLOv5, YOLOv8 introduces model clusters of varying sizes to accommodate diverse detection requirements. To address lightweight requirements for edge deployment, this study selects YOLOv8n, the smallest and most computationally efficient variant in the YOLOv8 series, as the baseline model for optimization.

Figure 2 illustrates the YOLOv8 architecture, which comprises three core components. The backbone network, constructed using Conv–Batch normalization–SiLU (CBS) modules, is a vertical structure responsible for extracting multi-scale features from input images. The CBS module is highlighted in the top subfigure of Figure 2 with the green background, and it integrates three sequential layers: a convolutional layer (Conv), batch normalization (BN), and a Sigmoid-Weighted Linear Unit (SiLU) activation. The SiLU function ensures smooth gradient flow, mitigating vanishing gradient issues during training. Notably, the C2f module replaces YOLOv5’s C3 structure, optimizing gradient propagation through multi-branch residual connections. The lower subfigure in Figure 2 details the bottleneck structure within the C2f module. Additionally, the Spatial Pyramid Pooling Fast (SPPF) module enhances multi-scale feature fusion, improving feature representation capabilities. The neck network employs a hierarchical Bidirectional Feature Pyramid Network with Path Aggregation Network (BiFPN-PAN) architecture. This design facilitates bidirectional cross-scale feature interaction through top-down and bottom-up pathways, effectively fusing high-level semantic information with low-level spatial details to enhance robustness in multi-target detection. The head network adopts YOLOX’s decoupled head design [49], separating classification and regression tasks. Parallel prediction layers process targets across varying scales, significantly improving convergence efficiency and detection accuracy.

2.2.2. Introduce the SlimNeck Solution

The SlimNeck structure represents a novel network optimization technique [50], distinguished by its unique architectural design. By integrating GSConv and the efficient cross-stage partial network module VoV-GSCSP, SlimNeck achieves high-performance feature fusion while significantly reducing model complexity.

Figure 3 illustrates the structure of the VoV-GSCSP module, the core component of the SlimNeck solution. The VoV-GSCSP module’s workflow proceeds as follows: The input feature with c1 channels is split into two branches. The main branch reduces dimensionality through a 1 × 1 convolution to c2/2 channels and sequentially passes through multiple GSBottleneck modules, while the shortcut branch undergoes direct 1 × 1 convolution. The outputs from both branches are concatenated along the channel dimension, followed by a final 1×1 convolution to adjust the channel count to the target c2. This design maintains model performance while reducing computational and parametric costs, enabling efficient feature extraction. The GSConv employed in VoV-GSCSP is a lightweight convolution method requiring approximately half the computations of standard convolution (SC). Through a channel shuffling strategy, GSConv retains richer semantic information, enhancing the expressive power of flame and smoke features while minimizing computational costs.

As illustrated in Figure 4, the SlimNeck architecture integrates the VoV-GSCSP module with GSConv and GSBottleneck components to form an efficient feature fusion network. The adoption of this lightweight solution enhances the YOLOv8n model’s detection accuracy for real-time forest fire target identification while simultaneously reducing model parameters and improving inference speed. By leveraging collaborative interactions among these modules, the architecture achieves a balance between computational efficiency and feature representation robustness, addressing critical challenges in edge-based wildfire detection systems.

2.2.3. Integrated MCA Attention Mechanism

In forest fire detection, the high similarity between flames, smoke, and the background, along with issues like target occlusion and low contrast interference, complicates accurate identification. The attention mechanism enhances the model’s focus on fire points through dynamic weight distribution, suppresses other interference, and improves detection accuracy. MCA is a high-performance, lightweight attention module [51]. Its three-branch parallel architecture enables complementary feature interaction across channel (C), height (H), and width (W) dimensions. The core lies in its unique design that simultaneously learns complementary attention in these three dimensions, which boosts the model’s spatial feature perception with almost no extra computation. Its formal mathematical expression is Equation (1), where

X \in R^{C \times H \times W}

is the input feature tensor, and

F_{W}

,

F_{H}

, and

F_{C}

represent the attention function in width, height, and channel dimensions.

M C A (X) = \frac{1}{3} (F_{W} (X) + F_{H} (X) + F_{C} (X))

(1)

As shown in Figure 5, the MCA structure has three branches. The top-most width dimension branch uses tensor reshaping to focus on horizontal spread features like flames. The middle height dimension branch, similar to the width branch, captures vertical features such as smoke diffusion. The bottom channel dimension branch keeps the original channel structure. MCA’s attention mechanism has two key parts: squeeze and excitation transformations. The squeeze transformation adaptively fuses global average and standard deviation pooling features, enhancing feature discrimination. The excitation transformation then determines channel interaction coverage, focusing on large-size features and boosting small-size receptive fields, generating precise attention weights. At the far right of Figure 5, the three branches’ outputs are calibrated by attention weights and aggregated by simple averaging. This adaptively adjusts feature map weights, improving target localization and recognition.

In our improved model, the MCA module is placed between the neck and head networks. Data pass through the MCA layer, enhancing feature representation with multi-scale context aggregation. This provides richer semantics with minimal computational cost, making the model focus more on key flame and smoke areas, improving detection accuracy, especially for small targets.

2.2.4. Introduced the MPDIoU Loss Function

MPDIoU is a novel approach for comparing bounding box similarities [52]. It transforms the loss calculation during model training into minimizing the distance between the model-inferred bounding box and the real labeled one, guiding faster model convergence. In the forest fire detection model, MPDIoU loss optimizes bounding box regression, enabling quicker convergence to accurate detection results and enhancing the model’s accuracy for flame and smoke targets. The MPDIoU calculation steps are as follows. In this paper, we use two bounding boxes, Box1 (predicted box) and Box2 (true box), and the following is an example of the MPDIoU calculation process:

As shown in Figure 6, suppose there are two bounding boxes, Box1 (the predicted box) and Box2 (the true box), whose coordinates are as follows:

Box1: top-left corner

(x_{1}, y_{1})

, bottom-right corner

(x_{2}, y_{2})

; Box2: upper-left corner

(x_{1_{g t}}, y_{1_{g t}})

, lower-right corner

(x_{2_{g t}}, y_{2_{g t}})

; the MPDIoU calculation formula is shown in Equation (2):

I = m a x (0, m i n (x_{2}, x_{2_{g t}}) - m a x (x_{1}, x_{1_{g t}})) \times m a x (0, m i n (y_{2}, y_{2_{g t}}) - m a x (y_{1}, y_{1_{g t}})) U = (x_{2} - x_{1}) \times (y_{2} - y_{1}) + (x_{2_{g t}} - x_{1_{g t}}) \times (y_{2_{g t}} - y_{1_{g t}}) - I I o U = \frac{I}{U} d_{1}^{2} = {(x_{1} - x_{1_{g t}})}^{2} + {(y_{1} - y_{1_{g t}})}^{2} d_{2}^{2} = {(x_{2} - x_{2_{g t}})}^{2} + {(y_{2} - y_{2_{g t}})}^{2} M P D I o U = I o U - \frac{d_{1}^{2}}{w^{2} + h^{2}} - \frac{d_{2}^{2}}{w^{2} + h^{2}} L_{M P D I o U} = 1 - M P D I o U

(2)

MPDIoU offers a more precise loss metric than traditional IoU and its variants. By considering geometric factors through the coordinates of the upper-left and lower-right corners, MPDIoU simplifies the loss function’s calculation, reduces computational complexity, and enhances the model’s detection accuracy for fire and smoke targets. In this study, replacing the CIoU loss with MPDIoU during training improves bounding box regression, providing a clearer geometric interpretation.

2.2.5. Improved YOLO Network Model YOLOv8n-SMMP

The architecture of the enhanced YOLOv8n-SMMP model is depicted in Figure 7. The lightweight neck network incorporates the SlimNeck framework, implemented through GSConv and VoV-GSCSP modules. GSConv replaces conventional convolution layers, reducing computational complexity and parameter redundancy. The VoV-GSCSP module substitutes the original C2f structure for feature extraction, maintaining detection accuracy while minimizing inference latency. Furthermore, the MCA mechanism is integrated between the neck and head networks. This module enhances feature representation by aggregating contextual information across channel, height, and width dimensions, significantly improving detection precision for flame and smoke targets with negligible computational overhead, particularly small-scale fire points. The synergistic integration of these components enables the model to capture richer spatial-semantic features, achieving a balance between efficiency and accuracy in real-time wildfire detection scenarios.

2.2.6. Pruning Algorithm Design

Model compression technology is crucial for deploying complex models on edge devices, as it reduces computational and storage demands, enabling efficient model operation. In the context of forest fire detection, this efficiency allows for quicker identification of fires, providing critical time savings for firefighting efforts. In order to further meet the real-time requirements of forest fire detection and adapt to the edge equipment with low computing power and low power consumption, this paper designs an improved channel pruning technology to further compress the model.

Channel pruning, a structured approach to model compression [53], leverages the scaling factor

γ

from the BN layer to identify and remove redundant channels. By converting the channel selection process into an optimizable sparsity problem, this method effectively streamlines the model. A key advantage of channel pruning is its ability to maintain the integrity of the model’s structure during the pruning process, avoiding the need for extensive architectural changes. Additionally, it does not rely on specialized hardware or complex software support, making it a simple and flexible solution that significantly reduces the technical and implementation costs associated with model lightweighting. Figure 8 is a schematic diagram of channel pruning. The process involves evaluating each channel’s contribution using its associated

γ

value. Channels with

γ

values below a certain threshold are deemed redundant and are removed, simplifying the model’s architecture. This approach not only reduces the model’s computational load but also ensures that the model’s performance remains intact, making it particularly suitable for resource-constrained environments.

The process of the pruning algorithm designed in this paper is shown in Figure 9, which evaluates the importance of channels based on the F1 norm, removes redundant channels of the model, and retains key features through an iterative pruning strategy. The target model comprises three core components: The model proposed in this paper mainly consists of three core components. The backbone network acts as the “eyes” of the model and extracts basic visual features such as edges and textures from the input image through stacked convolutional layers. The neck network serves as a “bridge” between feature extraction and decision-making. It merges features from different levels (e.g., fine details from shallow layers and semantic patterns from deep layers) to enhance contextual understanding. The head network plays the role of the “brain” and makes the final decision. It uses the fused features from the neck to classify flame or smoke targets and locates their positions through bounding boxes. A selective pruning strategy is tailored to the structural characteristics of the improved model, focusing on convolutional layers in Conv, C2f, SPPF, and Detect modules:

Backbone pruning: Redundant standard convolutional layers in repetitive CBS blocks are pruned without compromising feature extraction. For C2f modules, the output channels of the cv1 convolution in bottleneck layers are retained, while cv2 convolutional layers are pruned. Both cv1 and cv2 layers in SPPF modules are pruned;
Neck network pruning: A dependency graph is constructed to ensure channel alignment for cross-layer concatenation operations in GSConv and VoV-GSCSP modules, maintaining feature fusion consistency. The MCA attention layer between the neck and head networks is updated to preserve channel coherence, ensuring post-pruning functionality;
Head network pruning: Parallel convolutional layers in classification and regression heads are pruned synchronously to maintain task decoupling.

A dependency graph is constructed to ensure the consistency of the channel dimension of the cross-layer connection of the model after pruning, and the MCA attention layer is specially processed to avoid the destruction of the attention mechanism caused by pruning.

The pruning process designed in this paper is mainly divided into three stages. Firstly, a dependency graph is constructed, which is used to describe the topological relationship between the layers of the model to ensure the structural integrity of the model after pruning. The dependency graph module from the torch_pruning library is utilized to automatically analyze the topological relationships between each layer and guarantee channel dimension consistency in cross-layer connections after pruning. Then, the importance of the channel is evaluated. The

L 1

norm is selected as the importance index of the channel in this paper, and its calculation formula is shown in Equation (3).

S_{c} = \sum_{k = 1}^{K} \sum_{h = 1}^{H} \sum_{w = 1}^{W} |W_{k, h, w}^{c}|

(3)

Here,

W_{k, h, w}^{c}

represents the weight tensor of the

C_{t h}

channel, and

K

,

H

, and

W

are the number of convolution kernels, height, and width, respectively. A lower channel importance score

S_{c}

indicates that the channel has higher redundancy. Finally, iterative pruning and restoration are established. In order to avoid excessive pruning resulting in a sudden drop in accuracy, a gradual pruning strategy is adopted, and fine-tuning of the model is performed after each round of pruning. Freeze part of the backbone layer during fine tuning, train only the neck and head, update the channel dimension of the attention layer when saving the pruning model after each round of pruning, and maintain the integrity of the attention mechanism.

3. Results and Discussion

3.1. Evaluation Metrics

In order to verify the improved model performance in this paper, the following indicators are used to evaluate the model.

Mean average precision (mAP) [54] is a crucial metric for evaluating a model’s overall target detection accuracy across different categories. Its calculation formula is shown in Equation (4). In this formula, TP denotes the number of regions correctly identified by the model as containing flame or smoke targets. FP refers to the number of regions incorrectly identified as containing these targets when they are not present. FN indicates the number of regions with actual flame or smoke targets that the model fails to detect. C represents the number of target categories the model detects; in this study, C is two, corresponding to smoke and fire. mAP@0.5 specifically denotes the model’s average detection accuracy for smoke and fire when the Intersection over Union (IoU) threshold is set at 0.5.

P = \frac{T P}{T P + F P} R = \frac{T P}{T P + F N} A P = \int_{0}^{1} P (R) d R m A P = \frac{1}{C} \sum_{i = 1}^{C} A P_{i}

(4)

Frames per second (FPS) is used in object detection tasks as a measure of the inference speed of the model, which indicates how many images the model can detect per second. Its calculation formula is shown in Equation (5), where N is the total number of test images and t_i represents the inference time for the model to process the ith image.

F P S = \frac{N}{\sum_{i = 1}^{N} t_{i}}

(5)

Parameter count (Params) refers to the total number of trainable parameters in a model, serving as a measure of spatial complexity and reflecting the storage requirements during model inference.

Computational complexity quantifies the arithmetic operations required for model inference, specifically represented as the total floating-point operations (FLOPs). In this study, Giga FLOPs (GFLOPs)—billions of floating-point operations—are adopted as the standardized unit to evaluate computational workload during model inference experiments.

3.2. Experimental Environment and Parameter Setting

3.2.1. Experimental Environment

The main configuration of the training environment used in this paper is shown in Table 1 of the environment configuration, and NVIDIA’s parallel computing platform CUDA is used for model acceleration training on the GPU.

3.2.2. Experimental Parameter Setting

Figure 10 presents the loss curve during YOLOv8n-SMMP model training, with training rounds on the x-axis and loss values on the y-axis. At 100 iterations, training losses have minimized, and continued training risks overfitting. The bounding box loss and classification loss on the validation set almost all converge at 100 rounds, and the bounding box loss has an increasing trend after 100 rounds. The distributed focal loss reaches the lowest point at about 70 rounds, and the classification loss of the model is enlarged by later training. It can be seen that continued training may lead to model overfitting, so the training is stopped at 100 rounds, and the iteration rounds of the subsequent training under different conditions are set to 100 epochs. Training phase: Imgsz = 640, batch size = 16, epochs = 100; fine-tuning stage: Imgsz = 640, batch size = 8, epochs = 50, EMA decay rate = 0.9999.

3.3. Experimental Results

3.3.1. Ablation Experiment

To verify the effectiveness of the improved modules presented in this paper, multiple experiments were conducted on the same dataset. As summarized in Table 2, the lightweight YOLOv8n-SMMP model, incorporating all enhancements, achieves a comprehensive detection accuracy (mAP@0.5) of 67.5% for flame and smoke targets, with 2.08M parameters and 5.4 GFLOPs computational complexity. The model attains a real-time inference speed of 82.6 FPS, fulfilling the stringent requirements of forest fire detection tasks. Compared to the baseline YOLOv8n, the optimized model demonstrates a 3.3% improvement in mAP@0.5, alongside a 31% reduction in parameters (0.93M fewer) and a 33% decrease in computational load (2.7 GFLOPs reduction).

Specifically, the individual improvement points of the YOLO model proposed in this paper play distinct roles, and the combination of these improvements demonstrates a synergistic optimization effect that surpasses the simple additive effect of single-point usage.

When the MCA attention mechanism is added alone, the detection accuracy metric mAP@0.5 increases by 1.7%, with almost no change in the number of parameters and computational complexity, though the model’s reasoning speed decreases. This indicates that incorporating the MCA mechanism between the neck and head networks effectively enhances the model’s detection accuracy for flame and smoke targets. The feature map from the neck network, after processing by the MCA module, decomposes fused features into width, height, and channel dimensions. Width-based processing enables the model to more accurately capture key horizontal features, aligning with the horizontal dynamics of forest fire spread and improving the accuracy of identifying dynamically changing flame targets. Height-based branch processing increases the model’s sensitivity to vertical smoke diffusion in images. Combining this with unchanged channel dimension features retains original semantic information, enriching the feature map and improving detection accuracy. The decrease in inference speed is due to the MCA mechanism’s additional computation from three-branch processing, leading to a slight slowdown.

Following the introduction of the SlimNeck lightweight program, it is observed that the model’s parameter count has been reduced by approximately 5%, and the computational load has decreased by 0.7 GFLOPs, as indicated by ablation experimental data. This suggests that the lightweight module effectively diminishes the structural complexity of the model through optimization of the convolutional computation process. Concurrently, both detection accuracy and speed have shown slight improvements, implying that SlimNeck successfully maintains at least a baseline level of detection accuracy prior to implementing lightweight techniques while incorporating GSConv to reduce convolutional computation complexity via mixed feature output. Moreover, through the utilization of a high-performance lightweight cross-stage partial network known as the VoV-GSCSP module, there is an enhancement in the model’s feature extraction capabilities. Consequently, this leads to a 0.4% improvement in comprehensive detection accuracy for flame and smoke targets. These findings indicate that this approach can holistically enhance model performance and contribute positively to improving both detection accuracy and inference speed.

Replacing YOLOv8n’s loss function from CIoU to MPDIoU alone increases the model’s comprehensive detection accuracy mAP@0.5 by 2.5%, showing MPDIoU’s effectiveness in improving detection accuracy for flame and smoke targets. The number of model parameters and computational complexity remain almost unchanged, with the model’s detection speed increasing by 0.7 FPS, indicating MPDIoU enhances accuracy without added complexity and slightly improves processing speed through simplified loss calculations.

The improved channel pruning algorithm, when applied to YOLOv8n, reduces model parameters from 3.01 M to 2.16 M and computational complexity from 8.1 GFLOPs to 5.5 GFLOPs, while boosting inference speed from 62.3 FPS to 76.1 FPS and increasing detection accuracy to 64.7%. The results show the proposed channel pruning technique significantly reduces parameters and computation, maintaining detection accuracy.

In summary, combining SlimNeck, MCA, MPDIoU, and the improved channel pruning technology reduces YOLOv8n’s parameters by 31% and computation by 33% compared to the original model, while improving detection accuracy. This significantly enhances the YOLO model’s real-time detection capability on edge devices, meeting practical forest fire detection requirements.

3.3.2. Comparative Experiment

To quantify the superiority of the proposed YOLOv8n-SMMP model in real-time forest fire detection, we conducted a comprehensive evaluation against mainstream lightweight YOLO series, including YOLOv7-tiny, YOLOv8n, YOLOv8s, and YOLOv11n. The generalization capacity of the model was rigorously assessed through hold-out validation, cross-model comparisons, and multi-metric consistency analysis. The dataset, comprising 2603 images with diverse environmental and lighting conditions, was partitioned into training, validation, and test sets. This stratification ensured that the model’s performance on unseen data could be objectively evaluated.

As shown in Table 3, YOLOv8n-SMMP outperformed all baseline models, achieving the highest mAP@0.5 score of 67.5% while maintaining the lowest parameter count (2.08M) and computational complexity (5.4 GFLOPs). Compared to the original YOLOv8n, the optimized model demonstrated a 3.3% improvement in detection accuracy, alongside a 31% reduction in parameters and a 33% decrease in computational load.

To demonstrate the enhanced recognition capabilities of the YOLOv8n-SMMP model compared to the original YOLOv8n in forest fire real-time detection tasks, this study conducted comprehensive comparative experiments between the improved and original models. Figure 11 presents the precision–-recall curves for the original YOLOv8n and the improved YOLOv8n-SMMP models. The improved model demonstrates enhanced detection accuracy for both flame and smoke targets, with the mean average precision at 0.5 IoU (mAP@0.5) increasing by 3% for flame targets and 3.4% for smoke targets, representing an overall improvement of 3.3% in comprehensive detection accuracy. The precision–-recall curves further highlight the model’s enhanced generalization, reflecting robust performance across varying target scales and environmental conditions.

Figure 12 illustrates the F1 curves for both models. The improved model achieves an F1 score of 0.66 at a confidence threshold of 0.285, marking a 0.06 increase in F1 score under nearly identical confidence conditions. This improvement underscores the model’s balanced precision–-recall trade-off, even under dynamic wildfire scenarios.

Figure 13 and Figure 14 present a qualitative comparison of detection performance between the baseline YOLOv8n and the optimized YOLOv8n-SMMP model on identical forest fire validation images. The enhanced model demonstrates superior precision in identifying smoke and flame targets, particularly for small-scale fire points, with significantly higher confidence scores than the baseline. For instance, in the 12th image of Figure 13, the baseline model generated a false positive detection (fire: 0.4) for a flame-like region in the upper-left corner due to insufficient attention to contextual features. In contrast, YOLOv8n-SMMP accurately localized both flames and smoke in this region by leveraging the MCA mechanism’s ability to suppress environmental interference through cross-dimensional feature discrimination in the 12th image of Figure 14. Notably, in the 15th image of Figure 14, the improved model correctly distinguished clouds from smoke by analyzing texture patterns, achieving a smoke confidence score of 0.6 compared to the baseline’s 0.5. This result highlights the model’s enhanced environmental robustness, enabled by the MCA module’s capacity to capture discriminative spatial and channel-wise features.

While the optimized model exhibits marked improvements, certain detections under low-contrast conditions display reduced confidence scores (e.g., fire: 0.3, smoke: 0.3–0.4). For example, in the upper-left region of Figure 14, the model’s detection accuracy for large-scale smoke decreased due to blurred feature boundaries caused by atmospheric scattering. Future work will address this limitation by integrating adaptive receptive field modules to enhance large-scale feature representation.

Despite these edge cases, YOLOv8n-SMMP maintains robust performance across diverse scenarios. As evidenced in Table 3, the model consistently detects both expansive fire zones and subtle smoke plumes, outperforming other lightweight YOLO variants by 3.3–5.1 percentage points in mAP@0.5 for forest fire detection tasks.

To further emphasize the strengths of YOLOv8n-SMMP, we compare it with two recent and representative forest fire detection models: YOLO-LFD [36] and DSS-YOLO [40]. Figure 15 illustrates the comparative experimental results of the three models in terms of four key parameters. DSS-YOLO introduces dual-stream attention and bi-directional feature fusion to enhance detection accuracy, achieving an mAP@0.5 of 66.2% with 3.2M parameters and 7.9 GFLOPs. However, its structural complexity increases inference latency, limiting suitability for edge deployment. In contrast, YOLOv8n-SMMP achieves 67.5% mAP@0.5 with only 2.08M parameters and 5.4 GFLOPs, offering a better trade-off between precision and efficiency. Similarly, YOLO-LFD leverages Ghost modules and a simplified PANet structure to compress the model and achieves 65.3% mAP@0.5 with 2.42M parameters. However, YOLOv8n-SMMP outperforms it in both accuracy and real-time speed (82.6 FPS vs. ~70 FPS). These results demonstrate that YOLOv8n-SMMP achieves state-of-the-art detection performance while significantly reducing model size and computational burden, thus better satisfying the demands of real-world forest fire monitoring on low-power embedded devices.

4. Conclusions

This study addresses the critical challenge of balancing detection accuracy and real-time performance in forest fire monitoring by introducing YOLOv8n-SMMP, an optimized lightweight model based on the YOLO framework. Architectural enhancements, including the SlimNeck module for efficient feature fusion, the MCA mechanism for spatial-channel focus, and the MPDIoU loss function for precise bounding box regression, collectively improve the model’s detection precision while reducing computational demands. Compared to the baseline YOLOv8n, the proposed model achieves a 3.3% increase in mAP@0.5 alongside a 31% reduction in parameters and a 33% reduction in computational complexity (from 8.1 to 5.4 GFLOPs). With an inference speed of 82.6 frames per second, the model demonstrates practical viability for deployment on edge devices such as drones and surveillance cameras, enhancing early wildfire detection capabilities.

The study acknowledges several limitations that warrant further investigation. First, the model exhibits reduced accuracy in detecting low-contrast smoke under complex environmental conditions, such as haze or overlapping cloud textures. Future efforts will focus on integrating adaptive contrast enhancement algorithms to improve feature discrimination in such scenarios. Second, while the model excels in controlled experiments, its performance in real-world fire scenarios remains untested. Planned field deployments will assess its stability and generalization across diverse terrains and weather conditions, with operational data guiding iterative refinements. Third, the current pruning strategy employs fixed rates, which may compromise shallow features critical for small-target detection. Future work will explore dynamic pruning mechanisms combined with knowledge distillation to achieve higher compression ratios without sacrificing accuracy.

Author Contributions

Conceptualization, N.Z.; software, N.Z.; supervision, Z.Z.; visualization, N.Z. and D.G.; writing–original draft, N.Z.; writing—review and editing, N.Z. and D.G. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the Frontier Technologies R&D Program of Jiangsu grant number BF2024060.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The datasets analyzed in this study are not publicly available due to privacy and ethical restrictions. De-identified data can be made available upon reasonable request from the corresponding authors.

Conflicts of Interest

The authors declare no conflicts of interest.

Abbreviations

The following abbreviations are used in this manuscript:

YOLOv8n-SMMP	YOLOv8n-SlimNeck–MCA–MPDIoU–Pruned
MCA	Multi-dimensional collaborative attention
GSConv	Group Shuffling Convolution
VoV-GSCSP	VoV-based GSConv Cross-Stage Partial Network
MPDIoU	Minimum Point Distance Intersection over Union
CBS	Conv–Batch normalization–SiLU module
C2f	Cross-convolution with 2 filters
GFLOPs	Giga floating-point operations

References

Anderegg, W.R.L.; Trugman, A.T.; Badgley, G.; Anderson, C.M.; Bartuska, A.; Ciais, P.; Cullenward, D.; Field, C.B.; Freeman, J.; Goetz, S.J.; et al. Climate-driven risks to the climate mitigation potential of forests. Science 2020, 368, eaaz7005. [Google Scholar] [CrossRef] [PubMed]
Gao, Y.; Huang, W.; Xu, R.; Gasevic, D.; Liu, Y.; Yu, W.; Yu, P.; Yue, X.; Zhou, G.; Zhang, Y.; et al. Association between long-term exposure to wildfire-related PM2.5 and mortality: A longitudinal analysis of the UK Biobank. J. Hazard. Mater. 2023, 457, 131779. [Google Scholar] [CrossRef] [PubMed]
Alkhatib, A.A.A. A Review on Forest Fire Detection Techniques. Int. J. Distrib. Sens. Netw. 2014, 10, 597368. [Google Scholar] [CrossRef]
Bao, S.; Xiao, N.; Lai, Z.; Zhang, H.; Kim, C. Optimizing watchtower locations for forest fire monitoring using location models. Fire Saf. J. 2015, 71, 100–109. [Google Scholar] [CrossRef]
Dang-Ngoc, H.; Nguyen-Trung, H. Aerial Forest Fire Surveillance—Evaluation of Forest Fire Detection Model using Aerial Videos. In Proceedings of the 2019 International Conference on Advanced Technologies for Communications (ATC), Hanoi, Vietnam, 17–19 October 2019; pp. 142–148. [Google Scholar]
Sherstjuk, V.; Zharikova, M.; Sokol, I. Forest Fire Monitoring System Based on UAV Team, Remote Sensing, and Image Processing. In Proceedings of the 2018 IEEE Second International Conference on Data Stream Mining & Processing (DSMP), Lviv, Ukraine, 21–25 August 2018; pp. 590–594. [Google Scholar]
Fralenko, V.P. Neural Network Methods for Detecting Wild Forest Fires. Sci. Tech. Inf. Process. 2024, 51, 497–505. [Google Scholar] [CrossRef]
Kim, Y.; Kim, B.-R.; Park, S. Synergistic use of multi-satellite remote sensing to detect forest fires: A case study in South Korea. Remote Sens. Lett. 2023, 14, 491–502. [Google Scholar] [CrossRef]
Güney, C.O.; Mert, A.; Gülsoy, S. Assessing fire severity in Turkey’s forest ecosystems using spectral indices from satellite images. J. For. Res. 2023, 34, 1747–1761. [Google Scholar] [CrossRef]
Zhang, J.; Li, W.; Yin, Z.; Liu, S.; Guo, X. Forest fire detection system based on wireless sensor network. In Proceedings of the 2009 4th IEEE Conference on Industrial Electronics and Applications, Xi’an, China, 25–27 May 2009; pp. 520–523. [Google Scholar]
Sridhar, P.; Thangavel, S.K.; Parameswaran, L.; Oruganti, V.R.M. Fire Sensor and Surveillance Camera-Based GTCNN for Fire Detection System. IEEE Sens. J. 2023, 23, 7626–7633. [Google Scholar] [CrossRef]
Lin, M.X.; Chen, W.L.; Liu, B.S.; Hao, L.N. An Intelligent Fire-Detection Method Based on Image Processing. Adv. Eng. Forum 2011, 2–3, 172–175. [Google Scholar] [CrossRef]
Wei, R.; Yan, R.; Qu, H.; Li, X.; Ye, Q.; Fu, L. SVMFN-FSAR: Semantic-Guided Video Multimodal Fusion Network for Few-Shot Action Recognition. Big Data Min. Anal. 2025, 8, 534–550. [Google Scholar] [CrossRef]
Girshick, R.; Donahue, J.; Darrell, T.; Malik, J. Rich feature hierarchies for accurate object detection and semantic segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Columbus, OH, USA, 24–27 June 2014. [Google Scholar] [CrossRef]
Girshick, R. Fast R-CNN. In Proceedings of the 2015 IEEE International Conference on Computer Vision (ICCV 2015), Santiago, Chile, 7–13 December 2015; pp. 1440–1448. [Google Scholar]
Ren, S.; He, K.; Girshick, R.; Sun, J. Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks. IEEE Trans. Pattern Anal. Mach. Intell. 2016, 39, 1137–1149. [Google Scholar] [CrossRef] [PubMed]
He, K.; Gkioxari, G.; Dollár, P.; Girshick, R. Mask R-CNN 2018. In Proceedings of the IEEE International Conference on Computer Vision, Salt Lake City, UT, USA, 2–4 April 2018. [Google Scholar] [CrossRef]
Liu, W.; Anguelov, D.; Erhan, D.; Szegedy, C.; Reed, S.; Fu, C.-Y.; Berg, A.C. SSD: Single Shot MultiBox Detector. In Computer Vision—ECCV 2016; Leibe, B., Matas, J., Sebe, N., Welling, M., Eds.; Lecture Notes in Computer Science; Springer International Publishing: Cham, Switzerland, 2016; Volume 9905, pp. 21–37. ISBN 978-3-319-46447-3. [Google Scholar]
Redmon, J.; Divvala, S.; Girshick, R.; Farhadi, A. You Only Look Once: Unified, Real-Time Object Detection 2016. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016. [Google Scholar] [CrossRef]
Redmon, J.; Farhadi, A. YOLO9000: Better, Faster, Stronger. In Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, 21–26 July 2017; pp. 6517–6525. [Google Scholar]
Redmon, J.; Farhadi, A. YOLOv3: An Incremental Improvement. arXiv 2018, arXiv:1804.02767. [Google Scholar] [CrossRef]
Bochkovskiy, A.; Wang, C.-Y.; Liao, H.-Y.M. YOLOv4: Optimal Speed and Accuracy of Object Detection. arXiv 2020, arXiv:2004.10934. [Google Scholar] [CrossRef]
Khanam, R.; Hussain, M. What is YOLOv5: A deep look into the internal features of the popular object detector. arXiv 2024, arXiv:2407.20892. [Google Scholar] [CrossRef]
Li, C.; Li, L.; Jiang, H.; Weng, K.; Geng, Y.; Li, L.; Ke, Z.; Li, Q.; Cheng, M.; Nie, W.; et al. YOLOv6: A Single-Stage Object Detection Framework for Industrial Applications. arXiv 2022, arXiv:2209.02976. [Google Scholar] [CrossRef]
Wang, C.-Y.; Bochkovskiy, A.; Liao, H.-Y.M. YOLOv7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. arXiv 2022, arXiv:2207.02696. [Google Scholar] [CrossRef]
Varghese, R.; Sambath, M. YOLOv8: A Novel Object Detection Algorithm with Enhanced Performance and Robustness. In Proceedings of the 2024 International Conference on Advances in Data Engineering and Intelligent Computing Systems (ADICS), Chennai, India, 18–19 April 2024; pp. 1–6. [Google Scholar]
Wang, C.-Y.; Yeh, I.-H.; Mark Liao, H.-Y. YOLOv9: Learning What You Want to Learn Using Programmable Gradient Information. In Proceedings of the Computer Vision—ECCV 2024; Leonardis, A., Ricci, E., Roth, S., Russakovsky, O., Sattler, T., Varol, G., Eds.; Springer Nature: Cham, Switzerland, 2025; pp. 1–21. [Google Scholar]
Wang, A.; Chen, H.; Liu, L.; Chen, K.; Lin, Z.; Han, J.; Ding, G. YOLOv10: Real-Time End-to-End Object Detection. Adv. Neural Inf. Process. Syst. 2024, 37, 107984–108011. [Google Scholar]
Khanam, R.; Hussain, M. YOLOv11: An Overview of the Key Architectural Enhancements. arXiv 2024, arXiv:2410.17725. [Google Scholar] [CrossRef]
Krizhevsky, A.; Sutskever, I.; Hinton, G.E. ImageNet classification with deep convolutional neural networks. Commun. ACM 2017, 60, 84–90. [Google Scholar] [CrossRef]
Frizzi, S.; Bouchouicha, M.; Ginoux, J.; Moreau, E.; Sayadi, M. Convolutional neural network for smoke and fire semantic segmentation. IET Image Process. 2021, 15, 634–647. [Google Scholar] [CrossRef]
Bohush, R.P.; Ablameyko, S.V. Algorithm for forest fire smoke detection in video. J. Belarusian State Univ. Math. Inform. 2021, 1, 91–101. [Google Scholar] [CrossRef]
Peng, Y.; Wang, Y. Real-time forest smoke detection using hand-designed features and deep learning. Comput. Electron. Agric. 2019, 167, 105029. [Google Scholar] [CrossRef]
Basturk, N.S. Forest fire detection in aerial vehicle videos using a deep ensemble neural network model. Aircr. Eng. Aerosp. Technol. 2023, 95, 1257–1267. [Google Scholar] [CrossRef]
Hou, Z.; Yang, C.; Sun, Y.; Ma, S.; Yang, X.; Fan, J. An object detection algorithm based on infrared-visible dual modal feature fusion. Infrared Phys. Technol. 2024, 137, 105107. [Google Scholar] [CrossRef]
Wang, H.; Zhang, Y.; Zhu, C. YOLO-LFD: A Lightweight and Fast Model for Forest Fire Detection. Comput. Mater. Contin. 2025, 82, 3399–3417. [Google Scholar] [CrossRef]
Liu, H.; Zhu, J.; Xu, Y.; Xie, L. Mcan-YOLO: An Improved Forest Fire and Smoke Detection Model Based on YOLOv7. Forests 2024, 15, 1781. [Google Scholar] [CrossRef]
Wang, Z.; Xu, L.; Chen, Z. FFD-YOLO: A modified YOLOv8 architecture for forest fire detection. Signal Image Video Process. 2025, 19, 265. [Google Scholar] [CrossRef]
Yun, B.; Zheng, Y.; Lin, Z.; Li, T. FFYOLO: A Lightweight Forest Fire Detection Model Based on YOLOv8. Fire 2024, 7, 93. [Google Scholar] [CrossRef]
Wang, H.; Fu, X.; Yu, Z.; Zeng, Z. DSS-YOLO: An improved lightweight real-time fire detection model based on YOLOv8. Sci. Rep. 2025, 15, 8963. [Google Scholar] [CrossRef]
Han, S.; Mao, H.; Dally, W.J. Deep Compression: Compressing Deep Neural Networks with Pruning, Trained Quantization and Huffman Coding. arXiv 2016, arXiv:1510.00149. [Google Scholar] [CrossRef]
Courbariaux, M.; Bengio, Y.; David, J.-P. BinaryConnect: Training Deep Neural Networks with binary weights during propagations. Adv. Neural Inf. Process. Syst. 2015, 28, 3123–3131. [Google Scholar]
Rastegari, M.; Ordonez, V.; Redmon, J.; Farhadi, A. XNOR-Net: ImageNet Classification Using Binary Convolutional Neural Networks. In Computer Vision—ECCV 2016; Leibe, B., Matas, J., Sebe, N., Welling, M., Eds.; Lecture Notes in Computer Science; Springer International Publishing: Cham, Switzerland, 2016; Volume 9908, pp. 525–542. ISBN 978-3-319-46492-3. [Google Scholar]
Hinton, G.; Vinyals, O.; Dean, J. Distilling the Knowledge in a Neural Network. arXiv 2015, arXiv:1503.02531. [Google Scholar]
Fire Detection Dataset. Available online: https://www.kaggle.com/datasets/atulyakumar98/test-dataset (accessed on 2 April 2025).
Shamsoshoara, A.; Afghah, F.; Razi, A.; Zheng, L.; Fulé, P. Aerial Images for Pile Fire Detection Using Drones (UAVs) 2020; IEEE DataPort: Porto, Portugal, 2020. [Google Scholar] [CrossRef]
Russell, B.C.; Torralba, A.; Murphy, K.P.; Freeman, W.T. LabelMe: A Database and Web-Based Tool for Image Annotation. Int. J. Comput. Vis. 2008, 77, 157–173. [Google Scholar] [CrossRef]
Wu, H.; Wu, D.; Zhao, J. An intelligent fire detection approach through cameras based on computer vision methods. Process Saf. Environ. Prot. 2019, 127, 245–256. [Google Scholar] [CrossRef]
Ge, Z.; Liu, S.; Wang, F.; Li, Z.; Sun, J. YOLOX: Exceeding YOLO Series in 2021. arXiv 2021, arXiv:2107.08430. [Google Scholar] [CrossRef]
Li, H.; Li, J.; Wei, H.; Liu, Z.; Zhan, Z.; Ren, Q. Slim-neck by GSConv: A lightweight-design for real-time detector architectures. J. Real-Time Image Process. 2024, 21, 62. [Google Scholar] [CrossRef]
Yu, Y.; Zhang, Y.; Cheng, Z.; Song, Z.; Tang, C. MCA: Multidimensional collaborative attention in deep convolutional neural networks for image recognition. Eng. Appl. Artif. Intell. 2023, 126, 107079. [Google Scholar] [CrossRef]
Ma, S.; Xu, Y. MPDIoU: A Loss for Efficient and Accurate Bounding Box Regression. arXiv 2023, arXiv:2307.07662. [Google Scholar] [CrossRef]
Liu, Z.; Li, J.; Shen, Z.; Huang, G.; Yan, S.; Zhang, C. Learning Efficient Convolutional Networks through Network Slimming. arXiv 2017, arXiv:1708.06519. [Google Scholar] [CrossRef]
He, K.; Zhang, X.; Ren, S.; Sun, J. Deep Residual Learning for Image Recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 770–778. [Google Scholar]

Figure 1. Data distribution. (a) Darker blue represents the flame label, and lighter blue represents the smoke label, the dataset contains over 4000 flame samples and approximately 2000 smoke samples, with smoke targets representing a smaller proportion; (b) the training set exhibits a predominance of small targets, complemented by multi-scale smoke data, which enhances the model’s generalization capability across varying fire scenarios; (c) the spatial distribution of bounding box center points demonstrates balanced coverage across the entire image space, validating the dataset’s comprehensive spatial representation. (d) Over 50% of targets exhibit aspect ratios below 16%, a feature that improves the model’s sensitivity to small fire points, critical for early wildfire detection in practical applications.

Figure 2. The YOLOv8 architecture consists of three parts: the backbone on the left for feature extraction, the neck in the middle for multi-scale fusion, and the head on the right for classification and regression. The orange circles labeled “c” in the figure represent the concatenation (Concat) operation for feature aggregation, orange circles labeled “a” represent the residual join (Add) operation, and the large arrow to the right means “represent”.

Figure 3. Lightweight high-performance cross-stage partial network VoV-GSCSP structure.

Figure 4. SlimNeck architecture.

Figure 5. MCA attention mechanism, which processes feature interactions across channel (C), height (H), and width (W) dimensions through three parallel branches. This architecture enhances spatial feature perception by adaptively aggregating global and local contextual information, while minimizing computational overhead.

Figure 6. Minimum Point Distance Intersection over Union (MPDIoU) computation example. The green bounding box (Box1, predicted) and blue bounding box (Box2, ground truth) illustrate the geometric relationship for MPDIoU calculation.

Figure 7. YOLOv8n-SMMP network framework, illustrating the optimized architecture integrating SlimNeck, MCA, and lightweight feature fusion module VoV-GSCSP for efficient and precise forest fire detection.

Figure 8. Schematic diagram of channel pruning, where the blue plane represents the channels whose value of the scaling factor

γ

is greater than the preset threshold constant c during training, and the orange plane represents the channels whose scaling factor

γ

is less than the threshold.

Figure 8. Schematic diagram of channel pruning, where the blue plane represents the channels whose value of the scaling factor

γ

is greater than the preset threshold constant c during training, and the orange plane represents the channels whose scaling factor

γ

is less than the threshold.

Figure 9. Structured pruning framework, demonstrating the iterative channel evaluation, dependency-aware pruning, and layer-specific optimization for maintaining model performance. The dotted arrows indicate how the relevant parts are handled in the pruning strategy.

Figure 10. Loss plot during YOLOv8n-SMMP model training. (a–c) Loss curves of the training set during training, and (d–f) loss curves of the validation set during training.

Figure 11. Comparison of precision-recall (P—R) curves before and after improvement. (a) P—R curve of YOLOv8n model’s detection effect on two types of targets: fire and smoke; (b) P—R curves of YOLOv8n-SMMP.

Figure 12. Comparison of F1 curves before and after improvement. (a) F1 curve of YOLOv8n model’s detection effect on two types of targets: fire and smoke; (b) F1 curves of YOLOv8n-SMMP.

Figure 13. Detection results of YOLOv8n model on part of the forest fire test set images.

Figure 14. The detection results of the improved YOLOv8n-SMMP model on part of the forest fire test set images in this paper.

Figure 15. Comparative performance analysis of the proposed YOLOv8n-SMMP against state-of-the-art lightweight forest fire detect models YOLO-LFD [36] and DSS-YOLO [40] across four critical metrics: (a) mean average precision (mAP@0.5), (b) model size (Params), (c) computational complexity (GFLOPs), and (d) inference speed (FPS).

Table 1. Experimental environment configuration.

Experimental Environment	Type
CPU	Intel-Core i7-11700
GPU	NVIDIA GeForce GTX 4080
Memory	24GB
Operating system	Linux-Ubuntu20.04
Deep learning framework	PyTorch1.11
Expansion pack	CUDA11.3, CUDnn8.0.4, OpenCV4.6.0.6, Torch_Pruning, etc.
IDE	PyCharm

Table 2. Results of ablation experiments.

YOLOv8n	MCA	SlimNeck	MPDIoU	Prune	Map@0.5/(%)	Params/10⁶	GFLOPs	FPS
√					64.2	3.01	8.1	62.3
√	√				65.9	3.06	8.1	60.8
√		√			64.6	2.82	7.4	63.5
√			√		66.7	3.01	8.1	63
√				√	64.7	2.16	5.5	76.1
√	√	√			66.8	2.88	7.4	62.1
√	√	√	√		67.0	2.88	7.4	69.1
√	√	√	√	√	67.5	2.08	5.4	82.6

Table 3. Results of comparative experiments.

Model	Map@0.5/(%)	Params/10⁶	GFLOPs
YOLOv8n [26]	64.2	3.01	8.1
YOLOv8s [26]	63.6	11.2	28.8
YOLOv7-tiny [25]	62.4	6.02	13.2
YOLOv11n [29]	63.7	2.6	6.6
YOLOv8n-SMMP	67.5	2.08	5.4

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Zhou, N.; Gao, D.; Zhu, Z. YOLOv8n-SMMP: A Lightweight YOLO Forest Fire Detection Model. Fire 2025, 8, 183. https://doi.org/10.3390/fire8050183

AMA Style

Zhou N, Gao D, Zhu Z. YOLOv8n-SMMP: A Lightweight YOLO Forest Fire Detection Model. Fire. 2025; 8(5):183. https://doi.org/10.3390/fire8050183

Chicago/Turabian Style

Zhou, Nianzu, Demin Gao, and Zhengli Zhu. 2025. "YOLOv8n-SMMP: A Lightweight YOLO Forest Fire Detection Model" Fire 8, no. 5: 183. https://doi.org/10.3390/fire8050183

APA Style

Zhou, N., Gao, D., & Zhu, Z. (2025). YOLOv8n-SMMP: A Lightweight YOLO Forest Fire Detection Model. Fire, 8(5), 183. https://doi.org/10.3390/fire8050183

Article Menu

YOLOv8n-SMMP: A Lightweight YOLO Forest Fire Detection Model

Abstract

1. Introduction

2. Materials and Methods

2.1. Datasets

2.2. Optimization of YOLOv8 Model

2.2.1. Baseline Model Selection and Architecture

2.2.2. Introduce the SlimNeck Solution

2.2.3. Integrated MCA Attention Mechanism

2.2.4. Introduced the MPDIoU Loss Function

2.2.5. Improved YOLO Network Model YOLOv8n-SMMP

2.2.6. Pruning Algorithm Design

3. Results and Discussion

3.1. Evaluation Metrics

3.2. Experimental Environment and Parameter Setting

3.2.1. Experimental Environment

3.2.2. Experimental Parameter Setting

3.3. Experimental Results

3.3.1. Ablation Experiment

3.3.2. Comparative Experiment

4. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

Abbreviations

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI