RLFNet: A Real-Time Lightweight Network for Forest Fire Detection on Edge Devices

Huang, Zhengshen; Kou, Weili; Zheng, Chen; Di, Guangzhi; Zhang, Qixing; Ma, Chenhao

doi:10.3390/rs18101543

Open AccessArticle

RLFNet: A Real-Time Lightweight Network for Forest Fire Detection on Edge Devices

by

Zhengshen Huang

¹,

Weili Kou

^1,*

,

Chen Zheng

²,

Guangzhi Di

¹,

Qixing Zhang

³

and

Chenhao Ma

¹

College of Big Data and Intelligent Engineering, Southwest Forestry University, Kunming 650224, China

²

School of Mathematics and Statistics, Henan University, Kaifeng 745004, China

³

State Key Laboratory of Fire Science, University of Science and Technology of China, Hefei 230027, China

^*

Author to whom correspondence should be addressed.

Remote Sens. 2026, 18(10), 1543; https://doi.org/10.3390/rs18101543

Submission received: 3 March 2026 / Revised: 17 April 2026 / Accepted: 11 May 2026 / Published: 13 May 2026

(This article belongs to the Special Issue Forest Fire Monitoring Using Remotely Sensed Imagery)

Download

Browse Figures

Versions Notes

Highlights

What are the main findings?

RLFNet, a real-time lightweight network, is well suited for edge device deployment for real-time forest fire detection. On our self-constructed dataset, it achieves mAP50 = 76.5% and 224.8 FPS using only 1.9 million parameters and 5.0 GFLOPs, realizing an optimal balance between accuracy and efficiency.
The model improves detection accuracy and inference speed while reducing parameter count and computational complexity by integrating three self-designed modules: a Parallel Multi-Scale Extraction Block (PMEB), a Bidirectional Cross Fusion Module (BCFM), and a Faster Inference Detection Head (FIDH). In addition, a pruning strategy is applied to further optimize the model.

What are the implications of the main findings?

The proposed model provides a practical, deployable solution for real-time forest fire detection on resource-constrained edge devices (e.g., UAVs, robots, and cameras), ensuring high detection accuracy and robust performance in complex forest environments.
The results demonstrate that the collaborative lightweight design of the backbone, neck, and head networks, combined with adaptive pruning, effectively resolves the trade-off between high accuracy and computational efficiency seen in existing models.

Abstract

Forest fires cause severe ecological and economic losses, so timely and accurate detection becomes crucial for effective prevention and control. Edge devices with intelligent algorithms can detect forest fires in real time. Current deep learning algorithms can achieve high accuracy, but they are not suitable for edge devices because they require substantial computing resources. To address this issue, this study proposes a real-time lightweight forest fire detection network (RLFNet) improved from YOLOv11n, with three key enhancements to the backbone, neck, and head. (1) A Parallel Multi-Scale Extraction Block (PMEB) improves C3k2 with a dual-branch parallel strategy to enhance multi-scale feature extraction efficiency; (2) a Bidirectional Cross Fusion Module (BCFM) replaces simple Concat with a context-aware cross-gating mechanism to suppress background noise and reduce false alarms; and (3) a Faster Inference Detection Head (FIDH) leverages structural re-parameterization and group normalization to boost inference efficiency while reducing parameters. In addition, a Layer-Adaptive Magnitude-based Pruning (LAMP) strategy is applied to further improve model’s computational efficiency. Experimental results on the self-constructed Diverse Fire Scenario (DFS) dataset demonstrate that RLFNet reduces parameters and GFLOPs by 25.2% and 20.6%, boosts mAP50 by 5.3%, and achieves an inference speed of 225 FPS, attaining the best accuracy and speed among the compared models. Validation on a public remote sensing dataset further confirms its strong generalization. These results indicate that RLFNet provides a high efficiency and lightweight solution for edge devices to real-time detect forest fires.

Keywords:

forest fire detection; edge devices; real-time and lightweight network; PMEB; BCFM; FIDH

1. Introduction

In recent decades, the frequency and intensity of forest fires have continued to increase, posing significant threats to ecosystems, the carbon cycle, and public safety [1,2]. In this context, to achieve effective forest fire prevention and control, it is essential to develop real-time and accurate detection technologies, as undetected fires may rapidly escalate into large-scale disasters [3]. Edge devices such as unmanned aerial vehicles (UAVs) [4], robots [5] and cameras [6] have been widely adopted due to their high mobility and responsiveness, as well as their ability to enable real-time detection of forest fires by integrating artificial intelligence algorithms. However, edge devices are often deployed in remote and mountainous forest areas with complex environments, where they face significant constraints in computing resources, memory, power budgets, and network stability [7,8]. Accordingly, a real-time lightweight detection model is urgently needed to enable reliable, real-time, accurate, and continuous forest fire detection [9].

Early forest fire detection methods deployed on edge devices primarily relied on traditional machine learning algorithms, such as Random Forests [10] and Support Vector Machines [11]. These methods heavily depended on manually designed features, making it difficult for them to adapt to the diverse appearances of smoke and flames as well as complex backgrounds. With rapid advances in deep learning, it has become the dominant algorithms for forest fire detection. Their key advantage lies in learning highly discriminative feature representations more efficiently in an end-to-end manner compared to conventional methods, thereby enabling timely and reliable early warning. Convolutional Neural Networks (CNNs) and Transformers are two primary deep learning algorithms employed for forest fire detection. CNN-based detectors include Faster R-CNN [12], SSD [13], and RetinaNet [14], while models such as IFS-DETR [15] and FTA-DETR [16] are built upon the Transformer architecture. Although these detectors perform robustly across a variety of fire detection tasks, their high computational complexity often compromises efficiency and inference speed, making them less suitable for deployment on resource-constrained edge devices and for real-time detection scenarios.

Since the emergence of the YOLO family, YOLO-based detectors have gradually become a major research focus in object detection due to their strong real-time performance and promising accuracy [17,18,19,20]. This advantage has also led to their increasingly widespread adoption in forest fire detection, significantly improving detection accuracy [21,22,23,24]. However, many improvements built upon the YOLO architecture often increase the parameter count and computational complexity, thereby undermining the feasibility of deployment on edge devices. Given the limited computational capacity of edge devices and the substantial computational overhead incurred by existing high-accuracy YOLO variants, there is an urgent need to develop more lightweight models to better meet the requirements of edge device deployment.

To meet the requirements of diverse application scenarios, several lightweight YOLO variants have been developed. For example, a YOLOv5-based benthic species recognition method reduces computational cost and model size by 66.0% and 40.5%, respectively [25]; YOLOv8-MDN-Tiny decreases parameter count and memory usage by 90.1% and 88.9%, improving passion fruit disease detection on handheld devices [26]; and Ji et al. [27] targets UAV-based welding defect detection, achieving 93.7% accuracy while reducing parameters by 21.6%, outperforming mainstream methods. Forest fire scenarios impose stricter demands on real-time, model lightweight, and detection accuracy. To address these, ref. Zhou and Jiang [28] redesigned the C3k2 structure with the FasterBlock module, which stacks multi-scale convolutions serially in a single path. This reduces parameters by 11.2%, improving suitability for resource-limited devices. However, the serial design also restricts multi-scale feature extraction efficiency and sacrifices detection accuracy. The authors of Zhu et al. [29] proposed the GERB module in the neck network. The module first expanded the input features via a 1 × 1 convolution, split them into a transformation branch and an identity branch, and introduced the RepConv to enrich feature representation. Finally, the two branches were simply concatenated for output. However, this lightweight design relied solely on simple concatenation, lacked contextual collaborative modeling, and thus struggled to suppress background interference, making it prone to false positives. The authors of Yu et al. [30] adopted composite scaling and a bidirectional feature pyramid to construct a lightweight detection head. It simultaneously scaled all model dimensions using a single scaling factor

ϕ

to improve accuracy while controlling computational cost. However, although this detection head achieved improved inference speed, it only reached 27.2 FPS, which can hardly meet the strict real-time requirements of edge devices in forest fire scenarios.

Beyond architectural modifications to the YOLO backbone, neck, and head, network pruning provides another practical route to model lightweighting [31]. In particular, layer-wise sparsity pruning has been widely adopted for its favorable trade-off between sparsity and detection accuracy [32,33,34]. However, many existing pruning approaches rely on heuristic rules and extensive hyperparameter tuning, which increases optimization costs and may compromise generalization and deployment robustness. The authors of Lee et al. [35] proposed Layer-Adaptive Magnitude-based Pruning (LAMP), which allocates layer-wise sparsity by minimizing

ℓ_{2}

distortion and avoids subjective tuning. This strategy reduces redundant parameters and computation while preserving detection performance, making it crucial for real-time deployment on edge devices.

Although deep learning has greatly improved the accuracy of forest fire detection, high-accuracy models typically incur substantial computational overhead, making them difficult to deploy effectively on resource-constrained edge devices. Existing lightweight models largely rely on redesigning the YOLO architecture, while pruning strategies are less frequently adopted. Therefore, achieving an effective balance between detection accuracy and efficiency in forest fire scenarios remains challenging.

To address the challenges mentioned above, this study proposes a real-time lightweight fire detection network (RLFNet) for forest fire detection on edge devices. Built upon YOLOv11, RLFNet introduces systematic lightweight improvements to the backbone, neck, and head, and further applies the LAMP strategy for holistic optimization. The main contributions and innovations of this study are as follows:

A Diverse Fire Scenarios (DFS) dataset is constructed to cover various fire types, viewpoints, and environmental conditions. It mitigates common limitations of public datasets such as limited data volume, scenario diversity, and unreliable annotations, thereby enhancing model robustness and generalization in forest fire scenarios.
A Parallel Multi-Scale Extraction Block (PMEB) is proposed, which designs a channel-grouping strategy to preserve a low-cost branch and perform parallel multi-kernel convolutions on grouped channels, enhancing multi-scale representation with low overhead and avoiding the constraints on feature extraction from lightweight serial single-branch kernel stacking.
A Bidirectional Cross Fusion Module (BCFM) is presented, which overcomes the inherent limitations of conventional feature concatenation by designing a context-aware cross-gating mechanism to achieve complementary cross-stage channel fusion, thereby significantly enhancing robustness against background interference.
A Faster Inference Detection Head (FIDH) is devised, which enhances localization accuracy through structural re-parameterization, while incorporating group normalization to stabilize small-batch real-time inference, thereby improving the model’s inference efficiency and stability on edge devices. The optimal value of each evaluation metric in the table is marked in bold.

The rest of this study is organized as follows: Section 2 introduces the materials and methods, including a detailed explanation of each module; Section 3 presents the experimental setup, ablation study, comparison results, and visualization; Section 4 discusses the limitations of this study and outlines future work; Section 5 summarizes the study.

2. Materials and Methods

2.1. Overall Framework of RLFNet for Forest Fire Detection

The overall workflow of the proposed method is illustrated in Figure 1a. First, the DFS dataset is constructed by collecting diverse fire scenarios, which improves the model’s robustness and generalization in complex forest fire environments. Next, targeting remote and complex forest environments, this study proposes RLFNet, a lightweight detection network tailored for edge device deployment, reducing computational cost while maintaining high detection accuracy. The LAMP strategy is further applied to RLFNet to adaptively prune redundant parameters across layers while preserving a compact architecture, thereby improving its practicality and deployability on resource-constrained platforms. Finally, detection examples on forest fire images further demonstrate that RLFNet has strong potential for edge deployment and is suitable for UAV and other edge-device scenarios. As shown in Figure 1b, RLFNet lies in the upper-left region, suggesting that it delivers high accuracy and fast inference under a low computational budget, highlights its efficient lightweight design. The corresponding metrics are summarized in Table 1. In all subsequent tables, the optimal value of each evaluation metric is marked in bold.

2.2. The Diverse Fire Scenarios (DFS) Dataset Construction

High-quality datasets are a crucial foundation for advancing deep learning-based UAV forest fire detection research [45,46]. To this end, the Diverse Fire Scenarios (DFS) dataset is constructed with a custom web crawler to collect fire-related images and videos from various public online sources. A standardized preprocessing pipeline is adopted to guarantee data quality and diversity.

Specifically, considering the rapid morphological changes and motion blur exhibited by flame and smoke in forest fire scenarios, one frame is extracted every three frames from video data, which reduces information redundancy caused by excessive inter-frame similarity when the sampling interval is too small, while avoiding the loss of key information when the interval is excessively large. Next, in the image screening stage, the structural similarity index (SSIM) is adopted for image-level deduplication. To achieve an optimal balance between removing redundant data and preserving sample diversity, image pairs with an SSIM value greater than 0.4 are identified as highly similar, and only one representative image is retained to suppress redundancy. Subsequently, corrupted files and low-quality samples are manually inspected and removed to ensure data reliability and reduce sampling bias. After cleaning, all remaining images are annotated according to uniform criteria and classified into two categories: fire and smoke. The final constructed DFS dataset contains a total of 5005 images, which are divided into training, validation, and test sets with a ratio of 7:2:1 to ensure the fairness and reliability of experimental evaluation.

To more accurately quantify the scale distribution characteristics of the targets, statistics on the normalized width and height are performed for the 10,062 annotated boxes in the DFS dataset. As shown in Figure 2, more than 70% of the target boxes are concentrated within the normalized width range of 0.0–0.6 and normalized height range of 0.4–0.8. In this region, the height of target boxes is generally greater than the width, which is highly consistent with the vertically extended morphological characteristics of flame and smoke. Further statistics indicate that the mean aspect ratio of all target boxes is 0.72 and the median is 0.68, demonstrating that the size ratios of most target boxes are reasonably distributed without obvious extreme outliers. Meanwhile, the DFS dataset covers targets with diverse width and height scales, which better adapts to detection requirements in complex scenarios and provides sufficient and reliable sample support for robust detection in complicated forest fire environments.

Sample instances of the DFS dataset are illustrated in Figure 3, including forest fire and non-forest fire scenarios such as urban areas, roads, and open fields. Both scene types exhibit rich semantic variations including multi-scale targets and complex background interference, which support comprehensive feature learning of fire and smoke for the model.

2.3. Overview of RLFNet

The overall architecture of RLFNet is illustrated in Figure 4. This study improves the original C3k2 modules with PMEB to enhance multi-scale feature representation while reducing parameter overhead. Then, BCFM is introduced to effectively fuse shallow spatial details with deep semantic information, alleviating the insufficient information coordination caused by conventional concat-based fusion. Finally, the original detection head is redesigned as FIDH to improve localization accuracy and enhance real-time inference performance on edge devices.

2.4. Parallel Multi-Scale Extraction Block (PMEB)

In forest fire detection, flames and smoke targets exhibit pronounced multi-scale variability, posing significant challenges to accurate detection, particularly under the strict computational constraints of edge device platforms. To address the inefficiency of conventional single-branch and multi-kernel architectures in multi-scale forest fire detection, this study proposes a Parallel Multi-Scale Extraction Block (PMEB), which reorganizes feature channels into kernel-specific subspaces. This design significantly reduces the number of parameters while improving detection accuracy. Figure 5 presents the structure of PMEB, which is described in detail in the following section.

Given an input feature map

X \in R^{B \times C \times H \times W}

(where B is the batch size, C the number of channels, H and W the height and width of X), PMEB first introduces a channel-splitting-driven feature decoupling strategy, which explicitly divides the input feature into a base feature branch with low computational cost and a feature branch that is highly sensitive to scale variations:

X_{cheap}, X_{group} = Split (X, [\frac{C}{2}, \frac{C}{2}], \dim = 1)

(1)

Unlike traditional multi-scale methods that directly stack multi-branch or multi-kernel convolutions in a shared feature space, this design starts from the feature organization level and transforms the multi-scale modeling problem into a channel-level structured division of labor. Among them,

X_{cheap}

serves as a lightweight straight-through branch, which preserves basic semantic information and control the overall computational overhead; while

X_{group}

is specially allocated for subsequent multi-scale feature extraction, thereby improving the model’s extraction ability to scale-sensitive feature extraction without significantly increasing the number of parameters. Subsequently, the scale-sensitive branch

X_{group}

is explicitly rearranged into a structured form suitable for parallel multi-scale processing, with dimensions

(B, C_{g}, H, W, G)

, where G denotes the number of convolutional kernel groups and

C_{g} = \frac{C}{2 G}

. This rearrangement is not merely a tensor transformation, but a structured reorganization of feature channels according to scale requirements, which allocates independent and non-interfering channel subspaces to different convolution kernels. Formally, the reorganized feature representation can be expressed as:

X_{group} = [X_{1}, X_{2}, \dots, X_{G}], X_{i} \in R^{B \times C_{g} \times H \times W}

(2)

Through kernel-specific channel partitioning, each scale operates exclusively on its corresponding channel subspace, thereby avoiding cross-scale feature interference and significantly reducing computational redundancy. Each channel subgroup

X_{i}

is then processed by a convolution with its corresponding kernel size

k_{i}

:

Y_{i} = {Conv}_{k_{i}} (X_{i}), i = 1, \dots, G

(3)

Specifically, the

3 \times 3

convolution branch focuses on capturing fine-grained textures and irregular flame boundaries, which helps reduce missed detection of small-scale fire targets, while the

5 \times 5

convolution branch enlarges the receptive field to enhance feature perception of low-contrast and diffusive smoke regions. Since convolutions at different scales are constrained to independent channel subspaces, parallel multi-scale feature extraction without introducing additional branch overhead, thereby effectively exploiting scale complementarity. The outputs of all scale-specific branches are subsequently stacked into a tensor of shape

(G, B, C_{g}, H, W)

and then rearranged back to

(B, C_{g}, G, H, W)

, forming a unified multi-scale feature representation:

X_{ms} = Concat (Y_{1}, Y_{2}, \dots, Y_{G})

(4)

The resulting multi-scale feature

X_{ms}

is concatenated with the lightweight bypass branch

X_{cheap}

:

X_{cat} = Concat (X_{cheap}, X_{ms})

(5)

and a

1 \times 1

convolution is applied to fuse the complementary multi-scale features, while maintaining an overall compact structure, this fusion process effectively integrates information across scales and further reduces parameter count and computational complexity. Finally, the overall mapping function of the PMEB module can be expressed as:

X_{out} = {Conv}_{1 \times 1} (Concat (X_{cheap}, {{Conv}_{k_{i}} (X_{i})}_{i = 1}^{G}))

(6)

The performance advantage of PMEB mainly stems from its channel-splitting strategy, which decouples scale-sensitive features from low-cost base features and restricts multi-scale convolutions to kernel-specific channel subspaces for parallel execution. Benefiting from this design, PMEB enhances the complementary scale information of fire and smoke targets without introducing additional branch overhead, thereby achieving more accurate and stable forest fire detection performance while maintaining a low computational cost and effectively meeting the lightweight and real-time deployment requirements of edge devices.

2.5. Bidirectional Cross Fusion Module (BCFM)

Feature fusion in many conventional CNN-based detectors is still implemented with simple channel concatenation, which overlooks the relative importance of features from different stages and limits effective coordination between shallow spatial details and deep semantic representations. Under complex forest fire conditions, this limitation often leads to feature fusion bias and increases the risk of false detections. To address this issue, a Bidirectional Cross Fusion Module (BCFM) is proposed. The module enhances cross-stage information interaction through a context-aware cross-gating mechanism, enabling adaptive fusion of complementary features, highlighting key fire-related regions, and suppressing complex background interference. The detailed design of the module is described below.

As illustrated in Figure 6, the input of the BCFM consists of two types of features extracted from different network stages: a shallow feature map

X_{0} \in R^{C_{1} \times H \times W}

, which primarily preserves fine-grained spatial structural information such as flame boundaries and smoke contours, and a deep feature map

X_{1} \in R^{C_{2} \times H \times W}

, which encodes high-level semantic attributes and the global distribution characteristics of flames and smoke. These two feature representations exhibit strong complementarity in terms of semantic level and information emphasis. When the channel dimensions of the two inputs are inconsistent (

C_{1} \neq C_{2}

), BCFM first applies a lightweight

1 \times 1

convolution to the shallow feature map for channel alignment:

X_{0}^{'} = f_{1 \times 1} (X_{0}), X_{0}^{'} \in R^{C_{2} \times H \times W}

(7)

This operation serves solely for feature-space alignment without introducing additional semantic interference, thereby ensuring that subsequent cross-stage interactions are conducted within a unified feature space. Subsequently, the aligned shallow feature

X_{0}^{'}

and the deep feature

X_{1}

are concatenated along the channel dimension to form a joint feature representation:

X_{cat} = [X_{0}^{'} ‖ X_{1}] \in R^{2 C_{2} \times H \times W}

(8)

In contrast to traditional approaches that directly treat concatenated features as the final fusion output, BCFM regards the joint feature as an intermediate representation for cross-stage information interaction. Building on this design, an adaptive reweighting strategy driven by global channel-wise context is applied to the joint feature

X_{cat}

to mitigate fusion bias caused by uniform weighting of heterogeneous features. This design captures channel-wise dependencies and dynamically modulates the response strength of different features.

z_{c} = \frac{1}{H \times W} \sum_{i = 1}^{H} \sum_{j = 1}^{W} X_{cat} (c, i, j)

(9)

Following this, the channel-wise context descriptor is transformed through two nonlinear mappings to generate the corresponding channel weight vector:

s = σ (W_{2} δ (W_{1} z)) \in R^{2 C_{2}}

(10)

This weight vector reflects the relative importance of different channels within the joint feature and is applied to perform adaptive reweighting:

{\tilde{X}}_{cat} = s ⊙ X_{cat}

(11)

Through this process, salient information within the joint feature is selectively enhanced, while redundant or interfering responses are effectively suppressed, providing a more stable and discriminative feature basis for subsequent cross-stage information interaction.

Distinct from existing attention-based fusion methods, BCFM does not directly treat the reweighted feature

{\tilde{X}}_{cat}

as the final fusion output. Instead, it further decomposes the reweighted feature along the channel dimension into two complementary attention subspaces:

{\tilde{X}}_{cat} = [S_{0} ‖ S_{1}], S_{0}, S_{1} \in R^{C_{2} \times H \times W}

(12)

Here,

S_{0}

and

S_{1}

correspond to the contextual guidance weights associated with the shallow and deep features, respectively. Based on the decomposed attention subspaces, BCFM establishes an explicit bidirectional cross fusion mechanism between shallow and deep features. The interaction is formulated as follows:

{\hat{X}}_{0} = X_{0} + (X_{1} ⊙ S_{1})

(13)

{\hat{X}}_{1} = X_{1} + (X_{0}^{'} ⊙ S_{0})

(14)

where

X_{1} ⊙ S_{1}

denotes that deep semantics guide shallow features to complement their missing semantic context;

X_{0}^{'} ⊙ S_{0}

indicates that shallow details, filtered through weight selection, reversely enhance deep features, enabling them to retain crucial shape boundaries. The final fused feature is obtained through concatenation:

X_{o u t} = [{\hat{X}}_{0} ‖ {\hat{X}}_{1}] \in R^{2 C_{2} \times H \times W}

(15)

The BCFM designs a context-aware cross-gating mechanism to overcome the limitation of unidirectional semantic propagation in traditional feature pyramid architectures. This design enables shallow details and deep semantic features to form a complementary and mutually constrained collaborative relationship within a unified module. Specifically, deep features provide stable global semantic context to suppress background interference such as clouds, haze, and illumination variations. Meanwhile, shallow structural details enhance the spatial discriminability of deep features, effectively reducing mismatches between flame boundaries and fire-like textures, making BCFM particularly suitable for robust forest fire detection in complex scenarios.

2.6. Faster Inference Detection Head (FIDH)

Although the original detection head of YOLOv11 adopts a decoupled Depthwise Convolution (DWConv) structure [47] to reduce computational complexity to some extent, it still suffers from redundant inference computation and insufficient localization accuracy in UAV-based forest fire detection scenarios. To address these limitations, as illustrated in Figure 7, a forest-fire-specific Faster Inference Detection Head (FIDH) is designed. The proposed FIDH further enhances robustness against complex forest backgrounds, improves the discriminability and localization accuracy of flames and smoke, and simultaneously ensures real-time inference capability on edge devices.

During training, the Diverse Branch Block (DBB) enriches the feature space by aggregating four parallel branches. Through structural re-parameterization, these branches are mathematically fused into an equivalent single convolution for inference, preserving an efficient single-path implementation, enabling the network to capture more fine-grained edge information across different receptive fields. This design effectively improves detection accuracy while introducing no additional runtime overhead. Specifically, let

I \in R^{C \times H \times W}

denotes the input tensor, and

O_{DBB - train} \in R^{D \times H \times W}

represents the fused feature output of the DBB during training. The mathematical expression of this fusion process is given by:

O_{DBB - train} = σ (\sum_{b = 1}^{4} {Branch}_{b} (I))

(16)

where

{Branch}_{b}

denotes the b-th parallel branch (including standard convolution, average pooling,

1 \times 1

convolution, and

1 \times 1

+

K \times K

convolution branches), and

σ

stands for the activation function (SiLU in this study). The feature map after DBB re-parameterization is fed into the Convolution with Group Normalization (Conv-GN) module for channel compression and feature normalization. Its

1 \times 1

convolution process can be expressed as:

I_{Conv - GN} = I \otimes F_{1 \times 1}

(17)

In forest fire detection tasks, the large size of input images makes large-batch training memory-intensive. Unlike the Batch Normalization (BN), the GN is independent of batch size and can maintain more stable behavior under the small-batch training conditions commonly encountered on edge device platforms, which is beneficial for improving inference efficiency. where

F_{1 \times 1} \in R^{D^{'} \times D \times 1 \times 1}

denotes the

1 \times 1

convolutional kernel, which is used for lightweight channel fusion. Subsequently, Conv-GN is applied to the feature map output by the

1 \times 1

convolution. Finally, the processed feature map is fed into the decoupled output layer and undergoes scale-adaptive adjustment via the Scale module.

The FIDH enriches the feature space during training through the DBB, which enhances the representational capacity of convolution during inference. Meanwhile, the Conv-GN module leverages group normalization to maintain stable feature responses under the small-batch inference conditions commonly encountered on edge device platforms. This design reduces computational complexity while preserving effective sensitivity to flame and smoke targets, thereby meeting the real-time detection requirements in complex aerial scenarios.

2.7. The LAMP Strategy

Despite the architectural efficiency of PMEB, BCFM, and FIDH, residual computational redundancy remains during inference. Therefore, the LAMP strategy is further introduced to adaptively prune low-contribution parameters, enabling additional reductions in computation and memory costs and improving real-time deployability on resource-constrained edge device platforms. This method is based on the LAMP score, the calculation formula for which is as follows:

score (u; W^{(i)}) : = \frac{{(W^{(i)} [u])}^{2}}{\sum_{v \geq u} {(W^{(i)} [v])}^{2}}

(18)

where

W^{(i)}

denotes the weight matrix, u and

v s .

are index variables, and

\sum_{v \geq u}

indicates summation over all indices

v s .

from the current index u to the end. This formula calculates the ratio of the squared weight at index u to the sum of the squared weights of all subsequent suffix weights, serving as a measure of the significance at this position.

To facilitate a clearer understanding of the overall LAMP strategy, the complete pruning procedure is illustrated in Figure 8.

First, for the input network, the LAMP scores for each weight are calculated. Then, we check if the global sparsity constraint is satisfied. If not, we repeat the score calculation and pruning loop. If satisfied, we determine whether to iterate for further optimization: if yes, we loop back to the score calculation stage; if no, we proceed to the next step. Finally, we fine-tune the pruned model to recover its accuracy, forming a complete pruning pipeline from input to optimized output.

This process prunes low-impact parameters and redundant connections, enabling the model to focus on highly informative features. By adaptively retaining critical weights across layers, LAMP strategy compresses the model effectively while preserving or even slightly boosting detection performance.

2.8. Evaluation Metrics

To evaluate the performance of the proposed model, the PASCAL VOC [48] evaluation criteria were adopted, as they are widely accepted and commonly used standards in object detection. This criterion primarily uses mean Average Precision (mAP) as its key performance indicator. The calculation of mAP is based on the model’s Precision and Recall. The specific calculation method is as follows:

\begin{matrix} Precision & = \frac{T P}{T P + F P} \end{matrix}

(19)

\begin{matrix} Recall & = \frac{T P}{T P + F N} \end{matrix}

(20)

\begin{matrix} A P & = \int_{0}^{1} P (r) d r \end{matrix}

(21)

\begin{matrix} m A P & = \frac{\sum_{i = 1}^{C} A P_{i}}{C} \end{matrix}

(22)

In this study, the proposed forest fire detection model is assessed with multiple performance metrics. True Positives (TPs) denote correctly identified fire instances, False Positives (FPs) represent non-fire scenarios misclassified as fire, and False Negatives (FNs) indicate missed fire instances. Precision (P) measures detection accuracy, while Recall (R) reflects the ability to capture actual fires. The mAP provides an overall performance evaluation across categories. Beyond accuracy, model complexity and efficiency are assessed through the number of parameters and GFLOPs, with higher GFLOPs indicating greater computational cost. In addition, FPS is used to evaluate real-time capability, which is crucial for fire detection systems requiring rapid response.

3. Experiments and Results

3.1. Experimental Setup

All experiments conducted in this study are implemented on a computational platform configured with an AMD Ryzen 9 7945HX CPU (integrated with Radeon Graphics, operating at 2.50 GHz). To ensure efficient execution of computationally intensive tasks, the platform is further equipped with an NVIDIA GeForce RTX 4060 GPU (8 GB memory) and 32 GB of RAM. For the model training process, the SGD algorithm is selected as the optimizer, with key parameters set as follows: a batch size of 16, a learning rate of 0.01, and a total training duration of 200 epochs. Additionally, all input images were uniformly resized to a resolution of

640 \times 640

pixels prior to training. This well-designed experimental setup provides a reliable foundation for validating the effectiveness of the proposed methodologies.

Notably, all comparison models are trained under the same experimental environment and identical hyperparameter settings as the proposed RLFNet to ensure fair and objective performance comparison.

3.2. Model Performance Evaluation

3.2.1. Comparison Experiments with Different Models

As shown in Table 1, this study conducts a comprehensive comparison between RLFNet and mainstream object detection models, including lightweight variants of the YOLO series, RetinaNet, and RT-DETR, evaluating their overall performance across three core dimensions: detection accuracy, model efficiency, and inference speed. Here, the suffixes “F” and “S” in the metrics correspond to fire and smoke targets, respectively. RLFNet stands out among all compared models: with a compact model structure featuring only 1.9M parameters and 5.0 GFLOPs, it achieves the highest mAP50 (76.5%) and mAP50-95 (53.3%), fully demonstrating its outstanding detection capability in multi-scale fire and smoke target detection tasks. Even when compared with the latest YOLOv13n, RLFNet still exhibits significant advantages in various core detection metrics, and its inference speed of 224.8 FPS surpasses all compared models, highlighting its robust real-time response capability. In summary, RLFNet successfully achieves an optimal balance between low resource consumption, high real-time inference efficiency, and excellent detection accuracy, fully verifying its practical value and good applicability in real-time fire detection scenarios on edge devices.

3.2.2. Ablation Experiment

Ablations on the proposed methods. This study conducts systematic ablation experiments to rigorously verify the effectiveness of each proposed module and their synergistic effects, and performs independent complexity analysis on each module to validate the rationality of the lightweight design.

Experimental results are shown in Table 2. Introducing PMEB alone significantly reduces the model parameters while effectively improving Precision, mAP50, and FPS, verifying the advantage of channel grouping and parallel multi-kernel convolution in enhancing the efficiency of multi-scale feature extraction. When introducing BCFM alone, although the parameters and computational complexity increase slightly, the model achieves the highest Precision of 78.2%, which fully demonstrates its ability to suppress background interference in complex scenarios. Introducing FIDH alone improves the inference speed by 24.1% to 202.6 FPS and increases mAP50 by 1.1%, reflecting its excellent performance in balancing localization accuracy and inference efficiency.

Although the Recall slightly decreases when each module is used individually, the joint integration of all three modules significantly boosts the Recall to the optimal value of 68.4%. Compared with the baseline model, mAP50 is improved by 3.5%, parameters and computational complexity are reduced by 3.9% and 9.5% respectively, and FPS is increased by 31.7%. The above results fully validate the effectiveness of multi-module collaborative optimization, enabling the model to achieve a more excellent balance among detection accuracy, real-time performance, and lightweight design.

Ablation experiment results on different kernel settings in PMEB. To verify the rationality of the selected kernel combination in PMEB, this study explores different combination settings composed of 1 × 1, 3 × 3, and 5 × 5 kernel sizes, i.e., [1, 3], [1, 5], and [3, 5]. As shown in Table 3, When using the [1, 3] combination, although the lightweight level and inference speed are improved, the excessively small kernels restrict the receptive field and weaken the model’s ability to capture multi-scale object features, resulting in reduced accuracy. For the [1, 5] combination, the receptive field is expanded to a certain extent, but the extraction of detailed features and global information remains unbalanced, leading to limited accuracy improvement. The [3, 5] combination achieves the optimal balance between accuracy and efficiency, indicating that this kernel combination can better match the feature extraction requirements of the model and is the optimal choice for PMEB.

3.3. Experimental Results with Different Pruning Ratios

To explore the sensitivity of RLFNet to the LAMP strategy, the model is gradually compressed by varying the pruning rate. The experimental hyperparameters, including the number of pruning iterations and fine-tuning epochs, are determined through a series of exploratory experiments and conventional empirical settings widely adopted in network pruning, ensuring effective recovery of the detection performance of the pruned network.

As shown in Table 4, the accuracy decreases gently when the pruning rate ranges from 1.1 to 1.4, but drops sharply beyond 2.0. This phenomenon reflects the uneven distribution of feature redundancy across different layers of the model, and also validates the idea of layer-adaptive pruning in LAMP: within a low pruning range, most removed channels are redundant shallow-layer channels with little impact on model performance, while excessive pruning damages critical sensitive layers, leading to a rapid decline in accuracy. Therefore, 1.1 is finally selected as the pruning rate in this paper, enabling RLFNet to achieve the optimal balance between detection accuracy and efficiency. The above results demonstrate that structural optimization combined with a reasonable pruning strategy can realize efficient forest fire detection without sacrificing accuracy.

3.4. Experimental Visualization on the DFS Dataset

As shown in Figure 9, the top row presents the detection results of the baseline YOLOv11n, while the bottom row shows those of RLFNet. In the forest fire scenario (a), YOLOv11n yields more false positives, whereas RLFNet detects a small missed fire spot. In the multi-scale scenario (b), RLFNet covers targets of different sizes and boundary regions with high confidence. In the strongly disturbed scenario (c), RLFNet avoids confusing flame-like bright lights with actual fires, demonstrating better robustness to interference. Overall, RLFNet achieves more reliable fire localization under scale variation, background clutter, and small targets, delivering higher accuracy in complex forest environments and stable real-time detection under resource constraints.

Visualization of Heatmap Experiments

Grad-CAM is used to visualize the feature responses of RLFNet and YOLO series models, assessing their discriminative ability in fire detection. As shown in Figure 10, YOLOv5n and YOLOv10n exhibit dispersed attention with significant background interference, especially in smoke-filled scenarios, resulting in limited focus on fire regions. YOLOv8n and YOLOv11n show improved attention but remain unstable, with hotspots occasionally spreading into non-fire areas. In contrast, RLFNet demonstrates concentrated and stable attention across all scenarios, effectively suppressing background noise and highlighting fire-related regions. Even under complex backgrounds and smoke interference, it maintains robust target focus. These results indicate that RLFNet offers superior feature extraction and anti-interference capability compared to mainstream YOLO models, consistent with the quantitative experiments, and underscores its practical value in complex forest fire environments.

3.5. Generalization Experiment

Collecting large-scale, annotated forest fire imagery is costly, risky, and difficult. To address this, the publicly available M⁴SFWD dataset [49], a multi-faceted synthetic dataset for remote sensing forest fires detection provides diverse, simulated fire scenarios across terrains, climates, time periods, illumination conditions, and fire scales, providing systematic evaluation support for the robustness and generalization ability of models.

As shown in Table 5, in the generalization experiments with lightweight models, RLFNet demonstrates an exceptionally balanced and outstanding overall performance. With only 1.9M parameters and a computational complexity of 5.0 GFLOPs, it achieves the highest mAP50 (87.2%) while maintaining relatively high Precision and Recall, fully demonstrating its excellent detection accuracy in forest fire target recognition. Notably, RLFNet’s FPS (312.5) ranks first among all compared models, highlighting its superior inference speed. Overall, RLFNet still strikes an optimal balance between detection accuracy, inference efficiency, and model complexity in generalization experiments, making it highly suitable for deployment in resource-constrained scenarios like UAV-based forest fire detection.

The left and right sides display the comparison results of RLFNet and YOLOv11n, respectively. In Figure 11, this scenario involves complex elements such as blurred boundaries and overlapping multi-targets, while YOLOv11n produces bounding boxes with relatively high confidence, RLFNet, in contrast, can separate smoke from the background in the blurred edge areas of the scenario, thus accurately capturing indistinct smoke. The heatmaps make it even more intuitive to see that RLFNet only focuses on key fire areas and suppresses background interference. This further validates the model’s excellent generalization ability, as well as its adaptability and accuracy in complex forest environments.

3.6. Inference Performance Analysis Based on PyTorch v.2.12.0 and TensorRT

To validate the lightweight design and efficiency of RLFNet in real-world applications, the model is deployed through a TensorRT pipeline consisting of three steps. First, after training, the model and optimal weights are exported to ONNX format using the Ultralytics API v.8.3.12 (opset = 12, dynamic = True). Second, the ONNX file is parsed with NVIDIA TensorRT’s OnnxParser v.1.22.0 (NVIDIA, Santa Clara, CA, USA) and compiled into an optimized engine file (.engine) with a 1 GB workspace and FP16 precision for GPU acceleration. The input size is fixed at (1, 3, 640, 640) to ensure stable real-time inference. Finally, deployment is implemented using PyCUDA v.2026.1 and the TensorRT Runtime API, including preprocessing, GPU memory allocation, inference execution, and post-processing. Performance is evaluated by averaging 100 runs to obtain stable FPS. The experimental results show that RLFNet achieved 415.3 FPS after TensorRT deployment, representing a nearly 29% speed improvement over YOLOv11n (321.2 FPS), and runs approximately 4.2× and 3.6× faster than RetinaNet (98.7 FPS) and RT-DETR (115.7 FPS), respectively. These results collectively confirm its strong real-time inference capacity and demonstrate its overall potential for edge computing applications, particularly on resource-constrained edge devices.

4. Discussion

To meet the lightweight requirements for deployable forest fire detection on edge devices, a real-time lightweight forest fire detection network, termed RLFNet, is proposed. Experimental results show that, compared with the YOLOv11n baseline, RLFNet improves mAP50 by 5.3% on the self-constructed dataset, while reducing the number of parameters and GFLOPs by 25.2% and 20.6%, and achieving an inference speed of 225 FPS. In addition, RLFNet also demonstrates robust generalization performance on the public remote-sensing wildfire dataset (M⁴SFWD). These results indicate that RLFNet achieves a better balance between accuracy and efficiency, validating its effectiveness and deployment potential for real-time forest fire detection on resource-constrained edge devices.

The superior accuracy–efficiency balance of RLFNet is mainly attributed to three key modules introduced on top of YOLOv11n: PMEB, BCFM, and FIDH. Specifically, (1) PMEB separates low-cost basic features from scale-sensitive features, and performs convolutions with different kernel sizes in parallel within specific channel subspaces, thereby avoiding the computational redundancy caused by traditional multi-branch designs and stacked multi-scale kernels [24,28,50], and significantly improving multi-scale feature extraction efficiency; (2) BCFM is designed with a context-aware cross-gating mechanism to adaptively enhance fused features and better highlight fine-grained discriminative contextual information, overcoming the limitations of simple concatenation-based fusion [51,52,53,54], thereby strengthening responses in fire-related regions, suppressing background interference, and reducing false alarms; (3) FIDH uses structural re-parameterization to balance high-accuracy training and compact inference, and combine it with GN to improve performance stability under small-batch training. To some extent, this addresses the decline in localization efficiency caused by the introduction of a large number of parameters in existing detection heads [55,56,57], thus effectively balancing inference speed and localization accuracy.

In addition, this study adopts LAMP as a lightweight strategy beyond structural design. Unlike methods that reduce computational cost only by modifying YOLO modules [17,20,21,23,58,59,60], such structural optimizations often fail to fully remove internal parameter redundancy, leaving limited room for further efficiency improvement in edge deployment scenarios. In contrast, LAMP further compresses redundancy through parameter sparsification. As shown in Table 3, the pruned RLFNet achieves a lower parameter count and computational complexity while its detection accuracy is further improved, demonstrating the effectiveness of the proposed pruning strategy. The key reason is that LAMP adaptively allocates sparsity according to inter-layer sensitivity, applying relatively conservative pruning to critical layers while imposing stronger constraints on redundant layers, thereby achieving a more robust accuracy–efficiency trade-off while reducing hyperparameter tuning costs.

Although encouraging results have been achieved, this study still has several limitations. First, in complex backgrounds, overlapping boundaries between flames and smoke may still lead to imprecise localization. Specifically, on the DFS test set, approximately 52% of localization errors with IoU < 0.5 are caused by blurred boundaries of fire and smoke regions, as well as their overlapping distribution, which affects the positioning accuracy of the detection box. In future work, edge-aware loss functions or contour refinement mechanisms could be introduced to enhance boundary representation. Second, early-stage smoke in images often occupies only a few pixels, which are tiny in size, weak in features, easily confused with the background, and prone to partial occlusion in complex environments, representing the main challenge in forest fire detection. Results on the DFS test set show that the miss rate for small fire targets (e.g., width < 32 pixels) reaches 18.3%. Therefore, more accurate small-object detection strategies will be explored to address the problems of missed detection and occlusion, enabling effective identification of early-stage fires. This is crucial for reducing ecological damage and improving the efficiency of emergency response. Meanwhile, in future research, more comprehensive comparisons will be conducted with more popular lightweight general-purpose object detection models and dedicated fire detection models to further verify the advancement and practicality of RLFNet. Finally, although these experiments are conducted on an RTX 4060 GPU without actual embedded deployment, the TensorRT acceleration pipeline in Section 3.6 ensures high compatibility and lays a solid foundation for future deployment on NVIDIA edge devices such as Jetson Nano. Real-world deployment and field tests in real forest scenarios will also be explored in future work.

Future work can still be pursued in three directions. First, NMS-free detection paradigms can be explored to reduce duplicate predictions during training, for example by introducing a dual label assignment strategy that combines one-to-one and one-to-many matching, thereby further reducing inference latency. Second, the cross-domain generalization ability of the model can be further improved through domain adaptation. This includes conducting richer synthetic data training, expanding the training corpus with more forest fire images collected via UAVs, robots and surveillance equipment, so as to better cover diverse real-world scenarios. Meanwhile, specific data augmentation strategies are adopted, such as using cross-seasonal forest fire data and simulated scene data, to enhance the model’s adaptability to different scene distributions. Finally, multimodal fusion methods can be further investigated, such as combining visible-light images with near-infrared cues that reflect smoke absorption and scattering characteristics, so as to improve the model’s overall perception of complex forest fire scenes.

5. Conclusions

This study proposes a real-time lightweight forest fire detection network (RLFNet), which adopts a lightweight design for the backbone, neck, and detection head of YOLOv11, and further optimizes the model using the LAMP strategy, making it more suitable for deployment on edge devices. Experiments on our self-constructed dataset show that RLFNet improves mAP50 by 5.3% over the baseline model, while reducing parameters and GFLOPs by 25.2% and 20.6%, respectively, and achieving the fastest inference speed of 225 FPS. Overall, RLFNet outperforms competing methods and achieves a more effective balance between accuracy and efficiency. Furthermore, generalization experiments on the public remote-sensing wildfire dataset M⁴SFWD demonstrate that RLFNet also achieves the best performance among all compared models, indicating strong generalization capability. With TensorRT acceleration, the inference throughput reaches 415 FPS, confirming its suitability for real-time detection under strict computational budgets.

Overall, RLFNet demonstrates strong engineering application potential on edge platforms such as UAVs, robots, and surveillance cameras. Its high efficiency allows the high-precision fire detection algorithm to run stably for a long time on UAVs or field monitoring devices with limited computing resources and power supply, which is of great practical significance for early fire warning, ecological environment protection and economic loss reduction.

Author Contributions

Z.H.: Writing—original draft, Data curation, Methodology, Validation, Visualization. W.K.: Writing—review and editing, Conceptualization, Funding acquisition, Supervision. C.Z.: Writing—review and editing, Formal analysis, Methodology. G.D.: Conceptualization, Methodology, Supervision. Q.Z.: Writing—review and editing, Validation, Visualization. C.M.: Formal analysis, Visualization. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported by the National Natural Science Foundation of China [grant number: 32260391]; Yunnan International Joint Laboratory of Natural Rubber Intelligent Monitor and Digital Applications [grant number: 202403AP140001]; Yunnan International Joint R&D Center (China-Malaysia) for Digital Monitoring, Management and Applications of Nature Reserves [grant number: 202503AP140040]; Yunnan Fundamental Research Projects [grant number: 2018FG001-059]; and the Xingdian Talent Support Program [grant number: XDYC-CYCX-2024-0021]. Additional support was provided by the Academician Li Wei Workstation of Yunnan Province (grant number: 202505AF350082).

Data Availability Statement

Data will be made available on request.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Cunningham, C.X.; Williamson, G.J.; Bowman, D.M.J.S. Increasing frequency and intensity of the most extreme wildfires on Earth. Nat. Ecol. Evol. 2024, 8, 1420–1425. [Google Scholar] [CrossRef]
Zhao, J.; Zheng, B.; Ciais, P.; Chen, Y.; Gasser, T.; Canadell, J.G.; Zhang, L.; Zhang, Q. Global warming amplifies wildfire health burden and reshapes inequality. Nature 2025, 647, 928–934. [Google Scholar] [CrossRef]
Ma, Y.; Zang, E.; Liu, Y.; Wei, J.; Lu, Y.; Krumholz, H.M.; Bell, M.L.; Chen, K. Long-term exposure to wildland fire smoke PM2.5 and mortality in the contiguous United States. Proc. Natl. Acad. Sci. USA 2024, 121, e2403960121. [Google Scholar] [CrossRef]
Akhloufi, M.A.; Couturier, A.; Castro, N.A. Unmanned Aerial Vehicles for Wildland Fires: Sensing, Perception, Cooperation and Assistance. Drones 2021, 5, 15. [Google Scholar] [CrossRef]
Gromek, P.; Lowe, T. Ground Robot Technologies in Wildfire Risk Reduction: The Viewpoint of the Fire Service. Prog. Disaster Sci. 2025, 26, 100435. [Google Scholar] [CrossRef]
Govil, K.; Welch, M.L.; Ball, J.T.; Pennypacker, C.R. Preliminary Results from a Wildfire Detection System Using Deep Learning on Remote Camera Images. Remote Sens. 2020, 12, 166. [Google Scholar] [CrossRef]
Carrillo, C.; Margalef, T.; Espinosa, A.; Cortés, A. Edge Computing Driven Forest Fire Spread Simulation: An Energy-Aware Study. J. Comput. Sci. 2025, 88, 102605. [Google Scholar] [CrossRef]
Shi, W.; Cao, J.; Zhang, Q.; Li, Y.; Xu, L. Edge Computing: Vision and Challenges. IEEE Internet Things J. 2016, 3, 637–646. [Google Scholar] [CrossRef]
Boroujeni, S.P.H.; Razi, A.; Khoshdel, S.; Afghah, F.; Coen, J.L.; O’Neill, L.; Fule, P.; Watts, A.; Kokolakis, N.M.T.; Vamvoudakis, K.G. A comprehensive survey of research towards AI-enabled unmanned aerial systems in pre-, active-, and post-wildfire management. Inf. Fusion 2024, 108, 102369. [Google Scholar] [CrossRef]
Ko, B.; Kwak, J.Y.; Nam, J.Y. Wildfire smoke detection using temporospatial features and random forest classifiers. Opt. Eng. 2012, 51, 017208. [Google Scholar] [CrossRef]
Kim, O.; Kang, D.J. Fire detection system using random forest classification for image sequences of complex background. Opt. Eng. 2013, 52, 067202. [Google Scholar] [CrossRef]
Zhang, Q.; Lin, G.; Zhang, Y.; Xu, G.; Wang, J. Wildland Forest Fire Smoke Detection Based on Faster R-CNN using Synthetic Smoke Images. Procedia Eng. 2018, 211, 441–446. [Google Scholar] [CrossRef]
Zhang, D.; Zhang, Y.; Dong, L.; Ruan, S.; Liu, Z. Deep learning-based forest fire detection using an improved SSD algorithm with CBAM. PLoS ONE 2025, 20, e0333574. [Google Scholar] [CrossRef]
Vieira, P. A Deep Learning Based Object Identification System for Forest Fire Detection. Fire 2021, 4, 75. [Google Scholar] [CrossRef]
Chen, J.; Han, H.; Liu, M.; Su, P.; Chen, X. IFS-DETR: A real-time industrial fire smoke detection algorithm based on an end-to-end structured network. Measurement 2025, 241, 115660. [Google Scholar] [CrossRef]
Zheng, H.; Wang, G.; Xiao, D.; Liu, H.; Hu, X. FTA-DETR: An efficient and precise fire detection framework based on an end-to-end architecture applicable to embedded platforms. Expert Syst. Appl. 2024, 248, 123394. [Google Scholar] [CrossRef]
Chen, Y.; Yuan, X.; Wang, J.; Wu, R.; Li, X.; Hou, Q. YOLO-MS: Rethinking multi-scale representation learning for real-time object detection. IEEE Trans. Pattern Anal. Mach. Intell. 2025, 47, 4240–4252. [Google Scholar] [CrossRef]
Xiao, Y.; Xu, T.; Xin, Y.; Li, J. FBRT-YOLO: Faster and Better for Real-Time Aerial Image Detection. Proc. Aaai Conf. Artif. Intell. 2025, 39, 8673–8681. [Google Scholar] [CrossRef]
Wang, C.; Sun, P.; Yang, C.; Teng, X.; Wang, R. AMSA-YOLO: Real-time Object Detection with Adaptive Multi-Scale Attention Mechanism. Neural Netw. 2026, 197, 108545. [Google Scholar] [CrossRef]
Shi, C.; Chen, Y.; Zhang, C.; Chang, D.G.; Chen, Y.J.; Wang, Q. ICSD-YOLO: Intelligent Detection for Real-time Industrial Field Safety. Expert Syst. Appl. 2025, 307, 130994. [Google Scholar] [CrossRef]
Hu, Y.; Zhan, J.; Zhou, G.; Chen, A.; Cai, W.; Guo, K.; Hu, Y.; Li, L. Fast forest fire smoke detection using MVMNet. Knowl.-Based Syst. 2022, 241, 108219. [Google Scholar] [CrossRef]
Li, R.; Hu, Y.; Li, L.; Guan, R.; Yang, R.; Zhan, J.; Cai, W.; Wang, Y.; Xu, H.; Li, L. SMWE-GFPNNet: A high-precision and robust method for forest fire smoke detection. Knowl.-Based Syst. 2024, 289, 111528. [Google Scholar] [CrossRef]
Yan, C.; Wang, J. MAG-FSNet: A High-Precision Robust Forest Fire Smoke Detection Model Integrating Local Features and Global Information. Measurement 2025, 247, 116813. [Google Scholar] [CrossRef]
Wang, L.; Guo, L.; Li, H.; He, B.; Yang, J.; Huang, Y. Enhanced forest fire detection via dynamic multiscale fusion and contextual partial cross features. Eng. Appl. Artif. Intell. 2025, 162, 112531. [Google Scholar] [CrossRef]
Zhang, L.; Fan, J.; Qiu, Y.; Jiang, Z.; Hu, Q.; Xing, B.; Xu, J. Marine zoobenthos recognition algorithm based on improved lightweight YOLOv5. Ecol. Inform. 2024, 80, 102467. [Google Scholar] [CrossRef]
Chen, D.; Lin, F.; Lu, C.; Zhuang, J.; Su, H.; Zhang, D.; He, J. YOLOv8-MDN-Tiny: A Lightweight Model for Multi-Scale Disease Detection of Postharvest Golden Passion Fruit. Postharvest Biol. Technol. 2025, 219, 113281. [Google Scholar] [CrossRef]
Ji, W.; Liu, S.; Deng, L.; Li, J.; Liu, Y.; Xiong, Z. WDI-YOLO: A lightweight steel bridge weld defect detection algorithm using UAV images. J. Constr. Steel Res. 2025, 235, 109833. [Google Scholar] [CrossRef]
Zhou, K.; Jiang, S. Forest Fire Detection Algorithm Based on Improved YOLOv11n. Sensors 2025, 25, 2989. [Google Scholar] [CrossRef]
Zhu, H.; Ling, W.; Yan, H.; Zhong, X.; Liao, F. YOLO-MP: A lightweight forest fire detection model. Ecol. Inform. 2025, 92, 103516. [Google Scholar] [CrossRef]
Yu, Q.; Liu, H.; Liu, W.; Wang, Y.; Kuang, W.; Hu, H.; Ouyang, Z.; Hu, W. LFNet: An end-to-end lightweight model for real-time fire detection in embedded firefighting systems. AIP Adv. 2025, 15, 115125. [Google Scholar] [CrossRef]
Zhou, N.; Gao, D.; Zhu, Z. YOLOv8n-SMMP: A Lightweight YOLO Forest Fire Detection Model. Fire 2025, 8, 183. [Google Scholar] [CrossRef]
Gadhikar, A.H.; Mukherjee, S.; Burkholz, R. Why random pruning is all we need to start sparse. Int. Conf. Mach. Learn. 2023, 202, 10542–10570. [Google Scholar]
Chen, Z.; Qiu, G.; Li, P.; Zhu, L.; Yang, X.; Sheng, B. Mngnas: Distilling adaptive combination of multiple searched networks for one-shot neural architecture search. IEEE Trans. Pattern Anal. Mach. Intell. 2023, 45, 13489–13508. [Google Scholar] [CrossRef] [PubMed]
Sun, W.; Zhou, A.; Stuijk, S.; Wijnhoven, R.G.J.; Nelson, A.; Li, H.; Corporaal, H. DominoSearch: Find layer-wise fine-grained N:M sparse schemes from dense neural networks. Adv. Neural Inf. Process. Syst. 2021, 34, 20721–20732. [Google Scholar]
Lee, J.; Park, S.; Mo, S.; Ahn, S.; Shin, J. Layer-adaptive sparsity for the Magnitude-based Pruning. arXiv 2021, arXiv:2010.07611. [Google Scholar] [CrossRef]
Jocher, G. YOLOv5 by Ultralytics. 2020. Available online: https://github.com/ultralytics/yolov5 (accessed on 13 November 2025).
Wang, C.Y.; Bochkovskiy, A.; Liao, H.Y.M. YOLOv7: Trainable Bag-of-Freebies Sets New State-of-the-Art for Real-Time Object Detectors. In Proceedings of the 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR); IEEE: Piscataway, NJ, USA, 2023; pp. 7464–7475. [Google Scholar] [CrossRef]
Jocher, G.; Qiu, J.; Chaurasia, A. Ultralytics YOLOv8, 2023. Available online: https://github.com/ultralytics/ultralytics (accessed on 13 November 2025).
Chen, H.; Chen, K.; Ding, G.; Han, J.; Lin, Z.; Liu, L.; Wang, A. YOLOv10: Real-Time End-to-End Object Detection. Adv. Neural Inf. Process. Syst. 2024, 37, 107984–108011. [Google Scholar] [CrossRef]
Ultralytics. YOLOv11: Real-Time Object Detection Model. 2024. Available online: https://github.com/ultralytics/ultralytics (accessed on 13 November 2025).
Tian, Y.; Ye, Q.; Doermann, D. YOLOv12: Attention-Centric Real-Time Object Detectors. arXiv 2025, arXiv:2502.12524. [Google Scholar] [CrossRef]
Lei, M.; Li, S.; Wu, Y.; Hu, H.; Zhou, Y.; Zheng, X.; Ding, G.; Du, S.; Wu, Z.; Gao, Y. YOLOv13: Real-time object detection with hypergraph-enhanced adaptive visual perception. arXiv 2025, arXiv:2506.17733. [Google Scholar] [CrossRef]
Lin, T.Y.; Goyal, P.; Girshick, R.; He, K.; Dollár, P. Focal loss for dense object detection. In Proceedings of the 2017 IEEE International Conference on Computer Vision (ICCV); IEEE: Piscataway, NJ, USA, 2017; pp. 2980–2988. [Google Scholar] [CrossRef]
Zhao, Y.; Lv, W.; Xu, S.; Wei, J.; Wang, G.; Dang, Q.; Liu, Y.; Chen, J. Detrs beat yolos on real-time object detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, 16–22 June 2024; pp. 16965–16974. [Google Scholar] [CrossRef]
Wang, M.; Yue, P.; Jiang, L.; Yu, D.; Tuo, T.; Li, J. An open flame and smoke detection dataset for deep learning in remote sensing based fire detection. Geo-Spat. Inf. Sci. 2025, 28, 511–526. [Google Scholar] [CrossRef]
Pesonen, J.; Raita-Hakola, A.M.; Joutsalainen, J.; Hakala, T.; Akhtar, W.; Koivumäki, N.; Markelin, L.; Suomalainen, J.; de Oliveira, R.A.; Pölönen, I.; et al. Boreal Forest Fire: UAV-collected wildfire detection and smoke segmentation dataset. Sci. Data 2025, 12, 1419. [Google Scholar] [CrossRef]
Howard, A.G.; Zhu, M.; Chen, B.; Kalenichenko, D.; Wang, W.; Weyand, T.; Andreetto, M.; Adam, H. Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv 2017, arXiv:1704.04861. [Google Scholar] [CrossRef]
Everingham, M.; Van Gool, L.; Williams, C.K.I.; Winn, J.; Zisserman, A. The pascal visual object classes (voc) challenge. Int. J. Comput. Vis. 2010, 88, 303–338. [Google Scholar] [CrossRef]
Wang, G.; Li, H.; Li, P.; Lang, X.; Feng, Y.; Ding, Z.; Xie, S. M4SFWD: A Multi-Faceted synthetic dataset for remote sensing forest wildfires detection. Expert Syst. Appl. 2024, 248, 123489. [Google Scholar] [CrossRef]
Jiang, Y.; Meng, X.; Wang, J. SFGI-YOLO: A Multi-Scale Detection Method for Early Forest Fire Smoke Using an Extended Receptive Field. Forests 2025, 16, 1345. [Google Scholar] [CrossRef]
Lin, T.Y.; Dollár, P.; Girshick, R.; He, K.; Hariharan, B.; Belongie, S. Feature Pyramid Networks for Object Detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, 21–26 June 2017. [Google Scholar] [CrossRef]
Liu, S.; Qi, L.; Qin, H.; Shi, J.; Jia, J. Path Aggregation Network for Instance Segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Salt Lake City, UT, USA, 18–23 June 2018. [Google Scholar] [CrossRef]
Tan, M.; Pang, R.; Le, Q.V. EfficientDet: Scalable and Efficient Object Detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, 13–19 June 2020. [Google Scholar] [CrossRef]
Ghiasi, G.; Lin, T.Y.; Le, Q.V. NAS-FPN: Learning Scalable Feature Pyramid Architecture for Object Detection. In Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA, 15–20 June 2019. [Google Scholar] [CrossRef]
He, Y.; Sahma, A.; He, X.; Wu, R.; Zhang, R. FireNet: A lightweight and efficient multi-scenario fire object detector. Remote Sens. 2024, 16, 4112. [Google Scholar] [CrossRef]
Wang, Q.; Guan, S.; Lyu, S.; Cheng, G. Enhancing drone-based fire detection with flame-specific attention and optimized feature fusion. Int. J. Appl. Earth Obs. Geoinform. 2025, 142, 104655. [Google Scholar] [CrossRef]
Lin, Z.; Yun, B.; Zheng, Y. LD-YOLO: A lightweight dynamic forest fire and smoke detection model with dysample and spatial context awareness module. Forests 2024, 15, 1630. [Google Scholar] [CrossRef]
Li, J.; Zhou, G.; Chen, A.; Wang, Y.; Jiang, J.; Hu, Y.; Lu, C. Adaptive linear feature-reuse network for rapid forest fire smoke detection model. Ecol. Inform. 2022, 68, 101584. [Google Scholar] [CrossRef]
Han, Y.; Duan, B.; Guan, R.; Yang, G.; Zhen, Z. LUFFD-YOLO: A lightweight model for UAV remote sensing forest fire detection based on attention mechanism and multi-level feature fusion. Remote Sens. 2024, 16, 2177. [Google Scholar] [CrossRef]
Chen, G.; Cheng, R.; Lin, X.; Jiao, W.; Bai, D.; Lin, H. LMDFS: A lightweight model for detecting forest fire smoke in UAV images based on YOLOv7. Remote Sens. 2023, 15, 3790. [Google Scholar] [CrossRef]

Figure 1. (a) End-to-end workflow from DFS construction to RLFNet training, pruning, and UAV-based detection. (b) Comparison of RLFNet with lightweight models on DFS (x: parameters, y: mAP50; bubble size: GFLOPs; color: FPS).

Figure 2. Width–height distribution of instance boxes in the DFS dataset.

Figure 3. Sample images from the DFS dataset.

Figure 4. The overall architecture of the proposed RLFNet.

Figure 5. The structure of PMEB.

Figure 6. The structure of the Bidirectional Cross Fusion Module (BCFM).

Figure 7. The structure of Faster Inference Detection Head (FIDH).

Figure 8. Schematic diagram of the LAMP pruning procedure.

Figure 9. Visual comparison of detection results on the DFS dataset (the top section shows the outcomes from YOLOv11n, while the bottom section presents those from RLFNet. (a) is the forest fire scenario, (b) is the multi-scale scenario, (c) is the strongly disturbed scenario).

Figure 10. Grad-CAM visualization results for real forest fire detection.

Figure 11. Visualization results of the M⁴SFWD dataset. (Section (a) shows the detection box and heatmap visualization results of RLFNet, while Section (b) presents those of YOLOv11n).

Table 1. Comparison of RLFNet with other object detection models.

Models	mAP50(%)	mAP50 (F)	mAP50 (S)	mAP50-95	mAP50-95 (F)	mAP50-95 (S)	Params (M)	GFLOPs	FPS
YOLOv5n [36]	69.2	80.2	59.1	43.4	51.7	36.1	2.6	7.1	172.3
YOLOv7-Tiny [37]	69.9	77.4	62.3	41.2	45.7	36.6	6.0	13.0	95.2
YOLOv8n [38]	72.1	78.6	65.5	49.3	54.1	44.6	2.7	6.8	202.1
YOLOv10n [39]	71.1	77.2	65.1	43.2	45.0	41.4	2.7	8.2	189.8
YOLOv11n [40]	71.2	78.7	64.1	49.5	54.2	44.9	2.6	6.3	163.3
YOLOv12n [41]	71.7	77.8	65.5	46.6	48.8	44.5	2.5	5.8	178.0
YOLOv13n [42]	74.9	81.4	68.5	53.1	56.6	49.6	2.5	6.1	146.1
RetinaNet [43]	67.1	68.5	65.6	44.6	48.9	42.2	36.4	163.9	14.2
RT-DETR [44]	67.9	74.3	61.5	45.8	49.2	42.5	32.0	103.4	31.2
Ours	76.5	81.5	71.4	53.3	55.8	50.9	1.9	5.0	224.8

Table 2. Ablation experiment results of the YOLOv11n-based model.

PMEB	BCFM	FIDH	mAP50 (%)	Precision (%)	Recall (%)	Params (M)	GFLOPs	FPS
–	–	–	71.2	74.7	68.1	2.58	6.3	163.3
✓	–	–	71.8	77.3	66.1	2.49	6.2	186.5
–	✓	–	72.4	78.2	66.9	2.74	6.5	169.1
–	–	✓	72.3	76.2	68.0	2.42	5.6	202.6
✓	✓	–	73.6	74.8	67.9	2.66	6.4	183.5
✓	–	✓	72.6	77.7	68.2	2.32	5.5	222.2
–	✓	✓	73.8	77.4	67.8	2.58	5.8	174.9
✓	✓	✓	74.7	77.9	68.4	2.48	5.7	215.1

Note: The symbol ✓ indicates that the corresponding module is included in the model.

Table 3. Ablation experiment results on different kernel settings in PMEB.

Kernel Sizes Combination	mAP50 (%)	Precision (%)	Recall (%)	Parameters (M)	GFLOPs	FPS
[1, 3]	71.1	76.8	65.2	2.47	6.19	189.2
[1, 5]	71.3	77.0	65.6	2.48	6.20	187.8
[3, 5] (Ours)	71.8	77.3	66.1	2.49	6.21	186.5

Table 4. The impact of different pruning rates on RLFNet’s performance.

Pruning Ratio	mAP50 (%)	Precision (%)	Recall (%)	Parameters (M)	GFLOPs	FPS
None	74.7	77.9	68.4	2.6	5.7	215.14
1.1 (final)	76.5	77.4	70.4	1.9	5.0	224.8
1.2	71.9	74.4	65.8	1.7	4.5	246.3
1.3	71.8	75.2	68.4	1.6	4.1	272.8
1.4	70.6	75.9	65.5	1.5	3.7	290.0
1.5	70.1	73.0	63.2	1.4	3.4	296.6
2.0	66.8	72.4	60.1	1.2	2.3	311.3
2.5	35.3	52.9	33.0	1.1	1.7	323.1

Table 5. Comparison of RLFNet with other object detection models on the M⁴SFWD dataset.

Model	mAP50 (%)	Precision (%)	Recall (%)	Parameters (M)	GFLOPs	FPS
YOLOv5n	86.2	84.1	81.1	2.7	7.1	268.1
YOLOv8n	86.8	85.1	79.9	3.0	6.8	292.6
YOLOv10n	85.5	82.8	78.0	2.3	6.5	277.8
YOLOv11n	87.0	83.1	81.6	2.6	6.3	270.3
YOLOv12n	86.5	83.8	81.1	2.5	5.8	252.7
YOLOv13n	86.9	84.2	81.5	2.5	6.1	281.9
RLFNet	87.2	83.3	81.8	1.9	5.0	312.5

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Huang, Z.; Kou, W.; Zheng, C.; Di, G.; Zhang, Q.; Ma, C. RLFNet: A Real-Time Lightweight Network for Forest Fire Detection on Edge Devices. Remote Sens. 2026, 18, 1543. https://doi.org/10.3390/rs18101543

AMA Style

Huang Z, Kou W, Zheng C, Di G, Zhang Q, Ma C. RLFNet: A Real-Time Lightweight Network for Forest Fire Detection on Edge Devices. Remote Sensing. 2026; 18(10):1543. https://doi.org/10.3390/rs18101543

Chicago/Turabian Style

Huang, Zhengshen, Weili Kou, Chen Zheng, Guangzhi Di, Qixing Zhang, and Chenhao Ma. 2026. "RLFNet: A Real-Time Lightweight Network for Forest Fire Detection on Edge Devices" Remote Sensing 18, no. 10: 1543. https://doi.org/10.3390/rs18101543

APA Style

Huang, Z., Kou, W., Zheng, C., Di, G., Zhang, Q., & Ma, C. (2026). RLFNet: A Real-Time Lightweight Network for Forest Fire Detection on Edge Devices. Remote Sensing, 18(10), 1543. https://doi.org/10.3390/rs18101543

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

RLFNet: A Real-Time Lightweight Network for Forest Fire Detection on Edge Devices

Highlights

Abstract

1. Introduction

2. Materials and Methods

2.1. Overall Framework of RLFNet for Forest Fire Detection

2.2. The Diverse Fire Scenarios (DFS) Dataset Construction

2.3. Overview of RLFNet

2.4. Parallel Multi-Scale Extraction Block (PMEB)

2.5. Bidirectional Cross Fusion Module (BCFM)

2.6. Faster Inference Detection Head (FIDH)

2.7. The LAMP Strategy

2.8. Evaluation Metrics

3. Experiments and Results

3.1. Experimental Setup

3.2. Model Performance Evaluation

3.2.1. Comparison Experiments with Different Models

3.2.2. Ablation Experiment

3.3. Experimental Results with Different Pruning Ratios

3.4. Experimental Visualization on the DFS Dataset

Visualization of Heatmap Experiments

3.5. Generalization Experiment

3.6. Inference Performance Analysis Based on PyTorch v.2.12.0 and TensorRT

4. Discussion

5. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI