1. Introduction
Wildfires, commonly referred to as bushfires in Australia, are unplanned fires that spread through wildland areas, causing significant environmental damage, destroying property, endangering lives, and contributing to air pollution. The necessity for early wildfire detection is both urgent and complex, influenced by the potential for exponential fire spread under critical conditions [
1]. Early warning systems reduce risks to residents and responders by providing time for evacuation and strategic planning, and they save millions in firefighting and recovery costs by enabling early fire containment [
2]. Remote sensing (RS) platforms, including satellites, manned aircraft, and uncrewed aerial systems (UASs), play a vital role in providing on-demand information for areas that are hard to reach or pose risks to humans, such as regions prone to wildfires [
3]. UASs demonstrate distinct advantages in wildfire management, outperforming other RS approaches by delivering high-resolution spectral and structural data with the requisite temporal and spatial resolution for effective monitoring [
4].
Artificial intelligence (AI) in UAS-based RS involves analyzing data collected by UASs to perform tasks such as object detection and tracking, and semantic segmentation. This technology enhances environmental monitoring across various domains, including agriculture [
5], biosecurity [
6], search and rescue [
7], and disaster management [
8]. Machine learning (ML) is a core component of AI, providing the ability to learn from data. Deep learning (DL) is a sophisticated branch of ML, excelling in handling complex datasets through deep neural networks. Recent studies show how advanced DL models have greatly enhanced the precision of wildfire detection systems, outperforming conventional ML techniques [
4]. DL features a range of architectures such as convolutional neural networks (CNNs), recurrent neural networks (RNNs), transformers, and generative adversarial networks (GANs) [
9]. In detection tasks, CNNs are particularly effective at capturing hierarchical local features, while vision transformers (ViTs) have gained popularity for their strong performance on large datasets, given sufficient computational resources [
10,
11]. For the development and evaluation of these advanced models, a variety of datasets have been utilized, which are essential resources for improving DL approaches to wildfire detection.
Recent CNN-based wildfire detection studies have utilized various datasets, demonstrating notable performance when trained and tested on high-performance computing systems [
4].
Table 1 summarizes recent studies conducted on real-time wildfire detection using UAS imagery. Object detection is more commonly applied than semantic segmentation due to its faster deployment capabilities, while accuracy and speed vary depending on the dataset, number of target classes, and the hardware used for deployment. Jiao et al. [
12] developed a large variant of the You Only Look Once version 3 (YOLOv3) model, which enables real-time fire detection by capturing images via UASs and transmitting them to ground stations for processing. The use of large variant AI models enhances prediction accuracy and performance, critical for timely wildfire response. However, transmitting and processing images in wildland environments is challenging due to latency and limited connectivity caused by sparse cell tower coverage or limited bandwidth [
13]. Meanwhile, running DL models in resource-constrained computers onboard UASs presents significant challenges due to their limited computational power and memory, which restricts the model complexity of deployable models. However, the use of lightweight models indicates improved UAS autonomy and operational effectiveness by optimizing the trade-off between detection performance and speed in wildfire scenarios. Techniques such as knowledge distillation (KD) [
14] transfer learned knowledge from a large, high-performing model to a smaller model. This approximates the original performance while significantly reducing computational complexity, which makes it suitable for deployment in resource-constrained environments [
15].
The scarcity of labelled UAS-acquired wildfire datasets poses a significant challenge for training robust models. Most studies combined UAS-acquired datasets with publicly available wildfire datasets not originally captured by UASs [
15,
16,
17,
18]. This may be due to the limited availability of labelled datasets [
19]. This is a common challenge in RS and AI, largely driven by the time-consuming nature of annotation and the need for domain expertise [
20]. Using wildfire images from non-UAS sources captured at ground level using handheld devices helps overcome the lack of labelled UAS-acquired data and enhances model generalization through diverse scenarios. However, differences in viewpoint, resolution, and labelling standards can introduce a domain gap, potentially reducing the accuracy of UAS-based wildfire detection models [
21].
Wildfire detection systems primarily target flame, smoke, or a combination of both classes. Smoke detection often acts as an early indicator of wildfire activity, as it rises above the canopy and can be observed from a significant distance before flames become visible. This makes smoke a critical early warning signal in wildfire detection efforts. Research conducted by Barmpoutis et al. [
22] and Wang et al. [
15], for instance, reported that smoke detection demonstrated suboptimal performance, primarily due to the prevalence of false-positive results. Environmental conditions, such as fog and low-lying clouds, which can visually mimic smoke in UAS imagery, frequently resulted in misclassifications, thereby diminishing the accuracy of automated smoke detection systems [
23,
24]. Consequently, while smoke remains an important early indicator, the differentiation of smoke from analogous atmospheric phenomena continues to pose a substantial challenge.
In response to challenges in UAS-based wildfire detection, particularly the domain gap introduced by non-UAS datasets and the difficulty in distinguishing smoke from similar atmospheric phenomena, we introduced plume detection as an additional key indicator, alongside smoke and flame, to increase confidence and reduce false alarms. A plume is typically characterized by a vertically rising column of smoke that extends above the forest canopy, making it visually distinct during the initial stages of a fire. It appears as a structured, coherent, and elongated column, generally upright but often tilted under wind influence. The plume originates from a narrow base at the ignition point and widens with height, forming a recognizable geometry. Its texture commonly shows smooth, layered, or rolling billows with a sharper boundary compared to diffused smoke or fog, enabling clearer discrimination in aerial imagery. The boundary and structure of the plume generally diffuse after a certain distance from the source, which is useful for defining the plume height and annotating it for machine learning. Studies have explored smoke plume height detection using satellite-based systems [
25]. However, the use of UAS can further enhance this capability by capturing high-resolution and near-real-time imagery at lower altitudes. These capabilities facilitate the detection of finer plume structures, particularly in small-scale fires in the early wildfire stage, where satellite imagery may be limited by spatial resolution constraints, elevated data acquisition costs, and delays in retrieving the data for real-time decision making. We conducted a controlled fire experiment in Canungra, QLD, Australia to capture domain-specific imagery dataset [
26] from deciduous vegetation-dominated wildland using a UAS. The images were manually annotated for three classes: smoke, plume, and flame.
Instead of benchmarking smoke detection against other architectures, this study focuses on evaluating how the inclusion of plume as a distinct visual indicator, alongside smoke and flame, improves the overall reliability of wildfire detection, particularly under foggy conditions. We implemented a teacher-student framework using KD [
14], where a fine-tuned teacher model generated pseudo-labels and transferred knowledge to lightweight student models, which were subsequently optimized for real-time applications. Grounding Dino [
27], a powerful open-set object detection model that combines language-guided localization with the DINO [
28] (Detection transformer [
29] with improved training) architecture was used as the teacher model. YOLO Nano variants including YOLO version 5 (YOLOv5n), version 8 (YOLOv8n), and version 11 (YOLOv11n) were chosen as student models due to their speed and lightweight design, making them suitable for deployment on resource-constrained hardware devices. The acquired dataset [
26] was augmented with the Flame2 dataset [
30] to enhance model generalization. The dataset comprises RGB images collected by UAS during a prescribed fire in an open canopy setting in Northern Arizona in 2021. A portion of the Flame2 dataset was manually annotated and used to fine-tune the teacher model. The fine-tuned teacher model was then used to generate pseudo-labels for the entire dataset across the three classes. The combined dataset, which included both manual and pseudo-labels from wildlands with varying dominant vegetation types, was used to train the student models. This approach ensures consistency and accuracy in labelling while minimizing the domain gap between the datasets. By investigating the plume alongside traditional indicators such as smoke and flame, and leveraging UAS-specific datasets, this study aims to improve the accuracy, reliability and scalability of automated wildfire detection systems, particularly in identifying early warning signals under challenging environmental conditions.
4. Discussion
This study evaluated the effectiveness of incorporating plume as a distinct classification class alongside smoke and flame to improve real-time wildfire detection using UAS imagery under foggy conditions, and demonstrated that few-shot learning enables effective fine-tuning with a limited number of annotated examples [
43]. The results, presented in
Table 14, indicate that this multi-class classification approach significantly improves the accuracy of early-stage wildfire detection. The performance of three variants of YOLO Nano on a dataset comprising fog and wildfire images underscores the importance of distinguishing between smoke and plume. Notably, plume detection consistently outperformed smoke detection across all models. It achieved higher accuracy (ranging from 80.1% to 87.5%) and F1-scores (76.1% to 85.7%), whereas smoke detection showed lower accuracy (approximately 58.8%) and a modest F1-score of around 70%. This finding highlights the well-documented difficulty of differentiating smoke from fog in low-visibility environments, an issue frequently noted in previous optical sensing studies [
15,
22,
44].
While all models demonstrated perfect recall (100%) for smoke detection, they exhibited low precision (55%), indicating that while they successfully detected all wildfire instances, they produced a significant number of false positives in fog images. This limitation is especially evident in the right panel of
Figure 14, which shows how fog and smoke can share low opacity, horizontally spreading features, making them difficult to separate visually. The lower recall for plume detection (63.2–75%) suggests that some actual plumes were missed, indicating a less comprehensive capture of plume instances compared to smoke. Despite one misclassification, the inclusion of the plume class proved beneficial in reducing confusion with fog, as its distinct vertical geometry and higher columnar density made it more discriminative than diffuse smoke (
Figure 15a), with the models achieving high F1-scores that indicate reliable plume identification and minimal false positives under foggy conditions.
From the labelled details presented in
Table 4 and
Table 6, it was observed that over half of the images annotated for smoke also included plume labels, highlighting the frequent presence of plumes in wildfire imagery, particularly during the early ignition phase. By differentiating plume from diffuse smoke and fog, the model was able to focus on unique visual features such as vertical ascent, which are less prone to misclassification. This strategy contributed to the notably high F1-score for plume detection, especially with YOLOv11n. The superior performance is primarily attributed to the C2PSA [
39] (Cross Stage Partial with Spatial Attention) module, which enhances the model’s ability to focus on specific spatial regions of interest. It allows the model to prioritize the structural coherence like vertical ascent of plumes, effectively suppressing false positives caused by the diffuse and unstructured nature of fog. Flame detection showed better performance than plume among all the models, but was less useful in the early stages of wildfires, as flames were often hidden by canopy cover in the early stages [
45]. By incorporating plume detection, the model uses a feature that appears early in a wildfire and is visually distinct from fog, thereby improving overall detection reliability.
Recent studies primarily employed YOLO variants and other compact architectures to detect smoke and flame, often using combined classes such as “smoke + flame,” while our study advances this approach by introducing the plume class as a separa)te category, enhancing the model’s ability to capture early and distinguishable wildfire indicators. The mAP@0.5 of 0.721 achieved by YOLOv8n and YOLOv5n in this study is competitive with prior research. Akhloufi et al. [
16] reported a slightly higher mAP@0.5 of 79.84% using YOLOv3-tiny, which classified flame, smoke, and a combined “smoke + flame” class. In contrast, our model introduces a novel plume class, which may increase detection complexity due to its diverse visual characteristics. Wang et al. [
15] achieved a lower mAP of 0.661 using YOLOv4-MobileNetV3, while their KD-based pruned model exhibited faster prediction speeds with an mAP of 0.631, indicating that our approach outperforms this lightweight architecture despite handling an additional class. Xiong et al. [
18] reported AP@0.5 values of 0.86 for flame and 0.71 for smoke (estimated mAP: 0.785). When limited to these two classes, our YOLOv8n achieves an mAP@0.5 of 0.815, surpassing their performance. The integration of manual and pseudo-labels in our study significantly enhanced the efficiency.
Smoke achieved the highest IoU because its diffuse but relatively uniform appearance creates more consistent and clearly identifiable boundaries across images. This allows both annotators and the model to localize smoke regions more reliably. Compared to other classes, the lower detection performance for plumes during the model development is likely influenced by the challenges in consistently annotating their boundaries. As vertically rising columns of smoke, plumes often lack distinct edges, making it difficult to determine where bounding boxes should end. This subjectivity introduces annotation variability, resulting in inconsistencies in ground truth labels that reduce the IoU and overall accuracy. In addition to vertical ascent, plumes also exhibit higher columnar density, stronger structural coherence, and more stable upward motion compared to diffuse smoke or horizontally spreading fog. Flames, while smaller and more sensitive to bounding box errors [
46], performed better likely due to their bright, well-defined shapes that enable more consistent annotation. These class-specific differences underscore the importance of improved annotation strategies, particularly for diffuse or irregular targets like plumes. On the deployment side, platform performance also plays a key role in real-time detection. The Jetson Nano Orin demonstrated significantly higher frame rates compared to the OAK-D. This highlights the advantages of GPU-enabled hardware for processing complex models at the edge and offers practical insights into the trade-offs between speed, power efficiency, and model complexity in wildfire monitoring applications.
This study did not analyze the full temporal progression of the burning process (flame–plume–smoke transitions) during the controlled pile burns, as the experimental setup was designed solely for data acquisition to support real-time detection rather than fire behaviour characterization. The dataset used in this study [
26] was collected during controlled burns and is limited to images captured under daylight conditions. While this provides a consistent and manageable environment for model development, it limits the model’s generalizability. Smoke and plume characteristics such as colour and density can vary significantly based on factors such as fuel type [
47], combustion temperature [
48], and environmental conditions like wind direction [
49]. These variations make detection more challenging in diverse contexts and may reduce the model’s performance in foggy conditions during nighttime or across different forest types. While RGB cameras are cost-effective, their performance is limited at night due to low illumination, reduced contrast, and increased noise. This study did not assess nighttime performance, which remains a key limitation. Future work should consider fusing RGB with thermal imagery, as thermal sensors can detect heat signatures independent of lighting, offering a promising solution for improving nighttime wildfire detection reliability [
50]. In addition to this, factors such as the camera’s intrinsic resolution, the fixed gimbal orientation, the flight altitudes used during data acquisition, and the variability in wind conditions impose additional constraints on the dataset. These acquisition parameters may not fully capture the diversity of imaging geometries and environmental conditions encountered in real wildfire settings and therefore represent potential sources of bias that should be recognized as a limitation of this study.
The inclusion of the plume class, despite its relatively lower detection performance, marks a significant step forward in wildfire detection by providing crucial early warning cues, especially in foggy conditions. To further improve detection accuracy for wildfire, integrating multi-modal data can be beneficial, with RGB cameras offering a more cost-effective option compared to thermal imagery. The model’s performance at night remains untested. Moreover, the adoption of KD techniques can make the dataset creation process more scalable by automating and speeding up the annotation of large volumes of images. This not only reduces manual effort but also ensures consistency in labelling. By utilizing smoke, plume, and flame classes, this study also offers a solid framework for addressing fog-related challenges in wildfire detection and enhances the timeliness of response by evaluating deployment speeds across two different edge processing devices.