A Unified Deep Learning Framework for Biomass Burning Plume Detection and Domain-Adaptive PM1 Estimation

Li, Peimeng; Guo, Hongyu

doi:10.3390/su18105138

Open AccessArticle

A Unified Deep Learning Framework for Biomass Burning Plume Detection and Domain-Adaptive PM₁ Estimation

by

Peimeng Li

and

Hongyu Guo

^*

Guangdong Provincial Key Laboratory of Environmental Pollution Control and Remediation Technology, School of Environmental Science and Engineering, Sun Yat-sen University, Guangzhou 510006, China

^*

Author to whom correspondence should be addressed.

Sustainability 2026, 18(10), 5138; https://doi.org/10.3390/su18105138

Submission received: 10 April 2026 / Revised: 14 May 2026 / Accepted: 16 May 2026 / Published: 20 May 2026

(This article belongs to the Special Issue Air Quality Characterisation and Modelling—2nd Edition)

Download

Browse Figures

Versions Notes

Abstract

Biomass burning is a major source of atmospheric pollution. However, rapid and quantitative assessment of particulate matter in smoke plumes remains challenging, owing to the physical uncertainties, limited coverage, and labor-intensive quality control of conventional monitoring approaches. Existing image-based deep learning methods typically address either smoke detection or air quality assessment separately. To address this gap, we develop a Unified Smoke Detection and Aerosol Estimation Framework (SDAF), a three-stage deep learning approach evaluated using a smoke-rich airborne dataset. The framework integrates smoke localization with PM₁ estimation by combining a YOLOv11-based detector with an optimized convolutional neural network. The model achieves high accuracy under in-plume conditions (R² of 0.985). However, its performance degrades under out-of-plume conditions due to substantial differences in visual features between the two domains. Consequently, direct across-domain transfer performs poorly, whereas region of interest (ROI)-level fine-tuning substantially improves performance for out-of-plume images (R² of 0.621). Despite these promising results, fundamental limitations remain. Image-based PM₁ estimation is intrinsically ill-posed due to the non-unique mapping between visual observations and particle mass. Overall, the framework enables an integrated workflow from smoke localization to quantitative PM₁ estimation using image data alone, offering a scalable solution for biomass burning monitoring and air quality assessment while highlighting the fundamentally indirect nature of image-based PM₁ inference relative to spatially resolved retrievals.

Keywords:

biomass burning; image-based deep learning; smoke detection; PM estimation

1. Introduction

Biomass burning represents a major source of atmospheric pollutants and encompasses both natural and anthropogenic processes, including wildfires and open burning of agricultural residues. Emissions from biomass burning include particulate matter (PM), greenhouse gases, and a wide range of reactive gaseous species, which collectively influence regional air quality, perturb the Earth-atmosphere radiation balance, and impose stress on ecosystems [1,2,3,4]. These impacts are expected to intensify under climate change, as increasing frequency of extreme heat and drought events promotes more frequent and severe fires. By the end of this century, biomass burning is projected to contribute over 50% of PM_2.5 across the continental United States, potentially offsetting gains from anthropogenic emission reductions [5].

Biomass burning emissions exhibit strong spatial and temporal variability and can trigger severe, short-term air pollution episodes [6]. Since 2016, large wildfires have significantly increased PM_2.5 in the western and midwestern US [7]. In China, open-field agricultural burning dominates emissions, with high-emission zones aligned with major crop areas [8,9,10]. Such characteristics highlight the need for timely and effective monitoring of burning activities. Among various observable indicators, smoke is the most direct and visually discernible manifestation of biomass burning, making it a practical target for monitoring and early warning. In addition to indicating fire occurrence, plume also carries information on the intensity of pollutant emissions, particularly PM, providing the potential for simultaneous detection and air quality assessment.

However, existing monitoring approaches remain limited in both early detection and quantitative characterization. Conventional ground-based monitoring networks are often costly to deploy and maintain, and their sparse spatial coverage limits their ability to effectively capture large-scale biomass burning events in remote regions, as well as sporadic, small-scale burning activities in suburban areas. In addition, manual observations and fixed camera systems are further constrained by high labor demands and sensitivity to environmental conditions [11]. Satellite remote sensing enables large-scale observation but is constrained by cloud interference, overflight timing, and challenges in capturing rapidly evolving fire behavior, particularly nighttime activity. In addition, although high-resolution sensors such as Landsat provide relatively fine spatial detail, many widely used fire-monitoring products rely on coarser-resolution observations that have limited capability for detecting small or early-stage fires [12,13,14].

Estimating PM concentrations from smoke images requires linking visual features to aerosol properties. From a physical perspective, aerosol particles influence image formation through light extinction, with Mie scattering governing interactions for particles comparable to visible wavelengths [15,16,17,18]. Variations in particle concentration, size distribution, and refractive index affect image brightness, contrast, and texture. For instance, Saide et al. [19] reported that aerosol mass extinction efficiency (MEE), largely dominated by scattering, can increase by a factor of 2–3 as wildfire smoke ages due to changes in particle size and refractive index. However, this relationship is inherently nonlinear and depends on multiple interacting factors, including illumination conditions, plume morphology, particle size distribution (e.g., coarse-mode fraction) and background complexity. These complexities make explicit analytical modeling impractical and motivate the use of data-driven approaches [20].

As a prerequisite for quantitative estimation, accurate detection and localization of smoke regions are essential to isolate plume-related visual features from complex backgrounds. Recent advances in deep learning (DL) have enabled significant progress in image-based smoke detection. Object detection architectures, such as You Only Look Once (YOLO), Single Shot MultiBox Detector (SSD), Faster Region-based Convolutional Neural Network (Faster R-CNN), and Detection Transformer (DETR), which incorporates transformer-based global feature modeling, have demonstrated strong capability in identifying smoke and localizing its spatial extent across diverse imaging conditions [21,22,23,24,25,26,27]. Among these, the YOLO family of algorithms achieves a favorable balance between detection accuracy and computational efficiency, making it well suited for real-time monitoring applications [28,29].

Building upon detection, estimating PM concentrations from smoke imagery remains substantially more challenging. Both Machine learning (ML) and DL approaches provide powerful tools for capturing such nonlinear relationships. Traditional ML models (e.g., support vector machines (SVMs), random forests (RFs), and Bayesian networks (BNs)) are effective for specific tasks but are limited by their reliance on externally prescribed features and prior domain knowledge in representing complex atmospheric processes [30,31,32,33,34]. In contrast, DL models are capable of learning hierarchical feature representations from image data and have demonstrated strong performance in environmental prediction tasks [30,32]. However, despite these advantages, both ML and DL approaches primarily rely on statistical correlations rather than explicitly incorporating physical principles, which limits their interpretability. In practice, the robustness of learning-based models can be enhanced under challenging data conditions, such as limited samples or noisy observations, through strategies such as data augmentation and ensemble learning, which are widely adopted to improve generalization performance [35]. Nevertheless, existing image-based PM estimation studies have been developed primarily for urban pollution scenarios and rely on data sources such as surveillance or satellite imagery [36,37], with limited focus on biomass burning plumes.

To address these limitations, this study develops a unified DL framework for simultaneous smoke plume detection and PM₁ mass concentration estimation from visual observations, enabling an integrated processing workflow from smoke localization to quantitative assessment. By incorporating region of interest (ROI) extraction of smoke plume areas and transfer learning—where knowledge learned from in-plume images is adapted to improve PM₁ estimation for out-of-plume images—the framework is designed to handle domain discrepancies between different imaging conditions. The model supports both aerial and ground-based panoramic imagery, enhancing its applicability across diverse monitoring scenarios. This integrated approach provides a scalable and cost-effective solution for biomass burning monitoring, with potential applications in air quality management, early warning systems, and environmental policy support.

2. Methods

2.1. Dataset

Most publicly available datasets related to smoke images are designed primarily for smoke detection, with limited consideration given to the association between smoke images and corresponding PM concentrations. The lack of datasets that simultaneously support both smoke detection and quantitative PM estimation poses a significant challenge for the development of predictive algorithms. In this study, the dataset was obtained from the Fire Influence on Regional to Global Environments and Air Quality (FIREX-AQ) aircraft campaign [38], which was conducted in the summer of 2019 over the continental US to investigate the impacts of wildfires and other forms of biomass burning, including prescribed and agricultural residue burning on regional-to-global air quality and climate. The campaign employed state-of-the-art airborne measurement instruments, including a highly customized University of Colorado (CU) high-resolution time-of-flight aerosol mass spectrometer (HR-ToF-AMS; hereafter referred to as AMS; Aerodyne Research Inc., Billerica, MA, USA) for high-resolution in situ PM₁ measurements, complemented by smoke videos captured by external cameras mounted on the DC8 aircraft.

The AMS measured the non-refractory chemical composition of PM₁ at a temporal resolution of 1 or 5 Hz [39,40]. The total PM₁ mass concentration was then derived as the sum of the measured non-refractory species (including organics, sulfate, nitrate, ammonium, chloride, and potassium), which together constitute the particle mass reported in this study. Potassium, a non-routine species measured by the AMS, contributed less than 1% of PM₁ mass in wildfire smoke but up to 7% in agricultural fire plumes. PM₁ concentrations are reported at standard temperature (273 K) and pressure (1013 mbar) (STP). The data processing details, including the averaging procedure, are described separately in Section 2.1.2 and Section 2.1.3 for the two types of smoke plume images. Biomass burning aerosols are dominated by submicron particles in both number and mass, while coarse particles contribute only a minor fraction in number but can account for a non-negligible portion of mass [41,42,43]. Large ash particles, for example, are a known product of biomass burning. In the FIREX-AQ campaign, fine ash particles constitute approximately 8% and 5% of biomass burning smoke, while ash particles larger than 10 µm are mostly deposited near the source due to rapid gravitational settling [44]. Thus, submicron mass is generally representative of smoke, and these coordinated observations provide a robust link between smoke visual characteristics and PM₁ mass concentrations.

The RGB images are extracted from aerial video footage captured by the forward-looking camera on the DC8 aircraft (3840 × 2160 resolution, 30 frames per second) and segmented into three complementary subsets, each tailored to a specific modeling objective. The smoke detection dataset contains images with annotated smoke regions. For PM₁ estimation, image samples are further categorized based on acquisition position. In-plume (InP) images are characterized by smoke occupying most of the field of view, whereas out-of-plume (OutP) images are characterized by partial smoke coverage and more complex scene backgrounds. InP images provide a more reliable training dataset for PM estimation, whereas OutP images represent more realistic scenarios. Table 1 summarizes the main characteristics of InP and OutP images. The three subsets are described in detail below.

2.1.1. Smoke Detection Dataset

The FIREX-AQ Detection (FIREX-AQ-D) dataset consists of smoke images randomly extracted from flight segments prior to plume penetration to ensure diverse and representative smoke appearances across different fire events. All images were manually annotated using LabelImg (available at https://github.com/tzutalin/labelImg, accessed on 15 September 2025), with smoke categories and bounding boxes saved in YOLO format (Figure S1). The initial dataset comprised 1504 smoke images containing visible smoke regions and was expanded to 4512 images through a series of augmentation strategies, including random adjustments to brightness and contrast, blurring, horizontal flipping, and rotation. Bounding box of smoke annotations were transformed accordingly to preserve spatial consistency. These procedures were intended to mitigate overfitting and improve the model’s adaptability to variations in illumination conditions, image quality, and viewing perspectives.

2.1.2. In-Plume PM Estimation Dataset

The FIREX-AQ in-plume (FIREX-AQ-InP) dataset consists of images captured as the aircraft traversed large wildfire smoke plumes, providing an internal plume perspective. As the aircraft advanced rapidly through smoke with visually varying characteristics, PM₁ concentrations were averaged over 10 s intervals following each image acquisition. The preprocessing and annotation procedure for InP data, including AMS-based temporal averaging and label assignment, is illustrated in Figure S2.

The dataset includes 8344 smoke images, each labeled with a PM₁ mass concentration obtained from the AMS measurements, ranging from 2 to 5249 µg/sm³. Since the aircraft was located inside the smoke plume during image capture, the resulting images exhibit relatively homogeneous smoke features, with smoke occupying most of the image frame (Figure S3).

To ensure coverage across the full concentration range, low-concentration images, including cases where smoke is barely visible (e.g., Figure S3a), are included to provide training data at the lower end of the distribution. These samples serve as boundary-constrained observations that improve model stability in low-PM₁ regimes and reduce extrapolation uncertainty near the lower limit of the concentration range. Although low-concentration cases exhibit weak visual smoke cues, they still preserve subtle aerosol-related patterns such as faint scattering and slight background attenuation. Rather than introducing ambiguity, these samples contribute to a more continuous and well-conditioned mapping between visual features and PM₁ concentration across the full dynamic range. This design is particularly important given the wide dynamic range of PM₁ concentrations in biomass burning plumes.

Such imaging conditions emphasize smoke-related visual cues and facilitate the learning of quantitative mappings between image features and PM₁ concentrations. This dataset was subsequently used to develop and optimize a convolutional neural network (CNN)-based PM₁ concentration estimation model, where image features are directly regressed to particle mass concentrations. Within this modeling framework, the FIREX-AQ-InP dataset serves as the source domain, providing a pre-trained foundation for subsequent cross-domain transfer learning.

2.1.3. Out-of-Plume PM Estimation Dataset

The FIREX-AQ out-of-plume (FIREX-AQ-OutP) dataset consists of panoramic images captured by low-altitude aircraft cruising outside the agricultural fire smoke plumes, offering an unobstructed perspective of smoke morphology and spatial distribution. These images were primarily captured within one minute prior to the aircraft’s penetration of the smoke plume, with snapshots taken at 10 s intervals (illustrated in Figure S4). Depending on the variability in smoke dimensions and the aircraft flight path, the number of images collected per smoke event ranged from two to five. In total, 996 OutP images were obtained. FIREX-AQ-OutP is similar to FIREX-AQ-D. However, the former was extracted at fixed time interval ahead of plume interception, whereas the latter was randomly sampled from flight segments. As a result, a small subset of images (55 out of 996, ~5.5%) overlaps between the two datasets. Each OutP image is labeled with a PM₁ concentration, defined as the mean concentration measured during the subsequent plume penetration, ranging from 9 to 3729 µg/sm³. These labels represent integrated plume conditions along the flight path rather than pixel-level correspondence, introducing inherent spatial mismatch and uncertainty.

This limitation is not confined to the OutP dataset and also applies to the InP dataset. However, because InP images are sampled within wildfire plumes, their heterogeneity is expected to be lower than that of the full plume in the OutP dataset, where processes such as dilution and evolution exert a stronger influence. Accordingly, the estimation task for InP and OutP images is formulated as learning an approximate statistical mapping between plume appearance and in situ PM₁ observations, rather than a strictly spatially consistent retrieval.

Substantial differences exist between the FIREX-AQ-InP and FIREX-AQ-OutP datasets in terms of image feature distributions, smoke morphology, background complexity, and acquisition conditions. InP images capture an in-plume viewing perspective, with relatively homogeneous smoke appearance, and limited background variability. In contrast, OutP images capture an external panoramic view with complex backgrounds, including sky, clouds, terrain, and multiple smoke sources (Figure S5), capturing the complete spatial structure of smoke plumes under realistic monitoring conditions. This pronounced discrepancy introduces a clear domain shift between the two datasets, thereby establishing a suitable experimental setting for evaluating transfer learning strategies. In addition to concentration labels, the OutP dataset includes manually annotated ground-truth bounding boxes annotated in YOLO format that define the spatial extent of smoke regions. These annotations are used as reference standards for evaluating smoke detection and ROI extraction performance, rather than as manual inputs during model inference. Collectively, the FIREX-AQ dataset enables the joint investigation of smoke detection and PM₁ concentration estimation under realistic and heterogeneous observation scenarios.

To construct training and evaluation subsets, images were split at the image level rather than the event level. Consequently, frames extracted from the same smoke event may appear in both training and testing sets. Given that the data originate from continuous airborne video sequences with temporally correlated sampling, this strategy may introduce potential sample dependencies. However, InP and OutP images exhibit substantial variability in viewpoint, plume morphology, and acquisition conditions due to aircraft motion and temporal spacing, ensuring that each frame still provides non-identical visual information due to variations in viewpoint, plume evolution, and aircraft motion (Figures S2 and S4). This setup is intended to evaluate the model’s ability to learn robust feature–concentration mappings under realistic observational variability. In contrast, while an event-level splitting strategy reduces the risk of data leakage, it would also substantially decrease the number of available training samples given the limited number of independent plume events (~100) [38], thereby reducing data diversity and potentially hindering the model’s ability to learn stable feature–concentration relationships.

2.2. Deep Learning-Based Framework

An integrated imaged-based DL framework is developed for smoke detection and particle mass concentration prediction. The workflow is organized into three sequential stages.

In Stage 1, YOLO is trained and optimized using the FIREX-AQ-D dataset to perform smoke detection.

In Stage 2, multiple CNN architectures are evaluated using the FIREX-AQ-InP dataset for smoke-related PM₁ estimation, and the optimal model is further refined to capture both intensity and spatial characteristics of smoke.

In Stage 3, the optimized models from the previous stages are integrated to enable smoke localization and PM₁ estimation in FIREX-AQ-OutP images. Transfer learning [45,46] is employed to adapt the estimation model from the InP domain to OutP images, accounting for differences in imaging conditions, viewing geometry, and scene complexity between datasets, while reducing the need for large, labeled samples in the target domain. This strategy enables robust cross-domain generalization and improves model applicability under real-world conditions.

During inference, smoke regions are first detected, and PM₁ concentrations are subsequently estimated from the detected ROIs. This three-stage design enables a structured workflow from smoke localization to quantitative PM₁ estimation and facilitates systematic evaluation of domain adaptation strategies.

2.2.1. Smoke Detection Model

YOLOv11-n (hereafter referred to as YOLOv11) is adopted as the smoke detection model [47]. It consists of three principal components: a backbone network for feature extraction, a neck module for multi-scale feature fusion, and a detection head for bounding box regression and classification [48]. This architecture achieves an effective balance between inference efficiency and detection accuracy, which is essential for near-real-time smoke monitoring. Detailed model configuration and training settings are provided in the Supplementary Information (Text S1). The model is designed to localize smoke regions in OutP images and generate bounding boxes that define ROIs for subsequent concentration estimation task. The FIREX-AQ-D dataset was split randomly into training, validation, and test sets with a ratio of 8:1:1. Detection performance was evaluated during both training and testing phases to ensure reliable smoke localization. Accurate ROI extraction at this stage is critical, as detection performance directly affects the accuracy of downstream PM concentration estimation in OutP images.

2.2.2. In-Plume PM Estimation Model

To establish a robust PM₁ mass concentration estimation model, systematic comparative experiments were conducted using the FIREX-AQ-InP dataset. Multiple representative CNN architectures were evaluated, including ResNet-18 [49], ResNet-CBAM [50], DenseNet-121 [51], MobileNet-V3 [52], EfficientNet-B0 [53], and ConvNeXt-Tiny [54]. These models span a range of network depths, parameter scales, and architectural design philosophies, enabling a comprehensive assessment of prediction capability with InP images conditions. The FIREX-AQ-InP dataset was partitioned into training, validation, and test sets using an 8:1:1 split. Each CNN-based model was trained to learn a regression-based mapping between smoke image features and corresponding PM₁ concentrations. Model performance was evaluated on both training and test sets, and the architecture demonstrating the best overall estimation performance was selected as the Smoke PM Estimation Model (SPEM). SPEM is implemented using an optimized EfficientNet-B0-based architecture (EfficientNet-B0-Smokehead), designed to capture both the intensity and spatial distribution characteristics of smoke relevant to PM retrieval and to serve as the pretrained foundation for subsequent transfer learning to OutP images. Detailed information on training configuration of the PM₁ estimation model is provided in the Supplementary Materials (Text S2).

2.2.3. Unified Smoke Detection and Aerosol Estimation Framework (SDAF)

Building upon the smoke detection model (Section 2.2.1) and the PM estimation model (Section 2.2.2), a Unified Smoke Detection and Aerosol Estimation Framework (SDAF) is developed to integrate these components into a coherent framework for OutP images. The SDAF enables automated processing from smoke region identification to quantitative PM₁ estimation within a three-stage processing pipeline.

A pronounced domain shift exists between the InP and OutP datasets due to differences in viewing geometry, background complexity, and smoke appearance. To address this challenge, the In-plume pretrained Smoke Particle Estimation Model (hereafter referred to as InP-SPEM) is adapted to OutP images via transfer learning. All adaptation strategies initialize the model using pretrained InP-SPEM weights and perform full parameter fine-tuning, allowing complete adaptation of feature representations to the target domain while preserving low-level visual features. Four transfer learning strategies are proposed to systematically evaluate the role of domain adaptation under this cross-domain scenario, as detailed below.

M1 (OutP full images) applies InP-SPEM to full OutP images without any domain adaptation. The pretrained model is used for forward inference on OutP images, and performance metrics are computed to assess the feasibility of direct transfer.
M2 (OutP ROIs) employs SPEM to estimate concentration from ROIs extracted from OutP images, rather than from the entire images. This approach is designed to evaluate the predictive benefits of incorporating ROI-based features.
M3 (OutP full-image fine-tuning) performs fine-tuning of InP-SPEM on OutP raw images. The model is initialized from pretrained InP images weights and trained on the OutP dataset with an 8:2 train-validation split. A reduced learning rate is adopted to ensure stable adaptation while preserving pretrained representations. The model with the highest validation R² is selected. This setting evaluates global domain adaptation under full-scene context.
M4 (OutP ROI-level fine-tuning) integrates YOLOv11-based ROI extraction with an ROI-level fine-tuning strategy. Both training and testing are conducted on ROI-cropped samples to ensure consistency between training and inference distributions. The model is also initialized from pretrained InP-SPEM weights and undergoes full-parameter fine-tuning, enabling joint adaptation of the feature extraction and regression components under localized smoke regions.

In summary, these strategies provide a progressive evaluation of domain adaptation: M1 serves as the baseline. M2 improves input feature representation through ROI extraction, M3 enhances model adaptability via full-image fine-tuning, and M4 integrates both approaches to achieve synergistic optimization of feature representation and model parameters (Figure 1).

2.3. Model Evaluation Metrics

2.3.1. Evaluation Metrics for Smoke Detection

The performance of the YOLOv11 smoke detection model was evaluated using standard object detection metrics, including precision, recall, and mean Average Precision (mAP) [55,56]. These metrics collectively characterize the trade-off between detection accuracy and completeness. Precision quantifies the proportion of correctly identified smoke regions among all regions predicted as smoke, thereby reflecting the reliability of positive detections, and is calculated as shown in Equation (S1) in the Supplementary Materials. Recall quantifies the proportion of actual smoke regions that are correctly detected, thereby indicating the model’s sensitivity to smoke presence (Equation (S2)).

To evaluate detection performance, mAP is adopted as the primary metric, defined as the mean of Average Precision (AP) across categories, where AP corresponds to the area under the precision–recall curve. Higher mAP values indicate better detection performance across varying recall levels. As mAP can be computed under different intersection-over-union (IoU) thresholds, this study employs mAP@0.5 (mAP50), in which an IoU threshold of 0.5 is used to define true positives (TP) detections. This metric is widely used in smoke detection studies and provides a balanced assessment of localization accuracy and detection robustness.

2.3.2. Evaluation Metrics for PM Estimation

The performance of PM concentration estimation models was assessed using four complementary regression metrics: regression slope, the coefficient of determination (R²), root mean squared error (RMSE), and mean absolute error (MAE) [57,58]. Together, these metrics provide a comprehensive assessment of estimation accuracy, error stability, and robustness across different concentration ranges. R² evaluates the proportion of variance in observed PM₁ concentrations that can be explained by the model. RMSE reflects the overall magnitude of estimation error and assigns greater weight to larger deviations, making it sensitive to extreme errors. In contrast, MAE represents the average absolute difference between predicated and measured concentrations and provides a more robust indication of typical model error, as it is less influenced by outliers. These metrics are calculated as shown in Equations (S3)–(S5).

2.4. Model Training Hardware

All experiments were conducted using the PyTorch DL framework (version 2.9.0) under Python (version 3.11.13), with CUDA (version 12.8) acceleration enabled. Model training and evaluation were conducted on a workstation equipped with an NVIDIA GeForce RTX 5060 Laptop GPU and an AMD Ryzen 9 8945HX processor with Radeon Graphics. The FIREX-AQ-D dataset was used exclusively for training and evaluating the smoke detection model, whereas the FIREX-AQ-InP and FIREX-AQ-OutP datasets were utilized for PM₁ concentration estimation and transfer learning experiments. This configuration ensured consistency between dataset design and task-specific modeling objectives.

3. Results and Discussion

This section presents the optimization and systematic evaluation of the proposed framework. First, we examine the smoke detection performance of the YOLOv11 model. Subsequently, we compare and optimize the PM₁ concentration estimation capabilities of multiple CNN architectures using InP images. Finally, we apply the unified SDAF to OutP images to assess its effectiveness in cross-domain PM₁ concentration estimation. Through progressive comparison and refinement, we identify an effective transfer learning strategy for image-level PM₁ estimation.

3.1. Smoke Detection

The YOLOv11 model achieves stable and robust smoke detection performance on the FIREX-AQ-D dataset (423/452, 93.6%). As shown in Figure S6, all loss components rapidly decrease and converge during training, with consistent trends observed in validation, indicating stable optimization without overfitting. During training, all loss components, including box regression loss, classification loss, and distribution fitting loss, decrease rapidly and subsequently converge. A similar trend is observed for the corresponding validation losses, indicating consistent learning behavior and no evident signs of overfitting.

Detection performance improves rapidly during early training and stabilizes at high levels, with both precision and recall reaching strong and consistent values. The steady increase and eventual plateau of mAP50 further indicate balanced and reliable detection performance across varying IoU thresholds. These results suggest that the model achieves robust and reliable smoke localization under diverse spatial overlap conditions. The precision–recall curve (Figure S7) is smooth and concentrated toward the upper-left region, indicating high precision across a broad range of recall levels. The mAP50 reaches 0.932, confirming low rates of both missed detections and false positives and demonstrating reliable smoke localization. Such performance is comparable with recent YOLO-based smoke detection studies reporting mAP50 values of about 0.902 [59] and 0.981 [21]. However, these results are not strictly comparable due to differences in model architectures and dataset characteristics, including scene complexity and environmental conditions such as haze and cloud interference. Such performance provides a solid foundation for subsequent ROI extraction and downstream PM₁ concentration estimation.

Fine-tuning the pretrained YOLOv11 on the FIREX-AQ-OutP dataset substantially improved smoke detection performance (Figure S8). To further improve detection performance under OutP conditions, 200 images were randomly selected from the OutP dataset for model training, while the remaining 796 images were used as an independent test set. The baseline model is evaluated on the full OutP dataset (996 images), whereas the fine-tuned model is evaluated on this held-out subset; therefore, the reported detection rates are computed under different evaluation settings and should not be directly compared. The detection rate increased from 69.1% (696/996 images) to 77.8% (619/796 images), a gain of 8.7 percentage points. These gains indicate that domain-specific fine-tuning enhances the model’s sensitivity to complex plume structures and its robustness to variable smoke appearances. By reducing false negatives, the fine-tuned model not only preserves more valid samples but also supplies more reliable ROIs for downstream PM concentration estimation, ultimately improving the stability and accuracy of the overall framework.

3.2. In-Plume PM Estimation

Among the evaluated models, EfficientNet-B0 and MobileNet-V3 demonstrate the most balanced performance for PM₁ concentration estimation using InP images (Figure 2). During training (Figure 2a), they achieve the highest coefficients of determination (R² = 0.855 and 0.828, respectively), indicating superior fitting capability, whereas ConvNeXt-Tiny triggers early stopping at an earlier stage, reflecting limited fitting efficiency.

On the test set (Figure 2b), EfficientNet-B0 and MobileNet-V3 also achieve the highest R² values, with slopes closest to unity (1.013 and 0.921, respectively), indicating reliable predictions with minimal bias. Among them, EfficientNet-B0 further exhibits lower RMSE and MAE, reflecting reduced estimation error and improved overall accuracy. In contrast, ConvNeXt-Tiny shows weaker generalization with lower R² and higher RMSE and MAE. ResNet-CBAM and DenseNet-121 achieve relatively low MAE but display slightly lower R², indicating weaker correlation. Overall, EfficientNet-B0 provides the most balanced performance in terms of correlation strength and error magnitude on the test set.

These results indicate that lightweight architectures with efficient feature extraction can effectively capture the dominant visual characteristics of smoke under relatively homogeneous conditions.

However, further analysis suggests that PM₁ estimation from smoke images is not solely governed by global intensity features but is also strongly influenced by the spatial distribution and variability of smoke. Standard architectures relying on global average pooling (GAP) summarize feature maps by averaging across spatial dimensions, thereby discarding information on spatial feature distribution and potentially obscuring smoke spatial heterogeneity. In addition, optimization based solely on mean squared error does not explicitly constrain correlation strength or scale consistency, which may introduce systematic bias in regression outputs.

Accordingly, the architecture of SPEM described in Section 2.2.2 was designed to address these limitations. Specifically, the EfficientNet-B0-Smokehead architecture is developed to explicitly incorporate both intensity and spatial variability into the estimation process. This architecture incorporates three targeted enhancements, as detailed below. Firstly, at the feature aggregation stage, a dual-aggregation strategy combining GAP and standard deviation pooling was employed to capture both global intensity and spatial variability. Secondly, the regression head was redesigned as a two-layer structure incorporating Linear, LayerNorm, Gaussian Error Linear Unit (GELU) [60], and Dropout operations, thereby improving regression stability and representational capacity. Thirdly, a composite loss function, RatioCorrLoss, was introduced to jointly optimize mean squared error, correlation, and scale ratio, enabling simultaneous improvement in accuracy, correlation strength, and scale consistency. Details of the loss function design are provided in the Supplementary Information (Text S3).

These modifications lead to substantial improvements in estimation performance, with higher correlation (R² increases from 0.880 to 0.985) compared to the baseline model (Figure 3). This is evidenced by a pronounced reduction in error metrics: RMSE declines from 299.14 to 104.07 µg/sm³ and MAE decreasing from 181.41 to 53.71 µg/sm³, indicating a substantial reduction in both overall deviation and typical prediction error magnitude. It should be noted that the reported improvement reflects the combined effect of multiple architectural and training modifications introduced as an integrated design, rather than the contribution of any single component. The absence of a dedicated ablation study limits the ability to attribute performance gains to individual components.

A limitation of the image extraction strategy is that the dataset was not split at the wildfire-event level, potentially introducing data leakage. However, the risk is expected to be limited because smoke visual characteristics evolve continuously during aircraft passage through wildfire plumes over sampling periods lasting several minutes (Figure S2). Moreover, plume-level splitting within individual wildfire events is impractical in this study, as it would substantially reduce the number of available images while failing to represent the inherent spatial heterogeneity of smoke plumes.

Given that the PM₁ labels represent plume conditions sampled along the aircraft flight path rather than spatially explicit two-dimensional image observations, the achieved performance is notable and suggests that the model successfully captures meaningful statistical relationships between smoke appearance and PM₁ concentration.

Collectively, these results demonstrate that incorporating spatial variability and multi-objective optimization is essential for accurately capturing the relationship between smoke appearance and PM₁ concentration under controlled, homogeneous conditions. The optimized model therefore provides a robust foundation for subsequent PM₁ concentration estimation tasks using OutP images.

3.3. Out-of-Plume PM Estimation

Transfer learning plays a decisive role in cross-domain PM₁ estimation performance (Figure 4). Strategies incorporating parameter adaptation (M3 and M4) consistently outperform those without adaptation (M1 and M2), demonstrating that domain alignment is essential for reliable generalization from InP to OutP images (Figure 5).

Methods without parameter adaptation (M1 and M2) fail to generalize to OutP data, as evidenced by negative R² values. Unlike ordinary least-squares regression evaluated on training data, negative R² values on independent test datasets indicate that prediction errors exceed those of a simple mean-value predictor, reflecting poor generalization under substantial domain shift. This failure reflects substantial discrepancies in feature distribution, smoke morphology, and background complexity between the source and target domains. Although ROI-based input refinement in M2 provides moderate improvement (R² increase of 0.24, with reduced RMSE and MAE), persistent bias in medium- to high-concentration ranges indicates that input-level optimization alone is insufficient to resolve domain mismatch.

In contrast, strategies with parameter adaptation (M3 and M4) substantially improve estimation performance, confirming the critical role of model fine-tuning in mitigating domain shift. M3 achieves the highest predictive accuracy (slope = 0.814, R² = 0.696), demonstrating that aligning model parameters with the target domain is highly effective. In terms of error magnitude, M3 yields reduced RMSE (303.08 µg/sm³) and MAE (211.03 µg/sm³), indicating enhanced predictive accuracy and robustness, reflected in improved control of both extreme deviations and mean errors relative to non-adapted strategies. Under the current evaluation setting, M3 represents the most accurate regression strategy, but its reliance on full-image inputs limits practical applicability, as non-smoke regions dominate large portions of the OutP image and provide little relevant information for PM₁ estimation.

By integrating smoke detection with ROI-level fine-tuning, M4 provides a unified framework that jointly optimizes input representation and model parameters. Although its regression accuracy is slightly lower than that of M3, it offers a more practical and deployable solution by aligning both data representation and model learning with real-world observation conditions (Figure 6). In terms of computational efficiency, the inference time of the proposed three-stage framework is approximately 0.169 s per image (batch size = 1) on consumer-grade personal computer equipped with an NVIDIA RTX 5060 GPU. The reported inference time includes both smoke detection and PM₁ estimation stages, indicating that the method can support near-real-time processing for smoke monitoring applications. Overall, M3 prioritizes predictive accuracy under controlled evaluation conditions, whereas M4 prioritizes operational feasibility in practical deployment scenarios, where smoke regions must first be identified. This comparison underscores an inherent trade-off between optimal regression performance under idealized inputs and robustness under realistic, detection-constrained scenarios.

The reported estimation performance should be interpreted with caution because the training and testing subsets were constructed using image-level splitting rather than event-level separation. As a result, residual correlations may persist among samples acquired within the same smoke event, potentially leading to optimistic performance estimates. Although the continuously evolving visual characteristics of smoke during aircraft traversal reduce the likelihood of identical samples, further evaluation using event-level splitting and more diverse independent plume events will be necessary to more rigorously assess model generalization capability.

Compared with M2 and M3, M4 uniquely combines ROI-based input constraint with full-parameter fine-tuning. Unlike M2, which refines input representation without model updating, and M3, which performs global adaptation on full images, M4 restricts learning to smoke-relevant regions while updating all model parameters. This enables M4 to reduce background interference while maintaining adaptation capability, leading to improved spatial consistency of learned features.

Despite these improvements, estimation errors remain strongly ROI-dependent (Figure S9). Underestimation typically occurs when the aircraft traverses highly heterogeneous plume cores or when dense plume regions are only partially captured within the detected ROIs, causing the image representation to be diluted by adjacent low-intensity regions. Conversely, overestimation tends to arise when visually prominent plume structures are not fully representative of the aerosol conditions sampled along the aircraft flight path, particularly for optically thin or peripheral smoke regions. The extreme-case examples presented in Figure S9 illustrate representative failure cases, providing direct empirical evidence that concentration-dependent errors are closely linked to spatial inconsistencies between detected ROIs and the plume regions sampled along the aircraft trajectory.

These patterns reveal a key limitation of image-based estimation, arising from the difficulty of aligning two-dimensional visual observations with one-dimensional sampling paths within heterogeneous plume structures. In addition, variations in viewing geometry, plume morphology, and illumination conditions introduce substantial variability in image appearance that is not uniquely related to PM₁ concentration. Even with ROI-based processing, the extracted regions remain only an approximation of the true sampled plume segment, and uncertainties in detection boundaries and plume structure can lead to spatial misalignment.

Furthermore, the relationship between visual features and PM₁ concentration is inherently nonlinear and influenced by aerosol optical properties, including particle size distribution, composition, and scattering effects. Label uncertainty arising from flight-path-based measurements and potential temporal mismatch also contributes to estimation error. As a result, improvements in detection accuracy do not necessarily translate into proportional gains in estimation accuracy, particularly for extreme cases.

These findings collectively point to a fundamental challenge: interpreting image-based PM₁ estimates requires accounting for inherent geometric and sampling inconsistencies. The problem is ill-posed by nature, as one-dimensional measurements are used to approximate two-dimensional visual information for heterogeneous three-dimensional plume structures.

4. Summary

Biomass burning is a major and persistent source of atmospheric pollution, exerting significant impacts on air quality, climate, and ecosystems. Developing scalable and cost-effective monitoring approaches is therefore critical. In this study, we propose an image-based DL framework that integrates smoke detection and PM₁ concentration estimation from aerial imagery, enabling a sequential inference from smoke localization to quantitative mass estimation. The learned mapping between image features and PM₁ concentration is broadly consistent with plume behavior, as regions with higher optical thickness and more spatially concentrated smoke structures are generally associated with increased aerosol loading and enhanced light extinction.

The optimized CNN model, InP-SPEM, demonstrates high accuracy in estimating PM₁ concentrations under InP conditions (slope = 0.969, R² = 0.985, RMSE = 104.47 µg/sm³, MAE = 53.71 µg/sm³). This improvement arises from the combined effect of multiple architectural and training modifications designed to enhance feature representation and regression stability, rather than a single design factor. However, significant performance degradation is observed when directly applied to OutP images, highlighting the impact of domain shift. Incorporating ROI extraction with transfer learning effectively mitigates this issue, and comparative evaluation indicates that domain adaptation and input consistency are critical for cross-domain generalization, with fine-tuning substantially improving performance.

The integrated approach (M4), combining YOLO-based smoke detection with ROI-level fine-tuning, demonstrates relatively stable performance under the current evaluation setting and provides a practical integrated workflow for cross-domain PM₁ estimation (slope = 0.645, R² = 0.621). Although the alternative M3 approach yields higher statistical accuracy using full-scene OutP images, its applicability is fundamentally limited because non-smoke regions dominate large portions of the input and provide limited information for plume-specific concentration estimation. As a result, M4 represents the more practical and deployable solution under real-world observation conditions, highlighting a trade-off between optimal regression performance under idealized inputs and robustness under detection-constrained scenarios.

Despite these advances, several limitations remain that warrant further consideration. Image-based PM estimation is intrinsically ill-posed, as the mapping between two-dimensional visual features and one-dimensional particle mass measurement is non-unique and influenced by plume heterogeneity, aerosol properties, viewing geometry, illumination conditions, and background complexity. Consequently, PM₁ estimation from InP and OutP images is inherently approximate, as the labels are derived from flight-path-based measurements rather than direct spatial correspondence with image content. Additional uncertainty arises from the use of flight-path-averaged PM₁ labels, where temporal averaging and misalignment between image acquisition and in situ sampling can introduce label uncertainty and affect model performance. This mismatch between images and PM₁ labels imposes a fundamental limitation on the framework, whereby the model learns a statistical relationship between plume appearance and PM₁ concentration rather than performing a spatially explicit or pixel-aligned retrieval. Moreover, the evaluation relies on a single dataset, reflecting the scarcity of benchmark datasets that pair in situ PM measurements with smoke imagery, thereby limiting assessment across diverse fire conditions. Because the datasets were partitioned at the image rather than event level, related samples from the same smoke event may occur in both training and testing subsets. Consequently, the reported performance may not fully represent independent event-level predictive capability. The framework is also restricted to submicron particles and has not yet been extended to other particulate fractions, notably in scenarios where the coarse mode particles contribute substantially to total mass.

Addressing these limitations requires the integration of physical understanding with data-driven approaches, the development of larger and more diverse datasets, improvements in detection robustness. Future work will also explore event-level data splitting and validation using more independent plume events to enable more statistically rigorous assessment of model generalization, although this will require substantially larger datasets. Further efforts include extending the framework to multiple particulate metrics, incorporating low-light, nighttime, and ground-based observations, and conducting sensitivity analyses to better quantify the influence of key factors such as plume density, image resolution, and ROI selection. Validation across diverse geographic regions and fire environments is also needed to assess generalizability. Extending the framework to other pollutants is valuable but would require approaches that can establish relationships between image features and visually faint air pollutants.

More broadly, this work provides a scalable and practical pathway for image-based air quality monitoring, advancing biomass burning observation from proof-of-concept toward operational deployment. The proposed framework highlights both the potential and the fundamental challenges of inferring particulate concentrations from visual data, and establishes a foundation for future developments in real-time smoke monitoring, emission estimation, and data-driven atmospheric observation systems.

Supplementary Materials

The following supporting information can be downloaded at: https://www.mdpi.com/article/10.3390/su18105138/s1, Text S1: YOLOv11 smoke detection model configuration; Text S2: CNN-based PM₁ estimation model configuration; Text S3: Loss Function Design for EfficientNet-B0-Smokehead (RatioCorrLoss); Figure S1: Representative examples from the FIREX-AQ-D dataset; Figure S2: Examples of data preprocessing and annotation for the FIREX-AQ-InP dataset; Figure S3: Representative InP images from the FIREX-AQ-InP dataset; Figure S4: Examples of data preprocessing and annotation for the FIREX-AQ-OutP dataset; Figure S5: Representative OutP images from the FIREX-AQ-OutP dataset; Figure S6: Training and validation performance of YOLOv11 in terms of loss components and evaluation metrics; Figure S7: Precision–recall (PR) curve of YOLOv11 on the FIREX-AQ-D dataset; Figure S8: Comparison of smoke detection performance between the baseline and fine-tuned YOLOv11 model on the FIREX-AQ-OutP dataset; Figure S9: Extreme cases from the FIREX-AQ-OutP dataset evaluated using M4; Equations (S1)–(S5): Evaluation Metric Formulas.

Author Contributions

Conceptualization, H.G.; methodology, P.L. and H.G.; data curation, P.L. and H.G.; writing—original draft preparation, P.L.; writing—review and editing, P.L. and H.G.; funding acquisition: H.G. All authors have read and agreed to the published version of the manuscript.

Funding

This research was supported by the National Natural Science Foundation of China [grant number 21EAA01082], Guangzhou Municipal Science and Technology Bureau Basic and Applied Basic Research Foundation (Young Doctor “Qihang” Program) [grant number 2024A04J459], and Sun Yat-sen University Fundamental Research Fund [grant number 23hytd015]. Work by H.G. at the University of Colorado Boulder as part of the CU HR-AMS team was supported by NASA grants [grant number 80NSSC18K0630, 80NSSC19K0124, and 80NSSC21K1451].

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

FIREX-AQ data is publicly available at the NASA LaRC archive (https://doi.org/10.5067/SUBORBITAL/FIREXAQ2019/DATA001, accessed on 15 September 2025). The processed fire plume data used in this study are available from the corresponding author upon reasonable request.

Acknowledgments

We gratefully acknowledge the FIREX-AQ crews, pilots, and support staff for making the field measurements possible. We further extend our sincere appreciation to the CU HR-AMS team, particularly Pedro Campuzano-Jost and Jose L. Jimenez, for acquiring and processing the AMS data used in this study, as well as for their valuable suggestions regarding the paper preparation and data analysis. During the preparation of this manuscript, the authors used ChatGPT (5.3) and DeepSeek (3.2) exclusively for linguistic improvements. The authors reviewed and edited the content as needed and take full responsibility for the content of the publication.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Yao, W.; Zhao, Y.; Chen, R.; Wang, M.; Song, W.; Yu, D. Emissions of Toxic Substances from Biomass Burning: A Review of Methods and Technical Influencing Factors. Processes 2023, 11, 853. [Google Scholar] [CrossRef]
Johnston, H.J.; Mueller, W.; Steinle, S.; Vardoulakis, S.; Tantrakarnapa, K.; Loh, M.; Cherrie, J.W. How Harmful Is Particulate Matter Emitted from Biomass Burning? A Thailand Perspective. Curr. Pollut. Rep. 2019, 5, 353–377. [Google Scholar] [CrossRef]
Andreae, M.O. Emission of trace gases and aerosols from biomass burning—An updated assessment. Atmos. Chem. Phys. 2019, 19, 8523–8546. [Google Scholar] [CrossRef]
Zhang, Q.; Wang, Y.; Xiao, Q.; Geng, G.; Davis, S.J.; Liu, X.; Yang, J.; Liu, J.; Huang, W.; He, C.; et al. Long-range PM_2.5 pollution and health impacts from the 2023 Canadian wildfires. Nature 2025, 645, 672–678. [Google Scholar] [CrossRef] [PubMed]
Ford, B.; Val Martin, M.; Zelasky, S.E.; Fischer, E.V.; Anenberg, S.C.; Heald, C.L.; Pierce, J.R. Future Fire Impacts on Smoke Concentrations, Visibility, and Health in the Contiguous United States. GeoHealth 2018, 2, 229–247. [Google Scholar] [CrossRef]
Brown, H.; Liu, X.; Pokhrel, R.; Murphy, S.; Lu, Z.; Saleh, R.; Mielonen, T.; Kokkola, H.; Bergman, T.; Myhre, G.; et al. Biomass burning aerosols in most climate models are too absorbing. Nat. Commun. 2021, 12, 277. [Google Scholar] [CrossRef]
Burke, M.; Childs, M.L.; de la Cuesta, B.; Qiu, M.; Li, J.; Gould, C.F.; Heft-Neal, S.; Wara, M. The contribution of wildfire to PM_2.5 trends in the USA. Nature 2023, 622, 761–766. [Google Scholar] [CrossRef]
Zheng, Y.; Han, Z.; Wu, J.; Li, J. Spatial Distribution and Temporal Variation of Biomass Burning Emissions and Intercomparison of Biomass Burning Emission Inventories over China. Clim. Environ. Res. 2023, 28, 495–508. [Google Scholar] [CrossRef]
Cheng, Y.; Yu, Q.; Liu, J.; Cao, X.; Zhong, Y.; Du, Z.; Liang, L.; Geng, G.; Ma, W.; Qi, H.; et al. Dramatic changes in Harbin aerosol during 2018–2020: The roles of open burning policy and secondary aerosol formation. Atmos. Chem. Phys. 2021, 21, 15199–15211. [Google Scholar] [CrossRef]
Van Der Werf, G.R.; Randerson, J.T.; Giglio, L.; van Leeuwen, T.T.; Chen, Y.; Rogers, B.M.; Mu, M.; van Marle, M.J.E.; Morton, D.C.; Collatz, G.J.; et al. Global fire emissions estimates during 1997–2016. Earth Syst. Sci. Data 2017, 9, 697–720. [Google Scholar] [CrossRef]
Park, G.; Lee, Y. Wildfire Smoke Detection Enhanced by Image Augmentation with StyleGAN2-ADA for YOLOv8 and RT-DETR Models. Fire 2024, 7, 369. [Google Scholar] [CrossRef]
Wiggins, E.B.; Soja, A.J.; Gargulinski, E.; Halliday, H.S.; Pierce, R.B.; Schmidt, C.C.; Nowak, J.B.; DiGangi, J.P.; Diskin, G.S.; Katich, J.M.; et al. High Temporal Resolution Satellite Observations of Fire Radiative Power Reveal Link Between Fire Behavior and Aerosol and Gas Emissions. Geophys. Res. Lett. 2020, 47, e2020GL090707. [Google Scholar] [CrossRef]
Chen, Y.; Morton, D.C.; Randerson, J.T. Remote sensing for wildfire monitoring: Insights into burned area, emissions, and fire dynamics. One Earth 2024, 7, 1022–1028. [Google Scholar] [CrossRef]
Schroeder, W.; Oliva, P.; Giglio, L.; Quayle, B.; Lorenz, E.; Morelli, F. Active fire detection using Landsat-8/OLI data. Remote Sens. Environ. 2016, 185, 210–220. [Google Scholar] [CrossRef]
Moore, R.H.; Wiggins, E.B.; Ahern, A.T.; Zimmerman, S.; Montgomery, L.; Campuzano Jost, P.; Robinson, C.E.; Ziemba, L.D.; Winstead, E.L.; Anderson, B.E.; et al. Sizing response of the Ultra-High Sensitivity Aerosol Spectrometer (UHSAS) and Laser Aerosol Spectrometer (LAS) to changes in submicron aerosol composition and refractive index. Atmos. Meas. Tech. 2021, 14, 4517–4542. [Google Scholar] [CrossRef]
Li, T.; Shen, H.; Yuan, Q.; Zhang, X.; Zhang, L. Estimating Ground-Level PM_2.5 by Fusing Satellite and Station Observations: A Geo-Intelligent Deep Learning Approach. Geophys. Res. Lett. 2017, 44, 11985–11993. [Google Scholar] [CrossRef]
Van Donkelaar, A.; Martin, R.V.; Brauer, M.; Kahn, R.; Levy, R.; Verduzco, C.; Villeneuve, P.J. Global Estimates of Ambient Fine Particulate Matter Concentrations from Satellite-Based Aerosol Optical Depth: Development and Application. Environ. Health Perspect. 2010, 118, 847–855. [Google Scholar] [CrossRef] [PubMed]
Sanford, T.J.; Murphy, D.M.; Thomson, D.S.; Fox, R.W. Albedo Measurements and Optical Sizing of Single Aerosol Particles. Aerosol Sci. Technol. 2008, 42, 958–969. [Google Scholar] [CrossRef]
Saide, P.E.; Thapa, L.H.; Ye, X.; Pagonis, D.; Campuzano-Jost, P.; Guo, H.; Schuneman, M.L.; Jimenez, J.L.; Moore, R.; Wiggins, E.; et al. Understanding the Evolution of Smoke Mass Extinction Efficiency Using Field Campaign Measurements. Geophys. Res. Lett. 2022, 49, e2022GL099175. [Google Scholar] [CrossRef]
Di, Q.; Amini, H.; Shi, L.; Kloog, I.; Silvern, R.; Kelly, J.; Sabath, M.B.; Choirat, C.; Koutrakis, P.; Lyapustin, A.; et al. An ensemble-based model of PM_2.5 concentration across the contiguous United States with high spatiotemporal resolution. Environ. Int. 2019, 130, 104909. [Google Scholar] [CrossRef] [PubMed]
Gonçalves, L.A.O.; Ghali, R.; Akhloufi, M.A. YOLO-Based Models for Smoke and Wildfire Detection in Ground and Aerial Images. Fire 2024, 7, 140. [Google Scholar] [CrossRef]
Ding, Y.; Wang, M.; Fu, Y.; Wang, Q. Forest Smoke-Fire Net (FSF Net): A Wildfire Smoke Detection Model That Combines MODIS Remote Sensing Images with Regional Dynamic Brightness Temperature Thresholds. Forests 2024, 15, 839. [Google Scholar] [CrossRef]
Yang, H.; Wang, J.; Wang, J. Efficient Detection of Forest Fire Smoke in UAV Aerial Imagery Based on an Improved Yolov5 Model and Transfer Learning. Remote Sens. 2023, 15, 5527. [Google Scholar] [CrossRef]
Hong, R.; Wang, X.; Fang, Y.; Wang, H.; Wang, C.; Wang, H. Yolo-Light: Remote Straw-Burning Smoke Detection Based on Depthwise Separable Convolution and Channel Attention Mechanisms. Appl. Sci. 2023, 13, 5690. [Google Scholar] [CrossRef]
Bahhar, C.; Ksibi, A.; Ayadi, M.; Jamjoom, M.M.; Ullah, Z.; Soufiene, B.O.; Sakli, H. Wildfire and Smoke Detection Using Staged YOLO Model and Ensemble CNN. Electronics 2023, 12, 228. [Google Scholar] [CrossRef]
Zheng, X.; Chen, F.; Lou, L.; Cheng, P.; Huang, Y. Real-Time Detection of Full-Scale Forest Fire Smoke Based on Deep Convolution Neural Network. Remote Sens. 2022, 14, 536. [Google Scholar] [CrossRef]
Pan, J.; Ou, X.; Xu, L. A Collaborative Region Detection and Grading Framework for Forest Fire Smoke Using Weakly Supervised Fine Segmentation and Lightweight Faster-RCNN. Forests 2021, 12, 768. [Google Scholar] [CrossRef]
Sharma, A.; Kumar, V.; Longchamps, L. Comparative performance of YOLOv8, YOLOv9, YOLOv10, YOLOv11 and Faster R-CNN models for detection of multiple weed species. Smart Agric. Technol. 2024, 9, 100648. [Google Scholar] [CrossRef]
Redmon, J.; Divvala, S.; Girshick, R.; Farhadi, A. You Only Look Once: Unified, Real-Time Object Detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 27–30 June 2016; pp. 779–788. [Google Scholar]
Chakma, A.; Vizena, B.; Cao, T.; Lin, J.; Zhang, J. Image-based air quality analysis using deep convolutional neural network. In Proceedings of the 2017 IEEE International Conference on Image Processing (ICIP), Beijing, China, 17–20 September 2017; pp. 3949–3952. [Google Scholar]
Sharma, G.; Khurana, S.; Saina, N.; Shivansh; Gupta, G. Comparative Analysis of Machine Learning Techniques in Air Quality Index (AQI) prediction in smart cities. Int. J. Syst. Assur. Eng. Manag. 2024, 15, 3060–3075. [Google Scholar] [CrossRef]
Zhang, Q.; Fu, F.; Tian, R. A deep learning and image-based model for air quality estimation. Sci. Total Environ. 2020, 724, 138178. [Google Scholar] [CrossRef]
Zimmerman, N.; Presto, A.A.; Kumar, S.P.N.; Gu, J.; Hauryliuk, A.; Robinson, E.S.; Robinson, A.L. A machine learning calibration model using random forests to improve sensor performance for lower-cost air quality monitoring. Atmos. Meas. Tech. 2018, 11, 291–313. [Google Scholar] [CrossRef]
Wang, W.; Men, C.; Lu, W. Online prediction model based on support vector machine. Neurocomputing 2008, 71, 550–558. [Google Scholar] [CrossRef]
Zhang, X.; Xia, Y.; Zhang, C.; Liu, B.; Wang, C.; Fang, H.; Wang, J. A prediction model for dyke-dam piping based on data augmentation and interpretable ensemble learning. Eng. Fail. Anal. 2025, 182, 110174. [Google Scholar] [CrossRef]
Suh, K.-S.; Min, B.-I.; Yang, B.-M.; Kim, S.; Park, K.; Kim, J. Machine learning method using camera image patterns for predictions of particulate matter concentrations. Atmos. Pollut. Res. 2022, 13, 101325. [Google Scholar] [CrossRef]
Liu, Y.; Nie, J.; Li, X.; Ahmed, S.H.; Lim, W.Y.B.; Miao, C. Federated Learning in the Sky: Aerial-Ground Air Quality Sensing Framework with UAV Swarms. IEEE Internet Things J. 2021, 8, 9827–9837. [Google Scholar] [CrossRef]
Warneke, C.; Schwarz, J.P.; Dibb, J.; Kalashnikova, O.; Frost, G.; Al-Saad, J.; Brown, S.S.; Brewer, W.A.; Soja, A.; Seidel, F.C.; et al. Fire Influence on Regional to Global Environments and Air Quality (FIREX-AQ). J. Geophys. Res. Atmos. 2023, 128, e2022JD037758. [Google Scholar] [CrossRef]
Pagonis, D.; Campuzano-Jost, P.; Guo, H.; Day, D.A.; Schueneman, M.K.; Brown, W.L.; Nault, B.A.; Stark, H.; Siemens, K.; Laskin, A.; et al. Airborne extractive electrospray mass spectrometry measurements of the chemical composition of organic aerosol. Atmos. Meas. Tech. 2021, 14, 1545–1559. [Google Scholar] [CrossRef]
Pagonis, D.; Selimovic, V.; Campuzano-Jost, P.; Guo, H.; Day, D.A.; Schueneman, M.K.; Nault, B.A.; Coggon, M.M.; DiGangi, J.P.; Diskin, G.S.; et al. Impact of Biomass Burning Organic Aerosol Volatility on Smoke Concentrations Downwind of Fires. Environ. Sci. Technol. 2023, 57, 17011–17021. [Google Scholar] [CrossRef]
Dollner, M.; Gasteiger, J.; Schöberl, M.; Gattringer, A.; Beres, N.D.; Bui, T.P.; Diskin, G.; Weinzierl, B. The Cloud Indicator: A novel algorithm for automatic detection and classification of clouds using airborne in situ observations. Atmos. Res. 2024, 308, 107504. [Google Scholar] [CrossRef]
June, N.A.; Hodshire, A.L.; Wiggins, E.B.; Winstead, E.L.; Robinson, C.E.; Thornhill, K.L.; Sanchez, K.J.; Moore, R.H.; Pagonis, D.; Guo, H.; et al. Aerosol size distribution changes in FIREX-AQ biomass burning plumes: The impact of plume concentration on coagulation and OA condensation/evaporation. Atmos. Chem. Phys. 2022, 22, 12803–12825. [Google Scholar] [CrossRef]
Reid, J.S.; Koppmann, R.; Eck, T.F.; Eleuterio, D.P. A review of biomass burning emissions part II: Intensive physical properties of biomass burning particles. Atmos. Chem. Phys. 2005, 5, 799–825. [Google Scholar] [CrossRef]
Adachi, K.; Dibb, J.E.; Scheuer, E.; Katich, J.M.; Schwarz, J.P.; Perring, A.E.; Mediavilla, B.; Guo, H.; Campuzano-Jost, P.; Jimenez, J.L.; et al. Fine Ash-Bearing Particles as a Major Aerosol Component in Biomass Burning Smoke. J. Geophys. Res. Atmos. 2022, 127, e2021JD035657. [Google Scholar] [CrossRef]
Hosna, A.; Merry, E.; Gyalmo, J.; Alom, Z.; Aung, Z.; Azim, M.A. Transfer learning: A friendly introduction. J. Big Data 2022, 9, 102. [Google Scholar] [CrossRef]
Pan, S.J.; Yang, Q. A Survey on Transfer Learning. IEEE Trans. Knowl. Data Eng. 2010, 22, 1345–1359. [Google Scholar] [CrossRef]
Ultralytics. YOLO11 NEW. Available online: https://docs.ultralytics.com/models/yolo11 (accessed on 15 September 2025).
Terven, J.; Córdova-Esparza, D.-M.; Romero-González, J.-A. A Comprehensive Review of YOLO Architectures in Computer Vision: From YOLOv1 to YOLOv8 and YOLO-NAS. Mach. Learn. Knowl. Extr. 2023, 5, 1680–1716. [Google Scholar] [CrossRef]
He, K.; Zhang, X.; Ren, S.; Sun, J. Deep Residual Learning for Image Recognition. In Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 27–30 June 2016; pp. 770–778. [Google Scholar]
Woo, S.; Park, J.; Lee, J.-Y.; Kweon, I.S. CBAM: Convolutional Block Attention Module. In Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany, 8–14 September 2018; pp. 3–19. [Google Scholar]
Huang, G.; Liu, Z.; van der Maaten, L.; Weinberger, K.Q. Densely Connected Convolutional Networks. arXiv 2016, arXiv:1608.06993. [Google Scholar]
Howard, A.; Sandler, M.; Chen, B.; Wang, W.; Chen, L.-C.; Tan, M.; Chu, G.; Vasudevan, V.; Zhu, Y.; Pang, R. Searching for MobileNetV3. In Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision (ICCV), Seoul, Republic of Korea, 27 October–2 November 2019. [Google Scholar]
Tan, M.; Le, Q.V. EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks. arXiv 2019, arXiv:1905.11946. [Google Scholar]
Liu, Z.; Mao, H.; Wu, C.; Feichtenhofer, C.; Darrell, T.; Xie, S. A ConvNet for the 2020s. arXiv 2022, arXiv:2201.03545. [Google Scholar] [CrossRef]
Everingham, M.; Van Gool, L.; Williams, C.K.I.; Winn, J.; Zisserman, A. The Pascal Visual Object Classes (VOC) Challenge. Int. J. Comput. Vis. 2009, 88, 303–338. [Google Scholar] [CrossRef]
Liu, L.; Ouyang, W.; Wang, X.; Fieguth, P.; Chen, J.; Liu, X.; Pietikäinen, M. Deep Learning for Generic Object Detection: A Survey. Int. J. Comput. Vis. 2019, 128, 261–318. [Google Scholar] [CrossRef]
Willmott, C.; Matsuura, K. Advantages of the mean absolute error (MAE) over the root mean square error (RMSE) in assessing average model performance. Clim. Res. 2005, 30, 79–82. [Google Scholar] [CrossRef]
Chicco, D.; Warrens, M.J.; Jurman, G. The coefficient of determination R-squared is more informative than SMAPE, MAE, MAPE, MSE and RMSE in regression analysis evaluation. PeerJ Comput. Sci. 2021, 7, e623. [Google Scholar] [CrossRef] [PubMed]
Sun, L.; Li, Y.; Hu, T. ForestFireDetector: Expanding Channel Depth for Fine-Grained Feature Learning in Forest Fire Smoke Detection. Forests 2023, 14, 2157. [Google Scholar] [CrossRef]
Hendrycks, D.; Gimpel, K. Gaussian Error Linear Units (GELUs). arXiv 2016, arXiv:1606.08415. [Google Scholar]

Figure 1. Unified Smoke Detection and Aerosol Estimation Framework (SDAF). The framework integrates smoke detection and particulate matter (PM) mass concentration retrieval. A YOLO-based detection module first identifies and delineates smoke regions in out-of-plume (OutP; sampled outside plumes; partial smoke coverage) images. Subsequently, a convolutional neural network (CNN) optimized on in-plume (InP; sampled inside plumes; high smoke coverage) images estimates PM from the detected regions. Within this unified framework, various training and inference strategies are explored to assess the effects of region of interest (ROI)-based processing, domain adaptation, and training-testing consistency on prediction performance.

Figure 2. Performance comparison of six CNN-based PM₁ concentration estimation models on InP images. (a) training-stage evolution of R² across epochs (one epoch stands for a complete pass through the entire training dataset); (b) testing-stage comparison of slope, R², RMSE, and MAE.

Figure 3. Orthogonal distance regression (ODR) fits between predicted and measured PM₁ mass concentrations for InP images. Results are presented for two model configurations: (a) the baseline EfficientNet-B0 and (b) the optimized EfficientNet-B0, selected as the In-plume pretrained Smoke Particle Estimation Model (InP-SPEM). For each scatter plot, a donut chart (a pie chart with concentric rings) displays the frequency distributions of the measured (inner ring) and predicted (outer ring) PM₁ concentrations, with colors representing the selected concentration bins.

Figure 4. Performance comparison of four transfer learning strategies (designated M1, M2, M3, and M4) for PM₁ concentration estimation. M1 is the baseline. M2 refines inputs via ROI extraction. M3 adapts via full-image fine-tuning. M4 combines both for synergistic optimization.

Figure 5. Comparison of four transfer learning strategies (M1–M4) for PM₁ concentration estimation on OutP images. Shown are linear-scale (a–d) and corresponding log-scale (e–h) scatter plots. In each plot, the red line represents the ODR fit, and the green error bars indicate the standard deviation (SD) of the measured PM₁. The orange histograms and blue smoothed curves in the upper and right marginal panels show the frequency distributions of the measured and predicted PM₁, respectively.

Figure 6. Representative PM₁ concentration estimation results obtained using the M4 within the SDAF. Each panel shows the detected smoke region (delineated by a blue bounding box), alongside the corresponding measured (presented as mean ± SD) and predicted PM₁ concentrations for a OutP image.

Table 1. Key characteristics of in-plume (InP) and out-of-plume (OutP) images.

	In-Plume	Out-of-Plume
Acquisition position	Inside plume	Outside plume
Smoke coverage	High	Partial
Smoke morphology	Homogeneous	Heterogeneous
Fire type	Wildfires	Agricultural fires
Background	Minimal/obscured	Complex
Viewing geometry	Consistent	Variable (multi-angle)

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Li, P.; Guo, H. A Unified Deep Learning Framework for Biomass Burning Plume Detection and Domain-Adaptive PM₁ Estimation. Sustainability 2026, 18, 5138. https://doi.org/10.3390/su18105138

AMA Style

Li P, Guo H. A Unified Deep Learning Framework for Biomass Burning Plume Detection and Domain-Adaptive PM₁ Estimation. Sustainability. 2026; 18(10):5138. https://doi.org/10.3390/su18105138

Chicago/Turabian Style

Li, Peimeng, and Hongyu Guo. 2026. "A Unified Deep Learning Framework for Biomass Burning Plume Detection and Domain-Adaptive PM₁ Estimation" Sustainability 18, no. 10: 5138. https://doi.org/10.3390/su18105138

APA Style

Li, P., & Guo, H. (2026). A Unified Deep Learning Framework for Biomass Burning Plume Detection and Domain-Adaptive PM₁ Estimation. Sustainability, 18(10), 5138. https://doi.org/10.3390/su18105138

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu