Improving UAV-Based Detection of Low-Emission Smoke with an Advanced Dataset Generation Pipeline
Abstract
:1. Introduction
1.1. Air Pollution and Its Impact on Public Health
1.2. UAV-Based Detection of Low-Emission Sources
1.3. Advances in Vision-Based Smoke Detection
1.3.1. Rule-Based and Classical Approaches to Smoke Detection
1.3.2. Deep Learning Approaches to Smoke Detection
2. Materials and Methods
2.1. Overview of Low-Emission Smoke Detection
2.2. Image Acquisition and Initial Dataset Preparation
- Data Collection Locations and Conditions
- Clear sky and overcast conditions—assessing visibility under different lighting;
- Temperature range from −10 °C to 15 °C—accounting for seasonal variations;
- Wind speeds up to approximately 10 m/s (gusts)—reflecting real-world variability while remaining within UAV operational limits;
- Snow-covered rooftops—included in the dataset to ensure the YOLO model can recognize rooftops under winter conditions.
- UAV Platforms and Flight Parameters
- DJI Mavic Air;
- DJI Mavic 2 Enterprise Dual—thermal imaging capabilities (not used in this work);
- DJI Air 2S—improved low-light performance;
- DJI Mini 4 Pro—lightweight, sub-250 g UAV for regulatory flexibility.
- Flight Parameters and Stability Considerations
2.2.1. Dataset Components
- Input Dataset:
- 2.
- Test Dataset:
- 3.
- Qualitative Validation Data:
Dataset Availability
2.3. Automated Dataset Generation for Smoke Detection from Stationary UAV Sequences
- Motion Detection: Temporal changes between video frames are analyzed to identify regions of interest (ROIs) that may contain smoke. This step serves as the initial filter, capturing moving elements that could potentially correspond to chimney emissions.
- Rooftop Localization: A pre-trained YOLO model is used to constrain the search space, ensuring that only motion regions appearing above rooftops are considered as potential smoke emissions. This significantly reduces false positives caused by moving objects such as vehicles, pedestrians, or vegetation.
- Binary Classification (new in this study): The final refinement stage introduces a LightGBM-based classifier that differentiates true smoke regions from other moving elements. This additional step enhances dataset accuracy, particularly in challenging conditions where motion-based and rooftop-based filtering alone are insufficient.
2.3.1. Motion Detection and Initial Region Segmentation
Preprocessing and Motion Mask Generation
- Noise Reduction: Gaussian blurring () is applied to suppress high-frequency noise and reduce sensitivity to small, fast-moving elements (e.g., leaves, insects, or airborne debris). The kernel size was selected to balance noise suppression while preserving motion edges.
- Temporal Gradient Computation: Motion is detected by computing the absolute difference between luminance values of frames separated by n intervals:
Thresholding and Morphological Processing
- Adaptive Thresholding: The temporal gradient images are binarized using adaptive thresholding:
- Morphological Refinement and Contour Analysis: To improve the segmentation of motion regions:
- -
- Morphological operations (erosion and dilation) are applied using disk-shaped structuring elements to remove noise and fill small gaps:
- -
- Contour detection and filtering remove small or overly large regions. Adjacent contours are merged based on their proximity, and convex hulls are computed to ensure consistent motion masks.
Morphological operation parameters were chosen to match the expected sizes of smoke regions while preventing the excessive merging of independent detections. Empirical tuning ensured that the process was effective across different environmental conditions.
2.3.2. Rooftop Localization
2.3.3. Binary Classification of Smoke Regions
Motivation and Input Data
- RGB frames: Cropped fragments of the original stabilized video frames, providing detailed color and texture information. These images represent the detected motion regions and serve as the primary visual context for classification.
- Difference frames: Temporal gradients computed over a window of 10 frames, highlighting motion dynamics within the detected regions. These frames emphasize changes over time, making them useful for identifying smoke’s characteristic movement patterns.
- “Precise” masks: Binary masks obtained through morphological processing, representing refined motion regions. These masks help isolate detected motion regions, such as smoke or other moving objects like cars, from the background. This enables a more focused evaluation of features such as saturation or hue, reducing the influence of irrelevant areas.
- Convex contour masks: Binary masks representing convex hulls of detected motion regions, providing a smoothed and generalized approximation of object shapes. Similar to “precise” masks, these masks can be used to constrain feature extraction to relevant areas. Their simplified shape makes them particularly useful for reducing noise introduced by irregular or fragmented motion contours.
Feature Selection: Initial Set of Features
- GLCM-Based Features: Gray-Level Co-occurrence Matrix (GLCM) features, such as contrast and correlation, were computed for multiple directions and distances, capturing textural properties from difference and spatial frames.
- Entropy: Shannon entropy, a measure of randomness or complexity, was calculated for both RGB and difference frames, capturing variations within the regions.
- Blur Effect: A Laplacian-based metric was used to quantify blur of spatial or temporal frames.
- Statistical Measures: Mean and standard deviation of pixel intensities were extracted from both RGB and difference frames to capture basic statistical properties of the regions.
- Color Properties: Mean hue and saturation values were derived from RGB frames, capturing the dominant color characteristics of the regions.
- Mask-Based Features: Binary masks derived from motion regions (“precise” and convex contour variants) were initially considered to constrain feature extraction to the areas of detected motion. However, preliminary tests showed no significant improvement in classification performance, and these features were not included in the final model.
Classifier Selection
- Efficiency: LightGBM achieves fast training and inference times, making it suitable for iterative model development and potential real-time applications.
- Memory usage: The algorithm is optimized for low memory consumption, enabling scalability for larger datasets without excessive computational overhead.
- Accuracy: Gradient boosting is well suited for capturing complex feature interactions, leading to high predictive performance in tasks requiring nuanced decision-making.
Threshold Optimization for Binary Classification
Feature Importance and Final Feature Selection
- SET1: All Features—Includes the complete set of features extracted from both RGB frames and difference frames. While this set achieves high accuracy (0.94) and precision (0.94), it also generates a relatively higher number of false positives (32).
- SET2: No Spatial GLCM—Excludes GLCM features derived from RGB frames, retaining only those from difference frames. Although precision slightly improves (0.94), accuracy remains almost unchanged (0.94) and the number of false positives increases slightly (36).
- SET3: Minimal Set Without GLCM—Removes all GLCM features, leading to a noticeable decline in accuracy (0.88) and precision (0.88), with false positives increasing to 46.
- SET4: Optimal Feature Set—Retains all features from RGB frames while removing spatial GLCM features and limiting GLCM features from difference frames to the two most impactful ones (contrast and correlation). This configuration achieves a balanced trade-off, maintaining accuracy (0.94) and precision (0.94) while slightly reducing the number of false positives (34).
2.4. Final Dataset Preparation
2.4.1. Evaluation Datasets
- Dataset 1: Baseline (Without Classifier)—This dataset corresponds to the configuration used in previous work [6], relying solely on motion detection and rooftop localization without the binary classifier. While the expanded input dataset includes new winter and oblique sequences, annotations were generated using the unrefined pipeline.
- Dataset 2: Optimized Pipeline (With Classifier, SET4, )—This dataset incorporates the binary classifier with the optimized feature set (SET4) and decision threshold . The classifier enhances annotation quality by filtering out false positives, particularly non-smoke motion regions, significantly improving dataset reliability.
- Dataset 3: Classifier Without Rooftop Verification—This configuration applies the binary classifier to all detected motion regions, bypassing the rooftop localization step. The dataset serves to analyze the contribution of rooftop verification to dataset quality and model performance.
Dataset Statistics
2.5. YOLO Training Configuration
- Image Size: The input image resolution was set to pixels. This adjustment was necessary due to the small size of the smoke regions, as smaller resolutions () significantly degraded detection quality.
- Augmentations: Despite the high variability inherent to the training dataset, additional augmentations were applied. These included random rotations within and scaling by , both of which further improved the detector’s performance.
3. Results
3.1. Evaluation Setup
- YOLO-Base—The baseline model, trained on Dataset 1, which relies solely on motion-based annotations without additional filtering. This dataset was generated using the original pipeline approach from [6] and serves as a reference point for evaluating the impact of the proposed modifications.
- YOLO-Class—The improved version, trained on Dataset 2, which incorporates a LightGBM-based classifier to filter out non-smoke objects and improve annotation quality. Motion regions are constrained to rooftops to focus on residential emissions, aligning with the intended application of the detector.
- YOLO-NoRoof—A classifier-based model trained on Dataset 3, where the rooftop constraint is removed, allowing the binary classifier to process all detected motion regions. This configuration enables the detection of a broader range of smoke sources, such as bonfires and industrial emissions.
3.2. Validation Using Automatically Annotated Datasets
3.3. Model Evaluation on an Independent Test Set
3.4. Qualitative Evaluation on the Test Set
3.5. Qualitative Validation on Video Sequences
3.5.1. Intermittent False Negatives in Video Sequences
3.5.2. Transient False Positives in Individual Frames
3.5.3. Detection of Distant Smoke in Oblique Views
3.5.4. Summary of Video-Based Evaluation
- Intermittent false negatives are primarily frame-dependent and can be mitigated through temporal interpolation.
- Smoke detection in oblique views generally improves as the UAV approaches the source, though steep angles remain challenging.
- False positives are infrequent and typically isolated, making them straightforward to filter.
- Winter conditions present challenges but do not significantly degrade performance in most cases.
4. Discussion
4.1. Impact of the New Data Generation Pipeline on Detection Performance
4.2. Challenges in Oblique Views and Winter Conditions
4.3. Real-Time Feasibility and Temporal Consistency
4.4. Limitations and Future Directions
4.5. Implications for UAV-Based Environmental Monitoring
4.6. Summary of Key Findings
- The classifier-based dataset refinement significantly improves precision while maintaining recall.
- Oblique views and winter conditions pose challenges but do not substantially degrade detection performance.
- Temporal inconsistencies, such as intermittent false positives and false negatives, can be mitigated through post-processing.
- YOLOv11-nano-based models can operate in real time on high-performance hardware, making UAV-based real-time applications feasible.
- Expanding the dataset to include additional smoke sources could further enhance the generalization of the YOLO-NoRoof model. However, nighttime detection would require a fundamentally different approach, potentially incorporating infrared or thermal imaging.
- UAV-based smoke detection shows strong potential for air quality monitoring and regulatory enforcement applications.
Funding
Data Availability Statement
Conflicts of Interest
Abbreviations
AGL | Above Ground Level |
BB | Bounding Box |
CNN | Convolutional Neural Network |
DWT | Discrete Wavelet Transform |
ECC | Enhanced Correlation Coefficient Maximization |
FN | False Negative |
FP | False Positive |
GMM | Gaussian Mixture Model |
GLCM | Gray-Level Co-occurrence Matrix |
IoU | Intersection over Union |
LBP | Local Binary Patterns |
mAP | mean Average Precision |
NMS | Non-Maximum Suppression |
OBB | Oriented Bounding Box |
PM | Particulate Matter |
RGB | Red, Green, Blue (color model) |
ROI | Region of Interest |
SDK | Software Development Kit |
SoR | Smoke over Rooftop Index |
SVM | Support Vector Machine |
TP | True Positive |
UAV | Unmanned Aerial Vehicle |
WHO | World Health Organization |
YOLO | You Only Look Once |
References
- Ortiz, A.G.; Guerreiro, C.; Soares, J. EEA Report No 09/2020 (Air Quality in Europe 2020); Annual report; The European Environment Agency: Copenhagen, Denmark, 2020. [Google Scholar]
- Program PAS dla Czystego Powietrza w Polsce; Presentation; Polish Smog Alert (PAS): Cracow, Poland, 2020.
- Bebkiewicz, K.; Chłopek, Z.; Chojnacka, K.; Doberska, A.; Kanafa, M.; Kargulewicz, I.; Olecka, A.; Rutkowski, J.; Walęzak, M.; Waśniewska, S.; et al. Krajowy Bilans Emisji SO2, NOX, CO, NH3, NMLZO, Pyłów, Metali Ciężkich i TZO za lata 1990–2019; Presentation; The National Centre for Emissions Management (KOBiZE): Warsaw, Poland, 2021. [Google Scholar]
- Saydirasulovich, S.N.; Mukhiddinov, M.; Djuraev, O.; Abdusalomov, A.; Cho, Y.I. An Improved Wildfire Smoke Detection Based on YOLOv8 and UAV Images. Sensors 2023, 23, 8374. [Google Scholar] [CrossRef] [PubMed]
- Mukhiddinov, M.; Abdusalomov, A.B.; Cho, J. A Wildfire Smoke Detection System Using Unmanned Aerial Vehicle Images Based on the Optimized YOLOv5. Sensors 2022, 22, 9384. [Google Scholar] [CrossRef] [PubMed]
- Szczepański, M. Vision-Based Detection of Low-Emission Sources in Suburban Areas Using Unmanned Aerial Vehicles. Sensors 2023, 23, 2235. [Google Scholar] [CrossRef] [PubMed]
- Jiao, Z.; Zhang, Y.; Mu, L.; Xin, J.; Jiao, S.; Liu, H.; Liu, D. A yolov3-based learning strategy for real-time uav-based forest fire detection. In Proceedings of the 2020 Chinese Control and Decision Conference (CCDC), Hefei, China, 22–24 August 2020; pp. 4963–4967. [Google Scholar]
- Ronneberger, O.; Fischer, P.; Brox, T. U-net: Convolutional networks for biomedical image segmentation. In Proceedings of the Medical Image Computing and Computer-Assisted Intervention–MICCAI 2015: 18th International Conference, Munich, Germany, 5–9 October 2015; Proceedings, Part III 18. Springer: Berlin/Heidelberg, Germany, 2015; pp. 234–241. [Google Scholar]
- Xu, Z.; Xu, J. Automatic Fire Smoke Detection Based on Image Visual Features. In Proceedings of the International Conference on Computational Intelligence and Security Workshops (CISW 2007), Harbin, China, 15–19 December 2007; pp. 316–319. [Google Scholar]
- Chunyu, Y.; Jun, F.; Jinjun, W.; Yongming, Z. Video Fire Smoke Detection Using Motion and Color Features. Fire Technol. 2010, 46, 651–663. [Google Scholar] [CrossRef]
- Yuan, F. A fast accumulative motion orientation model based on integral image for video smoke detection. Pattern Recognit. Lett. 2008, 29, 925–932. [Google Scholar] [CrossRef]
- Calderara, S.; Piccinini, P.; Cucchiara, R. Smoke Detection in Video Surveillance: A MoG Model in the Wavelet Domain. In Proceedings of the Computer Vision Systems: 6th International Conference, ICVS 2008, Santorini, Greece, 12–15 May 2008; Gasteratos, A., Vincze, M., Tsotsos, J.K., Eds.; Springer: Berlin/Heidelberg, Germany, 2008; pp. 119–128. [Google Scholar]
- Kolesov, I.; Karasev, P.; Tannenbaum, A.; Haber, E. Fire and smoke detection in video with optimal mass transport based optical flow and neural networks. In Proceedings of the 2010 IEEE International Conference on Image Processing, Hong Kong, China, 26–29 September 2010; pp. 761–764. [Google Scholar] [CrossRef]
- Gubbi, J.; Marusic, S.; Palaniswami, M. Smoke detection in video using wavelets and support vector machines. Fire Saf. J. 2009, 44, 1110–1115. [Google Scholar] [CrossRef]
- Olivares-Mercado, J.; Toscano-Medina, K.; Sánchez-Perez, G.; Hernandez-Suarez, A.; Perez-Meana, H.; Sandoval Orozco, A.L.; García Villalba, L.J. Early Fire Detection on Video Using LBP and Spread Ascending of Smoke. Sustainability 2019, 11, 3261. [Google Scholar] [CrossRef]
- Panchanathan, S.; Zhao, Y.; Zhou, Z.; Xu, M. Forest Fire Smoke Video Detection Using Spatiotemporal and Dynamic Texture Features. J. Electr. Comput. Eng. 2015, 2015, 706187. [Google Scholar] [CrossRef]
- Xu, G.; Zhang, Y.; Zhang, Q.; Lin, G.; Wang, J. Deep domain adaptation based video smoke detection using synthetic smoke images. Fire Saf. J. 2017, 93, 53–59. [Google Scholar] [CrossRef]
- Yuan, F. Video-based smoke detection with histogram sequence of LBP and LBPV pyramids. Fire Saf. J. 2011, 46, 132–139. [Google Scholar] [CrossRef]
- Favorskaya, M.; Pyataeva, A.; Popov, A. Verification of Smoke Detection in Video Sequences Based on Spatio-temporal Local Binary Patterns. Procedia Comput. Sci. 2015, 60, 671–680. [Google Scholar] [CrossRef]
- Krizhevsky, A.; Sutskever, I.; Hinton, G.E. ImageNet Classification with Deep Convolutional Neural Networks. Commun. ACM 2017, 60, 84–90. [Google Scholar] [CrossRef]
- Szegedy, C.; Liu, W.; Jia, Y.; Sermanet, P.; Reed, S.; Anguelov, D.; Erhan, D.; Vanhoucke, V.; Rabinovich, A. Going deeper with convolutions. In Proceedings of the 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Boston, MA, USA, 7–12 June 2015; pp. 1–9. [Google Scholar] [CrossRef]
- Simonyan, K.; Zisserman, A. Very deep convolutional networks for large-scale image recognition. arXiv 2014, arXiv:1409.1556. [Google Scholar]
- Srinivas, K.; Dua, M. Fog Computing and Deep CNN Based Efficient Approach to Early Forest Fire Detection with Unmanned Aerial Vehicles. In Proceedings of the Inventive Computation Technologies; Smys, S., Bestak, R., Rocha, Á., Eds.; Springer International Publishing: Cham, Switzerland, 2020; pp. 646–652. [Google Scholar]
- Lee, W.; Kim, S.; Lee, Y.T.; Lee, H.W.; Choi, M. Deep neural networks for wild fire detection with unmanned aerial vehicle. In Proceedings of the 2017 IEEE International Conference on Consumer Electronics (ICCE), Las Vegas, NV, USA, 8–10 January 2017; pp. 252–253. [Google Scholar] [CrossRef]
- Chen, Y.; Zhang, Y.; Xin, J.; Wang, G.; Mu, L.; Yi, Y.; Liu, H.; Liu, D. UAV Image-based Forest Fire Detection Approach Using Convolutional Neural Network. In Proceedings of the 2019 14th IEEE Conference on Industrial Electronics and Applications (ICIEA), Xi’an, China, 19–21 June 2019; pp. 2118–2123. [Google Scholar] [CrossRef]
- Ren, S.; He, K.; Girshick, R.; Sun, J. Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks. IEEE Trans. Pattern Anal. Mach. Intell. 2017, 39, 1137–1149. [Google Scholar] [CrossRef]
- Jiao, Z.; Zhang, Y.; Xin, J.; Mu, L.; Yi, Y.; Liu, H.; Liu, D. A deep learning based forest fire detection approach using UAV and YOLOv3. In Proceedings of the 2019 1st International conference on industrial artificial intelligence (IAI), Shenyang, China, 23–27 July 2019; pp. 1–5. [Google Scholar]
- Lin, T.Y.; Goyal, P.; Girshick, R.; He, K.; Dollár, P. Focal loss for dense object detection. In Proceedings of the IEEE International Conference On Computer Vision, Venice, Italy, 22–29 October 2017; pp. 2980–2988. [Google Scholar]
- Alexandrov, D.; Pertseva, E.; Berman, I.; Pantiukhin, I.; Kapitonov, A. Analysis of machine learning methods for wildfire security monitoring with an unmanned aerial vehicles. In Proceedings of the 2019 24th conference of open innovations association (FRUCT), Moscow, Russia, 8–12 April 2019; pp. 3–9. [Google Scholar]
- Ke, G.; Meng, Q.; Finley, T.; Wang, T.; Chen, W.; Ma, W.; Ye, Q.; Liu, T.Y. LightGBM: A Highly Efficient Gradient Boosting Decision Tree. In Proceedings of the Advances in Neural Information Processing Systems, Long Beach, CA, USA, 4–9 December 2017; Volume 30, pp. 3146–3154. [Google Scholar]
- Evangelidis, G.; Psarakis, E. Parametric Image Alignment Using Enhanced Correlation Coefficient Maximization. IEEE Trans. Pattern Anal. Mach. Intell. 2008, 30, 1858–1865. [Google Scholar] [CrossRef]
- Wang, C.Y.; Bochkovskiy, A.; Liao, H.Y.M. YOLOv7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. arXiv 2022, arXiv:2207.02696. [Google Scholar]
- Lazarevich, I.; Grimaldi, M.; Kumar, R.; Mitra, S.; Khan, S.; Sah, S. YOLOBench: Benchmarking Efficient Object Detectors on Embedded Systems. arXiv 2023, arXiv:2307.13901. [Google Scholar]
Dataset | Training Frames | Validation Frames | Total Annotations |
---|---|---|---|
Without Classifier | 16,948 | 4238 | 37,801 |
With Classifier (SET4) | 16,948 | 4238 | 23,548 |
Classifier Without Rooftop Verification | 16,948 | 4238 | 28,139 |
Model | Precision | Recall | mAP@0.5 | mAP@0.5:0.9 |
---|---|---|---|---|
YOLO-Base_n | 0.944 | 0.794 | 0.557 | 0.308 |
YOLO-Class_n | 0.975 | 0.899 | 0.797 | 0.461 |
YOLO-NoRoof_n | 0.967 | 0.826 | 0.713 | 0.405 |
YOLO-Base_s | 0.930 | 0.822 | 0.603 | 0.342 |
YOLO-Class_s | 0.980 | 0.912 | 0.821 | 0.490 |
YOLO-NoRoof_s | 0.966 | 0.852 | 0.755 | 0.452 |
Bold values indicate the best-performing results. |
Model | Precision | Recall | mAP@0.5 | mAP@0.5:0.9 | TP | FP | FN |
---|---|---|---|---|---|---|---|
YOLO-Base_n | 0.761 | 0.672 | 0.615 | 0.357 | 831 | 261 | 406 |
YOLO-Class_n | 0.829 | 0.681 | 0.626 | 0.361 | 843 | 174 | 394 |
YOLO-NoRoof_n | 0.776 | 0.671 | 0.612 | 0.362 | 835 | 241 | 409 |
YOLO-Base_s | 0.778 | 0.717 | 0.609 | 0.378 | 896 | 256 | 353 |
YOLO-Class_s | 0.820 | 0.681 | 0.633 | 0.394 | 850 | 186 | 399 |
YOLO-NoRoof_s | 0.793 | 0.690 | 0.606 | 0.376 | 867 | 226 | 390 |
Model | Precision | Recall | TP | FP | FN |
---|---|---|---|---|---|
YOLO-Base_n | 0.973 | 0.937 | 1062 | 30 | 72 |
YOLO-Class_n | 0.994 | 0.939 | 1011 | 6 | 66 |
YOLO-NoRoof_n | 0.983 | 0.932 | 1058 | 18 | 77 |
YOLO-Base_s | 0.952 | 0.956 | 1097 | 55 | 50 |
YOLO-Class_s | 0.986 | 0.929 | 1022 | 14 | 78 |
YOLO-NoRoof_s | 0.982 | 0.941 | 1073 | 20 | 67 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the author. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Szczepański, M. Improving UAV-Based Detection of Low-Emission Smoke with an Advanced Dataset Generation Pipeline. Remote Sens. 2025, 17, 1004. https://doi.org/10.3390/rs17061004
Szczepański M. Improving UAV-Based Detection of Low-Emission Smoke with an Advanced Dataset Generation Pipeline. Remote Sensing. 2025; 17(6):1004. https://doi.org/10.3390/rs17061004
Chicago/Turabian StyleSzczepański, Marek. 2025. "Improving UAV-Based Detection of Low-Emission Smoke with an Advanced Dataset Generation Pipeline" Remote Sensing 17, no. 6: 1004. https://doi.org/10.3390/rs17061004
APA StyleSzczepański, M. (2025). Improving UAV-Based Detection of Low-Emission Smoke with an Advanced Dataset Generation Pipeline. Remote Sensing, 17(6), 1004. https://doi.org/10.3390/rs17061004