Spatio-Temporal Data Model for Early Wildfire Detection
Abstract
1. Introduction


2. Related Work
3. Materials and Methods
- 1.
- Collecting data from the archive of the operational wildfire surveillance system;
- 2.
- Compilation of different multichannel image data models based on various spatial, temporal, and contextual information;
- 3.
- Training and evaluation of the standard YOLO architecture on the collected datasets in order to select the best-performing multichannel image data model;
- 4.
- Generating an extended dataset using augmentation techniques for the selected multichannel data model, as well as training and evaluation of the standard YOLO architecture on the extended dataset;
- 5.
- Evaluation of the trained algorithm on original high-resolution video sequences taken from the cameras of the wildfire surveillance system.
3.1. Dataset Collection
3.2. Data Models
3.2.1. Long- and Short-Term Memory Encoding
3.2.2. Short-Time Foreground Estimation
3.2.3. Distance Channel
- 1.
- RGB—Original RGB image taken from the camera;
- 2.
- Temporal image—Three-channel image consisting of the blue channel of the current frame and two channels representing short-term and long-term memory, as defined in Section 3.2.1.
- 3.
- RGB + Temporal—Five-channel samples consisting of the original RGB image and two channels representing short-term and long-term memory.
- 4.
- RGB + Distance—Four-channel samples containing RGB image and relative distance from the camera for each pixel.
- 5.
- RGB + Temporal + Distance—Six-channel samples consisting of the original RGB image, short- and long-term memory, and relative distance for each pixel.
- 6.
- RGB + Foreground—Five-channel image, consisting of RGB channels and two additional channels defined in Section 3.2.2: (2) representing difference between the current frame and short-term background, and (5) representing long-term dynamic characteristics of the image.
- 7.
- All channels (RGB + Temp. + Dist. + Fgr.)—Eight-channel data samples consisting of the original RGB image and all additional channels defined above.
3.3. Data Model Evaluation
3.4. Multichannel YOLO Training on Augmented Dataset
- 1.
- No changes can be applied directly to channels based on the dynamic properties of the image, i.e., calculated from a sequence of successive images. All modifications can only be applied to the original RGB images retrieved from the camera.
- 2.
- All augmentation procedures applied to a sequence of images must be consistent with possible changes that could realistically occur between two consecutive frames. For example, any augmentation based on changes in image geometry must be applied to all images in the sequence in the same way and with the same parameters.
- 1.
- Sequence transform—set of transformations applied to the entire sequence. Transformations were applied to each frame in the sequence with exactly the same parameters for all transformations in the set.
- 2.
- Series transform—set of transformation applied to all frames in a series, i.e., to a set of consecutive frames captured at one-second intervals during a single stop at a preset position. This set of transformations reflects changes that may occur during a period of approximately two minutes, which is the time it takes for the camera to return to the same preset position. Transformations were applied to each frame in the series with exactly the same set of parameters.
- 3.
- Frame transform—set of transformations applied to each frame in the sequence with the randomly generated parameters. These transformation correspond to the changes that can occur in one second.
3.5. Evaluation on High-Resolution Sequences
3.6. Evaluation Metrics
4. Results
4.1. Data Model Selection
4.2. Training on Augmented Data
4.3. Evaluation on Sequences
5. Discussion
6. Conclusions
Author Contributions
Funding
Data Availability Statement
Acknowledgments
Conflicts of Interest
References
- Casas, E.; Ramos, L.; Bendek, E.; Rivas, F. Assessing the Effectiveness of YOLO Architectures for Smoke and Wildfire Detection. IEEE Access 2023, 11, 96554–96583. [Google Scholar] [CrossRef]
- Batista, M.; Oliveira, B.; Chaves, P.; Ferreira, J.C.; Brandao, T. Improved real-time wildfire detection using a surveillance system. In Proceedings of the World Congress on Engineering 2019, London, UK, 3–5 July 2019. [Google Scholar]
- Bondarenko, V.; Vasyukov, V. Hardware and software complex configuration for automated wildfire detection. In Proceedings of the 2012 IEEE 11th International Conference on Actual Problems of Electronics Instrument Engineering (APEIE), Novosibirsk, Russia, 2–4 October 2012; pp. 101–104. [Google Scholar] [CrossRef]
- Štula, M.; Krstinić, D.; Šerić, L. Intelligent forest fire monitoring system. Inf. Syst. Front. 2012, 14, 725–739. [Google Scholar] [CrossRef]
- Saleh, A.; Zulkifley, M.A.; Harun, H.H.; Gaudreault, F.; Davison, I.; Spraggon, M. Forest fire surveillance systems: A review of deep learning methods. Heliyon 2023, 10, e23127. [Google Scholar] [CrossRef] [PubMed]
- Sharma, J.; Granmo, O.; Goodwin, M.; Fidje, J.T. Deep Convolutional Neural Networks for Fire Detection in Images. In Engineering Applications of Neural Networks; Springer International Publishing: Cham, Switzerland, 2017; pp. 183–193. [Google Scholar]
- Redmon, J.; Divvala, S.; Girshick, R.; Farhadi, A. You Only Look Once: Unified, Real-Time Object Detection. In Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 27–30 June 2016; pp. 779–788. [Google Scholar] [CrossRef]
- Raita-Hakola, A.; Rahkonen, S.; Suomalainen, J.; Markelin, L.; de Oliveira, R.A.; Hakala, T.; Koivumäki, N.; Honkavaara, E.; Pölönen, I. Combining YOLO V5 and transfer learning for smoke-based wildfire detection in boreal forests. Int. Arch. Photogramm. Remote Sens. Spat. Inf. Sci. 2023, 1771–1778. [Google Scholar] [CrossRef]
- Chetoui, M.; Akhloufi, M.A. Fire and Smoke Detection Using Fine-Tuned YOLOv8 and YOLOv7 Deep Models. Fire 2024, 7, 135. [Google Scholar] [CrossRef]
- Jeong, M.; Park, M.; Nam, J.Y.; Ko, B.C. Light-Weight Student LSTM for Real-Time Wildfire Smoke Detection. Sensors 2020, 20, 5508. [Google Scholar] [CrossRef]
- de Venâncio, P.V.A.B.; Campos, R.J.; Rezende, T.M.; Lisboa, A.C.; Barbosa, A.V. A hybrid method for fire detection based on spatial and temporal patterns. Neural Comput. Appl. 2023, 35, 9349–9361. [Google Scholar] [CrossRef]
- Dewangan, A.; Pande, Y.; Braun, H.W.; Vernon, F.; Perez, I.; Altintas, I.; Cottrell, G.W.; Nguyen, M.H. FIgLib & SmokeyNet: Dataset and Deep Learning Model for Real-Time Wildland Fire Smoke Detection. Remote Sens. 2022, 14, 1007. [Google Scholar] [CrossRef]
- Vdoviak, G.; Sledević, T. Temporal Encoding Strategies for YOLO-Based Detection of Honeybee Trophallaxis Behavior in Precision Livestock Systems. Agriculture 2025, 15, 2338. [Google Scholar] [CrossRef]
- Alzahrani, N.; Bchir, O.; Ismail, M.M.B. YOLO-Act: Unified Spatiotemporal Detection of Human Actions Across Multi-Frame Sequences. Sensors 2025, 25, 3013. [Google Scholar] [CrossRef]
- van Leeuwen, M.C.; Fokkinga, E.P.; Huizinga, W.; Baan, J.; Heslinga, F.G. Toward Versatile Small Object Detection with Temporal-YOLOv8. Sensors 2024, 24, 7387. [Google Scholar] [CrossRef]
- Krstinić, D.; Šerić, L.; Ivanda, A.; Bugarić, M. Multichannel data from temporal and contextual information for early wildfire detection. In Proceedings of the 2023 8th International Conference on Smart and Sustainable Technologies (SpliTech), Split/Bol, Croatia, 20–23 June 2023; pp. 1–6. [Google Scholar] [CrossRef]
- OIV Digital Signal and Networks, Odašiljači i veze d.o.o., Ulica Grada Vukovara 269d, HR-10000 Zagreb. OIV Fire Detect AI. 2025. Available online: https://oiv.hr/en/services-and-platforms/oiv-fire-detect-ai/ (accessed on 29 November 2025).
- Jocher, G.; Stoken, A.; Borovec, J.; NanoCode012; ChristopherSTAN; Liu, C.; Laughing; Hogan, A.; Lorenzomammana; Tkianai; et al. ultralytics/yolov5: v3.0. Zenodo 2020. [Google Scholar] [CrossRef]
- Khanam, R.; Hussain, M. YOLOv11: An Overview of the Key Architectural Enhancements. arXiv 2024. [Google Scholar] [CrossRef]
- Tian, Y.; Ye, Q.; Doermann, D. YOLOv12: Attention-Centric Real-Time Object Detectors. arXiv 2025. [Google Scholar] [CrossRef]
- Tao, C.; Zhang, J.; Wang, P. Smoke Detection Based on Deep Convolutional Neural Networks. In Proceedings of the 2016 International Conference on Industrial Informatics—Computing Technology, Intelligent Technology, Industrial Information Integration (ICIICII), Wuhan, China, 3–4 December 2016; pp. 150–153. [Google Scholar] [CrossRef]
- Zhang, A.; Zhang, A.S. Real-Time Wildfire Detection and Alerting with a Novel Machine Learning Approach. Int. J. Adv. Comput. Sci. Appl. 2022, 13. [Google Scholar] [CrossRef]
- Gonzalez, A.; Zuniga, M.D.; Nikulin, C.; Carvajal, G.; Cardenas, D.G.; Pedraza, M.A.; Fernandez, C.A.; Munoz, R.I.; Castro, N.A.; Rosales, B.F.; et al. Accurate fire detection through fully convolutional network. In 7th Latin American Conference on Networked and Electronic Media (LACNEM 2017); IET: Hertfordshire, UK, 2017; pp. 1–6. [Google Scholar] [CrossRef]
- Zhang, Y.; Rui, X.; Song, W. A UAV-Based Multi-Scenario RGB-Thermal Dataset and Fusion Model for Enhanced Forest Fire Detection. Remote Sens. 2025, 17, 2593. [Google Scholar] [CrossRef]
- El-Madafri, I.; Peña, M.; Olmedo-Torre, N. Real-Time Forest Fire Detection with Lightweight CNN Using Hierarchical Multi-Task Knowledge Distillation. Fire 2024, 7, 392. [Google Scholar] [CrossRef]
- Almeida, J.S.; Huang, C.; Nogueira, F.G.; Bhatia, S.; de Albuquerque, V.H.C. EdgeFireSmoke: A Novel Lightweight CNN Model for Real-Time Video Fire–Smoke Detection. IEEE Trans. Ind. Inform. 2022, 18, 7889. [Google Scholar] [CrossRef]
- Cao, J.; Peng, B.; Gao, M.; Hao, H.; Li, X.; Mou, H. Object Detection Based on CNN and Vision-Transformer: A Survey. IET Comput. Vis. 2025, 19, e70028. [Google Scholar] [CrossRef]
- Lee, S.I.; Koo, K.; Lee, J.H.; Lee, G.; Jeong, S.; O, S.; Kim, H. Vision transformer models for mobile/edge devices: A survey. Multimed. Syst. 2024, 30, 109. [Google Scholar] [CrossRef]
- Neimark, D.; Bar, O.; Zohar, M.; Asselmann, D. Video transformer network. In Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision, Montreal, QC, Canada, 11–17 October 2021; pp. 3163–3172. [Google Scholar]
- Feichtenhofer, C.; Fan, H.; Malik, J.; He, K. Slowfast networks for video recognition. In Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision, Seoul, Korea, 27 October–2 November 2019; pp. 6202–6211. [Google Scholar]
- Xing, Z.; Dai, Q.; Hu, H.; Chen, J.; Wu, Z.; Jiang, Y.G. Svformer: Semi-supervised video transformer for action recognition. In Proceedings of the 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Vancouver, BC, Canada, 17–24 June 2023; pp. 18816–18826. [Google Scholar]
- Polenakis, I.; Sarantidis, C.; Karydis, I.; Avlonitis, M. Smoke Detection on the Edge: A Comparative Study of YOLO Algorithm Variants. Signals 2025, 6, 60. [Google Scholar] [CrossRef]
- Zhang, P.; Zhao, X.; Yang, X.; Zhang, Z.; Bi, C.; Zhang, L. F3-YOLO: A Robust and Fast Forest Fire Detection Model. Forests 2025, 16, 1368. [Google Scholar] [CrossRef]
- Zhu, W.; Niu, S.; Yue, J.; Zhou, Y. Multiscale wildfire and smoke detection in complex drone forest environments based on YOLOv8. Sci. Rep. 2025, 15, 2399. [Google Scholar] [CrossRef]
- Hochreiter, S.; Schmidhuber, J. Long Short-Term Memory. Neural Comput. 1997, 9, 1735–1780. [Google Scholar] [CrossRef]
- Krichen, M.; Mihoub, A. Long Short-Term Memory Networks: A Comprehensive Survey. AI 2025, 6, 215. [Google Scholar] [CrossRef]
- Jakovčević, T.; Stipaničev, D.; Krstinić, D. Visual spatial-context based wildfire smoke sensor. Mach. Vis. Appl. 2013, 24, 707–719. [Google Scholar] [CrossRef]
- Collins, R.; Lipton, A.; Kanade, T.; Fujiyoshi, H.; Duggins, D.; Tsin, Y.; Tolliver, D.; Enomoto, N.; Hasegawa, O.; Burt, P.; et al. A System for Video Surveillance and Monitoring; Technical Report CMU-RI-TR-00-12; Robotics Institute, Carnegie Mellon University: Pittsburgh, PA, USA, 2000. [Google Scholar]
- Buslaev, A.; Iglovikov, V.I.; Khvedchenya, E.; Parinov, A.; Druzhinin, M.; Kalinin, A.A. Albumentations: Fast and Flexible Image Augmentations. Information 2020, 11, 125. [Google Scholar] [CrossRef]







| Data | Locations | Sequences | Smoke | No Smoke |
|---|---|---|---|---|
| Train | 48 | 234 | 147 | 87 |
| Validation | 18 | 53 | 29 | 24 |
| Test | 13 | 46 | 26 | 20 |
| Data | Samples | Smoke | No Smoke |
|---|---|---|---|
| Train (no augmentation) | 9077 | 5238 | 3839 |
| Train (augmented) | 74,502 | 37,251 | 37,251 |
| Validation | 1694 | 847 | 847 |
| Test | 1854 | 927 | 927 |
| Transformation | 1 | 2 | 3 | 4 | 5 | 6 | 7 |
|---|---|---|---|---|---|---|---|
| Scale | 1.0 | 2.0 | 0.6 | 1.0 | 1.0 | 1.0 | 1.0 |
| Random scale | 0.0 | 0.0 | 0.0 | 0.2 | 0.2 | 0.2 | 0.2 |
| Horizontal flip | 1.0 | 0.2 | 0.2 | 0.2 | 0.2 | 0.2 | 0.2 |
| Rotate | 0.2 | 0.2 | 0.2 | 1.0 | 0.2 | 0.2 | 0.2 |
| Perspective | 0.2 | 0.2 | 0.2 | 0.2 | 0.2 | 1.0 | 0.2 |
| Optical distortion | 0.2 | 0.2 | 0.2 | 0.2 | 1.0 | 0.2 | 0.2 |
| HSV | 0.2 | 0.2 | 0.2 | 0.2 | 0.2 | 0.2 | 0.6 |
| Random brightness | 0.2 | 0.2 | 0.2 | 0.2 | 0.2 | 0.2 | 1.0 |
| Random gamma | 0.2 | 0.2 | 0.2 | 0.2 | 0.2 | 0.2 | 0.6 |
| Num.Ch. | Nano | Small | Medium | Large | |
|---|---|---|---|---|---|
| RGB | 3 | 0.38 | 0.38 | 0.39 | 0.42 |
| Temp. | 3 | 0.45 | 0.43 | 0.43 | 0.42 |
| RGB + T. | 5 | 0.44 | 0.43 | 0.46 | 0.45 |
| RGB + Dist. | 4 | 0.34 | 0.37 | 0.36 | 0.36 |
| RGB + TD | 6 | 0.45 | 0.43 | 0.43 | 0.43 |
| RGB + Fgr. | 5 | 0.42 | 0.44 | 0.41 | 0.43 |
| RGB + TFD | 8 | 0.43 | 0.44 | 0.46 | 0.45 |
| Num.Ch. | Nano | Small | Medium | Large | |
|---|---|---|---|---|---|
| RGB | 3 | 0.310 | 0.323 | 0.330 | 0.345 |
| Temp. | 3 | 0.378 | 0.370 | 0.375 | 0.359 |
| RGB + T | 5 | 0.380 | 0.388 | 0.414 | 0.407 |
| RGB + Dist. | 4 | 0.263 | 0.290 | 0.279 | 0.280 |
| RGB + TD | 6 | 0.389 | 0.374 | 0.378 | 0.363 |
| RGB + Fgr. | 5 | 0.358 | 0.383 | 0.352 | 0.368 |
| RGB + TFD | 8 | 0.374 | 0.390 | 0.408 | 0.387 |
| Dataset | Nano | Small | Medium | Large | |
|---|---|---|---|---|---|
| YOLOv8 | Spatio-temporal 5D | 0.49 | 0.51 | 0.50 | 0.52 |
| YOLOv8 | RGB 3D | 0.41 | 0.44 | 0.46 | 0.46 |
| YOLOv11 | RGB 3D | 0.43 | 0.46 | 0.45 | 0.45 |
| YOLOv12 | RGB 3D | 0.46 | 0.47 | 0.44 | 0.46 |
| Dataset | Nano | Small | Medium | Large | |
|---|---|---|---|---|---|
| YOLOv8 | Spatio-temporal 5D | 0.46 | 0.47 | 0.47 | 0.49 |
| YOLOv8 | RGB 3D | 0.35 | 0.39 | 0.40 | 0.40 |
| YOLOv11 | RGB 3D | 0.36 | 0.42 | 0.40 | 0.40 |
| YOLOv12 | RGB 3D | 0.40 | 0.40 | 0.39 | 0.40 |
| RGB-T 5C YOLOv8 | Detected (TP) | Missed (FN) | Rate (%) | No Delay | Mean Delay | Max Delay |
|---|---|---|---|---|---|---|
| Nano | 54 | 32 | 62.8 | 44 | 74.4 | 210 |
| Small | 59 | 27 | 68.6 | 48 | 67.3 | 210 |
| Medium | 57 | 29 | 66.3 | 47 | 74.1 | 315 |
| Large | 57 | 29 | 66.3 | 50 | 46.3 | 210 |
| RGB 3C YOLOv8 | ||||||
| Nano | 56 | 30 | 65.1 | 41 | 77.5 | 315 |
| Small | 55 | 31 | 64.0 | 44 | 96.6 | 316 |
| Medium | 59 | 27 | 68.6 | 50 | 70.6 | 315 |
| Large | 51 | 35 | 59.3 | 42 | 105.6 | 316 |
| RGB 3C YOLOv11 | ||||||
| Nano | 56 | 30 | 65.1 | 44 | 105.4 | 316 |
| Small | 55 | 31 | 64.0 | 41 | 98.1 | 525 |
| Medium | 51 | 35 | 59.3 | 41 | 84.6 | 525 |
| Large | 53 | 33 | 61.6 | 46 | 45.6 | 105 |
| RGB 3C YOLOv12 | ||||||
| Nano | 53 | 33 | 61.6 | 44 | 36.2 | 110 |
| Small | 57 | 29 | 66.3 | 52 | 42.6 | 105 |
| Medium | 52 | 34 | 60.5 | 44 | 40.4 | 105 |
| Large | 54 | 32 | 62.8 | 46 | 27.0 | 105 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2026 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license.
Share and Cite
Krstinić, D.; Bejo, J.; Sikora, T.; Bugarić, M. Spatio-Temporal Data Model for Early Wildfire Detection. Fire 2026, 9, 175. https://doi.org/10.3390/fire9040175
Krstinić D, Bejo J, Sikora T, Bugarić M. Spatio-Temporal Data Model for Early Wildfire Detection. Fire. 2026; 9(4):175. https://doi.org/10.3390/fire9040175
Chicago/Turabian StyleKrstinić, Damir, Jakov Bejo, Toma Sikora, and Marin Bugarić. 2026. "Spatio-Temporal Data Model for Early Wildfire Detection" Fire 9, no. 4: 175. https://doi.org/10.3390/fire9040175
APA StyleKrstinić, D., Bejo, J., Sikora, T., & Bugarić, M. (2026). Spatio-Temporal Data Model for Early Wildfire Detection. Fire, 9(4), 175. https://doi.org/10.3390/fire9040175

