An Assessment of the Multi-Input Spatiotemporal RF–XGBoost Hybrid Framework for PM10 Estimation in Lithuania
Abstract
1. Introduction
2. Materials and Methods
2.1. Air Quality in Lithuania
2.2. DataSets
2.2.1. Ground Monitoring Stations
2.2.2. Meteorological Data
2.2.3. Satellite Atmospheric Data
2.2.4. Static Station and Environment Data
2.2.5. MODIS LST Data
2.2.6. Temporal Features
2.3. Method
2.4. Accuracy Assessment
3. Results
3.1. Model Performance Comparison
3.2. Temporal Performance Evaluation
3.3. Station-Level Performance
4. Discussion
5. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
Abbreviations
| PM | Particulate Matter |
| RF | Random Forest |
| XGBoost | Extreme Gradient Boosting |
| TROPOMI | Tropospheric Monitoring Instrument |
| NO2 | Nitrogen Dioxide |
| CO | Carbon Monoxide |
| SO2 | Sulfur Dioxide |
| O3 | Ozone |
| HCHO | Formaldehyde |
| AI | Aerosol Index |
| MODIS | Moderate-Resolution Imaging Spectroradiometer |
| NDVI | Normalized Difference Vegetation Index |
| R2 | Coefficient of determination |
| MAE | Mean Absolute Error |
| RMSE | Root Mean Square Error |
| ML | Machine Learning |
| AOD | Aerosol Optical Depth |
| SeaWiFS | Sea-viewing Wide Field-of-view Sensor |
| MLR | Multiple Linear Regression |
| SVR | Support Vector Regression |
| KNN | K-nearest neighbor regression |
| IDW | Inverse Distance Weighting |
| WHO | World Health Organisation |
| EEA | European Environment Agency |
| EPA | Environmental Protection Agency |
| IQR | Interquartile Range |
| AMS | Automatic Meteorological Stations |
| GEE | Google Earth Engine |
| MAD | Median Absolute Deviation |
| GPR | Gaussian Process Regression |
| RBF | Radial Basis Function |
| LUR | Land-use regression |
| SRTM | Shuttle Radar Topography Mission |
| LST | Land-Surface Temperature |
| DEM | Digital Elevation Model |
References
- Bodor, K.; Micheu, M.M.; Keresztesi, Á.; Birsan, M.V.; Nita, I.A.; Bodor, Z.; Petres, S.; Korodi, A.; Szép, R. Effects of PM10 and Weather on Respiratory and Cardiovascular Diseases in the Ciuc Basin (Romanian Carpathians). Atmosphere 2021, 12, 289. [Google Scholar] [CrossRef]
- Duarte, R.M.B.O.; Duarte, A.C. Health Effects of Urban Atmospheric Aerosols. Atmosphere 2023, 14, 309. [Google Scholar] [CrossRef]
- Particulate Matter (PM) Basics | US EPA. Available online: https://www.epa.gov/pm-pollution/particulate-matter-pm-basics (accessed on 5 December 2025).
- Gencarelli, N.; Tang, Y.; Deng, J.; Cui, X.; Liu, Z.; Yang, L.; Zhang, S.; Liang, Y. High-Resolution Spatial Prediction of Daily Average PM2.5 Concentrations in Jiangxi Province via a Hybrid Model Integrating Random Forest and XGBoost. Atmosphere 2025, 16, 1317. [Google Scholar] [CrossRef]
- Serio, C.; Jiang, Z.; Yu, H.; Alvarez, C.I.; Andrés Ulloa Vaca, C.; Armando, N.; Llumipanta, E. Machine Learning for Urban Air Quality Prediction Using Google AlphaEarth Foundations Satellite Embeddings: A Case Study of Quito, Ecuador. Remote Sens. 2025, 17, 3472. [Google Scholar] [CrossRef]
- Vidot, J.; Santer, R.; Ramon, D. Remote Sensing of Particle Matter Using SeaWiFs. In Remote Sensing of Clouds and the Atmosphere VIII; SPIE: Bellingham, WA, USA, 2004; Volume 5235, pp. 619–626. [Google Scholar] [CrossRef]
- Gupta, P.; Christopher, S.A.; Wang, J.; Gehrig, R.; Lee, Y.; Kumar, N. Satellite Remote Sensing of Particulate Matter and Air Quality Assessment over Global Cities. Atmos. Environ. 2006, 40, 5880–5892. [Google Scholar] [CrossRef]
- van Donkelaar, A.; Martin, R.V.; Park, R.J. Estimating Ground-Level PM2.5 Using Aerosol Optical Depth Determined from Satellite Remote Sensing. J. Geophys. Res. Atmos. 2006, 111, D21201. [Google Scholar] [CrossRef]
- TROPOMI Observing Our Future | TROPOMI: TROPOspheric Monitoring Instrument. Available online: https://www.tropomi.eu/ (accessed on 8 December 2025).
- Grzybowski, P.T.; Markowicz, K.M.; Musiał, J.P. Estimations of the Ground-Level NO2 Concentrations Based on the Sentinel-5P NO2 Tropospheric Column Number Density Product. Remote Sens. 2023, 15, 378. [Google Scholar] [CrossRef]
- Vienneau, D.; De Hoogh, K.; Bechle, M.J.; Beelen, R.; Van Donkelaar, A.; Martin, R.V.; Millet, D.B.; Hoek, G.; Marshall, J.D. Western European Land Use Regression Incorporating Satellite and Ground-Based Measurements of NO2 and PM10. Environ. Sci. Technol. 2013, 47, 13555–13564. [Google Scholar] [CrossRef]
- Warthon, J.; Zamalloa, A.; Olarte, A.; Warthon, B.; Miranda, I.; Zamalloa-Puma, M.M.; Ccollatupa, V.; Ormachea, J.; Quispe, Y.; Jalixto, V.; et al. A Comprehensive Assessment of PM2.5 and PM10 Pollution in Cusco, Peru: Spatiotemporal Analysis and Development of the First Predictive Model (2017–2020). Sustainability 2025, 17, 394. [Google Scholar] [CrossRef]
- Sotoudeheian, S.; Arhami, M. Estimating Ground-Level PM10 Using Satellite Remote Sensing and Ground-Based Meteorological Measurements over Tehran. Environ. Health Sci. Eng. 2014, 12, 122. [Google Scholar] [CrossRef]
- Zhang, Z.; Wang, J.; Hart, J.E.; Laden, F.; Zhao, C.; Li, T.; Zheng, P.; Li, D.; Ye, Z.; Chen, K. National Scale Spatiotemporal Land-Use Regression Model for PM2.5, PM10 and NO2 Concentration in China. Atmos. Environ. 2018, 192, 48–54. [Google Scholar] [CrossRef]
- Plocoste, T.; Laventure, S. Forecasting PM10 Concentrations in the Caribbean Area Using Machine Learning Models. Atmosphere 2023, 14, 134. [Google Scholar] [CrossRef]
- Stafoggia, M.; Johansson, C.; Glantz, P.; Renzi, M.; Shtein, A.; de Hoogh, K.; Kloog, I.; Davoli, M.; Michelozzi, P.; Bellander, T. A Random Forest Approach to Estimate Daily Particulate Matter, Nitrogen Dioxide, and Ozone at Fine Spatial Resolution in Sweden. Atmosphere 2020, 11, 239. [Google Scholar] [CrossRef]
- Wei, C.; Zhao, C.; Hu, Y.; Tian, Y. Predicting the Concentration Levels of PM2.5 and O3 for Highly Urbanised Areas Based on Machine Learning Models. Sustainability 2025, 17, 9211. [Google Scholar] [CrossRef]
- Li, T.; Zhang, C.; Shen, H.; Yuan, Q.; Zhang, L. Real-Time and Seamless Monitoring of Ground-Level PM2.5 Using Satellite Remote Sensing ISPRS Ann. Photogramm. Remote Sens. Spatial Inf. Sci. 2018, IV-3, 143–147. [Google Scholar] [CrossRef]
- Šilingas, M.; Suchockas, V.; Varnagirytė-Kabašinskienė, I. Evaluation of Undergrowth under the Canopy of Deciduous Forests on Very Fertile Soils in the Lithuanian Hemiboreal Forest. Forests 2022, 13, 2172. [Google Scholar] [CrossRef]
- Byčenkienė, S.; Khan, A.; Bimbaitė, V. Impact of PM2.5 and PM10 Emissions on Changes of Their Concentration Levels in Lithuania: A Case Study. Atmosphere 2022, 13, 1793. [Google Scholar] [CrossRef]
- WHO. Global Air Quality Guidelines: Particulate Matter (PM2.5 and PM10), Ozone, Nitrogen Dioxide, Sulphur Dioxide and Carbon Monoxide; WHO: Geneva, Switzerland, 2021; pp. 1–360. [Google Scholar]
- Directive–2008/50–EN–EUR-Lex. Available online: https://eur-lex.europa.eu/eli/dir/2008/50/oj/eng (accessed on 13 December 2025).
- Air Quality E-Reporting (AQ e-Reporting). Available online: https://www.eea.europa.eu/en/datahub/datahubitem-view/3b390c9c-f321-490a-b25a-ae93b2ed80c1 (accessed on 14 December 2025).
- INSPIRE Geoportal. Available online: https://inspire-geoportal.ec.europa.eu/srv/eng/catalog.search#/extenddetails?country=lt&view=priorityOverview&theme=none&resourceId=ccd714ff-d1ee-4400-9822-aaff4e97bee7 (accessed on 14 December 2025).
- McDonnell, W.F.; Nishino-Ishikawa, N.; Petersen, F.F.; Chen, L.H.; Abbey, D.E. Relationships of Mortality with the Fine and Coarse Fractions of Long-Term Ambient PM10 Concentrations in Non-smokers. J. Expo. Anal. Environ. Epidemiol. 2000, 10, 427–436. [Google Scholar] [CrossRef]
- Abdulkareem, S.K.; Alhadithi, M.; Amer, W. Evaluating Spatial Interpolation Techniques for Accurate Air Quality Prediction: An Overview. E3S Web Conf. 2025, 633, 07008. [Google Scholar] [CrossRef]
- Neumann, C. Habitat Sampler—A Sampling Algorithm for Habitat Type Delineation in Remote Sensing Imagery. Divers. Distrib. 2020, 26, 1752–1766. [Google Scholar] [CrossRef]
- Lu, G.Y.; Wong, D.W. An Adaptive Inverse-Distance Weighting Spatial Interpolation Technique. Comput. Geosci. 2008, 34, 1044–1055. [Google Scholar] [CrossRef]
- Shepard, D. A Two-Dimensional Interpolation Function for Irregularly-Spaced Data. In Proceedings of the 1968 23rd ACM National Conference; ACM: New York, NY, USA, 1968; pp. 517–524. [Google Scholar] [CrossRef]
- Meteo.Lt API. Available online: https://api.meteo.lt/ (accessed on 21 December 2025).
- Sentinel-5P―Sentinel Online. Available online: https://sentinels.copernicus.eu/copernicus/sentinel-5p (accessed on 21 December 2025).
- Earth Engine Data Catalog | Google for Developers. Available online: https://developers.google.com/earth-engine/datasets (accessed on 21 December 2025).
- Hampel―Outlier Removal Using Hampel Identifier―MATLAB. Available online: https://se.mathworks.com/help/signal/ref/hampel.html (accessed on 21 December 2025).
- Rasmussen, C.E.; Williams, C.K.I. Gaussian Processes for Machine Learning; MIT Press: Cambridge, MA, USA, 2006; p. 248. [Google Scholar]
- Wang, M.; Beelen, R.; Bellander, T.; Birk, M.; Cesaroni, G.; Cirach, M.; Cyrys, J.; de Hoogh, K.; Declercq, C.; Dimakopoulou, K.; et al. Performance of Multi-City Land Use Regression Models for Nitrogen Dioxide and Fine Particles. Environ. Health Perspect. 2014, 122, 843. [Google Scholar] [CrossRef] [PubMed]
- Stafoggia, M.; Bellander, T.; Bucci, S.; Davoli, M.; de Hoogh, K.; de’ Donato, F.; Gariazzo, C.; Lyapustin, A.; Michelozzi, P.; Renzi, M.; et al. Estimation of Daily PM10 and PM2.5 Concentrations in Italy, 2013–2015, Using a Spatiotemporal Land-Use Random-Forest Model. Environ. Int. 2019, 124, 170–179. [Google Scholar] [CrossRef]
- Farr, T.G.; Rosen, P.A.; Caro, E.; Crippen, R.; Duren, R.; Hensley, S.; Kobrick, M.; Paller, M.; Rodriguez, E.; Roth, L.; et al. The Shuttle Radar Topography Mission. Rev. Geophys. 2007, 45, RG2004. [Google Scholar] [CrossRef]
- Zanaga, D.; Van De Kerchove, R.; Daems, D.; De Keersmaecker, W.; Brockmann, C.; Kirches, G.; Wevers, J.; Cartus, O.; Santoro, M.; Fritz, S.; et al. ESA WorldCover 10 m 2021 V200[Data set]. Zenodo 2022. [Google Scholar] [CrossRef]
- Rouse, J.W., Jr.; Haas, R.H.; Schell, J.A.; Deering, D.W. Monitoring Vegetation Systems in the Great Plains with ERTS; NASA: Washington, DC, USA, 1974; Volume 1.
- Sorichetta, A.; Hornby, G.M.; Stevens, F.R.; Gaughan, A.E.; Linard, C.; Tatem, A.J. High-Resolution Gridded Population Datasets for Latin America and the Caribbean in 2010, 2015, and 2020. Sci. Data 2015, 2, 150045. [Google Scholar] [CrossRef] [PubMed]
- Open Spatial Demographic Data and Research―WorldPop. Available online: https://www.worldpop.org/ (accessed on 21 December 2025).
- Home Page―Geoportal.Lt. Available online: https://www.geoportal.lt/geoportal/ (accessed on 21 December 2025).
- Wan, Z.; Hook, S.; Hulley, G. MODIS/Terra Land Surface Temperature/Emissivity Daily L3 Global 1km SIN Grid V061; NASA: Washington, DC, USA, 2021.
- Land Processes Distributed Active Archive Center | NASA Earthdata. Available online: https://www.earthdata.nasa.gov/centers/lp-daac (accessed on 22 December 2025).
- Dudek, A.; Baranowski, J.; Liu, H.; Li, J.-B.; Li, M.; Chen, S.-H.; Dudek, A.; Baranowski, J. Gaussian Processes for Signal Processing and Representation in Control Engineering. Appl. Sci. 2022, 12, 4946. [Google Scholar] [CrossRef]
- Liu, C.; Chen, R.; Sera, F.; Vicedo-Cabrera, A.M.; Guo, Y.; Tong, S.; Coelho, M.S.Z.S.; Saldiva, P.H.N.; Lavigne, E.; Matus, P.; et al. Ambient Particulate Air Pollution and Daily Mortality in 652 Cities. N. Engl. J. Med. 2019, 381, 705–715. [Google Scholar] [CrossRef]
- Chen, G.; Li, S.; Knibbs, L.D.; Hamm, N.A.S.; Cao, W.; Li, T.; Guo, J.; Ren, H.; Abramson, M.J.; Guo, Y. A Machine Learning Method to Estimate PM2.5 Concentrations across China with Remote Sensing, Meteorological and Land Use Information. Sci. Total Environ. 2018, 636, 52–60. [Google Scholar] [CrossRef]
- Dietterich, T.G. Ensemble Methods in Machine Learning. In International Workshop on Multiple Classifier Systems; Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Springer: Berlin, Germany, 2000; pp. 1–15. [Google Scholar] [CrossRef]
- Breiman, L. Random Forests. Mach. Learn. 2001, 45, 5–32. [Google Scholar] [CrossRef]
- Chen, T.; Guestrin, C. XGBoost: A Scalable Tree Boosting System. In Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Francisco, CA, USA, 13–17 August 2016; pp. 785–794. [Google Scholar] [CrossRef]
- Chicco, D.; Warrens, M.J.; Jurman, G. The Coefficient of Determination R-Squared Is More Informative than SMAPE, MAE, MAPE, MSE and RMSE in Regression Analysis Evaluation. PeerJ Comput. Sci. 2021, 7, e623. [Google Scholar] [CrossRef]
- Willmott, C.J.; Matsuura, K. Advantages of the Mean Absolute Error (MAE) over the Root Mean Square Error (RMSE) in Assessing Average Model Performance. Clim. Res. 2005, 30, 79–82. [Google Scholar] [CrossRef]
- Lin, L.; Liang, Y.; Liu, L.; Zhang, Y.; Xie, D.; Yin, F.; Ashraf, T.; Lin, L.; Liang, Y.; Liu, L.; et al. Estimating PM2.5 Concentrations Using the Machine Learning RF-XGBoost Model in Guanzhong Urban Agglomeration, China. Remote Sens. 2022, 14, 5239. [Google Scholar] [CrossRef]
- Faye, D.; Lguensat, R.; Kaly, F.; Sudmant, A.; Gaye, A.T.; Kalisa, E. Machine Learning for Air Quality Forecasting: Insights from Five Provinces of Rwanda. Sci. Afr. 2025, 30, e02959. [Google Scholar] [CrossRef]
- Sorek-Hamer, M.; Chatfield, R.; Liu, Y. Review: Strategies for Using Satellite-Based Products in Modelling PM2.5 and Short-Term Pollution Episodes. Environ. Int. 2020, 144, 106057. [Google Scholar] [CrossRef]
- Lee, J.; Barquilla, C.A.M.; Park, K.; Hong, A. Urban Form and Seasonal PM2.5 Dynamics: Enhancing Air Quality Prediction Using Interpretable Machine Learning and IoT Sensor Data. Sustain. Cities Soc. 2024, 117, 105976. [Google Scholar] [CrossRef]
- Zhang, Z.; Johansson, C.; Engardt, M.; Stafoggia, M.; Ma, X. Improving 3-Day Deterministic Air Pollution Forecasts Using Machine Learning Algorithms. Atmos. Chem. Phys. 2024, 24, 807–851. [Google Scholar] [CrossRef]
- Wallek, S.; Langner, M.; Schubert, S.; Franke, R.; Sauter, T. Hourly Particulate Matter (PM10) Concentration Forecast in Germany Using Extreme Gradient Boosting. Atmosphere 2024, 15, 525. [Google Scholar] [CrossRef]
- Li, B.; Liu, C.; Hu, Q.; Sun, M.; Zhang, C.; Zhu, Y.; Liu, T.; Guo, Y.; Carmichael, G.R.; Gao, M. A Deep Learning Approach to Increase the Value of Satellite Data for PM2.5 Monitoring in China. Remote Sens. 2023, 15, 3724. [Google Scholar] [CrossRef]








| Data Group | Variables | Unit | Spatial/Temporal Resolution | Source |
|---|---|---|---|---|
| Ground air-quality data | PM10, PM2.5, NO2, SO2, O3, CO | µg/m3, mg/m3 | Hourly | EEA |
| Meteorological data | Air temperature, feels-like temperature, wind speed, wind gust, relative humidity, cloud cover, sea-level pressure, precipitation | °C, m/s, %, hPa, mm | Hourly | Meteo |
| Satellite atmospheric data | NO2, CO, SO2, O3, HCHO columns, Absorbing Aerosol Index (AAI) | mol/m2, unitless | ~3.5–7 km | TROPOMI |
| Static station and environmental data | Coordinates (lat, lon), altitude, elevation (DEM), land cover, NDVI, population density, proximity to main roads | mixed | static derived | EEA, ESA, USGS, WorldPop, INSPIRE |
| MODIS LST data | day, night | °C | 1 km | MODIS |
| Temporal features | year, month, day, weekday, season, lag1, lag2, lag3, roll7 | unitless, µg/m3 | Daily | Derived |
| α | β | MAE | RMSE | R2 |
|---|---|---|---|---|
| 0.0 | 1.0 | 3.0179 | 3.8912 | 0.7194 |
| 0.1 | 0.9 | 2.9881 | 3.8475 | 0.7257 |
| 0.2 | 0.8 | 2.9638 | 3.8117 | 0.7308 |
| 0.3 | 0.7 | 2.9458 | 3.7842 | 0.7346 |
| 0.4 | 0.6 | 2.9339 | 3.7650 | 0.7373 |
| 0.5 | 0.5 | 2.9281 | 3.7544 | 0.7388 |
| 0.6 | 0.4 | 2.9271 | 3.7523 | 0.7391 |
| 0.7 | 0.3 | 2.9323 | 3.7589 | 0.7382 |
| 0.8 | 0.2 | 2.9439 | 3.7739 | 0.7361 |
| 0.9 | 0.1 | 2.9613 | 3.7975 | 0.7328 |
| 1.0 | 0.0 | 2.9846 | 3.8293 | 0.7283 |
| Interval (µg/m3) | N | MAE | RMSE | Mean Residual (Pred−Obs) |
|---|---|---|---|---|
| <15 | 3544 | 2.94 | 3.76 | 2.64 |
| 15–40 | 4309 | 2.89 | 3.70 | −0.02 |
| >40 | 31 | 6.62 | 7.48 | −6.62 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2026 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license.
Share and Cite
Fahim, M.A.S.; Sužiedelytė Visockienė, J. An Assessment of the Multi-Input Spatiotemporal RF–XGBoost Hybrid Framework for PM10 Estimation in Lithuania. Sustainability 2026, 18, 2022. https://doi.org/10.3390/su18042022
Fahim MAS, Sužiedelytė Visockienė J. An Assessment of the Multi-Input Spatiotemporal RF–XGBoost Hybrid Framework for PM10 Estimation in Lithuania. Sustainability. 2026; 18(4):2022. https://doi.org/10.3390/su18042022
Chicago/Turabian StyleFahim, Mina Adel Shokry, and Jūratė Sužiedelytė Visockienė. 2026. "An Assessment of the Multi-Input Spatiotemporal RF–XGBoost Hybrid Framework for PM10 Estimation in Lithuania" Sustainability 18, no. 4: 2022. https://doi.org/10.3390/su18042022
APA StyleFahim, M. A. S., & Sužiedelytė Visockienė, J. (2026). An Assessment of the Multi-Input Spatiotemporal RF–XGBoost Hybrid Framework for PM10 Estimation in Lithuania. Sustainability, 18(4), 2022. https://doi.org/10.3390/su18042022

