Machine Learning-Based Ground-Level NO2 Estimation in Istanbul: A Comparative Analysis of Sentinel-5P and GEOS-CF

Yagmur Aydin, Nur

doi:10.3390/app152010997

Open AccessArticle

Machine Learning-Based Ground-Level NO₂ Estimation in Istanbul: A Comparative Analysis of Sentinel-5P and GEOS-CF

by

Nur Yagmur Aydin

Geomatics Engineering Department, Engineering Faculty, Gebze Technical University, Kocaeli 41400, Türkiye

Appl. Sci. 2025, 15(20), 10997; https://doi.org/10.3390/app152010997

Submission received: 22 September 2025 / Revised: 10 October 2025 / Accepted: 13 October 2025 / Published: 13 October 2025

(This article belongs to the Special Issue Air Quality Monitoring, Analysis and Modeling)

Download

Browse Figures

Versions Notes

Abstract

Nitrogen dioxide (NO₂) poses severe risks to human health and the environment, especially in densely populated megacities. Ground-based air quality monitoring stations provide high-temporal-resolution data but are spatially limited, while satellite observations offer broad coverage but measure column densities rather than surface concentrations. To overcome these limitations, this study integrates ground-based observations with satellite-derived NO₂ from Sentinel-5P TROPOMI and GEOS-CF products to estimate ground-level NO₂ in Istanbul using machine learning (ML) approaches. Three ML algorithms (RF, XGB, and CB) were tested on two datasets spanning 2019–2024 at ~1 km resolution, incorporating 20 features, including topographic, meteorological, environmental, and demographic variables. Among models, CB achieved the best performance (R: 0.686, RMSE: 16.23 µg/m³, and MAE: 11.75 µg/m³ in the test dataset) with the Sentinel-5P dataset, successfully capturing spatial and seasonal variations in ground-level NO₂ both quantitatively and qualitatively. SHAP analysis revealed that regarding satellite-derived NO₂, anthropogenic indicators such as population density, road length, and digital elevation model were the most influential features, while meteorological factors contributed secondarily. Despite the lower spatial resolution of GEOS-CF data, both Sentinel-5P and GEOS-CF datasets supported reliable model outputs. This study provides the first ML-based ground-level NO₂ estimation framework for the Istanbul Metropolitan City.

Keywords:

Sentinel-5P; Geos-CF; ground-level NO₂; Istanbul; CatBoost; SHAP; machine learning

1. Introduction

As one of the main air pollutants, nitrogen dioxide (NO₂) is produced by natural and anthropogenic sources, including fossil fuel burning from transportation, industrial activities, and power plants [1]. Exposure to NO₂ is associated with a range of health issues, such as cardiovascular and respiratory diseases, lung cancer, and premature death [2,3,4]. Beyond its health impacts, NO₂ also adversely affects the natural environment. It contributes to the formation of tropospheric ozone and aerosol nitrates, leading to acid rain and reduced visibility [5]. Moreover, high concentrations of NO₂ can damage crops and vegetation by reducing yields and inhibiting plant growth [6].

The rapid growth of megacity populations, driven by economic and technological development, has further increased NO₂ emissions from human activities. Several studies have demonstrated the strong connection between economic development and ground-level NO₂ concentrations [7,8]. Over the past three decades, megacities have contributed significantly to the rise in anthropogenic NO₂ emissions, making them critical areas for air pollution mitigation and control [9,10]. Therefore, continuous monitoring and accurate estimation of ground-level NO₂ concentrations in megacities are of vital importance for tracking air pollution and developing effective action plans [11].

Air quality monitoring stations are sustainable systems established in regions where air pollution monitoring is essential. These stations provide high spatial and temporal resolution data, typically with hourly measurements. However, as point-based systems, their spatial coverage is limited. Furthermore, since the monitoring of anthropogenic emissions is often prioritized, the distribution of stations is concentrated in urban areas, resulting in a lack of spatial balance [12,13]. On the contrary, remote sensing technologies enable spatial and temporal analyses through their wide coverage and repeated data acquisition [14]. Despite these advantages, remote sensing has limitations, such as the inability to collect data under cloudy conditions and its measurement of column density rather than surface or near-surface concentrations [13,15]. Nevertheless, numerous studies have demonstrated that satellite-observed tropospheric NO₂ column density correlates with ground-level NO₂ due to its short atmospheric lifetime and its formation from anthropogenic activities [1,16]. Considering the respective advantages and limitations of satellite and ground-based monitoring systems, their combined use provides complementary insights, supporting the development of more comprehensive and accurate models.

Several approaches have been employed to estimate ground-level NO₂, including traditional statistical methods such as kriging, land use regression, and geographically and temporally weighted regression [16]. However, machine learning (ML) models have shown superior performance by effectively capturing complex nonlinear relationships, accounting for interactions among diverse variables, and achieving more accurate predictions [13,17,18]. ML-based studies have been applied across various scales, including local [19,20,21], regional [22,23], and national [10,17,24,25,26] levels. In addition, explainable artificial intelligence (XAI) techniques, such as Shapley Additive exPlanations (SHAP), have been increasingly used to identify and interpret the environmental, meteorological, and topographic drivers influencing ground-level NO₂ estimates [22,27,28].

Within the scope of this study, ground-level NO₂ concentrations in Istanbul were estimated using ML algorithms by integrating ground-based air quality monitoring data with satellite-derived tropospheric column of NO₂, including Sentinel-5P TROPOMI and Geos-CF datasets. Three machine learning algorithms were applied to a dataset spanning 2019–2024 at a spatial resolution of ~1 km, with a focus on seasonal variations. Both qualitative and quantitative evaluations were conducted. To improve prediction accuracy, a comprehensive set of topographic, meteorological, environmental, and demographic variables was incorporated. Model interpretability was ensured using SHAP, a widely adopted explainable AI method [29], to identify the most influential features of NO₂ variability. The study will address the following research questions:

How accurately can ground-level NO₂ in metropolitan cities be estimated using satellite-derived tropospheric NO₂ data?
How do ground-level NO₂ estimates differ when using NO₂ data with varying spatial resolutions?
To what extent do environmental and anthropogenic factors influence the prediction of ground-level NO₂?

In this context, this study represents the first ML-based ground-level NO₂ estimation framework for Istanbul, Türkiye’s most populous and touristic city, providing critical insights for air quality management and policymaking.

2. Study Area and Datasets

2.1. Study Area and Ambient Air Quality Monitoring Station Data

Istanbul is Türkiye’s most populous metropolitan city, with a population of approximately 16 million. Its role as a bridge connecting Asia and Europe has made it a major transportation, industrial, and tourism hub (Figure 1). Due to internal migration and increasing population, residential areas have increased significantly in the last 30 years to meet the city’s housing needs [30,31], and it is estimated that they will increase further [32]. Therefore, monitoring air quality in cities with intense human activity is gaining importance. To this end, the Istanbul Metropolitan Municipality and the Ministry of Environment, Urbanization, and Climate Change have established air quality monitoring stations in the city. The distribution of air quality monitoring stations is shown in Figure 1.

There are 40 air quality stations in Istanbul. However, only 32 of these stations measure the NO₂ parameter. The distribution of stations is concentrated around the Bosphorus, where transportation, touristic and industrial activities, and residential areas are concentrated. This allows monitoring only in areas where threats are identified, but not in areas without air quality stations. In air quality monitoring stations, parameters are measured hourly and serviced online (https://havakalitesi.ibb.gov.tr/, accessed on 1 August 2025).

Hourly collected station data were filtered based on the satellite passing time (between 1.00 and 2.00 p.m. in UTC+3 time) between the years of 2019 and 2024, and the data from both hours each day were averaged. To minimize the effects of abnormal values on the model, values greater than 300 and less than one were eliminated from the data, the same as with Chi et al. (2022) [17].

2.2. Datasets

In the study, various datasets were identified through a literature review and collected from different sources, taking into account the matching of time intervals, as listed in Table 1.

2.2.1. Sentinel-5P TROPOMI Tropospheric NO₂ Columns

The Sentinel-5P satellite was launched on 13 October 2017, and was designed for monitoring the atmosphere and air pollution, ozone-layer monitoring, climate change and aviation safety at high spatial resolution under the Copernicus mission. The TROPOMI (TROPOspheric Monitoring Instrument) sensor collects solar radiation backscattered from the Earth and atmosphere. Its products are served in two processing levels with a spatial sampling of approximately 3.5 × 5.5 km since 6 August 2019. However, the Level-2 data is stored as Level-3 OFL with a spatial sampling of approximately 1 × 1 km in Google Earth Engine (GEE) [33]. The Sentinel-5P TROPOMI dataset was extracted using the GEE cloud computing platform, which stores archive and up-to-date datasets and enables data processing and geospatial analysis.

2.2.2. Satellite-Based Variables

Various topographical, environmental, and meteorological factors were included in the model to enhance the data space and improve model accuracy, as well as their influence on NO₂. The Shuttle Radar Topography Mission (SRTM) digital elevation model (DEM) [34] was used as a topographical factor. SRTM is a global DEM dataset with a 30 m spatial resolution. Normalized Difference Vegetation Index (NDVI) [35] was used as the environmental factor to identify green and non-green areas. NDVI data were calculated from MODIS Nadir Bidirectional Reflectance Distribution Function Adjusted Reflectance (NBAR) data, which has a spatial resolution of 500 m and has been providing data since 2000 [36]. Nighttime light (NTL) data were used to identify urban areas and non-green areas, representing the artificial lights of the settlements and human activities [37]. Considering the impact of anthropogenic activities on NO₂ formation, NTL data spatially well characterize the areas where human activities and hence NO₂ emissions are concentrated [38]. For this purpose, NTL data were obtained with the Visible Infrared Imaging Radiometer Suite (VIIRS) with a spatial resolution of 500 m [39].

The climate variables were provided by the Goddard Earth Observing System Composition Forecast (Geos-CF). Geos-CF, developed by the National Aeronautics and Space Administration (NASA), includes globally produced three-dimensional distributions of atmospheric composition with a spatial resolution of approximately 27 km. Geos-CF products cover atmospheric replay to time-average one-hour data by combining meteorological, atmospheric, and chemical collections [40]. Fourteen bands of the Geos-CF were used in the study, as given in Table 1. To match the Geos-CF dataset with Sentinel-5P, the Geos-CF dataset was temporally filtered based on the satellite passing time of the Sentinel-5P between the years of 2019 and 2024, and the Geos-CF data taken in both hours (between 1.00 and 2.00 p.m. in UTC+3 time) for each day were averaged to provide temporal consistency of the datasets.

2.2.3. Auxiliary Variables

To consider the effects of social factors, population density (PD) and road length (RL) variables were included in the estimation model. Population data were provided from the Turkish Statistical Institute (TUIK), and population density was computed by dividing the population values by the area of each district. Road data was obtained from the road layer shared by OpenStreetMap. Within the scope of the study, 0.1 × 0.1° grid network was created for the study area, and the road lengths within each grid were calculated. In addition to meteorological, environmental, and social factors, the day of the year (DOY) was included in the model. These auxiliary data were used to detect the density of human activities both temporally and spatially [41].

3. Methodology

The study was conducted in three parts: data preprocessing and feature extraction, model development with ML algorithms and model evaluations, as shown in Figure 2.

3.1. Data Preprocessing and Feature Extraction

Satellite images data, including Sentinel-5P, Geos-CF, MODIS NDVI, SRTM DEM, and VIIRS NTL, were extracted with the locations of air quality monitoring stations using GEE, and data were matched both temporally and spatially. Data were divided into three parts: training, validation, and test. While the training data covers the years 2019–2022, the validation data covers the year 2023, and the test data covers the year 2024.

Two different data groups were generated to measure the effect of NO₂ data sources. With this purpose, the tropospheric column density of NO₂ gathered from Sentinel-5P was used as input in the first data group, while the NO₂ tropospheric column density data from Geos-CF were used as input in the second data group, along with 19 other variables.

In the phase of applying the created models to all images, 0.1 × 0.1° grid network was created, data was collected on the GEE platform, and the resulting thematic maps were produced with the best-performing model.

3.2. Model Development with Machine Learning

In order to estimate surface NO₂ concentrations, the best appropriate model must generate the most accurate results. Therefore, three different machine learning algorithms, namely Random Forest (RF) Regression, Extreme Gradient Boosting (XGBoost) Regression (XGB), and CatBoost Regression (CB), were chosen based on their success in similar studies [13,19,26,42]. Within the scope of the study, Optuna [43] was used in the hyperparameter optimization of algorithms.

RF, introduced by Breiman (2001) [44], is one of the most prominent machine learning algorithms for air quality assessment research owing to its robust predictive performance and efficiency [45,46]. RF divides the original training dataset into random subsets and constructs an ensemble of decision trees. The training process is performed utilizing 2/3 of these subsets, while the remaining subsets are responsible for evaluating the model’s accuracy [47]. The majority voting approach is utilized to identify the final label of samples.

XGB was presented by Chen and Guestrin (2016) [48] as an advanced tree-based machine learning algorithm based on boosting theories. The main principle of XGB is the sequential refinement of weak learners within an ensemble [49]. The training process begins with a base model by allocating equal weights to all samples, and labels are predicted. In subsequent iterations, incorrectly estimated samples are assigned higher weights to fix their labels [50,51]. Unlike other ML models, XGB incorporates regularization techniques and optimized loss functions to reduce overfitting and enhance generalization [52].

CB, one of the latest members of tree-based algorithms, was developed by Yandex to effectively handle the challenges associated with categorical features [53]. Automatic processing of categorical data and missing values through ordered boosting eliminates manual preprocessing requirements [54]. Additionally, CB constructs a symmetric tree structure to mitigate overfitting and enables GPU acceleration for large-scale datasets, resulting in fast and superior predictive performance [55,56].

Moreover, in this study, SHAP, an XAI framework based on game theory, was employed to interpret model behavior by quantifying the contribution of each feature to predictions in terms of both magnitude and direction [29]. SHAP values represent the marginal effect of a feature compared to the dataset’s average prediction, while aggregated absolute SHAP values provide global feature importance [57]. This approach enables both local and global interpretation, facilitating the assessment of feature relevance, the validation of model reliability against domain knowledge, and highlighting the role of satellite observations in modeling ground-level NO₂ concentrations.

3.3. Model Evaluation and Accuracy Assessment

To assess model accuracy, three accuracy metrics were used within the study: Pearson correlation coefficient (R), Root Mean Square Error (RMSE), and Mean Absolute Error (MAE). The equations of metrics are, respectively, given as follows:

R = \frac{\sum_{i = 1}^{n} (x_{i} - x_{m}) ({\hat{x}}_{i} - {\hat{x}}_{m})}{\sqrt{\sum_{i = 1}^{n} {(x_{i} - x_{m})}^{2} \sum_{i = 1}^{n} {({\hat{x}}_{i} - {\hat{x}}_{m})}^{2}}}

(1)

R M S E = \sqrt{\frac{\sum_{i = 1}^{n} {(x_{i} - {\hat{x}}_{i})}^{2}}{n}}

(2)

M A E = \frac{\sum_{i = 1}^{n} |x_{i} - {\hat{x}}_{i}|}{n}

(3)

where

x_{i}

is the measured value,

{\hat{x}}_{i}

is the model-estimated NO₂ value, n is the total number of samples in the validation dataset, and

x_{m}

and

{\hat{x}}_{m}

represent the means of the measured and estimated values, respectively.

4. Results

4.1. Correlation Analysis Between Variables

Firstly, the correlation coefficients between the features are calculated using the Pearson Correlation coefficient and are given in Figure 3. The results indicate that the correlation coefficients of S5P and Geos-CF NO₂ data with ground station data (Sta_NO₂) were determined to be close to each other, 0.38 and 0.31, respectively. However, the correlation coefficient between S5P and Geos-CF NO₂ data was determined to be 0.68. NDVI has a negative correlation with road length (RL) and population density (PD), with correlation coefficients of −0.66 and −0.57, respectively. The correlation coefficient between PD and RL was 0.59. When the correlations between Geos-CF bands were examined, the highest correlations were determined as 2-m air temperature (T2M)-surface skin temperature (TS) 0.97, sea level pressure (SLP)-surface pressure (PS) 0.90, mid layer heights (ZL)-surface geopotential height (PHIS) 0.86, eastward wind (U10M)-northward wind (V10M) 0.86, T2M-specific humidity (Q) 0.81, TS-Q 0.75, respectively.

4.2. Accuracy Assessment and Seasonal Thematic Maps

Three ML algorithms were tested with accuracy metrics including R, RMSE, and MAE for each data part, and the results are given in Table 2. The best-performing model was highlighted for both Sentinel-5P and Geos-CF datasets.

Considering the results given in Table 2, all the results obtained were lower than the standard deviations of the data itself (34.16 µg/m³ for training, 31.40 µg/m³ for validation, and 29.86 µg/m³ for testing). This shows that all algorithms performed well. Although the error values of the models established with Geos-CF data in the training phase were lower than those of the models established with Sentinel-5P data, the RMSE and MAE of the models established with S5P in the validation and testing phases were relatively lower, and the R was higher.

Although XGB gave the highest R (0.945 and 0.842) and lowest RMSE (9.157 and 14.917 µg/m³) and MAE (6.405 and 10.402 µg/m³) values in the training phase, it obtained the lowest R and highest error values in the validation and testing phases for both S5P and Geos-CF data due to its tendency to overfitting. CB gave the best results in the validation and testing phases in both data models due to its generalization capability. When ranked in terms of performance, the CB algorithm is followed by RF and XGB, respectively.

The station-based diagrams, including accuracy metrics (RMSE, MAE, and R) for all algorithms, are presented in Figure 4. In general, the RF model (Figure 4a,b) and CB model (Figure 4e,f) exhibited similar performance for the Sentinel-5P dataset, characterized by high correlation coefficients (R > 0.7) and relatively low RMSE (mostly below 15 µg/m³). The color distribution further indicates lower MAE values, suggesting that RF and CB achieved more stable and accurate estimations across different stations. The XGB model (Figure 4c,d) showed a wider spread of RMSE values, with several points extending toward higher error levels, reflecting greater variability in prediction accuracy. The CB model (Figure 4e,f) demonstrated consistent and balanced results, with moderate RMSE and relatively high correlation values similar to those of RF. Across all models, the Sentinel-5P-based results (Figure 4b,d,f) generally outperform those derived from Geos-CF (Figure 4a,c,e), indicating a better agreement between Sentinel-5P observations and the model outputs. Overall, the station-based analysis confirms that RF and CB provided the most robust and reliable predictions, while Sentinel-5P data provided stronger consistency with the modeled variables. Additionally, the analysis revealed that the models consistently exhibit high errors at the same stations. The three stations with the highest errors exceeding 20 µg/m³ (Avcılar, Ümraniye2, and Esenler) are located in spatially distinct areas within the most densely populated districts. This may be attributed to abrupt fluctuations in measured values, which could have increased the model errors at these stations.

In addition to the quantitative evaluation, a qualitative assessment is also important in assessing the model’s performance. For this purpose, seasonal thematic maps of NO₂ distribution were created with seasonal average data for 2024. Maps created with Sentinel-5P are given in Figure 5, while maps created with Geos-CF are given in Figure 6.

According to Figure 5, seasonal maps obtained from the XGB model are quite noisy, unlike those from other models. It was observed that the amount of NO₂ was high in the northern parts of the city where forested areas were dense. Seasonal variation in NO₂ distribution shows significant changes in both the RF and CB models. During winter and spring, NO₂ levels are higher in the southern parts of the Bosphorus, where urban areas are dense, while they are lower in other areas. While NO₂ levels decrease in the summer months, they increase again in the autumn. Differences between the two model results are particularly evident in summer and autumn. The sharp linear changes in the results are due to the pixel sizes of the Geos-CF data.

When the thematic maps obtained with the models created with Geos-CF are examined, the XGB model has a high noise level, similar to the results obtained with the Sentinel-5P models. RF and CB models also show consistent seasonal distribution. Model results also diverge between summer and autumn seasons.

In order to evaluate the accuracy of the produced thematic maps, an accuracy assessment was conducted using seasonal average values obtained from the stations. The calculated R, RMSE, and MAE values for four seasons are given in Figure 7.

Considering Figure 7, among the models built with Sentinel-5P, XGB has the lowest R and highest RMSE and MAE errors in all seasons. After both quantitative and qualitative evaluations, the XGB algorithm was found to be unsuccessful on this dataset. In the RF and CB models, RF was relatively successful only in the summer season, while CB performed successfully in other seasons. When the error amounts were examined, it was determined that the models’ error amounts were higher, and the correlations were lower in the spring and autumn seasons compared to the summer and winter seasons.

Among models built with Geos-CF data, XGB appears to be the most successful model with the lowest error values in the quantitative evaluation, even though it fails in the qualitative assessment. This demonstrates that quantitative assessments alone are not sufficient to evaluate ML model results. Among the RF and CB models, CB performed well in all seasons except summer. Spring and autumn were also the seasons with the highest error rates.

When the Sentinel-5P and Geos-CF model results are compared, it is observed that the RF and CB model results shown in Figure 5 and Figure 6 mostly yield similar results. In the quantitative evaluation, it was determined that the CB model created with Sentinel-5P data showed better results than the models established with Geos-CF in both the training, validation, and testing stages (Table 2) and in the seasonal evaluation analyses (Figure 7).

5. Discussion

In the study, ground-level NO₂ estimation analysis of Istanbul province was carried out using three different ML algorithms (RF, XGB, and CB) and two different datasets, including satellite-derived NO₂ (Sentinel-5P and Geos-CF), meteorological, environmental, and social factors. The model created using data collected between 2019 and 2023 was tested with data from 2024, and its performance was compared both quantitatively and qualitatively. While many studies indicate that the XGB algorithm performs well in estimating ground-level NO₂ [13,17,27,58], in this study, XGB exhibited the lowest performance among the three algorithms. Shetty et al. (2024) [22] also reported that the model performed poorly in Türkiye in their study using the XGB algorithm over Europe. While CB produced the best results among the three algorithms, Figure 8 presents the station-based relative error distribution, calculated using both seasonal station observations and the estimated values.

The highest errors were recorded in spring in the southern parts of the Bosphorus, whereas in summer, errors were more pronounced in the northern areas compared to other seasons. Error levels were generally higher at stations on the Asian side (east of the Bosphorus) than on the European side (west of the Bosphorus), with the lowest errors observed in winter at stations in the western part of the city. This is primarily due to the lack of sufficient stations in the northern part of the Asian side. It is noteworthy that the distribution of relative errors is quite similar in both datasets.

To investigate the impacts of features on the estimation of ground-level NO₂, SHAP analysis was performed for each data model. The results are presented in Figure 9, and the 20 features are listed in order of importance. While Sentinel-5P NO₂ data became prominent as the most important factor, Geos-CF data contributed to the model as the fourth most important factor. This reveals that location features (RL, PD, and DEM) are more important than Geos-CF data. It also demonstrated the necessity of including satellite-derived NO₂ data in the ground-level NO₂ estimation model [58].

In both models, RL and PD are among the top three most important factors. High RL and PD values increase the model output. Dense populations and road networks are strongly associated with higher surface NO₂ levels. The distribution of road networks and PD is given in Figure 10a,b. It is observed that the road network is dense in the southern parts of the Bosphorus where the population is concentrated. The CB model results in Figure 5 and Figure 6 show that ground-level NO₂ values are high in areas where the road network and PD are dense. The main reason for this situation is vehicle emissions and human activities that cause NO₂ formation [59]. Shao et al. (2023) [27] revealed that urbanization and population growth exhibit a power law relationship with NO₂ concentration. In addition, the seasonal variation in ground-level NO₂ concentration in this region can also be associated with human activities. The fact that NO₂ levels are high in winter and spring, decrease in summer, and increase again in autumn is due to heating activities in densely populated areas [60].

DEM was identified as the third most important factor after RL and PD, and its distribution in Istanbul is given in Figure 10c. According to the SHAP results, decreasing elevation causes an increase in ground-level NO₂. The reason for this can be the high settlement and human population on the shores of the Bosphorus, where the elevation is low. When Figure 10 is examined, in areas with high elevation, both the road network and population density decrease, and therefore the NO₂ level also decreases.

The following five important features are ZL, NTL, V_10M, NDVI, and ZPBL. While high ZL, NTL, and V_10M values cause an increase in ground-level NO₂, low NDVI and ZPBL contribute to this rise. High NTL and low NDVI can also be associated with urbanization [22]. In densely urbanized areas, NTL is higher and NDVI is lower. Since wind is an effective parameter in NO₂ transport [28], it was found in this study that northward wind (V_10M) is more effective than eastward wind (U_10M). Low ZPBL values may be associated with increased ground-level NO₂ values, as air pollutants tend to concentrate at low altitudes near the Earth’s surface [61]. Low PBLH also causes increased ground-level NO₂ concentrations, especially in coastal areas [62]. Meteorological parameters were ranked among the last ten least influential features within the scope of the study, lagging behind human activities and topography in estimating ground-level NO₂.

6. Strengths and Limitations

The strengths and limiting factors of this study were identified. The strength of the study is the successful demonstration of ground-level NO₂ distribution over Istanbul using data with varying spatial resolutions. The CB models built with two different datasets (Sentinel-5P and Geos-CF) produced both quantitative and qualitative results, but the model built with Sentinel-5P data performed better, thanks to its higher spatial resolution.

The limitation of the study is the lack of both in situ and auxiliary data. When the distribution of ground air quality monitoring stations in Figure 1 is examined, it is observed that the stations are not distributed homogeneously throughout the city. Because NO₂ emissions are generated by vehicle emissions, industrial activities, and human activities, terrestrial air quality monitoring stations are located in densely populated areas. However, this poses a limitation in regional ground-level NO₂ estimation analyses.

Another limitation is that meteorological data with a spatial resolution of approximately 11 km, such as ERA5-Land, cannot be used within the scope of this study. This is because water areas are masked in this dataset. Because Istanbul is geographically surrounded by the Black Sea and the Sea of Marmara, the absence of pixels corresponding to terrestrial air quality measurement stations on land results in data loss due to pixel size. Therefore, Geos-CF data were used instead of ERA5-Land data. Studies frequently utilize ERA5 products and perform analyses at a broader spatial scale [23,26]. Future studies will test the model with ERA5 data at the same spatial resolution as Geos-CF, and analyses will be expanded to include other metropolitan cities. Furthermore, adding traffic data in cities with high human activity will also increase model accuracy. Additionally, the models were analyzed on a seasonal basis in this study. Future research could extend this approach to monthly assessments; however, producing results at a daily temporal resolution is not feasible due to data gaps.

7. Conclusions

In this study, ground-level NO₂ estimation was performed using three different ML algorithms and two different datasets. In addition to determining the most accurate model, each model incorporated 20 features, and their relative contributions were assessed through SHAP analysis. The CB model was identified as the most successful. Although the atmospheric GEOS-CF data with lower spatial resolution influenced the visual outcomes in models based on both Sentinel-5P and GEOS-CF inputs, the models were still able to accurately capture the spatial and temporal distribution of ground-level NO₂ both quantitatively and qualitatively.

The results highlight the critical role of anthropogenic indicators (e.g., PD, road networks, NTL), topographic factors (DEM), and air pollution variables (NO₂ from Sentinel-5P and GEOS-CF) in driving model performance, while meteorological factors contributed secondarily. In future studies, we will expand the data pool by incorporating traffic data and various atmospheric datasets and will perform more comprehensive analyses using deep learning models to enhance model accuracy.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The data presented in this study are available on request from the corresponding author (the data are not publicly available due to privacy or ethical restrictions).

Acknowledgments

The author expresses her gratitude to the Istanbul Metropolitan Municipality for providing the air quality ground monitoring station data used in this study.

Conflicts of Interest

The author declares no conflicts of interest.

Abbreviations

The following abbreviations are used in this manuscript:

NO₂	Nitrogen dioxide
ML	Machine Learning
RF	Random Forest Regression
XGB	XGBoost Regression
CB	CatBoost Regression
XAI	Explainable Artificial Intelligence
SHAP	Shapley Additive exPlanations

References

Liu, F.; Beirle, S.; Zhang, Q.; Dörner, S.; He, K.; Wagner, T. NOx Lifetimes and Emissions of Cities and Power Plants in Polluted Background Estimated by Satellite Observations. Atmos. Chem. Phys. 2016, 16, 5283–5298. [Google Scholar] [CrossRef]
Khan, R.R.; Siddiqui, M.J. Review on Effects of Particulates: Sulfur Dioxide and Nitrogen Dioxide on Human Health. Int. Res. J. Environ. Sci. 2014, 3, 70–73. [Google Scholar]
Eum, K.-D.; Kazemiparkouhi, F.; Wang, B.; Manjourides, J.; Pun, V.; Pavlu, V.; Suh, H. Long-Term NO₂ Exposures and Cause-Specific Mortality in American Older Adults. Environ. Int. 2019, 124, 10–15. [Google Scholar] [CrossRef]
Manisalidis, I.; Stavropoulou, E.; Stavropoulos, A.; Bezirtzoglou, E. Environmental and Health Impacts of Air Pollution: A Review. Front. Public Health 2020, 8, 505570. [Google Scholar] [CrossRef]
Seinfeld, J.H.; Pandis, S.N. Atmospheric Chemistry and Physics: From Air Pollution to Climate Change; John Wiley & Sons: Hoboken, NJ, USA, 2016. [Google Scholar]
Chen, T.-M.; Kuschner, W.G.; Gokhale, J.; Shofer, S. Outdoor Air Pollution: Nitrogen Dioxide, Sulfur Dioxide, and Carbon Monoxide Health Effects. Am. J. Med. Sci. 2007, 333, 249–256. [Google Scholar] [CrossRef]
Cao, H.; Han, L. The Short-Term Impact of the COVID-19 Epidemic on Socioeconomic Activities in China Based on the OMI-NO₂ Data. Environ. Sci. Pollut. Res. 2022, 29, 21682–21691. [Google Scholar] [CrossRef]
Schneider, P.; Lahoz, W.A.; van der A, R. Recent Satellite-Based Trends of Tropospheric Nitrogen Dioxide over Large Urban Agglomerations Worldwide. Atmos. Chem. Phys. 2015, 15, 1205–1220. [Google Scholar] [CrossRef]
Güçlü, Y.S.; Dabanlı, İ.; Şişman, E.; Şen, Z. Air Quality (AQ) Identification by Innovative Trend Diagram and AQ Index Combinations in Istanbul Megacity. Atmos. Pollut. Res. 2019, 10, 88–96. [Google Scholar] [CrossRef]
Wang, W.; Li, B.; Chen, B. Improved Surface NO₂ Retrieval: Double-Layer Machine Learning Model Construction and Spatio-Temporal Characterization Analysis in China (2018–2023). J. Environ. Manag. 2025, 384, 125439. [Google Scholar] [CrossRef]
Zhang, Y.; Li, Z.; Wei, J.; Zhan, Y.; Liu, L.; Yang, Z.; Zhang, Y.; Liu, R.; Ma, Z. Long-Term Exposure to Ambient NO₂ and Adult Mortality: A Nationwide Cohort Study in China. J. Adv. Res. 2022, 41, 13–22. [Google Scholar] [CrossRef]
Zhang, D.; Shi, R.; Zhou, Y.; Zheng, L.; Chen, M. The Spatial Distribution Characteristics and Ground-Level Estimation of NO₂ and SO₂ over Huaihe River Basin and Shanghai Based on Satellite Observations. In Proceedings of the Remote Sensing and Modeling of Ecosystems for Sustainability XV, San Diego, CA, USA, 19–23 August 2018; Gao, W., Chang, N.-B., Wang, J., Eds.; SPIE: Bellingham, WA, USA, 2018; p. 22. [Google Scholar]
Kang, Y.; Choi, H.; Im, J.; Park, S.; Shin, M.; Song, C.-K.; Kim, S. Estimation of Surface-Level NO₂ and O₃ Concentrations Using TROPOMI Data and Machine Learning over East Asia. Environ. Pollut. 2021, 288, 117711. [Google Scholar] [CrossRef]
Fernandes, A.P.; Riffler, M.; Ferreira, J.; Wunderle, S.; Borrego, C.; Tchepel, O. Spatial Analysis of Aerosol Optical Depth Obtained by Air Quality Modelling and SEVIRI Satellite Observations over Portugal. Atmos. Pollut. Res. 2019, 10, 234–243. [Google Scholar] [CrossRef]
Duncan, B.N.; Prados, A.I.; Lamsal, L.N.; Liu, Y.; Streets, D.G.; Gupta, P.; Hilsenrath, E.; Kahn, R.A.; Nielsen, J.E.; Beyersdorf, A.J.; et al. Satellite Data of Atmospheric Pollution for U.S. Air Quality Applications: Examples of Applications, Summary of Data End-User Resources, Answers to FAQs, and Common Mistakes to Avoid. Atmos. Environ. 2014, 94, 647–662. [Google Scholar] [CrossRef]
Qin, K.; Rao, L.; Xu, J.; Bai, Y.; Zou, J.; Hao, N.; Li, S.; Yu, C. Estimating Ground Level NO₂ Concentrations over Central-Eastern China Using a Satellite-Based Geographically and Temporally Weighted Regression Model. Remote Sens. 2017, 9, 950. [Google Scholar] [CrossRef]
Chi, Y.; Fan, M.; Zhao, C.; Yang, Y.; Fan, H.; Yang, X.; Yang, J.; Tao, J. Machine Learning-Based Estimation of Ground-Level NO₂ Concentrations over China. Sci. Total Environ. 2022, 807, 150721. [Google Scholar] [CrossRef]
Bahadur, F.T.; Shah, S.R.; Nidamanuri, R.R. Applications of Remote Sensing Vis-à-Vis Machine Learning in Air Quality Monitoring and Modelling: A Review. Environ. Monit. Assess. 2023, 195, 1502. [Google Scholar] [CrossRef] [PubMed]
Fu, J.; Tang, D.; Grieneisen, M.L.; Yang, F.; Yang, J.; Wu, G.; Wang, C.; Zhan, Y. A Machine Learning-Based Approach for Fusing Measurements from Standard Sites, Low-Cost Sensors, and Satellite Retrievals: Application to NO₂ Pollution Hotspot Identification. Atmos. Environ. 2023, 302, 119756. [Google Scholar] [CrossRef]
Cedeno Jimenez, J.R.; Pugliese Viloria, A.d.J.; Brovelli, M.A. Estimating Daily NO₂ Ground Level Concentrations Using Sentinel-5P and Ground Sensor Meteorological Measurements. ISPRS Int. J. Geoinf. 2023, 12, 107. [Google Scholar] [CrossRef]
Yagmur Aydin, N.; Aydin, I. Estimation of Ground-Level NO₂ Concentrations over Megacities Using Sentinel-5P and Machine Learning Models: A Case Study of Istanbul. Int. Arch. Photogramm. Remote Sens. Spat. Inf. Sci. 2025, XLVIII-M-6–2025, 303–308. [Google Scholar] [CrossRef]
Shetty, S.; Schneider, P.; Stebel, K.; David Hamer, P.; Kylling, A.; Koren Berntsen, T. Estimating Surface NO₂ Concentrations over Europe Using Sentinel-5P TROPOMI Observations and Machine Learning. Remote Sens. Environ. 2024, 312, 114321. [Google Scholar] [CrossRef]
Griffin, D.; Hempel, C.; McLinden, C.; Kharol, S.K.; Lee, C.; Fogal, A.; Sioris, C.; Shephard, M.; You, Y. Development and Validation of Satellite-Derived Surface NO₂ Estimates Using Machine Learning versus Traditional Approaches in North America. EGUsphere 2025, 2025, 1–20. [Google Scholar] [CrossRef]
Araki, S.; Shima, M.; Yamamoto, K. Spatiotemporal Land Use Random Forest Model for Estimating Metropolitan NO₂ Exposure in Japan. Sci. Total Environ. 2018, 634, 1269–1277. [Google Scholar] [CrossRef]
Chan, K.L.; Khorsandi, E.; Liu, S.; Baier, F.; Valks, P. Estimation of Surface NO₂ Concentrations over Germany from TROPOMI Satellite Observations Using a Machine Learning Method. Remote Sens. 2021, 13, 969. [Google Scholar] [CrossRef]
Long, S.; Wei, X.; Zhang, F.; Zhang, R.; Xu, J.; Wu, K.; Li, Q.; Li, W. Estimating Daily Ground-Level NO₂ Concentrations over China Based on TROPOMI Observations and Machine Learning Approach. Atmos. Environ. 2022, 289, 119310. [Google Scholar] [CrossRef]
Shao, Y.; Zhao, W.; Liu, R.; Yang, J.; Liu, M.; Fang, W.; Hu, L.; Adams, M.; Bi, J.; Ma, Z. Estimation of Daily NO₂ with Explainable Machine Learning Model in China, 2007–2020. Atmos. Environ. 2023, 314, 120111. [Google Scholar] [CrossRef]
Sun, W.; Tack, F.; Clarisse, L.; Schneider, R.; Stavrakou, T.; Van Roozendael, M. Inferring Surface NO₂ over Western Europe: A Machine Learning Approach with Uncertainty Quantification. J. Geophys. Res. Atmos. 2024, 129, e2023JD040676. [Google Scholar] [CrossRef]
Lundberg, S.; Lee, S.-I. A Unified Approach to Interpreting Model Predictions. arXiv 2017, arXiv:1705.07874. [Google Scholar] [CrossRef]
Khorrami, B.; Heidarlou, H.B.; Feizizadeh, B. Evaluation of the Environmental Impacts of Urbanization from the Viewpoint of Increased Skin Temperatures: A Case Study from Istanbul, Turkey. Appl. Geomat. 2021, 13, 311–324. [Google Scholar] [CrossRef]
Bozkurt, S.G.; Kuşak, L. Detection of Population Density, LULC Variation and Cross-Regional Similarities Using K-Means Clustering Algorithm in Istanbul Example. Mimar. Bilim. Uygulamaları Derg. 2024, 9, 69–86. [Google Scholar] [CrossRef]
Akın, A.; Sunar, F.; Berberoğlu, S. Urban Change Analysis and Future Growth of Istanbul. Environ. Monit. Assess. 2015, 187, 506. [Google Scholar] [CrossRef]
Sentinel-5P OFFL NO2: Offline Nitrogen Dioxide. Available online: https://developers.google.com/earth-engine/datasets/catalog/COPERNICUS_S5P_OFFL_L3_NO2 (accessed on 1 August 2025).
SRTM. Available online: https://www.earthdata.nasa.gov/data/instruments/srtm (accessed on 1 August 2025).
Rouse, J.W., Jr.; Haas, R.H.; Schell, J.A.; Deering, D.W. Monitoring the Vernal Advancement and Retrogradation (Green Wave Effect) of Natural Vegetation; NASA: Washington, DC, USA, 1973.
MODIS. Available online: https://developers.google.com/earth-engine/datasets/catalog/MODIS_061_MCD43A4#description (accessed on 1 August 2025).
Elvidge, C.D.; Baugh, K.; Zhizhin, M.; Hsu, F.C.; Ghosh, T. VIIRS Night-Time Lights. Int. J. Remote Sens. 2017, 38, 5860–5879. [Google Scholar] [CrossRef]
Levin, N.; Kyba, C.C.M.; Zhang, Q.; Sánchez de Miguel, A.; Román, M.O.; Li, X.; Portnov, B.A.; Molthan, A.L.; Jechow, A.; Miller, S.D.; et al. Remote Sensing of Night Lights: A Review and an Outlook for the Future. Remote Sens. Environ. 2020, 237, 111443. [Google Scholar] [CrossRef]
VIIRS Lunar Gap-Filled BRDF Nighttime Lights. Available online: https://developers.google.com/earth-engine/datasets/catalog/NASA_VIIRS_002_VNP46A2#description (accessed on 1 August 2025).
Geos-CF. Available online: https://developers.google.com/earth-engine/datasets/catalog/NASA_GEOS-CF_v1_rpl_tavg1hr#description (accessed on 1 August 2025).
Liu, N.; Lin, W.; Ma, J.; Xu, W.; Xu, X. Seasonal Variation in Surface Ozone and Its Regional Characteristics at Global Atmosphere Watch Stations in China. J. Environ. Sci. 2019, 77, 291–302. [Google Scholar] [CrossRef] [PubMed]
Qin, K.; Han, X.; Li, D.; Xu, J.; Loyola, D.; Xue, Y.; Zhou, X.; Li, D.; Zhang, K.; Yuan, L. Satellite-Based Estimation of Surface NO₂ Concentrations over East-Central China: A Comparison of POMINO and OMNO2d Data. Atmos. Environ. 2020, 224, 117322. [Google Scholar] [CrossRef]
OPTUNA. Available online: https://optuna.org/ (accessed on 1 August 2025).
Breiman, L. Random Forests. Mach. Learn. 2001, 45, 5–32. [Google Scholar] [CrossRef]
Chen, J.; Zhu, S.; Wang, P.; Zheng, Z.; Shi, S.; Li, X.; Xu, C.; Yu, K.; Chen, R.; Kan, H.; et al. Predicting Particulate Matter, Nitrogen Dioxide, and Ozone across Great Britain with High Spatiotemporal Resolution Based on Random Forest Models. Sci. Total Environ. 2024, 926, 171831. [Google Scholar] [CrossRef]
Vaishnavi, K.; Sreya, G.; Reddy, K.K.; P R, A. Machine Learning for Air Quality Prediction: Random Forest Classifier. In Proceedings of the 2024 Fourth International Conference on Advances in Electrical, Computing, Communication and Sustainable Technologies (ICAECT), Bhilai, India, 11–12 January 2024; pp. 1–5. [Google Scholar]
Sharda, S.; Kumar, S.; Setia, R.; Dhiman, P.; Patel, N.R.; Pateriya, B.; Salem, A.; Elbeltagi, A. Evaluation of Different Spectral Indices for Wheat Lodging Assessment Using Machine Learning Algorithms. Sci. Rep. 2025, 15, 21774. [Google Scholar] [CrossRef]
Chen, T.; Guestrin, C. XGBoost. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Francisco, CA, USA, 13–17 August 2016; ACM: New York, NY, USA, 2016; pp. 785–794. [Google Scholar]
Ozturk, M.Y.; Colkesen, I. A Novel Hybrid Methodology Integrating Pixel- and Object-Based Techniques for Mapping Land Use and Land Cover from High-Resolution Satellite Data. Int. J. Remote Sens. 2024, 45, 5640–5678. [Google Scholar] [CrossRef]
Georganos, S.; Grippa, T.; Vanhuysse, S.; Lennert, M.; Shimoni, M.; Wolff, E. Very High Resolution Object-Based Land Use–Land Cover Urban Classification Using Extreme Gradient Boosting. IEEE Geosci. Remote Sens. Lett. 2018, 15, 607–611. [Google Scholar] [CrossRef]
Rumora, L.; Miler, M.; Medak, D. Impact of Various Atmospheric Corrections on Sentinel-2 Land Cover Classification Accuracy Using Machine Learning Classifiers. ISPRS Int. J. Geoinf. 2020, 9, 277. [Google Scholar] [CrossRef]
Abdi, A.M. Land Cover and Land Use Classification Performance of Machine Learning Algorithms in a Boreal Landscape Using Sentinel-2 Data. GIScience Remote Sens. 2020, 57, 1–20. [Google Scholar] [CrossRef]
Dorogush, A.V.; Ershov, V.; Gulin, A. CatBoost: Gradient Boosting with Categorical Features Support. arXiv 2018, arXiv:1810.11363. [Google Scholar] [CrossRef]
Kulkarni, C.S. Advancing Gradient Boosting: A Comprehensive Evaluation of the CatBoost Algorithm for Predictive Modeling. J. Artif. Intell. Mach. Learn. Data Sci. 2022, 1, 54–57. [Google Scholar] [CrossRef]
Pham, T.D.; Yokoya, N.; Nguyen, T.T.T.; Le, N.N.; Ha, N.T.; Xia, J.; Takeuchi, W.; Pham, T.D. Improvement of Mangrove Soil Carbon Stocks Estimation in North Vietnam Using Sentinel-2 Data and Machine Learning Approach. GIScience Remote Sens. 2021, 58, 68–87. [Google Scholar] [CrossRef]
Ozturk, M.Y.; Colkesen, I. Development of Transferable Hybrid Deep Learning Networks for Temporal and Multi-Regional Mapping of Poplar Plantations with Sentinel-2. Adv. Space Res. 2025, 76, 4249–4279. [Google Scholar] [CrossRef]
Lundberg, S.M.; Erion, G.; Chen, H.; DeGrave, A.; Prutkin, J.M.; Nair, B.; Katz, R.; Himmelfarb, J.; Bansal, N.; Lee, S.-I. From Local Explanations to Global Understanding with Explainable AI for Trees. Nat. Mach. Intell. 2020, 2, 56–67. [Google Scholar] [CrossRef] [PubMed]
Wei, Q.; Song, W.; Dai, B.; Wu, H.; Zuo, X.; Wang, J.; Chen, J.; Li, J.; Li, S.; Chen, Z. Spatiotemporal Estimation of Surface NO₂ Concentrations in the Pearl River Delta Region Based on TROPOMI Data and Machine Learning. Atmos. Pollut. Res. 2025, 16, 102353. [Google Scholar] [CrossRef]
Kang, H.; Zhu, B.; Zhu, C.; de Leeuw, G.; Hou, X.; Gao, J. Natural and Anthropogenic Contributions to Long-Term Variations of SO₂, NO₂, CO, and AOD over East China. Atmos. Res. 2019, 215, 284–293. [Google Scholar] [CrossRef]
Yu, S.; Yin, S.; Zhang, R.; Wang, L.; Su, F.; Zhang, Y.; Yang, J. Spatiotemporal Characterization and Regional Contributions of O₃ and NO₂: An Investigation of Two Years of Monitoring Data in Henan, China. J. Environ. Sci. 2020, 90, 29–40. [Google Scholar] [CrossRef]
Xiao, K.; Wang, Y.; Wu, G.; Fu, B.; Zhu, Y. Spatiotemporal Characteristics of Air Pollutants (PM10, PM2.5, SO₂, NO₂, O₃, and CO) in the Inland Basin City of Chengdu, Southwest China. Atmosphere 2018, 9, 74. [Google Scholar] [CrossRef]
Lee, S.-J.; Lee, J.; Greybush, S.J.; Kang, M.; Kim, J. Spatial and Temporal Variation in PBL Height over the Korean Peninsula in the KMA Operational Regional Model. Adv. Meteorol. 2013, 2013, 1–16. [Google Scholar] [CrossRef]

Figure 1. The location of Türkiye and Istanbul, along with the distribution of air quality monitoring stations.

Figure 2. Workflow of the study.

Figure 3. The correlation heatmap of the features obtained on a daily basis.

Figure 4. Station-based RMSE, MAE, and R diagrams for 2024. (a,b): RF, (c,d): XGB, and (e,f): CB.

Figure 5. Seasonal maps created with the Sentinel-5P seasonal average dataset for 2024.

Figure 6. Seasonal maps created with the Geos-CF seasonal average dataset for 2024.

Figure 7. Accuracy assessment of seasonal maps using R, RMSE, and MAE.

Figure 8. Relative error distribution of seasons for (a) Sentinel-5P and (b) Geos-CF.

Figure 9. SHAP analysis results for the CB model estimated for both Sentinel-5P and Geos-CF.

Figure 10. The most effective features: (a) Road network, (b) Population density, and (c) Elevation.

Table 1. List of input and output features used in the study.

Data Type	Name	Variable	Data Source	Input/Output
Ground Monitoring	Ground-based NO₂ Station Data (Hourly average)	NO₂ measurements	https://havakalitesi.ibb.gov.tr/, accessed on 1 August 2025	Output
Satellite Air Quality Product	Sentinel-5P TROPOMI (Daily)	Tropospheric vertical column of NO₂	https://earthengine.google.com, accessed on 1 August 2025	Input
Satellite Air Quality Product	Geos-CF (Hourly average)	Hourly average Nitrogen dioxide (NO₂, MW = 46.00 g mol⁻¹) tropospheric column density	https://earthengine.google.com, accessed on 1 August 2025	Input
Climate	Geos-CF (Hourly average)	Dust optical depth at 550 nm (AOD550_Dust)	https://earthengine.google.com, accessed on 1 August 2025	Input
		Surface geopotential height (PHIS)		Input
		Surface pressure (PS)		Input
		Specific humidity (Q)		Input
		Relative humidity after moist (RH)		Input
		Sea level pressure (SLP)		Input
		2-m air temperature (T2M)		Input
		Total precipitation (TPREC)		Input
		Surface skin temperature (TS)		Input
		10-m eastward wind (U10M)		Input
		10-m northward wind (V10M)		Input
		Mid-layer heights (ZL)		Input
		Planetary boundary layer height (ZPBL)		Input
Society	MODIS (Daily)	Normalized Difference Vegetation Index (NDVI)	https://earthengine.google.com, accessed on 1 August 2025	Input
	VIIRS (Daily)	Nighttime light (NTL)	https://earthengine.google.com, accessed on 1 August 2025	Input
	OpenStreetMap	Road Length (RL)	https://www.geofabrik.de, accessed on 1 August 2025	Input
	TUIK (Annual)	Population Density (PD)	https://biruni.tuik.gov.tr/medas/, accessed on 1 August 2025	Input
Topography	SRTM	Digital Elevation Model (DEM)	https://earthengine.google.com, accessed on 1 August 2025	Input
-	-	Day of Year (DOY)	-	Input

Table 2. Accuracy assessment results of the methods during training, validation, and test steps.

Sentinel-5P
Data	Train (2019–2022)			Validation (2023)			Test (2024)
Model/Metric	R	RMSE (µg/m³)	MAE (µg/m³)	R	RMSE (µg/m³)	MAE (µg/m³)	R	RMSE (µg/m³)	MAE (µg/m³)
RF	0.820	16.302	11.458	0.660	17.909	12.880	0.666	16.645	12.150
XGB	0.945	9.157	6.405	0.657	18.274	13.111	0.638	17.605	12.766
CB	0.827	15.772	11.121	0.669	17.743	12.658	0.686	16.232	11.746
Geos-CF
Model/Metric	R	RMSE (µg/m³)	MAE (µg/m³)	R	RMSE (µg/m³)	MAE (µg/m³)	R	RMSE (µg/m³)	MAE (µg/m³)
RF	0.837	15.575	10.954	0.641	18.266	13.077	0.649	17.027	12.409
XGB	0.842	14.917	10.402	0.642	18.389	13.095	0.653	17.000	12.388
CB	0.819	16.164	11.479	0.643	18.193	12.954	0.665	16.582	12.084

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the author. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Yagmur Aydin, N. Machine Learning-Based Ground-Level NO₂ Estimation in Istanbul: A Comparative Analysis of Sentinel-5P and GEOS-CF. Appl. Sci. 2025, 15, 10997. https://doi.org/10.3390/app152010997

AMA Style

Yagmur Aydin N. Machine Learning-Based Ground-Level NO₂ Estimation in Istanbul: A Comparative Analysis of Sentinel-5P and GEOS-CF. Applied Sciences. 2025; 15(20):10997. https://doi.org/10.3390/app152010997

Chicago/Turabian Style

Yagmur Aydin, Nur. 2025. "Machine Learning-Based Ground-Level NO₂ Estimation in Istanbul: A Comparative Analysis of Sentinel-5P and GEOS-CF" Applied Sciences 15, no. 20: 10997. https://doi.org/10.3390/app152010997

APA Style

Yagmur Aydin, N. (2025). Machine Learning-Based Ground-Level NO₂ Estimation in Istanbul: A Comparative Analysis of Sentinel-5P and GEOS-CF. Applied Sciences, 15(20), 10997. https://doi.org/10.3390/app152010997

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Machine Learning-Based Ground-Level NO₂ Estimation in Istanbul: A Comparative Analysis of Sentinel-5P and GEOS-CF

Abstract

1. Introduction

2. Study Area and Datasets