Next Article in Journal
Techno-Economic Optimal Operation of an On-Site Hydrogen Refueling Station
Previous Article in Journal
Boosting Seed Performance with Cold Plasma
Previous Article in Special Issue
Linking Analysis to Atmospheric PFAS: An Integrated Framework for Exposure Assessment, Health Risks, and Future Management Strategies
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Machine Learning-Based Ground-Level NO2 Estimation in Istanbul: A Comparative Analysis of Sentinel-5P and GEOS-CF

Geomatics Engineering Department, Engineering Faculty, Gebze Technical University, Kocaeli 41400, Türkiye
Appl. Sci. 2025, 15(20), 10997; https://doi.org/10.3390/app152010997
Submission received: 22 September 2025 / Revised: 10 October 2025 / Accepted: 13 October 2025 / Published: 13 October 2025
(This article belongs to the Special Issue Air Quality Monitoring, Analysis and Modeling)

Abstract

Nitrogen dioxide (NO2) poses severe risks to human health and the environment, especially in densely populated megacities. Ground-based air quality monitoring stations provide high-temporal-resolution data but are spatially limited, while satellite observations offer broad coverage but measure column densities rather than surface concentrations. To overcome these limitations, this study integrates ground-based observations with satellite-derived NO2 from Sentinel-5P TROPOMI and GEOS-CF products to estimate ground-level NO2 in Istanbul using machine learning (ML) approaches. Three ML algorithms (RF, XGB, and CB) were tested on two datasets spanning 2019–2024 at ~1 km resolution, incorporating 20 features, including topographic, meteorological, environmental, and demographic variables. Among models, CB achieved the best performance (R: 0.686, RMSE: 16.23 µg/m3, and MAE: 11.75 µg/m3 in the test dataset) with the Sentinel-5P dataset, successfully capturing spatial and seasonal variations in ground-level NO2 both quantitatively and qualitatively. SHAP analysis revealed that regarding satellite-derived NO2, anthropogenic indicators such as population density, road length, and digital elevation model were the most influential features, while meteorological factors contributed secondarily. Despite the lower spatial resolution of GEOS-CF data, both Sentinel-5P and GEOS-CF datasets supported reliable model outputs. This study provides the first ML-based ground-level NO2 estimation framework for the Istanbul Metropolitan City.

1. Introduction

As one of the main air pollutants, nitrogen dioxide (NO2) is produced by natural and anthropogenic sources, including fossil fuel burning from transportation, industrial activities, and power plants [1]. Exposure to NO2 is associated with a range of health issues, such as cardiovascular and respiratory diseases, lung cancer, and premature death [2,3,4]. Beyond its health impacts, NO2 also adversely affects the natural environment. It contributes to the formation of tropospheric ozone and aerosol nitrates, leading to acid rain and reduced visibility [5]. Moreover, high concentrations of NO2 can damage crops and vegetation by reducing yields and inhibiting plant growth [6].
The rapid growth of megacity populations, driven by economic and technological development, has further increased NO2 emissions from human activities. Several studies have demonstrated the strong connection between economic development and ground-level NO2 concentrations [7,8]. Over the past three decades, megacities have contributed significantly to the rise in anthropogenic NO2 emissions, making them critical areas for air pollution mitigation and control [9,10]. Therefore, continuous monitoring and accurate estimation of ground-level NO2 concentrations in megacities are of vital importance for tracking air pollution and developing effective action plans [11].
Air quality monitoring stations are sustainable systems established in regions where air pollution monitoring is essential. These stations provide high spatial and temporal resolution data, typically with hourly measurements. However, as point-based systems, their spatial coverage is limited. Furthermore, since the monitoring of anthropogenic emissions is often prioritized, the distribution of stations is concentrated in urban areas, resulting in a lack of spatial balance [12,13]. On the contrary, remote sensing technologies enable spatial and temporal analyses through their wide coverage and repeated data acquisition [14]. Despite these advantages, remote sensing has limitations, such as the inability to collect data under cloudy conditions and its measurement of column density rather than surface or near-surface concentrations [13,15]. Nevertheless, numerous studies have demonstrated that satellite-observed tropospheric NO2 column density correlates with ground-level NO2 due to its short atmospheric lifetime and its formation from anthropogenic activities [1,16]. Considering the respective advantages and limitations of satellite and ground-based monitoring systems, their combined use provides complementary insights, supporting the development of more comprehensive and accurate models.
Several approaches have been employed to estimate ground-level NO2, including traditional statistical methods such as kriging, land use regression, and geographically and temporally weighted regression [16]. However, machine learning (ML) models have shown superior performance by effectively capturing complex nonlinear relationships, accounting for interactions among diverse variables, and achieving more accurate predictions [13,17,18]. ML-based studies have been applied across various scales, including local [19,20,21], regional [22,23], and national [10,17,24,25,26] levels. In addition, explainable artificial intelligence (XAI) techniques, such as Shapley Additive exPlanations (SHAP), have been increasingly used to identify and interpret the environmental, meteorological, and topographic drivers influencing ground-level NO2 estimates [22,27,28].
Within the scope of this study, ground-level NO2 concentrations in Istanbul were estimated using ML algorithms by integrating ground-based air quality monitoring data with satellite-derived tropospheric column of NO2, including Sentinel-5P TROPOMI and Geos-CF datasets. Three machine learning algorithms were applied to a dataset spanning 2019–2024 at a spatial resolution of ~1 km, with a focus on seasonal variations. Both qualitative and quantitative evaluations were conducted. To improve prediction accuracy, a comprehensive set of topographic, meteorological, environmental, and demographic variables was incorporated. Model interpretability was ensured using SHAP, a widely adopted explainable AI method [29], to identify the most influential features of NO2 variability. The study will address the following research questions:
  • How accurately can ground-level NO2 in metropolitan cities be estimated using satellite-derived tropospheric NO2 data?
  • How do ground-level NO2 estimates differ when using NO2 data with varying spatial resolutions?
  • To what extent do environmental and anthropogenic factors influence the prediction of ground-level NO2?
In this context, this study represents the first ML-based ground-level NO2 estimation framework for Istanbul, Türkiye’s most populous and touristic city, providing critical insights for air quality management and policymaking.

2. Study Area and Datasets

2.1. Study Area and Ambient Air Quality Monitoring Station Data

Istanbul is Türkiye’s most populous metropolitan city, with a population of approximately 16 million. Its role as a bridge connecting Asia and Europe has made it a major transportation, industrial, and tourism hub (Figure 1). Due to internal migration and increasing population, residential areas have increased significantly in the last 30 years to meet the city’s housing needs [30,31], and it is estimated that they will increase further [32]. Therefore, monitoring air quality in cities with intense human activity is gaining importance. To this end, the Istanbul Metropolitan Municipality and the Ministry of Environment, Urbanization, and Climate Change have established air quality monitoring stations in the city. The distribution of air quality monitoring stations is shown in Figure 1.
There are 40 air quality stations in Istanbul. However, only 32 of these stations measure the NO2 parameter. The distribution of stations is concentrated around the Bosphorus, where transportation, touristic and industrial activities, and residential areas are concentrated. This allows monitoring only in areas where threats are identified, but not in areas without air quality stations. In air quality monitoring stations, parameters are measured hourly and serviced online (https://havakalitesi.ibb.gov.tr/, accessed on 1 August 2025).
Hourly collected station data were filtered based on the satellite passing time (between 1.00 and 2.00 p.m. in UTC+3 time) between the years of 2019 and 2024, and the data from both hours each day were averaged. To minimize the effects of abnormal values on the model, values greater than 300 and less than one were eliminated from the data, the same as with Chi et al. (2022) [17].

2.2. Datasets

In the study, various datasets were identified through a literature review and collected from different sources, taking into account the matching of time intervals, as listed in Table 1.

2.2.1. Sentinel-5P TROPOMI Tropospheric NO2 Columns

The Sentinel-5P satellite was launched on 13 October 2017, and was designed for monitoring the atmosphere and air pollution, ozone-layer monitoring, climate change and aviation safety at high spatial resolution under the Copernicus mission. The TROPOMI (TROPOspheric Monitoring Instrument) sensor collects solar radiation backscattered from the Earth and atmosphere. Its products are served in two processing levels with a spatial sampling of approximately 3.5 × 5.5 km since 6 August 2019. However, the Level-2 data is stored as Level-3 OFL with a spatial sampling of approximately 1 × 1 km in Google Earth Engine (GEE) [33]. The Sentinel-5P TROPOMI dataset was extracted using the GEE cloud computing platform, which stores archive and up-to-date datasets and enables data processing and geospatial analysis.

2.2.2. Satellite-Based Variables

Various topographical, environmental, and meteorological factors were included in the model to enhance the data space and improve model accuracy, as well as their influence on NO2. The Shuttle Radar Topography Mission (SRTM) digital elevation model (DEM) [34] was used as a topographical factor. SRTM is a global DEM dataset with a 30 m spatial resolution. Normalized Difference Vegetation Index (NDVI) [35] was used as the environmental factor to identify green and non-green areas. NDVI data were calculated from MODIS Nadir Bidirectional Reflectance Distribution Function Adjusted Reflectance (NBAR) data, which has a spatial resolution of 500 m and has been providing data since 2000 [36]. Nighttime light (NTL) data were used to identify urban areas and non-green areas, representing the artificial lights of the settlements and human activities [37]. Considering the impact of anthropogenic activities on NO2 formation, NTL data spatially well characterize the areas where human activities and hence NO2 emissions are concentrated [38]. For this purpose, NTL data were obtained with the Visible Infrared Imaging Radiometer Suite (VIIRS) with a spatial resolution of 500 m [39].
The climate variables were provided by the Goddard Earth Observing System Composition Forecast (Geos-CF). Geos-CF, developed by the National Aeronautics and Space Administration (NASA), includes globally produced three-dimensional distributions of atmospheric composition with a spatial resolution of approximately 27 km. Geos-CF products cover atmospheric replay to time-average one-hour data by combining meteorological, atmospheric, and chemical collections [40]. Fourteen bands of the Geos-CF were used in the study, as given in Table 1. To match the Geos-CF dataset with Sentinel-5P, the Geos-CF dataset was temporally filtered based on the satellite passing time of the Sentinel-5P between the years of 2019 and 2024, and the Geos-CF data taken in both hours (between 1.00 and 2.00 p.m. in UTC+3 time) for each day were averaged to provide temporal consistency of the datasets.

2.2.3. Auxiliary Variables

To consider the effects of social factors, population density (PD) and road length (RL) variables were included in the estimation model. Population data were provided from the Turkish Statistical Institute (TUIK), and population density was computed by dividing the population values by the area of each district. Road data was obtained from the road layer shared by OpenStreetMap. Within the scope of the study, 0.1 × 0.1° grid network was created for the study area, and the road lengths within each grid were calculated. In addition to meteorological, environmental, and social factors, the day of the year (DOY) was included in the model. These auxiliary data were used to detect the density of human activities both temporally and spatially [41].

3. Methodology

The study was conducted in three parts: data preprocessing and feature extraction, model development with ML algorithms and model evaluations, as shown in Figure 2.

3.1. Data Preprocessing and Feature Extraction

Satellite images data, including Sentinel-5P, Geos-CF, MODIS NDVI, SRTM DEM, and VIIRS NTL, were extracted with the locations of air quality monitoring stations using GEE, and data were matched both temporally and spatially. Data were divided into three parts: training, validation, and test. While the training data covers the years 2019–2022, the validation data covers the year 2023, and the test data covers the year 2024.
Two different data groups were generated to measure the effect of NO2 data sources. With this purpose, the tropospheric column density of NO2 gathered from Sentinel-5P was used as input in the first data group, while the NO2 tropospheric column density data from Geos-CF were used as input in the second data group, along with 19 other variables.
In the phase of applying the created models to all images, 0.1 × 0.1° grid network was created, data was collected on the GEE platform, and the resulting thematic maps were produced with the best-performing model.

3.2. Model Development with Machine Learning

In order to estimate surface NO2 concentrations, the best appropriate model must generate the most accurate results. Therefore, three different machine learning algorithms, namely Random Forest (RF) Regression, Extreme Gradient Boosting (XGBoost) Regression (XGB), and CatBoost Regression (CB), were chosen based on their success in similar studies [13,19,26,42]. Within the scope of the study, Optuna [43] was used in the hyperparameter optimization of algorithms.
RF, introduced by Breiman (2001) [44], is one of the most prominent machine learning algorithms for air quality assessment research owing to its robust predictive performance and efficiency [45,46]. RF divides the original training dataset into random subsets and constructs an ensemble of decision trees. The training process is performed utilizing 2/3 of these subsets, while the remaining subsets are responsible for evaluating the model’s accuracy [47]. The majority voting approach is utilized to identify the final label of samples.
XGB was presented by Chen and Guestrin (2016) [48] as an advanced tree-based machine learning algorithm based on boosting theories. The main principle of XGB is the sequential refinement of weak learners within an ensemble [49]. The training process begins with a base model by allocating equal weights to all samples, and labels are predicted. In subsequent iterations, incorrectly estimated samples are assigned higher weights to fix their labels [50,51]. Unlike other ML models, XGB incorporates regularization techniques and optimized loss functions to reduce overfitting and enhance generalization [52].
CB, one of the latest members of tree-based algorithms, was developed by Yandex to effectively handle the challenges associated with categorical features [53]. Automatic processing of categorical data and missing values through ordered boosting eliminates manual preprocessing requirements [54]. Additionally, CB constructs a symmetric tree structure to mitigate overfitting and enables GPU acceleration for large-scale datasets, resulting in fast and superior predictive performance [55,56].
Moreover, in this study, SHAP, an XAI framework based on game theory, was employed to interpret model behavior by quantifying the contribution of each feature to predictions in terms of both magnitude and direction [29]. SHAP values represent the marginal effect of a feature compared to the dataset’s average prediction, while aggregated absolute SHAP values provide global feature importance [57]. This approach enables both local and global interpretation, facilitating the assessment of feature relevance, the validation of model reliability against domain knowledge, and highlighting the role of satellite observations in modeling ground-level NO2 concentrations.

3.3. Model Evaluation and Accuracy Assessment

To assess model accuracy, three accuracy metrics were used within the study: Pearson correlation coefficient (R), Root Mean Square Error (RMSE), and Mean Absolute Error (MAE). The equations of metrics are, respectively, given as follows:
R = i = 1 n ( x i x m ) ( x ^ i x ^ m ) i = 1 n ( x i x m ) 2 i = 1 n ( x ^ i x ^ m ) 2
R M S E = i = 1 n ( x i x ^ i ) 2 n
M A E = i = 1 n x i x ^ i n
where x i is the measured value, x ^ i is the model-estimated NO2 value, n is the total number of samples in the validation dataset, and x m and x ^ m represent the means of the measured and estimated values, respectively.

4. Results

4.1. Correlation Analysis Between Variables

Firstly, the correlation coefficients between the features are calculated using the Pearson Correlation coefficient and are given in Figure 3. The results indicate that the correlation coefficients of S5P and Geos-CF NO2 data with ground station data (Sta_NO2) were determined to be close to each other, 0.38 and 0.31, respectively. However, the correlation coefficient between S5P and Geos-CF NO2 data was determined to be 0.68. NDVI has a negative correlation with road length (RL) and population density (PD), with correlation coefficients of −0.66 and −0.57, respectively. The correlation coefficient between PD and RL was 0.59. When the correlations between Geos-CF bands were examined, the highest correlations were determined as 2-m air temperature (T2M)-surface skin temperature (TS) 0.97, sea level pressure (SLP)-surface pressure (PS) 0.90, mid layer heights (ZL)-surface geopotential height (PHIS) 0.86, eastward wind (U10M)-northward wind (V10M) 0.86, T2M-specific humidity (Q) 0.81, TS-Q 0.75, respectively.

4.2. Accuracy Assessment and Seasonal Thematic Maps

Three ML algorithms were tested with accuracy metrics including R, RMSE, and MAE for each data part, and the results are given in Table 2. The best-performing model was highlighted for both Sentinel-5P and Geos-CF datasets.
Considering the results given in Table 2, all the results obtained were lower than the standard deviations of the data itself (34.16 µg/m3 for training, 31.40 µg/m3 for validation, and 29.86 µg/m3 for testing). This shows that all algorithms performed well. Although the error values of the models established with Geos-CF data in the training phase were lower than those of the models established with Sentinel-5P data, the RMSE and MAE of the models established with S5P in the validation and testing phases were relatively lower, and the R was higher.
Although XGB gave the highest R (0.945 and 0.842) and lowest RMSE (9.157 and 14.917 µg/m3) and MAE (6.405 and 10.402 µg/m3) values in the training phase, it obtained the lowest R and highest error values in the validation and testing phases for both S5P and Geos-CF data due to its tendency to overfitting. CB gave the best results in the validation and testing phases in both data models due to its generalization capability. When ranked in terms of performance, the CB algorithm is followed by RF and XGB, respectively.
The station-based diagrams, including accuracy metrics (RMSE, MAE, and R) for all algorithms, are presented in Figure 4. In general, the RF model (Figure 4a,b) and CB model (Figure 4e,f) exhibited similar performance for the Sentinel-5P dataset, characterized by high correlation coefficients (R > 0.7) and relatively low RMSE (mostly below 15 µg/m3). The color distribution further indicates lower MAE values, suggesting that RF and CB achieved more stable and accurate estimations across different stations. The XGB model (Figure 4c,d) showed a wider spread of RMSE values, with several points extending toward higher error levels, reflecting greater variability in prediction accuracy. The CB model (Figure 4e,f) demonstrated consistent and balanced results, with moderate RMSE and relatively high correlation values similar to those of RF. Across all models, the Sentinel-5P-based results (Figure 4b,d,f) generally outperform those derived from Geos-CF (Figure 4a,c,e), indicating a better agreement between Sentinel-5P observations and the model outputs. Overall, the station-based analysis confirms that RF and CB provided the most robust and reliable predictions, while Sentinel-5P data provided stronger consistency with the modeled variables. Additionally, the analysis revealed that the models consistently exhibit high errors at the same stations. The three stations with the highest errors exceeding 20 µg/m3 (Avcılar, Ümraniye2, and Esenler) are located in spatially distinct areas within the most densely populated districts. This may be attributed to abrupt fluctuations in measured values, which could have increased the model errors at these stations.
In addition to the quantitative evaluation, a qualitative assessment is also important in assessing the model’s performance. For this purpose, seasonal thematic maps of NO2 distribution were created with seasonal average data for 2024. Maps created with Sentinel-5P are given in Figure 5, while maps created with Geos-CF are given in Figure 6.
According to Figure 5, seasonal maps obtained from the XGB model are quite noisy, unlike those from other models. It was observed that the amount of NO2 was high in the northern parts of the city where forested areas were dense. Seasonal variation in NO2 distribution shows significant changes in both the RF and CB models. During winter and spring, NO2 levels are higher in the southern parts of the Bosphorus, where urban areas are dense, while they are lower in other areas. While NO2 levels decrease in the summer months, they increase again in the autumn. Differences between the two model results are particularly evident in summer and autumn. The sharp linear changes in the results are due to the pixel sizes of the Geos-CF data.
When the thematic maps obtained with the models created with Geos-CF are examined, the XGB model has a high noise level, similar to the results obtained with the Sentinel-5P models. RF and CB models also show consistent seasonal distribution. Model results also diverge between summer and autumn seasons.
In order to evaluate the accuracy of the produced thematic maps, an accuracy assessment was conducted using seasonal average values obtained from the stations. The calculated R, RMSE, and MAE values for four seasons are given in Figure 7.
Considering Figure 7, among the models built with Sentinel-5P, XGB has the lowest R and highest RMSE and MAE errors in all seasons. After both quantitative and qualitative evaluations, the XGB algorithm was found to be unsuccessful on this dataset. In the RF and CB models, RF was relatively successful only in the summer season, while CB performed successfully in other seasons. When the error amounts were examined, it was determined that the models’ error amounts were higher, and the correlations were lower in the spring and autumn seasons compared to the summer and winter seasons.
Among models built with Geos-CF data, XGB appears to be the most successful model with the lowest error values in the quantitative evaluation, even though it fails in the qualitative assessment. This demonstrates that quantitative assessments alone are not sufficient to evaluate ML model results. Among the RF and CB models, CB performed well in all seasons except summer. Spring and autumn were also the seasons with the highest error rates.
When the Sentinel-5P and Geos-CF model results are compared, it is observed that the RF and CB model results shown in Figure 5 and Figure 6 mostly yield similar results. In the quantitative evaluation, it was determined that the CB model created with Sentinel-5P data showed better results than the models established with Geos-CF in both the training, validation, and testing stages (Table 2) and in the seasonal evaluation analyses (Figure 7).

5. Discussion

In the study, ground-level NO2 estimation analysis of Istanbul province was carried out using three different ML algorithms (RF, XGB, and CB) and two different datasets, including satellite-derived NO2 (Sentinel-5P and Geos-CF), meteorological, environmental, and social factors. The model created using data collected between 2019 and 2023 was tested with data from 2024, and its performance was compared both quantitatively and qualitatively. While many studies indicate that the XGB algorithm performs well in estimating ground-level NO2 [13,17,27,58], in this study, XGB exhibited the lowest performance among the three algorithms. Shetty et al. (2024) [22] also reported that the model performed poorly in Türkiye in their study using the XGB algorithm over Europe. While CB produced the best results among the three algorithms, Figure 8 presents the station-based relative error distribution, calculated using both seasonal station observations and the estimated values.
The highest errors were recorded in spring in the southern parts of the Bosphorus, whereas in summer, errors were more pronounced in the northern areas compared to other seasons. Error levels were generally higher at stations on the Asian side (east of the Bosphorus) than on the European side (west of the Bosphorus), with the lowest errors observed in winter at stations in the western part of the city. This is primarily due to the lack of sufficient stations in the northern part of the Asian side. It is noteworthy that the distribution of relative errors is quite similar in both datasets.
To investigate the impacts of features on the estimation of ground-level NO2, SHAP analysis was performed for each data model. The results are presented in Figure 9, and the 20 features are listed in order of importance. While Sentinel-5P NO2 data became prominent as the most important factor, Geos-CF data contributed to the model as the fourth most important factor. This reveals that location features (RL, PD, and DEM) are more important than Geos-CF data. It also demonstrated the necessity of including satellite-derived NO2 data in the ground-level NO2 estimation model [58].
In both models, RL and PD are among the top three most important factors. High RL and PD values increase the model output. Dense populations and road networks are strongly associated with higher surface NO2 levels. The distribution of road networks and PD is given in Figure 10a,b. It is observed that the road network is dense in the southern parts of the Bosphorus where the population is concentrated. The CB model results in Figure 5 and Figure 6 show that ground-level NO2 values are high in areas where the road network and PD are dense. The main reason for this situation is vehicle emissions and human activities that cause NO2 formation [59]. Shao et al. (2023) [27] revealed that urbanization and population growth exhibit a power law relationship with NO2 concentration. In addition, the seasonal variation in ground-level NO2 concentration in this region can also be associated with human activities. The fact that NO2 levels are high in winter and spring, decrease in summer, and increase again in autumn is due to heating activities in densely populated areas [60].
DEM was identified as the third most important factor after RL and PD, and its distribution in Istanbul is given in Figure 10c. According to the SHAP results, decreasing elevation causes an increase in ground-level NO2. The reason for this can be the high settlement and human population on the shores of the Bosphorus, where the elevation is low. When Figure 10 is examined, in areas with high elevation, both the road network and population density decrease, and therefore the NO2 level also decreases.
The following five important features are ZL, NTL, V_10M, NDVI, and ZPBL. While high ZL, NTL, and V_10M values cause an increase in ground-level NO2, low NDVI and ZPBL contribute to this rise. High NTL and low NDVI can also be associated with urbanization [22]. In densely urbanized areas, NTL is higher and NDVI is lower. Since wind is an effective parameter in NO2 transport [28], it was found in this study that northward wind (V_10M) is more effective than eastward wind (U_10M). Low ZPBL values may be associated with increased ground-level NO2 values, as air pollutants tend to concentrate at low altitudes near the Earth’s surface [61]. Low PBLH also causes increased ground-level NO2 concentrations, especially in coastal areas [62]. Meteorological parameters were ranked among the last ten least influential features within the scope of the study, lagging behind human activities and topography in estimating ground-level NO2.

6. Strengths and Limitations

The strengths and limiting factors of this study were identified. The strength of the study is the successful demonstration of ground-level NO2 distribution over Istanbul using data with varying spatial resolutions. The CB models built with two different datasets (Sentinel-5P and Geos-CF) produced both quantitative and qualitative results, but the model built with Sentinel-5P data performed better, thanks to its higher spatial resolution.
The limitation of the study is the lack of both in situ and auxiliary data. When the distribution of ground air quality monitoring stations in Figure 1 is examined, it is observed that the stations are not distributed homogeneously throughout the city. Because NO2 emissions are generated by vehicle emissions, industrial activities, and human activities, terrestrial air quality monitoring stations are located in densely populated areas. However, this poses a limitation in regional ground-level NO2 estimation analyses.
Another limitation is that meteorological data with a spatial resolution of approximately 11 km, such as ERA5-Land, cannot be used within the scope of this study. This is because water areas are masked in this dataset. Because Istanbul is geographically surrounded by the Black Sea and the Sea of Marmara, the absence of pixels corresponding to terrestrial air quality measurement stations on land results in data loss due to pixel size. Therefore, Geos-CF data were used instead of ERA5-Land data. Studies frequently utilize ERA5 products and perform analyses at a broader spatial scale [23,26]. Future studies will test the model with ERA5 data at the same spatial resolution as Geos-CF, and analyses will be expanded to include other metropolitan cities. Furthermore, adding traffic data in cities with high human activity will also increase model accuracy. Additionally, the models were analyzed on a seasonal basis in this study. Future research could extend this approach to monthly assessments; however, producing results at a daily temporal resolution is not feasible due to data gaps.

7. Conclusions

In this study, ground-level NO2 estimation was performed using three different ML algorithms and two different datasets. In addition to determining the most accurate model, each model incorporated 20 features, and their relative contributions were assessed through SHAP analysis. The CB model was identified as the most successful. Although the atmospheric GEOS-CF data with lower spatial resolution influenced the visual outcomes in models based on both Sentinel-5P and GEOS-CF inputs, the models were still able to accurately capture the spatial and temporal distribution of ground-level NO2 both quantitatively and qualitatively.
The results highlight the critical role of anthropogenic indicators (e.g., PD, road networks, NTL), topographic factors (DEM), and air pollution variables (NO2 from Sentinel-5P and GEOS-CF) in driving model performance, while meteorological factors contributed secondarily. In future studies, we will expand the data pool by incorporating traffic data and various atmospheric datasets and will perform more comprehensive analyses using deep learning models to enhance model accuracy.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The data presented in this study are available on request from the corresponding author (the data are not publicly available due to privacy or ethical restrictions).

Acknowledgments

The author expresses her gratitude to the Istanbul Metropolitan Municipality for providing the air quality ground monitoring station data used in this study.

Conflicts of Interest

The author declares no conflicts of interest.

Abbreviations

The following abbreviations are used in this manuscript:
NO2Nitrogen dioxide
MLMachine Learning
RFRandom Forest Regression
XGBXGBoost Regression
CBCatBoost Regression
XAIExplainable Artificial Intelligence
SHAPShapley Additive exPlanations

References

  1. Liu, F.; Beirle, S.; Zhang, Q.; Dörner, S.; He, K.; Wagner, T. NOx Lifetimes and Emissions of Cities and Power Plants in Polluted Background Estimated by Satellite Observations. Atmos. Chem. Phys. 2016, 16, 5283–5298. [Google Scholar] [CrossRef]
  2. Khan, R.R.; Siddiqui, M.J. Review on Effects of Particulates: Sulfur Dioxide and Nitrogen Dioxide on Human Health. Int. Res. J. Environ. Sci. 2014, 3, 70–73. [Google Scholar]
  3. Eum, K.-D.; Kazemiparkouhi, F.; Wang, B.; Manjourides, J.; Pun, V.; Pavlu, V.; Suh, H. Long-Term NO2 Exposures and Cause-Specific Mortality in American Older Adults. Environ. Int. 2019, 124, 10–15. [Google Scholar] [CrossRef]
  4. Manisalidis, I.; Stavropoulou, E.; Stavropoulos, A.; Bezirtzoglou, E. Environmental and Health Impacts of Air Pollution: A Review. Front. Public Health 2020, 8, 505570. [Google Scholar] [CrossRef]
  5. Seinfeld, J.H.; Pandis, S.N. Atmospheric Chemistry and Physics: From Air Pollution to Climate Change; John Wiley & Sons: Hoboken, NJ, USA, 2016. [Google Scholar]
  6. Chen, T.-M.; Kuschner, W.G.; Gokhale, J.; Shofer, S. Outdoor Air Pollution: Nitrogen Dioxide, Sulfur Dioxide, and Carbon Monoxide Health Effects. Am. J. Med. Sci. 2007, 333, 249–256. [Google Scholar] [CrossRef]
  7. Cao, H.; Han, L. The Short-Term Impact of the COVID-19 Epidemic on Socioeconomic Activities in China Based on the OMI-NO2 Data. Environ. Sci. Pollut. Res. 2022, 29, 21682–21691. [Google Scholar] [CrossRef]
  8. Schneider, P.; Lahoz, W.A.; van der A, R. Recent Satellite-Based Trends of Tropospheric Nitrogen Dioxide over Large Urban Agglomerations Worldwide. Atmos. Chem. Phys. 2015, 15, 1205–1220. [Google Scholar] [CrossRef]
  9. Güçlü, Y.S.; Dabanlı, İ.; Şişman, E.; Şen, Z. Air Quality (AQ) Identification by Innovative Trend Diagram and AQ Index Combinations in Istanbul Megacity. Atmos. Pollut. Res. 2019, 10, 88–96. [Google Scholar] [CrossRef]
  10. Wang, W.; Li, B.; Chen, B. Improved Surface NO2 Retrieval: Double-Layer Machine Learning Model Construction and Spatio-Temporal Characterization Analysis in China (2018–2023). J. Environ. Manag. 2025, 384, 125439. [Google Scholar] [CrossRef]
  11. Zhang, Y.; Li, Z.; Wei, J.; Zhan, Y.; Liu, L.; Yang, Z.; Zhang, Y.; Liu, R.; Ma, Z. Long-Term Exposure to Ambient NO2 and Adult Mortality: A Nationwide Cohort Study in China. J. Adv. Res. 2022, 41, 13–22. [Google Scholar] [CrossRef]
  12. Zhang, D.; Shi, R.; Zhou, Y.; Zheng, L.; Chen, M. The Spatial Distribution Characteristics and Ground-Level Estimation of NO2 and SO2 over Huaihe River Basin and Shanghai Based on Satellite Observations. In Proceedings of the Remote Sensing and Modeling of Ecosystems for Sustainability XV, San Diego, CA, USA, 19–23 August 2018; Gao, W., Chang, N.-B., Wang, J., Eds.; SPIE: Bellingham, WA, USA, 2018; p. 22. [Google Scholar]
  13. Kang, Y.; Choi, H.; Im, J.; Park, S.; Shin, M.; Song, C.-K.; Kim, S. Estimation of Surface-Level NO2 and O3 Concentrations Using TROPOMI Data and Machine Learning over East Asia. Environ. Pollut. 2021, 288, 117711. [Google Scholar] [CrossRef]
  14. Fernandes, A.P.; Riffler, M.; Ferreira, J.; Wunderle, S.; Borrego, C.; Tchepel, O. Spatial Analysis of Aerosol Optical Depth Obtained by Air Quality Modelling and SEVIRI Satellite Observations over Portugal. Atmos. Pollut. Res. 2019, 10, 234–243. [Google Scholar] [CrossRef]
  15. Duncan, B.N.; Prados, A.I.; Lamsal, L.N.; Liu, Y.; Streets, D.G.; Gupta, P.; Hilsenrath, E.; Kahn, R.A.; Nielsen, J.E.; Beyersdorf, A.J.; et al. Satellite Data of Atmospheric Pollution for U.S. Air Quality Applications: Examples of Applications, Summary of Data End-User Resources, Answers to FAQs, and Common Mistakes to Avoid. Atmos. Environ. 2014, 94, 647–662. [Google Scholar] [CrossRef]
  16. Qin, K.; Rao, L.; Xu, J.; Bai, Y.; Zou, J.; Hao, N.; Li, S.; Yu, C. Estimating Ground Level NO2 Concentrations over Central-Eastern China Using a Satellite-Based Geographically and Temporally Weighted Regression Model. Remote Sens. 2017, 9, 950. [Google Scholar] [CrossRef]
  17. Chi, Y.; Fan, M.; Zhao, C.; Yang, Y.; Fan, H.; Yang, X.; Yang, J.; Tao, J. Machine Learning-Based Estimation of Ground-Level NO2 Concentrations over China. Sci. Total Environ. 2022, 807, 150721. [Google Scholar] [CrossRef]
  18. Bahadur, F.T.; Shah, S.R.; Nidamanuri, R.R. Applications of Remote Sensing Vis-à-Vis Machine Learning in Air Quality Monitoring and Modelling: A Review. Environ. Monit. Assess. 2023, 195, 1502. [Google Scholar] [CrossRef] [PubMed]
  19. Fu, J.; Tang, D.; Grieneisen, M.L.; Yang, F.; Yang, J.; Wu, G.; Wang, C.; Zhan, Y. A Machine Learning-Based Approach for Fusing Measurements from Standard Sites, Low-Cost Sensors, and Satellite Retrievals: Application to NO2 Pollution Hotspot Identification. Atmos. Environ. 2023, 302, 119756. [Google Scholar] [CrossRef]
  20. Cedeno Jimenez, J.R.; Pugliese Viloria, A.d.J.; Brovelli, M.A. Estimating Daily NO2 Ground Level Concentrations Using Sentinel-5P and Ground Sensor Meteorological Measurements. ISPRS Int. J. Geoinf. 2023, 12, 107. [Google Scholar] [CrossRef]
  21. Yagmur Aydin, N.; Aydin, I. Estimation of Ground-Level NO2 Concentrations over Megacities Using Sentinel-5P and Machine Learning Models: A Case Study of Istanbul. Int. Arch. Photogramm. Remote Sens. Spat. Inf. Sci. 2025, XLVIII-M-6–2025, 303–308. [Google Scholar] [CrossRef]
  22. Shetty, S.; Schneider, P.; Stebel, K.; David Hamer, P.; Kylling, A.; Koren Berntsen, T. Estimating Surface NO2 Concentrations over Europe Using Sentinel-5P TROPOMI Observations and Machine Learning. Remote Sens. Environ. 2024, 312, 114321. [Google Scholar] [CrossRef]
  23. Griffin, D.; Hempel, C.; McLinden, C.; Kharol, S.K.; Lee, C.; Fogal, A.; Sioris, C.; Shephard, M.; You, Y. Development and Validation of Satellite-Derived Surface NO2 Estimates Using Machine Learning versus Traditional Approaches in North America. EGUsphere 2025, 2025, 1–20. [Google Scholar] [CrossRef]
  24. Araki, S.; Shima, M.; Yamamoto, K. Spatiotemporal Land Use Random Forest Model for Estimating Metropolitan NO2 Exposure in Japan. Sci. Total Environ. 2018, 634, 1269–1277. [Google Scholar] [CrossRef]
  25. Chan, K.L.; Khorsandi, E.; Liu, S.; Baier, F.; Valks, P. Estimation of Surface NO2 Concentrations over Germany from TROPOMI Satellite Observations Using a Machine Learning Method. Remote Sens. 2021, 13, 969. [Google Scholar] [CrossRef]
  26. Long, S.; Wei, X.; Zhang, F.; Zhang, R.; Xu, J.; Wu, K.; Li, Q.; Li, W. Estimating Daily Ground-Level NO2 Concentrations over China Based on TROPOMI Observations and Machine Learning Approach. Atmos. Environ. 2022, 289, 119310. [Google Scholar] [CrossRef]
  27. Shao, Y.; Zhao, W.; Liu, R.; Yang, J.; Liu, M.; Fang, W.; Hu, L.; Adams, M.; Bi, J.; Ma, Z. Estimation of Daily NO2 with Explainable Machine Learning Model in China, 2007–2020. Atmos. Environ. 2023, 314, 120111. [Google Scholar] [CrossRef]
  28. Sun, W.; Tack, F.; Clarisse, L.; Schneider, R.; Stavrakou, T.; Van Roozendael, M. Inferring Surface NO2 over Western Europe: A Machine Learning Approach with Uncertainty Quantification. J. Geophys. Res. Atmos. 2024, 129, e2023JD040676. [Google Scholar] [CrossRef]
  29. Lundberg, S.; Lee, S.-I. A Unified Approach to Interpreting Model Predictions. arXiv 2017, arXiv:1705.07874. [Google Scholar] [CrossRef]
  30. Khorrami, B.; Heidarlou, H.B.; Feizizadeh, B. Evaluation of the Environmental Impacts of Urbanization from the Viewpoint of Increased Skin Temperatures: A Case Study from Istanbul, Turkey. Appl. Geomat. 2021, 13, 311–324. [Google Scholar] [CrossRef]
  31. Bozkurt, S.G.; Kuşak, L. Detection of Population Density, LULC Variation and Cross-Regional Similarities Using K-Means Clustering Algorithm in Istanbul Example. Mimar. Bilim. Uygulamaları Derg. 2024, 9, 69–86. [Google Scholar] [CrossRef]
  32. Akın, A.; Sunar, F.; Berberoğlu, S. Urban Change Analysis and Future Growth of Istanbul. Environ. Monit. Assess. 2015, 187, 506. [Google Scholar] [CrossRef]
  33. Sentinel-5P OFFL NO2: Offline Nitrogen Dioxide. Available online: https://developers.google.com/earth-engine/datasets/catalog/COPERNICUS_S5P_OFFL_L3_NO2 (accessed on 1 August 2025).
  34. SRTM. Available online: https://www.earthdata.nasa.gov/data/instruments/srtm (accessed on 1 August 2025).
  35. Rouse, J.W., Jr.; Haas, R.H.; Schell, J.A.; Deering, D.W. Monitoring the Vernal Advancement and Retrogradation (Green Wave Effect) of Natural Vegetation; NASA: Washington, DC, USA, 1973.
  36. MODIS. Available online: https://developers.google.com/earth-engine/datasets/catalog/MODIS_061_MCD43A4#description (accessed on 1 August 2025).
  37. Elvidge, C.D.; Baugh, K.; Zhizhin, M.; Hsu, F.C.; Ghosh, T. VIIRS Night-Time Lights. Int. J. Remote Sens. 2017, 38, 5860–5879. [Google Scholar] [CrossRef]
  38. Levin, N.; Kyba, C.C.M.; Zhang, Q.; Sánchez de Miguel, A.; Román, M.O.; Li, X.; Portnov, B.A.; Molthan, A.L.; Jechow, A.; Miller, S.D.; et al. Remote Sensing of Night Lights: A Review and an Outlook for the Future. Remote Sens. Environ. 2020, 237, 111443. [Google Scholar] [CrossRef]
  39. VIIRS Lunar Gap-Filled BRDF Nighttime Lights. Available online: https://developers.google.com/earth-engine/datasets/catalog/NASA_VIIRS_002_VNP46A2#description (accessed on 1 August 2025).
  40. Geos-CF. Available online: https://developers.google.com/earth-engine/datasets/catalog/NASA_GEOS-CF_v1_rpl_tavg1hr#description (accessed on 1 August 2025).
  41. Liu, N.; Lin, W.; Ma, J.; Xu, W.; Xu, X. Seasonal Variation in Surface Ozone and Its Regional Characteristics at Global Atmosphere Watch Stations in China. J. Environ. Sci. 2019, 77, 291–302. [Google Scholar] [CrossRef] [PubMed]
  42. Qin, K.; Han, X.; Li, D.; Xu, J.; Loyola, D.; Xue, Y.; Zhou, X.; Li, D.; Zhang, K.; Yuan, L. Satellite-Based Estimation of Surface NO2 Concentrations over East-Central China: A Comparison of POMINO and OMNO2d Data. Atmos. Environ. 2020, 224, 117322. [Google Scholar] [CrossRef]
  43. OPTUNA. Available online: https://optuna.org/ (accessed on 1 August 2025).
  44. Breiman, L. Random Forests. Mach. Learn. 2001, 45, 5–32. [Google Scholar] [CrossRef]
  45. Chen, J.; Zhu, S.; Wang, P.; Zheng, Z.; Shi, S.; Li, X.; Xu, C.; Yu, K.; Chen, R.; Kan, H.; et al. Predicting Particulate Matter, Nitrogen Dioxide, and Ozone across Great Britain with High Spatiotemporal Resolution Based on Random Forest Models. Sci. Total Environ. 2024, 926, 171831. [Google Scholar] [CrossRef]
  46. Vaishnavi, K.; Sreya, G.; Reddy, K.K.; P R, A. Machine Learning for Air Quality Prediction: Random Forest Classifier. In Proceedings of the 2024 Fourth International Conference on Advances in Electrical, Computing, Communication and Sustainable Technologies (ICAECT), Bhilai, India, 11–12 January 2024; pp. 1–5. [Google Scholar]
  47. Sharda, S.; Kumar, S.; Setia, R.; Dhiman, P.; Patel, N.R.; Pateriya, B.; Salem, A.; Elbeltagi, A. Evaluation of Different Spectral Indices for Wheat Lodging Assessment Using Machine Learning Algorithms. Sci. Rep. 2025, 15, 21774. [Google Scholar] [CrossRef]
  48. Chen, T.; Guestrin, C. XGBoost. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Francisco, CA, USA, 13–17 August 2016; ACM: New York, NY, USA, 2016; pp. 785–794. [Google Scholar]
  49. Ozturk, M.Y.; Colkesen, I. A Novel Hybrid Methodology Integrating Pixel- and Object-Based Techniques for Mapping Land Use and Land Cover from High-Resolution Satellite Data. Int. J. Remote Sens. 2024, 45, 5640–5678. [Google Scholar] [CrossRef]
  50. Georganos, S.; Grippa, T.; Vanhuysse, S.; Lennert, M.; Shimoni, M.; Wolff, E. Very High Resolution Object-Based Land Use–Land Cover Urban Classification Using Extreme Gradient Boosting. IEEE Geosci. Remote Sens. Lett. 2018, 15, 607–611. [Google Scholar] [CrossRef]
  51. Rumora, L.; Miler, M.; Medak, D. Impact of Various Atmospheric Corrections on Sentinel-2 Land Cover Classification Accuracy Using Machine Learning Classifiers. ISPRS Int. J. Geoinf. 2020, 9, 277. [Google Scholar] [CrossRef]
  52. Abdi, A.M. Land Cover and Land Use Classification Performance of Machine Learning Algorithms in a Boreal Landscape Using Sentinel-2 Data. GIScience Remote Sens. 2020, 57, 1–20. [Google Scholar] [CrossRef]
  53. Dorogush, A.V.; Ershov, V.; Gulin, A. CatBoost: Gradient Boosting with Categorical Features Support. arXiv 2018, arXiv:1810.11363. [Google Scholar] [CrossRef]
  54. Kulkarni, C.S. Advancing Gradient Boosting: A Comprehensive Evaluation of the CatBoost Algorithm for Predictive Modeling. J. Artif. Intell. Mach. Learn. Data Sci. 2022, 1, 54–57. [Google Scholar] [CrossRef]
  55. Pham, T.D.; Yokoya, N.; Nguyen, T.T.T.; Le, N.N.; Ha, N.T.; Xia, J.; Takeuchi, W.; Pham, T.D. Improvement of Mangrove Soil Carbon Stocks Estimation in North Vietnam Using Sentinel-2 Data and Machine Learning Approach. GIScience Remote Sens. 2021, 58, 68–87. [Google Scholar] [CrossRef]
  56. Ozturk, M.Y.; Colkesen, I. Development of Transferable Hybrid Deep Learning Networks for Temporal and Multi-Regional Mapping of Poplar Plantations with Sentinel-2. Adv. Space Res. 2025, 76, 4249–4279. [Google Scholar] [CrossRef]
  57. Lundberg, S.M.; Erion, G.; Chen, H.; DeGrave, A.; Prutkin, J.M.; Nair, B.; Katz, R.; Himmelfarb, J.; Bansal, N.; Lee, S.-I. From Local Explanations to Global Understanding with Explainable AI for Trees. Nat. Mach. Intell. 2020, 2, 56–67. [Google Scholar] [CrossRef] [PubMed]
  58. Wei, Q.; Song, W.; Dai, B.; Wu, H.; Zuo, X.; Wang, J.; Chen, J.; Li, J.; Li, S.; Chen, Z. Spatiotemporal Estimation of Surface NO2 Concentrations in the Pearl River Delta Region Based on TROPOMI Data and Machine Learning. Atmos. Pollut. Res. 2025, 16, 102353. [Google Scholar] [CrossRef]
  59. Kang, H.; Zhu, B.; Zhu, C.; de Leeuw, G.; Hou, X.; Gao, J. Natural and Anthropogenic Contributions to Long-Term Variations of SO2, NO2, CO, and AOD over East China. Atmos. Res. 2019, 215, 284–293. [Google Scholar] [CrossRef]
  60. Yu, S.; Yin, S.; Zhang, R.; Wang, L.; Su, F.; Zhang, Y.; Yang, J. Spatiotemporal Characterization and Regional Contributions of O3 and NO2: An Investigation of Two Years of Monitoring Data in Henan, China. J. Environ. Sci. 2020, 90, 29–40. [Google Scholar] [CrossRef]
  61. Xiao, K.; Wang, Y.; Wu, G.; Fu, B.; Zhu, Y. Spatiotemporal Characteristics of Air Pollutants (PM10, PM2.5, SO2, NO2, O3, and CO) in the Inland Basin City of Chengdu, Southwest China. Atmosphere 2018, 9, 74. [Google Scholar] [CrossRef]
  62. Lee, S.-J.; Lee, J.; Greybush, S.J.; Kang, M.; Kim, J. Spatial and Temporal Variation in PBL Height over the Korean Peninsula in the KMA Operational Regional Model. Adv. Meteorol. 2013, 2013, 1–16. [Google Scholar] [CrossRef]
Figure 1. The location of Türkiye and Istanbul, along with the distribution of air quality monitoring stations.
Figure 1. The location of Türkiye and Istanbul, along with the distribution of air quality monitoring stations.
Applsci 15 10997 g001
Figure 2. Workflow of the study.
Figure 2. Workflow of the study.
Applsci 15 10997 g002
Figure 3. The correlation heatmap of the features obtained on a daily basis.
Figure 3. The correlation heatmap of the features obtained on a daily basis.
Applsci 15 10997 g003
Figure 4. Station-based RMSE, MAE, and R diagrams for 2024. (a,b): RF, (c,d): XGB, and (e,f): CB.
Figure 4. Station-based RMSE, MAE, and R diagrams for 2024. (a,b): RF, (c,d): XGB, and (e,f): CB.
Applsci 15 10997 g004
Figure 5. Seasonal maps created with the Sentinel-5P seasonal average dataset for 2024.
Figure 5. Seasonal maps created with the Sentinel-5P seasonal average dataset for 2024.
Applsci 15 10997 g005
Figure 6. Seasonal maps created with the Geos-CF seasonal average dataset for 2024.
Figure 6. Seasonal maps created with the Geos-CF seasonal average dataset for 2024.
Applsci 15 10997 g006
Figure 7. Accuracy assessment of seasonal maps using R, RMSE, and MAE.
Figure 7. Accuracy assessment of seasonal maps using R, RMSE, and MAE.
Applsci 15 10997 g007
Figure 8. Relative error distribution of seasons for (a) Sentinel-5P and (b) Geos-CF.
Figure 8. Relative error distribution of seasons for (a) Sentinel-5P and (b) Geos-CF.
Applsci 15 10997 g008
Figure 9. SHAP analysis results for the CB model estimated for both Sentinel-5P and Geos-CF.
Figure 9. SHAP analysis results for the CB model estimated for both Sentinel-5P and Geos-CF.
Applsci 15 10997 g009
Figure 10. The most effective features: (a) Road network, (b) Population density, and (c) Elevation.
Figure 10. The most effective features: (a) Road network, (b) Population density, and (c) Elevation.
Applsci 15 10997 g010
Table 1. List of input and output features used in the study.
Table 1. List of input and output features used in the study.
Data TypeNameVariableData SourceInput/Output
Ground MonitoringGround-based NO2 Station Data
(Hourly average)
NO2 measurementshttps://havakalitesi.ibb.gov.tr/, accessed on 1 August 2025Output
Satellite Air Quality ProductSentinel-5P TROPOMI
(Daily)
Tropospheric vertical column of NO2https://earthengine.google.com, accessed on 1 August 2025Input
Geos-CF
(Hourly average)
Hourly average Nitrogen dioxide (NO2, MW = 46.00 g mol−1) tropospheric column densityhttps://earthengine.google.com, accessed on 1 August 2025Input
ClimateGeos-CF
(Hourly average)
Dust optical depth at 550 nm (AOD550_Dust)https://earthengine.google.com, accessed on 1 August 2025Input
Surface geopotential height (PHIS)Input
Surface pressure (PS)Input
Specific humidity (Q)Input
Relative humidity after moist (RH)Input
Sea level pressure (SLP)Input
2-m air temperature (T2M)Input
Total precipitation (TPREC)Input
Surface skin temperature (TS)Input
10-m eastward wind (U10M)Input
10-m northward wind (V10M)Input
Mid-layer heights (ZL)Input
Planetary boundary layer height (ZPBL)Input
SocietyMODIS
(Daily)
Normalized Difference Vegetation Index (NDVI)https://earthengine.google.com, accessed on 1 August 2025Input
VIIRS
(Daily)
Nighttime light (NTL)https://earthengine.google.com, accessed on 1 August 2025Input
OpenStreetMapRoad Length (RL)https://www.geofabrik.de, accessed on 1 August 2025Input
TUIK
(Annual)
Population Density (PD)https://biruni.tuik.gov.tr/medas/, accessed on 1 August 2025Input
TopographySRTMDigital Elevation Model (DEM)https://earthengine.google.com, accessed on 1 August 2025Input
--Day of Year (DOY)-Input
Table 2. Accuracy assessment results of the methods during training, validation, and test steps.
Table 2. Accuracy assessment results of the methods during training, validation, and test steps.
Sentinel-5P
DataTrain (2019–2022)Validation (2023)Test (2024)
Model/MetricRRMSE
(µg/m3)
MAE
(µg/m3)
RRMSE
(µg/m3)
MAE
(µg/m3)
RRMSE
(µg/m3)
MAE
(µg/m3)
RF0.82016.30211.4580.66017.90912.8800.66616.64512.150
XGB0.9459.1576.4050.65718.27413.1110.63817.60512.766
CB0.82715.77211.1210.66917.74312.6580.68616.23211.746
Geos-CF
Model/MetricRRMSE
(µg/m3)
MAE
(µg/m3)
RRMSE
(µg/m3)
MAE
(µg/m3)
RRMSE
(µg/m3)
MAE
(µg/m3)
RF0.83715.57510.9540.64118.26613.0770.64917.02712.409
XGB0.84214.91710.4020.64218.38913.0950.65317.00012.388
CB0.81916.16411.4790.64318.19312.9540.66516.58212.084
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Yagmur Aydin, N. Machine Learning-Based Ground-Level NO2 Estimation in Istanbul: A Comparative Analysis of Sentinel-5P and GEOS-CF. Appl. Sci. 2025, 15, 10997. https://doi.org/10.3390/app152010997

AMA Style

Yagmur Aydin N. Machine Learning-Based Ground-Level NO2 Estimation in Istanbul: A Comparative Analysis of Sentinel-5P and GEOS-CF. Applied Sciences. 2025; 15(20):10997. https://doi.org/10.3390/app152010997

Chicago/Turabian Style

Yagmur Aydin, Nur. 2025. "Machine Learning-Based Ground-Level NO2 Estimation in Istanbul: A Comparative Analysis of Sentinel-5P and GEOS-CF" Applied Sciences 15, no. 20: 10997. https://doi.org/10.3390/app152010997

APA Style

Yagmur Aydin, N. (2025). Machine Learning-Based Ground-Level NO2 Estimation in Istanbul: A Comparative Analysis of Sentinel-5P and GEOS-CF. Applied Sciences, 15(20), 10997. https://doi.org/10.3390/app152010997

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop