Estimations of the Ground-Level NO 2 Concentrations Based on the Sentinel-5P NO 2 Tropospheric Column Number Density Product

: The main objective of the presented study was to verify the potential of the Sentinel-5 Precursor (S-5P) Tropospheric NO 2 Column Number Density (NO 2 TVCD) to support air pollution monitoring in Poland. The secondary objective of this project was to establish a relationship between air pollution and meteorological conditions. The ERA-5 data together with the NO 2 TVCD product and auxiliary data were further assimilated into an artiﬁcial intelligence model in order to estimate surface NO 2 concentrations. The results revealed that the random forest method was the most accurate method for estimating the surface NO 2 . The random forest model demonstrated MAE values of 3.4 µ g/m 3 (MAPE~37%) and 3.2 µ g/m 3 (MAPE~31%) for the hourly and weekly estimates, respectively. It was observed that the proposed model could be used for at least 120 days per year due to the cloud-free conditions. Further, it was found that the S-5P NO 2 TVCD was the most important variable, which explained more than 50% of the predictions. Other important variables were the nightlights, solar radiation ﬂux, road density, population, and planetary boundary layer height. The predictions obtained with the proposed model were better ﬁtted to the actual surface NO 2 concentrations than the CAMS median ensemble estimations (~15% better accuracy).


Introduction
Air pollution is a global threat leading to huge impacts on human health and ecosystems.One of the most dangerous pollutants in the atmosphere is nitrogen dioxide (NO 2 ).It is responsible for respiratory diseases, cardiovascular diseases, lower self-cleaning airway capacity levels, weakening of the immune function of the lungs, asthma, and many others [1,2].According to a report by the European Environment Agency (EEA), there were 55,000 premature deaths due to NO 2 across Europe in 2018 [3].Moreover, high nitrogen concentrations damage ecosystems due to eutrophication and acidification (together with sulfur dioxide, SO 2 ).This in turn leads to changes in species diversity (reduced levels of existing species and invasions of new ones), as well as to increased concentrations of toxic metals in water and soils [3,4].In this respect, it is important to monitor the spatiotemporal distribution of NO 2 in order to detect and mitigate high concentrations of this pollutant in the atmosphere.Other air pollutants such as PM 2.5 and PM 10 are monitored more than NO 2 , especially in Poland.To mitigate this problem, other data sources are needed apart from the in situ measurements.This creates a strong demand for satellite data that allows for timely information gathering on air pollution.
Remote sensing data obtained from space-borne sensors can be used to assess the spatiotemporal distribution of NO 2 at the global scale.The Ozone Monitoring Instrument (OMI) onboard the National Aeronautics and Space Administration (NASA) Aura satellite has been providing daily air pollution data from 2004 at the global scale [5,6].
On the basis of the data provided by the OMI, Jamali et al. [7] found that the globally tropospheric vertical column density (TVCD) of NO 2 featured a slightly increasing trend (0.004•10 15 mol/cm 2 /yr) for the period 2005-2018.The highest increase in NO 2 was observed over India (0.04•10 15 mol/cm 2 /yr) and the highest decrease was observed over Japan (−0.049•10 15 mol/cm 2 /yr).Generally, at the global scale, a statistically significant linear trend was observed over ca.62% of the Earth, out of which ca. 54% was positive and ca.8% was negative [7].In the study performed by Krotkov et al. [8], NO 2 TVCD data from the OMI were analyzed over the most polluted areas around the world.It was found that the NO 2 tropospheric columnar concentrations decreased over the eastern USA by 40% and by 5% over Eastern Europe.On the other hand, increases were observed over China (ca.+5%; however, the trend was expressly negative from 2011), the Middle East (ca.+20%), and India (ca.+50) [8].Negative trends were observed over eight European cities, including Athens, Bucharest, Lisbon, Paris, Rome, Rotterdam, Berlin, and Madrid, for the period 2005-2014 [9].Georgoulias et al. [10] studied NO 2 TVCD trends derived from the records of four satellite sensors, namely the Global Ozone Monitoring Experiment (GOME) sensor onboard the second European Remote Sensing satellite (ERS-2), the Global Ozone Monitoring Experiment 2 (GOME-2) sensor onboard Meteorogical Operation A (Metop-A), the GOME-2 sensor onboard Meteorogical Operation B (MetOp-B), and the Scanning Imaging Absorption Spectrometer for Atmospheric Cartography (SCIAMACHY) sensor onboard the Environmental Satellite (Envisat).The key findings from this study related to negative trends observed during the twenty-one years (1996-2017) over the most industrialized and highly populated regions of the well-developed countries and positive trends over developing regions [10].In contrast to Jamali et al. [7], Georgoulias et al. [10] claimed that linear trends are not reliable on a global scale for a such long period [10].
The major advancement in satellite observations of NO 2 was the TROPOspheric Monitoring Instrument (TROPOMI) instrument mounted onboard the Sentinel-5 Precursor satellite (Sentinel-5P, S-5P) launched by the European Space Agency [11].NO 2 atmospheric pollution has been even more threatening since the COVID-19 pandemic started, and many scientific studies have elaborated on this topic.In this respect, reduced NO 2 levels in 2020 were reported by Bauwens et al. [12] at the global and regional scales.They analyzed NO 2 TVCD data over 33 cities all over the world, and only over 1 city (Isfahan) did the atmospheric NO 2 concentration increase during the pandemic [12].It was claimed that the reason for this increase was related to ignoring COVID-19 restrictions in Iran [12].
The satellite NO 2 column number density or tropospheric column number density is a different physical quantity than the ground concentration expressed in parts per million (ppm) or µg/m 3 .The physical relation between both quantities is usually complicated due to the vertical variability of NO 2 concentrations.A vertical profile of NO 2 depends on many factors, including the weather conditions, emissions, and chemical reactions.However, it is known that the satellite NO 2 TVCD is related to the surface concentrations, as the main NO 2 sources are related to human ground activities and the lifetime of NO 2 is short, which in turn precludes transportation at long distances [13][14][15].Since the launch of the OMI in 2004, many studies on the modeling of NO 2 concentrations at the surface have been conducted by assimilating various NO 2 TVCD satellite products.Sentinel-5P data have also been used to estimate this kind of pollution.The research performed by Griffin et al. [16] over the Canadian Oil Sands revealed a linear relationship between the TROPOMI NO 2 TVCD and ground mass concentration of NO 2 characterized by a Pearson correlation coefficient (R) of 0.67 [16].A higher R coefficient (0.85) was observed by Zheng et al. [17], who studied monthly means of NO 2 derived from Sentinel-5P and in situ measurements for different districts in China.Analogously, Cerosimo et al. [18] used the monthly mean S-5P NO 2 TVCD product to retrieve the surface NO 2 concentrations over Italy.However, in this study, the authors did not average the measurements over the districts and used measurements from single stations instead, as opposed to Zheng et al. [17].Cerosimo et al. [18] reported values of R 2 = 0.85 (TROPOMI NO 2 vs. ground concentration) over the north of Italy and R 2 = 0.71 over the south of Italy.In addition, Cerosimo et al. [18] also performed an analysis on non-averaged data that revealed values of R 2 = 0.50 over the south of Italy and R 2 = 0.42 over the north of Italy.A similar coefficient of determination (R 2 = 0.48 for hourly data and R 2 = 0.77 for annual data) was found by Jeong and Hong [19] for surface NO 2 concentrations retrieved from the S-5P TVCD NO 2 product over South Korea.Due to an undeniable relationship between atmospheric concentrations of pollutants and meteorological conditions [20][21][22][23][24], as well as anthropogenic factors [23,[25][26][27][28], they are often assimilated into the ML-based NO 2 retrieval models together with satellite NO 2 products.In this respect, the study carried out by Kim et al. [29] over the north of Italy, Switzerland, Austria, and southeast France, based on the XGboost method [30], revealed different agreements between the modeled NO 2 concentrations and in situ measurements, depending on the averaging interval, i.e., R 2 ~0.45 for hourly data, R 2 ~0.55 for daily averaged data, and R 2 ~0.60 for weekly and monthly averaged data [29].Kim et al. [29] found that the S-5P TVCD NO 2 product had a relative impact of 12% on the estimation of surface NO 2 concentrations.Only traffic emissions had a higher impact (14,8%).Other predictors that influenced the surface NO 2 values by more than 10% were the time of day (11.5%) and planetary boundary layer height (PBLH: 11.4%) [29].Wang et al. [31] employed five machine learning methods for the estimation of surface NO 2 atmospheric concentrations over China based on NO 2 TVCD data together with ancillary data (land cover, normalized difference vegetation index (NDVI), road density, and population density).Wang et al. [31] reported values of R 2 = 0.72 for light gradient boosting (Light-GBM) [32], R 2 = 0.68 for the XGBoost method [30], R 2 = 0.57 for the random forest (RF) method [33], R 2 = 0.55 for the deep belief network (DBN) method [34], and R 2 = 0.43 for the BPNN (back propagation neural network) method [35].The surface NO 2 was predicted by Chan et al. [36] using the S-5P TVCD, planetary boundary layer height, digital elevation model (DEM), air temperature, wind speed, relative humidity, precipitation, and incoming shortwave radiation flux at the surface level as predictors.They used an artificial neural network (ANN) algorithm [37] that featured a correlation value of R 2 = 0.64 [36].Analogously, Kim et al. [29] studied the impact of each predictor on the accuracy of the surface NO 2 estimation.Chan et al. [36] also found that the S-5P NO 2 TVCD was the most significant variable when estimating surface NO 2 concentrations [36].From the perspective of inconsistent results in surface NO 2 modeling and the different impacts of the explanatory variables (e.g., meteorological, anthropogenic) reported by the aforementioned studies, it is necessary to conduct consecutive studies to better define the limitations in the modeling of atmospheric NO 2 concentrations based on satellite data, meteorological conditions, and anthropogenic factors.This challenge was the main motivation to conduct this study and to address the following research questions:

•
How much does the accuracy of surface NO 2 modeling improve if the S-5P TCVD NO 2 product is assimilated into a model?The main objective of this study was to verify the potential of a machine learning (ML) approach based on the NO 2 TVCD Sentinel-5P satellite product generated by the European Space Agency (ESA) to support air pollution monitoring in Poland.In this respect, the surface atmospheric NO 2 concentration modeled by means of the ML algorithm was validated against in situ measurements provided by the Chief Inspectorate of Environmental Protection (GIOS).The secondary objective of the study was to establish a relationship between NO 2 air pollution (based on Sentinel-5P products) and the ERA5 meteorological reanalysis provided by the European Center for Medium-Range Weather Forecasts (ECMWF).
The analyses were performed for two temporal aggregation levels, namely hourly and weekly data, to determine the impact of the temporal averaging.Furthermore, several ML algorithms were tested together with a simple linear regression model to select the most robust method.Ultimately, the impacts of the explanatory variables on the model results were analyzed to determine the most significant variables.
This manuscript is structured into five sections.The Section 1 provides a comprehensive literature overview on the modeling of surface atmospheric NO 2 concentrations.Section 2 provides descriptions of the data used within the study, as well as the methodology for the data processing, analysis, and validation.Section 3 provides the results of the conducted analysis.Section 4 provides a discussion of the acquired results, while Section 5 concludes the conducted study.

Study Area
Poland was used as the study area to develop a NO 2 retrieval algorithm and to analyze the spatial distribution of NO 2 .The climate in Poland is temperate.It is oceanic in the northwest and more continental towards the southeast.The average temperature in summer is ca.20 • C, while in winter it is ca.−1 • C [39].The topography of Poland has a latitudinal belt-like layout, with a young glacial landscape (with a belt for part of the country and lake districts) in the northern part of the country, an old glacial landscape in some parts, and a belt of highlands and mountains on the ground.The lowlands cover 91.3% of the area, the highlands 5.6%, and the mountains 3.3%, of which 0.2% are high mountains [40].The above factors all make Poland a perfect representative of a temperate climate zone.Moreover, other Central and Eastern European countries are facing similar problems and challenges related to air quality as Poland.There is a similar annual mean mass concentration of NO 2 in Poland of 15.6 µg/m 3 to the other countries of the region, e.g., Czechia with 15.5 µg/m 3 , Croatia with 13.8 µg/m 3 , Slovakia with 14.8 µg/m 3 , Hungary with 17.0; µg/m 3 , Serbia with 17.3 µg/m 3 , and Slovenia with 14.5 µg/m 3 [3].

Materials
The modeling of the surface NO 2 mass concentration presented in this study was based on in situ NO 2 measurements, the Sentinel-5P TVCD NO 2 product, the ERA-5 meteorological reanalysis from the Copernicus Climate Change Service (CCS), and various ancillary datasets, such as for the nightlight intensity derived from the Visible Infrared Imaging Radiometer Suite (VIIRS) satellite instrument, population density, road density, and DEM.The study covered the period from 1 July 2018 to 30 July 2021.

NO 2 Tropospheric Vertical Column Density (NO 2 TVCD) Product Retrieved from Sentinel-5P Satellite TROPOMI Measurements
The Sentinel-5P satellite was launched on 13 October 2017 as part of the Copernicus Earth Observation Program.The S-5P mission supports the global monitoring of the atmosphere and air quality using TROPOMI.The TROPOMI instrument consists of four spectrometers measuring radiation in the ultraviolet (270-320 nm), visible (310-500 nm), near-infrared (675-775 nm), and shortwave infrared (2305-2385 nm) electromagnetic spectrums.At the beginning of the mission, the spatial resolution of the TROPOMI products was 3.5 km × 7 km, while currently (December 2022) it is 3.5 km × 5.5 km.The width of the swath is 2600 km, which allows for daily monitoring of the atmosphere at the global scale.The spectral resolutions of the TROPOMI products vary from 1 nm in the UV band through to 0.5 nm in the VIR and NIR bands and 0.25 nm in the SWIR band [11].
The NO 2 TVCD product is generated by the Royal Netherlands Meteorological Institute (KNMI) by means of DOMINO [41,42] and Quality Assurance for Essential Climate Variables (QA4ECV) [43] algorithms.These methods share three common steps: 1.
The NO 2 slant column densities are retrieved from the measured radiance and irradiance spectra using the differential optical absorption spectroscopy (DOAS) method; 2. The separation of tropospheric and stratospheric columns, i.e., their conversion to tropospheric and stratospheric slant columns; 3.
The conversion of tropospheric and stratospheric slant columns into tropospheric and stratospheric vertical column densities [42,44].
The vertical profiles of NO 2 are calculated for the center of a pixel featuring a spatial resolution of 1 • × 1 • and are based on the chemistry transport model (CTM)-TM5-MP [44,45].Finally, the filtering of the cloud cover is performed using the Fast Retrieval Scheme for Clouds from the Oxygen A-Band (FRESCO-S) algorithm [46].
This allowed us to filter the NO 2 TCVD values, with the data qa_values > 0.75, as suggested by van Geffen et al. [44] and Ialongo et al. [14] being used.Consequently, the majority of the cloudy pixels (cloud radiance fractions > 0.5) and pixels covered by snow or ice were removed [51].

In Situ Air Quality Measurements
The NO 2 surface mass concentration was obtained from the Chief Inspectorate of Environmental Protection (GIOS).The GIOS provides hourly averages of measurements recorded every 10 s at a height of 2 m a.g.l.The number of stations where NO 2 is measured is constantly changing, as some new ones are added and some old ones are discontinued.In this respect, 136 locations were selected featuring NO 2 measurement records collected since 2018.Further, stations classified as transport (roadside) and industrial stations were excluded from the analysis because traffic-related and industrial-related pollutants can vary substantially within a few meters, and such stations are not representative of a large satellite footprint [36,52,53].Thus, the data from 116 stations, including 20 non-built-up stations, 12 suburban stations, and 84 urban stations, were used within this study [40].The spatial distribution of the stations in Poland measuring atmospheric NO 2 concentrations is dominated by urban areas, as there are only few background stations in Poland (Figure 1).
Due to the high daily variability of NO 2 [54,55], the hourly means provided by GIOS were linearly interpolated to the time of the S-5P image acquisition.

Meteorological Data
The meteorological conditions have a great impact on the air quality [20][21][22][23][24]29,30]; thus, they were also used as explanatory variables to predict the NO 2 concentrations in this study.In this respect, the following meteorological variables were analyzed: the air temperature (T), dew point temperature (DT), atmospheric pressure (P), surface incoming solar radiation accumulated over one hour (RAD), U zonal wind component (U WIND), V meridional wind component (V WIND), and PBLH.The T and DT were used to calculate the relative humidity (RH), while the U WIND and V WIND were used to calculate the wind speed (WS) and wind direction (WD).These datasets were retrieved from the ERA5-Land Hourly reanalysis model provided by the European Center for Medium-Range Weather Forecasts (ECMWF) at the 0.1 • × 0.1 • spatial resolution [56].The meteorological data, excluding PBLH, were retrieved using the Google Earth Engine (GEE) [57], while the PBLH data were obtained using an R programming language script using the ecmwfr package [58].The ERA5-Land data were linearly interpolated to the time of the S-5P acquisition in order to reduce temporal inconsistencies between meteorological and satellite data.Then, linearly interpolated datasets were spatially downscaled to the 3.5 km × 5.5 km spatial resolution using bilinear interpolation [59][60][61].Due to the high daily variability of NO2 [54,55], the hourly means provided by GIOS were linearly interpolated to the time of the S-5P image acquisition.

Meteorological Data
The meteorological conditions have a great impact on the air quality [20][21][22][23][24]29,30]; thus, they were also used as explanatory variables to predict the NO2 concentrations in this study.In this respect, the following meteorological variables were analyzed: the air temperature (T), dew point temperature (DT), atmospheric pressure (P), surface incoming solar radiation accumulated over one hour (RAD), U zonal wind component (U WIND), V meridional wind component (V WIND), and PBLH.The T and DT were used to calculate the relative humidity (RH), while the U WIND and V WIND were used to calculate the wind speed (WS) and wind direction (WD).These datasets were retrieved from the ERA5-Land Hourly reanalysis model provided by the European Center for Medium-Range Weather Forecasts (ECMWF) at the 0.1° × 0.1° spatial resolution [56].The meteorological data, excluding PBLH, were retrieved using the Google Earth Engine (GEE) [57], while the PBLH data were obtained using an R programming language script using the ecmwfr package [58].The ERA5-Land data were linearly interpolated to the time of the S-5P acquisition in order to reduce temporal inconsistencies between meteorological and satellite data.Then, linearly interpolated datasets were spatially downscaled to the 3.5 km x 5.5.km spatial resolution using bilinear interpolation [59][60][61].
The meteorological datasets were used to derive the zonal atmospheric circulation index values over Poland according to the classification method proposed by Lityński [62][63][64].In this respect, the values for sea-level pressure from the NCEP/NCAR (National The meteorological datasets were used to derive the zonal atmospheric circulation index values over Poland according to the classification method proposed by Lity ński [62][63][64].In this respect, the values for sea-level pressure from the NCEP/NCAR (National Centers for Environmental Prediction/National Center for Atmospheric Research) reanalysis were used.The circulation types (CT) were distinguished using 3 main indices (zonal, Ws; meridional, Wp; cyclonicity index, Cp), which in turn were subdivided into 27 circulation types.The Ws and Wp were calculated based on differences in atmospheric pressure between zones at 40 • -65 • N and 0 • -35 • E, while the Cp was the atmospheric pressure over the node nearest to Warsaw (52.5 • N, 20 • E) subtracted by 1000 hPa [62][63][64].Over the years, the synoptic maps used by Lity ński were replaced by NCEP/NCAR reanalysis data [65][66][67].

Ancillary Data
NO 2 emissions are greatly related to anthropogenic activities [13][14][15]68,69].In this respect, it was necessary to add anthropogenic explanatory variables to train the NO 2 retrieval model, such as population density, road density, and satellite data on the intensity of nightlights.In order to obtain information about the population density, the Global Human Settlement Layers (GHSL) dataset was used for the year 2015.This dataset contains the population density expressed as the number of people within a 250 m × 250 m spatial grid [70].Additionally, the intensity of nightlights was used, as it corresponds well with socioeconomic dynamics [71][72][73].This dataset consists of nighttime radiance imagery acquired by the Visible Infrared Imaging Radiometer Suite (VIIRS) Day/Night Band (DNB) [74,75].These two datasets were extracted from the GEE [57] at a spatial resolution of 464 m × 464 m.The road density was derived from the OpenStreetMap dataset [76].The population density and road density were calculated as the sum of people within a 3.5 × 5.5 grid and the road density was calculated as the total length of the roads within a grid.
Following recommendations from studies on NO 2 modeling [29,30], the digital elevation model (DEM) from the Shuttle Radar Topography Mission (SRTM) featuring a 30 m × 30 m spatial resolution [77] was used in this study.The information on the terrain elevation was extracted from the GEE [57].
Ultimately, the surface atmospheric NO 2 concentrations from the CAMS (Copernicus Atmosphere Monitoring Service) reanalysis were used to benchmark the NO 2 retrieval models proposed within this study.The CAMS dataset covers the period from 10 November 2018 to 30 June 2021 and consists of an ensemble median of 11 CAMS models considered as the best-fitted to real concentrations among other models [78,79].The temporal resolution of the CAMS ensemble median is 1 h, while the spatial resolution is 1 km × 1 km.Estimates are provided at 10 vertical levels (surface, 50 m, 100 m, 250 m, 500 m, 750 m, 1000 m, 2000 m, 3000 m, 5000 m).The data are provided for Europe only [80].In contrast to other data, the CAMS ensemble median NO 2 estimates were not resampled to the resolution of Sentinel-5P pixels so as not to distort the comparable results.

Modeling of Surface NO 2 Mass Concentration
In order to predict the surface NO 2 , we used the datasets mentioned in Section 2.2.In this respect, the NO 2 TVCD, T, P, RAD, WS, PBLH, NIGHTLIGHT, POPULATION, ROADS DENSITY, and ELEVATION were included in the models.Moreover, the atmospheric circulation data (CT), as a categorical factor, were included in the machine learning methods.
Four kinds of methods were used: • Linear regression with one independent variable (LM)-NO 2 TVCD; Prediction analyses were performed for two periods: hourly data (for hours of each available S5-P image) and weekly data (weekly averaged data).Each variable was averaged to the weekly mean.
One parameter was chosen to point out outliers within the dataset and exclude them: • Stations measuring surface NO 2 closer than 100 m from the road (2 stations; 1024 observations; 2% of all observations).
According to Section 2.2.2, transport stations were excluded from the analysis [36,52,53].However, there were still stations located close to the roads within the dataset.Due to the excessive noise concentrations over the stations next to the roads (and defined as background stations), these were also excluded from the dataset [81].
After excluding the outliers, the datasets consisted of 50,099 hourly observations and 13,765 weekly observations, having been reduced by 1024 observations (2%) and 994 observations (7%), respectively.
In order to perform prediction activities, the dataset was divided into two parts: a training dataset used for the training model and a test dataset used for validation.The training dataset consisted of data for January, March, May, July, September, and November, while the testing dataset consisted of data for February, April, June, August, October, and December.Due to our approach, the models were trained and validated over datasets that differed from each other.The training dataset and testing dataset included 22,678 (45%) and 27,421 (55%) hourly observations, respectively, and 7126 (52%) and 6639 (48%) weekly averages, respectively.
To obtain information about the availability of useful S-5P observations, we counted the number of observations characterized by qa_values > 0.75 for each pixel within the study area.Only data for 2019 and 2020 were chosen for the observation analysis as they were the only fully covered years in the study's analysis period.

Validation methodology
The results obtained from the formulated predictions models were validated using the following statistical parameters: • Mean squared error (MSE): • Root mean squared error (RMSE): • Bias: • Mean absolute error (MAE): • Mean percentage absolute error (MAPE): Here, y i is the actual value, ŷi is the predicted value, y is the mean for the actual values, and n is the number of observations.
Considering that the data used for prediction purposes were expressed in various units, we performed the normalization before the modeling.We used standardization feature scaling in order to bring all values into the common scale: Here, yi is the actual value, ŷi is the predicted value,  is the mean for the actual values, and n is the number of observations.
Considering that the data used for prediction purposes were expressed in various units, we performed the normalization before the modeling.We used standardization feature scaling in order to bring all values into the common scale: Here, z is the standardized variable, x is the non-standardized variable, ⲙ is the mean of the non-standardized variable, and σ is the standard deviation of the non-standardized variable.

Ranking the Modeling Skills of the Selected Predictors
The variables' importance for multilinear regression was calculated by comparing the standardized regression coefficients.The standardized coefficients were also summed.Further, the percentage of each variable within the total was calculated [82].
The variables' importance for the random forest model was calculated with the ran-domForest package created for the R programming language, and the variables' contribution was expressed by the mean increase in MSE, divided by a measure of the variability (%IncMSE), which was calculated as follows: Here, z is the standardized variable, x is the non-standardized variable, Here, yi is the actual value, ŷi is the predicted value,  is the mea values, and n is the number of observations.
Considering that the data used for prediction purposes were expre units, we performed the normalization before the modeling.We used stan ture scaling in order to bring all values into the common scale: Here, z is the standardized variable, x is the non-standardized variab of the non-standardized variable, and σ is the standard deviation of the no variable.

Ranking the Modeling Skills of the Selected Predictors
The variables' importance for multilinear regression was calculated the standardized regression coefficients.The standardized coefficients we Further, the percentage of each variable within the total was calculated [8 The variables' importance for the random forest model was calculat domForest package created for the R programming language, and the var is the mean of the non-standardized variable, and σ is the standard deviation of the non-standardized variable.

Ranking the Modeling Skills of the Selected Predictors
The variables' importance for multilinear regression was calculated by comparing the standardized regression coefficients.The standardized coefficients were also summed.Further, the percentage of each variable within the total was calculated [82].
The variables' importance for the random forest model was calculated with the ran-domForest package created for the R programming language, and the variables' contribution was expressed by the mean increase in MSE, divided by a measure of the variability (%IncMSE), which was calculated as follows: 1.
For each variable in the model: Rank the variables' importance according to the %IncMSE values, whereby the greater the value, the more important the variable [83].
This method is considered more reliable than the decrease in node impurity [84,

Numbers of Observations
First of all, the limitations of using satellite S-5P data due to cloud coverage were verified.As mentioned in Section 2.  2a,b).Thus, the frequency rates of cloud-free pixels within the borders of Poland were similar in 2019 and 2020.For each part of Poland, it was possible to obtain data for at least one-third of the year.For the areas in the southern part of Poland, it was even possible to obtain data for more than half of the year.

Modeling of Surface NO 2 Mass Concentration
The modeling of the surface NO 2 was the main objective of the study.The results and validation of estimated surface NO 2 mass concentrations based on satellite, meteorological, and other data are listed in Section 2.2.
Firstly, a linear model with one independent variable-NO 2 TVCD-was used to estimate the surface NO 2 mass concentrations (LM-S5P).The results demonstrated that R 2 = 0.32 with MAE and MAPE values of 4.9 µg/m 3 and 48.4% for the hourly estimates, respectively.LM-S5P was the only model with an RMSE higher than 7.00 µg/m 3 for these measurements.The accuracy of the model was absolutely unsatisfactory.The results were slightly better for the weekly averages.LM-S5P gave results of R 2 = 0.33, RMSE = 6.2 µg/m 3 , MAE = 4.3 µg/m 3 , and MAPE = 42.4% (Table 1).However, it was decided to develop a model with more independent variables.observations in 2020).The lowest numbers of cloud-free observations were noticed over the northern part of Poland near the Baltic Sea area, where only 130-140 observations were noted in 2019 and 2020.Additionally, the area near the northeastern border was characterized by low numbers of cloud-free images; there were 130-140 cloud-free observations in each year (Figure 2a,b).Thus, the frequency rates of cloud-free pixels within the borders of Poland were similar in 2019 and 2020.For each part of Poland, it was possible to obtain data for at least one-third of the year.For the areas in the southern part of Poland, it was even possible to obtain data for more than half of the year.

Modeling of Surface NO2 Mass Concentration
The modeling of the surface NO2 was the main objective of the study.The results and validation of estimated surface NO2 mass concentrations based on satellite, meteorological, and other data are listed in Section 2.2.
Firstly, a linear model with one independent variable-NO2 TVCD-was used to estimate the surface NO2 mass concentrations (LM-S5P).The results demonstrated that R 2 = 0.32 with MAE and MAPE values of 4.9 μg/m 3 and 48.4% for the hourly estimates, respectively.LM-S5P was the only model with an RMSE higher than 7.00 μg/m 3 for these measurements.The accuracy of the model was absolutely unsatisfactory.The results were slightly better for the weekly averages.LM-S5P gave results of R 2 = 0.33, RMSE = 6.2 μg/m 3 , MAE = 4.3 μg/m 3 , and MAPE = 42.4% (Table 1).However, it was decided to develop a model with more independent variables.The second approach involved multilinear regression with ten variables (MLM).The estimates were more accurate in comparison with LM-S5P.We observed values of R 2 = 0.45, RMSE = 6.6 µg/m 3 , MAE = 4.2 µg/m 3 , and MAPE = 42.1% for the hourly data and R 2 = 0.49, RMSE = 5.4 µg/m 3 , MAE = 3.6 µg/m 3 , and MAPE = 35.5% for the weekly averages (Table 1).This method allowed us to verify the influence of each variable in the model.A detailed description of the impact of the variables is given in Section 3.3.
To improve the model and make it possible to include categorical data, a machine learning approach was used.Firstly, estimates were calculated with the use of the random forest (RF) method.This approach achieved R 2 values higher than 0.50 for the hourly data and 0.60 for the weekly averages.The MAE declined to 3.7 µg/m 3 for the hourly measurements and to 3.1 µg/m 3 for the weekly averages, which corresponded to 37.2% and 30.8%, respectively (Table 1).This model was more accurate, especially for the group of high predicted values but low actual values.For the LM-S5P model and MLM, the observed estimates were in the range of ca.40-80 µg/m 3 , while the actual values were lower than 10 µg/m 3 (Figure 3a,b).The RF method handled these measurements and predicted them with higher accuracy (Figure 3c).Additionally, the group of estimations between the black line and red line (Figure 3,b), which corresponded to actual values > 50 µg/m 3 (Figure 3a,b), was predicted with higher accuracy using the RF approach for the hourly data as well as for weekly averages (Figure 3c,g).Therefore, it seems that the RF method was less sensitive for non-typical actual surface NO 2 mass concentrations, which compounded the results of the LM-S5P and MLM method.Figure 3 shows that all of the models overestimated the surface NO2 mass concentrations during relatively clean conditions (NO2 values less than approximately 10 μg/m 3 ) and underestimated the values during pollution events.
To sum up, the development of the linear model into a more complex model requiring more computing power was justified.The multilinear model revealed significantly better estimation accuracy than the linear model with one independent variable (ca.6%), The second machine learning approach used for modeling purposes was the radial kernel support vector machine (SVM).The results and accuracy were very similar in comparison with the RF model at R 2 = 0.54, RMSE = 6.1 µg/m 3 , MAE = 3.7 µg/m 3 , and MAPE = 36.9%for the hourly estimates and R 2 = 0.59, RMSE = 4.9µg/m 3 , MAE = 3.2 µg/m 3 , and MAPE = 31.2%for the weekly averages (Table 1).Significant differences in estimates between the RF and SVM methods were observed within the interval ranges of 70-90 µg/m 3 for the actual (ground-based observation) values and 30-50 µg/m 3 for the predicted values (Figure 3c,d,g,h).The predictions calculated using the SVM method were slightly higher.
Figure 3 shows that all of the models overestimated the surface NO 2 mass concentrations during relatively clean conditions (NO 2 values less than approximately 10 µg/m 3 ) and underestimated the values during pollution events.
To sum up, the development of the linear model into a more complex model requiring more computing power was justified.The multilinear model revealed significantly better estimation accuracy than the linear model with one independent variable (ca.6%), while the machine learning approaches were ca.5% more accurate than the multilinear model (for the hourly observations).For the weekly averages, the tendencies were similar.The MLM model was ca.7% more accurate than the LM-S5P model, while the RF and SVM estimates had accuracy values higher than ca.5% in comparison with the MLM model.Therefore, the SVM method was a little bit more accurate for hourly estimates, while the RF method was more accurate for weekly averages (Table 1).However, one approach was chosen for further investigation-the RF model (for hourly and weekly data).This was decided due to the very low differences between the SVM and RF models, which actually may have been random and due to the lower bias for the RF model (<0.1.µg/m 3 and −0.1 µg/m 3 for hourly and weekly predictions, respectively) (Table 1).Moreover, computing estimates using the RF method is faster and requires a computer with less computing power, which makes the method more applicable for further studies.
The distribution and magnitude of differences between the predicted NO 2 mass concentrations (RF model) and actual surface NO 2 concentrations are shown in Figure 4.A perfect fit would be described by points placed strictly on the red line.Figure 4 allows for easy visualization and interpretation of how many values are underestimated (points below the red line) and overestimated (over the red line).In this study, the differences were characterized by an asymmetric distribution.The magnitude of the underestimations was higher than the magnitude of the overestimations (Figure 4).For the hourly estimates, this trend is even more evident, as the tails are bigger and deviate more from the perfect fit (red line).Differences higher than 50 µg/m 3 can even be observed (Figure 4a), while for the weekly averages a poor number of differences higher than 30 µg/m 3 can be noticed (Figure 4b).Additionally, the increase in overestimations is less smooth and the peaks are higher, at ca. 30 µg/m 3 for hourly and ca.15 µg/m 3 for weekly estimates (Figure 4).The distribution and magnitude of differences between the predicted NO2 mass concentrations (RF model) and actual surface NO2 concentrations are shown in Figure 4.A perfect fit would be described by points placed strictly on the red line.Figure 4 allows for easy visualization and interpretation of how many values are underestimated (points below the red line) and overestimated (over the red line).In this study, the differences were characterized by an asymmetric distribution.The magnitude of the underestimations was higher than the magnitude of the overestimations (Figure 4).For the hourly estimates, this trend is even more evident, as the tails are bigger and deviate more from the perfect fit (red line).Differences higher than 50 μg/m 3 can even be observed (Figure 4a), while for the weekly averages a poor number of differences higher than 30 μg/m 3 can be noticed (Figure 4b).Additionally, the increase in overestimations is less smooth and the peaks are higher, at ca. 30 μg/m 3 for hourly and ca.15 μg/m 3 for weekly estimates (Figure 4).Considering that the function for the estimations was fitted above the perfect fit for actual values lower than ca. 10 μg/m 3 and under the perfect fit for actual values higher than ca. 10 μg/m 3 (Figure 3c,g), it was decided to verify the bias within the quartiles defined by the actual surface NO2 mass concentrations.It was found that the proposed random forest model overestimated the results, with 2.5 μg/m 3 and 2.2 μg/m 3 bias rates for Considering that the function for the estimations was fitted above the perfect fit for actual values lower than ca. 10 µg/m 3 and under the perfect fit for actual values higher than ca. 10 µg/m 3 (Figure 3c,g), it was decided to verify the bias within the quartiles defined by the actual surface NO 2 mass concentrations.It was found that the proposed random forest model overestimated the results, with 2.5 µg/m 3 and 2.2 µg/m 3 bias rates for hourly and weekly predictions within the first quartile (Table 2), which was 4.4 µg/m 3 (Table 1).Even if the bias rates show a high relative difference, it has to be emphasized that a NO 2 mass concentration of ca.4-5 µg/m 3 is still a very low level of air pollution.Whether or not the model was biased at ca. 2-2.5 µg/m 3 , the results still showed a very low level of NO 2 .The estimates within the second and third quartiles were less biased at 1.7 µg/m 3 and 1.3 µg/m 3 for the hourly and weekly NO 2 mass concentrations, respectively (Table 2).This is extremely important because it was the most frequent observed level of NO 2 pollution.It also corresponds well to Figure 3c,g, where the perfect fit and modeled fit are the closest around the values defined by the second and third quartiles (Table 2).The highest bias rates were observed within the fourth quartile, which were −5.1 µg/m 3 and −4.0 µg/m 3 for the hourly and weekly predictions, respectively (Table 2).Therefore, these underestimations for the high actual values of the NO 2 mass concentrations were considered the most serious problem within the proposed model.Such results are expected, while satellite columnar observations tend to average the pollution data in comparison to one-level (surface) concentration.Therefore, all models underestimated the range of surface concentrations of NO 2 .Since this was also a problem that had occurred within the other studies, this issue was analyzed and is discussed in Section 4.

Variables' Importance
With respect to Table 3, for the hourly estimates, the most important variable for the prediction of surface NO 2 mass concentrations was NO 2 TVCD obtained from S-5P.It explained 57% of the model's results.The second most significant variable was PBLH, which explained 7% of the results.RAD, WS, ROADS, and NIGHTLIGHT were the other variables that explained at least 5% of the results, at 6%, 6%, 6%, and 5%, respectively.The rest of the meteorological factors, T and P, explained less than 5% of the results.However, they were still statistically significant (p-value < 0.05).POP explained 3%.Therefore, 57% of the model's results were explained by the TROPOMI data, 23% were explained by the meteorological data (T, P, WS, RAD, PBLH), and 14% by the anthropogenic factors (NIGHTLIGHT, POP, ROADS); 1% was explained by the elevation and 6% by an intercept.The increases in the TVCD of NO 2 , nightlights, population, road density, and elevation gave rise to increased surface NO 2 mass concentrations, while the increases in temperature, pressure, radiation, wind speed, and PBLH caused decreased surface NO 2 mass concentrations.To sum up, it seems that the meteorological factors were favorable to decreases in surface NO 2 values.In contrast, anthropogenic factors increase the surface NO 2 mass concentration (Table 3).
Very similar results were obtained for the weekly averages.Here, 53% of the model's results were explained by the S-5P data.Again, the second most important variable was BLH (8%).Generally, the meteorological data expressed 24% of the model's results, while 14% was expressed by anthropogenic factors.Additionally, 1% of the predictions were determined by the elevation and 8% by the intercept (Table 3).
An investigation of the variables' importance within the RF model was conducted and the results are expressed as % changes in the MSE.Because the RF model allows for using categorical data, it was possible to verify the impacts of factors such as the circulation type (CT).
For the hourly measurements, just like with the multilinear approach, the most important variable for the RF model was the S-5P TVCD of NO 2 .The decrease in accuracy when excluding S-5P from the model was 27%.The next most significant variable was NIGHT-LIGHT (19% decrease in accuracy).The other variables that decreased accuracy by at least 15% when excluded from the model were PBLH (16%) and ROADS (15%).Beside PBLH, the meteorological factor that affected the accuracy the most was RAD (14% decrease in accuracy).POP was almost as important (an almost 14% decrease in MSE).When excluding other factors from the RF model, the accuracy decreased by less than 10% (T, 9%; WS, 7%; ELEVATION, 5%; P, 4%).The lowest impact on the model accuracy was for CT; removing CT from the model increased the MSE by 3% (Figure 5a).Thus, it seems that no additional factor such as CT influenced the quality of the RF model in comparison with the MLM model except for the method itself.In contrast with their importance within the MLM model, the anthropogenic factors (NIGHTLIGHT, ROADS, POP) were more important for the RF approach (Table 3). the MLM model except for the method itself.In contrast with their importance within the MLM model, the anthropogenic factors (NIGHTLIGHT, ROADS, POP) were more important for the RF approach (Table 3).Slightly different results were revealed for the weekly averages.TVCD of NO2 and NIGHTLIGHT were again the most important factors (23% and 13% decreases in accuracy, respectively, when excluded from the model).On the other hand, RAD and POPU-LATION were more important than ROADS (13% and 11% decreases in accuracy, respectively).Still, PBLH had a significant impact on the model (12% increase in MSE when excluded).Again, T, WS, ELEVATION, and P were the least important with 5%, 3%, 3%, and 2% decreases in accuracy, respectively (Figure 5b).It is worth mentioning that the impacts of WS and P were lower for the weekly averages estimations than the hourly estimations, while with the MLM approach, the opposite was true (Table 3).Regardless, according to their importance and influence on the MSE, two groups of variables are emphasized for the RF model (both for the hourly and weekly data):

•
Those with an impact higher than or equal to 10% on the changes in MSE: S5P, NIGHTLIGHT, PBLH, ROADS, RAD, and POP; • Those with an impact lower than 10% on the changes in MSE: T, P, WS, ELEVATION, and CT.Slightly different results were revealed for the weekly averages.TVCD of NO 2 and NIGHTLIGHT were again the most important factors (23% and 13% decreases in accuracy, respectively, when excluded from the model).On the other hand, RAD and POPULATION were more important than ROADS (13% and 11% decreases in accuracy, respectively).Still, PBLH had a significant impact on the model (12% increase in MSE when excluded).Again, T, WS, ELEVATION, and P were the least important with 5%, 3%, 3%, and 2% decreases in accuracy, respectively (Figure 5b).It is worth mentioning that the impacts of WS and P were lower for the weekly averages estimations than the hourly estimations, while with the MLM approach, the opposite was true (Table 3).Regardless, according to their importance and influence on the MSE, two groups of variables are emphasized for the RF model (both for the hourly and weekly data):

•
Those with an impact higher than or equal to 10% on the changes in MSE: S5P, NIGHT-LIGHT, PBLH, ROADS, RAD, and POP; • Those with an impact lower than 10% on the changes in MSE: T, P, WS, ELEVATION, and CT.

Changes in NO 2 TVCD and Surface NO 2 Mass Concentration in Respect to Meteorological Conditions and Other Factors
The changes in NO 2 TVCD and surface NO 2 mass concentration values were verified as indicators of pollution in each interval defined for each meteorological or anthropogenic factor.
A decrease in air temperature implied a decrease in air pollution (TVCD of NO 2 as well as surface NO 2 ).Starting at −4-0 • C, the surface NO 2 fell consistently to 24-28 • C and then it increased until the last interval (Figure 6a).The slight increase starting from 24-28 • C was supposedly induced by photochemical reactions that started to occur with the high solar radiation flux and high air temperature [86].In contrast, the surface NO 2 mass concentration stopped rising at the 8-12 • C range and then fell further at 24-28 • C. Just like the surface NO 2 , it increased again in the next two intervals.On average, the surface NO 2 concentration dropped by 0.8 µg/m 3 in each subsequent interval, while the NO 2 TVCD concentration dropped by 0.7 × 10 5 µmol/m 2 in each subsequent interval.The biggest changes were noticed for the 20-24 • C range, which were −2.2 µg/m 3 for the surface NO 2 and −2.4 × 10 5 µmol/m 2 for the TVCD of NO 2 .The first interval (below −4 • C) seemed to be out of sync because there was a poor number of observations (lower than 1% of all observations).For the same reason, it was hard to unequivocally interpret results for the >32 • C interval (Figure 6a).
The meteorological parameter with the greatest effect on air pollution was PBLH (Section 3.3).A trend of decreasing air pollution with increasing PBLH values was observed.This can be explained by deeper convective mixing of the air pollution emitted at the surface.One exception was the first interval (0-250 m a.g.) for the TVCD of NO 2 ; however, only 1.5% of all observations were covered within this interval.The average decrease in the surface NO 2 mass concentration was 2.7 µg/m 3 with PBL at elevations higher than 250 m, while the average decrease in the TVCD of NO 2 was 1.3 × 10 5 µmol/m 2 with PBL at elevations higher than 250 m (Figure 6b).
A significant decrease in air pollution was also observed with increasing wind speed.The surface NO 2 mass concentration decreased by 1.5 µg/m 3 on average with a WS increase of 2 m/s, while the columnar density decreased by 1.8 × 10 5 µmol/m 2 .The greatest drops were observed between the first and second intervals at 2.7 µg/m 3 and 2.1 × 10 5 umol/m 2 , respectively.There was a greater decrease between 8-10 m/s and >10 m/s for the satellite NO 2 values; however, only 0.1% of all observations were collected under such conditions (Figure 6c).There were also observed differences in surface NO 2 and NO 2 TVCD values for different advections.This was confirmation that the air is cleaner during northern advections.This is caused by the movement of relatively clean Atlantic air masses from the north and the inflow of more polluted air masses from the south.Additionally, the eastern wind direction was more favorable for lower TVCD and surface NO 2 values than the western advections (Figure 6d).As was mentioned previously, NO 2 is mostly generated by human activities, and beyond the western border of Poland there are many human settlements.On the other hand, beyond the eastern border there are more forests, arable lands, grasslands, and wetlands.
The dependence above was confirmed by the distribution of surface NO 2 concentration and NO 2 TVCD with respect to the circulation types.Therefore, the pollution was higher during southern and western circulations, while the lowest pollution was observed during the northern and eastern circulations.Additionally, the pollution was higher during gradientless circulations (suffix o) when the air masses were slack and air mixing was obstructed (Figure 6e).As was already pointed out, the anthropogenic factors were also very significant for the distribution of NO 2 pollution.We found positive trends of 0.9 µg/m 3 and 0.7 × 10 5 µmol/m 2 increases in surface concentration and columnar NO 2 density, respectively, with increases in population density for the 10,000 people per pixel.However, these trends were not so evident or constant.We observed an exception for the 70-80,000 people-per-pixel interval, which was characterized by evident decreases in surface and columnar NO 2 .On the other hand, it was hard to interpret these two last intervals due to the poor numbers of observations, representing 1.8% and 0.9% of all observations, respectively.Moreover, it was assumed that in such areas, there was a lot of residential construction and a lower road density, positively influencing the NO 2 emissions [23,29,31].Regardless, a significant trend of increasing NO 2 was observed for growing populations (Figure 6g).
The nighttime radiance was the last analyzed variable with respect to changes in NO 2 concentration and columnar density.The tendency of the two types of air pollution was positive for the surface NO 2 but was negative for the TVCD.For the surface NO 2 , we observed a 1.1 µg/m 3 increase per 20 nW/m 2 /sr, while we observed an average 0.8 × 10 5 µmol/m 2 decrease per 20 nW/m 2 /sr for TROPOMI NO 2 .However, poor numbers of observations could lead to misleading conclusions.The last two intervals covered only 1.2% and 1.7% of the total, respectively.If they were excluded from the analysis, the trends for both the surface and satellite-measured NO 2 would be positive (Figure 6f).
To sum up, changes in NO 2 pollution, for both the surface mass concentration and columnar density, were observed with respect to the different meteorological conditions and anthropogenic factors.PBLH was the variable that was distinguished by the highest change in air pollution through the intervals.It corresponded well with the RF model and MLM model, which were strongly dependent on the PBLH.Additionally, lower air temperatures implied higher air pollution.This corresponded well with the significant impact on the model by the RAD, which was strictly connected to the temperature but also connected to the PBLH.Furthermore, it was confirmed that the wind speed affected the NO 2 pollution.A higher wind speed implied a lower surface NO 2 concentration and TVCD of NO 2 .The analysis on the advections and circulation types confirmed that southern and western advections implied higher levels of NO 2 .Finally, we affirmed the influence of human activities and settlements, with higher population densities and higher radiance levels due to nightlights being linked to increases in NO 2 .

Discussion
Chan et al. [36] performed similar research for Germany, which is quite climatically similar to Poland.They found a value of R 2 = 0.64 using a neural network approach.However, they compared the daily averaged surface mass concentration of NO 2 to the daily averaged NO 2 TVCD, so their result is not fully comparable with our results [36].Chan et al. [36] also verified the impact of each variable on their model.Both our results and their results show that the TVCD of NO 2 is the most important variable when trying to synergize satellite, meteorological, and geographical data in order to estimate surface pollution concentrations.On the other hand, the second most important variable in the mentioned study was elevation, which was one of the least important variables in our model.It is supposed that Germany has a more varied topography, so that variable has more influence.What is interesting is that in Chan et al.'s study, the impact of the PBLH on the model was very low [36].For the proposed RF model, the PBLH was the most important meteorological variable for hourly estimates and the second one for weekly estimates.This finding corresponds with the results reported by Kim et al. [29], who found that the PBLH was the most important meteorological factor (4th most important of all variables) in their model.In contrast, they claimed that the solar radiation influenced the results least in their model, while for the proposed model it was as important as the PBLH.However, for modeling purposes they used the hour and day of the year, for which the solar radiation was a kind of derivative [29].Thus, the influence of the solar radiation could be explained by those two variables.The validation of the model performed by Kim et al. [29] showed a value of R 2 = 0.59, which was similar to our results.Again, just like in our study and Chan et al.'s study [36], the model proposed by Kim et al. [29] tends to underestimate the surface NO 2 for highly polluted days.However, they claimed that this was intentional in order to optimize the predictions for mean concentrations [29].In other studies, in which only the NO 2 TVCD was used to the estimate surface NO 2 , worse results were achieved, including R 2 = 0.45 in Griffin et al.'s study [16] and R 2 = 0.48 in Jeong and Hong's study [19].
In the only study on TROPOMI NO 2 performed for Poland, Kawka et al. [87] compared NO 2 TVCD with surface NO 2 concentrations estimated using the GEM-AQ model [88] with the use of averaging kernels derived from the S-5P datasets [87].They estimated monthly averages, so their results cannot be compared with our results.However, it was pointed out that the estimates were more accurate when PBLH data were added to the model [87].
Our estimates were also compared to estimates provided by the CAMS model for the ensemble median of the models.Because of the availability of the CAMS data (over a three-year rolling archive), a comparison was performed for the 10 November 2018-30 June 2021 period.In this respect, the MAE, MAPE, and R 2 values of the RF model could be different from the values in Section 3.2.
The comparison revealed that the proposed RF model estimated surface NO 2 concentrations with ca.15% better accuracy than the CAMS ensemble median model (ca.1.5 µg/m 3 ).The R 2 for the RF model was 0.08 higher than the R 2 for the CAMS model.Both the RF model and CAMS model generally underestimated the NO 2 surface mass concentrations (Table 4).Figure 7 shows the spatial distribution and variability of the modeled surface NO 2 concentrations obtained from the RF model (Figure 7a,c), which were developed within the study, as well as the predictions derived from the CAMS model (ensemble median) (Figure 7b,d).The overall distribution of the surface NO 2 mass concentrations is quite similar across the years and methods.The most polluted areas, around cities such as Warszawa, Lodz, Katowice, Krakow, and Wroclaw, are distinguished by surface NO 2 ca.concentrations of ca. 10 µg/m 3 by both the RF and CAMS models.However, there are differences over the other capital cities (smaller cities) such as Bialystok, Olsztyn, Lublin, Rzeszow, and Gorzow Wielkopolski.In 2019 and 2020, the RF model (Figure 7a,c) indicated a slightly higher NO 2 level than the CAMS ensemble median (Figure 7b,d).In Figure 7a,c, single pixels can be observed over these cities (which clearly differ from the surrounding areas), which cannot be observed in Figure 7b,d.Moreover, the same situation can be observed for Bydgoszcz and Torun in 2020.According to the RF model, a mass concentration range of 4-6 µg/m 3 NO 2 was observed for these cities, while in the neighborhoods it was 2-4 µg/m 3 .The CAMS ensemble median indicated that in Torun and Bydgoszcz, as well as around these cities, the surface NO 2 concentrations were 4-6 µg/m 3 .At this point, it should be remembered that NO 2 is strictly connected to human activities, so the air pollution in the city should be higher than in the surrounding area.Thus, it is claimed that the proposed RF model is more sensitive in estimating surface NO 2 mass concentrations over human settlements.On the other hand, the CAMS ensemble median (Figure 7b,d) estimated higher surface NO 2 concentrations than the RF model (Figure 7a,c) over the area that starts from Warsaw and passes through Lodz, then reaches Katowice and Krakow.This is where one of the busiest motorways in Poland is located.It seems that the CAMS ensemble median is more sensitive to linear traffic emissions.
(Figure 7b,d) estimated higher surface NO2 concentrations than the RF model (Figure 7a,c) over the area that starts from Warsaw and passes through Lodz, then reaches Katowice and Krakow.This is where one of the busiest motorways in Poland is located.It seems that the CAMS ensemble median is more sensitive to linear traffic emissions.The model developed within this study is distinguished by its better accuracy (Table 4) and spatial recognition for surface NO2 concentrations than the CAMS (ensemble median) model (Figure 7).However, it must be remembered that the CAMS model provides data on many pollutants.Moreover, the CAMS data are provided hourly.The RF model created within the study is dependent on satellites, so it is impossible to predict surface NO2 concentrations every day using the described approach.
The proposed RF model is prone to similar aberrations as the models used in previous studies; high surface NO2 mass concentrations are underestimated while very low values (<5 μg/m 3 ) are overestimated [29,31,36,87,89].The previous studies did not indicate specific reasons for this but pointed out that it was characteristic of the modeling of NO2 mass concentrations in itself.Underestimations of high values and overestimations of low values were also observed for the ensemble median model estimates provided by CAMS [90,91].It is assumed that the main reason for the underestimations in the proposed random forest model's results is the underestimation of columnar values of NO2 provided by the Sentinel-5P measurements, which was the most important variable within the model.The underestimation of the columnar density is a widely known issue that has been described many times since the launch of S-5P.It was described in studies on Sentinel-5P NO2 concentrations [38,92].Moreover, it was observed by Griffin et al. [16] in a study performed over Canada, by Ialogno et al. [14] in a study performed over Finland, and by Liu et al. [93] in a study performed over China.Thus, this issue has been observed all over the The model developed within this study is distinguished by its better accuracy (Table 4) and spatial recognition for surface NO 2 concentrations than the CAMS (ensemble median) model (Figure 7).However, it must be remembered that the CAMS model provides data on many pollutants.Moreover, the CAMS data are provided hourly.The RF model created within the study is dependent on satellites, so it is impossible to predict surface NO 2 concentrations every day using the described approach.
The proposed RF model is prone to similar aberrations as the models used in previous studies; high surface NO 2 mass concentrations are underestimated while very low values (<5 µg/m 3 ) are overestimated [29,31,36,87,89].The previous studies did not indicate specific reasons for this but pointed out that it was characteristic of the modeling of NO 2 mass concentrations in itself.Underestimations of high values and overestimations of low values were also observed for the ensemble median model estimates provided by CAMS [90,91].It is assumed that the main reason for the underestimations in the proposed random forest model's results is the underestimation of columnar values of NO 2 provided by the Sentinel-5P measurements, which was the most important variable within the model.The underestimation of the columnar density is a widely known issue that has been described many times since the launch of S-5P.It was described in studies on Sentinel-5P NO 2 concentrations [38,92].Moreover, it was observed by Griffin et al. [16] in a study performed over Canada, by Ialogno et al. [14] in a study performed over Finland, and by Liu et al. [93] in a study performed over China.Thus, this issue has been observed all over the world.Regarding the reasons for underestimations, these include the coarse resolution of the a priori-derived profiles (1 • × 1 • ), aerosol impacts, uncertainties regarding the cloud fraction [86], and the size of the TROPOMI resolution for the atmospheric mass factor (AMF), which cannot resolve street-level variations in concentrations [94].As such, algorithms dedicated to providing columnar NO 2 concentrations from TROPOMI have been developed.This was the motivation for reprocessing the Sentinel-5P TROPOMI NO 2 product (S5P-PAL) with the algorithm from version 2.3.1, which was supposed to improve the accuracy of the NO 2 column density estimates over the most polluted areas [38].Underestimations of the TVCD of NO 2 had been also found for data derived by the OMI [94][95][96] and GOME [97].As a reason for the overestimation of low values, it has been indicated that during clear-sky conditions, which are also relatively clean air conditions, the AMF is underestimated and the NO 2 TVCD is overestimated [98].
To sum up, the proposed model underestimates high values and overestimates surface NO 2 mass concentrations.The estimates for the second and third quartiles were quite similar to the actual values.This corresponded well with the other studies and models.The fact that the problem has appeared within studies using the TROPOMI as well as the OMI and GOME means that the issue is caused by the satellite measurement itself, and not by the method of prediction.In contrast to the previous studies, the factors with the greatest effects on the estimates were different for the Polish study area.This led to the conclusion that in different regions, different variables are the most significant for the surface NO 2 mass concentrations.

Conclusions
This study aimed to estimate NO 2 surface mass concentrations based on columnar NO 2 obtained from Sentinel-5P measurements, meteorological conditions, and anthropogenic and geographical factors.Four approaches, including linear regression, multilinear regression, and machine learning methods such as random forest (RF) and support vector machine, were used to achieve the goal of this research.The estimations were performed with respect to the time ranges (hourly and weekly).
It was found that the random forest (RF) and support vector machine (SVM) models were the most accurate methods for estimating surface NO 2 concentrations.However, the RF model was chosen as the best solution as it was faster and required a computer with less computing power, making it applicable for further usage.The RF model demonstrated MAE values of 3.4 µg/m 3 (MAPE~37%) and 3.2 µg/m 3 (MAPE~31%) for the hourly and weekly estimates, respectively (Section 3.2).It also has to be pointed out that the ground measurements may have been influenced by some errors.In particular, the location of a certain station may not be representative for the whole pixel area, in addition to the significant NO 2 equipment uncertainty.In fact, the proposed model may be more accurate than is described by the statistics.Furthermore, it was observed that the RF model could be used for at least 120 days per year due to the cloud-free conditions.However, areas with ca.180 days a year with favorable conditions for estimation were distinguished (southern-western Poland) (Section 3.1).
The variables' impact on the estimations was also verified.It was found that the S-5P TVCD of NO 2 was the most important variable, which explained more than 50% of the predictions.Other important variables were the nightlights, solar radiation flux, road density, population, and planetary boundary layer height (PBLH).The impact of the PBLH was particularly significant, as suggested by Kawka et al. [87].They used only TROPOMI data, but they suggested using the PBLH in further research.The data quantity of the proposed model was quite similar to previous studies performed for other areas [16,19,29,36] (see Section 4).Importantly, the predictions obtained with the proposed RF model were better fitted to the actual surface NO 2 concentrations than the CAMS median ensemble estimations (ca.15% better accuracy), which suggested that the RF model is more accurate for the area of Poland.
Changes in NO 2 pollution, both in terms of the surface mass concentration and columnar density, were observed with respect to various meteorological conditions.The most significant changes were observed for the increase in PBLH when the air pollution decreased by 2.7 µg/m 3 at 250 m (Figure 5b).Additionally, lower air temperatures and lower wind speeds implied higher air pollution.Moreover, western and southern advections as well as atmospheric circulations were favorable to higher levels of NO 2 (Section 3.4).
To conclude, the key findings of this research were as follows: • There were at least 120 days per year when it was possible to perform model calculations for Poland using TROPOMI observations;

•
The results revealed that the machine learning methods (RF and SVM) gave ca.63% accuracy for hourly estimations of NO 2 and 69% accuracy for weekly averages;

•
The implementation of meteorological and anthropogenic variables improved the quality of the models.The MLM approach gave 6% (hourly) and 7% (weekly) lower MAPE values than the LM-S5P approach, while the RF and SVM approaches gave 11% (hourly) and 12% (weekly) lower MAPE values than the LM-S5P approach;

•
The planetary boundary layer height, solar radiation, nightlights, roads density, and population influenced the estimations the most; • The air temperature, wind speed, surface pressure, elevation, and type of atmospheric circulation influenced the estimations the least; • The trends for the surface NO 2 and TVCD of NO 2 were negative for increases in air temperature, PBLH, and wind speed, while they were positive for increases in population and nightlights;

•
The RF model created within the study was better fitted to the actual values than the CAMS ensemble median model.
The conclusions drawn from this study will be further used to improve the model and to verify its accuracy for other areas.Our attention was paid to the issue of the station's representativeness.Thus, in further research, we will verify the station's representativeness with respect to the area of the TROPOMI pixel.In addition, information on the vertical variability of the NO 2 concentrations (especially at the PBL and above) could improve the estimations of the surface concentrations.For this purpose, drone observations will be used to determine the NO 2 profiles and to improve our models.

Figure 1 .
Figure 1.Location of the GIOS stations measuring atmospheric NO2 mass concentrations.The green circles correspond to stations located over non-built-up areas (background stations), the orange circles correspond to stations located over suburban areas, and the red circles correspond to stations located over urban areas.

Figure 1 .
Figure 1.Location of the GIOS stations measuring atmospheric NO 2 mass concentrations.The green circles correspond to stations located over non-built-up areas (background stations), the orange circles correspond to stations located over suburban areas, and the red circles correspond to stations located over urban areas.
(a) Permute the variable; (b) Calculate the new model MSE according to the variable permutation; (c) Take the difference between the model MSE and new model MSE; 3.Collect the results in a list; 4.
2.1, only available observations for 2019 and 2020 were calculated.The numbers of cloud-free observations over Poland depended on the location.The highest numbers of such observations were observed both in 2019 and 2020 over the southwestern part of Poland (170-180 observations in 2019, 190-200 observations in 2020) and over the southeastern part of Poland (150-160 observations in 2019 and 190-200 observations in 2020).The lowest numbers of cloud-free observations were noticed over the northern part of Poland near the Baltic Sea area, where only 130-140 observations were noted in 2019 and 2020.Additionally, the area near the northeastern border was characterized by low numbers of cloud-free images; there were 130-140 cloud-free observations in each year (Figure

Figure 2 .
Figure 2. Days when at least one S5-P image with a qa_value > 0.75 and cloud fraction < 0.50 (further: useful image) was captured: (a) each pixel represents the number of days when at least one useful image was captured in 2019; (b) each pixel represents the number of days when at least one useful image was captured in 2020; (c) the frequency of the pixels with a qa_value > 0.75 and cloud fraction < 0.50 in 2019; (d) the frequency of pixels with a qa_value > 0.75 and cloud fraction < 0.50 in 2020.

Figure 2 .
Figure 2. Days when at least one S5-P image with a qa_value > 0.75 and cloud fraction < 0.50 (further: useful image) was captured: (a) each pixel represents the number of days when at least one useful image was captured in 2019; (b) each pixel represents the number of days when at least one useful image was captured in 2020; (c) the frequency of the pixels with a qa_value > 0.75 and cloud fraction < 0.50 in 2019; (d) the frequency of pixels with a qa_value > 0.75 and cloud fraction < 0.50 in 2020.

27 Figure 3 .
Figure 3. Binned scatter plots showing the distribution of predicted vs. actual (ground-based observations) NO2 mass concentrations: (a-d) hourly measurements; (e-h) weekly averages.The black line shows a perfect agreement, while the red line is according to the predicted values vs. the actual values: (a,e) LM-S5P; (b,f) MLM; (c,g) RF; (d,h) RF.The count is the frequency of actual NO2 values expressed using a logarithmic scale.

Figure 3 .
Figure 3. Binned scatter plots showing the distribution of predicted vs. actual (ground-based observations) NO 2 mass concentrations: (a-d) hourly measurements; (e-h) weekly averages.The black line shows a perfect agreement, while the red line is according to the predicted values vs. the actual values: (a,e) LM-S5P; (b,f) MLM; (c,g) RF; (d,h) RF.The count is the frequency of actual NO 2 values expressed using a logarithmic scale.

Figure 4 .
Figure 4. Quantile-quantile plots, showing (a,c) density plots and (b,d) differences between values predicted by the RF model for surface NO2 mass concentrations and actual surface NO2 mass concentrations: (a,b) hourly measurements; (c,d) weekly averages.

Figure 4 .
Figure 4. Quantile-quantile plots, showing density plots and differences between values predicted by the RF model for surface NO 2 mass concentrations and actual surface NO 2 mass concentrations: (a) hourly measurements; (b) weekly averages.

Figure 5 .
Figure 5. Percentage increases in MSE when excluding variables from the RF model: (a) hourly measurements; (b) weekly averages.

Figure 5 .
Figure 5. Percentage increases in MSE when excluding variables from the RF model: (a) hourly measurements; (b) weekly averages.

Figure 6 .
Figure 6.Changes in surface NO2 mass concentration(μg/m 3 ) and TVCD (umol/m 2 ) x10 5 of NO2 due to meteorological conditions and anthropogenic factors.On the left y axis, the red font and red bars

Figure 6 .
Figure 6.Changes in surface NO 2 mass concentration (µg/m 3 ) and TVCD (umol/m 2 ) × 10 5 of NO 2 due to meteorological conditions and anthropogenic factors.On the left y axis, the red font and red bars correspond to the surface NO 2 mass concentrations, while on the right y axis, the blue font and blue bars correspond to the TVCD of NO 2 .Every chart shows changes in air pollution with respect to different variables: (a) air temperature; (b) planetary boundary layer height; (c) wind speed; (d) wind direction; (e) atmospheric circulation; (f) nighttime radiance; (g) population.Note: % on the bars refers to the percentage of observations of the total dataset.

Figure 7 .
Figure 7. Maps of estimated surface mass concentrations of NO2 (μg/m 3 ) for yearly averages: (a) based on the RF model for 2019; (b) based on the CAMS ensemble median for 2019; (c) based on the RF model for 2020; (d) based on the CAMS ensemble median for 2020.The red points are the capital cities of the voivodeships (NUTS-2).

Figure 7 .
Figure 7. Maps of estimated surface mass concentrations of NO 2 (µg/m 3 ) for yearly averages: (a) based on the RF model for 2019; (b) based on the CAMS ensemble median for 2019; (c) based on the RF model for 2020; (d) based on the CAMS ensemble median for 2020.The red points are the capital cities of the voivodeships (NUTS-2).
85].2.3.4.Determining the Variability of the NO 2 TVCD and Surface NO 2 with Respect to the Meteorological Conditions and Anthropogenic FactorsThere were verified changes in the surface concentrations of NO 2 and NO 2 TVCD due to different meteorological conditions (T, PBLH, P, WS, WD, CTYPE), as well as anthropogenic and geographical factors (nightlights and population).The means of the NO 2 pollution levels (surface and columnar) were calculated for the following intervals:

Table 1 .
Statistics of prediction of NO 2 surface mass concentration with use of various techniques: LM-S5P-Linear regression with one independent variable (NO 2 derived by Sentinel-5P); MLM-Multiple linear regression with several independent variables; RF-Random forest; SVM-Support Vector Machine.

Table 2 .
Bias [µg/m 3 ] for estimations performed by RF model within the quartiles both for hourly measurements and weekly averages.

Table 4 .
Comparison of RF and CAMS ensemble median models' statistics.