Comparison of S5P/TROPOMI Inferred NO 2 Surface Concentrations with In Situ Measurements over Central Europe

: The aim of this paper is to evaluate the surface concentration of nitrogen dioxide (NO 2 ) inferred from the Sentinel-5 Precursor Tropospheric Monitoring Instrument (S5P/TROPOMI) NO 2 tropospheric Simulations of the NO 2 tropospheric vertical column densities and surface concentrations In summer, biases are generally higher for all station types, especially for the trafﬁc stations (~ − 75%), ranging from − 54% to − 30% for the background and industrial stations.


Introduction
Nitrogen oxides (NO X = NO + NO 2 ) play a significant role in tropospheric chemistry and have a negative impact on air quality. Nitrogen dioxide (NO 2 ) is a precursor of tropospheric ozone and aerosols [1][2][3] and has been associated with premature deaths and high mortality rates [4,5]. The short lifetime and the high spatial variability of NO 2 August 2019), from an ascending sun-synchronous polar orbit [21][22][23]. The high spatial resolution combined with an improved signal-to-noise ratio compared to previous spaceborne instruments allows the detection of local anthropogenic emission sources [8,[24][25][26]. The tropospheric NO 2 VCD retrieval is performed in the 405-465 nm range. TROPOMI data are generated with the DOMINO (Dutch OMI NO 2 ) algorithm, which is also used for the Ozone Monitoring Instrument (OMI) data production [27,28]. Details concerning the standard operational NO 2 product can be found in the Algorithm Theoretical Basis Document [22] and the Product User Manual [29].
In this work, we use offline v1.3 and v2.3 S5P/TROPOMI tropospheric vertical column densities (VCDs) for two sensing periods (June-July 2019 and December 2019-January 2020). According to the latest validation report [30], a mean negative bias of −32% is found in the tropospheric column densities between TROPOMI v1.3 and MAX-DOAS groundbased data from 27 stations. The bias depends on the pollution level at the station, being positive over clean areas (18%) and negative (−46%) over highly polluted areas, but overall within the mission requirement of 50%. The low TROPOMI tropospheric VCDs over highly polluted areas have led to the implementation of improvements in the TROPOMI NO 2 retrieval algorithm leading to version 2.3. The new TROPOMI v2.3 data, publicly available via the ESA Sentinel-5P Product Algorithm Laboratory (S5P-PAL) show, on average, higher tropospheric VCDs up to 10-40% compared to the v1.3 data, depending on the level of pollution and season, especially in winter over mid and high latitudes [23]. Comparisons with ground-based data show that v2.3 TROPOMI data reduces the mean bias of the tropospheric columns from −32% to −23%, the stratospheric from −6% to −3% and the total column bias from −12% to −5%, respectively [23].
For the purposes of this study, we use observations of scenes that are mostly cloudfree with an associated quality flag higher than 0.75. The daily orbital files for the sensing periods are gridded onto a 0.05 • × 0.1 • grid covering the European domain. Figure 1 Figure A1. The v2.3 columns show higher values (by 16% on average) over highly polluted areas, namely the Po valley in northern Italy, the cities of Essen, Dusseldorf and Koln in western Germany and the cities of Rotterdam, Brussels and Paris. The differences are more pronounced in winter, whereas slightly higher NO 2 VCDs in v1.3 TROPOMI data are observed over a handful of rural and background areas located in central Germany. More specifically, the TROPOMI v2. 3 tropospheric VCDs are about 3% higher in June and July compared to the v1.3 dataset, whereas in winter, v2.3 data are by 11-18% higher for the whole domain. Over the hotspots mentioned above, the bias is approximately 13% in summer and ranges between 16-26% in winter, as reported in [23].

LOTOS-EUROS CTM Simulations
In this work, the LOTOS-EUROS chemical transport model is used [31]. LO-TOS-EUROS is one of the nine state-of-the-art models used in the Copernicus Atmosphere Monitoring Service (CAMS) to provide air quality forecasts to a broad range of

LOTOS-EUROS CTM Simulations
In this work, the LOTOS-EUROS chemical transport model is used [31]. LOTOS-EUROS is one of the nine state-of-the-art models used in the Copernicus Atmosphere Monitoring Service (CAMS) to provide air quality forecasts to a broad range of users. It simulates distinct components (e.g., oxidants, primary aerosol, heavy metals) in three dimensions in the lower atmosphere [32]. The model has been used in a wide range of air quality studies and has been extensively evaluated against in situ measurements. A model evaluation against ground-based measurements over two major Greek cities has shown that the modeled NO 2 surface concentrations show a high spatial correlation of 0.86 and a moderate underestimation of about 10% [33]. When compared to industrial stations located near power plants in Greece, LOTOS-EUROS NO 2 surface concentrations mean seasonal bias improves to 2 µg/m 3 from 10 µg/m 3 , after assimilating the spaceborne TROPOMI NO 2 observations [34]. The model has also been recently used to estimate changes in NO 2 emissions due to the COVID-19 pandemic restrictive measures [26] and to study NO 2 concentrations attributed to shipping activity in the Mediterranean and the Black Sea basin [35]. Both studies show significant agreement with TROPOMI satellite observations (R~0.95).
For this study, we use the LOTOS-EUROS v2. 02.001 open-source version and more specifically the NO 2 tropospheric vertical column densities and NO 2 surface concentrations over the central European domain and selected pixels representative of the ground-based station locations. A nested domain configuration was used. Two model runs with different spatial resolutions were performed to maximize the smooth transition of dynamics between a coarser European domain and a refined central European domain. The first (outer) run covers the European domain from 15 • W to 45 • E and from 30 • N to 60 • N with a horizontal resolution of 0.25 • × 0.25 • . Boundary and initial conditions of this run are obtained from the CAMS global near-real-time (NRT) product with a spatial resolution of 35 km × 35 km. The second (inner) run was performed for central Europe, from 2 • E to 18 • E and from 39 • N to 55 • N with a horizontal resolution of 0.05 • × 0.10 • (latitude x longitude). Boundary conditions of the inner run are obtained from the lower resolution outer domain. The model simulations are driven by operational meteorological data from the European Centre for Medium-Range Weather Forecasts (ECMWF) with a horizontal resolution of 7 km × 7 km [36]. Finally, the CAMS-REG (CAMS regional European emissions) v4.2 for the year 2017 available at 0.05 • × 0.10 • [37] is the anthropogenic emission inventory used in the model.

CAMS Satellite Operator (CSO)
The CAMS Satellite Operator (CSO, https://ci.tno.nl/gitlab/cams/cso, last accessed 30 June 2022) is a toolbox to facilitate assimilation of satellite observations in regional air quality models. It contains two main entities: a preprocessor that can be used to download and convert satellite data, in particular TROPOMI data, and an observation operator that can be added to the source code of a model simulation. With this operator, the module can perform simulations of the satellite retrievals and use them in a data assimilation procedure. The observation operator is included in the LOTOS-EUROS source code, and it is used to provide simulations, as described below.
More specifically, the TROPOMI tropospheric NO 2 retrieval product (y r ) is treated by CSO as a profile with one layer from the surface up to 200 hPa. The simulation of a retrieval product from a model state does not require an a priori profile and is denoted with: where: • y s is the simulated retrieval defined on a single layer profile, n r = 1; • A trop is the tropospheric averaging kernel with shape (n r , n a ); in this product n a = 34, the number of a priori layers covering the full atmosphere; • X is a concentration profile defined on model layers covering the full atmosphere; values above 200 hPa are actually ignored; • H extracts a simulated profile from the model using vertical and horizontal interpolation.
The tropospheric averaging kernel is derived from Equation (2): using the following entities from the retrieval product: • A is the total column averaging kernel; • M is the scalar total column air mass factor; • M trop is the tropospheric column air mass factor; • l tp is the index of the layer containing the tropopause in the a priori profile.
The air mass factors in the TROPOMI product are based on simulations with the TM5-MP global atmospheric model [38] at the coarser resolution of 0.5 • × 0.5 • . Thus, the air mass factors do not represent the strong gradients near high emitting sources. This can be improved by replacing the original tropospheric averaging kernels of the retrieval, depending on the a priori profiles of the TM5-MP model, with the LOTOS-EUROS higher resolution model profiles. As described in [29], the first step is to estimate an alternative tropospheric air mass factor using the alternative a priori profile (x a ), in our case a LOTOS-EUROS simulation: This is used to obtain the updated retrieval and tropospheric averaging kernel as scaled versions of the original variables, as described in Equations (4) and (5).
Thus, the new simulations can be estimated by replacing the variables in Equation (1), initially with Equation (5) and then with Equation (3).
The effect of the aforementioned process is illustrated in Figure 2. Figure 2a shows the original TROPOMI v2.3 NO 2 VCDs, while Figure 2b depicts the TROPOMI v2.3 NO 2 VCDs after the application of the local air mass factor correction described in Equations (4) and (5). Figure 2c shows the differences between those two datasets for December 2019. The updated TROPOMI v2.3 NO 2 VCDs with the local air mass factor correction show sharper gradients, especially over highly polluted areas in western Germany, the Netherlands, Belgium, the Po valley, and cities such as Paris, Rome and Naples. Increased NO 2 concentrations can also be observed over rural and background areas in central Germany and eastern France. Over the whole domain, the updated NO 2 VCDs are higher by approximately 20% when compared to the original NO 2 VCDs. Over hotspots, the updated TROPOMI v2.3 NO 2 VCDs show higher levels by approximately 18% for both periods, while over rural areas concentrations are higher by 16% and 22% for the summer and winter, respectively. the updated TROPOMI v2.3 NO2 VCDs show higher levels by approximately 18% for both periods, while over rural areas concentrations are higher by 16% and 22% for the summer and winter, respectively.
Note that assimilation is not applied for the purposes of this study. The NO2 products from the CSO process are used as input for the estimation of TROPOMI inferred NO2 surface concentrations, as described in detail in the methodology.

European Environmental Agency In Situ Measurements
Hourly in situ measurements of NO2 surface concentrations over central Europe are obtained from the European Environment Agency (EEA) (https://discomap.eea.europa.eu/map/fme/AirQualityExport.htm, last accessed on 11 July 2022). The European air quality database includes information concerning the monitoring of air quality from all member countries of the European Union (EU) and additionally some countries that cooperate with the EEA. This dataset is used for the evaluation of the satellite-derived NO2 surface concentrations and the LOTOS-EUROS simulations. In situ measurements at 11:00 UTC were chosen as representative of the overpass time of S5P for June, July, December 2019 and January 2020. Note that assimilation is not applied for the purposes of this study. The NO 2 products from the CSO process are used as input for the estimation of TROPOMI inferred NO 2 surface concentrations, as described in detail in the methodology.

European Environmental Agency In Situ Measurements
Hourly in situ measurements of NO 2 surface concentrations over central Europe are obtained from the European Environment Agency (EEA) (https://discomap.eea.europa.eu/ map/fme/AirQualityExport.htm, last accessed on 11 July 2022). The European air quality database includes information concerning the monitoring of air quality from all member countries of the European Union (EU) and additionally some countries that cooperate with the EEA. This dataset is used for the evaluation of the satellite-derived NO 2 surface concentrations and the LOTOS-EUROS simulations. In situ measurements at 11:00 UTC were chosen as representative of the overpass time of S5P for June, July, December 2019 and January 2020.
For this work, data from 236 stations were used for over 20 locations in central Europe ( Figure 3). The choice of the station location was not arbitrary and specific criteria were considered. The stations had to be distributed all over the domain, thus comparisons could be made for various regions and area types (traffic, background, rural, etc.). The stations are categorized in the EEA database in the following types: urban traffic (81), suburban traffic (7), urban background (86), suburban background (30), rural background (19), suburban industrial (6) and rural industrial (7).

Methodology
The objective of this work is to estimate NO 2 surface concentrations inferred from S5P/TROPOMI NO 2 tropospheric VCDs using as input LOTOS-EUROS CTM simulations. The basis of the methodology is described by Equation (7), originally applied in [9]. According to Equation (7), the derived NO 2 satellite concentrations are equal to the fraction of the NO 2 VCDs of the satellite and the chemical transport model multiplied by the surface concentration of the lowest vertical layer of the model.
where: For this work, data from 236 stations were used for over 20 locations in central Europe ( Figure 3). The choice of the station location was not arbitrary and specific criteria were considered. The stations had to be distributed all over the domain, thus comparisons could be made for various regions and area types (traffic, background, rural, etc.). The stations are categorized in the EEA database in the following types: urban traffic (81), suburban traffic (7), urban background (86), suburban background (30), rural background (19), suburban industrial (6) and rural industrial (7).

Methodology
The objective of this work is to estimate NO2 surface concentrations inferred from S5P/TROPOMI NO2 tropospheric VCDs using as input LOTOS-EUROS CTM simulations. The basis of the methodology is described by Equation (7), originally applied in [9]. According to Equation (7), the derived NO2 satellite concentrations are equal to the fraction of the NO2 VCDs of the satellite and the chemical transport model multiplied by the surface concentration of the lowest vertical layer of the model. where: • So, inferred TROPOMI NO2 surface concentration; • SG, NO2 surface concentration of the model; • ΩG, NO2 tropospheric VCDs of the model; • Ωο, NO2 tropospheric VCDs from the satellite observations.
The model vertical profiles are obtained for 11:00 UTC, corresponding to the closest time of the TROPOMI overpass across the domain. Three surface datasets are estimated based on three different setups involving the various datasets of the simulations and the satellite retrievals. More specifically, the first dataset of the inferred TROPOMI NO2 surface concentrations is derived by using as input the TROPOMI NO2 tropospheric VCDs and the a priori LOTOS-EUROS NO2 tropospheric VCDs and surface concentrations. The second dataset includes the CSO simulations where the TM5-MP averaging kernels are applied to the LOTOS-EUROS vertical profiles. Finally, the third dataset is derived by applying the satellite and model NO2 VCDs, updated with the new air mass factors and averaging kernels, and the a priori LOTOS-EUROS NO2 surface concentrations to Equa- The model vertical profiles are obtained for 11:00 UTC, corresponding to the closest time of the TROPOMI overpass across the domain. Three surface datasets are estimated based on three different setups involving the various datasets of the simulations and the satellite retrievals. More specifically, the first dataset of the inferred TROPOMI NO 2 surface concentrations is derived by using as input the TROPOMI NO 2 tropospheric VCDs and the a priori LOTOS-EUROS NO 2 tropospheric VCDs and surface concentrations. The second dataset includes the CSO simulations where the TM5-MP averaging kernels are applied to the LOTOS-EUROS vertical profiles. Finally, the third dataset is derived by applying the satellite and model NO 2 VCDs, updated with the new air mass factors and averaging kernels, and the a priori LOTOS-EUROS NO 2 surface concentrations to Equation (7). Table 1 describes the model and satellite datasets used in each setup applied to the methodology to derive the inferred NO 2 satellite surface concentrations products, as discussed above. The modeled and inferred TROPOMI NO 2 surface concentrations of the closest pixels to the station locations are selected to carry out comparisons with the EEA in situ measurements. To ease the understanding of the process, we provide here a first example of the results, which are discussed in detail further on. Figure 4 shows the derived satellite NO 2 surface concentrations (in µg/m 3 ) for July 2019 (left column) and for January 2020 (right column) based on the three setups shown in Table 1. Figure 4a,b depict the inferred TROPOMI v2.3 NO 2 surface concentrations of the first setup with the original TROPOMI v2.3 product and the a priori LOTOS-EUROS simulations. Figure 4c,d show the second setup of inferred NO 2 surface concentrations with the inclusion of the CSO simulations and the application of the TM5-MP averaging kernels on the LOTOS-EUROS profiles. Finally, Figure 4e,f illustrate the third setup of the estimated NO 2 satellite surface concentrations with the updated averaging kernels (AKs) and air mass factors. It is evident that as we move from the case of not applying AKs (Figure 4a  (e,f) TROPOMI and LOTOS-EUROS products after application of the CSO AMFs.

Investigation into Influencing Quantities
In this section, the most prominent quantities that affect the methodology are studied and the results are analyzed. The aim of the derivation of satellite NO 2 surface concentrations is mainly to provide information at locations where ground-based stations do not exist. Therefore, each quantity involved needs to be studied in detail and cross-validated with in situ measurements. Three instances and their imprint on the results are examined: the vertical levelling scheme used in the LOTOS-EUROS CTM and the CSO operator simulations (Section 3.1.1), the versions of the S5P/TROPOMI satellite data (Section 3.1.2), and the new updated air mass factors estimated through the CSO process (Section 3.1.3).

LOTOS-EUROS Vertical Leveling Scheme
Initially, the effect of the LOTOS-EUROS CTM vertical leveling scheme on the results is examined. There are three methods to define vertical layers in the LOTOS-EUROS configuration: the mixed-layer definition, the hybrid-layer definition and the meteo-level definition [39]. The meteo-level definition, used in this study, adopts the level definition of the meteorological data. Layer interfaces can be defined as pressures or heights above the surface, depending on the meteorological data. This option can be more realistic when using the model at high resolution, depending on the application and resolution of the input meteorology.
In this work, the meteo-level definition is used in order to keep the model as consistent as possible with the ECMWF meteorological data. Two different meteo-leveling schemes are applied on the model runs. The first setup of model simulations, which is the base setup (hereafter mentioned as meteo12 leveling scheme), uses 12 vertical layers and the second setup (hereafter mentioned as meteo34 leveling scheme) uses the same configuration as the ECMWF model with 34 vertical layers. The meteo12 model simulations extend to approximately 9 km whereas the meteo34 simulations extend to 30 km. Both schemes include eight layers on top of the model, 20 and 42 total layers in total, respectively, filled by the boundary conditions in order to have full atmosphere to simulate total NO 2 columns. The first three layers of both schemes are identical. In meteo12, a coarsening of the layers takes place after the first three model levels, whereas the second setup is more detailed, providing information on each vertical layer corresponding to the ECMWF vertical layers. The meteo12 scheme is very efficient in terms of computation time, while the meteo34 scheme is computationally more expensive [40]. Figure 5 shows the LOTOS-EUROS NO 2 surface concentration of the first model layer for both leveling schemes for a zoomed-in area covering the regions of Belgium, western Germany and the Netherlands. At first glance, no significant differences can be spotted between the meteo12 (Figure 5a) and the meteo34 leveling schemes (Figure 5b) NO 2 surface concentrations. By observing the absolute differences between the NO 2 surface concentrations of the meteo34 and the meteo12 schemes (Figure 5c), however, it is evident that the meteo34 leveling scheme results in slightly higher concentrations over high-emitting land areas, by a mean of +1.2 µg/m 3 , and sharper gradients over hotspots but generally lower concentrations over the sea. The same pattern is also observed in the winter months ( Figure A2). A more detailed view of the differences between the two leveling schemes is shown in the vertical profiles of the modeled NO2 concentrations ( Figure 6) for June 2019 and in Figure A3 for January 2020. Layer interfaces are defined as heights above the surface according to the ECMWF data. Figure 6a and Figure A3a show the vertical profiles of a hotspot pixel while Figure 6b and Figure A3b depict the vertical profiles of a rural pixel. Figure 6c and Figure A3c show the differences between the two vertical schemes. Both hotspot and rural pixels are selected as the closest to a traffic and a rural station in the Netherlands, within the city of Amsterdam. Differences between the vertical profiles of the two leveling schemes are calculated for the first 12 common reference heights, namely the top of each layer from the meteo12 scheme, for both hotspot and rural pixels (Figures 6c and A3c). In both summer and winter, the meteo34 scheme shows higher concentrations for the first three layers. On the contrary, meteo12 shows higher NO2 concentrations between the fifth and the ninth layer (between 0.12 and 1.5 km), while for higher layers the differences become negligible. Differences are more pronounced for the hotspot pixel, where the meteo34 leveling scheme shows higher NO2 concentrations for the first three layers by 0.2 μg/m 3 in June 2019 and by 0.9 μg/m 3 for January 2020 ( Figure A3c). The rural differences are an order of magnitude smaller than for hotspot pixels. Overall, for the first model layer, which is used to derive satellite NO2 surface concentrations, the meteo34 leveling scheme shows 2-4% higher concentrations over the hotspot pixel and 6-10% over the rural pixel for both periods. For the whole central European domain and the first model layer, the meteo34 leveling scheme shows approximately 5% higher NO2 concentrations for the summer months and 3% for the winter months. show the differences between the two vertical schemes. Both hotspot and rural pixels are selected as the closest to a traffic and a rural station in the Netherlands, within the city of Amsterdam. Differences between the vertical profiles of the two leveling schemes are calculated for the first 12 common reference heights, namely the top of each layer from the meteo12 scheme, for both hotspot and rural pixels (Figures 6c and A3c). In both summer and winter, the meteo34 scheme shows higher concentrations for the first three layers. On the contrary, meteo12 shows higher NO 2 concentrations between the fifth and the ninth layer (between 0.12 and 1.5 km), while for higher layers the differences become negligible. Differences are more pronounced for the hotspot pixel, where the meteo34 leveling scheme shows higher NO 2 concentrations for the first three layers by 0.2 µg/m 3 in June 2019 and by 0.9 µg/m 3 for January 2020 ( Figure A3c). The rural differences are an order of magnitude smaller than for hotspot pixels. Overall, for the first model layer, which is used to derive satellite NO 2 surface concentrations, the meteo34 leveling scheme shows 2-4% higher concentrations over the hotspot pixel and 6-10% over the rural pixel for both periods. For the whole central European domain and the first model layer, the meteo34 leveling scheme shows approximately 5% higher NO 2 concentrations for the summer months and 3% for the winter months.
The LOTOS-EUROS NO 2 surface concentrations of the first layer from both leveling schemes are applied to the third setup (Table 1). Inferred TROPOMI v2.3 NO 2 surface concentrations are then estimated for each station type and studied period. The output is two surface products, updated with the new air mass factors and averaging kernels, derived from the two different leveling schemes. Those newly estimated datasets are intercompared for all station types and studied periods in order to assess the effect of the leveling scheme to the implemented methodology. Figure 7 shows the scatter density plots of rural background (Figure 7a) and rural industrial (Figure 7c) stations for the two leveling schemes (Figure 7b,d) for the winter. The LOTOS-EUROS NO2 surface concentrations of the first layer from both leveling schemes are applied to the third setup (Table 1). Inferred TROPOMI v2.3 NO2 surface concentrations are then estimated for each station type and studied period. The output is two surface products, updated with the new air mass factors and averaging kernels, derived from the two different leveling schemes. Those newly estimated datasets are intercompared for all station types and studied periods in order to assess the effect of the leveling scheme to the implemented methodology. Figure 7 shows the scatter density plots of rural background ( Figure 7a) and rural industrial (Figure 7c) stations for the two leveling schemes (Figure 7b,d) for the winter. NO2 TROPOMI inferred surface concentrations show an overall good agreement with the in situ measurements for both station types with correlation coefficients between 0.53 and 0.7. More specifically, the rural background NO2 surface concentrations derived from the meteo12 leveling scheme show a correlation of 0.53 and slope of 0.67, whereas the concentrations derived from the meteo34 leveling scheme show a slightly higher correlation (0.55) and a slope of 0.75. Rural industrial correlations are 0.70 and 0.67, and the slopes are 0.79 and 0.94, respectively. Tables 2 and A1 summarize the correlation coefficient, slope, and relative bias of the comparison between the inferred NO2 TROPOMI v2.3 surface concentrations, derived from each leveling scheme, and the in situ measurements for winter and summer, respectively. Overall, the use of the meteo34 leveling scheme leads to improvements for almost all statistical parameters examined. The urban and suburban traffic stations bias is reduced, from −24.55% to −20.70% and from −26.90% to −23.18%, respectively. Suburban industrial stations bias is lower in the case of the meteo34 leveling scheme (−9.70%) compared to the meteo12 (−15.66%) and the rural industrial stations bias is significantly improved from −15.57% to −4.32%. For background stations, the mean relative bias is slightly higher in the case of the meteo34 leveling scheme for all station types by ~5-7%. Slopes are closer to the 1:1 line for the NO2 surface concentrations derived from the me-teo34 leveling scheme, except for the urban background stations. Correlation coefficients are very similar for both leveling schemes with the highest being calculated for the industrial stations (0.63, 0.62 for the suburban-industrial and 0.7, 0.67 for the ru- In the summer (Table A1), correlations and slopes are generally lower compared to the winter. For traffic stations, correlations range from 0.10 to 0.32 and slopes from 0.03 to 0.14. Relative biases are extremely high for both leveling schemes (~−75%), showing an overall poor agreement with the in situ data. This might be attributed to the higher underestimation of the in situ measurements by the model during the summer. Background stations show better statistical indicators, especially for the NO2 surface concentrations derived from the meteo34 leveling scheme. Relative biases are lower for the meteo34 leveling scheme by approximately 8% when compared to meteo12. Finally, industrial stations show the highest correlations of all the station types (ranging 0.58-0.63) and the relative bias is lower by ~5% for the surface concentrations derived from the meteo34 leveling scheme.
Overall, the meteo34 leveling scheme yields a better agreement with the in situ measurements. Slopes are closer to 1 and biases are lower for most station types, except for the background stations in winter, where a modest overestimation of the ground-based measurements is found. The only significant drawback of applying the meteo34 leveling scheme to a larger dataset and longer period is that this option is computationally more expensive. Tables 2 and A1 summarize the correlation coefficient, slope, and relative bias of the comparison between the inferred NO 2 TROPOMI v2.3 surface concentrations, derived from each leveling scheme, and the in situ measurements for winter and summer, respectively. Overall, the use of the meteo34 leveling scheme leads to improvements for almost all statistical parameters examined. The urban and suburban traffic stations bias is reduced, from −24.55% to −20.70% and from −26.90% to −23.18%, respectively. Suburban industrial stations bias is lower in the case of the meteo34 leveling scheme (−9.70%) compared to the meteo12 (−15.66%) and the rural industrial stations bias is significantly improved from −15.57% to −4.32%. For background stations, the mean relative bias is slightly higher in the case of the meteo34 leveling scheme for all station types by~5-7%. Slopes are closer to the 1:1 line for the NO 2 surface concentrations derived from the meteo34 leveling scheme, except for the urban background stations. Correlation coefficients are very similar for both leveling schemes with the highest being calculated for the industrial stations (0.63, 0.62 for the suburban-industrial and 0.7, 0.67 for the rural-industrial stations) and the lowest for the traffic stations (0.47,0.48 for the urban-traffic and 0.43, 0.45 for the suburban-traffic). In the summer (Table A1), correlations and slopes are generally lower compared to the winter. For traffic stations, correlations range from 0.10 to 0.32 and slopes from 0.03 to 0.14. Relative biases are extremely high for both leveling schemes (~−75%), showing an overall poor agreement with the in situ data. This might be attributed to the higher underestimation of the in situ measurements by the model during the summer. Background stations show better statistical indicators, especially for the NO 2 surface concentrations derived from the meteo34 leveling scheme. Relative biases are lower for the meteo34 leveling scheme by approximately 8% when compared to meteo12. Finally, industrial stations show the highest correlations of all the station types (ranging 0.58-0.63) and the relative bias is lower by~5% for the surface concentrations derived from the meteo34 leveling scheme.
Overall, the meteo34 leveling scheme yields a better agreement with the in situ measurements. Slopes are closer to 1 and biases are lower for most station types, except for the background stations in winter, where a modest overestimation of the ground-based measurements is found. The only significant drawback of applying the meteo34 leveling scheme to a larger dataset and longer period is that this option is computationally more expensive.

S5P/TROPOMI Versions Comparison
Another quantity that affects the results is the product version of the TROPOMI tropospheric NO 2 VCDs. Figures 1c and A1c have already shown differences between the TROPOMI v1.3 and the TROPOMI v2.3 NO 2 tropospheric VCDs. TROPOMI v2.3 NO 2 concentrations are higher by approximately 3% in summer and by 11-18% in winter. Both v1.3 and v2.3 TROPOMI tropospheric VCDs are used as input in the implemented methodology for all the possible setups (Table 1). Here, the imprint on the derived NO 2 surface concentrations of the third setup is shown for winter. The meteo12 leveling scheme was applied for the LOTOS-EUROS simulations due to computational reasons. Figure 8 shows the scatter density plots of the urban traffic and background stations between the inferred TROPOMI v1.3 and v2.3 NO 2 surface concentrations of the third setup and the in situ measurements, for the winter. Both versions of TROPOMI inferred NO 2 surface concentrations show nearly identical moderate correlations for both urban traffic and background stations ( Figure 8). However, the relative bias is much improved, and the slope is closer to 1 in the case of the TROPOMI v2.3 inferred data, indicating that the concentrations of the latter dataset are closer to the ground-based truth. Overall, the comparisons between the TROPOMI versions and the in situ measurements clearly show that the TROPOMI v2.3-inferred NO2 surface concentrations, after the application of the updated air mass factors, correlate much better with the ground-based measurements.  The statistical indicators of Table 3 show a significant improvement for the TROPOMI v2.3-derived NO 2 surface concentrations for all station types except from the rural background stations. More specifically, the mean absolute bias of the urban and suburban traffic stations decreases from 15.46 to 10.46 µg/m 3 and from 20.19 to 11.53 µg/m 3 , while there is a significant improvement in the slopes. Urban and suburban background TROPOMI v2.3 NO 2 mean absolute bias improves from 3.86 to −2.21 µg/m 3 and from 2.27 to −0.89 µg/m 3 . Rural background TROPOMI v1.3 NO 2 inferred surface concentrations lay close to the ground-based stations measurements with a mean absolute bias of 0.05 µg/m 3 , whereas the TROPOMI v2.3 slightly overestimates the in situ measurements with a bias of −1.97 µg/m 3 . Slopes do not show a major improvement for the TROPOMI v2.3 background-inferred NO 2 surface concentrations. This is possibly related to the known lingering background values issues of the TROPOMI tropospheric NO 2 data [22]. Finally, suburban and rural industrial TROPOMI v1.3 inferred products show a higher bias compared to the TROPOMI v2.3 inferred data, by approximately 15%. Slopes are higher for the industrial TROPOMI v2.3 inferred products, improving from~0.58 to~0.78, reinforcing the fact that TROPOMI v2.3 inferred NO 2 surface concentrations correlate better with the in situ measurements. Correlation coefficients are not included in Table 3, as they are shown extensively in Table 2 and are nearly identical for the comparison of both TROPOMI versions. For the summer (Table A2), inferred TROPOMI v2.3 NO 2 surface concentrations are slightly higher for all station types by approximately 2 µg/m 3 . Relative biases are lower by approximately 5%, 14% and 13% for the traffic, background and industrial stations, respectively. Overall, the comparisons between the TROPOMI versions and the in situ measurements clearly show that the TROPOMI v2.3-inferred NO 2 surface concentrations, after the application of the updated air mass factors, correlate much better with the ground-based measurements.

Application of the Updated Air Mass Factors
Another important ingredient in the derivation of NO 2 satellite surface concentrations is the application of the updated air mass factors and averaging kernels to the satellite retrievals (Equation (4)) and the model simulations (Equation (6)) described in the CSO operator process. Therefore, the NO 2 surface products of all possible setups (Table 1) and their comparisons with the in situ measurements are examined in order to determine if the application of the updated air mass factors and averaging kernels improve the results. As concluded in the previous section, TROPOMI v2.3 is optimal for the derivation of satellite NO 2 surface concentrations, and it is therefore used as input in all setups. For the LOTOS-EUROS simulations, the meteo12 leveling scheme was applied to the model configuration, since computational time is important, as the differences between the two leveling schemes are not so important for this analysis. Figure 9 illustrates the relative bias among the inferred TROPOMI NO 2 surface concentrations of all setups and the in situ measurements, and among the LOTOS-EUROS a priori NO 2 surface concentrations and in situ measurements. The relative bias is extremely high for all setups for the urban and the suburban traffic stations, improving from~−90% in the first setup to~−78% in the third setup. For all different types of background stations, we find an improved relative bias. Urban background relative bias improves significantly from~−77% to~−45% and from~−72% to~−30% for the suburban background stations in the third setup. Rural background stations relative bias also decreases for the third setup from~−78% to −53%, albeit this improvement is not as remarkable as for the urban and suburban background stations. Finally, for the suburban and rural industrial stations the bias notably decreases from~−73% to~−51% and from~−60% to~−35%, respectively. Worth mentioning is the fact that LOTOS-EUROS relative bias is quite low (~−8%) for the rural industrial stations compared to the other datasets. This can be attributed to the fact that TROPOMI v2.3 NO 2 data seem to underestimate NO 2 levels over the selected rural industrial pixels. The effect of the air mass factors is better illustrated in the scatter plots of Figure 10. Suburban background (Figure 10a-c) and industrial (Figure 10d-f) stations are depicted. Figure 10a,d, shows the scatter plots between the in situ measurements and the inferred TROPOMI NO2 surface concentrations of the first setup derived from Equation (7). Figure 10b,e and Figure 10c,f show the same comparisons, including the inferred TROPOMI NO2 surface concentrations of the second and the third setup, respectively. The TRO- During the winter (Figure 9b), there is an obvious improvement in the relative bias of all the involved parameters with the in situ measurements for all the station types compared to the summer results (Figure 9a). Overall, inferred TROPOMI v2.3 NO 2 surface concentrations derived from the third setup seem to provide a more realistic product, closer to the ground-based truth compared to the baseline setup. More specifically, the bias of the urban and suburban traffic stations shows a remarkable improvement,~−25% and~−27%, respectively, by approximately 20-30% when compared to the first setup. Suburban and rural industrial stations bias for the third setup is approximately~−16%, almost 15% lower for both station types compared to the first setup. For the background stations, a reversal of the sign for the relative bias is evident, considering the absolute levels of NO 2 over the background stations, but still is the lowest compared to the other station types. Inferred TROPOMI v2.3 NO 2 surface concentrations slightly overestimate urban and suburban background in situ measurements by 7.4% and 3.9%, whereas the overestimation is higher (10.37%) for the rural background stations. It is apparent that the second setup (in blue) shows a lower bias for the background stations (0.49%, 1.40% and 5.96%). This can be attributed to the enhancement of the TROPOMI NO 2 tropospheric VCDs by the application of the updated air mass factors and averaging kernels. Finally, inferred TROPOMI NO 2 surface concentrations, derived from the baseline setup, show the highest discrepancies with the in situ measurements for all the station types.
The effect of the air mass factors is better illustrated in the scatter plots of Figure 10. Suburban background (Figure 10a-c) and industrial (Figure 10d-f) stations are depicted. Figure 10a,d, shows the scatter plots between the in situ measurements and the inferred TROPOMI NO 2 surface concentrations of the first setup derived from Equation (7). Figure 10b,e and Figure 10c,f show the same comparisons, including the inferred TROPOMI NO 2 surface concentrations of the second and the third setup, respectively. The TROPOMI-inferred NO 2 surface concentrations of the second setup are calculated with the application of the TM5-MP averaging kernels to the LOTOS-EUROS simulations, whereas the inferred NO 2 surface concentrations of the third setup are estimated with the application of the updated air mass factors and averaging kernels on both the TROPOMI data and the model simulations. An improvement of the slope is found for both the second and third setup, from 0.71 to 0.77 and 0.78 and from 0.63 to 0.73 and 0.76, respectively. This statistical indicator shows that the third setup performs better compared to the other two setups, and the derived NO 2 surface concentrations are closer to the ground-based data. Correlation coefficients are moderate for the suburban background station (R~0.5) and good for the suburban industrial stations (R~0.65), varying insignificantly for each setup.
Concluding, it is obvious that the TROPOMI-inferred NO 2 surface concentrations of the third setup perform better overall. Biases are significantly lower insummer. In winter, there is a remarkable improvement for the traffic and industrial stations; whereas, for the background stations, a slight overestimation is found, which, however, does not exceed the threshold of 10%. We should underline the fact that the second setup performs better for the background stations.

Optimal Setup
We have already shown the effect of the three prominent influencing quantities on the satellite-derived NO 2 surface concentrations. The application of the meteo34 leveling scheme on the model simulations generates higher inferred TROPOMI NO 2 surface concentrations over land, resulting in a lower bias for all the station types during both periods, except from the background stations during the winter months. Although the results are generally improved, we proceeded using the meteo12 leveling scheme mainly due the high computational time required for the meteo34 leveling scheme simulations. TROPOMI v2.3 NO 2 tropospheric VCDs perform better, and the performance is further enhanced with the application of the updated air mass factors on both the TROPOMI data and the model simulations. Hence, the optimal setup comprises the TROPOMI v2.3 NO 2 VCDs, the meteo12 leveling scheme LOTOS-EUROS simulations and the updated air mass factors and averaging kernels applied on the satellite data and the LOTOS-EUROS simulations (setup 3 in Table 1).
ground-based data. Correlation coefficients are moderate for the suburban background station (R~0.5) and good for the suburban industrial stations (R~0.65), varying insignificantly for each setup.
Concluding, it is obvious that the TROPOMI-inferred NO2 surface concentrations of the third setup perform better overall. Biases are significantly lower insummer. In winter, there is a remarkable improvement for the traffic and industrial stations; whereas, for the background stations, a slight overestimation is found, which, however, does not exceed the threshold of 10%. We should underline the fact that the second setup performs better for the background stations.  The simplified Lamsal equation (Equation (7)) introduced significant capabilities in the derivation of satellite NO 2 surface concentrations. However, the simple application of the a priori model simulations and original satellite data in Equation (7) produces poor results when compared to ground-based data, as shown in Figure 11. Overall, the NO 2 surface concentrations derived from the optimal setup offer a better match with the groundbased measurements for both periods. During summer, background and industrial stations exhibit the lowest bias. The differences with the in situ measurements range between 3 and Remote Sens. It is clear that the ad hoc implementation of the described baseline methodology with the original satellite and model data results in significant discrepancies with the ground-based truth. The product improves significantly when accounting for the effect of the new TROPOMI product version, averaging kernels and air mass factors. Thus, the derivation of NO2 surface concentrations is a complicated problem that requires sufficient knowledge on the existing methodologies and quantities that can bring the results closer to the ground truth. During winter, the inferred NO 2 surface concentrations of the optimal setup lie close to the ground-based concentrations of the background stations, with the lowest biases calculated for urban, suburban and rural areas, −2.22, −0.89 and −1.97 µg/m 3 , respectively. This can be attributed to the high known NO 2 loads observed by TROPOMI over background areas [23] and the enhancement of NO 2 levels due to the application of the updated air mass factors and averaging kernels. Industrial station differences are 3-4 µg/m 3 , almost 50% lower than the baseline setup. Finally, traffic stations bias shows major improvement with a mean value~11 µg/m 3 , notably lower by~8-10 µg/m 3 than the baseline setup. Note that the TROPOMI and LOTOS-EUROS resolution is, unavoidably, too low to properly resolve the high concentrations at traffic stations, resulting in higher biases. However, the implementation of the local air mass factors to the satellite NO 2 VCDs and the model simulations do reduce the bias at the traffic stations. Summarizing the aforementioned findings, Table 4 shows analytically the mean NO 2 surface concentrations of the EEA stations for the baseline and the optimal setup for both periods. It is clear that the ad hoc implementation of the described baseline methodology with the original satellite and model data results in significant discrepancies with the ground-based truth. The product improves significantly when accounting for the effect of the new TROPOMI product version, averaging kernels and air mass factors. Thus, the derivation of NO 2 surface concentrations is a complicated problem that requires sufficient knowledge on the existing methodologies and quantities that can bring the results closer to the ground truth.

Conclusions
The aim of this study is to derive NO 2 surface concentrations over Central Europe from the S5P/TROPOMI instrument. To achieve this objective, we implemented the methodology originally described by the work of [9], for three different setups. The first setup, which is the baseline setup, includes the a priori TROPOMI v1.3 and v2.3 tropospheric NO 2 VCDs and the a priori LOTOS-EUROS simulations of NO 2 VCDs and surface concentrations. The second setup includes the a priori TROPOMI NO 2 VCDs and the LOTOS-EUROS simulations, in which the averaging kernels of the TM5-MP model have been applied. Finally, the third setup includes the modified TROPOMI and LOTOS-EUROS NO 2 tropospheric VCDs after the application of the updated air mass factors and averaging kernels via the CSO process and the a priori LOTOS-EUROS NO 2 surface concentrations. The derived concentrations from all setups are compared with EEA in situ measurements. Furthermore, three important influencing quantities that directly affect the results, namely the LOTOS-EUROS leveling scheme, the TROPOMI NO 2 product versions and the updated air mass factors and averaging kernels, were examined thoroughly. The main findings of the study are summarized below.

•
The LOTOS-EUROS meteo34 vertical leveling scheme showed overall improved statistical indicators. Slopes are closer to 1, and the relative bias is lower. In particular, the relative bias in summer is lower for traffic stations by approximately 2%, for background stations by 7-9% and for the industrial stations by 5% compared to the relative bias of the meteo12 scheme. During winter, traffic and industrial stations relative bias is lower by 4-11%. The derived TROPOMI v2.3 NO 2 surface concentrations, updated with the air mass factors and averaging kernels from the local model (third setup), lie closer to the ground-based truth for both periods. In summer, biases are high for the traffic stations (~−70%) and moderate for background and industrial stations, ranging from −50% to −30%, improving significantly compared to the first setup. In winter, traffic and industrial stations bias improves from −50% to −25% and from −30% to −15%. Background-station-inferred NO 2 surface concentrations slightly overestimate the ground-based measurements in winter. In this case, the second setup shows a lower bias for the urban (+0.49%), suburban (+1.40%) and rural (+5.96%) background stations compared to the third setup (+7.40%, +3.90% and +10.37%, respectively). This enhancement can be attributed to the sharper gradients included in the updated air mass factors. Comparisons between the first and the third setups show an average improvement of 24% and 18% in the bias of summer and winter, respectively.

•
The implemented methodology performs better for the background and industrial stations for both periods. This may be attributed to the fact that TROPOMI and LOTOS-EUROS resolution is too low to properly resolve the high concentrations at traffic stations, resulting in higher net biases.

•
Results are better in winter for all station types. Model simulations are obtained only at 11:00 UTC, which is the closest time to the TROPOMI overpass. The model underestimates the in situ NO 2 surface concentrations during daytime and the underestimation is higher in summer. This might be attributed to the higher photolysis rate of NO 2 in summer (higher solar radiation, low cloud cover), which is maximized in the early afternoon. Summer NO 2 levels are significantly lower and closer to the emission sources compared to the winter, when the NO X lifetime is higher and local transport of emissions is more pronounced. Low resolution (0.10 • × 0.05 • ) model simulations and satellite observations cannot detect emissions at station level, especially in summer, due to representation issues related to the location of the stations. Differences between both periods might also be partly attributed to the anthropogenic NO X emissions used in the model, as they refer to year 2017.
Overall, the derived TROPOMI v2.3 NO 2 surface concentrations of the third setup show the best agreement with the in situ measurements. The ultimate goal of this study is to provide inferred S5P/TROPOMI NO 2 surface concentrations with a reliable methodology, for areas where in situ measurements are not available. The third setup, which is optimal, seems to respond sufficiently to this task with room for improvement.  Acknowledgments: Results presented in this work have been produced using the Aristotle University of Thessaloniki (AUTH) High Performance Computing Infrastructure and Resources. The authors would like to acknowledge the support provided by the AUTH IT Centre throughout the progress of this research work. We further acknowledge the Atmospheric Toolbox ® .