Quantifying the Impact of the Covid-19 Lockdown Measures on Nitrogen Dioxide Levels throughout Europe

: In this paper, the effect of the lockdown measures on nitrogen dioxide (NO 2 ) in Europe is analysed by a statistical model approach based on a generalised additive model (GAM). The GAM is designed to ﬁnd relationships between various meteorological parameters and temporal metrics (day of week, season, etc.) on the one hand and the level of pollutants on the other. The model is ﬁrst trained on measurement data from almost 2000 monitoring stations during 2015–2019 and then applied to the same stations in 2020, providing predictions of expected concentrations in the absence of a lockdown. The difference between the modelled levels and the actual measurements from 2020 is used to calculate the impact of the lockdown measures adjusted for confounding effects, such as meteorology and temporal trends. The study is focused on April 2020, the month with the strongest reductions in NO 2 , as well as on the gradual recovery until the end of July. Signiﬁcant differences between the countries are identiﬁed, with the largest NO 2 reductions in Spain, France, Italy, Great Britain and Portugal and the smallest in eastern countries (Poland and Hungary). The model is found to perform best for urban and suburban sites. A comparison between the found relative changes in urban surface NO 2 data during the lockdown and the corresponding changes in tropospheric vertical NO 2 column density as observed by the TROPOMI instrument on Sentinel-5P revealed good agreement despite substantial differences in the observing method.


Introduction
The global Covid-19 pandemic in 2020 has led to major changes in society, the economy, and transportation worldwide. In Europe, the first cases of Covid-19 were detected by the end of January. In February, the number of incidents increased substantially in a few countries-Italy, France and Spain-and Italy was the first country in Europe to introduce restrictions on the population. Italy imposed a quarantine on more than 50,000 people in the northern part of the country on 22 February. During March, most European countries introduced a full national lockdown, and most of these actions were taken mid-month. By 18 March, more than 250 million people were in lockdown in Europe, and by the beginning of April, 3.9 billion people or around half the global population were subject to complete or partial lockdown [1]. The global road transport activity was almost 50% below the 2019 average by the end of March, and commercial flight activity nearly 75% below 2019 by mid-April 2020 [2]. The lockdown restrictions were gradually lifted in the following weeks and months, varying substantially between the countries in Europe.
The reduced road transport and aviation led to reduced emissions of air pollutants and thereby lower levels of atmospheric pollutants, as documented by several European studies [3][4][5][6][7][8][9][10][11]. The quantification of this effect is, however, not trivial. First, weather patterns have a decisive influence on air pollutants' concentration through atmospheric

•
Pure observational-based studies in which the lockdown periods are compared with non-lockdown periods [3,10,12], either using measurements from previous years or by looking at pre-and post-lockdown periods in 2020.

•
Studies based on chemical transport models (CTMs) and observations in combination [4,7,8,[13][14][15][16][17] either by running separate baseline and lockdown emission scenarios or by estimating the lockdown effect from a comparison between measurements and business-as-usual scenarios. • Statistical based studies [5,6,9,18] using multiple regression, generalised additive model (GAM), or machine learning (ML) models to estimate the links between measured concentration levels and meteorological as well as time (day of week, etc.) data. Some studies also use a combination of statistical methods and CTMs [4,7].
There are pros and cons to each of these approaches. Pure observational-based studies are easy to conduct and avoid all model assumptions but are hampered by difficulties in subtracting the meteorological impact. CTM based studies are the standard way of assessing air pollution levels, but for the lockdown period, the CTMs were faced with difficulties with the emission scenarios since the emission changes during lockdown will vary with city, country and time. Some CTM studies use activity data during the lockdown as a proxy for the emissions [8]. In contrast, others use country-and sector-resolved emission reduction factors [4], and some studies have even turned off emissions from specific sectors entirely [16].
An advantage of statistical models (compared to CTMs) is that they do not require any emissions assumptions. Furthermore, as opposed to CTMs, statistical models can be trained on data for each measurement station separately to give optimal prediction accuracy for every station. This is particularly important for, e.g., traffic sites since concentration levels at such locations can deviate quite substantially from the surrounding grid concentrations. The main disadvantage of statistical models vs. a CTM is that the former is not built as a physical causal model but only uses statistically found associations between a set of explanatory meteorological and time variables and the resulting concentrations. Thus, one should be careful with extrapolating results from such a model to other sites and periods far into the past or future.
In this paper, we show that a specific type of statistical regression model, namely a generalised additive model (GAM), is particularly well suited for isolating the effect of the reduced emissions from other confounding processes. Various studies using machinelearning (ML) methods have been used to assess the lockdown effect on air pollutants, and the gradient boosting technique has been particularly popular [4,7,9]. The GAM approach [19,20] can also be considered an ML method. Still, the main advantage of a GAM model lies in its interpretability. It provides direct functional relationships between each input explanatory variable and the response variable (the atmospheric concentration). In contrast, other ML methods are less interpretable and tend to produce more "black-box" non-transparent relationships and results. The GAM modelling approach is also often found to have good predictive abilities [20,21]. Furthermore, since the GAM model is statistical in nature, we can provide 95% uncertainty intervals for the model predictions, which enable us to compare and check the resulting accuracy of the model with the actual observations. The GAM model can also estimate and consider long-term trends in the concentration levels over several years in the predictions for a left-out or future year without any assumptions about the change in emissions with time. The present study is a substantial extension of the preliminary results presented in [11].
Despite the fundamental differences between the various methodologies discussed above, the estimated effects of lockdown on NO 2 levels in Europe seem to be reasonably consistent across the studies. The studies are, however, not directly comparable since the studied periods differ somewhat.
In their observational-based work, Baldasano [3], Sicard et al. [10], and Tobias et al. [12] all estimated reductions in the NO 2 concentration levels of the order of 50-65% for urban and traffic sites in Spain, France and Italy in March 2020. They also found that consideration of the varying weather patterns had a decisive influence on the estimated levels. These estimations agree very well with the CTM based findings of [4] for urban areas in the same countries. For countries that adopted softer lockdown measures, such as Germany, the Netherlands, Poland and Sweden, Barré et al. [4] estimated smaller reductions in NO 2 levels. Keller et al. [7], using the NASA global atmospheric composition model GEOS-CF (GEOS Composition Forecasts) with a bias-correction methodology found a 46% reduction in NO 2 over Spain (14 March to 23 April) and widespread reductions in the order of 22% in March and 33% in April over Europe. Using the WRF-CHIMERE model for Western Europe with different emission scenarios, Menut et al. [8] estimated NO 2 concentration reductions in the order of 15-30% for Germany and the Netherlands and 35-45% on average in other countries for March 2020. Grivas et al. [13], looking at the Greater Athens area using the TAPM model estimated average NO 2 concentration reductions in 30-35%, and up to 50% reduction in some Athens basin areas.
Based on a machine learning model fed by meteorological data and time features for background and traffic stations in Spain, Petetin et al. [9] estimated a mean reduction in NO 2 concentration levels of 40% already early in the year when less stringent restrictions were introduced, increasing up to 55% reduction during later and more strict phases of the lockdown. Ordonez et al. [18], applying a GAM model for the period 15 March to 30 April found the best correlation for Benelux sites. For the meteorologically adjusted changes, they found 47-50% reductions in the NO 2 concentration levels for urban locations in France, Italy and Spain. Grange et al. [6] used an ML model called Random Forest Model on NO 2 and O 3 data for 102 metropolitan areas and 34 countries in Europe. They estimated NO 2 reductions that agree very well with the studies mentioned above, with on average 34 and 32% lower concentration levels than expected at traffic/roadside sites and urban background sites, respectively. They also found that the oxidant level (O x = NO 2 + O 3 ) was more or less unchanged during the lockdown, implying a similar increase in O 3 accompanied the reduced NO 2 .

Method
A GAM model [19,20] is a non-linear regression model linking expected values µ i of a given response variable Y i to several explanatory variables x ij through the following set of relations: where β 0 is a constant (the intercept), and where β j (·), for j = 1, . . . , p, represents smooth functions of the covariates x ij , with p the number of such covariates. Our GAM model was developed over several years [22,23] and was initially designed to assess air pollutant trends in Europe based on long-term monitoring data of O 3 , NO 2 and PM. That work aimed to apply and adapt for European conditions a statistical method that has been used by the US-EPA (Environmental Protection Agency) on a routine basis for surface ozone trend assessments, adjusting for the inter-annual impact of changing meteorology [24].
The response variable Y i in (Equation (1)) represents a measured air pollutant concentration at day number i at a given site, while x ij represents the values of individual explanatory variables for j = 1, . . . , p at the same location and at the same day i, typically meteorological data, such as temperature, humidity, etc., as well as time variables (day of the week, etc.). In (Equation (1)) g(·) is a function linking the statistical expected value of the response variable Y i , i.e., µ i , to the explanatory variables x ij . In a GAM model, the response variable Y i is assumed to have a specific probability distribution, known as the response distribution, with mean µ i and variance V i . Further, a GAM model is an extension of a multiple linear regression (MLR) model where each β j is a smooth function of x ij and not a constant to be multiplied with x ij as in an MLR model, and where the mean value µ i is more generally related to the covariates through the given link function g(µ i ). For NO 2 , we apply a log link function g(µ) = log µ and a Gamma distribution as a response distribution. This is because NO 2 has a relatively large range of concentration variation of several orders of magnitude, where the variance of Y i , i.e., V i , is typically proportional to µ 2 i . Thus, for such a variable, it is common practice in GAM modelling to choose a logarithmic link function and a distribution which is skewed to the right, such as a Gamma distribution, as a response distribution for Y i [20]. This was also applied in the previous trend studies [22,23]. In these studies, we looked at surface data of O 3 , NO 2 , PM 10 and PM 2.5 .
Although developed initially for long-term trend studies, the GAM model proved to be very well suited for studies of the effect of the lockdown measures on air pollutant concentrations during the Covid-19 pandemic. The conceptual idea of the GAM is to establish statistical relationships between the input explanatory variables and the measured air pollutant by training the model on specific periods and then applying the established model to predict the air concentrations in another period. Provided that the model performs reasonably well compared to measurement data, the difference between the predicted NO 2 levels for 2020 (the expected or business as usual (BAU)) and the measured levels gives the reduction in NO 2 due to the activity restrictions during the pandemic. In the following, we document that the model could be used in this way.

Statistical Uncertainty of the GAM Predictions
The uncertainty in the GAM model predictions, depicted as the grey shaded areas of the prediction plots in Section 3, is defined as 95% prediction intervals of the unconditional response distribution of modelled concentrations of NO 2 for each day. These distributions cannot be given analytically, so a Monte Carlo approach was used to define each interval. At day number i, N samples of log-expected values logμ ij , j = 1, . . . , N, were first drawn from a normal distribution with meanμ i and standard deviationσ i . These values corresponded to the estimate of the expected value and standard error, respectively, of the linear predictor (Equation (1)) for day number i. Next, shape (a) and scale (s) parameters of a Gamma conditional response distribution given the expected valueμ ij was defined in the usual way [19] asâ =φ −1 andŝ =ŝ ij =μ ij /â, where,φ is the estimated scale or dispersion parameter. Then, N samples of predicted concentrationsŷ ij were obtained by random draws from Gamma distributions, i.e.,ŷ ij ∼ Gamma â,ŝ ij , representing samples from the unconditional (compound) response distribution of modelled concentrations given the data. Finally, a 95% prediction interval was obtained for each day as the interval between the 0.025 and 0.975 sample quantiles of these concentrations. After some testing with various values of N, 100 samples were found to give satisfactory results in defining the 95% prediction intervals, with a good balance between the accuracy of final intervals and the computational efforts.

Input Data
The study was based on official air quality measurement data reported to the European Environment Agency (EEA) through the e-Reporting system. These data are publicly available through a web interface (https://discomap.eea.europa.eu/map/fme/ AirQualityExport.htm). EU member states, EEA countries, and other associated European countries report measurement data for a wide range of air pollutants to EEA's e-Reporting database on an automated, near real-time basis. The most recent data belong to the E2a data set, also named UTD data (Up to Date), and have been through less stringent quality control procedures. In October/November of each year, the previous year's data are resubmitted. These data constitute the E1a data set, meaning validated data that have been through more rigorous quality control.
In this study, we investigated the period March-July for the years 2015 through 2020. Measurement data were extracted from the e-Reporting database at the end of October 2020, meaning that we used E1a data for 2015-2018 and many of the sites in 2019 (while E2a for the rest) and E2a data for 2020. March through July 2020 included the introduction of lockdown measures in most of Europe, with substantial implications for the road traffic, particularly in March through April, followed by a period of gradual recovery towards more average conditions. From experience with previous testing and application of the GAM model, the five preceding years (2015-2019) provide a sufficient reference for the GAM to be trained on. A more extended period would reduce the number of available sites and increase the importance of interannual trends in pollutant concentrations, whereas the benefit concerning improved model performance is expected to be minor.
All NO 2 data are reported to EEA as hourly averages. The GAM is based on daily input values for meteorology and air quality data, and all NO 2 data were transformed to daily mean values on the input to the model.
The monitoring sites reporting to the EEA are classified according to station type (background, industry, or traffic) and area type (rural, suburban, or urban). In principle, this constitutes nine combinations overall, although some combinations will rarely occur (such as background traffic). Based on these classes, we allocated the stations to the following three categories: • Traffic (all area types); • Urban background and suburban background; • Rural background.
We used operational and ERA-interim data [25] for the meteorological input to the model provided by the European Center for Medium-Range Weather Forecast (ECMWF). ERA-interim (ECMWF Re-Analysis) data has a spatial resolution of approximately 0.75 • . The operational data, which were used after August 2018 when there was no ERA-interim data available, has a spatial resolution of roughly 0.14 • . ERA-interim has 60 vertical levels, and the operational dataset has 137. All data were interpolated from the original data, given as spherical harmonic coefficients to gridded fields of 0.3 • resolution.
The input meteorological data are listed in Table 1. Air temperature at 2 m, specific humidity, and the two horizontal wind vectors were extracted from the analysis at 00:00, 06:00, 12:00 and 18:00 UT, respectively, for the lowest vertical model level. Air pressure at mean sea level was available as a surface field. The top net solar radiation and the planetary boundary height were extracted at 15:00 UT as forecasted data. The top net solar radiation was the incoming solar radiation minus the outgoing solar radiation (by reflection and scattering from the atmosphere and the surface) at the top of the atmosphere. Continuous-time in fraction of years (0.0 = 1 Jan at start of period). This is the trend term.
Based on the gridded fields of meteorological data, we prepared annual time series containing daily values of temperature, relative humidity, solar radiation, planetary boundary layer height (PBL) and wind speed and wind direction at 10 m height for each station separately by picking the data values in the grid square containing the station. Temperature, relative humidity and wind were aggregated into daily mean values based on the four data values each day. The mean wind direction was obtained using a vector mean. The other parameters were already given as daily data, as mentioned.
In addition to the meteorological data, three time-variables were included as input to the GAM: day of week number (1, . . . , 7), the day number in season, and overall time since 1 March 2015 given as year fraction. Whereas the two first variables are cyclic, the latter is a continuous term that considers long-term trends in the concentration levels.
To account for missing data, a data capture criterion of 75% each year was applied, meaning that for a station to be included in the analyses, it should have at least 75% valid daily data for the actual period (March-July) for every year from 2015 through 2020.
As explained above, the model setup implies that the response variable, i.e., the daily mean NO 2 concentration, was estimated by a linear combination of the meteorological and time variables for that grid square and that day. In other words, air mass history and long-range transport effects were not considered. While this is a significant simplification, experience shows [23] that this simple approach can predict daily mean NO 2 levels fairly accurately at many monitoring stations, as discussed in more detail below.

Model Performance of the GAM
To assess the model performance, the GAM was used to predict the daily NO 2 levels at all sites in each of the years 2015-2019. In these calculations, the GAM was optimised based on data from the remaining years (but not 2020), whereas the actual year was not included. These predicted daily values were then compared to the measured data, as explained below.
Various statistical measures were calculated to assess the model performance based on the predicted vs. the measured daily NO 2 concentrations at each site individually for the March-July period. Since the GAM was optimised to the observations, the model was unbiased by construction, and the mean bias was indeed found to be close to zero. In this study, we used the linear correlation coefficient (r) and the normalised mean gross error (NMGE) as the model performance measures. The NMGE was chosen since it is a measure of the mean relative deviation of the model from the observed values and is independent of the absolute level of NO 2 , which is essential considering the large variations in NO 2 concentrations over Europe.
The GAM was applied to almost 2000 stations, and in the post-processing of the results, we found that the model failed for a number of the sites. Inspection of the measurement Atmosphere 2021, 12, 131 7 of 20 data indicated that major breaks in the time series (e.g., due to station placement changes) either within one year or between the other years was the cause for many of these failures. Thus, a screening of the stations was required, and we decided to set a criterion of a minimum correlation threshold of r ≥ 0.65 for the linear correlation between the daily GAM predictions and the measured data based on all data from 2015-2019 for a station to be included in the analysis. An additional criterion on the NMGE was not considered necessary since the r-criterion also filtered out the sites with the highest NMGE values.
The total number of sites in each category and their average r and NMGE values (before and after the screening of the stations), as well as the percentage fraction of sites fulfilling the r-criterion, is given in Table 2. The best agreement between the GAM predictions and measurements was found at traffic sites followed by the urban and suburban background sites that showed a somewhat poorer agreement. For rural background sites, the model performance was considerably poorer, which was expected since these sites are, to a larger extent, controlled by long-range transport events and not by the local emissions and meteorology at the site.  Table 2 shows that 85% and 81% of the traffic and urban/suburban stations, respectively, fulfilled the r-criterion, whereas only around half the background rural stations passed this criterion. The mean correlation coefficient for all traffic and urban/suburban sites was 0.72-0.73 and the NMGE 20-24% before filtering. After filtering, the mean r value was 0.77 and the NMGE 18-22% for these sites. The r and NMGE values were considerably poorer for rural background sites before filtering. The total number of sites after filtering was 1383. Figure 1 shows the cumulative distribution of the correlation coefficients for the three categories of stations. It indicated that the model performance was fairly even for traffic and urban/sites with a tail of poor-performing sites at the left end of the diagram followed by fairly uniform r values ranging from 0.65-0.90. For the rural background sites, the cumulative distribution was different, indicating that the lack of model performance for these sites reflected that the GAM was less fit to predict NO 2 levels at these locations. The geographical distribution of r and NMGE for all the NO 2 sites before the screening is shown in Figure 2. This indicated that the agreement between the GAM and the measurements was best (high r, low NMGE) in the northwest part of the continent, i.e., Benelux, northwest Germany, northeast France and England. A somewhat more inferior agreement could be observed in southern Europe, particularly Spain and some parts of Italy. Figure 2 revealed many sites with very low r-value in Spain, mostly at rural background sites. Simultaneously, sites in the Madrid and Barcelona agglomerations showed a good agreement between observed and GAM predicted levels, which are discussed further below.

The Impact of Lockdown and Recovery on European NO 2 Levels
As explained above, the GAM was first trained on the measured daily data from March-July for the five years 2015-2019 for each monitoring station separately. The estimated GAM model (Equation (1)) was then applied for predicting the expected levels in March-July 2020 given normal conditions and no lockdown. The differences between the GAM model predictions and the measured values are then seen as the effect of the pandemic lockdown restrictions. Only sites fulfilling the criterion of r ≥ 0.65 were included in the following analyses. Figures 3-5 show the calculated mean relative differences between the predicted and measured NO 2 levels for April 2020, the month with the strongest impact from the lockdown, for the three categories of stations. Stations with a statistically significant change in NO 2 (on a p = 0.05 level) were plotted as squares, while the others were plotted as circles.

The Impact of Lockdown and Recovery on European NO2 Levels
As explained above, the GAM was first trained on the measured daily data from March-July for the five years 2015-2019 for each monitoring station separately. The estimated GAM model (Equation (1)) was then applied for predicting the expected levels in March-July 2020 given normal conditions and no lockdown. The differences between the GAM model predictions and the measured values are then seen as the effect of the pandemic lockdown restrictions. Only sites fulfilling the criterion of r ≥ 0.65 were included in the following analyses. Figures 3-5 show the calculated mean relative differences between the predicted and measured NO2 levels for April 2020, the month with the strongest impact from the lockdown, for the three categories of stations. Stations with a statistically significant change in NO2 (on a p = 0.05 level) were plotted as squares, while the others were plotted as circles.   The results show marked regional differences in Europe, with the most substantial impacts in the south and west and the least impact in the east. These maps indicated that the largest reductions in NO2 levels occurred in Spain, France and Italy and the smallest declines in eastern countries, such as Hungary, Slovakia and Poland. This is further illustrated in Figure 6, showing the country-averaged spread in the observed minus expected NO2 levels for April in each of the preceding years (2015-2019) given in blue and for April 2020 in red for each country separately. For traffic sites, we estimated the largest median decrease in NO2 levels in Spain (60%), Italy (57%), Portugal  The results show marked regional differences in Europe, with the most substantial impacts in the south and west and the least impact in the east. These maps indicated that the largest reductions in NO2 levels occurred in Spain, France and Italy and the smallest declines in eastern countries, such as Hungary, Slovakia and Poland. This is further illustrated in Figure 6, showing the country-averaged spread in the observed minus expected NO2 levels for April in each of the preceding years (2015-2019) given in blue and for April 2020 in red for each country separately. For traffic sites, we estimated the largest median decrease in NO2 levels in Spain (60%), Italy (57%), Portugal The results show marked regional differences in Europe, with the most substantial impacts in the south and west and the least impact in the east. These maps indicated that the largest reductions in NO 2 levels occurred in Spain, France and Italy and the smallest declines in eastern countries, such as Hungary, Slovakia and Poland. This is further illustrated in Figure 6, showing the country-averaged spread in the observed minus expected NO 2 levels for April in each of the preceding years (2015-2019) given in blue and for April 2020 in red for each country separately. For traffic sites, we estimated the largest median decrease in NO 2 levels in Spain (60%), Italy (57%), Portugal (57%), France (56%) and Great Britain (46%). These results agree well with other published studies [4,[6][7][8].
timated by our study agrees very well with the results from many other European studies [3,4,[6][7][8]18] based on different methodologies. Therefore, we are confident that these differences between the countries express real differences in the lockdown effect on NO2 levels in different countries.
The box-whisker plots shown in Figure 6 are sensitive to outliers when the number of stations is small. This is seen for the traffic sites in Sweden where the results were affected by one single station (SE0058-Dalaplan) in Malmö reporting substantially higher levels than expected in April 2020. These measurements were most likely wrong or reflect a very local change to the traffic pattern since a neighbouring traffic station (SE0096-Bergsgatan) located just 1 km away did not show any signs of such elevated NO2 levels. In addition, in previous years, these two sites were highly correlated with each other. Figure 6. The country-wise differences between measured and expected mean NO2 concentration in April 2020 for traffic sites (upper panel) and urban/suburban sites (lower panel). The numbers Figure 6. The country-wise differences between measured and expected mean NO 2 concentration in April 2020 for traffic sites (upper panel) and urban/suburban sites (lower panel). The numbers in brackets give the number of stations. Only countries with at least four stations in the given category were included in the figure. The centreline shows the median value while the boxes span from the 25to 75-quantile and the whiskers from the 9-to the 91-quantile.
These values were calculated as: where the averages of observed and GAM predicted concentrations were taken over all stations of each category in each country for April in each of the years 2015-2020.
To investigate if the estimated differences in NO 2 reductions between the countries could be explained by systematic differences in GAM performance and reflect a model artefact, we looked at the relationship between model performance and ∆NO 2 . This showed no covariation between the linear correlation coefficients (r) and ∆NO 2 . Furthermore, the country-wise differences in NO 2 concentration reductions during lockdown estimated by our study agrees very well with the results from many other European studies [3,4,[6][7][8]18] based on different methodologies. Therefore, we are confident that these differences between the countries express real differences in the lockdown effect on NO 2 levels in different countries.
The box-whisker plots shown in Figure 6 are sensitive to outliers when the number of stations is small. This is seen for the traffic sites in Sweden where the results were affected by one single station (SE0058-Dalaplan) in Malmö reporting substantially higher levels than expected in April 2020. These measurements were most likely wrong or reflect a very local change to the traffic pattern since a neighbouring traffic station (SE0096-Bergsgatan) located just 1 km away did not show any signs of such elevated NO 2 levels. In addition, in previous years, these two sites were highly correlated with each other.
The initial lockdown effect and the gradual recovery is indicated in Figure 7. This shows the monthly median deviation from the expected NO 2 levels, as calculated by the GAM at all urban and suburban stations (including traffic sites) in 2020 for April-July for each country with at least ten such sites. The lockdown was introduced around the middle of March in most countries, whereas lifting the restrictions varied substantially between the countries concerning date and content.   Figure 7. The median relative drop in NO 2 concentration at all urban and suburban sites is given by the difference between the GAM predicted and observed level in April-July 2020. Only countries with at least ten sites are shown.
The monthly data given in Figure 7 show a gradual recovery during April-July for all countries. Still, none of the countries was "back to normal" even in July, indicating reduced NO 2 levels in all of Europe even long after the lockdown restrictions had been lifted. This agrees with the findings of [7]. Countries in the east with the least reduction in NO 2 levels in April, such as Poland and the Czech Republic, also showed the least change during the period, staying at around 20% reduction during these months. In the countries with the largest NO 2 reduction in April, the NO 2 drop changed in these four months from 60% to 30% in Spain, and from 51% to 19% and 20% in France and Italy, respectively. The time series of the aggregated NO 2 data (observed and predicted) for all traffic stations in Spain and the Czech Republic from 2015 to 2020 is given in Figures 8 and 9 as examples. Similar figures for all countries are given in Figure S1-S3 in the Supplementary Material based on traffic stations, urban/suburban background stations, and rural stations, respectively. These results showed a very good agreement between the observations and the GAM predictions when averaged over the country. The weekly cycle and the peaks and dips through the period were reproduced very satisfactorily by the GAM. The data from the Czech Republic indicated a tendency for an underestimation of the NO 2 levels during some of the peak episodes. These results gave strong support that the GAM provides reasonable predictions for the daily NO 2 levels at these sites. It should be noted that the model performance was comparable for the other European countries, as can be seen from the plots in the Supplementary Material.  The results for 2020 in Figure 8 show a substantial reduction in measured NO 2 levels compared to the Spanish traffic sites' predictions, most pronounced during the lockdown (14 March-9 May). After lifting the lockdown in May, the disparity between the expected and observed levels was gradually reduced, and by the end of July, the measured NO 2 levels were within the 95% confidence interval of the GAM predictions.
In contrast, the results for the Czech traffic sites in 2020 ( Figure 9) showed that the measured NO 2 levels were below the predictions of the whole period, but mostly within the 95% confidence interval of the predictions and without a very clear trend from March to July, as also discussed above. The country-level differences in the estimated lockdown effects over Europe based on the GAM agree well with other findings as, e.g., reported by [4,7]. Figure 10 shows the observed and predicted NO 2 concentrations during March-July at traffic stations in six large cities in Europe: Barcelona, Madrid, Rome, Paris, Vienna and Berlin. For the first four of these cities, the results showed substantial reductions in the NO 2 levels compared to the predictions. In contrast, Vienna and Berlin's levels were only slightly reduced. The Spanish sites' results align with the study by [10], using an ML-based approach to analyse NO 2 data from Spain in March-April. As seen from Figure 10, the day-to-day variations in the observations follow closely the predictions for all the cities but at lower levels. This is a strong indication that the reduced levels reflect emission reductions and not weather anomalies or other confounding processes. In addition, for Berlin, the observations and predictions correlated very well, indicating that the much smaller reductions in NO 2 compared to the other cities reflect that the emission reduction in Berlin during the lockdown was substantially lower than in the other three cities. From 1 May, Berlin's observed levels were close to the predictions indicating that the road traffic in Berlin was back to normal conditions very early compared to the other cities. Figure 11 provides a spatial view for the same set of six cities, showing the station-level reductions in measured surface NO 2 compared to the business-as-usual scenario.

A Comparison of the GAM Predictions with Satellite Data for Selected Cities
Earth-observing satellites allow for a unique spatially continuous air quality perspective that is typically not possible with the relatively sparse official air quality monitoring network. The recently launched Sentinel-5P satellite with its TROPOMI instrument allows for maps with higher spatial resolution than previously possible. Using such data, simple comparisons of monthly mean NO 2 levels between different years can be made. Figure 12 shows a comparison of the April 2019 NO 2 levels against the same period of 2020. Qualitatively, the impact of reduced emissions due to lockdown measures is clearly visible. However, such a simple comparison is prone to various uncertainties, and most importantly, the effect of different meteorology between the two years is not accounted for [4]. Relative differences calculated in such a way between two years result from a combination of various effects and can thus not be interpreted as the sole signal of lockdown measures. To estimate to what extent quantitative estimates with a simple satellite-based technique are comparable with a more robust meteorology-correcting approach as the one presented in this paper, we extracted the relative difference of the 2019 and 2020 April NO 2 Tropospheric Vertical Column Density (TVCD) averages from TROPOMI/S5P over a circular region of 40 km diameter for all cities for which the GAM-based analysis had at least ten stations. All available data from the official Level-2 offline NO 2 product were gridded to 0.025 • by 0.025 • spatial resolution, filtered for clouds and other retrieval issues (using only retrievals with quality assurance flag values of greater than 0.75), averaged to daily mosaics, and subsequently averaged over one month. These non-meteorologycorrected satellite estimates could then be directly compared to the meteorology-corrected relative differences from the GAM approach. The results can be seen in Figure 13. There was a surprisingly strong correlation between the two datasets with an R 2 value of 0.91. However, there appeared to be a positive bias in the satellite data, particularly for relative differences around −20%. This was also confirmed by the slope of a linear regression, which showed a slope of 0.81. Nonetheless, the correlation between the two datasets was robust, particularly considering the substantial uncertainties in the satellite-derived estimates and the fact that satellites provide integrated atmospheric column measurements as opposed to the surface-based station observations. This result indicates that simple year-to-year comparisons from satellite data can be useful for a first indicative analysis, even though a proper meteorology-correction along the lines demonstrated in this paper continues to be necessary for a robust quantitative analysis.

Conclusions
The strong restrictions on human activities linked to the first wave of the Covid-19 pandemic in spring 2020 in Europe led to substantial road traffic changes. This, in turn, led to significant reductions in the level of NO 2 and other pollutants. The quantification of this reduction is not trivial since differing weather patterns and underlying long-term emission trends could mask the signal from the lockdown in 2020. Various methods have been published to solve this issue, and in the present paper, we showed that a GAM (generalised additive model) was very well suited for the task.
The conceptual idea of the GAM is to establish statistical relationships between input explanatory variables and measured air pollutants by training the model on specific periods and then applying the established model to predict air concentrations in another period. The present study was based on NO 2 measurement data from the European Environmental Agency (EEA) and gridded meteorological data extracted from ECMWF for the period March-July, in 2015-2020. The GAM was applied for nearly 2000 NO 2 monitoring stations separately, first by training on the 2015-2019 period and then used to predict "business-asusual" levels for 2020.
The results revealed that a screening of the stations was required. For most of the sites, the GAM provided good predictions of the daily NO 2 levels. In contrast, for a minor number of the stations, an inferior agreement between predicted and observed levels was found. Many of these cases could be explained by inconsistent measurement data. A larger fraction of the rural background sites showed less good agreement between predicted and measured NO 2 levels, reflecting that NO 2 at these sites were, to a more considerable extent, controlled by long-range transport, which is not captured by the GAM.
The results after aggregating all traffic sites (or urban/suburban sites) for individual countries show particularly good agreement between predicted and observed daily NO 2 levels. This is likely an effect of station-wise peculiarities cancelling out. For the urban and suburban stations, we estimated the most substantial lockdown effect on NO 2 in Spain with a 60% reduction as a country average, followed by Italy (51%), France (51%), Portugal (47%) and Great Britain (43%). The least impact was estimated for the eastern countries of Poland (22%) and Hungary (23%). Our results showed a gradual recovery during April-July for all countries. Still, even in July, the NO 2 levels were 20% lower than expected in many countries, indicating that the effect of reduced emissions lasted long after the first lockdown restrictions had been lifted.
Aggregating the results for European cities also revealed large differences between the cities with Barcelona and Madrid on one end of the scale (mean reduction of around 60% in April) and Berlin, Hamburg and Vienna on the other end (20-30% reduction).
Whereas chemical transport models (CTMs) are state-of-the-art tools concerning assessment studies on a regional scale [26], they are less applicable for urban and roadside conditions. For these locations, statistical models such as the GAM could fill a gap assessing pollutants as documented in the present work.
The GAM has also been applied for PM 10 , PM 2.5 and surface ozone [11,22,23] at rural, urban and suburban locations. The experience is that the GAM is indeed a valuable tool even for secondary pollutants at rural background sites, offering a low-cost model type that is complementary to resource-intensive CTMs. The GAM presented in this work will be applied to all of 2020, including the second wave and lockdown of the Covid-19 pandemic, as soon as the data are available.  Data Availability Statement: The study did not report any data.
Acknowledgments: Sabine Eckhardt at NILU is acknowledged for extracting the meteorological data from ECMWF. The GAM model's computations were performed on resources provided by UNINETT Sigma2-the National Infrastructure for High-Performance Computing and Data Storage in Norway.