1. Introduction
Despite positive developments in emission reductions, air quality is still of concern in Europe. Particulate matter (PM), nitrogen dioxide (NO
2), and ground-level ozone (O
3) are Europe’s most problematic pollutants negatively affecting human health. The European Environment Agency (EEA) estimated that in 2018 in the 28 European Union member states 379,000 premature deaths could have been caused by long-term exposure to particles with a diameter of
or less (PM
2.5), 54,000 to NO
2, and 19,400 to O
3 [
1]. According to the EEA, in 2018 48% and 74% of the urban population in Europe (EU-28) was exposed to concentrations above the World Health Organization (WHO) air quality guidelines (AQG) for particles with a diameter of 10
or less (PM
10) (annual mean 20
m
) and PM
2.5 (annual mean 10
m
), respectively. Fortunately in most European countries NO
2 concentrations steadily decreased between 2009 and 2018. Only 4% of the European population were exposed to NO
2 concentrations above the EU annual limit value, which is equal to the AQG (40
m
in a calendar year) in 2018 [
1]. SO
2 pollution plays only a minor role in Europe these days, although, for example in vicinities of large power plants infrequent exceedances of limit values do occur (daily 20
m
) and in 2018, based on the WHO AQGs, 19% of the urban population in Europe was affected [
1].
Air quality maps are generated to inform the public about air pollution levels in the region they are living in. They serve as visualization of the actual situation and as a basis for air quality assessments. In situ observations, satellite measurements, and output from chemical transport modeling (CTM) are three mutually complimentary sources for generating air quality maps. In situ measurements provide accurate actual levels of concentrations, satellite data provide observations of spatial and temporal patterns (but not concentrations directly) and modeling outputs provide spatially continuous coverage of given area. The satellite based monitoring initiative for regional air quality (SAMIRA) project was set up to explore the added value of satellite data for air quality mapping through their synergistic use together with in situ air quality and modeling data. Satellite observations used in the project were acquired by the geostationary spinning enhanced visible and infrared imager (SEVIRI) onboard Meteosat second generation (MSG) [
2], the ozone monitoring instrument (OMI) onboard NASA’s Aura platform [
3] and the TROPOspheric monitoring instrument (TROPOMI) on the Sentinel-5 Precursor (S5P) satellite [
4].
For estimating human exposure to air pollution the knowledge of PM concentration is essential. Ground-level concentration of PM
2.5 can be estimated from satellite observations of total-column aerosol optical depth (AOD) utilizing various approaches (see e.g., [
5,
6,
7], and references therein). Geostationary satellites instruments like SEVIRI allow for the retrieval of AOD (e.g., [
8,
9,
10]) at high temporal frequency and are, therefore, particularly interesting for air quality applications. Therefore, the SEVIRI near-real-time (NRT) AOD retrieval was a first task within SAMIRA. Satellite AOD is a convolution of the contribution from within the planetary boundary layer (PBL) and the free troposphere, locally-produced, and long range transported aerosols. Due to the complex spatial and temporal relationship between the total column aerosol optical depth and ground-level particulate matter, AOD-to-PM conversion, the second activity within SAMIRA, is a rather complex challenge. A multitude of methods for the AOD-to-PM conversion were developed throughout the years, for example using empirical and multivariate relations (e.g., [
11,
12]), scaling of the satellite AOD with the PM
2.5/AOD ratio from a CTM [
13], synergistic satellite and ground-based AOD [
14], spectral and synergistic satellite information [
15,
16], fused satellite and model-calibrated PM
2.5 [
17], and machine learning [
18,
19,
20]. For SAMIRA we chose a physical based AOD-to-PM conversion method, the foundation of which goes back to work of [
21].
SAMIRA dealt also with NO
2 and SO
2, which was of particular interesting due to the improvement in spatial resolution and sensitivity introduced by TROPOMI. Whereas OMI pixels have a spatial resolution of up to 13 × 24 km
2 at nadir, TROPOMI has a footprint of 3.5 × 5.5 km
2 at nadir (since August 2019). For OMI, the QA4ECV NO
2 is one of the most recent NO
2 retrieval products [
22]; for TROPOMI NO
2 see [
23]. For Romania, which was one of our study regions, [
24] reported that annual SO
2 from power plants decreased between 2005 and 2015, while NO
2 emissions were more or less stable during that time period. TROPOMI led to a new area of top-down NO
2 emission monitoring from space [
25,
26], as well as bringing advances for surface NO
2 concentration estimates (e.g., [
27]). For SO
2, despite the increased sensitivity of TROPOMI, in 2018/2019 only the largest SO
2 emitters in Europe, e.g., the Polish Bełchatów coal power plant, were visible from space, due to the installation of flue gas desulfurization systems in the European Union [
28].
An essential goal of SAMIRA was to improve the societal relevance of air quality data measured from space. This can be done by combining ground-based in situ data with model output and satellite data products. For the combination of different data sources, a range of methods can be used to create spatial concentration fields. Such methods are often referred to as data assimilation and data fusion [
29], the latter being a subset of data assimilation methods in a wider sense [
30]. Therefore, within the SAMIRA initiative, an operational algorithm for data fusion of multiple heterogeneous datasets was extended to make use of various NRT satellite-based air quality products.
Although with the launch of the TROPOMI instrument the spatial resolution of air quality-related satellite products has significantly improved, the available resolution is still relatively coarse for urban- and local-scale applications, where air pollution tends to have the most significant consequences for the human population. Therefore, we investigated the feasibility of statistical downscaling of OMI and TROPOMI data with the help of geostatistics and a fine-scale proxy datasets. The proposed technique builds upon extensive previous research in geostatistics [
31,
32,
33,
34]. In geostatistical terms, downscaling is essentially a change of support problem (with support denoting the area of an observation, e.g., a point, a pixel, a grid cell, or a polygon), where the coarse spatial support of the original dataset is seen as an areal support and the fine spatial support of the target resolution is seen as a point support.
Finally, a pre-operational in situ PM10 data assimilation system was developed within SAMIRA. This development did not include satellite data yet. Therefore, it is only briefly described for completeness, being the first step towards a full air quality data assimilation system for Romania.
With this paper we want to give an overview and share lessons learned within the SAMIRA initiative. Following this introduction, in
Section 2 we present the general methodology used in the project. Example results from the data product development are shown in
Section 3. In
Section 4 the validation of the datasets is presented.
Section 5 illustrates the visualization system of the data. A discussion of the results, conclusions, and outlooks are summarized in
Section 6.
2. SAMIRA Methodology
The overall approach, the activities and the logical flow of the SAMIRA project are illustrated in
Figure 1. SAMIRA evolved around five primary research tasks: a. the SEVIRI AOD retrieval, b. the PM
2.5 retrieval, i.e., the AOD to PM
2.5 conversion, c. the data fusion methodology, which means the integration of satellite-based datasets with in situ monitoring and modeling data, d. the downscaling methodology, i.e., the development of algorithms for increasing the spatial resolution of satellite-based air quality products, and e. the in situ PM
10 data assimilation. All activities marked in green in
Figure 1 made use of satellite data, which were to various extents combined with ground-based in situ, model data and auxiliary datasets.
The SAMIRA data product development took place in two subsequent steps: a. the retrieval of historical data (June–September 2014) and their validation, and b. the demonstration and development of the products in near-real-time. The SEVIRI NRT AOD retrieval was developed based on an existing optimal estimation algorithm to provide up-to-date high-frequency AOD maps for Poland, Romania, the Czech Republic, and southern Norway (see
Section 2.1). SEVIRI AOD and output from the weather research and forecasting (WRF) model coupled with chemistry (WRF-Chem) [
35,
36,
37] was combined to calculate near-surface hourly PM
2.5 (see
Section 2.2). WRF-Chem was run on the Babeş-Bolyai University (UBB) high performance computer for June–September 2014 in multiple configurations. The WRF-Chem model was integrated for the entire European area at two horizontal resolutions: 15 km and 5 km. From the model integration at 15 km horizontal resolution, nested domains were run for each country and region of interest, at 5 km (country) and 1 km (region) horizontal resolution, respectively (see
Figure 2). For areas marked in bold, we show exemplary results in the following. We combined in situ with satellite data and output from a chemistry transport model (CTM), either WRF-Chem or the comprehensive air quality model with extensions (CAMx) using data fusion techniques (see
Section 2.3); the methodology was demonstrated for Europe and the Czech Republic. In the current paper we focus on regional and local air quality matters, therefore, we show results from the latter only. Finally, to make the coarse spatial resolution of satellite observations more suitable for local applications, satellite data were downscaled with the help of the high-resolution CTM output and alternative time-invariant proxies using geostatistics (see
Section 2.4). The methodology was demonstrated for NO
2, SO
2, and AOD/PM using OMI and TROPOMI data for the capitals of the four countries and areas, which are known for their bad air quality, e.g., the Ostrava/Katowice area. Finally, preparatory work for the development of an operational PM air quality forecast system in Romania was done by assimilation of in situ PM
10 into the WRF-Chem model (see
Section 2.5). In the following all retrievals are described briefly.
2.1. SEVIRI AOD Retrieval
The first step in the SAMIRA air quality product development was the SEVIRI NRT AOD retrieval. A prototype algorithm was initially developed for Poland [
38] and within SAMIRA it was improved and extended to the Czech Republic, Romania, and Southern Norway. The algorithm was modified based on case studies presented in [
39,
40,
41]. Improvements that were made are related to the surface reflectance estimation, the improved cloud screening, and the uncertainty calculation. For a full description of the final version of the algorithm, we refer the interested readers to [
42]. In short, the computation consists of a few steps. At first, a reference day with a low AOD and low cloud cover is chosen, for which the surface reflectance is calculated from SEVIRI data utilizing the operational global-scale Copernicus atmosphere monitoring service (CAMS) AOD forecast product (
https://atmosphere.copernicus.eu/) as background information on the spatial AOD distribution. CAMS AOD is corrected using sun-photometer measurements from several aerosol robotic network (AERONET) [
43] stations in the respective country using an optimal interpolation method [
38]. Then the surface reflectance is estimated with the use of corrected CAMS AOD. The AOD is calculated for several days around the reference day. Finally, the AOD at 635 nm is interpolated to a regular grid of 0.07° × 0.045°, corresponding approximately to 5.5 × 5.5 km
2 for Poland. Besides AOD, AOD pixel-level uncertainties are estimated.
For the NRT retrieval, each day at 00:21 UTC surface reflectances are calculated. This is done for each country separately. At 7:00 UTC the following conditions are checked: a. the mean AOD, which is calculated using CAMS data, is below or equal to 0.15 and b. the cloud cover is less or equal to 65% (SEVIRI cloud mask). If fulfilled, surface reflectances are calculated for the previous day using data from SEVIRI, AERONET (both automatically downloaded), and CAMS AOD forecast data (downloaded every day at around 03:00 UTC). The SEVIRI AOD retrieval starts at the 7th, 23rd, 38th, and 53rd minute of each hour (within the time period between 5:00 and 9:45 UTC, and 13:00 and 16:45 UTC). There is about 20 to 23 min delay in receiving the data, which means that, e.g., the calculation for 7:00 UTC starts at 7:23 UTC. AOD computations take a few minutes, depending on the number of valid pixels (cloud-free and containing a surface reflectance). For each time the following is checked: the mean AOD (CAMS) is >0.15 and the cloud cover is ≤65%. If these conditions are fulfilled, AOD and its uncertainties are calculated from SEVIRI data and surface reflectances (for the reference day, being one of the previous days). The choice of the reference day for the surface reflectance calculation is done with a constraint that the span between the day for which the AOD map is derived and the closest available reference day cannot be more than 15 days.
2.2. PM2.5 Retrieval: AOD to PM2.5 Conversion
The near-surface hourly PM
2.5 concentrations were retrieved from temporally averaged SEVIRI AOD data [
44]. Estimating PM
2.5 from satellite observations of total-column AOD requires knowledge of the aerosol properties (microphysical and optical), their vertical distribution (PBL height and fraction of AOD in the PBL), and AOD. More specific, see Equation (
1) from [
21].
with
being the density and
the effective radius of the aerosol mixture,
the extinction coefficient,
the fraction of AOD in the PBL, and
the height of the PBL.
The workflow for our PM
2.5 retrieval is illustrated in
Figure 3. The most computationally intensive part is obtaining the optical and microphysical properties of aerosols in an online fashion. This was handled by creating a look-up table of properties for a range of aerosol mixtures at a number of relative humidity levels. Using the algorithm developed in [
45] for aerosol typing a synthetic database was generated by simulating the optical properties of various aerosol types based on available information on the microphysics.
The algorithm combines the global aerosol dataset (GADS) database [
46], the optical properties of aerosol and clouds (OPAC model) [
47] and T-Matrix code for light scattering by non-spherical particles [
48] in order to compute in an iterative way the optical properties of different aerosol classes in various humidity conditions and mass proportions starting from the microphysical properties. From GDAS we used the extinction, scattering and absorption coefficients, the single scattering albedo and the asymmetry parameter. From OPAC microphysical properties used are the density of aerosol particles, the aerosol mass per cubic meter, and the mode radius. Information from the CTM is used to compute the mass mixing ratios of the aerosols which are part of PM
2.5. Using the model we computed the relative humidity, mass mixing ratios for four major aerosol types (soot, water soluble, insoluble, sulfates), and the PBL height. The AOD fraction in the PBL was computed using WRF-Chem and the same aerosol type was assumed at all levels. Uncertainties due to the use of WRF-Chem cross-sections and the assumption of an unique aerosol type in the column were evaluated in post-processing mode by comparisons with the cloud–aerosol LIDAR with orthogonal polarization (CALIOP) onboard the CALIPSO mission [
49] and ground-based LIDAR data.
T-matrix calculations are very time consuming (approximately 3 s per mixture/grid point). This would have taken about a day of computation to get results (150 × 200 grid point), therefore, look-up tables were pre-computed. A master-script handled the inputs and errors, e.g., caused by missing input variables. It could run in automated or manual mode; the latter plotted all intermediate steps for debugging and quality control. Within SAMIRA we could demonstrate the NRT capability of our AOD-to-PM2.5 retrieval, with PM2.5 maps being ready within 5 min after the AOD maps became available on a server.
2.3. Data Fusion Methodology
The data fusion methodology used for SAMIRA is illustrated in
Figure 4. It is a variant of the regression–interpolation–merging mapping [
50,
51], which is an improved residual kriging method. Residual kriging is a frequently used data fusion method [
52]. In residual kriging, monitoring, modeling and other supplementary data are combined in multiple linear regression and subsequent spatial interpolation of its residuals is done by ordinary kriging [
53]. Separate map layers were created for rural and urban background areas on a grid at a 1 × 1 km
2 spatial resolution. The rural layer was based on the rural background stations, while the urban layer was based on the urban and suburban background stations. Residual kriging was applied separately for the rural and urban background areas with the subsequent merging of these map layers by population density.
Air quality in situ data were acquired from the Czech national air pollution database, the air quality information system (AQIS) [
54] and the EEA’s air quality e-reporting database [
55]. Model data were acquired from three models: CAMx for the Czech domain at 4.7 × 4.7 km
2 spatial resolution, WRF-Chem for the European domain (hourly and daily time steps) at 5 × 5 km
2 spatial resolution and the European Monitoring and Evaluation Programme (EMEP) model [
56] for the European domain (annual averages) at 10 × 10 km
2 spatial resolution. Various satellite datasets were used. SEVIRI AOD from SAMIRA (temporally aggregated into hourly and daily time averages) was used for the creation of data-fused PM
2.5 and PM
10 maps. The PM
2.5 product from SAMIRA was used for data-fused PM
2.5 maps. NO
2 and SO
2 from OMI, complemented by data from the Global Ozone Monitoring Experiment–2 (GOME-2), were used for the development of historical test data for 2014. NRT NO
2 and SO
2 data were obtained from TROPOMI. Supplementary datasets needed were altitudes from the database ZAGABED prepared by the Czech Office for Surveying, Mapping and Cadastre (
https://geoportal.cuzk.cz/) and Global Multi-resolution Terrain Elevation Data 2010 (GMTED2010) [
57]. Merging of the rural and urban map layers was done using population density based on Geostat 2011 grid dataset [
58] and data from the Czech Office for Surveying, Mapping, and Cadastre.
2.4. Downscaling Methodology
The downscaling methodology used for SAMIRA is, just like many other downscaling techniques, essentially based on increasing the spatial resolution of a coarse source dataset (in our case satellite data of air quality) with the help of spatial proxy or auxiliary datasets that are available at a fine spatial resolution and that are, to some extent, correlated with the source dataset. As such, the technique makes use of the assumption that the spatial patterns of the unknown fine-scale field of the source variable will be similar to the spatial patterns of the fine-scale proxy datasets.
Figure 5 illustrates the SAMIRA downscaling methodology. In simplified terms, the various input datasets were brought to the same coarse resolution and a statistical model was fitted to directly relate the satellite and the proxy data. The model choice was arbitrary and could range from simple linear regression models to more advanced non-linear models such as random forest or similar. Subsequently, the spatial residuals from the model were calculated, downscaled to the fine target resolution using area-to-point kriging [
33], and added to the deterministic trend component of the fitted model.
2.5. In Situ PM10 Assimilation
For completeness, the SAMIRA pre-operational in situ PM
10 air quality data assimilation forecast system should be mentioned. Based on the prototype system developed in the ESA SiAiR project (2014–2015) [
59], the WRF-Chem model was setup to run at 5 km horizontal resolution, covering most of Europe. The emission pre-processor was developed by the Central Institute for Meteorology and Geodynamics (ZAMG) in order to prepare data from emission inventories for WRF-Chem. Emissions were taken from the TNO-MACC II emission inventory [
60] and the EMEP inventory (
http://www.ceip.at/ceip-reports, accessed on 30 October 2018) for areas not covered by TNO emissions. This development was a first step towards a full air quality data assimilation system for Romania. It was designed to allow extension to other observational datasets. Thus, adding processor(s) for Sentinel 3, Sentinel 5P, and future missions is desirable.
6. Conclusions and Outlook
The SAMIRA initiative led to increase in knowledge and better exploitation of synergistic satellite-based air quality products. A distributed NRT system for satellite-based regional air quality was set up, thus successfully demonstrating a complex technical interplay between multiple research and operational institutions located in four European countries. Advances were made in the following research areas:
1. The SEVIRI AOD optimal estimation algorithm was improved and geographically extended from Poland to Romania, the Czech Republic and Southern Norway. Alongside AOD, pixel-level uncertainties were estimated. After testing for historical data (June-–September 2014), a NRT retrieval was implemented and is currently operational (for details see [
42]). The benefit of using SEVIRI for air quality application is the possibility to obtain data with a high temporal resolution (15 min). The largest limitation of any geostationary AOD algorithm is related to the surface hot-spot effect for scattering angles close to 180° and a small solar zenith angle, limiting the retrieval to day-times up to 10:00 UTC and after 14:00 UTC. The exact range, however, depends on the time of the year and geographical position. A specific issue for the SEVIRI optimal interpolation AOD retrieval is the choice of the reference day (a clean day with low AOD and clear sky) and that the availability of regular ground-based AOD measurements is required. This hinders the retrieval for regions where no photometer data are available, as it was in the case for the Czech Republic. Due to the northern geographic location of Norway scattering geometry is unfavorable and clouds frequently obstruct the scenes, therefore AOD data for Norway are sparse. Validation against ground-based sun-photometers, located in Romania and Poland, using the data from 2014, showed generally good agreements with country mean correlation coefficients (R) between 0.61 and 0.62, a bias of 0.09–0.12, and an RMSE of 0.12–0.14, but did not correlate as well with the 3 km AOD product from MODIS. For more details see [
64].
2. A retrieval for ground-level concentrations of PM2.5 was implemented using the SEVIRI AOD in a combination with WRF-Chem output. NRT capability for the AOD–to–PM2.5 retrieval was demonstrated. The satellite-based PM2.5 data product from our method was validated using ground-based in situ PM2.5 observations from National air quality networks. An important lesson learned is that the representativity of the air quality monitoring stations is a very important factor to take into account when evaluating the methodology. For representative sites in Poland and the Czech Republic correlations between 0.56 and 0.49 were found between satellite-based PM2.5 and PM2.5 measured at air quality sites; this is nearly double the correlation between WRF-ChemPM2.5 and PM2.5 observed at the in situ sites. Uncertainties in the PM2.5 retrieval, as well as AOD and WRF-Chem uncertainties contribute to those from the satellite-based PM2.5. The boundary layer altitude were found to be an important parameter for a potential future improvement of the PM2.5 retrieval.
3. The added value of including satellite data when creating air quality maps was demonstrated. An operational algorithm for data fusion with the capability of optimally merging and mapping multiple heterogeneous datasets was extended to make use of various satellite-based air quality products (NO2, SO2, AOD, PM2.5, and PM10). Validation results showed that in multiple cases the inclusion of satellite data can improve the mapping for the Czech Republic for both, historical as well as NRT data. Moreover, an inclusion of satellite data improves the daily and hourly mapping results of NO2 in the rural areas and annual data, both in the rural background and in the urban regions. Inclusion of satellite SO2 slightly improves both rural and urban mapping results for hourly, daily, and annual data. Including AOD or PM2.5 derived from satellite AOD improved the results for PM2.5 and PM10. The main limitation for the operational use of such data lies in the limited satellite data coverage due to lack of daylight and cloudiness. Gap filling in the cloudy areas, using model and in situ data only, can be considered a suitable way to improve coverage, as long as is properly flagged.
4. A geostatistical downscaling algorithm was developed and tested to bridge the gap between satellite products of air quality (typically provided at spatial resolutions on the order of several kilometres) and urban-scale applications (for which spatial resolution of hundreds of meters are required). Statistical downscaling has been carried out in many disciplines in the past and substantial efforts have also been made in satellite remote sensing. However, so far to our knowledge no studies have used or implemented such approaches for downscaling satellite-based air quality products. In a first step, the SAMIRA downscaling algorithm was validated using synthetic data. Then, we found that it performs well in extracting spatial details that can be seen in the true high-resolution data field. We successfully demonstrated downscaling OMI NO2 data to the spatial resolution of TROPOMI, with NO2 data from the latter acting as a true high-resolution reference. It is expected that the advantage provided by the downscaling algorithm will also hold for even finer spatial scales. It is important to note here that—when using a time-invariant proxy—the downscaling on a daily basis (i.e., not for a longer-term average) for a high-resolution instrument, such as TROPOMI, is limited to relatively calm winds in the area and no significant plumes forming. In the case of a substantial plume the downscaling algorithm will spatially redistribute in areas where the original emissions causing the plume have not actually originated, thus leading to erroneous results.
We can conclude that the SAMIRA project was a significant step forward towards a better exploitation of the Earth observation capabilities for air quality monitoring in Europe. Geostationary satellite instruments, like SEVIRI and the upcoming Sentinel 4 mission [
65] (launch planned in 2023) are particularly interesting for air quality applications. Setting up a European or international initiative, analogue to, e.g., the Production and Evaluation of Aerosol Climate Data Records from European Satellite Observations (Aerosol_cci) [
66], but with focus on geostationary AOD retrievals, would be an important step forward in improving satellite-based air quality. This is also true for the estimation of the information content for satellite-based PM
2.5 retrieval. Outcomes from SAMIRA, together with the work performed for the EEA [
67], led to the inclusion of satellite data in the EEA air quality mapping in Europe. A geospatial downscaling algorithm was implemented and we could demonstrate its skills. More work is necessary to better understand the uncertainties and limitations associated with the resulting downscaled products. All of the work shown above is about data integration. Linking the NO
2 columnar data with surface concentrations of NO
2, for which several hundreds of stations are available in Europe, is a natural next step in the development of satellite-based urban-scale air quality monitoring.