1. Introduction
There are currently many projects in the world that develop and use information technology to aggregate, analyze, and visualize information about air pollution in cities, these provide urban residents with timely information about possible dangers. The information sources for such monitoring systems are ground air quality measurement stations and satellite information. There are also projects that monitor air quality not only for a particular country but for many countries or regions. One of these such projects is Smart Urban Solutions for Air Quality, Disasters and City Growth (SMURBS), part of the Horizon-2020 ERA-Planet program, within which the National Observatory of Athens has developed a platform for monitoring indicator 11.6.2, “Annual mean levels of fine particulate matter in cities”, for 800 cities and 37 European countries [
1]. Unfortunately, Ukraine is not included in the list of countries monitored on this platform, and that is why the question of the development of a similar product for Ukraine arises.
This article describes the analysis of existing technologies and data in the world, which can be used to monitor air quality in cities and countries as a whole. We also propose an algorithm for creating similar, high-quality products for the territory of Ukraine. For Ukraine, there are currently two sources of information on air quality monitoring—ground stations, which are equipped with appropriate sensors, and satellite data. We first consider ground measurements.
In recent years, Ukraine has significantly increased its network of ground stations and the creation of air quality monitoring systems in large cities. This is confirmed by the growing number of monitoring stations of various sensor providers in Ukraine. For example, in 2018, the current largest air quality networks, EcoCity [
2] and SaveDnipro [
3], appeared, and the total number of sensors from all stations in 2020 was 1183, a number which had already reached 1568 by 2021, i.e., the number of sensors per year increased by 385. It is the large cities that make the biggest contribution to air pollution, where the number of stations for air pollution measurement has significantly increased. However, the greatest activity still occurs in large cities, and questions regarding the absence of a small number of ground posts for air quality indicator measurement, in less populated cities or villages, remain open. In this paper, we propose using satellite or model open data to fill the gaps in those places where there are no ground measurements. The issue of the validation of satellite and model data on the basis of ground measurements for the territory of Ukraine was considered in [
4].
The world’s largest aggregator of air pollution data is the AQICN resource [
5], which uses the air quality index, the standards of which are calculated by the United States Environmental Protection Agency, and visualizes all sensors and information using API. It integrates networks of smart cities around the world, as well as national and public air quality monitoring systems. Western Europe, Southeast Asia, North America, and India have the largest sources of information. The estimated number of operating stations of different types is roughly 13,000 units.
In Western Europe, one of the most popular air quality monitoring initiatives, and the most common provider of air pollution data, is the Sensor Community [
6], whose developers launched the Luftdaten.info public project in 2015 in Stuttgart, Germany [
7]. This project has become a catalyst for the development of similar networks of public air quality monitoring in Ukraine.
Another portal used by the residents of smart cities in the European Union is Airly [
8]. It applies its own technological solutions and provides an opportunity to predict the levels of dust concentrations of fractions PM
2.5 and PM
10 for 1 day ahead. The service works with more than 300 local authorities, and more than 4000 devices around the world are integrated into it.
The Swiss commercial project IQAir [
9] uses devices to measure air pollution levels both outdoors and indoors, offering appropriate sensors and a mobile application for control and notification. IQAir operates on the world’s largest free real-time air quality information platform and attracts a growing number of global citizens, organizations, and governments. The IQAir AirVisual platform integrates data collected by governments, companies, and individuals around the world.
In the context of smart cities, the most valuable tool is the Urban Atlas service, which is related to local CLMS services and provides a detailed digital city plan in vector form. This plan is segmented into small functional areas classified by Coordinate Information on the Environment (CORINE) nomenclature [
10]. The Urban Atlas is a geospatial layer with high resolution, which is built for all European cities with a population of more than 100,000 and combines high-resolution satellite data, city segmentation by blocks and functional urban areas (FUAs), important city infrastructure, etc. This product is used as a basis for city planning and obtaining analytics on the most important indicators of city development, including air quality monitoring.
There are many other global air quality monitoring services that are designed and operated for the territories of the European Union [
11,
12], Great Britain [
13], and the United States [
14], etc. By analyzing them, we can conclude that they publish information in the form of levels of concentrations of pollutants. Almost all smart city monitoring systems include web portals that combine a large amount of environmental and urban data, which are directly or indirectly related to air quality and determine the comfort of the urban environment. The unequivocal advantage of such systems is the use of a large number of high-density measuring equipment in urban areas. In particular, one of the most common projects, which includes air quality analysis and the evaluation of the Sustainable Development Goal 11.6.2 indicator, quality of life, analysis of urban growth, and many other important applications, is the SMURBS project [
15]. For the most part, almost all air quality monitoring projects involve the Copernicus Atmosphere Monitoring Service (CAMS), global data based on satellite measurements.
The CAMS service for air quality monitoring and forecasting completely covers the territory of Ukraine, which is a good sign for the national developers of air quality monitoring services. However, these data are still not used properly in Ukraine, as applications for cities can be implemented only with the presence of an active network of ground stations, which allow one to clarify the values of pollution concentrations in urban areas and increase the spatial resolution of global raster products. At the same time, over the last three years, public air quality monitoring initiatives and public networks, such as EcoCity and others, have been developing very rapidly in Ukraine. In the future, such data sources may become alternatives to governmental air quality monitoring networks, which can be obtained using Citizen Science. Thus, using these data, Ukrainian cities can already obtain the same opportunities in the implementation of Copernicus data. with regard to air quality monitoring throughout its cities, as other European cities. Despite the fact that the ground air quality monitoring networks have grown significantly, large cities still have one or two monitoring stations each, which do not cover all the variability of air quality in a large city. Examples of such cities are Chernihiv (one station), Sumy (one station), Donetsk and Luhansk (no stations), Kherson (four stations), and Zhytomyr (four stations). The lack of quality national products for urban planning and the monitoring of air quality indicators in cities makes it impossible to assess the impact of cities on the environment and human health, and to calculate important indicators for achieving Sustainable Development Goals (SDGs) such as 11.6.2, “Annual mean levels of fine particulate matter in cities”.
The National Research Foundation of Ukraine’s project aims to create information technology for air quality monitoring in Ukraine based on satellite and in situ data and the development of Urban Atlas for the cities of Ukraine. In the future, this will provide opportunities for the qualitative use of new tools for urban planning. Currently, the impact of cities on environmental quality is very large, and the assessment of this impact makes it possible to reduce the negative effects of human activities.
2. Methods
The methodology for calculating the Sustainable Development Goal Indicator 11.6.2 is based on the methodology proposed by the UN and used within the SMURBS project [
15], the main goal of which was to collect best practices to promote the concept of the “smart city” and use it for as many cities as possible by integrating satellite observations to increase the resilience of the environment and society to urban impact. These European activities brought together a large consortium of experts from around the world, who were divided into the three main areas of the project: air quality, urban growth, natural or human-made disasters and their consequences. The countries represented in this consortium were Greece, Italy, France, Spain, Germany, Switzerland, Slovenia, the Czech Republic, Romania, and Finland, as well as Ukraine, represented by the Space Research Institute of the National Academy of Sciences of Ukraine and the State Space Agency of Ukraine [
16,
17].
SMURBS project countries that have monitored and developed air quality monitoring systems for pilot cities have developed 22 technical solutions that are currently at different stages of development, depending on the country, city, and data available for the pilot area. Typically, all participants use the same type of input to carry out their research and air quality monitoring projects in pilot cities: data from portable sensor ground stations, satellite data, and CAMS model data [
18] (or other model data that is local to the pilot territories). The National Observatory of Athens has developed a platform for monitoring indicator 11.6.2 for 800 cities and 37 European countries. Ground data are used as the input data, on the basis of which models are extended over a larger area. The most common indicator for assessing sustainable development in terms of air quality is 11.6.2, which is the average annual concentration of PM
2.5 in functional urban areas (FUAs) and in global urban centers (GUCs) in terms of population. It is estimated using advanced modeling through the integration of satellite monitoring data, population estimates, topography, and ground-based measurements [
19]. FUAs, therefore, consist of a densely inhabited city and a less densely populated commuting zone whose labor market is highly integrated with the city.
There are two main methodologies for calculating indicator 11.6.2 (UN methodology and the methodology developed within the SMURBS project), which are based on different data sets. According to these methodologies, all air quality indicators are calculated within the FUA and GUC. The general calculation equation is the same for the two methodologies and is as follows:
where
Cn represents the estimated average value of PM
2.5 for the functional urban areas or global urban centers, and
Pn represents the population (calculated for the FUAs and GUCs). This equation is used for countries with available data from ground-based PM
2.5 measurements.
The main input data, as can be seen from Equation (1), for the calculation of SDG indicator 11.6.2, are ground stations, which measure the concentration of PM2.5 in FUAs and GUCs, and statistics on urban populations. The big problem is the low quality and availability of the required data. In particular, for Ukraine, there is not enough data available, which should be used for air quality monitoring. That is why, in cases where there are not enough necessary data, it is necessary to use alternative methods, in particular using the corresponding satellite data instead.
Many developed methodologies within the SMURBS project consider model data of air quality CAMS with a low spatial resolution of 11 km. In this article, according to the methodology of the SMURBS project, SDG 11.6.2 indicators for the years 2014–2018 are calculated for Ukraine.
4. Results
4.1. CAMS Air Quality Data Validation for Ukraine
The insufficient number of ground-based stations leads to the necessity of searching for additional data that allow one to analyze air quality across the whole country. In our case, satellite data or model data CAMS provide daily and hourly air quality data with a spatial resolution of 11 km. The Royal Netherlands Meteorological Institute (KNMI) validates different algorithms of CAMS data modelling for different parts of Europe on the basis of ground stations every day, and the root mean square error PM
2.5 value, in general, does not exceed 10 mg/m
3 [
35]. Unfortunately, such validation is not performed for the territory of Ukraine. Within our investigations, CAMS data were validated based on the example of the city of Kyiv.
The left-hand panel of
Figure 9 presents the coverage of the territory of Ukraine by CAMS data for 18 April 2020, including a period of active forest fires in the Zhytomyr region and the Chernobyl region. Active fires, which were determined according to our own methodology via satellite data with a high spatial resolution from Sentinel-2 of the European program Copernicus, are marked in red [
36]. As can be seen from the right-hand panel of
Figure 9, one pixel of CAMS data contains a large number of ground air quality measurement posts. This is why the aggregation of ground and satellite data, satellite data validation, as well as, in the future, increasing the spatial distinction of satellite data using deep learning algorithms, is an important task.
Before starting the validation of satellite data, we need to have an understanding of the reliability of the data that will be analyzed. In particular, the first step is to identify and eliminate the outliers observed in the data. These outliers can be caused by several factors—the failure of the ground station (error in determining the indicators of air pollution), as well as the high sensitivity of the sensor to sudden changes in the environment (sharp deterioration of air quality with subsequent rapid recovery). Such cases (outliers) were detected and eliminated from the data set using a boxplot or box-and-whisker diagram [
37]. We also note that we validate rather rough global model data (with spatial resolution of 11 kilometers) with ground data, indicating that the major peaks in ground data physically cannot be identified based on global data.
The next step in data analysis is to compare the outlier-cleaned data with the CAMS model data. To do this, the ground data were aggregated so that they could be easily compared with the CAMS data, namely by calculating the average hourly values of the PM2.5 pollutant for each station.
Two commonly used statistical metrics, including the correlation coefficient (
R) and root mean squared error (
RMSE), were hereby calculated between spatially and temporally co-located ground PM
2.5 measurements and CAMS PM
2.5 to quantitatively evaluate the accuracy and uncertainty of the latter. Mathematically, these metrics can be derived from the following Equations (2) and (3):
where
oi denotes ground-based PM
2.5 measurements and
pi represents the CAMS PM
2.5 data, respectively.
o and
p are arithmetic means of the observed and forecasted PM
2.5 concentrations, respectively, while n denotes the number of data pairs.
Since we are interested in how to predict or model ground observations with CAMS data, we will also be interested in the coefficient of determination
R2 between ground data and CAMS data, which is determined by the following Equation (4) (in our case, the forecast data will be ground data):
where
is the simulated value of air quality according to CAMS.
After identifying the outliers in the ground datasets, the statistical indicators (
R,
R2 and
RMSE) were calculated, which are presented in
Figure 10. For 107 stations, the correlation between ground data and CAMS is more than 0.5, which indicates a sufficiently high quality of CAMS model data. For 85 stations, the RMSE indicator does not exceed 15 mg/m
3, and for 33 stations, it does not exceed 10 mg/m
3.
Figure 11 shows a comparison of daily data from all ground stations for 2019–2020 (the values are in mg/m
3). The R-squared ratio is not very high (0.36–0.38) but is stable for both years. The correlation coefficient shows the best values (0.5–0.6), which allows us to conclude that CAMS data can be used for Kyiv as a reliable source of data in analytical tasks. Excluding from the survey those stations (
Figure 11b) that are statistical outliers (their values are very different from neighboring stations), the correlation coefficient and R-squared coefficient were increased and the root mean squared error was decreased for both years (in 2019, the R-squared value increased from 0.38 to 0.48, the correlation values increased from 0.6 to 0.61, and the RMSE value decreased from 10.7 to 10.1 mg/m
3; in 2020, the R-squared value increased from 0.36 to 0.38, the correlation values increased from 0.5 to 0.54, and the RMSE value decreased from 10.5 to 9.7 mg/m
3).
Additionally, from the dependence graph, it is seen that the value of PM2.5 from ground stations mostly exceeds the data from CAMS. One of the logical explanations for this is the spatial resolution of CAMS data. It does not allow one to increase the variability of the values of the contaminant, and thus, increases the accuracy of CAMS products. A great advantage of satellite and model data for air quality monitoring is the continuous coverage of the study area with data, which cannot be guaranteed when using ground stations only. As a conclusion, we can say that CAMS data are reliable global data that can be used in scientific and practical research for the territory of Ukraine.
Figure 12 shows graphs comparing different ground stations within a single cell of CAMS data. Red shows CAMS data, green shows data from a specific ground station that is geographically within the investigated cell, and blue shows the average PM
2.5 values for all ground stations within this cell. The characteristic jump for days 100–110 of the year (April 2020) is associated with a sharp deterioration in air quality in Kyiv due to fires in the Zhytomyr region and the Chornobyl zone. Stations with a red outline are defined as anomalies and are removed from the analysis because the values in them are much larger than the average value for all the stations within a given cell.
4.2. Population Data GHSL and NASA Validation for Ukraine
In this paper, the FUAs for Ukraine were defined, having not been defined for Ukraine previously. Accordingly, the question about the availability and accessibility of population data within the obtained FUA arose. This question is important, because population data are necessary for calculating the SDG indicator 11.6.2. In this case, the global products for around world can help, namely, the population data from the Global Human Settlement Layer (GHSL) and the global population data from NASA. Accordingly, the first question before using these data for the territory of Ukraine is the question of their verification. To validate GHSL and NASA population data, official population statistics for 2013–2020 were used for the 20 largest cities in Ukraine [
38]. Separate validation was conducted in the districts of Kyiv, for which statistical information on the population is available for 2007–2020 [
39].
Pearson’s correlation coefficient was calculated to verify the relationship between satellite population data and statistics (
Table 1,
Figure 13). The Global Human Settlement Layer data correlation coefficient for Kyiv districts is close to 0.77. This means that there is a strong directly linear relationship between GHSL data and statistics. NASA data for the districts of Kyiv show a correlation coefficient close to 0.52, which is characteristic of the direct linear dependence of the average intensity.
During the analysis, the absolute and relative deviations of GHSL and NASA data from statistical data were calculated. The maximum relative error according to GHSL is roughly 33% in the city of Kherson, while the minimum error is less than 1% in the city of Nikolaev. In the main part of the cities, the error does not exceed 10%. At the level of Kyiv districts, the relative deviation is mostly no more than 2–10%. Deviation of more than 10% is observed in the Desnyansky and Obolonsky districts. A particularly large deviation (about 50%) was recorded in the Holosiivskyi district.
According to NASA data, the situation is worse due to the lower spatial resolution of the product. The largest relative error is recorded in the city of Kherson and is about 80%. For other cities, the error exceeds 40%. At the level of districts of the city of Kyiv, a deviation of about 50% is observed. In the Holosiivskyi district, deviations reach more than 100%. However, when estimating data not separated by district but for the whole city, the deviation rates decreased significantly, in some years not even reaching 7% for NASA data.
According to the analysis, it can be concluded that there is a clear, directly linear relationship between GHSL and NASA product data and population statistics, and they can be used in studies to determine the SDG indicator 11.6.2 for Ukraine.
4.3. SDG Indicator 11.6.2
Figure 14 shows the obtained values of indicator 11.6.2 for Ukraine using different data on the population by FUA and GUC. Following on from
Figure 14, in 2017, there was a significant decline in the indicator responsible for improving air quality in the country as a whole. It can also be noted that the chart shows that indicator 11.6.2 does not change much at the level of the whole country, depending on the provider of information on population. This once again confirms the possibility of using global population data to calculate this SDG indicator.
From a comparative analysis of all European countries, which also gave their values to the indicator 11.6.2, we can conclude that in 2017, all countries had better air quality than in all other years.
Figure 15 and
Figure 16 show the results of the evaluation of indicators 11.6.2 for Ukraine for the last 5 years in comparison with other European countries (for 2019 and 2020 years, there are no available data for this SDG indicator for Europe), which calculate this indicator centrally (for FUA and GUC).
Figure 17 demonstrates the obtained indicators of 11.6.2 at the Ukraine level in comparison with other European countries in 2018.
It should be mentioned that the air quality monitoring product is based exclusively on open data sources.
Land cover maps have great potential for the analysis of urban areas, as they provide a reliable set of data that can be used for the planning and evaluation of various indicators of sustainable development, such as the indicator for achieving Sustainable Development Goal 11.6.2 and others.
5. Conclusions and Discussion
According to the obtained results, it is safe to say that cities have the greatest impact on air quality compared to suburban areas. This is shown in
Figure 18, where, for European countries and for Ukraine, the value of the SDG 11.6.2 indicator for FUAs is lower than that for the GUCs. As a further step in the analysis, the authors propose using the land cover maps, which received their own algorithm for deep learning to classify the time series of satellite data [
40,
41] within the FUA. This will help to determine the contribution of cities and other types of land cover to air quality indicator assessment.
The considered air quality monitoring methodologies are just beginning to be adapted and used for the territory of Ukraine, in particular for Kyiv. The use of open access CAMS air quality model data is stable but requires increased spatial resolution because this product is too rough for cities. In addition, the ground-based monitoring system using air quality sensors is only beginning to grow in large cities but has not yet increased in other settlements and villages. This requires the development of technology that will allow the use of ground data for those cities where they exist and to build model values of air pollution based on satellite data in those cities where ground data are insufficient. Of course, a pre-important step is the validation of satellite data on the basis of ground data to understand the quality and accuracy of the data source.
To understand the meaning of the developed product in terms of air quality, it is compared with similar indicators for European cities. Unfortunately, for Ukraine, there is no modern database with the populations of different cities, which is an integral part of the assessment of the indicator of Sustainable Development Goal 11.6.2. Therefore, the global products of the Copernicus GHSL program and NASA are used, as in most EU countries. In addition to the PM2.5 data used in the UN and SMURBS methodologies for the assessment of indicator 11.6.2, in the future, all available data on air pollution (both ground and satellite) will be used.
The technology of indicator 11.6.2, “Annual mean levels of fine particulate matter (e.g., PM2.5 and PM10) in cities (population weighted)”, for the calculation for air quality assessment was adopted for the territory of Ukraine. The initial main workflow and relevant datasets were proposed by the United Nations and realized within the ERA-PLANET/SMURFS project for the territories of the European Union. During approach adaptation, all unavailable data for Ukraine were replaced by using other available datasets, in particular the population dataset from NASA and Copernicus. On the basis of the available data from in situ air quality estimation networks, these data were analyzed from an accuracy point of view and applicability. The obtained results are based on accuracy estimations for the available population data and satellite data, which can be used instead of inaccessible data on air quality.
As a further development, the authors plan to come to an agreement on the use of Meteosat satellite data [
42], which will be a significant step towards the digitalization of air quality monitoring in Ukraine. We hope that the process of digitalization in Ukraine will provide great opportunities for access to additional sources of information that will allow the monitoring of air quality at the highest level.