Contributing towards Representative PM Data Coverage by Utilizing Artiﬁcial Neural Networks

: Atmospheric aerosol particles have a signiﬁcant impact on both the climatic conditions and human health, especially in densely populated urban areas, where the particle concentrations in several cases can be extremely threatening (increased anthropogenic emissions). Most large cities located in high-income countries have stations responsible for measuring particulate matter and various other parameters, collectively forming an operating monitoring network, which is essential for the purposes of environmental control. In the city of Athens, which is characterized by high population density and accumulates a large number of economic activities, the currently operating monitoring network is responsible, among others, for PM 10 and PM 2.5 measurements. The need for satisfactory data availability though can be supported by using machine learning methods, such as artiﬁcial neural networks. The methodology presented in this study uses a neural network model to provide spatiotemporal estimations of PM 10 and PM 2.5 concentrations by utilizing the existing PM data in combination with other climatic parameters that affect them. The overall performance of the predictive neural network models’ scheme is enhanced when meteorological parameters (wind speed and temperature) are included in the training process, lowering the error values of the predicted versus the observed time series’ concentrations. Furthermore, this work includes the calculation of the contribution of each predictor, in order to provide a clearer understanding of the relationship between the model’s output and input. The results of this procedure showcase that all PM input stations’ concentrations have an important impact on the estimations. Considering the meteorological variables, the results for PM 2.5 seem to be affected more than those for PM 10 , although when examining PM 10 and PM 2.5 individually, the wind speed and temperature contribution is on a similar level with the corresponding contribution of the available PM concentrations of the neighbouring stations.


Introduction
Advances in the field of air quality estimations have been rapid, particularly during the last few decades, demonstrating an increasing interest and attention in both the research community and authorities responsible for the impact assessment of air quality in modern communities. Many cities worldwide are struggling with poor air quality conditions and subsequently with increased mortality and hospital admission rates, mainly due to cardiovascular and respiratory illnesses [1,2]. This is mostly evident for cities with limited access to clean energy, resulting in an increased need for electric power generation and oil/gas extraction, both procedures responsible for emissions amplification, thus citing air pollution levels as an indicator of sustainable development goals [3]. However, all modern socioeconomic centers, where the majority of human activities transpire, need to carefully monitor and evaluate outdoor and indoor pollutants [4] which, in combination with global climate change (global warming), can lead to higher mortality rates [5]. The connection how both PM concentrations and meteorological values can support PM concentrations' point estimations. Specifically, a Feed-Forward Neural Networks (FFNNs) approach was used, in order to make spatial point estimations of PM 10 and PM 2.5 concentrations, aiming to develop a simple yet effective scheme which has the ability to provide representative PM datasets for stations with data gaps or to expand the available data. The spatial estimation of PM concentrations by using FFNNs and data from neighboring stations has been performed before successfully, when compared with other schemes [38]. However, this study evaluates additionally the incorporation of crucial meteorological parameters, such as the surface temperature and wind speed, and how these additions affect the performance of the networks. The methodology utilizes data from ground-based observations, obtained from monitoring stations located in the city of Athens, Greece, which is a densely populated metropolitan area, characterized by regional variability considering the type of each subsidiary area that is part of the city. Furthermore, an important part of the presented methodology is to provide an approach for understanding the contribution of each model input to the output by utilizing an approach proposed by Garson [50]. This approach can contribute to addressing the lack of explanatory power, which is a common problem associated with ANNs. and provide insight on the structure of the function being approximated, which associates input and output parameters.

Data
The area of study is metropolitan Athens, which is part of the Attica region in Greece. Important characteristics of the functional urban area of Athens are the considerably high population density, the complexity of its meteorological and geophysical features and the agglomeration of the majority of economic activities in Greece, which are being associated with various PM pollution sources (vehicular traffic, domestic fuel burning, natural dust and salt, industrial activities, etc.). The Athens basin is defined by four major mountain ranges. These are Mounts Parnitha, Pentelikon, Hymmetus and Aigaleo, which are natural borders at the north, northeast, east central and west respectively, and they affect the air pollutants dispersion and transportation mechanisms. Additionally, the city lies on the north coast of the Saronic Gulf and the west coast of the Euboean Gulf and thus is affected by the sea breeze and other flows. Subsequently, the complex topography and fluctuation of climatic conditions are associated with complex PM 10 and PM 2.5 concentration profiles, characterized by spatial variability even for stations at close proximity [51,52]. The area of study and the locations of each monitoring station are presented in Figure 1.
The importance of the area, considering the PM pollution fields, over the last few years is additionally connected to the post-2010 time period and the economic crisis that affected Greece, during which particle concentrations in all major cities, and especially Athens, increased significantly due to the residential extensive burn of low-cost biomass as an alternative source of fuel for heating [53,54].
For this study, PM 10 and PM 2.5 hourly data were obtained from the air quality monitoring network operated by the Hellenic Ministry of the Environment, Energy and Climate Change (MEE), which has operated in the Attica region since 1984. More specifically, data from nine (AGP, ARI, ELE, THR, KOR, LYK, MAR, PIR and PER) and six stations (AGP, ARI, ELE, THR, LYK and PIR) for PM 10 (µg/m 3 ) and PM 2.5 (µg/m 3 ), respectively, were used in order to create the PM database, and they are presented in Table 1.  The selection of the stations was mostly based on data availability in each case. Additionally, daily data for two meteorological parameters (wind speed in km/h and temperature in • C) were obtained for the target station (AGP) from the automatic weather stations NOAAN (National Observatory of Athens Automated Network) network of the National Observatory of Athens (NOA). The methodology used in this study could be applied for a different target station in the area. However, the AGP station was selected among the six common stations of PM 10 and PM 2.5 due to the temperature (T) and wind speed (WS) values' availability, which can help in better supporting the methodology. Ultimately, the analysis covers a three-year time period (2016-2018) for both the PM and the meteorological parameters. Figure 2 depicts the average monthly evolution for both PM 10 and PM 2.5 and for the 2016 to 2018 time period at the AGP station. All three years' monthly averaged concentrations for each pollutant are presented in the same diagram for comparative purposes.

Methodology
Initially, as mentioned above, the AGP station was selected as a target station for which all the steps of the methodology were performed. This station had high percentages of data availability (>90%) for both pollutants and for all three years (2016-2018), which was important for the evaluation of the results. Accordingly, yearly averaged, maximum and minimum concentrations for PM 10 and PM 2.5 were calculated in AGP. These descriptive statistics are helpful during the discussion of the results and act as an initial description of the 2016-2018 PM conditions for this specific location. The selected machine learning scheme which was used in this study is an FFNN model designed for spatial point interpolation. According to Hornik et al., this type of architecture can effectively simulate the relationship between input and output to various degrees of accuracy, based on several parameters that are part of the networks structure ( Figure 3) [55].
The FFNN is a multilayer perceptron and the information flow follows one direction, advancing from the input to the output without looping [56]. The equation through which the output of a neuron can be calculated is the following: where f is the activation function, x i the inputs, w i the synaptic weights and b the bias. The synaptic weights are the internal connections among the neurons of the network (Figure 3), and through adjustments of their values, the strength of the connections is modified [57]. The PM 10 and PM 2.5 concentrations were estimated by using AGP as a target station and the remaining stations' concentrations as inputs. The number of input stations is different for each pollutant (eight and five for PM 10 and PM 2.5 , respectively). Three stations for PM 10 are not available for PM 2.5 due to limited data, and they were excluded. Additionally, the daily temperature and wind speed values at AGP were used as predictors in the model. Four different models were developed in order to compare their performance. For the first model, the predictors were only the data of the input stations. For the second, third and fourth model, the number of predictors/inputs increased by adding the temperature values, the wind speed values and both wind speed and temperature values, respectively. In all four models, the output was the AGP PM concentrations. Eventually, eight models were created in total (four for PM 10 and four for PM 2.5 ). The aim of this additive process was to investigate how much the meteorological parameters affect the accuracy of the estimations. Initially, the datasets (PM 10 , PM 2.5 , Temperature and Wind Speed) were randomly divided into the training (70%), validation (15%) and test (15%) subsets. While the pollutants and meteorological data points for these datasets where randomly selected from the 2016-2018 time period, they were common for all inputs of each individual network development.
When the network used a data point for a random hour from a monitoring station, the same hour was selected for the remaining stations. This procedure was followed so as to retain the daily variability and avoid mixing seasons and even days due to the short-term fluctuations of the PM concentrations. The next step involved the selection of the optimum number of neurons in the hidden layer. The FFNN consists of three layers, the input, hidden and output layers [58]. The number of neurons in the input and output layers is completely determined by the inputs and outputs. The hidden layer size is an important part of the network architecture. In order to select the optimum architecture, the criterion that was followed was the minimization of the Mean Absolute Error (MAE) on the validation subset [59]. Lower MAE values correspond to a better performing network in relation to a lower degree of complexity. Different FFNN configurations were tested for multiple runs (10 repetitions), due to the initial weights of the neurons of the model being randomly established, and thus, the average result of these runs guaranteed the randomness of the process. The number of hidden neurons tested in all cases ranged from one to forty. To avoid pattern exploitation in the training subset (overfitting), the early stopping approach was used [60], which, according to the validation subset error (when it started to increase), stopped the training process. The final networks that were developed were evaluated for their estimation accuracy by applying two difference and correlation statistical measures, the MAE and the coefficient of determination (R 2 ), [38,59,61,62] on the test subset of the output vector. These criteria are calculated by using the following equations: where n is the number of data points, E the estimated and A the observed concentrations. The best-performing models are associated with lower MAE and higher R 2 values and are evaluated based on the results of both statistical parameters. For the MAE metric, the standard deviation (SD) was also calculated to indicate the dispersion of the estimated concentrations from the MAE value. Additionally, the FFNN models' results were compared with the corresponding estimations of a multiple linear regression model (MLR) [38,63] in order to further establish the superior predictive ability of the FFNNs. Finally, the accuracy of the FFNN models is also examined by plotting scatter diagrams which additionally contribute towards an easier comparison among the models. The scatter diagrams provide information considering the relationship between the observed and estimated values at high, medium and low concentration levels.
The last part of the methodology includes an analysis regarding the distinguishing of the significance of every input variable to the output, for all the FFNN models that were developed, utilizing an algorithm proposed by Garson [50]. This methodology is based on recognizing the associations that the synaptic weights reveal considering the inputs and output relationship and was also used in other studies in the field of air quality, to quantify the importance of each station's data (inputs) to the estimated values for the target station [38,63,64]. The Relative Importance (RI) percentage is calculated with the use of Equation (4), where w ij , w kj are the connection weights between the i-th input and j-th hidden neuron, and between the j-th hidden and k-th output neuron, respectively. In general, ANNs provide little explanatory insight into the individual contribution of the input variables in the estimation procedure. The RI method addresses this issue and can be used as a variable selection technique for similar problems.

Results and Discussion
As aforementioned, the results presented in this section are for the AGP station. Descriptive statistics for the 2016-2018 period in the AGP monitoring station are presented in Table 2. This table includes yearly mean, max and min concentrations for each year individually and the corresponding values for the three years in total. Both PM 10 and PM 2.5 are measured in µg/m 3 , and the monitoring methodology is based on beta radiation absorption. Table 3 includes the number of data points that were used for each subset during the development of the models (input data). There are more available data points for PM 10 due to the increased number of monitoring stations that were used as inputs. In both pollutant cases, when the meteorological parameters are added, they qualify as an additional predictor that has the same number of data points with the input stations' concentrations. Thus, the scenario with the most inputs, i.e., where both WS and T are incorporated, has a higher number of data points available for the training, validation and test subsets. In all cases, the architecture of the ANNs, following the experimental design of this work, defines the number of data points included in the input and output vectors. The size of the training-validation-test subset for the output vector is based on the 70-15-15 percentages which were introduced in the previous Section, and the resulting data points are 10,339-2215-2215 for PM 10 and 11,090-2376-2376 for PM 2.5 . The data points of the output vector are the same for all four scenarios, as the output is always the PM concentrations at AGP.
The architecture of the models is presented in Table 4. The number of inputs is the total number of predictor stations, and one (T, WS) or two meteorological parameters inputs (T and WS) are added according to the model used in the second, third and fourth row of the table. It is evident that the number of hidden neurons in the models for both PMs is lower when the meteorological data are not included in the inputs (16 and 13 hidden neurons for PM 10 and PM 2.5 , respectively). The same number ranges from 26 to 30 for the remaining six schemes. This difference can be associated with the increased complexity of these networks. As more inputs with different characteristics are added to the network, the latter needs additional hidden neurons to simulate the relationship between input and target data.    Tables 5 and 6 show the MAE, SD of MAE and R 2 values for each of the eight models and the corresponding cases for the MLR method. These values are the result of applying the abovementioned metrics on the test subset of the output vector. When comparing the two methodologies, it is evident that the FFNN models outperform the MLR scheme for both PM 10 and PM 2.5 and all input scenarios. They display lower MAE and SD of MAE and higher R 2 values, indicating that the FFNNs methodology simulates more effectively the nonlinear relationship between the input and output parameters. Considering individually the results of the FFNNs method, Table 5 provides some interesting findings. In general, the PM 10 models are associated with low error and high correlation values, providing satisfactory results regardless of the input data that were used. The PM 2.5 models' results showcase higher MAE error values (considering that the MAE error is higher when compared with the average PM 2.5 values) and lower R 2 values (which can possibly be attributed to the lower number of input data, and subsequently, less information during the training process). However, on average, in both cases, the schemes that include T and WS values give lower MAE and higher R 2 values. This is evident especially for the two models that include both T and WS, where the lowest MAE (3.67 µg/m 3 and 2.39 µg/m 3 ) and highest R 2 (0.94 and 0.75) values are produced. Although the MAE value average for the models with PM 10 inputs (3.85 µg/m 3 ) is higher than the corresponding value for PM 2.5 (2.44 µg/m 3 ), the MAE statistical metric uses the same scale as the data being measured and is not suitable for comparison between PM 10 and PM 2.5 in contrast to R 2 , which illustrates better results for the PM 10 cases. A conclusion of significant importance can be drawn by comparing the MAE values with the yearly mean, maximum and minimum concentrations, which are presented in Table 2. While the FFNNs that include as predictors the surface temperature and wind speed correspond to better performance statistics (lower MAE and higher R 2 ), the differences between the models are small regarding the Table 2 values. This fact illustrates the effectiveness, in this case, of the models that are using only concentrations from neighbouring stations. However, adding more parameters or changing the networks configuration (i.e., selecting the subsets data by chronological order and not randomly, using different approaches to avoid overfitting etc.) can further improve the results. An additional evaluation of the FFNN models is performed by plotting scatter diagrams of the predicted versus the observed values, as presented in Figure 4. The scatter diagrams for PM 10 and PM 2.5 ( Figure 4) are consistent with the MAE and R 2 performance statistics ( Table 5). The degree of dispersion for the PM 10 (Figure 4a-d) is lower compared to PM 2.5 (Figure 4e-h). This can be explained by the lower number of inputs provided in order to train the models. Specifically, during the training process, the number of input stations is eight for PM 10 and five for PM 2.5 , meaning that the air quality network density for the latter was lower. On the contrary, there are no notable differences when the diagrams are compared based on the different inputs. According to the MAE values, the performance of all the models, when studied separately for each figure, reveals that there is not a scheme that identifies as substantially superior. However, a closer examination reveals that the models which include both meteorological parameters (T and WS) produce scatter diagrams with lower dispersion across the line of optimum agreement. This is especially evident regarding the higher concentration values (upper right) for the PM 10 models, where the markers are closer to the diagonal.
Finally, the results of the Garson methodology are presented in Table 7. The percentage of contribution for the meteorological parameters is nearly half when compared with the monitoring stations concentrations in the case of the PM 10 models. For PM 2.5 , the corresponding percentages are at a similar level (~15%). Additionally, the monitoring stations which are of the same type (Suburban/Background) as AGP (KOR, LYK and THR), and those which are at proximity (MAR, LYK and ARI), are expected to contribute more to the AGP concentrations estimations. However, Table 7 reveals that all stations have a significant importance for the models.

Conclusions
This study used an FFNN application for estimating PM 10 and PM 2.5 concentrations. ANN approaches, in general, have the advantage to be able to model effectively nonlinear relationships compared to other methodologies. An important aspect is the evaluation of the developed models during different scenarios of input parameters. In nearly all cases, the MAE and R 2 values were lower and higher, respectively, when the meteorological values were added during the training process. The models that showcased a better performance were those who had as additional inputs both T and WS, although there were not crucial differences noticed among the schemes of the four different scenarios. Regarding the comparison between PM 10 and PM 2.5 , the estimations for the latter had a higher degree of dispersion in the scatter diagrams of the observed versus the estimated values. This can be explained due to the more limited information provided during training (more input stations for PM 10 ). The Garson methodology results reveal that all monitoring stations in the Attica region, which were involved in the FFNN development process, are important for the PM estimations. Future work can extend this methodology to include more target stations with different characteristics and/or add more climate parameters. These additions, considering their impact and usefulness for the models, can be further analyzed and supported by applying suitable feature selection and feature ranking techniques [65][66][67]. Finally, ANN ensemble approaches [68] can be examined, aiming to reduce the variance of predictions and the generalization error by combining the results of multiple models.
Author Contributions: C.G.T. and A.A. were involved in the investigation, conceptualization, writing-original draft preparation and writing-review and editing of this work, while, individually, C.G.T. was responsible for the data curation, validation of the results and supervised the whole procedure. Both C.G.T. and A.A. performed the various steps of the methodology, processed the data and developed the neural network models. Both authors were involved in the discussion of the results and commented on the manuscript. All authors have read and agreed to the published version of the manuscript.
Funding: This research received no external funding.

Data Availability Statement:
The air quality and meteorological datasets generated and/or analyzed during the current study are publicly available in the Ministry of Environment and Energy repository, (ypen.gov.gr) (accessed on 30 June 2020) and the National Observatory of Athens repository (https://meteosearch.meteo.gr/) (accessed on 30 June 2020) respectively.

Conflicts of Interest:
The authors declare no conflict of interest.