Forecasting Particulate Pollution in an Urban Area: From Copernicus to Sub-Km Scale

: Particulate air pollution has aggravated cardiovascular and lung diseases. Accurate and constant air quality forecasting on a local scale facilitates the control of air pollution and the design of effective strategies to limit air pollutant emissions. CAMS provides 4-day-ahead regional (EU) forecasts in a 10 km spatial resolution, adding value to the Copernicus EO and delivering open-access consistent air quality forecasts. In this work, we evaluate the CAMS PM forecasts at a local scale against in-situ measurements, spanning 2 years, obtained from a network of stations located in an urban coastal Mediterranean city in Greece. Moreover, we investigate the potential of modelling techniques to accurately forecast the spatiotemporal pattern of particulate pollution using only open data from CAMS and calibrated low-cost sensors. Speciﬁcally, we compare the performance of the Analog Ensemble (AnEn) technique and the Long Short-Term Memory (LSTM) network in forecasting PM 2.5 and PM 10 concentrations for the next four days, at 6 h increments, at a station level. The results show an underestimation of PM 2.5 and PM 10 concentrations by a factor of 2 in CAMS forecasts during winter, indicating a misrepresentation of anthropogenic particulate emissions such as wood-burning, while overestimation is evident for the other seasons. Both AnEn and LSTM models provide bias-calibrated forecasts and capture adequately the spatial and temporal variations of the ground-level observations reducing the RMSE of CAMS by roughly 50% for PM 2.5 and 60% for PM 10 . AnEn marginally outperforms the LSTM using annual veriﬁcation statistics. The most profound difference in the predictive skill of the models occurs in winter, when PM is elevated, where AnEn is signiﬁcantly more efﬁcient. Moreover, the predictive skill of AnEn degrades more slowly as the forecast interval increases. Both AnEn and LSTM techniques are proven to be reliable tools for air pollution forecasting, and they could be used in other regions with small modiﬁcations.


Introduction
Air pollution is a global pivotal issue in the fields of health and environment, affecting at the same time both the economy and social life. Expediting industrialization and urbanization triggered an increase in cardiovascular and lung diseases, attributable to air pollution [1]. Airborne particles with a diameter of 10 µm or less are included in air pollutants with adverse effects on public health, especially in urban areas. Particulate matter (PM) consists of a complex mixture of particles with major components sulfate, nitrates, ammonia, sodium chloride, black carbon, mineral dust and water [2]. Both coarse particulate matter (PM 10 ) and fine particles (PM 2.5 ), due to their diminutive size, can penetrate deeply into the respiratory system, causing serious chronic health problems including airway irritation, asthma, irregular heart rate, abnormal lung function [3,4]. Longterm exposure to high levels of PM is quantitatively associated with increased mortality and lung cancer [2]. Fine particles are small and light, which allows them to remain in the atmosphere for longer periods of time; they also have been associated with a 4 to 8% increase in the burden of cardiopulmonary diseases [5]. The sources of particulate quality sensors) grade monitoring stations due to the data variability or lower accuracy. However, calibrating them with the appropriate techniques can generate reliable data in increased spatial coverage [26][27][28].
In this study, we evaluate the CAMS PM forecasts at a local scale against in-situ measurements, spanning 2 years, obtained from a dense network of calibrated low-cost air pollution stations located in an urban coastal Mediterranean city in Greece. Then, we apply two essentially different statistical approaches into the operational CAMS forecasts to investigate their predictive capacity to accurately map the pollution pattern at a local scale for different types of stations (urban, suburban, rural). Specifically, AnEn and LSTM methods are trained with the 4-day-ahead operational forecasts of PM 10 and PM 2.5 from CAMS as well as open data from ground-based observations for 2018 and tested with the corresponding datasets of 2019.

Investigation Area and Pollution Data
The examined urban area lies between the Mount Panachaikon to the east and the Gulf of Patras to the west. It is a major urban center and has one of the largest ports in Greece, an important channel of communication with Western Europe. The main sources of anthropogenic particulate matter are traffic (road, maritime) and indoor activities (including wood burning, cooking, etc.). According to the Köppen Climate Classification, the area has a Mediterranean climate with a moderate temperature range, warm dry summers and mild wet winters.
Data of PM concentrations (µg/m 3 PM 2.5 & PM 10 ) for the municipality of Patras were collected for the period January 2018 to December 2019. Hourly measurements of PM concentrations are received from the Patras Air network, an air quality monitoring system by the laboratory of Atmospheric Physics, University of Patras, that consists of low-cost sensors (Purple Air) measuring the concentration of particulate matter at a dense network of stations installed in Patras. Their performance has been evaluated and calibrations have been derived for Patras [27]. Stations with at least 75% annual data are included in the study and classified considering the location and the dominant emission source. The locations of the stations are illustrated in Figure 1 and their types are listed in Table 1. Traffic stations are located near roads, whereas to monitor pollution levels in areas less dominated by emission densities, stations are installed in suburban areas. Background stations are located at a certain distance from the area of the highest air pollution emissions and they are influenced by an integrated contribution from sources of pollutants.
CAMS forecasts of PM 2.5 and PM 10 concentrations are obtained on a six-hour basis in a 0.1 • (~10 km) horizontal resolution ( Figure 1). The forecasts are pin-pointed to the location of the air quality monitoring stations using inverse distance weighting [29]. Considering that the weights diminish as a function of distance, the closest points have more influence on the target value [30].
Air pollutant concentrations are affected by the atmospheric conditions prevailing in an area, besides emissions. Temperature affects fuel usage (emissions) and chemical transformations in the atmosphere while precipitation leads to wet deposition and removal of the air pollutants from the atmosphere [31]. Because of the hygroscopic growth of particles, they tend to absorb large amounts of moisture, leading potentially to dry deposition through gravitational settling [32][33][34]. High wind speed contributes to the advection of PM, but in some cases, the topography of an area can impede the transport of air pollutants to the other regions [35]. Figure 2 illustrates the mean monthly values of temperature, rainfall, rainy days, humidity and wind during the study period in the study area, retrieved from the era 5 reanalysis of ECMWF. The 1st semester of 2018 is warmer and more humid compared to 2019, while the opposite is true for the 2nd semester. The number of rainy days (and total precipitation) in January 2019 is double compared to January 2018, affecting the wet deposition of pollutants. In February and March, the situation is reversed, with larger differences between the two years. Prevailing wind direction is from the west and east in the warm season (April to September), transporting marine aerosol particles in the study area, and from the east in the period from October to March.  More concretely, this technique based on the current deterministic forecast selects the most similar historical forecasts to it, from an archive of predictions issued from the same NWP model, and uses the mean value of their corresponding observations as the current AnEn forecast [22]. The probability distribution of the future observation, y, at a given time and location, is P(y|xf), where x, f are the repositories of past observed and predicted values, respectively. The metric used to estimate the similarity between the current deterministic forecast and past predictions was proposed by Delle Monache et al. [16,19]: where F t , A t are the current and analog (historical) NWP deterministic forecasts, respectively, at a certain station for the same forecast lead time t; N v are the predictor variables, and w i their assigned weights; the σ fi is the standard deviation of the past forecasts of each predictor variable at the same station; t is equal to half of the window of the time where the metric is computed; and therefore, F i,t+j , A i,t+j are the current and analog forecasts in this time window [19,36,37]. The metric ranks the similarity of past forecasts to the current. For each lead time, the n most similar analogs are chosen from past dates within the training period, and their contemporary observations are the members used to generate the ensemble prediction. The selection of the optimum number of analogs results from computing the Root Mean Squared Error as a function of the number of analogs on the training data set [37,38]. The appropriate ensemble size minimizes the RMSE values between observations and the analog ensemble predictions. From all possible combinations produced by the predictor variables Nv, the combination leading to the lowest RMSE of the AnEn mean is chosen and preserved as constant over the testing period. For each station and forecast lead time, the optimal number of analog members, and the suitable combination of predictors of the target output, are selected.

LSTM
Air pollutant concentration at a specific time t is influenced not only by the current conditions, but also by the values at a previous time t. Given this, a recurrent neural network (RNN) is selected, which can generate forecasts with sequential information flow [42]. RNN networks are feed-forward neural networks with cyclic connections between neurons that allow one to transfer information, introducing the output from the previous steps as input for the next step. They also have an internal memory in each hidden layer, which retains the interdependency of the data and enable them to preserve long-term information. These models link the previous information with the current target, but when the gap is increased between them, RNNs encounter problems in learning to connect the information. In addition, due to the long-term dependencies, the gradient descent used to determine the weights of the network tends to vanish or explode [43]. To address these problems, LSTM networks were proposed [25]; they are specifically designed to learn dependencies between long-distance information. LSTMs have the structure of a chain of repeating modules of a neural network, like all RNNs, but their repeating modules have a more complex structure [44]. The models consist of an Input Layer, LSTM layers, Dense Layer, and an Output Layer. In the Input Layer, sequential data are created and fed to the LSTM layer, where each LSTM module accepts as input the output of the previous cell as shown in Figure 3. An LSTM module has a cell state and three gates (namely, forget, input, output). The gates are internal mechanisms of the LSTM cell ( Figure 3) that can determine the removal or the addition of information to the cell state. The subscript t denotes the time step in the current moment, so x t is the input vector to the LSTM unit; h t−1 is the output vector of the previous cell, of which the memory is represented by c t−1 ; h t c t , concern the current cell state and will be used in the next cell, ensuring the sequential dependency. W is the weight matrix adjusted for the current input state in each gate and b the vector of the bias terms of the sigmoid layer of neurons.
Initially, the first gate of the LSTM module (Figure 4a), the forget gate, determines what information from the previous cell and to which extent needs to be discarded from it or to be retained in the new cell state. The concatenation of the output from previous cell h t−1 with the input of current cell x(t) are introduced into the forget gate layer, f t (Equation (3)), where a sigmoid function (Equation (2)) ranks the output between 0 (complete discarding) and 1 (complete inclusion of information in the new cell) for each value in the cell state C t−1 . In the next step, the new values pass into the input gate (Figure 4b), where a sigmoid layer (Equation (4)) decides if they will be updated, while a hyperbolic tangent function generates a vector of candidate values (Equation (5)). The updated cell is the sum of the previous cell state scaled by f t , and the new cell state scaled by the input gate i t (Equation (6)). Finally, the cell state is put into the output gate (Figure 4c), where a sigmoid layer selects the information will be updated (Equation (7)). Then, to produce the cell output h t , the cell state is activated by a hyperbolic tangent function and multiplied by the result of the output gate o t (Equation (8)). LSTM models are considered the most suitable for an air pollution forecast, since the output sequence is a function of the observations in previous steps from multiple different parameters, including the predicted variable. They have been applied to prediction problems of major air pollutants (CO, NO 2 , SO 2 , PM 2.5 , PM 10 , O 3 ) [45], while most of them focus on short-term and long-term predictions of PM (PM 2.5 , PM 10 ) [46-48].

Predictor Variables
The core predictor variables in the models (AnEn, LSTM) are the CAMS concentrations of PM 2.5 and PM 10 . Two additional variables are included which serve as proxy to the emissions variability. Julian day is selected to reproduce the variations of emissions associated with processes in the seasonal cycle. Correspondingly, the day of the week is added to the predictor variables to capture the variability of city activities, contributing to air pollutant concentrations.

Verification Methodology
Statistical indices such as Mean Bias Error (MBE) and Root Mean Square Error (RMSE) [49,50] are used to reflect the error between actual and predicted time series of PM. The MBE defines the systematic error of a model to over or underpredict the observations while the RMSE describes the overall deviation between estimated actual values. The metrics are presented in Equations (9) and (10): O i and F i indicate the observed and forecasted values, respectively. In addition, Taylor plots [51] and soccer plots are used to summarize model performance at each season and station.
Apart from the continuous evaluation described above, categorical statistics are used to quantify the performance of AnEn and LSTM in forecasting extreme levels of PM 2.5 and PM 10 . The analysis is assessed with the following statistical parameters [50]: (a) the probability of detection (POD), which corresponds to the ratio of the observations forecasted correctly by the method; (b) the false alarm ratio (FAR), representing the proportion of events which are categorized incorrectly as extremes events; (c) the miss mate (MIS), which is a complement score of POD and indicates the extreme events that are not forecasted; (d) the critical success index (CSI) that gives the overall skill of a model to detect correctly an extreme event, in consideration of both false alarms and misses. The statistical range of the above indices is from 0 to 1, with the value of zero indicating the perfect skill for FAR and MIS, whereas for POD and CSI, the unitary value is the ideal situation.

Observed PM Concentrations
The mean monthly PM 10 and PM 2.5 concentrations at each station during 2018-2019 are illustrated in Figure 5. During winter (DJF), the PM levels peak and exhibit the largest spread among stations. The higher atmospheric stability limits pollutant re-circulation and each station is mostly affected by the nearby emission sources. This results in more elevated concentrations at the urban stations compared to suburban stations. PM concentrations are generally constant from May to September as well as between different station types during the same period. The magnitude and variability of PM concentrations are generally consistent across the two years, with the only notable difference being the reduced PM levels in January 2019 compared to January 2018. This can be attributed to the meteorological conditions in the study area presented in Section 2.1; the twofold increase in total precipitation and rainy days in January 2019 exhibits a washing effect on PM concentrations.

CAMS Evaluation
The comparison of the gridded 0.1 • × 0.1 • CAMS forecasts with the observations at specific locations is performed on equal terms. Specifically, we make use of the cell de-clustering geostatistical approach to estimate the observed PM concentration onto the CAMS grid cell scale. The 0.1 • × 0.1 • area around the central CAMS grid point ( Figure 1) contains six stations. Splitting the CAMS cell into 4 equal 0.05 • × 0.05 • boxes, we find either 1 (SE and SW) or 2 (NE and NW) monitoring stations within each box. The observed PM concentration at the CAMS spatial scale is calculated from the weighted sum of the sensor's concentration, giving weight 0.25 to the stations located in the south and 0.125 to the north stations. Figure 5 presents the mean monthly PM levels of the central grid point of CAMS shown in Figure 1 against the weighted mean value of the six stations found in its grid. A substantial underestimation occurs during the winter (DJF) while overestimation is evident for the other months. The maximum difference between forecasts and observations is found in January (MB < 0) and April (MB > 0). The most profound reason for this inconsistency by a factor of 2 in DJF is possibly due to errors in the CAMS anthropogenic emissions and especially wood burning, which represents roughly 43% of particulate emissions in the area during winter (Pandis S, personal communication). On the other side, the April overestimation by a factor of~3 in PM 10 in both years is linked to occurred events of Sahara dust transport (17-19/4/18, 12/4/19, 24-28/4/19), not identifiable from the low-cost sensors.

Development of AnEn and LSTM Models
LSTM and AnEn methods are employed to produce PM 10 and PM 2.5 forecasts for the next 90 h at six-hour increments, for the eight air quality monitoring stations located in the urban area of Patras. The datasets of all the stations are separated into two parts, one for training the models (2018) and the other for evaluating them (2019). In this section, we present the implemented configuration of each algorithm issued during the training phase.

AnEn
Given a forecast, the AnEn algorithm searches similar past forecasts in the training dataset, as described in Section 2.2.1. The selection of the number of analogs and the combination of predictors contribute significantly to the optimal configuration of the AnEn method. Those factors are determined by the leave-one-out cross-validation method in the training dataset (2018) applied for each day. PM 10 and PM 2.5 are the target variables. The predictor variables for PM 10 (PM 2.5 ) are four: the same variable provided from the CAMS forecast and three auxiliary variables, namely, the CAMS forecast of PM 2.5 (PM 10 ), the Julian day and the day of week. Seven combinations are produced by the set of the three auxiliary variables, which, with the addition of the AnEn that hasn't any auxiliary variable, produce a total of eight combinations. For each station, the number of analogs and the variable combination yielding the lowest RMSE between the observed values and the analog ensemble predictions in the train period are identified. The same configuration will be applied in the next section, in the 'blind' dataset of the validation period. Table 2 displays the optimum configuration per station, i.e., the number of analogs and the combination of predictor variables yielding the minimum RMSE. At most stations, more than 20 analogs are needed to derive the analog forecast for PM 2.5 while fewer members (on average 6 less) are required for PM 10 forecasts. In producing PM 10 predictions, PM 2.5 is used from all stations while the contrary is generally not true, occurring only at 25% of the stations. For both pollutants, AnEn utilizes as input the Julian day at most stations (seven out of the eight) while WDAY was found important at 2-3 stations only. Hence, the AnEn PM 10 forecast relies mostly on three inputs (CAMS forecasts of PM 10 and PM 2.5 , Julian day) while the AnEn PM 2.5 forecast has two dominant inputs (CAMS forecast of PM 2.5 , Julian day). This partly explains the need for fewer analogs for PM 10 . Adjusting weights to the predictors does not led to better results because they are proven statistically insignificant.

LSTM
Achieving the best performance for the LSTM model is a complex and time-consuming procedure. It is not enough to optimize the hyperparameters of the model; the best combination of them should also be found. In order to construct the architecture of the LSTM network, hyperparameters like the number of hidden layers and nodes in each layer are used, while epoch and batch size, optimizer, loss and activation function should also be employed. A range of values is tested for adjusting each parameter. Through a grid search, numerous trials with all the possible parameter combinations are conducted to result in the final network. As input variables, the same are used as those of AnEn, i.e., observations and CAMS forecasts of PM 10 and PM 2.5 , the Julian day and the day of week. The data for 2018 of each station are divided into groups of four days, using the first three days of each group as a training set and the fourth day as a validation set to tune the hyperparameters of the model.
The dataset needs preparation before introducing it to the LSTM network, including normalizing the input variables with a range of 0-1 and transforming it suitably for a supervised learning problem. Investigating the correlation between the current target value and its own historical lagged values through the partial autocorrelation (PACF) function, the higher correlation occurred four steps back (t-24 h). Therefore, the LSTM network is trained with a time lag of four timesteps. Using as input the prior four timesteps of predictors (at time t-24 h), the LSTM model is learning from them to produce PM forecasts for each forecast lead time for the next four days.
Based on trial experiment runs, one LSTM layer is proven suitable for the network to avoid overfitting, Although the number of 100 units in the hidden layer seems to be appropriate for all stations, a different network size s tested to achieve the best result for each station. As a concern, the activation of this layer is selected between the functions of relu, sigmoid, softmax and tanh, with the sigmoid function yielding to the least Mean Square Error. After the LSTM layer, two dense layers are added; the first is a fully connected layer with 50 units that works efficiently to connect the neurons to each layer, with the second dense layer acting as the output layer. The output of the model is one-dimensional and utilizes sigmoid activation to produce better forecasts. The model is trained using the Adam gradient-based optimization technique. The Adam optimizer compared with two other stochastic gradient descent algorithms, Stochastic Gradient Descent (SGD) and RMSProp, achieves the minimum error with the lesser number of epochs. After the selection of the optimizer, the number of epochs and batch size are determined, 50 and 76, respectively. For the validation loss, the function must be minimized through optimization, where common choices are the RMSE and MAE. In this case, the RMSE is proven to be a better option. Many of the mentioned results are in accordance with the findings reported in pertinent studies on air pollution forecasting with LSTM models [52][53][54].

AnEn & LSTM Forecast Verification (Validation Phazse)
In this section, the optimal configuration of AnEn and LSTM identified during the training phase in the year 2018 (Section 3.3) is applied to the 2019 dataset to evaluate their forecast skill. The verification is carried out for each station separately, covering different forecast lead times, seasons and extreme levels. Verification of CAMS forecasts pin-pointed to the station locations are also used for comparison.

Time Series
Concurrent time series of PM predictions by CAMS, LSTM and AnEn techniques against ground-level observations are produced at six-hour increments for all sites of the study. Figure 6 illustrates the time-series plots of two stations of different types, an urban traffic station (Psila Alonia) and a suburban background station (University of Patras), for a two-week period of January and April 2019, respectively. Those months are selected because, as seen earlier, of the deviation between observations and CAMS peaks during those months. As far as the urban traffic station is concerned (Figure 6a), despite the tendency of CAMS to underestimate PM concentrations conspicuously in January, the AnEn is drastic in correcting the CAMS forecasts towards the magnitude and variability of the measured values. The LSTM captures the variations of the measured values; however, it underestimates the peaks, making it inferior to the AnEn in this type of station. Regarding the background suburban station (Figure 6b), the CAMS model produces quite overestimated forecasts during April. The application of the AnEn in the CAMS forecasts limits to a large extent their distance from the observations. The LSTM, integrating antecedent useful information to the next output, accomplishes a good performance even though it tends to overestimate the minimum. In summary, both the AnEn and the LSTM demonstrate a significant potential to correct the magnitude and phasing of CAMS PM 2.5 and PM 10 predictions, with AnEn displaying higher forecast skill at the occasional observed extreme PM concentrations.

Degradation of Forecast Skill
The verification of the daily cycle of the models has been carried out for horizons up to 90 h ahead, at the eight air quality monitoring stations. Figure 7 displays the normalized RMSE as a function of the forecast lead time from hour 0 to 90 for the PM 2.5 and PM 10 levels. The improvements over CAMS are significant at all stations for each forecast lead time. The AnEn generates better results than the LSTM method. The peak error in both approaches is observed at 18 h UTC due to the elevated levels of PM at the specific evening rush time. Moreover, the degradation of the forecast skill as the forecast interval increases is milder for the corrected schemes, being slowest for AnEn (0.015 increase in NRMSE per forecast day) compared to LSTM (0.043 per day) and CAMS (0.079 per day).

Error Indices
Typical error metrics, such as MBE and RMSE, are calculated at each monitoring station in annual, seasonal and monthly temporal scales to gain insight on the forecast skill of each model. On an annual scale, CAMS shows positive bias for both pollutants, with the MBE of PM 10 being roughly double compared to PM 2.5 ( Table 3). As illustrated in Figure 8, the annual overestimation from CAMs is found for all seasons except for winter, when quite underestimated forecasts are distinguishable. The annual biases of both approaches are smaller than 1 µg/m −3 in absolute terms when aggregated over all stations, being reduced compared to CAMS by a factor of at least 3 ( Table 3). The AnEn technique reduces the bias of CAMS forecasts by approximately 65%, in absolute terms. On the annual scale, it generates predictions with a slight overestimation, in the range 0.1 to 1.1 µg/m −3 for PM 2.5 and 0.2 to 1.7 µg/m −3 for PM 10 . The bias reduction is consistent across all seasons. In contrast to AnEn, the LSTM model exhibits a minor underestimation tendency ranging from −0.9 to 1.7 µg/m −3 for PM 2.5 and −1.4 to 0.2 µg/m −3 for PM 10 , which demonstrates slightly underestimated concentrations with small negative MBE values.   The performance of each model is also evaluated using RMSE, a widely used reliability factor where errors of different signs do not compensate as in the case of MBE. As can be inferred from Table 4, neither method is clearly superior. Both models show a gross annual RMSE value (averaged over all stations) lower by approximately 50% for PM 2.5 and 60% for PM 10 with respect to CAMS. The best performance (~75% RMSE improvement) for both models is met at the suburban background stations (University of Patras, Platani) and the worst (~50% RMSE improvement) is noticed at the urban traffic stations (Koukouli, Psila Alonia). AnEn prevails over the LSTM method in urban traffic stations (high PM levels) while the opposite is true at the background stations. According to the seasonal values of RMSE at each station, the AnEn attains better results than LSTM during winter ( Figure 9). The largest improvement of AnEn over CAMS forecasts is observed in spring and summer and the minimum in autumn. Generally, in terms of RMSE, AnEn is proven more efficient for predicting periods with high particulate air pollution levels, whereas the LSTM is marginally more successful, in seasons with moderate emissions.

Taylor & Soccer Plots
Taylor diagrams check the prediction skill of models from various angles, summarizing RMSE, Pearson Correlation coefficient (PCC) and standard deviation (STD) in a single plot ( Figure 10). The semi-circles in the plots are RMSE contours and the reference point stands for the statistics for the observed field. The values of statistical metrics are derived from the seasonal means of all stations. For PM 2.5 (Figure 11a), CAMS forecasts show the highest combined skill in autumn (PCC~0.4, STD ratio~1) while in winter, the underestimation of the PM concentrations degrades the skill (PCC~0.2, STD ratio~0.5). In summer, CAMS inflates the observed variance by a factor of 2; in spring, the inflation is larger (STD ratio~3), partly explained by the unidentified soiling events from the sensors. In winter, when the highest PM levels are recorded, AnEn clearly outscores LSTM improving noticeably all three validation scores (error, phasing, variance). In autumn, AnEn achieves better variance compared to LSTM at the expense of a larger error, hence, there is no clear winner. Last, in spring and summer, the combined skill of AnEn and LSTM is similar. The results for PM 10 (Figure 11b) indicate a similar picture with those of PM 2.5 . In summary, the seasonal Taylor diagrams point out the essential improvement of CAMS forecasts in winter through AnEn, which corrects appreciably the CAMS forecasts, achieves clearly smaller RMSE, higher correlation and standard deviation closer to the observed values, than the LSTM.    The visualization of the normalized MBE versus the normalized RMSE of each station by means of soccer plots allows the comparative assessment of model skill across different type of stations. Apart from CAMS ( Figure 11) exceeding the border of 75%, the statisticalbased models AnEn and LSTM position the stations in a bounded area, with the NMB not going beyond the limits of 20% in any station, and the NRMSE being restricted mostly to under 75%. For AnEn (Figure 11), the overall performance of stations is within the range of 50-75%, with the only exception being the urban traffic station of Koukouli for the PM 2.5 . For the PM 10 , two additional urban stations, those of Agia Sofia and Psila Alonia, are found outside the bigger box. An overestimation bias is evident for both PM 2.5 and PM 10 in all stations. For LSTM (Figure 11), the results appear more scattered compared to AnEn. Exceptional model performance is noticed for the suburban station of the University of Patras which exhibits the smaller error, whereas the urban stations of Psila Alonia and Koukouli are outside of the greater box. A slight underprediction bias occurs across all the stations for the PM 10 concentrations, apart from the Kastelokampos, which displays an overestimation bias. For the PM 2.5 , half of the stations show overestimation. Generally, the skill of the statistical methods for the urban stations are found within the bounds or exceed them, whereas in the suburban or background stations, where lower air pollutant levels prevail, a good performance is achieved.

Extremes
The occurrence of extreme particulate pollution events is rare in the examined region, but of particular importance for human activities. Consequently, it is deemed necessary to evaluate the proposed methods based on their performance in PM exceedances. Regarding the PM concentrations in the specific area, extreme values of PM 2.5 and PM 10 are considered those exceeding 20 and 40 µg/m −3 [55], respectively. The small representativity of those records in the dataset makes the task of verification more challenging. The capacity of AnEn and LSTM methods and the CAMS estimator in forecasting PM 2.5 and PM 10 extreme values is assessed with the statistical scores POD, FAR, CIS and MIS mentioned in Section 2.4 (Table 5). Generally, the indices for CAMS show substantial weakness to capture extreme values. It is considered more constructive to juxtapose mainly the skills of the statistical methods, AnEn and LSTM. Based on the value of the probability of detection (POD) for PM 2.5 , the AnEn is superior in detecting the extreme values, giving 0.52 against 0.20 of the LSTM, which means more than half the extreme events are detected by the AnEn. However, the false alarm value for AnEn and LSTM is 0.46 and 0.42, respectively, indicating a considerable proportion of the forecast extreme events which does not occur. The composite index CSI for AnEn is 0.36 and dominates over the LSTM index being 0.16. For PM 10 , the AnEn generally replicates the PM 2.5 results, achieving a CSI of 0.30 while LSTM fails (CSI = 0.04). The CSI index takes into account hits, false alarms and misses; hence, it constitutes a more reliable statistical measure; so relying on it, we conclude that the AnEn outscored LSTM in identifying extreme values. In this section, we extend the analysis to two dimensions, and we present forecast maps of air pollution from the investigated techniques. For CAMS, the area enclosed by the stations corresponds to the forecast at the central grid point (Figure 12). For AnEn and LSTM, the maps were generated from triangulation-based natural neighbor interpolation from their corresponding forecasts at each station. As already discussed, CAMS underestimates the average, across all stations, PM 2.5 and PM 10 concentrations (Figure 12). AnEn and LSTM improve the predictions at the point locations of the stations, yielding spatial variability. Between them, AnEn reproduces with higher accuracy the observed south-north gradient, mostly due to its better skill at the peak values observed at the southern stations.

Conclusions
In this work, we evaluate the CAMS PM forecasts at local scale against in-situ measurements, spanning 2 years, obtained from a network of stations located in an urban coastal Mediterranean city in Greece. Then, we compare the performance of a statistical method (AnEn) and a deep-learning network (LSTM) in forecasting PM 10 and PM 2.5 concentrations at station level, using only open data as inputs; namely, the PM observations from a dense network of calibrated low-cost sensors and the corresponding operational CAMS forecasts. The models are trained with the datasets of 2018; the four-day ahead predictions at 6 h increments are validated with the 2019 datasets. Hence, the purpose of the study is two-fold: evaluate CAMS in an urban agglomeration and downscale its forecasts at sub-km scale.
The comparison of the PM 2.5 and PM 10 concentrations at the monitoring stations, upscaled at the CAMS grid, show an underestimation of PM 2.5 and PM 10 concentrations by a factor of 2 in CAMS forecasts during winter, indicating a misrepresentation of anthropogenic particulate emissions such as wood-burning. Overestimation is evident for the other seasons and CAMS achieves the lower error in autumn.
The AnEn technique seeks analog patterns in an input database. The LSTM technique uses the CAMS forecasts as features to identify the pattern of PM concentration. Both approaches effectively predict PM concentrations, capturing adequately the variations of measured values in all stations and achieving substantial lower error than CAMS forecasts. The AnEn reduces the CAMS RMSE by 55% and 60% for PM 2.5 and PM 10 , respectively, and shows a good ability to approach the ground-based measurements throughout the year, and its skill is consistent across all pollution levels, including the extreme values. The LSTM exhibits a trend for underestimating the high PM concentrations, making it more suitable for stations and periods with moderate PM levels. LSTM demonstrated difficulty in capturing the exceedances of the limit value of 25 and 40 µg/m 3 , for PM 2.5 and PM 10 , respectively. AnEn and LSTM have similar skill in suburban stations, whereas AnEn is proven more efficient in the urban stations which record higher PM levels, especially in winter. Seasonally, the AnEn is superior to the LSTM in winter when the PM concentrations are considerably high. In the other seasons, in particular, spring and summer, the composite skill of both methods is similar. The results were robust up to 4 days ahead forecasts and the corrected predictions demonstrated only a mild degradation. Especially, the predictive skill of AnEn degrades notably more slowly as the forecast interval increases compared to CAMS and LSTM.
The generation of forecast maps demonstrated the superior ability of AnEn to reproduce the observed pollution pattern with a south-north gradient. This supports the statement that its forecasts could constitute a reliable index for a health decision support system in cases where the variability of the fine-scale non-represented processes (e.g., weather, traffic, agricultural burning, sandstorms, industrial activity, etc.) follows consistent sub-daily and sub-seasonal patterns. In the city of Patras, those conditions are highly fulfilled.
AnEn and LSTM techniques are statistical methods tied to the completeness of the training data and the stationarity between the training and testing datasets. For the purposes of the short-term PM forecasting investigated here, the two continuous years dataset was found to meet both requirements. In this setting, both techniques are proven reliable tools for air pollution forecasting, and they could be used in other regions with small modifications. Those include the optimization of the architecture hyperparameters (LSTM: number of hidden layers and nodes, learning rates; AnEn: number of analogs) and the selection of the important inputs. The configuration of LSTM is time-consuming because it demands tuning and testing of all the possible combinations of its hyper-parameters. AnEn has lower computational demands and showed better performance across seasons and pollution levels.
A limitation of the current study is the lack of explicit meteorological and emission predictors in the statistical approaches as such data are not available. This will be implemented in a future experiment using forecasts from the Weather Research and Forecasting model. High-resolution weather forecasts could incorporate information about urban processes not parameterized in this study, such as advection and diffusion, with a view to improving further the PM 10 and PM 2.5 prediction in cities at sub-km scale.
Funding: This research was funded by Western Greece's Smart Specialization Strategy (RIS3) 2014-2020, co-financed by Greece and the European Union, project "Smart Air Quality Monitoring (SmartAQM)".