A Wavenet-Based Virtual Sensor for PM 10 Monitoring

: In this work, a virtual sensor for PM 10 concentration monitoring is presented. The sensor is based on wavenet models and uses daily mean NO 2 concentration and meteorological variables (wind speed and rainfall) as input. The methodology has been applied to the reconstruction of PM 10 levels measured from 14 monitoring stations in Lombardy region (Italy). This region, usually affected by high levels of PM 10 , is a challenging benchmarking area for the implemented sensors. Neverthless, the performances are good with relatively low bias and high correlation.


Introduction
Exposure to high levels of particulate matter (PM 10 ) is a big social problem [1] due to its impacts on human health, with effects including pulmonary and cardio-vascular diseases [2,3]. One of the main challenges in decision making related to PM 10 control is that, usually, win-win solutions that also consider other pollutants, such as nitrogen oxides (NO 2 ) and ozone (O 3 ), are complex to identify and implement [4][5][6][7]. For this reason, having detailed information about the level of all of the significant air pollutants over a certain area is a key issue in decision-making processes. In this context, the use of integrated information coming from regional networks and novel/private networks supported by low-cost technology [8,9] has become more and more important, which has been mainly due to the fact that they can provide suitable information for chemical transport models (CTMs), allowing them to compute concentrations far away from the official monitoring network stations [10][11][12].
In principle, four main techniques for the measurement of PM 10 are presented in literature [13]: (1) gravimetric analysis of pumped and filtered particles; (2) tapering element oscillating microbalance (TEOM); (3) beta-attenuation; (4) light scattering. The first three of these techniques are quite expensive, so their use is limited to regional authorities, private companies and research groups [13]. Light scattering, instead, is a relatively lowcost technique, but it is often affected by consistent biases [14].
The objective of this work is to evaluate the possibility of implementing a virtual sensor for PM 10 daily mean concentration starting from the data measured by sensors detecting other pollutants and meteorological variables. In particular, the virtual sensors applied in this work are based on NO 2 daily mean concentration and meteorological variables, such as wind speed, rainfall, relative humidity and temperature.
As indicated by the name, virtual sensors can be broadly described as a software that allows us to compute the value of a certain variable without direct measurement considering measurements that are physically/chemically related to the variable that should be reproduced [15]. They assume a key role when it is not possible to place a physical sensor due to any kind of limitations (e.g., unreachable position, high cost). There are two possible approaches to virtual sensor implementation: 1.
Data-driven: in this approach, time series of input and output variables are collected from direct measurement and are used to compute a mathematical, approximated relationship between the measured variables' and sensors' output [16]; 2.
Deterministic: in this approach, the (eventually approximated) physical/chemical relationships among input and output variables are used to compute the unmeasured variable through the virtual sensor [17].
This work presents a data-driven approach based on wavenet models to implement a PM 10 virtual sensor using NO 2 and meteorological variables. All these variables are strictly related to the phenomena involved in the formation and accumulation of PM 10 in atmosphere; their choice is due to the presence in the literature of low-cost sensors with performances that are adequate [18] enough to identify a virtual sensor, therefore allowing the definition of a low-cost PM 10 measuring network. Wavenets are data-driven models resulting from the integration of wavelet theory and neural network models [19]. Their main applications are related to sound management/filtering [20], even if their nonlinear function approximation (and thus forecasting) properties have been applied with good results also in other fields such as energy systems [19,21]. These approximation properties make them suitable for environmental monitoring and forecasting applications, but still, there is no literature related to their application to reproduce PM 10 or other air quality pollutants. Therefore, since artificial neural networks are widely used in this field [4,22,23], wavenets could also be useful for the definition of a PM 10 virtual sensor. The paper is organized in two main parts, a methodological one (Section 2) where the basics of the artificial neural network, wavelet theory and wavenets are introduced and a second part presenting the evaluation of the results on a test case.

Materials and Methods
In this section, the theoretical framework used to derive a virtual air quality sensor based on wavenets [24] is presented.

Artificial Neural Networks
Artificial Neural Networks (ANNs) are functions approximating human brain behavior, considered as a network of smaller units, called neurons, representing the information processing unit (Figure 1). Each input x i of the network is multiplied by a corresponding weight w i , analogous to a synaptic force; then all the weighted inputs are added together, including also a bias b term in order to compute the activation level x of the neuron. The output signal y(x) is usually a nonlinear function f (x) of the activation level. Hence, the typical neuron model is represented as (1): where d is the length of the input vector. The approximation capacity of a single neuron is quite limited; to overcome this, they are collected in layers sharing the same input. The final structure of a neural network is obtained by connecting several layers, as in the case of the two-layer feedforward neural network in Figure 2. In this case, the output y(x) can be computed as: where y(x) ∈ R m2 is the output of the network, x ∈ R d is its input vector, f : R d → R m1 and g : R m1 → R m2 are the activation functions of the hidden and output layers, respectively, and, finally, m1 and m2 are the lengths of the activation function output and the neural network output. The bias terms b 1 , b 2 ∈ R m1 and the weight matrices IW ∈ R m1×d and LW ∈ R m2×m1 are computed during the training phase. Even if the number of layers of an artificial neural network can be higher than 2, following the proof of the Cybenko approximation theorem, and in order to limit the complexity of the network, in real applications only a two layers neural network is used [25].

Wavelets and Wavenet Models
Wavelets are a family of orthonormal basis functions that can be used to perform transformations among spaces. Their use ranges from function approximation to audio compression [26][27][28]. The wavelet approximation theory is strictly related to multi-resolution analysis [26]. In this context, a function h(x) can be approximated using the so-called wavelet (mother) and scaling (father) functions, as: where: • c j 0 (k) are the scaling coefficients; • d j (k) are the details (wavelet) coefficient; • φ j 0 ,k (x) is the selected scaling (father) function family; • ψ j,k (x) is the selected wavelet (mother) function family.
The computation of the scaling and wavelet coefficient is strongly connected to the selected wavelet family (considered as the couple wavelet/scaling functions). Up to now, a number of different functions has been considered and are currently used. More details about wavelet transformation can be found in [26][27][28].
Wavenets (wavelet networks) [24] can be considered as a one hidden layer network with wavelets as activation functions. In particular, the wavenet output Y(x) for an input x ∈ R d can be computed as: and Q ∈ R d×d are the parameters to be computed during the training.
The comparison between Equations (2) and (4) shows that the wavenet can be considered as a neural network with the function: as the activation function of the hidden layer.
When the phenomena to model with the wavenet is dynamical, the wavenet is feeded by an input vector x(t) that is the output of a time delay phase: where u 1 ...u m are the variables selected to compute the output y(t) of the overall system. In this work, since the PM 10 formation, accumulation and removal are clearly dynamical processes, the system structure presented in Figure 3 is used.

Case Study and Dataset Definition
The aim of this work is the definition of a virtual sensor to compute PM 10 daily average concentrations starting from the measured data of daily average NO 2 concentration and the measured values of two meteorological variables: average daily wind speed WS, total daily rainfall RF, average daily relative humidity RH and average daily temperature T. The selection of NO 2 as the input variable is due to the fact that its levels are strongly related to PM 10 ones, as they shared some emission drivers (i.e., road traffic, domestic heating) and chemical paths (i.e., formation of secondary inorganic aerosol starting from the ammonium nitrates). On the other hand, the selected meteorological variables can be related to general deposition or dispersion conditions (mainly rainfall and wind speed) or to the formation of secondary aerosol by condensation. Thus, the Y(x) in Equation (4) is the dailyPM 10 concentration computed by the model, which is referred to as nPM 10 (x) from now on. Moreover, the input x of the wavenet function is time dependent, so x = x(t), and it includes both NO 2 concentrations and meteorological variables for the day t and the previous days, as in: In order to test the presented methodology, a series of models has been trained and validated to reproduce the PM 10 daily mean concentrations starting from different input measured by the Lombardy region monitoring network. The work has been tested using data measured by 14 monitoring stations belonging to the Lombardy region (Italy) monitoring network (Figure 4).
More in detail, the data from year 2019 have been used (365 × 14 = 5110 available raw data tuples). The performance evaluation for the different models has been performed using a leave-p-out approach with p = 4. Following this approach, 100 tests have been performed for each model configuration, with 10 stations being used for the identification, and the data for p = 4 being randomly selected as stations queued in order to define the metastation used for the validation.

Configuration Tests
In order to evaluate the capability of the methodology presented in Section 2 to compute PM 10 concentrations, all the possible configurations among the input variables have been considered, and the relative models PM 10 = W N(x) trained.
In principle, the different configurations can be grouped into three categories: • Configurations including only NO 2 concentration as input; • Configurations including only meteorological variables as input; • Configurations including both NO 2 concentrations and meteorological variables as input.
For each test, an analysis of the memory of the systems, i.e., an evaluation of the performances of varying n NO 2 , n WS , n RF , n RH and n T , has been performed. On the basis of the knowledge of the phenomena related to the formation of PM 10 in atmosphere, a maximum value of 5 days can be considered for these parameters. Each model has been evaluated on the basis of the following three different statistical indexes: • Normalized Root Mean Squared Deviation: wherePM 10 (t) and PM 10 (t) are, respectively, the t-th values of the model output and of the validation dataset, and µP M 10 and µ PM 10 are their mean values. From the huge set of performed tests, only the best-performing ones are presented in this context, in particular for the combination of multiple input.

Models with NO 2 as Input
This first class of models includes only NO 2 daily mean concentrations as input. This is due to the fact that PM 10 and NO 2 concentrations are generated by several common emitting activities (i.e., road transport) and that the secondary inorganic fraction of PM 10 is composed, in part, of nitrates, in particular ammonia nitrate, whose formation depends on the NO 2 concentration in atmosphere. Table 1 highlights that the performances are quite good in terms of correlation, with values around 0.74, and acceptable in terms of root mean square error, with a normalised root mean standard deviation (allowing one to compare the root mean square error with respect to the overall variability of the output time series) around 0.1.
From these results, it is clear that an increase in the memory of the system does not lead to significant impacts on the performances and on the behavior of the model. The negligible increase in performances for the test with n NO 2 = 4 does not justify the increasing number of parameters. Table 2 shows the performances for the same configurations for the part of the time series where PM 10 concentrations higher than 30 µg/m 3 have been measured. The table states that the model has strong difficulties in reproducing high concentrations, as highlighted by the strong decrease in statistical indexes.

Models with Meteorological Variables as Input
The second class of models considers only the meteorological variables as input. These tests allow an assessment of the relative "importance" between meteorology and NO 2 concentration for the computation of PM 10 levels. Tables 3 and 4 show poor performances, with the limited exception of the cases with temperature T as input. Thus, the performances suggest that the meteorological conditions alone are not enough to estimate PM 10 concentrations, and, so, they may be at best used to increase the performances in addition to the NO 2 concentrations.  The last class of models considers both the meteorological variables and the NO 2 daily mean concentration as input in order to evaluate if the joint use of these information sources leads to an increase in the performances. Table 5 presents the results with NO 2 concentrations coupled to a meteorological variable at a certain time. The performances are in line with that of the models with only NO 2 as an input. Moreover, the combined use of more than one meteorological variable did not lead to a consistent increase in performance (Tables 6-8). The only slight improvement can be seen for high concentrations when the temperature is used as input (Tables 9-12), but also, in this case, the performances seem not to be good enough (correlation coefficient close to 0.52) in the preproduction of the peaks. These results suggest that, to reproduce mean PM 10 levels in this domain, only the NO 2 concentrations should used, thus relying on cheaper sensors. Nevertheless, a bond in the performances exists, which did not allow the reconstruction of peak concentrations.

Comparison to State-of-the-Art Models
In this section, the comparison of the wavenet approach used in this work with two different state-of-the-art models is presented. The two models are a (1) K-nearest neighbors (KNN) and an (2) artificial neural network-based model, which are often used in this context to capture the dynamic of the PM 10 [29]. The comparison (Table 13) shows how the performances of the best-identified wavenet are strongly better than that of the KNN model and very similar (slightly better for high orders) to that of the ANN ones. Moreover, it has to be stressed how the best model for the wavenet approach ensures these performances with limited complexity and with a limited number of variables (only NO 2 concentration) with respect to the other approaches. Figures 5-7 present the time series plots for the best configuration of wavenet, artificial neural network and KNN models, respectively. As expected, the behaviour of the wavenet and ANN models is very similar, with the first models showing slightly better performances for the low value close to the sample n. 800. In general, the KNN model reproduces higher value but, as also stated by the lower values of correlation coefficient, the time series rarely follows the value and the gradient of the measured values.

Conclusions
In this work, a data-driven, wavenet-based virtual sensor for PM 10 daily mean concentration is presented and evaluated. Different model configurations have been tested and evaluated. The methodology has been applied to data measured by the Lombardy regional monitoring network. The results show good agreement between the output of the virtual sensor and the measured data used for validation when the daily mean NO 2 concentration is used as input-in particular, around the mean concentration values. Therefore, the models fail to reproduce the peak concentrations, and this behaviour will not change even if other inputs, such as meteorological data, are used. Nevertheless, the performances show that this approach can be used to produce supporting information to integrate the regional monitoring network that can be made available through app/web services due to a relatively fast computation.