IoT and Satellite Sensor Data Integration for Assessment of Environmental Variables: A Case Study on NO2

This paper introduces a novel approach to increase the spatiotemporal resolution of an arbitrary environmental variable. This is achieved by utilizing machine learning algorithms to construct a satellite-like image at any given time moment, based on the measurements from IoT sensors. The target variables are calculated by an ensemble of regression models. The observed area is gridded, and partitioned into Voronoi cells based on the IoT sensors, whose measurements are available at the considered time. The pixels in each cell have a separate regression model, and take into account the measurements of the central and neighboring IoT sensors. The proposed approach was used to assess NO2 data, which were obtained from the Sentinel-5 Precursor satellite and IoT ground sensors. The approach was tested with three different machine learning algorithms: 1-nearest neighbor, linear regression and a feed-forward neural network. The highest accuracy yield was from the prediction models built with the feed-forward neural network, with an RMSE of 15.49 ×10−6 mol/m2.


Introduction
Air pollution, global warming and other pollutants have a great impact on the environment, and have become a major global concern [1]. NO 2 , which is used as the case study in this paper, is one of the greenhouse gases, and an important indicator of air pollution. It is also a precursor for several harmful secondary air pollutants, such as ozone and particulate matter (PM 2.5 , and PM 10 ). This was the reason why networks of in situ Internet of Things (IoT) sensors were established for monitoring environmental variables [2,3]. IoT sensors are low cost, easy to install and can perform measurements with high temporal resolution [4][5][6][7]. However, today, the networks of IoT sensors do not cover larger areas. On the other hand, space agencies have also addressed this issue by launching satellites equipped with instruments to observe air pollutants in the Earth's atmosphere [8]. Satellite measurements assure large coverage, but their temporal resolution is low. Satellites provide measurements in the form of raster images, associated with various environmental attributes [9]. In this context, the spatiotemporal alignment of IoT and satellite data sources represents the main challenge of low-level data fusion approaches that can limit the efficiency of higher levels significantly. Past studies addressed the issue of spatiotemporal alignment, either by interpolation or simulation of monitored sensor values, to match the spatial resolution of satellite images, while feeding the aligned features into the higher level analytics tools [10].
In comparison to simulations, interpolation approaches are usually less computationally demanding. The most commonly used approaches include simple aggregation of the closest sensor values (i.e., Voronoi Natural Neighbors' Interpolation), linear and bilinear interpolation, Inverse Distance Weighting (IDW) and kriging [11,12]. The aggregations of the closest sensor values are the most straightforward, as they do not require ables, traffic, and wind data. Other types of models and their combinations were studied (e.g., Community Multiscale Air Quality and GEOS-Chem) with a variety of regressions (neural network, random forest, gradient boosting algorithms and a generalized additive geographically weighted model) [24]. However, CTM-based simulation approaches are computationally demanding and difficult to implement, as they require precise definition of the behavior of chemical species within the given grid-cell. Consequently, the results are of low spatial and temporal resolutions [25].
The methods mentioned above perform interpolation in the pre-processing step to fill the missing spatial gaps. The proposed method omits interpolation during the initialization. Instead, the assessment of an environmental variable is performed by an ensemble of regression models, where each regression model performs the interpolation by different parameters.
The whole process constructs a satellite-like raster image based on the in situ IoT measurements. The resulting image can, therefore, be constructed in times when IoT measurements are given in comparison with other methods, which expose lower temporal resolutions (typically on the dally, or even monthly, scale). For this, the observed area is partitioned into Voronoi cells, based on the locations of the IoT sensors active at the desired time. Each set of pixels located in a Voronoi cell has its own regression model, by which better adaptation can be achieved to the local characteristics. The parameters for the regression models are selected by using the measurements of the neighboring IoT sensors. The nearest neighbor, linear regression, and forward-feeding neural network are used in this paper. Accordingly, the proposed approach brings the following novelties: • a strong theoretical foundation for modeling the relationship between the IoT and satellite data, • the integration of interpolation directly into regression models, yielding a more compact and consistent algorithm, • an ensemble of base regression models constructed by using measurements from the surrounding IoT sensors, and • an increased temporal resolution, dependent only on the sampling rate of the IoT sensors.
The rest of the paper is organized as follows: The details of the proposed method are explained in Section 2. Section 3 describes the observed area and data preparation. Section 4 provides the results of the approach and their evaluation. Section 5 discusses the obtained results and concludes the article.

Methods
This Section introduces a new ensemble method for assessing the environmental variables by constructing a satellite-like image using the data fusion paradigm. The method's input consists of the gridded observed area, in situ IoT sensor data, and the two archives. The first archive contains satellite images and the second archive stores the past IoT measurements. To construct the satellite-like image, the following steps are implemented (see Figure 1): Data selection, and • Machine learning and Prediction.
These steps are highlighted in the continuation.

Feature Selection
Let S = {s n }, 0 ≤ n < N, be a set of N IoT sensors. Each sensor s n operates at the fixed location Position(s n ) = ( x n , y n ) , and can either be available or turned off. S t = {s t n }, S t ⊆ S denotes a set of these sensors, which are returning the valid measurements of an environmental variable E in the given time t ∈ [ t s , t e ], where t s indicates the starting and t e the ending time of the measurements. As seen in Figure 2, the observed area is covered by a set of M pixels P = {p m }, where p m has the center at the location Position(p m ) = ( x m , y m ), 0 ≤ m < M. and Γ(s t n ) are then contained in a vector of the selected sensors F t at the time moment t, F t = s t n ∪ Γ( s t n ), which is then used in the data selection process.

Data Selection
By utilizing knowledge about the past measurements of the observed environmental variable E by sensors and by satellite images, the regression models are built for each vector of the target pixels TP(v t n ), which are located inside the Voronoi cell v t n , TP(v t n ) = Inside(v t n ), TP(v t n ) ⊆ P (see Figure 2). The measurements from both sources are stored as time series in two archives: Archive A IoT containing times and measurements of E from N IoT sensors A IoT = {Time, E_s 0 , E_s 1 , ..., E_s N−1 }, while A Sat = {Time, E_p 0 , E_p 1 , ..., E_p M−1 } stores the measurements of E obtained from M pixels of the satellite images. Possible invalid values in A IoT are identified and marked. The obtained satellite images are re-gridded, due to inconsistencies in the pixel position in each revisited time [16] before being stored in A Sat . In addition, for each m-th pixel p m , its validity value is also stored beside the value of E.
The selection process (see Figure 3) consists of querying both archives using F t . All time moments, where values from IoT sensors are valid, are used and stored in vector T. To construct the training, validation and testing datasets, the vector T is used to obtain Tables IoT E and Sat E , which are aligned temporally. Figure 3. Data selection process.

Data Selection in Details
In the continuation, the whole selection process is described in detail, with the relation algebra operators' projection (π) and selection (σ).
Archive A IoT is queried firstly by a vector of the selected sensors from vector F t obtained during the feature selection process. The query's result is a vector of times T, T ⊆ [ t s , t e ], (Equation (1)), where denotes valid measurements.
A matrix IoT E , of available measurements of E obtained by IoT sensors in times T, is then acquired by querying the archive A IoT (Equation (2)).
Subsequently, the archive A Sat (Equation (3)) is queried for measurements of E from satellite pixels (Sat E ), for time moments T obtained from (Equation (1)), and target pixels TP(v t n ), where v t n is the Voronoi cell of sensor s t n .
A concrete example is demonstrated in Tables 1 and 2, where the valid measurements are represented schematically by tick marks, while the invalid ones are crossed. The feature selected sensors F t = {s 3 , s 4 , s 6 , s 8 } for sensor s 8 and Γ(s 8 ) = {s 3 , s 4 , s 6 } are seen in Figure 4.  Tables 1 and 2. The center of the Voronoi cell is marked by red dot, while its natural neighbors are marked by black dots. Pixels whose centers are located within the Voronoi cell are colored in yellow. The extraction from the whole observed area is shown, where the pixels from p 0 to p 11 are present.
Similarly, let us assume that sensor s 8 (located in Voronoi cell v 8 in Figure 4) has the following 3 target pixels: TP(v 8 ) = {p 4 , p 7 , p 10 }. The A Sat is queried and Sat E is obtained (Equation (6)).
The results of queries from both archives are the yellow-colored cells in Tables 1 and 2. ..

Machine Learning and Prediction
The base regression models for satellite-like image construction were obtained by three different approaches: • Nearest Neighbor, • Linear Regression, and • Neural Networks.
The same data from IoT E , Sat E and the input feature vector FV t =< F t , E_F t > are used by all three approaches. Let us remember that F t , F t = s t n ∪ Γ( s t n ), while E_F t represents their measurements of the observed environmental variable E, E_F t = E_s t n ∪ E_Γ( s t n ) .

Nearest Neighbor
Nearest Neighbor (1-NN) is an instance-based machine learning algorithm [26]. It queries the training dataset of instances to find the nearest object. In our case, 1-NN is used to find the nearest instance from IoT E according to input feature vector FV t =< F t , E_F t > by which the corresponding values are obtained from Sat E . The distance D between FV t and instance in the row from IoT E , IoT E,row , 0 ≤ row < |IoT E |, is calculated by Equation (7). The distance D is the sum of differences between the measurements of E_F t from vector FV t , multiplied with the Inverse Distance Weighting (IDW) between the central sensor s t n and each of its neighbors in Γ( s t n ). Due to being the result of IDW( s t n , s t n ) = 1, this is omitted, and only the difference between E_s t n and IoT E,row _s n is made at the start of calculation.
D( FV t , IoT E,row ) = |E_s t n − IoT E,row _s n | The E_s t n in Equation (7) denotes the measurements of E in time moment t, E_s t n ∈ E_F t , while IoT E,row _s n is sensor s n 's measurements from IoT E,row (Equation (8)).
Furthermore, the measurements of E from IoT E _Γ( s t n ) row are obtained by Equation (10).
The values of E of the target pixels TP(v t n ) , which are obtained from the satellite, are denoted as E_TP(v t n ). They are obtained from a row in Table Sat E . The index row is acquired by applying the Argmin function, which returns the index in which D is minimal (Equation (11)).
The pixels from TP( v t n ) may contain some invalid values. For such pixels the valid values are used from the next nearest instance row = Argmin( D( FV t , IoT E,row ) ). The process is repeated until all E_TP( v t n ) are valid.

Linear Regression
Linear Regression (LR) also takes advantage of supervised learning to model the relationships between variables [27]. LR is used to model the relationship between the measurements of E from the IoT sensors and a single value of each target pixel from the satellite, E_F t → E_p i , 0 ≤ i < |E_TP(v t n ) |. Consequently, the algorithm uses only the values from IoT E to calculate the value of the targeted pixel E_p i . However, the samples with invalid pixel values are excluded. The result is a regression equation, which is represented by a vector of coefficients b (Equation (12)).

Neural Networks
A Neural Network (NN) utilizes supervised learning [28]. The NN uses samples in the training dataset to generate a regression function that maps the inputs into their corresponding outputs. Our method uses the feed-forward neural network [29] to map the measurements of any environmental variable E from the IoT sensors to the values of the targeted pixels. The network consists of an input layer, a hidden layer, and an output layer. These layers are connected densely, meaning that each neuron of the current layer is connected to each one in the preceding layer. The values in each j-th layer are denoted as x j,k , where j, 0 ≤ j < 3 is the number of layers and k, 0 ≤ k < LayerSize(j) is the number of elements in each layer. The number of neurons in the hidden layer is the mean of the neurons in the input and output layers (Equation (13)).
The regression function E_F t → E_TP( v t n ) is generated in the process of machine learning. This is achieved by feeding the input samples from IoT E to the NN and comparing the obtained output with samples from the Sat E . The initial values of E_F t (also x 0 ) are then used as input to the neurons in the hidden layer. The input to neuron ne j,k is a sum of the input values x j−1 multiplied by the vector of weights w j,k (Equation (14)).
The activation function Φ is applied on the vector of the obtained values, where function Softmax [30] is applied in the hidden layer (Equation (15)), and, a linear activation function is used at the output layer.
The weights of the regression function are then updated according to the calculated error between the actual value E_p i , E_p i ∈ E_TP(v t n ) and the obtained value, which is denoted as E_p i . In the Sat E some training samples can contain invalid measurements. For this purpose, the error is calculated according to Equation (16).

Study Area and Data Preparation
In order to account for various testing conditions, the area of the Republic of Slovenia was used as the observed area. The country covers 20,271 square kilometers, and is known for its geomorphological diversity. This is due mostly to various natural regions gathered in one place: Alps, Dinaric Alps, the Pannonian Basin and the Mediterranean Basin. Although the area is mostly covered by forests (up to almost 60%), it bears a variety of pollution sources [31,32]. As seen in Figure 5, the area is covered sparsely by IoT sensors to monitor NO 2 emissions. One of the main pollution sources is traffic, as roads which are part of Trans-European Transport Network, connecting major European and Slovenian cities, cross the country. Namely, Ljubljana and Maribor are places in Slovenia where the population is the most dense [33]. Furthermore, another polluted area connected to the road traffic is the port of Koper, which is an international connection from Continental Europe to the Mediterranean Sea [34,35]. Moreover, Slovenia owns the Šoštanj Thermal Power Plant located in the Šalek Valey, which is a large source of air pollution, as it produces approximately 35% of the electricity [36]. To obtain different testing scenarios, eight different areas across the whole country were selected as test cases.
Sentinel-5P is dedicated to the monitoring of the Earth's atmosphere, and has a revisit rate of less than a day [37]. The measurements are given to users in the form of Network Common Data Form (format NetCDF) [38]. Each pixel is equipped with a timestamp, the value of the tropospheric NO 2 column, and a quality assurance value (qa_value ∈ [ 0, 1]), which indicates the validity of the measurement [37]. qa_value = 0 corresponds to an invalid measurement, while qa_value = 1 represents a value with no errors. Pixels with qa_value > 0.75 were used as recommended in [39]. Due to the inconsistencies in the satellite pixel position at each revisit time, the observed area was gridded, as seen in Figure 5. Based on the typical satellite pixel sizes, the size 5.5 × 5.5 km was selected for the pixels in the grid. The satellite image was cropped and re-gridded to match the observed gridded area. Additionally, the data were used from the IoT ground sensors measuring NO 2 seen in Figure 5. The unified interface providing various geo-biophysical parameters from sensor networks across Slovenia is available in [40]. The national data provider is the Slovenian Environment Agency [3]. However, other local providers contribute their data to the mentioned platform, such as, for example, the Šoštanj Thermal Power Plant.

Results
The proposed approach was implemented on a personal computer with an Intel Core TM i7 − 9750H CPU, 6 cores, 12 MB of cache and 32 GB of memory. The approach was tested with all three introduced machine learning algorithms.
The implementation of LR was taken from the programming library MLPACK [41]. Furthermore, the feed-forward neural network was implemented using TensorFlow [42]. The regression model was built by setting the hyperparameters: epochs to 250, batch_size to 16, and optimizer to Adam. These hyperparameters, along with the selection of the activation function and the number of neurons in the hidden layer, were obtained by using cross-validation with the Grid Search algorithm [43]. The parameter cv, which determines the number of folds was set to three. Additionally, the input data were scaled to a range between 0 and 1. On the other hand, the mean value was subtracted from the target variables in the training dataset. Additionally, they were divided by their Standard Deviation. The calculated outputs were then mapped back to their original range.
The Tables IoT E and Sat E were aligned temporally, and aggregated by applying the natural join conditions (Equation (17)). The aggregated dataset was split in order to obtain the training, validation and testing datasets (see Figure 6).
The training and validation datasets contained the samples between 1 January 2020 and 28 February 2021. As seen in Figure 7, the training dataset was used to build the prediction model [44]. The validation dataset was applied only during the tuning of the hyperparameters for the regression model built by the NN.
The testing dataset included the measurements from 1 March 2021, and up to and including 1 June 2021. This dataset was withheld from the machine learning, and was used to evaluate the regression model's performance, as performed in [28,45].
The evaluation of the machine learning model was performed by comparing the calculated values and the actual measurements of E (i.e., NO 2 ) from the testing dataset using the Root Mean Square Error (RMSE) metric, defined in Equation (18). The testing dataset included a total of 72,813 valid E values from the satellite pixels in P. Let us remember that P is a set of M pixels in the observed area,  Figure 7. Machine learning and model evaluation workflow.
The accuracy of the base regression models for each machine learning algorithm was evaluated on the test cases (seen in Figure 5) and the whole observed area. Additionally, the execution times were measured for calculating the pixel values within the single image (849 pixels). It should be noted that the measured times include only the prediction phase, whereas the time used for data preparation and the machine learning was excluded. The obtained results are given in Table 3.
As seen in Table 3, the RMSEs varied between test cases. However, the RMSE ratios between them were similar for each base regression model. Likewise, the execution times to calculate the values of pixels in each observed area were also dependent on the regression model. The 1-NN took the longest, while the fastest was LR. The results prove that the method processing time is under 1 min, and that the proposed method is suitable for practical use. However, the time measurement excluded the data preparation and machine learning, which is largely dependent on the size of the training dataset and machine learning algorithm used. Some of the image samples, which were calculated in the absence of the satellite data, are shown in Figure 8.  The additional analyses were conducted on the results obtained by the neural network, as it outperformed the other two algorithms. As seen in Table 4, the RMSE was compared for each test case, based on the number of considered IoT sensors. Some test cases never considered a specific number of IoT sensors. This occurred either due to invalid measurements or positions of the Voronoi cells. These cases are marked with "/" in Tables 4 and 5. Assessment of the NO 2 was conducted using additional meteorological variables. They were considered by generating a vector of random meteorological parameters 50 times. In each new iteration the proposed method calculated pixel values for the whole test dataset P. The averages of the results are seen in Table 6, while the best results were obtained when temperature, humidity, wind speed and NO 2 were considered together (RMSE = 14.79 × 10 −6 mol/m 2 ).

Discussion
A new ensemble approach for the assessment of the arbitrary environmental variable is proposed in this article. It utilizes the data fusion paradigm to construct the satellite-like image, based on measurements from the IoT sensors. Unique to other methods, it omits the interpolation of input variables in the initialization, and performs it during the prediction.
The results showed that, when comparing the method's performance using three different machine learning algorithms, the feed-forward neural network achieved the best performance. The main factor contributing to the performance of the neural network was its architecture, which determines the number of coefficients used to calculate the target variable. This could have caused us to obtain more accurate results than linear regression. There are also some other differences that contribute to the construction of a regression equation. This includes the number of iterations to update weights, and the algorithm used to determine how the weights are updated.
The RMSEs varied between the selected test cases. The best results on average were obtained in the test cases 5, 7 and 8 (Table 3), which are positioned on the plain terrain (see Figure 5). These cases also had better results when less and nearer sensors were considered (Tables 4 and 5). This was due to the lack of major geographical barriers between the considered IoT sensors that contributed to the calculated values [15]. Similarly, good results were also obtained in test case 6. This test case is positioned near the Šoštanj Thermal Power Plant and is surrounded by three or more IoT sensors. The test cases with the worst results had geographical barriers present between them and the considered sensors. However, the obtained results could be improved if more IoT sensors were considered.
The 1-NN took the longest time to perform the calculations. This was due to being unsupervised learning, and most processing was performed in the process of searching for the nearest instance. Furthermore, the approach performed slower in cases where the set of pixels with the smallest distance contained many invalid values. In this case, the algorithm had to search for the next set of pixels. The 1-NN also carries out most of the processing in the prediction phase. On the other hand, linear regression and neural network being the supervised approaches, perform most of the calculations in the phase of machine learning.
Nevertheless, it was shown that the proposed approach performs in a relatively short amount of time, and provides good results. The main advantage of this approach is its assessment of the environmental variable at an arbitrary location within the observed area in times when the IoT measurements are available. On the other hand, other similar approaches assess the variables at lower temporal resolutions. The proposed approach increases the spatial resolution by defining the ensemble of base regression models which are dependent on the neighboring IoT sensors. The proposed approach allows the integration of any machine learning algorithm. Furthermore, it can be applied to an arbitrary variable in any observed area, as long as sufficient and appropriate correlated training data are provided. It may also use other auxiliary meteorological variables in addition to the main predictor (NO 2 in our case).
The main disadvantage of the proposed approach is its inability to include new sensors. This will be improved by incremental machine learning algorithms [46]. Furthermore, the method performance can also be increased by utilizing different machine learning algorithms to model the relationship between the IoT and the satellite sensor data. We will try to improve efficiency by utilizing recurrent neural networks to make time series predictions [47].