Nitrogen Dioxide Monitoring by Means of a Low-Cost Autonomous Platform and Sensor Calibration via Machine Learning with Global Data Correlation Enhancement

Koziel, Slawomir; Pietrenko-Dabrowska, Anna; Wójcikowski, Marek; Pankiewicz, Bogdan

doi:10.3390/s25082352

Open AccessArticle

Nitrogen Dioxide Monitoring by Means of a Low-Cost Autonomous Platform and Sensor Calibration via Machine Learning with Global Data Correlation Enhancement

¹

Engineering Optimization & Modeling Center, Reykjavik University, 102 Reykjavik, Iceland

²

Faculty of Electronics, Telecommunications and Informatics, Gdansk University of Technology, 80-233 Gdansk, Poland

^*

Author to whom correspondence should be addressed.

Sensors 2025, 25(8), 2352; https://doi.org/10.3390/s25082352

Submission received: 19 February 2025 / Revised: 25 March 2025 / Accepted: 4 April 2025 / Published: 8 April 2025

(This article belongs to the Section Environmental Sensing)

Download

Browse Figures

Versions Notes

Abstract

:

Air quality significantly impacts the environment and human living conditions, with direct and indirect effects on the economy. Precise and prompt detection of air pollutants is crucial for mitigating risks and implementing strategies to control pollution within acceptable thresholds. One of the common pollutants is nitrogen dioxide (NO₂), high concentrations of which are detrimental to the human respiratory system and may lead to serious lung diseases. Unfortunately, reliable NO₂ detection requires sophisticated and expensive apparatus. Although cheap sensors are now widespread, they lack accuracy and stability and are highly sensitive to environmental conditions. The purpose of this study is to propose a novel approach to precise calibration of the low-cost NO₂ sensors. It is illustrated using a custom-developed autonomous platform for cost-efficient NO₂ monitoring. The platform utilizes various sensors alongside electronic circuitry, control and communication units, and drivers. The calibration strategy leverages comprehensive data from multiple reference stations, employing neural network (NN) and kriging interpolation metamodels. These models are built using diverse environmental parameters (temperature, pressure, humidity) and cross-referenced data gathered by surplus NO₂ sensors. Instead of providing direct outputs of the calibrated sensor, our approach relies on predicting affine correction coefficients, which increase the flexibility of the correction process. Additionally, a calibration stage incorporating global correlation enhancement is developed and applied. Demonstrative experiments extensively validate this approach, affirming the platform and calibration methodology’s practicality for reliable and cost-effective NO₂ monitoring, especially keeping in mind that the predictive power of the enhanced sensor (correlation coefficient nearing 0.9 against reference data, RMSE < 3.5 µg/m³) is close to that of expensive reference equipment.

Keywords:

air pollution assessment; nitrogen dioxide detection; low-cost sensing systems; correction techniques; artificial intelligence; neural networks

1. Introduction

High concentrations of nitrogen dioxide (NO₂) pollution have long been recognized as harmful to human health. Potential health issues linked to elevated NO₂ levels encompass skin infections, respiratory system problems, nasal and ocular irritation, bronchitis, lung cancer, and worsening of underlying health issues [1,2,3]. Two standards for NO₂ are in place, proposed in the CAFE Directive [4], concerning average yearly and hourly concentrations (40 µg/m³ and 200 µg/m³, respectively; the latter should not be surpassed for over 18 h a year). The WHO provides similar guidelines [5]. However, concentrations surpassing these limits are being recorded by over a dozen percent of monitoring stations in Europe. This issue is predominantly linked to urban areas, especially in proximity to vehicular transportation systems. NO₂ pollution has substantial economic consequences (e.g., the estimated costs in 2016 in China were close to USD 30 billion [3]). Another aspect is the environmental impact of NO_x pollution, which includes eutrophication of water systems, acid rain occurrence, and photochemical smog [6]. The latter is responsible for intense algal blooms, leading to ecological degradation of water reservoirs. Furthermore, the increase in O3 concentration due to NO_x negatively impacts agriculture.

Standard NO₂ monitoring requires large and stationary equipment, as well as controlled installation conditions and periodical service. The most reliable methods are (i) photofragment chemiluminescence (featuring high sensitivity yet needing repetitive calibration) [7], (ii) long-distance differential optical absorption spectroscopy (good sensitivity but poor spatial resolution) [8], (iii) laser-induced fluorescence (of excellent sensitivity yet exploiting a pulsed laser and a vacuum system) [9], and (iv) cavity ring spectroscopy (portable system not requiring calibration) [10]. Notwithstanding, the above techniques involve expensive equipment, which is also associated with high maintenance costs.

The aforementioned disadvantages of stationary monitoring stations fostered the development of compact and low-cost sensors, which enable portability and exhibit low deployment expenses. Another important advantage is an increased spatial resolution of air pollution detection, which is crucial due to spatial and temporal heterogeneity of air pollutants, particularly in densely populated urban areas [11,12,13]. Unfortunately, low-priced sensors are grossly inaccurate when compared to the reference stations [14,15,16], which is due to their inherent instability [17], limited fabrication repeatability [18,19], cross-sensitivity to other gases [20,21,22], as well as susceptibility to environmental variables (temperature, humidity, etc.) [23,24,25]. Although achieving sufficient accuracy in pollutant detection is crucial, a substantial volume of less precise data from cost-effective sensors can supplement the output of limited reference stations. Inexpensive devices offer affordable monitoring alternatives for low- and middle-income countries [26]. Additionally, sensor networks [27,28] installed on ground vehicles or unmanned aerial vehicles [29,30] might emerge as pivotal elements of air quality monitoring systems in the foreseeable future.

In recent years, significant efforts have been directed toward developing calibration techniques to enhance cheap sensor systems’ reliability. Two categories of sensor calibration techniques exist: field and laboratory ones [31]. While laboratory methods are theoretically more accurate, they are conducted under conditions seldom replicated in real-world settings (considering environmental parameters and the presence of various ambient gases), making them susceptible to failure when validated in the field [14,15]. Consequently, most studies opt for field calibration, utilizing reference data collected by governmental monitoring stations in corresponding locations. In a numerical sense, calibration exploits a spectrum of regression procedures and a range of advanced machine learning procedures, such as artificial neural networks. A number of approaches utilizing regression techniques have been reported in the literature. In [31], electrochemical sensors of NO and NO₂ were calibrated using multivariate linear regression (MLR), support vector regression (SVR), and random forest (RF) by taking into account temperature and humidity data. In [32], linear statistical learning algorithms, Gaussian process regression (GPR), ridge regression, random forest regression (RFR), and MLR were applied to calibrate cheap NO₂ and PM₁₀ sensing devices, taking into account temperature and humidity. Some of these methods provided promising results. In [33], the MLR technique was employed to calibrate the chemiluminescence NO-NO₂-NO_x analyzer, again, using temperature and humidity data. A variety of studies using regression models for calibrating cost-efficient sensors can be found in [34,35,36,37].

Over the recent years, the utilization of artificial neural networks for sensor correction has become increasingly popular, which seems to be more reliable than the methods outlined in the previous paragraph. In [26], calibration of CO, NO₂, O₃, and SO₂ sensors has been carried out, taking into account ambient temperature and humidity and utilizing single linear regression (SLR), MLR, RFR, and long short-term memory networks (LSTM). The obtained results indicate that LSTM is superior to all regression-based algorithms considered in the study. In [11], sensor calibration using historical time series has been carried out using a convolutional neural network for short-term variation modeling and a recurrent neural network (RNN) for extracting global and periodic features. The results of correcting commercial CO and O₃ sensors (also taking into account temperature and humidity) demonstrated advantages of CNN/RNN over a number of benchmark techniques such as SLR, SVR, or a composition of LSTM and CNN. Other related studies reported in the literature include utilization of Bayesian NNs [38], shallow NNs [39], or dynamic NNs [40,41].

This study introduces a novel autonomous NO₂ measurement platform and a robust machine learning (ML) framework for calibration of cheap commercial sensors. The platform comprises primary and auxiliary NO₂ detectors, additional sensors for assessing environmental conditions (temperature, humidity, pressure), and hardware units with drivers for measuring and data transmission protocols. Our approach employs a combination of data-driven models: artificial neural network (ANN) and kriging interpolant for predicting affine correction factors of low-cost sensors. At the same time, the uniqueness of correction factors is ensured through appropriate regularization. The calibration procedure integrates environmental parameters as well as NO₂ concentrations from main and surplus sensors as inputs. Furthermore, global data correlation enhancement is achieved by adding an additional calibration stage. The developed calibration strategy has been validated with the use of the reference and cheap detector data collected during a five-month-long measurement period conducted at several venues in Gdansk (nearly 480,000 citizens), Poland. Comprehensive experiments corroborate that the calibrated NO₂ sensor ensures excellent monitoring precision with a correlation coefficient close to 0.9 w.r.t. reference measurements and RMSE not exceeding 3.5 μg/m³. At the same time, it has been demonstrated that all incorporated correction mechanisms are relevant and contribute to quality improvements. The reported level of performance makes the presented platform and the calibration methodology appropriate for practical, low-cost, and dependable NO₂ monitoring.

2. Inexpensive NO₂ Sensing Units

Here, we outline the design of the developed autonomous NO₂ monitoring platform. In particular, we provide a brief summary of the sensing units, including the electronic circuitry and sensors. The ML-based correction strategy is explained in Section 3.

2.1. Autonomous Monitoring Platform

For the purpose of acquiring measurement data as well as implementing calibration of inexpensive NO₂ sensors, custom microprocessor-based hardware units have been designed and prototyped. The system is equipped with a number of environmental sensors, installed to provide various types of data, later employed to improve NO₂ detection reliability. The platform also encapsulates a wireless communication device (GSM modem) for transmitting the readings to the cloud. The development of automatic data acquisition procedures involves commercial elements controlled by the BeagleBone^(R) Blue microprocessor system [42], containing a device for storing data and a built-in power supply. The system can operate for at least two hours without external supply due to the presence of a re-chargeable 7.4 V/4400 mA battery. The use of low-cost sensors, along with commercially available low-cost modules, makes the entire platform a low-cost solution, especially when compared to professional, calibrated laboratory-class equipment used as a reference. The BeagleBone board incorporates serial input/output (I/O) ports to connect the units (GSM transmission module and several cheap sensors). The schematic diagram of the unit has been shown in Figure 1, which also provides information about sensors included in the platform. The primary operating software as well as drivers for hardware units have been written in Python3. The GSM modem transmits the sensors’ readings to the cloud, which may be accessed through the web browser.

All units have been placed on the customized base plate fabricated from polyethylene terephthalate using a 3D printing manufacturing process (see Figure 2a,b). All gas detectors (SGX, ST, MICS) have been installed in near proximity (cf. Figure 2a) along with the temperature and humidity detectors evaluating operating parameters of SGX, ST, and MICS. The supplementary sensor measuring external temperature and humidity was mounted at the unit’s edge because of the heat generated by the hardware. Figure 2b also illustrates an optional USB stick module (Intel) supporting computations. This module can be used to speed up the application of the calibration model on the device; however, the platform has sufficient time between measurements to complete all calculations even without the hardware accelerator. The complete system is installed in a polyethylene terephthalate weatherproof case shown in Figure 2c.

It should be noted that the low-cost sensors used in the proposed system may exhibit certain limitations. While the parameters of the sensors are given in the manufacturer datasheets [43,44,45,46,47], the real-world operation for such sensors is challenging, and one cannot expect the quality of the results of low-cost sensors to be comparable to the professional reference equipment. We did not estimate the influence of sensor cross-sensitivity to other gases nor saturation and recovery time after high exposure. Instead, we used the results from multiple sensors and developed an algorithm for providing as good conformance as possible to the reference measurements in real, outdoor, and long-term conditions.

According to the manufacturers, the electrochemical sensors used in our platform have a limited lifetime (2–3 years), and they must be replaced accordingly. The sensors are mounted in the sockets, making their replacement straightforward. Other components have longer lifetimes, where the overall lifetime of the platform can be estimated at 10 years, provided that the regular replacement of the above-mentioned electrochemical sensors is performed.

2.2. Reference Data

In order to calibrate and assess the reliability of the monitoring platform delineated in Section 2.1, acquiring high-quality reference data is crucial. In this study, we utilized an air monitoring network in the city of Gdańsk, Poland, set up under the auspices of the ARMAG Foundation [48]. The locations of the reference facilities are shown in Figure 3a, and the photo of a single facility is presented in Figure 3b. The utilized instruments include the following:

Thermo environmental 42C chemiluminescent NO_x analyzer (stations 1 and 3);
API Teledyne 200E chemiluminescent NO_x analyzer (station 8).

ARMAG offers the air condition data (gathered every hour) to the public at no cost via their website https://armaag.gda.pl/en/ (accessed on 19 February 2025).

It should also be noted that selecting Gdansk for the presented study has been decided for practical reasons. Gdansk University of Technology is located in Gdansk, which has appropriate measurement infrastructure (reference stations) described above. As the presented low-cost measurement platforms have to be allocated in the vicinity of the base stations in the process of data acquisition, Gdansk was a natural choice for the study. Furthermore, it is a representative urban zone with a large population and considerable industrial infrastructure as well as relatively heavy traffic, which are all factors contributing to NO_x-type air pollution.

3. Machine-Learning-Based Sensor Calibration

This section delves into the calibration method developed to rectify the NO₂ data obtained from the inexpensive sensor utilized in the hardware units described in Section 2. The primary algorithmic components include affine sensor correction, a machine learning-based calibration approach integrating neural network (NN) and kriging interpolation surrogates, and a global strategy for enhancement of (reference and cheap sensor) data correlation. Environmental factors (external/internal temperature, pressure, humidity) and measurements collected by both main and surplus NO₂ sensors serve as inputs for the surrogates.

The remaining part of this section is organized as follows. The formulation of the calibration problem can be found in Section 3.1 and Section 3.2, which discuss the affine sensor correction scheme. Section 3.3 and Section 3.4 elucidate surrogate modeling techniques employed by the calibration procedure. Global data correlation enhancement is then discussed in Section 3.5, whereas Section 3.6 summarizes the operating flow of the entire calibration procedure.

3.1. Problem Statement

Low-cost sensor correction is carried out based on the data collected at the reference facilities outlined in Section 2.2. The data were acquired within five months, with the measurements generated hourly. The monitoring units of Section 2, allocated in the vicinity of the corresponding reference stations, produce a set of measurements that include NO₂ readings from the main and surplus sensors, as well as environmental parameters (temperature, humidity, pressure). Figure 4 visualizes the relevant outputs rendered by the base station and the low-cost measurement platform. The summary of the data obtained from the reference facilities and our sensors, along with the associated notation, is also provided in Figure 4. It should be noted that the internal temperature exceeds the external one (the opposite relationship occurs for humidity), which is a result of heating of the electronic equipment within the unit. In other words, the operating conditions of the internal and external sensors are different. Consequently, it is beneficial—from the point of view of calibration reliability—to consider both the outside and inside environmental parameters. The readings from the auxiliary NO₂ sensor readings are grossly inaccurate, yet the analysis of their outputs (in the form of including them as inputs of the calibration models) enables the indirect quantification of the factors that influence the main sensor outputs (e.g., cross-sensitivity to various gas pollutants).

The available data are partitioned into two sets: N₀ training samples and N_t testing samples, so that N_t equals around one-tenth of the total sample number N (for details, see Section 4). In the following, the reference training data will be referred to as y_r₀^(j)}, j = 1, …, N₀, and {y_rt^(j)}, j = 1, …, N_t, will denote testing data. The same data division has been applied to the cheap sensor NO₂ readings. In particular, {y_s₀^(j)}, j = 1, …, N₀, pertains to training points, and {y_st^(j)}, j = 1, …, N_t refers to testing points. The supplementary data, which will be the input of the calibration models, are gathered into respective vectors. We have {z_s₀^(j)}, j = 1, …, N₀—auxiliary training data with z_s₀ = [T_o₀^(j) T_i₀^(j) H_o₀^(j) H_i₀^(j) P₀^(j) S₁₀^(j) S₂₀^(j)]^T, and {z_st^(j)}, j = 1, …, N_t—supplementary testing data with z_st = [T_ot^(j) T_it^(j) H_ot^(j) H_it^(j) P_t^(j) S_1t^(j) S_2t^(j)]^T.

Calibration is carried out with the use of the training datasets {y_r₀^(j)}, {y_s₀^(j)}, and {z_s₀^(j)}, j = 1, …, N₀. The correction coefficients are represented by C(y_s,z_s;p), see Figure 5a, with p being a vector comprising concatenated correction model parameters (e.g., hyperparameters of the NN), and y_c = F_CAL(y_s,C(y_s,z_s;p)) standing for the calibrated sensor output. In Section 4, we consider two variations of the calibration process: (i) with the inputs being both auxiliary data z_s and the primary sensor output y_s, and (ii) with the only input being z_s. This is to demonstrate that incorporating the sensor-predicted NO₂ level does improve the correction process reliability.

Using the notation discussed in the previous paragraphs, we formulate the sensor calibration task as follows:

p^{*} = \arg \min_{p} \sqrt{\sum_{j = 1}^{N_{0}} {(y_{r 0}^{(j)} - F_{C A L} (C (y_{s 0}^{(j)}, z_{s 0}^{(j)}, p)))}^{2}}

(1)

According to (1), we aim at identifying the hyper-parameters of the calibration models so that the reference NO₂ readings and those from the calibrated sensor are in the best possible (L-square) agreement over the considered training set.

3.2. Basic Correction Scheme. Affine Response Scaling

In contrast to conventional approaches, where sensor calibration is arranged by modeling the differences between reference and sensor data, in this work, we employ affine scaling, which is a composition of multiplicative and additive correction. Initial inspection of the data (cf. Figure 6) indicates that the magnitude of reference data changes exceeds that of cheap sensor data. Consequently, it is beneficial to multiplicatively scale the sensor outputs using a coefficient larger than unity to ‘globally’ improve the data alignment.

The details of the affine correction can be found in Figure 7. As mentioned, given the reference and sensor data, it is recommended to ensure that the multiplicative correction factor A^(j) > 1, which can be ensured by setting the value of the hyper-parameter α (cf. (9), (10)) strictly below unity. In practice, α is optimized simultaneously with identifying the primary correction model. Based on the initial experiments, we set α = 0.8 for all experiments in Section 4.

3.3. ML-Based Sensor Calibration

In our approach, low-cost sensor correction is carried out using a composition of neural networks (NNs) and kriging interpolation surrogates (cf. Section 3.4). The utilized NN model is a multi-layer perceptron (MLP) [50,51]. The specific structure of the NN model, shown in Figure 8, incorporates three fully connected hidden layers with 20 neurons each and a sigmoid activation function. The surrogate is trained using a backpropagation Levenberg-Marquardt algorithm [52] (maximal number of epochs 1000, MSE for performance evaluation, random training/testing data division).

Note that the selected NN architecture is purposely simple, which brings a number of benefits. On the one hand, the network training is quick, so that a number of variations can be conveniently explored. On the other hand, due to a large number of training samples (as compared to the number of NN weights), the surrogate is a regression model of low sensitivity to the number of layers. Furthermore, a straightforward MLP structure allows for smoothing out the inherent noise of both sensor and reference readings.

As mentioned in Section 3.2, two calibration variations are employed, in which the primary sensor output y_s is either treated as the surrogate input or not taken into account. These will be marked as C_ANN.y(y_s,z_s,p_ANN) and C_ANN(y_s,z_s,p_ANN), respectively. Recall that the correction function outputs are affine scaling factors A and D.

3.4. Auxiliary Correction by Means of Kriging Interpolation

Apart from NN, we also employ kriging [52] as an additional correction mechanism. Kriging belongs to data-driven modeling approaches with numerous engineering and scientific applications [53,54,55,56,57,58]. The kriging formulation with Gaussian correlation functions can be found in Figure 9. Our approach renders independent models for multiplicative and additive correction coefficients A and D. Note that kriging surrogate (cf. (K.7)) consists of two parts, one being a regression model g(x)^Tβ (usually involving low-order polynomials), the other being a stochastic one accounting for local disparities from the trend. The hyper-parameters θ_k are determined using maximum likelihood estimation.

The kriging surrogate is established using the same training dataset as used for the NN model, i.e., {x^(j)}_j _{= 1, …, N0}, are either vectors {z_s₀^(j)} or a composition of {z_s₀^(j)} and cheap sensor outputs {y_s₀^(j)}. The respective kriging models, rendering the correction coefficients A and D, are denoted as C_KR(y_s,z_s,p_KR) and C_KR.y(y_s,z_s,p_KR).

The kriging surrogate serves as an additional calibration model along with the NN predictor. It should be emphasized that kriging interpolant ensures an exact fit to the training data but has limited ability to generalize. A combination of both kinds of models allows us to achieve a better trade-off between generalization and approximation. We employ a convex combination of the kriging and NN models:

C (y_{s}, z_{s}, p_{T O T}) = β C_{A N N} (y_{s}, z_{s}, p_{A N N}) + (1 - β) C_{K R} (y_{s}, z_{s}, p_{K R})

(11)

whenever y_s-data are excluded from the model input,

C_{y} (y_{s}, z_{s}, p_{T O T}) = β C_{A N N . y} (y_{s}, z_{s}, p_{A N N}) + (1 - β) C_{K R . y} (y_{s}, z_{s}, p_{K R})

(12)

when it is included. The overall parameter vector p_TOT = [p_ANN p_KR]. The calibration coefficients are A(y_s,z_s,p_TOT) and D(y_s,z_s,p_TOT) (similarly for (12)). Based on the initial experiments, it has been determined that β = 0.7 ensures the best results in terms of low-cost sensor calibration quality.

At this point some comments are relevant concerning the relationship between ANN and kriging calibration models. As mentioned earlier, ANN is the primary model implemented using a simple architecture to act as a regressor, which learns typical dependencies between environmental parameters and the cheap sensor and reference NO₂ readings. Increasing the ANN complexity would lead to improved approximation capability at the expense of generalization, which is essential given a broad range of NO₂ level variability. This is reflected in better performance of the correction model over the training data in comparison to the testing data, as indicated in Section 4. In other words, the ANN surrogate allows for good overall correction of the inexpensive sensor readings. The kriging interpolation model, on the other hand, is by definition interpolative, therefore providing a perfect approximation of the training data. Incorporating this surrogate with a variable convex combination parameter β allows us to improve the overall approximation capability of the calibration model while readily controlling the balance between approximation and generalization. Thus, adding kriging as a supplementary model enhances the calibration scheme’s flexibility, which would be difficult to achieve with ANN only.

3.5. Global Data Correlation Enhancement

Surrogate-assisted calibration described in Section 3.1 through Section 3.4 is further complemented by a global data correlation enhancement procedure proposed in this study as a supplementary correction method. The motivation behind it is that the application of (here, L-square-based) calibration of the form of (1) may lead to systematic offsets, which are functions of NO₂ level from the cheap sensor. This has been shown in Figure 10a,b using a part of the training data considered in Section 4. Although surrogate-based calibration leads to excellent results (see Figure 10a), re-plotting the samples after ordering the reference NO₂ values reveals the aforementioned offsets. More specifically, the average misalignment between the corrected and reference data is positive for low NO₂ levels but becomes slightly negative for higher levels. This is also noticeable on the scatter plot (right-hand-side panel of Figure 10b), which is slightly skewed.

To reduce the offset, we implement a simple procedure that effectively ‘rotates’ the smoothened sensor data so that it becomes better aligned with the ordered reference measurement. Let y_r be the ordered reference data vector, as illustrated by the red line in Figure 10b. Furthermore, let y_c be the calibrated inexpensive sensor data, also ordered (exactly as y_r), as illustrated by the blue line in Figure 10b. Finally, let S_m(y_c) be the smoothened y_c. In the case of aggressive smoothing, S_m produces a monotonically increasing curve representing (local) mean values of the vector y_c. Global data correlation enhancement is executed using affine transformation similar to (5), i.e.,

y_{c . G}^{(j)} = A_{G} (y_{c}^{(j)} + D_{G})

(13)

for j = 1, …, N₀. The coefficients A_G and D_G are established by solving a regression problem:

[A_{G} D_{G}] = \arg \min_{[A D]} ‖ y_{r} - A (S_{m} (y_{c}) + D) ‖

(14)

Note that the coefficients are identified at the level of aggregated training data vectors, and they are not functions of any parameters from the low-cost sensor (neither environmental nor auxiliary ones). The effects of global data correlation enhancement have been shown in Figure 10c for the same training data subset as considered in Figure 10b. One can observe a considerable offset reduction, but also the improvement of the scatter plot symmetry. The employment of the global correction permitted increasing the correlation coefficient to 0.95 (from 0.93) and reducing RMSE to 1.8 μg/m³ (from 2.1). Clearly, the r² and RMSE values for the testing data will be less favorable, yet the achieved improvement is similar, as demonstrated in Section 4. Meanwhile, Figure 11 shows the effects of global data correlation improvement for the smoothened calibrated sensor data. As can be seen, application of (13) and (14) leads to a considerably better alignment between the sensor and reference data.

3.6. Complete Operating Flow of Calibration Procedure

The complete calibration procedure employs the mechanisms discussed in Section 3.2 through Section 3.5. The first step is to predict the (local) correction coefficients by kriging and neural network surrogates using a supplementary vector z_s and the true sensor reading y_s. Both models are combined as described in Section 3.4 using the convex combination factor β. The intermediate quantity y_c is a result of the affine correction (2), (3), whereas the ultimate corrected output is rendered by applying the global correlation scheme (13), (14). The procedure’s flowchart can be found in Figure 12.

4. Results and Discussion

This section delves into the outcomes yielded by the calibration framework presented in Section 3, implemented on the inexpensive sensor detailed in Section 2. We will briefly delineate the dataset composition utilized in validation experiments, describe in detail the results achieved for various setups, and formulate key observations.

4.1. Data Description

Our verification experiments exploit the data gathered from the three reference stations outlined in Section 2.2 and located in the city of Gdansk, Poland. Data acquisition has been performed across five months, from March till August 2023. The autonomous monitoring units of Section 2 have been placed in the immediate vicinity of the reference facilities. The readings were collected hourly. The total sample number is over 10,000. As mentioned earlier, about 90 percent of samples were used to establish the correction models (NN and kriging surrogates, global data correlation enhancement). The remaining samples were used for testing. The training data are denoted as {y_r₀^(j)} (reference measurements), {y_s₀^(j)} (cheap sensor measurements), and {z_s₀^(j)} (measurements of an auxiliary sensor), j = 1, …, N₀. The testing data are denoted as {y_rt^(j)}, {y_st^(j)}, and {z_st^(j)}, j = 1, …, N_t, with N_t = 1008 = 3·336. The latter correspond to three 14-day periods from the following locations (Station 1, 1–15 April; Station 2, 15–29 July; Station 3, 1–14 July). Figure 13 visualizes selected training data subsets, both reference and uncorrected readings. It should be emphasized that the disparities between the readings are considerable, so that the calibration process is a difficult undertaking.

4.2. Numerical Results

The cost-efficient NO₂ sensor embedded into the hardware unit of Section 2 has been calibrated using the technique of Section 3, based on data described in Section 4.1. For the purpose of comparison, we have considered a number of calibration setups, as encapsulated in Table 1. Each setup represents a different set of the model input data (selected or complete surplus detector parameters, inclusion of NO₂ output y_s from the main sensor), optional incorporation of the kriging metamodel, as well as employment of the global data correlation enhancement. This allows us to demonstrate the relevance of particular algorithmic tools and their contribution to the improvement of the low-cost sensor measurement reliability. The NN surrogate has been trained 10 times for each setup, and the best results (as measured using the RMSE loss function on the training data) indicated the ultimate surrogate. The affine scaling factor α (cf. Section 3.2) has been concurrently adjusted in this process, and α = 0.8 was selected for being the most beneficial (i.e., it ensured the highest generalization capability and accuracy of the calibration models).

The numerical results can be found in Table 2, which reports the obtained correlation coefficient r and modeling error (here, RMSE) for the testing and training data. The utilized definitions have been included in Figure 14, Figure 15, Figure 16 and Figure 17, which show visualization of the results for three selected setups (setups 2, 5, and 9), which present the reference and corrected sensor outputs for the selected sub-sets, as well as the respective scatter plots. For further clarification, Figure 18 depicts the combined training and testing outputs, specifically for Setup 9, arranged based on ascending values of NO₂ concentrations. The displayed data comprise reference NO₂ levels alongside the respective raw and calibrated sensor readings.

The application of the calibration model requires approximately no more than 100,000 multiplication and 10,000 addition operations (upper limit with considerable margin), which can easily be applied in the time between the measurements on the proposed platform.

Additional validation included a comparison of the proposed calibration approach with several benchmark frameworks: linear regression (LR) and two direct approaches—artificial neural network-based (ANN) calibration and calibration implemented using a convolutional neural network (CNN) [55]. The numerical results are gathered in Table 3. For ANN/CNN prediction, the calibrated model output is predicted directly, in contrast to our approach, where correction coefficients are predicted. An analysis of the results shows that our calibration methodology outperforms all benchmark techniques in terms of both correlation coefficients and RMSE. Thus, it can be concluded that the key mechanism behind this superiority lies in the employment of affine correction (cf. Table 3), which ensures better performance compared to the direct prediction of the calibrated sensor using ANN of the same architecture or a CNN.

4.3. Economic Analysis

The Autonomous Measurement Platform outlined in Section 2.1 comprises widely available electronic modules and parts (COTS). The electrochemical sensors are less accurate than professional measuring equipment, but their cost is also much lower. The correction mechanisms proposed in this article significantly enhance these cheap sensors’ operation, making them comparable to professional measuring equipment with a correlation coefficient of approximately r = 0.9. As can be seen from Table 4 and Table 5, the cost of a professional reference station is significantly higher, approximately USD 87,000 of the initial costs and approximately USD 3300 of annual costs. The expenses associated with the proposed platform are just USD 750 (at mass production, the costs would be significantly lower). In contrast, the annual cost of platform usage, including sensor replacement due to their limited lifetime, is approximately USD 120.

4.4. Deployment

The deployment of the proposed measurement platforms is rather straightforward. As the hardware platform is supplied by a low voltage (the allowed input power supply voltage range is 12–36 V, max 24 W), even a simple external and certified 15 V laptop-type power supply can be used. It is sufficient that the external power supply is labelled with the FCC logo, which means that it has been authorized under the Supplier’s Declaration of Conformity (SDoC) procedure from the Federal Communications Commission (FCC) of the United States of America. It should also be certified with CE, which assures that the power supply has been assessed by the manufacturer and deemed to meet the European Union’s (EU) safety, health, and environmental protection requirements. The platform hardware is an experimental device developed for scientific purposes. Consequently, it does not have any conformity certifications yet. However, if needed to introduce the system to commercial usage, acquiring such a certificate would not be difficult, as it is a low-voltage device containing standard, mostly passive components. It would likely be required to conduct conformance of radio emission tests as the device contains the GSM modem and the antenna.

Clean air is crucial for human health and the environment. Air pollution has now been proven harmful to human health and the environment. Therefore, it is important to monitor the level of air pollution. Dense deployment of the proposed platforms is easily feasible. The only limitation is the coverage and the capacity of the cellular network in the area of deployment, as the devices use GSM modems in IoT mode to upload the measurement results to the database in the computer cloud. Multiple utilization scenarios can be proposed for the presented low-cost platform with its calibration algorithms. Installing a dense stationary measurement network assessing the air quality in a certain area would enable end-users to make information-driven decisions to mitigate air pollution exposure. The wide deployment of the platforms on cars, trucks, buses, bikes, and scooters would allow monitoring of air pollution on a large scale (street, urban), enabling mapping the pollution dynamics and locating emission hot spots and increasing the spatial and temporal resolution. Installing a pair of sensors, one inside and another one outside the car’s or truck’s cabin, would provide information to the driver to open or close the vehicle’s windows to prevent being exposed to the increased air pollution in the place where drivers and passengers stay for long periods.

4.5. Discussion

The results presented in Section 4.2 are examined here to outline the functionality of the proposed correction process and evaluate the performance of the calibrated sensor. It should be noted that sensor calibration poses a serious challenge due to notable disparities between sensor and reference outputs, alongside the wide dynamic range of measurements (between zero and around 60 µg/m³), as illustrated in Figure 13. Additionally, NO₂ values frequently undergo substantial fluctuations over short intervals.

Despite these challenges, our calibration methodology provides remarkably good results, which is confirmed by the correlation coefficients and RMSE level indicated in Table 2. Clearly, the most involved arrangement (Setup 9), which combines NN and kriging surrogates, the broadest range of input variables (NO₂ measurements collected by the main and surplus sensors and all environmental parameters), and global data correlation enhancement, is also the most successful one.

For this setup, the correlation coefficient exceeds 0.88 (and 0.96 for the training data). At the same time, RMSE is just 1.7 μg/m³ and 3.5 μg/m³ for the testing and training samples, respectively. This error value is low, particularly given a broad range of recorded nitrogen dioxide levels, and makes the calibrated low-cost sensor practically usable. High precision of the corrected sensor also manifests itself in excellent visual agreement between its output and the reference one, as shown in Figure 17. Again, this has been achieved despite considerable misalignment between the reference and raw sensor measurements.

It should also be emphasized that all algorithmic tools developed and employed as the components of the calibration procedure are relevant and contribute to the obtained precision of the corrected sensor. For example, noticeable improvement of the correlation coefficient and reduction of RMSE are achieved due to combining the NN surrogate with the kriging metamodel (cf. Setup 3 vs. 2, or Setup 5 vs. 4). Increasing the number of model inputs (e.g., Setup 7 versus 5) is even more essential. The same can be said about considering the main NO₂ sensor output as a correction model input (Setup 5 versus 3), which by itself improves the correlation coefficient by about 0.03 and lowers RMSE by nearly 0.5 μg/m³.

Finally, the global data correlation enhancement scheme of Section 3.5 is yet another important component, which increases the correlation coefficient by an additional 0.02 and lowers RMSE by about 0.3 μg/m³. As mentioned earlier, these combined enhancements translate into better visual agreement between the corrected and reference data. Similar improvements can be observed on the scatter plots, which are noticeably more concentrated in the vicinity of the identity function for Setup 9 (Figure 17) than for other Setup 5 (Figure 16), let alone Setup 2 (Figure 15). Figure 18 provides another way of illustrating the improvements obtained via the introduced correction procedure by showing the aggregated testing and training samples ordered w.r.t. the increasing NO₂ levels. As it can be observed, the corrected sensor samples are allocated significantly closer to the corresponding reference data, in contrast to the uncorrected sensor.

The histograms of the absolute errors (i.e., the differences between the corrected and reference readings, y_r − y_c, for the complete testing data) for Setups 2, 5, and 9 have been shown in Figure 19. Close to normal error distribution has been observed, as expected. However, for Setups 2 and 5, the mean is negative (−1.4 and −1.0 μg/m³, respectively), which is because of certain asymmetry of the NO₂ reading distribution w.r.t. the reference. This is also noticeable on the scatter plots (Figure 15 and Figure 16), which are slightly skewed. For Setup 9, owing to global data correlation enhancement, the mean is close to zero, and the scatter plot is more symmetrical accordingly.

The standard deviations for Setups 2, 5, and 9 are 4.2, 3.8, and 3.3, respectively. This indicates that enhancing the calibration approach (such as increasing the number of input variables for the surrogates, combining NN with kriging, and applying global correction) significantly improves the reliability of the inexpensive sensor. From the data, it can be inferred that the probability of the calibrated sensor’s absolute error being within 3 µg/m³ is approximately 65%, while the probability of it falling within the ±6 µg/m³ range exceeds 90%.

Based on the extensive comparison of the performance of low-cost sensors and calibration algorithms together with the cost of the equipment given in [59], it can be seen that only a few papers can report the correlation coefficient r > 0.9 [60,61,62]. Notwithstanding, calibration was performed individually for each sensor in these works. Furthermore, in [61], each hardware contains duplicated sensors for better data quality control. The solution presented in this paper uses the same correction scheme for all sensors (measurement units), avoiding troublesome individual treatment of each device. Despite this general approach, the correlation coefficient is very high.

In conclusion, calibrated sensor reliability is outstanding, in particular, for the most sophisticated correction setup, Setup 9. In practice, an offline execution of the calibration procedure is possible, meaning it can be performed on the NO₂ readings from the sensor of the measurement platform before transmitting the data to the end user using the wireless communication module. Another approach is to execute correction directly at the hardware unit, using the installed computational module described in Section 2.

5. Conclusions

This study introduced an autonomous, custom-built platform for monitoring nitrogen dioxide (NO₂) as well as an innovative machine learning calibration technique for a cheap NO₂ sensor. The employed hardware units comprise main and secondary NO₂ sensors, multiple auxiliary detectors for assessing environmental conditions (temperature, humidity, pressure), and dedicated electronic devices with drivers for establishing and managing monitoring protocols and wireless data transmission. The correction method utilizes an affine adjustment of the inexpensive sensor readings, employing regularization to ensure the uniqueness of correction coefficients. It combines neural network and kriging interpolation surrogates as an ensemble metamodel, predicting corrections based on input variables encompassing environmental parameters and NO₂ readings from the main and secondary detectors. Additionally, a global data correlation enhancement layer is included, operating across the complete training dataset.

The developed correction process exploited reference and cheap sensor data collected across multiple locations in Gdansk, Poland, over five months. The proposed monitoring platforms were situated near reference stations, providing output data for analysis in an hourly regime. Rigorous verification indicates that the proposed correction technique achieves exceptional accuracy of NO₂ monitoring, boasting a correlation coefficient surpassing 0.88 obtained for the reference data. Simultaneously, the RMSE error remains below 3.5 μg/m³. Achieving such excellent accuracy corroborates the practicality and dependability of NO₂ detection using cheap sensing devices. Supplementary experiments involving alternative correction setups underline the importance of the developed algorithmic tools in refining the correction scheme. Specifically, the inclusion of additional input variables such as primary and secondary NO₂ readings, the fusion of NN and kriging surrogates, and global data correlation enhancement collectively enhance the accuracy of NO₂ detection.

In future efforts, enhancing NO₂ monitoring reliability remains a key objective. This includes employing other gas detection devices (e.g., SO₂, CO, O₃), using their readings as auxiliary inputs for the correction model, and exploring cross-sensitivity. Additionally, the exploration of more advanced ML-based techniques, particularly deep learning NNs and their integration with regression schemes, will be pursued to refine the calibration process even further. Another direction of future research would be to expand our framework and explore its adaptability to diverse climatic conditions in other locations, as well as carry out investigations concerning long-term sensor drift and the needed calibration updates.

Author Contributions

Conceptualization, S.K. and A.P.-D.; methodology, S.K. and A.P.-D.; software, S.K. and B.P.; validation, S.K. and B.P.; formal analysis, S.K. and M.W.; investigation, S.K., A.P.-D., M.W. and B.P.; resources, S.K. and M.W.; data curation, S.K., A.P.-D., M.W. and B.P.; writing—original draft preparation, S.K.; writing—review and editing, S.K., A.P.-D., M.W. and B.P.; visualization, S.K., A.P.-D., M.W. and B.P.; supervision, M.W.; project administration, S.K.; funding acquisition, S.K., M.W. and B.P. All authors have read and agreed to the published version of the manuscript.

Funding

This work is partially supported by the Icelandic Centre for Research (RANNIS) Grant 2410297 and by the National Science Centre of Poland Grant 2020/37/B/ST7/01448.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Data are contained within the article.

Acknowledgments

The authors would like to thank Dassault Systemes, France, for making CST Microwave Studio available.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Chen, T.-M.; Kuschner, W.G.; Gokhale, J.; Shofer, S. Outdoor air pollution: Nitrogen dioxide, sulfur dioxide, and carbon monoxide health effects. Am. J. Med. Sci. 2007, 333, 249–256. [Google Scholar] [PubMed]
Schwela, D. Air pollution and health in urban areas. Rev. Environ. Health 2000, 15, 13–42. [Google Scholar] [PubMed]
Zhao, S.; Liu, S.; Sun, Y.; Liu, Y.; Beazley, R.; Hou, X. Assessing NO2-related health effects by non-linear and linear methods on a national level. Sci. Total Environ. 2020, 744, 140909. [Google Scholar] [PubMed]
Agras, J.; Chapman, D. The Kyoto protocol, cafe standards, and gasoline taxes. Contemp. Econ. Policy 1999, 17, 296–308. [Google Scholar] [CrossRef]
WHO. Air Quality Guidelines: Global Update 2005: Particulate Matter, Ozone, Nitrogen Dioxide, and Sulfur Dioxide; World Health Organization: Geneva, Switzerland, 2006. [Google Scholar]
Mauzerall, D.L.; Sultan, B.; Kim, N.; Bradford, D.F. NO_x emissions from large point sources: Variability in ozone production, resulting health damages and economic costs. Atmos. Environ. 2005, 39, 2851–2866. [Google Scholar] [CrossRef]
Bradshaw, J.; Davis, D.; Crawford, J.; Chen, G.; Shetter, R.E.; Müller, M.; Gregory, G.; Sachse, G.; Blake, D.; Heikes, B.; et al. Photofragmentation—laser induced fluorescence detection of NO₂ and NO: Comparison of measurements with model results based on airborne observations during PEM-Tropics A. Geophys. Res. Lett. 1999, 26, 471–474. [Google Scholar] [CrossRef]
Platt, U. Air monitoring by differential optical absorption spectroscopy. In Encyclopedia of Analytical Chemistry; John and Wiley and Sons: New York, NY, USA, 2017. [Google Scholar]
Matsumoto, J.; Hirokawa, J.; Akimoto, H.; Kajii, Y. Direct measurement of NO₂ in the marine atmosphere by laser-induced fluorescence technique. Atmos. Environ. 2001, 35, 2803–2814. [Google Scholar] [CrossRef]
Berden, G.; Peeters, R.; Meijer, G. Cavity ring-down spectroscopy: Experimental schemes and applications. Int. Rev. Phys. Chem. 2010, 19, 565–607. [Google Scholar]
Yu, H.; Li, Q.; Wang, R.; Chen, Z.; Zhang, Y.; Geng, Y.; Zhang, L.; Cui, H.; Zhang, K. A deep calibration method for low-cost air monitoring sensors with multilevel sequence modeling. IEEE Trans. Instrum. Meas. 2020, 69, 7167–7179. [Google Scholar]
Bi, J.; Wildani, A.; Chang, H.H.; Liu, Y. Incorporating low-cost sensor measurements into high-resolution PM_2.5 modeling at a large spatial scale. Environ. Sci. Technol. 2020, 54, 2152–2162. [Google Scholar] [CrossRef]
Castell, N.; Dauge, F.R.; Schneider, P.; Vogt, M.; Lerner, U.; Fishbain, B.; Broday, D.; Bartonova, A. Can commercial low-cost sensor platforms contribute to air quality monitoring and exposure estimates. Environ. Int. 2017, 99, 293–302. [Google Scholar] [PubMed]
Laref, R.; Losson, E.; Sava, A.; Siadat, M. Empiric unsupervised drifts correction method of electrochemical sensors for in field nitrogen dioxide monitoring. Sensors 2021, 21, 3581. [Google Scholar] [CrossRef]
Fonollosa, J.; Fernández, L.; Gutièrrez-Gálvez, A.; Huerta, R.; Marco, S. Calibration transfer and drift counteraction in chemical sensor arrays using direct standardization. Sens. Actuators B Chem. 2016, 236, 1044–1053. [Google Scholar]
Rai, A.C.; Kumar, P.; Pilla, F.; Skouloudis, A.N.; Sabatino, S.D.; Ratti, C.; Yasar, A.; Rickerby, D. End-user perspective of low-cost sensors for outdoor air pollution monitoring. Sci. Total Environ. 2017, 607, 691–705. [Google Scholar]
Kim, H.; Müller, M.; Henne, S.; Hüglin, C. Long-term behavior and stability of calibration models for NO and NO₂ low-cost sensors. Atmos. Meas. Tech. 2022, 15, 2979–2992. [Google Scholar]
Poupry, S.; Medjaher, K.; Béler, C. Data reliability and fault diagnostic for air quality monitoring station based on low cost sensors and active redundancy. Measurement 2023, 223, 113800. [Google Scholar]
Carotta, M.C.; Martinelli, G.; Crema, L.; Malagù, C.; Merli, M.; Ghiotti, G.; Traversa, E. Nanostructured thick-film gas sensors for atmospheric pollutant monitoring: Quantitative analysis on field tests. Sens. Actuators B Chem. 2001, 76, 336–342. [Google Scholar] [CrossRef]
Wang, Z.; Li, Y.; He, X.; Yan, R.; Li, Z.; Jiang, Y.; Li, X. Improved deep bidirectional recurrent neural network for learning the cross-sensitivity rules of gas sensor array. Sens. Actuators B Chem. 2024, 401, 134996. [Google Scholar]
Zimmerman, N.; Presto, A.A.; Kumar, S.P.N.; Gu, J.; Hauryliuk, A.; Robinson, E.S.; Robinson, A.L.; Subramanian, R. A machine learning calibration model using random forests to improve sensor performance for lower-cost air quality monitoring. Atmos. Meas. Tech. 2018, 11, 291–313. [Google Scholar]
Gorshkova, A.; Gorshkov, M.; Tripathi, N.; Tukmakov, K.; Podlipnov, V.; Artemyev, D.; Mishra, P.; Pavelyev, V.; Platonov, V.; Djuzhev, N.A. Enhancement in NO₂ sensing properties of SWNTs: A detailed analysis on functionalization of SWNTs with Z-Gly-OH. J. Mater. Sci. Mater. Electron. 2023, 34, 102. [Google Scholar]
Jiao, W.; Hagler, G.; Williams, R.; Sharpe, R.; Brown, R.; Garver, D.; Judge, R.; Caudill, M.; Rickard, J.; Davis, M.; et al. Community Air Sensor Network (CAIRSENSE) project: Evaluation of low-cost sensor performance in a suburban environment in the southeastern United States. Atmos. Meas. Tech. 2016, 9, 5281–5292. [Google Scholar] [CrossRef]
Lewis, A.C.; Lee, J.D.; Edwards, P.M.; Shaw, M.D.; Evans, M.J.; Moller, S.J.; Smith, K.R.; Buckley, J.W.; Ellis, M.; Gillot, S.R.; et al. Evaluating the performance of low cost chemical sensors for air pollution research. Faraday Discuss. 2016, 189, 85–103. [Google Scholar] [CrossRef] [PubMed]
Spinelle, L.; Gerboles, M.; Villani, M.G.; Aleixandre, M.; Bonavitacola, F. Field calibration of a cluster of low-cost commercially available sensors for air quality monitoring. Part B: NO, CO and CO₂. Sensor. Actuators B Chem. 2017, 238, 706–715. [Google Scholar] [CrossRef]
Han, P.; Mei, H.; Liu, D.; Zeng, N.; Tang, X.; Wang, Y.; Pan, Y. Calibrations of low-cost air pollution monitoring sensors for CO, NO₂, O₃, and SO₂. Sensors 2021, 21, 256. [Google Scholar] [CrossRef] [PubMed]
Müller, M.; Graf, P.; Meyer, J.; Pentina, A.; Brunner, D.; Perez-Cruz, F.; Hüglin, C.; Emmenegge, L. Integration and calibration of non-dispersive infrared (NDIR) CO₂ low-cost sensors and their operation in a sensor network covering Switzerland. Atmos. Meas. Tech. 2020, 13, 3815–3834. [Google Scholar] [CrossRef]
Shusterman, A.A.; Teige, V.E.; Turner, A.J.; Newman, C.; Kim, J.; Cohen, R.C. The BErkeley Atmospheric CO2 Observation Network: Initial evaluation. Atmos. Chem. Phys. Discuss. 2016, 16, 13449–13463. [Google Scholar] [CrossRef]
Andersen, T.; Scheeren, B.; Peters, W.; Chen, H. A UAV-based active AirCore system for measurements of greenhouse gases. Atmos. Meas. Tech. 2018, 11, 2683–2699. [Google Scholar] [CrossRef]
Kunz, M.; Lavric, J.; Gasche, R.; Gerbig, C.; Grant, R.H.; Koch, F.-T.; Schumacher, M.; Wolf, B.; Zeeman, M. Surface flux estimates derived from UAS-based mole fraction measurements by means of a nocturnal boundary layer budget approach. Atmos. Meas. Tech. 2020, 13, 1671–1692. [Google Scholar] [CrossRef]
Miech, J.A.; Stanton, L.; Gao, M.; Micalizzi, P.; Uebelherr, J.; Herckes, P.; Fraser, M.P. Calibration of low-cost NO2 sensors through environmental factor correction. Toxics 2021, 9, 281. [Google Scholar] [CrossRef]
Nowack, P.; Konstantinovskiy, L.; Gardiner, H.; Cant, J. Machine learning calibration of low-cost NO₂ and PM₁₀ sensors: Non-linear algorithms and their impact on site transferability. Atmosph. Meas. Tech. 2021, 14, 5637–5655. [Google Scholar] [CrossRef]
D’Elia, G.; Ferro, M.; Sommella, P.; De Vito, S.; Ferlito, S.; D’Auria, P.; di Francia, G. Influence of concept drift on metrological performance of low-cost NO₂ sensors. IEEE Trans. Instrum. Meas. 2022, 71, 1–11. [Google Scholar] [CrossRef]
Jain, S.; Presto, A.A.; Zimmerman, N. Spatial modeling of daily PM_2.5, NO₂, and CO concentrations measured by a low-cost sensor network: Comparison of linear, machine learning, and hybrid land use models. Environ. Sci. Technol. 2021, 55, 8631–8641. [Google Scholar] [CrossRef] [PubMed]
Ionascu, M.-E.; Castell, N.; Boncalo, O.; Schneider, P.; Darie, M.; Marcu, M. Calibration of CO, NO₂, and O₃ using Airify: A low-cost sensor cluster for air quality monitoring. Sensors 2021, 21, 7977. [Google Scholar] [CrossRef]
Bi, J.; Stowell, J.; Seto, E.Y.W.; English, P.B.; Al-Hamdan, M.Z.; Kinney, P.L.; Freedman, F.R.; Liu, Y. Contribution of low-cost sensor measurements to the prediction of PM_2.5 levels: A case study in Imperial County, CA, USA. Environ. Res. 2020, 180, 108810. [Google Scholar] [CrossRef] [PubMed]
van Zoest, V.; Osei, F.B.; Stein, A.; Hoek, G. Calibration of low-cost NO₂ sensors in an urban air quality network. Atmos. Environ. 2019, 210, 66–75. [Google Scholar] [CrossRef]
De Vito, S.; Delli Veneri, P.; Esposito, E.; Salvato, M.; Bright, V.; Jones, R.L.; Popoola, O. Dynamic multivariate regression for on-field calibration of high speed air quality chemical multi-sensor systems. In Proceedings of the XVIII AISEM Annual Conference, Trento, Italy, 3–5 February 2015; IEEE: Piscataway, NJ, USA, 2015; pp. 1–3. [Google Scholar]
Masson, N.; Piedrahita, R.; Hannigan, M. Quantification method for electrolytic sensors in long-term monitoring of ambient air quality. Sensors 2015, 15, 27283–27302. [Google Scholar] [CrossRef] [PubMed]
Esposito, E.; De Vito, S.; Salvato, M.; Bright, V.; Jones, R.L.; Popoola, O. Dynamic neural network architectures for on field stochastic calibration of indicative low cost air quality sensing systems. Sens. Actuators B Chem. 2016, 231, 701–713. [Google Scholar]
Wang, Z.; Xie, C.; Liu, B.; Jiang, Y.; Li, Z.; Tai, H.; Li, X. Self-adaptive temperature and humidity compensation based on improved deep BP neural network for NO₂ detection in complex environment. Sens. Actuators B Chem. 2022, 362, 131812. [Google Scholar] [CrossRef]
BeagleBone® Blue, BeagleBoard. Available online: https://www.beagleboard.org/boards/beaglebone-blue (accessed on 19 February 2025).
Datasheet SPS30, Particulate Matter Sensor for Air Quality Monitoring and Control, Sensirion. Available online: https://sensirion.com/products/catalog/SPS30 (accessed on 7 April 2025).
SGX-7NO2 Datasheet, Industrial Nitrogen Dioxide (NO2) Sensor’, SGX Sensortech. Available online: https://www.sgxsensortech.com/content/uploads/2021/10/DS-0338-SGX-7NO2-datasheet.pdf (accessed on 19 February 2025).
Four Electrode NO2 Sensor, SemaTech (7E4-NO2-5) (PN: 057-0400-200), SemeaTech Inc. Available online: https://www.semeatech.com/uploads/datasheet/7series/057-0400-200_EN.pdf (accessed on 19 February 2025).
Datasheet MiCS-2714 1107 rev 6, SGX Sensortech. Available online: https://www.sgxsensortech.com/content/uploads/2014/08/1107_Datasheet-MiCS-2714.pdf (accessed on 19 February 2025).
Humidity Sensor BME280, Bosch Sensortec. Available online: https://www.bosch-sensortec.com/products/environmental-sensors/humidity-sensors-bme280/ (accessed on 19 February 2025).
ARMAG Foundation: Home. Available online: https://armaag.gda.pl/en/index.htm (accessed on 19 February 2025).
Map Data from OpenStreetMap. Available online: https://openstreetmap.org/copyright (accessed on 19 February 2025).
Vang-Mata, R. (Ed.) Multilayer Perceptrons; Nova Science Pub. Inc.: New York, NY, USA, 2020. [Google Scholar]
Dlugosz, S. Multi-Layer Perceptron Networks for Ordinal Data Analysis; Logos Verlag: Berlin, Germany, 2008. [Google Scholar]
Hagan, M.T.; Menhaj, M. Training feed-forward networks with the Marquardt algorithm. IEEE Trans. Neural Netw. 1994, 5, 989–993. [Google Scholar] [PubMed]
Bingler, A.; Bilicz, S.; Csörnyei, M. Global sensitivity analysis using a kriging metamodel for EM design problems with functional outputs. IEEE Trans. Magn. 2022, 58, 1–4. [Google Scholar] [CrossRef]
Diago-Mosquera, M.; Aragón-Zavala, A.; Azpilicueta, L.; Shubair, R.; Falcone, F. A 3-D indoor analysis of path loss modeling using kriging techniques. IEEE Antennas Wirel. Propag. Lett. 2022, 21, 1218–1222. [Google Scholar]
Zhan, D.; Xing, H. A fast kriging-assisted evolutionary algorithm based on incremental learning. IEEE Trans. Evol. Comp. 2021, 25, 941–955. [Google Scholar]
Yu, S.; Li, Y. Active learning kriging model with adaptive uniform design for time-dependent reliability analysis. IEEE Access 2021, 9, 91625–91634. [Google Scholar]
Sinha, A.; Shaikh, V. Solving bilevel optimization problems using kriging approximations. IEEE Trans. Cybern. 2022, 52, 10639–10654. [Google Scholar] [CrossRef] [PubMed]
Song, Z.; Wang, H.; He, C.; Jin, Y. A kriging-assisted two-archive evolutionary algorithm for expensive many-objective optimization. IEEE Trans. Evol. Comp. 2021, 25, 1013–1027. [Google Scholar]
Karagulian, F.; Barbiere, M.; Kotsev, A.; Spinelle, L.; Gerboles, M.; Lagler, F.; Redon, N.; Crunaire, S.; Borowiak, A. Review of the performance of low-cost sensors for air quality monitoring. Atmosphere 2019, 10, 506. [Google Scholar] [CrossRef]
Cordero, J.-M.; Borge, R.; Narros, A. Using statistical methods to carry out in field calibrations of low cost air quality sensors. Sens. Actuators B Chem. 2018, 267, 245–254. [Google Scholar]
Bigi, A.; Mueller, M.; Grange, S.K.; Ghermandi, G.; Hueglin, C. Performance of NO, NO₂ low cost sensors and three calibration approaches within a real world application. Atmos. Meas. Tech. 2018, 11, 3717–3735. [Google Scholar]
Spinelle, L.; Gerboles, M.; Villani, M.G.; Aleixandre, M.; Bonavitacola, F. Field calibration of a cluster of low-cost available sensors for air quality monitoring. Part A: Ozone and nitrogen dioxide. Sens. Actuators B Chem. 2015, 215, 249–257. [Google Scholar]

Figure 1. Autonomous ambient monitoring unit: (a) block diagram (the picture shows the main functional units of the platform along with the peripherals, specifically, sensors, communication devices, and environmental parameter detectors); (b) sensors incorporated into the proposed autonomous measurement platform.

Figure 2. Proposed measurement equipment: inside view, panels (a,b) show top and bottom layers, respectively, and (c) exploded view.

Figure 3. Base stations for reference data acquisition: (a) the exact locations and (b) a photo of the selected station along with the developed hardware unit. The stations are established by the ARMAG Foundation, and according to its terminology, they are dubbed AM1, AM3, and AM8. Maps are provided by OpenStreetMap [49].

Figure 4. NO₂ measurements from (a) the calibrated cheap sensor (y_r) and (b) the reference sensor (y_r); the sensor’s auxiliary outputs are surplus NO₂ data (S₁ and S₂), inside and outside temperature (T_i and T_o, respectively) and humidity (H_i and H_o, respectively), and atmospheric pressure (P); (c) data supplied by the reference station and autonomous monitoring platforms of Section 2.

Figure 5. Sensor calibration: (a) general structure with the calibration unit producing the correction coefficients C(y_s,z_s,p), (b) calibration procedure: Correction coefficients (obtained based on auxiliary and sensor data y_s) serve to assess the calibrated output y_c (for comparison, we will also consider a version without y_s as a calibration input). Observe that the correction model is placed between the (raw) low-cost sensor and the overall output of the measurement system. In actual realization, the calibration model is used to provide correction coefficients, which are utilized to create the said final output.

Figure 6. Reference and sensor measurements: selected sub-sets of training samples. Compare the magnitude of data changes, which is considerably higher for reference readings than for the sensor counterparts. This is indicative of potential advantages of multiplicative scaling using A > 1 (scaling coefficient). The data shown consist of six-week measurement results extracted from the acquired datasets and corresponding to the three reference stations considered in this study and the low-cost sensors allocated in their vicinity.

Figure 7. Affine correction of sensor readings. As indicated in the text, the presented scheme is a mixture of additive correction and multiplicative scaling, which are balanced by the coefficient α, determined using the initial experiments. The correction coefficients are obtained by solving regression problem (6), which becomes the loss function to be minimized while training the calibration model.

Figure 8. Main calibration model in the form of a neural network (MLP comprising three hidden layers that are fully connected). The surrogate’s inputs are environmental parameters (temperature, humidity) as well as raw, low-cost sensor readings. Based on this data, the (trained) model provides predictions of the additive and multiplicative correction coefficients A and D, respectively.

Figure 9. Surrogate modeling using kriging interpolation. The model consists of two parts, a regression (trend) function and localized variations from the trend Z(x). The trend function is typically a low-order polynomial (here, of the second order), whereas Z(x) is a linear combination of radially symmetric basis functions (here, Gaussian), with the scaling parameters determined through maximum likelihood estimation.

Figure 10. Illustration of the global data correlation enhancement procedure: (a) selected subset of training data (reference and corrected sensor); (b) the same samples arranged w.r.t. to increasing NO₂ reference values (left) and respective scatter plot (right); observe a systematic level-dependent offset (although samples in (a) are well aligned); (c) the same data subject to the global data correlation enhancement procedure (systematic offset has been greatly reduced, and symmetry of the scatter plot has been improved).

Figure 11. The effects of global data correlation enhancement shown for smoothened calibrated sensor data. As it can be observed, upon correlation improvement, the reference and sensor data are considerably better aligned, which carries over to increased correlation coefficient and lower RMSE. The global correction is analytically described by Equation (13), with correction coefficients obtained through a regression process as in (14).

Figure 12. Flowchart of the calibration process. (Local) calibration coefficients are rendered by the composition of the NN and kriging models, which take into account the actual NO₂ sensor output y_s and vector z_s (the models are combined using the convex combination factor β). Next, the outcome y_c is rendered using the affine correction of the sensor reading. Finally, global data correlation enhancement is applied, producing the corrected reading y_c.G. Thus, the cheap sensor readings undergo a series of enhancements, which are applied sequentially. Again, the correction coefficients are functions of the raw sensor readings and environmental parameters corresponding to the current measurement (to be calibrated), whereas global correction is computed based on the entire available training dataset.

Figure 13. Selected NO₂ data subsets: reference and the respective uncorrected sensor outputs: (a) training data, (b) testing data. As can be noticed, the discrepancy between the low-cost sensor and reference readings is significant, especially in terms of the amplitude. Also, it is pretty consistent across the time period corresponding to the measurement campaign.

Figure 14. Definitions of correlation coefficient r and RMSE. Both r and RMSE are used as the primary performance indicators to evaluate the reliability of the proposed calibration strategy, as well as to compare its different variations. RMSE has been selected as a relevant metric of the absolute error, which gives an idea of the corrected sensor dependability as compared to the typical ranges of measured NO₂ concentration.

Figure 15. Sensor correction (Setup 2 of Table 1): (a) training data subsets; (b) testing data (uncorrected and corrected sensor readings are shown using green and blue lines, reference marked black); (c) scatter plots of the training and testing samples (shown in the left and right panels, respectively) (gray and black colors indicate uncorrected and corrected samples). The results indicate considerable improvement of the alignment between the calibrated sensor and the reference data in comparison to the raw sensor, which is also observed on the scatter plots. Observe that the testing data are completely separate from the training data and have not been used for calibration model identification.

Figure 16. Sensor correction (Setup 5 of Table 1): (a) training data subsets; (b) testing data (uncorrected and corrected sensor readings are shown using green and blue lines, reference marked black); (c) scatter plots of the training and testing samples (shown in the left and right panels, respectively) (gray and black colors indicate uncorrected and corrected samples). The results indicate considerable improvement of the alignment between the calibrated sensor and the reference data in comparison to the raw sensor, which is also observed on the scatter plots. At the same time, it can be observed that Setup 5 provides noticeably better results than Setup 2 shown in Figure 15, which is primarily due to the incorporation of the auxiliary kriging surrogate.

Figure 17. Sensor correction (Setup 9 of Table 1): (a) training data subsets; (b) testing data (uncorrected and corrected sensor readings are shown using green and blue lines, reference marked black); (c) scatter plots of the training and testing samples (shown in the left and right panels, respectively) (gray and black colors indicate uncorrected and corrected samples). The results indicate considerable improvement of the alignment between the calibrated sensor and the reference data in comparison to the raw sensor, which is also observed on the scatter plots. At the same time, it can be observed that Setup 9 provides noticeably better results than both Setup 2 (Figure 15) and Setup 5 (Figure 16), which is due to the concurrent employment of all correction mechanisms, including global enhancement (cf. Section 3.5).

Figure 18. Sensor calibration (Setup 9): (top) the complete training and (bottom) testing dataset, arranged based on ascending NO₂ concentrations. The visualization permits highlighting the benefits coming from calibration, especially pushing the corrected outputs towards their reference data counterparts (w.r.t. the raw data). It can be noticed that due to global correction (cf. Section 3.5), the calibrated sensor data are symmetrical with respect to the reference data. The lack of perfect symmetry for the testing data (bottom panel) is minor and caused by the fact that the testing data corresponds to relatively long periods of time outside the training intervals.

Figure 19. Absolute error histograms (y_r − y_c, reference vs. corrected data) for aggregated testing samples (values in μg/m³): (top) Setup 2 (mean: −1.4, standard deviation: 4.2), (middle) Setup 5 (mean: −1.0, standard deviation 3.8), and (bottom) Setup 9 (mean: −0.5, standard deviation 3.3). The distribution means are shown using solid vertical lines, and standard deviations are marked using dashed lines. These pictures indicate that the calibrated low-cost sensor errors are concentrated closer to the zero value for more complex calibration setups. Also, the mean value becomes very close to zero for Setup 9, which is due to the employment of the global response correction.

Table 1. Calibration setups considered in verification studies. The table indicates the type of calibration models used (NN or a combination of NN with kriging), specifies the calibration inputs and whether the raw low-cost sensor readings are incorporated as an input, and indicates utilization of the global correlation enhancement scheme (cf. Section 3.5).

Calibration Setup	Calibration Model	Input Variables		Global Data Correlation Enhancement
Calibration Setup	Calibration Model	Supplementary Data	NO₂ Measurements from Main Sensor (y_s)	Global Data Correlation Enhancement
1	NN	Restricted (only T_o, T_i, H_o, and H_i)	NO	NO
2	NN	Restricted (z_s without pressure P)	NO	NO
3	NN + kriging ¹	Restricted (z_s without pressure P)	NO	NO
4	NN	Restricted (z_s without pressure P)	YES	NO
5	NN + kriging ¹	Restricted (z_s without pressure P)	YES	NO
6	NN	Complete z_s	YES	NO
7	NN + kriging ¹	Complete z_s	YES	NO
8	NN	Complete z_s	YES	YES
9	NN + kriging ¹	Complete z_s	YES	YES

¹ Models combined using the convex combination factor β = 0.7.

Table 2. Sensor calibration performance. The table reports the values for the training data (columns two and three) and the testing data (columns four and five) for each calibration setup considered in Table 1.

Calibration Setup	Training Data		Testing Data
Calibration Setup	Correlation Coefficient r	RMSE [μg/m³]	Correlation Coefficient r	RMSE [μg/m³]
1	0.82	4.0	0.70	5.6
2	0.89	3.0	0.81	14.3
3	0.95	2.2	0.82	4.4
4	0.91	2.8	0.84	4.0
5	0.95	2.0	0.85	3.9
6	0.93	2.5	0.86	3.9
7	0.96	1.8	0.86	3.8
8	0.94	2.4	0.878	3.6
9	0.96	1.7	0.883	3.5

Table 3. Benchmarking: LR and direct ANN/CNN-based prediction.

Calibration Method	Training Data		Testing Data
Calibration Method	Correlation Coefficient r²	RMSE [μg/m³]	Correlation Coefficient r²	RMSE [μg/m³]
Linear regression S(z_s)	0.28	7.8	0.07	9.9
Linear regression S_y(z_s, y_s)	0.66	5.4	0.56	6.8
Direct ANN ^#-based prediction (z_s)	0.77	4.4	0.26	8.8
Direct ANN ^#-based prediction (z_s and y_s)	0.83	3.8	0.61	6.4
Direct CNN ^USD-based prediction (z_s and y_s) (convolution layers: 32, 16, 8)	0.50	6.5	0.29	8.6
Direct CNN ^USD-based prediction (z_s and y_s) (convolution layers: 64, 32, 16)	0.72	4.8	0.45	7.6
Direct CNN ^USD-based prediction (z_s and y_s) (convolution layers: 128, 64, 32)	0.77	4.5	0.42	7.7

^# ANN uses the same architecture as described in Section 3. ^USD CNN architecture uses filters of the size 4 × 1 × 1 and three convolution layers followed by a fully connected layer of the size 64 neurons, as well as batch normalization and ReLU layers in between the convolution layers. CNN is trained using the ADAM algorithm with a mini-batch size of 1000.

Table 4. Cost breakdown of the autonomous measurement platform (per unit).

No.	Name of the Component/Module	Approximate Cost at Unit Production	Lifetime
1.	SPS30 particulate matter (PM) sensor (Sensirion [43])	USD 60	>10 years
2.	SGX-7NO₂⟶NO₂ electrochemical sensor (SGX Sensortech [44])	USD 80	>24 months
3.	7E4-NO₂⟶NO₂ electrochemical sensor (SemaTech [45])	USD 140	3 years
4.	MiCS 2714⟶Compact MOS ambient quality sensor (SGX Sensortech [46]) for NO2 and hydrogen detection)	USD 16	not applicable
5.	BME280⟶Environmental sensor (Bosch Sensortech [47]) capable of detecting air temperature and humidity together with atmospheric pressure (2 pieces)	USD 14	10 years
6.	BeagleBone Blue microcomputer board	USD 140	n/a
7.	Minor passive components, supplementary modules and accessories	USD 300	n/a
	Total cost of hardware	USD 750
	Electricity (per year)	USD 25
	GSM transmission costs for IoT GSM rate for 1nce operator (per year) (www.1nce.com, accessed on 19 February 2025)	USD 11

Table 5. The approximate costs of reference stations equipped with professional analyzers. Cost estimation is based on information acquired from ARMAG [48].

No.	Name	Approximate Cost	Remarks
1.	Air-conditioned container, without the measurement equipment (similar to the one shown in Figure 3b)	USD 25,000	purchase cost
2.	NO-NO₂-NO_x Analyzer i.e., API T200	USD 25,000	purchase cost
3.	PM analyzer PM₁₀, PM_2.5, PM₁ i.e., GRiMM EDM 280	USD 37,000	purchase cost
4.	Service and maintenance of the analyzers	USD 500	cost per year
5.	Electricity (measurement equipment, air-conditioning, heating)	USD 2800	cost per year

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Koziel, S.; Pietrenko-Dabrowska, A.; Wójcikowski, M.; Pankiewicz, B. Nitrogen Dioxide Monitoring by Means of a Low-Cost Autonomous Platform and Sensor Calibration via Machine Learning with Global Data Correlation Enhancement. Sensors 2025, 25, 2352. https://doi.org/10.3390/s25082352

AMA Style

Koziel S, Pietrenko-Dabrowska A, Wójcikowski M, Pankiewicz B. Nitrogen Dioxide Monitoring by Means of a Low-Cost Autonomous Platform and Sensor Calibration via Machine Learning with Global Data Correlation Enhancement. Sensors. 2025; 25(8):2352. https://doi.org/10.3390/s25082352

Chicago/Turabian Style

Koziel, Slawomir, Anna Pietrenko-Dabrowska, Marek Wójcikowski, and Bogdan Pankiewicz. 2025. "Nitrogen Dioxide Monitoring by Means of a Low-Cost Autonomous Platform and Sensor Calibration via Machine Learning with Global Data Correlation Enhancement" Sensors 25, no. 8: 2352. https://doi.org/10.3390/s25082352

APA Style

Koziel, S., Pietrenko-Dabrowska, A., Wójcikowski, M., & Pankiewicz, B. (2025). Nitrogen Dioxide Monitoring by Means of a Low-Cost Autonomous Platform and Sensor Calibration via Machine Learning with Global Data Correlation Enhancement. Sensors, 25(8), 2352. https://doi.org/10.3390/s25082352

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Nitrogen Dioxide Monitoring by Means of a Low-Cost Autonomous Platform and Sensor Calibration via Machine Learning with Global Data Correlation Enhancement

Abstract

1. Introduction