Reliability Validation of a Low-Cost Particulate Matter IoT Sensor in Indoor and Outdoor Environments Using a Reference Sampler

: A suitable and quick determination of air quality allows the population to be alerted with respect to high concentrations of pollutants. Recent advances in computer science have led to the development of a high number of low-cost sensors, improving the spatial and temporal resolution of air quality data while increasing the effectiveness of risk assessment. The main objective of this work is to perform a validation of a particulate matter (PM) sensor ( HM-3301 ) in indoor and outdoor environments to study PM 2.5 and PM 10 concentrations. To date, this sensor has not been evaluated in real-world situations, and its data quality has not been documented. Here, the HM-3301 sensor is integrated into an Internet of things (IoT) platform to establish a permanent Internet connection. The validation is carried out using a reference sampler ( LVS3 of Derenda) according to EN12341:2014. It is focused on statistical insight, and environmental conditions are not considered in this study. The ordinary Linear Model, the Generalized Linear Model, Locally Estimated Scatterplot Smoothing, and the Generalized Additive Model have been proposed to compare and contrast the outcomes. The low-cost sensor is highly correlated with the reference measure ( R 2 greater than 0.70), especially for PM 2.5 , with a very high accuracy value. In addition, there is a positive relationship between the two measurements, which can be appropriately ﬁtted through the Locally Estimated Scatterplot Smoothing model.


Introduction
Numerous scientific studies support that air pollution is harmful to public health [1][2][3][4][5][6]. Therefore, the evaluation of pollutant concentrations is essential to assess risk [7]. A suitable and quick knowledge of air quality allows the population to be alerted in terms of high concentrations of pollutants. One pollutant that presents a high risk is particulate matter (PM), due to its great potential to reach the inner part of the lung [8]. The reference method (EN12341:2014) established by the European Union to determine these pollutants is based on collecting particles in filters with 24-h exposure (Directive 2008/50/EC [9], amended by Commission Directive 2015/1480/EC [10]). This hinders rapid action when there are episodes of high concentrations of pollutants. On the other hand, there is a limitation in the use of reference equipment due to the high economic costs involved, and consequently, there are fewer control stations.
Recent advances in the fields of low-cost sensors and computer science such as the movement of the Internet of things (IoT) and open and low-cost hardware have led to the opportunity to deploy sensorization platforms with acceptable costs for mass deployment. Also, the impulse towards smart cities has generated greater penetration and implantation of these sensing devices [11]. The field of environmental monitoring is not an exception. It has allowed the installation of a higher number of these outdoor air quality monitoring solutions and improvements both in spatial and temporal resolution [12], besides increasing effectiveness in risk assessment.
This work is focused on environmental monitoring of particles levels in the air, and currently there are some sensorization solutions on the market to measure this phenomenon. Most of these sensors are based on the optical scattering of light using a laser and the application Mie's theory to scattered light to determine the particle size [13]. In this way, different particle sizes such as PM 2.5 (less than 2.5 µm) and PM 10 (less than 10 µm) can be measured. These sensors can be installed in both indoor and outdoor environments [14]. The size of these sensors is contained, and the power consumption is low (and supports sleep mode). These sensors can operate with high sampling rates (even higher than professional solutions), and their price ranges from 5 to 100 euros [15].
PM sensors are widely used in particle detection instruments, smart appliances, indoor and outdoor air detection, and clean room evaluations. In the literature, these sensors have been used in a long list of applications [16], for example in air quality testing equipment, air purifiers/air conditioners, dust and smoke detection and analysis, industrial PM analysis, multichannel particle counters, and environmental testing equipment. There are several low-cost PM sensor solutions on the market, such as: Sharp GP2Y1010 [17], Wuhan Cubic PM3007 [18], Plantower PMS1003 [19], Shinyei PPD42NS [20], Nova SDS011 [21], and HM-3001 [22] (used in this work). Deployment of these sensors can be performed by non-profit organizations and civil scientists [23]. However, the accuracy and reliability of the sensors must be evaluated in a comprehensive and repeatable manner under real-world conditions before they are implemented in large quantities [24]. In the literature, there are few validations works for this type of PM 2.5 sensor, and their approaches do not respond to different time scales or different environments indoors and outdoors [25,26].
This work aims to perform a validation of one of these dust sensors (HM-3301) in the range PM 2.5 and PM 10 . This has not yet been done in prior research. For this purpose, extensive tests were defined temporarily and in indoor and outdoor environments. The HM-3301 sensor was integrated into an IoT platform called Sense Our Environment ( SEnviro) [27] to provide full connectivity. The validation was carried out using a reference sampler (LVS3 from Derenda) according to EN12341:2014. It is focused on the statistical point of view, and environmental conditions are not considered to realize the comparative.
In this context, the primary objective of the present work is to identify how a low-cost, lightweight, and portable HM-3301 particle counter works as a particle counting device. This sensor captures trends in ambient PM 2.5 and PM 10 concentrations, but essential properties of the sensor response have yet to be demonstrated. In the same way, the aim is to analyze the validation using sampler (LVS3) and to compare the possible differences with the sensor data. Another objective is to use different methodologies for adjusting the differences, such as Linear Models (LMs), Generalized Linear Models (GLMs), Locally Estimated Scatterplot Smoothing (LOESS), or Generalized Additive Models (GAMs).

Low-Cost Sensor
As was indicated in the previous section, a PM sensor was integrated into a sensing node platform called Sense Our Environment (SEnviro) [27]. SEnviro is a low-cost and autonomous solution to monitor the environment. In 2015, some nodes of the SEnviro platform were deployed and evaluated in the context of Jaume I University's campus [28]. Those nodes could monitor meteorological and air quality phenomena. A new version of the SEnviro node platform was published in 2018 [29][30][31] to monitor small vineyards. This new version provides some improvements compared to the first version published in 2015. Some of these enhancements were 3G connectivity, the possibility of changing the behavior of the IoT node using Over The Air (OTA) updates, more efficient power management, and more appropriate connection techniques to deliver observations using the Message Queue Telemetry Transport (MQTT) protocol. The current work is based on the last version of SEnviro and includes the mentioned PM sensor for monitoring in indoor and outdoor environments. The PM sensor adaptation in the IoT node is also detailed.
From the hardware point of view (how the node is built) and following the modules defined in [29] (Core, Sensors/Actuators, Power supply, and Communication), a new 3G microcontroller called Particle Boron [32] is used. This microcontroller increases the performance of the previous one (Particle Electron) in terms of speed and allowing a mesh configuration in a natural manner. The communication module used is 3G and is used to establish a connection between the IoT node and the server side. The power supply is accomplished using a battery and a permanent power supply. The wire is used because in this case for sensor validation a wireless solution is not compulsory, and this solution increases the performance and reliability of the platform. In the sensor module and for this research work, only the appointed PM sensor to monitor particulate matter (Section 2.1.1) is added. This sensor is joined to the microcontroller using a Grove connector and I2C communication protocol. Finally, an enclosure printed using a 3D printer has been designed to house all the components and to protect the electronic components ( Figure 1). At the behavioral level, following the same previous work [29], the developed IoT node also follows a modular design formed by seven modules (Control, Basic Config, Communication, Sensing, Acting, Energy Savings, and Update Mode). In our current work, the seven modules are developed in the same way as the appointed work, but in this case, the energy savings due to a continuous power supply are disabled. In order to validate the sensor, the sensing module is adjusted to take observations every 15 min. In addition to the storage mechanisms used in SEnviro, in this work the observations are stored in a Google Drive spreadsheet in real-time to facilitate the validation analysis step.

The Laser Particulate Matter Sensor: HM-3301
The PM sensor included in the IoT node is called HM-3301 ( Figure 2) and was created as a new generation of laser dust sensors developed by Huaman Electronics. The HM-3301 PM sensor is adopted for constant and real-time exposure of dust in the air in indoor and outdoor environments. The main difference with respect to the previous pumping generation is that the HM-3301 PM sensor employs fan blades to drive air, and the air flowing through the detection chamber is used as a test sample to perform a real-time and continuous test on the dust of different particle sizes in the air. This sensor follows standards like ISO 21501-4, ISO 14644-1, and FS209E. The HM-3301 PM sensor supports a six-channel output of 0.3 µm, 0.5 µm, 1.0 µm, 2.5 µm, 5.0 µm, and 10.0 µm. It is composed of a fan, a condensing mirror, an infrared laser source, a photosensitive tube, and a signal-amplifying and sorting circuit. This sensor supports communication modes between a microcontroller using the I2C and UART interfaces. Table 1 summarizes all PM sensor features.  The HM-3301 PM sensor is based on the Mie scattering theory [13]. When light passes through particles at the same quantity as a wavelength of the light (or greater), it produces light scattering ( Figure 3). The scattered light is concentrated on a highly sensitive photodiode, which is then amplified and analyzed by a circuit. With a specific mathematical model and algorithm, the count concentration and mass concentrations of the dust particles are obtained. For this work, mass concentrations of PM 2.5 , and PM 10 were used.

Reference Sampler: LVS3
LVS3 sampler unit from Derenda has been used to collect particles from ambient air in compliance with EN12341:2014 ( Figure 4). It was set up for as a control unit in combination with filter changer. This mechanism uses a 4-m 3 rotary vane vacuum pump to draw the particulate-laden air into an upstream head. The particles are then ranked by size in an upstream impactor and deposited on a quartz fiber filter with a diameter of 47 mm. The air throughout is monitored with a measuring hole which fits between the filter and the vacuum pump. The technical parameters of LVS3 sampler are shown in Table 2. PM 2.5 and PM 10 concentrations were determined by gravimetric method based on weighing the filter before and after sampling. According to standard EN12341:2014, the filters must be kept for at least 48 hours in a special chamber at 50% relative humidity and 20 • C temperature. Filters were weighed on an analytical scale with a precision of 0.1 mg. The PM concentration levels were determined using Equation (1): where C PM is the particle concentration in µg/m 3 ; Pm is the weight of the sampled filter in g; Pv is the empty weight of the filter in g; and Vair is the volume of air pumped in m 3 .

Sampling Conditions
The current work aims to carry out validations of the named low-cost sensor in two different environments (indoors and outdoors) to learn the behavior of equipment measuring different ranges of concentrations within PM 2.5 and PM 10 . One unit of this inexpensive sensor and two reference samplers (one in each environment) sampled simultaneously in the same location, so external factors to the measure were not considered. During the indoor environment period the low-cost sensor and the reference sampler were used inside an office of the Jaume I University without a human presence ( Figure 5). Regarding the outdoor environment validation, Vila-real, an industrial city in the eastern region of the province of Castellón (Spain), has been chosen. This province is a strategic area in the framework of European Union pollution control. This area has a high concentration of ceramic tiles and factories, representing an essential cluster with respect to this material [33] (Figure 6). Indoor and outdoor measurements were not concurrent during the same time period.

Statistical Models Used for Validation
Linear models are relatively simply described and developed, and are easily interpreted. In particular, the GLM is a resilient generalization of ordinary linear regression models characterized by response variables for which its error distribution models differ from a normal distribution. Thus, these kinds of models allow the linear model to be related to the response variable through a link function and permit the magnitude of the variance of each measurement to be a function of its predicted value [34,35].
However, such models may be limited in terms of predictive capacity if the relationship between variables is more complicated. Some of the models that allow the modeling of non-linear relationships while trying to maintain a high level of interpretability are, for example, LOESS or GAM. LOESS is a nonlinear regression with a single predictor, similar to splines in terms of adjustment by regions, but it differs in that the intervals can overlap. On the other hand, GAM is an extension of LOESS for modeling multiple predictors. In this study, an ordinary LM, GLM, LOESS, and GAM have been proposed to compare and contrast their results.

Description Data
As discussed, a low-cost sensor validation has been performed using a reference sampler unit. The HM-3301 PM sensor has a sequential typology, so it takes timely and parameterizable measurements every x minutes (15 in our case), while the reference sampler unit performs daily measurements using one-day filters. Thus, the granularity of the observations is different; this feature will be treated to perform validation.
The PM low-cost sensor was installed on 17 September 2019, with data taken until 12 November 2019, although there were some days of inactivity. During this period, the sensor was located in both indoor and outdoor environments. In total, three different periods can be identified. The first period was from 17 September to 14 October, where the PM sensor was installed in an indoor environment ( Figure 7a); the second period was from 15 October to 30 October, in an outdoor environment ( Figure 7b); and finally, third period from 31 October to 12 November, in an indoor environment (Figure 7c). Both indoor environments periods were in the same university office. In total 3677 observations were collected during these three periods with a rate of 15 min. Each observation contains a timestamp and PM 2.5 and PM 10 values. The dataset with all collected observations is published in Zenodo [36].  Moreover, two units of reference samplers (one for PM 10 and other for PM 2.5 ) were set up during several days in the periods of HM-3301 sensor. Table 3a summarizes the data collected using PM 2.5 LVS3 sampler and Table 3b for PM 10 .

Analysis and Modeling
First of all, in order to get a first insight on how data are distributed and which is the degree of relationship between the values obtained by the HM-3301 sensor (for PM 2.5 or PM 10 ) and those of the LVS3 sampler, an exploratory analysis of the data for each by means on a scatter plot was carried out (Figure 8). Even if the number of points is scarce, a positive trend can be observed between the two measures. Secondly, the results, after applying the different models in the data comparison between LVS3 sampler and HM-3301 sensor for PM 2.5 and PM 10 (indoor and outdoor) are shown in Figure 9. These graphs also include envelopes (grey color) indicating a confidence interval assuming 95% confidence. In most cases, the points are always kept within the limits of the interval, although LOESS seems to best fit the points.  In addition, Figure 10 shows the LOESS model for PM 2.5 and PM 10 obtained by the HM-3301 sensor, which presents a correlation 0.8495. There seems to be a linear regression because the data are correlated. A suitable fit between the two measurements with the correlation value and the graph is obtained. The correlations for the two PM series are 0.9143 in PM 2.5 and 0.8870 in PM 10 , separately and in front of reference sampler, showing very similar values. In all cases, a good adjustment is shown, at close to 1. After this first joint analysis, the indoor and outdoor environments are distinguished. The results are shown in Figure 11 in the case of PM 2.5 . Although there are not too many points, a certain tendency over them can be observed, with correlations of 0.8905 and 0.7306, respectively.
A similar process has been done to analyze the PM 10 values. Thus, considering all the values together (indoor and outdoor measures), a correlation of 0.8870 is obtained. Moreover, again, LOESS seems to be the best. When we distinguish between indoor and outdoor data ( Figure 12) we obtain correlations of −0.4644 and 0.7720, respectively. In the case of the former, due to the small amount of data the correlation is not significant, and therefore a much lower and negative value appears.  Finally, the accuracy of the data, which gives information about the similarity between the low-cost sensor measurement and the reference sampler values, is calculated using the following Equation (2) [37,38]: where S is the average concentration obtained by the sensors throughout the testing period and M is the average concentration measured by the official air quality monitoring station during the testing period. The value for PM 2.5 was 80.4337 and for PM 10 56.5942. Considering that it is a measure interpreted in percentages, these values indicate that for PM 2.5 the sensors measure 80% with respect to the reference measures, while for PM 10 the sensor's accuracy is lower. Thus, in our case, the sensor's accuracy is better when analyzing PM 2.5 than PM 10 .
The results of the study are: (1) the HM-3301 sensor provides almost identical figures, with correlations with the reference sample being greater than 0.70 (VLS3 from Derenda); (2) all the measures show a quite high linearity against officially measured concentrations of PM 2.5 and PM 10 ; and (3) the data validation recorded directly at the three sensors increased the R 2 value. The results confirm that the use of this kind of low-cost sensors for PM 2.5 and PM 10 monitoring under certain environmental conditions is viable.

Discussion and Conclusions
The HM-3301 sensor has recently drawn attention due to its appearance and low cost for measuring PM mass concentration, which is frequently used as an indicator of air quality. Until now, this sensor has not been thoroughly evaluated in real-world conditions, and its data quality is not well documented. In this study, accurate monitoring of indoor and outdoor mass concentrations of particulate matter was achieved. The PM 2.5 and PM 10 assessed in our study are crucial for human health risk assessments. An IoT platform called SEnviro was used to provide Internet connection and transmit observations in real-time.
The low cost associated with these sensors, and the possibility of calibrating them for field use [23] are advantages for this new tool to reliably obtain continuous exposure estimates in broad areas. The results of this paper suggest that if such sensors are arranged indoors or outdoors, measurements are expected to be reasonably accurate and precise. The capacity to obtain detailed continuous exposure information from a large number of study participants may allow for additional insight into short-term exposures to respirable PM 2.5 or PM 10 . Again, the applicability of the sensor for personal monitoring studies largely depends on whether the upper limit of detection might be exceeded. The validation performed between the HM-3301 sensor and the VLS3 reference sampler, according to EN12341:2014, demonstrated that the low-cost sensor measurements highly correlated with the reference measures, especially for PM 2.5 , where a very high accuracy value was found. In addition, in general, there was a positive relationship between the two types of measurements, which was adequately fitted through the LOESS model.
A limitation raised during the presented work, and especially in the outdoor environment, is that this study was limited to a relatively low number of days in which there were no substantial variations in environmental conditions (although there were several days of continuous rain). As noted in [26], we should be concerned about the data captured by this type of sensor when the relative humidity values are higher than 80% (following the technical parameters of the HM-3301 this value is 90%). For this reason, in future works meteorological parameters will be included in the statistical models. Another reason to utilize more comprehensive working periods is to evaluate long-term performance. Although the manufacturer claims a two-year lifespan in outdoor environments, this must be tested. A long working period can cause some problems for sensor response. One is the formation of films in the optical sensor lens. Alternatively, another issue could appear if the sensor is located in high-concentration environments where it will likely become saturated and will not capture the correct particulate values. Both limitations can be addressed using correction techniques on the results obtained [26].
As conclusions, the analyzed sensor (HM-3301) is valid to complement the regulatory networks of monitoring the outdoor (and indoor) PM and improve the spatial and temporal resolution of the data of PM 2.5 and PM 10 . The low cost and the possibility for use outdoors are advantages for this new tool to reliably obtain continuous exposure estimates over more extensive areas such as large cities.
As future work, a more extended period of monitoring time is planned to cover a broader range of weather conditions and to test the long-term stability of the sensors. A follow-up study that will evaluate the sensor performance using at least a one-year time series is proposed and adding more low-cost sensors in a wide variety of different environmental conditions and pollution regimes. Another approach to take into account should focus on the question of whether and how other covariates could be included, for instance, how temperature or humidity can affect the sensor and measurements. These possibilities could reduce the possible limitations in this kind of sensor. The small size, low power requirements, and the ability for the HM-3301 sensor to be integrated into other monitoring devices such as GPS or mobile phones make this sensor a promising tool for personal monitoring studies and for providing data to support new research studies. Finally, emerging techniques on machine learning [39] will be explored to validate this kind of low-cost sensor.