Performance Assessment of a Low-Cost PM2.5 Sensor for a near Four-Month Period in Oslo, Norway

The very low-cost Nova particulate matter (PM) sensor SDS011 has recently drawn attention for its use for measuring PM mass concentration, which is frequently used as an indicator of air quality. However, this sensor has not been thoroughly evaluated in real-world conditions and its data quality is not well documented. In this study, three SDS011 sensors were evaluated by colocating them at an official, air quality monitoring station equipped with reference-equivalent instrumentation in Oslo, Norway. The sensors’ measurement results for PM2.5 were compared with data generated from the air quality monitoring station over almost a four-month period. Five performance aspects of the sensors were examined: operational data coverage, linearity of response and accuracy, inter-sensor variability, dependence on relative humidity (RH) and temperature (T), and potential improvement of sensor accuracy, by data calibration using a machine-learning method. The results of the study are: (i) the three sensors provide quite similar results, with intersensor correlations exhibiting R values higher than 0.97; (ii) all three sensors demonstrate quite high linearity against officially measured concentrations of PM2.5, with R2 values ranging from 0.55 to 0.71; (iii) high RH (over 80%) negatively affected the sensor response; (iv) data calibration using only the RH and T recorded directly at the three sensors increased the R2 value from 0.71 to 0.80, 068 to 0.79, and 0.55 to 0.76. The results demonstrate the general feasibility of using these low cost SDS011 sensors for indicative PM2.5 monitoring under certain environmental conditions. Within these constraints, they further indicate that there is potential for deploying large networks of such devices, due to the sensors’ relative accuracy, size and cost. This opens up a wide variety of applications, such as high-resolution air quality mapping and personalized air quality information services. However, it should be noted that the sensors exhibit often very high relative errors for hourly values and that there is a high potential of abusing these types of sensors if they are applied outside the manufacturer-provided specifications particularly regarding relative humidity. Furthermore, our analysis covers only a relatively short time period and it is desirable to carry out longer-term studies covering a wider range of meteorological conditions.


Introduction
Particulate matter (PM) is one of the major airborne pollutants in urban environments and is one of the most problematic air pollutants, in terms of its negative effects on human health [1]. The effects of PM on human health, which have been widely studied in the last twenty years, include asthma, lung cancer and cardiovascular issues [2,3]. Generally, the level of health effects from PM are related to the size of particles. For instance, PM up to 10 micro-meters (μm) in diameter (PM10,) can penetrate into the bronchi. PM up to 2.5 μm (PM2.5, fine particles) can penetrate the lungs, while ultrafine respectively. The performance characteristics of the available low-cost sensors should be well known, before their deployment for sensor-based management of air pollution [10].
This study is one of the major tasks within the EU H2020 project hackAIR (www.hackair.eu) [31], which is building a collective, awareness-raising platform for outdoor air quality with pilots in local communities in Norway and Germany. In this study, PM2.5 measurements from a set of three low-cost PM sensor units (SDS011) were compared against tapered element oscillating microbalance (TEOM) observations made at an air quality monitoring station in Oslo, Norway. The TEOM device is a well-characterised instrument and commonly used in air quality monitoring. We assessed the performance of the three units for a four-month period in winter and spring 2018.

Nova PM Sensor SDS011
The SDS011 sensor is a quite recent air quality sensor developed by Inovafit, a spin-off from the University of Jinan, Shandong province, P.R. China [17] (Figure 1). The technology is based on laser diffraction theory, where particle density distribution is specified from the light intensity distribution patterns [21,29]. The sensor contains a digital output and a built-in fan (Figure 2), which can measure the particle density distribution between 0.3 to 10 μm in the air. A built-in algorithm convert the particle density distribution into particle mass. A short technical specification of the SDS011 is given in Table 1.

Sensors Co-Location and Its Measurement Site Description
Three SDS011 sensors were co-located at the official air quality monitoring station in Kirkeveien, Oslo, Norway (59°55'56'' N; 10°43'28'' E), which is a road-side station ( Figure 3). Road transport is the dominant emission source of PM in the region.
All three sensors were connected with a DHT22 digital temperature and humidity sensor, which measured the relative humidity (RH) and temperature (T) inside the sensor casing. Sensor casing was adopted from luftdaten.info platform [32].  A reference-equivalent instrument TEOM 1405 FDMS (filter dynamics measurement system, which has been calibrated against the true reference Kleinfiltergeraet) is running at the Kirkeveien measurement station. The TEOM 1405FDMS is a TEOM1405DF with an added FDMS unit. The TEOM 1405 DF is a dichotomous analyser. It measures both coarse and fine particles concurrently on two microbalances. In principle, it is two TEOMs in parallel. The inlet airflow is split by a virtual impactor. The FDMS unit compensates for any losses of volatile organic and inorganic compounds on the particles. The inlet air is passed through a drier to avoid condensation. Mass in the dry air is left to accumulate on the filter for 6 min and the mass concentration (MCbase) is measured. Then the dry sample air is diverted through a filter and a chiller at 5 °C to remove all the particles and volatile compounds from the air stream. The clean air is sampled on the filter for another 6 min and the mass concentration (MCref) is measured. The total mass concentration after 12 min is calculated as MC = MCbase − MCref [33]. While the TEOM is not a true reference instrument, the uncertainties resulting of the calibration against the Kleinfiltergeraet are so much smaller than the uncertainties of the SDS011 sensors that they are not likely to show any significant effect on our analysis.

Data Preparation
PM2.5 data measured by the instrumentation from the official air quality monitoring station was used from 11 December 2017 to 31 March 2018. During this period, one sensor system (S1) using an SD card for data storage recorded data every 30 second, while the other two sensor system (S2 and S3) using the luftdaten.info approach recorded data every 2.5 minutes [32]. PM2.5 mass concentration from each sensor and official monitoring station was provided at hourly time scale.

Data Analysis
The manufacturer limits the operating range of the SDS011 sensor in terms of RH to 0%-70% (see Table 1). In many countries average RH lies consistently above this threshold for significant periods of the year and as a result the sensors are in practice nonetheless being used somewhat inappropriately by many projects and initiatives outside this range, without taking into account the official operating range. While we are aware of the manufacturer specifications and the physical reasons behind this restriction, we therefore evaluate the performance of the sensors over the entire range of RH, reflecting ongoing practical use of such sensors under real-world conditions. This has the aim of quantifying the uncertainty of the sensors when they are used outside of the official manufacturer-provided operating range for RH. However, we also provide validation results restricted to the manufacturer-provided RH operating range to show the performance of the sensor during appropriate use.
Five performance aspects were examined: operational data coverage of the sensor systems, linearity of the response and accuracy, inter-sensor variability, dependence on air RH and T, and potential accuracy improvement using data calibration with a machine-learning method.
The linearity of response between SDS011 sensors and the official air quality monitoring station was assessed using linear regression, where the data from official air quality monitoring station was the independent variable and the SDS011 sensor data the dependent variable. An R 2 value close to 1 reflects a very good linearity of the sensor response in comparison with the official instrumentation. A small R 2 value indicates a poor linear relationship.
Accuracy is the degree of closeness between the sensors' measured values and the reference value. In this context, the long-term averaged data accuracy is here defined as follows [34,35]: Where X is the average concentration measured by the sensors throughout testing period and R is the average concentration measured by the official air quality monitoring station during the testing period. The higher the positive value (percentage), the higher the sensor's accuracy. For example, a value of 100% implies that sensors measure exactly what the reference instrument measures. In cases where sensors overestimate the reference instrument by more than 100%, sensor accuracy is reported as a negative value, using equation (1) [34,35].
Inter-sensor variability is related to how close the measurements from three units of the same sensor type are to each other. It is evaluated through a set of descriptive statistical parameters, such as mean, range, and standard deviation. For a set of three sensors the inter-sensor variability is reported as a percentage and is calculated as follows [34,35]: where Mean-highest is the highest of the three sensors' average concentrations, Mean-lowest is the lowest of the three sensors' average concentrations, and Mean-average is the average of the three sensors' average concentrations. The impact of RH and T on the sensor response was tested by analysing the relationship between observed PM2.5 sensor error (measured as sensor observation data minus reference data) and air temperature as well as RH for three sensors. A Loess fit [36] was used to better illustrate the relationship.
Multiple-linear regression (MLR) [37] and a machine learning method (Random Forest) [38] were used for illustrating potential sensor data accuracy improvement by correcting for the effects of RH and T measured by DHT22 sensor, which was located right beside the SDS011 sensor within same sensor casing.
All data analyses were carried out in the R environment for statistical computing and visualization [39]. All measurements were conducted under varying meteorological conditions. Figure 5 illustrates that the PM2.5 data from three sensors follow similar patterns near four-month period, thus indicating that they respond similarly to varying environmental conditions. Qualitatively no significant drift of the signal was observed for any of the three sensor systems over the study period. The PM2.5 concentrations at hourly time scale ranged from 0.4 μg/m 3 to 127.5 μg/m 3 . The T range the sensor systems were exposed to was −14.0-+11.4 °C, and the RH range was 15.4-+99.5%, respectively ( Figure  5). The operation of the three tested SDS011 sensors was stable throughout the almost four-month study period and was no obvious errors in terms of data availability or failures of electronic parts of the sensors were observed within these meteorological conditions.

Linearity of the Response and Accuracy
Data from three SDS011 sensor systems was compared with data generated from the official air quality monitoring station over a nearly four-month period (11 December 2017-31 March 2018) ( Figure 6, Table 2). The results show that the PM sensors provided a consistent measurement response to measurements of the reference monitoring station. Three sensors demonstrated a substantial degree of correlation against the official reference instrument from air quality monitoring station, with R 2 values equal to 0.71 (S1), 0.68 (S2), and 0.55 (S3), respectively. This result is consistent with the similar study implemented in Wroclaw, Poland by Badura et al. 2018 [30]. As can be seen in Table 2, the slope of all three regression models is slightly below 1, indicating a general underestimation of the PM2.5 mass for all three units, particularly for higher pollution levels. Furthermore, the mean error is generally below 2 μg/m 3 and the RMSE (Root-Mean-Square Error) is less than 6 μg/m 3 for all three units. Sensor system S3 shows the overall worst performance of all three units.  The three sensors demonstrated satisfactory to comparatively high data accuracy of the longterm mean concentration with values of 98.16%, 86.82% and 80.76%, respectively ( Table 3). The longterm averaged data accuracy for three sensors reached 88.58%.

Intersensor Variability
The inter-sensor variability over the almost four-month study period (11 December 2017-31 March 2018) was analysed. We can see that three sensors provide quite similar results and do not vary substantially (Figure 4, Table 4, Figure 7), with inter-model variability around 9.64%, which calculated as following:

Influence of Relative Humidity and Air Temperature
Sensors were exposed to T in the range of −14.0-+11.4 °C, and RH in the range of about 15.4-+99.5%. These parameters were measured by the DHT22 sensor located beside the SDS011 sensor, within same sensor casing. Therefore, measurements of T and RH are independent and not affected by the data availability of sensors or electronic parts of sensors.
Most low-cost sensors for air quality, including such as Alphasense OPC-N2 [23,30], Plantower PMS7003 [19, 20,30], and Nova SDS011 [21,30], are to some extent influenced by the ambient environmental conditions. Therefore, we explored the relationship between the observed PM2.5 error as a function of T and RH (Figure 8). Relationship between observed PM2.5 sensor error (measured as sensor observation minus reference data) and air temperature (left Panels (a), (c), (e)) as well as relative humidity (right Panels (b), (d) and (f)) for S1 (top row, Panels (a) and (b)), S2 (middle row, Panels (c) and (d)), and S3 (bottom row, Panels (e) and (f)). Black dots indicate the actual hourly average observations, whereas the coloured surfaces indicates the density of occurring observations, highlighting where the majority of observations are located. The red line represents a Loess fit to the dataset with the grey area indicating the 95% confidence intervals. Figure 8 shows how the PM2.5 sensor error (calculated as the hourly mean sensor observation minus the hourly mean observations from the TEOM instrument) varies with T and RH. While all of the raw hourly data the overall patterns can be most easily observed by analysing the red line which represent a Loess fit [40] to the raw data.
As for the dependence of the PM2.5 sensor error with air T, all three units show similar patterns. For relatively low T under −5 °C the errors were either slightly negative on average (S1 and S3) or close to zero (S2). For T around zero degrees, all three units show slightly positive errors between 0 μg/m 3 and 5 μg/m 3 . For higher T the average PM2.5 of all three units decreases again. There is no obvious physical reason for this pattern and we think that this peak at around zero degrees is rather related to high RH values at these temperatures (see also Figure 9).
In terms of the impact of RH on the PM2.5 sensor errors, we can initially observe that there is a similar behaviour for all three units. The errors tend to be quite stable between −5 μg/m 3 and 0 μg/m 3 for RH levels less than approximately 80%. However, for RH values between 80% and 100% we can see a substantial increase in PM2.5 error for all three units. At close to 100% RH, all three units show positive PM2.5 errors of 10 μg/m 3 to 15 μg/m 3 on average. While the RH values that occurred during our study period ranged from a low of 15% to nearly 100%, the highest frequency of RH values in this study was found in two clusters of around 80% and 90%, respectively.
In order to better disentangle the effects of T and RH, Figure 9 shows the PM2.5 error of each sensor as a function of both RH and air T at the same time. The most obvious pattern is the substantial cluster of positive errors for high RH (>90%) at T above 0 °C. This pattern exists for all three units, although there are some slight differences in the magnitude of the errors. We can further observe the largest negative errors for low T (<−5°C) and RH between approximately 40% and 80%. The errors tend to decrease again slightly for even lower T (<−13 °C) but the range of RH is very small in this case and the overall number of samples is very low, making it difficult to draw further conclusions. The effect of RH and T on sensors' data quality needs to be taken into consideration when using lowcost particle sensors [41,42]. There could be several reasons for the effect of RH on the sensor performance. The most obvious reason is that the low-cost sensor has no system for drying the particles before they enter the optical chamber, which means that aerosol particles as well as fog droplets are counted. This leads to a positive artefact compared to the TEOM. The second reason is particle growth by water vapour condensation. Depending on the chemical composition of the particles, water vapour can condense onto the particle and particles grow by condensation. This growth in particle diameter is reflected by the radius to the power of three in the particle mass and would also lead to a positive artefact compared to the TEOM, as the TEOM measures dry particles. The third reason is the change of optical properties of particles measured if water condensation occurs onto the particle. A critical parameter when calculating the particle density distribution is the refractive index of the particles. Water vapour condensation changes the imaginary component in the Mie equation. This is the extinction coefficient of the material, defined as the reduction of transmission of optical radiation caused by absorption and scattering of light, leading to a wrong estimation of the size and therefore the mass reported by the instrument.
Based upon these observations of high RH negatively affecting the sensors' response, we filtered sensor data with RH less than 80%, and plotted it against officially measured concentrations of PM2.5 ( Figure 10). The results indicate that the three sensors demonstrated an increased degree of correlation against the official reference instrument from air quality monitoring station ( Figure 6), with R 2 values increased from 0.71 to 0.80 (S1), 0.68 to 0.79 (S2), and 0.55 to 0.71 (S3), respectively. However, the slope of the regression lines is slightly lower than before for all three units ( Figure 6). Furthermore, we filtered sensor data with RH less than 70% which aligned with the manufacturer-provided RH operating range (max. RH 70%), and plotted it against officially measured concentrations of PM2.5 ( Figure 11). The results showed that three sensors demonstrated a decreased degree of correlation against the official reference instrument from air quality monitoring station, with R 2 values decreased from 0.71 to 0.65 (S1), 0.68 to 0.57 (S2), and 0.55 to 0.47 (S3), respectively. The physical reason for these decreased correlation is not entirely clear, but it may be related to the fact that nearly 60% of the data were filtered out. Since this filtering also includes most observations at concentration of greater than 20 μg/m 3 , the available range of concentrations is decreased significantly, which could be responsible for a decrease in correlation. Figure 11. Linear regression for 1-hour average PM2.5 values for environmental condition with RH < 70% from the three SDS011 sensors versus the PM2.5 data from the air quality monitoring station for the entire study period.

Correction for Temperature and Humidity Effects
It is to some extent possible to statistically correct for the effects of RH and T that were shown in the previous section, although such corrections tend to be specific to the location at which the colocation is being carried out and cannot easily be transferred to other locations with different conditions. We demonstrate an example for such a correction procedure here using simple multilinear regression (MLR) [37] and a random forest (RF) model [38]. Figure 12 shows the improvement in sensor accuracy that can be achieved when RH and T are accounted for as part of the calibration. The left column of Figure 12 shows the original out-of-sensor data for all three sensors, whereas the middle column and the right column show the data after calibration using a MLR and a RF model, respectively. It can be seen that already a simple linear regression can improve the accuracy with respect to reference data somewhat, although the increases in R 2 value are relatively modest. However, using the same dataset with a RF model increases the correlation significantly, explaining roughly 10% more of the variability for sensors S1 and S2 and even 20% more of the variability in the case of sensor S3, with R 2 value increased from 0.71 to 0.80, from 0.68 to 0.79, from 0.55 to 0.76, respectively (Figures 12).
It should be noted that this correction for the effect of air T and RH is only valid for the particular location at which the model was trained. As such, the model is dependent on both the specific characteristics of the environmental parameters at this site but also of the characteristics of the PM that occurs at this site (e.g., particle type, size, etc.). Applying such a correction model at a different location that has either different environmental conditions or different particle characteristics is likely to result in inferior performance. Figure 13 shows scatter plots of the relative expanded uncertainty as a function of the PM2.5 concentration measured at the air quality monitoring station, following the methodology described by Spinelle et al. (2015) [43]. Based on these plots, two out of the three units (S1 and S2) reach the data quality objective (DQO) of 50% as defined in the European Air Quality Directive [8]. Both sensors reach relative expanded uncertainties [44,45] of below 50% approximately at concentrations of 20 μg/m 3 . S3, however, does not meet the DQO. Figure 12. Comparison of scatterplots between hourly reference PM2.5 observations and hourly sensor PM2.5 observations for the test dataset of the original out-of-sensor data (left column), after correction for relative humidity and temperature effects using multiple linear regression (middle column), and after correction for relative humidity and temperature effects using a random forest model (right column). The three rows represent the data for S1, S2, and S3, respectively. The individual data points are coloured by the value of relative humidity.

Conclusions
The conducted comparison of three SDS011 sensors with data from an official reference air quality monitoring station demonstrated that the SDS011 sensor generally follow the PM2.5 variability. Linear regression indicated a good correlation between the two datasets with R 2 values equal to 0.71 (S1), 0.68 (S2) and 0.55 (S3), respectively, over almost four-month period in challenging, Norwegian winter conditions with frequently high relative humidity. The inter-sensor variability analysis showed that the three sensors provided quite similar results and did not vary substantially from each other, with inter-model variability around 9.64%, and inter-sensor correlations R values higher than 0.97. RH and, to a very small extent, T affect the SDS011 performance. Particularly high RH values (over 80%) cause significant overestimates of the true PM2.5 mass. While the sensors provide generally reasonable estimates of PM2.5 mass out-of-the-box, our results also indicate that a field calibration under representative environmental conditions is highly beneficial for improving the accuracy of the measurements. This study was limited to a relatively low number of months with limited variation in environmental conditions. To cover a wider range of meteorological conditions and to test the long-term stability of the sensors, we are working on a follow-up study that will evaluate the performance of the sensors using a yearlong time series at least and sensors located in a wide variety of differing environmental conditions and pollution regimes.
Despite the reasonably good performance of these sensors, it should be noted that the potential misuse of these sensors is nonetheless high, especially when they are used outside of a research environment for citizen science applications and personal air quality monitoring, where the users might not have the required knowledge to adequately judge the uncertainty of the sensors. In such cases, the deployment of these sensors will almost certainly not be confined to environments with RH < 80%, and there is no specific notification to the users that the readings are unreliable when the RH is > 80%, although the manufacturer provides a recommended RH operating range. In addition, the relative uncertainties can be quite high for hourly values, and users should be aware of this limitation and take caution in interpreting such measurements. Nevertheless, considering their very low cost and the performance assessment results overall, we conclude that the SDS011 has significant potential for implementing a dense monitoring network when the environmental conditions exhibit on average relatively low relative humidity (RH < 80%). When used under environmental conditions that often exhibit high relative humidity, appropriate automated filtering or correction routines need to be established to remove problematic observations from the datasets or at minimum provide the users with clear indications of the estimated observations uncertainty. If these conditions are met, we conclude that networks of SDS011 sensors could in future, for example, complement the regulatory outdoor air quality monitoring networks and improve the spatial and temporal resolution of PM2.5 data, opening up various applications for the research community, regulatory agencies and raising public awareness.
Author Contributions: Hai-Ying Liu and Rolf Haugen designed and performed the field experiments; Philipp Schneider and Hai-Ying Liu analysed the data; Matthias Vogt contributed to results interpretation by explaining the dependence of T and RH on sensors' response; Hai-Ying Liu and Philipp Schneider wrote the paper.

Funding:
The research leading to these results has received funding from EU H2020 project hackAIR with contract no. 688363 (www.hackair.eu).