1. Introduction
Particulate matter with an aerodynamic diameter less than or equal to 2.5 μm (PM
2.5) is a classified human carcinogen [
1]. The Global Burden of Disease Study 2015 showed that around 5.7–7.3 million deaths could be attributable to PM
2.5 exposure [
2,
3]. Many areas worldwide experience annual mean levels of PM
2.5 reaching 100 μg/m
3 [
4,
5], much higher than 10 μg/m
3, the value recommended by the World Health Organization [
6]. Since monitoring stations equipped with expensive instruments established by environmental regulatory agencies are only situated in limited areas, the development of low-cost sensors (LCSs) provides opportunities to measure pollutant levels at much higher spatial densities than ever before [
7,
8,
9]. However, most LCSs for air pollutants face the data accuracy challenges [
9,
10], as they are typically not calibrated by the manufacturers due to cost considerations. Inaccurate underestimated pollutant levels may give false impressions of acceptable air quality, while inaccurate overestimated pollutant levels (2–3 fold, [
10]) may mislead residents and result in unnecessary societal costs. Either way, biased LCS networks have limited applications.
In Taiwan, citizens have placed PM
2.5 LCSs near their households or inside elementary schools due to concerns about the harmful effects of PM
2.5. With the assistance of information scientists and a volunteer internet groups, since 2016 real-time PM
2.5 values all over Taiwan have been made available on a website (
https://v5.airmap.g0v.tw/#/map) to show the spatial distributions of PM
2.5. The temporal trends and relative comparisons of LCS data among different areas have met the demand of citizens who want to be informed of the PM
2.5 levels in their neighborhoods. Pollution awareness has thus been enhanced dramatically. However, overestimated PM
2.5 levels from these LCSs often needlessly alarm citizens who are unaware of the aforementioned accuracy issue, and environmental groups have wrongly accused the Taiwan Environmental Protection Administration (Taiwan EPA) of tampering with the data of official monitoring stations, which show consistently lower levels than those of the LCSs (
https://news.housefun.com.tw/news/article/154493172776.html). The unnecessary distrust between citizen groups and the Taiwan EPA is an unfortunate side effect of this successful collaboration between academics and citizens. Solving the data accuracy issue could also resolve the dilemma of the current PM
2.5 LCS network in Taiwan and in other countries, as well as enhance the applicability of these environmental LCS networks.
Rai et al. [
8] proposed a two-stage calibration process with laboratory calibration done by the manufacturers and calibration checks performed by the end-users. This would be ideal if the manufacturers followed the authors’ suggestions. However, demanding manufacturers to calibrate LCSs may be unrealistic since LCSs are made in larger quantities with much lower costs than more expensive instruments. The alternative is to obtain research-grade observations by conducting side-by-side comparisons and establishing correction equations to accordingly convert LCS readings into data comparable to those obtained from research-grade instruments, such as GRIMM, SidePak, and tapered element oscillating microbalance (TEOM) analyzers [
8,
10,
11,
12]. However, these evaluations are labor-intensive, time-consuming, and resource-demanding, which is not possible for the thousands of LCSs placed by citizens in large areas. Innovative ways to correct the data of these LCSs to nearly research-grade observations are thus urgently needed.
The alternative to correction depends on data science. A review paper [
13] highlighted the growing use of machine learning and other advanced data processing approaches to improve LCS/monitoring agreements with reference monitors. One of the machine learning methods, random forest, has been applied to calibrate LCSs for CO, NO
2, CO
2, and O
3 with meteorological data and net responses from all sensors [
14]. In addition, in preliminary explorations, our group found that deviations of the LCS signals from reference instruments are greatly reduced by applying machine learning methods to correct uncalibrated LCS readings in sensor networks with data from research-grade instruments taken from regulatory monitoring stations within a 3 km radius; these results will be summarized in another manuscript [
15]. In short, machine learning methods are promising for reducing LCS deviations in sensor networks. However, these applications are restricted to the currently limited number of official monitoring stations.
Here, we propose a hybrid method for combining traditional laboratory evaluations and new data science methods to adjust LCS readings to research-grade observations. For LCS sets located within a 3 km radius of regulatory monitoring stations, machine learning could be applied to adjust LCS readings based on the monitoring instruments. For other LCSs in areas without monitoring stations, “seed” LCSs corrected with side-by-side comparisons in the laboratory could be installed strategically in those areas to provide research-grade observations to further correct nearby uncalibrated LCSs. This combination of traditional laboratory evaluations and new machine learning methods could largely enhance the scientific and social values of such sensor networks. As a result, environmental scientists with the ability to conduct laboratory evaluations could further contribute to community monitoring and citizen science by improving the data accuracy of these LCS networks. This paper is focused on acquiring research-grade data for the seed LCS. The second part involves applying machine learning methods and will be presented in another manuscript [
15].
Laboratory evaluations have been conducted for several PM LCSs, such as those from Alphasense, Dylos, Samyoung, Sharp, Shinyei, Nova, and Plantower. Rai et al. [
8] reviewed evaluations published before 2017. Subsequently, at least seven publications have presented results from laboratory evaluations for various PM LCSs [
16,
17,
18,
19,
20,
21,
22]. Among the sensors evaluated, Plantower sensors, which have a relatively low cost (~35 USD), consistently performed well in terms of the intra-precision among themselves and their precision compared to various research-grade instruments [
16,
17,
18,
19,
20,
21,
22]. It was found that Plantower sensors performed better than Shinyei ones because Plantower sensors are designed with a fan that draws in air and a laser light source [
23]. Moreover, there are more than 4000 PMS sensors in the PM sensor network of Taiwan. Thus, a Plantower PMS3003 sensor was chosen as the target LCS in this work for the reasons stated in the
Section 2. To date, only five publications have focused on PMS3003 [
10,
16,
23,
24,
25]. Therefore, new laboratory evaluation results can fill the data gap for this LCS and facilitate its application in environmental studies.
With a hybrid method of data correction in mind, it is important to provide valid and robust laboratory correction equations for limited “seed” devices with LCSs (i.e., AS-LUNGs, Academia Sinica, Taipei, Taiwan). In addition, “low-cost” is an important consideration for evaluating these LCS devices to facilitate their wide application. The objective of the current work is to obtain reliable and robust correction equations to convert LCS signals to research-grade data with side-by-side comparisons between research instruments and LCS devices. The robustness of these equations was evaluated under two different experimental settings with two different burnt materials and both before and after 1.5 years of field campaigns. Correction equations in different concentration ranges were also established, and possible ceiling values were explored. Recommendations are given for evaluation methodologies to be applied by other research groups with consideration of different financial requirements and different degrees of variability. This work shows two candidates can be used as “seed” LCS devices. LCS readings can be converted to data comparable to the Federal Equivalent Method (FEM) based on side-by-side comparisons in a laboratory. For scientists with resources, evaluation experiments can be conducted with incense burning in a customer-built chamber with FEM instruments to obtain correction equations with coefficients of determination (R2) of 0.999, less than 6.0% variability for PM2.5 and PM1 in slopes, and mean root-mean-square-errors (RMSEs) of 1.18 and 1.56 µg/m3 for PM2.5 and PM1, respectively, in a range of 0.1–200 µg/m3. For scientists with limited resources, experiments can be conducted using a standard chemical fume hood with an R2 of 0.930–996, less than 15.5% variability in the slopes, a mean RMSE of 2.4 for PM2.5, and 10.1% variability in the slopes with a mean RMSE of 1.82 for PM1.
4. Discussion
4.1. Performance of PMS3003 and AS-LUNG Sets
Air pollution sensor networks are good compliments to current regulatory monitoring networks for providing pollutant levels close to citizens’ living environments in large areas at much lower cost [
8,
37]. We propose a hybrid method of combining laboratory evaluations and data science to ensure that LCS networks provide accurate PM data. First, LCS data are corrected by laboratory side-by-side comparisons for “seed” LCS devices, which can be installed strategically in areas without EPA stations; secondly, statistical or machine learning methods are applied to adjust nearby uncalibrated LCS devices with data from the EPA or the seed LCS devices wherever available. Thus, readings from other uncalibrated LCS devices in the sensor network can be corrected to nearly research-grade observations accordingly. The current work focuses on the first part of this process to obtain reliable and robust correction equations to convert the readings of LCS devices to research-grade (or FEM-comparable) measurements via side-by-side comparisons with research-grade instruments in the laboratory. The robustness and variability of the acquired correction equations under different experimental settings were evaluated with low-cost considerations.
Our results show that in both the hood and chamber experiments, the AS-LUNG sets with PMS3003 have good agreement with the FEM EDM-180, providing a high R
2 of 0.930–0.998 in the hood and chamber experiments with linear regressions and 0.999 with segmented regressions, showing that the AS-LUNG sets meet the USEPA’s criteria for continuous PM
2.5 monitors (r > 0.9 or R
2 > 0.81) [
30] and for candidate equivalent methods (r > 0.97 or R
2 > 0.94) [
31]. However, the slopes of these regression lines do not meet the USEPA’s criteria (1 ± 1) [
30]. This accuracy issue could be solved via the presented laboratory evaluation methodologies. With side-by-side comparisons, both the AS-LUNG-P and AS-LUNG-O readings could be converted to EDM-180-comparable measurements and serve as “seed” LCS devices in sensor networks. PMS3003 is not the newest Plantower sensor on the market; however, for research purposes, a sensor with a high R
2 with FEM instruments is much better to provide reliable data than the fancy ones with unknown drawbacks.
Our results also show that chamber experiments with better seals can acquire correction equations with a much lower variability between different LCS devices and duplicate experiments for PM2.5 and PM1 (higher ICC indexes) than the hood experiments. Since more observations at higher concentrations typically ensure the robustness of the regression equations, correction equations from the chamber are taken as more accurate estimations than the hood corrections. Without correction, the PMS3003 readings can overestimate PM2.5 by about 2–3 fold and PM1 by about 1.4–2.2 fold. It should be noted that PMS3003 seems to have an upper limit of around 200–250 µg/m3 for PM1 but can detect PM2.5 up to 400 µg/m3.
Both the hood and chamber experiments were able to obtain correction equations with high R
2 values and high ICC indexes (0.952–0.999), showing the excellent precision of AS-LUNG and the excellent repeatability of the presented experimental settings and protocol. The choice of experimental settings needs to consider the required expenses and acceptable degrees of variability. The advantage of using a chemical fume hood is that a hood is a standard set-up in a wet laboratory and does not require extra costs compared to a chamber. Using incense in hoods for side-by-side comparisons would encounter large variability with 10.9–15.5% for slopes with a mean RMSE of 1.67–2.40 for PM
2.5 and 7.6–10.1% for slopes with mean RMSEs of 1.38–1.82 for PM
1 (from both AS-LUNG-P and AS-LUNG-O). On the other hand, if resources permit, a chamber is a better choice for conducting evaluation experiments to acquire correction equations. With segmented regressions, the mean RMSEs of PM
2.5 are less than 1.18 µg/m
3 with %CVs less than 6.0% for the slopes and RMSEs in the range of 0.1–200 µg/m
3 with both incense and mosquito coils in the chamber experiments, with a slight increase in the %CV and RMSEs as the concentration increases. For PM
1 of 0.1–200 µg/m
3, the mean RMSEs are less than 1.56–1.63 µg/m
3, with inter-sensor variability of less than 11.8% with either incense or mosquito coils. These results demonstrate the steadiness of the PMS3003 sensor. Higher expenses provide better sealed conditions in chamber experiments with a reduced %CV. Our %CV results in the chamber experiments for PM
2.5 were within the USEPA’s acceptable measurement uncertainty for continuous PM
2.5 monitors (%CV < 10%) [
30].
Performance evaluations for PMS3003 have also been reported by other research groups. PMS3003 was assessed in wind tunnels, and high correlations were found with GRIMM 1.109 (R
2 = 0.73–0.97), with linearity of 200–850 µg/m
3 [
23]. Another chamber evaluation of data from 242 PMS3003 sensors found high linear correlations (R
2 > 0.978) with a DustTrak monitor with ammonium nitrate and alumina oxide, providing small intercepts, good repeatability, and certain deviations from the reference values [
20], similar to most of the results presented in this work. Additionally, the authors found significant differences between the responses of the sensors purchased from different batches, indicating the necessity to calibrate each batch. Moreover, field evaluations were conducted for PMS3003 sensors in two suburban regions with a mean 1 h PM
2.5 of 9 ± 9 and 10 ± 3 µg/m
3 and, in one location, a 1 h PM
2.5 of 36 ± 17 and 116 ± 57 µg/m
3 during the monsoon and post-monsoon seasons, respectively [
24]. These results showed excellent intra-PMS3003 precision (R
2 = 0.98–1.00), but their correlations with the reference instruments were not good. An RMSE of 3 µg/m
3 was found, with a quadratic fit for a 24 h integration time against an E-BAM, presenting non-linearity at high-levels above 300 µg/m
3 [
24]. Our work showed a high R
2 (0.930–0.999) with a breakpoint around 30–40 µg/m
3 in the range of 0.1–400 µg/m
3 for PM
2.5 (the highest level generated in our experiments) and a non-linear response above 200 µg/m
3 for PM
1.
Moreover, two other PMS sensor models, a PMS1003 and a PMS5003, were evaluated in Utah (USA), for 320 days against TEOM, and their RMSE values were found to be above 10 µg/m
3—much higher than our results in the laboratory [
25]. In addition, 19 AirBeams were compared against a BAM in California (USA), with a mean RMSE of 1.08 µg/m
3 [
38]. Our evaluation in the laboratory showed that the RMSE performance of PMS3003 is better than, or at least comparable to, that of other evaluated sensors.
The signal drift of the sensors after the 1.5 years field campaign was shown to be only 19–24%. Side-by-side comparisons may thus be needed once a year to maintain the validity of the correction equations. In addition, employing a cleaning procedure is also required to maintain good data quality of the sensors. Whether the sensor drift changed linearly over time or occurred suddenly needs to be further evaluated. It was found that the signals of PMS1003 did not change after one year of field operations in the USA [
25]. Another sensor, the AirBeam, drifted less than 5% before and after a two-month campaign in the USA [
38]. LCS devices have received significant attention for their potential applications. However, the potential drift of sensor responses and the required maintenance of such sensors have not been documented prior to this manuscript. These results demonstrate that wherever data accuracy is important for long-term monitoring, proper maintenance is mandatory. More works need to be done to better assess sensor drift.
4.2. Choices of Evaluation Settings
Traditionally, aerosol scientists have tended to use expensive standard dust, such as Arizona road dust or urban dust, to evaluate sensor performance. To maintain our low-cost principle, we used commercially available and inexpensive aerosols for our evaluations. To avoid a fire hazard, incense sticks and mosquito coils were chosen as our test materials rather than straw, and cigarettes were not considered due to their tar contents, which could contaminate the chamber surfaces. Although the ingredients of the incense sticks and mosquito coils differ greatly, the acquired correction equations are quite similar, implying the robustness of the correction equations. Incense sticks can be purchased worldwide for traditional, religious, or relaxation purposes. Thus, incense sticks are recommended to be used as economic burnt material examples for side-by-side comparisons. Additionally, for sensor evaluations, previous work has shown that incense offers similar performance to PM
2.5 in residential air in Baltimore, suggesting that incense may be a suitable substitute for urban PM
2.5 [
22]. The results of the current work and our previous work [
10] also support the use of incense sticks, representing urban PM
2.5, for the evaluation of LCS.
In this work, segmented regression was applied to obtain the correction equations for 0.1–200, 0.1–300, 0.1–400 µg/m3 of PM2.5 from the chamber experiments. These correction equations have much smaller intercepts than those of the linear regressions from either the hood or chamber experiments, with much smaller RMSEs. The R2 values are 0.999 for all three concentration ranges. Therefore, segmented regressions are recommended for the correction equations, rather than linear regressions.
Moreover, our side-by-side comparisons were conducted with two GRIMM 1.109 instruments. Based on their good agreement with the EDM-180, a FEM instrument, the final correction equations were constructed to convert AS-LUNG readings into FEM-comparable values. The GRIMM device is smaller and easier to carry around. If only research-grade measurements are needed, correction experiments with the GRIMM 1.109 are sufficient. However, if FEM comparable measurements are preferred and resources permit, EDM-180 is recommended as an optimal instrument to use for side-by-side comparisons.
For scientists who have the resources to conduct laboratory evaluations, this work provides valuable information on the choice of experimental setting (i.e., a chemical fume hood versus chamber), the materials used (incense versus mosquito coils), and linear or segmented regression equations. Different correction equations were compared in this study to illustrate possible biases and variability under different experimental conditions. Traditional methods of conducting side-by-side comparisons with research instruments typically use standard dust and required repeated experiments under pre-specified temperatures and RH conditions inside a temperature and RH controlled chamber, thus requiring more resources. The greatest advantage of our method is that one can acquire robust correction equations for a stable PM sensor to obtain FEM-comparable data with considerably lower costs and result that are closer to real-world scenarios than those obtained using traditional methods.
For scientists who have only limited resources and intend to use AS-LUNG sets in areas of interest with PM2.5 concentrations higher than 10 µg/m3, hood experiments with incense and GRIMM 1.109 with linear regressions are sufficient. Lower costs come with higher variability in the slopes and RMSEs. The large intercepts of correction equations would not be an issue in polluted areas. Even with 15–20% variability, AS-LUNG sets, after conversion, can be used as seed LCS devices for the reading adjustment of other uncalibrated LCS devices. The raw measurements with 2–3 times overestimations can be corrected to a more acceptable concentration range.
Due to economic considerations, only one experiment under certain temperature and RH conditions is preferred to acquire one correction equation for one LCS device. The conditions for one experiment cannot cover the wide range of all environmental conditions in the field. Extra side-by-side comparisons should be carried out during different seasons in the field to obtain correction equations covering wider temperature and humidity ranges. This process should be less expensive than setting up comparisons in a temperature- and humidity-controlled chamber. The correction equations established in this work are intended to be applied in the field in subtropical Taiwan, under a temperature of 15–30 °C and humidity of 70–84% based on the monthly means in non-mountainous areas from 1981–2010 [
39]. Certain subtropical areas, such as southeastern Asia, have similar climatic conditions to Taiwan with high PM levels. This inexpensive method of conducting side-by-side evaluations could also be carried out in these countries to facilitate the development of LCS networks.
A limitation of this work is that the provided correction equations may not be applicable in other countries with different temperatures and humidity ranges, although this methodology still has practical value. The impacts of temperature and humidity on PM LCS have also been evaluated by different research groups [
20,
24,
31,
40]. Two groups, for example, developed correction equations that consider temperature and humidity [
24,
40]. While our group did not adjust for temperature and humidity based on the aforementioned reasons and due to the subtropical climatic conditions in Taiwan, we acknowledge the need to acquire correction equations under different temperature and humidity ranges for other research groups. For countries located in different climate zones, one or two more correction equations at lower/higher temperatures and dried humidity may be required for the equations to be applicable in the field.