3.1. Observed PurpleAir and Regulatory PM2.5
To ensure quality PM
2.5 data from PurpleAir, the developed QA/QC routine has eliminated about 15% of the 5-min PM
2.5 data for further analysis. It was imperative to evaluate and validate PurpleAir PM
2.5 with observed PM
2.5 at regulatory sites.
Figure 1 also shows FEM/FRM and non-FEM/FRM-monitored PM
2.5 sites in California during 2016 and 2023. Details of these sites are in
Table 1 and
Table 2, and
Supplementary Material Tables S1 and S2 with AQS ID, site name, PurpleAir monitor ID, and dates of monitoring. It also shows the approximate distance calculated between PurpleAir monitor and regulatory site. Regulatory monitor data were downloaded from EPA AQS Datamart from 2016 to 2022 [
32]. The most recent PM
2.5 data are available until October 2022 and were used for the analysis. PurpleAir monitored PM
2.5 was graphically and statistically evaluated for both FEM/FRM and non-FEM/FRM monitored PM
2.5. All regulatory sites with PurpleAir monitor within 20 m were analysed. Time-series and scatter plots are shown only for four FEM/FRM, and four non-FEM/FRM sites were selected, covering the North to South of California for discussions.
Figure 3 and
Figure 4 show hourly average PM
2.5, in black lines, at four FEM/FRM and four non-FEM/FRM sites and PurpleAir PM
2.5 in purple dots (
x-axis is in MM/YY format). From these figures, it is very clear that PurpleAir monitors captured the trend of PM
2.5 at regulatory monitors from 2016 to 2022. PurpleAir observed higher PM
2.5 concentrations for both FEM/FRM and non-FEM/FRM regulatory monitors. They also captured the PM
2.5 events due to forest fires along with regulatory monitors. PurpleAir PM
2.5 followed the trends of regulatory monitors for both less than 100 µg/m
3 and greater than 100 µg/m
3 PM
2.5 concentrations. PM
2.5 above 200 µg/m
3 were captured by PurpleAir at Fresno-Garland (
Figure 3c) and all non-FEM/FRM sites with the exception of one day spike at El Rio-El Rio Mesa School (
Figure 3d). Spikes in PM
2.5 concentrations at Sacramento-T Street (
Figure 4c) were observed due to forest fire and the trend can be seen by both regulatory and PurpleAir monitors. Thus, the sensors have been able to capture local and regional episodic events.
Figure 5 shows scatter plots with hourly average PurpleAir PM
2.5 concentrations on the
y-axis and regulatory monitored PM
2.5 concentrations on
x-axis. Detailed monitoring information can be found in
Table 1. These plots show PurpleAir monitored higher concentrations than regulatory monitors for most of the time. Scatter plots also show +/−25% dotted lines and, for the majority of times, the scatter dots were out of +/−25% range with higher number of dots towards the
y-axis or PurpleAir PM
2.5. The linear fit line for all sites is on the positive side of +25%. Only El Rio School site (
Figure 5b) site has shown a one-to-one linear fit.
PurpleAir-monitored PM
2.5 were mostly higher than the regulatory monitored PM
2.5. This may be because PurpleAir monitors were calibrated by the manufacturer using particles with completely different properties than particulate matter in the ambient air [
34], and the conversion of particle counts to mass is also unknown [
15]. Aside from that, it was found that the ambient air also includes water droplets with aerodynamic particle size. Traditionally, both FEM/FRM and non-FEM/FRM monitors measure PM
2.5 by removing water content in the sample inlet. This was achieved by heating the sample air in the inlet pipe. However, on the contrary, PurpleAir sensors measure PM
2.5 concentrations without removing moisture content in aerosols. It is the water content in the ambient air that makes PM
2.5 measured by PurpleAir as an “Absolute PM
2.5” or, in context to regulatory monitors, as “Wet PM
2.5”. The adjustment of water content in the PurpleAir measured PM
2.5 during the conversion from particle count to mass is unknown. Therefore, even before the comparison between PurpleAir PM
2.5 with FEM/FRM and non-FEM/FRM monitored PM
2.5, the PurpleAir PM
2.5 concentrations will be greater than regulatory monitors most of the time.
Table 1 shows a statistical evaluation of PurpleAir monitors in comparison with regulatory monitors. For statistical evaluation of the PM
2.5 corelation coefficient (R
2), mean bias (MB) and root mean square error (RMSE) were performed. Mean bias is primarily used to estimate the average bias between two variables. The coefficient of determination, R-squared (R
2), determines how well data fit the regression model compared to observation data. The Root Mean Square Error (RMSE) is a frequently used measure of the difference between two actual measures and how much error there is between two variables. Equations of the evaluation indices are shown below:
where
Pi is PurpleAir PM
2.5 concentrations, R
i is regulatory PM
2.5 concentrations,
is mean of R
i,
is mean of
Pi, and n is the number of hourly samples.
For all FEM/FRM (
Table 1 and
Table S1), coefficient of determination, R
2 values were between 0.23 and 0.9 with an average of 0.62. For all non-FEM/FRM (
Table 2 and
Table S2), R
2 values were between 0.27 and 0.92 with an average of 0.74, which was lower than reported studies conducted for shorter durations [
10,
21,
23]. The coefficient of determination, R
2, of Goleta, El Rio-El Rio School, and Lompoc-H Street has shown the lowest values of 0.56, 0.5, and 0.6, respectively. These three sites are along the coastlines of Southern California. It is expected that the moisture content in the coastal air will be higher than the inland area. This affirms that moisture content plays a significant role in PurpleAir PM
2.5 monitoring. Moisture in the air attracts PM due to its hygroscopic characteristics and results in higher concentrations. As of now, PurpleAir monitors do not heat inlet air compared to regulatory monitors. The rest of the sites, located inland, have shown higher R
2 of greater than 0.70. The mean bias is highest at Fresno-Garland of 9.62 µg/m
3 followed by 6.93 µg/m
3 at Sacramento-T Street, as shown in
Supplementary Material Tables S1 and S2. The mean bias for all sites were positive, showing higher PM
2.5 from PurpleAir than FEM/FRM and non-FEM/FRM.
After validation of the performance of purple air sensors with observed daily average data, the sensor data were used to perform detailed summary statistics across different regions of California: from Bay Area Air Quality Management District (AQMD) (Bay Area), Sacramento Metropolitan AQMD (Sacramento), San Diego Air Pollution Control District (APCD) (San Diego), San Joaquin Valley APCD (San Joaquin), and South Coast AQMD (South Coast) according to the availability of data from sensors for recent years (2018–2020). After excluding poorly performing sensors (around 4%), all the purple air sensors were used in this statistical analysis.
Table 2,
Table 3 and
Table 4 show results from this analysis.
The sensor dataset revealed a wide range of PM
2.5 concentrations, with a maximum 24 h average concentration of about 200 µg/m
3 measured in the Bay Area in 2020. Other northern Californian regions (Sacramento), which showed the next highest daily concentrations, were followed by the South Coast and San Diego, respectively. Dry weather [
35] and forest management techniques over the past few decades have also contributed to a rise in the frequency and intensity of wildfire outbreaks in California, contributing to the severity of the 2020 fire season there. Even though they were less severe than in 2020, California experienced record-breaking wildfires in 2018, which also displayed comparable trends in all the aforementioned districts. San Diego was less affected by the wildfires. Residential fire burning and other incidents, such as fire starting from electric transmission lines, contributed to higher maximum PM
2.5 concentrations in Northern California in 2019. Land–sea breezes can significantly pollute Northern California’s coastal areas. The influence of a combination of wildfires and anthropogenic emissions was felt at South Coast as well, leading to higher concentrations in this region.
Additionally, across the entire state of California, the standard deviation ranged from 8 to 30 µg/m3 for the individual counties, with higher variabilities in the northern California regions most affected by fires. The median PM2.5 concentration of the dataset was between 5 and 13 µg/m3, while the mean concentrations ranged from 7 to 30 µg/m3. Overall, the PM2.5 measurements showed higher maxima and standard deviation values in 2020 compared to 2019 or 2018, which is commiserate with the fact that wildfire intensity peaked in 2020, as previously mentioned.
A region-wise inter-comparison (Bay Area, San Diego, Sacramento, San Joaquin, and South Coast) in the State of California of the daily average of sensor data for the years 2018–2020, as seen from
Table 2,
Table 3 and
Table 4, revealed that areas in northern California, including Bay Area, Sacramento, and San Joaquin, had distinctly higher 95th and 75th percentile concentrations in comparison to South Coast during the year of extensive wildfires in 2020. This reemphasizes the importance of wildfire impact on air pollution in the Northern California. Therefore, 2020 may be considered the year of the highest daily PM
2.5 concentrations measured in California.
The number of sensors has also increased significantly, from around 369 in 2018 to around 773 sensors in 2020 in the Bay Area, a growth of a huge 110 percent. The other two areas, Sacramento and San Diego, exhibit a more modest growth of sensors (18 percent in San Diego and 94 percent in Sacramento) in comparison to the Bay Area. The southern part of California (South Coast) witnessed a growth of sensors from around 552 in 2018 to around 741 in 2020, a growth of 34 percent.
Table 5 shows the total number of daily-average (or 24 h average) PM
2.5 of all sensors’ observations and exceedances in the regions of California. For all regions, in contrast to 3% in 2019 and 7% in 2018, the overall average percentage of exceedances over all regions of the state of California was almost 11% in 2020. In terms of overall exceedances in 2020, the Bay Area has the highest percentage of exceedances (58%), followed by the South Coast (21%), San Joaquin (12%), Sacramento (8%), and San Diego (1%). The South Coast had the largest percentage of exceedances in 2019 (51%), followed by the Bay Area (27%), San Joaquin (17%), Sacramento (4%), and San Diego (1%). 2018 saw the highest percentage of exceedances in the South Coast (54%), followed by the Bay Area (24%), San Joaquin (18%), and San Diego (0.7%). The analysis demonstrates the impact of COVID-19 in the South Coast in 2020, when anthropogenic emissions were lower than in 2018. However, the effects of the California wildfires were more noticeable in 2020.
To test the significance of the annual variations (2018–2020) in the distribution of daily mean PM
2.5 levels, as measured by the sensors across the state of California, the non-parametric Kruskal–Wallis test was performed to determine whether the distribution of daily means was identical to each other or showed any significant difference amongst them for the years (2018–2020). The null hypothesis that many samples were taken from the same population was tested using this non-parametric technique, which is arguably the most extensively used test for this purpose. Since the null hypothesis was rejected across the years for all the regions of California, a post hoc test Dunn’s test was conducted to perform a multi-comparison analysis across all years for all regions in California to find out which samples (years) were different from each other. The Dunn’s test results from Bay Area have been displayed below for the years 2018, 2019, and 2020 as a representative result in
Table 6.
The tables show that the differences in concentrations were significant across all years at the 95th confidence level (
Table 4) since
p = 0.00 <
p = 0.05. In the case of San Diego, the differences in daily mean concentrations for 2018 and 2019, and between 2019 and 2020 were significant (
p = 0.00) but there were no significant differences between 2019 and 2020 (
p > 0.05). For South Coast, there were significant differences in daily mean concentrations of PM
2.5 across all the years considered (
p = 0.00). For San Joaquin, the differences in daily average PM
2.5 concentrations over the years were significant. For Sacramento, the differences were not significant for the years 2018 and 2020 (
p > 0.05) but were significant across the other years.
3.2. Geostatistically Predicted and Observed PM2.5
The regulatory monitoring network is too sparse to support community-scale PM
2.5 exposure assessments. PurpleAir monitoring network provides more dense monitors up to community-scale and spatially across California State compared to the existing regulatory monitoring network. Geostatistical interpolation techniques—Kriging and IDW using PurpleAir PM
2.5—might help to bridge the gap between PurpleAir and regulatory monitored PM
2.5. Interpolation was conducted using the daily average PurpleAir PM
2.5 for the years 2018 and 2020 as the PurpleAir monitoring began in California in 2016, and fewer monitors were in operation until the end of 2017.
Figure 6 shows statistically interpolated PurpleAir, FEM/FRM, and non-FEM/FRM daily average PM
2.5 on 16 November 2018, by Kriging and IDW. Both statistical interpolation techniques have captured the smoke dispersion from CAMP fire started on 8 November 2018 [
36]. The difference in spatially interpolated daily average PurpleAir PM
2.5 in the northern part of California was due to a difference in interpolation approaches by Kriging and IDW. For both years, the interpolated sensor data provided a realistic representation of daily PM
2.5 concentrations and thus may reduce the uncertainty introduced by interpolation errors due to a sparse observational network of FRM and non-FRM monitors for effective decision-making. However, although the sensor data are subject to some uncertainty, as discussed earlier, the interpolated PM
2.5 from PurpleAir has shown a better representation of PM
2.5 due to the dense number of PM
2.5 monitors for interpolation in comparison to the thinly distributed network of FEM/FRM and non-FEM/FRM monitors. For further analysis, four regulatory sites across California State without monitors were selected for its assessment. The reason for not selecting collocated monitored sites was to avoid the influence of monitored PurpleAir PM
2.5 at the same location.
Figure 7 shows observed daily average PM
2.5 concentrations in black lines and interpolated PM
2.5 concentrations at the four above-mentioned regulatory monitoring sites (
x-axis is in MM/YY format). The time-series plots show a good agreement between observed and interpolated PM
2.5. Both IDW and Kriging methods captured the peaks of observed PM
2.5. However, for many days, Kriging and IDW over-predicted the PM
2.5, as shown in
Figure 7. The reason of the over prediction can be due to higher observed PM
2.5 by PurpleAir monitors. Scatter plots with interpolated PM
2.5 on
y-axis and regulatory on
x-axis show good agreement, and most of the interpolated falls between +/−25%. Both Kriging and IDW geo-statistically demonstrated that these can be used to interpolated daily average PurpleAir PM
2.5 at unmonitored locations for exposure and air quality assessments. The agreement between geo-statistically interpolated PurpleAir and observed daily average PM
2.5 gives confidence in using PurpleAir PM
2.5 with regulatory monitors to estimate PM
2.5 at unmonitored locations. This demonstrates that low-cost PM
2.5 sensors have a potential to fill in the gaps in the regulatory monitoring networks and might be useful to overcome the limitations and improve the air quality assessments and other scientific assessments. These PurpleAir PM
2.5 can be integrated and used with observed regulatory PM
2.5 to formulate a decision support system using geostatistical techniques, but before that, the uncertainty due to sensor measurements should be minimized prior to their usage to supplement regulatory monitors.
Table 7 shows a statistical evaluation of interpolated daily averaged PurpleAir PM
2.5, using Kriging and IDW techniques, with daily averaged observed PM
2.5 concentrations. The interpolated PM
2.5 by Kriging has lower Root Mean Square Error (RMSE) and Mean Bias (MB) values than IDW. Corelation co-efficient values for the Oakland-West and Stockton-Hazelton sites were above 0.76 and were lower for the Mira Loma and Otay Mesa sites.