Use of Citizen Science-Derived Data for Spatial and Temporal Modeling of Particulate Matter near the US / Mexico Border

: This paper describes the use of citizen science-derived data for the creation of a land-use regression (LUR) model for particulate matter (PM 2.5 and PM coarse ) for a vulnerable community in Imperial County, California (CA), near the United States (US) / Mexico border. Data from the Imperial County Community Air Monitoring Network community monitors were calibrated and added to a LUR, along with meteorology and land use. PM 2.5 and PM coarse were predicted across the county at the monthly timescale. Model types were compared by cross-validated (CV) R 2 and root-mean-square error (RMSE). The Bayesian additive regression trees model (BART) performed the best for both PM 2.5 (CV R 2 = 0.47, RMSE = 1.5 µ g / m 3 ) and PM coarse (CV R 2 = 0.65, RMSE = 8.07 µ g / m 3 ). Model predictions were also compared to measurements from the regulatory monitors. RMSE for the monthly models was 3.6 µ g / m 3 for PM 2.5 and 17.7 µ g / m 3 for PM coarse . Variable importance measures pointed to seasonality and length of roads as drivers of PM 2.5 , and seasonality, type of farmland, and length of roads as drivers of PM coarse . Predicted PM 2.5 was elevated near the US / Mexico border and predicted PM coarse was elevated in the center of Imperial Valley. Both sizes of PM were high near the western edge of the Salton Sea. This analysis provides some of the initial evidence for the utility of citizen science-derived pollution measurements to develop spatial and temporal models which can make estimates of pollution levels throughout vulnerable communities.


Introduction
Both PM 2.5 (particulate matter less than 2.5 µm in diameter) and PM 10 (particulate matter less than 10 µm in diameter) are linked to adverse health outcomes such as respiratory and cardiac disease, asthma, and increased mortality [1][2][3][4]. PM coarse is the difference between PM 10 and PM 2.5 and, as such, represents a different size fraction than PM 2.5 , whereas PM 10 overlaps with PM 2.5 . The United States (US) Environmental Protection Agency (EPA) refers to particles between 2.5 and 10 µm in diameter measurements of PM from both Federal Equivalent Method (FEM) beta-attenuation monitors (BAMs) and Federal Reference Method (FRM) filter-based gravimetric samplers. By comparing FEM and FRM data with data from the community air monitors, an equation for estimating mass concentrations from the Dylos particle count concentrations was developed. The calibration equation was validated by comparing calibrated monitor results with PM 2.5 measured by collocated reference instruments at six other sites. The process was previously described in detail [15].
Community monitoring PM 2.5 , PM 10 , relative humidity, and temperature data from 35 sites for a 12-month period, 1 October 2016 to 1 October 2017, were used in the following analyses. Relative humidity is known to change particle size due to the addition or subtraction of water from particles [16]. Since temperature and relative humidity were moderately correlated, only relative humidity was included in the conversion equation. The PM data were converted from particle counts to particle mass as detailed in Carvlin et al. [15]. However, the conversion equation was updated using data from the Calexico-Ethel site for the 12-month study period. The previous PM 10 where β 0 is the intercept, RH is the relative humidity as measured by the RH sensor on our custom circuit board, and e is the residual error. The factors c1, c2, and c3 were used in the inversion of the model to estimate BAM equivalent PM concentrations from Dylos measurements, as described in Carvlin et al. [15]. As a part of the conversion process, data were run through an automated quality control (QC) process; hours with less than 75% of data and data with particle counts less than 30 in Dylos bin 1 were discarded. After the automatic QC, a manual QC process was performed to identify time periods when the monitor response was slowly attenuated due to incremental dust build-up on the photodiode and when the monitor readings were oscillating rapidly between high and low, which resulted in a further 1.2% of data being dropped. The conversion equation can produce negative numbers, which were used as is in the following analyses unless otherwise noted. Hourly PM 2.5 and PM coarse data were averaged to monthly data using a 50% data completeness cutoff. This left a total of 207 monthly data points across 33 monitors. Each monitor had six months of data on average; however, some monitors had only a few months. The monitors that had the least amount of data were those near the Salton Sea. These monitors have poor cell reception and are subject to harsher environmental conditions, which leads to lower data completeness.
Regulatory data were downloaded from the California Air Resources Board (CARB) website. The regulatory network consists of five sites located near population centers in Imperial Valley that have Met One 1020 PM 10 beta attenuation monitors (BAMs) (Met One, Grants Pass, OR, USA) [17]. Two of these sites also have Met One 1020 PM 2.5 BAMs. There are also five sites located around the Salton Sea that are operated by the Imperial Irrigation District that have PM 2.5 and PM 10 Thermo Fischer Scientific Series 1405-D tapered element oscillating microbalances (TEOMs) (Thermo Fischer Scientific, Waltham, MA, USA) [18]. These sites were set up to monitor emissions from the Salton Sea as it recedes due to changes in water rights that reduced the agricultural runoff that kept the sea from evaporating. Only QC screened data were used. Values greater than 985 µg/m 3 for BAMs were excluded since this is above the range of the instrument [19]. No upper cutoff was used for TEOM measurements since all values were within the instrument range [20]. All negative values from regulatory instruments were kept as is and were included in the analysis.
Land-use variables and community monitor locations were loaded into ArcGIS (ESRI, v. 10.3.1, Redlands, CA, USA). Then, 250 m, 500 m, and 1000 m buffers were created around each monitor. Land-use parameters were sampled within each of these buffers. Geographic information system (GIS) and meteorological variables are listed in Table 1 along with their source, date, buffers, and averaging period. All data manipulation and analyses were performed using R statistical software (v. 3.3.3, https://www.r-project.org/). Agricultural burning records were received from the Imperial Air Pollution Control District. Acres burned was recorded on the daily level. This information was added to the model as acres burned within 5 km of a monitoring site within the last day. When multiple burns were recorded within 5 km of a site, they were summed.
Other GIS variables were considered but rejected since all monitors had the same value or nearly the same value for that variable. Dropped variables included indicators of industrial PM emissions since none of our monitors were located near industrial sites that had permits to release PM. Satellite PM 2.5 was included, but was not predictive, perhaps because the data were 15 years old and satellite measurements are known to not perform well in desert areas.
Meteorological data completeness was less than ideal, especially for planetary boundary layer height. Because the models require a complete dataset, hours that did not have meteorological data were dropped. This resulted in a limited number of complete hours for October and November 2017 and, therefore, they are not included in the monthly and yearly PM maps.
Some monitors have buffers that cross the US/Mexico border. However, we had no land-use data for Mexico. If only the US side of the land use was used, then the true value of that land use would be underestimated. To adjust for this, the percentage area of the buffer within Mexico was recorded for each site. Then, the variables were multiplied by 100/(100 − % area), which gave the land use for the whole buffer assuming the same distribution in Mexico as in the US. A satellite image of the area showed that, in most cases, land within the buffer on both sides of the border was primarily urban land.
Land-use variables were converted from continuous to categorical or binary. This was done because the monitors do not cover the range of land use seen throughout the valley, particularly the range of land use sampled by the grid of points used for prediction (the fishnet). Therefore, if linear extrapolation was used then the fishnet predictions failed, becoming extremely small or large. Histograms of each variable were analyzed to determine whether the variable should be converted to binary or categorical. The cut point for the binary variables was the first quartile. Most of these variables had nearly all of the measurements around zero and just a few at much higher values. All categorical variables were given three categories. The cut points for the categorical variables were the first quartile, the median, and the third quartile. If the continuous value was less than or equal to the first 25% of all values, it was given the categorical value "low"; if it was between 25% and 75%, it was categorized as "medium", and if it was greater than 75% it was categorized as "high". After conversion from continuous to categorical and binary, the fishnet predictions were much closer to the range of the monitoring measurements. However, the choice of cut points is dependent on the data and, therefore, limits the more general application of the models developed in this paper.
Three alternative models were considered. Categorical variables were converted to binary variables for use in models which cannot process categories. The models were a Bayesian additive regression trees (BART) model, a lasso model, and a partial least squares (PLS) model. PLS is a modeling technique that re-projects the data in order to find the dimension in the input variable space that explains the most variance in the outcome. PLS is used in PM modeling, in particular when there are a large number of variables [21]. Lasso is a penalized least squares method that reduces the number of variables in the model based on an alpha parameter. This parameter is chosen based on cross-validated testing. Mercer et al. [22] and Knibbs et al. [23] used lasso to help reduce the number of variables in PM LUR models. BART is a model that sums individual regression trees using a Bayesian approach [24]. It was used to predict torrential rain and avalanches, and to relate vehicle trip duration to household characteristics [25][26][27]. It should be noted that BART has a random component such that, each time it is re-run, it will produce slightly different results. This is why, for model creation and variable selection, it was important to have a large sample size of runs to get a sense of the average response for the model.
Models were compared by leaving out one site at a time and calculating the RMSE at that site using the rest of the sites. For each model, the RMSE was averaged across all sites. BART was found to be the best performing model for PM 2.5 and PM coarse . A variable selection test for these models was performed to identify which variables had the most impact on the model. The variable selection for the BART models was done by dropping one variable at a time from the model and calculating the test R 2 . Each BART model was run many times and the results were averaged to reduce bias from the random component. The variables that led to the largest decrease in R 2 were those that had the most impact on the model. The 10 most important variables for the PM 2.5 and PM coarse models were compared. In order to compare variable selection stability across models, the top 10 BART and lasso variables were compared. Lasso variables were selected based on their standardized coefficient values, corrected by their standard deviations.

Prediction
The same steps described in Section 2.1 were followed for a grid of points, the fishnet, equally spaced one mile (1.6 km) apart across all of Imperial County (i.e., sampling GIS parameters within buffers at each grid point, finding the nearest meteorological data, etc.).
The BART models were used to predict monthly PM 2.5 and PM coarse concentrations on the fishnet points. The residuals of the PM 2.5 and PM coarse BART models were kriged and added to the model predictions before mapping. This helped the model account for purely spatial variability. These combined PM levels, at both the monthly and study-average timescale, were kriged and a 2D surface was created and compared to kriged PM measurements.
The fishnet predictions were then compared to regulatory PM concentrations. The fishnet point closest to the regulatory site was chosen for comparison. The longest distance between a fishnet point and a regulatory site was <1 km. Model predictions and regulatory measurements were compared using R 2 and RMSE on the monthly and yearly timescales. Table 2 presents summary statistics for the community monitors averaged over the 40 sites in the network compared to the regulatory monitors. Regulatory sites are located to provide coverage of populated areas in the Imperial Valley, as well as to monitor the air quality around the Salton Sea, and they consist of a mix of BAM and TEOM instruments [14]. High hourly PM 2.5 and very high hourly PM 10 and PM coarse values were seen by both the community monitors and the regulatory monitors. The community monitors produced a similar county average to the regulatory monitors for PM 2.5 . However, the community monitors measured slightly lower PM 10 and PM coarse .  Table 3 shows the result of the modeling process for monthly PM 2.5 . RMSE and R 2 were averaged across all of the leave-one-site-out variants. The lasso model chose 47 variables and the PLS model chose 24 variables. In terms of RMSE, lasso and PLS performed similarly and BART performed the best. The average RMSE between model predictions and the test set ranged from 1.50 to 1.79 µg/m 3 . Model R 2 values did not follow the same order as model RMSE. PLS had the highest R 2 (0.54), followed by PLS and BART (0.54 and 0.47, respectively). RMSE was used to select the top performing model since a lower prediction error was deemed more important than linearity. Also, there were a variable number of months of data for each site, and low sample size can dramatically change R 2 , especially when the range of the data is small. For example, some sites only had two months of data and others only had a 2-µg/m 3 variation in monthly average PM 2.5 over the course of a year. BART was chosen as the top performing model based on RMSE. The results of the modeling process for monthly PM coarse are presented in Table 4. Lasso chose 40 variables and PLS chose seven variables. Lasso and PLS performed similarly and BART performed the best. The average RMSE for PM coarse models ranged from 9.14 to 12.30 µg/m 3 . Model R 2 values followed the order of model RMSE for PM coarse . The R 2 value for the PM coarse BART model was 0.65. The difference in performance between models was found to be unstable while testing differing predictor variables, i.e., the top model would switch between PLS, lasso, and BART depending on which variables were included. This would suggest, with this dataset, that these model types are more or less comparable in terms of performance. Table 5 shows the results from the variable importance test for the PM 2.5 and PM coarse models. The top variable for the PM 2.5 model was relative humidity, and the top variable for the PM coarse model was temperature. These variables varied monthly and likely accounted for seasonal patterns. It should be noted the relative humidity may have a confounding effect since it was used to correct the Dylos measurements and was used in the model. However, the relative humidity data used in the model were a monthly average. The PM 2.5 model places more emphasis on meteorological variables, such as wind direction, wind speed, and planetary boundary layer height, and location, while the PM coarse model places more emphasis on land-use variables for farmland, native vegetation, and roads. The California Department of Water Resources defines native vegetation as all desert land and non-riparian land near bodies of water. For PM 2.5 , the lasso model chose land-use variables that represented nearly all categories of land use, as well as a variable representing railroad length, whereas the BART model put more emphasis on meteorological variables. For PM coarse , the lasso model and BART models chose similar variables with more emphasis on land use in the lasso model and more emphasis on transportation in the BART model. The PM coarse lasso model also did not have any meteorological variables. Figure 1a,b show the kriged study-average residuals from the PM 2.5 and PM coarse models. The PM 2. to see if there was spatial correlation in the residuals by month. PM2.5 had two months where p < 0.05, May and August 2017, and PMcoarse had three months with p < 0.05, January, July, and August 2017. The average measured PM2.5 over the study period is shown in Figure 2a and the average predicted PM2.5 is shown in Figure 2b. The predicted PM2.5 map is a combination of the PM2.5 model and the kriged PM2.5 residuals. The spatial structure of the predicted PM2.5 map matches that of the measured PM2.5 map. The highest average PM2.5 levels were near the US/Mexico border and to the west of the Salton Sea near Salton City. Low-PM2.5 regions appeared primarily to the east of the Salton Sea. The predicted PM2.5 map is biased low compared to the measured PM2.5 map. The average measured PM 2.5 over the study period is shown in Figure 2a and the average predicted PM 2.5 is shown in Figure 2b. The predicted PM 2.5 map is a combination of the PM 2.5 model and the kriged PM 2.5 residuals. The spatial structure of the predicted PM 2.5 map matches that of the measured PM 2.5 map. The highest average PM 2.5 levels were near the US/Mexico border and to the west of the Salton Sea near Salton City. Low-PM 2.5 regions appeared primarily to the east of the Salton Sea. The predicted PM 2.5 map is biased low compared to the measured PM 2.5 map.  Figure 3a shows the average measured PMcoarse and Figure 3b shows the average predicted PMcoarse over the study period. The predicted PMcoarse map is a combination of the PMcoarse model and the kriged PMcoarse residuals. The spatial structure of the PMcoarse map differs slightly from the measure PMcoarse map in that the predictions are lower to the southeast of the Salton Sea and higher in the southeast portion of the Imperial Valley. The highest PMcoarse to the west of the Salton Sea and the area east of the Salton Sea had low PMcoarse concentrations. The predicted PMcoarse map is biased low compared to the measured PMcoarse map.  Figure 3a shows the average measured PM coarse and Figure 3b shows the average predicted PM coarse over the study period. The predicted PM coarse map is a combination of the PM coarse model and the kriged PM coarse residuals. The spatial structure of the PM coarse map differs slightly from the measure PM coarse map in that the predictions are lower to the southeast of the Salton Sea and higher in the southeast portion of the Imperial Valley. The highest PM coarse to the west of the Salton Sea and the area east of the Salton Sea had low PM coarse concentrations. The predicted PM coarse map is biased low compared to the measured PM coarse map.      Table 6 shows summary statistics comparing the fishnet predictions and data from the regulatory monitoring network for monthly and yearly averages. The yearly RMSE was 3.2 µg/m 3 for PM 2.5 and 17.7 µg/m 3 for PM coarse . The difference between the model prediction and regulatory monitor values is likely driven by spatiotemporal factors not accounted for in the model and measurement error introduced by the monitors and predictor variables. In particular, PM coarse measurements were biased low compared to regulatory measurements as noted above.  [7]. All of these models produced annual averages of PM 2.5 and, therefore, are not directly comparable to the monthly models described in this paper. That said, the R 2 and RMSE values for the PM 2.5 models presented in this paper are within the range of the annual models found in the literature.

Results
Eeftens et al. [6] created PM coarse LUR models for 20 European cities as a part of the ESCAPE project. The range of validation R 2 seen across the 20 models was 0.03 to 0.73, with a median R 2 of 0.57. Wolf et al. [28] used the ESCAPE data to create finer-spatial-resolution PM coarse LUR models for Germany, which had a validation R 2 of 0.49. Eeftens et al. [29] modeled PM coarse using an LUR model for eight areas in Switzerland. The average validation R 2 was 0.38. These studies were also annual averages. In general, PM coarse LUR models seem to have similar or slightly lower performance compared to PM 2.5 LUR models. The PM coarse models presented in this paper have R 2 values in the range of the annual models found in the literature.
Ahangar et al. [30] used 2017 data from the community network to obtain a finely resolved concentration map of PM using a dispersion model. The residuals between model estimates at the monitor locations and the measured concentrations were then interpolated to the grid points using kriging. They compared predicted monthly averaged PM with data from a regulatory monitor at one location and found that most of the modeled values fell within a factor of two of the regulatory values; their resulting concentration maps were consistent with this study in showing the highest values of PM at the international border, yet concentrations were higher directly south of the Salton Sea in their study as opposed to the west of the Salton Sea in the present study. These differences may be due to time period differences of the two studies and the different modeling approaches.
This paper helps to elucidate the effect of meteorological and land-use parameters on particulate matter in Imperial County. The variable selection process pointed to seasonality, as measured by relative humidity, meteorological parameters, and length of roads as prime drivers in monthly PM. Important variables for monthly PM coarse were seasonality, based on temperature, land-use variables for farmland, and length of roads. The predicted PM coarse map showed areas of high PM coarse around Brawley and to the west of the Salton Sea. The predicted PM map showed high levels in a large area near the US/Mexico border and to the west of the Salton Sea. This may point to agriculture as a potential source of PM coarse , cross-border transport as a potential source of PM, and windblown dust as a source of both sizes of PM.
The PM and PM coarse model variable selection and predicted PM maps seem to agree with the emissions inventories in the 2018 PM 2.5 and 2018 PM 10 state implementation plans (SIPs) created by the Imperial County Air Pollution Control District (ICAPCD) [9,10]. The 2018 PM 2.5 SIP found that the main drivers of PM within Imperial County were unpaved road dust (39%), fugitive windblown dust (30%), off-road vehicles (8%), and agriculture (7%). This was seen in the models as a focus on seasonality and meteorology. The SIP argues that Imperial County would be in compliance with the National Ambient Air Quality Standards but for the transport of pollution from Mexico. As points of evidence, they cite emissions inventories for Calexico and Mexicali, Mexico that suggest that 60% of PM 2.5 in Calexico comes from Mexicali. Two other sites in the center of Imperial County, El Centro and Brawley, had an annual PM concentration of 7-8 µg/m 3 in 2016, whereas Calexico had an annual PM 2.5 concentration of 12.5 µg/m 3 . This can be seen in the elevated measured and predicted PM 2.5 values at the US/Mexico border in Figure 2.
The 2018 PM 10 SIP found that the main drivers of PM 10 were fugitive windblown dust (75%), unpaved road dust (18%), and agriculture (3%). The PM coarse models, which can roughly be compared to the SIP's PM 10 emissions inventory, may have picked this up as seasonality and land use, specifically native vegetation, which was defined as primarily desert area. According to the SIP, wind patterns in Imperial County include high speed wind from the west, particularly during April and May, which may account for the high PM coarse levels seen during May 2017 in Figure 5.
In both Figures 2 and 3, high PM values can be seen west of the Salton Sea. The CARB review of the ICAPCD 2018 PM SIP remarks on this, saying that these high PM concentrations are due to dust from disturbed soils being swept into the air during high-wind events [31].
While there are challenges regarding calibration and accuracy of low-cost sensors, their low cost means they can be assembled into high-density networks that enable detailed modeling such as the LUR presented in this paper. Furthermore, the sensor and model data can be used help double-check and augment existing air monitoring and modeling efforts such as the emissions inventories in the Imperial County SIPs. If the necessary steps are taken to ensure data accuracy and proper display and explanation of sensor data, the authors see citizen science-derived data from low-cost air sensors as a valuable tool for communities, academic researchers, and government air-quality stakeholders.