Combining Regulatory Instruments and Low-Cost Sensors to Quantify the Effects of 2020 California Wildﬁres on PM2.5 in San Joaquin Valley

: The San Joaquin Valley in California has some of the worst air quality conditions in the nation, affected by a variety of pollution sources including wildﬁres. Although wildﬁres are part of the regional ecology, recent increases in wildﬁre activity may pose increased risk to people and the environment. The 2020 wildﬁre season in California included the largest wildﬁres reported to date and resulted in poor air quality across the state. In this study, we looked at the air quality effects of these wildﬁres in the San Joaquin Valley area. We determined that four wildﬁres (LNU Lightning Complex, SCU Lightning Complex, Creek, and Castle) were primarily affecting the air quality in the area. The daily PM2.5 emissions from each one of these wildﬁres were estimated and the largest daily emissions, 1935 ton/day, were caused by the Creek ﬁre. To analyze the air quality in the study area, we developed a method utilizing a combination of regulatory and low-cost sensor data to estimate the daily PM2.5 concentration levels at 5 km spatial resolution. The concentrations maps showed that the highest average concentration levels were reached on 17 September with an average of 130 µ g/m 3 when about one-ﬁfth of the study area was affected by hazardous PM2.5 levels. A sensitivity study of our interpolation method showed that the addition of low-cost sensors to regulatory data improved the performance of area-wide concentration estimates and reduced the mean absolute error and the root mean square error by more than 20%.


Introduction
San Joaquin Valley is affected by some of the poorest air quality conditions in the nation [1]. This area is especially affected by high PM2.5 concentrations and is designated as non-attainment for PM2.5 levels [2]. Exposure to PM2.5 has been associated with multiple adverse health effects [3], and the levels in California might worsen through time due to the climate-driven changes in emissions and meteorological conditions [2,4]. Moreover, exposure to PM2.5 is correlated with poor air quality perception in the Valley [5].
Increases in wildfire activities in the region, driven by a variety of factors, including fire suppression, land-use changes, and climate change [6] pose a risk of increased exposure to wildfire smoke. Furthermore, wildfires, as a major source of PM2.5, are expected to increase in the area in the next decades and the effects of changing climate combined with the population growth could result in more toxic exposure in the Valley [7,8]. The PM2.5 contribution of wildfires can be even larger during the summertime and in rural areas [9]. This contribution might gain more significance beyond summer with the fire season extending from May to December in some regions of California and the increasing pattern of extreme autumn wildfires [10].
Traditionally, the impact of wildfire events on air quality is investigated through the monitors in high population urban areas, which are usually far from the emission sources [11][12][13]. However, the assessment of smoke impacts in the areas close to wildfires can be helpful for accurate exposure and risk calculations as well as planning for response. One of the promising solutions in recent years has been to expand the ground-level monitoring network for observations of wildfire smoke and other pollution sources using low-cost sensors [14]. The accessibility and quick deployment of these sensors has allowed for monitoring air quality impacts of various wildfire events in recent years [15,16]. However, the observations from these sensors have lower accuracy compared to the regulatory monitors and usually need corrections before being used for any analysis [17]. Recently, these sensors have been used to improve the air quality models and high-resolution concentration maps, which traditionally were created by using regulatory monitor observations [18,19].
The objective of this study is to analyze the impact of particulate pollutants emitted by wildfires occurring between August and December 2020 on the ambient air quality of Central Valley using regulatory stations along with the Purple Air low-cost sensor network. The fire season of this year was the most active in California history so far (https://www.fire.ca.gov/incidents/2020/, accessed on 15 May 2021). Another aim of this study was to use a back-trajectory model to determine which of the major wildfires were most impactful on the air quality in this region. The periods before and during major wildfire events were analyzed and compared with each other and daily PM2.5 concentration maps were created for the study area. In addition to air quality, PM2.5 emissions from these wildfires were analyzed. We also quantified the additional accuracy in the concentration maps when using a combination of low-cost sensors and regulatory stations data compared to the traditional method of using solely the regulatory station data.

Study Location and Period
This study primarily focuses on the areas in the San Joaquin Valley Air Basin. In addition to the Basin's boundary, we also included Mariposa, Tuolumne, and Calaveras counties in Mountain Counties Air Basin to have a more uniform study area over the Sierra Nevada mountains (Figure 1). The study area covers 11 counties extending from the Sacramento-San Joaquin Delta in the north to the Tehachapi Mountains in the south and from coastal mountain ranges in the west to the Sierra Nevada in the east. The study period starts on 17 July 2020, a month before the start of the first major wildfire. This period, 17 July to 17 August, was chosen to have a base measurement period before the effects of wildfire. The analysis was extended to the containment of the last major wildfire in the area on 31 December 2020.   The study period starts on 17 July 2020, a month before the start of the first major wildfire. This period, 17 July to 17 August, was chosen to have a base measurement period before the effects of wildfire. The analysis was extended to the containment of the last major wildfire in the area on 31 December 2020. Fire perimeters included both the active fire front and the burned area no longer active; however, the fire area required to estimate emissions is only the active fire area. Thus, to obtain the active fire area two approaches were used. In the first, subsequent daily fire perimeters were subtracted in ArcPro where the resulting fire perimeter represented the active fire area only. In the second approach, VIIRS fire pixels were obtained from the NOAA HMS database (Hazard Mapping System, https://www.ospo.noaa.gov/Products/ land/hms.html) (accessed on 15 May 2021), which uses fire detections from the 2 km resolution ABI data from GOES-16 (East) and GOES-17 (West), 1 km data from the MODIS instrument aboard the Terra satellite and, 375 m resolution VIIRS data from the S-NPP and NOAA-20 polar satellites. The Visible Infrared Imaging Radiometer Suite (VIIRS) instrument provides the highest resolution of fire detections at 375 m; it was first launched in October 2011 aboard the Suomi National Polar-orbiting Partnership (S-NPP) satellite [20].
The HMS fire data was post-processed to isolate fire detections present within each fire area. Due to data availability for the entire duration of the fires, the second approach, the VIIRS approach, was selected for the entire process here. The first approach, based on fire perimeters, was only used to validate the VIIRS active fire area. Additionally, fire perimeters were utilized to geo-locate the fires of interest and to process fuels, emissions, and fire pixel data.

Fire Emission Estimation
Daily fire emissions were calculated using two different methods. The first method uses a custom algorithm that applies the First Order Effects Model (FOFEM) [21,22] Version 6.7 [23] to estimate emissions, while in the second method, emissions estimations were obtained from an emissions database, the Fire Inventory from NCAR (FINN) [24]. The purpose of applying two different emissions estimation methodologies was to increase the level of confidence in the emissions estimates obtained. FOFEM software estimates fire emissions based on fuel characteristics and moisture content in units of weight per unit area. It is worth noting that FOFEM provides a point estimate not a spatially delimited output. Since the fire emissions over an entire fire perimeter were of interest here a procedure similar to that in Clinton et al. (2006) was developed to apply point source emissions produced by FOFEM to a gridded spatial area [25]. Prior to running the FOFEM model, daily fire perimeters, VIIRS fire pixels, and fuel loading maps were obtained and post-processed using ArcGIS Pro and Python. Emissions lookup tables were generated by running FOFEM for fuel types present in the study area. Default model settings were utilized for all cases and fuel moisture was set to dry. Daily emissions were obtained through FCCS fuel loading data for each VIIRS fire pixel within the daily active fire area. The fuel data was overlaid with FOFEM-derived emissions tables for each fuel class. Through this process, emissions information for each fuel type was obtained in a process that calculated emissions for all fuels contained within a fire perimeter. Although FOFEM provides information on various emitted species, only PM2.5 was included in the analysis here for purposes of comparison with air quality data. A flowchart for this process is shown in Figure 2. Fire Inventory From NCAR (FINN) emission estimates were available for the year 2020 through National Center for Atmospheric Research database (Fire Emission Factors and Emission Inventories, https://www.acom.ucar.edu/Data/fire/, accessed on 1 July 2021). FINN Inventory produces daily emission estimates at an approximate horizontal resolution of 1 km 2 [24].

Trajectory Modeling
Multiple wildfires occurred during the study period in California. The first step in the analysis of the wildfire effect is to determine which of these wildfires had considerable impacts on the study area. For this purpose, we calculated the trajectory of air parcels starting from the location of each fire from the start to the containment date.
These forward trajectories were calculated using NOAA's HYSPLIT version 1.5 (Hybrid Single-Particle Lagrangian Integrated Trajectory) [26] using the SplitR package (https://github.com/rich-iannone/SplitR, accessed on 7 October 2020) [27]. Every six hours, trajectories were calculated running 48 h forward in time starting at each fire location 50 m above the ground level. We used the Global Data Assimilation System (GDAS) model output with a 1 × 1-degree resolution as the meteorological input in our trajectory modeling. The locations of forward trajectories were apportioned to a 50 × 50 km grid in order to find the most frequent path of trajectories. The results from this analysis were used to determine the wildfires with the most likely impact on the air quality in the study area for further emission analysis. While HYSPLIT can give an estimate of possible trajectories over the study area, the accurate assessment of the ground-level smoke effects was carried out using a combination of regulatory and low-cost instruments explained in the next section.

Regulatory Data
For the regulatory stations in the study area, hourly PM2.5 values were downloaded from California Air Resources Board (CARB) network (http://www.arb.ca.gov/aqmis2/aq-mis2.php, accessed on 15 May 2021). This network included the data from 20 stations in the study area operated by different agencies. More details about these stations and their operating agencies are provided in Table SI-1. PM2.5 measurements were collected using real-time Beta-Attenuation Monitors (BAM) manufactured by Met One Instruments, Inc at the hourly resolution. In addition to

Trajectory Modeling
Multiple wildfires occurred during the study period in California. The first step in the analysis of the wildfire effect is to determine which of these wildfires had considerable impacts on the study area. For this purpose, we calculated the trajectory of air parcels starting from the location of each fire from the start to the containment date.
These forward trajectories were calculated using NOAA's HYSPLIT version 1.5 (Hybrid Single-Particle Lagrangian Integrated Trajectory) [26] using the SplitR package (https: //github.com/rich-iannone/SplitR, accessed on 7 October 2020) [27]. Every six hours, trajectories were calculated running 48 h forward in time starting at each fire location 50 m above the ground level. We used the Global Data Assimilation System (GDAS) model output with a 1 × 1-degree resolution as the meteorological input in our trajectory modeling. The locations of forward trajectories were apportioned to a 50 × 50 km grid in order to find the most frequent path of trajectories. The results from this analysis were used to determine the wildfires with the most likely impact on the air quality in the study area for further emission analysis. While HYSPLIT can give an estimate of possible trajectories over the study area, the accurate assessment of the ground-level smoke effects was carried out using a combination of regulatory and low-cost instruments explained in the next section.  PM2.5 measurements were collected using real-time Beta-Attenuation Monitors (BAM) manufactured by Met One Instruments, Inc at the hourly resolution. In addition to the PM2.5 data, hourly wind direction and speed data were downloaded for further analysis of wind patterns in the study area.

Low-Cost Sensors
In this study, we used the PM2.5 data from low-cost sensors to expand the observations from the regulatory network. PurpleAir network was used for this purpose, which is the densest network in the study area deployed by various air quality agencies, citizen groups, and individuals (www2.purpleair.com, accessed on 15 May 2021). PurpleAir reports 2min average PM2.5 levels at two different channels (labeled A and B) measured by two laser scattering sensors as well as temperature, pressure, and relative humidity. These sensors have been used and evaluated during a variety of studies for different purposes including air quality assessment of wildfires [14,15,17,28]. To access the sensor data, the AirSensor package was used which is an open-source package in R designed for accessing and visualizing PurpleAir observations [29]. The most conservative quality assurance algorithm in the AirSensor package was used to filter out the suspected data points. This algorithm calculated PM2.5 concentrations by hourly averaging the values measured at both channels. Data was invalidated if the minimum count was less than 20 (66% data completion) when the hourly mean difference between channels A and B was less than 5 µg/m 3 , or 3-hourly data recovery was less than 90%. In addition to these criteria, the sensors that were operative for less than half of the study period were not used in the analysis.
An important consideration for the PurpleAir sensors, as well as the majority of lowcost sensors, is the accuracy of the measurements and the needed data correction [30,31]. These sensors are known to overestimate the PM2.5 concentration values and can be corrected by using collocated regulatory measurements [17,32]. In this study, we used the data from five sensors with the collocated monitors at regulatory stations for the correction of sensor measurements. These stations were located at Visalia, Modesto-14th Street, Bakersfield-California Avenue, Fresno-Garland, and Yosemite Village-Visitor Center (Table SI-1). While simple linear regression analysis can eliminate much of the bias of these sensors, the addition of temperature and relative humidity has been suggested in the literature to improve the results [17,19,32]. Thus, in this study, we tested multiple correction models similar to the ones used in the literature by Barkjohn et al., 2020 andMalings et al., 2020 [17,33]. The performances of different correction models are shown in Table SI-2. The results from multivariable linear regression equations using relative humidity and temperature were minimally different from a simple linear regression. Hence, we decided to use the simple linear regression model between the collocated measurements because of its simplicity and performance compared to the rest of the models. The linear regression showed that sensor PM2.5 observations are highly correlated with the regulatory monitors (r 2 = 0.87) (Figure SI-1).

Interpolation Analysis
The PM2.5 data from sensors and regulatory stations were averaged daily and combined to create a concentration network in the study area. Since the regulatory data is of higher accuracy compared to the low-cost sensor data, the data from the low-cost sensors closer than 1 km to the regulatory station is filtered out to give more weight to these monitors. Moreover, if two sensors were located less than 1 km from each other, the sensor with lower data recovery is filtered out to avoid any overfitting in the interpolation or sensitivity analysis as well as to prioritize the higher quality sensors. The total number of low-cost sensors used for the final analysis was 165.
The daily data from sensors or regulatory monitors with less than 18 valid hours were invalidated. Surface concentrations over the study area were calculated using a gridded surface of daily PM2.5 values by interpolating the regulatory and sensor measure- ments. We used ordinary kriging as the method for our interpolation [34], which has been used in previous air quality studies to interpolate concentrations over local or regional scales [9,19,35,36]. Ordinary kriging is known to have a better performance with normal distribution datasets and in the case of the non-normal distribution transformation to a normal distribution is recommended [37]. While in practice normality is not always possible, a natural logarithmic transformation was used to reach a more normal distribution in this study ( Figure SI-2). Concentration levels were interpolated over a 5 × 5 km grid encompassing the study area. The "gstat" library in R was used to perform the kriging analysis for each day using all of the available observations [38]. First, experimental semivariograms were calculated, and then a model was fitted using the default weighted least-squares fit to calculate the model parameters (e.g., sill, nugget, and range). The spherical semivariogram model had the smallest median least square error during the study period and hence was used in these interpolations.

Interpolation Performance and Sensitivity Analysis
The performance of the interpolation analysis was assessed through a spatial crossvalidation algorithm. The stations used for the interpolation were divided into three random groups. The data in each group was validated against the interpolation model trained by the rest of the stations for each day and mean absolute error (MAE), root mean square error (RMSE), and coefficient of determination (r 2 ) were calculated using the interpolated and observed concentrations. We repeated this procedure 100 times with different random groups in the cross-validation, and the average MAE, RMSE, and r 2 were used as the measures of performance for the interpolated maps.
A sensitivity analysis was then performed to understand the improvement of the interpolated maps using the combination of sensor and regulatory measurements as compared with regulatory measurements only. First, concentration maps were created using only the regulatory stations. The accuracy of these maps was then compared to the maps using additional sensor data by comparing the average errors and r 2 in the cross-validation analysis.

Wildfire
The results of forward trajectories from the HYSPLIT model suggested that four of the wildfires were affecting the study area: SCU Lightning Complex, LNU Lightning Complex, Creek, and Castle. Figure 3 shows the density of trajectories starting from each of these wildfires. The higher density inside the study area indicates that more smoke could be transported from these fires compared to the rest of the wildfires during the study period with a smaller impact on the air quality of the area ( Figure SI-3). To confirm that the results from HYSPLIT are not affected by the selected initial height of the trajectories, back-trajectory analysis was conducted for these fires at 100 m and 200 m heights. The outcome was consistent with the original results (not shown).
A summary of the incident information for the four fires of interest identified by the HYSPLIT model is presented in Table 1   A summary of the incident information for the four fires of interest identified by the HYSPLIT model is presented in Table 1. All fires started in August except the Creek fire, which started in September. The largest fire was the SCU Lightning Complex which reached an area of 396,624 acres. The smallest fire was the SQF Complex(Castle fire) which reached an area of 174,178 acres. Maps showing the fire growth are presented in Figure 4.      Daily fire emissions were estimated via data from the FINN database and FOFEM. For each fire, we examined emissions occurring between the start and end dates listed in Table 1. Figure 5a,b show the PM2.5 emissions for the LNU Lightning Complex and SCU Lightning Complex fires respectively where it can be observed that both fires exhibited peak emissions during August. According to the estimates here, emissions from the SCU and LNU fires peaked the same week between 18 and 20 August. Specifically, the application of the FOFEM model estimated that the peak emissions for the LNU fire occurred on 19 August whereas data from the FINN model estimated that maximum peak emissions occurred on 20 August. In the case of the SCU fire, the FOFEM model estimated that peak emissions occurred on 19 August whereas data from the FINN model estimated that maximum peak emissions occurred on 20 August. The magnitude of the maximum PM2.5 emissions was higher for the LNU fire than compared to the SCU fire. Like the case of the Castle and Creek fire estimates, the daily emission estimates from FINN were larger than FOFEM estimates. The FINN estimate for the maximum LNU fire PM2.5 emissions was 1731 ton/day whereas the FOFEM estimate was 698 ton/day. For the SCU fire, the FINN estimate indicated maximum emissions of 483 ton/day whereas the FOFEM estimate indicated 202 ton/day. The emission from both fires decreased substantially around the end of August before their total containment. Daily fire emissions were estimated via data from the FINN database and FOFEM. For each fire, we examined emissions occurring between the start and end dates listed in Table 1. Figure 5a,b show the PM2.5 emissions for the LNU Lightning Complex and SCU Lightning Complex fires respectively where it can be observed that both fires exhibited peak emissions during August. According to the estimates here, emissions from the SCU and LNU fires peaked the same week between 18 and 20 August. Specifically, the application of the FOFEM model estimated that the peak emissions for the LNU fire occurred on 19 August whereas data from the FINN model estimated that maximum peak emissions occurred on 20 August. In the case of the SCU fire, the FOFEM model estimated that peak emissions occurred on 19 August whereas data from the FINN model estimated that maximum peak emissions occurred on 20 August. The magnitude of the maximum PM2.5 emissions was higher for the LNU fire than compared to the SCU fire. Like the case of the Castle and Creek fire estimates, the daily emission estimates from FINN were larger than FOFEM estimates. The FINN estimate for the maximum LNU fire PM2.5 emissions was 1731 ton/day whereas the FOFEM estimate was 698 ton/day. For the SCU fire, the FINN estimate indicated maximum emissions of 483 ton/day whereas the FOFEM estimate indicated 202 ton/day. The emission from both fires decreased substantially around the end of August before their total containment.      The daily emission estimates from FINN are generally more than twice larger than FOFEM estimates. This is because of the different methodologies used for these estimates which causes wildfire emission inventories to be different from each other in magnitude [44]. However, the emission trends from both methods are similar to each other, which suggests that the trends are reliable for our analysis.
It should be noted that both methods were not able to estimate the wildfire emissions on the days with low fire activities. This is likely due to the fact that the satellite was not able to detect weak fire activities. However, our FOFEM method was able to monitor the fire activities longer than the FINN estimates for all the fires.

Air Quality
The air quality levels during the study period are driven by a combination of emission sources and meteorological effects. During the study period, the dominant winds were from northwest to southeast in the Valley area with similar patterns at different stations from north to south ( Figure 6). The wind patterns are distinctly different at the stations located at higher elevations with the winds from northeast and southwest in Sequoia National Park and Yosemite Village stations and southeast and westerly winds dominating San Andreas-Gold Strike Road. These higher elevation stations are located in valley drainages and are affected by the up and downslope flows and other topographic influences from the Sierra Nevada mountains [45].  Figure 7 provides a summary of average air quality during the study period. The first major fires affecting the area were SCU Lightning Complex and LNU Lightning Complex fires started on 17 and 18 August respectively. We used the month before 17 August as the base period showing the concentration levels in the study area without any wildfire   Figure 7 provides a summary of average air quality during the study period. The first major fires affecting the area were SCU Lightning Complex and LNU Lightning Complex fires started on 17 and 18 August respectively. We used the month before 17 August as the base period showing the concentration levels in the study area without any wildfire impact. This period is highlighted with green in Figure 7a and its average concentration map is shown in Figure 7b by averaging all the estimated concentrations in each grid. The daily average area-wide concentration during this period was 7.7 ± µg/m 3 with small variations. Figure 8 shows the daily percentage of the study area experiencing each AQI during the study period. During this base period, most of the area experienced a good AQI. The concentration levels started increasing substantially on 17 August as the smoke from the wildfires in the northeast started affecting the region. Smoke was transported along the San Joaquin Valley with the dominant northwesterly winds (shown in Figure 6) resulting in poor air quality in the region. The period from 17 August to 4 September is the yellow section in Figure 7a with the average concentration map shown in Figure 7c. The interpolated maps showed that the concentrations were uniformly increased across the study area during this period with an average concentration of 45 µg/m 3 , which is more than four times the pre-fire period. From 18 to 26 August, the majority of the study area was experiencing unhealthy AQI with some areas experiencing very unhealthy conditions. This was a direct result of the fact that the LNU and SCU fires were the most active during this period, especially from 18 to 20 August (Figure 5a,b). The highest daily PM2.5 concentration during this period happened on 22 August with the daily area-wide concentration reaching 108 µg/m 3 . With the emissions from LNU and SCU lightning fires going down, the concentrations decreased by the end of this period. However, with the start of the Creek fire on 4 September and the increasing activity of the Castle fire (Figure 5d), the air quality condition deteriorated again.
The period from 4 September to 6 November, when the Castle and Creek fires emissions decreased substantially, is highlighted as red in Figure 7a. The study area was mostly affected by the Castle and Creek fire during this period as shown in the average concentration map during this period (Figure 7d). LNU and SCU fires had small PM2.5 emissions during this period (Figure 5a,b) and minimally affected the air quality in the study area. The highest daily emission of the Creek fire was recorded on 7 September (Figure 5c), and this matched the unhealthy and very unhealthy air quality conditions experienced during 7-21 September. The highest daily area-wide concentration during the study period happened on 17 September with the average area-wide concentration of 130 µg/m 3 . On the same day, 19% of the study area was experiencing hazardous levels of AQI which was the highest during the study period. From 29 September to 5 October, the area experienced unhealthy conditions. It should be noted that the areas experiencing hazardous AQI levels were concentrated around locations closer to the Castle and Creek fires. This is similar to findings from previous studies in this area [12]. At the end of this period, northern fires were contained.
With the emissions of the Castle and Creek fire decreasing, the concentration levels in the study area subsided. The period from 6 November to the containment of the Creek fire on 31 December is highlighted in blue in Figure 7a, and the average concentration map is shown in Figure 7e. After 6 November, the concentration across the study period was reduced but was still elevated compared to the pre-fire period. The emissions from Castle and Creek fires were not detectable through our emission estimation models during this period which suggested negligible effect from these fires in the study period (Figure 5c,d). The interpolated concentration map during this period was consistent with the emission estimates showing small to no contribution from the Creek and Castle fires and with higher concentration at the lower elevation areas in the Valley (Figure 7e). These higher concentration levels are consistent with historical elevated PM2.5 concentrations and caused by lower boundary layer height and ventilation combined with emission sources other than wildfires during this period [46].    The period from 4 September to 6 November, when the Castle and Creek fires emissions decreased substantially, is highlighted as red in Figure 7a. The study area was mostly affected by the Castle and Creek fire during this period as shown in the average concentration map during this period (Figure 7d). LNU and SCU fires had small PM2.5 emissions during this period (Figure 5a,b) and minimally affected the air quality in the study area. The highest daily emission of the Creek fire was recorded on 7 September (Figure 5c), and this matched the unhealthy and very unhealthy air quality conditions experienced during 7-21 September. The highest daily area-wide concentration during the study period happened on 17 September with the average area-wide concentration of 130 μg/m . On the same day, 19% of the study area was experiencing hazardous levels of AQI which was the highest during the study period. From 29 September to 5 October, the area experienced unhealthy conditions. It should be noted that the areas experiencing hazardous AQI levels were concentrated around locations closer to the Castle and Creek fires. This is similar to findings from previous studies in this area [12]. At the end of this period, northern fires were contained.
With the emissions of the Castle and Creek fire decreasing, the concentration levels in the study area subsided. The period from 6 November to the containment of the Creek fire on 31 December is highlighted in blue in Figure 7a, and the average concentration map is shown in Figure 7e. After 6 November, the concentration across the study period was reduced but was still elevated compared to the pre-fire period. The emissions from Castle and Creek fires were not detectable through our emission estimation models during this period which suggested negligible effect from these fires in the study period (Figure 5c,d). The interpolated concentration map during this period was consistent with the emission estimates showing small to no contribution from the Creek and Castle fires and with higher concentration at the lower elevation areas in the Valley (Figure 7e). These higher concentration levels are consistent with historical elevated PM2.5 concentrations and caused by lower boundary layer height and ventilation combined with emission sources other than wildfires during this period [46].

Interpolation Performance and Sensitivity Analysis
Applying the interpolation algorithm with only the regulatory measurements resulted in an MAE of 7.0 µg/m 3 , RMSE of 18.2 µg/m 3 , and R 2 of 0.63 for the entire study period. Adding the low-cost sensors and using the interpolation algorithm as described in Section 2.4 resulted in a reduction of MAE and RMSE to 5.2 µg/m 3 and 14.2 µg/m 3 respectively and improved R 2 to 0.81. These results clearly showed that the addition of low-cost sensors, which resulted in a higher density of observations, improved the interpolation performance substantially resulting in a more accurate concentration assessment in the study area.

Conclusions and Summary
In this study, we analyzed the emissions and air quality effects of wildfires in the San Joaquin Valley area of California happening between August to December 2020. HYSPLIT's back-trajectory analysis suggested that four major wildfires were affecting the study area in this period including LNU Lightning Complex, SCU Lightning Complex, Creek, and Castle fires. We estimated PM2.5 daily emissions from these fires during this period using two different methods. The first method utilized the satellite fire pixels along with First Order Effects Model (FOFEM) emission model to estimate the daily emissions while the second method used the Fire Inventory from NCAR (FINN) database. The emission estimations by FOFEM were less than half of FINN estimation. However, the emission trends estimated by both the FOFEM and FINN methods were correlated with each other very well which suggested that the evolution of wildfire emissions was correctly assessed. Daily emissions from SCU Lightning Complex and LNU Lightning Complex fires peaked around the same time between 18 and 20 August. The maximum emissions for the Creek fire occurred on 7 September, whereas for the Castle fire the maximum emissions occurred on 15 September.
Unlike the traditional approach of using only regulatory stations in analyzing the air quality effects of wildfires, we investigated the effects of these PM2.5 emissions on air quality using a combination of regulatory stations and low-cost sensors in the area. First, an equation was developed to correct the low-cost sensor data based on the concurrent regulatory and sensor measurements in the study area. Then, we interpolated the concentration in the study area using the daily averages from regulatory stations and corrected sensors to produce 5 km resolution maps. The interpolated maps showed that with the start of the LNU and SCU fires, the concentration levels in the San Joaquin Valley increased relatively uniformly due to the transport of the smoke from these fires along with the dominant northwesterly winds. The highest area-wide PM2.5 concentration was observed on 17 September caused by the smoke of Castle and Creek fires with an average of 130 µg/m 3 and affecting about one-fifth of the study area with hazardous PM2.5 levels.
Concentration maps also showed that Castle and Creek fires had the largest effects on the air quality in their immediate vicinity which were not similar to the more uniform effects of LNU and SCU Lightning Complex fires.
Closer to the end of the study period (after the first week of November) and while Castle and Creek fires were still active, the concentration levels were mostly affected by sources other than wildfires in the Valley due to the reduced emissions from these fires.
Further sensitivity study and cross-validation analysis were applied to the interpolation method to quantify the improvement in accuracy of concentration estimations when sensor measurements are added to the regulatory ones. Our analysis showed that utilizing additional data of low-cost sensors resulted in about 26% and 22% reductions in mean absolute error and root mean squared error respectively compared to only using the regulatory monitors. The coefficient of determination improved from 0.63 to 0.81 in the cross-validation analysis when sensors data were added. These interpolation maps clearly showed the additional benefit of the available low-cost sensors in tracking the air quality events caused by the wildfire when it is combined with conventional regulatory measurements.
This work can help create a more accurate picture of wildfire pollution in the San Joaquin Valley as wildland fires are expected to grow in severity and frequency in the region. With the expansion of the sensor network in the region, we can expect further improvements in air quality assessment using methods similar to our work. Future work may explore incorporating air quality models at local or regional scales with the current interpolation method to further increase the accuracy of the concentration maps in the San Joaquin Valley area. In addition, the methods developed here can be helpful in understanding long-term trends in the impact of wildfire on air quality in the San Joaquin Valley.  Data Availability Statement: Raw data used for this study are publicly available. Please refer to Section 2.2 and for further information.