Description and Evaluation of the Fine Particulate Matter Forecasts in the NCAR Regional Air Quality Forecasting System

This paper describes a quasi-operational regional air quality forecasting system for the contiguous United States (CONUS) developed at the National Center for Atmospheric Research (NCAR) to support air quality decision-making, field campaign planning, early identification of model errors and biases, and support the atmospheric science community in their research. This system aims to complement the operational air quality forecasts produced by the National Oceanic and Atmospheric Administration (NOAA), not to replace them. A publicly available information dissemination system has been established that displays various air quality products, including a near-real-time evaluation of the model forecasts. Here, we report the performance of our air quality forecasting system in simulating meteorology and fine particulate matter (PM2.5) for the first year after our system started, i.e., 1 June 2019 to 31 May 2020. Our system shows excellent skill in capturing hourly to daily variations in temperature, surface pressure, relative humidity, water vapor mixing ratios, and wind direction but shows relatively larger errors in wind speed. The model also captures the seasonal cycle of surface PM2.5 very well in different regions and for different types of sites (urban, suburban, and rural) in the CONUS with a mean bias smaller than 1 µg m−3. The skill of the air quality forecasts remains fairly stable between the first and second days of the forecasts. Our air quality forecast products are publicly available at a NCAR webpage. We invite the community to use our forecasting products for their research, as input for urban scale (<4 km), air quality forecasts, or the co-development of customized products, just to name a few applications.


Introduction
Exposure to air pollutants can lead to premature deaths through respiratory or cardiovascular diseases not only at high levels but also at levels below the United States (US) Environmental Protection Agency (EPA) defined National Ambient Air Quality Standards (NAAQS) [1][2][3]. Poor air quality in the United States (US) was estimated to have caused about 160,000 premature deaths in 2010 in the US with a total economic loss of about $175 billion [4]. To help the public reduce their exposure to air pollution, air quality managers across the US use air quality forecasts along with other sources of information to warn the public of forthcoming air pollution episodes.
The National Center for Atmospheric Research (NCAR) has developed a regional air quality forecasting system in collaboration with the National Aeronautics and Space Administration (NASA) and the Colorado Department of Public Health and Education (CDPHE) that provides 48 h air quality forecasts of ozone, fine particulate matter (PM 2.5 ), and related species over the contiguous US (CONUS). The NCAR system does not intend to replace the operational air quality forecasts produced by the National Air Quality Forecasting Capability (NAQFC) at the National Oceanic and Atmospheric Administration (NOAA)/National Centers for Environmental Predictions (NCEP) [5] but aims to provide an additional piece of information for air quality management. For instance, if both the

The Model Configuration
The architecture of the NCAR regional air quality forecasting system for the CONUS is depicted in Figure 1. We used the Weather Research and Forecasting model coupled with Chemistry (WRF-Chem) [7,8] as the air quality model. The model domain is defined on a Lambert conformal projection with a horizontal grid spacing of 12 km and (390, 230, and 43) grid points in x, y, and z directions. The model top is located at 50 hPa, and the average thickness of the lowest model layer is 7.7 m. Eleven model levels are located between the surface and 1 km altitude. We used a meteorological time step of 15 sec and a chemistry time step of 2 min. The meteorological initial and boundary conditions were based on the 00 UTC cycle of the NCEP Global Forecast System (GFS). The GFS output is available every 3 h at a spatial resolution of 0.5 • × 0.5 • and can be downloaded from the NCEP GFS website. The GFS output is mapped to the WRF-Chem domain using the WRF preprocessing system (WPS). The chemical initial and boundary conditions were based on the Whole Atmosphere Community Climate Model (WACCM [9]) daily 10-day global atmospheric composition forecast conducted at NCAR.
Our forecasting system reads in pre-calculated monthly averaged hourly anthropogenic emissions of trace gases and aerosols, which are based on the EPA 2014 v2 National Emissions Inventory (NEI). Near-real-time (NRT) biomass burning emissions are obtained from the Fire Inventory from NCAR (FINN) version 1 [10] and are distributed vertically online within the model using a plume rise parameterization developed by Freitas et al. [11]. This parameterization selects fire properties appropriate for the land use in every grid box that contains fire emissions and simulates the plume rise explicitly using the environmental conditions simulated by WRF. The derived height of the plume was used as the effective injection height. Our forecasting system reads in pre-calculated monthly averaged hourly anthropogenic emissions of trace gases and aerosols, which are based on the EPA 2014 v2 National Emissions Inventory (NEI). Near-real-time (NRT) biomass burning emissions are obtained from the Fire Inventory from NCAR (FINN) version 1 [10] and are distributed vertically online within the model using a plume rise parameterization developed by Freitas et al. [11]. This parameterization selects fire properties appropriate for the land use in every grid box that contains fire emissions and simulates the plume rise explicitly using the environmental conditions simulated by WRF. The derived height of the plume was used as the effective injection height.
Once the injection height is estimated, fire emissions in the flaming phase are equally distributed in the vertical grid between the surface and the estimated injection height. The online plume rise module expects the input emissions in the smoldering phase, whereas FINN emissions include both smoldering and flaming emissions. Therefore, a scaling factor to the FINN input emissions is applied within the plume rise model. Within the model, fire emissions are assumed to have a diurnal variation with a daytime peak. We employed the persistence fire emission assumption, which means that NRT fire emissions are repeated every day for each day of the forecast cycle and which is commonly used in operational air quality forecasts [12]. FINN NRT fire emissions are available with a latency of one day, i.e., the forecast cycle starting on 1 May 2019 used the fire emissions from 30 April 2019 for both 1 and 2 May 2019.
Biogenic emissions of volatile organic compounds and soil NOx are calculated online within the model using the Model of Emissions of Gases and Aerosols from Nature (ME-GAN) [13,14]. MEGAN v2.04 implemented in WRF-Chem uses gridded emission factors based on global datasets of four plant functional types, namely broadleaf trees, needleleaf trees, shrubs/bushes, and herbs/crops/grasses. MEGAN defines the emission factor as the net flux of a compound into the atmosphere, and thus, MEGAN emissions from croplands include emissions from both the soils and plants. However, MEGAN v2.04 does not consider the pulsation of NOx emission events after fertilization. The standard emission factors (defined as net flux of a compound into the atmosphere) for nitrogen compounds (NO, NO2, and NH3) in these four categories are 5, 6, 30, and 70 µ g m −2 hr −1 , meaning that the highest emissions of nitrogen compounds occur from the herbs/crops/grasses category. Emissions of dust and sea-salt aerosols are also calculated online within the model using wind speed-based parameterizations [15,16]. Other input datasets supplied to the model include the land use and vegetation information for biogenic emission and dry deposition calculations and column density for ozone and oxygen for photolysis frequency estimation. Once the injection height is estimated, fire emissions in the flaming phase are equally distributed in the vertical grid between the surface and the estimated injection height. The online plume rise module expects the input emissions in the smoldering phase, whereas FINN emissions include both smoldering and flaming emissions. Therefore, a scaling factor to the FINN input emissions is applied within the plume rise model. Within the model, fire emissions are assumed to have a diurnal variation with a daytime peak. We employed the persistence fire emission assumption, which means that NRT fire emissions are repeated every day for each day of the forecast cycle and which is commonly used in operational air quality forecasts [12]. FINN NRT fire emissions are available with a latency of one day, i.e., the forecast cycle starting on 1 May 2019 used the fire emissions from 30 April 2019 for both 1 and 2 May 2019.
Biogenic emissions of volatile organic compounds and soil NO x are calculated online within the model using the Model of Emissions of Gases and Aerosols from Nature (MEGAN) [13,14]. MEGAN v2.04 implemented in WRF-Chem uses gridded emission factors based on global datasets of four plant functional types, namely broadleaf trees, needle-leaf trees, shrubs/bushes, and herbs/crops/grasses. MEGAN defines the emission factor as the net flux of a compound into the atmosphere, and thus, MEGAN emissions from croplands include emissions from both the soils and plants. However, MEGAN v2.04 does not consider the pulsation of NO x emission events after fertilization. The standard emission factors (defined as net flux of a compound into the atmosphere) for nitrogen compounds (NO, NO 2 , and NH 3 ) in these four categories are 5, 6, 30, and 70 µg m −2 hr −1 , meaning that the highest emissions of nitrogen compounds occur from the herbs/crops/grasses category. Emissions of dust and sea-salt aerosols are also calculated online within the model using wind speed-based parameterizations [15,16]. Other input datasets supplied to the model include the land use and vegetation information for biogenic emission and dry deposition calculations and column density for ozone and oxygen for photolysis frequency estimation.
The gas-phase tropospheric ozone photochemistry is modeled using the Model for Ozone and Related Tracers-4 (MOZART-4) chemical mechanism [17]. The Goddard Global Ozone Chemistry Aerosol Radiation and Transport (GOCART) model was used to represent aerosol processes. GOCART simulates five major tropospheric aerosol types, namely sulfate, organic carbon (OC), black carbon (BC), dust, and sea-salt [15,18,19]. GOCART uses a bulk approach (i.e., simulates only mass concentrations) for BC, OC, and sulfate and tracks both the size distribution and mass concentrations of dust and sea-salt aerosols. BC and OC emissions occur in the hydrophobic mode but age to the hydrophilic mode with an e-folding lifetime of 2.5 days. Both the dry and wet deposition removes aerosol particles from the atmosphere with the exception of hydrophobic BC and OC, which are not affected by dry deposition. The GOCART aerosol model does not simulate nitrate and secondary organic aerosols (SOA), which may lead to underestimation of simulated PM 2.5 mass concentrations. WRF-Chem includes other aerosol schemes that simulate both nitrate aerosols and SOA, but their high computational cost prevents their use in daily forecasting. In addition to the standard model setup, we also included six source-specific CO tracers and eight inert tracers to determine the influence of fires in a given location. These tracers were primarily used to support flight planning during the FIREX-AQ campaign and were not analyzed in this study. Other physical and chemical parameterizations used in the model configuration are listed in Table 1. The air quality forecasting system starts every day at 2 a.m. Mountain Time (MT) with the download and preprocessing of the GFS, WACCM, and FINN datasets, which takes about 15 min. A dedicated queue is allocated on NCAR's supercomputer to run the forecast, and 48 h air quality forecasts are normally available by 5 a.m. MT. Table 1. Parameterizations used for selected atmospheric processes in the NCAR regional air quality forecasting system.

Challenges and Mitigation Strategies
Daily air quality forecasting poses several technical challenges that need to be addressed for the automated production of air quality forecasts. First, a delay in the satellite retrieved fire locations prevents the generation of NRT fire emissions. In such cases, our system uses the fire emissions from the previous day. Second, an upgrade of the GFS and/or the WACCM models can interrupt the input meteorology and chemistry streams and cause a failure of the air quality forecast. However, such instances are rare, and so far, we have had to deal with only the GFS forecast system upgrade, in June 2019. Third, NCAR's supercomputer "Cheyenne" may be unavailable occasionally for 1-7 days to allow for routine maintenance or due to suffering from unexpected outages leading to non-availability of the air quality forecasts for the maintenance days. Automated scripts have been developed to handle Cheyenne outages and backfill the missed days. In general, we are able to fill a 7 day gap in 1-2 days. Fourth, saving the full three-dimensional WRF-Chem output every hour poses a storage challenge. We addressed this challenge by storing only surface concentrations of the most important air pollutants and the three-dimensional distribution of only a few variables every hour, and the full three-dimensional WRF-Chem output every six hours. A full list of the variables included in the hourly output can be seen in Table 2 of the web page "WRF-Chem Tracers for FIREX-AQ" [28].

Information Dissemination System
WRF-Chem forecasts are post-processed to generate many products that help fulfill the needs of the research community as well as air quality forecasters from the CDPHE. These products are displayed on the website "WRF-Chem Forecast maps" [29]. Figure 2 shows a snapshot of the information dissemination system. Users can display the spatial distribution of ozone, PM 2.5 , NO x , CO, CO source tracers, inert fire tracers, aerosol optical depth (AOD) at 550 nm, and key meteorological variables (planetary boundary layer height, downwelling solar radiation, relative humidity, and ventilation rate) for the CONUS, Colorado, and Colorado Front Range (a transition zone between the Great Plains and the Rocky Mountains in the central part of Colorado) at the surface and at 3 km, 5 km, and 8 km altitudes. The users can switch between the regions (red arrow in Figure 2), altitudes (blue arrow), and chemical species (green arrow) from the drop-down menus. We also display NRT observations of ozone and PM 2.5 from the EPA AirNOW network for a qualitative NRT evaluation of our forecasts. Figure 2 shows an example of the comparison for PM 2.5 for 30 September 2020 12 UTC over the CONUS. Both the model and observations show high PM 2.5 along the western coast because of the California wildfires, but the model appeared to underestimate the observed PM 2.5 levels at this time except in Minnesota, where the model slightly overestimated the observations. and 8 km altitudes. The users can switch between the regions (red arrow in Figure 2), altitudes (blue arrow), and chemical species (green arrow) from the drop-down menus. We also display NRT observations of ozone and PM2.5 from the EPA AirNOW network for a qualitative NRT evaluation of our forecasts. Figure 2 shows an example of the comparison for PM2.5 for 30 September 2020 12 UTC over the CONUS. Both the model and observations show high PM2.5 along the western coast because of the California wildfires, but the model appeared to underestimate the observed PM2.5 levels at this time except in Minnesota, where the model slightly overestimated the observations. The website also hosts an access panel with links to evaluation and additional visualization, comparison with the TOLNET stations, and comparison to satellite retrievals ( Figure 3). Under the evaluation and additional visualization, the users can access average statistics, hourly maps, Google Earth files (Keyhole Markup language Zipped (KMZ)) containing our forecasts, and forecasts at TOLNET sites. Average statistics include plots comparing the observed and forecasted time series of PM2.5 and ozone averaged over the CO-NUS, 10 EPA regions, Colorado, and the Front Range (see evaluation and visualization example in Figure 3). Maps for the CONUS showing correlation coefficient (r), mean bias (MB), and root mean squared error (RMSE) at each site for a given forecast cycle are also The website also hosts an access panel with links to evaluation and additional visualization, comparison with the TOLNET stations, and comparison to satellite retrievals ( Figure 3). Under the evaluation and additional visualization, the users can access average statistics, hourly maps, Google Earth files (Keyhole Markup language Zipped (KMZ)) containing our forecasts, and forecasts at TOLNET sites. Average statistics include plots comparing the observed and forecasted time series of PM 2.5 and ozone averaged over the CONUS, 10 EPA regions, Colorado, and the Front Range (see evaluation and visualization example in Figure 3). Maps for the CONUS showing correlation coefficient (r), mean bias (MB), and root mean squared error (RMSE) at each site for a given forecast cycle are also displayed. Hourly maps of the comparison of the model's forecast with NRT observations are shown in Figure 2. Under the TOLNET link, we display the vertical profiles of ozone, CO, CO tracers, relative humidity, 300 nm backscatter coefficient, and 300 nm extinction coefficient for the full 48 h forecast cycle at all the TOLNET sites, i.e., Boulder, Greenbelt, Hampton, Huntsville, and Wrightwood (see

Observations
Every day, we download hourly surface PM2.5 concentrations from 1136 EPA Air-NOW sites for the NRT evaluation of our forecasts, as presented in Figure 2. These observations are downloaded with a 2-day lag so that a full forecast cycle can be validated. However, many of these sites have missing data, and for annual evaluation, we include only those sites that have valid data for at least 50% of the days in a month, and each day

Observations
Every day, we download hourly surface PM 2.5 concentrations from 1136 EPA AirNOW sites for the NRT evaluation of our forecasts, as presented in Figure 2. These observations are downloaded with a 2-day lag so that a full forecast cycle can be validated. However, many of these sites have missing data, and for annual evaluation, we include only those sites that have valid data for at least 50% of the days in a month, and each day must have valid data for at least 18 h to be considered as a valid day. This requirement reduces the number of sites from 1136 to 612, of which 560 are located in the US. A map of the total sites and the sites used in this analysis is shown in the Supplementary Material ( Figure S1). In addition, we use in situ observations of 2 m temperature and relative humidity, surface pressure, and 10 m wind speed and direction for the evaluation of the meteorological parameters simulated by WRF-Chem. The meteorological data used for this study is from the Aviation Routine Weather Record (METAR) network and is distributed by NCEP's Meteorological Assimilation Data Ingest System (MADIS). The METAR data is a standard data record for reporting meteorological data in the US as part of a surface weather observation program. The METAR data has four different Quality Control (QC) checks, and in the model, we used level-3 QC-controlled data. For individual reporting sites (at QC = 3), there is a lot of missing METAR data. Similar to PM 2.5 observations, we only include those sites that have valid data for at least 50% of the days in a month, and each day must have valid data for at least 18 h to be considered as a valid day. The total number of METAR sites that reported data during the study period was 1290, but the data validity requirement reduces the number of sites to 525, 513, 459, 516, 516, respectively, for 2 m temperature, relative humidity, surface pressure, 10 m wind speed and direction.

Results and Discussion
The model output is paired with the in-situ observations of near-surface temperature, pressure, relative humidity, wind speed, wind direction, and PM 2.5 mass concentrations through bilinear interpolation to the observation site locations. The evaluation is performed in 10 regions defined by the EPA (see Supplementary Material, Figure S2 for region definition) to understand regional variability in model performance. Model performance is evaluated separately for the first and second days of the forecast to assess variations in the model performance as a function of the lead time. Figure 4 shows an evaluation of the modeled hourly and daily averaged 2 m temperature against the observations in the 10 EPA regions during June 2019 to May 2020 using paired data from the first day of each forecast cycle. The model showed an excellent ability to reproduce the observed seasonal cycle of 2 m temperature in all the regions and captures the seasonal cycle. The model also captured the regional variability in 2 m temperature with a lower amplitude of the seasonal cycle in the southern parts of the domain, i.e., R4, R6, and R9. The correlation coefficient between the model and observations in all the regions exceeded 0.99, and the mean bias was within ±1.6 K. We observed slightly higher differences between the model and observations in 2020 compared to 2019. This may have been because of the COVID-19 induced changes in anthropogenic emissions, which were not considered in our forecasting setup, and might feedback to the meteorology.

Meteorological Evaluation
The model also showed an excellent ability to reproduce the hourly and daily averaged variations in surface pressure in the regions R1-R7 with correlation coefficients exceeding 0.98 and mean biases smaller than −3 hPa ( Figure 5). However, the model showed poorer performance in the regions R8-R10, which includes the Rocky Mountains. The model could not resolve all the topographical features of the Rocky Mountains at a grid spacing of 12 km × 12 km, and the underestimation of surface pressure by the model in R8-R10 is reflective of the smoother topography used by the model. The mean bias in R9 and R10 went up to 7.5 hPa.  The model also showed an excellent ability to reproduce the hourly and daily averaged variations in surface pressure in the regions R1-R7 with correlation coefficients ex-  .  The model reproduced the observed seasonal cycle of relative humidity in all the regions with correlation coefficients exceeding 0.88 ( Figure 6). The model reproduced the observed seasonal cycle of relative humidity in all the regions with correlation coefficients exceeding 0.88 ( Figure 6).  The model showed the largest errors in relative humidity (RH) in the R1, R2, and R5 regions. Interestingly, the model did not show large errors in the regions R9 and R10 where the model showed the highest errors in surface pressure. To analyze the effects of the bias seen in WRF simulated surface pressure (PSFC) in R8-R10, we analyzed the sensitivity of RH to variations in PSFC. WRF RH is derived from water vapor mixing ratio at 2 m (w), PSFC, and temperature at 2-m (T2) using the following equation from [30] RH = ((PSFC × w)/((0.622 + w) × es(T2))) × 100 (%) where es(T2) is the saturation vapor pressure at T2 and is derived using Equations (2) and (3) At a constant w, the biases in PSFC and T2 will determine the error in RH. The maximum mean bias values in PFSC and T2 for all the regions were estimated as 7.5 hPa and 1.6 • C, respectively. To assess the impact of these biases on RH calculation, we also calculated RH by adding ±3.5 hPa and ±0.8 • C to PSFC and T2, respectively. The changes in RH due to surface pressure perturbations were estimated at 0.6%, 0.4%, and 0.6% for the regions R8, R9, and R10, respectively. In contrast, temperature perturbations caused larger changes in RH of −7.6%, −5.5%, and −7.7% for the regions R8, R9, and R10, respectively. Thus, the RH calculation was more sensitive to temperature bias rather than the pressure bias, and that is why the modeled and observed RH had a reasonable agreement in regions R8-R10 despite the bias in surface pressure.
Since water vapor mixing ratios (w) also determine the RH, we calculated w from the dew point temperature (Td) and PSFC using Equation where e(Td) is vapor pressure and is estimated using Equations (5) and (6) provided by [30]. The comparison of the seasonal cycle of WRF and METAR derived w for day-1 forecasts is shown in Figure 7. The model captured the seasonal cycle of w very well in all the regions as indicated by r values exceeding 0.97 and mean biases smaller than 0.57 g/kg. The good comparison between WRF and METAR water vapor mixing ratios and higher sensitivity to temperature errors indicated that higher biases in RH in the latter half of the study period in R1-R3 and R5 were likely a result of larger errors in temperature. To confirm this, we estimated RH using WRF PSFC, WRF w, and observed METAR T2. This revised RH (rev) estimated by eliminating the temperature bias showed ( Figure S3) a lower mean bias than the WRF RH (derived from WRF T2).  For the appropriate evaluation of the modeled 10 m wind speed, we replaced all instances of calm winds (wind speed less than 1 knot) in the model by zero, as is done in the METAR dataset. The model captured the observed seasonal changes in 10 m wind speed in all the regions but generally overestimated the wind speed by 0.64-1.06 m/s on average (Figure 8). The correlation coefficients for 10 m wind speed were lower than those for temperature, pressure, and relative humidity, indicating a slightly poorer model performance for winds. The WRF model is well known to overpredict 10 m wind speed at low For the appropriate evaluation of the modeled 10 m wind speed, we replaced all instances of calm winds (wind speed less than 1 knot) in the model by zero, as is done in the METAR dataset. The model captured the observed seasonal changes in 10 m wind speed in all the regions but generally overestimated the wind speed by 0.64-1.06 m/s on average (Figure 8). The correlation coefficients for 10 m wind speed were lower than those for temperature, pressure, and relative humidity, indicating a slightly poorer model performance for winds. The WRF model is well known to overpredict 10 m wind speed at low to moderate wind speeds in all available planetary boundary layer (PBL) schemes [31]. This shortcoming of the model was partly attributed to unresolved topographical features by the default surface drag parameterization, which in turn influences surface drag and friction velocity and partly to the use of coarse horizontal and vertical resolutions of the domain [32]. Mass and Ovens [31] also found this overestimation at the horizontal grid spacing of 4 km × 4 km but noticed improvement at 1.3 km × 1.3 km grid spacing. In contrast to wind speed, the model captured the wind direction in all the regions very well, with correlation coefficients exceeding 0.77 and mean biases within 11 • (Figure 9). to moderate wind speeds in all available planetary boundary layer (PBL) schemes [31]. This shortcoming of the model was partly attributed to unresolved topographical features by the default surface drag parameterization, which in turn influences surface drag and friction velocity and partly to the use of coarse horizontal and vertical resolutions of the domain [32]. Mass and Ovens [31] also found this overestimation at the horizontal grid spacing of 4 km × 4 km but noticed improvement at 1.3 km × 1.3 km grid spacing. In contrast to wind speed, the model captured the wind direction in all the regions very well, with correlation coefficients exceeding 0.77 and mean biases within 11° (Figure 9).  The model performance for second-day forecasts is presented in the Supplementary  Material (Figures S4-S9) and was very similar to the first-day forecasts. The range of statistical parameters for the first-and second-day forecasts for all the meteorological The model performance for second-day forecasts is presented in the Supplementary  Material (Figures S4-S9) and was very similar to the first-day forecasts. The range of statistical parameters for the first-and second-day forecasts for all the meteorological parameters are shown in Table 2. It can be clearly seen that these statistical metrics did not change significantly from the first-to second-day of forecasting except for the wind speed, where we saw large differences in the correlation coefficient from the first to the second day of the forecast. These results suggest that the meteorological accuracy of our forecasting system does not change significantly with the lead time.

Surface PM 2.5 Evaluation
The variations in hourly and daily averaged WRF-Chem forecasted surface PM 2.5 mass concentrations were evaluated against the EPA AirNOW observations from 1 June 2019 to 31 May 2020 (see Figure 10 for the first-day forecasts and Figure S10 for the second-day forecast). In most of the EPA regions, both the model and observations showed the highest surface PM 2.5 during the winter season. Higher wintertime PM 2.5 mass concentrations can be attributed to two factors. First, the seasonal cycle of anthropogenic emissions (see Figure S11) showed that OC emissions peaked in winter in almost all the regions, which was likely due to residential wood burning. Second, the boundary layer was shallower during the winter, and the emissions were trapped into a smaller volume near the surface. In contrast, a deeper boundary layer during summer combined with injection of fire emissions into the free troposphere leads to lower surface PM 2.5 mass concentrations during summer. Forest fires generally play an important role, but 2019 was a low fire activity year. To understand the relative roles of anthropogenic and fire emissions, we also analyzed the time series of carbon monoxide (CO) tracers implemented to track CO emitted from anthropogenic (CO (Anth)) and fire (CO (Fire)) emissions in our forecasts. The seasonal cycle of CO (Anth) and CO (fire) is shown in Figure S12. CO (Fire) was always less than CO (Anth), indicating that fire emissions overall did not play a dominant role in controlling air quality during our study period.
WRF-Chem showed a moderate ability to capture the seasonal cycle of surface PM 2.5 mass concentrations in the R1-R3 and R5-R7 regions with the correlation coefficient exceeding 0.58 but performed poorly in the remaining regions, i.e., R4 and R8-R10. The annual mean bias in different regions is within ±2 µg m −3 , with the highest bias in the R8 and R9 regions. However, the daily mean bias occasionally exceeded 20 µg m −3 in some of the regions. The underestimation of PM 2.5 by the model could be partly attributed to the use of the MOZCART chemical mechanism in WRF-Chem, which is linked to the GOCART aerosol scheme. The GOCART scheme does not include nitrate and secondary organic aerosols and thus is missing some of the aerosol components. In addition, wildfire aerosol emissions in FINN v1 are known to be biased low and are expected to contribute to the low model results as well. To understand the model performance in different environments, we also evaluated the model performance in simulating PM 2.5 at urban ( Figure S13), suburban (Figure S14), and rural ( Figure S15) site types. The model performance at the urban, suburban, and rural sites was overall similar to the performance at all sites. The largest mean bias at all types of sites was noticed in regions R8 and R9. The urban sites showed the largest mean bias in region R8, whereas the suburban and rural sites showed the largest mean bias in region R9. The range of statistical scores across the 10 EPA regions for PM 2.5 performance at all sites, urban, suburban, and rural sites for day 1 and day 2 forecasts are also shown in Table 2. The statistical scores at all types of sites did not change significantly from the first to the second day of the forecast, indicating that the model's performance remained fairly constant over the forecast duration. Atmosphere 2021, 12, x FOR PEER REVIEW 17 of 25 To provide further insight into the hourly performance of our forecasts, we compared the time series of forecasted and observed PM2.5 mass concentrations on a 48 h time scale To provide further insight into the hourly performance of our forecasts, we compared the time series of forecasted and observed PM 2.5 mass concentrations on a 48 h time scale obtained by averaging paired data from all the sites in each EPA region ( Figure 11). In all the regions, our forecasts captured higher nighttime and lower daytime concentrations of PM 2.5 . This diurnal variability in PM 2.5 mass concentrations results from a combination of the diurnal variations in emissions and boundary layer dynamics. The hourly standard deviation (not shown in Figure 11) averaged observed values (3.73-10.09 µg/m 3 ) was slightly higher than using the model (3.36-8.94 µg/m 3 ). The difference between the model and observations slightly increases with lead time in regions R1, R4, R5, and R8, whereas it either stayed constant or slightly decreased in other regions.
Atmosphere 2021, 12, x FOR PEER REVIEW 18 of 2 obtained by averaging paired data from all the sites in each EPA region (Figure 11). In a the regions, our forecasts captured higher nighttime and lower daytime concentrations o PM2.5. This diurnal variability in PM2.5 mass concentrations results from a combination o the diurnal variations in emissions and boundary layer dynamics. The hourly standar deviation (not shown in Figure 11) averaged observed values (3.73-10.09 µ g/m 3 ) wa slightly higher than using the model (3.36-8.94 µ g/m 3 ). The difference between the mode and observations slightly increases with lead time in regions R1, R4, R5, and R8, wherea it either stayed constant or slightly decreased in other regions. We also analyzed the daily variations in the Pearson's correlation coefficient (r), mean bias (MB), and root mean squared error (RMSE). These statistical metrics were calculated from model-observations pairs available at all the sites over the 48 h forecast period every day in each EPA region. Daily variations in r, MB, and RMSE in each EPA region are shown in Figures S16-S18, respectively. We noticed a large day-to-day variability in all the statistical parameters with r, MB, and RMSE values ranging from −0.43-0.88, −11.97-16.05 µg/m 3 , and 2.13-36.06 µg/m 3 , respectively. The highest r values were seen in R1-R3 (i.e., the eastern US), and the lowest r values were seen in R8-R10 (i.e., the western US). The mean bias changes were from mostly negative in summer to mostly positive in winter and early spring, particularly in regions R1-R4. The mean bias was mostly within ±5 µg/m 3 in regions R5-R7 and was mostly negative throughout the year in R8 and R9. The mean bias fluctuated more rapidly between negative and positive values in R10. The RMSE shows an increase during winter in R1, R2, and R8-R10, while the other regions showed a relatively constant RMSE throughout the year along with some occasional increase in the RMSE. We also attempted to analyze daily variations in these statistical metrics for high aerosol loading events characterized by daily average PM 2.5 mass concentrations, but very limited occurrences of such events inhibited us from investigating the daily regional variability in these metrics during such events.
These seasonal changes in the model performance can be attributed to several factors, including uncertainties in the seasonal cycle and magnitude of the emissions, changes in nitrate and SOA aerosols across regions and seasons which are not simulated by the GOCART scheme, errors in meteorological and chemical initialization of the model, errors in physical parameterizations that can lead to errors in PBL dynamics, chemical kinetics, and deposition, and the inability of the model to resolve complex topographical features. For example, recent studies have shown that an average change of about 1.5 µg/m 3 in initial conditions can change 48 h PM 2.5 forecasts by 0.5-3 µg/m 3 at different lead times in the US [33,34]. The horizontal grid spacing is also shown to affect model simulations, particularly in urban, coastal, and complex terrain regions [35]. Uncertainties in dry deposition parameterizations are estimated to introduce an uncertainty of up to 40% on anthropogenic and 52% on biogenic organic aerosols over CONUS in the WRF-Chem model [36,37].
To understand which chemical species contribute the most to PM 2.5 mass concentrations in different regions, we analyzed the mass concentration of black carbon (BC), organic carbon (OC), dust, sea-salt, sulfate, and other PM 2.5 . For the GOCART model, the PM 2.5 mass concentrations were calculated using the following equation PM 2.5 = BC1 + BC2 + (OC1 + OC2) × 1.8 + DUST1 + 0.286 × DUST2 + SEAS1 + 0.942 × SEAS2 + 1.375 × SULF + P25 where BC1 and BC2 represent hydrophobic and hydrophilic BC, respectively; OC1 and OC2 represent hydrophobic and hydrophilic OC, respectively; P25 represents non-speciated primary PM 2.5 , DUST1 and DUST2 represent dust from the first and second bins corresponding to effective radii of 0.73 µm and 1.4 µm, respectively; SEAS1 and SEAS2 represent sea-salt from the first and second bins corresponding to effective radii of 0.3 µm and 1.0 µm, respectively; SULF represents sulfate. SULF was multiplied by 1.375, which is the ratio of molecular weight of ammonium sulfate (132.14 g/mol) to sulfate (96.06 g/mol). Since GOCART assumes that sulfate aerosols are present as ammonium sulfate, multiplication of SULF by 1.375 adds the missing ammonium mass needed to neutralize sulfate. (OC1 + OC2) was multiplied by 1.8 to convert organic carbon to organic matter. The values of all the aerosol chemical components contributing to PM 2.5 mass concentration at the EPA AirNOW sites were extracted from the six hourly model output. The variations in daily averaged BC, OC, Dust, Sea-salt, and P25, and Sulfate in 10 EPA regions are presented in Figure 12. OC was the most dominant contributor to PM 2.5 mass concentration in regions R1-R7, with Sulfate and sea-salt becoming the other two most important contributors. Dust also appeared as an important contributor to total PM 2.5 in R6 in June 2019. Dust, sea-salt, OC, and Sulfate were the most important contributors to PM 2.5 mass concentration in regions R8 and R9, while OC and sea-salt were the most important in region R10.  To understand this regional variability in PM 2.5 composition, we analyzed the seasonal cycle of the U.S. NEI 2014 anthropogenic emissions of BC, OC, SO 2 , and other unspeciated PM 2.5 (P25) used in the model in each EPA region ( Figure S11). In most of the regions, OC emissions were the highest in almost all regions except during June-October in regions R7-R10, where OC and P25 emissions were comparable, and throughout the year in R6. Higher OC emissions explain the dominance of OC contribution to PM 2.5 composition.
Dust emissions are calculated online within the model and are not output in our forecasts. To understand why dust concentrations were the highest in R9, we show the spatial distribution of the dust source function in Figure S19. The dust source function represents the fraction of alluvium available for wind erosion and thus marks the grid boxes from where dust can be emitted. Dust source functions are pre-calculated and provided as a fixed input dataset to the model. In California, Nevada, and Arizona, which have the highest dust source function and all belong to region R9; we saw the highest dust concentrations in that region.

Conclusions and Outlook
This paper presents the setup, configuration, and evaluation of the surface PM 2.5 and meteorological components of the NCAR regional air quality forecasting system. The system complements the NOAA operational air quality forecasts by providing an additional piece of information for decision-makers, and the public serves as a testbed for identifying errors and biases in the model performance through NRT evaluation, supports field campaign planning, and provides the atmospheric chemistry community a data set to support their research needs. The model shows excellent skill in capturing hourly to daily variations in temperature, surface pressure, relative humidity, and wind direction but showed relatively larger errors in wind speed. The model also captures the seasonal cycle of surface PM 2.5 observations very well in all different regions of the US with a mean bias smaller than 1 µg m −3 . The skill of the forecasting system in simulating both the meteorological parameters and PM 2.5 mass concentrations does not change significantly from the first to the second day of the forecast. While the model shows a reasonable performance in capturing daily variations, the model is overall underestimating concentrations on days when PM 2.5 exceeds its National Ambient Air Quality Standard. To a large degree, this is due to the model not simulating nitrate aerosols and SOA but also because of the 12 km spatial resolution. Hence, for any forecast model that uses a simplified aerosol scheme, these shortcomings need to be considered in the interpretation.
We plan to take several steps in the coming years to further improve the accuracy of the air quality forecasts in our system. First, we plan to replace the monthly averaged anthropogenic emissions with weekday and weekend/holiday varying anthropogenic emissions and also adjust the emissions for the non-NEI reported years using the EPA reported annual trends in emissions. Second, we plan to initialize our air quality forecasts via the assimilation of Moderate Resolution Imaging Spectroradiometer (MODIS) AOD retrievals, which have been found very useful in reducing biases in Community Multiscale Air Quality (CMAQ) model predictions of PM 2.5 over the CONUS [33] and WRF-Chem predictions of PM 2.5 in Delhi [12]. Third, we plan to evaluate how well the persistence fire assumption works in the CONUS and whether we can develop an algorithm to predict fire emissions for two days in the future possibilities based on machine learning techniques. Future fire emissions can be predicted using fire behavior and spread models, but they are computationally too intensive to be applied to a large number of fires that can be detected in a given day over the CONUS. Future studies will also present a comprehensive evaluation of surface ozone and related species against the available in situ and satellitebased observations. Finally, we invite the community to use our forecasting products to join in the analysis, use the model output as boundary conditions for finer scale (~4 km grid spacing) air quality forecasts over urban areas of CONUS, or co-develop customized products to serve their needs.
Supplementary Materials: The following are available online at https://www.mdpi.com/2073-4 433/12/3/302/s1, Figure S1: Map of all PM 2.5 observation sites (black dots) used in near-realtime verification and the sites (red dots) used in the annual evaluation presented in this study. Figure S2: Definition of EPA regions. Figure reproduced from the EPA regional website (https: //www.epa.gov/aboutepa/visiting-regional-office (accessed on 20 February 2021)). Figure S3: Seasonal variations in daily observed (Obs) model simulated (WRF) and revised (Rev) 2 m relative humidity in four EPA regions during June 2019 to May 2020. r and MB for each region estimated using paired data for the full year are also shown. Figure S4: Same as Figure 4 but using day 2 of the forecast. Figure S5: Same as Figure 5 but using day 2 of the forecast. Figure S6: Same as Figure 6 but using day 2 of the forecast. Figure S7: Same as Figure 7 but using day 2 of the forecast. Figure S8: Same as Figure 8 but using day 2 of the forecast. Figure S9: Same as Figure 9 but using day 2 of the forecast. Figure S10: Same as Figure 10 but using day 2 of the forecast. Figure S11: Monthly variations in BC, OC, P25, and SO2 emissions averaged at the EPA AirNOW observation sites in each EPA region. BC, OC, and P25 emissions are in ng m −2 s −1 , and SO2 emissions are in mole km −2 h −1 . Figure S12: Variations in daily averaged CO (Anth) and CO (Fire) tracers in 10 EPA regions from 1 June 2019 to 31 May 2020. Figure S13: Variations in hourly and daily averaged WRF-Chem forecasted surface PM 2.5 mass concentrations against the EPA AirNOW observations at the urban sites in 10 EPA regions from 1 June 2019 to 31 May 2020. r and MB for each region estimated using paired data for the full year are also shown. Figure S14: Same as Figure S13 but at the suburban sites. Figure S15: Same as Figure S13 but at the rural sites. Figure S16: Seasonal variations in the Pearson's correlation coefficient (r) calculated from model-observations pairs available at all the sites over the 48-h forecast period every day in each EPA region. Figure S17: Seasonal variations in the mean bias (MB) calculated from model-observations pairs available at all the sites over the 48-h forecast period every day in each EPA region. Figure S18: Seasonal variations in the root mean squared error (RMSE) calculated from model-observations pairs available at all the sites over the 48-h forecast period every day in each EPA region. Figure S19: Spatial distribution of the dust source function over the model domain.