WRF-Chem Modeling of Summertime Air Pollution in the Northern Great Plains: Chemistry and Aerosol Mechanism Intercomparison

: Oil and gas production in the Bakken region increased dramatically during the past decade. A WRF-Chem modeling study of the Northern Great Plains was conducted for a July 2010 baseline scenario prior to the largest of these production increases. Simulations using the RACM-MADE/SORGAM, CBMZ-MOSAIC, and MOZART-MOSAIC chemistry-aerosol mechanisms were compared to each other and against ground level observations. All three gas-aerosol modules produced similar prediction results for O 3 , and NO 2 , with moderate correlation to hourly measurements and monthly average values overpredicted by 20% for O 3 and underpredicted by 5% for NO 2 . Monthly average PM2.5 concentrations were relatively accurate, but correlation to hourly measurements was very low and PM2.5 subspecies exhibited high variability with a mix of over and underpredictions depending on the mechanism. Pollutant concentrations were relatively low across the mostly rural study domain, especially in the Bakken region. Results from this work can be used as a basis of comparison for studies of more recent time periods that include increased oil and gas-related emissions.


Introduction
Oil and gas production activities have increased considerably in the United States over the last decade, which has caused increased volatile organic compound (VOC) and nitrogen oxide (NOx) air pollutant emissions [1]. One area that has seen dramatic changes is the Bakken oil region, which spans 520,000 km 2 across western North Dakota and eastern Montana in the U.S., as well as southwestern Manitoba and southeastern Saskatchewan in Canada [2]. Drilling activity in the Bakken region increased ten-fold between 2007 and 2015 and has since maintained an average production rate of over 1.1 million barrels a day, now making North Dakota (ND) the second largest oil-producing state in the U.S. During the same time, natural gas production has also increased more than ten-fold [3]. Monitoring data in the Bakken region has shown corresponding increases in atmospheric levels of NOx, VOCs, and PM2.5 [2,[4][5][6][7].
Atmospheric modeling can provide further insight into these changes and can help predict future impacts of continued drilling activities and increases in production. Modeling of the Bakken region, however, has been limited, with no modeling studies focused specifically on the Northern Great Plains (NGP). Simulations of the broader continental U.S. offer some general information on recent and potential future pollutant conditions of relevance to the Northern Great Plains region. Simulations of air quality in the western half of the U.S. in 2011 with and without oil and gas emissions showed widespread impacts on O3 and nitrogen deposition, and more localized contributions to knowledge, specifically focused in this geographical area. July 2010 was used as a representative summertime episode, the season when ozone and PM2.5 pollutant levels have historically been highest [48], at a time when oil and gas production activities in the region were still small relative to current and expected future activity. The aims of this work were to identify an appropriate combination of model options for simulating this region and use them in a simulation reflecting baseline conditions within our region of study. The results of these baseline model simulations will be used as a basis of comparison for future studies assessing the impact of oil and gas-related activities on regional air quality.
The Model of Ozone and Related Chemical Tracers (MOZART) [31] gas-phase chemistry mechanism includes 85 chemical species and 196 reactions. MOZART is coupled with the Model for Simulating Aerosol Interactions and Chemistry (MOSAIC) [32] aerosol scheme, using four sectional size bins to reduce computational cost. The MOZART-MOSAIC (MM) scheme used includes a limited set of secondary organic aerosol (SOA) species. The Zaveri version of the Carbon Bond Mechanism (CBMZ) gasphase chemistry mechanism [26] is based on the Carbon Bond Mechanism (CBM-IV) [27] and includes 52 total species and 132 reactions. CBMZ is also coupled with MOSAIC. SOA is not included in the CBMZ-MOSAIC (CM) scheme used in this study. The Regional Atmospheric Chemistry Mechanism (RACM) [29] includes 77 gas-phase species and 237 reactions. In this work, the RACM mechanism was coupled with the Modal Aerosol Dynamics Model for Europe (MADE) [33], which describes the aerosol with 3 log-normal modes. A limited number of secondary organic aerosol species were incorporated into MADE with the Secondary Organic Aerosol Model (SORGAM) [34]. RACM chemical mechanism code for implementation in WRF-Chem was generated with the kinetic preprocessor (KPP) [50,51]. The main physical and chemical parameterizations used in WRF-Chem are shown in Table 1. A 24 km grid resolution with 28 vertical layers was used to achieve a balance of accuracy and computational cost. A common set of meteorological and physical options in WRF-Chem were used for simulations with the three different chemistry-aerosol schemes.

Model Scenario
The model scenario used in this study is a simulation of July 2010 in the Northern Great Plains. As shown in Figure 1, the modeling domain included all of the central United States, with analysis focused on the region surrounding and downwind of the Bakken oil formation. The analysis region, indicated by the red box in Figure 1, included the states of North Dakota, South Dakota, Nebraska, Montana, Wyoming, Minnesota, and Iowa. This largely excluded population and pollution sources in the surrounding areas so that analysis could focus on this more rural and unpolluted region. Data from North Dakota monitoring sites showed an O3 concentrations peak during the summer, while PM2.5 concentrations showed no seasonal trend and remained constant on average throughout the year [48,58]. July was used to represent these summertime conditions when both O3 and PM2.5 concentrations in North Dakota are the highest. The 2010 year was selected because it had meteorological conditions, similar to average conditions during the past decade. Additionally, importantly, in 2010 oil and gas production was relatively low (0.3 million barrels oil per day in 2010, compared with over 1.1 million barrels per day each year since 2014 [59]), providing a baseline scenario of the region before the subsequent increases in oil and gas development and associated emissions. The meteorological input data sets for the year 2010 were downloaded from the North American Regional Reanalysis (NARR) 3-h data, at 32 km resolution [60]. Model input data for the emitted chemical species used hourly area and point source anthropogenic emissions from the 2011 National Emission Inventory (NEI-2011) [61] at 4 km resolution, and biogenic emissions from the MEGAN 2.04 model [62] at 1 km resolution. The NEI-2011 emissions include hourly emissions data for an average July weekday that is used to represent emission rates on all days of a simulation. Biogenic emissions vary throughout the month based on simulated hourly temperature and solar radiation. Initial and lateral boundary conditions for chemical species were obtained from the MOZART-4 model [31]. MOZART output data at 6-h intervals was downloaded from the National Center for Atmospheric Research (NCAR) [63] and then converted to the proper format for WRF-Chem using the MOZBC pre-processor [64].
To ensure that simulation results did not drift significantly from real meteorological conditions, the month-long scenario was simulated in 5-day intervals [41]. A 12-h meteorological spin up was used before each 5-day interval and a 5-day spin up time was used to initialize the entire scenario. Some of the simulations were carried out on a highperformance computing system consisting of 3.3 GHz Intel Xeon E5-2643 processors (Intel: Santa Clara, CA, USA) with a 56 Gbit EDR InfiniBand (IB) interconnect and a GPFS parallel file system. Every 5.5 meteorological day model run took about 13 h to complete using 8 cores of this HPC system. Other simulations were carried out on an individual Linux PC consisting of a 3.4-4.0GHz, 4-core Intel Core i7-6700 Processor (Intel: Santa Clara, CA, USA). Every 5.5 meteorological day model run took about 84 h to complete using this PC system.

Observational Data and Analysis
Model predictions were compared to ground level observational data for July 2010 from EPA's Chemical Speciation Network (CSN) [58], Clean Air Status and Trends Network [65], and the Interagency Monitoring of Protected Visual Environments (IMPROVE) network [66]. Locations of the monitoring sites are shown in Figure 1. There were a total of 47 sites reporting data for one or more of the following: hourly values of O3 (27 sites), NO2 (10), PM2.5 (18), near-ground temperature at 2 m (15), surface pressure (9), relative humidity (12), wind speed at 10 m (6), and 24-h values every third day for PM2.5 components OC (15), EC (15), SO4 2− (22), NO3 − (21), and NH4 − (6). Additional details for each of the monitoring sites are found in Table S1 of the Supplementary Information. Sites were assigned into a western group that includes the Bakken and an eastern group, mainly in or near Minnesota and Iowa.
Statistical analysis was performed comparing CBMZ-MOSAIC, RACM-MADE/SORGAM, and MOZART-MOSAIC model predictions to the observational data. Reported statistical measures include Pearson's correlation coefficient (R), mean bias (MB), normalized mean bias (NMB), and root mean square error (RMSE), which are defined in Table 2. Only monitoring sites with <10% missing data were included in the analysis. For aerosol components, the modeled PM2.5 subspecies mass concentration was determined by adding the first three size bins of the MOSAIC scheme or the Aitken and accumulation modes of the MADE/SORGAM aerosol scheme. For organic carbon (OC), RACM-MADE/SORGAM model predictions of organic aerosol (OA) were divided by a factor of 1.4 to obtain OC values, consistent with a commonly used OA:OC ratio [67].

Meteorology
July 2010 exhibited typical summer weather conditions across the study domain. Daily high temperatures were generally 25-35 °C and nighttime lows were 10-20 °C. Scattered rainfall was present somewhere within the region on most days and several periods of strong thunderstorms occurred, most notably 4, 22-24 and 27-28 July. Total monthly precipitation varied from 2-10 cm in the western half of the study region to 10-30 cm in the east, with the highest totals in the southeast [68].
In recent years, the study domain has periodically been impacted by heavy smoke events produced by wildfires in the western United States and Canada. MODIS/Terra satellite imagery for early July 2010 show a cluster of large fires in northern Saskatchewan with clearly visible smoke plumes. On most days, smoke from these fires remained in Canada, north of the model domain for this study, with the only exception on July 15 when there appeared to be a smoke event across parts of ND and into Minnesota [69]. Table 3 lists averaged statistical results comparing meteorological predictions from simulations with the RACM-MADE/SORGAM (RM), CBMZ-MOSAIC (CM), and the MO-ZART-MOSAIC (MM) chemistry-aerosol mechanisms to measurements at ground stations across the domain area of study. These and other statistical results reported for this study are averages of the measures determined for individual stations. A summary of the statistical measures at each location is provided in Tables S2-S5 in the Supplementary Materials. For near-ground temperature and surface pressure, all three models produced similar results that agree well on average with observations, as evidenced by average NMB values of approximately 2% and −0.3%, and average Pearson's correlation coefficients of 0.80 and 0.92, respectively. The models all showed only a small underprediction of average near-ground relative humidity, with −4% average NMB, but produced somewhat lower correlation to hourly observations (average R-value of 0.67). For 10 m wind speed, the three models did not perform quite as strongly, with a tendency toward somewhat larger overpredictions (average NMB value of 30%) and only moderate correlations with observations (average R-value of 0.48). These results are comparable to other studies with WRF-Chem where wind speed was overpredicted by 30-100% [44,70,71]. Differences in meteorological results between the three model formulations was minimal, with only slight variation in the average statistical measures. This is not surprising because the three chemistry-aerosol mechanisms are coupled to the same base WRF meteorological model. The simulations did not produce identical meteorological results to each other because aerosol-cloud-radiation feedbacks in the coupled WRF-Chem model are influenced differently by the different aerosol and gas chemistry mechanism formulations. The effect of geographical location was explored by comparing meteorological parameters for eastern and western monitoring sites. Western sites had lower average observed temperature (20.7 °C in the west vs. 24 °C in the east) possibly due to western sites generally being further north, lower pressure (941 vs. 981 hPa) due to higher elevations in the west, much lower RH (59% vs. 84%), and higher wind speed (3.5 vs. 2.1 m/s). Statistical performance of the models showed variation between eastern and western sites that was generally small, and the three models continued to be in close agreement with each other. Only wind speed predictions had significant geographical bias, with eastern sites overpredicted on average by 58% and western sites underpredicted by 21% (Table S5, Supplementary Materials).
Simulation results also showed similar performance when looking at diurnal cycles in these meteorological measures. As seen in Figure 2, which shows hourly time series and scatterplots for the Dickinson and Beulah sites in western ND, temperature and pressure predictions correlated well with each other and with ambient measurements. Wind speed and relative humidity predictions aligned less closely with observations, but still captured the hourly variability in these measures reasonably well. The trends shown in Figure 2 for temperature, pressure, and relative humidity are representative of model performance at the other monitoring sites across the modeling domain. Wind speed results in Figure 2 are for a western site with large underpredictions, unlike other sites that showed overpredictions for wind speed. Overall, these results give confidence that the meteorological parameters upon which the chemistry simulations are based were simulated appropriately.

Ozone
Model predictions of maximum hourly O3 concentrations are shown in Figure 3 for the RM, CM, and MM chemistry-aerosol mechanisms. The models were generally similar to each other; all three showed a pattern of highest maximum O3 concentrations where major cities are located and towards the southwest and southeast. Peak concentrations in the urban areas exceeded 85 ppb. Maximum O3 concentrations above 65-70 ppb were predicted in areas surrounding major cities and in the southwestern portion of the domain, while much of the central portion of the study area showed lower maximum concentrations of 50-60 ppb. Compared to observations, modeled peak O3 concentrations at the individual monitoring sites showed a negative bias of 2-4 ppb on average. However, there was considerable variation in model error between sites. For all three models, model bias was negatively correlated with measured peak O3 concentration. The models all underpredicted higher observations and overpredicted lower observations by up to 30% in each direction, resulting in less variability between sites for model predictions than for measured concentrations. While general trends were similar, the three models did predict some differences in the level and spatial patterns of peak O3 concentrations (see Figure S1 in the Supplementary Materials). When averaged across the entire domain, the RM model resulted in the highest peak O3 levels. Compared with RM, the CM simulations produced lower concentrations of approximately 1-3 ppb across most of the western half of the domain, while in the east, higher gradients were predicted around urban centers, with peak concentrations higher by 3-6 ppb within the cities, and lower by 3-9 ppb in some surrounding areas. MM results were also lower on average than RM, but showed the opposite behavior with less of a gradient between high and low-concentration areas. Some areas that showed higher peak concentrations with the RM and CM models, such as the Minneapolis metro area and western Nebraska, were lower by 10-20 ppb with the MM model. Conversely, areas in the southeastern portion of the domain had peak O3 levels with MM that were higher by 5-15 ppb.
Monthly average O3 predictions are shown in Figure 4a-c. These plots illustrate the similar geographical patterns in the model results of the three chemistry-aerosol mechanisms. Predicted values ranged from the lower 30 ppbs in the north, to the high 30 s in the south, with much higher concentrations in the southwest reaching 50 ppb. Observations values showed a similar pattern of higher concentrations towards the southwest and a slight north-south gradient, but with some variability between individual sites as to be expected with point measurements. Mean bias (MB) results of model predictions compared to observations are presented in Figure 4d-f. Model simulations with all three mechanisms consistently overpredict observations, on average by 4-6 ppb, with larger bias in the east, particularly at southeastern sites, than in the west. This is consistent with a previous study over the conterminous United States which also showed that RACM overpredicted average 24 h ozone concentrations by a similar magnitude [11].
Simulations showed minor differences between mechanisms in average predicted O3 concentration ( Figure S1, Supplementary Materials). Averaged across the domain, the MM and RM models produced the highest average O3 concentrations, both with predictions higher than CM by approximately 1.5 ppb. Other comparisons of these gasphase chemistry mechanisms showed similar or greater differences in average O3 predictions [18,19]. Geographical patterns for the RM and CM mechanism were similar, but with RM consistently higher by around 1 ppb in the east, and 2 ppb on the west of the domain. Results with the MM mechanism, however, showed a somewhat different pattern. Compared with RM, the MM simulations showed concentrations that were higher towards the east by 1-3 ppb, but lower in the southwest by 1-3 ppb. Additionally, compared with CM, MM showed larger positive bias, with concentration differences in the northeast and southeast of up to 4 ppb. The model differences in average O3 share some features with the peak O3 results ( Figure S1, Supplementary Materials), but with smaller concentration differences between the models for the monthly averages. In the east, MM tended to predict higher concentrations than the other models, except for the Minneapolis urban area where it predicted less O3, while in the southwest the RM mechanism yielded the highest O3 concentrations. MB and NMB results show the CM mechanism predictions exhibited the least bias at both eastern and western sites, while MM and CM had larger overpredictions, with MM highest at the eastern sites and RM highest at sites in the west. Model predictions correlated well with observed hourly O3 concentrations. As seen in Table 4, overall Pearson's correlation coefficients were moderate, with R = 0.52-0.53 for all three mechanisms. Some lack of correlation is expected due to the use of average weekday emissions for all simulation days; for each mechanism, the predictions of eastern sites showed a stronger average correlation coefficient with observations (R = 0.53-0.56) than on western sites (R = 0.45-0.48). The magnitude of average RMSE was also similar for all three mechanisms and for both eastern and western sites, with all values within the range of 10-13 ppb (see Table S6, Supplementary Materials for individual site data).  Figure 5 presents hourly O3 concentrations for two sites, Fargo ND and Dickinson ND, representative of eastern and western monitoring sites. Predicted hourly concentrations capture the observed diurnal cycle in ground level O3. However, like many eastern sites, the Fargo results showed a consistent overprediction of results throughout most of the month, generally failing to capture observed nighttime lows below 20 ppb. For the Dickinson site, the models followed observations more closely, while still predicting smaller day/night variations than measured at the site. Scatter plots of both sites also show that model predictions aligned very closely to each other, and evidenced a clear overprediction when observed concentrations were low in all of the models.

NO2
Model predictions of average hourly NO2 concentrations are shown in Figure 6a-c for the RM, CM, and MM chemistry-aerosol mechanisms. The three model predictions appeared very similar to each other, with average NO2 levels less than 1 ppb across much of the western and far northern portions of the domain and slightly higher concentrations of 1-2 ppb in the eastern half of the region. Higher concentrations were predicted at urban centers and along major interstate freeways, with a large peak reaching 14 ppb around Minneapolis, MN. These model predictions generally aligned with the observational data, which reported higher levels in the east, and lower levels in the west, but with elevated levels near Minneapolis MN, Des Moines IA, and Gillette WY. As shown in Figure 6d-f, a slight positive bias of up to 0.5 ppb was observed at many sites, while a negative bias of 1-2 ppb was observed at others. Much larger discrepancies occurred at the monitoring site near Minneapolis, where the models exhibited a high positive bias of 6 ppb, more than doubling the average observed concentration. Mean NO2 concentration predictions were consistent and extremely similar among the three chemistry mechanisms ( Figure S1, Supplementary Materials). The only noticeable differences between models were RM predictions were approximately 0.5 ppb higher at the peak location of the Minneapolis area, and CM showed levels about 0.3 ppb lower in a small area of the southwest, but even those values were within 10% of the other mechanisms. Averaged across the entire domain, RM predictions of NO2 were only slightly higher (0.03 ppb) than MM results, which were slightly higher (0.03 ppb) than CM results. The average relative difference between mechanisms for monthly mean NO2 and O3 were both approximately 5%, but NO2 differences were more uniform across the domain than O3, which had much larger and offsetting geographical variability. Table 5 provides statistical analysis of the hourly concentration results. Averaged across the 10 NO2 monitoring sites, the predicted monthly average NO2 concentrations for each mechanism were very close to the observed mean of 3.1 ppb, with mean biases smaller than −0.2 ppb. However, these domain average values hide much larger offsetting errors among the individual sites, with underpredictions of 1-2 ppb at several sites, and an overprediction of more than 5 ppb at the site in the Minneapolis area. Comparing model performance at eastern and western sites, MB averaged around −0.8 ppb in the west and about 0.6 ppb in the east, but again with wider variations from site to site. Results for NMB showed a similar pattern, with average overpredictions of 12-19% in the east and underpredictions of 21-28% in the west depending on the model. This range of NMB is consistent with NO2 results reported by other researchers in simulations of the continental U.S. [44,46]. Each of the three models in this study produced similar results. At all locations, the RM mechanism predicted the highest NO2 concentrations, but MM and CM results were always within 10% and 6%, respectively, of the RM values. Moderate correlation was seen between model predictions and observed hourly NO2 concentrations for all mechanisms, with domain-wide average R values of 0.38. On average, eastern sites had slightly higher correlation values with R = 0.42 in the east and R = 0.35 in the west, but it is important to remember that these east and west average values are each based on a small number of monitoring sites and that the individual site values ranged more widely, especially in the west (Table S7, Supplementary Materials). The average RMSE (3.2 ppb) was observed to be roughly equal in magnitude to the observed mean concentration (3.1 ppb), and this relationship held true for eastern and western averages as well as for the individual sites. As shown in Figure 7, predicted hourly concentrations of NO2 followed the observed diurnal cycle at both the Fargo ND and Gillette WY sites, while tending to underpredict their nighttime peaks. This is generally true across the other monitoring sites, with the notable exception of the Minneapolis site, where the models significantly overpredicted peak NO2 levels.

PM2.5
Model predictions of average hourly PM2.5 concentrations are shown in Figure 8a-c for the RM, CM, and MM chemistry-aerosol mechanisms. The three mechanisms appeared to show a similar spatial pattern, with low concentrations in the west and north of less than 5 μg/m 3 and higher concentrations in the southeast exceeding 10 μg/m 3 . Comparing the predictions from each model, some geographical differences are apparent ( Figure S1, Supplementary Materials). The CM mechanism showed predictions in the southeast that were up to 4 μg/m 3 higher compared with the other mechanisms. The MM mechanism predicted concentrations that were 1-2 μg/m 3 higher in the northeast and 0.5 μg/m 3 lower in the west compared with the other mechanisms. Predictions for other parts of the domain appeared to be similar in all three mechanisms. Averaged across the entire domain, CM produced the highest PM2.5 concentrations, followed by the RM and MM mechanisms. The modeled spatial trends in PM2.5 concentrations agreed generally with the observed concentrations. As shown in Table 6, average observed mean values in the east were about twice the western values, 9.8 and 4.8 μg/m 3 respectively. Model predictions at the observation sites showed a similar trend with higher concentrations at the eastern monitoring sites on average than at those in the west. However, there were still noticeable differences between modeled and observed concentrations, as seen in Figure 8d-f. In the west, underpredictions of 1-4 μg/m 3 occurred at most sites, resulting in an average mean bias of −1.2 to −1.9 μg/m 3 . In the east, the average bias was smaller, with MB ranging from −1.0 to +0.6 μg/m 3 . However, this resulted from overpredictions in the southeast, especially with the CM mechanism, which were offset by large underpredictions of 5-8 μg/m 3 with all three mechanisms at a pair of sites in the northeast (Table S8, Supplementary Materials). Although the model predictions of average monthly PM2.5 concentrations were fairly close to the observed levels and matched the performance level of other modeling studies [17,46], with NMB values ranging from −14% to +2%, they failed to capture the hourly and daily variability in PM2.5. Pearson's correlation coefficients were close to zero for all models at most of the sites studied, with average R values ranging from 0.05 to 0.10. Examples of this poor correlation are shown in Figure 9. At the Fargo site, Figure 9a, the models did not predict the observed episodes of elevated PM2.5 levels, nor the hourly variations, resulting in R values of 0.02-0.07. This is in addition to a consistent underprediction for all three mechanisms. Even at the site with the best modelobservation agreement, Gillette WY in Figure 9b, the correlation coefficient was only 0.23-0.30. The very poor correlation with hourly PM2.5 data is similar to PM results from previous studies with the same or similar aerosol mechanisms [20,38,41]. Model correlation may be better for daily average, rather than hourly, predictions, as has been seen in other studies [11,17,38,41,72]. However, calculations for a few sites indicated that even for daily average values, while slightly improved, all three models still showed weak correlation with observed PM2.5 concentrations. Considering together the bias and correlation results, there does not appear to be any one mechanism clearly more suitable for predicting PM2.5 concentrations in this region, as all three showed similar errors.

PM2.5 Subspecies
Before presenting PM2.5 subspecies results, it is important to describe briefly the monitoring sites and measurements that were used. Observational sites for all of the EC, OC, and for some of the SO4 and NO3 subspecies measurements are part of the IMPROVE network of stations which are located in rural areas. Other monitoring sites provide additional measurements of SO4, NO3, and all measurements of NH4. It should be noted that at many of the 25 PM2.5 subspecies sites, observational values for either PM2.5 and/or ozone were not available. Consequently, average results for individual subspecies and for total PM2.5 may not be directly comparable to one another because they are each based on a different set of monitoring sites. Furthermore, the total PM2.5 values include other components that were not individually measured, such as dust and sea salt.
Both of the model aerosol schemes include EC, OC, NO3, NH4, and SO4. These five major aerosol subspecies have corresponding PM2.5 measurement data and were the focus of analysis. Other model aerosol components, such as sodium, chloride, and other inorganics for which measurement data was not available, were not considered individually, but they do contribute to the total simulated PM2.5 mass concentrations. Within the models, the OC fraction is subdivided into multiple subcomponents representing different source or chemical types. Model OC concentrations reported in this work include all of these OC subcomponents, and for the RM and MM models include secondary organic aerosols (SOA). For comparison to observational PM2.5 subspecies data, which are available as daily averages of every third day, model averages of hourly predictions from the corresponding days were used, whereas model-model comparisons used averages of hourly predictions for the entire month.
Model prediction results for PM2.5 subspecies are presented in Figure 10 and Table  7 (See Tables S9-S13 in the Supplementary Materials for individual site results). All three models showed similar geographical patterns for elemental carbon (EC), with areas of higher concentration in the southeast area of the domain, between 0.2 and 0.4 μg/m 3 , and higher concentration peaks predicted in urban areas. The CBMZ-MOSAIC (CM) chemistry-aerosol mechanism predicted the highest concentrations of EC, with an average simulated EC concentration at the monitoring sites of 0.19 μg/m 3 , followed by MM (0.18 μg/m 3 ) and RM (0.16 μg/m 3 ). Compared with the average observation value of 0.17 μg/m 3 , CM and MM slightly overpredicted EC, with NMB values of 20% and 10% respectively, while RM predictions were barely lower than EC observations on average, with a NMB of −2%. Statistical analysis showed a moderate correlation between model predictions and observed concentrations of EC for all mechanisms, with average R values between 0.37 and 0.43. Predicted EC concentrations were similar, but not identical, for the three different mechanisms. EC is a primary pollutant and will depend on emissions, transport, and deposition. Deposition rates and the resulting EC concentrations are influenced by the overall aerosol size distribution and, as noted by other researchers [20], how the size distribution is represented within the model. The RM mechanism uses a modal approach, while CM and MM use a sectional approach to describe the aerosol size distribution. Together with variations in the formation of other aerosol components that affect particle size, this leads to the small differences in predicted EC concentrations shown in Figure 10 and Table 7.  Organic carbon (OC) also showed areas of high concentration towards the southeast of the domain, but these higher levels continue in a band extending northwest. For the RM and CM simulations, OC concentrations were over 0.5 μg/m 3 across most of the northeastern third of the domain, with lower concentrations of 0.2-0.3 μg/m 3 towards the southwest. The MM simulation, however, showed much higher OC levels, with concentrations over 1 μg/m 3 across the eastern third of the domain and exceeding 1.5 μg/m 3 in the southeast. The higher OC concentrations with MM are attributable to the SOA components included in this version of the MOSAIC mechanism. While the CM simulation also uses MOSAIC, it does not include any SOA. When considering only primary organic aerosol, the MM predictions were very similar to the CM results. The RM mechanism uses a different SOA scheme that produces much less SOA. All of the models predicted OC concentrations that were lower than observed values. The MM mechanism predicted the highest simulated mean of 0.76 μg/m 3 , but this was still well below the average observed OC value of 1.18 μg/m 3 , resulting in NMB of −36%. The RM and CM mechanisms with their even lower OC concentrations had NMB values of −75% and −62%, respectively. Despite these large underpredictions, all three models showed a moderate correlation with the observed OC concentrations, with R values ranging from 0.34 for RM to 0.55 for CM.

PM2.5 Subspecies
Nitrate (NO3) concentrations were highest in the southeast of the domain in all three mechanisms, and were widely dispersed without localized peaks. Maximum predicted concentrations reached 1.8 μg/m 3 for the CM mechanism, 1.2 μg/m 3 for MM, and 0.8 for RM, with levels dropping below 0.2 μg/m 3 across most of the western and northern half of the domain for all three mechanisms. All models significantly overpredicted NO3 concentrations on average. While the observed mean was 0.20 μg/m 3 , simulated means were 0.57, 0.55, and 0.30 μg/m 3 for the CM, MM, and RM mechanisms, respectively. This overestimation is consistent with results from other researchers [17,44,72] who obtained NMB values for NO3 ranging from +30% to +240% with similar models in the U.S. and Europe. The larger NO3 concentrations with CM compared to RM also mirror those reported by Balzarini et al. [17], with the difference attributed to a higher rate constant in CBM-Z for the NO2 + OH  HNO3 reaction. Model-observation correlation was also weak with average R = 0.26-0.28.
Ammonium (NH4) concentrations were more broadly distributed across the domain, with a general north-south gradient for all three mechanisms. Unlike EC, OC, and NO3, where the RM mechanism predicted the lowest concentrations of the three mechanisms, for NH4 it produced the highest concentrations, ranging from 0.2 μg/m 3 in the north to over 0.8 μg/m 3 in the south. The MM and CM mechanisms produced lower NH4 levels of 0.1-0.2 μg/m 3 in the north and 0.3-0.5 μg/m 3 in the south. Differences between the mechanism results for NH4 are related to differences in predicted NO3 and SO4 concentrations because aerosol ammonium is primarily formed as H2SO4 and HNO3 are neutralized by NH3 to form (NH4)2SO4 and NH4NO3. As seen in Figure 10, the geographical patterns for NH4 reflect closely a combination of NO3 and SO4 concentrations, with the south-north gradient in NH4 due to SO4 and elevated levels in the southeast due to NO3. At the 6 NH4 monitoring sites, which were mainly in the southeast, the RM mechanism produced a simulated mean of 0.75 μg/m 3 , compared with the observed average value of 0.49 μg/m 3 . The MM and CM mechanisms also overpredicted mean NH4 concentration, but to a lesser extent. Average NMB was 65% for RM, and only 25% and 11% for MM and CM, respectively. Statistical analysis, however, showed essentially no correlation between model and observed NH4 concentrations for any of the mechanisms, as the average Pearson's correlation coefficient was between −0.01 and 0.06. Sulfate (SO4) showed an increasing trend towards the south and southwest of the domain in all three mechanisms, with no localized peaks predicted. The CM mechanism predicted the lowest SO4 levels with average concentrations ranging from approximately 0.5 μg/m 3 across most of the domain to 1.0 μg/m 3 in the far southeast corner. The MM mechanism had a similar north south trend, but was uniformly higher across the entire domain by about 0.75 μg/m 3 . The RM mechanism predicted the largest north-south gradient, ranging from less than 0.5 μg/m 3 in the north to almost 2 μg/m 3 in the south. In comparisons of model predictions with monitoring site data, the RM mechanism produced the highest simulated mean of 1.26 μg/m 3 , followed by the MM prediction of 1.20 μg/m 3 . Both predictions were slightly higher than the observed average value of 1.18 μg/m 3 , resulting in NMB values of approximately 30%. The CM mechanism predictions were much lower at only 0.53 μg/m 3 and NMB of −37%. This is likely due to an underprediction of aqueous SO2 oxidation to form SO4 as has been suggested by other researchers [17,73]. Consistent with this explanation, in the present study SO2 concentrations with CM were higher than with RM and MM. Similar behavior has been seen in other studies where simulations using CM significantly underestimate observed sulfate concentrations [73], and predict much less sulfate than simulations using MM and RM [17,20]. Statistical analysis showed near null Pearson's correlation coefficients for all three mechanisms, with R between −0.15 and +0.13.
Variability in model performance for the different aerosol subspecies depends on how well the gas and aerosol mechanisms capture the underlying formation processes, especially for secondary species. All three mechanisms performed relatively well on EC predictions, with average NMB less than 20% and moderate correlation to hourly observations. As a primary pollutant, modeled EC concentrations depend on emissions inventories, which were the same in all simulations, as well as on transport, dispersion, and removal processes, which were similar across the three models. Consequently, it is not surprising the model results were similar to each other. The good agreement with observations suggests the emission inventories for EC are reasonably accurate. Organic carbon predictions also showed moderate correlation with observations, but unlike EC, OC was significantly underpredicted. For RM and CM mechanisms, the average NMB was more than −60%, and while MM predicted higher concentrations, especially in the eastern portion of the domain, it also underpredicted observations, with an average NMB of −36%. These prediction errors are likely due to uncertainties in representation of SOA process and precursors in the gas-aerosol mechanisms, as well as uncertainties in POA and SOA emissions [74]. The RM simulation uses a limited SOA representation with the SORGAM scheme, but still produced lower OC concentrations than CM, which did not account for any SOA. The MM simulation uses the same MOSAIC aerosol scheme as CM, but with SOA, and this appeared to lead to better predictions. However, even with the inclusion of some SOA processes, RM and MM simulations still produced poor overall OC prediction results, suggesting that errors in organic aerosol representation remain.
For the secondary inorganic aerosol species (NO3, NH4, and SO4), predictions from all of the models showed a very poor correlation with the hourly measurements from monitoring sites. Monthly averages also had significant errors that varied depending on the model and aerosol species, further highlighting uncertainties in simulating aerosol processes as has been noted by other researchers [11,17,20,72,74,75]. Results of this study show that RM overpredicted NH4, but performed reasonably well with both NO3 and SO4. The CM mechanism, on the other hand, had minimal error with NH4, but overpredicted NO3 and underpredicted SO4, both by more than a factor of 2. MM had similar overpredictions of NO3 as CM, but performed much better with NH4 and SO4. Comparing these results to previous WRF-Chem simulations is difficult, as other studies do not show consistent trends for NO3 and SO4 predictions between different mechanisms. For example, our results are similar to WRF-Chem studies by Balzarini and coworkers [17] and Zhang and coworkers [44], who found NO3 was overestimated by CM, RADM2-MADE/SORGAM, and other mechanisms. However, results from Georgiou [20] and Tessum [11] found that CM, MM, and RACM-MADE/VBS all underpredicted nitrate concentrations. Additionally, while Balzarini [17] and Georgiou [20] showed CM underpredicting SO4, similar to our work, they attribute this negative bias to the absence of aqueous phase SO2 oxidation reactions. That explanation does not fit with the results from this work, where the CM and RM simulations did include aqueous reactions, and MM did not, but it was only the CM results that underpredicted SO4. In better agreement are particulate ammonium predictions, where NH4 concentrations follow SO4 and NO3, due to the formation of (NH4)2SO4 and NH4NO3, and were consistently overpredicted by all three mechanisms in this study and matched or exceeded observations in other reported studies [11,17,20,44].

Conclusions
The WRF-Chem online coupled atmospheric chemistry model was used to simulate gas and aerosol concentrations over the Northern Great Plains for July 2010, prior to the significant increase in oil production activities over the following years. The performance of the CBMZ-MOSAIC (CM), RACM-MADE/SORGAM (RMS), and MOZART-MOSAIC (MM) gas-chemistry aerosol mechanisms were analyzed against each other and against ground level observations. All three gas-aerosol mechanism combinations produced generally similar results for both monthly averages and diurnal variations in hourly concentrations. Ozone and NO2 predictions were within 5% of each other on average, while average PM2.5 concentrations varied between mechanisms by less than 20%. Aerosol subspecies results showed much greater variability between mechanisms (differences of more than a factor of two in some cases) with RM predicting less nitrate, CM less sulfate and ammonium, and MM more organic carbon.
When evaluated against observational data, all simulations were able to reproduce hourly meteorological measurements for temperature, pressure, and relative humidity with good correlation, minimal bias, and little geographical variation in performance. Wind speed prediction performance showed moderate correlation to observations, with overpredictions at western sites and underpredictions at eastern sites. Performance for O3 and NO2 was reasonable, with all mechanisms showing moderate correlation to hourly measurements and average NMB of approximately +20% for O3 and −5% for NO2, but with NO2 overpredicted in the east and underpredicted in the west. Model performance was not as good for fine particulate. Monthly average PM2.5 concentrations were relatively accurate, but with very low correlation to hourly measurements. Aerosol subspecies predictions showed low correlation and, with the exception of EC, significant bias, underpredicting OC, overpredicting NO3 and NH4, and over or underpredicting SO4 depending on the mechanism.
Measured and modeled pollutant concentrations for this time period are relatively low across the mostly rural study domain. O3, NO2, and PM2.5 levels are all well below NAAQS standards. Some geographical variations exist in pollutant levels. Monthly average O3 is highest in the southwest of the domain and lowest in the north. Model simulations show the highest NO2 concentrations around the Minneapolis-St. Paul metropolitan area, higher levels in the south and east along major transportation and population centers, and the lowest concentrations in the northwest. PM2.5 concentrations show a strong gradient from highest levels in the south and southeast to lowest in the north and northwest. Predicted fine aerosol subspecies concentrations also show strong geographical patterns. EC and NO3 behave similarly to NO2, with higher levels in the south and southeast and lower levels in the northwest. NH4 and SO4 concentrations are highest in the south and lowest in the north. OC has a gradient from high levels in the east to lower in the west.
For future model simulations of this region, any of the mechanisms appear suitable for predicting gas phase concentrations and particle phase predictions show similar levels of uncertainty. The July 2010 scenario in this study was specifically chosen to capture baseline conditions for the Bakken region before increased oil and gas production activities would have influenced air quality. Modeling results for this time period indicate that, in general, the northwestern portion of the study domain, where the Bakken oil and gas fields are located, had the lowest pollutant concentrations. Additional studies of the region that include the more recent increases in oil and gas-related emissions can use these results as a basis of comparison.
Supplementary Materials: The following are available online at www.mdpi.com/xxx/s1, Figure S1: Differences between model predictions of maximum hourly O3, mean O3, NO2, and PM2.5 concentrations, Table S1: Meteorology and air quality monitoring sites and available measurements, Table S2: Statistical analysis of predicted hourly near ground temperature at individual monitoring sites, Table S3: Statistical analysis of predicted hourly surface pressure at individual monitoring sites, Table S4: Statistical analysis of predicted hourly relative humidity at individual monitoring sites, Table S5: Statistical analysis of predicted hourly 10 m wind speed at individual monitoring sites, Table S6: Statistical analysis of predicted hourly O3 concentrations at individual monitoring sites, Table S7: Statistical analysis of predicted hourly NO2 concentrations at individual monitoring sites, Table S8: Statistical analysis of predicted hourly PM2.5 concentrations at individual monitoring sites, Table S9: Statistical analysis of predicted daily average PM2.5 elemental carbon (EC) concentrations, Table S10: Statistical analysis of predicted daily average PM2.5 organic carbon (OC) concentrations, Table S11: Statistical analysis of predicted daily average PM2.5 nitrate (NO3) concentrations, Table S12: Statistical analysis of predicted daily average PM2.5 ammonium (NH4) concentrations, Table S13: Statistical analysis of predicted daily average PM2.5 sulfate (SO4) concentrations.