Evaluation of the Variability in Chemical Transport Model Performance for Deposition and Ambient Concentrations of Nitrogen and Sulfur Compounds

Air quality models are increasingly used to develop estimates of dry and wet deposition of sulfate and nitrate in watersheds (because of lack of measurements) in an effort to determine the acidifying deposition load into the aquatic systems. These models need to be rigorously evaluated to ensure that one can rely on the modeled quantities instead of the measured quantities. In the United State (U.S.), these models have been proposed for use in establishing national standards based on modeled quantities. The U.S. Environmental Protection Agency (EPA) is considering aquatic acidification as the main ecological endpoint of concern in determining the secondary national ambient air quality standards for nitrogen oxides and sulfur oxides. Acidification is tied to depositions of sulfur and nitrogen, which are linked to ambient concentrations of the elements. As EPA proposes to use a chemical transport model in linking deposition to ambient concentration, it is important to investigate how the currently used chemical transport models perform in predicting depositions and ambient concentrations of relevant chemical species and quantify the variability in their estimates. In this study, several annual simulations by multiple chemical transport models for the entire continental U.S. domain are evaluated against available measurement data for depositions and ambient concentrations of sulfur oxides and reactive nitrogen species. The model performance results vary by evaluation timescale and geographical region. Evaluation of annualized quantities (annual average ambient concentrations and annual total depositions) suppresses the large variances shown in the evaluation using the observation's native shorter-term timescales (e.g., weekly). In 401 addition, there is a large degree of bias and error (especially for deposition fluxes) in the modeling results that brings to question the suitability of using air quality models to provide estimates of deposition loads. Variability in the ratio of deposition to ambient concentration, so-called the Transference Ratio that EPA has proposed to use in linking deposition to ambient concentration, is also examined. Our study shows that the Transference Ratios as well as total reduced nitrogen deposition, another modeled parameter EPA proposed to use in the process of determining the new secondary standard, vary considerably by geographical region and by model simulation.


Introduction
The current secondary (welfare-based) National Ambient Air Quality Standard (NAAQS) for oxides of nitrogen (NO X ) is 0.053 ppm (53 ppb or 100 µg/m 3 ), calculated as the annual arithmetic average of the 1-hour NO 2 concentrations.The standard, which was selected to provide protection to the public welfare against acute injury to vegetation from direct exposure and resulting phytoxicity, was originally set to identical to the primary standard (to protect public health) in 1971.While the U.S. Environmental Protection Agency (EPA) has recently strengthened the primary standard by establishing a new 1-hour standard at a level of 100 ppb based on the 3-year average of the 98th percentile of the yearly distribution of 1-hour daily maximum concentrations, the secondary standard remains as it was set in 1971.The current secondary standard for oxides of sulfur (SO X ) was also set in 1971 and uses SO 2 as the atmospheric indicator.The standard of 0.5 ppm, based on a 3-hour average SO 2 concentration, is not to be exceeded more than once per year.
EPA has recently conducted a review of the secondary NAAQS for NO X and SO X and argued that there is now sufficient information available to support the concept that the largest impact to public welfare from SO X and NO X are the ecological effects associated with the deposition of nitrogen and sulfur compounds to terrestrial and aquatic environments [1][2][3][4].Based on the findings from this review, EPA has concluded that the current secondary NO X and SO X standards are not adequate to provide sufficient protection against adverse ecological effects associated with deposition of oxides of nitrogen and sulfur to sensitive ecosystems.Three issues were identified as main concerns of the current structure of the standards:  Ecological effects related to deposition are usually associated with depositional loads occurring over periods of months to years, which differ significantly from the short-term concentrations used by the current standards;  The current standards, which use NO 2 and SO 2 as indicators, address only a fraction of total atmospheric oxides of nitrogen and sulfur because they do not capture all relevant chemical species of nitrogen and sulfur that contribute to deposition; and  Under the current standards, NO 2 and SO 2 are assessed individually even though they play a joint role in acidifying aquatic ecosystems.
Addressing these issues, EPA proposed the following as the elements of the new secondary NAAQS:  Indicator: The concentration of all the major oxides of sulfur (SO X = SO 2 + particulate SO 4 ) and nitrogen (NO Y = NO + NO 2 + HNO 3 + PAN + 2 N 2 O 5 + HONO + NO 3 + organic nitrates + particulate NO 3 );  Form: The Aquatic Acidification Index (AAI) designed to be an ecologically relevant form of the standard that determines the allowable levels of ambient NO Y and SO X ;  Averaging Time: An annual averaging time based on the average of each year over a consecutive 3 to 5 year period; and  Level: A target chronic Acid Neutralizing Capacity (ANC) value in range of 20 to 70 µeq/L.The Clean Air Scientific Advisory Committee (CASAC) reviewed EPA's assessment and agreed that aquatic acidification should be the focus for developing a secondary NO X and SO X NAAQS because of the quantity and quality of the ecological effects data [5,6].
The complexities in the responses of ecosystems to deposition of atmospheric NO Y and SO X require a more sophisticated form of the standard than the current structure.The conceptual design of the new secondary NAAQS proposed by EPA consists of three main components: (1) linkage between ecological indicators and ecological effects; (2) linkage between an ecological indicator and atmospheric deposition; and (3) linkage between deposition and ambient air indicators.With respect to linking ecological indicator to adverse effects of aquatic acidification, EPA proposed to use ANC which is the most widely used indicator of acid sensitivity and has been found in various studies to be the best single indicator of the biological response and health of aquatic communities in acid sensitive systems.To link atmospheric deposition to the ecological indicator, ANC, EPA proposed to use ecosystem acidification models that quantify the relationship between deposition of nitrogen and sulfur and the resulting ANC in surface waters based on an ecosystem's inherent generation of ANC and ability to neutralize nitrogen deposition through biological and physical processes.Finally, the linkage between deposition and ambient concentrations of NO Y and SO X introduces a new quantity termed as Transference Ratio (T) which is defined as the ratio of total wet and dry deposition to concentration: where Dep(SO X or NO Y ) is the annual total wet and dry deposition of SO X or NO Y , and [SO X or NO Y ] is the annual average concentration of ambient SO X or NO Y .Due to lack of appropriate measurement data, EPA proposed to use EPA's Community Multiscale Air Quality (CMAQ) modeling system [7] to calculate the Transference Ratios.EPA defines the AAI in terms of four ecological and atmospheric factors and ambient air indicators (NO Y and SO X ) as follows: where F1 represents the ecosystem's natural neutralizing capability; F2 represents acidifying depositions associated with reduced nitrogen (NH X ) based on CMAQ model simulations; F3 and F4 are the Transference Ratios to convert ambient NO Y and SO X concentrations to nitrogen and sulfur deposition, respectively, which is also based on CMAQ modeling.The value of F1 would be based on a representative runoff rate as well as a representative critical load (the amount of acidifying atmospheric deposition of nitrogen and sulfur beyond which a target ANC is not reached) for the ecosystem associated with a single national target ANC level.The values of F1-F4 will vary spatially and thus an appropriate regionalization scheme is necessary for spatial aggregation (or averaging).More details on each of the above F factors are given elsewhere [4].
As described above, determination of the AAI value (and establishment of the new secondary NAAQS) relies heavily on use of chemical transport models.Therefore, it is important to investigate how the currently used chemical transport models such as CMAQ and the Comprehensive Air-quality Model with Extensions (CAMx) [8] perform in predicting depositions and ambient concentrations of relevant chemical species and quantify the variability in their estimates.The CMAQ modeling system has been periodically given updates, and for each new release EPA performed a comprehensive operational evaluation mostly focusing on eastern U.S. [9][10][11][12][13].CAMx model performance has also been investigated in various U.S. regions [14][15][16][17].While providing detailed analysis on the model performance, conventional model evaluation studies typically focus on single model application.
The Transference Ratios, TSO X and TNO Y , are calculated from model simulations and are key parameters in determining AAI.Another key parameter in the AAI formulation is total reduced nitrogen loading (LNH X , i.e., the term F2 in the AAI equation) that is also based on model simulations.EPA suggested that these parameters are sufficiently stable across time and space based on the CMAQ modeling results over the Adirondacks and Shenandoah case study areas [4].We examined variability in these parameters over various regions in contiguous U.S. using several annual model simulation datasets.

Modeling Databases
Several existing chemical transport model simulation outputs were selected based on two conditions: (1) since the new secondary NAAQS proposed by EPA is based on annualized depositions and ambient concentrations, the modeling period should cover at least a year; (2) both the concentration and deposition outputs should be available.Since many of the historical model runs were performed to address concentration issues (e.g., ozone, PM 2.5 and regional haze SIPs), the deposition outputs were often not archived.Table 1 summarizes the modeling datasets included in this study.The compiled datasets show variations in the model, emission/meteorological modeling year and grid resolution used.Different models may employ different algorithms for the same atmospheric process.For example, CAMx adopted a dry deposition model based on the approaches of Wesely [18] and Slinn and Slinn [19] while CMAQ uses the M3 dry deposition model by Pleim et al. [20,21].Details on the science algorithms used in the modes can be found in the references given below Table 1 and references therein.Figure 1 shows modeling domains used in these datasets.Each of the VISTAS and UBAQS modeling includes nested grid simulation with 12-km resolution (while covering a different region) in addition to the 36-km continental U.S. (CONUS) modeling domain.The FCAQTF modeling domain consists of 12-km and 4-km nested grids covering southwestern U.S. and the Four Corners region, respectively, as well as the common 36-km CONUS domain.

Model Performance Evaluation
As mentioned earlier, the proposed new standard relies extensively on modeling results rather than monitoring data, therefore, a rigorous evaluation of model performance is crucial.While EPA has conducted performance evaluation for their CMAQ modeling over the continental U.S. modeling domain (the 2002 annual simulation with CMAQ v4.6 and the 2002 through 2005 simulations with CMAQ v4.7) as part of their review of the secondary NAAQS for NO X and SO X [3,4], they focused on statistics based on annualized quantities and/or averaged over large regions.For this study, we performed more comprehensive model evaluation using several annual CMAQ and CAMx simulations for the 36-km CONUS modeling domain, and also examined how different statistics affect the evaluation results.
Several ambient and deposition monitoring networks were used for the model performance evaluation.Ambient concentrations and dry deposition of gaseous and particulate sulfur and nitrogen compounds (SO 2 , HNO 3 , NH 3 , particulate sulfate and nitrate) were evaluated against measurements from the Clean Air Status and Trends Network (CASTNET).Note that CASTNET reports dry deposition fluxes estimated by an inferential method which calculates the fluxes as the product of a locally-measured atmospheric concentration and a modeled deposition velocity [27].The deposition velocity is calculated using the Multi-Layer Model (MLM) where the vegetated canopy is discretized into multiple layers and stomatal and boundary layer resistances are calculated for each of these layers [28].Wet deposition of total (gaseous and particulate) sulfate, nitrate and ammonium were evaluated against measurements from the National Trends Network (NTN) of the National Atmospheric Deposition Program (NADP).Standard sampling interval for both CASTNET and NADP is one week (weekly average concentrations and weekly total deposition).Modeled NO Y concentrations were evaluated against hourly NO Y measurements from the Air Quality System (AQS) database.Additionally, hourly NO Y measurements at eight monitoring stations in the southeastern states (Alabama, Florida, Georgia and Mississippi) from the Southeastern Aerosol Research and Characterization (SEARCH) study were also included in the evaluation.
Table 2 summarizes performance of the selected model simulations using both annualized quantities (annual average for concentrations and annual total for deposition) and those with the monitoring network's native sampling intervals (one week for the CASTNET and NADP monitors; one hour for the AQS and SEARCH monitors).Additional model performance metrics are given in the Supplementary Material (see Tables S1-S3).In general, model performance for ambient concentrations is relatively better than that for depositions with dry deposition performance being especially poor.As noted earlier, CASTNET's dry deposition data is not "true" measurements, but estimates based on an "inferential model" involving measured air concentrations coupled with species-and location-dependent deposition velocities that reflect local land use and meteorological conditions at each monitoring site.Therefore, the performance evaluation results for dry depositions should be taken with caution.However, the large discrepancies (120-230% for gas species and 70-800% for PM species in MNE with native sampling frequency) between CMAQ-/CAMx-predicted dry depositions and CASTNET estimates may indicate significant uncertainties in dry deposition modeling.Spatial variability of deposition surface is one of the main factors that contribute to these discrepancies.Deposition velocities estimated at a monitoring site may not be adequately represented with relatively large size of model's computational grid cell which would contain complex mix of various land use types.Since the deposition flux is expressed as the product of deposition velocity and ambient concentration, the uncertainty in concentration measurements will be propagated directly into the calculated fluxes.Previous field studies indicated 5-30% of biases between different measurement methods for several gas and PM species [27].The models also exhibit particularly poor performance for wet depositions of total ammonium (NH 3 and particulate NH 4 ).As EPA has noted in their assessment [4], modeling ammonia deposition is a difficult task due to complexity of ammonia deposition processes (e.g., bi-directional flux of ammonia) and relatively large uncertainty in the ammonia emission inventory.EPA has attempted to improve CMAQ model performance for depositions, for example, by adjusting the CMAQ model output based on precipitation data generated by the Parameter-elevation Regressions on Independent Slopes Model (PRISM) or by incorporating a bi-directional ammonia flux algorithm in CMAQ [4].However, these corrections should be evaluated in a more comprehensive and thorough way before being utilized in a regulatory framework.In most cases, the biases and errors with weekly or hourly quantities are poorer than those with annualized quantities (see Table 2) as using annualized quantities is subject to compensation of errors due to temporal averaging/aggregation.The effect of temporal averaging is obvious in Figure 2 where the hourly average scatter plot clearly shows model's failure to reproduce high NO Y concentrations observed at urban and suburban sites (scatter plots for individual sites are shown in Figure S1) while the annual average scatter plot suggests more acceptable performance.Spatial aggregation can also affect results of model performance evaluation.Averaging across large regions (e.g., over continental U.S.) may hide spatial variations in model performance because areas of overestimation and underestimation can compensate each other.Spatial variation of the model performance is clearly demonstrated in Figure 3 which shows both overestimation and underestimation of SO 4 wet deposition by the model depending on location of the monitoring site.Factors contributing to the spatial variation in model performance include variations in geographical topography and land use/land cover.Temporal averaging in the performance statistics also affects this spatial variation.Model evaluation for weekly total wet deposition of SO 4 (left panels of Figure 3) shows that the model overestimates in most areas while exhibiting underestimations in certain regions in western and central U.S. while using annualized quantities (right panels of Figure 3) lessens level of the model overestimations in eastern U.S. and even changes sign of the biases in certain areas in central U.S. (from overestimation to underestimation).Note that since conservative estimation is more desirable in regulatory applications, model overestimation is considered more acceptable than underestimation.

Variability in Transference Ratios and Reduced Nitrogen Loading
To examine variability in TSO X , TNO Y and LNH X , we calculated these parameters using compiled set of annual chemical transport model simulations (listed in Table 1) over various regions in U.S. We first focused our analysis on the two case study areas (Adirondacks and Shenandoah; see Figure 4) that have been extensively studied by EPA for their review of the secondary SO X and NO X NAAQS.Variability in TSO X , TNO Y and LNH X is displayed by utilizing a "box and whisker" plot in which gray boxes represent 50% of the distribution around the median and any data points beyond whiskers are regarded as outliers.While EPA presented the inverse of TSO X and TNO Y in their variability analysis, we directly plotted these quantities as they were defined.We focused our analysis on the data range within the top and bottom whisker ends excluding outliers.Figure 5 presents variability in the TSO X , TNO Y , and LNH X values across the grid cells within each case study area calculated by all the model simulations considered in this study.Both TSO X and TNO Y vary considerably.At Adirondacks, the TSO X values range from 0.56 to 2.3 cm/s (varying by a factor of ~4) and TNO Y from 0.24 to 1.5 cm/s (varying by a factor of ~6).TSO X and TNO Y at Shenandoah show similar variability: The TSO X values spans from 0.35 to 1.3 cm/s (by a factor of ~4) and TNO Y from 0.22 to 0.96 (by a factor of ~4).LNH X changes its value by factors of 4 and 5 at Adirondacks and Shenandoah, respectively.

Conclusions
EPA's recent proposal for new secondary NAAQS of SO X and NO Y rely heavily on chemical transport modeling simulations.Therefore, it is crucial to adequately evaluate model's ability to accurately predict ambient concentrations and deposition of the relevant species.The rigor required when using models to establish a standard (and to determine non-attainment of those standards) should be much higher than when the models are used in a relative sense for implementation of those standards, such as development of control strategies.To supplement EPA's model performance evaluation of the CMAQ model, we conducted a rigorous and comprehensive model performance evaluation using several annual model simulations by multiple chemical transport models.The results show that there are significant uncertainty and variability in model performance of estimating concentrations and depositions of SO X and NO Y species.We also note that a complete model performance evaluation is problematic at the moment due to two issues.Firstly, routine measurements of total oxidized and reduced nitrogen is limited.The National Core (NCore) multi-pollutant monitoring program [29] routinely measures NO Y and SO X beginning in 2011, but the coverage is still not sufficient (83 sites at mainly urban locations).Secondly, significant uncertainties exist in measurement of dry deposition.As noted earlier, CASTNET dry deposition data are model-based estimates, not direct measurements.Uncertainties associated with the inferential measurements of dry deposition have been discussed elsewhere [30][31][32].One of the main uncertainties is introduced by scaling up estimates of deposition velocities from a small area with relatively uniform surface characteristics to a large area consisting of various land use categories.Even with uniform surface condition, conventional inferential approaches cannot be reliably applicable to very stable atmospheric conditions that can occur near surface during nighttime.Evaluation studies where modeled deposition velocities were compared with direct measurements in relatively small areas showed that differences as large as ±30% are common [33].
The conceptual design of the new standard proposed by EPA includes many model-based parameters such as the Transference Ratios (TSO X and TNO Y ) and total reduced nitrogen deposition (LNH X ) to link ambient air indicators (SO X and NO Y ) to an ecological indicator (ANC).EPA argued that these parameters are sufficiently stable to be used for establishing the new standard [4].However, we found that, unlike EPA's analysis, there is a substantial spatial variability in these parameters across the U.S. as well as within a given ecosystem watershed area.It is also shown that the choice of the simulation year, the chemical transport model, and the model configuration (e.g., the grid resolution) can make a large difference in the calculated TSO X , TNO Y and LNH X parameters.Such variability can potentially introduce significant uncertainty in the definition of the new secondary SO X and NO Y standards proposed by EPA.Given identified uncertainties and biases in the modeling results, these parameters should be based on observations, not model estimates.If the model can be shown to reproduce the observed values of these parameters then, and only then, can the model-based parameters be used for the regulatory purpose.

Figure 1 .
Figure 1.Modeling domains for (a) the Visibility Improvements State and Tribal Association of the Southeast (VISTAS) regional haze State Implementation Plan (SIP) modeling study; (b) the Uinta Basin Air Quality Study (UBAQS) modeling study; (c) the Electric Power Research Institute (EPRI) modeling study; and (d) the Four Corners Air Quality Task Force (FCAQTF) modeling study.

Figure 2 .Figure 3 .
Figure 2. Scatter plots of (a) Annual average NO Y concentrations and (b) Hourly average NO Y concentrations: VISTAS 2002 annual Community Multiscale Air Quality (CMAQ) simulation vs. Southeastern Aerosol Research and Characterization (SEARCH) monitoring data.

Figure 4 .
Figure 4. Regions used for analysis of variability in TSO X , TNO Y and LNH X .The Adirondack region includes 18 grid cells in a 36-km modeling grid and 164 cells in a 12-km grid.The Shenandoah region has 23 cells in a 36-km grid and 207 cells in a 12-km grid.

Figure 5 .
Figure 5. Box and whisker plots for variability of (a) TSO X ; (b) TNO Y ; and (c) LNH X across all the grid cells at each case study area and all the model simulations.The number in parentheses represents the number of each sample data.The inner quartiles of each data sample are represented by a stack of two gray boxes, separated at the median by a thin line.The height of the gray boxes together makes up the interquartile range (IQR).The range of data falling within 1.5 IQRs of the median is represented by whiskers.Any outliers that fall between 1.5 and 3 IQRs from the median are represented by asterisk markers, while any outliers falling beyond 3 IQRs from the median are represented by circular markers.The average of each sample is represented by a diamond marker.

Figure 6 .
Figure 6.Box and whisker plots for variability in TSO X (cm/s) at (a) Adirondacks and (b) Shenandoah across all the model simulations.The number in parentheses represents the number of each sample data.How to read a box and whisker plot is explained in the Figure 5 caption.

Figure 7 .
Figure 7. Box and whisker plots for variability in TNO Y (cm/s) at (a) Adirondacks and (b) Shenandoah across all the model simulations.The number in parentheses represents the number of each sample data.How to read a box and whisker plot is explained in the Figure 5 caption.

Figure 8 .
Figure 8. Box and whisker plots for variability in LNH X (kg-N/ha/year) at (a) Adirondacks and (b) Shenandoah across all the model simulations.The number in parentheses represents the number of each sample data.How to read a box and whisker plot is explained in the Figure 5 caption.

Figure 9 .Figure 9 .
Figure 9. Box and whisker plots for variability in (a) TSO X ; (b) TNO Y and (c) LNH X across all the model simulations at each IMPROVE site.The number in parentheses represents the number of each sample data.How to read a box and whisker plot is explained in the Figure 5 caption.

Table 1 .
Annual chemical transport model datasets compiled and used for this study.
[26]ibility Improvements State and Tribal Association of the Southeast (VISTAS) regional haze SIP modeling study[22];2Uinta Basin Air Quality Study (UBAQS)[23];3Electric Power Research Institute (EPRI) project for development and application of the Advanced Modeling System for Transport, Emissions, Reactions and Deposition of Atmospheric Matter (AMSTERDAM)[24];4CAMx modeling over the 36-km continental U.S. domain for various applications including Denver SIP modeling study[25];5Four Corners Air Quality Task Force (FCAQTF) modeling study for the Four Corners region[26];6Continental U.S. 36-km modeling domain (see Figure1

Table 2 .
Model performance statistics 1 for various sulfur or nitrogen compounds using annualized quantities and those with native sampling frequency 2 .