^{‡}

^{1}

^{★}

^{2}

This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution license (http://creativecommons.org/licenses/by/3.0/.)

Chemical data assimilation is the process by which models use measurements to produce an optimal representation of the chemical composition of the atmosphere. Leveraging advances in algorithms and increases in the available computational power, the integration of numerical predictions and observations has started to play an important role in air quality modeling. This paper gives an overview of several methodologies used in chemical data assimilation. We discuss the Bayesian framework for developing data assimilation systems, the suboptimal and the ensemble Kalman filter approaches, the optimal interpolation (OI), and the three and four dimensional variational methods. Examples of assimilation real observations with CMAQ model are presented.

Chemical data assimilation produces improved estimates of the chemical state of the atmosphere by combining information from three different sources: the physical and chemical laws of evolution (encapsulated in the model), the reality (as captured by the observations), and the current best estimate of the distribution of pollutants in the atmosphere (encapsulated in the prior) – all with associated errors [

The chemical interactions take place on a wide range of temporal scales (from milliseconds to days). This makes the system numerically stiff. The concentrations of short lived radical species follow the concentrations of long lived species through quasi steady state relations. After a short time the chemical evolution collapses onto a low dimensional manifold in state space. As a consequence, when meteorological fields are computed off line, ensembles of simulations will tend to converge to the same trajectory. Moreover, a direct adjustment of radical species through data assimilation is not feasible.

In regional air quality simulations, the influence of the initial conditions fades in time, and the concentration fields become largely driven by emission and removal processes (and by lateral boundary conditions in regional simulations). Therefore, to improve the analysis capabilities of CTMs, it is necessary to consider the estimation of emission parameters and lateral boundaries through data assimilation [

Chemical observations are still sparse, as the network is not as extensive as that used in numerical weather prediction. Local observations of chemical and particulate concentrations are strongly influenced by the local variability, yet they are used to constrain large scale three dimensional fields. Recently there has a considerable growth in the available remote sensing (satellite) data on tracer concentrations. This data is characterized by non-negligible biases; a method to alleviate this issue is proposed in [

An additional difficulty arises from the multiphysics nature of the simulation, where the evolution is driven by multiple competing physical processes. A successful data assimilation system need to correctly account for error correlations between chemical species (due to chemical interactions) and between chemical and dynamic variables (due to transport processes).

This paper gives an overview of the state of the art in chemical data assimilation. We review chemical transport models in Section 1.1 and chemical observations in Section 1.2. Section 2 is devoted to the formulation of the chemical data assimilation problem in a Bayesian framework. Practical assimilation methods discussed include optimal interpolation (OI) (Section 3.3), suboptimal Kalman filters (Section 3.2), ensemble Kalman filters (Section 3.4), three dimensional variational (3D-Var, Section 3.5) and four dimensional variational data assimilation (4D-Var, Section 3.6). Challenges to chemical data assimilation such as data inputs, the construction of adjoints, and the construction of error covariance matrices are highlighted in Section 4. Assimilation results with real data and the CMAQ model are presented in Section 5. Section 6 draws conclusions and pinpoints to future directions in chemical data assimilation.

An atmospheric chemical transport model (CTM) solves the mass balance equations for concentrations x_{(}_{i}_{)} of tracer species 1 ≤

Here _{(}_{i}_{)} are expressed as a mole fraction (e.g., the number of molecules of tracer per 1 billion molecules of air); the absolute concentration of tracer _{(}_{i}_{)} (molecules/cm^{3}). _{(}_{i}_{)} is the rate of transformations of species i and depends on all other concentrations at the same spatial location. Such local transformations are determined by gas and liquid phase chemical kinetics, by inter-phase mass transfer, by aerosol dynamic processes (coagulation and growth), by thermodynamic processes, _{(}_{i}_{)} and the ground level emissions are _{(}_{i}_{)}. The deposition velocity is
^{initial}, and is subject to Neumann boundary conditions [^{ground}. Dirichlet boundary conditions [^{inflow} (along the top and, for regional models, along the lateral boundary as well). A no diffusive flow condition is imposed at the outflow boundary Γ^{outflow} (along the top and, for regional models, along the lateral boundary as well).

The numerical solution to

In (2), the solution x_{i}_{i}_{0} is obtained by sampling x^{initial} at the grid points. The model solution operator ℳ depends on model parameters such as emission rates, deposition velocities, and boundary fluxes. In principle, all the model parameters, as well as the initial conditions x_{0}, can be retrieved through data assimilation if there are enough observations. However, we have to limit the number of model parameters to be determined as the observations in reality are lacking to accurately constrain the problem.

While there are numerous CTMs available for both regional and global applications, the community Multiscale Air Quality (CMAQ) model is primarily used to provide examples in Section 5. As an open-source community model, CMAQ is widely used by the air quality community worldwide and continuously updated with support from the U.S. Environmental Protection Agency (EPA) and Community Modeling & Analysis System (CMAS) [

Measurements of atmospheric chemical fields have been significantly increasing for the past years throughout the world. Many ground-based networks have been established to routinely monitor the air quality on the surface level. For instance, in the U.S. the AIRNow network has been reporting ozone and fine particle observation (PM2.5,

In addition to the in-situ measurement networks, multiple satellite instruments with capability to measure the troposphere and stratosphere atmospheric chemical fields have been operating to provide real-time measurements [_{2} from TES and surface flask measurements [

In recent years, many field experiments have been carried out with intensive measurement activities. For instance, the International Consortium for Atmospheric Research on Transport and Transformation (ICARTT) field campaign took place in the northeastern United States and the Maritime Provinces of Canada during summer 2004 [

We now summarize some of the previous work in the field of chemical data assimilation. The field has accumulated a large body of work from contributions by many authors. Among those, many excellent papers were products of the Global and Regional Earth-System Monitoring Using Satellite and In situ Data (GEMS [

Two approaches to data assimilation have become widely used in applications: variational methods, rooted in control theory, and Kalman filter methods, rooted in statistical estimation theory. The base concepts of the variational approach to chemical data assimilation are discussed in [

EnKF, extended Kalman filter [

The true state of the system (the true distribution of tracer concentrations in the atmosphere) is a continuous vector field ^{t} distributed across three space dimensions and one time dimension. The number of components of the vector at a given location and a given moment equals the number of chemical species present in the atmosphere. The true state is unknown and needs to be estimated from the available information.

In practice we work with a finite dimensional representation of the continuous field x^{t} = 𝘚 (^{t}) ∈ ℝ^{n}^{t} from the available information. The operator 𝘚 maps the physical space to the model space (for example, it can sample the continuous field at the grid points, or it can lump several chemical species into a single representative family, then average the family concentration over each grid cell,

In order to obtain an estimate of x^{t} data assimilation combines three different sources of information: the prior information, the model, and the observations. The best estimate that optimally fuses all these sources of information is called the analysis, and is denoted by x^{a} ∈ ℝ^{n}

The background (prior) probability density 𝒫^{b}(x) encapsulates our current knowledge of the tracer distribution. Specifically, 𝒫^{b}(x) describes the uncertainty with which one knows x^{t} at the present, before any (new) measurements are taken. The mean taken with respect to this probability density is denoted by

The current best estimate of the true state is called the apriori, or the background state x^{b} ∈ ℝ^{n}^{b} =
^{b} [x].) A typical assumption is that the random background errors ^{b} = x^{b} − x^{t} are unbiased and have a normal probability density,

Here ^{b} [^{b} (^{b})^{T}^{n}^{×}^{n}

The model _{0} ∈ ℝ^{n}_{0} to future state values x_{i}^{n}_{i}

The size of the state space in realistic chemical transport models is very large, typically ^{7}) variables for regional models and ^{8}) for global models. The model is a first order Markov process, meaning that the probability distribution of the state at time _{i}_{i}_{−1}:
_{i}_{0},…, x_{i}_{−1}]) =
_{i}_{i}_{−1}).

Observations represent snapshots of reality available at several discrete time moments. Specifically, measurements y_{i}^{m}_{i}

The observation operator ℋ^{t} maps the physical state space onto the observation space. The measurement (instrument) errors are denoted by

In order to relate the model state to observations we also consider the relation

where the observation operator ℋ maps the model state space onto the observation space. In many practical situations ℋ is a highly nonlinear mapping (as is the case, e.g., with satellite observation operators). At present the chemical observations are sparsely distributed, and their number is small compared to the dimension of the state space,

The observation error term
^{t})

Typically observation errors are assumed to be unbiased and normally distributed

Observation errors at different times (

Based on these three sources of information data assimilation computes the analysis (posterior) probability density
^{a}(x). Specifically,
^{a}(x) describes the uncertainty with which one knows x^{t} after all the information available from measurements has been accounted for. The mean taken with respect to this probability density is denoted by
^{a} [^{a}(x)

The best estimate x^{a} of the true state obtained from analysis distribution is called the aposteriori, or the ^{a} =
^{a} [x], but this is not necessary; in the maximum likelihood approach the refined estimate of the true state is obtained from the analysis distribution mode). The analysis estimation errors ^{a} = x^{a} − x^{t} are characterized by the ^{a} =
^{a} [^{a}] and by the ^{a} [(^{a} − ^{a}) (^{a} − ^{a})^{T}^{n}^{×}^{n}

The chemical data assimilation problem is formulated in a Bayesian framework. The analysis probability density is the probability density of the state _{1}, …,y_{N}

The denominator
^{b}) − y

Since the observation errors
_{1},…, _{N}

Bayes' theorem ^{7}). Approximations are needed in order to represent such densities. One approach is to approximate all probabilities involved by normal distributions, in which case closed form solutions for the posterior density are possible, see Section 2.3. Practical algorithms based on normal approximations are the suboptimal Kalman filters, discussed in Section 3.2. Another possible approximation is the Monte Carlo approach, where all the probability densities involved are represented by samples in the state space. In this case the application of Bayes' theorem

In practice we want to use ^{a} of the true state x^{t} that are optimal in a certain sense. One way to define a best estimator is to minimize the expected values of the mean square error min
^{a} [‖x^{a} − x^{t}‖^{2}]. The resulting minimum mean square error (MMSE) estimator is given by the mean of the posterior distribution, x^{a} =
^{a} [x]. This estimator is not practical for large scale systems, as it requires an integration in the high dimensional state space. Practical estimators are obtained by taking the mean of an approximation of the posterior distribution, see for example Section 3.4. A computationally feasible estimator is given by the mode of the posterior distribution, and is called the maximum aposteriori estimator (MAP), as discussed in Section 2.4. Of particular interest are unbiased estimators, which are characterized by a zero posterior error mean (^{a} = 0). A minimum variance unbiased (MVUE) estimator x^{a} has the smallest total variance (min trace
^{a} [ (x^{a} −
^{a} [x^{a}])(x^{a} −
^{a} [x^{a}])^{T}

Consider a time invariant ideal case where the observation operator is linear

After inserting ^{a}(x) =
^{a},

with the analysis mean x^{a} and covariance

where ^{n}^{×}^{m}

In the maximum likelihood approach one looks for the argument that maximizes the posterior distribution, or equivalently, minimizes its negative logarithm:

The scaling factors of the probability densities, as well as the term – ln

Similarly, under the assumption that observation errors are independent

The maximum likelihood estimator is obtained as the minimizer of the cost function

where the constant terms have been left out.

Note that if, in addition, the observation operator is linear

The result is the Kalman filter estimate for the mean

Typical data assimilation applications are concerned with time dependent systems, e.g., the evolution of the chemical composition of the atmosphere. In such applications the interest is not focused on one analysis at one time, but on a series of analyses for times _{1},…, _{N}

There are two approaches to obtain the analysis probability densities
^{a}(x_{i}_{1}, …, _{N}_{i}_{1},…, y_{N}

In the _{1},…, _{N}_{i}_{i}_{1},…, y_{i}

We now discuss the Kalman filter approach in the ideal case where the observation operator is linear _{ti−1→ti} (x) = M_{ti−1→ti} · x.

The background state (_{i}_{i}_{−1}):

Note that a model forecast starting from the true state at _{i}_{−1} does not reproduce the true state at _{i}

where _{i}_{i}_{i}

The background error at _{i}_{i}_{−1}, transported through the model equations, and the model error

The model error _{i}_{i}_{i}_{−1} to _{i}

For every observation time _{i}_{i}

with the Kalman gain matrix given by

where _{i}_{i}

Practical data assimilation algorithms use the estimation approaches presented in Section 2.2, together with various approximations most often related to Gaussian assumptions and to the structure of the underlying physical model.

The extended Kalman filter (EKF) generalizes the original

The EKF approach modifies

The extended Kalman filter is not practical for large systems because of the
^{2}) memory size needed to store full covariance matrices, and the prohibitive computational costs associated with inverting large matrices in

A low memory approximation of a covariance matrix _{(}_{ℓ}_{)} for ℓ = 1,…,_{(}_{ℓ}_{)} and x_{(}_{k}_{)} can be modeled as decreasing with the distance between the gridpoints of ℓ and

Polynomial models of spatial correlations [

The simplest approach to avoid the cost of _{(}_{ℓ}_{)} as passive tracers from _{i}_{−1} to _{i}

The _{i}_{i}

Using a rank _{i}_{−1}

leads to the following forecast covariance

The terms

and the analysis covariance

It is immediate that the analysis increments

Localization improves the accuracy of the approximation by removing spurious long-distance correlations, and results in a full rank forecast covariance matrix.

Optimal interpolation [_{(}_{ℓ}_{)}. For example, these can be observations located sufficiently close to the the grid point where x_{(}_{ℓ}_{)} is defined. Let _{l}^{μ}^{×}^{m}_{ℓ} = _{ℓ}y ∈ ℝ^{μ}_{ℓ} = _{ℓ}^{μ}^{×}^{n}

Let ⅇ_{ℓ} ∈ ℝ^{n}_{(}_{ℓ}_{)} is given by

The cost of forming and solving the matrix
^{3}), instead of
^{3}) for the complete matrix ^{T} + _{l}_{l}^{b} of the important observations are used. The weight
_{ℓ} (_{ℓ}), and transposing the result. The analyses for different components ℓ can be computed in parallel.

When approximations of

The ensemble Kalman filter (EnKF) [_{i}_{−1} is represented by the sample points
_{i}

where the random variable _{i}_{i}_{i}

Each member forecast ensemble is processed separately using

To obtain the correct posterior statistics, a different set of perturbed observations is used for each ensemble member, y_{i}_{i}_{i}_{i}_{i}

The ensemble Kalman filter raises several issues. First the rank of the estimated covariance matrix

In spite of the problems, ensemble Kalman filter has many attractive features. The effects of non-linear dynamics are captured by the use of the forward model

Numerous improvements of the original EnKF [

The use of EnKF [_{(}_{ℓ}_{)} and x_{(}_{k}_{)} are defined in ^{f}. It has been observed in practice that, after a number of assimilation cycles, all ensemble members tend to be close to one another in the state space. In this case the estimated forecast covariance ^{f} by a factor

Variational methods solve the data assimilation problem in an optimal control framework [

In the 3D-Var data assimilation the observations _{1},…, _{N}_{i}_{i}_{−1})

The discrepancy between the model state x_{i}_{i}

While in principle a different background covariance matrix should be used at each time, in practice the same matrix is re-used throughout the assimilation window, _{i} =

Typically a gradient-based numerical optimization procedure is employed to solve

Note that the gradient requires to computation of the adjoint
_{i}_{i}

Preconditioning is often used to improve convergence of the numerical optimization problem _{i}

In strongly-constrained 4D-Var data assimilation all observations _{1},…, _{N}_{0}; they uniquely determine the state of the system at all future times via the model

Given the background value of the initial state
_{0}, the observations y_{i}_{i}_{i}

Note that the departure of the initial conditions from the background is weighted by the inverse background error covariance matrix, while the differences between the model predictions _{i}_{i} are weighted by the inverse observation error covariance matrices. The 4D-Var analysis is computed as the initial condition which minimizes

The model

The large scale optimization problem

The 4D-Var gradient requires not only the linearized observation operator _{i}_{i}

In the incremental formulation of 4D-Var [

where _{i}_{t0}_{→}_{ti}_{0}, and _{i}

Weakly constrained 4D-Var avoids the assumption of a perfect model, implicit in the formulation _{i}_{i}

The weakly constrained 4D-Var estimate of x = [x_{0}, x_{1},…, x_{N}

The optimization variables are the model states at all times x ∈ ℝ^{n}^{(}^{N}^{+1)}, and therefore the resulting optimization problem is of larger dimension than that for strongly-constrained 4D-Var.

Insightful comparisons of the relative merits of EnKF and 4D-Var [

EnKF is simple to implement, while 4D-Var requires the construction of adjoint models, a non-trivial task in the presence of stiff chemistry [

On the other hand the 4D-Var optimal solution is consistent with model dynamics throughout the assimilation window. 4D-Var naturally incorporates asynchronous observations while for EnKF asynchronous observations require a more involved framework [

Very recent wok has focused on the development of hybrid data assimilation methods, that attempt to combine the advantages of both variational and ensemble techniques [

Running chemical transport models requires several essential components. Firstly, model-ready emission files have to be processed using emission inventories. Secondly, meteorological states are needed for commonly-used off-line CTMs. Lastly, the realistic initial concentrations for various constituents are required. A spin-up period is often chosen to generate such initial fields when no previous run results are available. Chemical data assimilation adds two more components to these,

Obtaining and utilizing atmospheric chemical observations remains a challenge. Currently atmospheric chemical observations come from many different sources. They vary greatly in their dissemination methods, availability, data reliability due to different validation and quality control methods, instrument descriptions and measurement uncertainties, temporal and spatial resolutions, and data formats. “Integrated Global Atmospheric Chemistry Observations” (IGACO) is an ongoing effort as a component of the Integrated Global Observing Strategy (IGOS) partnership [

The most important challenge posed by 4D-Var data assimilation is the need to construct and maintain an adjoint of the chemical transport model. The construction of adjoint models is a labor intensive and error prone task. Moreover, the adjoint is specific to the chemical transport model version at hand; any new release of an improved version of the code requires changes in the adjoint model to reflect the changes in the forward model. The construction of the adjoint model is a continuous process that follows closely the development of the forward chemical transport model.

The adjoint of a chemical transport model consists of adjoints of all the individual science processes [

The two approaches lead to different results, since taking the adjoint and discretization operations do not commute. Considerable work has been done to understand the theoretical properties of different types of adjoint models, and the implications they have on sensitivity analysis and chemical data assimilation [

Specialized tools have been developed to assist the construction of chemical transport adjoint models. The chemical kinetic preprocessor KPP produces efficient code for the simulation of stiff chemistry, together with efficient tangent linear and discrete adjoint chemical kinetic models [

The quality of the assimilation depends on the accuracy with which the background and observation error covariances are known; misspecification of these covariances directly impacts the accuracy of the analysis [

Background error covariances determine the relative weighting between observations and a priori data, and dictate how the information is spread in space and among variables. Background error covariances are based on models of the error at the current time (or at initial time in 4D-Var). In case of cyclic data assimilation the analysis error covariance from the previous cycle, transported to the current time, may be used as the new background error covariance. Background error covariance matrices need to:

capture the spatial error correlations created by the flow (transport and diffusion),

capture the inter-species error correlations created by the chemical interactions,

have full rank, such that terms of the form x^{T}^{−1} x make sense, and

allow for computationally efficient evaluations of matrix vector operations of the form ^{1/2} x, and ^{−1} x.

Reasonable approximations and representations of the background error are crucial to data assimilation applications. Chai [

An autoregressive (AR) model approach to represent background error covariance matrices has been proposed in [^{b}〉 = 0, and background covariance

Here (

A simplified approach proposed in [

In the context of 4D-Var chemical data assimilation the hybrid approach discussed in [

At the end of any data assimilation calculation one would like to estimate the quality of the analysis,

In operational data assimilation the goal is to improve forecasts. The model is initialized with the analysis that incorporates information from all past observations; the model is run, and the forecast is compared against the new observations that become available in the subsequent time window. Well established metrics for model-observation discrepancies in forecast mode are the forecast skill scores [

The data assimilation system itself has the ability to provide estimates of the posterior error magnitude. If an ensemble Kalman filter is used, estimates of the analysis covariance matrices
_{i}

In [

As described in Section 4.3, model background error statistics are crucial in data assimilation applications. It is important to gain knowledge of model uncertainties for a CTM with its specific setups, including the gas phase chemistry mechanism and aerosol module, model resolution, emission inventories,

A computational grid with a 12-km resolution covering the contiguous United States (CONUS, shown in

Repeating the steps described in [

Model error correlation coefficients are shown in

Two CMAQ data assimilation systems are built with 4D-Var and OI approaches separately. The data assimilation time window is set to start from 1200Z on August 5, 2007 until 1200Z on August 6, 2007. In this 24-h period, the AIRNow hourly-averaged observations are assimilated and the observations are assumed to be un-correlated with each other and have a uniform root-mean-square error set as 3.3 ppbv. To check the effect of the data assimilation tests, an additional “forecast” day, starting from 1200Z on August 6, 2007 until 1200Z on August 7, 2007 is continuously run and will be evaluated against the AIRNow observations that are not assimilated in any of the assimilation tests.

In the 4D-Var data assimilation, the initial ozone concentrations are chosen as the only control parameters to be adjusted. Currently, the ozone background error covariance matrix B is assumed to be diagonal, with the root-mean-square errors set as 14.3 ppbv at every grid point. A quasi-Newton limited memory L-BFGS [

For the OI data assimilation runs, the assimilation happens every hour by combining the model results with the observations. To illustrate the effect of the background error covariance, we designed a case that eliminates the spatial correlation usage, both horizontally and vertically. It is listed in

where _{h}^{2} ppbv^{2}. Instead of using a constant vertical correlation structure obtained in Section 5.1, we use the boundary layer depth information available from the meteorological inputs. In Case 4, the vertical correlation coefficients are set as 1.0 for any two model grid layers inside the boundary layers. Otherwise, it is assumed there is no correlation for the background error.

Compared to ozone predictions, CMAQ PM2.5 predictions are much worse for the NAQFC experimental runs [

In the test, the MODIS AOD fine mode products are used. The model counterpart can be reconstructed by integrating the hourly extinction coefficients over the whole vertical columns. The extinction coefficients calculated from two visibility methods, Mie theory approximation and mass reconstruction method [_{h}_{MODIS}

^{2} improve over four out of six days in both regions. It is encouraging as the correlation between the column quantity of AOD and the surface PM2.5 is not linear. A better reconstructed AOD cannot guarantee better predictions of surface aerosol. The current simplification of placing the observations at a single time each day and adjusting all the aerosol species using a single factor will be modified in the future. In addition, switching OI approach to 3D-Var or 4D-Var method is expected to generate better assimilation results.

New developments in chemical data assimilation techniques and algorithms, and the increased volume and diversity of available chemical measurements, have opened exciting opportunities for better science through the integration of chemical transport models and observations. Chemical data assimilation has begun to play an essential role in air quality assessments for environmental management. Widely used chemical transport models such as STEM, CMAQ, and GEOS-Chem, have been endowed with adjoint sensitivity analysis and data assimilation capabilities, and are now being used by the community to answer important scientific questions. The availability of these tools, and the growing importance of chemical weather forecasting to society, should help stimulate significant advances in chemical data assimilation in the foreseeable future.

Future advances will require a sustained development of new chemical data assimilation algorithms. While there is much to build upon from the assimilation experience in weather prediction, there are significant differences and challenges that are specific to chemical weather. Promising possibilities are opened up by combining the strengths of 4D-Var and EnKF techniques in hybrid data assimilation methods. Feedbacks between the meteorological and air quality components, which have mostly been studied as separate systems, are critical to improving the understanding of air quality. Future work needs to built the infrastructure required to couple meteorological and air quality forecasting and data assimilation systems. Finally, current chemical data assimilation system capabilities should be extended to enable the optimal design of the observing systems, and to rigorously quantify the informational value added by each instrument in heterogeneous sensor networks.

CMAQ CONUS computational domain and ozonesonde locations. Red circles indicate ozonesonde locations where observations are used to calculate vertical model error statistics. Unit of longitude and latitude: degree.

Ozone error statistics results through Hollingsworth-Lönnberg approach. AIRNow observations are used to get horizontal error statistics (left). Ozonesonde observations are used in calculating vertical model error statistics (right). Unit of height: meter.

Scatter plots of AIRNow ozone observations and CMAQ predictions for the assimilation (upper, (

MODIS AOD (fine mode) and CMAQ reconstructed AOD. AOD-Recona and AOD-Reconb are calculated before and after assimilation. The differences (AOD-Recona-AOD-Reconb) are also shown.

Model ozone biases and root-mean-square errors (RMSE) against AIRNow observations during 8:00 am–8:00 pm local time on Day 1 (August 5, 2007) and Day 2 (August 6, 2007). Case 1 is the base case,

1 | N/A | N/A | 8.3 | 15.9 | 8.7 | 16.3 |

2 | 4D-Var | Diagonal | −0.8 | 11.0 | 7.6 | 15.6 |

3 | OI | Diagonal | 2.6 | 12.7 | 7.5 | 15.8 |

4 | OI | H⊗V⊗C | −1.3 | 13.2 | 3.1 | 12.8 |

Correlation between CMAQ PM2.5 predictions and AIRNow hourly observations in Upper Midwest (UM) and Northeast (NE) US before and after (OI) MODIS AOD assimilation.

^{2} |
||||||
---|---|---|---|---|---|---|

0.420 | 0.138 | 0.355 | 0.154 | 0.234 | 0.021 | |

0.399 | 0.178 | 0.311 | 0.180 | 0.270 | 0.041 | |

0.253 | 0.416 | 0.097 | 0.070 | 0.156 | 0.217 | |

0.306 | 0.367 | 0.110 | 0.207 | 0.171 | 0.206 |

The work of A. Sandu has been supported in part by NSF through awards NSF OCI-0904397, NSF CCF-0916493, NSF DMSï£¡0915047.

_{X}

_{2}column observations

_{2}sources and sinks using satellite observations of

_{2}from TES and surface flask measurements

The paper is dedicated to the memory of Dr. Daewon Byun, whose work remains a lasting legacy to the field of air quality modeling and simulation.