1. Introduction
Formaldehyde (HCHO) is an important Hazardous Air Pollutant (HAP) contributing to cancer risk in the United States (US) [
1]. In addition, HCHO exposure can cause eye and airway irritation, allergies, and pulmonary disease [
2]. The US Environmental Protection Agency (USEPA, Washington, DC, USA) is proposing to update its HCHO inhalation unit risk (IUR) to 6.4 × 10
−6 (μg/m
3)
−1 so that a long-term average ambient air concentration of about 1.3 parts per billion by volume (ppb) would correspond to a cancer risk of 1-in-10
5 [
3]. Moreover, the USEPA considers a reference concentration (RfC) of 7 μg/m
3, about 5.6 ppb, as a threshold for non-cancer health impacts [
3].
The cumulative impacts of ambient HCHO on human health are difficult to assess. On the one hand, cumulative exposure to HCHO alone is the result of both direct emissions (primary HCHO) and chemical formation in the air (secondary HCHO), so a large variety of organic compounds from a multitude of stationary and mobile anthropogenic sources, as well as vegetation (biogenic sources), contributes to ambient concentrations of HCHO. On the other hand, the health impacts of ambient HCHO are exacerbated by reactive chemistry in the air, which enhances the concentrations of ozone and fine particulate matter (PM
2.5) due to the atmospheric radicals produced by HCHO decomposition [
4,
5]. Thus, mitigating HCHO exposure may have multiple air quality co-benefits.
In a previous paper, Olaguer et al. [
6] surveyed the scientific literature on regional models of atmospheric HCHO. They also conducted new 1.3 km horizontal resolution modeling of the Southeast Michigan (SEMI) ozone nonattainment area in the US. They noted how current emissions inventories systematically underestimate primary HCHO so that regional air quality models severely underpredict ambient HCHO relative to observations in industrialized cities such as Detroit, Michigan. Citing measurements of ambient HCHO during the 2021 Michigan-Ontario Ozone Source Experiment (MOOSE) [
7], Olaguer et al. [
6] further demonstrated that the urban HCHO deficit could not be explained by inadequate mechanisms for secondary formation associated with biogenically emitted isoprene, as had been proposed by Marvin et al. [
8].
The University of Michigan School of Public Health (UMSPH) [
9] recently reviewed the status and long-term trends of ambient HCHO exposure in metropolitan Detroit based on 20 years of monitoring data at three regulatory stations (Dearborn, Southwest Detroit, River Rouge) within 5 km of each other in a heavily industrialized section of SEMI. Long-term average concentrations were 2.2 ppb for Southwest Detroit, 2.6 ppb for Dearborn, and 3.2 ppb for River Rouge. The highest level (River Rouge) corresponded to the 84
th percentile of all ground-level HCHO measurements in the US over the same period, while the lowest level (Southwest Detroit) was close to the national median. One site (Dearborn) showed a marginally significant decrease of 0.04 ppb/year, while the other two did not show any statistically significant trends. In 2021, the year of the MOOSE study, the annual mean HCHO concentrations at the three sites were 2.26 ppb, 2.35 ppb, and 3.00 ppb for Dearborn, Southwest Detroit, and River Rouge, respectively [
10].
The area of metropolitan Detroit investigated by UMSPH is an environmental justice community in which multiple industrial facilities are located near residential neighborhoods of mostly ethnic minorities and low-income families. A map of the area was provided by Olaguer et al. [
6]. The goal of this study is to infer more accurate estimates for HCHO emissions in this area than are now available and to compute the resulting spatially distributed cumulative exposure to ambient HCHO, accounting for both primary sources and secondary formation on finer scales than possible with current regional models. This is a necessary intermediate step to address the cumulative impacts of HCHO via the criteria pollutants, ozone, and PM
2.5, which is left to future studies.
Past attempts to quantify primary HCHO using methods other than standard emission factors or measurements directly at the source included inverse modeling. This technique infers emissions from ambient air measurements by means of an air quality model and an optimization scheme that minimizes the differences between atmospheric concentrations predicted by the model and corresponding observations. Olaguer [
11] was the first to apply a 3D fine-scale inverse model with reactive chemistry to infer HCHO emissions from a petrochemical facility based on stationary measurements during the Second Texas Air Quality Study. Olaguer et al. [
12] continued this approach in quantifying HCHO emissions from specific industrial processes at the third largest refinery in the US based on mobile quantum cascade laser measurements. This study applies the inverse modeling technique to improve estimates of HCHO emissions and resulting ambient air exposures in the metropolitan Detroit community of interest. The novelty of this study is that it is the first attempt to quantify cumulative exposures to HCHO on fine scales based on the realistic and comprehensive treatment of both primary and secondary sources in an extended urban area.
2. Methods
2.1. Ambient Air Measurements of HCHO and Related Gaseous Species
The analysis period of this study coincides with the MOOSE binational field campaign that took place at the international border between the US and Canada, mostly in the late spring and summer of 2021 [
7]. During MOOSE, a mobile laboratory was fielded by Aerodyne Research, Inc. (Billerica, MA, USA) as a platform for real-time optical absorption and (electron or chemical) ionization—mass spectrometry measurement techniques. The Aerodyne Mobile Lab (AML), its instrument manifest, and its operations during MOOSE are described in detail by Yacovitch et al. [
13] and in a forthcoming paper [
14]. Note that a sizeable suite of compounds could be measured by the AML at the multiple parts per trillion (ppt) level, even at frequencies of ~1 s in the case of proton transfer reaction—mass spectrometry (PTR-MS) and tunable infrared laser direct absorption spectrometry (TILDAS).
The AML measurements were complemented by routine stationary measurements of hydrocarbons (USEPA Method TO-15) and/or carbonyls (USEPA Method TO-11A) at the Dearborn, Southwest Detroit, and River Rouge monitoring stations operated by the Michigan Department of Environment, Great Lakes, and Energy (EGLE). Moreover, limited measurements of the important nitrogen reservoirs, HONO and HNO
3, were made by a university research team using an annular denuder system [
15] at the EGLE Trinity-St. Mark’s station (42.29582° N, 83.12943° E) on some days during MOOSE.
The AML measured ambient air concentrations of ozone, nitrogen oxides, and organic compounds in the area, of interest, which MOOSE participants referred to as the Dearborn Loop.
2.2. Modeling Methodology
A fine-scale (sub km horizontal resolution) 3D Eulerian air quality model known as the Microscale Forward and Adjoint Chemical Transport (MicroFACT) model [
16] was used to simulate atmospheric concentrations of various chemical compounds. Transport of 35 chemical species is simulated in MicroFACT using standard algorithms, including the Piecewise Parabolic Method [
17] and Smolarkiewicz [
18] for horizontal and vertical advection, respectively, while the Euler backward iterative (EBI) scheme [
19] serves as the chemistry solver.
The MicroFACT model accounts for chemical transformations in the air via a chemical mechanism optimized for urban applications. The mechanism employs 116 gas-phase reactions for daytime and night-time chemistry and includes heterogeneous reactions for the secondary production of nitric and nitrous acid on aerosol surfaces. Reaction rates on aerosols were computed based on a ratio of reaction surface area to air volume of 0.0014 m
2 m
−3 as in Zhang et al. [
20]. Reactions 8(a) and 8(b) of Zhang et al. [
20] were also recently added to simulate the conversion of NO
2 to HONO at the ground.
The 4D variational (4Dvar) data assimilation technique [
21] and an adjoint counterpart to the forward version of MicroFACT (see Supplementary Material of [
16] as well as
Equation (S1) of the
Supplementary Material of this paper for details of the chemical Jacobian used to construct the adjoint) were used to perform inverse modeling based on mobile laboratory and other ambient air measurements during MOOSE. This was the same method used by Olaguer [
11] and Olaguer et al. [
12], who employed a predecessor of MicroFACT to infer petrochemical industry emissions of HCHO from air quality field campaign measurements in Texas.
The modeling grid for this study covers an 8 km × 8 km area encompassing the Dearborn Loop with a horizontal resolution of 400 m. There are 20 vertical layers extending from the surface to the model top at 1500 m above ground level (AGL). The lowest five layers have a width of 2 m, while the vertical resolution decreases parabolically with the height above these layers. A time step of 10 s was selected to ensure computational stability.
MicroFACT uses building-sensitive wind fields generated by the Quick Industrial Complex (QUIC) urban wind model [
22] based on 3D building shape files and available measurements from ground-based meteorological stations. The QUIC outputs cover an 8.4 km × 8.4 km area slightly larger than the MicroFACT model domain described above with 5 m horizontal resolution and the same vertical structure. The higher-resolution wind field was appropriately averaged and staggered relative to the coarser MicroFACT grid to ensure mass conservation.
2.3. Simulation Periods, Meteorology, and Initial/Boundary Conditions
For this study, two 1 h periods with different prevailing winds were chosen for the model simulations. The first simulation period (Period 1) was 22 May 2021, 13:13–14:13 local standard time (LST), during which prevailing winds were approximately WNW. The second simulation period (Period 2) was 30 May 2021, 10:00–11:00 LST, during which prevailing winds were approximately NNE. Emissions inferred by inverse modeling should be interpreted as averages over the appropriate simulation period. Note that the inferred emissions, while strictly pertaining to the selected 1 h periods, are expressed in equivalent US tons per year and compared to reported annual mean emissions.
Figure 1 shows wind streamlines at a height of 1 m AGL generated by the QUIC model, constrained by wind data from the three EGLE monitoring stations in the Dearborn Loop plus the EGLE station at Allen Park near the southwest corner of the model domain. The prevailing wind speeds and directions are such that a pollutant signal can propagate across the entire MicroFACT domain within the simulation period so that an approximate concentration steady state is established at the end of each simulation (assuming steady emissions).
To calculate the planetary boundary layer (PBL) height, values of the surface temperature T
S,j, and nominal surface wind speed U
S,j (where j is the simulation period) were specified from EGLE station measurements that best reflected the incoming air mass (Dearborn for j = 1 and Southwest Detroit for j = 2). Surface pressure, relative humidity, and cloud cover were specified based on reported conditions at the Detroit Metropolitan Wayne County (DTW) Airport. Values for the friction velocity, Monin–Obukhov length, and PBL height were derived from the turbulence parameterization described in the Supplementary Material of Olaguer et al. [
23], assuming a surface roughness length of 0.1 m and the specified values of T
S,j, U
S,j, and cloud cover.
Table 1 summarizes the meteorological input parameters assumed in this study.
The layer temperatures above the surface were extrapolated from the surface temperature assuming the moist adiabatic lapse rate. The resulting vertical profile was used to compute temperature-dependent chemical reaction rates. Clear-sky photolysis rates were multiplied by the ratio of the measured solar radiation flux at the EGLE station in New Haven, Michigan (the nearest station where such measurements are available) to the corresponding clear-sky solar radiation flux obtained from
www.meteoexploration.com (accessed on 8 March 2023).
The profile of vertical diffusivity was computed from the PBL height, the friction velocity, and the Monin–Obukhov length based on the urban diffusivity parameterization of Delle Monache et al. [
24] as implemented by Olaguer et al. [
12]. The horizontal diffusivity was set at a constant value of 50 m
2/s as in Olaguer [
16]. Dry deposition velocities for the 35 transported species were likewise set as in Olaguer [
16] based on preceding literature.
The boundary conditions (BCs) for the air quality fields, which also serve as a uniform prior estimates of initial conditions, are from Olaguer [
16] except as follows. For the three pollutants subject to inverse modeling, namely NO, HCHO, and CO, the BCs were derived from the minimum AML measurements during the simulation period. For the secondary pollutants, NO
2 and O
3, and for the volatile organic compounds (VOCs): isoprene, toluene, xylenes, acetaldehyde, and terpenes, the BCs were derived from averages of AML measurements during the simulation period. For the hydrocarbons: propene and 1,3-butadiene, BCs were derived from annual mean concentrations measured at the appropriate upwind EGLE station (Dearborn or Southwest Detroit). Lastly, BCs for HONO and HNO
3 were taken from MOOSE measurements at the Trinity–St. Mark’s station on 12 June 2021, 9:31–14:39 LST regardless of the simulation period.
Table 2 lists the relevant BCs and deposition velocities assumed for this study.
2.4. Initial Emissions Estimates
Initial estimates of point source emissions for CO, NO
x (=NO + NO
2), and VOCs were based on the 2017 US National Emissions Inventory (NEI) [
25]. One exception was HCHO, for which initial estimates were based on process-specific emission ratios of HCHO to CO derived from available stack tests and field study measurements as described in Olaguer et al. [
6]. Initial estimates for HONO were set at 0.8% of NO
x emissions [
16,
26]. Point source emissions were assigned to vertical layers using calculated plume release heights based on reported stack data for each industrial facility emission point in the NEI within the model domain. The calculated plume release heights were averaged over each horizontal grid cell.
Figure 2 shows the resulting distribution of plume release heights for Periods 1 and 2. Note that the red markers in the figure represent the locations of industrial facilities or EGLE monitoring sites (see
Table 3).
The highest plume release heights in
Figure 2 are associated with power generation, steel manufacturing, and coking facilities. These sites are either in the northwest section of the model domain or in the eastern section near the bank of the Detroit River.
The plume rise-adjusted emissions derived from the NEI were kept unchanged except for NO, HCHO, CO, and NO2. The emissions of the first three compounds were adjusted by inverse modeling based on AML measurements of their concentrations in ambient air. The emissions of NO2, while not directly subject to inverse modeling, were indexed to NO emissions such that the emissions ratio of NO to NOx for all elevated and ground-level sources was always 90%. Numerical experiments with wider sets of inversely modeled compounds beyond NO, HCHO, and CO did not significantly improve the solution quality.
To avoid biasing the inverse model results, the initial point source estimates from the NEI for NO, NO2, HCHO, and CO were first averaged over the entire horizontal domain. The domain-averaged values, denoted by for species i, were re-assigned to grid cells with non-zero plume release heights as the prior emissions estimates. Note that this method lowers the total domain emissions of each of the four compounds relative to the initial NEI-derived estimates. Overall, the prior emissions estimates are purposely conservative.
For stationary non-point sources of NOx and CO, total emissions in the model domain were set equal to the total NOx or CO point source emissions in the model domain multiplied by the ratio of Wayne County (NEI) non-point source emissions to Wayne County (NEI) point source emissions. These total emissions were then divided by the total number of horizontal grid cells. The result was assigned to each surface grid cell as a prior non-point source emissions estimate. The initial estimate for HCHO was obtained by conservatively multiplying the corresponding value for CO by a factor of 0.0002. Note that because marine vessels routinely operate in the Detroit River, no distinction was made between land and water grid cells in assigning non-point source emissions, which are presumed to be largely produced by combustion. Nevertheless, the possibility of significant non-combustion fugitive emissions of HCHO is considered in the inverse modeling adjustments of ground-level emissions.
Mobile source emissions were estimated by scaling county-level emissions computed using the Motor Vehicle Emission Simulator (MOVES) [
27] according to the ratio of total road lengths in each model grid cell to the total road lengths in Wayne County. Biogenic emissions of isoprene were estimated from the high-resolution regional air quality model runs described in Olaguer et al. [
6]. Grid cell-specific isoprene emissions of 0.024 US tons/year were assumed for Period 1, and 0.0072 US tons/year for Period 2.
2.5. Error Parameters
The 4Dvar method requires specification of certain error parameters. Some of these parameters were set independently of the simulation period. These include the assumed measurement errors for the three species subject to inverse modeling, and the corresponding prior estimates of the background concentration error covariances, which indicate the uncertainty in the initial conditions. These parameters are listed in
Table 4.
In addition to the error parameters listed in
Table 4, prior estimates of the emission error covariances must also be set to reflect uncertainties in the prior emissions estimates. These parameters are specific to each simulation period and were tuned to optimize the agreement between the forward model-predicted concentrations of the three inverse-modeled species and the corresponding grid cell-averaged ambient concentration measurements. Separate error covariances were assigned to elevated and ground-level emission sources.
The prior estimate of the point source emission error covariance
for species
i was applied only to elevated sources and is modeled in terms of the prior point source emissions estimate
(see above), the cell-averaged plume release height
(meters), and the tunable parameter
as follows:
The prior estimate of the ground-level emission error covariance
for species
i was applied only to surface cells with no stacks present and is modeled in terms of the prior ground-level emissions estimate
and the tunable parameter
as follows:
In the case of NO and CO, Equation (2) was only applied when the maximum measured ambient concentration of NO in a grid cell exceeded 50 ppb or if the maximum measured ambient concentration of CO in the same grid cell exceeded 300 ppb. This was partially intended to represent transient vehicular traffic plumes beyond longer-term average mobile source emissions. For HCHO, the ground-level emission error covariance allows for non-combustion fugitive emissions that are not correlated with NO or CO emissions.
Note that the inverse modeling method used in this study combines automated emissions adjustments by the 4Dvar technique with the heuristic emissions tuning via the parameters
and
. The selected values of
and
are listed in
Table 5. The lower values of these parameters for Period 2 relative to Period 1 may reflect a lower PBL height, less traffic emissions, or the disposition of sources relative to the prevailing winds.
2.6. Observed Ambient Concentrations
The AML sampling plan during MOOSE was largely, though not exclusively, focused on the chemical fingerprinting of point sources. Other objectives included co-location with EGLE monitoring stations during high ozone days, characterization of lake breeze front chemical conditions, and coordinated actions with other mobile laboratories. The Dearborn Loop was sampled during 12 of the 40 days that the AML was present in the SEMI region. The two simulation periods of interest to this study were selected because of the prevailing wind directions relative to the two EGLE monitoring stations with the most measurements (Dearborn and Southwest Detroit), and because they facilitated contrasting analyses of the emission sources within the Dearborn Loop area.
For this study, AML measurement records with over 150 ppb of NO or 1000 ppb of CO were filtered from the analysis. This was intended to remove interferences from individual motor vehicle plumes without eliminating the transient influences of mobile source fleets, as explained above.
Figure 3 shows grid cell-averaged measurements of ambient concentrations of the three inverse-modeled species during Periods 1 and 2. Note that for Period 1 there were coincident 24 h average HCHO concentration measurements available at the Southwest Detroit and River Rouge monitoring stations. These measurements were treated as indicative of the hourly averages at those locations during Period 1. There were no corresponding station measurements available during Period 2, as carbonyls were only measured every 6 days at best.
There is a significant difference in the magnitudes of the measured ambient concentrations between the two periods of interest. Period 1 concentrations are much higher than those measured during Period 2. This is likely due to the significant influence of emissions from the industrial facilities at the northwest corner of the domain, which were upwind of the AML measurements during Period 1.
The cell-averaged ambient concentrations obtained from ~1 s frequency measurements by the AML were treated by the inverse model as constant during the data assimilation time window. The results of the MicroFACT modeling based on these observation-based inputs are discussed in the next section.
4. Discussion
The HCHO emissions estimated in this study stand in sharp contrast to the corresponding emissions obtained or inferred from reports to the State of Michigan in 2017 by industrial facilities in the Dearborn Loop area.
Table 7 lists these facility emissions as recorded in the Michigan air emissions reporting system (MAERS), a public database maintained by EGLE [
28]. The NO
x, CO, and VOC emissions data in MAERS were the basis for the 2017 NEI data used to derive initial emissions estimates for the model. Note that the facilities in
Table 7 are listed in order of their reported CO emissions. Except for Marathon Petroleum, these facilities have recorded HCHO emissions far below 1 tpy.
The underestimation of primary HCHO is caused by the corresponding emission factors in the USEPA’s AP-42 database [
29] being out-of-date. However, in 2014 the USEPA made changes in VOC, NO
x, and CO emission factors for flares and other units at refineries in response to legal action [
30]. Moreover, emissions inventories usually do not include primary HCHO that is formed as a product of incomplete combustion (PIC) prior to release. As discussed by Olaguer et al. [
6], these combustion emissions have been successfully measured in the field with contemporary ultraviolet/visible, and/or infrared spectroscopic techniques. Such field measurements, however, have mostly not been incorporated into emission factor determinations for HCHO.
As already mentioned, an important PIC co-emitted with HCHO is CO, although CO is itself an active agent in reducing NO
x [
31]. The maximum elevated cell emissions of CO inferred by this study were 7103 tpy (=14.2 million lbs/year) in Period 1 and 7608 tpy (=15.2 million lbs/year) in Period 2, while the maximum ground-level cell emissions were 542 tpy and 491 tpy for Periods 1 and 2, respectively. The magnitudes of the inferred point source emissions suggest that some of the reported CO emissions in
Table 7 may be overestimates, especially for the two steel mills. Even if this were true, the application of typical HCHO:CO ratios would still yield HCHO emission estimates well over 1 tpy for these sources.
For NO, the maximum elevated cell emissions for Periods 1 and 2 were 366 tpy (=731,051 lbs/year) and 64 tpy (=128,580 lbs/year), respectively, while the corresponding maximum ground-level cell emissions were 169 tpy in Period 1 and 77 tpy in Period 2. As was the case for CO, the inferred point source emissions for NO are generally lower than the reported values in MAERS.
The levels of HCHO emissions inferred by the MicroFACT model may be sufficient to explain the ambient HCHO concentrations consistently measured at the three EGLE stations in the Dearborn Loop over the last two decades, as well as the concentration gradient between the Dearborn and River Rouge sites measured in 2021 (see
Figure 6). While only two wind directions over two 1 h periods were simulated in this study, it is not difficult to see how different wind directions over a long period would alternate between the high exposure scenario of Period 1 and the low exposure scenario of Period 2.
Figure 8 displays a wind rose based on measurements at DTW airport between 2018 and 2022. Northwesterly and northeasterly wind directions are prominent in the long-term data. In contrast, the dominant southwesterly wind direction should generate low HCHO exposures like those in Period 2, as the major emission sources would be downwind of or peripheral to most of the affected residential areas. Thus, the resulting long-term average HCHO exposures over the populated areas in the Dearborn Loop would likely be very similar to those indicated by the EGLE station measurements.
5. Summary and Conclusions
This study has demonstrated the feasibility and utility of inverse modeling for estimating widespread industrial emissions of chemically reactive compounds and quantifying cumulative exposure to HAPs with multiple primary and secondary sources. The results have practical implications for air quality policy and regulations because they identify a range of specific industrial sources that may be subject to controls to mitigate inhalation exposure to toxic HCHO and reduce the ambient concentrations and health impacts of criteria pollutants. These controls may include minimizing or improving the efficiency of flares, adding oxidation catalysts to stationary engines, and better detecting and capturing fugitive emissions.
The main conclusions drawn from this study are as follows:
HCHO emissions from individual industrial facilities mainly representing the power, steel, coking, and waste treatment industries likely exceed 1 tpy.
The average emission ratios of HCHO to CO in combustion sources emitting more than 1 tpy of HCHO are roughly 2 to 5%. This is consistent with known ratios from stack tests and other field measurements.
Both elevated point sources and ground-level sources contribute significantly to ambient HCHO exposures.
When winds favor the transport of emission plumes from the largest industrial sources towards the interior of the study area, widespread exposure to ambient concentrations of HCHO between 3 and 6 ppb may occur. Otherwise, ambient HCHO is mostly below 2 ppb (roughly the US national median) except in the vicinity or immediately downwind of large sources.
Application of the MicroFACT model helped to explain the observed HCHO concentration gradient between the EGLE monitoring stations at Dearborn and River Rouge in 2021. The gradient arises because of the importance of primary HCHO on the local scale versus broader plumes of secondary HCHO, which would result in a more homogeneous distribution of ambient concentrations.
Longer-term exposure to HCHO will result from a mix of favorable and unfavorable wind conditions, such that the measured long-term average HCHO concentrations at the three EGLE monitoring sites in the study area are likely good indicators of local population exposure.
In the future, the inverse model may be enhanced by utilizing automated techniques for estimating initial error covariances, such as pattern recognition (e.g., positive matrix factorization) or artificial intelligence, to distinguish source behaviors. This will minimize subjectivity and increase the accuracy of inverse modeling results. The cumulative impacts of HCHO including the enhancement of ambient ozone and PM2.5 would also be an important avenue for further investigation.