Development of a Health-Based Index to Identify the Association between Air Pollution and Health Effects in Mexico City

Health risks from air pollution continue to be a major concern for residents in Mexico City. These health burdens could be partially alleviated through individual avoidance behavior if accurate information regarding the daily health risks of multiple pollutants became available. A split sample approach was used in this study to create and validate a multi-pollutant, health-based air quality index. Poisson generalized linear models were used to assess the impacts of ambient air pollution (i.e., fine particulate matter (PM2.5), nitrogen dioxide (NO2), and ground-level ozone (O3)) on a total of 610,982 daily emergency department (ED) visits for respiratory disease obtained from 40 facilities in the metropolitan area of Mexico City from 2010 to 2015. Increased risk of respiratory ED visits was observed for interquartile increases in the 4-day average concentrations of PM2.5 (Risk Ratio (RR) 1.03, 95% CI 1.01–1.04), O3 (RR 1.03, 95% CI 1.01–1.05), and to a lesser extent NO2 (RR 1.01, 95% CI 0.99–1.02). An additive, multi-pollutant index was created using coefficients for these three pollutants. Positive associations of index values with daily respiratory ED visits was observed among children (ages 2–17) and adults (ages 18+). The use of previously unavailable daily health records enabled an assessment of short-term ambient air pollution concentrations on respiratory morbidity in Mexico City and the creation of a health-based air quality index, which is now currently in use in


Introduction
Once the world's most polluted regions, Mexico City has made significant improvements in recent decades through targeted air quality management of fuels and industrial emissions [1,2]. However, the rate of air quality improvement has recently slowed, and air pollution-related health burdens continue to persist among the city's residents due to rapid urban development and local topographical conditions that trap pollution in the Mexico City valley [3][4][5][6]. Beyond lowering pollutant concentrations through improved air quality management, adverse health outcomes could also be reduced through behavior modification choices (such as choosing to remain indoors on poor air quality days). Accurate information about the daily health risks from air pollution is necessary so that individuals can make the best behavior modification decisions. To meet this need, the Marron Institute of Urban Management at New York University, in collaboration with Secretaría del Medio Ambiente (SEDEMA), developed a rigorous health-based air quality index based on local health statistics for use as a communication tool in Mexico City.
Air quality indices communicate current air pollution conditions to the public; the intention is to encourage individuals to change their behavior in ways that reduce poor health outcomes in a given locale. Studies have shown increased public awareness of these tools, particularly among those living in highly polluted regions, those with respiratory conditions, and in regions where doctors have been trained to provide information about air quality indices [6][7][8][9]. Additionally, changes in behavior as a result of index alerts have been observed in numerous locations [6,10,11]. These tools may provide immediate benefits to existing air quality management efforts, which take time to implement and often run into bureaucratic roadblocks.
Traditional risk communication tools, such as the U.S. Air Quality Index (AQI), have been constructed to emphasize extreme pollution episodes, or to highlight days where pollution levels are unusually high and above regulatory levels [12][13][14]. Mexico has modeled its regulatory air quality standards off of those in the U.S., and uses a similar air quality index based on a single-pollutant concentration model to communicate daily risk to the public [1]. However, these and similar tools are limited in their ability to capture the risks associated with lower levels of pollution, given that by design, anything below their regulatory levels would be deemed as "safe" [15]. In reality, strong evidence suggests that a large proportion of health effects attributable to air pollution occur on days where exposures are below standard regulatory levels [16,17].
In contrast to indices based on federally-mandated pollutant concentration limits, recent efforts have been made towards developing health-based indices using risk ratios derived from the epidemiological literature. The first among these was Canada's Air Quality Health Index [18,19], which has served as a model for similar index designs in other countries [20][21][22]. Most health-based index designs incorporate a multipollutant model to better represent the reality of airshed mixtures and ambient exposures, and may include multi-day lag structures to capture the full spectrum of short-term health impacts [19,23,24]. Existing health-based indices typically rely on measures of mortality to determine risk communication messages, an outcome which may not best reflect the dayto-day needs of the general population. In contrast, using respiratory morbidity as a health outcome is relevant across a wider range of age categories (from children to the oldest adults) [25,26] and is also the health endpoint most likely to drive individual behavior modification decisions [11,[27][28][29][30]. Respiratory morbidity has also been demonstrated as the only health outcome to be improved through the awareness and utilization of a health-based air quality index in Canada, even though that index was designed based on short-term mortality risk. Examining a population-based ten-year cohort in Toronto, Chen et al. (2018) found that only asthma-related emergency department visits showed significant reductions in association with air quality alerts; the six other cardiovascular and respiratory-related health endpoints, including mortality, revealed no association with index values. While a similar study based in Santiago, Chile did observe reductions in mortality associated with air quality alerts, this city frequently experiences severe pollution episodes that are uncommon in Toronto, suggesting mortality is a less useful endpoint under moderate to low pollution settings [31].
In this study, the associations of pollutant concentrations and respiratory morbidity outcomes were examined in Mexico City and used to produce a multipollutant, healthbased air quality index. In order to produce a rigorous index suitable for communication to the general public, three goals were put in place at the start of the study. First, because pollutants affect different age groups to different extents, the index should accurately predict respiratory outcomes for both children and adults. Second, the index should include at least three ambient air pollutants, since indices that rely too heavily on a single pollutant are unable to accurately capture the overall health risk to a population that is exposed to many different pollutants each day. Finally, the index needed to show a generally normal distribution to allow for effective risk communication, particularly at relatively lower levels of pollution.

Materials and Methods
Hourly and daily pollution monitoring datasets were obtained for all available monitors from 2010 to 2015 in Mexico City from SEDEMA. The individual pollution variables were aggregated into daily exposure variables at health-relevant averaging times (24-h average for fine particulate matter (PM 2.5 ) (µg/m 3 ), 8-h maximum average for ground-level ozone (O 3 ) (ppb), 1-h maximum for nitrogen dioxide (NO 2 ) (ppb)). Monitors used in the primary health analysis were selected in part due to a low number of missing monitoring days by season and spatial representation of the metropolitan area. A frequency cut-point of 70% of days with valid monitoring data per season, prior to data imputation, was used as a screening criterion for inclusion. Missing values were inputted with multivariate imputation by chained equations (MICE) using predictive mean matching to input non-normally distributed pollution data. In total, there were six PM 2.5 monitors, ten O 3 monitors, and five NO 2 monitors that met the inclusion criteria (see Appendix A). All imputations were completed using R [32,33].
Meteorological variables, derived from the MER air quality station (see Figure A1), were also used in the analysis to control for the effects of temperature and relative humidity, which have been shown to be associated with both respiratory health outcomes and daily pollution concentrations [34][35][36]. Daily 24-h average temperature and relative humidity values were used in the primary health analysis, although sensitivity analysis using maximum temperature and relative humidity did not change these results.
Daily health data were available for the years 2010-2015 in the metropolitan area of Mexico City, obtained through a data sharing agreement between SEDEMA and the city's Ministry of Health. Prior to this study, only weekly numbers had been made available to researchers for analysis. Without this newly acquired daily health data, this analysis and development of a health-based air quality index would not have been possible.
Respiratory emergency department (ED) visits were defined as upper respiratory infections (ICD-10 codes J00-06), asthma (J45-J46), chronic obstructive pulmonary disease (COPD) (J44), pneumonia (J12-J18), acute lower respiratory infections (J20-J22), chronic lower respiratory disease (J40-J42, J47), and other respiratory illness (J30-J39). Daily respiratory ED counts were calculated for age groups 2-17 years, 18+ years, and a combined category of all ages. There were 610,982 respiratory ED visits reported from a total of 40 facilities during the study period, and approximately 80% of the total ED visits came from a smaller subset of 17 facilities. Full descriptive statistics by age group and year are shown in Table 1. Respiratory ED visits, rather than respiratory hospital admissions, were used as our primary measure of population-level morbidity due to the nearly 20 times greater number of events per day. A sensitivity analysis, which combined daily respiratory hospital admissions with respiratory ED visits, did not modify the results of the study. The study period was divided into even and odd years a priori in order to have independent health data available for the creation and validation of the health-based air quality index, consistent with previous work published by Perlmutt  Poisson generalized linear models were used to assess the associations of individual air pollutants with respiratory ED visits in Mexico City. Such models provide an effective method for analyzing nonlinear time-series and are widely used to analyze the health impacts of air pollution. Quasi-likelihood estimators were used in order to account for overdispersion of the data [38]. Model selection, including the number of degrees of freedom used for natural splines, was completed using Akaike information criterion (AIC) scores as well as inclusion of variables that are associated with both air pollution concentrations and the health outcomes of interest [39][40][41]. The primary time series model for each of the individual air pollutants used non-linear terms to control for long-term and seasonal trends, day of the week, and same day and multiple day lagged meteorological variables as shown in Equation (1) below: Daily Respiratory ED Visits = pollutant concentration + day of week (6 df) + length of study period (24 df) + same day temperature (3 df) + lag days 1-3 temperature (3 df) + same day relative humidity (3 df) + lag days 1-3 relative humidity (3 df) Natural splines were used for all of the variables (other than day of the week) using the indicated number of degrees of freedom (df). Sensitivity analysis was also completed using alternative degrees of freedom, based on the number of degrees of freedom with the next lowest AIC values; this sensitivity analysis indicated that the health results were not substantially changed using alternative degrees of freedom.
Associations between pollutant concentrations and respiratory ED visits were assessed for individual lag days 0-5 as well as average lag structures using permutations within the same 6-day exposure time window. Reported relative risks and 95% confidence intervals (CI) were calculated for the interquartile range of the individual air pollutants. All analysis was completed using R [32].
A health-based air quality index was created that included PM 2.5 , O 3 , and NO 2 using coefficients from individual pollutant models. The effects of the individual pollutants were represented as being additive in nature in the final index. Daily index values were estimated using coefficients derived for each pollutant in the primary analysis (see Appendix B). These calculated daily values were then used to estimate population-level respiratory morbidity using a similar model to that described for the individual pollutants as a way to validate the effectiveness of the index to represent population-level health risks. A more detailed description on how to calculate these daily index values is found in Appendix C.

Results
Significant associations between increased air pollution exposures and increased counts of daily respiratory ER visits were commonly observed among multiple pollutants, age ranges, and lag days. A complete listing of relative risks by lag structure and age group can be seen for all three pollutants in Table 2. The coefficients and standard errors used to calculate these relative risks are found in Appendix B for the same age groups, lag structures, and pollutants. Figure 1 shows the relative risk of respiratory ED visits for an interquartile increase in PM 2.5 concentrations. Significant associations are observed across multiple individual lag days for both children (ages 2-17 years) and adults (ages 18+ years) with maximum relative risks observed around lag days 2 and 3 in both age groups. The average of lag days 0-3 captures this window and indicates a relative risk of 1.03 (95% CI: 1.01-1.04) per an interquartile increase in PM 2.5 concentrations among individuals of all ages. This effect is slightly more pronounced in adults than children but effect sizes are highly similar on a per unit basis.  Exposures to increased levels of ambient O 3 were also observed to be significantly associated with respiratory ED visits in Mexico City during the study period. Figure 2 shows the relative risks for children, adults, and all ages for an interquartile increase in O 3 concentrations. Unlike what was observed for PM 2.5 , the peak impact of O 3 appears to occur primarily on lag day 1 among adults and lag days 1 and 2 among children. A four-day moving average of lag days 0-3 captures this window and indicates a relative risk of 1.03 (95% CI: 1.01-1.05) among individuals of all ages. Unlike the effects of PM 2.5 , which were observed to be similar among children and adults, the effect size among adults is more than twice as large as among children for an interquartile increase in ambient O 3 . As shown in Figure 3, associations of respiratory ED visits were not as consistent for NO 2 as they were for PM 2.5 and O 3 . None of the individual lag days were associated with increased respiratory morbidity risk among adults during the study period with statistical significance. Among children there were significant or nearly significant positive associations for NO 2 and respiratory ED visits at lag days 1 and 4, although non-significant positive associations were observed on other lag days. Not only were the associations less likely to be significant for NO 2 as compared to PM 2.5 and O 3 , but the effect size was also approximately one third of the other pollutants among individuals of all ages.
The results of the validation of the index constructed using daily concentrations of PM 2.5 , O 3 , and NO 2 are shown in Figure 4. Significant associations were observed at a 6-day moving average of lag days 0-5, with similar effect sizes for both children and adults. The relative risks and confidence intervals are shown by age group in Table 3.  . Risk ratios of respiratory ED visits in Mexico City corresponding to one interquartile increase in health-based index values, by lag structure and age group. The primary exposure window of interest is the 6-day average of lag days 0-5. The index values are significantly associated with population-level respiratory risk for both children and adults over the multi-day window of health impacts observed for the underlying individual pollutants. Examples of lag days 0-2 and lag days 3-5 represent the extreme differences in results observed between age groups. Other lag structures (i.e., lag days 1-3) are significantly associated with health risks in both populations at similar levels of relative risk. Table 3. Risk ratios for respiratory emergency department visits in Mexico City associated with health-based index values, by age group and lag structure. The primary exposure period of interest is the 6-day average observed at lag days 0-5. Significant associations were observed for both children and adults for the critical time period at which health effects were observed across the range of individual pollutants evaluated in this study. There are notable differences in the timing in regards to when significant effects are occurring between the two age groups. The most extreme examples are presented in Figure 4 and shown in more detail in Table 3. At the population-level, adults showed significant associations with adverse respiratory health outcomes more immediately following exposure (i.e., lag days 0-2) but not at later lag periods (i.e., lag days 3-5). The opposite was true for children, who continued to experience adverse health impacts of exposure to elevated levels of air pollution 3-5 days following exposure. However, the lack of positive associations among children at lag days 0-2 should be interpreted with caution given that the non-significant association is driven entirely by a lack of effect observed at lag day 0, a finding that was also consistently observed in the individual pollutant results. These values were specifically selected to show the most dramatic differences in effects observed by age group. Other groupings of lag days (e.g., lag days 1-3, etc.) show significant associations for population-level health risks among both children and adults with similar magnitudes of relative risks (t-statistic for children at lag days 1-3 = 2.55, t-statistic for adults = 2.49).

Discussion
An ideal health-based air pollution index is capable of easily and accurately communicating the daily health risks of outdoor air pollution exposures to the public. The index should take into account the effects of multiple pollutants at both high and relatively low concentrations and be able to represent risks that occur across broad age ranges in order to be meaningful for the general population. Beyond these general goals, there was no pre-determined combination of pollutants stipulated for inclusion in the generation of this study's final index model.
The inability to detect a stronger NO 2 effect, especially among adults, is likely due to increased exposure misclassification when using central site monitors in estimating population level health effects. Given the much higher NO 2 concentrations near major roads and experienced during commute times [42][43][44][45], the central site measurements of NO 2 are likely not accounting for the true exposures of affected populations. Despite this limitation, the coefficient for NO 2 associations with population-level respiratory morbidity was used in the creation of an air pollution index in the absence of more precise exposure estimates for NO 2 .
Like many existing health-based risk communication approaches, this index was designed to consider the multi-day effects that have been consistently observed to be associated with air pollution exposures rather than a same-day, rolling hourly exposure to air pollution [46]. It is also agnostic towards existing regulatory limits or recommended standards which are considered in some air quality indices (e.g., AQI in the US and the health-based index in Hong Kong) [14,47]. Rather, it was built to consider observable population-level health risks and is created using coefficients developed specifically for Mexico City. It is possible that a generic health-based index using coefficients derived from a variety of locales could be developed [37,48], but this approach was not tested in this study.
Conversely, this index is unlike existing health-based risk communication indices for air pollution [19][20][21]31] in that it was built specifically to consider the respiratory morbidity risks of air pollution rather than mortality risks. It is also unique in its ability to provide individuals with reliable information not just on high pollution days but also on days typically described as having good or moderate levels of air pollution. Susceptible individuals already experience adverse health risks at these lower concentrations [16,17,49] but previously have not had access to the information that could inform daily behavior modification decisions. However, it is not recommended that this index replace existing mechanisms that trigger required actions based on categories of outdoor pollution levels, such as school closures when air quality exceeds a specific level on their existing index. These existing mechanisms are well-suited to both reduce continued emissions of pollutants and provide broad-based guidance for reductions in exposures [7,9,11,50]. Instead, this study's index should be a health-focused supplement for use by individuals to inform behavior modification decisions, in addition to the effective regulatory actions are already in place.
It is also recommended that communication of health-based air pollution index values avoids the use of strict cut-points both in visual and descriptive dissemination of information. Existing communication approaches rely heavily on strict cut-points in the messaging of outdoor air pollution levels, which have little scientific basis and do not reflect the individual heterogeneity of effects that occur across healthy individuals, much less among individuals with increased susceptibility who this index is specifically designed to help [51]. Designated alert levels may also produce information overload, reducing individual behavior change when the level is breached over multiple days [52,53]. Rather than specifying categories of health risks using cut-points, it is preferable to instead identify categories of air pollution levels (e.g., days with relatively low, typical, or relatively high pollution levels in the context of what is commonly observed in the region). It is likely that colors may add to the effectiveness of communication of the index in this regard and if they are used they should reflect the continuous, non-threshold scale of health risks accompanying index values. The index values will be most effective when communicated in a consistent manner that allows susceptible individuals to learn the level at which they might want to consider behavior modification to reduce personal exposure to outdoor air pollution. Therefore, while the choice of scaling values and maximum index values is fluid, it should not be modified once individuals begin to adapt to the new index values.
Many important decisions regarding the spatial and temporal resolution of the index values will need to be made in order to best communicate the health risks of ambient outdoor air pollution in Mexico City. It is not recommended that these values be combined with real-time, personal monitoring of air pollutants, given that it was developed based on longer pollutant averaging times measured at central site monitors. Instead, the use of rolling rather than daily pollutant concentrations using the same averaging times as used in the study may allow for "real-time" reporting of index values. Even while this approach best supports the science behind the index, special consideration needs to be made for the available resources of local air quality managers in order to encourage the most consistent risk communication to the general public. In considering these important issues we have recommended that the reporting of daily temperatures be used as a guide in how to best use the air pollution index values to communicate the health risks of air pollution. In particular this may mean emphasizing forecasted values of index values to allow susceptible individuals to make plans regarding their personal behaviors. It may also mean allowing the public to learn for themselves the levels at which they will start to take specific actions to reduce exposures.
In addition to working with traditional media outlets and developing web-based and mobile-based communication tools, it may be advisable to specifically train primary health care providers in the utilization and interpretation of air pollution index values. Previous research has shown that this is a viable mechanism for informing the public of air pollution indices, particularly for individuals with preexisting respiratory diseases [6,8].
This approach may also provide an accelerated path towards targeting individuals in the population who are most susceptible, and thus most likely to benefit from this tool.
Finally, special attention should be paid to environmental justice and health literacy issues in considering how the information can be best communicated to the public. This is especially true given that socioeconomic status impacts both susceptibility to the health risks of air pollution and the ways in which information is most frequently derived [54][55][56][57]. Consideration of relevant environmental justice issues in communicating this index, and maximizing the ability of all individuals to have ready access to reported air pollution index values, will result in the greatest mitigation of adverse public health risks associated with daily air pollution exposures [52].

Conclusions
Air pollution is significantly associated with respiratory morbidity outcomes in Mexico City. Based on newly available hospital data, locally-derived coefficients associated with three major air pollutants were derived and used in the design of a health-based air quality index. In conjunction with forecasted pollutant concentrations, daily index values can effectively communicate health daily risks to individuals of all ages across a wide range of ambient air quality conditions. While construction and validation of a health-based index was the goal of this research, the ultimate intent is to communicate an index that not only accurately communicates daily health risks to the public, but an index that is actually adopted by the public. This requires outreach and advocacy by local government, as well as local public health and environmental organizations. Local air quality managers are advised to communicate these values to the public in a way that reflects the non-threshold health risks associated with various air pollution levels. Additionally, health care providers may be a key source in distributing index information, and special care should be taken to ensure the tool is distributed equitably across income and education levels. Moreover, expansion to the public requires development of mobile apps, websites, or other risk communication methods that ensure that the general public has access to current index values. Such measures have already been adopted by SEDEMA in Mexico City using the index developed in this study (http://aire.cdmx.gob.mx/conoce-tu-numero/, accessed on 9 March 2021) and by working with medical practitioners to expand outreach efforts to the public. As such, while the focus of this research was on the construction and validation of a health-based air quality index, its translation continues to be implemented by SEDEMA, enabling the citizens of Mexico City to make informed decisions on when to modify their outdoor activities, reduce their exposure to air pollution, and potentially reduce respiratory morbidity health outcomes.  Data Availability Statement: Restrictions apply to the availability of these data. Health data used in this study's analysis was obtained through a data sharing agreement between SEDEMA and the Mexico City Ministry of Health and is unavailable due to patient privacy restrictions.

Conflicts of Interest:
The authors declare no conflict of interest. The funders had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript, or in the decision to publish the results.

Appendix C. Calculating Daily Health-Based Index Values
A summary of the methods used to calculate daily index values is shown in the flow chart illustrated in Figure A2. Of particular note is the identification of the averaging time for each pollutant that is to be used for each pollutant along with the accompanying coefficients derived from the time-series analysis. Figure A2. A Guide to calculating a daily air pollution index in Mexico City. Coefficients provided correspond to lag days 0-3 associations for the individual pollutants and respiratory ED visits for all ages of individuals. The provided scaling value corresponds to the maximum daily index value observed from 2010 to 2015. This value can be modified as desired in order to re-scale index values. Similarly, step 4 shows the creation of daily index values that range from 0 to 10. Alternative ranges of values can be used if a maximum value of 10 is not desired. It is possible that maximum excess risk will be greater than the scaling value provided, resulting in an index value greater than the maximum value selected.
The precise values of these coefficients are less important than the ratio of the coefficient values, which indicates the increased importance of PM 2.5 and O 3 when computing the index values as compared to NO 2 . These coefficients were derived from the lag 0-3 associations among individuals of all ages but the use of slightly different coefficients using different age groupings or lag structures would not be expected to alter the validation of the created index as long as the ratios between the pollutants remained the same.
It is important to note that it is possible that the calculation of excess risk from an individual pollutant may be negative on a given day. In these circumstances it is essential that this value is changed to zero when calculating the combined daily excess risk as shown in Step 1 of the flow chart illustrated in Figure A2. Failure to do so will result in index values that will not accurately reflect population-level risks.
As identified in Step 3 of the flow chart, an initial scaling value corresponding to the maximum excess risk observed during the study period has been provided. This value can be changed in accordance with priorities and preferences of local staff but once selected should not be modified. This value, in conjunction with the desired maximum index value, will determine how the index values are scaled for communication purposes. It does not change the ability of the index to represent health risks. It is possible that the daily excess risk may be greater than the selected scaling value. When this happens the calculated index value will be greater than the maximum index values which can be easily planned for during formulation of how the index is communicated to the public.