Air Pollution, Socioeconomic Status, and Avoidable Hospitalizations: A Multifaceted Analysis

Minutti-Martinez, Carlos; Mata-Rivera, Miguel F.; Arellano-Vazquez, Magali; Escalante-Ramírez, Boris; Olveres, Jimena

doi:10.3390/mca30040069

Open AccessArticle

Air Pollution, Socioeconomic Status, and Avoidable Hospitalizations: A Multifaceted Analysis^†

by

Carlos Minutti-Martinez

¹

,

Miguel F. Mata-Rivera

^2,*

,

Magali Arellano-Vazquez

¹

,

Boris Escalante-Ramírez

³

and

Jimena Olveres

³

¹

INFOTEC, Centro de Investigación e Innovación en Tecnologías de la Información y Comunicación, Aguascalientes 20326, Mexico

²

UPIITA, Instituto Politécnico Nacional, Mexico City 07340, Mexico

³

CECAv, Centro de Estudios en Computación Avanzada, Universidad Nacional Autónoma de México, Mexico City 04510, Mexico

^*

Author to whom correspondence should be addressed.

^†

This is a revised and extended version of the paper published in Minutti-Martinez, C.; Mata-Rivera, M.F.; Arellano-Vazquez, M.; Escalante-Ramírez, B.; Olveres, J. Air Pollution, Socioeconomic Status, and Avoidable Hospitalizations in Mexico City: A Multifaceted Analysis. In Proceedings of the Mexican International Conference on Artificial Intelligence, Tonantzintla, Mexico, 21–25 October 2024.

Math. Comput. Appl. 2025, 30(4), 69; https://doi.org/10.3390/mca30040069

Submission received: 16 April 2025 / Revised: 22 June 2025 / Accepted: 24 June 2025 / Published: 30 June 2025

(This article belongs to the Special Issue New Trends in Computational Intelligence and Applications 2024)

Download

Browse Figures

Versions Notes

Abstract

This study investigates the combined effects of air pollution and socioeconomic factors on disease incidence and severity, addressing gaps in prior research that often analyzed these factors separately. Using data from 86,170 hospitalizations in Mexico City (2015–2019), we employed multivariate statistical methods (PCA and factor analysis) to construct composite measures of social and economic status and grouped correlated pollutants. Logistic and negative binomial regression models assessed their associations with hospitalization risk and frequency. Results showed that economic status significantly influenced diabetes complications, while social factors affected prenatal care-related diseases and hypertension. The

{PM}_{10}

–

{PM}_{2.5}

–CO group increased the incidence of asthma, influenza, and epilepsy, whereas

{NO}_{2}

–

{NO}_{x}

impacted diabetes complication severity and influenza. Nonlinear effects and interactions (e.g., age and weight) were also identified, highlighting the need for integrated analyses in environmental health research.

Keywords:

avoidable hospitalizations; ambulatory care sensitive conditions; diabetes; air pollution; socioeconomic status; environmental health; epidemiology; risk factors; Mexico City

1. Introduction

Avoidable hospitalizations (AHs), also known as ambulatory care sensitive conditions (ACSCs), represent hospital admissions that could potentially be prevented through timely and effective outpatient care. These conditions include both chronic illnesses like diabetes, asthma, and congestive heart failure, as well as acute conditions such as pneumonia and complicated appendicitis. When primary care is effective, it can help prevent or manage these conditions, thereby reducing the need for hospitalization [1,2,3].

As a key indicator of primary care quality, avoidable hospitalizations frequently result from inadequate or delayed community-based care. Their occurrence underscores the critical need for improved care coordination, enhanced preventive services, and better disease management strategies across healthcare settings [1,2].

Research consistently demonstrates that lower socioeconomic status (SES) is associated with an elevated risk of avoidable hospitalizations, conditions that could have been prevented through timely outpatient care [4,5,6]. The combined effect of individual-level household income and neighborhood-level material deprivation on hospitalization risk proves particularly significant, with individuals residing in low-income neighborhoods experiencing the highest risk. While the precise mechanisms remain incompletely understood, experts believe factors such as limited healthcare access, health behaviors, and health outcomes likely play important roles [4].

Usually, in urban environments, lower-SES populations face disproportionate exposure to higher air pollution levels, contributing to increased mortality risks from all causes, including respiratory conditions [7,8]. Despite extensive epidemiological research on air pollution recognizing this disparity, significant gaps remain in understanding optimal methods for adjusting SES confounding and potential biases arising from improper adjustment [8].

Recent research has increasingly linked air pollution to neurological outcomes. A large population-based study in Ontario, Canada [9], reported that long-term exposure to fine particulate matter (PM_2.5) was associated with a 5.5% increased risk of developing epilepsy, while ozone exposure was linked to a 9.6% increase. Other research suggests that air pollutants affect the number of pediatric patients in the emergency department with epilepsy attacks [10]. Proposed mechanisms include the entry of pollutants into the bloodstream and central nervous system via the lungs, potentially crossing the blood–brain barrier and triggering neuroinflammatory processes [11]. However, findings across studies are not entirely consistent, particularly with regard to specific pollutants.

The high correlation between social and economic factors presents analytical challenges in determining their relative contributions. Consequently, SES analyses frequently focus primarily on income or residential area while neglecting other crucial factors like educational attainment, potentially oversimplifying complex socioeconomic relationships.

International comparisons reveal considerable variation in avoidable hospitalization rates, with Mexico showing distinct patterns. For instance, asthma admission rates vary 12-fold across OECD countries, with Mexico, Italy, and Colombia reporting the lowest rates, while Latvia, Turkey, and Poland report rates more than twice the OECD average [12].

Understanding avoidable hospitalizations in specific populations requires careful consideration of contextual factors including SES, healthcare access, and health behaviors. By studying these populations in detail, policymakers can identify necessary changes to address avoidable hospitalizations, ultimately improving healthcare outcomes while reducing costs.

Analyzing trends in avoidable hospitalizations by clinical condition helps inform healthcare policy and resource allocation by identifying increasing or decreasing rates over time. This examination can reveal important patterns and correlations between specific conditions and hospitalization rates, guiding targeted interventions. Additionally, it can help identify high-risk groups based on age, sex, or SES.

In this comprehensive study, we investigate the principal risk factors for avoidable hospitalizations in Mexico City, examining their relationship with SES and air pollution, identifying high-risk groups, and tracking temporal changes. Our analysis of 86,170 patient records from 2015 to 2019 employed negative binomial regression, logistic regression, and Gradient Boosting Machine (GBM) models to account for nonlinearity and interactions between variables. We included SES and air pollution as key risk factors, along with relevant covariates such as locality, age, sex, weight, and admission date.

To carefully examine SES effects, we generated a composite indicator using factor analysis (FA) that provides a more nuanced assessment of each economic and social factor’s contribution, thereby better capturing their interrelation. For air pollution, we constructed indexes through Principal Component Analysis (PCA) to properly account for spatial pollutant concentration correlations across localities. We applied an iterative algorithm specifically tailored to this problem to obtain relevant factors, systematically eliminating non-significant variables while penalizing the simultaneous inclusion of highly correlated variables to reduce multicollinearity and interpretation problems.

Our results demonstrate that different aspects of the composite SES indicator influence the incidence of various avoidable hospitalization categories, while environmental air pollution affects both the incidence and severity of hospitalizations. In particular, we identified significant interactions and nonlinear effects between variables, findings that can directly inform prevention efforts and public policy aimed at reducing avoidable hospitalizations.

2. Materials and Methods

To determine the relevance of air pollution (AP) and socioeconomic status (SES) on the leading causes of hospitalization, we matched each patient’s locality of residence with complementary datasets to estimate corresponding AP and SES indexes. Our analysis incorporated multiple confounding factors including sex, age, weight, access to social security, municipality of residency, and admission date (months 1–60), among others.

We estimated severity through two measures: the number of hospitalization days and mortality occurrence. For each variable, we assessed its contribution using an iterative algorithm designed to address the problem of multicollinearity. Unlike traditional Forward–Backward Selection algorithms [13], which do not account for correlations between variables, our approach systematically eliminates and includes variables based on the Akaike Information Criterion (AIC) while penalizing correlation to maximize model interpretability. Additionally, to explore potential nonlinearity and variable interactions, we utilized relative feature importance derived from the Gradient Boosting Machine (GBM) model. All variables were scaled to a 0–1 range to facilitate comparison.

2.1. Data Sources

We integrated three primary data sources:

Hospitalizations: The Mexico City Ministry of Health (SEDESA) provided anonymized data from public hospitals under CONACYT project 7051. This comprehensive dataset included patient information such as:

Demographic characteristics (age, weight, sex);
Geographic origin;
Hospitalization indicators;
Health services entitlement status;
Admission and discharge dates;
Duration of hospitalization;
Medical conditions;
Locality of residence;
International Classification of Diseases (ICD) codes for:
–
Initial diagnosis;
–
Primary condition;
–
Cause of death (when applicable).

Air Pollutant Concentrations: We obtained AP measures from Mexico City’s Automatic Air Quality Monitoring Network [14]. For each monitoring station, we calculated 15-year averages (2005–2020) for concentrations of

P M_{10}

,

P M_{2.5}

,

C O

,

N O_{X}

,

N O_{2}

,

S O_{2}

,

N O

, and

O_{3}

.

Using kriging interpolation and QGIS, we estimated mean concentrations for each patient’s locality based on centroid coordinates.

Census of Population and Housing: We incorporated official 2020 Mexico Census data containing detailed housing and population variables at the locality level, which served as the foundation for constructing our SES indicators.

2.2. Socioeconomic Status and Air Pollution Factors

Census data are widely employed for constructing neighborhood-level composite SES indicators, typically using Principal Component Analysis (PCA) or Factor Analysis (FA) to weight each variable’s contribution [15,16].

In this study, we derived SES indicators using FA to detect more nuanced economic and social dimensions (F_ECONOM and F_SOCIAL) in the data, resulting in indicators where higher values represent less favorable circumstances, which can be interpreted as economic and social lag indicators. We validated these factors by regression analysis against established indices including:

Social Lag Index (SLI) [17];
Social Development Index (SDI) [18];
Human Development Index (HDI) [19].

These analyses resulted in coefficients of determination (

R^{2}

) exceeding 0.9, indicating strong concordance.

The economic factor (F_ECONOM) showed stronger influence with housing-related census variables such as:

Number of dwellings with latrine;
Single-room dwellings;
Dwellings with earthen floors.

In contrast, the social factor (F_SOCIAL) is strongly influenced by variables such as:

Average number of live-born children;
Average educational attainment;
Affiliation with different health services.

Additional details on the development of the SES indicators are presented in Appendix A.2.

Regarding air pollution, Mexico City’s complex terrain significantly influences local meteorology and atmospheric pollutant behavior, resulting in spatially correlated AP patterns. Failure to account for this spatial correlation could lead to biased effect estimates. Therefore, we constructed pollution factors by grouping geographically correlated pollutants. Using PCA, we identified three distinct pollutant groups based on spatial concentration patterns:

PM_CO ( $P M_{10}$ , $P M_{2.5}$ , and $C O$ );
NO2_NOx ( $N O_{X}$ and $N O_{2}$ );
SO2_NO_O3 ( $S O_{2}$ , $N O$ , and $O_{3}$ ).

Additional details on the development of the air pollutant factors are presented in Appendix A.1.

2.3. Analytical Models

For each patient, we measured hospitalization severity through a composite indicator that combined mortality occurrence and hospitalization duration. Specifically, the severity Y for patient i was defined such that:

$Y_{i} > 0.5$ indicated mortality, with values approaching 1 representing faster mortality (greater severity).
$Y_{i} < 0.5$ indicated survival, with values approaching 0 representing shorter hospital stays (lower severity).

This formulation can alternatively be interpreted as a classification problem with high-severity (death) and low-severity (non-death) classes, weighted to account for extreme cases (see [20] for more details on severity estimation).

A comprehensive catalog of Avoidable Hospitalization for Ambulatory Care Sensitive Conditions using ICD-10 codes is extracted from [21]. Table 1 presents the 14 categories with sufficient data for the severity analysis, showing case counts by locality and admission date.

Two primary model types were developed: Models estimating monthly locality-specific hospitalization counts for specific conditions and models estimating hospitalization severity.

For hospitalization count modeling at the locality level, we included total locality population (POBTOT) as the primary expected predictor. Additionally, we considered population proportions by age group (0–2 years, 18–24 years, and 60+ years), population density (POB_AREA), male–female ratio (REL_H_M), and SES and AP factors described previously.

At the patient level, we considered the municipality of residence (E_MUN_XXXXX), admission date (ADM_DATE, months 1–60), and month of admission (MONTH).

For hospitalization count modeling, following the meta-analysis by Wallar et al. [22] which concluded that negative binomial regression is most appropriate for this type of data, we employed negative binomial regression as our primary count model.

For the prediction of mortality during hospitalization, we selected logistic regression based on its established suitability for clinical outcomes (see [23]). At the patient level, we included potentially relevant severity predictors such as age (AGE), weight (WEIGHT), sex (SEX_M: 1 = male, 0 = female), and origin (PROCED: 1 = external, 2 = emergency, 3 = referred, 4 = other, 9 = unspecified).

At the locality level, we incorporated municipality of residence (E_MUN_XXXXX) and admission date (ADM_DATE, months 1–60).

In both model types, municipality of residence proved particularly relevant as different municipalities may have varying hospital infrastructure, health policies, or other unmeasured factors that could spuriously correlate with SES or AP exposure.

We implemented a systematic variable selection algorithm that began by identifying the 10 variables most strongly correlated with the outcome. The procedure iteratively removed variables with the weakest contribution to model fit, evaluated using the AIC, while penalizing the inclusion of highly correlated variables. Specifically, the final model was selected to minimize

A I C + λ \cdot r

, where

λ

is a penalty parameter and r represents the maximum absolute correlation between included variables. This approach balances goodness-of-fit with reduced multicollinearity, enhancing both model interpretability and robustness.

This approach effectively reduced multicollinearity while maintaining model interpretability.

While conventional modeling methods often struggle with high-dimensional relationships, advanced machine learning techniques like Gradient Boosting Machine (GBM) models have demonstrated superior performance in medical predictive analytics compared to traditional statistical models (e.g., Kong et al. [24]). We employed GBM to automatically account for nonlinear confounding effects and interactions, estimating variable effects, and exploring complex relationships difficult to detect with classical models.

For robust validation, we reserved 15% of records for each category as a holdout set, ensuring our predictions were non-random and that variable importance measurements had genuine predictive value; this also allows for comparison where modeling nonlinearity and interactions are relevant in the model to increase accuracy.

3. Results

Our analysis revealed several key findings regarding relevant factors for each model. We report standardized coefficients and 95% confidence intervals to facilitate interpretation of effect magnitudes and associated uncertainty. If predictor variables do not appear in the results tables, it is because they were either weakly associated with the outcome or highly correlated with more influential variables, and were therefore excluded during model selection based on relevance and multicollinearity considerations.

For visualization, red colors in figures represent effects that increase hospitalizations or severity, while blue indicates protective effects; 95% Confidence Intervals (CIs) are also included. In GBM models, we present the normalized Gini importance for the top 10 influential variables.

Figure 1 displays the estimated risk factors associated with diabetes complications for both hospitalization frequency and severity, along with their 95% confidence intervals. Patient weight emerged as one of the most significant factors increasing severity, while

N O_{2}

and

N O_{x}

pollutants also showed substantial effects. For the number of hospitalizations, total locality population (POBTOT) showed the largest effect as expected, but economic status (F_ECONOM) demonstrated a similarly strong association where localities with less favorable economic conditions had a higher number of hospitalizations.

Figure 2 displays the estimated risk factors associated with influenza and pneumonia, for both hospitalization frequency and severity, along with their 95% confidence intervals. The age of the patient has a significantly larger effect than the rest of the variables, increasing severity, followed by the patient’s weight, while

N O_{2}

and

N O_{x}

pollutants also showed substantial effects. For the number of hospitalizations, population with 65 years or older (POB65_MAS) and exposure to

S O_{2}

,

N O

and

O_{3}

have similar effects in increasing the number of hospitalizations. Exposure to PM and CO is also related to an increase in the number of hospitalizations.

Figure 3 presents some results of the GBM model for diabetes complications that illustrate the interactions between variable pairs and their effects on the number of hospitalizations and severity. These plots highlight the importance of analyzing interactions and nonlinear effects. For example, while admission date showed no statistical significance in regression analysis, GBM revealed a nonlinear pattern where cases increased until month 30 then decreased, a pattern that could yield non-significant linear effects despite meaningful temporal variation. Similarly, examining the social–economic factor relationship showed that economic factors are more relevant, but unfavorable social conditions amplified effects when combined with poor economic status.

The severity analysis in Figure 3 shows both linear age effects and nonlinear age-weight interactions. Severity peaked for older patients with either high or low weight. While logistic regression identified

N O_{2}

and

N O_{x}

effects, GBM additionally revealed severity increases when these pollutants co-occurred with PM and CO exposure.

Together, these figures demonstrate how both modeling approaches can identify significant risk factors while GBM provides additional insight into complex nonlinear relationships and interactions.

Table 2 and Table 3 present variables of interest related to the number of hospitalizations, as identified by the regression and GBM models, respectively. Similarly, Table 4 and Table 5 show the corresponding results for hospitalization severity. Table 2 and Table 4 report estimated effects and 95% confidence intervals for the variables retained in the final regression models, while Table 3 and Table 5 display normalized Gini importance scores from the GBM models, with bold values indicating variables that were also retained in the regression models. Comparing these results provides complementary insights. While Gini importance does not indicate the direction of association, high importance scores for variables not selected in the regression models may reflect nonlinear relationships or interactions not captured by the linear specification.

The results of the negative binomial regression (Table 2) confirm that the total population (POBTOT) has the strongest effect in all categories, as expected. GBM results (Table 3) similarly identify POBTOT as the most important. For admission date (ADM_DATE), only ear, nose, and throat infections (EN&T INFEC) showed significant linear effects (decreasing over time), but GBM revealed substantial nonlinear effects for diabetes (DC), angina (ANG), and COPD, suggesting an inverted U-shaped temporal pattern as shown in Figure 3, that linear models might miss.

Economic status (F_ECONOM) significantly affected only diabetes complications (DC) in regression, with GBM confirming DC as the most impacted category. For social status (F_SOCIAL), prenatal delivery-related conditions (DPCPD) showed the strongest effect, with less favorable status increasing hospitalizations. Hypertension (HYPERT) also showed increased hospitalizations for less favorable F_SOCIAL values.

Air pollution groups showed category-specific effects:

PM_CO ( $P M_{10}$ , $P M_{2.5}$ , CO) increased hospitalizations for:
–
Influenza and pneumonia (I&P);
–
Gastroenteritis (GASTRO);
–
Ear, nose, and throat infections (EN&T INFEC);
–
Asthma (ASTH);
–
Epilepsy (EPILEP).
SO2_NO_O3 significantly affected I&P

GBM importance scores align with these findings, showing PM_CO as the most influential pollutant group, particularly for asthma and ENT infections.

For hospitalization severity (Table 4 and Table 5), age and weight showed the strongest effects. Age significantly affected almost all categories, with influenza and pneumonia showing the largest effect. Weight most strongly impacted diabetes (DC) and hypertension (HYPERT). Although weight was not selected as a relevant variable in many regression models, GBM showed high importance across multiple categories, likely reflecting inverted U-shaped relationships where both high and low weights increase severity (e.g., for asthma). Sex differences emerged for pyelonephritis (riskier for males) and heart failure (riskier for females).

Asthma showed reduced severity with unfavorable economic status and PM_CO exposure. These results may reflect a mixture of more exposure to these pollutants in higher SES areas, and also a possible survivor bias where severely affected individuals cannot reside in highly polluted areas.

For the NO2_NOx group of pollutants, exposure shows increased severity for diabetes complications as well as influenza and pneumonia.

Table 6 presents the performance of regression (negative binomial and logistic) and GBM models to predict the number and severity of hospitalizations in the validation dataset. Severity can be interpreted as a weighted binary prediction, where more severe cases have a greater weight in the loss function; therefore, AUC values can be estimated. The correlation between predicted and ground truth is used as a performance metric for the number of hospitalizations and the AUC value for severity. In both cases, values closer to 1 are preferable, and bold values indicate the best model on each task and category. We observed that for the number of hospitalizations, the GBM model tends to perform better across most categories, while for severity, regression tends to have better results in many cases. However, in terms of average AUC value, GBM performs slightly better.

These regression models, which include only variables selected for their relevance and avoid simultaneously including highly correlated predictors, exhibit performance comparable to the more complex GBM models while offering greater interpretability.

4. Discussion

Our multifaceted analytical approach, combining composite SES/AP indicators with advanced modeling techniques, reveals complex interactions between risk factors for avoidable hospitalizations in Mexico City. By distinguishing economic versus social SES dimensions and their differential health impacts, we provide nuanced insights for targeted public health interventions.

The identified nonlinear effects and interactions—particularly between age and weight—highlight limitations of conventional regression approaches. These complex relationships explain why some factors (e.g., admission date) showed significance in GBM but not regression models, underscoring the value of sophisticated analytical methods in epidemiological research.

Diabetes complications exemplify the need for more complex research that accounts for nuanced studies of air pollution and SES. Our results show that the economic component of SES is the main contributor to increased hospitalizations, which is important for better-targeted care campaigns. Additionally, populations more exposed to

N O_{2}

should be more aware of a higher risk of severe hospitalization when it occurs.

Asthma is another example of how the complex interaction between SES and air pollution should be considered to better understand the effects of both factors on disease outcomes. Table 2 shows that the most significant variables predicting the number of hospitalizations due to asthma in a location are its population and, more importantly, exposure to PM and CO. However, regarding hospitalization severity (Table 4), we observed that better SES could lead to higher severity, which might seem to contradict previous research where low SES is associated with more severe asthma (e.g., [25]). However, in the context of Mexico City, our results (see Appendix A.3) show that higher exposure to PM, CO, and

N O_{2}

is related to higher SES (consistent with existing research, e.g., [26]), considering that PM and

N O_{2}

are related to higher risks of asthma (see [27]). Therefore, this interaction between SES and air pollution can result in the compensatory effects observed in severity. The existence of complex interactions can be inferred from the large performance difference between regression and GBM models (Table 6), as GBM can better model nonlinear effects and interactions.

The higher severity observed in women with heart failure (Table 4) is consistent with recent findings. Lu et al. [28] reported that women hospitalized for heart attacks are less likely to receive key interventions such as cardiac catheterization, percutaneous coronary intervention (PCI), and coronary artery bypass grafting (CABG) compared to men, contributing to higher mortality rates among women. Similarly, Ezekowitz et al. [29] found that hospital mortality rates were notably higher in women than in men for ST-segment elevation myocardial infarction (STEMI), with 9.4% mortality in women versus 4.5% in men, and 4.7% versus 2.9% for non-STEMI (NSTEMI). Given the gender differences observed in our analysis for Mexico City, it may be valuable to investigate whether similar disparities in treatment access or quality exist locally, where targeted changes in patient management could potentially reduce excess mortality.

The higher severity found in men compared to women for pyelonephritis also aligns with prior studies. Kim et al. [30] reported an in-hospital mortality rate of 1.5 per 1000 episodes of pyelonephritis in men, compared to 0.5 per 1000 in women. Severe pyelonephritis in men is also associated with higher rates of complications such as renal abscesses, which are rare in females. Experimental models suggest that androgens (male hormones) may enhance the severity of urinary tract infections, including pyelonephritis, in males [31].

Although SES is usually studied as a mixture of economic and social factors, our results show the need for more nuanced analysis. For example, in the correlations of variables (Appendix A.3), even though economic and social factors are highly correlated, our results show that air pollutants are mainly related to the economic aspects of SES. In addition, for Mexico City, different pollutants behave differently in relation to SES. More favorable SES scores are related to higher exposure to PM, CO, and

N O_{2}

, while lower SES is associated with higher exposure to

S O_{2}

and

O_{3}

. Like Mexico City, many cities could have unique and more complex situations where these interactions between SES and air pollution should be considered to properly address public policy and prevention, underscoring the need for more research in regions with different circumstances than those typically studied.

Regarding model performance, although some interpretability is possible in GBM by studying the Gini importance of variables, the magnitude and direction of effects can be more difficult to assess, and multicollinearity issues remain, with the possibility of wrongly assessing the magnitude of effects that could be diluted across other variables. Therefore, if prediction accuracy is the main objective, GBM could be preferable, but when studying the effects of different factors on diseases, our results show that regression models can be highly interpretable while still maintaining competitive performance. However, comparing both types of models can be beneficial for discovering unknown nonlinear effects and interactions between variables.

Although changes in patient management over time could influence trends in avoidable hospitalizations, such information was not directly available in the dataset. However, our analysis did not reveal consistent temporal effects across most conditions. In particular, the variable representing admission date (ADM_DATE) was not retained in the final regression models for the majority of disease categories, suggesting limited or no measurable shifts in hospitalization patterns during the study period. An exception was observed for ear, nose, and throat infections, where a downward trend was detected. These findings may indicate overall stability in care delivery or access for most conditions, although unobserved factors such as policy changes or protocol updates cannot be ruled out. Future research could benefit from incorporating hospital-level or policy implementation data to further investigate temporal changes in care practices.

Our findings advance understanding of how air pollution and SES jointly influence both AH incidence and severity, a key improvement over previous studies examining these factors separately [22,32]. Notably, SES factors primarily affected hospitalization frequency (especially for chronic conditions like diabetes), while air pollution impacted both incidence and severity (e.g., diabetes complications and influenza). This pattern suggests socioeconomic factors influence long-term health behaviors and preventive care access, while pollution has both acute and chronic health effects.

5. Conclusions

Our analysis of 86,170 hospitalizations in Mexico City (2015–2019) yields several important conclusions with both scientific and policy implications:

5.1. Key Findings

SES effects are multidimensional: Economic and social components of SES showed distinct health impacts, with:
–
Economic factors strongly influencing diabetes complications.
–
Social factors more relevant for prenatal conditions and hypertension.
Pollution effects are pollutant-specific: Different pollutant groups affected:
–
Incidence ( ${PM}_{10}$ – ${PM}_{2.5}$ –CO group): Asthma, influenza, epilepsy.
–
Severity ( ${NO}_{2}$ – ${NO}_{x}$ group): Diabetes complications, influenza.
Complex interactions exist: Notable nonlinear relationships were found for:
–
Age–weight interactions in disease severity.
–
Temporal patterns in hospitalization rates.
–
SES–pollution interactions.

5.2. Policy Implications

These findings suggest several targeted intervention strategies:

Pollution control:
–
Priority reduction of ${PM}_{2.5}$ and CO in areas with high asthma rates.
–
${NO}_{2}$ mitigation near diabetes treatment centers.
Healthcare interventions:
–
Economic support programs for diabetes management.
–
Social support initiatives for maternal health.
Monitoring and research:
–
Enhanced surveillance in high-risk populations.
–
Further study of neurological effects of air pollution.

5.3. Methodological Contributions

Our study demonstrates the value of:

Combining traditional and machine learning approaches.
Developing composite SES indicators.
Analyzing pollutant groups rather than individual species.
Examining both incidence and severity outcomes.

These findings significantly advance our understanding of the complex interplay between environmental and social determinants of health in urban populations. The methodologies developed here can be applied to other cities facing similar public health challenges, while the specific results provide actionable insights for improving population health in Mexico City.

Author Contributions

Conceptualization, C.M.-M.; data curation, C.M.-M.; formal analysis, C.M.-M.; investigation, C.M.-M.; methodology, C.M.-M.; project administration, M.F.M.-R.; resources, M.F.M.-R.; supervision, M.F.M.-R., B.E.-R. and J.O.; writing—original draft, C.M.-M.; writing—review and editing, C.M.-M., M.F.M.-R., M.A.-V., B.E.-R. and J.O. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

The data analysis was conducted using anonymized data, with no possibility of identifying individual patients. Data were extracted from the administrative databases of the Ministry of Health of Mexico City in accordance with CONACYT project 7051. To ensure confidentiality and anonymity, the Ministry of Health provided data with all direct identifiers removed. Written informed consent for participation was not required, in accordance with national legislation. Access to the data was provided from the start of the research project on 15 July 2021.

Data Availability Statement

The data supporting study findings are the responsibility of the Ministry of Health of Mexico City (SEDESA). Thus, restrictions apply to the availability of these data, which were used under license for the CONACYT project 7051 and are not publicly available. Data are, however, available from the authors upon reasonable request and with permission of SEDESA. Codes to generate the SES indicators, contaminant factors, and predictive models are openly available at https://github.com/cminuttim (accessed on 16 April 2025).

Acknowledgments

The authors thank the Mexican National Council of Science and Technology (CONACYT) which made this research possible through the project 7051 “Data observatory for discoveries of social-spatial-temporal patterns in health, mobility and air quality” and to the Ministry of Health of Mexico City (SEDESA) for providing their data and knowledge. This article is a revised and expanded version of a paper entitled “Air Pollution, Socioeconomic Status, and Avoidable Hospitalizations in Mexico City: A Multifaceted Analysis”, which was presented at the “6th Workshop on New Trends in Computational Intelligence and Applications” (CIAPP 2024), as part of MICAI 2024, INAOE, Puebla, MEXICO on 22 October 2024 [33]. During the preparation of this manuscript, the authors used ChatGPT-4 and Claude 3 for grammar and style correction purposes. The authors have reviewed and edited the output and take full responsibility for the content of this publication.

Conflicts of Interest

The authors declare no conflicts of interest.

Abbreviations

The following abbreviations are used in this manuscript:

ACSCs	Ambulatory Care Sensitive Conditions
AHs	Avoidable Hospitalizations
AIC	Akaike Information Criterion
AP	Air Pollution
AUC	Area Under the ROC Curve
CO	Carbon Monoxide
COPD	Chronic Obstructive Pulmonary Disease
DC	Diabetes Complications
FA	Factor Analysis
GBM	Gradient Boosting Machine
HDI	Human Development Index
ICD	International Classification of Diseases
NO	Nitric Oxide
NO₂	Nitrogen Dioxide
NO_x	Nitrogen Oxides
O₃	Ozone
PCA	Principal Component Analysis
PM_2.5	Fine Particulate Matter (<2.5 $μ$ m)
PM₁₀	Coarse Particulate Matter (<10 $μ$ m)
SEDESA	Secretaría de Salud de la Ciudad de México
SES	Socioeconomic Status
SLI	Social Lag Index
SDI	Social Development Index
SO₂	Sulfur Dioxide

Appendix A. Statistical Summary

Appendix A.1. Air Pollution Factors

Mexico City’s complex terrain, surrounded by mountains, significantly influences local meteorology and pollutant behavior. The region’s wind patterns play a crucial role in determining air quality and pollutant distribution patterns [34].

The high correlation between certain air pollutants presents challenges for robust modeling due to potential multicollinearity issues. To address this, we employed principal component analysis (PCA) to group commonly co-occurring pollutants while preserving maximal information. Figure A1 displays the PCA biplot, clearly illustrating both pollutant groupings and their spatial distribution patterns.

We conducted additional PCAs for each identified group to determine individual pollutant weights, resulting in the following final pollutant factors:

\begin{matrix} P M_C O & = & 0.35 \cdot P M_{10} + 0.39 \cdot P M_{2.5} + 0.26 \cdot C O \\ N O 2_N O x & = & 0.54 \cdot N O_{X} + 0.46 \cdot N O_{2} \\ S O 2_N O_O 3 & = & 0.35 \cdot S O_{2} + 0.33 \cdot N O + 0.32 \cdot O_{3} \end{matrix}

Figure A1. Principal component analysis of air pollutants across localities in the Mexico City Metropolitan Area, showing pollutant groupings and spatial distribution patterns.

Appendix A.2. Socioeconomic Factors

Table A1 presents the variables used to construct our composite socioeconomic status (SES) indicator, while Figure A2 shows the factor loadings from the factor analysis. The social component primarily captures variables related to education (average schooling level), reproductive health (average number of live-born children, linked to female education and fertility rates [35,36,37]), and healthcare access (affiliation to various health services). In contrast, the economic component focuses on dwelling characteristics, including traditional poverty indicators such as earthen floors [38].

Table A1. Socioeconomic variables used for composite SES indicator construction.

Variable	Description
REL_H_M	Male–female ratio
PROM_HNV	Average number of live-born daughters and sons
P3HLINHE_M	Male population aged 3+ speaking indigenous language but not Spanish
P5_HLI_NHE	Population aged 5+ speaking indigenous language but not Spanish
PDER_IMSS	Population affiliated to IMSS health services
PDER_ISTE	Population affiliated to ISSSTE health services
PDER_ISTEE	Population affiliated to state ISSSTE health services
PAFIL_PDOM	Population affiliated to PEMEX, Defense, or Navy health services
PDER_SEGP	Population affiliated to Instituto de Salud para el Bienestar
PDER_IMSSB	Population affiliated to IMSS BIENESTAR
PAFIL_IPRIV	Population with private health insurance
GRAPROES	Average level of schooling
VIVPAR_DES	Uninhabited private dwellings
VIVPAR_UT	Temporary private dwellings
PROM_OCUP	Average occupants per inhabited dwelling
VPH_PISOTI	Dwellings with earthen floors
VPH_1CUART	Dwellings with only one room
VPH_AGUAFV	Dwellings without piped water
VPH_LETR	Dwellings with latrine
VPH_NODREN	Dwellings without drainage
VPH_SNBIEN	Dwellings without property ownership
VPH_SINRTV	Dwellings without radio or TV
VPH_SINTIC	Dwellings without information/communication technologies

Figure A2. Factor loadings showing the relative contributions of variables to the social and economic components of the SES indicator.

Appendix A.3. Distributions and Correlations

Figure A3, Figure A4, Figure A5 and Figure A6 present the distributions and correlations between variables across different avoidable hospitalization categories and severity measures.

Notably, while the social and economic components of SES are generally correlated, they demonstrate distinct relationships with other variables. For diabetes complications (Figure A3), the economic component shows stronger correlations with air pollution groups than the social component. Additionally, the

{SO}_{2}

–NO–

O_{3}

group frequently exhibits inverse correlations compared to the

{NO}_{2}

–

{NO}_{x}

group. This pattern underscores the importance of careful variable selection in model construction, as using one pollutant group as a proxy for another could lead to erroneous conclusions.

The correlation analysis also helps identify potential confounding factors and multicollinearity issues. It reveals some counterintuitive relationships, such as the association between higher SES and more severe asthma hospitalizations—a finding explained by the concurrent higher exposure to PM, CO, and

{NO}_{2}

pollution in higher SES areas.

Figure A3. Variable correlations for diabetes complications (DC), influenza/pneumonia (I&P), ENT infections (EN&T INFEC), and gastroenteritis (GASTRO).

Figure A4. Variable correlations for ulcers (ULCER), pyelonephritis (PYELO), cellulitis (CELL), and asthma (ASTH).

Figure A5. Variable correlations for prenatal conditions (DPCPD), epilepsy (EPILEP), hypertension (HYPERT), and heart failure (HEART).

Figure A6. Variable correlations for angina (ANG) and chronic obstructive pulmonary disease (COPD).

References

Rosano, A.; Loha, C.A.; Falvo, R.; van der Zee, J.; Ricciardi, W.; Guasticchi, G.; de Belvis, A.G. The relationship between avoidable hospitalization and accessibility to primary care: A systematic review. Eur. J. Public Health 2012, 23, 356–360. [Google Scholar] [CrossRef] [PubMed]
Lyhne, C.N.; Bjerrum, M.; Riis, A.H.; Jørgensen, M.J. Interventions to prevent potentially avoidable hospitalizations: A mixed methods systematic review. Front. Public Health 2022, 10, 898359. [Google Scholar] [CrossRef] [PubMed]
Sanderson, C.; Dixon, J. Conditions for which onset or hospital admission is potentially preventable by timely and effective ambulatory care. J. Health Serv. Res. Policy 2000, 5, 222–230. [Google Scholar] [CrossRef]
Wallar, L.E.; Rosella, L.C. Individual and neighbourhood socioeconomic status increase risk of avoidable hospitalizations among Canadian adults: A retrospective cohort study of linked population health data. Int. J. Popul. Data Sci. 2020, 5, 1351. [Google Scholar] [CrossRef] [PubMed]
Spycher, J.; Morisod, K.; Moschetti, K.; Le Pogam, M.A.; Peytremann-Bridevaux, I.; Bodenmann, P.; Cookson, R.; Rodwin, V.; Marti, J. Potentially avoidable hospitalizations and socioeconomic status in Switzerland: A small area-level analysis. Health Policy 2024, 139, 104948. [Google Scholar] [CrossRef]
Blustein, J.; Hanson, K.; Shea, S. Preventable hospitalizations and socioeconomic status. Health Aff. 1998, 17, 177–189. [Google Scholar] [CrossRef]
Blanco-Becerra, L.C.; Miranda-Soberanis, V.; Barraza-Villarreal, A.; Junger, W.; Hurtado-Díaz, M.; Romieu, I. Effect of socioeconomic status on the association between air pollution and mortality in Bogota, Colombia. Salud Publica Mex. 2014, 56, 371–378. [Google Scholar] [CrossRef]
Hajat, A.; MacLehose, R.F.; Rosofsky, A.; Walker, K.D.; Clougherty, J.E. Confounding by socioeconomic status in epidemiological studies of air pollution and health: Challenges and opportunities. Environ. Health Perspect. 2021, 129, 65001. [Google Scholar] [CrossRef]
Antaya, T.C.; Le, B.; Oiamo, T.; Wilk, P.; Speechley, K.N.; Burneo, J.G. The association of air pollution with new-onset epilepsy. Epilepsia 2025. [Google Scholar] [CrossRef]
Yalçın, G.; Sayınbatur, B.; Toktaş, İ.; Gürbay, A. The relationship between environmental air pollution, meteorological factors, and emergency service admissions for epileptic attacks in children. Epilepsy Res. 2022, 187, 107026. [Google Scholar] [CrossRef]
Calderón-Garcidueñas, L.; Solt, A.C.; Henríquez-Roldán, C.; Torres-Jardón, R.; Nuse, B.; Herritt, L.; Villarreal-Calderón, R.; Osnaya, N.; Stone, I.; García, R.; et al. Long-term air pollution exposure is associated with neuroinflammation, an altered innate immune response, disruption of the blood-brain barrier, ultrafine particulate deposition, and accumulation of amyloid β-42 and α-synuclein in children and young adults. Toxicol. Pathol. 2008, 36, 289–310. [Google Scholar] [CrossRef] [PubMed]
OECD. Avoidable Hospital Admissions. 2019. Available online: https://www.oecd.org/en/publications/health-at-a-glance-2019_4dd50c09-en.html (accessed on 16 April 2025).
James, G.; Witten, D.; Hastie, T.; Tibshirani, R. An Introduction to Statistical Learning; Springer: Berlin/Heidelberg, Germany, 2013; Volume 112. [Google Scholar]
RAMA. Automatic Air Quality Monitoring Network. 2025. Available online: http://www.aire.cdmx.gob.mx/default.php?opc=%27aKBh%27 (accessed on 16 April 2025).
Messer, L.C.; Laraia, B.A.; Kaufman, J.S.; Eyster, J.; Holzman, C.; Culhane, J.; Elo, I.; Burke, J.G.; O’Campo, P. The Development of a Standardized Neighborhood Deprivation Index. J. Urban Health 2006, 83, 1041–1062. [Google Scholar] [CrossRef]
Yu, M.; Tatalovich, Z.; Gibson, J.T.; Cronin, K.A. Using a composite index of socioeconomic status to investigate health disparities while protecting the confidentiality of cancer registry data. Cancer Causes Control 2014, 25, 81–92. [Google Scholar] [CrossRef] [PubMed]
CONEVAL. Índice de Rezago Social (IRS), 2020. 2021. Available online: https://www.coneval.org.mx/Medicion/IRS/Paginas/Indice_Rezago_Social_2020.aspx (accessed on 16 April 2025).
EvaluaCDMX. Índice de Desarrollo Social de la Ciudad de México, 2020. 2021. Available online: https://evalua.cdmx.gob.mx (accessed on 16 April 2025).
UNDP. Informe de Desarrollo Humano Municipal 2010–2015. 2019. Available online: https://www.undp.org/es/mexico/publications/idh-municipal-2010-2015 (accessed on 17 April 2025).
Minutti-Martinez, C.; Galindo, A.; Valdez-Garduno, L.F.; Mata-Rivera, M.F. Exploring nonlinear effects of air pollution on hospital admissions by disease using gradient boosting machines. In Proceedings of the 2022 19th International Conference on Electrical Engineering, Computing Science and Automatic Control (CCE), Mexico City, Mexico, 9–11 November 2022; IEEE: Piscataway, NJ, USA, 2022. [Google Scholar]
Poblano Verástegui, O.; Torres-Arreola, L.D.P.; Flores-Hernández, S.; Nevarez Sida, A.; Saturno Hernández, P.J. Avoidable Hospitalization trends from Ambulatory Care-Sensitive Conditions in the public health system in México. Front. Public Health 2021, 9, 765318. [Google Scholar] [CrossRef] [PubMed]
Wallar, L.E.; De Prophetis, E.; Rosella, L.C. Socioeconomic inequalities in hospitalizations for chronic ambulatory care sensitive conditions: A systematic review of peer-reviewed literature, 1990–2018. Int. J. Equity Health 2020, 19, 60. [Google Scholar] [CrossRef]
Shipe, M.E.; Deppen, S.A.; Farjah, F.; Grogan, E.L. Developing prediction models for clinical use using logistic regression: An overview. J. Thorac. Dis. 2019, 11, S574–S584. [Google Scholar] [CrossRef]
Kong, G.; Lin, K.; Hu, Y. Using machine learning methods to predict in-hospital mortality of sepsis patients in the ICU. BMC Med. Inform. Decis. Mak. 2020, 20, 251. [Google Scholar] [CrossRef]
Lee, W.S.; Hwang, J.K.; Ryu, J.; Choi, Y.J.; Oh, J.W.; Kim, C.R.; Han, M.Y.; Oh, I.H.; Lee, K.S. The relationship between childhood asthma and socioeconomic status: A Korean nationwide population-based study. Front. Public Health 2023, 11, 1133312. [Google Scholar] [CrossRef]
García-Burgos, J.; Miquelajauregui, Y.; Vega, E.; Namdeo, A.; Ruíz-Olivares, A.; Mejía-Arangure, J.M.; Resendiz-Martinez, C.G.; Hayes, L.; Bramwell, L.; Jaimes-Palomera, M.; et al. Exploring the spatial distribution of air pollution and its association with socioeconomic status indicators in Mexico City. Sustainability 2022, 14, 15320. [Google Scholar] [CrossRef]
Tiotiu, A.I.; Novakova, P.; Nedeva, D.; Chong-Neto, H.J.; Novakova, S.; Steiropoulos, P.; Kowal, K. Impact of air pollution on asthma outcomes. Int. J. Environ. Res. Public Health 2020, 17, 6212. [Google Scholar] [CrossRef]
Lu, H.; Hatfield, L.A.; Al-Azazi, S.; Bakx, P.; Banerjee, A.; Burrack, N.; Chen, Y.C.; Fu, C.; Gordon, M.; Heine, R.; et al. Sex-based disparities in acute myocardial infarction treatment patterns and outcomes in older adults hospitalized across 6 high-income countries: An analysis from the International Health Systems Research Collaborative. Circ. Cardiovasc. Qual. Outcomes 2024, 17, e010144. [Google Scholar] [CrossRef]
Ezekowitz, J.A.; Savu, A.; Welsh, R.C.; McAlister, F.A.; Goodman, S.G.; Kaul, P. Is there a sex gap in surviving an acute coronary syndrome or subsequent development of heart failure? Circulation 2020, 142, 2231–2239. [Google Scholar] [CrossRef] [PubMed]
Kim, B.; Myung, R.; Kim, J.; Lee, M.j.; Pai, H. Descriptive epidemiology of acute pyelonephritis in Korea, 2010–2014: Population-based study. J. Korean Med. Sci. 2018, 33, e310. [Google Scholar] [CrossRef]
Olson, P.D.; Hruska, K.A.; Hunstad, D.A. Androgens enhance male urinary tract infection severity in a new model. J. Am. Soc. Nephrol. 2016, 27, 1625–1634. [Google Scholar] [CrossRef] [PubMed]
Hajat, A.; Hsia, C.; O’Neill, M.S. Socioeconomic Disparities and Air Pollution Exposure: A Global Review. Curr. Environ. Health Rep. 2015, 2, 440–450. [Google Scholar] [CrossRef] [PubMed]
Minutti-Martinez, C.; Mata-Rivera, M.F.; Arellano-Vazquez, M.; Escalante-Ramírez, B.; Olveres, J. Air Pollution, Socioeconomic Status, and Avoidable Hospitalizations in Mexico City: A Multifaceted Analysis. In Proceedings of the Mexican International Conference on Artificial Intelligence, Tonantzintla, Mexico, 21–25 October 2024; Springer: Berlin/Heidelberg, Germany, 2024; pp. 73–86. [Google Scholar]
Minutti-Martinez, C.; Arellano-Vázquez, M.; Zamora-Machado, M. A Hybrid Model for the Prediction of Air Pollutants Concentration, Based on Statistical and Machine Learning Techniques. Lect. Notes Comput. Sci. 2021, 13068, 252–264. [Google Scholar]
Jain, A.K. The Effect of Female Education on Fertility: A Simple Explanation. Demography 1981, 18, 577–595. [Google Scholar] [CrossRef]
Basu, A.M. Why does Education Lead to Lower Fertility? A Critical Review of Some of the Possibilities. World Dev. 2002, 30, 1779–1790. [Google Scholar] [CrossRef]
Brzozowska, Z. Female Education and Fertility under State Socialism in Central and Eastern Europe. Population 2015, 70, 689–725. [Google Scholar]
Gutierrez-Jimenez, J.; Torres-Sanchez, M.G.C.; Fajardo-Martinez, L.P.; Schlie-Guzman, M.A.; Luna-Cazares, L.M.; Gonzalez-Esquinca, A.R.; Guerrero-Fuentes, S.; Vidal, J.E. Malnutrition and the presence of intestinal parasites in children from the poorest municipalities of Mexico. J. Infect. Dev. Ctries. 2013, 7, 741–747. [Google Scholar] [CrossRef]

Figure 1. Key determinants of hospitalization frequency (top) and severity (bottom) for diabetes complications.

Figure 2. Key determinants of hospitalization frequency (top) and severity (bottom) for influenza and pneumonia.

Figure 3. Partial dependence plots showing interactions between variables for hospitalization frequency (left) and severity (right) in diabetes complications.

Table 1. Avoidable hospitalizations categories and number of cases analyzed.

Code	Category	Records
DC	Diabetes complications	23,868
I&P	Influenza and pneumonia	16,075
EN&T INFEC	Ear, nose, and throat infections	7146
GASTRO	Dehydration and gastroenteritis	5252
PYELO	Pyelonephritis	5218
ULCER	Perforated or bleeding ulcer	5172
CELL	Cellulitis	4428
ASTH	Asthma	4278
DPCPD	Diseases related with the prenatal health care of pregnancy and delivery	4004
EPILEP	Convulsions and epilepsy	3279
HYPERT	Hypertension	2319
HEART	Congestive heart failure	2110
COPD	Chronic obstructive pulmonary disease	1600
ANG	Angina	1421

Table 2. Estimated effects and 95% CI for hospitalization frequency (Negative Binomial Regression).

Category	POBTOT	ADM_DATE	F_ECONOM	F_SOCIAL	PM_CO	SO2_NO_O3
DC	$6.979 \pm 1.118$		$5.493 \pm 1.101$
I&P	$5.961 \pm 1.344$				$1.256 \pm 0.559$	$2.584 \pm 0.813$
EN&T INFEC	$4.474 \pm 1.364$	$- 0.426 \pm 0.273$			$1.790 \pm 0.397$
GASTRO	$1.616 \pm 0.384$				$1.451 \pm 0.343$
ULCER	$3.768 \pm 0.662$
PYELO	$2.404 \pm 0.356$
CELL	$3.581 \pm 0.590$
ASTH	$1.433 \pm 0.385$				$1.666 \pm 0.356$
DPCPD	$2.647 \pm 0.752$			$2.252 \pm 0.768$
EPILEP	$1.870 \pm 0.381$				$0.602 \pm 0.381$
HYPERT	$2.234 \pm 0.397$			$0.520 \pm 0.506$
HEART	$2.758 \pm 0.676$
ANG	$2.499 \pm 0.773$
COPD	$1.645 \pm 0.397$

Table 3. Gini importance for hospitalization frequency prediction (GBM).

Category	POBTOT	ADM_DATE	F_ECONOM	F_SOCIAL	PM_CO	NO2_NOx	SO2_NO_O3
DC	0.489	0.126	0.075	0.005	0.013	0.010	0.087
I&P	0.533	0.046	0.007	0.088	0.027	0.003	0.002
EN&T INFEC	0.530	0.080	0.007	0.007	0.170	0.023	0.001
GASTRO	0.670	0.053	0.003	0.013	0.105	0.009	0.010
ULCER	0.580	0.099	0.026	0.018	0.031	0.004	0.003
PYELO	0.765	0.063	0.005	0.024	0.005	0.006	0.002
CELL	0.745	0.078	0.006	0.003	0.004	0.003	0.009
ASTH	0.496	0.084	0.003	0.004	0.163	0.083	0.002
DPCPD	0.654	0.066	0.021	0.013	0.008	0.008	0.039
EPILEP	0.753	0.058	0.003	0.004	0.015	0.005	0.018
HYPERT	0.755	0.076	0.009	0.013	0.004	0.007	0.034
HEART	0.728	0.087	0.010	0.009	0.005	0.022	0.009
ANG	0.570	0.162	0.005	0.006	0.010	0.008	0.014
COPD	0.550	0.132	0.013	0.018	0.031	0.008	0.023

Bold indicates variables selected as relevant in the regression models.

Table 4. Estimated effects and 95% CI for hospitalization severity (Logistic Regression).

Category	AGE	WEIGHT	SEX_M	F_ECONOM	PM_CO	NO2_NOx
DC		$1.225 \pm 0.871$				$0.374 \pm 0.153$
I&P	$4.555 \pm 0.175$	$0.971 \pm 0.467$				$0.334 \pm 0.190$
EN&T INFEC	$4.033 \pm 0.299$
GASTRO	$0.195 \pm 0.018$
ULCER	$2.211 \pm 0.419$
PYELO	$4.069 \pm 0.364$		$0.447 \pm 0.183$
CELL	$3.792 \pm 0.522$
ASTH				$- 0.037 \pm 0.020$	$- 0.016 \pm 0.007$
DPCPD				$0.031 \pm 0.015$
EPILEP	$0.200 \pm 0.029$
HYPERT	$3.405 \pm 0.731$	$2.339 \pm 1.445$
HEART	$1.615 \pm 0.676$		$- 0.238 \pm 0.217$
ANG	$1.977 \pm 1.222$
COPD	$3.314 \pm 0.732$

Table 5. Gini importance for hospitalization severity prediction (GBM).

Category	AGE	WEIGHT	SEX_M	F_ECONOM	F_SOCIAL	PM_CO	NO2_NOx	SO2_NO_O3
DC	0.181	0.096	0.002	0.004	0.006	0.017	0.023	0.002
I&P	0.927	0.020	0.001	0.002	0.005	0.008	0.002	0.003
EN&T INFEC	0.742	0.101	0.004	0.010	0.009	0.008	0.013	0.006
GASTRO	0.431	0.110	0.012	0.060	0.042	0.013	0.028	0.012
ULCER	0.042	0.027	0.001	0.013	0.013	0.005	0.014	0.002
PYELO	0.760	0.039	0.032	0.008	0.007	0.023	0.006	0.011
CELL	0.437	0.087	0.024	0.052	0.034	0.019	0.023	0.026
ASTH	0.166	0.211	0.004	0.016	0.078	0.043	0.023	0.049
DPCPD	0.333	0.365	0.000	0.026	0.025	0.030	0.017	0.013
EPILEP	0.388	0.187	0.013	0.022	0.041	0.031	0.023	0.023
HYPERT	0.359	0.255	0.022	0.032	0.008	0.023	0.059	0.006
HEART	0.274	0.138	0.014	0.021	0.026	0.031	0.007	0.021
ANG	0.277	0.067	0.029	0.043	0.023	0.032	0.020	0.028
COPD	0.469	0.174	0.020	0.025	0.019	0.044	0.026	0.028

Bold values indicate the best model on each task and category.

Table 6. Performance of the regression and GBM models over the number and severity of hospitalizations.

Category	Number of Hospitalizations		Severity of Hospitalizations
	(Correlation)		(AUC)
	REG	GBM	REG	GBM
DC	0.880	0.972	0.646	0.732
I&P	0.889	0.929	0.900	0.902
EN&T INFEC	0.834	0.854	0.930	0.912
GASTRO	0.851	0.922	0.891	0.818
ULCER	0.886	0.919	0.722	0.721
PYELO	0.874	0.906	0.832	0.816
CELL	0.913	0.906	0.872	0.808
ASTH	0.783	0.868	0.726	1.000
DPCPD	0.814	0.865	0.601	0.586
EPILEP	0.871	0.828	0.800	0.728
HYPERT	0.809	0.837	0.748	0.730
HEART	0.812	0.809	0.600	0.601
ANG	0.645	0.552	0.684	0.646
COPD	0.625	0.624	0.684	0.728
MEAN	0.820	0.842	0.760	0.766

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Minutti-Martinez, C.; Mata-Rivera, M.F.; Arellano-Vazquez, M.; Escalante-Ramírez, B.; Olveres, J. Air Pollution, Socioeconomic Status, and Avoidable Hospitalizations: A Multifaceted Analysis. Math. Comput. Appl. 2025, 30, 69. https://doi.org/10.3390/mca30040069

AMA Style

Minutti-Martinez C, Mata-Rivera MF, Arellano-Vazquez M, Escalante-Ramírez B, Olveres J. Air Pollution, Socioeconomic Status, and Avoidable Hospitalizations: A Multifaceted Analysis. Mathematical and Computational Applications. 2025; 30(4):69. https://doi.org/10.3390/mca30040069

Chicago/Turabian Style

Minutti-Martinez, Carlos, Miguel F. Mata-Rivera, Magali Arellano-Vazquez, Boris Escalante-Ramírez, and Jimena Olveres. 2025. "Air Pollution, Socioeconomic Status, and Avoidable Hospitalizations: A Multifaceted Analysis" Mathematical and Computational Applications 30, no. 4: 69. https://doi.org/10.3390/mca30040069

APA Style

Minutti-Martinez, C., Mata-Rivera, M. F., Arellano-Vazquez, M., Escalante-Ramírez, B., & Olveres, J. (2025). Air Pollution, Socioeconomic Status, and Avoidable Hospitalizations: A Multifaceted Analysis. Mathematical and Computational Applications, 30(4), 69. https://doi.org/10.3390/mca30040069

Article Menu

Air Pollution, Socioeconomic Status, and Avoidable Hospitalizations: A Multifaceted Analysis^†

Abstract

1. Introduction

2. Materials and Methods

2.1. Data Sources

2.2. Socioeconomic Status and Air Pollution Factors

2.3. Analytical Models

3. Results

4. Discussion

5. Conclusions

5.1. Key Findings

5.2. Policy Implications

5.3. Methodological Contributions

Author Contributions

Funding

Institutional Review Board Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

Abbreviations

Appendix A. Statistical Summary

Appendix A.1. Air Pollution Factors

Appendix A.2. Socioeconomic Factors

Appendix A.3. Distributions and Correlations

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Article Menu

Air Pollution, Socioeconomic Status, and Avoidable Hospitalizations: A Multifaceted Analysis †

Abstract

1. Introduction

2. Materials and Methods

2.1. Data Sources

2.2. Socioeconomic Status and Air Pollution Factors

2.3. Analytical Models

3. Results

4. Discussion

5. Conclusions

5.1. Key Findings

5.2. Policy Implications

5.3. Methodological Contributions

Author Contributions

Funding

Institutional Review Board Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

Abbreviations

Appendix A. Statistical Summary

Appendix A.1. Air Pollution Factors

Appendix A.2. Socioeconomic Factors

Appendix A.3. Distributions and Correlations

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Air Pollution, Socioeconomic Status, and Avoidable Hospitalizations: A Multifaceted Analysis^†