Next Article in Journal
A Methodological Framework for Assessing Overtourism in Insular Territories—Case Study of Santorini Island, Greece
Next Article in Special Issue
Leveraging a Cooler, Healthier, and Decarbonized School Commute: City-Scale Estimation and Implications for Nanjing, China
Previous Article in Journal
New Quality Productivity of Agriculture and Rural Areas at the Provincial Scale in China: Indicator Construction and Spatiotemporal Evolution
Previous Article in Special Issue
A 3D Model-Based Framework for Real-Time Emergency Evacuation Using GIS and IoT Devices
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Spatial Prediction of High-Risk Areas for Asthma in Metropolitan Areas: A Machine Learning Approach Applied to Tehran, Iran

1
Department of Geography and Urban-Rural Planning, Faculty of Social Sciences, University of Mohaghegh Ardabili, Ardabil 5619911367, Iran
2
Department of Geography, Faculty of Geography, University of Tehran, Tehran 1417853933, Iran
3
Center for Community Health Impact, UTHealth Houston School of Public Health, El Paso, TX 79902, USA
*
Author to whom correspondence should be addressed.
ISPRS Int. J. Geo-Inf. 2025, 14(3), 105; https://doi.org/10.3390/ijgi14030105
Submission received: 11 December 2024 / Revised: 30 January 2025 / Accepted: 23 February 2025 / Published: 1 March 2025
(This article belongs to the Special Issue HealthScape: Intersections of Health, Environment, and GIS&T)

Abstract

:
Asthma prevalence in large urban areas of developing countries is a significant public health concern, with increased rates driven by various socioeconomic and environmental factors. This study aims to predict asthma risk in Tehran, a major urban center in Iran. Data from 1473 asthma patients, alongside demographic, socioeconomic, air quality, environmental, weather, and healthcare access variables, were analyzed using geographic information systems (GIS) and remote sensing techniques. Three ensemble machine learning algorithms—Random Forest (RF), Gradient Boosting Machine (GBM), and Extreme Gradient Boosting (XGBoost)—were applied to model and predict asthma risk. A Negative Binomial Regression Model (NBRM) identified seven key predictors: population density, unemployment rate, particulate matter (PM2.5 and PM10), nitrogen dioxide (NO2), sulfur dioxide (SO2), neighborhood deprivation index, and road intersection density. Among the algorithms, GBM outperformed the others, with a training RMSE of 0.56 and a test RMSE of 1.07, demonstrating strong generalization. Additionally, GBM achieved the highest R-squared values (0.95 for training and 0.76 for testing) and lower MAE values (0.43 for training and 0.88 for testing). Effective pattern recognition was confirmed by EV values of 0.95 for training and 0.75 for testing, along with a Moran’s I value of 0.17, indicating minimal spatial autocorrelation.

1. Introduction

Asthma, a primary non-communicable disease, affects both children and adults, with the highest prevalence among children. It causes inflammation and narrowing of the airways, leading to symptoms like coughing, wheezing, breathlessness, and chest tightness [1]. Approximately 300 million people have asthma all over the world, with 250,000 deaths attributed to the disease annually, most of which are preventable [2]. As of a 2023 study, asthma prevalence varied across continents: Asia (3.44%), Africa (3.67%), South America (4.90%), Europe (5.69%), North America (8.29%), and Oceania (8.33%); among global asthma cases, 26.70% were severe, 30.99% were eosinophilic, 48.95% included allergic rhinitis, and 7.0% to 25.40% included nasal polyps [3]. Asthma is often underdiagnosed and undertreated, particularly in low- and middle-income countries [1]. Various studies have reported the prevalence of asthma in Iran in recent years at around 8.9% (95% confidence interval (CI): 8.5–9.3) [4,5].
Furthermore, demographic and socioeconomic factors like population density, aging, illiteracy, and unemployment are linked to higher rates of asthma prevalence and mortality in large urban areas [6]. High population density increases exposure to air pollution and asthma [7,8]. Aging populations may experience worsened asthma symptoms due to age-related changes [9,10]. Low literacy levels can hinder effective asthma management, leading to more severe outcomes [11]. Unemployment, often tied to lower socioeconomic status, limits access to healthcare and may increase exposure to asthma environmental triggers [12,13].
Previous studies have also shown that exposure to ambient air pollutants, including particulate matter (PM2.5 and PM10), nitrogen dioxide (NO2), ozone (O3), and sulfur dioxide (SO2), is a significant risk factor for asthma prevalence [14,15,16,17] Particulate matter, consisting of tiny particles suspended in the air, can penetrate deep into the respiratory system, triggering airway inflammation and exacerbating asthma symptoms [18]. NO2, primarily emitted from vehicles and industrial sources, has been associated with increased asthma incidence and severity, likely due to its irritant effects on the airways [19,20]. Increased levels of O3, a highly reactive gas formed by the interaction of sunlight with pollutants like NO2, were linked to asthma attacks and decreased lung function among individuals residing in urban centers [21]. SO2, primarily emitted from industrial processes and power plants, can irritate the airways and exacerbate respiratory conditions such as asthma [20].
Large urban areas in developing countries also pose significant built environmental risks for asthma onset or fatality. A study has indicated that the prevalence, severity, and morbidity of asthma have significantly increased among residents in low-income urban areas [21]. Poor-quality urban environments, characterized by socioeconomic disadvantage, inadequate access to healthcare, and poor housing conditions, contribute to the exacerbation of asthma symptoms in deprived areas [22,23,24,25]. Road intersection density, often associated with high traffic volume and vehicular emissions, is another significant determinant of asthma prevalence [25,26]. The proximity of residential areas to busy roads increases exposure to traffic-related air pollutants, such as particulate matter and nitrogen oxides, which can aggravate asthma symptoms and respiratory inflammation [27].
According to previous studies [28,29], the Normalized Difference Vegetation Index (NDVI), a green space and vegetation density measure, has been inversely associated with asthma prevalence. Higher levels of greenery in neighborhoods are linked to improved air quality, reduced pollution exposure, and enhanced respiratory health outcomes [28,29]. Exposure to industrial emissions, including SO2 and volatile organic compounds, is a known risk factor for asthma prevalence [30]. Additionally, proximity to fuel stations in urban areas, characterized by emissions from gasoline and diesel vehicles, is associated with increased asthma prevalence due to heightened exposure to benzene, a known respiratory irritant [31]. Urban heat islands (UHIs), where cities are hotter than surrounding areas due to human activities, worsen with climate change, leading to higher asthma prevalence and severity. Increased temperatures amplify air pollutants like ozone (O3) and particulate matter (PM2.5), which trigger asthma symptoms [32,33].
Climate change also prolongs heat waves and the pollen season, further aggravating asthma [34]. Urban areas experience high temperatures due to extreme climate conditions, worsening asthma by intensifying pollutants like ozone and particulate matter. Longer heat waves and extended pollen seasons also aggravate asthma [34]. Some studies confirmed that limited access to healthcare facilities in large cities of developing countries exacerbates asthma by delaying diagnosis, reducing treatment adherence, and hindering symptom management [35,36].
In Tehran, the capital city of Iran, it was estimated that the average asthma prevalence was 13.4% [37], exceeding the country’s overall average asthma prevalence of 8.9%. Certain studies indicated that asthma among adults in Tehran was 11.73% [5]. In children, the highest reported prevalence was 32% in Tehran, while in adolescents, it was 37% [38]. Previous research on asthma in Tehran has frequently explored the correlation between asthma and patient lifestyle [39], the link between asthma and food allergies or environmental pollutants [40], and the economic implications [41] associated with asthma. Some studies have also examined asthma’s spatial dimensions and spatial modeling within Tehran city using ensemble machine learning algorithms [42,43,44].
Spatial epidemiology is the application of theory and methods from epidemiology, geography, and statistics to describe spatial distributions of health outcomes and to analyze associations with possible causes to inform intervention and improve health [45]. Utilizing geographic information systems (GIS) for spatial analysis of asthma is essential for comprehending its prevalence in urban areas and informing prevention and management strategies [42,46,47]. Various geostatistical modeling techniques, including spatial regression [48], Bayesian [49], and machine learning algorithms (MLAs), integrate geographical and environmental data to identify patterns and correlations with asthma incidence [42].
Geographically Weighted Regression (GWR), Multiscale Geographically Weighted Regression (MGWR), Spatial Lag Model (SLM), Spatial Error Model (SEM), Bayesian Spatial Models, and Spatial Autoregressive Models (SAR) are primarily statistical or econometric models that incorporate spatial dependencies and structures [50]. These methods are valued for their interpretability and ability to model spatial relationships and dependencies explicitly [51]. However, they are not typically classified as machine learning algorithms (MLAs). Among the various machine learning algorithms, Random Forest (RF), Gradient Boosting Machine (GBM), and Extreme Gradient Boosting (XGBoost) are ensemble algorithms that have been specifically adapted for spatial count data. These algorithms capture complex, non-linear relationships, making them practical for regression and prediction tasks. While GBM improves accuracy by iteratively optimizing residual errors, RF minimizes overfitting by averaging multiple decision trees. Large spatial datasets benefit greatly from XGBoost’s scalability and regularization. To balance bias, variance, and overfitting, grid search and cross-validation were used to optimize the hyperparameters for all algorithms, including the number of trees, learning rate, and regularization terms. These algorithms were selected because of their versatility and resilience in identifying significant predictors and interactions in spatial epidemiological settings [42,52].
Machine learning (ML), which offers improved disease prediction, risk assessment, and individualized treatment skills, has emerged as a key instrument in medical research in recent years. The ability of machine learning algorithms like Random Forest (RF), Gradient Boosting Machine (GBM), and XGBoost to capture intricate, non-linear correlations between risk factors is demonstrated by successful applications in domains including diabetes, cardiovascular illnesses, and cancer [53,54,55], as well as in the prediction of infectious diseases such as COVID-19 [56]. In the context of asthma, machine learning (ML) is an emerging method that enhances conventional statistical techniques by offering more precise forecasts and insights into the incidence of asthma. Although there are currently few studies on machine learning in asthma risk prediction [42,43,44], newer studies show how ML is increasingly used to advance knowledge and direct focused public health initiatives. By using cutting-edge machine learning techniques to forecast asthma risk in Tehran, this work adds to this trend while tackling the city’s problems with extreme air pollution, urban heat island effects, and healthcare inequities. The study intends to further the integration of machine learning in asthma epidemiology by improving spatial risk prediction and informing public health initiatives using these methods.
This study fills essential gaps in the body of knowledge regarding the prevalence of asthma in Tehran. Although the prevalence of asthma in the city has been studied in previous years, machine-learning-based regression techniques for predicting spatial risk have not been widely applied. Furthermore, sociodemographic, built-environmental, and environmental factors—like air pollution, urban heat islands, and healthcare accessibility—have not all been thoroughly incorporated into a single spatial framework in previous studies. This study uses the Binomial Negative Regression Model (BNRM), geographic information systems (GIS), and sophisticated ensemble machine learning techniques, such as Random Forest (RF), Gradient Boosting Machine (GBM), and Extreme Gradient Boosting (XGBoost), to close these gaps. These resources pinpoint high-risk areas and offer helpful information for focused public health initiatives. The study uses extensive spatial regression models and a larger dataset to improve prediction accuracy and policy relevance. Tehran is a critical case study because of its extreme air pollution, notable urban heat island effects, and inequalities in access to healthcare. Asthma risks are increased by the city’s congested urban environment and industrial pollution, underscoring the necessity of efficient spatial modeling. This study contributes to a more thorough understanding of asthma prevalence by concentrating on Tehran and provides a solid framework for predicting spatial risk in highly polluted urban environments. It focuses on combining socioenvironmental factors to identify high-risk locations, which will help direct focused public health programs in the city.

2. Materials and Methods

2.1. Study Setting

Tehran, Iran’s capital, covers 730 square kilometers with a population density of about 11,000 per square kilometer. It has over 8 million residents within the city and more than 15 million in the metropolitan area [57]. The city, divided into 22 districts and 350 neighborhoods, provides a unique environment for studying asthma prevalence and risk factors. The city experiences a semi-arid climate marked by hot summers and cold winters, resulting in diverse seasonal air quality challenges [58,59]. Tehran’s urban heat island effects are significant, with temperature variations of up to 7 °C between urban and rural areas, influenced by dense construction and limited green spaces [60]. Approximately 7.5% of Tehran’s population is 65 years or older, the illiteracy rate is 7%, and the unemployment rate was reported to be about 7% [61].
Tehran is one of the world’s most polluted cities. It faces significant air pollution challenges, with elevated levels of particulate matter (PM2.5 and PM10) exceeding safe thresholds, reaching 30–50 µg/m3 for PM2.5 and 70–100 µg/m3 for PM10 during peak pollution seasons [62]. Additionally, NO2 levels persistently average 50–60 µg/m3, surpassing the WHO limit of 40 µg/m3. Ozone (O3) concentrations can spike to 150 µg/m3 in summer due to photochemical reactions from vehicular and industrial emissions, while SO2 levels are generally lower, averaging 20–30 µg/m3 with occasional industrial spikes [63,64].
According to calculations based on OSM data using a geographic information system, Tehran’s urban infrastructure showcases a high road intersection density, notably exceeding 400 intersections per square kilometer in central districts. Furthermore, according to the calculation based on Landsat 8 images, the city’s average Normalized Difference Vegetation Index (NDVI) was 0.3, indicating moderate vegetation cover, with higher values observed in the greener northern areas. The city has 6.5 m2 of green area for every person [65]. Proximity to industrial zones, particularly in the outskirts, exposes residents to significant industrial emissions [66]. There are about 120 public and private hospitals in Tehran, and about 60 hospitals provide services to asthma patients [61]. However, the spatial distribution of these hospitals is uneven, and local access to them varies. Figure 1 illustrates the location of the study area, the distribution of hospitals that have admitted patients with asthma, and the spatial distribution of asthma cases from 2020 to 2023.

2.2. Data Source and Its Processing

We employed a comprehensive dataset containing spatial and non-spatial data, consisting of 1 dependent variable and 15 independent variables selected based on existing literature. This dataset can be categorized into four primary groups: (1) census datasets, (2) remote sensing data acquired from Landsat 8 and Sentinel-5 satellite products, (3) GIS datasets, and (4) open-source datasets, which incorporate spatial data obtained from OpenStreetMap. The asthma data (response variable) consisted of information on 2179 patients, collected in Excel format from 70 hospitals affiliated with the Ministry of Health and Medical Education (MHME), spanning the period from 6 July 2020, to 2 August 2023, with an assigned ethics code number.
After incomplete or outlier cases were eliminated, 1473 patients made up the final sample for analysis. To protect patient privacy, the data were geocoded using the UTM coordinate system, which represents locations as neighborhood-level point features. This method allowed for spatial analysis while maintaining patient confidentiality. This secondary data analysis was approved as ethical and per the Ministry of Health and Medical Education’s guidelines. The confirmed asthma diagnoses in the area during the designated study period are represented by the asthma cases included in this analysis. The study’s reproducibility was ensured by adhering to a protocol that described the data integration and spatial analysis methodology. Although the data offer insightful information about the distribution of asthma, this study does not purport to provide a thorough epidemiological evaluation of all asthma cases in the study area.
Figure 2 shows the research methodology flowchart.
Patient information included age, sex, date of admission, date of discharge, hospitalization outcome (discharged, complete recovery, partial recovery, and death), and patient address. We used the patient addresses from the original dataset to create the point data for the dependent variable. These addresses were geocoded to determine each case’s exact geographic coordinates (latitude and longitude). Following this, the data were aggregated into 350 neighborhoods within Tehran city and compiled as asthma cases (N = 1473) (dependent variable) in a shapefile. The indicators considered are provided in Table 1, based on existing literature and data availability for the city. Population data and data on elderly individuals (aged 65 and older) were extracted from the 2016 census data [67]. The proportion of illiterate people (%) among individuals aged 6 and above and the proportion of unemployed people (%) among individuals in the labor force aged 15 to 64 were calculated for the 350 neighborhoods using census data. Additionally, information on the neighborhood deprivation index (percentage of deteriorated buildings in each neighborhood) was obtained from Tehran Municipality, indicating areas with deprived buildings.
Data on the most common air pollutants, including offline high-resolution imagery of the UV Aerosol Index (UVAI), also known as the Absorbing Aerosol Index (AAI), and levels of nitrogen dioxide (NO2), ozone (O3), and sulfur dioxide (SO2), were extracted using the Sentinel-5 Precursor, a satellite launched on 13 October 2017 by the European Space Agency to monitor air pollution. We employed the Google Earth Engine (GEE) cloud-based geospatial analysis platform to extract all city-level pollutant concentrations with a uniform cell size of 10 m. To calculate road intersection density (per square kilometer), we extracted all intersections from the road network using OpenStreetMap (OSM) data with QGIS software (version 3.36.3). The Normalized Difference Vegetation Index (NDVI) is a quantitative index of greenness ranging from 0 to 1, where 0 represents minimal or no greenness, and 1 represents maximum greenness [68]. The Normalized Difference Vegetation Index (NDVI) formula is typically represented as:
NDVI =   NIR     Red NIR + Red
NIR denotes near-infrared reflectance, while Red signifies red reflectance. NDVI is a pivotal metric within remote sensing analytical products employed for vegetation assessment [69]. NDVI was determined using Landsat 8 imagery. For Landsat 8, the NDVI calculation is expressed as (Band 5 − Band 4)/(Band 5 + Band 4) [70].
Data regarding the locations of industrial sites (V12), fuel stations (V13), and healthcare facilities (V15) were extracted from the city’s land use vector map, obtained from the Tehran City Municipality. To calculate V12, we computed the spatial density of industrial units per square kilometer. For variable V13, we used the Euclidean distance method to calculate the distance (in meters) between the neighborhood center and fuel stations. The proximity of these locations to the 350 neighborhoods’ centroids was assessed using Euclidean distance tools in ArcGIS Pro. To calculate V15, we computed the spatial density of health centers (such as hospitals) per square kilometer. These data were subsequently cross-referenced with OpenStreetMap datasets.
Urban heat islands (UHIs) are characterized by elevated temperatures in urban areas compared to their rural surroundings, primarily due to human activities. Land surface temperature (LST) is a key metric for identifying UHIs, with urban areas typically exhibiting higher LSTs due to heat-absorbing surfaces and reduced vegetation cover [71]. The UHIs were quantified using the USGS Landsat 8 Level 2, Collection 2, Tier 1 product via Google Earth Engine (GEE). A commonly used formula for LST calculation from satellite imagery is derived from Planck’s law, which relates the radiance detected by the sensor to the surface temperature. The formula is expressed as follows [68,72]:
LST = K 2 ln K 1 T B + 1 273 . 15
where LST represents the land surface temperature in degrees Celsius, and K1 and K2 denote calibration constants specific to the sensor utilized. TB signifies the brightness temperature recorded by the sensor. This formula stems from the fundamental principles of Planck’s law and the Stefan–Boltzmann law, elucidating the correlation between the emitted radiance from a surface and its temperature [68,72].
Particulate matter (AAI, including PM2.5 and PM10) (V5), nitrogen dioxide (NO2) (V6), ozone (O3) (V7), sulfur dioxide (SO2) (V8), and the Normalized Difference Vegetation Index (NDVI) (V11) and urban heat islands (UHIs) (V14) from Landsat 8 were among the environmental and air quality variables that we extracted using Google Earth Engine. Mean values were compiled for every spatial unit after raster data had been pre-processed to match neighborhood polygons spatially. We used the remote sensing variables’ annual average values for compatibility with the 2020–2023 aggregated asthma data. Additionally, using the Sentinel-5 data’s temporal coverage and spatial consistency, the spatial–temporal relationships between these variables and asthma prevalence were examined.
After preparing the spatial layers for each predictive variable, all selected spatial data indicators were transformed to the UTM Projection, WGS-84 Datum, Zone 39 N, with a uniform cell size of 30 m resolution to prevent any spatial output errors. The values of each variable (averaging where necessary) were then extracted for each neighborhood and stored in a geodatabase using ArcGIS Pro, maintaining the same coordinate system.
Figure 3 illustrates the spatial distribution of predictor value ranges across the study area, collected at the neighborhood level, which serves as the spatial analysis unit.

2.3. Analytical Methods

2.3.1. Statistical Methods Used for Descriptive Analysis

We employed Fisher’s exact test, an extension of the Chi-squared analysis [73], to evaluate the association between age, sex, and disease outcome to determine their independence under the null hypothesis. Additionally, analysis of variance (ANOVA) serves as a valuable statistical method for assessing disparities among the means of three or more groups, akin to an extension of the t-test for comparing multiple independent samples [74]. In this study, ANOVA was employed to assess the differences in mean ages across distinct age groups—categorized as “children” (age < 12), “adolescents” (age between 12 and 17), “adults” (age between 18 and 59), and “elderly” (age 60 and above)—concerning various disease outcome categories.

2.3.2. Statistical Methods Used for Inferential Analysis

Negative Binomial Regression Model (NBRM)

There are various methods to detect localized influential predictors in regression analysis, such as Explanatory Regression (ER), Ordinary Least Squares (OLS), and Generalized Linear Models (GLMs) [75]. GLMs are particularly noted for their effectiveness in pinpointing the most significant predictors in count data, especially when overdispersion is present. We assessed overdispersion by comparing the residual deviance to the degrees of freedom.
While machine learning algorithms are better than simple regressors in fitting data with multicollinearity, multicollinearity can affect model interpretability and feature importance across different machine learning algorithms [76]. Therefore, first, we ran a multicollinearity test to find predictors with high variance inflation factor (VIF) issues in R-Studio using the “car” package [77]. We removed highly correlated predictors according to previous studies (VIF > 5) [42]. We removed only one variable (V14) with the highest correlation (VIF = 5.2).
In the next step, Negative Binomial Regression (NBRM) was employed using the “MASS” package in R-Studio [78] to examine the relationship between asthma cases and predictors in this study. This method is ideal for counting data with overdispersion, a common issue in ecological and health studies [79]. NBRM extends Poisson regression to accommodate this variability, providing a better fit and more reliable estimates of coefficients and inference. In the generalized linear model framework, NBRM (Equation (3)) models the total count of events (Y) within a defined space–time interval, parameterized from a Poisson–gamma mixture as described by Hilbe [80], or equivalently, as the count of failures before achieving the (1/α)th success. The NBRM model can be written as [81]:
Pr y i | x i = Γ y i + α 1 y i ! Γ α 1 α 1 α 1 + μ i α 1 μ i α 1 + μ i y i
The formula represents the probability mass function of Negative Binomial Regression (NBRM), where Pr y i | x i denotes the probability of observing y i events given predictor variables x i . The function incorporates parameters α and μ i , which describe the dispersion and mean, respectively. This equation is crucial for modeling count data with overdispersion, accommodating scenarios where the variance exceeds the mean, as commonly encountered in ecological and health studies [81].

2.3.3. Geospatial and Spatial Statistics Methods for Spatial Analysis

Kernel Density Estimation (KDE)

A well-known quartic type of kernel density estimation (KDE) method [82] was applied to create a heat map with a 30-square-meters cell size resolution, mapping asthma cases within the study area. This method assesses asthma case density per square kilometer using a smooth, continuous surface fitted over observed data points. It employs a quartic (biweight) kernel to enhance spatial visualization and capture patterns effectively, enabling detailed mapping and nuanced modeling of event distribution [83]. The optimal bandwidth size was determined using the mean random distance (RD mean) method [84]. A bandwidth of 1000 m, based on RD mean calculations, was selected for its effectiveness in producing a smooth density map.

The Hot Spot Analysis (Getis-Ord Gi*)

The Getis-Ord Gi* statistic, known as hot spot analysis (HAS), is a cluster mapping technique employed in health-related analysis to examine event density in specific locations [85]. Hot spots are areas with concentrated incidents vital for disease prevention [86]. The Gi* statistic identifies local spatial autocorrelation through Z score and p-value, indicating clusters of high or low values [86]. Significant Z scores arise when a feature’s local sum and neighbors differ significantly from the overall sum, suggesting non-random clustering [87]. Using point feature patterns, the Gi* statistic has been widely used to pinpoint hot and cold spots [88]. This study utilized the Getis-Ord Gi* statistic to detect hot and cold spots of asthma occurrences based on sample data at the neighborhood level, applying the “K nearest neighbors” (KNN) method for spatial relationship conceptualization [86]. In spatial analysis, K nearest neighbors (KNN) identify the K closest polygons to a target based on metrics like Euclidean distance, which are ideal for detecting clusters or patterns involving neighboring polygons. This method is flexible, allowing us to adjust K for tailored proximity analysis and understand spatial relationships among features [85,86].
Our analysis used the false discovery rate (FDR), a statistical technique to manage false positives among significant outcomes during multiple-hypothesis testing. By lowering critical p-values, FDR addresses the heightened likelihood of false positives due to multiple comparisons, thereby enhancing accuracy in identifying genuinely significant findings in extensive datasets or spatial analyses [85,86].
The KDE creates a smooth surface to visualize event density, highlighting areas of high concentration without assessing statistical significance. At the same time, HAS (Getis-Ord Gi*) uses Z-scores and p-values to identify statistically significant clusters of high and low values, providing an analytical approach to detect local spatial autocorrelation [86].

2.3.4. Methods for Spatial Predictions Using MLAs

Among various types of machine learning algorithms, Random Forest (RF), Gradient Boosting Machine (GBM), and XGBoost algorithms were chosen for regression and prediction analysis. These MLAs are well-suited for spatial regression analysis due to their ability to manage complex relationships, large datasets, and noise. They enhance predictive accuracy, reveal key factors, and offer robust, generalizable insights into asthma prevalence [43,44,89].

Random Forest (RF)

Random Forest Regression (RF) is an ensemble-supervised machine learning algorithm developed by Leo Breiman and Adele Cutler, which creates models and generates predictions using an adaptation of the Random Forest algorithm [90]. The RF algorithm was designed and formulated for regression and prediction tasks. It constructs multiple decision trees using the bagging technique with bootstrapped samples, which involves generating random samples from the input data and training decision trees on these samples [91]. Developed by Ho in 1995 [92] and extended by Breiman in 2001 [93], RF is valued for its ease of implementation, speed, and high performance. For regression, it predicts by averaging the outputs of individual trees, reducing overfitting through majority voting. The algorithm ensures uncorrelated decision trees by selecting random feature subsets for training, which reduces model variance and enhances prediction accuracy, making it a robust choice for regression and prediction tasks. If implemented accordingly, RF can be adapted for count data, primarily through Poisson regression trees [91]. The RF algorithm constructs a model using a bagging technique, where multiple decision trees are created in parallel with random subsets of the training data. Each tree votes on an outcome, and the RF algorithm aggregates these votes for prediction. This ensemble method addresses overfitting issues of individual trees, resulting in a robust and intuitive model that requires fewer parameters [90].

Gradient Boosting Machine (GBM)

GBM is a robust machine learning algorithm used for regression tasks. It builds an ensemble of decision trees sequentially, where each new tree corrects the errors made by the previous ones by fitting them to the residuals. This technique uses weak learners (simple decision trees) and employs the mean squared error loss to improve the model progressively. Key features of GBM include the learning rate, which controls the contribution of each tree, and regularization techniques that prevent overfitting by limiting tree depth and applying penalties [94]. The algorithm also incorporates early stopping, monitoring performance on a validation set to halt training when improvement ceases while providing extensive hyperparameter tuning options for precise control over model complexity. Known for its high accuracy and ability to handle large datasets [95], GBM is highly effective in various practical applications. It offers superior performance in data-driven tasks and competitive machine learning challenges [96].

Extreme Gradient Boosting (XGBoost)

The XGBoost is an advanced implementation of GBM, recognized for its efficiency, flexibility, and optimization. It excels in handling sparse data, leveraging parallel processing, and utilizing weighted quantile sketching to enhance accuracy and scalability on large datasets. XGBoost supports custom loss functions, integrates built-in cross-validation, and is extensively used in competitive environments and industry applications for classification, regression, and ranking, owing to its strong performance and rich feature set [97].

MLA Implementation Procedure

Packages “randomForest” [93], “gbm”, and “xgboost” [97] were used in the R-Studio software (Version 2024.12.0+467), respectively, to run the algorithms. Similar settings were applied when running the algorithms to perform the same analyses and compare the model outputs. We uploaded the input database, including the six most significant predictors and our response variable (asthma cases), to the algorithms in R-Studio as a comma-separated file (*.csv). Next, we split the data into training and test sets to evaluate the model’s performance. We set 100 as the number of trees for each algorithm. To validate MLAs, 80% of the data is typically allocated for training, while the remaining 20% is reserved for testing using the holdout method. The holdout method is a technique in machine learning in which the dataset is split into separate training and testing sets to evaluate model performance [98].

MLAs Accuracy Assessment

The most commonly used validation metrics for count data in MLAs, including R-squared (R2), root mean squared error (RMSE), mean squared error (MSE), mean absolute error (MAE), and explained variance (EV) or variance explained (VE), were applied to measure model performance and select the best model for analysis and interpretation [99]. R-squared (R2) in machine learning measures how well a regression model fits the data, with values ranging from 0 to 1 indicating poor to perfect fit, respectively [99]:
R 2 = 1 i = 1 n p r e d i c t e d i a c t u a l i 2 i = 1 n a c t u a l i m e a n a c t u a l 2
MSE is a metric used in regression analysis to measure the average squared differences between predicted and actual values. It provides a way to quantify the overall quality of a model’s predictions, where lower MSE values indicate better performance [99]:
M S E = 1 n i = 1 n p r e d i c t e d i a c t u a l i 2
where the number of observations is represented by n.
RMSE in regression quantifies average prediction errors by taking the square root of the average squared differences between predicted and actual values, indicating prediction accuracy. Lower RMSE values signify better alignment between predicted and actual outcomes, indicating superior model performance in minimizing prediction errors [100]:
R M S E = 1 n i = 1 n p r e d i c t e d i a c t u a l i 2
MAE is a metric that measures the average absolute differences between predicted and actual values. It provides a straightforward way to quantify the magnitude of errors in a model’s predictions, where lower MAE values indicate better predictive accuracy [99]:
M A E = 1 n i = 1 n p r e d i c t e d i a c t u a l i
In statistics, explained variation (EV) measures the proportion to which a mathematical model accounts for a given dataset’s variation (dispersion). EV measures how well predictors (in regression) account for variability in the response variable or total dataset. Higher EV values indicate greater explanatory power, which is crucial for assessing model effectiveness or dimensionality reduction success [101]. We applied all suitable metrics to measure MLA’s performance using the “caret” package designed for MLAs [102].
After applying the MLAs, the Global Moran’s Index (GMI) was employed to assess spatial autocorrelation in model residuals, offering insights into potential spatial dependencies. This index is instrumental in identifying whether the residuals demonstrate significant spatial clustering or dispersion across the study area, which aids in selecting the most suitable model. The GMI is defined as follows [86,90]:
I = n i = 1 n j = 1 n w i j x i x ¯ x j x ¯ i = 1 n j = 1 n w i j i = 1 n x i x ¯ 2
where n represents the total number of spatial units (in this case, the number of neighborhoods in the Tehran metropolitan area); xi denotes the standardized death rate of overall cancers per 100,000 people in the neighborhood; I is the mean death rate across all counties; and wij represents the spatial weight between neighborhood i and j. Moran’s Index (I) ranges from −1 to +1, with values further from zero indicating stronger (positive or negative) spatial autocorrelation [103]. To compute the GMI, training and test data residuals were assigned as value fields for each study unit. This was followed by applying the GMI using GeoDa software (Version 1.22.0.4) [104].

3. Results

3.1. Non-Spatial Descriptive Findings

According to the sample data’s temporal trend of asthma prevalence, 1473 cases occurred over four years. Just two cases, or 0.14% of the total, were reported in 2020. This rose sharply to 823 cases (55.86%) in 2022 after increasing significantly to 46 cases (3.12%) in 2021. In 2023, there was a decrease, though, with 602 cases (40.88%). Data analysis revealed that the average age of the patients was 53 years. Among all cases, 47.5% (n = 699) were female, and 52.5% (n = 774) were male. Among all cases (N = 1473), 2.02% (n = 28) were adolescents, 38.6% (n = 569) were adults, 12.61% (n = 186) were children, and 46.77% (n = 690) were elderly. Examining the relationship between age groups and asthma using Fisher’s exact test showed a significant (p-value < 0.00) relationship between age groups and asthma. However, the results indicated that the relationship was not significant (p-value > 0.05) between gender and asthma prevalence, showing no substantial difference in the disease prevalence between genders. The average length of hospital stay of the patients was five days. The time distribution of the data shows that nearly 60% of the patients were admitted from February to July. Upon analysis, it was found that within the studied samples, 41.61% (n = 614) had achieved complete recovery, 11.12% (n = 163) had passed away, 11.83% (n = 172) were discharged, and 35.54% (n = 524) had partially recovered. The results of the ANOVA test indicated that the age group variable has a highly significant effect on the outcome of death (p < 2 × 10−16). Among the total deaths due to asthma (614 cases), 53% (n = 85) were men. Notably, the highest proportion of mortality was observed among the elderly age group (8.81%), followed by adults (2.24%).

3.2. Spatial Analysis Findings

3.2.1. KDE Method Results

In a study of 350 neighborhoods, the KDE map revealed significant variability in asthma cases. The mean KDE value is 3.4, with a standard deviation of 2.38 and a range of 11.73. The maximum and minimum KDE values are 11.77 and 0.04, respectively. Notably, 161 neighborhoods (approximately 46%) have KDE values exceeding 3.4 cases per square kilometer, indicating higher asthma burdens in these areas (see Figure 1).

3.2.2. Hot Spot Analysis Results

The “hot spot” analysis identified 87 hot spots (24.9%, with p-value < 0.05 and Gi* statistic > 1) and 88 cold spots (25.1%, with p-value < 0.05 and Gi* statistic > −1). Hot spots have significantly higher asthma cases, while cold spots have significantly lower concentrations. The spatial distribution of these spots reveals that areas with hot spots and high asthma burden are in the west and east of the city center, indicating localized clusters of high asthma incidence (see Figure 4).

3.3. Results from Negative Binomial Regression Model (NBRM)

In our analysis of the factors affecting the distribution of asthma cases in Tehran, we employed a Negative Binomial Regression Model (NBRM) to account for overdispersion in the count data. The predictors included variables V1 through V15, with the response variable being the count of asthma cases. We calculated each predictor’s variance inflation factor (VIF) to address multicollinearity. A VIF value exceeding 5 indicates potential multicollinearity issues. Predictors with VIF values greater than 5, such as V14, were removed iteratively. The final set of predictors included in our NBRM were free of significant multicollinearity. We assessed overdispersion by comparing the residual deviance to the degrees of freedom. The ratio of deviance to degrees of freedom was 0.632, indicating no significant overdispersion in the model. We employed a stepwise selection method to refine the model further. The final model, selected based on the lowest Akaike information criterion (AIC), is summarized in Table 2.
The lack of significant overdispersion and multicollinearity issues validated the model’s robustness. The stepwise selection process further optimized the model, ensuring the inclusion of only the six most relevant predictors. The final model emphasizes several key factors that significantly impact the distribution of asthma cases in Tehran. Variables V1 (population density), V4 (proportion of unemployed people (%)), V5 (particulate matter including PM2.5 and PM10), V6 (nitrogen dioxide (NO2)), V8 (sulfur dioxide (SO2)), V9 (neighborhood deprivation index (%)), and V10 (road intersection density) consistently emerged as significant across model iterations, highlighting their influence as the most critical predictors in our next analysis and MLA predictions (Table 3).

3.4. Results and Performance of MLAs

The evaluation of various machine learning algorithms (MLAs) for predicting the response variable “asthma cases” across different neighborhoods offers insights into the performance of Random Forest (RF), Gradient Boosting Machine (GBM), and XGBoost. The evaluation metrics, including RMSE (root mean squared error), R-squared, MAE (mean absolute error), explained variance (EV), and Moran’s I, help assess each model’s fit to the training data and its ability to generalize to unseen test data, as well as the spatial randomness of residuals. These metrics are universally applicable and highly relevant for RF, GBM, and XGBoost regression models. In this study, we used 20% of the spatial data for testing. The MLA diagnostics are summarized in Table 4.
The Random Forest algorithm demonstrates moderate performance, with a training RMSE of 0.56 and a test RMSE of 1.08, indicating a good fit on the training data but reduced performance on the test data. The R-squared values are 0.96 for the training set and 0.75 for the test set, suggesting that the algorithm explains a significant portion of the variance in the response variable, though less effectively on the test data. The MAE values are 0.40 for training and 0.84 for testing, highlighting higher prediction errors on unseen data. The explained variance (EV) values of 1.00 for training and 0.74 for testing further indicate that the algorithm performs well on the training data but may not generalize as effectively. The Moran’s I value of 0.29 indicates moderate spatial autocorrelation in the residuals.
The Gradient Boosting algorithm performs slightly better, with a training RMSE of 0.56 and a test RMSE of 1.07, suggesting strong generalization compared to the Random Forest algorithm. The R-squared values are 0.95 for training and 0.76 for testing, reflecting good explanatory power for the variance in the data. The MAE values of 0.43 for training and 0.88 for testing indicate slightly lower prediction errors than the Random Forest algorithm. Additionally, the EV values of 0.95 for training and 0.75 for testing suggest effective pattern capture without substantial overfitting. The Moran’s I value of 0.17 suggests lower spatial autocorrelation in the residuals compared to the Random Forest.
The XGBoost algorithm achieves the lowest training RMSE of 0.22 and the highest training R-squared of 0.99, indicating an almost perfect fit on the training data. However, the test RMSE of 1.21 and test R-squared of 0.69 suggest a notable decline in performance on the test data, implying potential overfitting. The MAE values are 0.16 for training and 0.91 for testing, showing minimal errors on the training set but higher errors on the test set. The EV values of 0.99 for training and 0.68 for testing further confirm that XGBoost may be overfitting to the training data. Additionally, Moran’s I value of 0.12 indicates the lowest level of spatial autocorrelation in the residuals among the three algorithms, suggesting that XGBoost may handle spatial dependencies more effectively.
Among the Random Forest (RF), Gradient Boosting Machine (GBM), and XGBoost algorithms evaluated, GBM emerged as the best performer. It achieved the lowest test RMSE (1.07), high R-squared values (0.95 for training and 0.76 for testing), and strong explained variance (EV) values (0.95 for training and 0.75 for testing), indicating a superior balance between training and test performance. The algorithm also exhibited minimal spatial autocorrelation in residuals, with a Moran’s I value of 0.17, making it the most robust and suitable for predicting asthma cases in this dataset.
The XGBoost algorithm, while showing the lowest training RMSE (0.22) and highest training R-squared (0.99), demonstrated a significant decline in performance on the test data (test RMSE of 1.21 and test R-squared of 0.69), suggesting potential overfitting. Its Moran’s I value of 0.12 indicates the lowest spatial autocorrelation in residuals, which is a positive aspect. Still, the overall model performance on test data was not as strong as GBM. The Random Forest algorithm had a test RMSE of 1.08, with R-squared values of 0.96 for training and 0.75 for testing, showing fair performance. However, it had a higher level of spatial autocorrelation in residuals (Moran’s I value of 0.29) compared to GBM and XGBoost.
In sum, the GBM algorithm is the most suitable for this regression task. It offers a well-balanced and robust performance across various metrics, including minimal spatial autocorrelation, making it the best choice among this dataset’s three algorithms for predicting asthma cases.
The variable importance diagnostics from the Gradient Boosting Machine (GBM) algorithm using R-Studio indicate that V1 (6.49%), V5 (4.53%), V9 (6.86%), and V10 (36.53%) are the most influential predictors for asthma occurrence, with V10 being the most significant. In contrast, V4 (1.2%), V6 (2.25%), and V8 (2.28%) contribute minimally to the model’s predictive power. Mapping the most essential variables provides a detailed view of asthma occurrence by highlighting spatial patterns and trends. Figure 5 displays the spatial distribution of the values of the most important predictors in the study area at a high-resolution scale (100-cell size) based on the Gradient Boosting algorithm, our best-fitting model.

3.5. Visualizing Risk Prediction of Disease

The primary outcome of our study is a risk map developed using key predictors to assess the probability of asthma in the study area (Figure 6). Utilizing our best-fitting Gradient Boosting Machine (GBM) algorithm, validated with appropriate metrics, we generated a risk prediction map to evaluate the likelihood of asthma occurrence. To assess the model’s accuracy, we employed a bivariate choropleth map in ArcGIS Pro version 3.2 [105]. This map compares the actual vs. predicted risk of asthma occurrence across various urban neighborhoods as determined by the GBM algorithm. The map demonstrates a strong agreement (R2 = 0.92) between the model’s predicted values and the observed asthma incident count per neighborhood. In a scatter plot, the R2 value, or the coefficient of determination, measures how well the independent variable(s) explain the variance in the dependent variable. R2 ranges from 0 to 1, where a value closer to 1 indicates that the data points fit the regression line well, meaning a strong relationship between the variables. Red line represents the identity line, indicating perfect prediction where observed values exactly match predicted values. Conversely, an R2 closer to 0 suggests a weak relationship, with the data points more widely scattered around the red line.
Created in QGIS (QGIS Development Team, 2024), the map is designed for a straightforward interpretation, with high-risk neighborhoods highlighted. Based on the GBM model’s results (Figure 6), around 164 neighborhoods (46.85%), home to an estimated 4,400,000 people (52% of the city’s total population), are identified as high-risk areas, with predicted values exceeding the mean of approximately 3.42. The map indicates that the southwest and southeast regions near the city center have the highest risk of asthma occurrence. These insights can guide public health interventions and resource allocation, ensuring that high-risk areas receive the necessary attention to mitigate asthma risks.

4. Discussion

This study revealed significant age-related differences in asthma prevalence and outcomes, with a notable relationship between age groups and asthma incidence. Our findings showed that elderly patients with asthma are at a higher risk for morbidity and mortality from their asthma than younger patients, as corroborated by previous studies [106,107,108]. Asthma morbidity and mortality are higher in the elderly due to underdiagnosis, comorbidities, and physiological changes like reduced lung elasticity and muscle strength. Immunosenescence, characterized by diminished immune responses and increased systemic inflammation that occurs with age, exacerbates asthma and infection risks [107,108]. Underutilization and reduced effectiveness of inhaled corticosteroids, coupled with airway remodeling, further elevate asthma risks. Additionally, age-related changes in lung structure, such as decreased chest wall compliance and increased airway obstruction, contribute to the severity of asthma in older adults [107,108].
In contrast, according to our findings, gender did not significantly correlate with asthma prevalence. Asthma prevalence is similar across genders due to balancing factors: boys have higher rates in childhood, while women have higher rates in adulthood due to hormonal influences and symptom reporting [109,110]. These opposing trends result in comparable overall prevalence. Genetic and environmental factors contributing to asthma risk also do not show strong gender bias, further equalizing prevalence rates between males and females [109,110].
Seasonal trends were observed, with most hospital admissions occurring from February to July, highlighting potential environmental or lifestyle factors influencing asthma exacerbations. According to previous studies [111,112,113,114], respiratory infections such as colds and flu are more common during these months, triggering asthma exacerbations [111]. Additionally, springtime increases pollen levels from trees, grasses, and flowers, leading to heightened allergic reactions in asthmatics. Changes in weather, including sudden temperature fluctuations and humidity [112,113] and increasing air pollutants [114], can also aggravate asthma symptoms. These combined factors contribute to the higher rate of asthma hospital admissions during this period.
Applying KDE analysis to 350 neighborhoods revealed significant spatial variability in asthma cases, with nearly half experiencing elevated asthma rates. The “hot spot” analysis further identified clusters of high and low asthma incidence, with notable concentrations of hot spots in the west and east of the city center. These results suggest localized factors affecting asthma prevalence and underscore the need to uncover underlying causes and guide targeted public health interventions. Asthma occurrences vary across large urban areas due to various environmental and socioeconomic factors and air pollutants, with some of these factors explored in this study. Higher asthma occurrences in specific neighborhoods are linked to elevated levels of air pollutants resulting from drought, traffic, and industrial activities [26]. Poor urban design, characterized by limited green spaces, combined with socioeconomic disparities, further impacts asthma occurrences, as lower-income areas face higher exposure to pollutants and have reduced access to healthcare [28,29]. Additionally, local climate and weather patterns can exacerbate asthma symptoms, leading to significant spatial differences in asthma occurrences within urban settings [112,113,114].
Using a validated and robust NBRM that exhibited no issues with overdispersion or multicollinearity, we identified population density (V1), proportion of unemployed people (V4), particulate matter (including PM2.5 and PM10) (V5), nitrogen dioxide (NO2) (V6), sulfur dioxide (SO2) (V8), neighborhood deprivation (V9), and road intersection density (V10) as the most significant predictors of asthma distribution (Table 3). These results underscore the critical influence of environmental characteristics (including built environment characteristics), air pollutants, and socioeconomic conditions on asthma prevalence. We will discuss these influential factors in detail in the following sections, as they were identified as the most significant predictors by our robust machine learning algorithm analysis results.
Among the MLAs evaluated for predicting asthma cases, the GBM emerged as the top performer. It achieved the lowest prediction errors, highest R-squared values, and superior explained variance, indicating a strong balance between training and test performance. Additionally, the GBM algorithm demonstrated minimal spatial autocorrelation in residuals, making it the most robust and suitable algorithm for this dataset. The GBM findings highlighted road intersection density (V10), neighborhood deprivation index (V9), population density (V1), and particulate matter (V5) as the most influential predictors of asthma occurrence, with road intersection density being the most significant. In contrast, sulfur dioxide (V8), nitrogen dioxide (V6), and the proportion of unemployed people (V4) contributed minimally to the model’s predictive power. This underscores the reliability and effectiveness of the GBM algorithm, making it a valuable tool for similar predictive tasks in the future.
According to the GBM algorithm, road intersection density (V10) predicts asthma cases significantly. Increased traffic congestion at intersections leads to higher emissions of pollutants such as particulate matter and nitrogen dioxide, exacerbating respiratory conditions and reducing air quality, contributing to the development and worsening of asthma symptoms [26]. Urban neighborhood deprivation (V9) is linked to asthma due to various interconnected factors, such as limited access to healthcare, higher pollution levels, poor housing conditions, increased stress, and greater exposure to environmental hazards. These conditions contribute to higher rates of respiratory problems, including asthma, in deprived and impoverished areas [21,22,23]. Population density (V1) is also associated with asthma due to higher pollution levels, increased exposure to allergens, and more frequent respiratory infections in densely populated areas. These factors collectively contribute to a higher prevalence of asthma [7,8]. Particulate matter (PM2.5 and PM10) (V5) is significantly associated with asthma. Fine particles (PM2.5) can penetrate the lungs, causing inflammation and aggravating cardiovascular and respiratory conditions [14]. Increased levels of PM2.5 are linked to a 2–3% rise in asthma symptoms among children. Similarly, elevated PM10 concentrations are correlated with more frequent emergency room visits and hospital admissions for asthma, underscoring the substantial impact of particulate matter on asthma prevalence and severity [14,15,32,33,34].
The primary outcome of this study was the creation of a risk map using the Gradient Boosting Machine (GBM) algorithm to evaluate asthma occurrence across different neighborhoods. This map compared predicted asthma risks with observed cases and identified several high-risk areas, notably in the southwest and southeast zones near the city center. Compared to findings from previous studies [42,43,44], our use of various machine learning algorithms (MLAs) and a broader set of predictors enabled us to predict areas with the highest disease risk with greater reliability and accuracy. This approach confirmed earlier results and provided more nuanced insights into the spatial distribution of asthma risk, enhancing the robustness and precision of our predictive modeling.

4.1. Strengths, Limitations, and Future Directions

This study utilized GIS, remote sensing (RS), and ensemble machine learning algorithms, specifically GBM, to predict asthma-prone areas in urban settings. Identifying key predictors such as population density, particulate matter (PM2.5 and PM10), neighborhood deprivation index, and road intersection density yields a detailed risk map of Tehran’s high-risk areas. These findings offer critical insights for targeted public health interventions, assisting community planners and administrators in managing asthma and optimizing resource allocation. However, several limitations must be acknowledged. The study may not fully account for age-related complexities, gender differences, or all the environmental and lifestyle factors that influence asthma. Spatial variability suggests that localized factors influence prevalence, but the study may not account for all contributing variables. Reliance on specific data sources, such as satellite imagery and census data, may result in biases in spatial resolution, temporal variability, and data coverage. For example, remote sensing data may have limitations in accurately capturing fine-scale variations in urban environments, whereas census data may not fully reflect population dynamics or specific subpopulations. Furthermore, excluding factors such as urban heat islands and socioeconomic disparities risks overlooking essential determinants. These data-related biases may limit the results’ generalizability, so caution is advised when applying these findings to other settings with different environmental or social characteristics.
Future research should incorporate more comprehensive data and additional variables to enhance predictive accuracy. Moreover, while machine learning algorithms are robust and predictive, they are computationally intensive and prone to overfitting, require extensive tuning, and can face obstacles of interpretability, outliers, and imbalanced data.

4.2. Policy Implications

The study’s conclusions have ramifications for public health initiatives and urban development strategies meant to reduce the prevalence of asthma. The reported regional variations in Tehran’s asthma prevalence highlight the need for a focused and localized strategy. Strict air quality monitoring, emission control in busy locations, and the encouragement of cleaner modes of transportation, especially in impoverished and highly populated neighborhoods, are all essential components of successful programs. Green space and buffer zone integration must be prioritized in urban development to lower pollution exposure and enhance air quality.
Addressing inequities in asthma outcomes requires improved access to healthcare services in high-risk communities and more public health surveillance. Additionally, the study highlights the need to address age-related vulnerabilities, particularly the increased risks older populations face due to comorbidities and decreased lung function. Tailored healthcare interventions, such as early diagnosis and better management strategies, are crucial to meeting their needs. Public awareness campaigns and infrastructure investments aimed at reducing traffic congestion can help minimize environmental triggers of asthma. Furthermore, addressing socioeconomic disparities through improved housing, education, and economic development initiatives can play a key role in mitigating the burden of asthma.
Important insights into the function of essential predictors such as particulate matter, neighborhood deprivation, and traffic intersection density were obtained by applying machine learning models, especially GBM. These results highlight the need for evidence-based decisions to efficiently distribute resources and create interventions focusing on high-risk locations. Metropolitan planners and legislators may promote healthier and more resilient areas by enacting egalitarian, data-driven, and locally relevant policies.

5. Conclusions

In this study, we integrated GIS, remote sensing (RS), and ensemble machine learning algorithms to predict asthma-prone areas in urban environments. Our results indicate that ensemble machine learning algorithms effectively identify asthma risk areas, with the Gradient Boosting Machine (GBM) algorithm demonstrating superior accuracy compared to other algorithms. Key predictors in our model were population density, particulate matter (PM2.5 and PM10), neighborhood deprivation index, and road intersection density. The resulting asthma risk map highlighted higher-risk areas in the southern and western parts of Tehran near the city center, where increased population density and transportation contribute significantly to air pollution levels. Such risk maps provide valuable tools for community planners and administrators to manage and mitigate asthma in these regions. Additionally, our findings offer essential insights for guiding public health interventions and optimizing resource allocation to address asthma risks in the most affected neighborhoods.

Supplementary Materials

The following supporting information can be downloaded at: https://www.mdpi.com/article/10.3390/ijgi14030105/s1, Table S1: Data.

Author Contributions

Conceptualization, Alireza Mohammadi; methodology, Alireza Mohammadi; software, Alireza Mohammadi; validation, Alireza Mohammadi; formal analysis, Alireza Mohammadi; investigation, Elahe Pishgar; resources, Elahe Pishgar; data curation, Alireza Mohammadi; writing—original draft preparation, Alireza Mohammadi; writing—review and editing, Juan Aguilera; visualization, Alireza Mohammadi; supervision, Alireza Mohammadi; project administration, Alireza Mohammadi; funding acquisition, Alireza Mohammadi. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the University of Mohaghegh Ardabili, Iran, under grant number 14341. The funder provided financial support for the study, and the grant number has been verified for accuracy.

Data Availability Statement

The original contributions presented in this study are included in the Supplementary Material. Further inquiries can be directed to the corresponding author.

Acknowledgments

The authors thank the Masih Daneshvari Hospital (Tehran) for making the data accessible. The authors express gratitude to the University of Mohaghegh Ardabili (Iran), for extending the opportunity to conduct this research. Moreover, the authors wish to convey their profound gratitude to the editor-in-chief for their supportive comments and suggestions.

Conflicts of Interest

The authors declare no conflicts of interest.

References

  1. World Health Organization. Asthma. Available online: https://www.who.int/news-room/fact-sheets/detail/asthma (accessed on 9 September 2024).
  2. Merhej, T.; Zein, J.G. Epidemiology of Asthma: Prevalence and Burden of Disease. Adv. Exp. Med. Biol. 2023, 1426, 3–23. [Google Scholar] [CrossRef] [PubMed]
  3. Rabe, A.P.J.; Loke, W.J.; Gurjar, K.; Brackley, A.; Lucero-Prisno, D.E. Global Burden of Asthma, and Its Impact on Specific Subgroups: Nasal Polyps, Allergic Rhinitis, Severe Asthma, Eosinophilic Asthma. J. Asthma Allergy 2023, 16, 1097–1113. [Google Scholar] [CrossRef] [PubMed]
  4. Seyedrezazadeh, E.; Gilani, N.; Ansarin, K.; Yousefi, M.; Sharifi, A.; Rouhi, A.H.J.; Aftabi, Y.; Najmi, M.; Dastan, I.; Moghaddam, M.P. Economic Burden of Asthma in Northwest Iran. Iran. J. Med. Sci. 2023, 48, 156–166. [Google Scholar] [CrossRef] [PubMed]
  5. Fazlollahi, M.R.; Najmi, M.; Fallahnezhad, M.; Sabetkish, N.; Kazemnejad, A.; Bidad, K.; Shokouhi Shoormasti, R.; Mahloujirad, M.; Pourpak, Z.; Moin, M. The Prevalence of Asthma in Iranian Adults: The First National Survey and the Most Recent Updates. Clin. Respir. J. 2018, 12, 1872–1881. [Google Scholar] [CrossRef] [PubMed]
  6. Holmes, L.; Enwere, M.; Williams, J.; Ogundele, B.; Chavan, P.; Piccoli, T.; Chinaka, C.; Comeaux, C.; Pelaez, L.; Okundaye, O.; et al. Black–White Risk Differentials in COVID-19 (SARS-CoV2) Transmission, Mortality and Case Fatality in the United States: Translational Epidemiologic Perspective and Challenges. Int. J. Environ. Res. Public Health 2020, 17, 4322. [Google Scholar] [CrossRef] [PubMed]
  7. Shin, S.; Bai, L.; Burnett, R.T.; Kwong, J.C.; Hystad, P.; van Donkelaar, A.; Lavigne, E.; Weichenthal, S.; Copes, R.; Martin, R.V.; et al. Air Pollution as a Risk Factor for Incident Chronic Obstructive Pulmonary Disease and Asthma: A 15-Year Population-Based Cohort Study. Am. J. Respir. Crit. Care Med. 2021, 203, 1138–1148. [Google Scholar] [CrossRef] [PubMed]
  8. Aslam, R.; Sharif, F.; Baqar, M.; Nizami, A.S.; Ashraf, U. Role of Ambient Air Pollution in Asthma Spread among Various Population Groups of Lahore City: A Case Study. Environ. Sci. Pollut. Res. 2023, 30, 8682–8697. [Google Scholar] [CrossRef] [PubMed]
  9. Nanda, A.; Baptist, A.P.; Divekar, R.; Parikh, N.; Seggev, J.S.; Yusin, J.S.; Nyenhuis, S.M. Asthma in the Older Adult. J. Asthma 2020, 57, 241–252. [Google Scholar] [CrossRef] [PubMed]
  10. Khosa, J.K.; Louie, S.; Moreno, P.L.; Abramov, D.; Rogstad, D.K.; Alismail, A.; Matus, M.J.; Tan, L.D. Asthma Care in the Elderly: Practical Guidance and Challenges for Clinical Management-A Framework of 5 “Ps.”. J. Asthma Allergy 2023, 16, 33–43. [Google Scholar] [CrossRef]
  11. Zaeh, S.E.; Ramsey, R.; Bender, B.; Hommel, K.; Mosnaim, G.; Rand, C. The Impact of Adherence and Health Literacy on Difficult-to-Control Asthma. J. Allergy Clin. Immunol. Pract. 2022, 10, 386–394. [Google Scholar] [CrossRef]
  12. Jabre, N.A.; Keet, C.A.; McCormack, M.; Peng, R.; Balcer-Whaley, S.; Matsui, E.C. Material Hardship and Indoor Allergen Exposure among Low-Income, Urban, Minority Children with Persistent Asthma. J. Community Health 2020, 45, 1017–1026. [Google Scholar] [CrossRef] [PubMed]
  13. Perez, M.F.; Coutinho, M.T. An Overview of Health Disparities in Asthma. Yale J. Biol. Med. 2021, 94, 497–507. [Google Scholar]
  14. Qin, P.; Luo, X.; Zeng, Y.; Zhang, Y.; Li, Y.; Wu, Y.; Han, M.; Qie, R.; Wu, X.; Liu, D.; et al. Long-Term Association of Ambient Air Pollution and Hypertension in Adults and in Children: A Systematic Review and Meta-Analysis. Sci. Total Environ. 2021, 796, 148620. [Google Scholar] [CrossRef] [PubMed]
  15. Singh, G.K.; Rai, S.; Jadon, N. Major Ambient Air Pollutants and Toxicity Exposure on Human Health and Their Respiratory System: A Review. J. Environ. Manag. Tour. 2021, 12, 1774–1788. [Google Scholar] [CrossRef]
  16. Ayres-Sampaio, D.; Teodoro, A.C.; Sillero, N.; Santos, C.; Fonseca, J.; Freitas, A. An Investigation of the Environmental Determinants of Asthma Hospitalizations: An Applied Spatial Approach. Appl. Geogr. 2014, 47, 10–19. [Google Scholar] [CrossRef]
  17. Alvarez-Mendoza, C.I.; Teodoro, A.; Freitas, A.; Fonseca, J. Spatial Estimation of Chronic Respiratory Diseases Based on Machine Learning Procedures—An Approach Using Remote Sensing Data and Environmental Variables in Quito, Ecuador. Appl. Geogr. 2020, 123, 102273. [Google Scholar] [CrossRef]
  18. Sonwani, S.; Madaan, S.; Arora, J.; Suryanarayan, S.; Rangra, D.; Mongia, N.; Vats, T.; Saxena, P. Inhalation Exposure to Atmospheric Nanoparticles and Its Associated Impacts on Human Health: A Review. Front. Sustain. Cities 2021, 3, 690444. [Google Scholar] [CrossRef]
  19. Naclerio, R.; Ansotegui, I.J.; Bousquet, J.; Canonica, G.W.; D’Amato, G.; Rosario, N.; Pawankar, R.; Peden, D.; Bergmann, K.C.; Bielory, L.; et al. International Expert Consensus on the Management of Allergic Rhinitis (AR) Aggravated by Air Pollutants: Impact of Air Pollution on Patients with AR: Current Knowledge and Future Strategies. World Allergy Organ. J. 2020, 13, 100106. [Google Scholar] [CrossRef]
  20. Manisalidis, I.; Stavropoulou, E.; Stavropoulos, A.; Bezirtzoglou, E. Environmental and Health Impacts of Air Pollution: A Review. Front. Public Health 2020, 8, 14. [Google Scholar] [CrossRef]
  21. Altman, M.C.; Kattan, M.; O’Connor, G.T.; Murphy, R.C.; Whalen, E.; LeBeau, P.; Calatroni, A.; Gill, M.A.; Gruchalla, R.S.; Liu, A.H.; et al. Associations between Outdoor Air Pollutants and Non-Viral Asthma Exacerbations and Airway Inflammatory Responses in Children and Adolescents Living in Urban Areas in the USA: A Retrospective Secondary Analysis. Lancet Planet. Health 2023, 7, e33–e44. [Google Scholar] [CrossRef]
  22. Keet, C.A.; McCormack, M.C.; Pollack, C.E.; Peng, R.D.; McGowan, E.; Matsui, E.C. Neighborhood Poverty, Urban Residence, Race/Ethnicity, and Asthma: Rethinking the Inner-City Asthma Epidemic. J. Allergy Clin. Immunol. 2015, 135, 655–662. [Google Scholar] [CrossRef] [PubMed]
  23. Sullivan, P.W.; Ghushchyan, V.; Kavati, A.; Navaratnam, P.; Friedman, H.S.; Ortiz, B. Health Disparities Among Children with Asthma in the United States by Place of Residence. J. Allergy Clin. Immunol. Pract. 2019, 7, 148–155. [Google Scholar] [CrossRef]
  24. Roy, S.; Majumder, S.; Bose, A.; Chowdhury, I.R. The Rich-Poor Divide: Unravelling the Spatial Complexities and Determinants of Wealth Inequality in India. Appl. Geogr. 2024, 166, 103267. [Google Scholar] [CrossRef]
  25. Stewart, I.T.; Clow, G.L.; Graham, A.E.; Bacon, C.M. Disparate Air Quality Impacts from Roadway Emissions on Schools in Santa Clara County (CA). Appl. Geogr. 2020, 125, 102354. [Google Scholar] [CrossRef]
  26. Khreis, H. Traffic, Air Pollution, and Health. In Advances in Transportation and Health; Elsevier: Amsterdam, The Netherlands, 2020; pp. 59–104. [Google Scholar]
  27. Gasana, J.; Dillikar, D.; Mendy, A.; Forno, E.; Ramos Vieira, E. Motor Vehicle Air Pollution and Asthma in Children: A Meta-Analysis. Environ. Res. 2012, 117, 36–45. [Google Scholar] [CrossRef] [PubMed]
  28. Yu, H.; Zhou, Y.; Wang, R.; Qian, Z.; Knibbs, L.D.; Jalaludin, B.; Schootman, M.; McMillin, S.E.; Howard, S.W.; Lin, L.Z.; et al. Associations between Trees and Grass Presence with Childhood Asthma Prevalence Using Deep Learning Image Segmentation and a Novel Green View Index. Environ. Pollut. 2021, 286, 117582. [Google Scholar] [CrossRef] [PubMed]
  29. Zeng, X.W.; Lowe, A.J.; Lodge, C.J.; Heinrich, J.; Roponen, M.; Jalava, P.; Guo, Y.; Hu, L.W.; Yang, B.Y.; Dharmage, S.C.; et al. Greenness Surrounding Schools Is Associated with Lower Risk of Asthma in Schoolchildren. Environ. Int. 2020, 143, 105967. [Google Scholar] [CrossRef] [PubMed]
  30. Buteau, S.; Shekarrizfard, M.; Hatzopolou, M.; Gamache, P.; Liu, L.; Smargiassi, A. Air Pollution from Industries and Asthma Onset in Childhood: A Population-Based Birth Cohort Study Using Dispersion Modeling. Environ. Res. 2020, 185, 109180. [Google Scholar] [CrossRef]
  31. Ly, B.-T.; Kajii, Y.; Nguyen, T.-Y.L.; Shoji, K.; Van, D.-A.; Do, T.-N.N.; Nghiem, T.-D.; Sakamoto, Y. Characteristics of Roadside Volatile Organic Compounds in an Urban Area Dominated by Gasoline Vehicles, a Case Study in Hanoi. Chemosphere 2020, 254, 126749. [Google Scholar] [CrossRef]
  32. Arunab, K.S.; Mathew, A. Quantifying Urban Heat Island and Pollutant Nexus: A Novel Geospatial Approach. Sustain. Cities Soc. 2024, 101, 105117. [Google Scholar] [CrossRef]
  33. Aghamohammadi, N.; Ramakreshnan, L.; Supramanian, R.K.; Lim, Y.C. Climate Change Adaptation and Public Health Strategies in Malaysia. In Climate Change and Human Health Scenarios: International Case Studies; Springer: Berlin/Heidelberg, Germany, 2023; pp. 99–113. [Google Scholar]
  34. D’Amato, G.; Chong-Neto, H.J.; Monge Ortega, O.P.; Vitale, C.; Ansotegui, I.; Rosario, N.; Haahtela, T.; Galan, C.; Pawankar, R.; Murrieta-Aguttes, M.; et al. The Effects of Climate Change on Respiratory Allergy and Asthma Induced by Pollen and Mold Allergens. Allergy Eur. J. Allergy Clin. Immunol. 2020, 75, 2219–2228. [Google Scholar] [CrossRef] [PubMed]
  35. Simich, C.S.; Jones, M.P. Chapter Asthma. In Urban Emergency Medicine; Cambridge University Press & Assessment: Cambridge, UK, 2023; p. 98. [Google Scholar]
  36. Yasaratne, D.; Idrose, N.S.; Dharmage, S.C. Asthma in Developing Countries in the Asia-Pacific Region (APR). Respirology 2023, 28, 992–1004. [Google Scholar] [CrossRef] [PubMed]
  37. Sabeti, Z.; Ansarin, K.; Seyedrezazadeh, E.; Asghari Jafarabadi, M.; Zafari, V.; Dastgiri, S.; Shakerkhatibi, M.; Gholampour, A.; Ghanbari Ghozikali, M.; Ghasemzadeh, R.; et al. A Comparison of Asthma Prevalence in Adolescents Living in Urban and Semi-Urban Areas in Northwestern Iran. Hum. Ecol. Risk Assess. 2021, 27, 2051–2068. [Google Scholar] [CrossRef]
  38. Rahimian, N.; Aghajanpour, M.; Jouybari, L.; Ataee, P.; Fathollahpour, A.; Lamuch-Deli, N.; Kooti, W.; Kalmarzi, R.N. The Prevalence of Asthma among Iranian Children and Adolescent: A Systematic Review and Meta-Analysis. Oxid. Med. Cell. Longev. 2021, 2021, 6671870. [Google Scholar] [CrossRef] [PubMed]
  39. Shariat, M.; Rostamian, E.; Moayeri, H.; Shariat, M.; Sharifi, L. A Review on the Relation between Obesity and Vitamin D with Pediatric Asthma, and a Report of a Pilot Study in Tehran, Iran: Review Article. Tehran Univ. Med. J. 2020, 78, 274–283. [Google Scholar]
  40. Masoud, F.; Kashi, G. Air Pollution on Mortality from Asthma in Tehran during the Years 1391 to 1394. Iran. J. Allergy Asthma Immunol. 2018, 17, 181–182. [Google Scholar]
  41. Sharifi, L.; Pourpak, Z.; Fazlollahi, M.R.; Bokaie, S.; Moezzi, H.R.; Kazemnejad, A.; Moin, M. Asthma Economic Costs in Adult Asthmatic Patients in Tehran, Iran. Iran. J. Public Health 2015, 44, 1212–1218. [Google Scholar]
  42. Razavi-termeh, S.V.; Sadeghi-niaraki, A.; Choi, S.M. Spatial Modeling of Asthma-prone Areas Using Remote Sensing and Ensemble Machine Learning Algorithms. Remote Sens. 2021, 13, 3222. [Google Scholar] [CrossRef]
  43. Razavi-Termeh, S.V.; Sadeghi-Niaraki, A.; Choi, S.M. Spatio-Temporal Modelling of Asthma-Prone Areas Using a Machine Learning Optimized with Metaheuristic Algorithms. Geocarto Int. 2022, 37, 9917–9942. [Google Scholar] [CrossRef]
  44. Razavi-Termeh, S.V.; Sadeghi-Niaraki, A.; Choi, S.M. Asthma-Prone Areas Modeling Using a Machine Learning Model. Sci. Rep. 2021, 11, 1912. [Google Scholar] [CrossRef] [PubMed]
  45. Morrison, C.N.; Mair, C.F.; Bates, L.; Duncan, D.T.; Branas, C.C.; Bushover, B.R.; Mehranbod, C.A.; Gobaud, A.N.; Uong, S.; Forrest, S.; et al. Defining Spatial Epidemiology: A Systematic Review and Re-Orientation. Epidemiology 2024, 35, 542–555. [Google Scholar] [CrossRef] [PubMed]
  46. Kappas, M. GIS and Remote Sensing for Public Health. In Geospatial Data Science in Healthcare for Society 5.0; Springer: Berlin/Heidelberg, Germany, 2022; pp. 79–97. [Google Scholar]
  47. Cushing, A.M.; Khan, M.A.; Kysh, L.; Brakefield, W.S.; Ammar, N.; Liberman, D.B.; Wilson, J.; Shaban-Nejad, A.; Espinoza, J. Geospatial Data in Pediatric Asthma in the United States: A Scoping Review Protocol. JBI Evid. Synth. 2022, 20, 2790–2798. [Google Scholar] [CrossRef] [PubMed]
  48. Studies, P. Spatial Analysis and Determinants of Asthma Health and Health Services Use Outcomes in Ontario. Master's Thesis, University of Ottawa, Ottawa, ON, Canada, 2016. [Google Scholar]
  49. Spyroglou, I.I.; Spöck, G.; Chatzimichail, E.A.; Rigas, A.G.; Paraskakis, E.N. A Bayesian Logistic Regression Approach in Asthma Persistence Prediction. Epidemiol. Biostat. Public Health 2018, 15, e12777-1–e12777-14. [Google Scholar] [CrossRef]
  50. Roy, S.; Chowdhury, I.R. Intoxication in the City: Investigating Spatial Patterns and Determinants of Drugs and Alcohol-Related Illegal Activities in India's Geostrategic Corridor. Appl. Geogr. 2024, 171, 103386. [Google Scholar] [CrossRef]
  51. Grekousis, G. Spatial Analysis Methods and Practice: Describe-Explore-Explain Through GIS; Cambridge University Press: Cambridge, UK, 2020; ISBN 9781108614528. [Google Scholar]
  52. Amaral, J.L.M.; Sancho, A.G.; Faria, A.C.D.; Lopes, A.J.; Melo, P.L. Differential Diagnosis of Asthma and Restrictive Respiratory Diseases by Combining Forced Oscillation Measurements, Machine Learning and Neuro-Fuzzy Classifiers. Med. Biol. Eng. Comput. 2020, 58, 2455–2473. [Google Scholar] [CrossRef]
  53. Placido, D.; Yuan, B.; Hjaltelin, J.X.; Zheng, C.; Haue, A.D.; Chmura, P.J.; Yuan, C.; Kim, J.; Umeton, R.; Antell, G.; et al. A Deep Learning Algorithm to Predict Risk of Pancreatic Cancer from Disease Trajectories. Nat. Med. 2023, 29, 1113–1122. [Google Scholar] [CrossRef] [PubMed]
  54. Zhang, L.; Wang, Y.; Niu, M.; Wang, C.; Wang, Z. Machine Learning for Characterizing Risk of Type 2 Diabetes Mellitus in a Rural Chinese Population: The Henan Rural Cohort Study. Sci. Rep. 2020, 10, 4406. [Google Scholar] [CrossRef]
  55. Oikonomou, E.K.; Khera, R. Machine Learning in Precision Diabetes Care and Cardiovascular Risk Prediction. Cardiovasc. Diabetol. 2023, 22, 259. [Google Scholar] [CrossRef]
  56. Razavi-Termeh, S.V.; Sadeghi-Niaraki, A.; Farhangi, F.; Choi, S.-M. COVID-19 Risk Mapping with Considering Socio-Economic Criteria Using Machine Learning Algorithms. Int. J. Environ. Res. Public Health 2021, 18, 9657. [Google Scholar] [CrossRef]
  57. World Population Review. Sharjah Population 2024; World Population Review: Walnut, CA, USA, 2024. [Google Scholar]
  58. Maghrebi, M.; Danandeh Mehr, A.; Karrabi, S.M.; Sadegh, M.; Partani, S.; Ghiasi, B.; Nourani, V. Spatiotemporal Variations of Air Pollution during the COVID-19 Pandemic across Tehran, Iran: Commonalities with and Differences from Global Trends. Sustainability 2022, 14, 16313. [Google Scholar] [CrossRef]
  59. Dehghan, A.; Khanjani, N.; Bahrampour, A.; Goudarzi, G.; Yunesian, M. The Relation between Air Pollution and Respiratory Deaths in Tehran, Iran- Using Generalized Additive Models. BMC Pulm. Med. 2018, 18, 49. [Google Scholar] [CrossRef] [PubMed]
  60. Kiavarz, M.; Hosseinbeigi, S.B.; Mijani, N.; Shahsavary, M.S.; Firozjaei, M.K. Predicting Spatial and Temporal Changes in Surface Urban Heat Islands Using Multi-Temporal Satellite Imagery: A Case Study of Tehran Metropolis. Urban Clim. 2022, 45, 101258. [Google Scholar] [CrossRef]
  61. Management and Planning Organization of Tehran Province. Results of the 2015 Census of Tehran Province and City; MPO: Tehran, Iran, 2016. [Google Scholar]
  62. Mohammadi, A.; Pishgar, E.; Fatima, M.; Lotfata, A.; Fanni, Z.; Bergquist, R.; Kiani, B. The COVID-19 Mortality Rate Is Associated with Illiteracy, Age, and Air Pollution in Urban Neighborhoods: A Spatiotemporal Cross-Sectional Analysis. Trop. Med. Infect. Dis. 2023, 8, 85. [Google Scholar] [CrossRef]
  63. Khoshakhlagh, A.H.; Mohammadzadeh, M.; Morais, S. Air Quality in Tehran, Iran: Spatio-Temporal Characteristics, Human Health Effects, Economic Costs and Recommendations for Good Practice. Atmos. Environ. X 2023, 19, 100222. [Google Scholar] [CrossRef]
  64. Banirazi Motlagh, S.H.; Pons-Valladares, O.; Hosseini, S.M.A. City-Scale Model to Assess Rooftops Performance on Air Pollution Mitigation; Validation for Tehran. Build. Environ. 2023, 244, 110746. [Google Scholar] [CrossRef]
  65. Ramyar, R.; Saeedi, S.; Bryant, M.; Davatgar, A.; Mortaz Hedjri, G. Ecosystem Services Mapping for Green Infrastructure Planning–The Case of Tehran. Sci. Total Environ. 2020, 703, 135466. [Google Scholar] [CrossRef] [PubMed]
  66. Gheshlaghpoor, S.; Abedi, S.S.; Moghbel, M. The Relationship between Spatial Patterns of Urban Land Uses and Air Pollutants in the Tehran Metropolis, Iran. Landsc. Ecol. 2023, 38, 553–565. [Google Scholar] [CrossRef]
  67. Statistical Centre of Iran (SCI). Tehran City Housing and Income Census Data and Reports 2016; Statistical Centre of Iran: Tehran, Iran, 2022. [Google Scholar]
  68. Roy, S.; Bose, A.; Majumder, S.; Roy Chowdhury, I.; Abdo, H.G.; Almohamad, H.; Abdullah Al Dughairi, A. Evaluating Urban Environment Quality (UEQ) for Class-I Indian City: An Integrated RS-GIS Based Exploratory Spatial Analysis. Geocarto Int. 2022, 38, 2153932. [Google Scholar] [CrossRef]
  69. Huang, S.; Tang, L.; Hupy, J.P.; Wang, Y.; Shao, G. A Commentary Review on the Use of Normalized Difference Vegetation Index (NDVI) in the Era of Popular Remote Sensing. J. For. Res. 2021, 32, 1–6. [Google Scholar] [CrossRef]
  70. U.S. Geological Survey. USGS Landsat Normalized Difference Vegetation Index|U.S. Geological Survey. Available online: https://www.usgs.gov/landsat-missions/landsat-normalized-difference-vegetation-index (accessed on 20 June 2024).
  71. Santamouris, M.; Cartalis, C.; Synnefa, A.; Kolokotsa, D. On the Impact of Urban Heat Island and Global Warming on the Power Demand and Electricity Consumption of Buildings—A Review. Energy Build. 2015, 98, 119–124. [Google Scholar] [CrossRef]
  72. Kumari, B.; Tayyab, M.; Shahfahad; Salman; Mallick, J.; Khan, M.F.; Rahman, A. Satellite-Driven Land Surface Temperature (LST) Using Landsat 5, 7 (TM/ETM+ SLC) and Landsat 8 (OLI/TIRS) Data and Its Association with Built-Up and Green Cover Over Urban Delhi, India. Remote Sens. Earth Syst. Sci. 2018, 1, 63–78. [Google Scholar] [CrossRef]
  73. Ufondu, A.N.; Shukla, U.C.; Stambaugh, C.; Huber, K.E.; Stambaugh, N. Categorical Variable Analyses: Chi-Square, Fisher's Exact, Mantel–Haenszel. In Translational Radiation Oncology; Eltorai, A.E.M., Bakal, J.A., Kim, D.W., Wazer, D.E., Eds.; Academic Press: Cambridge, MA, USA, 2023; pp. 165–170. ISBN 9780323884235. [Google Scholar]
  74. Jones, G.P.; Stambaugh, C.; Stambaugh, N.; Huber, K.E. Analysis of Variance. In Translational Radiation Oncology; Eltorai, A.E.M., Bakal, J.A., Kim, D.W., Wazer, D.E., Eds.; Academic Press: Cambridge, MA, USA, 2023; pp. 171–177. ISBN 9780323884235. [Google Scholar]
  75. ESRI. How Exploratory Regression Works. Available online: https://pro.arcgis.com/en (accessed on 26 October 2022).
  76. Chan, J.Y.; Leow, S.M.; Bea, K.T.; Cheng, W.K.; Phoong, S.W.; Hong, Z.-W.; Lin, J.-M.; Chen, Y.-L. A Correlation-Embedded Attention Module to Mitigate Multicollinearity: An Algorithmic Trading Application. Mathematics 2022, 10, 1231. [Google Scholar] [CrossRef]
  77. Fox, J.; Weisberg, S. An R Companion to Applied Regression, 3rd ed.; Sage: Thousand Oaks, CA, USA, 2011. [Google Scholar]
  78. Venables, W.N.; Ripley, B.D. Modern Applied Statistics with S, 4th ed.; Springer: New York, NY, USA, 2002; ISBN 0-387-95457-0. [Google Scholar]
  79. Stevens, R.S.; Dean, M.D.; Miller, J.S.; Dougald, L.E. Monitoring Crash Impacts of Exceptions to Entrance Spacing Standards: Lessons Learned from Virginia. Case Stud. Transp. Policy 2020, 8, 648–657. [Google Scholar] [CrossRef]
  80. Hilbe, J.M. Negative Binomial Regression; Cambridge University Press: Cambridge, UK, 2007; ISBN 9780511811852. [Google Scholar]
  81. Cappai, S.; Rolesu, S.; Coccollone, A.; Laddomada, A.; Loi, F. Evaluation of Biological and Socio-Economic Factors Related to Persistence of African Swine Fever in Sardinia. Prev. Vet. Med. 2018, 152, 1–11. [Google Scholar] [CrossRef]
  82. Silverman, B.W. Density Estimation: For Statistics and Data Analysis; Routledge: London, UK, 2018; ISBN 9781351456173. [Google Scholar]
  83. Carlos, H.A.; Shi, X.; Sargent, J.; Tanski, S.; Berke, E.M. Density Estimation and Adaptive Bandwidths: A Primer for Public Health Practitioners. Int. J. Health Geogr. 2010, 9, 39. [Google Scholar] [CrossRef] [PubMed]
  84. Chun, Y.; Griffith, D.A. Spatial Statistics and Geostatistics: Theory and Applications for Geographic Information Science and Technology; Sage Publishing: Thousand Oaks, CA, USA, 2012. [Google Scholar]
  85. ESRI. ArcGIS Pro Help. Available online: https://pro.arcgis.com/en/pro-app/latest/help/main/welcome-to-the-arcgis-pro-app-help.htm (accessed on 10 June 2024).
  86. Mitchel, A. Volume 2: Spartial Measurements and Statistics. In The ESRI Guide to GIS Analysis; ESRI Press: Bucharest, Romania, 2005; Volume 2. [Google Scholar]
  87. Bornmann, L.; de Moya Angeon, F. Hot and Cold Spots in the US Research: A Spatial Analysis of Bibliometric Data on the Institutional Level. J. Inf. Sci. 2019, 45, 84–91. [Google Scholar] [CrossRef]
  88. Ord, J.K. Art Getis and local spatial statistics. J. Geogr. Syst. 2024, 26, 191–200. [Google Scholar] [CrossRef]
  89. Farahani, M.; Razavi-Termeh, S.V.; Sadeghi-Niaraki, A. A Spatially Based Machine Learning Algorithm for Potential Mapping of the Hearing Senses in an Urban Environment. Sustain. Cities Soc. 2022, 80, 103675. [Google Scholar] [CrossRef]
  90. Environmental Systems Research Institute (ESRI). ArcGIS Professional GIS Help. Available online: https://pro.arcgis.com/en/pro-app/latest/help (accessed on 4 August 2024).
  91. Mienye, I.D.; Sun, Y. A Survey of Ensemble Learning: Concepts, Algorithms, Applications, and Prospects. IEEE Access 2022, 10, 99129–99149. [Google Scholar] [CrossRef]
  92. Ho, T.K. Random Decision Forests. In Proceedings of the International Conference on Document Analysis and Recognition, ICDAR, Montreal, QC, Canada, 14–16 August 1995; IEEE: New York, NY, USA, 1995; Volume 1, pp. 278–282. [Google Scholar]
  93. Breiman, L. Random Forests. Mach. Learn. 2001, 45, 5–32. [Google Scholar] [CrossRef]
  94. Friedman, J.H. Greedy Function Approximation: A Gradient Boosting Machine. Ann. Stat. 2001, 29, 1189–1232. [Google Scholar] [CrossRef]
  95. Miller, A.; Panneerselvam, J.; Liu, L. A Review of Regression and Classification Techniques for Analysis of Common and Rare Variants and Gene-Environmental Factors. Neurocomputing 2022, 489, 466–485. [Google Scholar] [CrossRef]
  96. Natekin, A.; Knoll, A. Gradient Boosting Machines, a Tutorial. Front. Neurorobot. 2013, 7, 21. [Google Scholar] [CrossRef] [PubMed]
  97. Chen, T.; Guestrin, C. XGBoost: A Scalable Tree Boosting System. In Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Francisco, CA, USA, 13–17 August 2016; pp. 785–794. [Google Scholar]
  98. Raschka, S. Model Evaluation, Model Selection, and Algorithm Selection in Machine Learning. arXiv 2018, arXiv:arXiv1811.12808. [Google Scholar]
  99. Dagli, B.Y. Application of a Statistical Regression Technique for Dynamic Analysis of Submarine Pipelines. J. Mar. Sci. Eng. 2024, 12, 955. [Google Scholar] [CrossRef]
  100. Eastman, J.R. TerrSet Geospatial Monitoring and Modeling System; Clark University: Worcester, MA, USA, 2016; pp. 345–389. Available online: https://www.clarku.edu/centers/geospatial-analytics/terrset/ (accessed on 12 June 2024).
  101. Barbur, V.A.; Montgomery, D.C.; Peck, E.A. Introduction to Linear Regression Analysis; John Wiley & Sons: Hoboken, NJ, USA, 1994; Volume 43, ISBN 1119578752. [Google Scholar]
  102. Kuhn, M. Building Predictive Models in R Using the Caret Package. J. Stat. Softw. 2008, 28, 1–26. [Google Scholar] [CrossRef]
  103. Vilinová, K. Spatial Autocorrelation of Breast and Prostate Cancer in Slovakia. Int. J. Environ. Res. Public Health 2020, 17, 4440. [Google Scholar] [CrossRef] [PubMed]
  104. Anselin, L. GeoDa [software]. Version 1.22.0.4. 2023. Available online: https://geodacenter.github.io/ (accessed on 10 June 2024).
  105. ESRI ArcGIS Pro. ArcGIS PRO Modul. 4—Data Anal; University of Toronto: Toronto, ON, Canada, 2022; p. 3. [Google Scholar]
  106. Pishgar, E.; Fanni, Z.; Tavakkolinia, J.; Mohammadi, A.; Kiani, B.; Bergquist, R. Mortality Rates Due to Respiratory Tract Diseases in Tehran, Iran during 2008-2018: A Spatiotemporal, Cross-Sectional Study. BMC Public Health 2020, 20, 1414. [Google Scholar] [CrossRef] [PubMed]
  107. D’Amato, G.; Vitale, C.; Molino, A.; Stanziola, A.; Sanduzzi, A.; Vatrella, A.; Mormile, M.; Lanza, M.; Calabrese, G.; Antonicelli, L. Asthma-Related Deaths. Multidiscip. Respir. Med. 2016, 11, 37. [Google Scholar] [CrossRef] [PubMed]
  108. Dunn, R.M.; Busse, P.J.; Wechsler, M.E. Asthma in the Elderly and Late-Onset Adult Asthma. Allergy Eur. J. Allergy Clin. Immunol. 2018, 73, 284–294. [Google Scholar] [CrossRef] [PubMed]
  109. Fuhlbrigge, A.L.; Jackson, B.; Wright, R.J. Gender and Asthma. Immunol. Allergy Clin. N. Am. 2002, 22, 753–789. [Google Scholar] [CrossRef]
  110. Oraka, E.; Kim, H.J.E.; King, M.E.; Callahan, D.B. Asthma Prevalence among US Elderly by Age Groups: Age Still Matters. J. Asthma 2012, 49, 593–599. [Google Scholar] [CrossRef] [PubMed]
  111. Chan, K.-P.F.; Kwok, W.-C.; Ma, T.-F.; Hui, C.-H.; Tam, T.C.-C.; Wang, J.K.-L.; Ho, J.C.-M.; Lam, D.C.-L.; Sau-Man Ip, M.; Ho, P.-L. Territory-Wide Study on Hospital Admissions for Asthma Exacerbations in the COVID-19 Pandemic. Ann. Am. Thorac. Soc. 2021, 18, 1624–1633. [Google Scholar] [CrossRef] [PubMed]
  112. Bagheri, O.; Moeltner, K.; Yang, W. Respiratory Illness, Hospital Visits, and Health Costs: Is It Air Pollution or Pollen? Environ. Res. 2020, 187, 109572. [Google Scholar] [CrossRef]
  113. Chen, Y.; Kong, D.; Fu, J.; Zhang, Y.; Zhao, Y.; Liu, Y.; Chang, Z.; Liu, Y.; Liu, X.; Xu, K.; et al. Associations between Ambient Temperature and Adult Asthma Hospitalizations in Beijing, China: A Time-Stratified Case-Crossover Study. Respir. Res. 2022, 23, 38. [Google Scholar] [CrossRef] [PubMed]
  114. Ko, F.W.S.; Lau, L.H.S.; Ng, S.S.; Yip, T.C.F.; Wong, G.L.H.; Chan, K.P.; Chan, T.O.; Hui, D.S.C. Respiratory Admissions before and during the COVID-19 Pandemic with Mediation Analysis of Air Pollutants, Mask-Wearing and Influenza Rates. Respirology 2023, 28, 47–55. [Google Scholar] [CrossRef]
Disclaimer/Publisher’s Note: The statements, opinions, and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions, or products referred to in the content.
Figure 1. Spatial distribution map of asthma cases and their density in the study area.
Figure 1. Spatial distribution map of asthma cases and their density in the study area.
Ijgi 14 00105 g001
Figure 2. Research methodology flowchart.
Figure 2. Research methodology flowchart.
Ijgi 14 00105 g002
Figure 3. Spatial distribution maps of initial predictors in the study area.
Figure 3. Spatial distribution maps of initial predictors in the study area.
Ijgi 14 00105 g003
Figure 4. The spatial distribution of asthma case hot spots and cold spots within the study area.
Figure 4. The spatial distribution of asthma case hot spots and cold spots within the study area.
Ijgi 14 00105 g004
Figure 5. The Gradient Boosting algorithm determined spatial distribution maps of the most significant predictor values in the study area. Legend: V1: population density; V5: particulate matter including PM2.5 and PM10; V9: neighborhood deprivation index (%), which shows the most deprived areas; V10: road intersection density per square kilometers.
Figure 5. The Gradient Boosting algorithm determined spatial distribution maps of the most significant predictor values in the study area. Legend: V1: population density; V5: particulate matter including PM2.5 and PM10; V9: neighborhood deprivation index (%), which shows the most deprived areas; V10: road intersection density per square kilometers.
Ijgi 14 00105 g005
Figure 6. Spatial distribution map of locations with varying asthma occurrence risk probability levels within the study area.
Figure 6. Spatial distribution map of locations with varying asthma occurrence risk probability levels within the study area.
Ijgi 14 00105 g006
Table 1. Indicators identified from the literature review to examine the relationship between asthma prevalence and neighborhood characteristics in Tehran (2020–2024).
Table 1. Indicators identified from the literature review to examine the relationship between asthma prevalence and neighborhood characteristics in Tehran (2020–2024).
AspectsIndicatorSpatial Database and Data TypeSource
Demographic and SocioeconomicV1: Population density (per sq.km)Census, ESRI shapefile[67]
V2: Proportion of elderly (%)Census, ESRI shapefile[67]
V3: Proportion of illiterate people (%)Census, ESRI shapefile[67]
V4: Proportion of unemployed people (%)Census, ESRI shapefile[67]
Air Quality IndexV5: Particulate matter (AAI include PM2.5 and PM10)Sentinel-5, RasterGoogle Earth Engine
V6: Nitrogen dioxide (NO2)Sentinel-5, RasterGoogle Earth Engine
V7: Ozone (O3)Sentinel-5, RasterGoogle Earth Engine
V8: Sulfur dioxide (SO2)Sentinel-5, RasterGoogle Earth Engine
EnvironmentalV9: Neighborhood deprivation index (%)Land use map, ESRI shapefileTehran municipality, OpenStreetMap
V10: Road intersection density (per square kilometers)OSM, ESRI shapefile, and RasterOpenStreetMap
V11: Normalized Difference Vegetation Index (NDVI)Landsat 8, RasterGoogle Earth Engine
V12: Exposure to industrial emissionsLand use map, OSM, ESRI shapefileOpenStreetMap
V13: Proximity to fuel stationsLand use map, OSM, ESRI shapefileTehran Municipality, OpenStreetMap
Weather and ClimateV14: Urban heat islands (UHIs)Landsat 8, RasterGoogle Earth Engine
Access and Utilization of Healthcare ServicesV15: Access to healthcare facilitiesLand use map, OSM, ESRI shapefileTehran municipality, OpenStreetMap
Table 2. Initial summary of negative binomial regression model coefficients.
Table 2. Initial summary of negative binomial regression model coefficients.
PredictorEstimateStd. Errorz ValuePr (>|z|)
V11.9 × 10−54.9 × 10−63.8 × 1001.5 × 10−4 ***
V2−3.2 × 10−21.9 × 10−2−1.6 × 1001.0 × 10−1
V38.8 × 10−31.1 × 10−27.7 × 10−14.4 × 10−1
V4−3.2 × 10−24.1 × 10−2−7.7 × 10−14.4 × 10−1
V51.4 × 1006.2 × 10−12.2 × 1002.6 × 10−2 *
V62.3 × 1035.9 × 1023.9 × 1001.1 × 10−4 ***
V7−1.4 × 1022.2 × 102−6.3 × 10−15.3 × 10−1
V85.2 × 1031.6 × 1033.3 × 1001.1 × 10−3 **
V9−5.6 × 10−32.2 × 10−3−2.6 × 1001.0 × 10−2 *
V105.9 × 10−41.5 × 10−44.0 × 1007.6 × 10−5 ***
V119.5 × 10−11.1 × 1008.7 × 10−13.8 × 10−1
V123.4 × 10−34.6 × 10−37.4 × 10−14.6 × 10−1
V13−2.1 × 10−53.6 × 10−5−5.7 × 10−15.7 × 10−1
V14−2.7 × 10−24.1 × 10−2−6.6 × 10−15.1 × 10−1
V156.7 × 10−32.3 × 10−22.9 × 10−17.7 × 10−1
Significance codes: 0—‘***’, 0.001—‘**’, 0.01—‘*’, 0.05—‘.’, 0.1—‘ ’, 1, AIC = 1219.
Table 3. Stepwise selected model summary.
Table 3. Stepwise selected model summary.
PredictorEstimateStd. Errorz ValuePr (>|z|)
V11.97 × 10−53.07 × 10−66.4337581.24 × 10−10 ***
V4−0.074220.034637−2.142780.032131 *
V51.4049770.4705552.9857880.002828 **
V61865.001413.05924.5150946.33 × 10−6 ***
V84250.5631240.3313.4269570.00061 ***
V9−0.004910.002075−2.368940.017839 *
V100.0005560.0001384.0412885.32 × 10−5 ***
Significance codes: 0—‘***’, 0.001—‘**’, 0.01—‘*’, 0.05—‘.’, 0.1—‘ ’, 1, AIC = 1210.
Table 4. Summary of MLA diagnostics.
Table 4. Summary of MLA diagnostics.
MLAsRMSER-SquaredMAEEVMoran’s I
(Train)(Test)(Train)(Test)(Train)(Test)(Train)(Test)(Train)
RF0.561.080.960.750.400.8410.740.29
GBM0.561.070.950.760.430.880.950.750.17
XGBoost0.221.210.990.690.160.910.990.680.12
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Mohammadi, A.; Pishgar, E.; Aguilera, J. Spatial Prediction of High-Risk Areas for Asthma in Metropolitan Areas: A Machine Learning Approach Applied to Tehran, Iran. ISPRS Int. J. Geo-Inf. 2025, 14, 105. https://doi.org/10.3390/ijgi14030105

AMA Style

Mohammadi A, Pishgar E, Aguilera J. Spatial Prediction of High-Risk Areas for Asthma in Metropolitan Areas: A Machine Learning Approach Applied to Tehran, Iran. ISPRS International Journal of Geo-Information. 2025; 14(3):105. https://doi.org/10.3390/ijgi14030105

Chicago/Turabian Style

Mohammadi, Alireza, Elahe Pishgar, and Juan Aguilera. 2025. "Spatial Prediction of High-Risk Areas for Asthma in Metropolitan Areas: A Machine Learning Approach Applied to Tehran, Iran" ISPRS International Journal of Geo-Information 14, no. 3: 105. https://doi.org/10.3390/ijgi14030105

APA Style

Mohammadi, A., Pishgar, E., & Aguilera, J. (2025). Spatial Prediction of High-Risk Areas for Asthma in Metropolitan Areas: A Machine Learning Approach Applied to Tehran, Iran. ISPRS International Journal of Geo-Information, 14(3), 105. https://doi.org/10.3390/ijgi14030105

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop