Next Article in Journal
Does Patient Capital Crowd out the Stabilizing Benefits of ESG? Evidence from Corporate Investment Volatility
Previous Article in Journal
ELECTRE-Based Optimization of Renewable Energy Investments: Evaluating Environmental, Economic, and Social Sustainability Through Sustainability Accounting
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Modeling Health-Supportive Urban Environments: The Role of Mixed Land Use, Socioeconomic Factors, and Walkability in U.S. ZIP Codes

1
Department of Architecture Engineering, Faculty of Engineering, Tanta University, Tanta 3111, Egypt
2
Department of Transportation Engineering, Faculty of Engineering, Alexandria University, Alexandria 21544, Egypt
3
The Center of Road Traffic Safety, Naif Arab University for Security Sciences, Riyadh 11452, Saudi Arabia
*
Author to whom correspondence should be addressed.
Sustainability 2025, 17(23), 10873; https://doi.org/10.3390/su172310873
Submission received: 30 October 2025 / Revised: 24 November 2025 / Accepted: 30 November 2025 / Published: 4 December 2025
(This article belongs to the Section Sustainable Urban and Rural Development)

Abstract

Over recent decades, planners in the U.S. have increasingly adopted mixed-use projects to reduce automobile dependency and strengthen local community identity, although results remain inconsistent across cities. Urban health and fitness outcomes are shaped by complex interactions between the built environment, socioeconomic factors, and demographic characteristics. This study introduces a Health and Fitness Index (HFI) for 28,758 U.S. ZIP codes, derived from normalized measures of walkability, healthcare facility density, and carbon emissions, to assess spatial disparities in health-supportive environments. Using four modeling approaches—lasso regression, multiple linear regression, decision trees, and k-nearest neighbor classifiers—we evaluated the predictive importance of 15 urban and socioeconomic variables. Multiple linear regression produced the strongest generalization performance (R2 = 0.60, RMSE = 0.04). Key positive predictors included occupied housing units, business density, land-use mix, household income, and racial diversity, while income inequality and population density were negatively associated with health outcomes. This study evaluates five statistical formulations (Metropolis Hybrid Models) that incorporate different combinations of walkability, land-use mix, environmental variables, and socioeconomic indicators to test whether relationships between urban form and socioeconomic conditions remain consistent under different variable combinations. In cross-sectional multivariate regression, although mixed-use development in high-density areas is strongly associated with healthcare facilities, these areas tend to serve younger and more racially diverse populations. Decision tree feature importance rankings and clustering profiles highlight structural inequalities across regions, suggesting that enhancing business diversity, land-use integration, and income equity could significantly improve health-supportive urban design. This research provides a data-driven framework for urban planners to identify underserved neighborhoods and develop targeted interventions that promote walkability, accessibility to health infrastructure, and sustainability. It contributes to the growing literature on urban health analytics, integrating machine learning, spatial clustering, and multidimensional urban indicators to advance equitable and resilient city planning.

1. Introduction

Mixed-use development integrates multiple land uses into a single project instead of separating land uses into single-purpose zones. Mixed-use development is frequently site-specific, neighborhood-oriented, or regional, and may be integrated into new constructions, renovations, brownfield projects, and Smart Growth initiatives in both urban and rural settings. Such districts typically feature higher densities and combine employment, retail, healthcare, and recreational services within residential environments, often enabled by zoning reform or Smart Growth policies.
Mixed-use development has historically been advocated as an essential element of effective urban design and has garnered support from critics of contemporary urban development and planning [1,2,3,4,5]. It is well known that this approach helps make cities more livable, brings people together, and promotes physical activity—all of which led to healthier, more dynamic urban communities [6,7,8,9]. Mixed-use development is deemed advantageous due to its walkability, transit accessibility, enhanced suburban and urban redevelopment, increased urban vitality, a broader range of housing options, and more efficient land utilization that can facilitate land preservation [10,11,12].
Facilities are fundamental elements of an urban environment [13,14,15]. It significantly enhances the quality of urban living. Certain services are essential, such as the supply of potable water. Other services in healthcare, culture, and education are widely sought after for enhancement. Public facilities are owned by the municipality and operated for the benefit of the community. The term “community facilities” encompasses a wider range of establishments [16], including those owned by the public as well as those owned and/or operated by the private sector for the community’s benefit.
The demand for additional and varied facilities and services in the community rises alongside urban expansion and population growth [17,18]. Conversely, antiquated facilities become obsolete as living standards and public expectations increase. While the necessity for essential municipal infrastructure (electricity, water, and sewer lines) persists, the demand for additional services (health clinics, junior colleges) is on the rise. This is due to a more discerning and anticipatory public. Planners and urban planning agencies are anticipated to assess requirements, establish priorities, and set standards for a wide array of public services and facilities.
Health systems and hospitals exploiting or developing real estate is not new. For years, care suppliers have been positioning their portfolio to attenuate price and reduce risk, notably in the wake of the Affordable Care Act. The distinction in today’s trends is not the scale or use of real estate; however, it is situated and the theory behind its placement.
To avoid ambiguity, we define here the role of the Metropolis Hybrid Models used in this study. Rather than representing different mixed-use land-use typologies, the Hybrid Models are alternative regression formulations that use different combinations of predictors and ZIP-code subsets depending on data availability.
Hybrid 1 includes walkability variables.
Hybrid 2 adds water-area measures.
Hybrid 3 incorporates carbon-emission variables and excludes ZIP codes with water area.
Hybrid 4 uses only socioeconomic and land-use indicators.
The All-ZIP Codes model applies to a unified structure for all 28,758 ZIP codes.
This structure enables testing the stability and consistency of urban-health relationships under different data constraints and urban forms.

2. Literature Review

While literature on various urban development patterns is common, empirical studies that address mixed-use development specifically are relatively scarce. Few published studies address the specific design adjustments or planning mechanisms required to improve mixed-use outcomes. By classifying developments by functions (land uses), degree of mixing, urban fabric, and socioeconomic characteristics of the location, this study focuses on the healthcare-related dimensions that may affect mixed-use development.
Empirical evidence strongly supports the link between built-environment walkability and improved health outcomes. A recent systematic review across multiple cities found that higher walkability—measured through density, connectivity, and proximity to amenities—is associated with lower rates of cardiovascular disease, obesity, and all-cause mortality [19,20]. A landmark longitudinal “natural experiment” using smartphone-based physical activity data for over 2 million U.S. residents demonstrated that relocations to more walkable neighborhoods significantly increased moderate-to-vigorous physical activity, with lasting effects beyond three months and across diverse demographic groups [21].
Another study in New York City found that perceptions of walkability—particularly barriers such as poor lighting or traffic—were negatively associated with both mental and physical quality of life among Black and Latino populations, even after adjusting for covariates [22]. Research shows that more heterogeneous land use (i.e., higher entropy) leads to reduced vehicle miles traveled (VMT) and consequent reductions in greenhouse gas emissions. A comprehensive meta-analysis reported that each 1% increase in land-use mix yields an average 0.09% reduction in household VMT [23,24].
In a cross-sectional survey of 268 Chinese cities, studies using the information entropy measure of land-use structure (IELUS) found a nonlinear (U-shaped) relationship with CO2 emissions: diversity initially reduces emissions up to an entropy value (~0.35), after which the relationship reverses [25]. Evidence from Shanghai showed that development zones with a higher land-use mix had better carbon emission coordination and overall environmental performance compared to heavily industrial or single-use areas [23].
Socioeconomic disparities shape access to health-supportive environments. Several studies confirm that higher household income, lower inequality, and greater racial diversity are associated with better walkability, availability of facilities, and lower collective carbon burden. For example, neighborhood-built environment features derived from Google Street View were linked to fewer cardiovascular risk factors and higher physical activity levels, especially in low-income groups [19]. Furthermore, built form attributes such as density and land-use mix showed weaker positive effects for disadvantaged communities, indicating that structural inequities may modulate environmental impact on health outcomes [26].
Composite indices integrating built environment, service access, and environmental quality are increasingly utilized for urban health assessment. For instance, the ACSM American Fitness Index aggregates community-level walkability, facility access, and physical activity indicators to benchmark U.S. cities’ health-supportive conditions. These frameworks underscore the value of multidimensional, scalable indices in guiding policy and planning.
Unsupervised learning techniques like hierarchical clustering and principal component analysis (PCA) have been effectively applied to reveal urban typologies based on mixed-use, density, and socioeconomic profiles. Studies frequently find that clusters with low walkability, high inequality, and limited services exhibit poorer environmental and health outcomes, supporting targeted intervention strategies in identified vulnerable areas [27].
The Paris urban development plan by Haussmann encourages people to have more time for physical activity in their neighborhood through more walking and cycling trips and fewer transit trips than residents in suburban developments. It had an enormous effect on the health status in Paris at that time. Physical inactivity is a leading risk factor for mortality. It is causing about 3 million deaths annually [28]. Indeed, moderate physical activity will result in considerable health benefits. Saelens et al. find that in 2003, physical activity was 50 percent higher among residents in highly walkable communities, in contrast to those in less walkable communities [29]. A study by Ewing and others aimed to represent multiple health outcomes regarding county compactness and sprawl measures. They find that compacted urban and suburban areas may have good effects on obesity and chronic disease [30].
Mixed-use development encourages public transportation, recreation areas, population density, and a limited urban footprint [10,31,32]. Single land use critically affects the ecosystem processes and services [33]. It can have wide and long-term consequences. Researchers find that urban sprawl may be the cause of many environmental problems, including loss of wildlife habitat, water pollution, and air pollution. It leads to biodiversity decline and species extinction. Urban life has an impact on both weight and health. Moreover, it has effects on socioeconomic characteristics and race distribution [34,35,36]. Research in transportation has studied the relation between built environment variables and individuals who are walking and cycling [37]. Activity levels have dropped and have led to sedentary lifestyles. These are some of the fundamental causes of obesity and overweight [38]. There is compelling evidence that design and land-use policies, together with mixed-use development, enhance physical activity, especially when integrated with transit infrastructure and pedestrian pathways or trails [39,40,41,42]. Mixed-use development plans that incorporate enhancements to bicycle or pedestrian transportation systems also augment opportunities for active transportation.
A primary benefit of mixed-use development is the promotion of walking and cycling as modes of transportation. The influence of mixed-use development on pedestrian trips surpasses its effect on the decrease in car kilometers traveled [43,44]. Ewing and Cervero (2010) estimate that, on average, walking trips increase by 15% for each 1% rise in land-use entropy and by 0.25% for every 1% reduction in walking distance to a store [45].
Sung and Lee examined the correlation between walking activity and the residential built environment. Their findings demonstrated that Jacobs’ theory on pedestrian activity correlates with mixed land use, density, block size, and accessibility [46].
In mixed-use development zones, individuals engage in walking and cycling more frequently than in single-use development zones [42,47]. Evidence indicates that children residing in Smart Growth areas with green spaces engage in greater levels of moderate-to-vigorous physical activity (MVPA) compared to their peers in conventional neighborhoods [48,49].
Substituting automotive journeys with cycling and walking in mixed-use development zones can diminish vehicle miles traveled (VMT) and greenhouse gas emissions that exacerbate climate change, thereby impacting health [47,50]. Researchers studying mixed land use have concentrated on autos. Moreover, initiatives that decrease vehicle miles traveled concurrently diminish greenhouse gas emissions. This decline resembles transportation patterns that remain unaffected by mixed land use. Additional research [51,52] has demonstrated that vehicle selection is influenced by land use. Reductions in automobile miles resulting from more mixed land use will lead to decreased greenhouse gas emissions. Frank et al. demonstrate that mixed land use exerted nearly the same influence on greenhouse gas emissions as on car miles traveled [53].
The frameworks, methodologies, and results from these studies can be utilized in physical exercise. This will assist researchers in comprehending the factors affecting physical activity in mixed-use developments. Sports facilities can function as the entertainment cornerstone in mixed-use buildings. Nonetheless, the construction of a new sports facility does not guarantee that the community will possess a complementary blend of land uses. In Pittsburgh, PNC Park and Heinz Field are expanding the downtown area through a mixed-use development project. Columbus’s Nationwide Arena is an integral component of an arena district [54].
An increasing body of scientific data demonstrates that environments can induce and/or exacerbate health issues. Research indicates that depression is associated with limited access to green spaces. Moreover, sadness resurfaces as a challenge for numerous inner-city inhabitants; the physical seclusion of suburban dwellers; and consequently, the immobility imposed on individuals unable to drive and lacking alternative transit. Cars not only emit pollutants [55,56] but also confine individuals for extended periods [57,58]. They result in accidents and fatalities [59,60]. Furthermore, non-walkable distances and the culture of automobile reliance promote sedentary lifestyles, leading to obesity, diabetes, and various other health conditions [61,62].

Research Gaps Identified in the Literature

Although extensive studies have examined walkability, mixed-use development, socioeconomic disparities, and environmental burdens, the existing literature reveals that several important research gaps remain:
1.
Lack of ZIP-code–level, national models integrating walkability, health services, and environmental measures.
  • Most studies focus on cities, counties, or neighborhoods, but large-scale, data-driven analyses at fine geographic resolution remain limited.
2.
Limited empirical research linking mixed-use development to health outcomes via composite indices.
  • Existing studies often analyze walkability, land-use mix, or emissions independently rather than through holistic integrated metrics.
3.
Insufficient differentiation between metropolitan, micropolitan, and rural areas.
  • Research tends to focus on large cities, overlooking the geographic heterogeneity and non-metropolitan dynamics.
4.
Absence of model-based typologies linking urban form, socioeconomic structure, and environmental sustainability.
  • Clustering methods are underused in studies of health-supportive environments.
5.
No study uses multiple statistical formulations (Hybrid Models 1–4) to test variable stability under different modeling constraints.
  • Most studies rely on a single regression structure, limiting robustness.
6.
Limited use of machine learning methods for structured urban datasets.
  • Machine learning is commonly applied to image-based data, but rarely to large-scale ZIP-code demographic datasets.
This study addresses these gaps by developing a large-scale, cross-sectional analysis across 28,758 ZIP codes, constructing a multidimensional Health and Fitness Index, and evaluating its determinants using four regression models and machine learning techniques.

3. Data and Methods

3.1. Primary Research Hypothesis (H1)

ZIP codes with urban environments that support walkability and provide access to health-related facilities are expected to support more health-promoting behaviors and outcomes. Therefore, environments characterized by higher walkability, greater availability of health services, and lower environmental burdens (e.g., carbon emissions) are hypothesized to achieve higher scores on the Health and Fitness Index (HFI).

3.2. Supporting Hypotheses

Hypothesis 2 (H2):
Socioeconomic characteristics—such as household income, racial diversity, and housing stability—are expected to influence health-supportive behaviors. Based on existing literature, higher socioeconomic advantage tends to correlate with healthier living environments, whereas socioeconomic stressors (e.g., overcrowding, high population density) are generally associated with lower health outcomes.
Hypothesis 3 (H3):
Urban planning scholarship suggests that diverse land-use patterns (e.g., mixed-use development) reduce reliance on automobile travel and support sustainability. Therefore, ZIP codes with greater land-use diversity are expected to demonstrate more favorable environmental performance, including reduced per capita carbon emissions.
Urban environments can be meaningfully categorized into typologies based on their built form, socioeconomic composition, and environmental conditions. It is hypothesized that hierarchical clustering will reveal distinct ZIP-code groups reflecting different levels of health-supportive infrastructure and socioeconomic complexity.
Machine learning literature suggests that indicators of neighborhood stability and economic capacity—such as housing occupancy, business density, and household income—tend to emerge as strong predictors of community-level health outcomes. Therefore, these variables are hypothesized to have high predictive importance in modeling the Health and Fitness Index.
Urbanization theory posits that metropolitan areas contain richer infrastructure, more services, and better access to health-supportive amenities than micropolitan or rural regions. Consequently, metropolitan ZIP codes are expected to exhibit higher Health and Fitness Index values.

3.3. Data Sources

Data were compiled from publicly available U.S. Census Bureau sources (1990–2020), ZIP code demographic profiles, and block-level census datasets. The demographics data is only as effective as their geographic boundaries for comparing the spatial distribution of zone diversity for residential and nonresidential zones. The technique involves a variety of land-use categories, but only residential land-use categories are used to measure diversity [63].
Zip code demographics are effective. It presents how a city is broken up. It makes it very easy to compare trends in the city with the suburbs. Zip code knowledge is designed to form connections between the past, present, and future as simply as possible. Zip code data can also provide social characteristics like income and education. This information includes for-profit businesses that range from supermarkets to graphic design firms; it also contains information about nonprofit and government organizations, like museums, and religious organizations. Table 1 summarizes the number of cases for each urban model. In this research, the focus is on metropolitan ZIP codes. According to A Modern Dictionary of Geography (2001), a metropolis is defined as a dominant city serving as a center for government, commerce, culture, or ecclesiastical leadership. People used to refer to any large city, as in Metropolitan County and metropolitan region. Metropolis ranges between the major cities and small towns. Table 2 shows the data source and measurement methods for mixed use and the health variables. Figure 1 illustrates the distribution of urban infrastructure, socioeconomic, and demographic variables across ZIP codes.
Two carbon-emission variables are used in this study for different purposes. First, for the construction of the Health and Fitness Index (HFI), total carbon emissions were normalized and inverted so that lower emissions correspond to higher environmental sustainability. Second, in Regression Model 3 (Metropolis Hybrid 3), raw total carbon emissions (not inverted) were included as an independent predictor to examine the direct statistical relationship between emissions and the composite HFI. These two variables are therefore not identical. The positive coefficient of carbon emissions in Hybrid Model 3 reflects urban concentration effects common in metropolitan ZIP codes—high walkability and high service density co-occurring with higher total emissions—and does not contradict the inverted use of emissions inside the HFI itself.

3.4. Methodology

This study employs a cross-sectional, data-driven approach to evaluate the determinants of the Health and Fitness Index (HFI) across 28,758 ZIP codes in the United States. All variables used in the analysis correspond directly to those listed in Table 2. Data were compiled from publicly available datasets covering the following dimensions:
  • Urban form indicators:
    Walk Score, representing the walkability index based on distance to nearby amenities
    Mix Factor, capturing the balance of population, jobs, and employment diversity
    Population density, representing residents per square mile
    Number of businesses, measuring overall economic activity
  • Facility availability variables:
    Education facilities
    Health facilities
    Accommodation and food services
    Arts, entertainment, and recreation facilities
  • Socioeconomic and demographic variables:
    Income per household
    % occupied housing units
    Median age
    Diversity of race, measuring the probability that two randomly selected residents have different races
    Population aged 75 years and over
  • Environmental variables: Total carbon emissions (tCO2e/yr) for each ZIP code
All continuous variables were normalized prior to analysis to ensure comparability across ZIP codes.

3.4.1. Construction of Health and Fitness Index (HFI):

The dependent variable, HFI, was constructed as a composite measure capturing walkability, healthcare accessibility, and environmental sustainability. Using Min-Max normalization:
x = x x m i n x m a x   x m i n
where
  • x = normalized value
  • x = raw variable value
  • x m i n , x m a x = min and max of the variable
(1)
Walkability (Walkscore_Norm)
Walkability is quantified using the Walk Score® index, which assigns a numeric value from 0 to 100 for each ZIP code. The score is computed using a weighted distance-decay algorithm:
W a l k s c o r e =   k = 1 n w k   .   f ( d k )
where
  • d k = network distance to the nearest amenity in category k (e.g., grocery, school, park, restaurant)
  • w k = category weight
  • f ( d k ) = distance-decay function awarding maximum points for distances ≤ 0.4 km and zero points for distances ≥ 1.6 km
The resulting raw Walk Score values were normalized to:
W a l k s c o r e _ N o r m [ 0,1 ]
(2)
Healthcare Accessibility (Health_Norm)
Healthcare accessibility is quantified as the density of health facilities per ZIP code, defined as:
H e a l t h _ D e n s i t y i   =   H e a l t h _ F a c i l i t i e s i / A r e a i ( s q m i l e s )
where
  • H e a l t h _ F a c i l i t i e s i = number of licensed healthcare facilities in ZIP code i
  • A r e a i = land area of the ZIP code
This measure captures the spatial availability of healthcare services per unit area. The resulting density values were normalized:
H e a l t h _ N o r m     [ 0,1 ]
(3)
Environmental Sustainability (Carbon_Norm)
Environmental sustainability is measured using total annual carbon emissions (tCO2e/year) for each ZIP code.
Because lower emissions indicate a healthier environment, the normalized carbon values were inverted:
C a r b o n _ N o r m   =   1   ( C a r b o n i C a r b o n m i n / C a r b o n m a x     C a r b o n m i n )
Thus, ZIP codes with cleaner environmental conditions receive higher scores.
The HFI was calculated as:
H F I i = W a l k s c o r e _ N o r m i +   H e a l t h _ N o r m i +   C a r b o n _ N o r m i 3
where
  • W a l k s c o r e _ N o r m i : normalized walkability score
  • H e a l t h _ N o r m i : normalized density of health facilities
  • C a r b o n _ N o r m i : normalized total carbon emissions (inverted to reflect positive contribution to fitness).
This composite index enables a holistic evaluation of health and fitness infrastructure by combining physical accessibility, healthcare resources, and environmental sustainability. Higher values of the index indicate ZIP codes that are more walkable, better served by health facilities, and environmentally cleaner, thereby representing areas with greater potential to support healthier lifestyles.

3.4.2. Predictive Modeling

We modeled HFI using multiple approaches:
Multiple Linear Regression: A baseline model was estimated as:
H F I i = β 0 + j = 1 p β j X i j + i
where
  • H F I i : Health and Fitness Index for ZIP code i
  • X i j : predictor j (urban form, socioeconomic, environmental variables)
  • β j : regression coefficients
  • i : error term
Model fit was assessed via RMSE and MAE:
R M S E = 1 n   i = 1 n y i ^ y i 2
M A E = 1 n   i = 1 n y i ^ y i
Lasso Regression: To handle multicollinearity and perform feature selection, Lasso regression was employed:
β ^ l a s s o = a r g   m i n β   1 2 n   i = 1 n y i X i β 2 + μ   j = 1 p β j
where
μ : regularization parameter controlling sparsity of coefficients.
Decision Tree Model: A Decision Tree Regressor was implemented to capture nonlinear interactions.
Splits were chosen to minimize the Mean Squared Error (MSE) at each node:
M S E s p l i t   = 1 N l e f t   i l e f t y i y ¯ l e f t 2 + 1 N r i g h t   i     r i g h t y i y ¯ r i g h t 2  
Feature importance was computed as:
I m p o r t a n c e   j   = t     T j M S E t t     T M S E t
where
  • T j : set of nodes where feature j was used
  • M S E t : reduction in MSE due to the split at node t
K-Nearest Neighbor (KNN) Classifier: The K-Nearest Neighbor classifier was employed to categorize ZIP codes into high-, medium-, and low-HFI groups and to identify the variables that most strongly differentiate these categories.
  • KNN Algorithm: KNN classifies an observation by examining the k closest ZIP codes in feature space and assigning the majority class:
y ^ i = m o d e   y i , 1 , y i , 2 ,   ,   y i , k
Distance Metric: Euclidean distance:
d x i   ,   x j =   m = 1 p ( x i m   x j m ) 2
Hyperparameter tuning: The optimal k was determined using grid search with stratified 10-fold cross-validation.
Purpose in this study: KNN is used to
  • validate whether ZIP codes naturally cluster into distinct health/fitness profiles, and
  • Compare classification performance to regression-based predictor rankings.

3.4.3. Train–Validation–Test Split

To ensure robust and statistically valid model development, the dataset was partitioned using a standard split of 70% training, 15% validation, and 15% testing. This structure provides a sufficiently large training set for model learning while retaining adequate data for hyperparameter tuning (validation) and final out-of-sample evaluation (test). The split was performed using train_test_split with random state = 42 for reproducibility. This revision replaces the earlier unconventional 16/4/80 split to ensure alignment with best practices in machine-learning-based modeling.

4. Data Analysis and Correlations

Figure 2 presents the relationship between the Health and Fitness Index (HFI) and Land-Use Mix Factor in different contexts. In Figure 2A, each dot represents a ZIP code, color-coded by city type. Regression lines show the linear trend for each city type with 95% confidence intervals. Higher Health and Fitness Index values are generally associated with increased land-use diversity, particularly in metropolitan and urban ZIP codes, while rural and micro areas exhibit weaker associations. Most city types (e.g., P and Metro) show a positive relationship, meaning ZIP codes with a higher HFI tend to have more mixed land uses. By contrast, micropolitan and rural areas exhibit flatter slopes, indicating that mixed land use does not translate as strongly into health-supportive conditions in less urbanized regions. The scatter shows substantial variability, indicating other factors may influence this relationship.
In Figure 2B, each dot represents a ZIP code, color-coded by climate classification. Regression lines show the linear association between health-supportive infrastructure and land-use diversity within each climate zone. Cold and Marine zones exhibit the strongest positive relationships, whereas Hot-Dry and Mixed-Dry regions demonstrate weaker associations, suggesting climate-modulated effects on urban form and health infrastructure. Marine and Cold zones: Show strong positive slopes → Healthier, more walkable ZIPs in these zones are strongly associated with mixed land uses. Hot-Humid and Mixed-Humid zones: Positive but moderate relationships. Hot-Dry and Mixed-Dry zones: Flatter slopes → health and fitness infrastructure does not strongly translate to land-use diversity. Very Cold: Limited data (few points), making regression less stable.
In Figure 2C, each ZIP code is color-coded based on its moisture regime classification (A (Humid), B (Dry), C (Marine)). Linear regression lines show that Regime C areas exhibit the strongest positive relationship between health-supportive infrastructure and land-use diversity, while Regime B areas display a weaker correlation, suggesting environmental moderation of urban planning impacts. Regime C (Marine): Shows the steepest slope, meaning in this regime, areas with higher HFI strongly correlate with more mixed land uses. Regime A (Humid): Positive relationship but less steep than C, indicating a moderate correlation. Regime B (Dry): The flattest slope, suggesting minimal relationship between HFI and mix factor.
In Figure 2D, ZIP codes are color-coded based on the U.S. Census regional classification. Linear regression lines indicate that the Midwest exhibits the strongest positive relationship between health-supportive infrastructure and land-use diversity, followed by the Northeast and South, whereas the West shows a weaker association. Midwest (Green): Steepest slope HFI is strongly associated with land-use mix in this region. Northeast (Blue): Positive but moderate association. South (Orange): Positive slope but weaker compared to the Midwest and the Northeast. West (Red): Flat regression line, little to no correlation between HFI and mix factor in western ZIP codes.
Figure 3 presents the correlation matrix of the considered variables. The figure indicates that Mix Factor, Retail Trade, and Number of Businesses are positively correlated with each other. Areas with diverse land uses also have more retail and business activity. Income Per Household is positively correlated with HFI, Number of Businesses, and Mix Factor. Gini Coefficient (income inequality) shows a negative correlation with Income Per Household and a weaker relationship with other variables. Median Age has mild to negative correlations with HFI and other activity-based variables, suggesting younger populations in more active, mixed-use ZIPs. Occupied Housing Units positively correlate with businesses and income, consistent with denser, economically active neighborhoods. Several urban form variables (Mix Factor, Retail, Businesses, Finance) are strongly interrelated, suggesting they may overlap when used together in regression models.
Facility and service availability variables are heavily right-skewed, indicating concentration in select high-density areas. Median age, household income, and Health and Fitness Index exhibit near-normal distributions, while population density and occupied housing are skewed toward high occupancy and low-density rural regions.
Figure 4 indicates the correlation heatmap showing associations between the Health and Fitness Index and ZIP code–level socioeconomic indicators. The index is positively correlated with occupied housing units, business density, household income, and land-use mix factor, suggesting that economically active, well-inhabited, and diverse urban areas are more health-supportive. Negative correlations with median age and Gini coefficient indicate that older populations and areas with higher inequality tend to have lower Health and Fitness Index values.

5. Regression Modeling Results

Before model development, the dataset was partitioned into training, validation, and test subsets to ensure robust out-of-sample evaluation and to prevent overfitting. Because the dataset includes nearly 30,000 ZIP codes, the test set provides a strong basis for reliable generalization assessment across all Hybrid Models. The split used in this study was:
  • Training set: 16% of all observations
  • Validation set: 4% of all observations
  • Test set: 80% of all observations
This proportional split was applied consistently across all Hybrid Models to ensure comparability, and all partitions were generated using train_test_split in scikit-learn with a fixed random seed (random_state = 42) to ensure reproducibility.
The Health and Fitness Index (HFI) served as the dependent variable, while all remaining variables were used as predictors. The predictor variables were grouped as follows:
  • Land use and services: Mix Factor, Retail Trade, Education Facilities, Accommodation and Food Services, Arts/Entertainment/Recreation, Finance/Professional Services
  • Housing and demographics: Occupied Housing Units, Number of Businesses, Diversity of Race, Population Aged 75+, Median Age, Population Density
  • Socioeconomic indicators: Income per Household, Gini Coefficient Estimate
After fitting the initial regression models, the results indicated that several predictors had negligible or statistically insignificant contributions. To identify the most influential variables, we applied the Least Absolute Shrinkage and Selection Operator (LASSO) regression, as shown in Figure 5. LASSO applies an L1 penalty on regression coefficients, shrinking weak predictors to zero and thus performing both regularization and variable selection. Only three predictors retained non-zero coefficients, indicating a measurable association with the Health and Fitness Index.
All other features had exactly zero coefficients, highlighting their limited importance in explaining variations in the Health and Fitness Index under the Lasso penalty. Number of Businesses: The strongest predictor, positively associated with the Health and Fitness Index. This suggests that ZIP codes with more business establishments tend to have better walkability, health facility access, and lower carbon emissions. Population aged 75+: A weaker but positive contribution, implying that areas with a larger elderly population may be slightly more health supportive. Income per Household: Minor positive contribution, consistent with the idea that higher-income neighborhoods offer more resources for health and fitness.
Following the feature selection process, four predictive modeling approaches were evaluated: multiple linear regression, LASSO regression, decision tree regression, and the k-nearest neighbor (KNN) classifier. This comparison aims to assess predictive performance and to determine the robustness and consistency of variable importance across models.
(1)
Multiple Linear Regression
The multiple linear regression model produced the following metrics:
  • RMSE: 0.0405
  • MAE: 0.0317
  • R2: 0.60
Cross-validation (5-fold) yielded scores of [0.56, 0.66, 0.41, 0.51, 0.61], with an average of 0.55, indicating moderate stability across folds.
(2)
LASSO Regression
LASSO regression was evaluated with the optimal regularization parameter (λ) determined via cross-validation:
  • RMSE: 0.0431
  • MAE: 0.0342
  • R2: 0.57
  • Mean cross-validated R2: 0.53
As expected, predictive accuracy was slightly lower due to the imposed sparsity constraint, but the model yielded clear feature-selection benefits.
(3)
Decision Tree Regression
A decision tree regressor was trained with maximum depth tuned via grid search:
  • RMSE: 0.0528
  • MAE: 0.0447
  • R2: 0.42
  • Cross-validated R2: 0.39
The decision tree exhibited lower predictive stability due to sensitivity to data splits and the high dimensionality of ZIP-code features.
(4)
K-Nearest Neighbor (KNN) Classifier
The KNN classifier was used to categorize ZIP codes into low-, medium-, and high-HFI groups. With the optimal value of k = 7, the model achieved:
  • Overall classification accuracy: 74%
  • Macro-averaged precision: 0.71
  • Macro-averaged recall: 0.70
  • Macro-averaged F1-score: 0.69
A 5-fold cross-validation resulted in classification accuracy scores of [0.73, 0.75, 0.70, 0.72, 0.75], averaging 0.73, demonstrating stable categorical discrimination performance.
Cross-validation scores were computed using k-fold cross-validation, in which the dataset is randomly divided into k subsets. The model is trained on k − 1 folds and validated on the remaining fold, repeated k times. For each fold, a performance metric (R2 for regression or accuracy for KNN) is calculated, and the mean of the k scores represents the overall cross-validation score.
All models substantially outperformed the baseline (mean HFI prediction = RMSE 0.064), indicating meaningful predictive skill across formulations. Feature importance was derived from standardized regression coefficients, as shown in Figure 6, indicating the contribution of each predictor to the Health and Fitness Index. The strongest positive predictors are Occupied Housing Units, Number of Businesses, Land-Use Mix Factor, Income Per Household, and Racial Diversity, indicating that urban areas with dense occupancy, vibrant business activity, economic affluence, and diversity tend to have healthier and more fitness-supportive environments. Negative coefficients for Income Inequality (Gini), Population Density, and Accommodation/Food Services suggest that greater inequality and high density may slightly reduce the Health and Fitness Index. The model’s R2 value of 0.60 suggests that approximately 60% of the variance in the Health and Fitness Index is explained by these predictors, leaving room for nonlinear effects or unobserved variables. The Regression model results are presented in Table 3.

6. Discussion

The findings of this study are broadly consistent with previous research linking walkability, mixed-use development, and environmental quality to healthier communities. Similar to the natural experiment reported in [21], our results show that higher walkability scores are strongly associated with higher Health and Fitness Index values, confirming the role of pedestrian-friendly environments in promoting physical activity.
The positive influence of land-use mix on environmental performance echoes previous findings from Shanghai and U.S. cities [23,24], which reported reductions in vehicle miles traveled and household emissions as diversity increased. Our results also align with the meta-analytic evidence that mixed-use development contributes to sustainability and greenhouse gas reduction [45,53].
Additionally, the strong role of socioeconomic variables—especially income, occupied housing units, and racial diversity—is consistent with studies showing that structural inequality shapes access to health-supportive environments [19,26]. Our machine learning models similarly identified household income and housing occupancy as major predictors, reinforcing findings reported in urban-health literature.
However, our results differ from some studies that reported nonlinear or U-shaped environmental relationships with land-use mix (e.g., IELUS values in Chinese cities). In contrast, our nationwide ZIP-code analysis found a consistently positive association between mixed-use indicators and environmental sustainability, possibly due to differences in scale, context, or measurement.
Finally, unlike previous studies that focused exclusively on urban centers, this study demonstrates substantial differences between metropolitan and non-metropolitan areas, showing significantly higher Health and Fitness Index values in metropolitan ZIP codes—an aspect seldom evaluated in earlier research.
This study developed a comprehensive, data-driven framework for evaluating the health-supportive capacity of urban environments across 28,758 ZIP codes in the United States. By constructing a Health and Fitness Index (HFI)—integrating walkability, density of healthcare facilities, and per capita carbon emissions—we provided a multidimensional measure of neighborhood-level health-supportive infrastructure and sustainability. Leveraging machine learning, regression analysis, and spatial clustering, this research advances understanding of how urban form, socioeconomic structure, and environmental factors interact to shape public health outcomes.
Multiple modeling approaches, including lasso regression and multiple linear regression, consistently identified occupied housing units, the number of businesses, land-use mix, household income, and racial diversity as the strongest positive predictors of the Health and Fitness Index. Negative associations emerged for income inequality, population density, and carbon-intensive activity, indicating that structural inequities and unsustainable development patterns undermine the potential for healthy, active communities. Linear regression achieved the highest generalization performance (R2 = 0.60R2 = 0.60, RMSE = 0.04), outperforming more complex models such as decision trees and k-nearest neighbors, suggesting a largely linear relationship between these predictors and health-supportive urban outcomes. Most ZIP codes clustered into a dominant group characterized by moderate walkability and health facility density, while smaller outlier clusters exhibited either high-income, mixed-use, low-emission profiles or underserved communities with poor accessibility and higher emissions.
Although LASSO and MLR appear to yield different sets of important predictors, the two methods address different analytical questions. LASSO imposes an L1 penalty, which forces correlated variables to compete with one another. As a result, only the strongest representative variables are retained, while other correlated predictors are shrunk to zero, even if they have meaningful effects. In contrast, MLR estimates coefficients for all variables simultaneously, revealing their relative effect sizes after controlling for all other predictors. Therefore, MLR highlights influential variables within the full multivariate structure, whereas LASSO identifies a minimal subset of essential predictors. These outputs are not contradictory; rather, they provide complementary perspectives on predictor importance in a multicollinear urban dataset.
The findings underscore critical spatial inequities in the distribution of health-supportive infrastructure and sustainable urban design. Metropolitan areas generally scored higher on the Health and Fitness Index compared to micropolitan and rural ZIP codes, reflecting greater business density, service availability, and walkability. However, this urban advantage is tempered by challenges such as high-income inequality and elevated transport emissions in certain metropolitan regions.
From a methodological standpoint, the study demonstrates the feasibility of combining multivariate statistical models with machine learning to analyze complex urban systems at a granular scale. The use of entropy-based land-use diversity metrics and per capita carbon normalization provides nuanced insights into the interplay between urban form and environmental outcomes. Moreover, the integration of socioeconomic variables reveals that built-environment interventions alone may be insufficient without addressing underlying structural inequities.
The outcomes of multivariate regressions illustrate the correlations between cultural forms and commuting patterns in the United States. The five models utilize socioeconomic variables: population density, total housing units, percentage of occupied housing units, median age, racial diversity, and income per family, respectively. Additionally, urban form variables include mix factor, street factor, mixed land use, and Walk Score. Each cultural variable was individually assessed within the model. In summary, I observe a substantial correlation between the quantity of business and healthcare facilities and walking, as well as land use diversity, despite accounting for other urban form and socioeconomic factors.
In metropolitan regions, walkability is strongly correlated with healthcare facilities and mixed land use. The educational institutions exhibit a favorable correlation with walking and public transit, indicating that healthcare workers are likely to reside within walking distance of their workplaces. Conversely, those over 75 years of age reside in affordable homes located far from employment opportunities, where they can use local (less corporate) lifestyle amenities, but necessitate transportation for daily commute. I observe an increased concentration of residential and commercial facilities, accompanied by enhanced pedestrian and transit commuting options. Accommodation and dining facilities exhibit a significant and positive correlation with all health factors.
The metropolitan areas model indicates a greater number of health facilities. The metropolitan hybrid model is anticipated to attract the younger generation for career opportunities. The findings indicate a negative association between the median age and the number of facilities across all models. This aligns with the employment variable in the descriptive statistics and is congruent with the principles of high-density urban plans, as it aims to promote health.
The findings indicate a positive association between the proportion of occupied housing units and the overall number of facilities across all models. The diminishing number of amenities seems to corroborate the positive influence of increased green spaces on health within neighborhoods.
Is it essential to examine which services or facilities contribute to improved health outcomes? The sectors of arts, entertainment, recreation, housing, and food services have a robust positive linear association with the population aged 75 and older across all models.
Research indicates that low-density urban regions offer a healthier environment. A plausible explanation is that low population density and abundant recreational facilities draw individuals to reside in these regions. Nonetheless, densely populated urban regions offer a greater number of healthcare facilities. It is important to acknowledge that individuals aged 75 and older often disfavor excessive mixed land use. Additionally, families with children are inclined to relocate to low-density districts. The elevated median age in densely populated neighborhoods is attributable to this factor.
The data indicate that augmenting facilities in mixed-use developments elevates carbon emissions. High-density urban regions possess a greater availability of public transit compared to low-density urban area models. There are potential explanations for this. Initially, an increase in facilities necessitates a greater number of trucks for transporting commodities. Moreover, population density necessitates increased energy consumption for heating and cooling. The proportion of occupied homes is greater in high-density urban regions. The majority of employment opportunities and the population are concentrated in high-density urban areas. The most common commuting journey is between suburban regions and hybrid areas. The least common commute is from a densely populated urban dwelling to a suburban workplace.
Policy implications are significant: enhancing business diversity, mixed land use, and access to healthcare facilities, alongside efforts to reduce carbon emissions, can substantially improve neighborhood health-supportive environments. Planners should prioritize equity-driven design to ensure that underserved ZIP codes receive targeted investments in infrastructure and services. The clustering analysis further provides a tool for identifying typologies of vulnerability and resilience, enabling more precise allocation of resources.

7. Conclusions

In conclusion, this research contributes to urban health scholarship by providing a robust, scalable methodology for quantifying health-supportive environments and uncovering spatial disparities. It bridges urban planning, public health, and sustainability domains, offering evidence-based insights to guide equitable and resilient city development. Future research can extend this framework by incorporating temporal dynamics, finer spatial resolution, and behavioral health data to further refine urban health metrics and intervention strategies.
The findings of this study hold significant implications for urban planning, public health policy, and sustainable development across the United States. By quantifying health-supportive environments through the Health and Fitness Index (HFI) and identifying key predictors, this research provides actionable insights for policymakers aiming to foster equitable, resilient, and sustainable cities.
First, enhancing land-use diversity and business density should be a central strategy. The positive relationship between mixed-use development and the HFI highlights the need for zoning policies that encourage the co-location of residential, commercial, educational, and healthcare facilities. Mixed-use neighborhoods reduce reliance on private vehicles, lower per capita carbon emissions, and improve access to essential health services, thereby promoting active, healthier lifestyles.
Second, addressing income inequality and structural socioeconomic barriers is essential to improving health outcomes. ZIP codes with higher income disparity and lower household income exhibited reduced HFI scores despite having access to some urban infrastructure. Urban policies should therefore integrate affordable housing initiatives, community-based healthcare services, and inclusive urban design that accommodates vulnerable populations, including racial minorities and the elderly. Investments in public transit and safe pedestrian infrastructure can further bridge accessibility gaps for disadvantaged communities.
Third, the clustering analysis provides a robust tool for targeted interventions. Distinct typologies identified in this research can guide policymakers in prioritizing resource allocation. For example, outlier clusters with high emissions and poor walkability may benefit from green retrofitting programs, incentives for low-emission transport, and the development of active mobility networks (bike lanes, pedestrian-friendly streets). Conversely, already high-performing clusters could be targeted for emission-reduction policies to sustain their health and environmental advantages.
Fourth, metropolitan areas generally outperform rural regions in terms of health-supportive environments, but they also exhibit elevated transport emissions and inequality hotspots. Policymakers should focus on decentralizing urban amenities to surrounding micropolitan and rural ZIP codes, mitigating the urban-rural divide in health infrastructure. Expanding telehealth services and regional public transit systems can enhance accessibility for remote communities.
The results of this study emphasize that mixed-use development should not only increase density but also ensure compatibility with surrounding urban clusters, particularly medical services, recreation, retail, and housing. Successful health-supportive urban planning requires more than renaming zoning codes; it necessitates true community planning that addresses density, massing, heights, parking, and setbacks, supported by innovative financing tools such as tax increment financing and revenue bonds for infrastructure improvements.
Healthcare organizations and planners can benefit from predictive analytics and portfolio management tools, commonly used in retail and banking industries, to identify optimal locations for healthcare services. This data-driven approach can enhance accessibility, rationalize service distribution, and improve the financial feasibility of healthcare investments. Mixed-use developments that integrate healthcare, wellness facilities, and retail opportunities create “live, work, play” environments, aligning healthcare delivery with consumer demand for wellness experiences.
Moreover, zoning diversity should be carefully managed to avoid unintended consequences of high-density development that may contribute to unhealthy environments. Strategies should include expanding green spaces, arts and recreation facilities, and public parks in areas with socioeconomic diversity but limited housing units. These amenities enhance physical activity opportunities and foster social cohesion, improving overall health outcomes.
Accessibility remains a critical consideration. Mixed-use developments with healthcare components should provide intuitive wayfinding, transit connectivity, ample parking, and drop-off areas, ensuring convenience for patients and staff. The rising demand for ambulatory care facilities and community-based healthcare services reflects a broader shift toward preventive care, wellness, and decentralized service delivery.
The evolving healthcare model, driven by technological innovation and operational restructuring, underscores the importance of adaptive mixed-use management. Future developments should anticipate changing tenant needs by offering flexible spaces, improved common areas, and complementary amenities, transforming healthcare facilities from illness-treatment hubs into wellness-centered community anchors. This paradigm shift supports the creation of healthier, more resilient, and happier urban communities.
Lastly, integrating data-driven planning tools—including machine learning models and spatial clustering techniques—into municipal and federal planning frameworks can improve monitoring and forecasting of health-supportive urban indicators. These tools allow decision-makers to dynamically assess policy impacts, optimize infrastructure investments, and track progress toward sustainable development goals (SDGs).

Author Contributions

Conceptualization, M.Z. and A.M.D.; methodology, M.Z. and A.M.D.; software, M.Z.; validation, M.Z. and A.M.D. and S.S.; formal analysis, M.Z. and A.M.D.; investigation, M.Z. and A.M.D.; resources, M.Z., A.M.D., and S.S.; data curation, M.Z.; writing—original draft preparation, M.Z., A.M.D., and S.S.; writing—review and editing, A.M.D.; visualization, M.Z., A.M.D., and S.S.; supervision, M.Z.; project administration, M.Z. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The original contributions presented in this study are included in the article. Further inquiries can be directed to the corresponding author.

Conflicts of Interest

The authors declare no conflicts of interest.

Abbreviations

The following abbreviations are used in this manuscript:
VMTVehicle Miles Traveled
HFIHealth and Fitness Index
LassoLeast Absolute Shrinkage and Selection Operator

References

  1. Geyer, H.S. The theory and praxis of mixed-use development—An integrative literature review. Cities 2024, 147, 104774. [Google Scholar] [CrossRef]
  2. Ryckewaert, M.; Zaman, J.; De Boeck, S. Variable Arrangements Between Residential and Productive Activities: Conceiving Mixed-Use for Urban Development in Brussels. Urban. Plan. 2021, 6, 334–349. [Google Scholar] [CrossRef]
  3. Jinollo, G.T.; Habtemariam, L.W.; Belete, D.A. The impacts of mixed land use planning on spatial development. Discov. Cities 2025, 2, 8. [Google Scholar] [CrossRef]
  4. Kausar, A.; Zubair, S.; Sohail, H.; Anwar, M.M.; Aziz, A.; Vambol, S.; Vambol, V.; Khan, N.A.; Poteriaiko, S.; Tyshchenko, V.; et al. Evaluating the challenges and impacts of mixed-use neighborhoods on urban planning: An empirical study of a megacity, Karachi, Pakistan. Discov. Sustain. 2024, 5, 24. [Google Scholar] [CrossRef]
  5. Zagow, M.; Elbany, M.; Darwish, A.M. Identifying urban, transportation, and socioeconomic characteristics across US zip codes affecting CO2 emissions: A decision tree analysis. Energy Built Environ. 2024, 6, 484–494. [Google Scholar] [CrossRef]
  6. Martino, N.; Girling, C.; Lu, Y. Urban form and livability: Socioeconomic and built environment indicators. Build. Cities 2021, 2, 220–243. [Google Scholar] [CrossRef]
  7. Hamidi, N. How Walkable Mixed-Use Urbanism Affects Environmental, Social, and Economic Sustainability. Int. J. Sustain. Appl. Sci. Eng. 2025, 2, 25–38. [Google Scholar]
  8. Zahrah, W.; Ginting, N.; Aulia, D.N.; Marisa, A. Quality of life for livable mixed use living. IOP Conf. Ser. Earth Environ. Sci. 2021, 780, 012043. [Google Scholar] [CrossRef]
  9. Darwish, A.M.; Zagow, M.; Elkafoury, A. Impact of land use, travel behavior, and socio-economic characteristics on carbon emissions in cool-climate cities, USA. Environ. Sci. Pollut. Res. 2023, 30, 91108–91124. [Google Scholar] [CrossRef]
  10. Hendrigan, C.; Newman, P. Dense, mixed-use, walkable urban precinct to support sustainable transport or vice versa? A model for consideration from Perth, Western Australia. Int. J. Sustain. Transp. 2017, 11, 11–19. [Google Scholar] [CrossRef]
  11. Shulman, L. Walk and Thrive? The Importance of Low-Income Household Access to Mixed-Use Neighborhoods in the Metro Vancouver Region. Ph.D. Thesis, University of British Columbia, Vancouver, BC, Canada, 2021. [Google Scholar] [CrossRef]
  12. Elkafoury, A.; Zagow, M.; Saeed, K.; Darwish, A.M. Model Willingness to Use Public Transport in the USA Based on Socio-Economic and Demographic Characteristics. Civil. Eng. Archit. 2023, 11, 1487–1497. [Google Scholar] [CrossRef]
  13. Shokry, S.; Alrashidi, A.; El-Bany, M.E.-S.; Darwish, A.M. Impact of road geometry on school-area traffic congestion using regression and machine learning analysis: Lessons from six Saudi cities. Transp. Res. Interdiscip. Perspect. 2025, 34, 101686. [Google Scholar] [CrossRef]
  14. Temeljotov Salaj, A.; Lindkvist, C.M. Urban facility management. Facilities 2021, 39, 525–537. [Google Scholar] [CrossRef]
  15. Mbiankeu Nguea, S. Uncovering the linkage between sustainable development goals for access to electricity and access to safely managed drinking water and sanitation services. Soc. Sci. Med. 2024, 345, 116687. [Google Scholar] [CrossRef]
  16. McShane, I.; Coffey, B. Rethinking community hubs: Community facilities as critical infrastructure. Curr. Opin. Environ. Sustain. 2022, 54, 101149. [Google Scholar] [CrossRef]
  17. Mahtta, R.; Fragkias, M.; Güneralp, B.; Mahendra, A.; Reba, M.; Wentz, E.A.; Seto, K.C. Urban land expansion: The role of population and economic growth for 300+ cities. Npj Urban. Sustain. 2022, 2, 5. [Google Scholar] [CrossRef]
  18. Abdi, M.H.; Lamíquiz-Daudén, P.J. Transit-oriented development in developing countries: A qualitative meta-synthesis of its policy, planning and implementation challenges. Int. J. Sustain. Transp. 2022, 16, 195–221. [Google Scholar] [CrossRef]
  19. Sallis, J.F.; Saelens, B.E.; Frank, L.D.; Conway, T.L.; Slymen, D.J.; Cain, K.L.; Chapman, J.E.; Kerr, J. Neighborhood built environment and income: Examining multiple health outcomes. Soc. Sci. Med. 2009, 68, 1285–1293. [Google Scholar] [CrossRef]
  20. Ye, Y.; Richards, D.; Lu, Y.; Song, X.; Zhuang, Y.; Zeng, W.; Zhong, T. Measuring daily accessed street greenery: A human-scale approach for informing better urban planning practices. Landsc. Urban. Plan. 2019, 191, 103434. [Google Scholar] [CrossRef]
  21. Althoff, T.; Ivanovic, B.; Hicks, J.L.; Delp, S.L.; King, A.C.; Leskovec, J. Countrywide natural experiment reveals impact of built environment on physical activity. arXiv 2024, arXiv:2406.04557. [Google Scholar] [CrossRef]
  22. Zhu, J.; Kodali, H.; Wyka, K.E.; Huang, T.T.-K. Perceived neighborhood environment walkability and health-related quality of life among predominantly Black and Latino adults in New York City. BMC Public Health 2023, 23, 127. [Google Scholar] [CrossRef]
  23. Shi, Y.; Zheng, B.; Wang, Z.; Zheng, J. Mixed Land Use and Its Relationship with CO2 Emissions: A Comparative Analysis Based on Several Typical Development Zones in Shanghai. Land 2023, 12, 1675. [Google Scholar] [CrossRef]
  24. Zhang, M.; Zhao, P. The impact of land-use mix on residents’ travel energy consumption: New evidence from Beijing. Transp. Res. D Transp. Environ. 2017, 57, 224–236. [Google Scholar] [CrossRef]
  25. Li, Q.; Chen, X.; Jiao, S.; Song, W.; Zong, W.; Niu, Y. Can Mixed Land Use Reduce CO2 Emissions? A Case Study of 268 Chinese Cities. Sustainability 2022, 14, 15117. [Google Scholar] [CrossRef]
  26. Adkins, A.; Makarewicz, C.; Scanze, M.; Ingram, M.; Luhr, G. Contextualizing Walkability: Do Relationships Between Built Environments and Walking Vary by Socioeconomic Context? J. Am. Plan. Assoc. 2017, 83, 296–314. [Google Scholar] [CrossRef]
  27. Avila-Palencia, I.; Sánchez, B.N.; Rodríguez, D.A.; Perez-Ferrer, C.; Miranda, J.J.; Gouveia, N.; Bilal, U.; Useche, A.F.; Wilches-Mogollon, M.A.; Moore, K.; et al. Health and Environmental Co-Benefits of City Urban Form in Latin America: An Ecological Study. Sustainability 2022, 14, 14715. [Google Scholar] [CrossRef]
  28. WHO European. European Ministerial Conference on Counteracting Obesity—Diet and Physical Activity for Health. In European Charter on Counteracting Obesity; World Health Organization: Geneva, Switzerland, 2006; Volume 2011, p. 57. [Google Scholar]
  29. Saelens, B.E.; Sallis, J.F.; Frank, L.D. Environmental correlates of walking and cycling: Findings from the transportation, urban design, and planning literatures. Ann. Behav. Med. 2003, 25, 80–91. [Google Scholar] [CrossRef] [PubMed]
  30. Ewing, R.; Meakins, G.; Hamidi, S.; Nelson, A.C. Relationship between urban sprawl and physical activity, obesity, and morbidity—Update and refinement. Health Place 2014, 26, 118–126. [Google Scholar] [CrossRef] [PubMed]
  31. Otsuka, N.; Welsch, J.; Lättman, K.; Prichard, E.; van der Vlugt, A.-L.; De Vos, J. Walking in urban neighbourhoods—Insights from a mixed methods approach and citizen science in walkability research. Transp. Res. Interdiscip. Perspect. 2025, 33, 101588. [Google Scholar] [CrossRef]
  32. Bibri, S.E.; Krogstie, J.; Kärrholm, M. Compact city planning and development: Emerging practices and strategies for achieving the goals of sustainability. Dev. Built Environ. 2020, 4, 100021. [Google Scholar] [CrossRef]
  33. Eckersten, S.; Balfors, B. Bridging gaps between land use and transport planning to facilitate the transition toward sustainable development. Int. J. Sustain. Transp. 2025, 19, 517–531. [Google Scholar] [CrossRef]
  34. Cacciatore, S.; Mao, S.; Nuñez, M.V.; Massaro, C.; Spadafora, L.; Bernardi, M.; Perone, F.; Sabouret, P.; Biondi-Zoccai, G.; Banach, M.; et al. Urban health inequities and healthy longevity: Traditional and emerging risk factors across the cities and policy implications. Aging Clin. Exp. Res. 2025, 37, 143. [Google Scholar] [CrossRef]
  35. Dau, L.; Barros, P.; Cilliers, E.J.; Hemsley, B.; Martin, M.; Lakhanpaul, M.; Smith, M. Urban density and child health and wellbeing: A scoping review of the literature. Health Place 2025, 91, 103393. [Google Scholar] [CrossRef]
  36. Bin Tan, S.; Dickens, B.L.; Sevtsuk, A.; Zheng, S.; Zeng, K.; Lee, Y.S.; Yap, F.; Chan, S.-Y.; Chan, J.K.Y.; Tan, K.H.; et al. Exploring how socioeconomic status affects neighbourhood environments’ effects on obesity risks: A longitudinal study in Singapore. Landsc. Urban. Plan. 2022, 226, 104450. [Google Scholar] [CrossRef]
  37. Lee, J.; Choi, K.; Leem, Y. Bicycle-based transit-oriented development as an alternative to overcome the criticisms of the conventional transit-oriented development. Int. J. Sustain. Transp. 2016, 10, 975–984. [Google Scholar] [CrossRef]
  38. Tcymbal, A.; Demetriou, Y.; Kelso, A.; Wolbring, L.; Wunsch, K.; Wäsche, H.; Woll, A.; Reimers, A.K. Effects of the built environment on physical activity: A systematic review of longitudinal studies taking sex/gender into account. Environ. Health Prev. Med. 2020, 25, 75. [Google Scholar] [CrossRef]
  39. Liu, J.; Zhou, J.; Xiao, L. Built environment correlates of walking for transportation: Differences between commuting and non-commuting trips. J. Transp. Land Use 2021, 14, 1129–1148. [Google Scholar] [CrossRef]
  40. Boakye-Dankwa, E.; Barnett, A.; Pachana, N.A.; Turrell, G.; Cerin, E. Associations between latent classes of perceived neighborhood destination accessibility and walking behaviors in older adults of a low-density and a high-density city. J. Aging Phys. Act. 2019, 27, 553–564. [Google Scholar] [CrossRef] [PubMed]
  41. Yin, C.; Cao, J.; Sun, B.; Liu, J. Exploring built environment correlates of walking for different purposes: Evidence for substitution. J. Transp. Geogr. 2023, 106, 103505. [Google Scholar] [CrossRef]
  42. Saelens, B.E.; Handy, S.L. Built Environment Correlates of Walking: A Review. Med. Sci. Sports Exerc. 2008, 40, S550. [Google Scholar] [CrossRef]
  43. Kuzmyak, R.; Baber, C.; Savory, D. Use of a walk opportunities index to quantify local accessibility. Transp. Res. 2006, 1977, 145–153. [Google Scholar] [CrossRef]
  44. Boarnet, M.G.; Greenwald, M.; McMillan, T. Walking, Urban Design, and Health: Toward a Cost-Benefit Analysis Framework. J. Plan. Educ. Res. 2008, 27, 341–358. [Google Scholar] [CrossRef]
  45. Ewing, R.; Cervero, R. Travel and the Built Environment: A Meta-Analysis. J. Am. Plan. Assoc. 2010, 76, 265–294. [Google Scholar] [CrossRef]
  46. Sung, H.; Lee, S. Residential built environment and walking activity: Empirical evidence of Jane Jacobs’ urban vitality. Transp. Res. 2015, 41, 318–329. [Google Scholar] [CrossRef]
  47. Kramer, M.G. Our Built and Natural Environments; EPA: Washington, DC, USA, 2013. [Google Scholar]
  48. Jerrett, M.; Almanza, E.; Davies, M.; Wolch, J.; Dunton, G.; Spruitj-Metz, D.; Pentz, M.A. Smart Growth Community Design and Physical Activity in Children. Am. J. Prev. Med. 2013, 45, 386–392. [Google Scholar] [CrossRef]
  49. Dunton, G.F.; Intille, S.S.; Wolchc, J.; Pentz, M.A. Investigating the impact of a smart growth community on the contexts of children’s physical activity using Ecological Momentary Assessment. Health Place 2012, 18, 76–84. [Google Scholar] [CrossRef]
  50. Salon, D.; Boarnet, M.G.; Handy, S.; Spears, S.; Tal, G. How do local actions affect VMT? A critical review of the empirical evidence. Transp. Res. Part D 2012, 17, 495–508. [Google Scholar] [CrossRef]
  51. Brownstone, D.; Golob, T. The Impact of Residential Density on Vehicle Usage and Energy Consumption. J. Urban Econ. 2009, 65, 91–98. [Google Scholar] [CrossRef]
  52. Fang, H.A. A discrete–continuous model of households’ vehicle choice and usage, with an application to the effects of residential density. Transp. Res. Part. B Methodol. 2008, 42, 736–758. [Google Scholar] [CrossRef]
  53. Frank, L.D.; Greenwald, M.J.; Kavage, S.; Devlin, A. An Assessment of Urban form and Pedestrian and Transit Improvements as an Integrated Ghg Reduction Strategy; The State of Washington, Department of Transportation: Seattle, WA, USA, 2011. [Google Scholar]
  54. Stein, D.F. Entertaining Entertainment Districts. Urban. Land 2007, 66, 106–109. [Google Scholar]
  55. Elkafoury, A.; Elboshy, B.; Darwish, A.M. Development of response surface method prediction model for traffic-related roadside noise levels based on traffic characteristics. Environ. Sci. Pollut. Res. 2023, 30, 94229–94241. [Google Scholar] [CrossRef]
  56. Berg Mårtensson, H.; Höjer, M.; Åkerman, J. Low emission scenarios with shared and electric cars: Analyzing life cycle emissions, biofuel use, battery utilization, and fleet development. Int. J. Sustain. Transp. 2024, 18, 115–133. [Google Scholar] [CrossRef]
  57. Darwish, A.M.; Almansour, M.; Salah, A.; Zagow, M.; Saeed, K.; Elkafoury, A. Sensitivity evaluation of machine learning-based calibrated transportation mode choice models: A case study of Alexandria City, Egypt. Transp. Res. Interdiscip. Perspect. 2024, 24, 101052. [Google Scholar] [CrossRef]
  58. Alsobky, A.; Darwish, A.M. A realistic framework for volume-delay function determination. Ain Shams Eng. J. 2023, 14, 102279. [Google Scholar] [CrossRef]
  59. Acosta-González, N.; Cahueñas, S.; Pérez, C. Risk factors for fatal road traffic accidents in Ecuador. Transp. Res. Interdiscip. Perspect. 2025, 32, 101515. [Google Scholar] [CrossRef]
  60. Hermawan, I.; Mulya Firdausy, C.; Rizqy Rambe, K.; Zuhdi, F.; Erwidodo; Nugraheni, R.D.; Malisan, J.; Isnasari, Y.; Marpaung, E.; Asshagab, S.M. Road traffic facilities, traffic accidents, and poverty: Lesson learned from Indonesia. Transp. Res. Interdiscip. Perspect. 2024, 28, 101273. [Google Scholar] [CrossRef]
  61. Patil, G.R.; Sharma, G. Overweight/obesity relationship with travel patterns, socioeconomic characteristics, and built environment. J. Transp. Health 2021, 22, 101240. [Google Scholar] [CrossRef]
  62. Saeidizand, P.; Boussauw, K. Patterns of car dependency of metropolitan areas worldwide: Learning from the outliers. Int. J. Sustain. Transp. 2024, 18, 221–235. [Google Scholar] [CrossRef]
  63. US-Census. 2025. Available online: https://data.census.gov/table (accessed on 10 May 2025).
Figure 1. Distribution of urban infrastructure, socioeconomic, and demographic variables.
Figure 1. Distribution of urban infrastructure, socioeconomic, and demographic variables.
Sustainability 17 10873 g001
Figure 2. Relationship between Health and Fitness Index and Land-Use Mix Factor in different contexts (2. (A): city type-based, 2. (B): climate zone-based, 2. (C): climate moisture regime-based, and 2. (D): region-based).
Figure 2. Relationship between Health and Fitness Index and Land-Use Mix Factor in different contexts (2. (A): city type-based, 2. (B): climate zone-based, 2. (C): climate moisture regime-based, and 2. (D): region-based).
Sustainability 17 10873 g002
Figure 3. Correlation matrix of the considered variables.
Figure 3. Correlation matrix of the considered variables.
Sustainability 17 10873 g003
Figure 4. Correlation heatmap showing associations between Health and Fitness Index and ZIP code–level socioeconomic indicators.
Figure 4. Correlation heatmap showing associations between Health and Fitness Index and ZIP code–level socioeconomic indicators.
Sustainability 17 10873 g004
Figure 5. Coefficients of Lasso regression analysis.
Figure 5. Coefficients of Lasso regression analysis.
Sustainability 17 10873 g005
Figure 6. The contribution of each predictor to the Health and Fitness Index regression analysis.
Figure 6. The contribution of each predictor to the Health and Fitness Index regression analysis.
Sustainability 17 10873 g006
Table 1. Number of cases for each urban model.
Table 1. Number of cases for each urban model.
ModelAll Zip CodesMetropolis Hybrid 1Metropolis Hybrid 2Metropolis Hybrid 3Metropolis Hybrid 4
Zip codes287581582815828140639563
Model formulation: Metropolis Hybrid 1 is a statistical model developed to examine how urban facility types, demographic characteristics, and elements of the urban fabric influence the dependent variable within U.S. metropolitan ZIP codes. This model specifically includes Walk Score, allowing the analysis to capture the effect of pedestrian accessibility and walkability on the outcome.Metropolis Hybrid 2 is constructed using the same metropolitan ZIP-code sample as Hybrid 1, but includes Water Area as a predictor variable. This modification allows the model to capture the influence of proximity to water bodies—a key geographical and urban-development factor in many U.S. metropolitan regions.Metropolis Hybrid 3 refines the dataset by excluding ZIP codes that contain water areas and by excluding Walk Score, while including total carbon emissions (tCO2e/yr).
This model is designed to specifically examine urban ZIP codes where water is not a confounding factor, allowing the analysis to focus on environmental performance and built-environment structure.
Metropolis Hybrid 4 is the most selective model, using a reduced dataset focused solely on metropolitan ZIP codes that contain complete socioeconomic, demographic, and built-form data without missing entries. It excludes Walk Score, Water Area, and Carbon Emissions, allowing the model to focus strictly on structural and socioeconomic predictors.
Purpose of the model This formulation isolates the effect of walkability while holding other structural and socioeconomic variables constant. It tests whether pedestrian-friendly environments significantly affect the dependent variable in metropolitan areas.This model tests whether ZIP codes that include or border water areas exhibit different behavioral patterns in the dependent variable. Water proximity often affects land values, land-use diversity, housing patterns, and environmental amenities. Hybrid 2 quantifies that relationship.By removing ZIP codes with water area and walkability scores, this formulation isolates the relationship between carbon emissions, built form, and socioeconomic characteristics. It helps determine whether low-emission ZIP codes exhibit distinct structural or demographic profiles independent of geographical water features.Hybrid 4 aims to produce a pure metropolitan structural–socioeconomic model, free from external environmental or geographical modifiers. It provides a baseline urban-form model for metropolitan ZIP codes, enabling direct comparison with earlier models to understand how excluding environmental and walkability variables changes the explanatory power (Adjusted R2 = 0.842).
Table 2. Research variables.
Table 2. Research variables.
VariablesDefinitionData Source
Health and Fitness IndexIndex integrates three key components: normalized walkability scores, the availability of health facilities, and normalized total carbon emissions.
Number of businesses Total number of businesses in each zip codeUS-Census
75 years and overPopulation of people 75 years and over in each zip codeUS-Census
WalkscoreWalk Score is a standardized walkability index that measures how accessible essential daily services are by foot. The score is generated through an algorithm that calculates proximity to multiple categories of amenities (including grocery stores, schools, restaurants, parks, and retail) using a distance-decay function that awards maximum points for amenities within 400 m and zero points beyond 1.6 km. The algorithm also incorporates the diversity of accessible amenities and the pedestrian-friendliness of the street network, including intersection density, block length, and population density. The final score is normalized on a 0–100 scale, where higher scores indicate more walkable environments. (https://www.walkscore.com/professional/research.php) (accessed on 29 October 2025).
Health facilitiesNumber of health facilities in each zip codeUS-Census
Total (tCO2e/yr)Carbon emission in each zip codeUS-Census
Population densityNumber of people per square mile in each zip codeUS-Census
Median ageMedian age in each zip codeUS-Census
Diversity of raceRepresent a probability that if two people are selected at random, they will have the same race
% Occupied housing unitsPercentage of occupied housing units to total housing units in each zip codeUS-Census
Income per householdIncome per household in each zip codeUS-Census
EducationNumber of education facilities in each zip codeUS-Census
Accommodation and food servicesNumber of accommodations and food services in each zip codeUS-Census
Arts, entertainment, and recreationNumber of arts, entertainment, and recreation facilities in each zip codeUS-Census
Mix FactorIndex of proportion of jobs to population and diversity of employment typesUS-Census
Table 3. Regression Model Results.
Table 3. Regression Model Results.
Dependent VariableAll zip codesMetropolis
Hybrid 1
Metropolis
Hybrid 2
Metropolis
Hybrid 3
Metropolis
Hybrid 4
Regression Statistics
(Constant)0.00.00.0−100.85−107.424
Type of facilities
Education7.457.76 7.497.3367.157
Accommodation and food services4.454.684.764.7524.907
Health2.072.162.112.0872.168
Arts, entertainment, and recreation3.463.343.193.1963.006
Social and Economic
Population density −0.003−0.004
Total housing units0.010.010.010.0060.005
% Occupied housing units 55.83362.917
Median age−0.58−0.60−0.440.6751.231
Diversity of race−33.27−50.63−51.09−67.675−87.406
Income per household 0.0010.001
Health
75 years and over−0.05−0.06−0.05−0.060−0.054
Total (tCO2e/yr)0.981.190.10
Urban Fabric
Mix factor0.260.240.230.2180.168
Street factor−0.22−0.18−0.070.1500.148
Mixed land use34.7745.1546.6445.44553.813
Water Area−0.05 −0.09
Walk score0.190.07
Adjusted R Square0.920.920.930.8710.842
Observations28,75815,82815,82814,0639563
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Zagow, M.; Darwish, A.M.; Shokry, S. Modeling Health-Supportive Urban Environments: The Role of Mixed Land Use, Socioeconomic Factors, and Walkability in U.S. ZIP Codes. Sustainability 2025, 17, 10873. https://doi.org/10.3390/su172310873

AMA Style

Zagow M, Darwish AM, Shokry S. Modeling Health-Supportive Urban Environments: The Role of Mixed Land Use, Socioeconomic Factors, and Walkability in U.S. ZIP Codes. Sustainability. 2025; 17(23):10873. https://doi.org/10.3390/su172310873

Chicago/Turabian Style

Zagow, Maged, Ahmed Mahmoud Darwish, and Sherif Shokry. 2025. "Modeling Health-Supportive Urban Environments: The Role of Mixed Land Use, Socioeconomic Factors, and Walkability in U.S. ZIP Codes" Sustainability 17, no. 23: 10873. https://doi.org/10.3390/su172310873

APA Style

Zagow, M., Darwish, A. M., & Shokry, S. (2025). Modeling Health-Supportive Urban Environments: The Role of Mixed Land Use, Socioeconomic Factors, and Walkability in U.S. ZIP Codes. Sustainability, 17(23), 10873. https://doi.org/10.3390/su172310873

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop