Next Article in Journal
Temporal Trends and Meteorological Associations of Particulate Matter and Gaseous Air Pollutants in Tehran, Iran (2017–2021)
Previous Article in Journal
Hydro-Climatic Variability and Peak Discharge Response in Zarrinehrud River Basin, Iran, Between 1986 and 2018
Previous Article in Special Issue
Modelling of Nanoparticle Number Emissions from Road Transport—An Urban Scale Emission Inventory
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Comparative Analysis of the Impact of Built Environment and Land Use on Monthly and Annual Mean PM2.5 Levels

1
School of Architecture and Art, Hebei University of Engineering, Handan 056038, China
2
School of Architecture and Urban Planning, Beijing University of Civil Engineering and Architecture, Beijing 100032, China
3
School of Architecture, Tianjin University, Tianjin 300072, China
*
Author to whom correspondence should be addressed.
Atmosphere 2025, 16(6), 682; https://doi.org/10.3390/atmos16060682
Submission received: 22 April 2025 / Revised: 27 May 2025 / Accepted: 3 June 2025 / Published: 5 June 2025
(This article belongs to the Special Issue Modeling and Monitoring of Air Quality: From Data to Predictions)

Abstract

Urban planners are progressively recognizing the significant effects of the built environment and land use on PM2.5 levels. However, in analyzing the drivers of PM2.5 levels, researchers’ reliance on annual mean and seasonal means may overlook the monthly variations in PM2.5 levels, potentially impeding accurate predictions during periods of high pollution. This study focuses on the area within the Sixth Ring Road of Beijing, China. It utilizes gridded monthly and annual mean PM2.5 data from 2019 as the dependent variable. The research selects 33 independent variables from the perspectives of the built environment and land use. The Extreme Gradient Boosting (XGBoost) method is employed to reveal the driving impacts of the built environment and land use on PM2.5 levels. To enhance the model accuracy and address the randomness in the division of training and testing sets, we conducted twenty comparisons for each month. We employed Shapley Additive Explanations (SHAP) and Partial Dependence Plots (PDP) to interpret the models’ results and analyze the interactions between the explanatory variables. The results indicate that models incorporating both the built environment and land use outperformed those that considered only a single aspect. Notably, in the test set for April, the R2 value reached up to 0.78. Specifically, the fitting accuracy for high pollution months in February, April, and November is higher than the annual mean, while July shows the opposite trend. The coefficient of variation for the importance rankings of the seven key explanatory variables exceeds 30% for both monthly and annual means. Among these variables, building density exhibited the highest coefficient of variation, at 123%. Building density and parking lots density demonstrate strong explanatory power for most months and exhibit significant interactions with other variables. Land use factors such as wetlands fraction, croplands fraction, park and greenspace fraction, and forests fraction have significant driving effects during the summer and autumn seasons months. The research on time scales aims to more effectively reduce PM2.5 levels, which is essential for developing refined urban planning strategies that foster healthier urban environments.

1. Introduction

PM2.5 (airborne particles with an aerodynamic diameter of less than 2.5 μm) significantly threatens human health and environmental quality, and rapid urbanization worldwide has exacerbated these effects [1,2]. China is no exception [3,4]; the capital, Beijing, has reached an urbanization rate of 87.6%, but it also grapples with PM2.5 pollution [5,6]. Research analysis has revealed that the annual mean PM2.5 levels in Beijing exceeded the national standard by 20% in 2019 [6,7,8]. In March 2020, the Chinese government issued the “Guiding Opinions on Building a Modern Environmental Governance System,” which proposed measures for air pollution control. Research indicates that road density, building density, and floor area ratio have a positive impact on the annual mean PM2.5 levels in Beijing [9]. Further studies have shown a strong correlation between building morphology and PM2.5 levels, with significant effects observed in spring and autumn, but not in summer and winter [10]. Existing research indicates that the influence of various factors on PM2.5 levels exhibits temporal variability, and a deeper understanding of this variability can provide critical insights for enhancing environmental policies. Therefore, in the subsequent literature analysis, we reviewed relevant studies on the selection of PM2.5 variables, explanatory variables, research areas, and research methods (see Supplemental Material, Table S1).
In previous studies, the acquisition of PM2.5 data primarily relies on monitoring point data and remote sensing data, each with advantages and applications [11,12]. Monitoring points provide high-resolution PM2.5 data [1]. However, the limited number of monitoring stations and the challenges associated with data accessibility hinder the effective utilization of this data at the urban level. In contrast, remote sensing data can cover large areas and provide continuous spatial information [13,14]. Additionally, it is crucial to consider the resolution of remote sensing data as well as the delineation of research units. Most studies organize data by streets, custom grids, or administrative regions, which may introduce biases during data processing [15]. Therefore, the study employs the original resolution of 0.001° to establish the research units. Furthermore, PM2.5 levels exhibit non-stationarity across different temporal scales [16,17]. While annual and seasonal averages have garnered significant attention from researchers, the variations in monthly averages are often overlooked.
Previous research has laid a solid foundation for exploring the built environment’s impact and land use on PM2.5 levels, which is becoming a hot topic in air pollution. Regarding the built environment, studies have shown that reasonably planning built-up areas and urban road density while controlling population density helps reduce pollutant emissions [7,8,18]. Furthermore, as urban building density continues to increase, compact urban forms weaken airflow and restrict the dispersion of PM2.5 levels [18,19,20]. Research conducted in Seoul, South Korea, found that increased traffic density leads to higher PM2.5 levels, while public transportation and mixed land use negatively affect PM2.5 levels [21]. Nevertheless, these studies do not employ a systematic method for selecting built environment variables, which limits the ability to conduct cross-comparisons among various research efforts. Cervero et al. initially proposed a theoretical framework of “3D” dimensions, which includes density, diversity, and design [22]. Later, Ewing and Cervero built on the “3D” framework by incorporating transfer distance and destination accessibility, creating a more thorough “5D” dimension [23]. They noted the lack of population-related explanatory variables within the “5D” dimension. To address this gap, Chris De Gruyter and colleagues further developed the framework by incorporating demand management and demographic factors, thus introducing the “7D” dimension [24]. This improvement enables researchers to select built environment variables more comprehensively to analyze their impact on PM2.5 levels and other environmental factors. Additionally, land use has been shown to significantly affect PM2.5 levels. Urban land use diversity, clustering, and concentration of development can make streets more walkable, reduce travel distances, and increase public transportation usage, thereby reducing vehicle emissions [8,25,26]. The increase in impervious surface area in urban centers leads to higher temperatures, promoting the formation of pollutants. With advancements in remote sensing and image recognition technologies, researchers have successfully identified urban land types [27,28] and quantified the impact of land use in New Delhi, India, on air pollution using land use data [29]. Generally, vegetation is considered beneficial for controlling PM2.5 levels. However, its effectiveness may vary with seasonal changes due to climatic influences [30]. These studies demonstrate that the built environment and land use influence PM2.5 levels, but the construction of indicator systems remains inadequate. This study employs a comparative analysis of two variable systems, providing credible explanations for a deeper understanding of the influencing factors of PM2.5 levels.
Existing research methods have established a solid foundation for studying the impact of the built environment and land use on PM2.5 levels, making it a focal point for analysis in the field of air quality modeling. Scholars have commonly employed linear models to analyze the relationship between the built environment, land use, and PM2.5 levels [2,11,20,31,32,33]. Numerous scholars have employed Ordinary Least Squares (OLS) models to analyze the impact of urban socioeconomic factors, urban morphology, and transportation networks on PM2.5 [32,34,35]. However, the least squares method fails to account for spatial effects, overlooking the potential spatial influences between air quality and its influencing factors. Some scholars have recognized this issue and proposed the Geographically Weighted Regression (GWR) model to explain the spatial heterogeneity of PM2.5 influencing factors [11,34]. Although the GWR model has been improved, it applies a uniform bandwidth to all explanatory variables, which can reduce the model’s accuracy. Subsequently, the Multiscale Geographically Weighted Regression (MGWR) model was introduced, which can obtain a set of optimal bandwidths for specific covariates, enhancing the model’s accuracy [36]. However, linear models typically assume that the data follows a linear function, which may overlook important details and introduce biases in the model results [37,38]. Nonlinear models are considered an effective solution to this problem. In studies focused on air pollution modeling in the Northwestern United States, a variety of machine learning models were compared to identify the optimal explanatory model, with the XGBoost model demonstrating the highest goodness of fit [39]. Additionally, partial dependence plots (PDP) and Shapley additive explanations (SHAP) serve as interpretive methods for XGBoost, which are crucial in enhancing the model’s interpretability. Peng et al. utilized PDP to explore the nonlinear relationships and threshold effects of the built environment on PM2.5 [40]. Doan et al. revealed the impact of the built environment, road vehicles, and PM2.5 on urban vitality, conducting an importance analysis of explanatory variables using SHAP [41]. The interaction of Shapley values aids in the deeper understanding of the mechanisms through which land use impacts PM2.5 levels. Research in China has shown that as the interaction between forest and grassland increases, PM2.5 levels tend to decrease. Despite existing studies employing the XGBoost method, the variability in model accuracy arises from different partitions of training sets and testing sets [42]. Therefore, this study conducts multiple experiments to validate the stability of the model and considers the interactions between variable features.
To fill this gap, the present study investigates how built environment and land use variables influence monthly and annual average PM2.5 levels. The XGBoost algorithm and the SHAP method are employed to interpret the results. The study examines the threshold effects and interactions of influencing factors. The research primarily has three objectives:
(1)
To compare the explanatory variable systems of built environment, land use, and a combination of both, in order to identify the optimal explanatory model.
(2)
To evaluate the predictive impacts of variables on monthly and annual average PM2.5 levels under the optimal explanatory model, and to analyze the threshold effects and interactions of these variables using explainable machine learning methods.
(3)
To analyze the differences in the impact of explanatory variables on monthly and annual average PM2.5 levels and propose more refined environmental policies.
This study constructs a variable system from the perspectives of built environment and land use, enhancing the accuracy of the predictive model. Additionally, the analysis of threshold effects and interactions of significant influencing factors on monthly and annual mean PM2.5 levels provides a theoretical basis for formulating more refined urban environmental policies to mitigate PM2.5 pollution.

2. Research Area and Data Sources

2.1. Research Scope and Data Sources

As China’s political, economic, and cultural center, Beijing has seen its urban population reach 21.33 million, with an urbanization rate of 87.6% due to rapid urbanization. This study selects the area within Beijing’s Sixth Ring Road as the research area (Figure 1a), which accounts for approximately 78% of the total population of Beijing. Concurrently, Beijing confronts significant PM2.5 pollution challenges that pose a serious threat to public health. Therefore, this study focuses on the area within Beijing’s Sixth Ring Road, and its findings are significant for preventing and controlling PM2.5 in Beijing and other megacities. Secondly, we use the original raster size of PM2.5 TIFF data as the unit of analysis (0.001° latitude and longitude) to ensure its accuracy (Figure 1b).
To eliminate the impact of the COVID-19 pandemic from 2020 to 2022, the PM2.5 data is sourced from the PM2.5 levels dataset of the Atmospheric Composition Analysis Group website from 2019. The dataset was published by the official website of Washington University in St. Louis (https://sites.wustl.edu/acag/datassets/surface-pm2-5/, accessed on 10 October 2023). Compared to ground monitoring, it is an annual mean surface estimate product obtained from satellite observations, chemical transport modeling, and ground monitoring, with a high level of accuracy (R2 = 0.81) [43,44]. This study selected a monthly and annual mean PM2.5 level raster dataset for 2019, with a resolution of 0.001°. The dataset completely covers the area within the Sixth Ring Road of Beijing and contains no missing values. In addition, research units with dimensions of 0.001° by 0.001° were constructed based on the original resolution and the same locations. The PM2.5 data statistics show that the monthly PM2.5 levels fluctuated significantly in 2019, ranging from 22.6 μg/m3 (May) to 77.36 μg/m3 (April) (Table 1). In terms of extreme values, the minimum levels were recorded at 22 μg/m3 (January), while the maximum reached 187 μg/m3 (April), highlighting the presence of extremely high values during that month. These data reflect the unstable characteristics of PM2.5 levels over time. The months with higher PM2.5 levels are February, April, July, and September.
This study also established a dataset of explanatory variables based on the built environment and land use, which includes 33 explanatory variables (Table 2). The data on buildings and road networks used as explanatory variables are obtained from OpenStreetMap (https://map.baidu.com, accessed on 10 October 2019). It includes the floor information and ground area of each building within the study area. Additionally, manual verification was conducted, and the road network information comprises 11,854 segments. Point of interest (POI) data, such as parking lots, bus stops, and subway stations information are retrieved from AmapAPI (https://lbs.amap.com, accessed on 10 October 2019). Population statistics are derived from Worldpop (https://hub.worldpop.org, accessed on 10 October 2019). This dataset consists of point data with a resolution of 100 m. It provides detailed information on the population distribution within the study area, and the population counts are used to calculate the population density within the research units. The Normalized Difference Vegetation Index (NDVI) data is sourced from the NASA Earth Data Portal (https://search.earthdata.nasa.gov/search, accessed on 10 October 2023). This dataset has a spatial resolution of 1 km and provides monthly and annual average vegetation coverage within the study area. The land use classification data is sourced from the database developed by Tsinghua University (https://data-starcloud.pcl.ac.cn/, accessed on 10 March 2024). This study utilized the 2017 non-urban land use classification dataset, which has a spatial resolution of 30 m and includes 9 types of land use. Additionally, the 2018 urban land use classification dataset, which comprises 12 types of land use, was also employed. The study extracted urban construction land and non-urban built-up land within the research area.

2.2. Dataset Construction

The factors influencing urban PM2.5 levels are multifaceted, with numerous studies examining the impacts of the built environment and land use. The influencing factors extracted in this study include building data, POI, land use, normalized vegetation index, roads, subways, public transport, and population data, totaling 33 explanatory variables. Based on these data, a dataset was constructed from the two dimensions of the built environment and land use. In the “7D” built environment dimension, building data, POI, roads, subways, public transport, and population data were selected. POI, land use, and roads were chosen in the land use dimension. From the perspective of land use and built environment, three models are established to compare the performance of nonlinear regression models (Table 2), where Model 1 is the built environment model. Model 2 is the land use model. Model 3 contains two dimensions.

3. Method

3.1. Research Framework

This study investigates the impacts of the built environment and land use on both monthly and annual mean PM2.5 levels employing the XGBoost. It explores the threshold effect and interaction of influencing factors and proposes measures and suggestions for PM2.5 mitigation. The analytical framework utilized in this study is illustrated in Figure 2.
The data preparation phase has been elaborated in Section 3. Three different models are established based on these indicators. Subsequently, XGBoost is used to perform regression analysis on the three models. Additionally, partial dependence plots (PDP) are employed to understand the nonlinear relationships and threshold effects between the explanatory and dependent variables. The study employs Shapley values to analyze the global importance and interactions of the explanatory variables, thereby gaining a deeper understanding of how various factors collectively influence PM2.5 levels. Based on the analysis results, recommendations for mitigating PM2.5 are proposed.

3.2. Extreme Gradient Boosting (XGBoost)

XGBoost is a widely recognized tree-based machine learning model demonstrating strong modeling capabilities for nonlinear features across various data types. XGBoost is not affected by multicollinearity, which allows it to maintain model stability and accuracy when handling highly correlated features [45]. This characteristic enables XGBoost to effectively identify important features even in the presence of redundant information, thereby providing more reliable predictive results. Additionally, it demonstrates strong robustness and flexibility when dealing with complex, high-dimensional datasets [46]. This study employs XGBoost to investigate the nonlinear relationships between the built environment, land use, and PM2.5, as follows:
y ^ l ( a ) = y ^ l ( a 1 ) + f a ( x q )
In the equation, y ^ l a represents the PM2.5 value predicted by the model following the a -th iteration, y ^ l a 1 is the predicted from the known ensemble of a − 1 decision trees, f a x q is the a − 1-th decision tree, xq corresponds to the q -th explanatory variable. The core of the XGBoost model solution is to fit the residuals from the a − 1-th decision tree to the a -th base decision tree and to compute the results of all decision trees after reaching the specified number of iterations.
During the training process of the XGBoost model, we employed a method of dividing the training and testing sets twenty times to avoid biases from a single split, with 80% of the data used for training and 20% for testing. Next, the training set was further divided into a training set and a validation set in an 80% and 20% ratio, during which we employed five-fold cross-validation to ensure more reasonable partitioning results [47]. Specifically, the training set is divided into five equal subsets, with one subset used as the validation set and the remaining four as the training set, repeating this process five times. This method provides a comprehensive evaluation of model performance, effectively preventing overfitting and enhancing generalization ability. Furthermore, in the search for optimal hyperparameters, we utilized Bayesian optimization to adjust parameters such as the number of trees, maximum depth, and learning rate. This approach builds a probabilistic model of the objective function, effectively exploring the hyperparameter space to improve model performance while reducing computational costs, thus automatically identifying the best parameter combination. To comprehensively assess the model’s accuracy, we employed multiple performance metrics, including the coefficient of determination (R2), root mean square error (RMSE), and mean absolute error (MAE). Hyperparameter tuning was performed during training (Table 3), optimizing the model’s performance [48]. The meanings of the hyperparameters are as follows: colsample_bytree controls the number of features used for each tree to prevent overfitting, eta determines the contribution of each tree to the final prediction results, gamma sets the minimum loss reduction required for a node split to control model complexity, max_depth represents the maximum depth of each tree, n_estimators is the number of trees to be trained, and subsample is the proportion of randomly sampled data used during model training. Ultimately, we utilized the optimized XGBoost model for predictions.

3.3. Explanation of Machine Learning Models: SHAP and PDP

We use both PDP and SHAP methods to explain and visualize XGBoost; PDP can visualize the nonlinear effects of independent variables on dependent variables [49]. Additionally, for densely distributed data ranges, the interpretability of results in PDP is greater. Conversely, for sparsely distributed data, over-interpretation should be avoided. SHAP, based on game theory, enhances the interpretability of XGBoost. SHAP provides a comprehensive analysis of global, local, and interaction effects. SHAP is commonly used for the interpretation of machine learning models, and its calculation formula is:
θ m ( f ) = G H m G ! · H G 1 ! H ! · f G U m x G U m f G x G f s
In the formula, m represents a feature, H denotes the set of all features, G represents the set of all features excluding m, G ! denotes the factorial of the number of features in G, x G represents the input feature values in G, f G U m denotes a model trained with the feature in i, fs denotes another model trained without the feature m , f G U m x G U m f G x G f s represents the difference in outputs between the two models. If the result of θ m is greater than 0, it is considered that the feature has a positive impact on the model prediction; otherwise, it proves that the feature has a negative effect on the model prediction. In this study, we use SHAP analysis to rank the global importance of explanatory variables. Then, we use SHAP value interactions to identify sensitive factors influencing the built environment and land use on PM2.5.

4. Results

4.1. Comparison of Models for Various Months

This study employed three different combinations of variable models: Model 1, Model 2, and Model 3. To avoid the uncertainty in model accuracy caused by XGBoost during multiple random splits of the training and testing sets [42], this study conducted twenty experiments. The performance indicators for these models were recorded as follows: R2, RMSE, and MAE (See Supplemental Material Tables S2–S4). Figure 3a–c, respectively, display the performance results of the test set for R2, RMSE, and MAE. This study uses a standard based on 20 experimental results. As shown in Figure 3a, the R2 goodness of fit of model 3 on the test set is consistently higher than that of models 1 and 2. There are differences in the monthly and annual average test accuracies of model 3, but the accuracy for most months exceeds 0.6. Furthermore, as illustrated in Figure 3b,c, the RMSE and MAE of model 3 exhibit smaller fluctuations across all months and are lower than those of models 1 and 2, indicating that model 3 has smaller prediction errors. This finding suggests that incorporating both built environment and land use factors offers a robust explanation for PM2.5 levels. Consequently, this study utilizes Model 3 for further analysis.

4.2. Relative Importance Analysis

To gain a deeper understanding of the decision-making process of machine learning models, we use a global interpretation method to analyze the model results. We begin by utilizing SHAP to rank the relative importance of the variables. Figure 4 shows the ranking of 33 explanatory variables in Model 3 for their relative importance in January. The relative importance rankings for the remaining 11 months and the annual mean can be found in Supplemental Material, Figure S1. The left side of the figure shows the various types of explanatory variables. Red signifies a positive effect on PM2.5 levels prediction, while green denotes a negative impact, as inferred from the scatter plot on the right. Next, we selected explanatory variables that significantly contribute to the monthly and annual mean levels of PM2.5 using the elbow theory. This approach allowed us to identify the most impactful factors in our analysis. The elbow theory determines the number of clusters by plotting a curve of the loss function against the number of clusters and identifying the curve’s inflection point, or “elbow,” to determine the optimal number of clusters. In this study, we employ this method to evaluate the contributions of explanatory variables, specifically selecting those that demonstrate significant effects. Significant variables are highlighted within red squares. We concluded that parking lots density and building density have strong explanatory power in the annual mean, while building density, croplands fraction, forests fraction, wetlands fraction, park and greenspace fraction, density of population, and parking lots density have solid explanatory power monthly. The following study analyzes these seven explanatory variables.
To investigate the differences in the relative importance of variables on a monthly and annual mean basis, we ranked the 33 variables in Model 3 according to their explanatory power. Figure 5 shows the monthly and annual mean relative importance ranking of the 33 variables in Model 3, sorted according to their explanatory power for PM2.5 levels, with the variable importance ranked from strong to weak, corresponding to a descending order. Overall, the highlighted lines in the figure gradually converge from left to right, suggesting that these variables tend to stabilize in their annual mean explanation over time. Still, the significant differences between months must be addressed. Secondly, we calculated the mean and coefficient of variation in the ranking of explanatory power for the seven variables (Figure 6). Among them, building density, forests fraction, density of population, and parking lots density have relatively low mean rankings, indicating a significant impact on PM2.5 levels. The coefficients of variation for density of population, forests fraction, park and greenspace fraction, and forests fraction are relatively small, suggesting that these variables provide a stable mitigation effect on PM2.5 levels monthly and annually as control indicators. Variables with more significant fluctuations include building density, wetlands fraction, and croplands fraction, which exhibit more significant monthly variations, indicating that metrics focused on specific months might be more effective.

4.3. Nonlinear and Threshold Effects

We need to utilize the PDP to better understand the relationship between the variables and PM2.5 levels (Figure 7). The dependence plot displays the variation in explanatory variables along the X-axis. Simultaneously, the Y-axis indicates the positive or negative impact of these variables on PM2.5 levels, with the absolute value reflecting the magnitude of the effect [50]. We also categorized the seasons, with spring including March, April, and May; summer consisting of June, July, and August; autumn covering September, October, and November; and winter encompassing December, January, and February. In Figure 7, spring is represented in green, summer in blue, autumn in yellow, and winter in red. The black circles indicate the significant relative importance of the variables for that month, while the green range represents the threshold intervals for monthly and annual mean.
In the dimension of the built environment, the most influential factors are building density, parking lots density, and density of population. Their influence is decisive in autumn and winter. Figure 7a illustrates the nonlinear relationship between building density and PM2.5 levels. Overall, the impact of building density on PM2.5 levels is positive and significant over an eight-month period. The degree of influence varies across different months. Specifically, when building density ranges from 0 to 0.06, PM2.5 levels increase rapidly with rising building density; however, once building density exceeds 0.25, the increase in PM2.5 levels becomes more gradual. The seasonal effect is not significant based on the color of the lines, while the months with a considerable impact from building density are February, July, and October. These months have a substantial influence, which may be attributed to the holiday periods during these times, as there is an increase in foot traffic and higher energy consumption in buildings, thereby contributing to increased PM2.5 levels [20,51,52,53]. Figure 7b illustrates the nonlinear impact of parking lots density on PM2.5 levels. Overall, the trend of influence is similar, with a low impact on PM2.5 levels from 0 to 5, a rapid increase in PM2.5 levels as parking lots density rises from 12 to 24, and a slower increase from 24 to 200. The parking lots density significantly affects PM2.5 levels in June, July, November, December, and the annual mean. These months correspond to low and high-temperature periods, which lead to increased vehicle usage and may contribute to the rise in PM2.5 levels [54,55,56]. Figure 7c shows the nonlinear impact of the density of population on PM2.5 levels; within the range from 910 to 960, there is a noticeable threshold effect, with both the significance and implications being relatively weak in winter. The months of relatively vital importance are June and October, with differing impact trends.
In the context of land use, the most influential factors are park and greenspace fraction, wetlands fraction, forests fraction, and croplands fraction. Figure 7d illustrates the nonlinear impact of park and greenspace fraction on PM2.5 levels, Overall, all months exhibit a positive effect except for October, which has a negative impact. The months of relatively vital importance are April, July, and September, all showing positive impacts and significant influence. A possible explanation for this phenomenon is that Beijing, as an economically developed city, experiences an increase in park and greenspace, attracting a greater influx of population and heightened human activities. This aggregation process can rapidly increase over a certain period, leading to a swift rise in PM2.5 levels near park green spaces. This aligns with some scholars’ findings that public green spaces in rapidly urbanizing areas do not effectively mitigate PM2.5 levels [57,58]. Figure 7e depicts the nonlinear impact of cropland fraction on ecosystem dynamics. The degree of both positive and negative effects varies significantly across different months, with several periods demonstrating adverse impacts. The range of cropland fraction from 0 to 0.004 indicates a phase of noticeable changes, while a range from 0.004 to 1 signifies a phase of slow change. Notably, March plays a critical role, exerting negative influences on predictions. Conversely, positive impacts emerge in May, August, and October, likely linked to specific agricultural activities. Farming operations are prevalent in May, during which fertilization practices can release particulate matter into the atmosphere. Furthermore, in October, extensive mechanized operations during the harvest season contribute to increased dust emissions [52,59]. Figure 7f indicates the nonlinear effect of wetlands fraction on PM2.5 levels. Overall, summer and winter positively impact PM2.5 levels. The months with relatively vital importance are April, May, July, and September, all exerting positive effects on PM2.5 levels, with significant influence. This indicates that wetlands fraction is an essential factor influencing PM2.5 levels. A possible reason is that wetlands can easily promote the growth of algae and microorganisms in summer. These organisms’ decomposition and metabolic processes release a certain amount of volatile organic compounds (VOCs). VOCs are essential precursors in the formation of PM2.5 levels. Additionally, when moisture from the wetland surface evaporates, it can carry some delicate particulate matter into the air. These particles can act as condensation nuclei for PM2.5 levels, gradually increasing in size by absorbing surrounding pollutants. This phenomenon is exacerbated in hot summer months, as high temperatures accelerate the rate of moisture evaporation [60]. Figure 7g illustrates the nonlinear impact of forest fraction on monthly PM2.5 levels. Overall, except for February and March, which have negative impacts, the remaining months show positive effects, with no clear seasonal pattern. The months of relative vital importance are June and August, both showing positive impacts. Within the range from 0 to 0.05, an increase in forest fraction leads to a rapid rise in PM2.5 levels. Subsequently, as forest fraction continues to increase, the effect gradually diminishes. A possible reason is that the vigorous biological activity in forests releases VOCs that contribute to PM2.5 formation during photosynthesis. Additionally, the summer sees many tourists in forested areas, which can lead to increased PM2.5 levels emissions due to human activities [61,62].

4.4. Interaction Effects Between Variables

Interactions can deeply explore the impact of explanatory variables on PM2.5, for example, the interaction between building density and parking lots density significantly affects PM2.5 levels. Therefore, examining the interaction effects of explanatory variables on PM2.5 levels is necessary. Each figure in the interaction analysis illustrates how one variable affects PM2.5 levels as another variable changes. When the Shapley value is 0, there is no interaction in the prediction of PM2.5 levels. When the value is greater or less than 0, there are positive and negative interactions with PM2.5 levels. Subsequently, interaction analysis will be conducted on the seven explanatory variables with more significant influence.
We conducted analyses based on the built environment and land use to better understand the interaction effects of influencing factors on the annual mean and monthly PM2.5 levels. Figure 8 illustrates the interactions between building density and density of population. In analyzing the interaction between building density and population density, the X-axis represents “building density,” ranging from 0 to 0.4, while the Y-axis represents “SHAP interaction values.” As building density steadily increases, an accompanying rise in population density will lead to a continuous increase in SHAP values, resulting in a positive interaction effect that further raises the predicted levels of PM2.5. Additionally, there may be subtle differences in distribution and density models across different months. For instance, in June, when building density is between 0.1 and 0.2, there might be fluctuations. However, these fluctuations do not affect the overall positive interaction effect. Figures S2–S6 (see Supplemental Material), respectively, illustrate the interactions between building density and parking lots density, wetlands fraction, croplands fraction, park and greenspace fraction, and forests fraction. These interactions positively impact PM2.5 levels predictions, while the interactions during March, August, and September are not significant. As building density increases, simultaneous increases in the density of population, parking lots density, and wetlands fraction lead to higher predicted PM2.5 levels (Figure 8, Figures S2 and S3). The increase in building density, coupled with a decrease in croplands fraction, park and greenspace fraction, and forests fraction, results in higher PM2.5 levels predictions. These results point to building density and are attributed to the built environment. This confirms that building density significantly impacts PM2.5 levels in Beijing. This finding is consistent with previous studies conducted in Beijing and other cities [7,63,64]. Adjusting building density is not feasible in the short term for the highly polluted months of October, November, and December. Based on the interactions, potential mitigation measures include increasing park and greenspace fraction, croplands fraction, and forests fraction. These indicators are all related to land use [65,66].
Figures S7–S9 (see Supplemental Material) illustrate the interactions of wetlands fraction, density of population, parking lots density, and park and greenspace fraction, all of which positively impact PM2.5 levels predictions. As wetlands fraction increases, the rise in population density, parking lots density, and park and greenspace fraction will increase PM2.5 levels predictions. This may be attributed to the characteristics of large cities, where PM2.5 levels are generally higher, and wetlands can adsorb PM2.5. However, the increase in the other three explanatory variables represents more human activities, leading to a rise in PM2.5 levels [67]. Additionally, the adsorption capacity of wetlands is lower during the summer and autumn seasons [68]. Figure S10 (see Supplemental Material) illustrates the interactions among croplands fraction, density of population, wetlands fraction, parking lots density, forests fraction, and park and greenspace fraction in August. These interactions harm PM2.5 levels predictions and are only evident in August. As cropland fraction increases, the concurrent rise in wetlands fraction and forests fraction negatively affects PM2.5 levels predictions. Simultaneously, a decrease in the density of population, park and greenspace fraction, and parking lots density will also negatively impact PM2.5 levels predictions. Figure S11 (see Supplemental Material) shows the interactions among forests fraction, density of population, parking lots density, and wetlands fraction in October, all of which positively impact PM2.5 levels predictions. As forest fraction increases, a decrease in parking lots density, wetlands fraction, and the density of population negatively affects PM2.5 levels predictions.

5. Discussion

5.1. Necessity of Influencing Mechanism Research of Monthly PM2.5 Levels

This study analyzes the nonlinear effects of the built environment and land use on monthly and annual mean PM2.5 levels. The results indicate that annual mean analyses cannot substitute for monthly studies, as evidenced by the model’s goodness of fit, which is better for months with severe pollution than for the annual mean. Furthermore, there are differences in the ranking of relative importance and the intensity of nonlinear effects in the result analysis. The reason for this outcome may be that the annual mean diminishes the significance of months with severe pollution, as demonstrated in Chen’s study, which shows that PM2.5 levels vary across different time spans [69]. Additionally, existing studies have shown that vehicle emissions and wind direction significantly impact daily average PM2.5 levels, but have a weaker effect on monthly averages [70]. These findings highlight the need to select appropriate time spans for a comparative analysis of PM2.5 levels. Based on this, we analyzed the impacts caused by the built environment and land use, which have been overlooked in previous studies. These findings suggest that monthly or more granular temporal analyses could enhance prevention and control policies.

5.2. Effectiveness of XGBoost Model Construction

First, it is necessary to address the robustness and generalizability of the model during its construction. Research indicates that repeatedly partitioning the training and testing sets helps to mitigate this issue. Additionally, the introduction of a validation set assists in controlling the model’s overfitting. This is consistent with the findings of Liu et al., who addressed this issue by extracting a subset of the data to serve as a local training set [42]. Each model enhanced the randomness of the training and testing set divisions through 780 experiments. Within the training set, we performed five-fold cross-validation to partition the training and validation sets further. Ultimately, we considered the optimal accuracy of all three models simultaneously. Based on the results, we compared and examined the impacts of the built environment and land use. This indicates that a comprehensive consideration of influencing factors aids in identifying more significant control variables. In contrast, previous studies have analyzed the impacts of the built environment, land use, and urban form separately [11,26,71].

5.3. Impact of the Built Environment and Land Use on PM2.5 Levels

Table 4 summarizes the main findings regarding the effects of the built environment and land use on PM2.5 levels. Building density shows a positive effect in terms of global relative importance and nonlinear impact in several months. This finding aligns with prior research, which indicates that high building density exerts the most significant influence on PM2.5 levels in Wuhan, China [20]. Furthermore, it substantiates the assertion that optimizing urban building density serves as an effective strategy for mitigating PM2.5 concentrations [72]. However, there is a lack of analysis regarding the interaction effects among the influencing variables. Reducing building density is difficult to achieve in policymaking. Therefore, through interaction analysis, we found that interactions with six other variables produce a strong positive effect. This implies that controlling other variables will also have a significant impact. Therefore, measures such as increasing the park and greenspace fraction, croplands fraction, and forests fraction, or reducing the density of population and parking lots density will effectively alleviate PM2.5 levels.
The most significant factors at the land use level are wetlands fraction, park and greenspace fraction, and forests fraction. An increase in wetlands fraction leads to an increase in PM2.5 levels. The global importance analysis shows a positive impact in April, May, July, and September. In the nonlinear analysis, only October and March show negative impacts, which are more pronounced in the summer. This may be related to climate and temperature. Li et al. suggested in their study of wetlands in Beijing that the efficiency of wetlands in adsorbing PM2.5 is lowest in summer, which is consistent with our findings [68,73,74]. This may be because agricultural activities are also a source of PM2.5 [75]; the reduced agricultural activities in Beijing during August may also lead to lower PM2.5 levels. Forests fraction positively impacts global importance and nonlinear analyses in June and August. This has a negative impact during the summer. Existing studies have shown that the effects of forests on PM2.5 in Beijing are adverse in winter. However, the lushness of trees in summer may hinder the dispersion of PM2.5, leading to deposition [76]. This does not prove that increased forest fraction will lead to higher PM2.5 levels. However, policy recommendations that can be made regarding interactions include reducing parking lots density and the density of population. Increasing wetlands fraction can work synergistically to enhance forest fraction.

5.4. Limitations and Future Research

This study has several limitations that deserve attention. Firstly, there is a limitation in regional representativeness; this study focuses on Beijing, and its results may not apply to other cities or regions, especially those with different climatic and geographical conditions. Future research should conduct broader regional comparisons to enhance the generalizability of the findings. Secondly, there is a limitation in the period of the data; some data (such as population data) are difficult to obtain every month. However, future governance of high PM2.5 pollution periods may trend towards annual mean and monthly analyses. Regarding variable selection, the variables chosen in this study may not encompass all factors influencing PM2.5 levels, such as socioeconomic and meteorological factors. Future research should consider additional relevant variables. Regarding the spatial heterogeneity of variable impacts, the model has not adequately captured the differences in how factors influence PM2.5 levels. Furthermore, delineating the study area may raise the Modifiable Areal Unit Problem (MAUP), meaning that different ways of defining research units may lead to significant differences in the results. This necessitates more careful consideration of area delineation in future research. This study also lacks consideration of policy impacts, such as the free passage on expressways during the high pollution month of October, increased tourism mobility, and restrictions on vehicle usage in winter. Finally, although machine learning algorithms outperform traditional models in predictive performance and nonlinear analysis, they do not establish causal relationships between variables. Future research should further explore causality.

6. Conclusions

This study uses Beijing’s Sixth Ring Road as a case study to explore the nonlinear relationships among the built environment, land use, and PM2.5. The results indicate that the impact of explanatory variables on annual mean PM2.5 levels differs from their effects on monthly PM2.5 levels, with distinct variations observed across months. Wetlands fraction, forests fraction, croplands fraction, and park and greenspace fraction show particularly significant impacts during summer and autumn months, with wetlands fraction and park and greenspace fraction being incredibly influential in heavily polluted months. Other variables, such as building density and parking lots density, also exhibit significant effects during summer, autumn, and winter months, with notable interactions with different explanatory variables. Based on our research findings, we suggest the following environmental policies. For annual strategies, priority should be given to regulating building density, along with reducing population density and increasing the proportion of park and greenspace for more significant effects. For monthly control measures, policymakers should focus on increasing the park and greenspace fraction and wetlands fraction, as these have a notable impact during the summer months. Additionally, reducing population and parking lots densities will also contribute to pollution reduction. Enhancing parks and greenspaces can help lower PM2.5 levels, and improving plant layouts, ventilation, and tree quality will further enhance results.
These findings contribute to understanding the impacts of land use and the built environment on PM2.5 levels, particularly during heavily polluted months, providing a basis for targeted urban planning strategies to reduce PM2.5 emissions. The contributions of this study mainly include:
(1)
A comparative analysis was conducted to examine the differences in the impacts of the built environment and land use factors on annual mean and monthly average PM2.5 levels in Beijing.
(2)
Multiple divisions of the training and testing sets were performed to reduce the instability in XGBoost model accuracy caused by random dataset partitioning.
(3)
By analyzing the relative importance and threshold effects of various factors influencing PM2.5, we enhanced our understanding of the mechanisms affecting both annual mean and monthly mean, especially for months with high PM2.5 pollution.
(4)
Interaction effects are highlighted to analyze the synergies of influencing factors. These insights could help develop combination measures and policy recommendations from the perspective of land use and the built environment to effectively reduce PM2.5 levels.

Supplementary Materials

The following supporting information can be downloaded at: https://www.mdpi.com/article/10.3390/atmos16060682/s1, Figure S1: Importance ranking of variables; Figure S2: The SHAP interaction values between building density and parking lots density; Figure S3: The SHAP interaction values between building density and wetlands fraction; Figure S4: The SHAP interaction values between building density and croplands fraction; Figure S5: The SHAP interaction values between building density and park & greenspace fraction; Figure S6: The SHAP interaction values between building density and forests fraction; Figure S7: The SHAP interaction values between wetlands fraction and density of population; Figure S8: The SHAP interaction values between wetlands fraction and parking lots density; Figure S9: The SHAP interaction values between wetlands fraction and park & greenspace fraction; Figure S10: The SHAP interaction values between croplands fraction and density of population, wetlands fraction, parking lots density, forests fraction, and park & greenspace fraction, respectively; Figure S11: The SHAP interaction values between forests fraction and density of population, parking lots density, and wetlands fraction; Table S1: Literature review; Table S2: Model 1 testing sets result; Table S3: Model 2 testing sets result; Table S4: Model 3 testing sets result; Table S5: Abbreviated list.

Author Contributions

Conceptualization, Z.W. and X.C.; Data curation, A.S., S.L. and X.C.; Formal analysis, Z.W.; Funding acquisition, Z.W.; Methodology, A.S., Z.W., S.L. and X.C.; Software, S.L.; Supervision, Z.W.; Visualization, X.C.; Writing—original draft, A.S., Z.W. and S.L.; Writing—review and editing, A.S. and Z.W. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The data presented in this study are available on request from the corresponding author.

Acknowledgments

The authors would like to express their gratitude to anonymous reviewers for their valuable comments.

Conflicts of Interest

The authors declare no conflicts of interest.

References

  1. Borna, M.; Turci, G.; Marchetti, M.; Schiano-Phan, R. Evaluating the Influence of Urban Blocks on Air Pollution Concentration Levels: The Case Study of Golden Lane Estate in London. Sustainability 2024, 16, 696. [Google Scholar] [CrossRef]
  2. Ding, Y.; Wang, C.; Wang, J.; Wang, P.; Huang, L. Revealing the impact of built environment, air pollution and housing price on health inequality: An empirical analysis of Nanjing, China. Front. Public Health 2023, 11, 1153021. [Google Scholar] [CrossRef]
  3. Wai, K.-M.; Yu, P.K.N. Application of a Machine Learning Method for Prediction of Urban Neighborhood-Scale Air Pollution. Int. J. Environ. Res. Public Health 2023, 20, 2412. [Google Scholar] [CrossRef] [PubMed]
  4. Geng, G.; Zheng, Y.; Zhang, Q.; Xue, T.; Zhao, H.; Tong, D.; Zheng, B.; Li, M.; Liu, F.; Hong, C.; et al. Drivers of PM2.5 air pollution deaths in China 2002–2017. Nat. Geosci. 2021, 14, 645–650. [Google Scholar] [CrossRef]
  5. Wu, Y.; Lin, S.; Shi, K.; Ye, Z.; Fang, Y. Seasonal prediction of daily PM2.5 concentrations with interpretable machine learning: A case study of Beijing, China. Environ. Sci. Pollut. Res. 2022, 29, 45821–45836. [Google Scholar] [CrossRef] [PubMed]
  6. Liu, L.; Silva, E.A.; Liu, J. A decade of battle against PM2.5 in Beijing. Environ. Plan. A Econ. Space 2018, 50, 1549–1552. [Google Scholar] [CrossRef]
  7. Chen, J.; Wang, B.; Huang, S.; Song, M. The influence of increased population density in China on air pollution. Sci. Total Environ. 2020, 735, 139456. [Google Scholar] [CrossRef]
  8. Wang, B.; Loo, B.P.Y.; Liu, J.; Lei, Y.; Zhou, L. Urban vibrancy and air pollution: Avoidance behaviour and the built environment. Int. J. Urban Sci. 2024, 28, 611–630. [Google Scholar] [CrossRef]
  9. Cao, Q.; Luan, Q.; Liu, Y.; Wang, R. The effects of 2D and 3D building morphology on urban environments: A multi-scale analysis in the Beijing metropolitan region. Build. Environ. 2021, 192, 107635. [Google Scholar] [CrossRef]
  10. Luan, Q.; Jiang, W.; Liu, S.; Guo, H. Impact of Urban 3D Morphology on Particulate Matter 2.5 (PM2.5) Concentrations: Case Study of Beijing, China. Chin. Geogr. Sci. 2020, 30, 294–308. [Google Scholar] [CrossRef]
  11. Deng, X.; Gao, F.; Liao, S.; Li, S. Unraveling the association between the built environment and air pollution from a geospatial perspective. J. Clean. Prod. 2023, 386, 135768. [Google Scholar] [CrossRef]
  12. Zhang, J.; Wan, Y.; Tian, M.; Li, H.; Chen, K.; Xu, X.; Yuan, L. Comparing multiple machine learning models to investigate the relationship between urban morphology and PM2.5 based on mobile monitoring. Build. Environ. 2024, 248, 111032. [Google Scholar] [CrossRef]
  13. Li, Y.; Zhang, M.; Ma, G.; Ren, H.; Yu, E. Analysis of Primary Air Pollutants’ Spatiotemporal Distributions Based on Satellite Imagery and Machine-Learning Techniques. Atmosphere 2024, 15, 287. [Google Scholar] [CrossRef]
  14. Cao, Y.; Yang, T.; Wu, H.; Yan, S.; Yang, H.; Zhu, C.; Liu, Y. Resilience Assessment and Improvement Strategies for Urban Haze Disasters Based on Resident Activity Characteristics: A Case Study of Gaoyou, China. Atmosphere 2024, 15, 289. [Google Scholar] [CrossRef]
  15. Weigand, M.; Wurm, M.; Dech, S.; Taubenböck, H. Remote Sensing in Environmental Justice Research—A Review. ISPRS Int. J. Geo-Inf. 2019, 8, 20. [Google Scholar] [CrossRef]
  16. Wang, D.; Zhong, Z.; Bai, K.; He, L. Spatial and Temporal Variabilities of PM2.5 Concentrations in China Using Functional Data Analysis. Sustainability 2019, 11, 1620. [Google Scholar] [CrossRef]
  17. Chen, X.; Yin, L.; Fan, Y.; Song, L.; Ji, T.; Liu, Y.; Tian, J.; Zheng, W. Temporal evolution characteristics of PM2.5 concentration based on continuous wavelet transform. Sci. Total Environ. 2020, 699, 134244. [Google Scholar] [CrossRef]
  18. Jiang, S.; Tang, L.; Lou, Z.; Wang, H.; Huang, L.; Zhao, W.; Wang, Q.; Li, R.; Ding, Z. The changing health effects of air pollution exposure for respiratory diseases: A multicity study during 2017–2022. Environ. Health 2024, 23, 36. [Google Scholar] [CrossRef] [PubMed]
  19. Ng, E. Policies and technical guidelines for urban planning of high-density cities—Air ventilation assessment (AVA) of Hong Kong. Build. Environ. 2009, 44, 1478–1488. [Google Scholar] [CrossRef]
  20. Yuan, M.; Song, Y.; Huang, Y.; Shen, H.; Li, T. Exploring the association between the built environment and remotely sensed PM2.5 concentrations in urban areas. J. Clean. Prod. 2019, 220, 1014–1023. [Google Scholar] [CrossRef]
  21. Ahn, H.; Lee, J.; Hong, A. Urban form and air pollution: Clustering patterns of urban form factors related to particulate matter in Seoul, Korea. Sustain. Cities Soc. 2022, 81, 103859. [Google Scholar] [CrossRef]
  22. Cervero, R.; Kockelman, K. Travel demand and the 3Ds: Density, diversity, and design. Transp. Res. Part D Transp. Environ. 1997, 2, 199–219. [Google Scholar] [CrossRef]
  23. Ewing, R.; Cervero, R. Travel and the built environment: A synthesis. Transp. Res. Rec. 2001, 1780, 87–114. [Google Scholar] [CrossRef]
  24. Ewing, R.; Cervero, R. Travel and the Built Environment: A meta-analysis. J. Am. Plann. Assoc. 2010, 76, 265–294. [Google Scholar] [CrossRef]
  25. Ajayi, S.A.; Adams, C.A.; Dumedah, G.; Adebanji, A.O.; Ackaah, W. The impact of traffic mobility measures on vehicle emissions for heterogeneous traffic in Lagos City. Sci. Afr. 2023, 21, e01822. [Google Scholar] [CrossRef]
  26. Xu, J.; Saeedi, M.; Zalzal, J.; Zhang, M.; Ganji, A.; Mallinen, K.; Wang, A.; Lloyd, M.; Venuta, A.; Simon, L.; et al. Exploring the triple burden of social disadvantage, mobility poverty, and exposure to traffic-related air pollution. Sci. Total Environ. 2024, 920, 170947. [Google Scholar] [CrossRef] [PubMed]
  27. Paydar, M.; Kamani Fard, A.; Sabri, S. Walking Behavior of Older Adults and Air Pollution: The Contribution of the Built Environment. Buildings 2023, 13, 3135. [Google Scholar] [CrossRef]
  28. Gong, P.; Chen, B.; Li, X.; Liu, H.; Wang, J.; Bai, Y.; Chen, J.; Chen, X.; Fang, L.; Feng, S.; et al. Mapping essential urban land use categories in China (EULUC-China): Preliminary results for 2018. Sci. Bull. 2020, 65, 182–187. [Google Scholar] [CrossRef]
  29. Sharma, D.; Thapar, S.; Jain, D.; Sachdeva, K. Mapping the Spatiotemporal Variability of Particulate Matter Pollution in Delhi: Insights from Land Use Regression Modelling. J. Indian Soc. Remote Sens. 2024, 52, 1329–1346. [Google Scholar] [CrossRef]
  30. Zhou, H.; Dai, Z.; Wu, C.; Ma, X.; Zhu, L.; Wu, P. Comparison of Different Impact Factors and Spatial Scales in PM2.5 Variation. Atmosphere 2024, 15, 307. [Google Scholar] [CrossRef]
  31. Zhang, Y.; Yang, Y.; Chen, J.; Shi, M. Spatiotemporal heterogeneity of the relationships between PM2.5 concentrations and their drivers in China’s coastal ports. J. Environ. Manag. 2023, 345, 118698. [Google Scholar] [CrossRef]
  32. Rahnama, M.R.; Sabaghi Abkooh, S. The effect of air pollutant and built environment criteria on unhealthy days in Mashhad, Iran: Using OLS regression. Urban Clim. 2021, 37, 100836. [Google Scholar] [CrossRef]
  33. Zhao, H.; Wu, M.; Du, Y.; Zhang, F.; Li, J. Relationship between Built-Up Environment, Air Pollution, Activity Frequency and Prevalence of Hypertension—An Empirical Analysis from the Main City of Lanzhou. Int. J. Environ. Res. Public Health 2022, 20, 743. [Google Scholar] [CrossRef] [PubMed]
  34. Duan, S.; Liu, Q.; Jiang, D.; Jiang, Y.; Lin, Y.; Gong, Z. Exploring the Joint Impacts of Natural and Built Environments on PM2.5 Concentrations and Their Spatial Heterogeneity in the Context of High-Density Chinese Cities. Sustainability 2021, 13, 11775. [Google Scholar] [CrossRef]
  35. Ouyang, X.; Wei, X.; Li, Y.; Wang, X.-C.; Klemeš, J.J. Impacts of urban land morphology on PM2.5 concentration in the urban agglomerations of China. J. Environ. Manag. 2021, 283, 112000. [Google Scholar] [CrossRef]
  36. Duan, H.; Cao, Q.; Wang, L.; Gu, X.; Ashrafi, K. Exploring the relationships between 3D urban landscape patterns and PM2.5 pollution using the multiscale geographic weighted regression model. Prog. Phys. Geogr. Earth Environ. 2024, 48, 368–388. [Google Scholar] [CrossRef]
  37. Xu, X. Forecasting air pollution PM2.5 in Beijing using weather data and multiple kernel learning. J. Forecast. 2019, 39, 117–125. [Google Scholar] [CrossRef]
  38. Song, C.; Fu, X. Research on different weight combination in air quality forecasting models. J. Clean. Prod. 2020, 261, 121169. [Google Scholar] [CrossRef]
  39. Ma, J.; Cheng, J.C.P.; Xu, Z.; Chen, K.; Lin, C.; Jiang, F. Identification of the most influential areas for air pollution control using XGBoost and Grid Importance Rank. J. Clean. Prod. 2020, 274, 122835. [Google Scholar] [CrossRef]
  40. Peng, T.; Gan, M.; Yao, Z.; Yang, X.; Liu, X. Nonlinear impacts of urban built environment on freight emissions. Transp. Res. Part D Transp. Environ. 2024, 134, 104358. [Google Scholar] [CrossRef]
  41. Doan, Q.C.; Ma, J.; Chen, S.; Zhang, X. Nonlinear and threshold effects of the built environment, road vehicles and air pollution on urban vitality. Landsc. Urban Plann. 2025, 253, 105204. [Google Scholar] [CrossRef]
  42. Liu, M.; Liu, Y.; Ye, Y. Nonlinear effects of built environment features on metro ridership: An integrated exploration with machine learning considering spatial heterogeneity. Sustain. Cities Soc. 2023, 95, 104613. [Google Scholar] [CrossRef]
  43. Hammer, M.S.; van Donkelaar, A.; Li, C.; Lyapustin, A.; Sayer, A.M.; Hsu, N.C.; Levy, R.C.; Garay, M.J.; Kalashnikova, O.V.; Kahn, R.A.; et al. Global Estimates and Long-Term Trends of Fine Particulate Matter Concentrations (1998–2018). Environ. Sci. Technol. 2020, 54, 7879–7890. [Google Scholar] [CrossRef] [PubMed]
  44. van Donkelaar, A.; Martin, R.V.; Li, C.; Burnett, R.T. Regional Estimates of Chemical Composition of Fine Particulate Matter Using a Combined Geoscience-Statistical Method with Information from Satellites, Models, and Monitors. Environ. Sci. Technol. 2019, 53, 2595–2611. [Google Scholar] [CrossRef]
  45. Chen, H.; Chen, H.; Liu, Z.; Sun, X.; Zhou, R.; Wang, K. Analysis of Factors Affecting the Severity of Automated Vehicle Crashes Using XGBoost Model Combining POI Data. J. Adv. Transp. 2020, 2020, 8881545. [Google Scholar] [CrossRef]
  46. Chan, J.Y.-L.; Leow, S.M.H.; Bea, K.T.; Cheng, W.K.; Phoong, S.W.; Hong, Z.-W.; Chen, Y.-L. Mitigating the Multicollinearity Problem and Its Machine Learning Approach: A Review. Mathematics 2022, 10, 1283. [Google Scholar] [CrossRef]
  47. Charilaou, P.; Battat, R. Machine learning models and over-fitting considerations. World J. Gastroenterol. 2022, 28, 605–607. [Google Scholar] [CrossRef] [PubMed]
  48. Demir, S.; Sahin, E.K. An investigation of feature selection methods for soil liquefaction prediction based on tree-based ensemble algorithms using AdaBoost, gradient boosting, and XGBoost. Neural Comput. Appl. 2022, 35, 3173–3190. [Google Scholar] [CrossRef]
  49. Aas, K.; Jullum, M.; Løland, A. Explaining individual predictions when features are dependent: More accurate approximations to Shapley values. Artif. Intell. 2021, 298, 103502. [Google Scholar] [CrossRef]
  50. Sun, J.; Zhou, T.; Wang, D. Relationships between urban form and air quality: A reconsideration based on evidence from China’s five urban agglomerations during the COVID-19 pandemic. Land Use Policy 2022, 118, 106155. [Google Scholar] [CrossRef]
  51. Yuan, C.; Ng, E.; Norford, L.K. Improving air quality in high-density cities by understanding the relationship between air pollutant dispersion and urban morphologies. Build. Environ. 2014, 71, 245–258. [Google Scholar] [CrossRef]
  52. Chen, Z.; Chen, D.; Wen, W.; Zhuang, Y.; Kwan, M.-P.; Chen, B.; Zhao, B.; Yang, L.; Gao, B.; Li, R.; et al. Evaluating the “2+26” regional strategy for air quality improvement during two air pollution alerts in Beijing: Variations in PM2:5 concentrations, source apportionment, and the relative contribution of local emission and regional transport. Atmos. Chem. Phys. 2019, 19, 6879–6891. [Google Scholar] [CrossRef]
  53. Li, C.; Zhang, K.; Dai, Z.; Ma, Z.; Liu, X. Investigation of the Impact of Land-Use Distribution on PM2.5 in Weifang: Seasonal Variations. Int. J. Environ. Res. Public Health 2020, 17, 5135. [Google Scholar] [CrossRef]
  54. Savio, N.; Lone, F.A.; Bhat, J.I.A.; Kirmani, N.A.; Nazir, N. Study on the effect of vehicular pollution on the ambient concentrations of particulate matter and carbon dioxide in Srinagar City. Environ. Monit. Assess. 2022, 194, 393. [Google Scholar] [CrossRef]
  55. Elhadi, R.E.; Abdullah, A.M.; Abdullah, A.H.; Ash’aari, Z.H.; Khan, M.F. Seasonal Variations of Atmospheric Particulate Matter and its Content of Heavy Metals in Klang Valley, Malaysia. Aerosol Air Qual. Res. 2018, 18, 1148–1161. [Google Scholar] [CrossRef]
  56. Liu, M.; Bai, X.; Luan, D.; Wei, J.; Gong, Y.; Gao, Q. Association between built environments and quality of life among community residents: Mediation analysis of air pollution. Public Health 2022, 211, 75–80. [Google Scholar] [CrossRef] [PubMed]
  57. Fan, Z.; Zhan, Q.; Liu, H.; Wu, Y.; Xia, Y. Investigating the interactive and heterogeneous effects of green and blue space on urban PM2.5 concentration, a case study of Wuhan. J. Clean. Prod. 2022, 378, 134389. [Google Scholar] [CrossRef]
  58. Jiang, Y.; Huang, G.; Fisher, B. Air quality, human behavior and urban park visit: A case study in Beijing. J. Clean. Prod. 2019, 240, 118000. [Google Scholar] [CrossRef]
  59. Zhu, Z.; Wang, G.; Dong, J. Correlation Analysis between Land Use/Cover Change and Air Pollutants—A Case Study in Wuyishan City. Energies 2019, 12, 2545. [Google Scholar] [CrossRef]
  60. Liu, J.; Yan, G.; Wu, Y.; Wang, Y.; Zhang, Z.; Zhang, M. Wetlands with greater degree of urbanization improve PM2.5 removal efficiency. Chemosphere 2018, 207, 601–611. [Google Scholar] [CrossRef]
  61. Lee, H.; Jeon, J.; Lee, M.; Kim, H.S. Seasonal contrasting effects of PM2.5 on forest productivity in peri-urban region of Seoul Metropolitan Area, Republic of Korea. Agric. For. Meteorol. 2022, 325, 109149. [Google Scholar] [CrossRef]
  62. Nizamani, M.M.; Zhang, H.-L.; Bolan, N.; Zhang, Q.; Guo, L.; Lou, Y.; Zhang, H.-Y.; Wang, Y.; Wang, H. Understanding the drivers of PM2.5 concentrations in Chinese cities: A comprehensive study of anthropogenic and environmental factors. Environ. Pollut. 2024, 361, 124783. [Google Scholar] [CrossRef]
  63. Park, S.-H.; Ko, D.-W. Investigating the Effects of the Built Environment on PM2.5 and PM10: A Case Study of Seoul Metropolitan City, South Korea. Sustainability 2018, 10, 4552. [Google Scholar] [CrossRef]
  64. King, K.E. Chicago Residents’ Perceptions of Air Quality: Objective Pollution, the Built Environment, and Neighborhood Stigma Theory. Popul. Environ. 2015, 37, 1–21. [Google Scholar] [CrossRef]
  65. Chen, M.; Dai, F.; Yang, B.; Zhu, S. Effects of urban green space morphological pattern on variation of PM2.5 concentration in the neighborhoods of five Chinese megacities. Build. Environ. 2019, 158, 1–15. [Google Scholar] [CrossRef]
  66. Yin, Z.; Zhang, Y.; Ma, K. Evaluation of PM2.5 Retention Capacity and Structural Optimization of Urban Park Green Spaces in Beijing. Forests 2022, 13, 415. [Google Scholar] [CrossRef]
  67. Yang, T.; Wang, Y.; Wu, Y.; Zhai, J.; Cong, L.; Yan, G.; Zhang, Z.; Li, C. Effect of the wetland environment on particulate matter and dry deposition. Environ. Technol. 2018, 41, 1054–1064. [Google Scholar] [CrossRef] [PubMed]
  68. Li, C.; Huang, Y.; Guo, H.; Wu, G.; Wang, Y.; Li, W.; Cui, L. The Concentrations and Removal Effects of PM10 and PM2.5 on a Wetland in Beijing. Sustainability 2019, 11, 1312. [Google Scholar] [CrossRef]
  69. Chen, W.; Tang, H.; Zhao, H. Diurnal, weekly and monthly spatial variations of air pollutants and air quality of Beijing. Atmos. Environ. 2015, 119, 21–34. [Google Scholar] [CrossRef]
  70. Casallas, A.; Castillo-Camacho, M.P.; Guevara-Luna, M.A.; González, Y.; Sanchez, E.; Belalcazar, L.C. Spatio-temporal analysis of PM2.5 and policies in Northwestern South America. Sci. Total Environ. 2022, 852, 158504. [Google Scholar] [CrossRef]
  71. Zhao, L.; Zhang, M.; Cheng, S.; Fang, Y.; Wang, S.; Zhou, C. Investigate the effects of urban land use on PM2.5 concentration: An application of deep learning simulation. Build. Environ. 2023, 242, 110521. [Google Scholar] [CrossRef]
  72. Guo, L.; Luo, J.; Yuan, M.; Huang, Y.; Shen, H.; Li, T. The influence of urban planning factors on PM2.5 pollution exposure and implications: A case study in China based on remote sensing, LBS, and GIS data. Sci. Total Environ. 2019, 659, 1585–1596. [Google Scholar] [CrossRef] [PubMed]
  73. Lu, S.; Yang, X.; Li, S.; Chen, B.; Jiang, Y.; Wang, D.; Xu, L. Effects of plant leaf surface and different pollution levels on PM2.5 adsorption capacity. Urban For. Urban Green. 2018, 34, 64–70. [Google Scholar] [CrossRef]
  74. Yan, G.; Yu, Z.; Wu, Y.; Liu, J.; Wang, Y.; Zhai, J.; Cong, L.; Zhang, Z. Understanding PM2.5 concentration and removal efficiency variation in urban forest park—Observation at human breathing height. PeerJ 2020, 8, e8988. [Google Scholar] [CrossRef] [PubMed]
  75. Gianquintieri, L.; Oxoli, D.; Caiani, E.G.; Brovelli, M.A. Implementation of a GEOAI model to assess the impact of agricultural land on the spatial distribution of PM2.5 concentration. Chemosphere 2024, 352, 141438. [Google Scholar] [CrossRef]
  76. Cao, W.; Zhou, W.; Yu, W.; Wu, T. Combined effects of urban forests on land surface temperature and PM2.5 pollution in the winter and summer. Sustain. Cities Soc. 2024, 104, 105309. [Google Scholar] [CrossRef]
Figure 1. Research scope: (a) research area; (b) unit of analysis.
Figure 1. Research scope: (a) research area; (b) unit of analysis.
Atmosphere 16 00682 g001
Figure 2. Analytical framework.
Figure 2. Analytical framework.
Atmosphere 16 00682 g002
Figure 3. Comparison of model performance: (a) Comparison of R2 for testing sets across the three models; (b) Comparison of RMSE for testing sets across the three models; (c) Comparison of MAE for testing sets across the three models.
Figure 3. Comparison of model performance: (a) Comparison of R2 for testing sets across the three models; (b) Comparison of RMSE for testing sets across the three models; (c) Comparison of MAE for testing sets across the three models.
Atmosphere 16 00682 g003aAtmosphere 16 00682 g003b
Figure 4. Importance ranking of variables.
Figure 4. Importance ranking of variables.
Atmosphere 16 00682 g004
Figure 5. Trend of changes in the importance ranking of explanatory variables.
Figure 5. Trend of changes in the importance ranking of explanatory variables.
Atmosphere 16 00682 g005
Figure 6. Coefficient of variation and mean of variable ranking change.
Figure 6. Coefficient of variation and mean of variable ranking change.
Atmosphere 16 00682 g006
Figure 7. Threshold effect. (a) Building density; (b) Parking lots density; (c) Density of population; (d) Park and greenspace fraction; (e) Croplands fraction; (f) Wetlands fraction; (g) Forests fraction.
Figure 7. Threshold effect. (a) Building density; (b) Parking lots density; (c) Density of population; (d) Park and greenspace fraction; (e) Croplands fraction; (f) Wetlands fraction; (g) Forests fraction.
Atmosphere 16 00682 g007aAtmosphere 16 00682 g007b
Figure 8. The SHAP interaction values between building density and population density.
Figure 8. The SHAP interaction values between building density and population density.
Atmosphere 16 00682 g008
Table 1. Descriptive statistics of PM2.5.
Table 1. Descriptive statistics of PM2.5.
PeriodMinimumMaximumMeanStandard Deviation
January224433.85.2
February23.9113.8 *55.7 *24.2
March28.850.541.24.0
April32187 *77.4 *37
May16.843.322.67.1
June2552.139.26.6
July31.3144.3 *60.8 *22.5
August17.466.229.010.2
September39.1142.1 *67.4 *22.4
October31.690.750.717.8
November24.467.545.011.4
December42.999.3 *71.2 *13.0
Annual mean32.771.645.310.5
Note: * Denote the month when the mean and maximum of PM2.5 are relatively large.
Table 2. Descriptive statistics of explanatory variables.
Table 2. Descriptive statistics of explanatory variables.
DimensionsVariablesComputing MethodUnitMeanStd. DeviationModel 1Model 2Model 3
Density
(Built Environment)
Building densityThe ratio of the first-floor area of the total building in each research unit to the research unit area 0.110.10+++
Density of residential buildingThe ratio of the number of facility points in each research unit to the research unit areaquantity/km211.7117.12+ +
Density of commercial facilities53.3384.91+ +
Density of office facilities62.41111.48+ +
Density of public service facilities29.6750.55+ +
Floor area ratioThe ratio of the total floor area in each research unit to the research unit area 0.560.64+++
Diversity
(Built Environment)
Mixed utilization of points of interestThe Shannon–Wiener Diversity Index represented the mixed utilization of points of interest. 0.820.30+ +
Mixed utilization of landThe Shannon–Wiener Diversity Index was used to represent the mixed utilization of land 0.370.11 ++
Design (Built Environment)Road densityThe ratio of road length to research unit area in each research unitkm/km23.412.8+++
Destination accessibility
(Built Environment)
Nearest to the subway stationThe center of the research unit is a straight-line distance from the nearest subway stationquantity/km22684.582078.49+ +
Distance to transit (Built Environment)Bus coverage of 500 m km/km2204.29475.63+ +
Demand management
(Built Environment)
Parking lots densityThe ratio of the number of parking lots in each research unit to the research unit areaquantity/km215.7126.61+ +
Demographics (Built Environment)Density of populationThe ratio of the number of people in each research unit to the research unit areaquantity/km275.1324.92+ +
Normalized differential vegetation index (Built Environment) 0.160.08+ +
Land use fractionsCroplands fractionThe fraction of the area of this land type of site in the grid to the area of the grid 0.040.11 ++
Forests fraction0.040.16 ++
Grasslands fraction0.030.17 ++
Shrublands fraction0.010.01 ++
Wetlands fraction0.010.01 ++
Water bodies fraction0.010.04 ++
Tundras fraction0.010.01 ++
Barren lands fraction0.080.02 ++
Residential fraction0.360.32 ++
Business office fraction0.010.07 ++
Commercial service fraction0.010.03 ++
Industrial fraction0.010.07 ++
Transportation stations fraction0.010.14 ++
Airport facilities fraction0.030.15 ++
Administrative fraction0.030.15 ++
Educational fraction0.010.07 ++
Medical fraction0.040.12 ++
Sport and cultural fraction0.010.03 ++
Park and greenspace fraction0.040.29 ++
Note: + indicates that the influencing factor is used as an explanatory variable in the corresponding model.
Table 3. Hyperparameter settings.
Table 3. Hyperparameter settings.
Periodcolsample_bytreeetagammamax_depthn_estimatorsSubsample
January10.040.23103680.6
February0.560.220.4533141
March0.780.150.231619120.7
April0.50.040.45512100.7
May0.940.040.89329450.6
June0.940.170.45128910.7
July0.50.050.45517870.7
August0.670.060.561617110.7
September0.670.010.231710340.7
October0.830.140.121118260.6
November0.890.050.78737130.6
December0.670.110.891019180.7
Annual mean0.670.110.891019180.7
Table 4. Main research findings.
Table 4. Main research findings.
DimensionVariables with Greater Relative ImportanceMonths that Were Significantly AffectedSeasons that Were Affected ConsiderablyVariables with Significant InteractionsInteraction Direction
Built environmentBuilding density, parking lots densityAnnual mean, January, February *, June, July *, October, November, December *summer, autumn, winterPark and greenspace fraction, Croplands fraction, Forests fraction, Wetlands fraction
Density of population, Parking lots density
Positive interaction
Positive interaction
Land useWetlands fractionApril *, May, July *, August, September *summerPark and greenspace fraction,
Density of population
Parking lots density
Positive interaction
Positive interaction
Croplands fractionAugustsummerForests fraction, Wetlands fraction, Park and greenspace fraction
Density of population, Parking lots density
Negative interaction
Negative interaction
Forests fractionOctoberautumnWetlands fraction
Density of population, Parking lots density
Positive interaction
Positive interaction
Park and greenspace fractionApril *, July *, September *summer non-significant
Note: * indicates these months with high PM2.5 concentration.
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Song, A.; Wang, Z.; Li, S.; Chen, X. Comparative Analysis of the Impact of Built Environment and Land Use on Monthly and Annual Mean PM2.5 Levels. Atmosphere 2025, 16, 682. https://doi.org/10.3390/atmos16060682

AMA Style

Song A, Wang Z, Li S, Chen X. Comparative Analysis of the Impact of Built Environment and Land Use on Monthly and Annual Mean PM2.5 Levels. Atmosphere. 2025; 16(6):682. https://doi.org/10.3390/atmos16060682

Chicago/Turabian Style

Song, Anjian, Zhenbao Wang, Shihao Li, and Xinyi Chen. 2025. "Comparative Analysis of the Impact of Built Environment and Land Use on Monthly and Annual Mean PM2.5 Levels" Atmosphere 16, no. 6: 682. https://doi.org/10.3390/atmos16060682

APA Style

Song, A., Wang, Z., Li, S., & Chen, X. (2025). Comparative Analysis of the Impact of Built Environment and Land Use on Monthly and Annual Mean PM2.5 Levels. Atmosphere, 16(6), 682. https://doi.org/10.3390/atmos16060682

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop