The Inﬂuence of Transportation Accessibility on Trafﬁc Volumes in South Korea: An Extreme Gradient Boosting Approach

: This study explored how transportation accessibility and trafﬁc volumes for automobiles, buses, and trucks are related. This study employed machine learning techniques, speciﬁcally the extreme gradient boosting decision tree model (XGB) and Shapley Values (SHAP), with national data sources in South Korea collected from the Korea Transport Institute, Statistics Korea


Introduction
Traffic congestion is a major problem in many urban areas, leading to increased travel time, air pollution, and decreased quality of life [1].One of the main factors contributing to traffic congestion is the volume of vehicles on the road, which is influenced by factors including accessibility to different destinations and transportation modes.Mondschein and Taylor [2], for instance, suggested that there are sites where individuals make numerous traffic and engage in numerous activities despite congestion, which tend to be more central, built-up areas with higher levels of accessibility.Accordingly, a potential strategy for reducing traffic congestion is to improve accessibility to different destinations and modes of transportation.By improving access to public transportation, for example, individuals may be more likely to use public transit rather than drive alone, which can help to reduce the number of vehicles on the road and alleviate congestion.Similarly, by improving access to essential services and amenities, such as grocery stores, schools, and healthcare facilities, individuals may be able to reduce the number of trips they take, which can also help to reduce congestion.
Therefore, this study aims to offer a deep understanding of how transportation accessibility and traffic volumes are associated and how the association differs by automobile, bus, and truck utilizing innovative approaches (i.e., ML techniques), extreme gradient boosting decision tree model (XGB), and Shapley values (SHAP) with nationwide data sources in South Korea collected from the Korea Transport Institute, Statistics Korea, and National Spatial Data Infrastructure Portal.This study makes several contributions, including (1) identifying critical factors and understanding their effects, and (2) providing policy implications.This study can aid in identifying the primary factors that influence traffic volumes, such as the availability of transportation infrastructure and the degree of congestion.Transportation planners and policymakers can use this information to devise strategies to improve transportation infrastructure and reduce congestion.Without studying the relationship between accessibility and congestion, transportation planners may not fully understand the transportation needs of users, leading to inefficient transportation systems that may reduce mobility for users.This study can also suggest the efficacy of various strategies for enhancing transportation infrastructure and reducing congestion.For instance, studies can assess the effects of constructing new roads or highways, expanding public transportation options, or instituting congestion pricing policies.

Related Works 2.1. Traffic and Accessibility
Previous literature has acknowledged the association between traffic volumes and accessibility.Broadly speaking, the built environment influences how people move through and interact with their surroundings, therefore land use and travel behavior are inextricably related [3].A substantial body of work has been devoted to the study of the relationship between the built environment and travel behavior, with several major findings emerging [4].Mainly, the built environment, operationalized by D variables (e.g., density, diversity, design, destination accessibility, and distance to transit), was found to have a considerable impact on the travel behavior of diverse transportation modes, such as automobiles and car-hailing services [5][6][7][8].
More specifically, previous literature has acknowledged the relationship between the built environment, particularly accessibility, and travel behavior.Built environment patterns influence travel behavior by determining the accessibility of destinations and the types of transportation available to individuals.For instance, after controlling for individual and household variables, Frank et al. [9] discovered that residents of neighborhoods with higher degrees of accessibility to destinations by foot and bike were more inclined to walk or bike for transportation.A similar finding was found by Duranton and Turner [10], who discovered that the amount of traffic congestion is a function of how easily infrastructures can be accessed, with more accessibility leading to increased traffic volumes and congestion.Lavieri et al. [11] explored the relationship between active transportation use and virtual and physical accessibility, controlling for information and communication technology use measures and other relevant factors.In their model, they considered that individual, household, and work factors influence activity-travel choices and that these choices are in turn influenced by both virtual accessibility and physical accessibility.Wang et al. [12] investigated the association between multi-use path accessibility and active travel behavior in Salt Lake City and found that multi-use path accessibility and a favorable clustering impact can influence active travel behavior.Yan [13] concluded that improvements in accessibility can lead to increases in not only TCS but also destination utility.

Machine Learning in Urban Sciences
Methodologically, the use of machine learning (ML) in the age of artificial intelligence (AI) can offer numerous contributions and innovations to the field of urban science.For instance, ML has the potential to increase the precision of predictions and classifications, especially when dealing with large and complex datasets [14,15].This could enable more precise and nuanced analyses of urban phenomena, leading to improved policy decisions and planning results.Second, ML can autonomously identify the most significant features of a dataset, enabling more efficient and effective analysis.This could assist researchers in concentrating their efforts on the most significant factors influencing urban phenomena, resulting in more targeted interventions and policy decisions.Thirdly, ML can recognize intricate patterns and relationships in urban data that may not be readily apparent to human analysts [16].This could contribute to discoveries and insights in the field of urban analytics, expanding our knowledge of how cities function.Fourth, ML can be trained on large datasets and executed rapidly, enabling analyses at larger scales and with greater granularity than conventional techniques.This could enable researchers to analyze urban phenomena on a regional or even national scale, providing fresh perspectives on urban challenges and issues.

Research Gaps
The empirical evidence demonstrates that land use patterns have a major impact on travel behavior and, more importantly, that increasing accessibility can assist in reducing reliance on single-occupancy vehicles while also supporting more sustainable modes of transportation.However, there have been several research gaps that this study aims to fill.First, a limited scope has been made to explore traffic volumes of diverse transportation modes as previous studies have generally used individual or household travel or the use of a certain transportation mode.Second, a few studies have focused on the relationship between the built environment, particularly accessibility, and traffic volumes.Also, there is a research gap when it comes to understanding the impact of accessibility to infrastructures such as schools, markets, and bus stops on traffic volumes.Third, despite the extensive research on the relationship between infrastructure accessibility and traffic volumes, there is a research gap when it comes to comprehending this relationship across various modes of transportation, including automobiles, buses, and trucks, in a particular context.Fourth, a few studies have attempted to explore and understand the complex relationship, such as non-linearity and feature importance, between accessibility and traffic volumes by using the ML approach [14].

Dependent Variable: Traffic Volumes
The dependent variables in this study were traffic volumes measured as annual average daily traffic (AADT) in the number of vehicles per day (see Table 1).The traffic volume used in this study refers to the estimated number of vehicles traveling on the road predicted via an algorithm that estimates the traffic volume of unobserved roads using observed actual traffic volume and navigation data.The traffic data was at the finer level of geographical boundary in South Korea, which is Eup/Myeon/Dong (EMD).The sample size was around 3200.This study categorized traffic volume into three modes: truck, car, and bus, as they differ in size, speed, purpose, emissions, and impact on traffic.Consequently, three different models were generated for each variable in this investigation.
The data were collected from the Korea Transport Institute.The sources offer nationwide representative data and advantages of this research, including generalizability and reduced bias.Nationwide representative data sources provide a more comprehensive picture of the population and have the potential to boost the findings' validity to be generalized to the full population of interest.The use of data sources that are representative of the entire country can help eliminate bias by ensuring that the sample is not systematically biased toward any particular groups or locations of the country.However, the data have limitations.For instance, nationwide analyses may not take into account specific regional or local contexts, which can limit the applicability of the findings to specific regions or populations.Also, nationwide data sources often have limited variables available for analysis, which can make it difficult to fully explore the relationships between different factors.Acknowledging the limitations, this study used the data sets.
where r is areas of residential use permits in km Regarding the descriptive statistics, there was a difference in the traffic volumes that occurred between the three different modes of transportation.More specifically, the traffic volumes for automobiles were the largest, followed by trucks and buses.Figure 1 shows the spatial distribution of traffic volumes of the three modes.It shows a large concentration of traffic volumes in metropolitan regions, such as Seoul and Busan.In addition, the majority of the truck traffic was concentrated along the key highway corridors, such as the Gyeong-bu Line and the Ho-nam Line.
Urban Sci.2023, 7, x FOR PEER REVIEW 5 of 16 limitations.For instance, nationwide analyses may not take into account specific regional or local contexts, which can limit the applicability of the findings to specific regions or populations.Also, nationwide data sources often have limited variables available for analysis, which can make it difficult to fully explore the relationships between different factors.Acknowledging the limitations, this study used the data sets.
Regarding the descriptive statistics, there was a difference in the traffic volumes that occurred between the three different modes of transportation.More specifically, the traffic volumes for automobiles were the largest, followed by trucks and buses.Figure 1 shows the spatial distribution of traffic volumes of the three modes.It shows a large concentration of traffic volumes in metropolitan regions, such as Seoul and Busan.In addition, the majority of the truck traffic was concentrated along the key highway corridors, such as the Gyeong-bu Line and the Ho-nam Line.

Independent Variables
This study used several independent variables categorized into two sections: (1) transportation accessibility, which is the focus of this study, and (2) control factors.First, this study employed diverse transportation accessibility indicators, including travel time to educational infrastructures (i.e., elementary school, middle school, and high school), commercial properties (i.e., mart and market), and public transportation infrastructures (i.e., bus stop and transit station).The models in this study were controlled for several factors that might be able to influence traffic volumes, such as population density, employment density, and land use diversity.
For the multicollinearity test, we developed ordinary least square (OLS) regression models for each truck, automobile, and bus, and estimated Variance Inflation Factor (VIF).The results of OLS models, which have adjusted R-squared of 0.485, 0.669, and 0.382 each for the three models, in Table 2 reveal that none of the independent variables show VIF of more than 10.Therefore, we concluded that the inclusion of all variables shown in Table 2 would not produce bias and issues related to the multicollinearity.Additionally, many of the variables show a significant relationship with traffic volumes of automobiles, buses, and trucks.

Independent Variables
This study used several independent variables categorized into two sections: (1) transportation accessibility, which is the focus of this study, and (2) control factors.First, this study employed diverse transportation accessibility indicators, including travel time to educational infrastructures (i.e., elementary school, middle school, and high school), commercial properties (i.e., mart and market), and public transportation infrastructures (i.e., bus stop and transit station).The models in this study were controlled for several factors that might be able to influence traffic volumes, such as population density, employment density, and land use diversity.
For the multicollinearity test, we developed ordinary least square (OLS) regression models for each truck, automobile, and bus, and estimated Variance Inflation Factor (VIF).The results of OLS models, which have adjusted R-squared of 0.485, 0.669, and 0.382 each for the three models, in Table 2 reveal that none of the independent variables show VIF of more than 10.Therefore, we concluded that the inclusion of all variables shown in Table 2 would not produce bias and issues related to the multicollinearity.Additionally, many of the variables show a significant relationship with traffic volumes of automobiles, buses, and trucks.

Methodological Approach
This study used the methodological approach depicted in Figure 2 to develop XGB models and SHAP values.The process can be categorized into several parts: (1) collect data, (2) split data into train and test sets, (3) train 5 machine learning algorithms with hyperparameter tuning using the grid-search, (4) search for optimal algorithm using the 10-fold cross-validation, (5) employ SHAP method, and (6) interpret SHAP methods, including feature importance, summary plot, dependence plot, and interaction value plot.This study employed several Python packages, such as Sklearn.10-fold cross-validation, (5) employ SHAP method, and (6) interpret SHAP methods, including feature importance, summary plot, dependence plot, and interaction value plot.This study employed several Python packages, such as Sklearn.

Extreme Gradient Boosting Decision Tree Model
This study used XGB.The algorithm is a type of gradient-boosting algorithm that uses decision trees as base learners [19].It works by iteratively adding decision trees to the model, with each tree attempting to correct the errors made by the previous tree [20].

Extreme Gradient Boosting Decision Tree Model
This study used XGB.The algorithm is a type of gradient-boosting algorithm that uses decision trees as base learners [19].It works by iteratively adding decision trees to the model, with each tree attempting to correct the errors made by the previous tree [20].
During training, XGB calculates the gradient and Hessian of the loss function concerning each prediction and then fits a decision tree to these values.The algorithm then adds this new tree to the model and updates the predictions for each sample based on the new tree's output.This process is repeated for a specified number of iterations or until convergence is achieved.One key feature of XGB is its ability to handle missing data and regularization techniques such as L1 and L2 regularization.Additionally, it has built-in support for parallel processing and can handle large datasets efficiently [21].Overall, XGB is a powerful machine-learning algorithm that has been shown to achieve state-of-the-art performance on a wide range of tasks [14].
XGB algorithm in this study is trained using a comprehensive shear strength database of 3278 samples, where 80% and 20% of the data are, respectively used for training and testing.The XGBoost algorithm contributes to the predictive model by achieving approximately 66.4%, 80.9%, and 54.3% validation accuracy for truck, car, and bus models, respectively, which exceeds the model performance of linear regression (LR), decision tree (DT), random forest (RF), and gradient boosting decision tree (GB) models (see Table 3).Therefore, we selected XGB as an optimal algorithm and interpreted it by using the Shapley value (SHAP).The hyper-parameters of the XGB truck model were an alpha of 0.01, a learning rate of 0.1, a max depth of 6, a number of estimators of 250, and a subsample of 0.9.Those of XGB car models were an alpha of 0.1, a learning rate of 0.1, a max depth of 6, a number of estimators of 250, and a subsample of 1.0.Finally, those of the XGB Bus model were an alpha of 0, a learning rate of 0.1, a max depth of 6, a number of estimators of 250, and a subsample of 0.8.In addition to the model specification of the XGB algorithm, the hyper-parameters of DT truck, car, and bus models were a max depth of 6, max features of auto, and a minimum sample leaf of 5. Also, after the grid search, RF truck, car, and bus models had a max depth of 10, minimum samples of 5, and a number of estimators of 250.Lastly, GB had a loss function of ls, a max depth of 6, max features of sqrt, and a number of estimators of 250.
The mathematical fundamentals will be introduced briefly in what follows.The equations are from several previous studies, such as Feng et al. [22] and Chen and Guestrin [23].That is, we declared that we do not have the originality to develop the equations here.Considering we have a database including n samples, say, D{(x 1 , y 1 ), (x 2 , y 2 ), . . . ,(x n , y n )} Urban Sci.2023, 7, 91 8 of 16 where x i , i = 1, 2, . . .; n = input variables; y i = output variable.Thus, the task is to train a model to find the mapping between the inputs and output, say, where ŷi = prediction value φ(•) final strong learner; f k (•) weak learner generated by the decision tree (DT) method; K = number of weak learners; and α k = learning rate used to avoid overfitting.According to XGBoost, a loss function L(•) is defined to represent the error between the prediction ŷi and the real value y i , which has the following form: in which the first right-hand-side term, L(•), denotes realistic training loss between real and predicted values, and the second right-hand-side term, Ω(•), denotes the complexity of the model, which is usually referred to as the regularization term.These two terms, respectively measure how well the model fits the data and the complexity of the model.In general, a squared loss function is adopted for the first term, whereas the second term is expressed by the tree node number and the L2 norm of the leaf score.That is, 4) where T = number of leaf nodes; w k = leaf scores (or weights); and γ and λ = penalty coefficients.Therefore, the problem becomes one of finding the appropriate learner f t at each step t ≤ K to minimize the loss function, as in Here, at the tth step, the objective can be rewritten as where ŷt−1 i = predicted value at last step; and const.represents ∑ t−1 k=1 Ω( f t ), which is indeed a constant.
Considering a second-order Taylor expansion and neglecting the constant term, the objective loss function in Equation ( 6) can be further approximated as where ) are the first and second-order gradients of the loss function.The term L (y i , ŷt−1 i ) is a constant, so it can be removed in the minimization process.Meanwhile, denoting the sample set of leaf j by I j , the objective can be simplified as Urban Sci.2023, 7, 91 where G j = ∑ i=I j g i and H j = ∑ i∈I j h i .With this loss function, the optimized weight w * j for leaf j, as well as the tree structure score L split after splitting, can be derived to build a tree, that is, where G L , G R , H L and H R are the first and second gradients of the left and right children after the split, respectively.

Interpretable Machine Learning: Shapley Value
The interpretability of machine learning projects is becoming an increasingly significant requirement, and as a result, there is a growing need to communicate the complicated outputs of model interpretation methodologies to non-technical stakeholders [24,25].Therefore, this study attempted to not only develop XGB algorithms but also interpret them using SHAP values.SHAP values are built on a concept from cooperative game theory that is used to quantify the contribution of each feature in a machine learning model to the final prediction [26].In the context of machine learning, the SHAP value can be used to explain the output of a model by attributing a portion of the prediction to each input feature [22].That is, it is capable of producing feature attributes for a single instance and allows further understanding of the prediction behavior of the algorithm [27].
Therefore, this study used SHAP in four ways (see Figure 2): (1) calculate feature importance, (2) offer summary plots, (3) present dependence plots, and (4) describe interaction plots.First, it can estimate the feature importance of each feature.Specifically, it calculates the average marginal contribution of each feature across all possible subsets of features [28,29].To calculate the Shapley value for a given feature, we consider all possible subsets of features that include that feature and calculate the difference in model output when that feature is included versus when it is not included.We then take the average of these differences across all possible subsets, weighting each subset by its size relative to the total number of subsets.Second, SHAP can produce several plots for direct interpretation [30].For instance, the summary plot shows the impact of each feature on model output for a single sample, with features ordered by importance.Also, the dependence plot presents how the value of a single feature affects model output by capturing non-linear relationships while accounting for interactions with other features.Furthermore, the interaction value plot shows how pairs of features interact to affect model output, with each point representing a single sample.
The SHAP value, g(x ), can be defined as where x is the vector of simplified input variables that are obtained from the original input variables x in the data set; M is the number of features of the data set; ϕ 0 is a constant when all inputs are null; and ϕ i is the attribution value for each feature i.It is noted that the authors do not have the originality of the equation, rather they are collected from previous studies, including Feng et al. [22] and Lundberg and Lee [27].

Results
This section consists of three subsections: (1) feature importance and its corresponding summary plot, (2) dependence plots to show non-linear relationships, and (3) interaction value plots to reveal the interaction effects of two variables on the dependent variable.The SHAP methodology was used to produce all the results in this section, which were based on the final three XGB algorithms for AADT of trucks, automobiles, and buses.
We selected XGB algorithms for all three models based on the results shown in Table 3. Table 3 indicates that XGB showed a significantly higher prediction accuracy compared to LR, DT, RF, and GB.There was, however, a difference in the model performance of the three XGB variants.This could be because the set of factors explained car traffic better than the other two modes of transportation.Furthermore, while many other characteristics, such as industrial accessibility, can be associated with truck and bus traffic, the model did not account for all of them.As a result, there is a disparity in model performance.However, this may not fully explain the variation.Thus, further study is needed to explore this aspect.

Feature Importance and Summary Plot
There are several methods for enhancing algorithm comprehension using SHAP, and feature importance is one of them.Feature importance allows us to estimate the contribution of each independent variable in percent to the model's prediction.Table 4 shows the results of feature importance and its rank.According to the feature importance plot, the important features for predicting AADT of trucks were Mart (48.27%),Ave Speed (17.49%),Metro (15.85%),Pop Density (11.70%), and Budget (11.26%).A similar pattern was detected for car and bus models.Specifically, Mart, Metro, and Ave Speed showed a significant contribution when determining the AADT of car and bus.Like the spatial concentration of AADT of the transportation modes in Figure 1, Metro showed considerable feature importance in Table 4. Regarding the accessibility indicators, Bus Stop, Transit, and High showed approximately 10% feature importance when explaining the AADT of the bus.Other accessibility indicators exhibited around 5 to 10% of feature importance except for Mart.The summary plot in Figure 3 adds the effects of independent variables to the feature importance in Table 4.Each point on the plot represents SHAP values for a feature and sample.While the y-axis shows each independent variable, the x-axis represents SHAP values.The color denotes the values of the independent variable, which ranges from low (red) to high (blue).Similar to the findings in Table 4, Figure 3 shows that the three most Urban Sci.2023, 7, 91 11 of 16 important features were Mart and Ave Speed for the truck model, Mart and Metro for the car model, and Mart and Ave Speed for the bus model.Moreover, the plots show lower values of Mart (i.e., closer to Mart) were associated with higher AADT for all three transportation modes in South Korea.Interestingly, lower accessibility to elementary school was associated with lower AADT of the transportation modes.Strict transportation regulations, such as the speed limit (30 km/h around the elementary school), may explain the results.The summary plot in Figure 3 adds the effects of independent variables to the feature importance in Table 4.Each point on the plot represents SHAP values for a feature and sample.While the y-axis shows each independent variable, the x-axis represents SHAP values.The color denotes the values of the independent variable, which ranges from low (red) to high (blue).Similar to the findings in Table 4, Figure 3 shows that the three most important features were Mart and Ave Speed for the truck model, Mart and Metro for the car model, and Mart and Ave Speed for the bus model.Moreover, the plots show lower values of Mart (i.e., closer to Mart) were associated with higher AADT for all three transportation modes in South Korea.Interestingly, lower accessibility to elementary school was associated with lower AADT of the transportation modes.Strict transportation regulations, such as the speed limit (30 km/h around the elementary school), may explain the results.

Dependence Plot
Figures 4-6 show selected dependence plots for the important accessibility indicators.The plot gives a graphical description of the marginal effect of an independent variable on AADT, after accounting for the average influences of all other variables used in the XGB model [31].One of the strengths of this method is that it is not constrained by the linearity assumption in the econometrics, rather it can reveal non-linear relationships [32].In the plots in the figures, each sample of the dataset appears as its point, and the point is

Dependence Plot
Figures 4-6 show selected dependence plots for the important accessibility indicators.The plot gives a graphical description of the marginal effect of an independent variable on AADT, after accounting for the average influences of all other variables used in the XGB model [31].One of the strengths of this method is that it is not constrained by the linearity assumption in the econometrics, rather it can reveal non-linear relationships [32].In the plots in the figures, each sample of the dataset appears as its point, and the point is presented as a scatterplot of the value of an independent variable on the x-axis and its corresponding SHAP values on the y-axis.presented as a scatterplot of the value of an independent variable on the x-axis and its corresponding SHAP values on the y-axis.In Figure 4, all else equal, the SHAP values for the accessibility index for Mart were comparable between 0 and 4, but then decreased sharply, meaning that better access to Mart showed a higher AADT of truck for a certain range, while AADT considerably decreased after the range.Also, accessibility to transit was comparable, whereas the plot exhibited a negative approximately linear trend after a certain point.Figure 5 demonstrates dependent plots that show the non-linear association between Mart, High, Transit, and AADT of cars.For accessibility to Mart (Mart), the AADT of cars was high when the travel time to high school was lower.However, AADT substantially decreased after the logtransformed travel time of 3. Also, the effect of accessibility to high school (High) did not    In Figure 4, all else equal, the SHAP values for the accessibility index for Mart were comparable between 0 and 4, but then decreased sharply, meaning that better access to Mart showed a higher AADT of truck for a certain range, while AADT considerably decreased after the range.Also, accessibility to transit was comparable, whereas the plot exhibited a negative approximately linear trend after a certain point.Figure 5 demonstrates dependent plots that show the non-linear association between Mart, High, Transit, and AADT of cars.For accessibility to Mart (Mart), the AADT of cars was high when the travel time to high school was lower.However, AADT substantially decreased after the logtransformed travel time of 3. Also, the effect of accessibility to high school (High) did not In Figure 4, all else equal, the SHAP values for the accessibility index for Mart were comparable between 0 and 4, but then decreased sharply, meaning that better access to Mart showed a higher AADT of truck for a certain range, while AADT considerably decreased after the range.Also, accessibility to transit was comparable, whereas the plot exhibited a negative approximately linear trend after a certain point.Figure 5 demonstrates dependent plots that show the non-linear association between Mart, High, Transit, and AADT of cars.For accessibility to Mart (Mart), the AADT of cars was high when the travel time to high school was lower.However, AADT substantially decreased after the log-transformed travel time of 3. Also, the effect of accessibility to high school (High) did not change, but it reduced by more than 2.5 away from the high school.The effective range of accessibility to transit stations (Transit) on AADT of cars was between 2.5 and 3.5.Figure 6 shows the non-linear relationship between the AADT of bus and Bus Stop, High, and Transit.The AADT of buses was higher the closer they were to the bus stop, which is in line with the findings of earlier research and makes intuitive sense.Complex non-linear relationships were observed in High and Transit.Specifically, the effects of High and Transit were substantially large when the log-transformed travel times to high school and transit station were 1.5 and 2, respectively.

Interaction Value Plot
The interaction value plots in Figures 7-9 visualize interaction effects between two independent variables on the AADT of the truck, car, and bus. Figure 7 demonstrates that controlling for the independent variables, the effect of accessibility to Mart in the metropolitan areas on AADT of trucks (red dots) was substantially larger than that in other areas (blue dots).Also, the interaction effect between Metro and Transit was the second degree in Figure 7, suggesting that while the lower or higher travel time to transit stations in rural areas was associated with higher AADT of trucks, the AADT of trucks reached the peak with medium travel time to the infrastructure in other areas.Figure 8 reveals that in the metropolitan areas, better accessibility to elementary school was associated with substantially lower AADT of cars.However, in other areas with lower car ownership and ridership, the direction of the association was the opposite.The interaction effects in Figure 9 were significant.Specifically, better accessibility to the bus stop or transit station particularly in metropolitan areas was associated with the AADT of buses, but the accessibility in other areas seems to have a nearly linear and positive influence on the AADT of buses.

Interaction Value Plot
The interaction value plots in Figures 7-9 visualize interaction effects between two independent variables on the AADT of the truck, car, and bus. Figure 7 demonstrates tha controlling for the independent variables, the effect of accessibility to Mart in the metro politan areas on AADT of trucks (red dots) was substantially larger than that in other area (blue dots).Also, the interaction effect between Metro and Transit was the second degree in Figure 7, suggesting that while the lower or higher travel time to transit stations in rura areas was associated with higher AADT of trucks, the AADT of trucks reached the peak with medium travel time to the infrastructure in other areas.Figure 8 reveals that in the metropolitan areas, better accessibility to elementary school was associated with substan tially lower AADT of cars.However, in other areas with lower car ownership and rid ership, the direction of the association was the opposite.The interaction effects in Figure 9 were significant.Specifically, better accessibility to the bus stop or transit station partic ularly in metropolitan areas was associated with the AADT of buses, but the accessibility in other areas seems to have a nearly linear and positive influence on the AADT of buses

Interaction Value Plot
The interaction value plots in Figures 7-9 visualize interaction effects between tw independent variables on the AADT of the truck, car, and bus. Figure 7 demonstrates tha controlling for the independent variables, the effect of accessibility to Mart in the metro politan areas on AADT of trucks (red dots) was substantially larger than that in other area (blue dots).Also, the interaction effect between Metro and Transit was the second degre in Figure 7, suggesting that while the lower or higher travel time to transit stations in rura areas was associated with higher AADT of trucks, the AADT of trucks reached the pea with medium travel time to the infrastructure in other areas.Figure 8 reveals that in th metropolitan areas, better accessibility to elementary school was associated with substan tially lower AADT of cars.However, in other areas with lower car ownership and rid ership, the direction of the association was the opposite.The interaction effects in Figur 9 were significant.Specifically, better accessibility to the bus stop or transit station partic ularly in metropolitan areas was associated with the AADT of buses, but the accessibilit in other areas seems to have a nearly linear and positive influence on the AADT of buses

Key Findings
There are several key findings in this study.First, accessibility indicators exhibited around 5 to 10% of feature importance except for Mart (around 50%).Also, the important features for predicting the AADT of trucks were Mart (48.27%),Ave Speed (17.49%),Metro (15.85%),Pop Density (11.70%), and Budget (11.26%).A similar pattern was detected for car and bus models.Second, the dependence plots indicated threshold effects.For instance, the SHAP values for the accessibility index for Mart were comparable between 0 and 4 but then decreased sharply, indicating that better access to Mart showed a higher AADT of trucks for a certain range, while AADT considerably decreased after the range.Third, this study found interaction effects in the interaction plots.For instance, better accessibility to public transportation infrastructures, such as bus stops and transit stations, was associated with higher annual average daily traffic (AADT), particularly in metropolitan areas including Seoul and Busan.

Key Findings
There are several key findings in this study.First, accessibility indicators exhibited around 5 to 10% of feature importance except for Mart (around 50%).Also, the important features for predicting the AADT of trucks were Mart (48.27%),Ave Speed (17.49%),Metro (15.85%),Pop Density (11.70%), and Budget (11.26%).A similar pattern was detected for car and bus models.Second, the dependence plots indicated threshold effects.For instance, the SHAP values for the accessibility index for Mart were comparable between 0 and 4 but then decreased sharply, indicating that better access to Mart showed a higher AADT of trucks for a certain range, while AADT considerably decreased after the range.Third, this study found interaction effects in the interaction plots.For instance, better accessibility to public transportation infrastructures, such as bus stops and transit stations, was associated with higher annual average daily traffic (AADT), particularly in metropolitan areas including Seoul and Busan.

Implication
The results of this study indicate a significant and non-linear relationship between transportation accessibility indicators and traffic volumes.The findings of this study have important implications for transportation planning and policy.Specifically, they suggest that improving accessibility to public transportation infrastructures can help increase traffic volumes of buses, particularly in metropolitan areas, including Seoul and Busan, which can promote sustainable modes of transportation.Additionally, they suggest that increasing accessibility to large-scale marts may have unintended consequences on traffic volumes for both trucks and automobiles and induce congestion, which implies that it should be approached with caution.Interestingly, given that better accessibility to elementary schools was associated with lower AADT for all three transportation modes, strict transportation regulations, as a speed limit of 30 km/h, may be an adequate approach to lower traffic volumes and congestion.

Limitation of this Study and Future Research Direction
This study has several limitations.First, the findings of this study may be limited to the specific area and transportation system examined in this study.The results may not necessarily apply to other areas or transportation systems with different characteristics.Second, the study relies on data from various sources, including traffic volume counts, transportation infrastructure maps, and demographic data.Some data may be incomplete or inaccurate, which could affect the results of the study.Third, the study examines the association between accessibility to infrastructures and traffic volumes but cannot establish causality.Other factors not examined in this study, such as land use patterns and travel behavior, may also influence traffic volumes.Fourth, the study examines traffic volumes over a specific time frame, and the relationship between accessibility to infrastructures and traffic volumes may vary over longer or shorter periods.
This study suggests several future research directions.For instance, further studies are needed to explore how transportation accessibility impacts traffic volumes within diverse regional or local contexts.Also, future study needs to expand diverse sets of accessibility indicators, such as accessibility to employment, and explore the associations.

Conclusions
This study aimed to explore the relationship between accessibility to infrastructures and traffic volumes across different modes of transportation in South Korea using XGB and SHAP approaches.We believe this research is an important step in understanding the complex relationship between accessibility to infrastructures and traffic volumes.By understanding the factors that influence traffic volumes, the study aims to provide important insights into how transportation systems can be developed and managed to meet the needs of users while also addressing social, economic, and environmental challenges [33].The findings of this study have important implications for transportation planning and policy and highlight the need to consider accessibility to infrastructures when developing transportation plans and policies [10].

Figure 2 .
Figure 2. Overall process of the methodological approach used in this study.

Figure 2 .
Figure 2. Overall process of the methodological approach used in this study.

Figure 3 .
Figure 3. Summary plots for feature importance of the truck (Left), car (Middle), and bus (Right) models.

Figure 3 .
Figure 3. Summary plots for feature importance of the truck (Left), car (Middle), and bus (Right) models.

Figure 4 .
Figure 4. Selected dependence plots of the truck model.

Figure 4 .
Figure 4. Selected dependence plots of the truck model.

Figure 4 .
Figure 4. Selected dependence plots of the truck model.

Figure 5 .
Figure 5. Selected dependence plots of the car model.

Figure 6 .
Figure 6.Selected dependence plots of the bus model.

Figure 5 .
Figure 5. Selected dependence plots of the car model.

Figure 4 .
Figure 4. Selected dependence plots of the truck model.

Figure 5 .
Figure 5. Selected dependence plots of the car model.

Figure 6 .
Figure 6.Selected dependence plots of the bus model.

Figure 6 .
Figure 6.Selected dependence plots of the bus model.

Figure 7 .
Figure 7. Selected interaction value plots of the truck model.

Figure 8 .
Figure 8. Selected interaction value plots of the car model.

Figure 7 .
Figure 7. Selected interaction value plots of the truck model.

Figure 7 .
Figure 7. Selected interaction value plots of the truck model.

Figure 8 .
Figure 8. Selected interaction value plots of the car model.

Figure 8 .
Figure 8. Selected interaction value plots of the car model.Urban Sci.2023, 7, x FOR PEER REVIEW 14 of 16

Figure 9 .
Figure 9. Selected interaction value plots of the bus model.

Figure 9 .
Figure 9. Selected interaction value plots of the bus model.

Table 1 .
Description and descriptive statistics of variables used in this study.

Table 2 .
Results of OLS regression models.

Table 3 .
Model performance comparison.

Table 4 .
Results on feature importance of the truck, car, and bus models.