Application of Machine Learning to Child Mode Choice with a Novel Technique to Optimize Hyperparameters

Travel mode choice (TMC) prediction is crucial for transportation planning. Most previous studies have focused on TMC in adults, whereas predicting TMC in children has received less attention. On the other hand, previous children’s TMC prediction studies have generally focused on home-to-school TMC. Hence, LIGHT GRADIENT BOOSTING MACHINE (LGBM), as a robust machine learning method, is applied to predict children’s TMC and detect its determinants since it can present the relative influence of variables on children’s TMC. Nonetheless, the use of machine learning introduces its own challenges. First, these methods and their performance are highly dependent on the choice of “hyperparameters”. To solve this issue, a novel technique, called multi-objective hyperparameter tuning (MOHPT), is proposed to select hyperparameters using a multi-objective metaheuristic optimization framework. The performance of the proposed technique is compared with conventional hyperparameters tuning methods, including random search, grid search, and “Hyperopt”. Second, machine learning methods are black-box tools and hard to interpret. To overcome this deficiency, the most influential parameters on children’s TMC are determined by LGBM, and logistic regression is employed to investigate how these parameters influence children’s TMC. The results suggest that MOHPT outperforms conventional methods in tuning hyperparameters on the basis of prediction accuracy and computational cost. Trip distance, “walkability” and “bikeability” of the origin location, age, and household income are principal determinants of child mode choice. Furthermore, older children, those who live in walkable and bikeable areas, those belonging low-income groups, and short-distance travelers are more likely to travel by sustainable transportation modes.


Introduction
Predicting travel mode choice is essential for transportation planning. However, most previous travel mode choice studies have focused on adults, whereas analyzing TMC in children has received less attention. Possibly, as a result, most transport planning is based on adults' needs, and children are more and more reliant on adults for their transport needs [1]. Children are members of society, and travel mode choice has been found to relate to their overall wellbeing. Children travels can influence parents' travel behavior [2]. Furthermore, children become adults, and their childhood behaviors can impact their adult behaviors. Therefore, it is important to examine children's TMC and how different parameters impact healthier and more sustainable mode choices.
In real-life travel behavior, individuals choose between different transportation modes. Therefore, the features of different transportation modes (e.g., travel time, travel cost, and While ML techniques require the determination of hyperparameters, their determination is typically performed ad hoc. Table 1 provides a summary of recent studies on TMC prediction using machine learning techniques and the methods applied to tune hyperparameters. As can be seen, over 30% of those studies did not tune hyperparameters at all, and, as a result, their models might suffer from over-or underfitting. When hyperparameter tuning is undertaken, it is normally applied by breaking datasets into training, validation, and sometimes even testing datasets. Furthermore, the existence of excessive outliers on validation data can lead to selecting nonoptimal values for hyperparameters. To overcome these deficiencies, the application of k-fold cross-validation is recommended [31]. However, only 42.4% of studies shown in Table 1 employed the k-fold cross-validation process to tune hyperparameters. The application of a robust method to tune hyperparameters is vital to develop an accurate prediction model. As shown in Table 1, trial and error is the most commonly used method in TMC prediction studies. However, the trial-and-error method has two major problems; it is a time-consuming technique and depends on modeler experience [53]. Accordingly, other researchers have applied systematic methods, including grid search, random search, and Hyperopt. Grid search is a brute force method, and it is not computationally efficient. Random search does not guarantee that optimal hyperparameters are found [54]. Moreover, all of these methods (i.e., trial and error, grid search, random search, and Hyperopt) only apply a single performance indicator (e.g., prediction accuracy) to tune hyperparameters. However, in many real-life prediction problems (e.g., TMC), datasets are not balanced. For instance, there tend to be many more car than bicycle trips (e.g., [52]). The promotion of such active modes may be a policy objective, but models using only overall accuracy might not adequately predict low-frequency modes. Thus, rather than simply using accuracy as a single performance indicator, multiple performance indicators (such as accuracy and F1-score together) should be applied to solve the problem of the imbalanced distribution of transportation modes. Hence, developing a new method that can consider multiple performance indicators in hyperparameter tuning is important, but currently overlooked. Table 1. A summary of recent studies on the application of machine learning to travel mode choice prediction.

Reference
Optimizing Hyperparameters

Considering k-Fold
Cross-Validation

Children Mode Choice
Pham et al. [21] (Tenfold) Trial and error × Pineda-Jaramillo and Arbeláez-Arenas [20] Random search × Kashifi et al. [33] × (Tenfold) -× Salas et al. [22] (Fivefold) Hyperopt Chao [25] × × -× As mentioned, the second problem with machine learning techniques is their black-box nature. To address this, many white-box prediction techniques have been developed, such as programming techniques (e.g., soccer league competition [55], water cycle programming [56], coyote optimization programming [32], and marine predator programming [57]) and M5tree [58]. Programming techniques cannot be applied to classification problems. Additionally, M5tree cannot represent the influence of variables on the response variable considering all respondents. In this regard, researchers have begun using ensemble machine learning techniques (e.g., gradient boosting) for TMC prediction problems since these methods can present the relative influence of each input variable on the response variable [20,24,26,52]. Although ensemble techniques can determine the influence of each variable, they can represent the direction of those influences.
Accordingly, after detecting the input variables with the highest relative influence on the response variable, different methods, such as accumulated local effects [59], Shapley additive explanations (SHAP) [60], partial dependence plot (PDP) [61], and local interpretable model agnostic explanations (LIME) [62], can be applied to represent in which the direction (positively, negatively, linearly, quadratically, etc.) of the top input variables impacts the response variable. However, LIME cannot indicate the influence direction of variables for all respondents, and it is a disaggregated technique. Although SHAP, PDP, and ALE can illustrate the influence direction of variables considering all data samples, they cannot represent whether the behavior of different groups is significantly different or not. Hence, multinomial logistic regression [63] is often used to determine the influence direction of variables and detect which groups significantly behave differently.
As can be seen from Table 1, although research has been conducted on adult TMC, child TMC has not received enough attention, and this group has been excluded in most studies. To address this issue, this study developed a model to predict the TMC of children and determine which variables significantly influence child mode choice for all trips (i.e., not only school-related trips) using an ensemble learning approach. Since conventional techniques to tune hyperparameters may not be highly efficient, and they can only optimize a single indicator during the tuning process, a new technique is proposed in this study that can optimize multiple indicators. The proposed technique can be highly effective for imbalanced datasets (e.g., Montreal TMC data), as the F1-score and accuracy can be maximized simultaneously. After detecting the most important variables on children's TMC, multinomial logistic regression is applied to make the results of the black-box prediction technique interpretable. In other words, multinomial logistic regression is used to represent in which direction top-ranked variables can influence child TMC and how these variables can support sustainable transportation.
In the next section, the datasets used in this study is initially explained. Then, the developed technique and the conventional techniques applied for tuning hyperparameters are described. Afterward, the results are presented and discussed.

Methods
The main objectives of this studies are as follows:

•
To develop a new method to tune hyperparameters; • To predict child mode choices accurately; • To determine which variables influence the child travel mode choice.
The flowchart of the methodology is shown in Figure 1. As can be seen, initially, different datasets are merged to develop a comprehensive dataset including many variables. Then, a new method is developed to tune hyperparameters. The developed method is compared with conventional regularisation techniques based on prediction accuracy and running time. The most accurate hyperparameter tuning technique is then used to run the final model. Subsequently, the machine learning technique is run, and the relative influence of variables is determined. Lastly, multiple logistic regression is used to interpret the results of the machine learning technique. final model. Subsequently, the machine learning technique is run, and the relative influence of variables is determined. Lastly, multiple logistic regression is used to interpret the results of the machine learning technique.

Datasets and Variables
To the best of the authors' knowledge, most previous studies used trip details, individual and household characteristics to model TMC. However, in this study, additional variables, such as accessibility, geographic, and land-use variables, are added to the mentioned variables to help explain TMC. To this end, three datasets are taken into account, the 2018 Montreal OD survey, Walk Score, and Montreal proximity measure data.
The Montreal OD survey was conducted in the fall of 2018, and roughly 400,000 trips were recorded for "an average fall" day. From this survey, 14 variables are considered, including age, gender, availability of a monthly transit pass, disability status, interview language, household income, the presence of people in the household with restrictions in movement, number of members in the household, number of cars in the household, trip distance, start time of the trip, reason for trip, region of origin, and region of destination.

Datasets and Variables
To the best of the authors' knowledge, most previous studies used trip details, individual and household characteristics to model TMC. However, in this study, additional variables, such as accessibility, geographic, and land-use variables, are added to the mentioned variables to help explain TMC. To this end, three datasets are taken into account, the 2018 Montreal OD survey, Walk Score, and Montreal proximity measure data.
The Montreal OD survey was conducted in the fall of 2018, and roughly 400,000 trips were recorded for "an average fall" day. From this survey, 14 variables are considered, including age, gender, availability of a monthly transit pass, disability status, interview language, household income, the presence of people in the household with restrictions in movement, number of members in the household, number of cars in the household, trip distance, start time of the trip, reason for trip, region of origin, and region of destination.
From the Walk Score dataset [64], walk score, transit score, and bike score variables were collected. Walk score measures the walkability of a location according to the distance to different amenities, including schools, parks, restaurants, grocery stores, and coffee shops. Transit score represents how well a location is served by public transit. The bike score indicates how a location is good for biking based on the availability of bike lanes, road connectivity, hilliness, and nearby amenities. These indices quantify the quality of walking, transit, and biking trips from 0 (worst) to 100 (excellent).
The built environment data were further enriched by using proximity data for Montreal. Ten variables were added, including accessibility level to primary school, secondary school, childcare facility, park, library, grocery store, health facility, pharmacy, employment source, and public transit. These indices measure the closeness of a dissemination block to the mentioned services using a gravity-based accessibility measure. Dissemination blocks are the smallest geographic area bounded on all sides by streets or boundaries of Statistics Canada's standard geographic areas [65]. For more information about the mentioned accessibility indices, please visit Statistics Canada [66]. In contrast to walk score, which provides an overall score, these values are destination-specific.
Altogether, 27 variables were applied to explain TMC. The attributes of selected variables are shown in Table 2. Since this investigation focused on children's TMC, the trips of individuals aged from 5 to 17 were taken into consideration (5 is the minimum age for trips to be collected on an individual level in Montreal; 18 is considered an adult in Canada). The trips where the origin is home were considered because the first trip's mode restricts the following TMC [52], and built environment data were collected according to the individual's residential location. In the final dataset, the number of relevant trips was 9597. These observations were randomly divided into training (80% of total samples) and testing data (20% of total samples). Six transportation modes were used for the mentioned trips: school bus (18.6%), car as a passenger (33.6%), bus (10.9%), rail transit (6.7%), cycling (2.4%), and walking (27.8%). The share of these modes in the dataset was 18.6%, 33.6%, 10.9%, 6.7%, 2.4%, and 27.8%, respectively. Hence, the share of transportation modes was imbalanced, and it was more appropriate to develop a model that can maximize F1-score, as well as accuracy, in the hyperparameter tuning process.

Modeling
For modeling TMC, an ensemble learning approach was applied for two reasons. First, the results of recent studies showed that ensemble prediction techniques generally outperform other modeling techniques, such as naïve Bayes, logistic regression, k-nearest neighbor, support vector machine, artificial neural network, nested logit, and multinomial logit, in explaining TMC in terms of prediction accuracy [11,30,50,52]. Second, ensemble techniques can prioritize variables on the basis of their relative influence on the response variable [67].
In this study, light gradient boosting machine (LGBM), a powerful and fast ensemble technique, was employed for the prediction process.
LGBM is an updated version of tree-based gradient boosting developed by Microsoft. Like other ensemble techniques, LGBM combines different weak learners (i.e., decision trees) to form a powerful and robust prediction algorithm [68].
LGBM is a quick method, and it is highly efficient for large-scale prediction problems. Parallel learning is supported by LGBM, and, as a result, memory usage is significantly reduced. A leaf-wise leaf growth strategy is used in LGBM modeling that can limit the depth growth in the splitting process. The mentioned leaf-wise leaf growth strategy splits the same layer of leaves simultaneously. Therefore, LGBM can implement multithreaded optimization. To this end, the complexity of the model is controlled automatically, and the probability of overfitting is considerably reduced [69].

Tuning Hyperparameters
A new technique is proposed to optimize hyperparameters of machine learning techniques considering multiple performance indicators. That is, a new multi-objective hyperparameter tuning (MOHT) approach is developed in this study. In this regard, non-dominated sorting genetic algorithm III (NSGA-III), a multi-objective metaheuristic algorithm, was used as the optimization tool. Genetic algorithms have been widely used to optimize several engineering problems [70][71][72]. NSGA-III was used for the optimization process since it is a multi-objective metaheuristic optimization technique, and metaheuristic techniques can sync with machine learning techniques [73]. In this technique, the hyperparameter values are optimized using an optimization framework. In each iteration of the optimization process, NSGA-III assigns different values to hyperparameters. Then, the machine learning technique is run to evaluate the performance indicators (i.e., accuracy and F1-score) for each of the assigned hyperparameters. Then, the model tries to improve the performance indicators by optimizing the hyperparameters. The optimization modeling of the proposed method is presented in Equations (1)- (5).
where Z 1 and Z 2 are the objective functions of the proposed optimization model. Accuracy K−CV and F K−CV 1 imply the accuracy and F1-score of validation data calculated using the k-fold cross-validation technique. In this study, a fivefold cross-validation was used for the tuning process (K = 5). HP Int i and HP Con j denote integer and continuous-ranged hyperparameters.
Set i is the defined set of integer hyperparameter i. HP min j and HP max j are the minimum and maximum defined values for continuous-ranged hyperparameters j. I and J represent the number of integer and continuous-ranged hyperparameters, respectively.
Equations (1) and (2) are the objective function of the proposed technique. That is, in the hyperparameter tuning process, accuracy and F1-score are maximized simultaneously. Equation (3) guarantees that the optimal value of integer hyperparameters is selected from their defined set. Equations (4) and (5) are the constraints that force the model to select the optimal value of continuous-ranged hyperparameters from their allowed range.
As mentioned, NSGA-III was employed to solve the multi-objective optimization problem. NSGA-III is a metaheuristic optimization algorithm that is used for solving multi-objective optimization problems. This algorithm aims to find non-dominated sorting optimal solutions integrating all objective functions rather than converting all objective functions into a single objective function. As a result, NSGA-III presents a Pareto front in which the optimal solutions cannot dominate each other on the basis of all objective functions [74]. The pseudo-code of MOHT is shown in Figure 2.  Equations (1) and (2) are the objective function of the proposed technique. That is, in the hyperparameter tuning process, accuracy and F1-score are maximized simultaneously. Equation (3) guarantees that the optimal value of integer hyperparameters is selected from their defined set. Equations (4) and (5) are the constraints that force the model to select the optimal value of continuous-ranged hyperparameters from their allowed range.
As mentioned, NSGA-III was employed to solve the multi-objective optimization problem. NSGA-III is a metaheuristic optimization algorithm that is used for solving multi-objective optimization problems. This algorithm aims to find non-dominated sorting optimal solutions integrating all objective functions rather than converting all objective functions into a single objective function. As a result, NSGA-III presents a Pareto front in which the optimal solutions cannot dominate each other on the basis of all objective functions [74]. The pseudo-code of MOHT is shown in Figure 2. Three conventional hyperparameter tuning techniques, namely, grid search, random search, and Hyperopt, were used to evaluate the effectiveness of the proposed hyperparameter tuning approach (i.e., MOHT). Grid search checks all the possible combinations of hyperparameters to find their optimal values. That is, a possible set for each hyperparameter should be defined. Then, all the possible combinations of hyperparameters in the possible set are used to run the model. Lastly, the combination that leads to the highest accuracy is considered the optimal value of hyperparameters. Random search only checks Three conventional hyperparameter tuning techniques, namely, grid search, random search, and Hyperopt, were used to evaluate the effectiveness of the proposed hyperparameter tuning approach (i.e., MOHT). Grid search checks all the possible combinations of hyperparameters to find their optimal values. That is, a possible set for each hyperparameter should be defined. Then, all the possible combinations of hyperparameters in the possible set are used to run the model. Lastly, the combination that leads to the highest accuracy is considered the optimal value of hyperparameters. Random search only checks some random possible combinations of hyperparameters and tunes hyperparameters on the basis of a limited number of random combinations. Hyperopt is an efficient hyperparameter tuning method that applies parallel and serial optimization to efficiently optimize hyperparameters [75].
Grid search is a brute force technique, and it only assigns a limited number of possible hyperparameter values to the hyperparameters' initial set. That is, for hyperparameters with a continuous range, only a few values can be checked, and the optimal value of hyperparameters may not be found. However, grid search is an exact algorithm, and its optimal solution is not changed in different runs. Random search may not find the optimal values for hyperparameters because it assigns random values to hyperparameters. Nonetheless, random search is a quick technique, and it is computationally efficient when the number of hyperparameters is significant. Hyperopt is computationally more efficient than grid search, while its running time is generally higher than random search. Furthermore, all these techniques apply a single performance indicator (e.g., accuracy) to tune hyperparameters. To address this issue, this study developed the MOHT.
The defined set for hyperparameters is presented in Table 3. As can be seen, grid search cannot cover the entire range, and a set with a few possible options should be considered in this technique since it is a comprehensive search method.

Results Interpretation
Although LGBM can rank variables on the basis of their relative influence on the response variable (i.e., children's TMC), it cannot interpret how each variable (e.g., trip distance) impacts the children's TMC. To solve this issue, after detecting the variables with the highest relative influence on children's TMC, multinomial logistic regression was applied to determine how these top-ranked variables influence the children's TMC. Since multinomial logistic regression cannot converge when the number of variables is significant, the top variables on child TMC were detected using the relative influence presented by LGBM. Then, those top variables were applied for modeling multinomial logistic regression.
Multinomial logistic regression is a robust statistical modeling technique that can be used for classification and interpretation. A set of explanatory variables are used in multinomial logistic regression for assessing the probability of dichotomous outcome events. Dichotomous variables mainly represent whether some events occur or not. In this technique, it is assumed that the relation between the explanatory variables is linear. Therefore, multinomial logistic regression uses linear decision boundaries, but it is a nonlinear technique [67]. From a sustainable transport perspective, the car as a passenger is considered the reference in the multinomial logistic regression to determine how top variables can attract children to use more sustainable transportation modes.

Results and Discussion
In this section, the results of hyperparameter tuning techniques are initially presented, and the best technique is determined. Then, the ranking of variables based on their relative influence on children' TMC is presented using the most accurate hyperparameter technique and LGBM. Ultimately, the results of a multinomial logistic regression model are presented.

The Performance of Hyperparameter Tuning Techniques
The optimal values of hyperparameters using different techniques are shown in Table 4. Although MOHT is a multi-objective algorithm and generally provides the users with multiple non-dominated optimal solutions (i.e., a Pareto front), it presents a single optimal solution for the applied case study. If MOHT presents over one optimal solution, it is recommended to apply gray relational analysis to find the best optimal solution according to the details provided by Naseri et al. [76]. The testing data accuracy and testing data F1-score of different hyperparameter tuning techniques are shown in Figures 3 and 4. As can be seen, the proposed technique in this study (MOHT) obtained the highest testing data accuracy, followed by grid search, Hyperopt, and random search. That is, applying MOHT could increase the prediction accuracy by 1.25%, 2.81%, and 3.59%, respectively, compared to grid search, Hyperopt, and random search. Similarly, MOHT outperformed other techniques in terms of testing data F1-score. The testing data F1-score of MOHT was 1.74%, 3.61%, and 4.89% greater than that of grid search, Hyperopt, and random search, respectively. Therefore, the testing data F1-score improvement of MOHT was more than its prediction accuracy, which is related to considering both accuracy and F1-score in the objective function of MOHT. Hence, it can be postulated that considering multiple performance indicators in the tuning hyperparameter techniques can improve the overall performance of the model. However, techniques including a single performance indicator can only improve the prediction accuracy and not all vital performance indicators.

The Performance of Hyperparameter Tuning Techniques
The optimal values of hyperparameters using different techniques are shown in Table 4. Although MOHT is a multi-objective algorithm and generally provides the users with multiple non-dominated optimal solutions (i.e., a Pareto front), it presents a single optimal solution for the applied case study. If MOHT presents over one optimal solution, it is recommended to apply gray relational analysis to find the best optimal solution according to the details provided by Naseri et al. [76]. The testing data accuracy and testing data F1-score of different hyperparameter tuning techniques are shown in Figures 3 and 4. As can be seen, the proposed technique in this study (MOHT) obtained the highest testing data accuracy, followed by grid search, Hyperopt, and random search. That is, applying MOHT could increase the prediction accuracy by 1.25%, 2.81%, and 3.59%, respectively, compared to grid search, Hyperopt, and random search. Similarly, MOHT outperformed other techniques in terms of testing data F1-score. The testing data F1-score of MOHT was 1.74%, 3.61%, and 4.89% greater than that of grid search, Hyperopt, and random search, respectively. Therefore, the testing data F1-score improvement of MOHT was more than its prediction accuracy, which is related to considering both accuracy and F1-score in the objective function of MOHT. Hence, it can be postulated that considering multiple performance indicators in the tuning hyperparameter techniques can improve the overall performance of the model. However, techniques including a single performance indicator can only improve the prediction accuracy and not all vital performance indicators.   The receiver operating characteristic (ROC) curves of different hyperparameter tuning techniques are indicated in Figure 5. Drawing on the results, the highest area under the curve (AUC) of the ROC curves was related to MOHT, followed by grid search, Hyperopt, and random search, with values of 0.81, 0.80, 0.79, and 0.78, respectively. Accordingly, MOHT was the best technique. MOHT obtained the highest AUC of the ROC curve for the least frequent mode (cycling) with a value of 0.62, which was 2%, 4%, and 6% more than that of grid search, Hyperopt, and random search. This improvement resulted from considering prediction accuracy and F1-score in MOHT, proving that MOHT was highly efficient for modeling this imbalanced TMC dataset. Therefore, considering an optimization framework to tune hyperparameters can even improve the performance indicators not considered in the objective function of the optimization model.  The receiver operating characteristic (ROC) curves of different hyperparameter tuning techniques are indicated in Figure 5. Drawing on the results, the highest area under the curve (AUC) of the ROC curves was related to MOHT, followed by grid search, Hyperopt, and random search, with values of 0.81, 0.80, 0.79, and 0.78, respectively. Accordingly, MOHT was the best technique. MOHT obtained the highest AUC of the ROC curve for the least frequent mode (cycling) with a value of 0.62, which was 2%, 4%, and 6% more than that of grid search, Hyperopt, and random search. This improvement resulted from considering prediction accuracy and F1-score in MOHT, proving that MOHT was highly efficient for modeling this imbalanced TMC dataset. Therefore, considering an optimization framework to tune hyperparameters can even improve the performance indicators not considered in the objective function of the optimization model.
The running time of the hyperparameter tuning techniques is presented in Figure 6. MOHT could reduce the computational time by 68% and 71% compared to Hyperopt and grid search, indicating that MOHT was a highly efficient technique regarding the computational cost. However, the MOHT running time was 2.5 times more than that of the random search. As mentioned, random search checks a limited number of random combinations; hence, it was the fastest technique. On the other hand, random search is less likely to find optimal values of hyperparameters, and the testing data accuracy and F1-score obtained by the random search were significantly lower compared to MOHT. Therefore, it can be postulated that MOHT outperformed other techniques when considering the testing data accuracy, testing data F1-score, and running time.
Liashchynskyi and Liashchynskyi [54] compared the performance of grid search and random search regarding the prediction accuracy and running time. The results suggested that, although random search was a faster technique, grid search could obtain higher prediction accuracy. Hence, their results are in line with the findings of this study.
ingly, MOHT was the best technique. MOHT obtained the highest AUC of the ROC curve for the least frequent mode (cycling) with a value of 0.62, which was 2%, 4%, and 6% more than that of grid search, Hyperopt, and random search. This improvement resulted from considering prediction accuracy and F1-score in MOHT, proving that MOHT was highly efficient for modeling this imbalanced TMC dataset. Therefore, considering an optimization framework to tune hyperparameters can even improve the performance indicators not considered in the objective function of the optimization model. The running time of the hyperparameter tuning techniques is presented in Figure 6. MOHT could reduce the computational time by 68% and 71% compared to Hyperopt and grid search, indicating that MOHT was a highly efficient technique regarding the computational cost. However, the MOHT running time was 2.5 times more than that of the random search. As mentioned, random search checks a limited number of random combinations; hence, it was the fastest technique. On the other hand, random search is less likely to find optimal values of hyperparameters, and the testing data accuracy and F1-score obtained by the random search were significantly lower compared to MOHT. Therefore, it can be postulated that MOHT outperformed other techniques when considering the testing data accuracy, testing data F1-score, and running time. Liashchynskyi and Liashchynskyi [54] compared the performance of grid search and random search regarding the prediction accuracy and running time. The results suggested that, although random search was a faster technique, grid search could obtain higher prediction accuracy. Hence, their results are in line with the findings of this study.

The Relative Influence of Variables on Children's TMC
Since MOHT led to the highest prediction accuracy, the LGBM was performed using the optimal hyperparameter values found by MOHT. Then, the relative influence of variables was extracted to determine which variables impact the children's TMC the most. The relative influence is illustrated in Figure 7. As can be seen, trip distance had by far the highest impact on children's TMC, with a relative influence of 15.5%. Walk score, age, bike score, household income, and accessibility to secondary school were the next best variables. The relative influence of other variables was less than 5%. Among accessibility parameters, accessibility to secondary school, accessibility to libraries, and accessibility to grocery stores had the greatest influence on children's TMC. Wang and Ross [77] investigated the relative influence of different variables on adults' TMC, and the results suggested that the relative influence of trip distance was significantly higher than the number of vehicles per capita, population density, and the number of people in the household. Accordingly, their results are in line with the results of the current study. In the Kim [11] study, age had a considerably higher influence than gender in terms of relative influence on TMC, which is consistent with the results shown in Table 5.  Wang and Ross [77] investigated the relative influence of different variables on adults' TMC, and the results suggested that the relative influence of trip distance was significantly higher than the number of vehicles per capita, population density, and the number of people in the household. Accordingly, their results are in line with the results of the current study. In the Kim [11] study, age had a considerably higher influence than gender in terms of relative influence on TMC, which is consistent with the results shown in Table 5.

Analyzing the Influence Direction of Top Variables
Although LGBM can rank the variables on the basis of their relative influence on the response variable, it cannot determine how changing variables affect the response variable. In this regard, multinomial logistic regression was performed to examine the direction influence of top-ranked variables, and the results are shown in Table 5. In the mentioned analysis, the car as a passenger was considered the reference. According to the results, most of the variables were statistically significant in terms of impact on children's TMC, which may be related to considering the top-ranked variables of LGBM in the multinomial logistic regression.
Children are more likely in Montreal to travel by public transit (i.e., rail transit and bus) than by car as a passenger. Nonetheless, they are less likely to travel by school bus or active transportation (i.e., cycling and walking) then by car as a passenger. By reducing the trip distance, children are more likely to walk or cycle to their destination, while the probability of traveling by public transit or school bus is reduced. In regions with a lower walk score, children are more likely to travel by school bus, and public and active transportation are less used than cars as a passenger. Children aged under 12 years are more likely to travel by car as a passenger. For those aged over 15, rail transit was their preference, followed by cycling, bus, and walking. However, they were not likely to prefer the school bus over a car as a passenger. A reduction in bike score led to a reduction in the probability of choosing bus, rail transit, cycling, and walking over a car as a passenger. Furthermore, a statistically significant difference was not found between the car as a passenger and the school bus if bike score was changed.
As compared to high-income households, children in the low-income group (<60 thousand CAD annually) preferred the school bus, bus, rail transit, cycling, and walking over a car as a passenger. The middle-income group (60-120 thousand CAD annually) was more likely to travel by school bus as compared to the high-income group, but a significant difference could not be seen between a car as a passenger and other transportation modes. A reduction in the accessibility to secondary school resulted in an increment in the intention to choose school bus, bus, and walking over a car as a passenger. On the other hand, reducing the accessibility to secondary school decreased the probability of choosing rail transit and cycling over a car as a passenger.

Managerial Implications
Individuals over 15, those who live in regions with higher walk score, bike score, and accessibility to secondary schools, the low-income group, and short-distance travelers are more likely to travel by active transportation. Moreover, older children (aged over 15), long-distance travelers, residents of regions with higher walk score and bike score, and the low-income group generally use public transit more than a car as a passenger. children aged 12 to 15, residents of regions with the lowest level of walk score, bike score, and accessibility to secondary schools, long-distance travelers, and low-and middle-income groups are more likely to travel by school bus than by car as a passenger.
Therefore, improving the walk score can increase the share of active and public transportation in child trips. Similarly, the bike score needs to be increased if the goal is to promote active transportation in children. Accessibility to schools should be improved if the governments tend to attract children to travel by active transport.
One of the limitations of this study is that it only applied the NSGA-III algorithm to develop a multi-objective hyperparameter tuning technique. It is recommended to consider other multi-objective optimization algorithms to develop new hyperparameter tuning techniques and compare their accuracy with the method proposed in this study.

Conclusions
In this study, the travel mode choice of children aged 5 to 17 was investigated using a robust ensemble learning technique, LGBM. To maximize the model's performance, a new multi-objective approach (MOHT) was proposed to tune machine learning techniques hyperparameters. The performance of the proposed technique was compared with the conventional tuning methods. MOHT was demonstrated to be an appropriate technique for tuning hyperparameters of imbalanced datasets (such as travel mode choice) since it can consider multiple machine learning performance indicators in the tuning process. MOHT outperformed other hyperparameter tuning techniques in terms of machine learning performance indicators (e.g., prediction accuracy, F1-score, and AUC). Moreover, this technique could significantly reduce the computational cost compared to grid search and Hyperopt. However, the running time of MOHT was considerably higher than the random search, but it could present more accurate solutions.
The independent variables were ranked on the basis of their relative influence on children's TMC, and trip distance, walk score, age, bike score, household income, and accessibility to secondary schools were the top-ranked variables. Since LGBM could not represent how these top-ranked variables influence children's TMC, multiple logistic regression was applied to better understand the influence of these variables on children's TMC. With reference to trips by car, the results suggested that, as trip distance decreases, active modes are more likely. The built environment, as measured by walk score, was positively associated with all sustainable and independent modes as was bike score to a lesser degree. As age increased, children used more sustainable and independent modes. Finally, the highest household income was associated with more car as passenger trips, but the relationship with active modes was less strong. The results suggest that policies for a mixed-use development with high-quality public transport networks, such as Singapore's 20 min towns and 45 min city [78], can facilitate both local travel and the use of public transport by children.