Linking Mode Choice with Travel Behavior by Using Logit Model Based on Utility Function

: The currently available transport modeling tools are used to evaluate the effects of behavior change. The aim of this study is to analyze the interaction between the transport mode choice and travel behavior of an individual—more speciﬁcally, to identify which of the variables has the greatest effect on mode choice. This is realized by using a multinomial logit model (MNL) and a nested logit model (NL) based on a utility function. The utility function contains activity characteristics, trip characteristics including travel cost, travel time, the distance between activity place, and the individual characteristics to calculate the maximum utility of the mode choice. The variables in the proposed model are tested by using real observations in Budapest, Hungary as a case study. When analyzing the results, it was found that “Trip distance” variable was the most signiﬁcant, followed by “Travel time” and “Activity purpose”. These parameters have to be mainly considered when elaborating urban trafﬁc models and travel plans. The advantage of using the proposed logit models and utility function is the ability to identify the relationship among the travel behavior of an individual and the mode choice. With the results, it is possible to estimate the inﬂuence of the various variables on mode choice and identify the best mode based on the utility function.


Introduction
Transport modeling is used to evaluate effects of behavior changes and to determine the impacts of infrastructure upgrades. The available tools are becoming complex, and a growing number of parameters, aspects, and stakeholders have to be considered [1]. The traditional modeling tools have been extended with activity-based models, which serve the analysis of traffic impacts and travel behavior, where a main issue is the identification of the most relevant modeling parameters [2].
The models based on daily activity and mode choice processes have dominated the transportation research community. These models predict behavior including information about daily activities and mode choice. The major advantages of using the activity-based models and mode choice behavior are an explicit analysis of complex travel behavior and a better understanding of travelers' responses to transportation policies.
Although the theoretical underpinnings of these models differ, they commonly have the assumption that the individuals will have within their choice sets the alternative they prefer. They sometimes find themselves subject to a set of constraints [3,4]. However, in most of these models, the choice sets are typically assumed to be given or derived according to some rules based on the importance of the very choice to the user. The delineation of choice sets is particularly important in activity-travel modeling, which receives increasing attention in the activity-based travel demand modeling [5,6].
In various statistical or econometric utility-based models, it is assumed that users faced with a set of alternatives choose maximum utility. The choices are calculated as a function that maximizes the overall utility of a daily activity pattern within derived choice sets [7].
In this context, the mode choice set refers to the set of all available discrete alternatives known by the individual. This means that the traveler considers different elements such as contextual factors, choice alternatives, and subjective values before activity-travel plans are executed. The most widespread tool is multinomial logit modeling with a well-defined mathematical structure and easily interpretable results, where one focus of the analysis the definition of trip characteristics and socio-demographic variables [8]. It is often stated that the models have to include the connection of attitudes and behavior, where it has to be explored how individual parameters have an effect on mode choice [9].
Therefore, this paper aims to examine the relationships among the transport mode choice, the individual, household characteristics, and daily activity of travelers by using a multinomial logit model and nested logit model; then, it identifies which of these variables have a great effect on the mode choice. As well as the determination of the utility function formulated from an individual, household characteristics and daily activity behavior are dealt with as a utility maximization problem to define the most suitable mode choice. In this paper, the activity-based modeling parameters are examined with their effect on mode choice using a novel combination of logit models. The proposed results will be useful both for transport modelers and decision makers, who would like to examine the factors of travel behavior change.
The paper is organized as follows: Section 2 offers a theoretical background, Section 3 presents the model formulation, Section 4 covers the study area and the interview technique as well as the process of data collection and how the model is estimated. It also includes a brief analysis of socio-economic characteristics in the study area. Section 5 is on model specification with model estimation, results, and utility function analysis, and Section 6 concludes the paper by highlighting some of the research results.

Theoretical Background
A primarily focus of modeling theory is on the utility-based modeling approach based on Joh et al. [10]. Several concepts of the activity-based models have been suggested in the research including constraint-based models, (nested logit) utility-maximizing models, advanced statistical models, and models-based individual behavior. Similarly, Bhat and Pendyala [11] discussed adopting diverse methodologies including discrete choice theory, utility theory, latent class modeling, rule-based modeling, and micro-simulation approaches to mode choice and activity behavior. In their work, they analyzed interactions between modeling intra-household and mode choice in the context of daily activity behavior. Based on their findings, there are parameters that have important implications for modeling travel behavior. Therefore, it is relevant to explore which parameters have the main effects on travel behavior.
Several models contain elements of both mode choice and preferences. For example, Timmermans et al. [12] introduced a simple model. They state that the presence of children of various ages in the household, work status of the household, age, and car availability of the spouses have their effect on the time duration for activity and travel behavior in the household. Multinomial logit models, including 53 these variables, were used to predict a time duration dedicated set of activities. Statistical-modeling approaches are also used to study the effect of household socio-demographic statistics on mode choice behavior in daily activity, as it can be seen in the studies of Wallace et al. [13] and Yang et al. [14]. In this process, they proposed different regression models to understand the results. Discrete choice models are often applied to dealing with the concepts of this nature. The authors discussed the potential strengths and weakness of the modeling approaches, where the main issue is the statistical significance and the usability for decision-making.
Dissanayake and Morikawa [15] proposed various discrete choice (multinomial and nested logit) models to investigate the role of household structure and travel characteristics to undertake the daily activity at various degrees of complexity. They found that these models are appropriate mainly because of the goodness of model fitting. This means that using logit models will result in meaningful interpretations; thus, we also used them in this study.
Considering the connection of the elements, Ye et al. [16] were dealing with trip chaining and mode choice, where they have identified individual trip chains and tried to analyze the effects of mode choice. Comparing this approach with other solutions, it was resulted that these steps are the most appropriate and provided the best goodness of fit values. Moreover, Islam and Habib [17] have investigated the relationship between mode choice and complexity of the daily activities. In this process, they have studied the hierarchical relationship between activity chain and mode choice. They found that mode choice decisions remain consistent during the weekdays. As a main result, it was stated the several socio-economic characteristics are influencing the choice; thus, a deeper analysis on these factors would be required in this study.
To provide a case study, Ashalatha et al. [18] analyzed a revealed preference study of mode choice in India by using multinomial logit model. The analysis indicated that the preference of using a car increases with age, while the preference of using a two-wheeler decreases in comparison to the use of bus. Furthermore, the results have shown that an increase in time and cost makes the travelers change to private modes from public transport. This paper also justifies the applicability of the model for mode choice analysis.
Several research papers have shown that although the topic of mode choice is of high significance on the research agenda for many years, most models of transport demand are still based on individual travel behavior to analyze transport mode choice. Therefore, the aim of this paper is to systematically deal with a model of mode choice processes in a daily activity by using a multinomial logit model (MNL) and nested logit model (NL). Hence, it investigates mode choice based on a utility function with a special focus on individual and household characteristics and daily activity and trip characteristics as well.

Analysis Framework
The basic assumption is that mode choice depends on travel behavior and environmental situation. The traveler has opinions (knowledge) of the environment and preferences and basic needs. These opinions and preferences lead to plans, programs, and schedules. The traveler processes those plans, programs, and schedules in temporal and spatial ways [19].
This research aims to explore mode choice behavior in Budapest. This model based on the activity-travel survey collected from the household information is undertaken. In this study, the methodology proposed can be divided into three steps: the first step involves the identification of mode (walk, private transport, and public transport), calculation of the trip characteristics (time, cost, and distance), determination of the number of trips per mode, identification of the activity purpose, and the individual and household characteristics. The second step is to analyze the mode choice based on the individuals, household, and travel characteristics by using MNL and NL. The third step is to investigate the mode choice behavior based on the utility function.

Basic Construction of Utility Theory
Utility is an indicator value for an individual on mode choice. Generally, this factor can be derived from the attributes of alternatives, and the utility maximization rule states that an individual will select the alternative out of the set of available alternatives that maximize his or her utility. The utility function (U) has the property that an alternative is chosen if its utility is greater than the utility of all other alternatives in the individual's choice set [20]. This can be stated as an alternative (i), which is chosen among a set of alternatives (j), if and only if the utility of alternative (i) is greater than or equal to the utility of all alternatives (j) in the choice set (C). Let (U i ) be the utility function of a mode choice by traveler (i) from alternatives (j), which can be represented as Equation (1) of observed utility and an error part [7]. where: U it = is the utility function of the alternative (i) to the mode choice (t), (U it is equivalent to U (X i , S t ) but provides a simpler notation), V it = is the deterministic or observable portion of the utility estimated by the analyst, and ε it = is the error or the portion of the utility unknown to the analyst. In the case of mode choice modeling, the objective is analysis of the traveler behavior to select a mode choice from the available alternatives so that the utility will be maximized.

Utility Associated with Alternatives Chosen by Travelers
The utility component alternatives are including mode choice chosen by the traveler. It includes walking or travel by car or travel by public transport and variables associated with the alternatives that describe individual characteristics in addition to trip characteristics. These variables affect the utility of each choice made by travelers [21,22].
The general utility function is as follows in Equation (2): where: V (Si) = is the value of the utility function of mode choice (i) by traveler, S ik = is the mode choice (i) by traveler, which includes walk, car, and public transport (PT) β k = is the coefficient of the independent variable or variables associated with the alternatives that describe individual characteristics in addition to trip characteristics which include travel time, travel cost, trip distance, activity purpose, individual income, and transfer number.
For each mode that is included in this research, the variables are walk (W), car (C), and public transport (PT). Thus, the utility function of each alternative reads as follows (3, 4, and 5): V (Si) = is the value of the utility function of mode choice (i) by traveler, S ik = is the mode choice (i) by traveler which includes walk, car, and PT, A WP = is weight case of activity purpose (i), I N W I = is weight case of monthly income (i), β k = is coefficient of the independent parameter that defines alternatives of mode choice (i) chosen by traveler, From the equations above, one can see the linear utility function of mode choice. It is used to estimate the utility values of each choice of alternatives, which depend on the values of travel cost, travel time, trip distance, and the variables associated with the alternatives.
Traveler's choice means assigning the chosen value of the alternative with the high utility and not a choice of another alternative with less value.

The Multinomial Logit Model and the Nested Logit Model
The mode choice process can be easily explained by random utility theory [23]. For this study, the multinomial logit model and a hierarchical logit (or nested logit) modelling structure are used to investigate and identify the effect of the variables related to travelers on mode choices as well as estimate the coefficients of the underlying model. The multinomial logit model (MNL) and NL model are generally one of the best structures used for modeling mode choices, which are used in the utility function technique to identify mode choice in travel behavior analysis [24]. Mode choice models statistically relate the choice made by each traveler to the attributes of the alternatives available. The components of the utilities of the different set of alternatives in the MNL model are assumed to be independent.
The mathematical structure known as the MNL gives the choice probabilities of each alternative as a function of the systematic portion of the utility of all the alternatives. The general Equation (6) for the probability of choosing an alternative 'i' (i = 1, 2,.., J) from a set of J alternatives is: where: Pr (i) = is probability of utility for a mode choice (n) by the traveler choosing alternative (i), V in = is the utility systematic component for a mode choice (n) by the traveler choosing alternative (i), V J = is the systematic component of the utility of the set alternative (j).
The MNL identifies how the independent variables are related to the dependent variable and are expressed in terms of utility.
The nested logit (NL) model is a generalization of the multinomial logit model (MNL), and it characterizes a partial relaxation of the independence of irrelevant alternatives (IIA) property of the MNL model. A nested logit model is appropriate when the subsets of similar alternatives are grouped in hierarchies or nests [25,26]. A nested logit model can consist of three-mode choice groups: public transportation modes, private transportation modes, and walk modes. The NL model can be calibrated to find coefficients by using standard logit estimation. The hierarchical structure of the NL model as the one represented by Equation (7) is estimated for each hierarchy [27,28]. where: P ji = is the probability that traveler i chooses alternative j, β = is a vector of all estimable coefficients for alternative j, S Ji = is a vector of all explanatory variables for traveler i.
The multinomial logit model treats all alternatives equally, whereas the nested logit model includes intermediate branches that group alternatives. The multinomial logit models are more widely adopted than the nested logit model to the paradigm of utility maximization where the MNL model provides a link by which choice probabilities can be estimated given the characteristics of the modes and the traveler.

Study Area
One of the important steps in the household survey process is the selection of the study area. Budapest is the capital and the most populous city of Hungary; therefore, it is a suitable choice as the study area.
The data collection was realized by the Hungarian Census Bureau in 2014. This type of data provides detailed information about the travel behavior of the travelers including socio-demographic variables and daily activity plans. In more details, the dataset consists of the following variables: departure time, arrival time, travel time, transport mode, activity type (e.g., home, work, leisure, education, and other). In addition, the data collected from respondents includes information on individuals and their household characteristics, the daily activity, and trip characteristics. From the collected data, after filtering, a total of 1889 trips were identified. The applied interview process is designed to cover most of the individuals and household characteristics, the daily activity, and trip characteristics. This interview method is structured along with questions regarding what aspects appear in the daily activity process of the respondents and why these elements are important and how each aspect affects the mode choice.

Socio-Economic Characteristics
This section presents a descriptive analysis of the socio-economic characteristics of the household and individuals obtained from the sample and also the trip characteristics. This analysis clearly explains that there are significant variations between employment percentage, income level and education level, and vehicle owners in the study area. Table 1 shows the descriptive statistics of the study area, which represents socio-economic and demographic characteristics of the household and individuals. Table 2 presents the frequencies percentages of household and individuals' characteristics and activity purpose.

Model Specification
The primary data collected from the individual information and the household questionnaire survey are sorted out and coded as different groups of similar characteristics. The coded data are later used as the set of variables for model generation where the chosen variables for the model are based on previous theoretical and empirical work on the mode choice model and daily activity analysis conducted by other researchers. Hence, the final specification of the variables based on statistical testing is arrived at here.
Three categories of variables that have an influence on the transport mode are considered: namely, individual and household socio-demographics, daily activity characteristics, and trip characteristics. Table 3 provides a list of variables used in the model. The descriptive analysis of the variables included in the study is given in Table 3.

Checking of the Selected Model
Then, we investigated the fitting of the model to the data for any particular MNL model [29]. The goodness-of-fit of a statistical model describes how well it fits into a set of observations. According to the results, the model has shown goodness-of-fit to the data. Table 4 explains goodness-of-fit to the model. Table 4. Goodness-of-fit.

Pseudo R-Square
To assess the goodness-of-fit model, the pseudo R2 is examined. The pseudo-R2 values can be calculated by MNL, as shown in Table 5. R2 summarizes the proportion of variance in the dependent variable associated with independent variables. The model with the largest pseudo-R2 statistic is the best according to the measures [30].

Goodness-of-Fit Measures
The likelihood-ratio test assesses the goodness of fit of two competing statistical models based on the ratio of their likelihoods, specifically one found by maximization over the entire parameter space and another found after imposing some constraint [31]. The significance of the difference between (Likelihood Ratio Tests) and (-2 Log-Likelihood of Reduced Model) for our selected model is given in Table 6. From the results, it is showing that the vehicle per household, time, and cost variables have more effect on transport mode choice than other variables. These results show the existence of a statistically significant relationship among the combination of independent variables and the dependent variable.

The Multinomial Logit Model
The multinomial logit (MNL) model was used to estimate the coefficients using in the utility function and identify the influence of the different individuals' and the household characteristics and also trip characteristics on the mode choice. The MNL model of the estimation coefficients, the statistically significant level, and t-test of the variables obtained from the analysis are shown in Table 7. The model estimation is statistically significant, and multi-collinearity is not present in the model; therefore, the standard errors of the regression coefficient β have a value of no more than 4. On the other hand, regression coefficients of the independent variables are measured by calculating the change in the logit for a one-unit change in the predictor variable while the other predictor variables are kept constant.
In this study, the dependent variable is the transport mode, which includes a walk (walk and bike), private transport (car), and public transport (PT).
To analyze the model, we have discussed and focused on independent variables related to dependent variables, which have statistical significance less than 0.05 being based on the model results as shown in Table 8. Consequently, model interpretation will only focus on these variables as follows: • The results indicate that the one-unit increase in the independent variable (Gender male) is associated with a 0.607 increase in the relative log odds of choosing a car against choosing public transport (0.495). This result shows that males incline to travel by car more than by using other transport modes. • Table 7 also shows the variable of the vehicle per household (1), which indicates that the one-unit increase in the independent variable is associated with a 1.232 increase in the coefficient of choosing a car. This confirms that the presence of vehicles in a household has a strong statistical influence to reduce public transport use.

•
The model result shows that the monthly income (4 and 5) and the one-unit increase in the independent variable (INCOME-4 and 5) is associated with a (0.650 and 0.962) increase in the relative choice of car. Consequently, this result indicates that an increase in the total household income affects the increase in the probability of travelers to use the car. The model result has shown the one-unit increase in the independent variable of the time (2 and 3) respectively, is associated with a 0.241 and 0.103 increase in the coefficient of choosing the travel mode by PT against the decrease in travel by car, which is (-0.235). This confirms that when the travel time increases by more than 30 min, travelers transform from PT to private transport.

•
The variable of the work activity (1) indicates that the one-unit increase in the independent variable is associated with a 0.416 increase in the coefficient of choosing travel mode by car against the decrease in travel by PT, which is (0.326), and the walk is 0.113.

•
The findings of the independent variable of the distance shows that travelers prefer to walk when that distance is less than 4 km, while other travelers prefer to travel by PT when the distance between the activity destinations is 4-10 km. Then, the travelers are inclined to travel by car if the distance more than 10 km.

•
The variable of the travel cost (4), which indicates that the one-unit increase in the independent variable is associated with a 0.938 increase in the coefficient of choosing travel mode by PT against an increase in travel by car, is 0.497. This confirms that the travel cost has a strong statistical influence on public transport use and car use. • Table 7 shows that the increase in the number of transfers from public transport to make the trip to the activity destination leads to decreasing the probability of using PT for traveling.
Considering these study findings, the individuals and the household characteristics, and also the trip characteristics, they have their effect on the mode choice in Budapest with a statistically significant difference or variation for each independent variable. In household characteristics, it has been found that family size has a positive effect on choosing the car for traveling as well as vehicle per household has a positive effect on choosing private transport, where the results have also shown that married travelers are more likely to use a personal car more than other choices. Household income shows a negative coefficient for public transport and walk, and it indicates a positive coefficient for car, suggesting that higher incomes lead to a greater likelihood of using private transport. In trip characteristics, travel time is a negative coefficient for walking and the use of cars in the private nest: as travel time increases, travelers are more likely to substitute private transport for public transport. Distance between activity destination and travel cost shows a positive coefficient for cars suggesting that travelers living far away from the downtown are more likely to choose these modes. It is observed that increasing the number of transfers between modes lead to a reduced likelihood of using public transport. Table 9 shows the correlations the parameters of the modes (walk, car, and PT) have beside the 20 explanatory variables within the NL model. It has been found out that there is a significant correlation between the modes and the main explanatory variables. The goodness of fit of the model is reasonable, and the data used in this model are found to be appropriate. The log-likelihood ratio (LLR) test is conducted for the nested logit model. The LLR value of the developed model is substantially greater than the chi-square value with the respective degrees of freedom as far as Table 10 can tell.

Estimation of Utility Function
The estimation procedure is based on random utility maximization (RUM) choice theory to provide parameter estimates for the mode choice based on the utility function. Specifically, the multinomial logit model with utility function as the objective function of mode choice is estimated based on the observed one-day daily activity traveling found in conventional travel datasets. In this study, the choice set of alternatives has been determined with reference to observed travel patterns from travelers in the dataset. Table 11 and Figures 1 and 2 present the utility values of mode choice concerning gender (male and female). It has been observed that men prefer to use private transport when making trips, based on the average utility value, and the utility values of private transport are greater than those of public transport. In contrast, women prefer public transport for making trips when compared with men; based on the average utility value, the utility values of private transport are less than PT. Table 11. Summary: Average utility value of transport mode based on the variables.   The age of traveler has shown an effect on utility value for choosing the mode. This is shown in Figures 3-5 and Table 11. Travelers who belong to age group of 10 to 20 are more users of a walk or public transportation for complete their activity. In addition, travelers at the age of 21-59 have less utility value for using public transport compared to traveling by car. Meanwhile, it has been found that travelers in the age group of 60 years and above make a greater number of trips by public transport compared to other age groups.   The number of persons in a household influences the utility values concerning mode choice for traveling. Table 11 and Figures 6 and 7 present the utility values of the mode choice concerning the family size. It has been observed that family size (1 or 2 people) has less probability of using private transport when making trips. In contrast, this probability of using private transport for making trips increases. Based on the average utility value, the utility values of private transport are more than public transport when the family size is larger (more than two people).    Table 11 include results indicating that workers have a greater propensity to make work activity trips by using private transport more than other modes. According to the results found in Figure 9 and Table 11, one can find that the shopping activity is often affected by the location of the market or how far it is from the house. The travel by using a walk or public transport to complete shopping activity has more utility to the traveler than the use of the car for shopping.

Utility of PT
Referring to the results given in Figures 10 and 11 as well as Table 11, one can clearly find out that the household income has a significant impact on the choice of transport mode, and it consequently affects the utility value of traveler in the choice of the transport mode. In particular, the traveler with a high income is inclined to use a car for traveling.    Table 11 presents the average utility values of variables (gender, age, family size, activity purpose, and monthly income) in relation to mode choice. This analysis has been done by considering the modes a walk, private transport (car), and public transport.
The presented figures and Table 11 show that the utility values are highly dependent on the socio-demographic parameters, especially age and activity purpose. This means that the final choice of transport mode will be different for the younger generation and older generation, as they have specific priorities, which are reflected in their utility function. In general, younger persons tend to walk, when realizing an activity, while this transport mode is almost neglected in case of the older generation. Transport by car is mostly appreciated for middle-age persons. Similar differences apply for the activity purpose, where work-related and leisure-related activities are attracting different transport modes. For example, going to work is usually a commuting activity by car, while shopping at the grocery store can be easily achieved by walking.

Conclusions
This research aims to estimate the influence of various variables on mode choice. To achieve this work, we used a multinomial logit model (MNL) and nested logit model (NL) to analyze the relationships between mode choices of the traveler and the characteristics of the individual, the household, the activities, and the trip.
Using the modeling techniques, tests were carried out to justify that the models fit the data according to statistical terms. In addition, the estimates of the variables were reviewed and interpreted. As the main results, it was found that these estimates focusing on the variables have a high significance on the mode choice. The coefficients and t-tests of variables showed that all explanatory variables were significant, but the effects and contribution of each variable were not the same. Thus, they were sorted according to their effects based on the model. As a conclusion, the significance of the variables could be identified, in the following order: "trip distance", "travel time", "activity purpose", "household income", "travel cost", "vehicle per household", "number of transfers between public transport", "family size", "age", "gender", "employment", and finally "education". This means that travel time and activity purpose are the most decisive factors, which influence the mode choice of an individual.
The utility function presented in this study highlights the concepts of utility maximization when performing the mode choice of travelers. This function includes the utility of performing daily activities so that it introduces a trade-off between different modes based on utility. This model allows accommodating the selection of travel mode depending on the trip distance, travel time, travel cost, activity purpose, monthly income, and transfer number to achieve the best personal utility. New terms in the utility function are introduced to improve the model performance and simplify the estimation of the maximum utility of mode choice. With the results, it is possible to estimate the influence of the various variables on mode choice and identify the best mode based on the utility function. This can have a practical application when elaborating urban traffic models and travel plans. Thus, the results can be relevant for both to urban planners and city-related decision-makers.
Overall, the research provides promising insights into the mode choice behavior in Budapest. The main limitation of the study lies in used modeling techniques. Hence, advanced models such as mixed logit or Probit models can be applied for further improvement. In addition, as the travel behavior of travelers may change over time, this may imply some limitations of the results. Therefore, it would be beneficial to run the models on a fresh and full dataset in the future. Another limitation is the size of the useful dataset. In the filtering process, several trips had to be discarded because of missing information. As future research, extended analysis to a large dataset will be useful for further identifying various variables considering the effect on mode choice. Finally, this study leads to the development of a microsimulation-based prototype of mode demand models.  Data Availability Statement: Main data sets collected and analyzed during the current study come from the Hungarian Census Bureau and can be found here: https://www.ksh.hu/stadat_annual_2_2.