Research on the Choice Behavior of Taxis and Express Services Based on the SEM-Logit Model

: With the development of Internet technology, online car-hailing is booming in China, which has profoundly a ﬀ ected people’s travel structures. In order to seek the sustainable development of taxi and online car-hailing services from the perspective of passenger mode choice behavior, the mechanism of passengers’ decision-making procedures and their travel mode choice behaviors were analyzed. To study the inﬂuence of latent variable factors on passenger choice behavior, this paper ﬁrstly designed a questionnaire, and a structural equation model (SEM) was established for the preliminary study of the relationship between the latent variables and the behavioral intentions using the online survey data. Then, the latent variables were introduced into the Logit model, setting up the SEM-Logit model to explore the mode choice patterns between taxis and online car services. The results showed that the SEM-Logit model with the latent variables is better than a general Logit model in terms of the model precision and hit ratio. Meanwhile, after introducing the latent variables, it was found that convenience, comfort, and economy factors have a signiﬁcant inﬂuence on the model, and the explanatory power of the model increases accordingly.


Introduction
The trip mode choice is important in traffic demand analysis. Most factors that affect the trip mode choice can be directly observed, such as required cost, time, and other economic indicators, and have been widely applied in various research papers. In addition, it is also believed that there are some potential or non-directly observable factors, such as attitudes and feelings.
In recent years, online car-hailing has become popular in China, and a number of ride-hailing platforms have formed as a consequence. Online car-hailing services realize the information required for matching of the driver and the passenger and reduce the proportion of customers searching. Meanwhile, it can replace the private car travel needs of some high time value groups. Hence, it is a sustainable travel mode. Unlike taxis, online car-hailing services can only carry passengers through network appointments and are not allowed to cruise (Service types of them are showed in Figure 1). Although both online car-hailing services and taxis can provide personalized and door-to-door services (both of them are on-demand mobility services, and taxis additionally can be used for street hailing [1]), some researchers have shown that there are certain differences in the travel characteristics between them. Cui et al. [2] analyzed the online order data of taxis and express services (one type of online car-hailing service; the service and the price are roughly equal to a taxi) and found that, in terms of the travel time of taxi online orders, trips of less than 20 minutes account for 22.8% of all rides, trips of 20-30 minutes account for 24.5%, and trips of 30-40 minutes account for 21.2%. Moreover, the hot spots for taking on or off vary greatly. However, the online order data of express services are The rapid development of online car-hailing has had a great impact on the taxi market, because it has realized information matching between passengers and drivers by using Internet technology. Isaa and Davis (2014) [3] insisted Uber severely disrupted the taxi service industry. Rayle et al. (2015) [4] investigated the usage of online car-hailing in San Francisco, USA, and explored the characteristics and the reasons of online car-hailing users as well as the influence of the online car-hailing market on the traditional taxi market, public transport, and private cars. The research suggested that online car-hailing can be a substitute for public transport and private cars to some extent. McKenzie and Baéz (2016) [5] differentiated Uber and taxi transportation through events attended by their passengers and explored event detection at a variety of spatial and temporal resolutions. Harding et al. (2016) [6] discussed taxi apps and their impact on taxi markets and suggested regulators should focus on reducing the likelihood of monopoly and collusion in a taxi market led by apps. Flores and Rayle (2017) [7] explained how and why Uber came to be accepted in San Francisco. Jiang et al. (2018) [8] comprehensively compared Uber, Lyft, and taxis with respect to key market features. Watanabe (2016) [9] considered that Uber used a disruptive business model driven by digital technology to trigger a ride-sharing revolution and fundamentally improved the efficiency of the load factor. Kim (2018) [10] concluded that Uber and taxis can create positive values through well-intentioned competition.
In terms of the travel characteristics analysis of online car-hailing, Dawes (2016) [11] found that there are relationships between the transportation characteristics of Uber or Lyft and user identification and attitude. Poulsen et al. (2016) [12] conducted a geo-spatial analysis of Green cab and Uber rides in the outer neighborhoods of New York City and found that demand for Green cabs is still growing, but that the number of Uber rides in the same area is growing more rapidly, while Green cabs are performing better than Uber in relatively poor neighborhoods. However, when looking at differences between weekdays and weekends, they found no differences between Green cabs and Uber. Abel and Kerry (2018) [13] showed that the number of Uber and Lyft rides is significantly correlated with whether it is raining by using all taxi, Lyft, and Uber rides in New York City. He and Shen (2015) [14] proposed a spatial equilibrium model that balanced the supply and the demand of taxi services in a regulated taxi market with the smartphone-based e-hailing application. Brodeur and Nield (2016) [15] found that the number of Uber rides per hour increases when it is raining, and surge pricing encourages an increase in supply. During the same time period, however, the number of taxi rides per hour decreased after Uber entered the New York market. Chen (2016) [16] studied driver work practices under the dynamic price markup of Uber and found that the driver's work adjustment is more flexible when the premium is high.
In the research on the choice behavior of online car-hailing, Peng et al. (2014) [17] adopted the planning behavior theory (TPB), the rational behavior theory (TRA), and the technology acceptance model (TAM), and analyzed the behavior intentions of passengers using car-hailing software. The The rapid development of online car-hailing has had a great impact on the taxi market, because it has realized information matching between passengers and drivers by using Internet technology. Isaa and Davis (2014) [3] insisted Uber severely disrupted the taxi service industry. Rayle et al. (2015) [4] investigated the usage of online car-hailing in San Francisco, USA, and explored the characteristics and the reasons of online car-hailing users as well as the influence of the online car-hailing market on the traditional taxi market, public transport, and private cars. The research suggested that online car-hailing can be a substitute for public transport and private cars to some extent. McKenzie and Baéz (2016) [5] differentiated Uber and taxi transportation through events attended by their passengers and explored event detection at a variety of spatial and temporal resolutions. Harding et al. (2016) [6] discussed taxi apps and their impact on taxi markets and suggested regulators should focus on reducing the likelihood of monopoly and collusion in a taxi market led by apps. Flores and Rayle (2017) [7] explained how and why Uber came to be accepted in San Francisco. Jiang et al. (2018) [8] comprehensively compared Uber, Lyft, and taxis with respect to key market features. Watanabe (2016) [9] considered that Uber used a disruptive business model driven by digital technology to trigger a ride-sharing revolution and fundamentally improved the efficiency of the load factor. Kim (2018) [10] concluded that Uber and taxis can create positive values through well-intentioned competition.
In terms of the travel characteristics analysis of online car-hailing, Dawes (2016) [11] found that there are relationships between the transportation characteristics of Uber or Lyft and user identification and attitude. Poulsen et al. (2016) [12] conducted a geo-spatial analysis of Green cab and Uber rides in the outer neighborhoods of New York City and found that demand for Green cabs is still growing, but that the number of Uber rides in the same area is growing more rapidly, while Green cabs are performing better than Uber in relatively poor neighborhoods. However, when looking at differences between weekdays and weekends, they found no differences between Green cabs and Uber. Abel and Kerry (2018) [13] showed that the number of Uber and Lyft rides is significantly correlated with whether it is raining by using all taxi, Lyft, and Uber rides in New York City. He and Shen (2015) [14] proposed a spatial equilibrium model that balanced the supply and the demand of taxi services in a regulated taxi market with the smartphone-based e-hailing application. Brodeur and Nield (2016) [15] found that the number of Uber rides per hour increases when it is raining, and surge pricing encourages an increase in supply. During the same time period, however, the number of taxi rides per hour decreased after Uber entered the New York market. Chen (2016) [16] studied driver work practices under the dynamic price markup of Uber and found that the driver's work adjustment is more flexible when the premium is high. In the research on the choice behavior of online car-hailing, Peng et al. (2014) [17] adopted the planning behavior theory (TPB), the rational behavior theory (TRA), and the technology acceptance model (TAM), and analyzed the behavior intentions of passengers using car-hailing software. The results showed that the perceived usability, the perceived usefulness, and the compatibility have positive indirect effects on user attitude, consequently affecting the user intention; subjective norms have positive direct impacts on user behavior intention, whereas perceived risk has a negative direct impact on behavior intention. The perceived price level has an impact on behavior intention and user attitude. Zhang (2017) [18] used the Push-Pull-Mooring (PPM) model to study the transfer intention of online car-hailing users. Zhu (2017) [19] constructed a passenger satisfaction model based on the structural equation model. Zhang et al. (2016) [20] set up a binomial logit model to analyze the urban residents' choice behaviors for taxi and online car-hailing. It is observed that scholars often consider the subjective attitudes of users in the study of behavior intention. In the 1970s, Spear (1976) [21] discussed how abstract transportation system characteristics can be quantified and included as explanatory variables in models of travel demand behavior. Since then, perceptions, feelings, preferences, and other variables have been taken into account in choice models, and it has been proven that these potential variables have a significant impact on the results of choice [22][23][24].
To sum up, it is not only the direct observation variables-such as passengers' personal characteristics and trip characteristics-that affect their choice behavior, but also many psychological factors (for example, personal feelings, attitudes, etc.) that cannot be directly observed. This paper established a choice model considering the latent variables. Firstly, the questionnaire data was sorted, the decision-making process of the trip mode choice behaviors of passengers was analyzed, and the appropriate variables were selected. The structural equation model (SEM) was adopted to discuss the influence relationships among the variables. Subsequently, the analyzed latent variables were substituted in the choice model, and the choice model with the latent variables included was constructed to explain and predict the choice behaviors of passengers. For the theoretical aspect, this paper conducted a model analysis of the selection behavior of taxi drivers and passengers at the psychological level, and a more suitable mechanism for the analysis of the behavior selection model was put forward. In practice, survey data were used for calibrating the model, which could provide suggestions for relevant departments to develop sustainable development policies for taxis and on-line car hailing, optimize scheduling distribution, and improve passenger satisfaction.
The structure of the paper is as follows. Section 2 outlines the model and the selected variables. Section 3 describes the data collection and the data statistics. Section 4 analyzes the results. Section 5 gives the conclusion.

SEM-Logit Model
In order to describe the influence of subjective factors on the passengers' decision-making processes, a SEM-Logit model considering the latent variables was constructed. The model consisted of two parts-the first part was the SEM model, which was mainly used to describe the causal relationship between the latent variables of travel mode selection and the corresponding observation variables. The second part was the Logit model, which was used to express the nonlinear function relationship between the probability of choosing a certain travel plan and the potential variables influencing the decision. Research on the application of the SEM-Logit model includes Yáñez et al. (2010) [25], who used hybrid discrete choice models for an urban multimode choice case incorporating latent variables. Chen and Li (2017) [26] presented a mode choice model for public transport, which integrated the structural equation model and the discrete choice model with categorized latent variables. Ding et al. (2018) [27] investigated the influences of the built environment on car ownership and travel mode choice simultaneously by making use of a multilevel integrated MNL (multinomial logit model) and SEM. Han et al. (2018) [28] used SEM-NL methodology to explore the possible causal relationships among personal waiting behavior, attitudes to bus service satisfaction, and travel mode choices of passengers waiting at a bus station. The specific description of the model is as follows:

Improvement of the utility function
The latent variable is added to the fixed utility term so that the utility function includes not only explicit variables such as trip characteristics and personal socio-economic characteristics of passengers but also latent variables such as perceptions, attitudes, and so forth. The improved utility function can be expressed as [25]: where i refers to an alternative, n is the number of passengers, l is the number of directly observable characteristics for the passengers, q is the number of directly observable characteristics for the trip mode, k is the number of latent variables, s iln is manifest variables of personal characteristics, z iqn is manifest variables of the trip mode, η ikn is latent variables, and a il , b iq and c ik are parameters to be estimated.

Adaptation coefficient calculation for latent variables η ikn
In order to determine the adaptation coefficient of the latent variables, SEM is needed to describe the relationship between the latent variable and its measurement variable. The latent variables η ikn can be described by a series of corresponding measurement variables x irn . Taking an external latent variable in the model as an example, for the observation model of the structural equation model of the attitude perception attribute in the trip mode selection of passengers, the vector form is expressed as: The load factors (path coefficients) Λ x1 , Λ x2 , . . . , Λ xn explained by exogenous potential variable η 1 are regarded as the weight of each observation variable (index variable), and then the load factors are standardized; α x1 , α x2 , . . . , α xn are used to represent the assigned weights.
Finally, the survey values of the observed variables are substituted into the formula, and then the adaptation values of the potential characteristic variables in the attitude perception attribute for the passengers can be obtained.

Discrete choice model
To describe the decision-making behaviors of the passengers, a binomial variable d in (the value can only be 0 or 1 when d in = 0, indicating that option i is not selected; when d in = 1, it indicates that option i is selected) is introduced. The formula is shown as follows:

Model Specification and Hypothesis
To study the passengers' choice behaviors for taxis and online car-hailing, an online survey using a questionnaire was conducted to obtain research data. The SEM-Logit model was established by using the SEM combined with the Logit model. The model consisted of two parts-the first part was the SEM model, which was mainly used to describe the causal relationship between the latent variables of the trip mode choice and their corresponding observation variables, and between the latent variables and explicit variables. According to the literature [26,[29][30][31][32], five types of exogenous latent variables-convenience, reliability, comfort, safety, and economy-were selected. The perceptual value and the behavioral intention were endogenous latent variables, and the specific explanation of the selected variables is shown in Table 1. The second part was the Logit model, which was used to describe the functional relationship between the probability of choosing a trip mode and the latent variables and explicit variables that affect the decision-making. It is worth noting that the observed variables had no influence on the individual choice behavior, and could only be used to measure the latent variables.

BI1
Be willing to choose this mode in the next year BI2 Happy to recommend this mode to others Before the establishment of the model, the following assumptions needed to be made regarding the model: (1) travelers were rational in making mode choices, as they would choose the travel scheme with the highest utility value; (2) options of mode choice were categorized into two types, taxis and express services; (3) selection evaluation depended on the utility function U, which included both potential and explicit factors; (4) the error distribution of each utility function followed a Gumbel distribution with an independent mean value of zero, while the error distribution of the rest of the stochastic actor functions followed a normal distribution; (5) independence of observations and no multicollinearity [33].

Data Collection and Analysis
Data were collected from the online questionnaire and were measured with a five point Likert scale (1: very negative to 5: very positive). A one-off survey method was adopted, and the respondents obtained a reward after completing the questionnaire. The contents of the questionnaire included personal attributes, trip attributes, and attitude attributes of the respondents. The investigation was conducted from 19 December to 24 December 2018. In total, 519 questionnaires were sent and recovered. Samples in which respondents did not seriously answer the questions, samples with three or more questions void of answers, and samples with five continuous extreme values were eliminated. This left 452 questionnaires for analysis, with a validity rate of 87.09%.

Descriptive Statistical Analysis
In the valid survey questionnaires, the proportion of males and females in the sample was basically consistent. The respondents were mainly young people aged between 21 and 30 (70.4%). Occupations were mainly students and employees in enterprises, accounting for 68.4% of the total sample. Education levels were concentrated in bachelor degree or above (90%). Individuals earning less than CNY 2500 a month accounted for 30.1% of the sample, followed by CNY 7001 and 10,000 (19.9% total). The details are shown in Table 2.

Travel Characteristics Analysis
Among the most commonly used modes of transport, 54.9% of respondents chose express services, higher than the preference for taxis at 45.1%. Among the respondents who chose taxis as their trip mode, the proportion using car hailing software (63.2%) was higher than cruising (37.7%).
(1) The relationship between personal attributes and trip mode selection It can be seen from Table 3 that the proportion of males choosing express services was higher (58.6%), while the percentage of females choosing taxis and express services was basically equivalent at 49.3% and 50.7%, respectively. In the relationship between age and travel mode choice, it can be seen that the proportion of young people choosing express services was obviously higher than those choosing taxis, but with the increase of age, the proportion choosing taxis increased gradually and eventually surpassed the proportion choosing express services. In terms of occupation, except among students, the proportion of business employees and other professions choosing express services was about 60%, and the proportion of other occupations choosing taxis was higher than the number choosing express services. The proportion of liberal professions and retirees choosing taxis was much higher than the proportion choosing express services. In terms of education level, except among those with a high school degree or below, the proportion of those who chose taxis was higher than the proportion choosing express services. Moreover, the proportion of those with other qualifications was Sustainability 2019, 11, 2974 7 of 13 higher than that of those choosing taxi. For the factor of monthly income, people with incomes of CNY 2501-4000 and CNY 5501-7000 were more likely to choose taxis than express services. (2) The relationship between travel attributes and trip mode selection It can be seen from Table 4 that the proportion of business offices using taxis was higher than the proportion of commuting and recreation users. For travel at night, the proportion choosing express services was obviously higher than those choosing taxis. Similarly, the percentage choosing express services was apparently higher than those choosing taxis when starting from suburban areas. In terms of travel distance, when the travel distance was less than 5 kilometers, express services were the main choice, while the proportion choosing taxis was higher when the travel distance exceeded 5 kilometers. When the waiting time was below 10 minutes, express services were the major travel mode.

Reliability Test
After obtaining the sample data through the formal questionnaire survey, a reliability test of the formal questionnaire was conducted again, and the Cronbach coefficient α of the seven latent variables involved in the model was calculated. The results are shown in Table 5. The reliability coefficient of each latent variable was greater than 0.8, indicating that the questionnaire had good reliability and could be used for structural equation analysis.

Parameter Estimation of SEM
It was necessary to calculate the adaptation coefficient of the latent variables before the establishment of the SEM-Logit model. To calculate the adaptation coefficient of the latent variables, the structural equation model needed to be established to estimate the path coefficient of each latent variable for the calculation of the adaptation coefficient. It was assumed that the travel behavior intention in the structural equation model was affected by the perceived value of the mode, and the perceived value was affected by convenience, reliability, comfort, safety, and economy. The structural equation model for establishing the choice behavior intention of an express service is shown in Figure 2. The load factor coefficients of each variable were at the 0.05 significance level. The specific test results are shown in Table 6, and the fitting results of the model are shown in Table 7.
The specific test results are shown in Table 6, and the fitting results of the model are shown in Table  7.

Parameter Estimation of the SEM-Logit Model
The trip mode choice behavior of taxi and express services considering the latent variables was based on the BL (binary logit) model, and the five latent variables of perceived value-convenience, safety, reliability, comfort, and economy-were substituted into the choice model. Afterwards, software was used to calibrate and verify the model parameters. Taking taxis as a reference, the utility function of the express services could be established as: V express = Constant + θ 1 · Gender + θ 2 · Age + θ 3 · Occupation + θ 4 · Education +θ 5 · Income + θ 6 · Purpose + θ 7 · Time + θ 8 · Lacation + θ 9 · Distance +θ 10 · Waiting + θ 11 · TC + θ 12 · TS + θ 13 · TR + θ 14 · CC + θ 15 · TE Using the estimated path coefficient and the corresponding structural equation model, adaptation coefficients of the latent variables of perceived attitude, such as convenience, safety, reliability, comfort, and economy, were substituted into the choice model. These latent variables, personal attributes, and the characteristic vectors of trip attributes were taken as the characteristic variables of trip utility to calibrate and verify the model parameters. Taking taxis as a reference level, the model parameter estimation results were obtained according to the basic attribute survey data and the survey data of attitude perception. The specific parameter estimation results and relevant test results are shown in Table 8. According to the parameter calibration results from Table 8, the trip mode choice model could be obtained, and the selection function is shown below: ln P 1n P 0n where P 0n is the probability that passengers choose a taxi, and P 1n is the probability that passengers choose an express service.
The test of the model mainly included the chi-square (χ 2 ) test of the likelihood ratio, the McFadden test (ρ 2 ), and the hit rate test.
(1) Under the hypothesis H 0 : θ 1 = θ 2 = . . . = θ k = 0, and at the significance level of 5%, the chi-square test value χ 2 α of 15 degrees of freedom is 24.996, when −2(L(0) − L(θ) = 479.404 > χ 2 α and the null hypothesis is rejected, which indicates that the feature vector of the model has a significant impact on the choice of trip mode.
(2) The McFadden's determination coefficient ρ 2 of the model is 0.271, which is higher than the BL model without considering the latent variables (ρ 2 = 0.116), indicating that the precision of the model when considering the latent variables has better precision.
(3) As can be seen from Table 9, the hit ratio of the SEM-Logit model is higher than that of the BL model.

Discussions and Conclusions
According to the estimated values of the characteristic variables in Table 8, it can be seen that latent variable factors had significant impacts on the choice results. Single-factor analysis was conducted on travelers' trip mode choice behaviors. Assuming that other characteristic variables remained invariable, the impact of changing a certain characteristic variable on the choice of travel mode was quantified.
(1) Personal attributes In variables relating to personal attributes, the significant variables were age, education level, and income. The regression coefficient of age in the model was −0.493, indicating that, with the increase of age, travelers were more inclined to take taxis, which is also consistent with the commonly held belief that smartphones are less popular among older travelers. The regression coefficient of education level in the model was 0.119, with a probability ratio of choosing taxis and express services among passengers of 1.127, indicating that, in the survey, with the improvement of degree education to the next level, the passengers choosing express services increased 1.127 times. The regression coefficient of income was positive, showing that, with an increase of income, passengers preferred to choose express services.
(2) Travel attributes In variables relating to travel attributes, the travel time and the waiting time were not significant. The regression coefficient of travel purpose was −0.403, showing that, in the travel mode choice of non-commuters, people were more inclined to choose taxis. The regression coefficient of departure place was 1.290, indicating that passengers taking rides from suburban areas preferred to choose online car-hailing. This is consistent with the fact that taxis are mostly concentrated in urban areas, and few are in suburban areas. The regression coefficient of travel distance was negative, and the probability ratio was 0.783, meaning that, for long-distance travel, passengers were more inclined to use taxis due to the fact that most long-distance travel was business trips, and the probability of using taxis by business passengers was higher.
(3) Attitude perception attribute In variables relating to attitude perception attributes, convenience, comfort, and economy were significant. Reliability and safety did not pass the significance test. The regression coefficient of convenience was 1.678, indicating the number of passengers who thought the convenience of express services was higher than that of taxis, with a probability ratio of 5.355. That is, when the convenience perception of passengers improved by one level, the probability of choosing express services increased by 5.355 times. Similarly, the regression coefficient of comfort was −0.280, showing that passengers were not satisfied with the comfort of express services. The regression coefficient of economy was 0.854, showing that passengers were satisfied with the economy of express services. This paper selected taxis and express services as the investigated modes of travel. The structural equation model and the factor analysis method were used to calculate the fitted values of the latent variables. The calculation results were substituted into the utility function and the established trip mode choice behavior of taxis and express services considering the latent variables to quantitatively describe the impacts of latent variables on choice results. The results showed that the SEM-Logit model including latent variables was better than the BL model without consideration of latent variables in terms of model precision and hit ratio. Meanwhile, after introducing the latent variables, it was found that convenience, comfort, and economy had a significant influence on the model, and the explanatory power of the model increased accordingly.