Analyzing the Influence of Transportations on Chinese Inbound Tourism: Markov Switching Penalized Regression Approaches

This study investigates the nonlinear impact of various modes of transportation (air, road, railway, and maritime) on the number of foreign visitors to China originating from major source countries. Our nonlinear tourism demand equations are determined through the Markovswitching regression (MSR) model, thereby, capturing the possible structural changes in Chinese tourism demand. Due to many variables and the limitations from the small number of observations confronted in this empirical study, we may face multicollinearity and endogeneity bias. Therefore, we introduce the two penalized maximum likelihoods, namely Ridge and Lasso, to estimate the high dimensional parameters in the MSR model. This investigation found the structural changes in all tourist arrival series with significant coefficient shifts in transportation variables. We observe that the coefficients are relatively more significant in regime 1 (low tourist arrival regime). The coefficients in regime 1 are all positive (except railway length in operation), while the estimated coefficients in regime 2 are positive in fewer numbers and weak. This study shows that, in the process of transportation, development and changing inbound tourism demand from ten countries, some variables with the originally strong positive effect will have a weak positive effect when tourist arrivals are classified in the high tourist arrival regime.


Introduction
As travel and tourism is one of the world's largest economic sectors, it has played an essential role in contributing to job creation and economic growth and creating prosperity worldwide [1][2][3]. In China, tourism has become an important contributor to the domestic economy since implementing reform and opening-up policies in the early 1980s. Among global destinations, China ranks fourth in total inbound tourists (Forbes, 2019). China Tourism Academy (2020) reported that the Chinese tourism industry generates 79.87 million jobs and 10.94 trillion yuan in revenue, accounting for 10.31% and 11.05% of total employment, and China's GDP, respectively. Therefore, the Chinese government takes tourism development into account when making essential policies on economic growth.
One of the critical factors driving tourism development is transportation, of which its importance to tourism development has been widely recognized [4][5][6][7]. Since the the Chinese reform and opening-up policies were first put into action, the issue of investment in transportation has received considerable attention as a prerequisite for tourism development, and transportation development has been placed at a high priority until today. There is a general agreement that tourism expands more when there are better transportation systems. Recently, China has provided advanced tourism services for international tourists inbound and developed/constructed and improved the domestic transportation infrastruc-ture, including international airports, high-speed railways, super-highways, expressways, and tunnel-roads to link tourists with various tourist attractions.
With the importance of the travel and tourism sector, the links between transportation and tourism have attracted more attention and debate among scholars in various fields [4][5][6][7][8]. Nevertheless, their empirical findings seem to reveal ambiguous results. Some studies indicated that transportation has a positive impact on tourism; some argued that there is no transportation contribution to tourism development, while others mentioned that the magnitude of the transportation effects is different in different periods. Since previous research results did not reach a consistent conclusion, we doubt that these ambiguous results are due to the use of different transportation indicators or modes, such as railway transportation [9][10][11][12], road transportation [13,14], air transportation [6,7,15], and maritime transportation [16,17]. In the case of maritime transportation, given the small number of ocean liners currently traveling to and from China, this transportation may weak impact the international tourism demand. However, this transportation allows tourists to travel to several port destinations, enjoy various amenities provided onboard, and participate in leisure programs. Park, Lee, Moon and Heo [18] mentioned that Chinese maritime transportation had grown rapidly, especially in the past decade, with the number of cruise tourists rising noticeably.
Therefore, it is more suitable to investigate the impact of different transportation modes on tourism development. In addition, it is a fact that these transportation modes have different times and costs of construction, while the government budget is limited. Hence, understanding the heterogeneous impacts of transportation modes may lead to efficient budget policies. Unfortunately, very few studies are available for China, which econometrically analyze the impact of different transportation modes on tourism development.
This study aims to fill this gap in the literature by ascertaining precisely which transportation modes play an important role in the number of inbound visitors. To accomplish our goal, four different modes of transportation, railway, land, air, and maritime, along with the macroeconomic determinants as control variables, are considered in this study. We analyze these factors' influence on the number of visitors from China's ten main international tourism sources, namely South Korea, Japan, Russia, USA, Mongolia, Malaysia, Philippines, Singapore, India, and Canada. The government can more confidently plan and develop tourism-related transportations based on a deeper understanding of tourism demand. From the methodological point of view, there has been no consensus regarding whether transportation has an asymmetric effect on tourism development. We suspect that the influence of transportation on tourism may not be stable over time, and the impact is unlikely to maintain its linear state with the expansion of transportations. Although transportations have affected tourism development in China, their structures and purposes differ markedly. Therefore, we suspect there is a nonlinear relationship between the scale of transportation and tourism development. In the literature, machine learning approaches and nonlinear econometric approaches can capture the nonlinear relationship between input and output variables. Huang, Ma, and Hu [19] mentioned that machine learning approaches are a subfield of computer science that evolved from the study of pattern recognition and computational learning theory in artificial intelligence. They also suggested that these approaches could provide high forecasting performance.
Although the machine learning approaches enable us to model the nonlinear relationship between transportation modes and China's international tourism, the economic interpretation of the weight parameters in the machine learning models is limited and difficult. Hence, in this study, we instead employ the nonlinear econometric model called the Markov-switching regression (MSR) model to verify the nonlinear impact of different transportation modes on China's top ten inbound tourists. The study also takes the advantages of this model in allowing us to analyze and compare the business cycle patterns of China's main sources of international tourism. The MSR model's ability to detect the business cycle dates is described by well-known statistical institutions (namely the Economic Cycle Research Institute, ECRI, and the National Bureau of Economic Research, NBER) [20]. Due to the different transportation impacts on tourism, the government and authorities should consider which type of transportation policy to implement, in order to manage structural changes, highlighting the importance of investigating the inbound tourism market business cycle. We also compared the business cycle patterns of China's ten main sources of international tourism. The results may reveal the reasons why significant differences arose in the structural changes in international tourism's when transportation infrastructures are developed.
The rest of the paper is organized as follows, with our literature review in Section 2. Section 3 describes the data and the methodology for empirical models. Section 4 presents the empirical results and conclusions with a discussion of major findings and policy implications are presented in Section 5.

Literature Review
Transportation and tourism are closely related to economic activities; the literature has increased its attention in relation to transportation's vital role in tourism development. In the last two decades, many studies have investigated the impact of various transportation modes, namely road, air, maritime, and railway, on tourism development, but controversial results exist on whether these transportation modes promote tourism. Some studies have revealed the positive influence on tourism, where some indicated the negative effect of transportation on tourism development, and others suggested that transportation has a supporting role at different magnitudes and at different times.
Many studies revealed that transportation plays a vital role in tourism development [5,21]. Khadaroo and Seetanah [6] considered air, land, and sea transports as factors affecting the tourism demand of Mauritius's island. They found that tourists from Asia, Europe, and America are susceptible to the island's transportation. Whereas, Pagliara, Mauriello, and Gomez [22]; Albalate and Fageda [9]; Yin, Pagliara, and Wilson [10]; Yin et al. [12]; and Pagliara, Mauriello, and Russo [11] have produced evidence of the significant effect of high-speed rail on tourist outcomes. Yin et al. [12] documented that high-speed rail improves the accessibility of the tourist to the destination due to the reduction of the tourist's traveling time. Button and Taylor [23] put air transportation in the tourism demand regression and conclude that it is essential in attracting international tourists. Air transport has expanded to new destinations, as well as reduced traveling time. In recent years, global tourism has developed rapidly, and the global air transport network is the main positive factor affecting inbound and outbound tourism [24]. Kanwal et al. [14] investigated the road and transportation infrastructure construction and the community support for tourism in the context of the China-Pakistan economic corridor (CPEC). They mentioned that road infrastructure and road transportation play a great role in tourism development for providing all possible destinations' accessibility. Hardy [25] also suggested that touring routes or self-drive trails have become a more popular attraction, and a better road infrastructure could provide a more satisfactory tourism experience. Although different transportation modes are considered in the tourism literature, the previous studies concluded that improved transportation usually increases a tourist destination's accessibility and increases tourist demand.
In recent years, transportation has played a significant role in explaining China's tourism development as well. For example, Chen and Haynes [26] investigated the impact of the Chinese high-speed rail on international tourism demand using a dynamic panel regression model of 21 countries. They confirmed that high-speed rail produces a weak positive effect on tourist arrivals. Li, Yang, and Cui [27] also conducted a panel regression analysis to provide a comprehensive analysis of the impact of high-speed rail on tourist arrivals in China. They found that high-speed rail has a stronger effect on international arrivals than on domestic ones. Moreover, they found a heterogeneous impact of high-speed rail on tourist arrivals across regions. Dong et al. [28] put the road length (km) per area as the transportation indicator in the tourism demand equation, and the result shows a significant effect of road on tourist arrivals. Jin et al. [29] studied the impact of high-speed rail on ice and snow tourism in Northeast China. They pointed out that after the implementation of high-speed rail, tourism economic ties between cities have increased. The isochron analysis shows that the central city promotes the tourism development of the surrounding cities, the number of daily and weekend trips increases significantly, and the shortest travel time between cities is significantly shortened. However, air and maritime transportations on tourism development are found to be limited for China.
There is still a controversy among the researchers despite the decisive evidence of the positive impact of transportation on tourism development. The studies of Pagliara et al. [22] and Albalate and Fageda [9] revealed a negative impact of high-speed rail on tourist arrivals. They identified several reasons why the effect of high-speed rail on tourist arrivals is not positive. First, the railway network is not appropriately designed and does not correspond to the riders' needs. Second, there is a substitution effect of air transportation. A similar finding is reached by Chen and Haynes [26]. They analyzed and found the weak impact of Chinese high-speed rail on international tourism demand. Some empirical studies have found that transportation improvement may hamper tourism development as the length of stay or traveling is shorter, resulting in reduced travel expenditure [30].
This contradiction among the researchers creates a motivation for further investigation of the transportation-tourism nexus. In this study, we attempt in this direction, and China, which is the fourth most visited country in the world, is chosen. Previous research studies have considered various empirical and methodological aspects of modeling the effect of transportation on tourism. Nevertheless, they have been unable to consider the presence of a structural shift in tourism. Specifically, the relationship between tourism and any transportation has been assumed to be linear and constant over time. Due to this aspect, we suspect that the linear regression models, which were generally used in the previous studies, may oversimplify the transportation-tourism nexus. Hence, an adequate understanding and estimations of structural shifts are essential for formulating and introducing policies on tourism development [30][31][32]. To deal with the possible nonlinear effect of transportation, as well as the structural change in tourism, the Markov-switching regression (MSR) model is used in this study. The nonlinear impacts of different transportation modes on the top ten inbound tourism have not been econometrically analyzed in China's case to the best of our knowledge. Specifically, this is the first attempt in determining whether a different transportation mode exerts different influences on international tourist arrivals to China.
Another contribution that can be found in this study is an econometric one. Although the MSR approach could solve the issue of structural changes, insufficient sample data (overparameterization), multicollinearity, and endogeneity problems may still occur in our model. We would like to note that, in the literature, annual transportation data and tourism data are generally collected, and there might be an interaction among the transportation modes through the "substitution effect" [9] and "complementary effect" between modes. In addition, there is a mixture of the unit of measurements for each transportation mode, which can still lead to aggregation bias even after the logged values are used in the analysis. One way to ameliorate these concerns, we contribute to the academic literature by introducing regularization methods like ridge estimation of Hoerl and Kennard [33] and lasso estimation of Tibshirani [34] to estimate all unknown parameters in the MSR model. Ridge and lasso have several advantages over the traditional unpenalized maximum likelihood as they reduce complexity and prevent over-fitting of a model. In particular, lasso estimation makes the model easier to interpret by eliminating irrelevant variables that are not associated with the response variable (variable selection). Also, it reduces the extent of the problem in enabling the maximum likelihood to work faster, making it possible to handle high-dimensional data [35]. The estimated coefficients from the MSR model based on these two methods would reveal interesting findings and destination management implications that previous studies have not explored.

Markov-Switching Regression Model
There are corresponding theoretical supports to distinguish the number of international tourist arrivals into high tourist arrivals and low tourist arrivals [31,32,36]. In the literature, many studies were mostly distinguished the economic climate into two or three states. However, when there are more than two states in the Markov-switching model, the model will be too complicated to properly distinguish the characteristics of the data and make all the divided state types lack persistence and high volatility. The different states of tourism can be described by the unobserved state or regime variable, which takes the discrete values of 0 and 1 [36,37]. The model structure of the MS-regression model is defined as, where y t is the dependent variable at time t; x t is a k × 1 vector of observed exogenous or independent variables at time t; β(s t ) is a k × 1 vector of regime-dependent unknown coefficients. u t is an error term at time t, which has a standard normal distribution, u t ∼ N(0, 1), and σ 2 (s t ) is regime-dependent unknown variance at time t. Note that the unobserved state variable S t is evolved according to a first-order Markov chain with the transition probabilities of, where p 11 is the probability of remaining in regime 1, while p 22 is the probability of remaining in regime 2.

Estimation
In recent years, there has been significant interest in the MSR models, and it has led to the development of efficient estimation for estimating the parameters, for example, expected maximization algorithm [38], maximum likelihood estimation [39], and Bayesian estimation [40]. However, the number of parameter estimates of the MS models is quite high compared to other linear models; thus, these conventional estimations may provide excessive modeling biases and cause low predictability. In particular, when the number of estimated regime-dependent coefficients k is substantially large, the computation becomes much more challenging and difficult.
In this study, two penalty functions are proposed, one is a ridge, and the other is a lasso, for estimating all unknown parameters Θ(s t ) = {β(s t ), σ(s t ), p 11 , p 22 } in the MS regression model. It is known that y t and x t are directly observed but s t is an unobserved latent state variable and is inferred by what is happening with y t and x t . Let θ = {β(s t ), σ(s t )}, this inference will take the form of two regime probabilities, where Z t−1 is the information available at time t − 1 as well as x t . These two regime probabilities can be filtered by Hamilton's filter [41] as in the following steps:

•
Step 1: We employ the following steady-state or unconditional probabilities to find the initial probability π 1,0 and π 2,0 :

•
Step 2: After computing the inferences for the regimes at time t = 0, we can update the inferences for the regimes at time t = 1 as, where f (y t |Z t−1 ; θ(s t = j)); j = 1, 2 is the density function of regime j.

•
Step 3: We then make prediction probabilities for the regimes at time t = 2 as: • Step 4: We can similarly derive two regime probabilities for t = 2, ..., T. The inference performed iteratively for this process is called "the filtered probabilities". Once the filtered probabilities π j,t (for t = 1, ...T; j = 1, )2 are obtained as above, we can construct the full log-likelihood function as: As the normal distribution and penalty functions of lasso and ridge are considered in this study, the density function f (y t |x t ; θ(s t = j)) can be penalized by lasso and ridge as follows: • Lasso density To obtain a penalized maximum likelihood estimator Θ(s t ), we maximize the full log-likelihood function in Equation (11) with respect to Θ(s t ) for some candidates' regularization parameters λ. Hence: We would like to note that if λ increases, the regression coefficients shrink towards zero, while λ = 0 the traditional maximum likelihood estimation, is returned. It is worth noting that we penalize the only β(s t ) and choose not to penalize σ(s t ), p 11 , p 22 . If either β(s t = 1) or β(s t =)2 is equal to zero, it corresponds to the linear relationship between x t and y t .

Tuning the Regularization Parameter Using Information Criteria
The selection of the regularization parameter λ is important in ensuring the consistency of the maximum penalized likelihood estimation in Equation (14). Traditionally, the Bayesian information criterion (BIC) was generally used to find the optimal λ . However, Chen and Chen [42] found that the BIC is too liberal for model selection when the number of the estimated parameters is large. Therefore, they introduced the extended BIC to deal with the model with a moderate size, but a huge number of parameters. The Extended BIC can be computed as, where L y t |x t ; Θ(s t ); λ is the maximum likelihood of Θ(s t ) given the candidate λ. P is the number of parameters in the model. v(λ) is the degrees of freedom, which is measured by T − k * , where k * is the number of non-zero coefficients. ξ is the value belonging to the interval [0, 1]. In this study, we set ξ = 0.5 [43]. To find the best λ, we consider the lowest EBIC.
In addition, to better understand the theoretical analysis, we summarize the important notation presented in Table 1.

Data Sources
In this study, the MSR model has been proposed to investigate the impacts of various transportation modes on the top ten source countries of Chinese inbound tourism. The dataset collected contains information on Chinese transportations and the number of tourist arrivals. The largest number of tourists originated from the following ten countries: South Korea, Japan, Russia, the USA, Mongolia, Malaysia, Philippines, Singapore, India, and Canada. We consider the annual time series data covering 1996-2018. The data are collected from various sources and databases (Table A1, Appendix A).
As permitted by the data availability and their tourism significance, the data required for this study is collected from many different data sources. In this study, we consider the number of tourist arrivals as the dependent variable, representing China's tourism development [4][5][6][7]. This data is collected from the CEIC database (https://www.ceicdata.com/en (accessed on 5 February 2021)). The core explanatory variables of interest are four transportation modes, namely road, railway, air, and maritime; (i) road transportation is proxied by the Chinese highway length [28]. This variable is used to measure the transportation and availability, and quality of internal land transportation [6,44]; (ii) railway transportation is proxied by the Chinese railway length in operation and fixed-asset investment on the railway [9,10,12]. This indicator is recognized as safety, convenience, timeliness, flexibility, and affordability of transportation for tourists [21]; (iii) maritime transportation is proxied by the Chinese coastal major port, Chinese river major port, and the river's Chinese navigable length. The first two variables are viewed as embarking and landing of passengers from ship to land. Thana [45] mentioned that port infrastructure is the initial start of the transport chain. Specifically, the travel can be started from the sea and continues with the intermodal transports (road, rail, or air). The data regarding road, rail, and air are collected from the Ministry of Transport of the People's Republic of China (http://www.epschinastats.com/db_transportation.html (accessed on 5 February 2021)); (iv) air transposition is proxied by the number of international air routes in China, the length of China's international air route, the number of Chinese airports, the number of international air passenger traffics in China, and Chinese aircraft daily utilization. Air is included in the tourism equation as it proxies greater speed, safety, service quality, and reliability [21]. As suggested by [6,15,44], air transportation plays a vital role in boosting tourism growth. For aircraft daily utilization variable, it is used to measure aircraft productivity. The air transportation data can be retrieved from the CEIC database (https://www.ceicdata.com/en (accessed on 5 February 2021)).
Nevertheless, tourism has also been recognized as a volatile industry that can be affected, not only by tourism infrastructure (transportation) [5,6], but also affected by local and global economic conditions, macroeconomic policies, and security [46][47][48]. Although there are various amazing tourist attractions in China, Chinese inbound tourism is still highly vulnerable to these factors, and thereby, leads to challenges in establishing tourism development strategies, tourism infrastructure, as well as delivering tourism products and services at consistent standards [49]. Therefore, the economic factors are also considered as the control variables in this study. The choice of these control variables is in line with the literature. The control variables used to explain tourism arrivals, include inflation rate [50,51], as well as GDP per capita of the country of origin and China [6,52]. Although the exchange rate is another important factor affecting the tourism demand [48,53], it is essential to highlight that the unavailability of data for the period 1996-2018 represented a limitation of this study. Therefore, the ridge and lasso estimations have been employed to resolve this issue and the independent variables' multicollinearity. We note that all economic variables are collected from the World Bank database (https://data.worldbank. org (accessed on 5 February 2021)). Besides, all statistical calculations and representations are done by the R program.

Empirical Model
This study intends to understand the regime-switching effect of transportation modes on international tourism development. Therefore, the two-regime MSR model is constructed, and our empirical model for each source of tourists to China can be defined, where lnTOUR i,t is the log of tourist arrivals from a given individual country i to China, at time t. The vector lnAIR i,t is a set of air transportation variables(the number of international air routes of China, The length of the international air route of China, the number of Chinese airports, and Chinese aircraft daily utilization), lnMARITIME i,t is a vector of maritime transportation variables (Chinese coastal major port, Chinese river major port, and Chinese navigable length of the river). lnROAD i,t is a road transportation variable (Chinese highway length), lnRAILWAY i,t is a set of railway transportation variables (Chinese railway length and Chinese fixed-asset investment on the railway), and CONTROL i,t is a set of control variables (the relative real GDP per capita between the home country and China and Chinese inflation rate). We perform natural logarithms difference transformation for all variables, except for the inflation rate (units of percentage), to reduce the possible heteroscedasticity.
The descriptive statistics of the data in this study are summarized in Table 2. Table 2 reports the summary statistics of our variables presented in Table 1. Based on the mean growth of tourist arrivals, the largest growth of tourists originated from the following ten countries: India (0.144), South Korea (0.113), Mongolia (0.094), Canada (0.093), Russia (0.09)2, Malaysia (0.088), the Philippines (0.081), the USA (0.080), Singapore (0.069) and Japan (0.061). According to Figure 1, we illustrate the yearly tourist arrivals in these ten countries, and it is observed that there is only Japan and Russia exhibit a weak upward trend, and the others show a persistent growth. We also observe a significant drop in tourist arrivals in 2004 and 2008, corresponding to the SARS epidemic in China, and during 2007-2008, corresponding to the global financial crisis and during 2014-2015, which appears to be consistent with the steep visa fees, cumbersome rules for tourists, and harmful pollution. This indicates that there might exist a structural change in Chinese tourism demand. Note: JB denotes Jarque-Bera normality test, and ZA denotes Zivot and Andrews nonlinear unit root test. "*", "**" and "***" denote rejection of the null hypothesis at the 10%, 5%, and 1% significance level, respectively.

Unit Root Test
Before estimating the MSR model to analyze the impact of transportation on tourism, as well as the business cycle of the Chinese tourism development, we need to check whether the data is stationary, using the unit root test as non-stationarity of the variables can lead to a spurious regression. In this study, we consider both linear and nonlinear unit root tests, namely the Augmented Dickey-Fuller (ADF) test [54] and Zivot and Andrews [55]. Note that the ADF test does not account for the potential structural breaks in the time series, and it is possible to provide inaccurate results when a structural break has occurred in the time series. Therefore, we also consider the ZA test to validate our stationary test, taking into consideration the presence of structural breaks.
In this study, we consider the ADF test that involves regressing on a lagged first difference (y t−1 ), a linear deterministic trend, (T), and P-lagged first differences (∆y t−p ). It can be expressed as follows, where ∆ is the first difference operator. The null hypothesis of a unit root (H 0 : θ = 0) is tested against the alternative that the variable is stationary (H 1 : θ < 0). However, the ADF tests did not encounter possible structural breaks in estimation of unit roots. Zivot and Andrews [55] then generalized the Equation (17) as, where DU t = 1 if t < TB, else DU t = 0, is a dummy parameter for mean shift arising at each potential breakpoint (TB) and DT t = t − TB if t > TB, else DT t = 0 if t < TB is the trend shift variable. Among all the possible breakpoints, the optimal TB is selected when the absolute value of the t-statistic from the ADF test is minimized. The null hypothesis demonstrates that the variable encompasses a unit-root with a drift that disregards any break, whereas the alternative hypothesis illustrates that the parameter follows a stationary trend procedure with a single break that appears at an unidentified time.
The results of these two tests are reported in Table 2. The result shows that the ADF and ZA tests reject the nonstationary null hypothesis, indicating that all the time series variables are stationary. Therefore, these variables are appropriately used for studying the relationship between Chinese inbound tourism and transportation and the tourism cycle further.

Model Comparison
In this study, the two-regime Markov-switching lasso regression (MS-LR) and Markovswitching ridge regression (MS-RR) models are assumed to investigate the nonlinear effects of transportation modes on ten top-tourist arrivals to China. Therefore, we need to validate our MSR model's performance by comparing it with the single-regime model, say linear ridge regression (RR), and lasso regression (LR) models. In this comparison, the EBIC is used as the comparison measure, and the lowest EBIC indicates a more parsimonious model. Table 3 shows the model comparison of five different models for all ten tourism demand equations. The result confirms our hypothesis of the nonlinear effect of transportationtourism nexus as the EBICs of the two-regime models, MS-LR and MS-RR, are lower than the single-regime models, namely linear-MLE, RR and LR. Focusing on the EBICs of our proposed models, MS-LR and MS-RR, Table 3 highlights that MS-LR is superior to MS-RR for all our arrival series, except for Russian tourists. This finding is not surprising as MS-LR is more flexible than MS-RR and can achieve robust parameter estimation and variable selection in Markov-switching regression simultaneously. However, we cannot conclude that lasso is always the best estimation for the MSR model as a data set's characteristics can influence regularized estimator's performance. Emmert-Streib and Dehmer [56] suggested that not one method always dominates the others since they all have specific strengths and weaknesses. Therefore, the MS-RR model may be preferred in the case of Russian tourist arrivals.

Estimation Result
We use two different econometric techniques in our estimates as the traditional MSregression estimated by MLE can lead to a large variance when a large number of predictors and a relatively small number of observations are used. The computation is impossible if the number of parameter estimates is larger than the number of observations. Therefore, two types of penalty likelihood functions, namely ridge, and lasso are applied to our MS regression.
Indeed, it can be seen that MS-LR is superior to MS-RR for all our tourist arrival series, except for Russian tourists (see Section 4.1). Therefore, the results from the best model specification are only interpreted and reported in Table 4. We note that the ridge penalty keeps all predictors in the MS-regression model, while lasso ensures sparsity of the results by shrinking some coefficients exactly to zero. Therefore, the t-statistics is used to test the significance of each predictor in the MR-RR model.      Note: GDPO is the relative Gross Domestic Product of home country and China. For the ridge estimation, "**" and "***" denote rejection of the null hypothesis at the 5%, and 1% significance level, respectively. The parentheses ( ) is the standard error. There is no standard error for the lasso estimation and statistic inference as the insignificant parameters are already eliminated by the penalty term.

Variable
From the above analysis results, transportation has a regime-switching effect on the top ten source countries of Chinese inbound tourism. It can be noted from the estimated intercept terms of regimes 1 and 2, these values are significant differences, and the value of the intercept term of regime 1 is lower than regime 2. Therefore, we can interpret regime 1 as a low tourist arrival regime, while regime 2 is a high tourist arrival regime.
For the low tourist arrival regime (regime 1), the first conclusion to be drawn is that transportation is confirmed to have played an important role together with the control variables. This is more pronounced for the case of Mongolia (MO), the Philippines (PH), Malaysia (MA), and India (IN) as ten out of eleven transportation indicators are not eliminated from the models. These results indicate that the development of all Chinese transportation infrastructures, railway, maritime, air, and roads could attract more tourists from Mongolia, the Philippines, Malaysia, and India to China. Second, the Chinese railway length in operation (RLO) shows a strongly negative impact on inbound tourism from all ten countries. Surprisingly, the net impact of Chinese railway length on tourism does not seem to be consistently positive. This result is similar to Pagliara et al. [22] and Albalate and Fageda [9], who suggested that the railway network is not appropriately designed and does not correspond to the traveler's needs. The second is the substitution effect of air transportation. This is to say, the substitution effect of railway on aviation has decreased the number of tourist arrivals in China, which has indirectly resulted in negative effects on inbound tourism. However, if we consider the Chinese fixed-asset investment on the railway (RFAI), the effect of this variable is significant, with the expected positive sign. That is, the higher investment on the railway leads to higher Malaysian, South Korea, Russian, Singaporean, Mongolian, and Philippines tourist arrivals. Third, the maritime indicators, maritime transportation infrastructures have shown a mix of positive and negative impacts on tourism demand. It is found that the Chinese river port and the navigable length of the river contribute a positive impact on tourism demand, but it is harmful to the Chinese coastal port. It is surprising that the Chinese coastal port's operation negatively impacts tourists from Japan, the Philippines, India, Russia, and the USA. The possible reason is that tourists and businesses employ water transportation to travel, import, and export, respectively, thus, increasing the operation of Chinese coastal ports may increase traffic intensity and extend the time cruisers remain at port destinations. Lau and Yip [57] mentioned that the coastal ports require advanced port facilities to fulfill the increasing size of cruise ships and the increased cruiser transit. Therefore, inefficient port facilities cannot generate higher tourist arrivals but can reduce coastal tourism. In addition, maritime transport is sometimes related to the marine environment's threat as pollution threatens society's interests in various fields concerning the marine environment, ranging from human existence to recreation, including tourism [58].
Fourth, considering air transportation, airport infrastructure variables (NIAR, LIAR, APT, ADU), they have been a relatively important tourism generation element, especially true for the daily aircraft utilization (ADU). We find that daily aircraft utilization shows a positive impact on all sources of Chinese international tourism. Note that this variable can be viewed as the measurement of aircraft productivity; thus, this result provides evidence that aircraft productivity plays a vital role in boosting Chinese tourism growth. This interpretation is consistent with the results revealed by Xie and Tveterås [44], Yin et al. [12], Khadaroo and Seetanah [6], Eric, Semeyutin, and Hubbard [15]. Fifth, based on the magnitudes of transportation coefficients, we find that ADU performs the highest positive impact on Chinese tourism demand, while RLO contributes to the highest negative impact. The logarithm coefficients of aircraft daily utilization across source countries are positive and range from the lowest of 0.0794 in the South Korean tourists to the highest of 1.7347 in Mongolian tourists. For RLO, the logarithm coefficients of Chinese railway length in operation across source countries are negative and range from the lowest of −2.5162 in the Japanese tourists to the highest of −0.4678 in South Korean tourists. Overall, tourists are particularly sensitive to railway infrastructure and air productivity, as judged by their coefficients.
In relation the control variables, we find that the relative GDP of the originating country variable has the strongest impact on foreign tourists for China. The elasticities range from 0.0343 in Singaporean tourists to the highest of 0.5561 in Indian tourists, which can be interpreted as a 1% increase in the GDPO is associated with an increase of around 0.0343 to 0.5561 in foreign tourist arrivals.
In the case of the high tourist arrival regime (regime 2), a brief interpretation of the estimation results in Table 4 is given below for greater clarity. First, transportation is also confirmed to play an essential role in this regime. However, the effect is less pronounced than that in regime 1. Second, we observe that the HWL and ADU variables are key determinants attracting more tourists from all countries. Bao et al. [59] suggested that tourists spend most of their time on the roads during high travel periods, resulting in declining quality of the vacation and unpleasant travel experience. Therefore, the development of new highways could relieve traffic congestion, as well as provide an opportunity to explore new destinations during the high tourist arrival regime. For ADU, we expect that the supply of airport infrastructure is still limited and is not consistent with the demand of international tourists during the high tourism regime; hence, aircraft productivity improvement would enhance capacity utilization. Consequently, it makes economic sense to expand aircraft productivity to gain more tourist arrivals from these ten countries. Third, it is observed that maritime transportation shows a weak positive impact on seven out of ten source countries (the Philippines, Singapore, Malaysia, India, USA, Russia, and Canada) as the relatively small coefficient values of RPL and NLR are presented. Fourth, among all the control variables, it is found that the relative country's GDP has a significant impact on the number of foreign tourists from all countries, except for the USA and Canada. A possible explanation for this phenomenon is that economic expansion may generate a positive economic climate that favors international tourism activities. However, in the case of the USA and Canada, the economic growth-tourism link is quite limited. This minor role of the originating country's GDP is not surprising if we consider that the USA and Canada tourists, which are high-income tourists, are not dependent on the price levels in China [8]. Van Can [60] confirmed that the choice of high-income travelers does not depend on price.
Comparing these two regimes, we observe that the estimated coefficient is more significant in regime 1 (low tourist arrival regime). The estimated coefficients in regime 1 are all positive (except for railway length in operation). In contrast, the estimated coefficients in regime 2 are less positive and weak, which reveals that Chinese transportation has an asymmetrical impact on China's inbound tourism. This study shows that in the process of transportation development and inbound tourism demand from ten countries, some variables' original strong positive effects transform into weak positive effects. These results can be explained by the fact that the number of tourism arrivals is greater in regime 2 than in regime 1. Therefore, transportation may not be the key driving tourist arrivals in the second regime. Furthermore, there is evidence that aircraft productivity looks more promising than other variables in regime 1, while both aircraft productivity and highway road are more pronounced in regime 2. This result suggests that different development plans are required for attracting Chinese inbound tourism. Table 5 shows the probabilities of being in regime 1 and regime 2 of the best-fit MSmodel (presented in Table 4) for the 10 source countries. The first conclusion is that the transition probabilities for all source countries are similar and range from 0.3114 (Japan) to 0.6298 (Singapore) for p 11 and range from 0.945 (1USA) to 0.9937 (Malaysia) for p 22 . Specifically, when the current state of Chinese tourism demand is regime 1 at time t, the probabilities of Chinese tourism demand remaining in regime 1 at time t + 1 range from 0.3114 to 0.6298. Likewise, if the current state of Chinese tourism demand is regime 2 at time t, the probabilities of Chinese tourism demand remaining in regime 2 at time t + 1 range from 0.9451 to 0.9937. These results suggest that the chances of switching from one regime to the other are not high in the Chinese tourism market. Also, the duration for which Chinese tourism remains in regime 1 ranges from the lowest of 1.4522 months in the Japanese tourists to the highest of 2.7012 in the Singaporean tourists, and that for regime 2 ranges from the lowest of 18.2149 months in the Japanese tourists to the highest of 158.302 in the Singaporean tourist. This suggests that the duration of the high tourist arrival regime is approximately 10-100 times the length of the low tourist arrival regime. Based on these findings, we can conclude that the Chinese tourism market has a high tendency to remain in the high tourist arrival regime and a low tendency to shift from the high tourist arrival regime to the low tourist arrival regime during 1995-2018. This implies that the expansion effect on Chinese tourism produced by the transportation modes is promising as they keep the Chinese tourist arrivals to maintain high most of the time. Finally, we then investigate the tourism cycle for all ten source countries using the obtained filtered probabilities (π 1,t and π 2,t ). The plot of the filtered regime probabilities tells us when the tourist arrivals follow the same behavior, which is either the probability of tourist arrivals is high (regime 2) or low (regime 1). The results of the filtered probabilities of the high tourist arrival regime are illustrated in Figure 2. From this figure, it can be seen that the evolution of filtered probabilities shows similar time-varying paths for each of the ten tourist arrival series, and the possibility of being in a high-tourism state is persistent and longer in regime 2. This indicates that Chinese transportations resulted in the structural change of the overall Chinese tourism demand. Moreover, we observe a sudden drop in the probabilities of staying in this regime around 2015-2016 for all tourist arrivals (except South Korea and the Philippines). This data point may denote the emergence of severe pollution and the introduction of censorship and complicated visa requirements, which made traveling to and within China riskier, and more expensive, respectively. Moreover, there is a massive decline around 1997-1998 in the probabilities of South Korean, Malaysian, Indian, Philippines tourist arrivals, which appears to coincide with the Asian financial crisis period. Furthermore, when one looks at all probabilities of Malaysian and Singapore tourists, these two tourist sources' probabilities are low in 2007-2008, which appears to be the time of the global financial crisis originated from the USA. As a result of the 2007-2008 US crisis, the global economy exerted adverse spillover effects on Asian tourism [61].
Throughout these empirical and statistical facts regarding international tourist arrivals in China, our empirical results are quite consistent with the historical situation. In other words, our models are well-suited for such empirical studies. We can observe a significant structural change of South Korean, Malaysian, Indian, Philippines tourists during [1997][1998][2007][2008] for Malaysian and Singapore tourists, and 2015-2016 for all tourist arrivals (except for South Korea and the Philippines). These data points correspond to external factors (economic crisis) and internal factors (Chinese environment and visa policy), which could steadily push China down on the list of top tourist destinations.
This study implies that there exists two regimes of international tourism arrivals to China. Unlike the results of existing literature, we show that the impacts of transportation modes are different among each other and between two regimes. Our evidence, obtained from filtered probability, reveals that unobserved events (such as economic crisis and domestic regulations) play an important role in China's international tourist arrivals. These results provide useful information for the policymaker to prepare a proper tourism policy for high tourist arrivals and low tourist arrivals.

Discussion, Conclusions, and Suggestions
This empirical study explores the influence of various transportation modes, namely air, road, maritime, and railway, on top-ten international Chinese inbound tourism from 1995 to 2018 using the Markov-switching regression (MSR) models. There are two distinctive features of this paper. First, this paper compares the magnitude of the impact of various transportation modes when international Chinese tourism corresponds to low tourist arrival regime and high tourist arrival regime. Specifically, the impact of various transportation modes can have a structural shift through the MSR models. Secondly, through this model, this paper runs Ridge and Lasso estimations of parameters in 10 sources tourist arrival equations after conducting the comparison of linearity against alternative switching models. We note that the Ridge and Lasso are proposed for incorporation in the MSR models to reduce complexity and prevent over-fitting of the MSR model. Also, the techniques help reduce the size of the problem to enable maximum likelihood to work faster, making it possible to handle high-dimensional data [62]. Thirdly, the cycle of 10 international Chinese inbound tourists is detected and illustrated.
Overall, the impacts of transportation modes on top-ten international Chinese tourist arrivals are mainly nonlinear, and the MSR models capture two tourist arrivals regimes. In relation to the magnitude of the effects of transportation modes on the Chinese tourism market, we observe that the aircraft productivity shows the highest positive contribution to Chinese tourism demand, while the railway length in operation contributes the highest negative impact. Surprisingly, while railway transport infrastructure is commonly believed to impact the tourism demand positively, our study obtains inconsistent evidence in favor of a positive relationship between railway transport infrastructure and tourist outcomes. As suggested by Albalate and Fageda [9], railway transportation may be more competitive in terms of travel time, frequencies, and comfort, but not necessarily in terms of price. Hence, visitors are more sensitive to price than time; the overall competitiveness of railway transportation concerning aviation may not positively affect tourist outcomes. In addition, Pagliara et al. [22] and Albalate and Fageda [9] also suggested that the railway network is not appropriately designed and does not correspond to the visitor's needs. In the view of Chinese railway transportation, the first suggestion may not be the key source of the negative outcome of tourism demand in China as the Chinses railway transportation price is relatively low compared to other countries. We expect that an inappropriate design of the railway network in China is the main problem. At the end of 2018 there was 131,000 km of a railroad across China, and 29,000 km made up a high-speed railroad, which is the largest high-speed rail system in the world [63]. However, the railway network, especially the traditional railway, may fail to fully transport all passengers to their destinations as the railway network in most places is not in a good condition, and thus, access to destinations is problematic. Moreover, the railway network expansion may lead to long-time travel and an unpleasant train journey. For aircraft productivity, the result shows a strong positive impact for all tourist arrival series, signifying that aircraft productivity is one of the main factors affecting the Chinese tourism demand.
By comparing high and low tourist arrival regimes, we find a stronger impact of transportation modes in low tourist arrival regime than high tourist arrival regime. A strong positive effect of some variables will transform into a weak positive effect. This indicates that transportation is a key driver boosting the international tourist arrivals in the low tourist arrival regime. We also find a significant structural change of South Korean, Malaysian, Indian, Philippines tourists during 1997-1998 corresponding to the Asian financial crisis, during 2007-2008 corresponding to the global financial crisis for Malaysian and Singapore tourists, and during 2015-2016 corresponding to Chinese visa restriction and pollution for all tourist arrivals (except for South Korea and the Philippines).
The impacts of transportations, especially highway road and aircraft productivity on Chinese tourism, become more salient when tourist arrival is classified in the low tourist arrival regime. Thus, the expansion of highway road construction and aircraft productivity improvement could generate sizable fiscal revenues from the international tourist arrival. In addition, by continuously improving aircraft productivity in the high tourist arrival regime, China could attract more international tourist arrivals. This suggests that China's government should continue to pursue innovations and technologies in its aircraft productivity, as well as simplify procedures and improve efficiency. Specifically, if tourism demand is growing rapidly, but capacities are limited, it makes economic sense to expand stretched capacities or increase air transportation productivity in China.
Furthermore, the structural change of the Chinese tourism demand is another critical issue observed in this study. If it is ignored, this may result in inappropriate transportation development, causing the government's policy to be misdirected. The Chinese government should be aware of the factors affecting structural change and incorporate them into the policy-planning.
Some limitations suggest directions for future work. Firstly, we use annual data, while monthly or quarterly data may be better for understanding the structural change in tourism demand. Future research may consider collecting monthly or quarterly data to capture better the structural changes in the tourism cycle. Secondly, the variables analyzed in this paper are transportation and some macroeconomic factors (control variables). Future research can introduce other variables when investigating this topic, such as the number of five-star hotels, exchange rate, and population. In addition, this paper adopts the number of arrivals of international tourists from the source country. Future research may consider international tourist receipts. In addition, as suggested by Thai, Wu, and Xiong [64] and Stai et al. [65], social characteristics and personalization are the important key factors determining human preference. Therefore, further research may consider these variables and investigate their effects on tourists' preferences for transportation modes.

Data Availability Statement:
The data used in the empirical analyses are available online at the CEIC database, World Bank database, and http://www.epschinastats.com/db_transportation.html (accessed on 5 February 2021). The data, however, are available upon request.