Modeling the Trip Distributions of Tourists Based on Trip Chain and Entropy-Maximizing Theory

: Suburban tourist railway is an emerging transportation mode for tourism. Knowing the travel demand and trip distribution patterns of tourists is an important prerequisite to the planning and construction of suburban tourist railways. However, this issue has attracted very little research attention so far. Therefore, this paper proposes a forecasting model focused on the trip distribution of tourists who travel with the suburban tourist railway. Based on the analysis of the characteristics of tourists’ trips and the use of the trip chain method, the frequency, order, distance, and visiting volume of stay points of the trips of tourists have been intensively studied. Then, a tourist trip distribution forecasting model was built in this paper. It uses the Entropy-Maximizing theory to predict trip chain distribution probability and obtain the distribution of tourists within the city. A case study that takes the H city as an example was conducted to test the proposed model. The results of this case show that the output of the model can reﬂect the real trip distribution characteristics of tourists very well, which demonstrates the applicability and effectiveness of the proposed model.


Introduction
With the development of tourism and suburban railways, the suburban tourist railway has become an emerging solution to meet the needs of "fast move, slow tourism" [1,2] for tourists. A suburban tourist railway connects the tourist attractions, transportation hubs, and passenger distributing centers located in the urban and suburban areas of a city. It also has the characteristics of rail transit such as large capacity and comfort, and is both fast and safe [3,4]. This indicates that suburban tourist railway could significantly optimize the tourist traffic network of a city, enhancing the connections between urban and suburban tourist attractions, so as to bring great benefits (e.g., efficiency and convenience) to the tourists, the tourism industry, and the transportation systems of the city, especially when facing a large number of tourists and heavy traffic congestion. Thus, the construction of suburban tourist railways, including the feasibility analysis, planning, and design, is of great importance to a city's tourist transport and economy. Knowing the transport demand and trip distribution patterns of tourists is an important prerequisite to the planning and construction of suburban tourist railways. Therefore, the research of tourist flow spatial distribution forecasting models is an important issue in the development of the suburban tourist railway.
In recent years, considerable efforts have been devoted to the research of tourists' trip activities and transport demand forecasting models. In terms of the trip activity analysis, Xu et al. [5] established the gravitational order of tourist flow to tourism cities in different periods based on the analysis of the characteristics of tourist flow. Li and Lin [6] examined the spatial-temporal dynamics of tourist flow in tourist destinations and built the spatial concentration index model and the tourism center of gravity model. Song and Guo [7] analyzed the travel laws of tourists using mobile communication big data and established a statistical model based on the big data of tourist flow. On the other hand, the most commonly used trip distribution forecasting methods are aggregate and disaggregate models [8]. Aggregate models mainly include the growth-factor methods (e.g., the Fratar method), gravity models [9][10][11], and intervening opportunities models [12]. In particular, the gravity models have later been explained using the maximum entropy theory and maximum likelihood principle. The intervening opportunities model was modified by Sheffi [13] to make it more widely used. Disaggregate models include random utility models such as Logit and Nested Logit [8]. In the last few decades, activity-based models which view travel as a demand derived from travelers' needs to participate in activities have attracted much attention [14,15]. In this context, based on the travel laws of tourists, Tian et al. [16] proposed a trip distribution model using trip chain [17,18], a sequence of linked trip segments between a pair of anchor activities, as the unit of trip distribution analysis. This makes it different from traditional trip distribution models [19] in which the trips are independent from each other. Besides, Tang et al. [20] reported a new method based on the Entropy-Maximizing (EM) theory to model the Origin-Destination (OD) distribution of taxi trips. Prior probability was used in it to improve the prediction performance of trip distribution. Furthermore, new data sources and techniques such as social media data and Machine Learning (ML), respectively, have been increasingly used to improve the predictive power and accuracy of trip distribution models [21,22]. In summary, these methods have significantly improved the understanding of tourists' trip activities and transport demand. However, existing studies mainly focus on the model of trip distribution of traditional transportation modes; very little attention has been paid so far to solve the problem of forecasting the tourist flow distribution of a suburban tourist railway. As a result, planners and policy-makers from the local government of a tourist city still lack proper methods and solid evidence to plan and design suburban tourist railways.
To fill this gap, this paper proposes a method based on the Entropy-Maximizing theory and trip chain to model the tourist flow distribution of suburban tourist railway. By analyzing the trip distribution characteristics of tourists (Section 2), a trip chain-based model was built to describe the trip distribution of tourists (Section 3). Then, a tourist flow distribution forecasting model for the suburban tourist railway was built based on the trip chain model and Entropy-Maximizing theory (Section 4). A case study was conducted using trip distribution data collected from the tourists who take suburban tourist railways around the H city to demonstrate the use of the proposed method (Section 5). Analysis of the results shows that the model can reflect the real trip distribution characteristics of tourists very well. Section 6 provides the conclusions of this paper.

Trip Distribution Characteristics of Suburban Tourism Activities
Tourism activity refers to the process that tourists take time and space for tours and sightseeing. Tourism activities could be influenced by many factors such as the number, distribution, trip distance, and attraction of scenic spots. Normally, tourists attracted by multiple scenic spots will organize multiple trips and activities in a certain order. Thus, tourism activities within a city's suburbs could have many kinds of modes ( Figure 1). The primary characteristics of tourism activities are listed below: 1. The multiple trips characteristic. Tourists usually like to visit multiple scenic spots when they travel in a tourism city. Each visit to a scenic spot is recorded as a trip. Thus, tourism activity could involve multiple trips. Note that the number of trips is generally limited due to the factors such as travel time and tourists' willingness. 2. The sequential characteristic. The travel paths of tourism activities are not randomly distributed but follow certain rules. For example, when tourists visit different scenic spots of a tourism city, they generally travel in a certain order. Specifically, they always travel starting from a departure point (e.g., a hotel or transportation hub), then arrive at various scenic spots, and finally end at an accommodation place or transportation hub. This indicates that the trips involved in tourism activity are sequential trips that can be linked as a trip chain. 3. The trip distance characteristic. Activities between transportation hubs and scenic spots reflect the trip distance of tourists [23]. Trip distance between departure points and scenic spots, or between scenic spots, is a very important factor that can affect the time, costs, and experiences of tourism activities. It can be regarded as a traveling obstacle since long-distance travel is time-consuming, costly, and tiring. Thus, the scenic spots near the urban areas generally could attract more tourists. 4. The clustering characteristic. Activity places of tourists in tourism cities mainly include transportation hubs, scenic spots (especially scenic spots close to each other), and accommodation places. Transportation hubs are distributing centers of tourists, scenic spots (tourist attractions) are the destinations of tourism activities, and accommodation places are the locations where tourists have rest, respectively. What they have in common is that they are the nodal areas that can attract and cluster a large number of tourists.

Basic Idea for Trip Distribution Modeling
In traditional trip distribution models, the gravity model and growth factor model calculate trip distribution based on only one trip, whereas the tourist trip chain method takes tourists' multiple trips into consideration, which can be viewed as chaining of trips. The trips of tourists in a trip chain have a chain structure in which tourists start from the origin point, then visit several scenic spots, and finally arrive at the destination point. The The primary characteristics of tourism activities are listed below: 1.
The multiple trips characteristic. Tourists usually like to visit multiple scenic spots when they travel in a tourism city. Each visit to a scenic spot is recorded as a trip. Thus, tourism activity could involve multiple trips. Note that the number of trips is generally limited due to the factors such as travel time and tourists' willingness.

2.
The sequential characteristic. The travel paths of tourism activities are not randomly distributed but follow certain rules. For example, when tourists visit different scenic spots of a tourism city, they generally travel in a certain order. Specifically, they always travel starting from a departure point (e.g., a hotel or transportation hub), then arrive at various scenic spots, and finally end at an accommodation place or transportation hub. This indicates that the trips involved in tourism activity are sequential trips that can be linked as a trip chain. 3.
The trip distance characteristic. Activities between transportation hubs and scenic spots reflect the trip distance of tourists [23]. Trip distance between departure points and scenic spots, or between scenic spots, is a very important factor that can affect the time, costs, and experiences of tourism activities. It can be regarded as a traveling obstacle since long-distance travel is time-consuming, costly, and tiring. Thus, the scenic spots near the urban areas generally could attract more tourists.

4.
The clustering characteristic. Activity places of tourists in tourism cities mainly include transportation hubs, scenic spots (especially scenic spots close to each other), and accommodation places. Transportation hubs are distributing centers of tourists, scenic spots (tourist attractions) are the destinations of tourism activities, and accommodation places are the locations where tourists have rest, respectively. What they have in common is that they are the nodal areas that can attract and cluster a large number of tourists.

Basic Idea for Trip Distribution Modeling
In traditional trip distribution models, the gravity model and growth factor model calculate trip distribution based on only one trip, whereas the tourist trip chain method takes tourists' multiple trips into consideration, which can be viewed as chaining of trips. The trips of tourists in a trip chain have a chain structure in which tourists start from the origin point, then visit several scenic spots, and finally arrive at the destination point. The trip chain reflects the actual trips of tourists and every tourist has a trip chain. Using the trip chain method to investigate the travel activities of tourists can reveal the order, frequency, distance, and the number of tourists at the stay points of tourists' trips. This accords with the trip distribution characteristics of tourism activities.
A study area can be divided into multiple traffic zones according to the scenic spots and transportation hubs. These traffic zones can be regarded as stay points in trip chains. Thus, the tourist trip chain becomes chaining of the activities between traffic zones. Then, to predict the spatial distribution of tourists based on the activity modes of the tourists' trip chain, the first step is to count the number of traffic zones in the tourism city, the next is to clarify the trip chain types and count the number of types according to the statistic of traffic zones and the characteristics of tourists' trips. Besides, each of the stay points is both the ending point of the last trip and the starting point of a new trip, which implies they would cluster a certain number of tourists. Thus, we can calculate the amount of attraction and production of tourists for each traffic zone. Then, based on the visiting number and trip distance constraints of the stay points, distribution functions could be established to calculate the probability of the type of each trip chain. After that, the trip volume of each trip chain in the traffic zone can be obtained, as well as the volume of trip distributions between each zone. Figure 2 depicts the overall process.
Appl. Sci. 2021, 11, x FOR PEER REVIEW 5 of 20 trip chain reflects the actual trips of tourists and every tourist has a trip chain. Using the trip chain method to investigate the travel activities of tourists can reveal the order, frequency, distance, and the number of tourists at the stay points of tourists' trips. This accords with the trip distribution characteristics of tourism activities.
A study area can be divided into multiple traffic zones according to the scenic spots and transportation hubs. These traffic zones can be regarded as stay points in trip chains. Thus, the tourist trip chain becomes chaining of the activities between traffic zones. Then, to predict the spatial distribution of tourists based on the activity modes of the tourists' trip chain, the first step is to count the number of traffic zones in the tourism city, the next is to clarify the trip chain types and count the number of types according to the statistic of traffic zones and the characteristics of tourists' trips. Besides, each of the stay points is both the ending point of the last trip and the starting point of a new trip, which implies they would cluster a certain number of tourists. Thus, we can calculate the amount of attraction and production of tourists for each traffic zone. Then, based on the visiting number and trip distance constraints of the stay points, distribution functions could be established to calculate the probability of the type of each trip chain. After that, the trip volume of each trip chain in the traffic zone can be obtained, as well as the volume of trip distributions between each zone. Figure 2 depicts the overall process.

Basic Definitions of Tourist Trip Chain
A trip refers to the travel of people, goods, or cars as a means of transportation from one place to another [24]. Specifically, the travel of people is called personal or individual trips. Meanwhile, a trip chain is defined as a round trip consisting of people completing one or more activities in chronological order and contains a large amount of information about the time, space, ways, and activities of the trip [25]. The structure of a trip chain generally consists of multiple segments, each of which is recorded as a trip. Thus, a tourist trip chain is a set of sequential trips taken by tourists who participate in one or more tourism activities for tours and sightseeing [26]. Normally, we define the departure point of the trip chain as the starting point, the final destination of the trip chain as the ending point, and the points in the middle of the trip chain as stay points.

Basic Definitions of Tourist Trip Chain
A trip refers to the travel of people, goods, or cars as a means of transportation from one place to another [24]. Specifically, the travel of people is called personal or individual trips. Meanwhile, a trip chain is defined as a round trip consisting of people completing one or more activities in chronological order and contains a large amount of information about the time, space, ways, and activities of the trip [25]. The structure of a trip chain generally consists of multiple segments, each of which is recorded as a trip. Thus, a tourist trip chain is a set of sequential trips taken by tourists who participate in one or more tourism activities for tours and sightseeing [26]. Normally, we define the departure point of the trip chain as the starting point, the final destination of the trip chain as the ending point, and the points in the middle of the trip chain as stay points.
To facilitate the study, this paper divides tourist destinations into different traffic zones. These traffic zones can be further classified into several types, including scenic spotdominated traffic zones, transportation hub-dominated traffic zones, city entrance/exitdominated traffic zones, and traffic zones in other areas of the city. These traffic zones have corresponding relationships to the starting point, ending point, and stay points of a trip chain. Generally, the starting point and ending point of the trip chain are the traffic zones dominated by the transportation hub or the entrance/exit of the city, respectively; the stay points are the scenic spot-dominated traffic zones or the traffic zones in other areas of the city. Meanwhile, each traffic zone can either be the starting point or ending point of a trip within a trip chain. To keep things simple, this paper defines the starting point of a trip as the origin zone, and the ending point of a trip as the destination zone.
Trip chains can be further divided into simple and complex chains according to their complexity. A simple trip chain consists of two consecutive trips and has only one stay point. A complex trip chain consists of more than two consecutive trips and has more than one stay point.
To analyze the trip types of each tourist, this paper uses m to represent the origin zone and m = {1, 2, 3, . . . , M }, M is the maximum number of origin zones. Similarly, we use n to represent the destination zone and n = {1, 2, 3, . . . , M }, N is the maximum number of destination zones. Then, as shown in Figure 3, the trip of a tourist that starts from origin zone m to a number of destination zones n 1 , n 2 , n 3 , . . . , n ϕ and finally returns to m again has a chain structure. Here, ϕ is the maximum number of destination zones that a tourist could visit in a trip chain. Then, we can define the choice set for trips starting from an origin zone as It is a set of trip chains that have selected ϕ numbers of destination zones and is called the ϕ dimensional destination zone choice set hereafter. To facilitate the study, this paper divides tourist destinations into different traffic zones. These traffic zones can be further classified into several types, including scenic spot-dominated traffic zones, transportation hub-dominated traffic zones, city entrance/exit-dominated traffic zones, and traffic zones in other areas of the city. These traffic zones have corresponding relationships to the starting point, ending point, and stay points of a trip chain. Generally, the starting point and ending point of the trip chain are the traffic zones dominated by the transportation hub or the entrance/exit of the city, respectively; the stay points are the scenic spot-dominated traffic zones or the traffic zones in other areas of the city. Meanwhile, each traffic zone can either be the starting point or ending point of a trip within a trip chain. To keep things simple, this paper defines the starting point of a trip as the origin zone, and the ending point of a trip as the destination zone.
Trip chains can be further divided into simple and complex chains according to their complexity. A simple trip chain consists of two consecutive trips and has only one stay point. A complex trip chain consists of more than two consecutive trips and has more than one stay point.
To analyze the trip types of each tourist, this paper uses to represent the origin zone and 1, 2, 3, … , , M is the maximum number of origin zones. Similarly, we use to represent the destination zone and 1, 2, 3, … , , N is the maximum number of destination zones. Then, as shown in Figure 3, the trip of a tourist that starts from origin zone to a number of destination zones , , , … , and finally returns to again has a chain structure. Here, φ is the maximum number of destination zones that a tourist could visit in a trip chain. Then, we can define the choice set for trips starting from an origin zone as H , , … , . It is a set of trip chains that have selected φ numbers of destination zones and is called the φ dimensional destination zone choice set hereafter.

The Number of Tourist Trip Chains
To facilitate the analysis of tourists' trip chains, this paper specifies that the origin and destination zones in a trip chain are not the same. Furthermore, each trip in a trip chain starts from one traffic zone to another, and there are no repeated traffic zones in a trip chain. This means there are no cyclic structures within the trip chain formed by traffic zones.
Therefore, it can be defined that the length of a trip chain is the number of selected destination zones in that chain. Consequently, the maximum length of the trip chains of a choice set is the maximum number of destination zones that can be selected, i.e., the value of . Then, suppose a trip distribution model has a φ dimensional destination zone choice set, because the number of trip chains in a choice set should be less than or equal to the maximum number of accessible destination zones (J), i.e., 1 , so the number of trip chains in the model is equal to ⋯ , where M is the maximum number of the origin zones, and the calculation of is based on the permutation formula. For example, if a tourist intent to visit three scenic spots, both the number of origin zones

The Number of Tourist Trip Chains
To facilitate the analysis of tourists' trip chains, this paper specifies that the origin and destination zones in a trip chain are not the same. Furthermore, each trip in a trip chain starts from one traffic zone to another, and there are no repeated traffic zones in a trip chain. This means there are no cyclic structures within the trip chain formed by traffic zones.
Therefore, it can be defined that the length of a trip chain is the number of selected destination zones in that chain. Consequently, the maximum length of the trip chains of a choice set is the maximum number of destination zones that can be selected, i.e., the value of ϕ. Then, suppose a trip distribution model has a ϕ dimensional destination zone choice set, because the number of trip chains in a choice set should be less than or equal to the maximum number of accessible destination zones (J), i.e., 1 ≤ ϕ ≤ J, so the number of trip chains in the model is equal to where M is the maximum number of the origin zones, and the calculation of A ϕ J is based on the permutation formula. For example, if a tourist intent to visit three scenic spots, both the number of origin zones and destination zones is three, and the maximum length of the trip chain is three (i.e., ϕ = 3), then the dimension of the traffic zone choice set is no more than three, and the number of trip chains is 3  In practice, the length of a trip chain and further the number of trip chains can be calculated according to the travel behavior of tourists. More specifically, it can be estimated through a questionnaire survey of tourists about the time and distance required to finish the trips.

The Trip Distribution Forecasting Model Based on Trip Chains
Based on the trip chain method proposed to analyze the distribution of tourists, this section uses the Entropy-Maximizing theory to model the probability of tourists' trip distribution. Two types of trip chain constraints, i.e., the visiting volume constraint of stay points and the travel distance constraint, were defined based on the characteristics of trip chains. Then, a trip chain-based trip distribution model was built, along with the corresponding model-solving algorithm.

Visiting Volume Constraint of Stay Points
The trip volume of a trip chain is normally distributed in the involved traffic zones. This means its value is constrained by the amount of produced and attracted trips in the origin and destination zones. By dividing scenic spots and transportation hubs into many traffic zones, the amount of tourist production and attraction in each traffic zone could be easily inferred based on the collected number of tourists in scenic spots and the traffic volume of transportation hubs. This enables us to define restrictive conditions (i.e., constraints) on the amount of tourist production and attraction of origin and destination zones of a trip chain. Suppose every tourist departure from the traffic zone m, and the selected trips choice set is H, then the visiting volume constraint of stay points of a trip chain can be defined as Equation (1): where O m means the produced trip volume of origin zone m; D n is the number of tourists that arrive at destination zone n; H a is the destination zone choice set of tourists, and S mH denotes the volume of trips that select H as the choice set for trip chains which start from origin zone m. Since every trip chain starts from an origin zone, so the number of tourists of origin zone m (i.e., O m ) is the sum of trip volumes of all the trip chains that start from origin zone m. Similarly, the number of tourists of destination zone n (i.e., D n ) is the sum of tourists of the trip chains that start from the origin zone m and take the traffic zone n as the jth destination zone, i.e., the number of tourists whose trip chains start from origin zone m and pass through the traffic zone n.

Trip Distance Constraint
The actual trip distance of tourists in a tourism city with a complex traffic network reflects the distance limitation of tourist trip chains. Generally, when calculating trip impedance, the trip distance or time should be calculated according to the actual traffic network. Thus, to obtain the actual trip distance between traffic zones and further calculate trip impedance, it is necessary to establish a tourism traffic network, and take into consideration the length and interaction of public transit roads, railways, and other types of transit adopted by tourists to travel. The needs for data preparation and processing should also be considered.
The trip distance of a tourist's travel from an origin traffic zone to a destination traffic zone is normally limited within a certain range of tourist destinations. Thus, the constraint of the total trip distance of tourists can be defined as Equation (2): where d represents the total trip distance, and d mH is the trip distance of different choice sets in which the trip chains take m as the origin traffic zone.

Objective Function
Let P(S mH ) be the probability of a tourist trip chain, then we can establish the objective function of the tourist trip chain probability distribution as Equation (3): where S is the total number of trip volumes.

Model Building
This paper uses the Entropy-Maximizing theory to build a trip distribution model. According to the Entropy-Maximizing theory, the maximum value of P(S mH ) is trip chain probability distribution. Specifically, when entropy reaches its maximum value, the P(S mH ) of a trip chain reaches the maximum. Therefore, the trip distribution forecasting model can be transformed into an optimal objective programming problem, as shown in Equation (4): Then, we take logarithms of the objective function and use the Stirling approximation formula, i.e., ln x! = x ln x − x, to transform the objective function into Equation (5): Focusing on the abovementioned optimization problem, we build the Lagrangian formula and insert the constraint conditions into it. The result is shown in Equation (6). where α, β, and γ are Lagrangian operators.
Further, we take partial derivatives of S mH , α, β, and γ, respectively, and set them to zero, then we get Equation (7): where ϕ is the length of a trip chain. Thus, the result of S mH is: Besides, the constraints of the amount of trip production and attraction can be represented as Equation (9): To simplify the representation of the model, we define that: Thus, S mH can be represented as Equation (11): T n D n P mH exp(−γd mH ) Then, parameter R m and T n can be represented as Equation (12): where λ is the number of iterations. Put Equation (11) into the trip chain distance constraint (i.e., Equation (2)), we can represent the trip distribution forecasting model as Equation (13):

Model Solving
This section provides the solution to calculate the value of α, β, and γ of Equation (8), and further calculate the parameter γ, R λ+1 m , and T λ+1 n (see Equation (14) below) according to Equation (10). In this solution, the number of produced trips of origin zone m (i.e., O m ) and the number of trip chains of destination zone n within the tourism cities of the study area, as well as the total trip distance (i.e., d) obtained according to the traffic network of the tourism city, are already given. The proposed model solving steps of this solution are described in Figure 4.

Model Solving
This section provides the solution to calculate the value of α, β, and γ of Equation (8), and further calculate the parameter γ, , and (see Equation (14) below) according to Equation (10). In this solution, the number of produced trips of origin zone m (i.e., ) and the number of trip chains of destination zone n within the tourism cities of the study area, as well as the total trip distance (i.e., d) obtained according to the traffic network of the tourism city, are already given. The proposed model solving steps of this solution are described in Figure 4.   (14): where m = {1, 2, 3, . . . , M}, n = {1, 2, 3, . . . , N}. • step 3: judge the two conditions R λ+1 n − T λ n /T λ n < ε iteratively, if they satisfy, then go to step 4, otherwise let λ + 1 and back to step 2; • step 4: let x k = γ k and calculate γ k iteratively using Newton's method, as shown in Equation (15): if the iterative calculation satisfies the condition x k+1 − x k /x k < ε, then end the iteration, otherwise let γ k+1 = x k+1 . • step 5: let λ = λ + 1, then go back to step 2.

Model Inputs, Outputs, and Application
Application of the Entropy-Maximizing -based trip chain distribution model requires several types of input data, including the tourist survey data, traffic network, and traffic zones of the tourism city, as well as the number of produced and attracted tourists of each traffic zone obtained through the questionnaire survey of tourists. The prior probability of each trip chain is needed too. Then, we can get the actual probability of each trip chain and the trip distributions of traffic zones.
The input of the model includes the number of produced and attracted tourists of each traffic zone, the prior probability of trip chains, and the total trip distance between each traffic zone.
The numbers of tourists in each traffic zone are a precondition of the model. Thus, the number of origin and destination zones should be determined before preparing model input data. The origin zone is the area primarily influenced by transportation hubs, and the destination zone is the area dominated by a scenic spot. Tourist productions or attractions of these traffic zones are important input data of the model.
The prior probability of the initial travel path can be obtained from the survey data. Trip distance is not only a model constraint but also an important input of the model. It can be divided into different classes according to the types of tourists' trips by means of aggregation. Then, the trip distance of each trip chain can be calculated separately, and the trip type of each trip distance class is taken as the input parameter of the model.
The output of the model is the trip probability of each trip chain. Then, we can perform a statistic analysis on the number of trips and verify whether the classification of the number of trips is consistent with the actual trips of tourists. Furthermore, trip distances can also be obtained from the model, and the distribution of the trip distances of tourists analyzed. Finally, the spatial distribution of tourists' trips can be obtained from the model as well.
The proposed model can be used to predict the number of tourists in traffic zones in the targeted year based on the current status of the traffic zones in recent years. Specifically, according to the annual growth of tourists, the gray prediction theory can be used to predict the number of tourists in the traffic zone. Then, taking the number of tourists in the traffic zone as the input value, the model can be used to predict the trip distribution.

Problem Statement
The suburban tourist railway Line T1 in H city is a newly built railway serving tourist passenger flow. Thus, this paper takes the region along the railway as the study area of this paper to analyze and predict the distribution of tourists' trips. Along the railway, there are six scenic spots coded as one, two, three, four, five, and six, respectively ( Figure 5). These scenic spots are divided into six traffic zones according to the location of railway stations. Besides, a virtual traffic zone is added considering the influence of tourists from other scenic spots in H city on the scenic spots along with Line T1. Additionally, two more traffic zones based on the north railway station and the entrance/exit of roads, respectively, are added in consideration of the sources of tourists. The spatial distribution of these traffic zones is shown in Figure 5. 3. Model application.
The proposed model can be used to predict the number of tourists in traffic zones in the targeted year based on the current status of the traffic zones in recent years. Specifically, according to the annual growth of tourists, the gray prediction theory can be used to predict the number of tourists in the traffic zone. Then, taking the number of tourists in the traffic zone as the input value, the model can be used to predict the trip distribution.

Problem Statement
The suburban tourist railway Line T1 in H city is a newly built railway serving tourist passenger flow. Thus, this paper takes the region along the railway as the study area of this paper to analyze and predict the distribution of tourists' trips. Along the railway, there are six scenic spots coded as one, two, three, four, five, and six, respectively ( Figure  5). These scenic spots are divided into six traffic zones according to the location of railway stations. Besides, a virtual traffic zone is added considering the influence of tourists from other scenic spots in H city on the scenic spots along with Line T1. Additionally, two more traffic zones based on the north railway station and the entrance/exit of roads, respectively, are added in consideration of the sources of tourists. The spatial distribution of these traffic zones is shown in Figure 5.

Distributing volume of traffic zone
As mentioned above in Section 5.1, the study area was divided into nine traffic zones. The distributing volume of each traffic zone was calculated based on the tourism and traffic survey data of H city in 2019. Specifically, the tourist volume of scenic spot-dominated traffic zones was calculated according to the number of visitors of that scenic spot; the tourist volume of transportation hub-dominated traffic zones was calculated based on the number of passengers arrival and departure that transportation hub, and the tourist volume of road entrance/exit dominated traffic zones was calculated according to the volume of passenger flow obtained from the traffic survey data. The obtained tourist volume of each traffic zone is shown in Table 1. Taking traffic zones dominated by the north railway station and road entrances/exits as origin zones (i.e., traffic zones 8 and 9 in Table 1), and all the traffic zones dominated by scenic spots as destination zones (traffic zone 1~7 in Table 1), we can found that the trip attraction of the destination zones is D = {56,338,125,86,107,42,400}, the trip production of the origin zones is O = {379, 595}, and the total number of tourists is 974 (with the unit 10,000 person per year).

Trip distance
Trip distance represents not only the actual distance between traffic zones but also the trip impedance such as the interaction of road and railway traffic. This section calculates the distances between traffic zones according to the trip distance constraint of the proposed model mentioned in Section 4.1.2. Firstly, traffic network data of the study area were prepared. Then, taking the centroid point of each traffic zone as the origin and destination point, the trip distance between each traffic zone was calculated using the shortest path method. The results are listed in Table 2.

Calculation of Model Parameters
According to the survey of tourists' trips, this study sets the average number of scenic spots visited by tourists per day no more than two. Then, let ε = 0.05; model parameters γ, R m , and T n for this case can be calculated based on Equation (10). Following the steps depicted in Figure 4, after 10 iterations, it can be found that the residual is less than 0.05, and the output γ = −1.7. Figure 6 shows the relation between ε and the number of iterations. Then, based on Equation (10) and the value of R m and T n , the values of parameter α and β can be calculated. Because there are two destination zones and seven origin zones, the result values of parameters α and β are α 1 = 10.70, α 2 = 10.81, β 1 = −11.23, β 2 = −8.84, β 3 = −6.14, β 4 = −5.92, β 5 = −5.75, β 6 = −7.24, β 7 = −5.51.

γ,
, and for this case can be calculated based on Equation (10). Following the steps depicted in Figure 4, after 10 iterations, it can be found that the residual is less than 0.05, and the output γ = −1.7. Figure 6 shows the relation between ε and the number of iterations. Then, based on Equation (10)

Trip chain probability distribution
Sorting the trip chains obtained by the model in descending order of the calculated probability, we obtained the first 15 trip chains as presented in Table 3. The trip chain code in Table 3 represents the correspondence actual trip chain. For example, trip chain code 93 means the trip chain that starts from traffic zone 9 (see Table 1) to traffic zone 3, and then goes back to traffic zone 9. Note: RE 1 means the road entrance/exit; SS 2 represents the scenic spot; NRS 3 is the north railway station; OSS 4 denotes the other scenic spots.
As shown in Table 3, among the first 15 trip chains that have the maximum probability, the number of trip chains that consist of only one scenic spot is larger than the number of trip chains that involve more than two scenic spots. Specifically, the trip chain with code 93 has the maximum probability. This means tourists have the maximum probability to travel from the road entrance/exit-dominated traffic zone to scenic spot 3, and the number of tourists is also the largest. Moreover, tourists have a larger probability to travel to only one traffic zone. In particular, tourists are more likely to travel to scenic spot 3, other scenic spots, scenic spot 2, scenic spot 4, and scenic spot 5. The detailed distribution of the trip probability of trip chains listed in Table 3 is shown in Figure 7.

1.
Trip chain probability distribution Sorting the trip chains obtained by the model in descending order of the calculated probability, we obtained the first 15 trip chains as presented in Table 3. The trip chain code in Table 3 represents the correspondence actual trip chain. For example, trip chain code 93 means the trip chain that starts from traffic zone 9 (see Table 1) to traffic zone 3, and then goes back to traffic zone 9. As shown in Table 3, among the first 15 trip chains that have the maximum probability, the number of trip chains that consist of only one scenic spot is larger than the number of trip chains that involve more than two scenic spots. Specifically, the trip chain with code 93 has the maximum probability. This means tourists have the maximum probability to travel from the road entrance/exit-dominated traffic zone to scenic spot 3, and the number of tourists is also the largest. Moreover, tourists have a larger probability to travel to only one traffic zone. In particular, tourists are more likely to travel to scenic spot 3, other scenic spots, scenic spot 2, scenic spot 4, and scenic spot 5. The detailed distribution of the trip probability of trip chains listed in Table 3 is shown in Figure 7 2. Analysis of the trip chain types As described in Section 3.2, this paper classifies trip chains into two types, i.e., simple trip chains and complex trip chains. Here, trip chains were classified according to the value of φ. Specifically, a trip chain with φ = 1 is a simple trip chain and φ = 2 is a complex trip chain. The proportion of simple and complex trip chains can be calculated according to the model. The trip volume of different trip chain types is shown in Table 4. The results presented in Table 4 show that the proportion of tourism activities for simple trip chains is larger than complex trip chains. According to the maximum entropybased trip distribution model, the maximum proportion of simple trip chains is 77.82%, and the trip volume is 15,120,493. The proportion of complex trip chains is 22.18%, and the trip volume is 4,319,897. The main reason for this might be that, when tourists visit tourism cities, a simple trip chain has a comparatively longer trip distance. This will be further discussed in the next section.

Analysis of trip distance
Each trip chain has a certain distance. Thus, we can classify trip chains according to their distance. By counting the number of trip chains at an interval of 5 km, we can obtain the distribution of trip distances and the corresponding trip probability, as listed in Table  5.

2.
Analysis of the trip chain types As described in Section 3.2, this paper classifies trip chains into two types, i.e., simple trip chains and complex trip chains. Here, trip chains were classified according to the value of ϕ. Specifically, a trip chain with ϕ = 1 is a simple trip chain and ϕ = 2 is a complex trip chain. The proportion of simple and complex trip chains can be calculated according to the model. The trip volume of different trip chain types is shown in Table 4. The results presented in Table 4 show that the proportion of tourism activities for simple trip chains is larger than complex trip chains. According to the maximum entropybased trip distribution model, the maximum proportion of simple trip chains is 77.82%, and the trip volume is 15,120,493. The proportion of complex trip chains is 22.18%, and the trip volume is 4,319,897. The main reason for this might be that, when tourists visit tourism cities, a simple trip chain has a comparatively longer trip distance. This will be further discussed in the next section.

Analysis of trip distance
Each trip chain has a certain distance. Thus, we can classify trip chains according to their distance. By counting the number of trip chains at an interval of 5 km, we can obtain the distribution of trip distances and the corresponding trip probability, as listed in Table 5. As illustrated in Table 5, a trip distance within the range of 25-35 km has the maximum proportion; the corresponding activity locations are traffic zones 3 and 7, namely scenic spot 3 and the other scenic spots. These two traffic zones and related trip chains also have the largest number of tourists according to Figure 7. Furthermore, the distances from the origin zones 8 and 9 to the destination zones 3 and 7 are about 30 km. Thus, it is reasonable that the model results are concentrated in the range of 25-40 km. The detailed distribution of the trip distance probability is shown in Figure 8. As illustrated in Table 5, a trip distance within the range of 25-35 km has the maximum proportion; the corresponding activity locations are traffic zones 3 and 7, namely scenic spot 3 and the other scenic spots. These two traffic zones and related trip chains also have the largest number of tourists according to Figure 7. Furthermore, the distances from the origin zones 8 and 9 to the destination zones 3 and 7 are about 30 km. Thus, it is reasonable that the model results are concentrated in the range of 25-40 km. The detailed distribution of the trip distance probability is shown in Figure 8.

Analysis of trip distribution
The trip distribution calculated using the trip chain-based trip distribution model is listed in Table 6.

Analysis of trip distribution
The trip distribution calculated using the trip chain-based trip distribution model is listed in Table 6. It can be seen from Table 6 that traffic zone 8 and traffic zone 9 have a relatively large volume of trips. Thus, it can be inferred that the majority of the trips of tourists to scenic spots are based on railway and road. Note that the volume of trips between some scenic spots is very small, e.g., the trips between traffic zone 1 and 2. The main reason could be that the origin zones of these trip chains are traffic zone 8 and 9, they are far away from traffic zone 1 and 2, resulting in a small result of the model.
We can draw trip expectation lines (i.e., origin-destination flows of expected trips) based on the trip distribution, as shown in Figure 9. It shows that the volume of trips from the traffic zones based on scenic spot 3 and other scenic spots to the traffic zones based on the north railway station and the road entrances/exits is relatively large, followed by the traffic volume from the traffic zones based on scenic spot 2, 4, and 5 to the traffic zones based on the north railway station and the road entrances/exits. Constrained by the visiting volume of stay points, the trip volume of traffic zones based on transportation hubs is the largest, followed by the trip volume between traffic zones where tourists visit more. These results are consistent with the trip distribution between traffic zones.

Screenline Test
To test the accuracy of trip distribution between scenic spots, the traffic volume that is already known needs to be used to make judgments. This paper sets screenlines between traffic zones to facilitate the test. Table 7 shows the areas separated by screenlines. Specifically, as shown in Figure 10, the boundary of the traffic zone of scenic spot 1 was set as screenline 1; the boundary of the traffic zone of scenic spot 3 was set as screenline 2; the boundary line that encloses traffic zone 2 and 4 was set as screenline 3; the boundary line that encloses the traffic zones of scenic spot 5 and 6 was set as screenline 4. Then, the road traffic volume was converted into the trip volume of tourists, and the trip OD volume was allocated to the traffic network of the screenlines. Specifically, as shown in Figure 10, the boundary of the traffic zone of scenic spot 1 was set as screenline 1; the boundary of the traffic zone of scenic spot 3 was set as screenline 2; the boundary line that encloses traffic zone 2 and 4 was set as screenline 3; the boundary line that encloses the traffic zones of scenic spot 5 and 6 was set as screenline 4. Then, the road traffic volume was converted into the trip volume of tourists, and the trip OD volume was allocated to the traffic network of the screenlines. The actual passenger flow of each screenline was obtained by calculating the annual traffic flow volume of relevant roads crossing the screenlines. It has been further converted into the number of tourists. Particularly, the traffic flow data was collected from the investigation of the number, license plate, and model of vehicles passing the road check points. Moreover, the tourist flow is distinguished from the total flow by taking buses and non-local cars as tourist traffic flow. Then, the trip distribution results obtained from the model were used to simulate the passenger flow volume at the cross-sections of screenlines. The simulated results were later used to compare with the actual trip vol- The actual passenger flow of each screenline was obtained by calculating the annual traffic flow volume of relevant roads crossing the screenlines. It has been further converted into the number of tourists. Particularly, the traffic flow data was collected from the investigation of the number, license plate, and model of vehicles passing the road check points. Moreover, the tourist flow is distinguished from the total flow by taking buses and non-local cars as tourist traffic flow. Then, the trip distribution results obtained from the model were used to simulate the passenger flow volume at the cross-sections of screenlines. The simulated results were later used to compare with the actual trip volumes, to test the reliability of the output results of the proposed model. As shown in Table 8, the error between the simulated passenger flow on the cross-sections of screenlines and the actual passenger flow is within 15%, which is within the acceptable error range. This indicates that the model has certain reliability and can be applied to predict the trip distribution of tourists of the suburban tourist railway in the future.

Conclusions
Aiming to predict the trip distribution of tourists of the suburban tourist railway, this paper proposed a forecasting model based on the Entropy-Maximizing theory and the trip chain method. A case study was conducted in this paper to demonstrate the applicability and effectiveness of the proposed model. The results of the case study show that the proposed model has a good fit for the data obtained from the tourism and traffic survey. This model has therefore been proved to be a good support method for decision-making in the planning and construction of suburban tourist railways.
Future research is still needed to improve the proposed model. The first research issue could be the integration of the traditional survey data and new data sources such as social media data to improve its robustness and accuracy. Particularly, social media is a near real-time and cost-effective data source that can provide more detailed and potentially more complete information with high spatial-temporal resolution [21,27]. The second research question that needs attention is the forecasting of spatial-temporal trip distribution of tourists of a suburban tourist railway. Time is naturally a very important influencing factor to tourists' trips. However, this paper is currently limited to the spatial dimension of trip distribution.