Next Article in Journal
Nonlocal Neumann Boundary Value Problem for Fractional Symmetric Hahn Integrodifference Equations
Previous Article in Journal
Modulation Recognition of Communication Signal Based on Convolutional Neural Network
Previous Article in Special Issue
An Efficient and Robust Improved A* Algorithm for Path Planning
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Multi-Objective Optimization of Differentiated Urban Ring Road Bus Lines and Fares Based on Travelers’ Interactive Reinforcement Learning

1
School of Management, Beijing Union University, Beijing 100101, China
2
School of Economics and Management, Beijing Jiaotong University, Beijing 100044, China
*
Author to whom correspondence should be addressed.
Symmetry 2021, 13(12), 2301; https://doi.org/10.3390/sym13122301
Submission received: 2 November 2021 / Revised: 17 November 2021 / Accepted: 17 November 2021 / Published: 2 December 2021

Abstract

:
This paper proposes a new multi-objective bi-level programming model for the ring road bus lines and fare design problems. The proposed model consists of two layers: the traffic management operator and travelers. In the upper level, we propose a multi-objective bus lines and fares optimization model in which the operator’s profit and travelers’ utility are set as objective functions. In the lower level, evolutionary multi agent model of travelers’ bounded rational reinforcement learning with social interaction is introduced. A solution algorithm for the multi-objective bi-level programming is developed on the basis of the equalization algorithm of OD matrix. A numerical example based on a real case was conducted to verify the proposed models and solution algorithm. The computational results indicated that travel choice models with different degrees of rationality significantly changed the optimization results of bus lines and the differentiated fares; furthermore, the multi-objective bi-level programming in this paper can generate the solution to reduce the maximum section flow, increase the profit, and reduce travelers’ generalized travel cost.

1. Introduction

1.1. Background and Motivation

In recent years, with the continuous increase of the scale of big cities, the road network of many large cities presents a ring structure spreading from the center to the periphery. From the perspective of urban planning, in cities with high population density, the ring road plays an important role in alleviating congestion in urban centers and realizing rapid connectivity between urban areas [1]. The ring road traffic congestion index should show the characteristics of gradually decreasing from the core area to the periphery, and according to the Beijing Transportation Development Annual Report 2021, the traffic congestion index in Beijing has decreased under the COVID-19 virus situation; however, congestion still occurs frequently in some ring roads. For example, during the rush hour of public transportation, there has been a long period of congestion between Ciyunsi bridge and Dajiaoting bridge on the middle east fourth ring road, which indicates that the bus line operation on the ring road needs to be further optimized.
In addition, in the existing research of traffic flow evolution and travel behavior, scholars have carried out some important research on the endogenous complexity of travel behavior, such as traveler’s reference point dependency, psychological account, regret aversion, and social interaction [2,3]. Through these studies, scholars found that, under the influence of bounded rationality, social interaction, and behavioral complexity, the traffic flow evolution is affected by the complexity of group decision making. Therefore, the design and operation scheme of bus lines will also be affected by the complexity of travelers’ group behavior in terms of traffic system optimization, the impact of travelers’ bounded rationality, social interaction, and daily evolutionary complexity on the optimization results of the transportation system, which deserves more in-depth research.

1.2. Literature Review

The optimization of bus line operation seeks to facilitate people’s traveling by adjusting the existing stop schedule plan, differentiated fares, timetable, etc. In general, the objective function contains profit maximization, maximizing travel utility, minimizing the operation cost, etc. In recent years, more and more studies have found that optimizing the variables related to travel behavior or utility can significantly improve the performance of traffic system. Hm et al. [4] found that optimizing the timetable on the basis of the dynamic travel demand of passengers can effectively increase the line passenger flow. On the basis of the multi-source big data, Yu et al. [5] accurately extracted candidate stations, which are very popular with travelers and convenient for transfer; the empirical study of a real case showed that the optimized lines can quickly satisfy travelers’ demand. Aiming at minimizing the cost of travelers and operators, Liang et al. [6] established a multi-objective optimization model of bus networks, which successfully reduced travelers’ waiting time and the in-vehicle time. Aiming at the optimization problem of bus network with demand response and the revenue of operators, Huang et al. [7] proposed a two-stage (static phase and dynamic phase) optimization model to solve the network design problem. Zhang et al. [8] introduced the travel time dependence of travelers into the bus timetable optimization problem, significantly improving the utility of passengers.
On the other hand, some studies in recent years have also found that, in addition to reasonable bus line design, the differentiated fares can also improve the efficiency of the traffic system; compared with the traditional fare scheme, differentiated bus fares strategy often has more advantages [9,10]. The differentiated fares system can effectively reduce travelers’ time cost and alleviate traffic congestion during peak hours [11,12]. The implementation of differentiated fares based on comfort level can reduce social costs and is more conducive to the public transport system than charging congestion fees [13]. In addition, the differentiated bus fares scheme can achieve a better balance between eliminating externalities and ensuring consumer surplus, as well as improving the Pareto distribution [12,13].
From the above literature review, it can be seen that the joint improvement and optimization of bus network and fare scheme can effectively improve the efficiency of transportation system. However, the research on the joint optimization of bus network and differentiated fares need to be further deepened; moreover, most of the existing studies of bus lines and fares have some basic assumptions, such as simplifying travelers’ behavior factors, but the cluster behavior of travelers is usually not considered. Some typical studies have found that after the implementation of the optimization scheme, there will still be a social dilemma [14,15] in the use of transportation resources. This is because, on the one hand, the traffic system is a typical complex socio-economic system; travelers’ decision making will be affected by social environment and bounded rationality [16]; and under the influence of multi-channel information, cluster travel behavior has the characteristics of learning and interaction [17,18]. On the other hand, travel choice is a day-to-day evolution process [19], and thus there is an uncertain causal relationship between travelers’ traffic information and behavior [20]. In addition, it is very difficult for travelers to accurately obtain the utility information of all potential travel modes, and thus it is also difficult to simulate the process of travel experience accumulation by using the analytical model.
The work of this paper is mainly divided into two parts: firstly, travelers’ reinforcement learning with social interaction is introduced into the ring road bus lines design problem. Secondly, a multi-objective bi-level programming of bus lines and differentiated ticket fares joint optimization model is established, and the solution algorithm in which the swarm intelligence multi-objective optimization algorithm is combined with the equalization algorithm of OD matrix is designed. Moreover, the model we proposed in this paper was applied to the Fourth Ring Road in Beijing to verify the effectiveness of the model.

1.3. Paper Organization

The remainder of this paper is organized as follows. The problem statement and basic assumptions are presented in Section 2. The analysis of generalized travel cost is presented in Section 3. Section 4 describes travelers’ BM reinforcement learning model with social interaction, which is followed by the properties of the model. Section 5 proposes the multi-objective bi-level programming model of bus lines and differentiated ticket fares, and Section 6 presents the corresponding solution algorithm of multi-objective bi-level programming. Section 7 presents the numerical example under real case to verify the proposed model and algorithm. Section 8 concludes this paper.

2. Problem Statement and the Basic Assumptions

2.1. Problem Statement

Consider a graph G = ( V , A ) of urban ring road that contains N main bus stops, where V is the set of bus stop and A is the set of links ( a A ). Let R denote the set of ring bus lines, where each bus line is composed of bus stops and links, and let p denote the travel mode of private car and b the travel mode of shared bike. Let D = ( D 1 , 1 D 1 , N D N , 1 D N , N ) denote the demand matrix in graph G , in which D i , j represents the travel demand between bus stop i and j . In the ring road, travelers always choose the shortest route, in daily travel activities, travelers can choose buses, private car, and shared bike between OD i and j ; thus, D i , j r is the travel demand of bus line r between OD i and j . Due to the competition between different travel modes and travelers’ option between different bus lines on the ring road, the traffic flow will transfer among different ring bus lines, private cars, and shared bike.
In reality, there are often different stop schedule plans for different ring bus lines on the ring road; let N max l i n e denote the maximum number of ring bus lines on the ring road. Thus, the stop schedule plans can be represented as a 0–1 matrix (see Table 1):
It can be seen from Table 1 that in the matrix, “1” represents the fact that the bus will stop here, and “0” represents the fact that the bus will not stop here. In addition, there are differentiated ticket fares P r per kilometer for the N max l i n e bus lines; if travelers choose private car or shared bike, they will have to pay the parking fee or bike sharing fee. The objective of the bus operation management department is to optimize and adjust the stop schedule plans and differentiated fares of bus lines, so as to improve the operation income, expand social welfare, and balance the transportation resources. Therefore, in this paper, we set the matrix of stop schedule plans and differentiated fares of bus lines as the optimization variables.

2.2. Basic Assumptions

(1)
Travel demand between bus stops. The travel demand D between bus stops is obtained from real daily bus IC card data in Beijing.
(2)
Travelers’ bounded rationality. It is difficult for travelers to know the accurate utility information of all potential travel modes at the same time. Travel decision-making is affected by travel cost, information interaction, and historical travel experience; thus, travelers’ perception of utility is a process of reinforcement learning.
(3)
Travel modes. There are three optional travel modes among bus stops: (1) buses, (2) private car, and (3) shared bike. In the ring bus lines, travelers can travel between any two bus stops without changing lines, and they always choose the shortest path (in a ring bus line, there are clockwise and counterclockwise paths from bus stop i to bus stop j). Let d* denote the critical distance of bicycle riding; when the distance between OD is larger than d*, travelers will not choose to ride a bicycle (Figure 1).

3. Generalized Travel Cost

By summarizing the literature on travel behavior, we find that the generalized travel cost consists of the following elements:
(1)
Psychological time of waiting the bus
For travelers who choose buses, they arrive at the bus stop in the Poisson process with the intensity of λ ; let ϕ r denote the departure frequency of bus line r , let τ x denote the time of traveler x ’s arrival at the bus stop, and the arrival of travelers can be regarded as independent random variables which obey the uniform distribution on the interval [ 0 , 1 ϕ r ] . Thus, we have E ( τ x ) = 1 2 ϕ r , and the waiting time can be represented as 1 ϕ r τ x . Let S ( τ ) denote the number of travelers arriving at the bus stop at time τ , and thus the expected waiting time of travelers can be formulated as E [ x = 1 S ( τ ) ( 1 ϕ r τ x ) ] . According to the time processing theory, there is a certain difference between travelers’ psychological feeling and the actual physical time, and the psychological feeling is more in accordance with travelers’ perception of utility. Therefore, the physical time should be converted into psychological time α ( 1 ϕ r τ x ) β , and α and β represent travel purpose coefficient and attention coefficient, respectively. Moreover, we have
{ E p s y [ x = 1 S ( τ ) α ( 1 ϕ r τ x ) β ] = λ 0 τ α ( 1 ϕ r τ x ) β d τ x E [ S ( τ ) ] = λ ϕ r
Solve Equation (1), and travelers’ psychological time of waiting the bus can be formulated as T p s y r = α ϕ r β ( β + 1 ) ; for the travelers who choose private car and shared bike, their psychological time of waiting is 0.
(2)
Travel time
For travelers who choose buses between OD i , j on day t , if they choose the bus line r , the travel time can be represented as T i , j r , t . For travelers who choose private cars, in this paper, we assume that the travel time of private car between OD i , j equals the shortest bus line travel time, which can be represented as T i , j p , t = min r R { T i , j r , t } . For travelers who choose shared bike, the travel time between OD i , j is T i , j b , t = d i , j v b , in which d i , j is the distance between bus stop i and j , and v b is the average speed of a bicycle.
(3)
Crowding degree
For travelers who choose buses, due to the restrictions of bus capacity and different stop schedule plans, the bus will be crowded; take bus line r which contains N r bus stops for instance, and let f i , i + 1 r + , t and f i , i + 1 r , t represent the up direction (from i to i + 1 ) and down direction (from i + 1 to i ) traffic flow between bus stop i and i + 1 on day t , respectively. Thus, we have
{ f i , i + 1 r + , t = f i 1 , i r + , t + j = i N r D i , j r , t j = 1 i D j , i r , t f i , i + 1 r , t = f i + 1 , i + 2 r , t + j = 1 i + 1 D i + 1 , j r , t j = i + 1 N r D j , i + 1 r , t
Then the maximum section flow is max { f i , i + 1 r + + f i , i + 1 r } ( r R , i V ) , and the crowding degree of bus line r between bus stop i and j on day t can be formulated as
{ C i , j r + , t = s = i j 1 η f s , s + 1 r + , t V r ϕ r C i , j r , t = s = i j 1 η f s , s + 1 r , t V r ϕ r
Here, V r represents the bus capacity of line r , and η is crowding factor. In addition, we assume that the crowding degree (not traffic jam) of private car and shared bike is 0.
(4)
Bus ticket fare, parking fee of private car, and bike sharing fee
For travelers who choose buses, let P r denote the fare per kilometer of bus line r ; thus, travelers need to pay P r d i , j . For travelers who choose private car and shared bike, they need to pay the parking fee and bike sharing fee, which are represented as P ˜ p and P ˜ b .
(5)
The effect of social interaction on travel cost
Let ξ x = 1 denote traveler x chooses bus, and ξ x = 1 represents traveler x chooses other travel modes, while λ m is social interaction level. Let E ( μ x ) denote traveler’s expectation of travel mode choice between bus stop i and j ; thus, E ( μ x ) = y x E ( μ y ) D ¯ i , j 1 , and according to the principle of multiplier interaction, the effect of social interaction on day t can be formulated as
M i , j t ( ξ x , μ x ) = λ m ξ x y x E ( μ y ) D ¯ i , j 1
In summary, the generalized travel cost between bus stop i and j on day t can be formulated as
{ G i , j κ , t = ζ p s y T p s y κ + ζ T T i , j κ , t + ζ C C i , j κ , t + ζ P P κ d i , j + ζ M M i , j t ( ξ x , μ x ) , ( κ = r , r R ) G i , j κ , t = ζ T T i , j κ , t + ζ C C i , j κ , t + ζ P P ˜ κ + ζ M M i , j t ( ξ x , μ x ) , ( κ = p , b )
Here, ζ p s y , ζ T , ζ C , ζ P , ζ M represent cost coefficients.
In the existing studies, some scholars use the social interaction model to simulate group travel choice behavior, but in reality, the essence of “interaction” is the diffusion of asymmetric and incomplete travel information in the group. Travelers make decisions based on the external information they receive, rather than being directly influenced by other travelers’ behavior. Therefore, in this paper, travelers’ social interaction is reflected in the information of generalized travel cost (Equation (5)) rather than the choice behavior itself.

4. BM Reinforcement Learning Model with Interaction

In real travel activities, travelers’ behavior is not always completely rational, and counterintuitive paradox often occurs in daily travel choice decision making, which conflicts with the traditional expected utility theory. In recent years, regret theory has been developed continuously. Regret theory holds that the decision making of travelers’ route (travel mode) choice is not only related to the utility of the selected route itself, but also related to the feedback generated by the comparison with other alternative routes (travel modes). At present, regret theory has been found to be more accurate in describing travelers’ decision-making behavior in an uncertain environment, and the calculated results are more in accordance with the reality. Therefore, this paper describes travelers’ generalized cost according to regret theory.

4.1. Utility Based on Regret Theory

The model construction of regret theory has experienced the improvement process from RRM1 [21] to RRM2 [22], and then to the consideration of path impedance and “regret feeling” [23]. In this paper, the construction idea of the regret theory is organically combined with the above travel scenarios, and the travel mode choice model is constructed as follows.
The regret cost based on generalized travel cost is formulated as
h ¯ i , j κ , t = G i , j κ , t δ ( min κ { R , p , b } { G i , j κ , t } G i , j κ , t )
The formation of regret psychology is based on the objective generalized cost observed by travelers; thus, we use G i , j κ , t to represent the generalized travel cost without social interaction. According to the design method of regret function, the function δ ( x ) can be represented as
δ ( x ) = 1 e ψ x
Here, ψ represents travelers’ regret aversion level; the larger the value of ψ , the more regret-averse the traveler is.
Furthermore, in this paper, we assume that travelers’ regret aversion level ψ is heterogeneous. Let N a g e n t be the number of agents participating in the reinforcement learning simulation between each OD pair; N a g e n t agents form the decision space Ω a g e n t ( x Ω a g e n t ), and agents are distributed in the grid, with each node representing a traveler. After the traveling between bus stop i and j on day t , each traveler would like to update their ψ through information exchanging in the Moore neighborhood (unlike the travel information on various intelligent devices, regret aversion is an endogenous psychological activity, and thus we set a small interaction range).
This process can be designed as follows:
(1) Each traveler x chooses the traveler with the lowest regret cost in the neighborhood (denoted as x ).
h ¯ i , j , x κ , t = min { h ¯ i , j , x κ , t | x Ω n e i g h b o r }
(2) Let ψ x be the regret aversion level of traveler x , and traveler x updates their value of ψ x with the intensity of p c .
ψ x t + 1 = ( 1 p c ) ψ x t + p c ψ x t

4.2. Bush–Mosteller Reinforcement Learning Model Based on Regret Theory

Most of the existing studies use logit model to depict the choice of travel mode, in which travelers know exactly the utility of each potential choice when making decisions. However, in reality, it is difficult for travelers to know the accurate utility information of all potential choices, and travelers’ choice of travel mode is a process of continuous improvement of their own experience that is affected by travel cost, information interaction, and historical travel experience. Therefore, in this paper, the Bush–Mosteller reinforcement learning model is introduced to simulate the evolution process of travelers’ travel mode choice.
Traveler x can only choose one single travel mode in one day, and they may choose different travel modes in a few days, which indicates that in a long period of daily travel activities, travelers can not only obtain the perceptive utility (PU) of every travel mode but can also gain the experience of travel mode utility (EU) on day t . Let U i , j κ , t and E i , j t denote the PU of travel mode κ and the EU on day t , respectively, which can be formulated as
U i , j , x κ , t = { s = 1 t ( h ¯ i , j , x κ , t ) ε i , j , x κ , s s = 1 t ε i , j , x κ , s , s L s = t L t ( h ¯ i , j , x κ , t ) ε i , j , x κ , s s = t L t ε i , j , x κ , s , s > L
E i , j , x t = { s = 1 t 1 ( h ¯ i , j , x κ , t ) t 1 , s L s = t L t 1 ( h ¯ i , j , x κ , t ) t L , s > L
Here, ε i , j , x κ , s is the 0–1 variable; if traveler x chooses the travel mode κ between bus stop i and j on day t , then ε i , j , x κ , s = 1 , or else ε i , j , x κ , s = 0 . L represents traveler’s memory length of historical regret cost.
On day t , traveler x will make a comparison between U i , j , x κ , t and E i , j , x t :
u i , j , x κ , t = { U i , j , x κ , t E i , j , x t | max { U i , j , x κ , t E i , j x , t } | , U i , j , x κ , t E i , j , x t U i , j , x κ , t E i , j , x t | min { U i , j , x κ , t E i , j x , t } | , U i , j , x κ , t < E i , j , x t
Let l denote the learning intensity of travelers; traveler x updates the choice probability of travel mode κ (represented as ω i , j , x κ , t ) and the choice probabilities of other travel modes ¬ κ (represented as ω i , j , x κ , t ) between bus stop i and j .
ω i , j , x κ , t = { ω i , j , x κ , t 1 + ( 1 ω i , j , x κ , t 1 ) l u i , j , x κ , t 1 , u i , j , x κ , t 1 0 ω i , j , x κ , t 1 + ω i , j , x κ , t 1 l u i , j , x κ , t 1 , u i , j , x κ , t 1 < 0
ω i , j , x ¬ κ , t = { ω i , j , x ¬ κ , t 1 ω i , j , x ¬ κ , t 1 l u i , j , x κ , t 1 , u i , j , x κ , t 1 0 ω i , j , x ¬ κ , t 1 ω i , j , x ¬ κ , t 1 ω i , j , x κ , t 1 l u i , j , x κ , t 1 1 ω i , j , x κ , t 1 , u i , j , x κ , t 1 < 0
The average choice probability between bus stop i and j can be represented as ( ω ¯ i , j , x κ , t , ω ¯ i , j , x κ , t ) , and the traffic flow of travel mode κ between bus stop i and j is Q i , j κ , t = D i j ω ¯ i , j , x κ , t .

4.3. Properties of the Model

Theorem 1. 
There exists the equilibrium state of traffic flow among all the bus stops. The necessary and sufficient condition for the non-zero traffic flow of each travel mode to reach the equilibrium state is that the perceptive utility (PU) of each travel mode is the same as the experience of travel mode utility (EU), which can be formulated as
x = 1 Q i j U i , j , x κ ( ψ x ) = x = 1 Q i j E i , j , x ( ψ x ) , κ { R , p , b }
Proof. 
On day t , travelers between bus stop i and j can be divided into two groups, travelers who choose travel mode κ from the first group, and travelers who do not choose travel mode κ from the second group. For travelers from the first group, they make decisions based on Equation (13), and the probability updating formula of choosing travel mode κ can be formulated as
Δ ω i , j , x κ , t = ω i , j , x κ , t ω i , j , x κ , t 1 ω i , j , x κ , t 1 = { ( 1 ω i , j , x κ , t 1 ) l u i , j , x κ , t 1 ω i , j , x κ , t 1 , u i , j , x κ , t 1 0 l u i , j , x κ , t 1 , u i , j , x κ , t 1 < 0
For travelers from the second group, they make decision based on Equation (13), and the probability updating formula of choosing travel mode κ can be formulated as
Δ ω i , j , x κ , t = { l u i , j , x ¬ κ , t 1 , u i , j , x ¬ κ , t 1 0 ω i , j , x ¬ κ , t 1 l u i , j , x ¬ κ , t 1 1 ω i , j , x ¬ κ , t 1 , u i , j , x ¬ κ , t 1 < 0
Therefore, the probability updating formula of choosing travel mode κ for all the travelers between bus stop i and j can be represented as
x = 1 D i j Δ ω i , j , x κ = x = 1 Q i , j κ Δ ω i , j , x κ + ¬ κ { R , p , b } x = 1 Q i , j ¬ κ Δ ω i , j , x κ = { x = 1 Q i , j κ ( 1 ω i , j , x κ ) l u i , j , x κ ω i , j , x κ ¬ κ { R , p , b } , u i , j , x ¬ κ , t 1 0 x = 1 Q i , j ¬ κ l u i , j , x ¬ κ ¬ κ { R , p , b } , u i , j , x ¬ κ , t 1 < 0 x = 1 Q i , j ¬ κ ω i , j , x ¬ κ l u i , j , x ¬ κ 1 ω i , j , x ¬ κ , u i , j , x κ 0 x = 1 Q i , j κ l u i , j , x κ ¬ κ { R , p , b } , u i , j , x ¬ κ , t 1 0 x = 1 Q i , j ¬ κ l u i , j , x ¬ κ ¬ κ { R , p , b } , u i , j , x ¬ κ , t 1 < 0 x = 1 Q i , j ¬ κ ω i , j , x ¬ κ l u i , j , x ¬ κ 1 ω i , j , x ¬ κ , u i , j , x κ < 0
Moreover, combine and expand Equation (17)
x = 1 D i j Δ ω i , j , x κ = { l 1 { x = 1 Q i , j κ [ U i , j , x κ ( ψ x ) E i , j , x ( ψ x ) ] ω i , j , x κ κ { R , p , b } x = 1 Q i j [ U i , j , x κ ( ψ x ) E i , j , x ( ψ x ) ] + ¬ κ { R , p , b } , u i , j , x ¬ κ < 0 x = 1 Q i , j ¬ κ ( 1 2 ω i , j , x ¬ κ ) [ U i , j , x ¬ κ ( ψ x ) E i , j , x ( ψ x ) ] 1 ω i , j , x ¬ κ , t 1 } , u i , j , x κ 0 l 2 { x = 1 Q i , j κ [ U i , j , x κ ( ψ x ) E i , j , x ( ψ x ) ] ¬ κ { R , p , b } , u i , j , x ¬ κ 0 x = 1 Q i , j ¬ κ [ U i , j , x ¬ κ ( ψ x ) E i , j , x ( ψ x ) ] ¬ κ { R , p , b } , u i , j , x ¬ κ < 0 x = 1 Q i , j ¬ κ ω i , j , x ¬ κ [ U i , j , x ¬ κ ( ψ x ) E i , j , x ( ψ x ) ] 1 ω i , j , x ¬ κ } , u i , j , x κ < 0
Sufficiency: when the condition of (15) is satisfied, x = 1 D i j Δ ω i , j , x κ = 0 holds, and the traffic flow reaches equilibrium.
Necessity: when x = 1 D i j Δ ω i , j , x κ = 0 holds, for ω i , j , x κ 0 and ω i , j , x ¬ κ 0 , with the continuous updating of ψ x , travelers’ ψ x tends to be the same, and thus we have x = 1 Q i j U i , j , x κ ( ψ x ) = x = 1 Q i j E i , j , x ( ψ x ) , κ { R , p , b } . □

5. Multi-Objective Bi-Level Programming of Bus Lines and Differentiated Ticket Fares

5.1. Constraints

(1)
Constraint of the bus stop setting
For each bus stop schedule plan X r , the number of bus stops should be larger than or equal to 2:
i = 1 N X r , i 2 , r R
(2)
Constraint of the traffic flow
The total daily travel demand between bus stops is fixed:
D i j = κ { R , p , b } Q i , j κ , t
(3)
Reasonable fare range and the constraint of total fare revenue
Public transport has the attribute of social public welfare; thus, the ticket fare and revenue should be controlled within a certain range:
P r , min P r P r , max ( r R )
i V j V r R Q i , j r P r d i , j M

5.2. Objective Function

The traffic management department encourages people to choose public transport or shared bike through bus lines optimization and fare policy adjustment, which can change travelers’ social equilibrium, so as to avoid pollution and congestion caused by a large number of private cars and solve the social dilemma. For the transportation management department, under the market conditions, on the one hand, it is necessary to maximize the profits generated by the public transport system; on the other hand, it is necessary to maximize the travel utility of travelers, so as to realize social welfare. Since it is difficult to determine the a priori weight of the two objectives and travelers’ bounded rationality in reality, in this paper, the idea of Pareto optimization is introduced to transform the bus line and fare optimization problem into a multi-objective optimization problem.
The objective function of buses’ profit maximization can be formulated as
max F 1 ( X , P r ) = i V j V ( r R Q i , j r P r d i , j r R ϕ r c r )
where c r represents the average operating cost. The objective function of maximizing travel utility can be formulated as
max F 2 ( X , P r ) = { i V j V r R Q i , j r G i , j r i V j V Q i , j p G i , j p i V j V Q i , j b G i , j b , d i , j d i V j V r R Q i , j r G i , j r i V j V Q i , j p G i , j p , d i , j > d
Moreover, for a large number of individual travelers, they need to maximize the utility through the reinforcement learning process of travel choice and the evolutionary process of heterogeneous regret aversion level.
max F 3 = U i , j , x κ , t ( ω i , j , x κ , t , h ¯ i , j , x κ , t ) , x Ω a g e n t

5.3. The Multi-Objective Bi-Level Programming Model

It is worth mentioning that Equations (24)–(26) constitute a multi-objective bi-level programming problem in which the problem max F ( X , P r ) = [ F 1 ( X , P r ) , F 2 ( X , P r ) ] T forms the upper-level programming and the problem max F 3 represents the lower-level programming. The multi-objective bi-level programming model can be represented as
{ max F ( X , P r ) = [ F 1 ( X , P r ) , F 2 ( X , P r ) ] T max F 3 = U i , j , x κ , t ( ω i , j , x κ , t , h ¯ i , j , x κ , t ) , x Ω a g e n t     s . t . i = 1 N X r , i 2 , r R                   D i j = κ { R , p , b } Q i , j κ , t                   P r , min P r P r , max , r R                   i V j V r R Q i , j r P r d i , j M
Definition 1. 
The Pareto optimal solution of bus stop schedule plan and differentiated ticket fares. For the variables ( X a , P r , a ) and ( X b , P r , b ) under constraints, if F i ( X a , P r , a ) F i ( X b , P r , b ) ( i = 1 , 2 ) and there exists at least one i that satisfies F i ( X a , P r , a ) > F i ( X b , P r , b ) , then ( X a , P r , a ) dominates ( X b , P r , b ) , which is denoted by ( X a , P r , a ) ( X b , P r , b ) . Moreover, if vector ( X c , P r , c ) is not dominated by any other variables, then ( X c , P r , c ) is non-dominated solution. The set of objective function values calculated by all non-dominated vectors constitute the Pareto frontier of bus line and fare optimization problem.
It can be seen that the introduction of multi-objective bi-level programming can provide traffic management departments with a decision-making space that is not affected by a priori probability and can tradeoff between economic income and the travelers’ utility complexity.

6. Solution Algorithm of Multi-Objective Bi-Level Programming

It can be seen from the above model that the multi-objective bi-level programming problem has the characteristics of multivariable and nonlinear; therefore, in this paper, we designed a solution algorithm in which the swarm intelligence multi-objective optimization algorithm is combined with the equalization algorithm of OD matrix. The algorithm steps are as follows:
Step 1: Population initialization. In recent years, swarm intelligence optimization algorithm based on complex network has been proven to be very effective in avoiding local optimum. Therefore, we first establish a network with grid structure for the population and introduce the small world network generation algorithm to depict the connections between individuals in the population, where each individual represents a solution ( X , P r ) .
Step 1.1: Set the region i [ 0 , n p ] , j [ 0 , n p ] as the complex network generation area; both i and j are integers, and each node ( i , j ) represents an individual in the population.
Step 1.2: Each individual ( i , j ) establishes connection with the surrounding eight neighbors to form a cellular network.
Step 1.3: Here, we introduce the method in literature [24]. Let p c u t be the rewiring probability of the network, p c u t [ 0 , 1 ] . We set a random number p r a n d [ 0 , 1 ] for each node ( i , j ) ; if p r a n d p c u t , cut one of node ( i , j ) ’s links randomly, and then establish a new link between node ( i , j ) and a node that is not in the surrounding eight neighbors of node ( i , j ) . Thus, the new neighborhood is established. In order to make the population space achieve a better balance between complete certainty and complete randomness, we set p c u t = 0.5 .
Step 2: The real encoding technique is employed, and the solution corresponding to individual ( i , j ) is ( X , P r ) i , j , in which the X r , i in X is encoded with random number between 0 and 1. A random number larger than 0.5 means X r , i = 1 , otherwise X r , i = 0 .
Due to the existence of equilibrium conditions in the group Bush–Mosteller model, given the bus stop schedule plan (represented as the 0–1 matrix X ) and the differentiated ticket fares P r ( r R ), through the continuous iteration of T i , j κ , t , C i , j κ , t , and M i , j t , the equilibrium traffic flow OD matrix of various travel modes between bus stops can be obtained. The OD matrix equalization algorithm is designed as follows:
Step 2.1: Within a certain distance d i , j d , there is a competitive relationship between bus and shared bike; for each OD pair in D , given the bus ticket fare P r d i , j ( r R ), the shared bike management department will set the optimal equilibrium bike sharing fee P ˜ b on the basis of generalized Nash equilibrium.
{ { P ˜ b k = arg max [ Q ^ b k ( P ¯ r k 1 ) P ˜ b k ] P ¯ r k = arg max [ Q ^ r k ( P ˜ b k 1 ) P ¯ r k ] Q ^ b k ( P ¯ r k 1 ) = Q b k 1 + Q b P ¯ r ( P ¯ r k P ¯ r k 1 ) + Q b P ˜ b ( P ˜ b k P ˜ b k 1 ) Q b k 1 = arg min 0 Q b k 1 G ( P ˜ b k 1 ) d x s . t . P ˜ b min P ˜ b k P ˜ b max , r R
Equation (28) is a generalized Nash equilibrium problem in which min 0 Q b k 1 G ( P ˜ b k 1 ) d x represents the estimation of traffic flow by shared bike operators, and it can be calculated on the basis of logit model, where Q b P ¯ r and Q b P ˜ b represent the derivative relationship between traffic flow and the price of shared bike. Equation (28) can be solved by the method of classical sensitivity analysis, but we will not go into much detail here.
Step 2.2: For the given bus stop schedule plan X , differentiated ticket fares P r , parking fee P ˜ p , and bike sharing fee P ˜ b , set the initial value of C i , j r , 0 and M i , j 0 to 0; set the iteration time t = 0 ; and calculate the OD flow matrix Q κ , t on the basis of Equations (10)–(14).
Step 2.3: Substitute the Q κ , t into Equation (2) and calculate the traffic flow f i , i + 1 r + , t + 1 and f i , i + 1 r + , t + 1 of bus lines; then, the value of C i , j r , t + 1 and C i , j r , t + 1 are obtained. Moreover, the number of travelers that choose other travel mode ( y x μ y ) is obtained; substitute y x μ y into Equation (4) and the value of M i , j t + 1 is calculated.
Step 2.4: The generalized travel cost matrix ( G = ( G 1 , 1 G 1 , N G i , j G N , 1 G N , N ) ) between bus stops is calculated according to Equation (5).
Step 2.5: Update the OD matrix of traffic flow according to Equations (10)–(14) on the basis of the new G , and the Q κ , t + 1 is obtained.
Step 2.6: For all the travel mode κ , let max i , j V , i j { | Q i , j κ , t + 1 Q i , j κ , t | Q i , j κ , t ς } represent the termination condition (flow difference between different evolution steps). If this condition is satisfied, the algorithm will stop the iteration; otherwise, return to Step 2.3. Thus, Q κ , t + 1 is the traffic flow matrix corresponding to the given bus stop schedule plan (the 0–1 matrix X ) and the differentiated ticket fares P r ( r R ). Then, calculate the objective function F ( X , P r ) = [ F 1 ( X , P r ) , F 2 ( X , P r ) ] T based on Q κ , t + 1 .
Step 3: Establish the set of local non-dominated solutions for each individual in the population. Let N D i , j denote the non-dominated solutions set of individual ( i , j ) . Add the objective function values (solutions) of individual ( i , j ) ’s neighbors F ( X , P r ) n i , n j into N D i , j : if the solution in N D i , j is dominated by the newly added solution ( F ( X , P r ) n i , n j N D i , j ( k ) ), then delete the dominated solution in N D i , j ; if the newly added solution is not dominated by any solution in N D i , j , add the new solution into N D i , j .
Step 4: Establish the set of global non-dominated solutions for all individuals. Let N D g denote the global non-dominated solutions set, and add the objective function values (solutions) of every node of the population into N D g : if the solution in N D g is dominated by the newly added solution ( F ( X , P r ) i , j N D g ( k ) ), then delete the dominated solution in N D g ; if the newly added solution is not dominated by any solution in N D g , add the new solution into N D g .
In terms of the constraint condition, we introduce the method of “constraint violation value” to illustrate the constraint violation degree of a solution; the constraint violation value is formulated as
C V [ ( X , P r ) i , j ] = i V j V r R Q i , j r P r d i , j M
where x means if x 0 , then x = 0 , otherwise x = | x | . It can be seen that the smaller the value of C V , the better the solution is. Therefore, the dominance between two solutions F ( X , P r ) 1 and F ( X , P r ) 2 can be redefined as
F ( X , P r ) 1 dominates F ( X , P r ) 2 if one of the following conditions is satisfied: (1) F ( X , P r ) 1 is a feasible solution, but F ( X , P r ) 2 is non-feasible solution; (2) Both F ( X , P r ) 1 and F ( X , P r ) 2 are non-feasible solutions, and C V [ ( X , P r ) 1 ] < C V [ ( X , P r ) 2 ] ; (3) Both F ( X , P r ) 1 and F ( X , P r ) 2 are feasible solutions, and F ( X , P r ) 1 Pareto dominates F ( X , P r ) 2 .
Step 5: Calculate the crowding distance in N D i , j and N D g . Taking N D g as an example, the solutions in N D g are arranged in descending order from 1 to N k according to the objective function value F s ( X , P r ) k ( s = 1 , 2 , k [ 1 , N k ] ) , where N k represents the number of solutions. The crowding distance of the k th solution of the objective function s is formulated as
{ d i s s , k = F s ( X , P r ) k + 1 F s ( X , P r ) k 1 d i s s , k = , k = 1 o r k = N k
Then, the crowding distance of the k th solution is d i s k = s = 1 2 d i s s , k .
Step 6: Selection. We introduce the “roulette” method to select the optimal solution in N D i , j and N D g , and thus the probability of individual corresponding to the k solution being selected is d i s k / d i s k ; the individual selected in N D i , j are marked as ( i , j ) , and individual selected in N D g are marked as ( i , j ) .
Step 7: Crossover. Set Y = ( X , P r ) . According to the literature [24], let μ i , j = ( Y i , j + Y i , j ) / 2 and σ i , j = | Y i , j - Y i , j | , then the crossover between Y i , j and Y i , j is formulated as
Y i , j = N ( μ i , j , σ i , j 2 )
Step 8: Mutation. Let p m be the probability of chaotic mutation; here, we use the “tent map” to iterate the chaotic sequence for its good ergodicity. The tent map is formulated as ρ t + 1 = { 2 ρ t , ρ t [ 0 , 0.5 ] 2 2 ρ t , ρ t ( 0.5 , 1 ] , and the individual is updated according to the probability p m (the value range of Y i , j is [ Y i , j , min , Y i , j , max ] ):
Y i , j = Y i , j , min + ρ t ( Y i , j , max Y i , j , min )
Step 9: Determining whether the algorithm meets the termination condition (the Pareto front cannot be improved). If so, the algorithm stops, otherwise, return to Step 2.
The flow chart of the solution algorithm is illustrated in Figure 2.

7. Case Study: Optimization of Bus Line and Fares of Fourth Ring Road in Beijing

7.1. Case and Parameter Setting

At present, there are two bus circle lines on the Fourth Ring Road in Beijing (bus no. 400 and bus no. 400 fast; bus no. 400 fast does not stop at every bus stop), and the abovementioned model is introduced to design the feasible differentiated bus lines. In this paper, 12 bus stops of bus no. 400 with large traveler flow on the Fourth Ring Road in Beijing were selected as the object bus stops (see Figure 3; bus stops with very few travelers were not considered). The OD matrix was obtained from the average passenger flow data between bus stops recorded within one month (bus card data). If there was no special explanation later, then we illustrated the parameters of the model and algorithm in Table 2 and Table 3.
Table 3 illustrates the parameter setting of the model, in which the value of λ m and c r are determined on the basis of the literature [10], the fare range of bus per kilometer and parking fee are determined on the basis of the mean value of real price in Beijing and the value range is appropriately expanded, and the average departure frequency of a bus line is determined by recording the departure frequency at the important bus stop. It is difficult for travelers to remember their travel experience every day in the past, and thus we set the value of L to 10 and set the value of N a g e n t to 100 to ensure that the evolution results of BM model can converge in limited iteration time. By summarizing the relevant literature [25], we found that when the average travel distance is less than 5 km, there will be a demand for shared bike; thus, we set d = 5 .

7.2. Convergence of OD Matrix Equalization Algorithm

Step 2 of the solution algorithm in Section 6 is the OD matrix equalization algorithm; here, we first verified the Theorem 1 by numerical simulation under different bus line planning.
In Figure 4a, the vertical axis represents the mean value of | Q i , j κ , t + 1 Q i , j κ , t | Q i , j κ , t in all the OD pairs (Table 2); after about 20 steps of iteration, the value tends to 0. Figure 4b–d shows the traffic flow evolution among some typical OD pairs with 2, 3, and 4 bus lines, respectively. It can be seen that the OD matrix equalization algorithm based on multi-agent reinforcement learning and social interaction can make the traffic flow between bus stops converge to a stable state. Theorem 1 is numerically verified under the condition of OD matrix (multi-OD pairs).

7.3. Optimization Results of the Differentiated Bus Lines and Fares

The optimization results of the multi-objective bi-level programming model in this paper can be illustrated by the Pareto front obtained through the above solution algorithm. Moreover, in order to compare the effects of different travel choice models (complete rationality and bounded rationality) on the optimization results, we also introduced the traditional logit model based on regret theory (lower level of (27), β = l , ψ = 0.5 ) to simulate the travel mode choice behavior:
{ h ¯ i , j κ , t = G i , j κ , t δ ( min κ { R , p , b } { G i , j κ , t } G i , j κ , t ) Q i , j κ = D i , j exp ( β h ¯ i , j κ , t ) κ { R , p , b } exp ( β h ¯ i , j κ , t )
Figure 5 illustrates the Pareto optimal solutions obtained by the multi-objective bi-level programming based on the two travel choice models (BM reinforcement learning model with interaction and logit model), and Table 4 and Table 5 show the representative Pareto optimal solution based on these two models. It can be seen from Figure 5 that, compared with the traditional logit model with complete rationality, the multi-objective bi-level programming based on BM reinforcement learning model obtained higher travelers’ utility but lower profit, which indicates that the increase of travelers’ learning behavior under the assumption of regret theory improves the effectiveness of group decision making and then reduces the profit of buses. It can also be seen from Table 4 and Table 5 that different travel choice models have significantly changed the optimization results of bus lines and the differentiated fares. Moreover, except for the bus lines that stop at every bus stop, it can be seen from the profit-oriented optimal solutions (solution B3) that the lower the number of bus stops, the higher the ticket fare.
It can be seen from Table 6 that the continuous evolution of travelers’ regret aversion level in BM reinforcement learning model effectively reduces the generalized travel cost (lower than logit model); therefore, the profit of the BM model is also lower than logit model. Moreover, on the basis of the optimized bus lines, compared with the ticket fare under real case (bus no. 400: RMB 2 within 10 km, RMB 1 for every additional 5 km), the differentiated fares effectively reduce the maximum section flow of bus lines, which means that the balance of passenger flow distribution in the bus network has been improved.

7.4. Effect of Important Parameters on Optimization Results

(1)
Maximum number of bus lines
Change the maximum number of bus lines N max l i n e and investigate the corresponding changes of the optimal solution.
It can be seen from Figure 6 that with the decrease of N max l i n e , the traveler utility decreases, and the operating profit of buses increases. Furthermore, in reality, there are two bus circle lines on the Fourth Ring Road in Beijing (bus no. 400 and bus no. 400 fast); under the condition of N max l i n e = 2 , a comparison is made between the corresponding Pareto optimal solution and the bus lines and ticket fares under real case (Table 7). Table 7 shows that compared with the bus lines and fares under real case (bus no. 400 and bus no. 400 fast), the multi-objective bi-level programming proposed in this paper can generate the solution to reduce the maximum section flow, increase the profit, and reduce the generalized travel cost, thus reducing congestion.
(2)
Traveler’s learning behavior
On the basis of the optimal bus lines and differentiated fares obtained from the multi-objective bi-level programming, we analyzed the impact of travelers’ behavior on the objective function through numerical simulation.
Figure 7 illustrates the effect of traveler’s learning behavior (travelers’ learning intensity l and traveler’s interaction intensity of risk aversion p c ) on generalized travel cost, wherein it can be seen that the generalized travel cost corresponding to the representative Pareto optimal solution decreases with the increase of l and p c . This result indicates that travelers’ reinforcement learning and information exchanging on risk aversion level in the multi-objective bi-level programming are effective; therefore, increasing the dissemination of travel cost information and risk attitude among travelers can effectively reduce travel costs.

7.5. Equilibrium Analysis of Ring Road Bus Line Planning

In reality, the management department of buses is not always able to accurately perceive the complexity of travelers’ decisions; therefore, the management department of buses often predicts group behavior on the basis of general equilibrium theory. Moreover, when taking the social interaction into consideration, we find that the equilibrium condition of travelers from the perspective of bus management department can be formulated as [10]
E ( ξ x ) = tanh { β [ G ¯ i , j κ + λ m y x E ( ξ y ) D ¯ i , j 1 ] }
Furthermore, the social equilibrium equation of all travelers among the bus lines can be formulated as
μ = tanh [ β ( λ m μ G ¯ i , j κ ) ]
Here, μ represents the average choice proportion of a travel mode when travelers’ behavior is in equilibrium; therefore, the adjustment of bus lines and travelers’ group behavior can change the social equilibrium. On the basis of the representative Pareto optimal solution, we analyzed the proportion of travel mode choice according to Equation (34) and investigated the impact of optimal bus lines and differentiated fares on the equilibrium.
Figure 8a shows the relationship between subjective expectation curves (based on Equation (34)) of different bus line schemes (Pareto optimal solutions and the bus lines and ticket fares under real case, N max l i n e = 2 ) and travelers’ group selection equilibrium under the multi-agent BM reinforcement learning model. It can be seen that, when λ m = 1.5 , neither Pareto optimal solution nor bus lines and ticket fares under real case can make travelers’ subjective expectation and actual decision reach equilibrium; however, compared with the bus lines and ticket fares under real case, the model proposed in this paper produces Pareto optimal solutions that make the subjective expectation curve closer to equilibrium (B1 and B2). Moreover, it can be seen from Figure 8b that, when λ m = 2.3 , the Pareto optimal solution B2 reaches unique equilibrium; in this equilibrium state, the proportion of travelers who choose bus is greater than 0.5, and the Pareto optimal solution B1 produces two equilibrium points, namely, advantage equilibrium point (the proportion of travelers who choose bus is greater than 0.5) and disadvantage equilibrium point (the proportion of travelers who choose bus is less than 0.5). Therefore, Figure 8a,b shows that the complexity of travelers’ group behavior will significantly shift the social equilibrium equation, and the increase of social interaction intensity makes the subjective expectation curve move to the upper left. For the management department of buses, different bus line and fare plans (from B1 to B3) will also significantly shift the social equilibrium equation, and thus the management department of buses can appropriately increase the dissemination of accurate travel cost information among travelers to promote the formation of equilibrium.

8. Conclusions

In this paper, a multi-objective bi-level programming model of bus lines and differentiated ticket fares for the urban ring road was proposed. The operating profit and travelers’ utility are taken as objective functions. In the new model we have proposed, travelers’ reinforcement learning behavior and social interaction for higher utility based on regret theory is introduced. Through the numerical analysis based on real bus lines (the bus circle lines on the Fourth Ring Road in Beijing), we made the following conclusions: (1) Travel choice models with different degrees of rationality have significantly changed the optimization results of bus lines and the differentiated fares. (2) Compared with the ticket fare under real case, the differentiated fares effectively reduce the maximum section flow of bus lines. (3) Compared with the bus lines and fares under real case, the multi-objective bi-level programming in this paper can generate the solution to reduce the maximum section flow, increase the profit, and reduce the generalized travel cost. (4) In order to encourage travelers to choose buses, the management department of buses can appropriately increase the dissemination of accurate travel cost information among travelers to promote the formation of advantage equilibrium and to reduce travelers’ travel costs. Moreover, travelers should also increase the intensity of learning and the social interaction of risk aversion level to reduce their generalized travel costs.
In addition, this paper shows that, compared with the logit model with complete information and complete rationality, under the condition of multi-objective optimization, the evolutionary learning behavior of travelers can reduce the operating profit of transportation system. Therefore, it can be seen that, whether under the condition of complete rationality or under the condition of complex cluster behavior, the more accurate travelers master the utility information, the higher the travel utility, and the lower the profit of the transportation system.

Author Contributions

Conceptualization, X.L. and X.Z.; methodology, X.L.; validation, X.L. and B.L.; formal analysis, X.L.; investigation, B.L.; writing—original draft preparation, X.L.; writing—review and editing, X.L. and X.Z. All authors have read and agreed to the published version of the manuscript.

Funding

This study is funded by the Youth Project of Humanities and Social Sciences Financed by Ministry of Education in China (grant number: 20YJC630069) and the Youth Project of National Natural Science Foundation of China (grant number: 72103019).

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Not applicable.

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. Liang, J.; Wu, J.; Gao, Z.; Sun, H.; Yang, X.; Lo, H.K. Bus transit network design with uncertainties on the basis of a metro network: A two-step model framework. Transp. Res. Part B Methodol. 2019, 126, 115–138. [Google Scholar] [CrossRef]
  2. Gz, A.; Fw, B.; Jia, N.C.; Ma, S.; Wu, Y. Information adoption in commuters’ route choice in the context of social interactions. Transp. Res. Part A Policy Pract. 2019, 130, 300–316. [Google Scholar]
  3. Calastri, C.; Hess, S.; Daly, A.; Maness, M.; Kowald, M.; Axhausen, K. Modelling contact mode and frequency of interactions with social network members using the multiple discrete–continuous extreme value model. Transp. Res. Part C Emerg. Technol. 2017, 76, 16–34. [Google Scholar] [CrossRef] [Green Version]
  4. Hm, A.; Xiang, L.A.; Hy, B. Single bus line timetable optimization with big data: A case study in Beijing. Inf. Sci. 2020, 536, 53–66. [Google Scholar]
  5. Yu, H.; Lv, W.; Liu, H.; Fu, X.; Xiao, R. A dynamic line generation and vehicle scheduling method for airport bus line based on multi-source big travel data. Soft Comput. 2020, 24, 6329–6344. [Google Scholar] [CrossRef]
  6. Liang, M.; Zhang, H.M.; Ma, R.; Wang, W.; Dong, C. Cooperatively coevolutionary optimization design of limited-stop services and operating frequencies for transit networks. Transp. Res. Part C Emerg. Technol. 2021, 125, 103038. [Google Scholar] [CrossRef]
  7. Huang, D.; Gu, Y.; Wang, S.; Liu, Z.; Zhang, W. A two-phase optimization model for the demand-responsive customized bus network design. Transp. Res. Part C Emerg. Technol. 2020, 111, 1–21. [Google Scholar] [CrossRef]
  8. Zhang, W.; Xia, D.; Liu, T.; Fu, Y.; Ma, J. Optimization of single-line bus timetables considering time-dependent travel times: A case study of Beijing, China. Comput. Ind. Eng. 2021, 158, 107444. [Google Scholar] [CrossRef]
  9. Zhao, P.; Zhang, Y. The effects of metro fare increase on transport equity: New evidence from Beijing. Transp. Policy 2019, 74, 73–83. [Google Scholar] [CrossRef]
  10. Li, X.Y.; Zhu, X.; Li, J. Multi-objective optimization of urban public transportation network differentiated fare. J. Transp. Syst. Eng. Inf. Technol. 2020, 20, 148–155, 176. [Google Scholar]
  11. Yang, H.; Tang, Y. Managing rail transit peak-hour congestion with a fare-reward scheme. Transp. Res. Part B Methodol. 2018, 110, 122–136. [Google Scholar] [CrossRef]
  12. Tang, Y.; Yang, H.; Wang, B.; Huang, J.; Bai, Y. A Pareto-improving and revenue-neutral scheme to manage mass transit congestion with heterogeneous commuters. Transp. Res. Part C Emerg. Technol. 2020, 113, 245–259. [Google Scholar] [CrossRef]
  13. Li, Z.C.; Zhang, L. The two-mode problem with bottleneck queuing and transit crowding: How should congestion be priced using tolls and fares? Transp. Res. Part B Methodol. 2020, 138, 46–76. [Google Scholar] [CrossRef]
  14. Marek, E.M. Social learning under the labeling effect: Exploring travelers’ behavior in social dilemmas. Transp. Res. Part F Psychol. Behav. 2018, 58, 511–527. [Google Scholar] [CrossRef]
  15. Wang, Y.; Wang, Y.; Choudhury, C. Modelling heterogeneity in behavioral response to peak-avoidance policy utilizing naturalistic data of Beijing subway travelers. Transp. Res. Part F Traffic Psychol. Behav. 2020, 73, 92–106. [Google Scholar] [CrossRef]
  16. Shamshiripour, A.; Rahimi, E.; Shabanpour, R.; Mohammadian, A.K. Dynamics of travelers’ modality style in the presence of mobility-on-demand services. Transp. Res. Part C Emerg. Technol. 2020, 117, 102668. [Google Scholar] [CrossRef]
  17. Zhu, Z.; Li, X.W.; Liu, W.; Yang, H. Day-to-day evolution of departure time choice in stochastic capacity bottleneck models with bounded rationality and various information perceptions. Transp. Res. Part E Logist. Transp. Rev. 2019, 131, 168–192. [Google Scholar] [CrossRef]
  18. Ye, H.; Xiao, F.; Yang, H. Day-to-day dynamics with advanced traveler information. Transp. Res. Part B Methodol. 2021, 144, 23–44. [Google Scholar] [CrossRef]
  19. Yang, Y.; Ke, H. Day-to-Day dynamic traffic assignment with imperfect information, bounded rationality and information sharing. Transp. Res. Part C Emerg. Technol. 2020, 114, 59–83. [Google Scholar]
  20. Kroesen, M.; Chorus, C. A new perspective on the role of attitudes in explaining travel behavior: A psychological network model. Transp. Res. Part A Policy Pract. 2020, 133, 82–94. [Google Scholar] [CrossRef]
  21. Chorus, C.G.; Arentze, T.A.; Timmermans, H.J.P. A random regret-minimization model of travel choice. Transp. Res. Part B Methodol. 2008, 42, 1–18. [Google Scholar] [CrossRef]
  22. Chorus, C.G. A new model of random regret minimization. Eur. J. Transp. Infrastruct. Res. 2010, 10, 181–196. [Google Scholar]
  23. Ramos, G.; Bazzan, A.; Silva, B. Analysing the impact of travel information for minimising the regret of route choice. Transp. Res. Part C Emerg. Technol. 2018, 88, 257–271. [Google Scholar] [CrossRef]
  24. Li, X.; Zhang, H. A multi-agent complex network algorithm for multi-objective optimization. Appl. Intell. 2020, 2, 2690–2717. [Google Scholar] [CrossRef]
  25. Wang, X.D.; Cheng, Z.H.; Trepanier, M.; Sun, L. Modeling bike-sharing demand using a regression model with spatially varying coefficients. J. Transp. Geogr. 2021, 93, 103059. [Google Scholar] [CrossRef]
Figure 1. Travel modes between bus stops.
Figure 1. Travel modes between bus stops.
Symmetry 13 02301 g001
Figure 2. Flow chart of the solution algorithm.
Figure 2. Flow chart of the solution algorithm.
Symmetry 13 02301 g002
Figure 3. Representative bus stops on the Fourth Ring Road in Beijing.
Figure 3. Representative bus stops on the Fourth Ring Road in Beijing.
Symmetry 13 02301 g003
Figure 4. Evolution of traffic flow. (a) Mean flow difference between two evolution steps. (b) Traffic flow of 2 bus lines. (c) Traffic flow of 3 bus lines. (d) Traffic flow of 4 bus lines.
Figure 4. Evolution of traffic flow. (a) Mean flow difference between two evolution steps. (b) Traffic flow of 2 bus lines. (c) Traffic flow of 3 bus lines. (d) Traffic flow of 4 bus lines.
Symmetry 13 02301 g004aSymmetry 13 02301 g004b
Figure 5. Pareto front of the optimal bus lines and fares.
Figure 5. Pareto front of the optimal bus lines and fares.
Symmetry 13 02301 g005
Figure 6. Pareto optimal solution with different value of N max l i n e .
Figure 6. Pareto optimal solution with different value of N max l i n e .
Symmetry 13 02301 g006
Figure 7. Effect of traveler’s learning behavior on generalized travel cost: (a) effect on solution B1, (b) effect on solution B2, (c) effect on solution B3.
Figure 7. Effect of traveler’s learning behavior on generalized travel cost: (a) effect on solution B1, (b) effect on solution B2, (c) effect on solution B3.
Symmetry 13 02301 g007
Figure 8. Equilibrium analysis of ring road bus line and fare plans. (a) BM reinforcement learning model with λ m = 1.5 . (b) BM reinforcement learning model with λ m = 2.3 .
Figure 8. Equilibrium analysis of ring road bus line and fare plans. (a) BM reinforcement learning model with λ m = 1.5 . (b) BM reinforcement learning model with λ m = 2.3 .
Symmetry 13 02301 g008
Table 1. The matrix of stop schedule plans.
Table 1. The matrix of stop schedule plans.
Bus Stop 1Bus Stop 2Bus Stop iBus Stop N
Bus line 11111
Bus line 20101
Bus line k1110
Bus   line   N max l i n e 0011
Table 2. Average travel demand among bus stops in a real case.
Table 2. Average travel demand among bus stops in a real case.
Travel Demand Dij1. Wu Ke Song Qiao Nan2. Si Ji Qing Qiao Nan3. Zhong Guan Cun Yi Jie4. Xue Yuan Qiao Dong5. An Hui Qiao Dong6. Wang Jing Qiao Dong7. Hong Ling Jin Qiao Bei8. Da Jiao Ting Qiao Nan9. Xiao Hong Men Qiao10. Huang Tu Gang11. Yi Hai Hua Yuan12. Bei Da Di
1. Wu Ke Song Qiao Nan026.4519.1618.83205.932.682.682.682.682.682.68
2. Si Ji Qing Qiao Nan10.54036.4422.8927.3415.3312.4411.6811.5410.5411.5411.74
3. Zhong Guan Cun Yi Jie5.5710.2013.3128.5717.4711.077.138.34776.575.576.57
4. Xue Yuan Qiao Dong3.774.774.77052.4423.810.75.865.534.774.774.97
5. An Hui Qiao Dong11.1412.9611.1430.68046.3431.4917.4413.7612.9412.2212.64
6. Wang Jing Qiao Dong222338.12020.8711.74.673.4743.33
7. Hong Ling Jin Qiao Bei3.394.393.393.393.394.3909.1715.258.4510.066.39
8. Da Jiao Ting Qiao Nan4.545.545.545.544.545.876.39012.974.544.545.54
9. Xiao Hong Men Qiao1.91.91.91.91.91.91.91.9011.3123.86.1
10. Huang Tu Gang6.564.213.7353.963.363.363.363.364.8032.3918.71
11. Yi Hai Hua Yuan6.484.542.492.322.292.182.182.182.181.18027.08
12. Bei Da Di45.2326.0322.9622.6422.3422.222.722.222.5625.4921.20
Table 3. Parameter setting.
Table 3. Parameter setting.
SymbolMeaningValueSymbolMeaningValue
n p × n p population size of the swarm algorithm25 d critical distance of bicycle riding5 (km)
N max l i n e maximum number of bus lines5 ϕ r departure frequency of bus line5
ζ p s y , ζ T , ζ C , ζ P , ζ M cost coefficients0.05 l travelers’ learning intensity0.9
λ m social interaction level1.5 L traveler’s memory length10
V r bus capacity150 p c traveler’s interaction intensity of risk aversion0.8
[ P r min , P r max ] fare range of bus per kilometer[0, 0.5] P ˜ p parking fee10
ψ x t travelers’ regret aversion level[0, 1] N a g e n t the number of agents in reinforcement learning100
p m mutation probability0.01 c r average operating cost10
Table 4. Representative Pareto optimal solution based on BM reinforcement learning model.
Table 4. Representative Pareto optimal solution based on BM reinforcement learning model.
Scheme 1
Stop Schedule Plans1. Wu Ke Song Qiao Nan2. Si Ji Qing Qiao Nan3. Zhong Guan Cun Yi Jie4. Xue Yuan Qiao Dong5. An Hui Qiao Dong6. Wang Jing Qiao Dong7. Hong Ling Jin Qiao Bei8. Da Jiao Ting Qiao Nan9. Xiao Hong Men Qiao10. Huang Tu Gang11. Yi Hai Hua Yuan12. Bei Da DiFare per Kilometer
Bus no. 4001111111111110.0041
Bus line 10000011101100.0091
Bus line 20000001110100.1132
Bus line 31011110001100.3668
Bus line 41111111101110.3853
Solution B2
Stop Schedule Plans1. Wu Ke Song Qiao Nan2. Si Ji Qing Qiao Nan3. Zhong Guan Cun Yi Jie4. Xue Yuan Qiao Dong5. An Hui Qiao Dong6. Wang Jing Qiao Dong7. Hong Ling Jin Qiao Bei8. Da Jiao Ting Qiao Nan9. Xiao Hong Men Qiao10. Huang Tu Gang11. Yi Hai Hua Yuan12. Bei Da DiFare per Kilometer
Bus no. 4001111111111110.4321
Bus line 10010110010110.2245
Bus line 20110001111110.2838
Bus line 31110101111110.3405
Bus line 41101011101010.2402
Solution B3
Stop Schedule Plans1. Wu Ke Song Qiao Nan2. Si Ji Qing Qiao Nan3. Zhong Guan Cun Yi Jie4. Xue Yuan Qiao Dong5. An Hui Qiao Dong6. Wang Jing Qiao Dong7. Hong Ling Jin Qiao Bei8. Da Jiao Ting Qiao Nan9. Xiao Hong Men Qiao10. Huang Tu Gang11. Yi Hai Hua Yuan12. Bei Da DiFare per Kilometer
Bus no. 4001111111111110.4864
Bus line 11110101010100.4704
Bus line 21111011111110.0910
Bus line 31110110011100.3836
Bus line 40100011011100.4724
Table 5. Representative Pareto optimal solution based on logit model.
Table 5. Representative Pareto optimal solution based on logit model.
Solution L1
Stop Schedule Plans1. Wu Ke Song Qiao Nan2. Si Ji Qing Qiao Nan3. Zhong Guan Cun Yi Jie4. Xue Yuan Qiao Dong5. An Hui Qiao Dong6. Wang Jing Qiao Dong7. Hong Ling Jin Qiao Bei8. Da Jiao Ting Qiao Nan9. Xiao Hong Men Qiao10. Huang Tu Gang11. Yi Hai Hua Yuan12. Bei Da DiFare per Kilometer
Bus no. 4001111111111110.1954
Bus line 11111010110010.2473
Bus line 20110001001000.3761
Bus line 31100010110100.2140
Bus line 41011110111110.0372
Solution L2
Stop Schedule Plans1. Wu Ke Song Qiao Nan2. Si Ji Qing Qiao Nan3. Zhong Guan Cun Yi Jie4. Xue Yuan Qiao Dong5. An Hui Qiao Dong6. Wang Jing Qiao Dong7. Hong Ling Jin Qiao Bei8. Da Jiao Ting Qiao Nan9. Xiao Hong Men Qiao10. Huang Tu Gang11. Yi Hai Hua Yuan12. Bei Da DiFare per Kilometer
Bus no. 4001111111111110.3338
Bus line 10001010011110.1492
Bus line 20111011111010.0632
Bus line 30000001110110.3072
Bus line 41010010001110.3573
Solution L3
Stop Schedule Plans1. Wu Ke Song Qiao Nan2. Si Ji Qing Qiao Nan3. Zhong Guan Cun Yi Jie4. Xue Yuan Qiao Dong5. An Hui Qiao Dong6. Wang Jing Qiao Dong7. Hong Ling Jin Qiao Bei8. Da Jiao Ting Qiao Nan9. Xiao Hong Men Qiao10. Huang Tu Gang11. Yi Hai Hua Yuan12. Bei Da DiFare per Kilometer
Bus no. 4001111111111110.1500
Bus line 10110010001010.4658
Bus line 20010000100000.5621
Bus line 30110011011000.4197
Bus line 40000000010100.2452
Table 6. Calculation results of the model.
Table 6. Calculation results of the model.
BM Reinforcement Learning ModelLogit Model
Pareto Optimal SolutionB1B2B3L1L2L3
Generalized travel cost2.84433.95304.27123.76834.51155.5936
Average operating profit of buses−142.8101−127.2018−103.8919−161.8960−121.2167−90.0820
Maximum   sec tion   flow   ( max { f i , i + 1 r + + f i , i + 1 r } ( r R , i V ) ) of the optimal bus lines110.827680.016068.300469.354892.443595.4080
Maximum   sec tion   flow   ( max { f i , i + 1 r + + f i , i + 1 r } ( r R , i V ) ) of the optimal bus lines based on real ticket fare79.185584.306175.326169.539292.945092.6598
Table 7. The comparison between Pareto optimal solution and the real case.
Table 7. The comparison between Pareto optimal solution and the real case.
Pareto Optimal Solutions and the Real Case B 1 B 2 B 3
Maximum   sec tion   flow   ( max { f i , i + 1 r + + f i , i + 1 r } ( r R , i V ) ) of the optimal bus lines and differentiated fares140.110289.940781.2725
Maximum   sec tion   flow   ( max { f i , i + 1 r + + f i , i + 1 r } ( r R , i V ) ) of bus lines under real case (bus no. 400 and bus no. 400 fast in Beijing)101.302196.216387.2309
Average operating profit of the optimal bus lines and differentiated fares−75.3130−56.5348−37.3504
Average operating profit of bus lines under real case (bus no. 400 and bus no. 400 fast in Beijing)−63.0251
Generalized travel cost of the optimal bus lines and differentiated fares0.47300.64850.9972
Generalized travel cost of bus lines under real case (bus no. 400 and bus no. 400 fast in Beijing)0.7104
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Share and Cite

MDPI and ACS Style

Li, X.; Zhu, X.; Li, B. Multi-Objective Optimization of Differentiated Urban Ring Road Bus Lines and Fares Based on Travelers’ Interactive Reinforcement Learning. Symmetry 2021, 13, 2301. https://doi.org/10.3390/sym13122301

AMA Style

Li X, Zhu X, Li B. Multi-Objective Optimization of Differentiated Urban Ring Road Bus Lines and Fares Based on Travelers’ Interactive Reinforcement Learning. Symmetry. 2021; 13(12):2301. https://doi.org/10.3390/sym13122301

Chicago/Turabian Style

Li, Xueyan, Xin Zhu, and Baoyu Li. 2021. "Multi-Objective Optimization of Differentiated Urban Ring Road Bus Lines and Fares Based on Travelers’ Interactive Reinforcement Learning" Symmetry 13, no. 12: 2301. https://doi.org/10.3390/sym13122301

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop