Next Article in Journal
Study on the Properties of Cement-Based Cementitious Materials Modified by Nano-CaCO3
Previous Article in Journal
Fault Diagnosis of HV Cable Metal Sheath Grounding System Based on LSTM
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Collaborative Optimization Method for Multi-Train Energy-Saving Control with Urban Rail Transit Based on DRLDA Algorithm

1
College of Mechanical and Control Engineering, Guilin University of Technology, Guilin 541004, China
2
Guangxi Key Laboratory of Hidden Metallic Ore Deposits Exploration, Guilin University of Technology, Guilin 541004, China
3
Yunnan Institute of Transportation Planning and Design, Kunming 650011, China
4
College of Information Science and Engineering, Guilin University of Technology, Guilin 541004, China
5
Guangxi Key Laboratory of Embedded Technology and Intelligent System, Guilin 541004, China
6
College of Computer Science and Technology, Beijing Jiaotong University, Beijing 100044, China
*
Author to whom correspondence should be addressed.
Appl. Sci. 2023, 13(4), 2454; https://doi.org/10.3390/app13042454
Submission received: 14 January 2023 / Revised: 6 February 2023 / Accepted: 9 February 2023 / Published: 14 February 2023
(This article belongs to the Section Transportation and Future Mobility)

Abstract

:
With the traffic congestion problem deteriorating, people increasingly choose urban rail transit (URT) to travel. Although URT alleviates traffic congestion, the long-term operation of a large number of trains leads to huge energy consumption. In order to adapt the major social development concept of “Low carbon”, a multi-train energy-saving control collaborative optimization method is proposed in this paper. First, the composition of single train operating conditions is determined by the conversion of operating conditions between stations and the force changes under the premise of ensuring safe and on-time train operation. A single-train energy consumption calculation combinatorial optimization model with the dual control objectives of reducing passengers’ average waiting time as well as train traction energy consumption is established. The energy saving control strategy of a single train is investigated by ARMA-Radial Basis Function Neural Network (ARMA-RBFNN) and Genetic Algorithm (GA). Next, the queuing theory is introduced to analyze the variation in passenger waiting time for multiple trains at different arrival intervals. A Deep Reinforcement Learning (DRL) algorithm is designed to obtain the correlation among passenger waiting time, arrival interval and train stopping time. The optimization objective is to minimize the multi-train traction energy consumption and the average passenger waiting time while considering conditions such as train operating safety interval, speed limit, multiple operating state and single train energy-saving models, etc. Then, a multi-train cooperative energy-saving control model is proposed based on the Dragonfly Algorithm (DA). Finally, a case study of Beijing Metro Line 4 is conducted to illustrate the effectiveness of the proposed method. The results demonstrate that the total traction energy consumption and passenger waiting time are reduced by 3.1% and 5 s, respectively, compared with the method of independently optimizing the single-train control strategy. The findings can aid in the development of energy-saving strategies and also provide a basis for energy-saving operation control of multiple trains.

1. Introduction

In recent years, Urban Rail Transit (URT) systems have established independent operation lines. Compared with urban buses, taxis and other forms of public transport, URT systems have developed significantly due to their advantages in safety, high punctuality and reliability. The construction of URT systems has become an important way to alleviate traffic congestion. In China, the operating mileage of the URT systems increased from 5032.7 km in 2017 to 9192.62 km in 2021 Q3 [1,2]. The average annual growth rate is about 19.16% [2]. With the increase in the operating mileage, the energy consumption of the URT systems has drawn more attention. In 2020, the annual energy consumption of the URT systems was about 172.4 billion kWh [3]. Among them, traction energy consumption accounted for 8.4 billion kwh or 48.7% [3]. From the perspective of environmental protection and operational cost reduction, it is necessary to study how to reduce the energy consumption in URT systems. Therefore, it is of great importance to reduce energy consumption by optimizing the multi-train operating conditions and statuses. As a result, the reduction in traction energy and the improvement of multi-train operating control strategies are two effective measures to save energy in URT systems.
Regarding single-train energy-saving operating control strategies, most studies focus on the line condition, train operating speed and condition conversion point. Howlett, Pudney and Vu [4] calculated the critical switching points and key speeds so as to obtain an optimal and feasible solution for train energy saving. Bocharnikov, Tobias and Roberts [5] presented a single-train trajectory optimization to obtain minimum energy consumption with maximum regenerated energy. Albrecht et al. [6,7] proposed a method to obtain an energy-efficient driving strategy for a train’s journey on an undulating track with steep grades. Miao et al. [8] proposed an energy-saving operation optimization model of a single train based on time discretization. Li et al. [9] presented a Pareto multi-objective genetic algorithm to optimize the energy consumption of the train. Li et al. [10] established an energy-saving optimal control model of a train under a timing constraint considering ramp, speed limit of the line and feedback of regenerative braking energy. Various intelligent algorithms were applied to obtain the best combination sequence and accurate driving mode switching point based on different operation conditions and train types [11]. Yin, Chen and Li [12] presented a regression tree to deal with design and analysis of energy-efficient train speed profiles. Fernández-Rodríguez, Fernández-Cardador and Cucala [13] used fuzzy optimization to calculate an optimal driving strategy that balanced energy-efficiency and computational efficiency. Bao et al. [14] proposed a design and analysis of energy-efficient train speed profiles to predict the actual load value of microgrid energy. Dou et al. [15] established a speed-flow relationship model to analyze the highway situation based on the data passing through sensor network nodes. To sum up, the train has many possible control strategies for each segment when the running status is determined. Additionally, the traction energy consumption for each control strategy is normally different. However, most researchers have tried to seek for an energy-saving control strategy for a given ideal environment with the train operation statuses of adjacent stations, so as to balance the sequence of formulating strategies and the switching point of operating conditions.
As demonstrated in the literature review, when the running state is determined, the train has many possible control strategies for each operating segment. Meanwhile, the traction energy consumption for each control strategy is normally different. However, most researchers have tried to seek for an energy-saving control strategy for a given ideal environment with the train operation statuses of adjacent stations, so as to balance the sequence of strategy formulation and the switching points of operating conditions. This can reduce the practicality of the control strategy and its control model in a real-life case.
A train control model (TCM), consisting of multi-train and carriages, takes train running status as a mass model. Although coupling between adjacent carriages is simplified, it is still a common and effective method to develop a train control strategy. Thus, some studies have carried out research on train traction conditions. Yang et al. [16] proposed a cooperative scheduling method to optimize the timetable so that the energy generated by the braking train could be directly used to accelerate the train. Ye and Liu [17] established a multi-phase optimal control method for multi-train control and scheduling on railway lines. Huang and Shuai [18] applied a customer-oriented dynamic cyclical adjustment approach to optimize a high-speed railway train scheduling plan. Liu, Guo and Yu [19] formulated a cooperative control model, considering the utilization of regenerative energy. Long and Yin [20] developed hyper-speed train operation control and energy-saving control URT strategies. Liu et al. [21] proposed a cooperative scheduling approach to optimize the timetable based on variable traction force and braking action. Feng et al. [22] presented an energy-saving operation optimization method for URT trains based on the recycling of braking energy. Cao [23] constructed a multiple energy consumption model to optimize the manipulation strategy. Pineda-Jaramillo, J et al. [24] used four basic features to predict the traction power based on different machine learning models. Deng et al. [25] created an energy consumption optimization method for train operation with these multiple parameters. Liu, Yang and Yang [26] proposed multiple high-speed trains following a characteristic model and a cooperative optimal control strategy. Oh et al. [27] investigated and classified representative applications for railway safety, mainly focusing on deep learning approaches. Zhang et al. [28] established a multi-train collaborative energy consumption optimization model to minimize the total energy consumption. Jieli and Tao [29] constructed a collaborative optimization model of multi-train operation to maximize the overlap time for regenerative braking. The energy performance method for the energy-efficient train model is still a popular topic of research. Wang et al. [30] proposed an integrated energy-efficient train operation method to minimize the systematic energy. Gu, Tang and Ma [31] proposed a multiple-optimization-model-based energy-efficient operation method to reduce energy consumption and arrival delay for following trains. Su et al. [32] proposed a train regulation method to use the dwell time supplements sufficiently, which can reduce energy consumption. He et al. [33] presented a multi-particle train model to realize train trajectory optimization within a given timetable. Huang et al. [34] proposed a joint scheduling model to improve service quality and reduce energy consumption in urban rail transit networks.
In summary, many existing studies have proposed different train control models and their corresponding optimization methods. However, the above studies have affected passenger travel comfort due to the multiple adjustments of station dwell time, inter-station operating time and other operation parameters, as well as the large adjustment range of the current train operation schedule. At the same time, the combination of multiple operating conditions, such as multi-traction or multi-cruises, as well as the actual operating conditions, which consist of line speed limit and passengers’ travel waiting time, have not been considered, which reduces the practicability of the model to a certain extent. Although several studies focused on trains at a single station or adjacent stations in the same power supply interval, few studies considered the utilization of regenerative energy for multiple trains at multiple stations. In addition, most of the previous works discuss single-objective optimization problems, but train performance is often influenced by many factors.
The control strategy optimization and the train timetable optimization attempt to achieve global minimization of energy consumption. Several studies focus on train timetable optimization based on dynamic passenger flow. Sun et al. [35] formulated three optimization models to design demand-sensitive timetables by demonstrating train operation using equivalent time. Li et al. [36] adopted the Modular Multilevel Converter Type Railway Power Conditioner (MMC-RPC) with distributed super-capacitor (SC) energy storage (ES) scheme. Gao et al. [37] proposed a comprehensive quantitative optimization model of the route scheme for the desert railway based on Interval Number and TOPSIS. Niu, Zhou and Gao [38] calculated and adjusted train timetables for a rail corridor with given time-varying origin-to-destination passenger demand matrices. Considering dwell time uncertainty and traction force, a timetable optimization method was proposed to minimize the energy consumption and train travel time [39,40]. Su et al. [41] presented an integrated train operation approach by jointly optimizing the train timetable and driving strategy, which can minimize the systematic net energy consumption, including the traction energy consumption and the reused regenerative energy. Gao and Yang [42] proposed an integrated optimization method for trains’ timetables and speed profiles. The train speed profile can be optimized to achieve on-time and precise stopping through a sequence of operating modes and associated tracking/braking accelerations at each position. Li et al. [43] proposed a multi-objective programming model consisting of three objective functions to minimize the energy consumption, passenger waiting time at stations and waiting time of transfer passengers at transfer stations. In recent years, data-driven energy saving control methods have been applied in URT systems. A series of machine learning method are employed in data-driven multi-train energy control strategy optimization. Bai et al. [44] presented a cooperative control method to optimize each switching point of the driving regimes. Yin et al. [45] used a deep neural network (DNN) to improve the performance of train energy control. Zhang et al. [46] proposed a policy-based reinforcement learning train control approach to reduce the traction energy consumption of the train. Liu et al. [47] applied a model predictive control method to the minimum safety tracking interval. Due to the multiple parameters of the trains’ dynamics, Cao et al. [48] and Cao, Wen and Ma [49] applied proportional integral differential (PID) and fuzzy controllers to optimize train control and the speed–time curve. Liu and Liu [50] built a dynamic planning model using rail running time to find the optimal control strategy. Zhu et al. [51] proposed multi-train timetable optimization based on the improved Seeker Optimization Algorithm (SOA). Zhang et al. [52] proposed a novel distance measurement on hesitant fuzzy sets (HFSs) based on equal-probability transformation and their application to decision-making regarding traffic control. In a word, the above research mainly focuses on the passenger waiting time at the station, and does not consider actual constraints such as train multi-stage traction energy consumption, actual train capacity, safety speed limit, etc. The established mathematical model has not yet been perfected and adapted to the actual operation conditions.
Nevertheless, there are significant limitations to the previous studies. First, the above studies mainly focus on the passenger waiting time at the station, and do not consider actual constraints such as train multi-stage traction energy consumption, actual train capacity, safety speed limit, etc. The established mathematical model has not yet been perfected and adapted to the actual operation conditions. Second, reducing passenger waiting time and energy consumption through train scheduling has a great impact on improving service quality and energy efficiency from an operational perspective [43]. However, there is a lack of studies on coordinated optimization of train traction energy consumption of multiple trains that also consider time-varying passenger demand and train operating state variation. Third, the above studies reduce the total energy consumption by optimizing the timetable or speed profile, respectively. However, the train timetable and speed profile are not independent elements of train operation. Both have a direct impact on traction energy consumption and the utilization of regenerative braking energy, and should be considered together.
In summary, at present, both research on single train energy-saving operation and multi-train timetable optimization have achieved rich findings, but there are still some urgent problems regarding energy-saving train control that deserve in-depth study.
On one hand, in terms of train control, the energy-saving control method and mathematical model, based on the combination of time-varying passenger flow and multi-train operating conditions, have not been fully established. When multiple trains run at the same time, due to the short departure interval, any slight change in the train operation process will affect the operation of other trains in the network and indirectly influence the use of train traction energy consumption. In terms of ensuring the safety of multi-train workshop operation, few studies have been conducted on the coordinated optimization control method of train traction energy consumption based on the intersective relationship between time-varying passenger demand and train operating state variation.
On the other hand, although these data-driven models have significantly improved the accuracy and flexibility of TCM, there are still shortcomings: the existing data-driven TCMs are based on black-box models that do not explain how they use the full range of inputs in physical operation. However, TCM is a multi-objective optimization problem concerning safety, punctuality, passenger comfort and energy saving, which is a complex learning problem. The focus of this study is combined with deep learning and neural networks, which have demonstrated their ability to simulate a significant amount of high-dimensional data to predict the movements of helicopters, robots and patients. Most of these models are based on multi-layered neural networks without physical properties. As mentioned above, TCMs are sensitive to variations in operating conditions and train parameters, whereas the data-driven models are black boxes and, thus, difficult to use in practice. Therefore, how to combine deep learning with a dynamic model of physical operating parameters to predict and control multi-train energy consumption is still an outstanding problem.
In view of multi-train energy-saving control collaborative optimization, a set of mathematic models are presented in which the objective function is to minimize the multi-train traction energy consumption and average passenger waiting time. Then, a DRLDA algorithm is designed to obtain the optimal objective value and cooperative control optimization strategy for all trains. The rest of this paper is organized as follows. Section 2 provides an energy-saving control model of a single train that is multiple objective-driven. In Section 3, the mathematical models for multi-train cooperative energy-saving control are established based on the DRLDA algorithm. In Section 4, the effectiveness of the proposed approach is demonstrated according to the case studies. Section 5 summarizes the main conclusions.

2. Multiple Objective-Driven Energy-Saving Control Model of a Single Train

2.1. Passenger Flow Prediction of URT

When studying the energy-saving train control operation strategy, the influence of passenger flow variations should be considered. Using the metro’s historical passenger flow data to effectively predict the movement of passengers in or out of the stations is of great significance for studying train traction energy consumption and reducing the cost of train operation. In the process of metro passenger flow prediction, when there are large, random fluctuations in passenger flow, the predicting accuracy of traditional models is often greatly affected. Thus, this paper uses the change-point algorithm to handle the time series consisting of passenger flow data. Then, the wavelet transform method is applied to denoise the change point set. Finally, the URT passenger flow is predicted by using the ARMA-RBFNN algorithm.

2.1.1. The Change-Point Algorithm

The passenger flow data are random in nature; these data present continuous variation in a sawtooth shape. Some change points are ignored when predicting directly with the original data, which affects the prediction accuracy. Therefore, the set of change points is defined as b(w). The change-point algorithm is used to obtain the b(w) in the time series of passenger flow. The change-point algorithm can improve the fitting accuracy of the model by dividing the time series of passenger flow into time periods with different characteristics based on qualitative changes in the observed function values at a certain time point. The specific steps of the algorithm are as follows:
Step 1: The cycle number is set as u, the number of change point as w and the auxiliary a variable as e. It is supposed that u = 1, w = 1, e = 1, β j = l j (j = 1, 2, …, k), where lj is the valley point on the peak curve (or the peak point of the valley curve).
Step 2: On the curve, β u is the starting point. Two convex waves are taken along the direction of the time axis, which are denoted as β u β u + 1 β u + 2 . The wave variance of b(w) is calculated within this range. Each point in the wave is taken as the middle point, and then evaluated. The next step is to calculate the maximum and minimum values of t * .
Y β u , β u + 1 , β u + 2 = t = β u β u + 1 x t x ¯ β u β u + 1 2 + t = β u + 1 + 1 β u + 2 x t x ¯ β u + 1 + 1 β u + 2 2
max β u , t 1 * , β u + 2 = max t = β u t x t x ¯ β u t 2 + t = t + 1 β u + 2 x t x ¯ t + 1 β u + 2 2
min β u , t 2 * , β u + 2 = min t = β u t x t x ¯ β u t 2 + t = t + 1 β u + 2 x t x ¯ t + 1 β u + 2 2
where x t is the value of the passenger flow time series at time t. t 1 * represents the maximum values. t 2 * represents the minimum values.
Step 3: Change point estimation:
R u = 2 Y β u , β u + 1 , β u + 2 max β u , t 1 * , β u + 2 + min β u , t 2 * , β u + 2
c r u = max β u , t 1 * , β u + 2 min β u , t 2 * , β u + 2
Step 3.1: If max β u , t 1 * , β u + 2 min β u , t 2 * , β u + 2 , when 1 c r u < 1.3 , the time series of passenger flow is slightly changed. There is no need to delineate the change point in the range of (lu,lu+2).
Step 3.2: If R u 1 4 + 3 2 2 c r u + 1 , whether it satisfies t 1 * = β u + 1 or not, then b ( w ) = β u + 1 , where b ( w ) represents a set of change points on the curve.
Step 3.3: If 1 4 + 3 2 2 c r u + 1 R u 1 , search for β u + 1 .
Step 3.4: If t c β u + 1 L , β u + 1 R , the following condition satisfies D e v β u , t c , β u + 2 = min D e v β u , β u + 1 L , β u + 2 , D e v β u , β u + 1 R , β u + 2 , where Dev(·) indicates variance. b ( w ) = t c can be obtained.
Step 3.5: If R u > 1 , so get b ( w ) = t 2 * .
The validity of b ( w ) is determined; if T ( w 1 ) > ε + 1 , the change point is retained. Otherwise, it is deleted.
Step 3.6: If b ( w ) exists, so u = u + 1, b ( w ) = β u , β u + 1 = β u + 1 , β u + 2 = β u + 2 , w = w + 1, e = 0. Otherwise, e = 1, β u = β u , β u + 1 + e = β u + 1 + e , β u + 2 + e = β u + 2 + e . When β u + 2 = β k , the search is complete and the change-point algorithm ends. Otherwise, Step 2 is repeated.

2.1.2. Wavelet Transform Method

The original data are processed again on the basis of wavelet transform to ensure the validity. Wavelet transform has the characteristic of multi-resolution analysis, which can focus on any detail of the passenger flow data for multi-resolution time-frequency domain analysis and can also effectively remove the noise from the passenger flow data. The equation for wavelet transform is as follows:
T x ( a , b ) = 1 a + x ( t ) ω ( t b ¯ a ) d t = x ( t ) , ω a , b ( t )
where a is the scale of passenger flow data and b is the translation amount of the passenger flow data. x(t) is the time-varying passenger flow data, w(t) is the basic wavelet and ω a , b ( t ) is a function set of basic wavelets. t is the sampling time interval.
Step 1: The sensor data is input.
Step 2: Wavelet transform multi-scale decomposition is used, including Haar, Db, Sym and Coif.
Step 3: Wavelet coefficients of each scale are denoised and processed with different layers.
Step 4: The signal-to-noise ratio is calculated. Compared with the results, the wavelet basis functions are selected. The final passenger flow data are obtained.

2.1.3. ARMA-RBFNN Algorithm

The data processed by the change-point algorithm and wavelet transform method are non-stationary time series. The non-stationary time series are transformed into stationary time series by difference computing. However, the ARMA model could not predict the nonlinearity of these data. The ARMA-RBFNN algorithm is constructed to reduce the prediction error. The specific steps are described as follows:
Step 1: The data are input after secondary processing.
Step 2: The stationarity, as well as the value of both the autocorrelation function and the partial autocorrelation function, are calculated. This assumes that the covariances of two random variables are defined as xt and xt−k, separately, in b(w) with k period intervals. The autocovariance function (defined as covk), with the lag of k period, is calculated by Equation (7).
cov k = cov b ( w ) , b ( w k ) = E b ( w ) u ¯ b ( w k ) u ¯
where u ¯ is the mean value of the passenger flow time series.
The autocorrelation function of the passenger flow time series is shown as ρ k = cov k ϑ x t k ϑ x t , where ϑ x t 2 = E x t E ( x t ) . When the passenger flow time series is stationary, the autocorrelation function is defined as ρ k = cov k ϑ 0 .
The data sample is presented as ρ ^ k = t = 1 n k x t x ¯ x t k x ¯ t = 1 n x t x ¯ 2 , where x ¯ = t = 1 n x ¯ n . The partial autocorrelation degree of the data sample is defined as ϕ ^ k k = ρ ^ k , k = 1 ρ ^ k j k 1 ϕ ^ k 1 , j ρ ^ k j 1 j k 1 ϕ ^ k 1 , j ρ ^ k j , k = 2 , 3 , .
If the graph has tailing characteristics, it belongs to a stationary time series, and Step 4 should be completed. Otherwise, it belongs to a non-stationary time series, and Step 3 should be completed.
Step 3: Difference computing transforms non-stationary time series into stationary time series. Due to the periodicity of the passenger flow entering the station, periodic differences are established for non-stationary time series.
y t = x t + s x t
Step 4: It is assumed that the passenger flow time series consists of linear and nonlinear components, i.e., L(t) is the linear part and N(t) is the nonlinear part.
y t = L ( t ) + N ( t )
Step 5: p and q are determined. The rank of ARMA algorithm is determined by the AIC criterion.
Step 6: Aiming at the linear part of the passenger flow time series, the ARMA model is used for passenger flow prediction. The prediction results are defined as p(t), and the nonlinear prediction error (defined as e(t)) is calculated as follows:
e ( t ) = y t p ( t )
Step 7: e(w) is randomly divided into two parts: 75% error for the training set and 25% for the test set.
Step 8: The K-MEANS clustering algorithm is applied to determine the radial basis function center. The average method is used for the radial basis function widths so as to determine the core parameters of the RBFNN algorithm.
Step 9: The training set is used to train the RBFNN algorithm. The RBFNN algorithm is used to predict the results of the test set.
Step 10: If the prediction value of RBF is defined as N ^ t , the equation represents the prediction result.
y ^ t = L ^ t + N ^ t

2.2. Energy-Saving Control Model of a Single Train

In Section 2.1, the variation pattern of passenger flow was obtained by the ARMA-RBFNN algorithm. When passengers board the train, how to control the train operation and reduce traction energy consumption has become an important part of the train energy-saving model.

2.2.1. Parameters and Notations

As shown in Table 1, the relevant notations and parameters throughout this paper are listed below to describe the problem more conveniently.

2.2.2. Operating Condition Analysis

In the force analysis of the train, it can be clearly seen that under the action of different forces, the train travels in different operating conditions. The operating condition of a train during operation can be divided into four types: traction, cruise, coasting and braking. In the traditional sense, there are only three train operating conditions: traction, coasting and braking. Traction conditions include primary traction, secondary traction and multiple traction. Cruising conditions mean that the running resistance and traction force of the train are balanced without resultant force, and the train runs at a fixed speed. Thus, the resultant force is analyzed under each operating condition.
F = F t r a c t i o n W t r a c t i o n   o p e r a t i n g   c o n t i o n 0 c r u i s e   o p e r a t i n g   c o n t i o n W c o a s t i n g   o p e r a t i n g   c o n t i o n ( B + W ) b r a k i n g   o p e r a t i n g   c o n t i o n
where Ftraction is the traction; W is resistance; B is braking force.

2.2.3. Energy Consumption Calculation

Most studies only calculate the traction energy consumption of the train under the four operating conditions while ignoring the secondary traction, which reduces the accuracy of the results. Thus, due to comprehensive consideration of secondary traction, it is assumed that the train operation process consists of six stages in Figure 1: traction, coasting, secondary traction, cruise, coasting and braking. The energy consumption is calculated, respectively, as follows.
Step 1: Traction stage 0–s1: the train starts with a certain acceleration traction to overcome the resistance, resulting in traction energy consumption.
F s i 1 m ( 1 + γ ) g W = m ( 1 + γ ) a s i 1
v = a s i 1 t i 1
E s i 1 = 0 t i 1 F s i 1 v d t = 0 t i 1 [ m ( 1 + γ ) ( g W + a s i 1 ) ] a s i 1 t d t = 0 t i 1 M ( Δ t ) ( 1 + γ ) ( g W + a s i 1 ) a s i 1 t d t
where γ is the rotary mass coefficient. γ = 0.08 .
Step 2: Coasting stage s1s2: the train is only affected by resistance during this process. The speed gradually decreases, and there is no traction energy consumption.
Step 3: Secondary traction stage s2s3: the train tracts for the second time with a certain acceleration, causing traction energy consumption.
v s i 2 = v s i 1 a s i 2 t i 2
v s i 3 = v s i 2 + a s i 3 t i 3 = v s i 1 a s i 2 t i 2 + a s i 3 t i 3
F s i 3 m ( 1 + γ ) g W = m ( 1 + γ ) a s i 3
E s i 3 = t i 2 t i 3 F s i 3 v d t = t i 2 t i 3 m ( 1 + γ ) ( g W + a s i 3 ) ( v s i 1 a s i 2 t i 2 + a s i 3 t i 3 ) d t = t i 2 t i 3 M ( Δ t ) ( 1 + γ ) ( g W + a s i 3 ) ( v s i 1 a s i 2 t i 2 + a s i 3 t i 3 ) d t
Step 4: Cruise stage s3s4: the resultant force on the train is zero. At the same time, the train runs at a constant speed of v s i 3 . The traction energy consumption is calculated according to the resistance of the train.
F s i 4 = m ( 1 + γ ) g W = M ( Δ t ) ( 1 + γ ) g W
E s i 4 = t i 3 t i 4 F s i 4 v d t = t i 3 t i 4 M ( 1 + γ ) g W v s i 3 d t
Step 5: Coasting stage s4s5: the train is only affected by resistance in this process and performs a deceleration motion; the traction energy consumption is zero.
Step 6: Breaking stage s5s6: the train is only affected by two forces, resistance and braking force, and the traction force does not do any work, so the traction energy consumption is zero.
In summary, the energy consumption of train operation in this zone, defined as E3, is the sum of the energy consumption of operation, consisting of primary traction, secondary traction and the cruising stage, which is calculated by Equation (22).
E 3 = E s i 1 + E s i 3 + E s i 4 = 0 t i 1 M ( Δ t ) ( 1 + γ ) ( g W + a s i 1 ) a s i 1 t d t + t i 2 t i 3 M ( Δ t ) ( 1 + γ ) ( g W + a s i 3 ) ( v s i 1 a s i 2 t i 2 + a s i 3 t i 3 ) d t + t i 3 t i 4 M ( Δ t ) ( 1 + γ ) g W ( a s i 1 t i 1 a s i 2 t i 2 + a s i 3 t i 3 ) d t = H ( Δ t i , t i 1 , t i 2 , t i 3 , t i 4 )

2.2.4. Single-Train Control Model

The process of urban rail transit operation can be divided into two main bodies: passengers and operators. From the perspective of passengers, there is a “benefit paradox” relationship between the high demand of passengers for a certain service level and train operation energy consumption. Passengers always hope to board the train as soon as it arrives as well as for the waiting time at the platform to be shortened. However, such hopes lead to an increase in train running speed and traction energy consumption, which, in turn, increases operation cost. From the perspective of the operator, the operator wants to appropriately reduce the train’s running speed and traction energy consumption in order to lower operation costs while transporting as many passengers as possible and ensuring the safety of train operation. The contradiction between supply and demand between the two main bodies is becoming increasingly prominent and influential. Therefore, this paper comprehensively considers strategies to meet the high service level requirements for passengers and co-optimize the traction energy consumption of trains.
Step 1: Control optimization objectives
  • Average passenger waiting time
In an actual situation, when passengers arrive at the platform to wait for the train, they automatically disperse according to the current queuing situation. Finally, the number of people queuing in front of each door tends to be balanced. Assuming that the travel time of passengers between the gate and the platform is ignored, the passenger waiting time is calculated as follows.
The departure time of the train at the ith platform is defined as r j i , which is calculated by Equation (23).
r j i = T j i + Δ t i + d j i
Similarly, the departure time of the previous train at the ith platform is defined as r j 1 i , which is shown as Equation (24).
r j 1 i = T j 1 i + Δ t i + d j i
The maximum passenger waiting time at the ith platform is calculated by Equation (25).
Δ T j i = T j i + Δ t i r j 1 i = T j i + t i ( T j 1 i + Δ t i + d j i ) = T j i T j 1 i d j i
The number of passengers waiting to board the train at the ith platform is the number of passengers arriving during the time difference between the departure of the previous train from the platform and the arrival of this train at the platform.
Q u p i = f i ( Δ t i ) = 1 T j i + Δ t i h i ( t ) d t
The time when the kth passenger arrives at the ith platform is defined as Δ t q k i ; the passenger waiting time at the ith platform is calculated by Equation (27).
Δ t q k i = T j i + Δ t i t q k i
Therefore, the average passenger waiting time is calculated by Equation (28).
Δ t q k i ¯ = k = 1 Q u p i Δ t q k i Q u p i = k = 1 1 T j i + Δ t i h i ( t ) d t ( T j i + Δ t i t q k i ) 1 T j i + Δ t i h i ( t ) d t
  • Traction energy consumption
The total traction energy consumption of the train on the whole line is the comprehensive energy consumption between every two adjacent stations. It consists of traction energy consumption and cruise energy consumption. Assuming that the train line has I-1 station spacing, it can be seen from Equation (22) that the total traction energy consumption of the jth train on this line is calculated by Equation (29).
E j = 1 I 1 E 3 = 1 I 1 H ( Δ t i , t i 1 , t i 2 , t i 3 , t i 4 )
Step 2: Constraint condition
To ensure safe train operation, the train operating status needs to be subject to the following constraints:
  • Speed constraint
According to multiple factors, such as the physical conditions of the train line, train operating safety and passenger comfort, the train operation will be limited by the maximum speed, vmax.
In order to ensure the punctual arrival of the train and maximize the transportation efficiency, the train will be restricted by a minimum speed, vmin, during traction operation. When the train runs in coasting conditions, the train speed is close to the minimum speed, vmin. At this time, the train should be accelerated immediately to ensure that the running speed is greater than vmin, to guarantee that the train can reach the next station on time.
  • Train dwell time constraint
When a train enters a station and stops, it needs to be given some time for passengers to fully disembark/embark before moving to the next station. Thus, the train dwell time (defined as d j i ) should meet the following conditions.
d min d j i d max
  • Passenger waiting time constraint
Considering the complex psychological characteristics of passengers, the train should meet their traveling and psychological needs. Therefore, the maximum waiting time that passengers can accept at the platform is defined as ( Δ t q ) max , i.e., passenger waiting time (defined as Δ t q k i ) is calculated by Equation (31).
0 < Δ t q k i ( Δ t q ) max
Step 3: Modeling
Considering the cost benefit of the two main bodies, namely the operator and the passenger, a single train combination optimization model is established that aims to balance the traction energy consumption and the average passenger waiting time. The model control objectives are summarized as follows.
min J = α E j + β Δ t q k i ¯ = α H ( Δ t i , t i 1 , t i 2 , t i 3 , t i 4 ) + β K ( Δ t i ) s . t .   Δ t i ( 0 , 60 ) t i 1 + t i 2 + t i 3 + t i 4 + t B T j i + Δ t i v min v s i n v max d min d j i d max 0 < Δ t q k i ( Δ t q ) max
  • Step 4: The model is solved by GA.
Step 4.1: The process is initialized and basic data are input, including line name, train data, train model parameters, single side slope, curve radius, speed limit of running line, etc.
Step 4.2: The control model and its objective function are determined based on Equation (32).
Step 4.3: The encoding mode is determined. The real number encoding mode represents the conversion point s i n in train operating conditions. Since s i n is a model variable, the conversion point is represented according to genes. The chromosome represents the sequence of conversion points. A chromosome contains multiple genes: S I = s i 1 , s i 2 , , s i n .
Step 4.4: The reciprocal of the objective function is selected as the fitness function to dynamically search the internal information, and then the individual evaluation is conducted.
f i t = 1 α E j + β Δ t q k i ¯
Step 4.5: GA is designed according to roulette selection and adaptive crossover/mutation operation.
Step 4.6: Operation parameters, including population size, iteration times, selection operator, crossover operator, mutation operator, etc., are determined.
Step 4.7: It is judged whether genetic algebra has been reached. If reached, the result is output and the algorithm ends. Otherwise, Step 4.4 should be repeated and the calculation should be continued. The flowchart of GA is shown in Figure 2.

3. Multi-Train Cooperative Energy-Saving Control Model Based on Multi-Objective Combinatorial Optimization

3.1. Scenario Assumptions

When studying a multi-train cooperative control method, it is necessary to optimize the arrival time, departure interval and maximum interval speed at each station while considering the benefits of both passengers and operators. Combined with the actual operation of the train, the following assumptions are made.
For passengers: on one hand, it is assumed that passengers show no transfer behavior. On the other hand, when the train stops regularly at a station, it is assumed that passengers board the train in an orderly manner, and the mechanism of “first come, first served” is adopted.
For train operation: on one hand, it is assumed that the trains depart in order at the starting station, only one train is allowed to stop at each station at the same time and trains cannot overtake the previous train on the route. On the other hand, it is assumed that the train follows the “stop at station by station” principle during operation. The trains are not allowed to stop during inter-station operation, i.e., trains can only stop at stations.

3.2. Constraint Condition

When multiple trains are running on the line, in addition to meeting the constraints of the single train control model, they also satisfy the train safety interval constraints. Only one train is allowed to stop at the station and cannot stop on the running line based on scenario assumption. Thus, a safety interval is required between two trains. The minimum safety interval of the train includes the maximum braking distance and safety protection distance required for emergency braking. To simplify the calculation, the minimum safe distance of the train is defined as Taq.
T j i T j 1 i T a q

3.3. Methodology

3.3.1. Modeling

Step 1: The “queuing theory” is introduced to study the relationships among time-varying passenger flow prediction, arrival interval and dwell time. Service time is defined as train dwell time. Train arrival interval ( Δ T j i ) is defined as vacation time. Therefore, trains that enter and leave the stations and passengers who embark/disembark belong to a service queuing model, i.e., a single vacation system in which every service enters a vacation. The average queue length of waiting passengers is defined as L. Arrival interval is defined as Δ T j i . Dwell time is defined as d j i . According to Ning, Niu and Zhang [53]:
L = ρ ˜ + λ 2 [ b 2 + 2 d j i E ( Δ T j i ) + E ( ( Δ T j i ) 2 ) ] 2 [ 1 ρ λ ˜ E ( Δ T j i ) ] + λ ˜ E ( ( Δ T j i ) 2 ) 2 E ( Δ T j i )
where λ ¯ represents the input conversion parameter of passenger arrival rate λ . ρ and ρ ˜ both represent the service intensity, i.e., ρ = λ ˜ μ and ρ ˜ = ρ + λ ˜ E ( Δ T j i ) . b2 represents the second moment of service time distribution.
Considering Equation (26):
Q u p i = f i ( Δ t i ) = T j 1 i + Δ t i + d j i T j i + Δ t i h i ( t ) d t
The average passenger arrival rate during the arrival interval is calculated by Equation (37).
λ i ¯ = Q u p i Δ T j i = T j 1 i + Δ t i + d j i T j i + Δ t i h i ( t ) d t T j i T j 1 i d j i
The average passenger waiting time is calculated by Equation (38).
Δ T = L i λ i ¯
The service time for one passenger is defined as Yk; the total service time for all arriving passengers is calculated by Equation (39).
X ( t ) = 1 Q u p i Y k
As the dwell time of the train is certain, the service time of each passenger is the same, and is calculated by Equation (40).
E X ( t ) = λ t E ( Y k )
The average service time of passengers per unit time is calculated by Equation (41).
H = E X ( t ) t
The minimum service intensity of the train is Ds/person within the dwell time of a train, which is calculated by Equation (42).
λ ˜ = H D = λ E ( Y k ) D
If the arrival time of the train obeys the negative exponential distribution, the dwell time obeys the deterministic distribution, and the second moment is b 2 = D 2 . Service time is D = u j i .
The arrival interval is expressed based on the actual arrival and departure times:
E ( Δ T j i ) = u j i u j 1 i = T j i + Δ t j i T j 1 i Δ t j 1 i
E [ ( Δ T j i ) 2 ] = D ( Δ T j i ) + [ E ( Δ T j i ) ] 2 = 2 ( T j i + Δ t j i T j 1 i Δ t j 1 i ) 2
Equations (36)–(44) are substituted into Equation (45), giving Equation (45).
Δ T = 1 λ ¯ λ ˜ d j i + λ ˜ ( T j i + Δ t j i T j 1 i Δ t j 1 i ) + 2 λ ˜ ( T j i + Δ t j i T j 1 i Δ t j 1 i ) 2 2 ( T j i + Δ t j i T j 1 i Δ t j 1 i ) + ( λ ˜ ) 2 [ ( d j i ) 2 + 2 ( T j i + Δ t j i T j 1 i Δ t j 1 i ) 2 + 2 ( T j i + Δ t j i T j 1 i Δ t j 1 i ) 2 ] 2 [ 1 λ ˜ d j i λ ˜ ( T j i + Δ t j i T j 1 i Δ t j 1 i ) ] = P ( λ ˜ , Δ t j i )
where the service time is the minimum dwell time and the number of passengers that the train can serve is Q. The remaining number of passengers for a given service time is defined as λ ¯ × Δ T Q .
During vacation time, when λ ¯ × Δ T > Q , the train arrival interval should be shortened to reduce the number of waiting passengers. The dwell time correspondingly extends and the number of passengers on board should increase. When λ ¯ × Δ T < Q , the arrival interval is extended to reduce the traction energy consumption of the train. The optimal passenger waiting time is taken as the control objective. The function is established by Equation (46).
Y 1 = i = 1 I j = 1 J Δ T s . t .   λ ˜ d j i + λ ˜ ( T j i + Δ t j i T j 1 i Δ t j 1 i ) < 1 T j i T j i T a q
Step 2: Traction energy consumption of multiple trains
The multi-train cooperative model is not a single objective optimization problem, but a multi-objective optimization problem. The main optimization objectives are described as follows:
  • Passenger waiting time
Passenger waiting time is naturally formed when passengers arrive at the platform through the gate and wait for the train to arrive. As known in Section 3.3, Δ T j i is defined as the vacation time of the queuing model. The passenger waiting time can be expressed as follows.
Y 1 = i = 1 I j = 1 J Δ T
  • Traction energy consumption
According to a single train energy-saving control model, a multi-train traction energy consumption control function is calculated by Equation (48).
Y 2 = 1 J E j = 1 J 1 I 1 E 3
Combined with the weight summation method, the energy-saving cooperative control method for multiple trains is constructed by the train operation safety, equipment working status and other factors or constraint conditions, which can be expressed as follows.
min Y = θ Y 1 + ω Y 2 s . t .   Δ t j i ( 0 , 60 ) t i 1 + t i 2 + t i 3 + t i 4 + t B T j i + Δ t i v min v s i n v max T j i T j 1 i T a q d min d j i d max T a q Δ T ( Δ t q ) max λ ˜ d j i + λ ˜ ( T j i + Δ t j i T j 1 i Δ t j 1 i ) < 1

3.3.2. DRLDA Algorithm for Multi-Train Energy-Saving Control Strategy

The external force, speed, dwell time, passenger waiting time, traction energy consumption and other factors are multiple basic variables that are interrelated and mutually restricted in the train operation control process. The speed, passenger flow, position and distance are the cumulative results of external forces on a train over time. In turn, the choice is also affected by the state of other basic variables. Moreover, when a train arrives at the station, the speed should reach a specified value, such as zero, and the actual operation time must be less than or equal to the planned operation time. At the same time, the relationships among time-varying passenger flow prediction, train arrival time, dwell time and traction energy consumption should also be considered before the train enters the station. During the train’s operation, the speed must not exceed the speed limit of the specific location. To deal with these complex constraints, the implementation of multi-train energy-saving control using DRL is a challenging cooperative optimization problem, because the learning process will involve a long-term trial and error process. Thus, the DRL is suitable to be combined with DA to handle the constraints. The learning system retains the global exploration and learning ability of reinforcement learning (RL), avoiding the locality and incompleteness of pure RL-guided learning. Under DA hybrid learning, it compares the current state of DRL with its own expectations. If they are inconsistent, it means that the agent has made an incorrect decision. Then, DA intervenes with the agent in time through weight distribution. The agent receives specific feedback actions and instructs the controller to make a reliable decision about the state. Finally, the input data for a given train/passenger flow and railway line are used to learn from the train energy consumption and passenger travel data to achieve the goals of safety, energy saving, comfort, punctuality, etc.
RL can make the optimal decision by learning the optimal control strategy incrementally, so as to maximize the cumulative reward value and improve the system performance. DRL combines the perception ability of Deep Learning (DL) with the decision-making ability of RL. Before applying the DRLDA algorithm to the multi-train energy saving control, the basic elements are defined as state space, action space, reward mechanism setting, strategy and value function, respectively.
Step 1: State space
To accurately present the current train information, multiple parameters are selected as environment states, which are related to different attributes of the environment: speed (defined as v s i n ), passenger waiting time (defined as Y1), traction energy consumption (defined as Y2), current position (defined as xi) and the remaining operating time of the train (defined as t r e s t i ). The state space is defined as Equation (50):
s i = [ v s i n , Y 1 , Y 2 , x i , t r e s t i ] s . t .   v s i n v min , v max Y 1 [ T a q , Δ T ] Y 2 [ 0 , E j ] x i 0 , x e n d t r e s t i [ 0 , t p l a n ]
where xend is the end position of the railway line. t r e s t i = t p l a n t o p e r t i n g represents the rest operating time of a train. tplan is the train travel planning time. toperating indicates the time that the train has been running.
The process of train operation is discrete; at each time point t, there are two kinds of states for trains, i.e., the stop at station state and the running state. For the stop at station state, the value ranges for the variables are v s i n = 0 , Δ T Y 1 T a q , Y 2 = 0 , x i x s t o p , 0 t r e s t i t d w e l l = d max to satisfy the upper limits of dwell time and train speed. For the running state, when train runs at the railway line, the value ranges for the variables are v min v s i n v max , Δ T Y 1 T a q , Y 2 E j , x i x e n d x s t o p , t d w e l l Δ t t r e s t i t r e s t i max (where Δ t is time intervals), and should satisfy the speed limit and the upper limit of the remaining operating time. During state transition, the state space si of multiple trains can be obtained by generating all candidate states within the value range.
Step 2: Action space
Trains generate corresponding traction or braking forces to control multiple operation based on the continuous output of the main control handle or URT system. Therefore, the action space A is composed of the variation tendency of the control force level (defined as CFLi) according to the current operating states. The action set is defined as follows:
A = a n i = C F L i , C F L i 100 % , 100 % s . t .   U 1 = C F L i F traction _ max C F L i > 0 U 2 = C F L i F coasting C F L i < 0 U 3 = C F L i F sec ondary _ traction C F L i > 0 U 4 = 0 C F L i = 0 U 5 = C F L i B max C F L i < 0
where Ui represents the different effects under different operating conditions and running states. Ftraction_max and Bmax are the maximum traction force and the maximum braking force of the train, respectively. Fsecondary_traction is the secondary traction force. Fcoasting is the train coasting force. The train with variation tendency of the control force level range is [−100%, 100%]. The maximum traction force is 100%, and the maximum braking force is −100%. A value of 0% means that train runs in a cruise state.
Step 3: Reward mechanism setting
An agent is able to randomly generate control strategies based on the definition of state and action. Reward function is of great importance in the realistic optimization objective field of RL applications. Different reward values can guide the algorithm to learn better control strategies in order to maximize the benefit. A reasonable reward and punishment mechanism can improve the convergence speed. Generally, under the cooperative setting, the common reward (defined as R total s i , a n i ) is obtained by the independent individual reward and the cooperative individual reward based on the relationship between multiple trains. In order to reduce the energy consumption of all trains, R total s i , a n i is calculated by Equation (52).
R total s i , a n i = min Y = θ Y 1 + ω Y 2
Step 4: Strategy and value function
In RL, the train selects an action mapping a probability distribution to execute in the current state, which is defined as strategy matching π . The agent selects a dynamic state-action function Q with a maximum value. The current state-action function Q is defined as:
Q s i , π s i = E 1 κ R total s i , π a n i s i + κ Q s i + 1 , π a n i s i + 1
where κ is a discount factor; the larger κ is, the greater the focus on future cumulative returns. The cumulative return value of the action strategy is defined as a function J π , calculated by J π = E ~ τ Q s , π a s . Thus, the optimal control strategy can be calculated by Equation (54).
o p t i m a l π = arg max π J π
Step 5: DRLDA strategy design
This algorithm consists two kinds of networks, i.e., actor network and critic network, experience replay pool, multi-train energy-saving control model and reward function. The DRLDA algorithm learning framework is illustrated in Figure 3. The two main networks, including the actor network and the critic network, output the train state and control strategy to the train cooperative model to simulate the train operation process. The reward function evaluates the action strategy. Then, the reward values and train states are stored in an experience replay pool through a replay memory mechanism. When the pool is full of data, small batches of random samples are used to train the two main networks, which can effectively reduce the correlations among samples and improve the learning speed. The training process of the proposed algorithm is divided into a state-action function update and an action strategy update. Two main networks combine the advantages of value function update and strategy function update. The critic network adopts the single-step update method based on the value function to evaluate the current multi-train control strategy. The network parameter is defined as ξ Q . The actor network adopts the strategy gradient update method, which can better solve the problem of continuous action space, and is used to output the multi-train energy-saving control strategy. The network parameter is defined as ξ π .
  • Critic Network
The critical network is similar to the Deep Q-Network (DQN) algorithm, which uses a fully connected neural network to fit each state action value, that is, Q s , A ξ Q = Q s , A . The state observations and the actions of sample data are taken as the input. The state action value Q is selected as the output. A larger Q value indicates a higher evaluation of that action strategy. The new weight of the Q neural network parameters is calculated by Equation (55).
ξ t + 1 Q = ξ t Q + ζ Q Δ y Q s , a ξ t Q
Δ y = y t y t is established, where y t = R t o t a l t + κ Q t + 1 s t + 1 , π s i t s t + 1 ξ π s i t ξ Q , y t = Q t s t , π s i t s t ξ π s i t ξ Q . When calculating, the above equations are substituted into Equation (55). Where y t is target value, ζ Q is the learning rate. Q s , a ξ t Q is the update gradient calculated by the critical neural network based on the Q value, which is output to the actor network as the basis for updating the action strategy. If the gradient is positive, it means that the optimal strategy is close to this control strategy. The larger the gradient, the faster the approach speed.
  • Actor Network
The division of action space has a great impact on RL. If the space is too small, it is impossible to accurately describe the train movement and operation process. If the division space is too large, it leads to too large a dimensionality, so the train cannot learn all the actions. The actor network is applied to Q neural network parameters to estimate the train control strategy. The state observations of the sample data are taken as the input, and the action strategy is taken as the output. The different types of strategies can be divided into stochastic and deterministic strategies. In the process of train operation, there is a certain optimal control strategy in each state. Thus, this paper selects the deterministic strategy. The deterministic strategy outputs the corresponding optimal action in the current state, that is, a = π s i ξ . According to gradient ascent (Lillicrap et al. [54]), the derivative of J π is calculated for obtaining the strategy gradient, which is shown in Equation (56).
ξ π s i J π E a Q s , a ξ Q a = π s i ξ π s i π s ξ π s i
When performing a gradient ascent to update strategies in a continuous control problem, the maximization of J maximizes the corresponding Q value. Therefore, an attempt is made to suggest that the agent proactively learns the random sampling data, and the hybrid control increments ( Δ π ξ i h r , i = 1, 2, …, N) are brought into Equation (56) as the unbiased estimation of the expectation. Equation (57) is rewritten to obtain the calculation of the strategy gradient.
ξ π s i J π υ N i a Q s , a ξ Q ξ π s i π s ξ π s i + 1 υ N i π s ξ π s i Δ π ξ i h r ξ π s i π s ξ π s i
where s = s i and a = π s i are new weight calculation equations of the Q neural network parameters, respectively; that is, ξ t + 1 Q = ξ t Q + a π ξ π s i J ( π ) .
  • Parameters Update and Their Mechanism
To increase the stability of algorithm training and reduce large changes to the neural network, as shown in Equations (58) and (59), the actor network and critic network should track the learning by “soft update” instead of directly changing the weights.
ξ t + 1 Q = σ ξ t Q + ( 1 σ ) ξ t 1 Q
ξ t + 1 π s i = ψ ξ t π s i + ( 1 ψ ) ξ t 1 π s i
where ξ t π s i and ξ t Q are the parameters of the actor network and critic network before updating, respectively. ξ t + 1 π s i and ξ t + 1 Q are calculated as the parameters of the actor network and critic network after updating, respectively.
The RL in continuous action space generally performs exploration by adding noise in order to learn new action strategies; that is, a t = π s ξ π s i + , where refers to the noise process. It is of great importance to improve the exploration efficiency. According to the actual train operation, the maximum traction level of the control strategy after adding noise is limited to no more than 100%, and the maximum braking level is no less than −100%.
  • The Pseudo Code of DRLDA Algorithm (Algorithm 1)
Algorithm 1: DRLDA algorithm for multi-train energy-saving control model
1: Initialize replay memory and the common reward, R total s i , a n i
2: Initialize actor network and critic network.
3: Initialize target actor network and critic network.
4: For i = 1:
5:        Reset state space si.
6:        Initialize a random process   for   A c t i o n S t a t e
7:        For step i = 1 to t:
8:               Select action Δ π i = π s i ξ π s i + i based on current strategy and noise.
9:           Set actor network, critic network and parameters update and its mechanism with ξ Q   and ξ π
10:       Obtain hybrid controller output, Δ π ξ i h r
11:       Execute simulation using π ξ i π ξ i h r and observe reward R total i
12:       Obtain next state, si + 1.
13:       Calculate ultimate action increment, Δ π ξ i = π ξ i π ξ i 1
14:       Store s i , C F L i , R t o t a l , Q i in replay memory.
15:       Sample random mini-batch of s i , C F L i , R t o t a l , Q i with size N.
16:       Set J π
17:         Initialize DA parameters, including the speed and position of population, the initial value of the weight, the number of iterations, the radius, etc.
18:           Calculate the objective function value of each Dragonfly individual according to Equation (49).
19:       Release the feasibility of all objective functions to external files.
20:          Compare the number of feasible solutions put into the external file with the maximum capacity of the external file. If it is greater than the maximum capacity, delete the redundant feasible solutions according to the dynamic maintenance strategy. If it is less than the maximum total, go to the next step.
21:             Update the weight factor of Dragonfly behavior and radius parameters.
22:             Update the population location by using the mixed mutation method.
23:             Update the critic network using Equation (55).
24:             Update the actor network using the sampled strategy gradient, Equation (56).
25:             Update the target networks in a soft manner.
26:             Condition judgment: if the maximum number of iterations is reached, execute the next step; if the maximum number of iterations is not reached, go to Step 18.
27:             Output the optimal solution.
28:      End
29: End

4. Case Study

In order to evaluate the effectiveness of the DRLDA algorithm and the ARMA-RBFNN-GA method, we used actual sensor data from Beijing Metro Line 4 (such as Xinjiekou, Pinganli, Xisi and Lingjinghutong) for simulation experiments, as shown in Figure 4.
In practice, these data include train parameters, passenger flow data and railway line data. The train parameters were a type B train with 6-train marshaling, SFM05 model, a total staffing capacity of 1440, etc. The train adopted DC 750 V third rail power supply. The designed maximum speed was 80 km/h. The passenger flow data were derived from historical operation data, including dynamic passenger flow demand data and train operation schedule data. For example, the metro passenger flow data were collected from the Automatic Fare Collection (AFC) data, survey data were provided by the Beijing Metro Company and the railway passenger flow data were obtained from statistics provided by the Passenger Transport Center, Beijing Railway Administration. Since the railway timetable is affected by many factors, it cannot be easily changed. The planning time range for the metro should be determined based on the Beijing Metro Line 4 (such as Xinjiekou, Pinganli, Xisi and Lingjinghutong) and the actual travel time range of passengers. In addition, in order to ensure that the case study was representative, we investigated the original passenger flow data of the Xinjiekou, Pin’anli, Xisi and Lingjinghutong stations of Beijing Line 4 from 7:00 to 9:00 on weekdays for one month, with 30 s interval statistics. The passenger flow data from one week is visualized in Figure 5. Railway line data include stations, line length, planned train operating time, speed limit, etc. Actual energy consumption data were obtained by the Beijing Metro Company.
As shown in Figure 5, passengers embarking on/disembarking from the Beijing metro conform to the actual passenger flow distribution.
In AnyLogic simulation software, as shown in Figure 6, the pedestrian library was used to construct a simulation of passengers entering and leaving the station and queuing for trains. The PedSource module was used to generate passengers. The passengers were shown to enter the platform through the gate, disperse in front of each door and wait for the train to arrive and stop before boarding. Then, the PedSink module was applied to demonstrate the disappearance of the passengers. Some passengers disembarked from the train, made their way from the platform to the gates and left through those gates. These passengers disappeared, thus completing the simulation of passengers entering and leaving the station and embarking on/disembarking from the train.
When building the train operation simulation, the Train Source module was used to generate the train. In the simulation, the train runs through track operation into the platform, which was completed by the Train Move To module. The TrainUpload module was used to represent the process of passengers embarking on/disembarking from the train. Then, the train goes to the next station. According to the actual operation of the train, several simulation parameters of the train (such as train and line length, running speed, etc.) were set with the property inspector of Train Source and Train Move To, as shown in Figure 7.

4.1. Comparison Results of Passenger Flow Prediction

To test and verify the performance of the ARMA-RBFNN algorithm, the original passenger flow data from the Xinjiekou, Pin’anli, Xisi and Lingjinghutong stations of Beijing Line 4 were collected from 7:00 to 9:00 on weekdays for one month. This paper takes 3–4 s as the interval period to process the passenger flow data by using the change-point algorithm. As shown in Figure 8, the change-point algorithm can search for the passenger flow data with change points; here, we use the Xinjiekou station as an example.
After the change-point algorithm processing, wavelet functions are used to denoise the passenger flow data, including Haar wavelet, db wavelet, sym wavelet and coif wavelet. A variety of wavelet decomposition levels are used for denoising. The signal-to-noise ratio (SNR) is calculated by four kinds of wavelet functions and their changes under different decomposition levels, which are shown in Table 2 and Table 3.
Table 2 shows that the larger the signal-to-noise ratio, the better the denoising effect. When the decomposition level of db wavelet function is 4, the signal-to-noise ratio is at its largest. Therefore, the wavelet function of db4 can be used to denoise the passenger flow data for embarking on the train. Similarly, as shown in Table 3, it can be observed that the wavelet function of db5 is used to denoise the passenger flow data for disembarking from the train.
The condition of using the ARMA algorithm is a stable time series; the stability of passenger flow when entering/leaving the station should be tested before using ARMA. As shown in Figure 9 and Figure 10, the value and characteristics of the autocorrelation function and partial autocorrelation function are calculated and analyzed training both embarking and disembarking states.
Figure 9 and Figure 10 demonstrate that the autocorrelation function decreases from 1.0 to 0.0 and fluctuates nearby. The fluctuation amplitude gradually decreases. The partial autocorrelation function decays from 1.0 to 0.0 and oscillates at 0.0 and nearby, infinitely close to 0.0. Both of them are tailing and belong to stable time series, which satisfies the applicability conditions of the ARMA algorithm.
In this section, the rank of the ARMA model is determined based on AIC criterion. According to experimental calculation, when p = 2, q = 3, AIC obtains the minimum value. Thus, the rank of the ARMA model is determined as ARMA (2, 3). Then, the selection and performance of the RBFNN algorithm’s parameters are shown in Figure 11. Finally, the performance of ARMA-RBFNN is compared with other deep learning models in the datasets.
Specifically, some commonly used time series prediction models (i.e., LSTM variation (Yin, Ning and Tang [55]) and MQ-RNN (Wen et al. [56]) are selected as benchmark models. The benchmark models are detailed as follows. First, the LAG-LSTM parameters are set as learning rate = 0.002, batch size = 80, pervious time steps = 14 and time delay = 5. The dimension of the hidden state is set as 12. The dropout is 0.2 (Yin, Ning and Tang [55]). Next, MQ-RNN is constructed as a simple combination of state and control variables as input data for the standard RNN network. The parameters are set as h t 12 and N = 14 (Wen et al. [56]).
In order to evaluate the performance of different algorithms, the Mean Absolute Percentage Error (MAPE) is selected as the evaluation index. As the ARMA-RBFNN algorithm predicts the passenger flow, it is necessary to evaluate the performance of this method as follows (see Figure 12 and Table 4).
M A P E = 100 % n i = 1 n y i y ^ i y i
where y i is the actual passenger flow data and y ^ i represents the predicting data.
As shown in Figure 12, it can be seen that the MAPE of ARMA-RBFNN, LAG-LSTM and MQ-RNN are 2.23%, 6.38% and 7.88%. Table 4 shows that the prediction error of ARMA-RBFNN was the smallest, superior to other algorithms. Therefore, the ARMA-RBFNN algorithm can be used to improve the prediction accuracy of passenger flow in URT systems.

4.2. Evaluation of Single-Train Control Effect

In this set of experiments, the single-train control effect is tested by GA, which solves the global optimal value of the model based on the secondary processed datasets. The train parameters adopt the type B train of Beijing line 4. Other parameters, including single side slope, curve radius (2000 m) and speed limit (80 km/h), are also selected. The station spacing is shown in Table 5.
  • Selection Operator
The selection operator is changed without changing other operators. The relationship between the fitness function value and the number of iterations is shown in Figure 13.
Figure 13 shows that the number of iterations required when the fitness function value tends to be stable decreases as the selection probability increases. When the selection probability value is 0.75, the fitness function value tends to be stable in the 26th generation. When the selection probability value is 0.85, the fitness function value tends to be stable in the 22th generation. When the selection probability value is 0.95, the fitness function value tends to be stable in the 8th generation. Thus, when the selection probability is between [0.80, 0.95], the convergence speed of GA is improved.
  • Crossover Operator
The crossover operator is changed without changing other operators. The relationship between the fitness function value and the number of iterations is shown in Figure 14.
Figure 14 shows that the number of iterations of the algorithm has little effect on the stability of the fitness function value based on changing the crossover operator. However, the function value has the effect of tending to the optimal value. The values of the crossover operator are between 0.7 and 0.8, and the fitness function value is close to the optimal value. When the values of the crossover operator are between 0.8 and 0.9, the fitness function value is far from optimal, with a large error. Therefore, a crossover operator value between [0.75, 0.85] can improve the evolution effect of GA.
  • Mutation Operator
The mutation operator is changed without changing other operators. The relationship between the fitness function value and the number of iterations is shown in Figure 15.
Figure 15 demonstrates that the variance probability is taken as 0.003; the value of the fitness function tends to be smooth in the 6th generation. When the variance probability is taken as 0.004, the value of the fitness function tends to be smooth in the 11th generation. When the variance probability is taken as 0.005, the value of the fitness function tends to be smooth in the 16th generation, and at the value of 0.004, the value of the fitness function is closer to the optimal value. Therefore, the algorithm evolves better when the variation probability is taken between [0.003, 0.005].
In summary, the selection operator, crossover operator and mutation operator are determined. The GA is run, and the optimal fitness value is related to the number of iterations, as shown in Figure 16.
Figure 16 shows that when the number of iterations is about 53, the fitness curve tends to be stable. This means that the optimal value of the fitness function has been reached, and the curve is convergence.
The speed and operating condition conversion points among the four stations are shown in Figure 17.
Figure 17 indicates that the operating conditions of the train between Xinjiekou-Lingjinghutong stations consist of four types: traction, cruise, coasting and braking, which verifies the effectiveness of the method proposed in this paper. Meanwhile, considering the operation efficiency and control effect, the optimal recommended speed of trains between stations obtained by the proposed algorithm can effectively satisfy the constraints of the speed limit and the safety of train operation, which further verifies the applicability of the proposed algorithm.
The solution results of GA were entered into the Anylogic simulation software for verification and analysis. The operation process is shown in Figure 18 and Figure 19. Figure 18 shows the interface where passengers were generated and entered the platform. Figure 19 shows the interface where passengers started to embark on/disembark from the train after tit stopped at the station.
As shown in Table 6, traction energy consumption and average passenger waiting time during train operation are calculated.
Table 6 shows that when the train ran at Xinjiekou/Pinganli station, the energy-saving efficiency was 5.58%. The average passenger waiting time at Pinganli station was reduced by 2 s. The energy-saving efficiency was 2.38%. The average passenger waiting time at Xisi station was reduced by 0.2 s during the operation between Pinganli and Xisi station. The energy-saving efficiency was 5.57%. The average passenger waiting time at Lingjinghutong station was extended by 3 s. For a single train, the overall energy-saving efficiency was 4.51%. The average passenger waiting time was extended by 0.8 s, thus realizing energy saving and achieving the expected effect.

4.3. Evaluation of Multi-Train Control Effect

To test and verify the reliability and performance of the DRLDA algorithm, different cases for simulations were made. The training process of DDPG-RS and DQN-RS (Shang, Zhou and Fujita [57]) was compared with the proposed method in an ordinary scenario. According to the relationship complexities among states, actions and evaluation indexes, Neural Network structures of different depths can be employed. Since the states and actions are multi-dimensional for the problem of train energy-saving control, the Q networks, actor networks and critic networks are composed of several fully connected hidden layers, with some neurons for the fully connected hidden layers, respectively. The output layer of the actor networks and critic networks have only one neuron in continuous action space (Kou et al. [58]; Wang et al. [59]; Zhao et al. [60]). The related parameters for training are shown in Table 7. If the termination condition described in the DRLDA strategy were triggered during the training, the current single training would be stopped. The train would be reset to the initial state to start a new training session. The levels were discretized into eleven different control levels [100%, −80%, −60%, −40%, −20%, 0%, 20%, 40%, 60%, 80%, 100%]. The higher the number of discretions, the more difficult the training. After adding noise in the exploration process, three different algorithms were selected as the closest control levels.
As shown in Figure 20, the average reward value of the variety process curve is shown for the training of three different algorithms.
Figure 20 demonstrates that the average reward values of three different algorithms tend to be stable after rising. The DRLDA algorithm more easily learns the action space, so the training process is more stable. The average episode reward value is stable at about −135, with a small amplitude of vibration. The average reward value per 200 episodes for DRLDA is generally higher than the reward values for DDPG-RS and DQN-RS in Figure 20. This implies that DA-DRL learns better than these two algorithms. With the increase in training times, the average episode reward values of the DDPG-RS and DQN-RS algorithms have been rising in vibration. The values of these two algorithms began to rise slowly after reaching −160 and −200, respectively.
The results, compared with the computer resources demanded by the above three and other alternative approaches, are shown in Table 8.
As shown in Table 8, the DRL-DA algorithm has more state transfers than other algorithms, and the average reward value obtained is higher, indicating that the DRL-DA triggers less termination of training and better learning. Due to the algorithm structure of Q learning, Q learning chooses discrete vectors for the action space, which has some limitations in simulation generalization ability and is slightly less effective. However, due to the relative simplicity of the algorithm, the training time is relatively fast. The DQN algorithm is relatively slow in computing efficiency due to the complexity of the algorithm compared to the reinforcement learning algorithm. At the same time, because the artificial neural network training has a certain degree of randomness, the training results are prone to fluctuations, which can be improved by adjusting the reward function weights and improving the neural network. Compared with other algorithms, the proposed algorithm requires less data. Through the direct interaction between the environment and the agent, it continuously learns and corrects each step of the action without the need for complex operation conditions settings. Therefore, compared with other algorithms, the proposed algorithm is more convenient and shows better performance.
Figure 21 shows that in the process of solving with the DRLDA algorithm, there is a convergence curve between the optimal value and the number of iterations. The operating time of multiple trains at Xinjiekou, Pinganli, Xisi and Lingjinghutong stations can be obtained by inputting different train arrival intervals in Figure 22.
Figure 21 demonstrates that the proposed algorithm converges to a relatively satisfactory solution after more than 100 generations of computation. The time to obtain the optimal solution is about 371 s. As seen in Figure 22, the multi-train operating time optimization diagram, drawn according to the optimization results obtained by the algorithm, can better reflect the variation pattern of passenger flow with time, and the train operating interval changes dynamically with the passenger flow demand. During the morning peak hours of 7:00–9:00 a.m., the operating lines are more closely distributed, which can maximize passenger waiting time, rationalizing train departures and reducing operation costs.
For verification, the solution results of the DRLDA algorithm were entered into Anylogic. In the multi-train train control model, the relationship between train traction energy consumption and passenger waiting time is mainly studied by changing the arrival interval of the train. The TrainSource module in Anylogic sets the train arrival interval, which can be adjusted in minutes or seconds. According to the solution results, the train arrival interval is adjusted by running the simulation.
Figure 23 shows that passengers arrive at the platform and disperse to each door of the train to wait for it. Figure 24 indicates that passengers have formed a queue in front of the doors and as the train is about to enter the station, and passengers are waiting to board.
As shown in Table 9, the planned operating times of the trains are compared with the optimization results of the proposed model.
Considering the punctuality of the train in operation, the train operating time is optimized within the range of (0.60 s). It can be seen from Table 9 that the train’s arriving time at each station also changes with different train arrival intervals and is affected by passenger waiting time. On this basis, the multi-train energy consumption at the four stations and the average passenger waiting time were calculated and compared, and are shown in Table 10.
As a result, the number of metro passengers waiting at the station is determined by the arrival rate of metro passengers at the station and the following headway. The train arrival time at each station also changes with different train arrival intervals and is affected by passenger waiting time. Table 10 shows that with the arrival of the morning rush, the trains gradually entered the normal operation stage during the day, the passenger flow increased and the scheduled departure interval gradually increased. When the train arrival interval was changed, the average passenger waiting time increased. More passengers entered the train, resulting in an increase in the frequency of train operation and a gradual increase in the number of train traction runs, which increased the train traction energy consumption. However, from the perspective of multi-train collaborative operation, the total traction energy-saving efficiency of these six trains was 3.1%. The passenger waiting time was reduced by 5 s, which achieves the energy-saving effect and the expected goal. Furthermore, with multi-train collaborative operation, there is a positive effect, i.e., the trains numbered 1, 2, 4 and 6. Meanwhile, there is also a negative effect, i.e., the trains numbered 3 and 5. The possible reason is that the passenger flow is assumed to be dynamic, which means that there is a spatiotemporal aspect to the passenger flow. Therefore, when the following headway, running time and dwell time change slightly, the impact on the train traction energy consumption value is rather large.

5. Conclusions

The rapid development of urban rail transit has brought about huge energy consumption problems due to its large capacity and long transportation hours. With the development concept of “low carbon” and “sustainable development”, how to reduce train traction energy consumption has become an important element in the sustainable development of rail transit. The focus of this paper is to reduce the train traction energy consumption while reducing the waiting time of passengers on the platform to satisfy their travel demands. Thus, a DRLDA algorithm was proposed for multi-train energy-saving collaborative optimized control to deal with complicated multidimensional constraints. Firstly, the change-point algorithm was used to find the changing points in the time series of passenger flow curve, which constitute the variation point sets. The wavelet variation method was used for denoising to ensure the authenticity of the data. Then, the train operation between stations was studied to determine the combination of train operating conditions, and a single train’s energy consumption was calculated. Considering the multidimensional conditions, the single-train energy-saving control model was established, the goal of which was to minimize the multi-train traction energy consumption and the average passenger waiting time. An ARMA-RBFNN algorithm was proposed and applied to implement the single-train energy-saving control strategy. Next, the queuing theory was introduced to study the relationship between passenger waiting time and train arrival intervals and stopping times. The multi-train cooperative energy-saving control model was established based on many factors and conditions. The global optimal value of the multi-train cooperative energy-saving control model was solved by the Dragonfly Algorithm. The DRLDA algorithm was designed to handle multi-train cooperative energy-saving control, and a proactive constraint handling system was added to DRL to adjust learning directions and avoid learning oscillations. The simulation results demonstrated that the MAPE of ARMA-RBFNN was 2.23%; this value was the smallest, which is superior to other algorithms. Moreover, for a single train, the overall energy-saving efficiency was 4.51% and the average passenger waiting time was extended by 0.8 s. For multiple trains, the DRLDA showed advantages over DDPG-RS and DQN-RS in accelerating the learning convergence process and obtaining smooth results, satisfying multidimensional constraints. At the same time, the total traction energy consumption and passenger waiting time were reduced by 3.1% and 5 s, respectively, achieving the energy-saving effect.
In the future, our research will mainly focus on the following major aspects: (1) When calculating the passenger waiting time according to the actual situation, this time changes, and can be set to an interval range to restore the authenticity of the data in the future. (2) The multi-train collaborative control model, in the case of interchange, should be studied in depth. (3) We hope to obtain more concrete operational data to study transfer behavior in a larger-scale rail transit network.

Author Contributions

Conceptualization, L.Q. and L.D.; methodology, L.Q. and L.D.; formal analysis, L.Q. and L.D.; investigation, X.X. and L.Z.; validation, L.Q. and L.D.; visualization, L.Q., X.Q. and L.D.; writing—original draft preparation, L.D.; writing—review and editing, L.D. supervision, L.Z.; project administration, X.X. and L.Z.; funding acquisition, X.X. and L.Z. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the National Natural Science Foundation of China (Grant No. 61741303), the National Natural Science Foundation of China (Grant No. 62262011) and the Natural Science Foundation of Guangxi (Grant No. 2021JJA170130).

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The data used to support the findings of this study are available from the first author upon request.

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. China Association of Metros. Statistics and Analysis of Urban Rail Transit in 2017. 2018. Available online: https://doi.org/10.14052/j.cnki.china.metros.2018.04.002 (accessed on 15 April 2018).
  2. Hou, X.F.; Mei, J.P.; Zuo, C. Overview of urban rail transit routes in China in 2021. Mod. Urban Rail Transp. 2022, 35, 12–16. [Google Scholar] [CrossRef]
  3. Hou, X.F.; Mei, J.P.; Zuo, C. Overview of urban rail transit routes in China in 2020. Mod. Urban Rail Transp. 2021, 34, 12–17. [Google Scholar] [CrossRef]
  4. Howlett, P.G.; Pudney, P.J.; Vu, X. Local energy minimization in optimal train control. Automatica 2009, 45, 2692–2698. [Google Scholar] [CrossRef]
  5. Bocharnikov, Y.V.; Tobias, A.M.; Roberts, C. Reduction of train and net energy consumption using genetic algorithms for trajectory optimisation. In Proceedings of the IET Conference on Railway Traction Systems (RTS 2010), Birmingham, UK, 13–15 April 2010; pp. 1–5. [Google Scholar] [CrossRef]
  6. Albrecht, A.; Howlett, P.; Pudney, P.; Vu, X.; Zhou, P. The key principles of optimal train control-Part 1: Formulation of the model, strategies of optimal type, evolutionary lines, location of optimal switching points. Transp. Res. Part B Methodol. 2016, 94, 482–508. [Google Scholar] [CrossRef]
  7. Albrecht, A.; Howlett, P.; Pudney, P.; Vu, X.; Zhou, P. The key principles of optimal train control-Part 2: Existence of an optimal strategy, the local energy minimization principle, uniqueness, computational techniques. Transp. Res. Part B Methodol. 2016, 94, 509–538. [Google Scholar] [CrossRef]
  8. Miao, C.Y.; Wu, S.L.; Zhou, Z.; Zhang, W. Energy Saving Operation Optimization Model of Single-train Based on Time Discretization. J. Logist. Eng. Univ. 2016, 32, 92–96. [Google Scholar] [CrossRef]
  9. Li, T.; Gui, H.D.; Sun, F.; Xing, Z.Y. Study on timing energy saving of single train based on Paretomulti-objective genetic algorithm. J. Guangxi Univ. 2017, 42, 1715–1722. [Google Scholar] [CrossRef]
  10. Li, G.; Lin, J.H.; Zhuang, Z.; He, L. A Research on Energy-Saving Train Control of Urban Mass Transit based on Regenerative Brake. Railw. Transp. Econ. 2019, 41, 121–126. [Google Scholar] [CrossRef]
  11. Scheepmaker, G.M.; Goverde, R.M.; Kroon, L.G. Review of energy-efficient train control and timetabling. Eur. J. Oper. Res. 2017, 257, 355–376. [Google Scholar] [CrossRef] [Green Version]
  12. Yin, J.; Chen, D.; Li, Y. Smart train operation algorithms based on expert knowledge and ensemble CART for the electric locomotive. Knowl.-Based Syst. 2016, 92, 78–91. [Google Scholar] [CrossRef]
  13. Fernández-Rodríguez, A.; Fernández-Cardador, A.; Cucala, A.P. Balancing energy consumption and risk of delay in high speed trains: A three-objective real-time eco-driving algorithm with fuzzy parameters. Transp. Res. Part C Emerg. Technol. 2018, 95, 652–678. [Google Scholar] [CrossRef]
  14. Bao, M.; Zhang, H.; Wu, H.; Zhang, C.; Wang, Z.; Zhang, X. Multiobjective Optimal Dispatching of Smart Grid Based on PSO and SVM. Mob. Inf. Syst. 2022, 2022, 2051773. [Google Scholar] [CrossRef]
  15. Dou, H.; Wang, Y.; Zhou, J.; Liu, Y. Coordinated Control of New Energy Environment and Mixed Vehicle Flow Speed Based on Sensor Network. Mob. Inf. Syst. 2022, 2022, 6161154. [Google Scholar] [CrossRef]
  16. Yang, X.; Li, X.; Gao, Z.; Wang, H.; Tang, T. A cooperative scheduling model for timetable optimization in subway systems. IEEE Trans. Intell. Transp. Syst. 2012, 14, 438–447. [Google Scholar] [CrossRef]
  17. Ye, H.; Liu, R. A multiphase optimal control method for multi-train control and scheduling on railway lines. Transp. Res. Part B Methodol. 2016, 93, 377–393. [Google Scholar] [CrossRef]
  18. Huang, W.; Shuai, B. Approach and application on high-speed train stop plan for better passenger transfer efficiency: The China case. Int. J. Rail Transp. 2019, 7, 55–78. [Google Scholar] [CrossRef]
  19. Liu, J.; Guo, H.; Yu, Y. Research on the cooperative train control strategy to reduce energy consumption. IEEE Trans. Intell. Transp. Syst. 2016, 18, 1134–1142. [Google Scholar] [CrossRef]
  20. Long, G.Q.; Yin, X.Z. Energy saving control strategy of urban rail transit train. Railw. Comput. Appl. 2018, 27, 90–94. [Google Scholar] [CrossRef]
  21. Liu, H.; Zhou, M.; Guo, X.; Zhang, Z.; Ning, B.; Tang, T. Timetable optimization for regenerative energy utilization in subway systems. IEEE Trans. Intell. Transp. Syst. 2018, 20, 3247–3257. [Google Scholar] [CrossRef]
  22. Feng, Y.; Chen, S.K.; Ran, X.C.; Bo, Y.; Jia, W.Z. Energy Saving Operation Optimization of Urban Rail Transit Trains Through the Use of Regenerative Braking Energy. J. China Railw. Soc. 2018, 40, 15–22. [Google Scholar] [CrossRef]
  23. Cao, J.F. A Study on the Energy-efficient Manipulation Strategy of Urban Rail Transit Train. Railw. Transp. Econ. 2019, 41, 108–113 + 118. [Google Scholar] [CrossRef]
  24. Pineda-Jaramillo, J.; Martínez-Fernández, P.; Villalba-Sanchis, I.; Salvador-Zuriaga, P.; Insa-Franco, R. Predicting the traction power of metropolitan railway lines using different machine learning models. Int. J. Rail Transp. 2021, 9, 461–478. [Google Scholar] [CrossRef]
  25. Deng, L.; Zhong, M.; Xu, J.; Xu, G. Train Operation Curve Optimization for an Urban Rail Interval with Multi-Parameter Adjustment. Trans. Res. Record J. Transp. Res. Board 2022, 2676, 811–826. [Google Scholar] [CrossRef]
  26. Liu, H.; Yang, L.; Yang, H. Cooperative Optimal Control of the Following Operation of High-Speed Trains. IEEE Trans. Intell. Transp. Syst. 2022, 23, 17744–17755. [Google Scholar] [CrossRef]
  27. Oh, K.; Yoo, M.; Jin, N.; Ko, J.; Seo, J.; Joo, H.; Ko, M. A Review of Deep Learning Applications for Railway Safety. Appl. Sci. 2022, 12, 10572. [Google Scholar] [CrossRef]
  28. Zhang, Y.; Zuo, T.; Zhu, M.; Huang, C.; Li, J.; Xu, Z. Research on multi-train energy saving optimization based on cooperative multi-objective particle swarm optimization algorithm. Int. J. Energy Res. 2021, 45, 2644–2667. [Google Scholar] [CrossRef]
  29. Jieli, L.; Tao, H. Research on Energy-saving Collaborative Optimization Method for Multiple Trains Considering Renewable Energy Utilization. In Proceedings of the 2020 5th International Conference on Communication, Image and Signal Processing (CCISP), Chengdu, China, 13–15 November 2020; pp. 58–63. [Google Scholar] [CrossRef]
  30. Wang, X.; Tang, T.; Su, S.; Yin, J.; Gao, Z.; Lv, N. An integrated energy-efficient train operation approach based on the space-time-speed network methodology. Transp. Res. Part E Logist. Transp. Rev. 2021, 150, 102323. [Google Scholar] [CrossRef]
  31. Gu, Q.; Tang, T.; Ma, F. Energy-efficient train tracking operation based on multiple optimization models. IEEE Trans. Intell. Transp. Syst. 2015, 17, 882–892. [Google Scholar] [CrossRef]
  32. Su, S.; Tang, T.; Xun, J.; Cao, F.; Wang, Y. Design of running grades for energy-efficient train regulation: A case study for beijing yizhuang line. IEEE Intell. Transp. Syst. Mag. 2019, 13, 189–200. [Google Scholar] [CrossRef]
  33. He, D.; Zhang, L.; Guo, S.; Chen, Y.; Shan, S.; Jian, H. Energy-efficient train trajectory optimization based on improved differential evolution algorithm and multi-particle model. J. Clean. Prod. 2021, 304, 127163. [Google Scholar] [CrossRef]
  34. Huang, Y.; Yang, L.; Tang, T.; Gao, Z.; Cao, F. Joint train scheduling optimization with service quality and energy efficiency in urban rail transit networks. Energy 2017, 138, 1124–1147. [Google Scholar] [CrossRef]
  35. Sun, L.; Jin, J.G.; Lee, D.H.; Axhausen, K.W.; Erath, A. Demand-driven timetable design for metro services. Transp. Res. Part C Emerg. Technol. 2014, 46, 284–299. [Google Scholar] [CrossRef]
  36. Li, T.; Shi, Y. Application of MMC-RPC in High-Speed Railway Traction Power Supply System Based on Energy Storage. Appl. Sci. 2022, 12, 10009. [Google Scholar] [CrossRef]
  37. Gao, Y.; Dong, X.; Han, F.; Li, Z. An Optimization Model for a Desert Railway Route Scheme Based on Interval Number and TOPSIS. Appl. Sci. 2022, 12, 10728. [Google Scholar] [CrossRef]
  38. Niu, H.M.; Zhou, X.S.; Gao, R.H. Train scheduling for minimizing passenger waiting time with time-dependent demand and skip-stop patterns: Nonlinear integer programming models with linear constraints. Transp. Res. Part B Methodol. 2015, 76, 117–135. [Google Scholar] [CrossRef]
  39. Canca, D.; Zarzo, A. Design of energy-efficient timetables in two-way railway rapid transit lines. Transp. Res. Part B Methodol. 2017, 102, 142–161. [Google Scholar] [CrossRef]
  40. Yang, X.; Chen, A.; Ning, B.; Tang, T. Bi-objective programming approach for solving the metro timetable optimization problem with dwell time uncertainty. Transp. Res. Part E Logist. Transp. Rev. 2017, 97, 22–37. [Google Scholar] [CrossRef]
  41. Su, S.; Wang, X.; Cao, Y.; Yin, J. An energy-efficient train operation approach by integrating the metro timetabling and eco-driving. IEEE Trans. Intell. Transp. Syst. 2019, 21, 4252–4268. [Google Scholar] [CrossRef]
  42. Gao, Z.; Yang, L. Energy-saving operation approaches for urban rail transit systems. Front. Eng. Manag. 2019, 6, 139–151. [Google Scholar] [CrossRef]
  43. Li, W.; Peng, Q.; Wen, C.; Xu, X. Comprehensive optimization of a metro timetable considering passenger waiting time and energy efficiency. IEEE Access 2019, 7, 160144–160167. [Google Scholar] [CrossRef]
  44. Bai, Y.; Cao, Y.; Yu, Z.; Ho, T.K.; Roberts, C.; Mao, B. Cooperative control of metro trains to minimize net energy consumption. IEEE Trans. Intell. Transp. Syst. 2019, 21, 2063–2077. [Google Scholar] [CrossRef]
  45. Yin, J.; Su, S.; Xun, J.; Tang, T.; Liu, R. Data-driven approaches for modeling train control models: Comparison and case studies. ISA Trans. 2020, 98, 349–363. [Google Scholar] [CrossRef] [PubMed]
  46. Zhang, M.; Zhang, Q.; Liu, W.T.; Zhou, B.Y. A policy-based reinforcement learning algorithm for intelligent train control. J. China Railw. Soc. 2020, 42, 69–75. [Google Scholar] [CrossRef]
  47. Liu, Y.; Zhou, Y.; Su, S.; Xun, J.; Tang, T. An analytical optimal control approach for virtually coupled high-speed trains with local and string stability. Transp. Res. Part C Emerg. Technol. 2021, 125, 102886. [Google Scholar] [CrossRef]
  48. Cao, Y.; Wang, Z.C.; Liu, F.; Li, P.; Xie, G. Bio-inspired speed curve optimization and sliding mode tracking control for subway trains. IEEE Trans. Veh. Technol. 2019, 68, 6331–6342. [Google Scholar] [CrossRef]
  49. Cao, Y.; Wen, J.; Ma, L. Tracking and collision avoidance of virtual coupling train control system. Future Gener. Comput. Syst. 2021, 120, 76–90. [Google Scholar] [CrossRef]
  50. Liu, W.; Liu, D. Dynamic Adjustment Strategy of Rail Guide Vehicle. Mob. Inf. Syst. 2021, 2021, 1433552. [Google Scholar] [CrossRef]
  51. Zhu, C.Q.; Du, G.F.; Ding, Y.W.; Huang, W.G.; Wang, J.; Fan, M.D.; Zhu, Z.K. Rail potential control with train diagram optimization in multitrain DC traction power system. Int. J. Rail Transp. 2022, 10, 476–496. [Google Scholar] [CrossRef]
  52. Zhang, F.; Zhao, Y.; Ye, J.; Wang, S.; Hu, J. Novel Distance Measures on Hesitant Fuzzy Sets Based on Equal-Probability Transformation and Their Application in Decision Making on Intersection Traffic Control. Comput. Model. Eng. Sci. 2023, 135, 1589–1602. [Google Scholar] [CrossRef]
  53. Ning, Z.; Niu, H.X.; Zhang, Z.X. Automatic operation regulation optimization model of metro train based on queuing theory. J. Railw. Sci. Eng. 2019, 16, 1826–1832. [Google Scholar] [CrossRef]
  54. Lillicrap, T.P.; Jonathan, J.H.; Pritzel, A.; Heess, N.; Erez, T.; Tassa, Y.; Silver, D.; Wierstra, D. Continuous control with deep reinforcement learning. arXiv 2015, arXiv:1509.02971. [Google Scholar] [CrossRef]
  55. Yin, J.; Ning, C.; Tang, T. Data-driven models for train control dynamics in high-speed railways: LAG-LSTM for train trajectory prediction. Inf. Sci. 2022, 600, 377–400. [Google Scholar] [CrossRef]
  56. Wen, R.; Torkkola, K.; Narayanaswamy, B.; Madeka, D. A multi-horizon quantile recurrent forecaster. arXiv 2017, arXiv:1711.11053. [Google Scholar] [CrossRef]
  57. Shang, M.; Zhou, Y.; Fujita, H. Deep reinforcement learning with reference system to handle constraints for energy-efficient train control. Inf. Sci. 2021, 570, 708–721. [Google Scholar] [CrossRef]
  58. Kou, P.; Liang, D.; Wang, C.; Wu, Z.; Gao, L. Safe deep reinforcement learning-based constrained optimal control scheme for active distribution networks. Appl. Energy 2020, 264, 114772. [Google Scholar] [CrossRef]
  59. Wang, X.; Gu, Y.; Cheng, Y.; Liu, A.; Chen, C.P. Approximate policy-based accelerated deep reinforcement learning. IEEE Trans. Neural Netw. Learn. Syst. 2019, 31, 1820–1830. [Google Scholar] [CrossRef]
  60. Zhao, H.; Zhao, J.; Qiu, J.; Liang, G.; Dong, Z.Y. Cooperative wind farm control with deep reinforcement learning and knowledge-assisted learning. IEEE Trans. Ind. Inform. 2020, 16, 6912–6921. [Google Scholar] [CrossRef]
Figure 1. Schematic diagram of train operating conditions.
Figure 1. Schematic diagram of train operating conditions.
Applsci 13 02454 g001
Figure 2. The flowchart of GA.
Figure 2. The flowchart of GA.
Applsci 13 02454 g002
Figure 3. The DRLDA algorithm learning framework.
Figure 3. The DRLDA algorithm learning framework.
Applsci 13 02454 g003
Figure 4. Illustration of Beijing Metro Line 4 (such as Xinjiekou, Pinganli, Xisi and Lingjinghutong) (Source: Baidu map).
Figure 4. Illustration of Beijing Metro Line 4 (such as Xinjiekou, Pinganli, Xisi and Lingjinghutong) (Source: Baidu map).
Applsci 13 02454 g004
Figure 5. Distribution of passenger flow changes under different days: (a) passenger flow entering stations on different days; (b) passenger flow leaving stations on different days.
Figure 5. Distribution of passenger flow changes under different days: (a) passenger flow entering stations on different days; (b) passenger flow leaving stations on different days.
Applsci 13 02454 g005
Figure 6. Simulation modeling of passenger activity.
Figure 6. Simulation modeling of passenger activity.
Applsci 13 02454 g006
Figure 7. Train operation simulation modeling.
Figure 7. Train operation simulation modeling.
Applsci 13 02454 g007
Figure 8. Passenger flow data in different states: (a) passenger flow data embarking on the train; (b) Passenger flow data disembarking from the train.
Figure 8. Passenger flow data in different states: (a) passenger flow data embarking on the train; (b) Passenger flow data disembarking from the train.
Applsci 13 02454 g008
Figure 9. Visualization for the stability test of passenger flow entering the station.
Figure 9. Visualization for the stability test of passenger flow entering the station.
Applsci 13 02454 g009
Figure 10. Visualization for the stability test of passenger flow leaving the station.
Figure 10. Visualization for the stability test of passenger flow leaving the station.
Applsci 13 02454 g010
Figure 11. RBFNN algorithm parameters selection and performance results. (a) The prediction results of four different stations; (b) ARMA-RBFNN performance results; (c) ARMA-RBFNN training state results; (d) ARMA-RBFNN regression results.
Figure 11. RBFNN algorithm parameters selection and performance results. (a) The prediction results of four different stations; (b) ARMA-RBFNN performance results; (c) ARMA-RBFNN training state results; (d) ARMA-RBFNN regression results.
Applsci 13 02454 g011aApplsci 13 02454 g011b
Figure 12. MAPE comparative analysis results.
Figure 12. MAPE comparative analysis results.
Applsci 13 02454 g012
Figure 13. Influence of selection operator on GA.
Figure 13. Influence of selection operator on GA.
Applsci 13 02454 g013
Figure 14. Influence of crossover operator on GA.
Figure 14. Influence of crossover operator on GA.
Applsci 13 02454 g014
Figure 15. Influence of mutation operator on GA.
Figure 15. Influence of mutation operator on GA.
Applsci 13 02454 g015
Figure 16. The global optimal solution results.
Figure 16. The global optimal solution results.
Applsci 13 02454 g016
Figure 17. The speed and operating condition conversion points among the four stations.
Figure 17. The speed and operating condition conversion points among the four stations.
Applsci 13 02454 g017
Figure 18. Passenger activity interface.
Figure 18. Passenger activity interface.
Applsci 13 02454 g018
Figure 19. Train parking interface.
Figure 19. Train parking interface.
Applsci 13 02454 g019
Figure 20. Training process of three different algorithms.
Figure 20. Training process of three different algorithms.
Applsci 13 02454 g020
Figure 21. Solving method with DRLDA algorithms.
Figure 21. Solving method with DRLDA algorithms.
Applsci 13 02454 g021
Figure 22. Multi-train operating time optimization diagram.
Figure 22. Multi-train operating time optimization diagram.
Applsci 13 02454 g022
Figure 23. Passengers wait for the train.
Figure 23. Passengers wait for the train.
Applsci 13 02454 g023
Figure 24. Passengers board the train.
Figure 24. Passengers board the train.
Applsci 13 02454 g024
Table 1. Notation explanation.
Table 1. Notation explanation.
NotationDescription
iThe station and station spacing number
jThe train number
Q u p i The number of passengers getting on at the ith station
Q o f f i The number of passengers getting off at the ith station
Q i The number of passengers on the train after departure from the ith station
T j i The scheduled operation time of the jth train to the next station
Δ t i The optimization time at the ith station
mThe total weight of the train
m p The weight of the train
m Q i The weight of the passengers on the train
S i The distance between the ith and i + 1th stations
s i n The position of the nth operating condition conversion point at the ith station, n N
t i n The conversion time of the nth operating condition at the ith station spacing, n N
v s i n The speed at which the position of the nth operating condition conversion point at the ith station
a s i n The acceleration at which the position of the nth operating condition conversion point at the ith station
t B The time between the train braking with maximum braking force and stopping
F s i n The force of the nth operating condition at the ith station spacing
E s i n The energy consumption of the nth operating condition at the ith station spacing
d j i The stopping time of the jth train at the ith station
u j i The arrival time of the jth train at the ith station
r j i The departure time of the jth train at the ith station
Δ T j i The maximum passenger waiting time at the ith platform
t q k i The time of arrival of the kth passenger at the ith platform, k = 1 , 2 , , Q u p i
Δ t q k i The waiting time of the kth passenger at the ith platform
FtractionThe traction
WThe resistance
BThe braking force
Table 2. The signal-to-noise ratio of passengers embarking on the train.
Table 2. The signal-to-noise ratio of passengers embarking on the train.
Wavelet FunctionHaardb2db3db4db5db6Sym2Sym3Sym4Sym5Sym6Coif1Coif2Coif3Coif4Coif5
SNR25.5534.9334.9535.0734.1934.2528.9927.9928.6227.8829.0229.7630.0929.6430.0429.55
Table 3. The signal-to-noise ratio of passenger disembarking from the train.
Table 3. The signal-to-noise ratio of passenger disembarking from the train.
Wavelet FunctionHaardb2db3db4db5db6Sym2Sym3Sym4Sym5Sym6Coif1Coif2Coif3Coif4Coif5
SNR27.6335.8233.5734.5936.2835.3530.0528.2626.4929.1828.1328.3930.2128.2731.3529.68
Table 4. Comparison prediction error of MAPE.
Table 4. Comparison prediction error of MAPE.
ModelARMA-RBFNNLAG-LSTMMQ-RNN
<10%93.78%73.06%69.07%
10–15%6.22%20.16%18.65%
15–20%--7.25%5.18%
>20%--0.52%3.11%
Table 5. Distance between stations.
Table 5. Distance between stations.
Origination/Destination StationDistance [m]
Xinjiekou/Pinganli1100
Pinganli/Xisi1100
Xisi/Lingjinghutong869
Table 6. Comparison of optimization results.
Table 6. Comparison of optimization results.
IndexXinjiekou/Pinganli StationPinganli/Xisi StationXisi/Lingjinghutong Station
Actual average passenger waiting time [s]105.00102.0093.00
Calculated average passenger waiting time [s]103.00101.8096.00
Actual energy consumption [kW·h]14.6714.2813.64
Calculated energy consumption [kW·h]13.8513.9412.88
Energy efficiency [%]5.582.385.57
Table 7. Training parameters.
Table 7. Training parameters.
ParameterDRLDADDPG-RSDQN-RS
State transition step100100100
The batch size N of transfer samples from replay memory323232
Single training step202020
Actor network learning radio0.00010.001--
Critic network learning radio0.00010.0010.001
Soft update parameter0.00010.0010.001
Sample data125125125
Replay memory capacity12,00010,00010,000
Discount factor0.990.80.9
Line length306930693069
Other parameters--See Ref. [57]See Ref. [57]
Table 8. Comparing the training session results of the proposed method with other algorithms.
Table 8. Comparing the training session results of the proposed method with other algorithms.
Number of TrainingNumber of StateAverage Rewards ValueSingle Training Time [s]
DRL-DA10,000751,442−1040.0008
DDPG-RS10,000735,117−1410.02
DQN-RS10,000680,985−1810.06
DDPG10,000640,088−2820.83
DQN10,000601,128−3692.04
DRL-DA20,000837,022−1190.001
DDPG-RS20,000802,441−1500.21
DQN-RS20,000750,088−1930.35
DDPG20,000721,672−2941.19
DQN20,000682,309−3822.20
DRL-DA30,000915,147−1320.032
DDPG-RS30,000870,036−1580.33
DQN-RS30,000834,011−2050.68
DDPG30,000802,113−3031.54
DQN30,000779,554−3942.31
DRL-DA40,0001,012,271−1410.074
DDPG-RS40,000953,046−1670.56
DQN-RS40,000903,117−2181.04
DDPG40,000880,121−3121.81
DQN40,000840,455−4082.57
Table 9. Comparison of operating times.
Table 9. Comparison of operating times.
The Planned Operating TimeModel Optimization Time
Train NumberXinjiekouPinganliXisiLingjinghutongXinjiekouPinganliXisiLingjinghutong
17:027:047:077:097:02:027:04:407:07:037:09:39
27:057:077:107:127:05:427:07:427:10:327:12:39
37:087:107:137:157:08:127:10:127:13:417:15:39
47:127:147:177:197:12:427:14:427:17:117:19:09
57:167:187:217:237:16:237:18:407:21:037:23:39
67:207:227:257:277:20:027:22:157:25:037:27:39
Table 10. Comparison results.
Table 10. Comparison results.
Train NumberActual Energy Consumption [kW·h]Actual Average Waiting Time [s]Energy Consumption by DRLDA [kW·h]Average Waiting Time by DRLDA [s]Time Reduction [s]Energy Efficiency [%]
181.50235.0076.40232.003.006.30
283.60231.0078.20230.001.006.50
382.70238.0084.50243.00−5.00−2.20
489.60307.0085.30301.006.004.80
588.50311.0090.10315.00−4.00−1.80
692.50304.0087.70300.004.005.20
Total518.401626.00502.201621.005.003.10
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Dong, L.; Qin, L.; Xie, X.; Zhang, L.; Qin, X. Collaborative Optimization Method for Multi-Train Energy-Saving Control with Urban Rail Transit Based on DRLDA Algorithm. Appl. Sci. 2023, 13, 2454. https://doi.org/10.3390/app13042454

AMA Style

Dong L, Qin L, Xie X, Zhang L, Qin X. Collaborative Optimization Method for Multi-Train Energy-Saving Control with Urban Rail Transit Based on DRLDA Algorithm. Applied Sciences. 2023; 13(4):2454. https://doi.org/10.3390/app13042454

Chicago/Turabian Style

Dong, Luxi, Linan Qin, Xiaolan Xie, Lieping Zhang, and Xianhao Qin. 2023. "Collaborative Optimization Method for Multi-Train Energy-Saving Control with Urban Rail Transit Based on DRLDA Algorithm" Applied Sciences 13, no. 4: 2454. https://doi.org/10.3390/app13042454

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop