Analysis of Bus Line Operation Reliability Based on Copula Function

: To promote the development level of urban sustainability, more and more cities have been paying attention to the improvement of public transportation. City managers intend to attract people from private cars to public transport by improving the service level of urban public transport. The operational reliability of bus lines plays a crucial role in maintaining high-level service of public transportation. Previous studies have focused on the service level of the whole line by investigating the overall stability of departure punctuality, line running time, and punctuality. This study aims to clarify the relationship between the whole line and sites in terms of the influence on the operational reliability of bus lines. This study proposes a bus route reliability evaluation method, based on copulas connect function, and the actual line data are taken as a case study. The results show that the method reveals the relationship between the overall line and the interstation paths in the context of operational reliability. This study finds the interstation path, which is more critical to the whole line. This method can provide a good reference for the diagnosis and evaluation of line operation and the optimization of bus routes. The results can support the sustainable development of public transportation.


Introduction
It has been realized that developing public transport is a promising and sustainable way to alleviate traffic congestion and improve the urban environment [1]. Correspondingly, many efforts have been made to develop public transportation [2]. Among these efforts, service quality improvements to public transport takes a dominant position. The operational reliability of bus lines is an important factor for the public to consider when choosing a trip. At the same time, it is also one of the most important indicators for industry departments to improve the service level and enterprises to improve the bus operation efficiency.
The main reason that determines the sustainable development of public transportation is its reliability. The reliability of bus services is an important reflection of the quality of bus service, and is also a key factor for residents to consider when choosing transportation modes [3,4]. Improving the reliability of public transportation is conducive to improving the service quality of public transportation, effectively enhancing the satisfaction of public transportation travelers, and promoting the sustainable development of public transportation.
The concept of public transport service reliability was first proposed by Sterman in 1976 [5]. Later, bus service reliability was defined as the ability to provide stable service within a period of time [6][7][8][9]. A survey report of bus service quality for Seoul Metropolitan City in 2003 provided a more comprehensive definition of bus service reliability, which included the accurate operation of public transportation systems, the probability of vehicles arriving at a station on time, the delay rate, and so on [10]. Some other reports noted that reliability of the public transportation system is the ability to complete specified tasks under certain operating conditions and times [11][12][13][14][15]. Additionally, time reliability of a public transportation system is divided into vehicle running time reliability and passenger waiting time reliability [16][17][18].
In terms of influencing factors, the Transit Capacity and Quality of Service Manual (TCQSM) system lists factors that affect reliability, which include traffic conditions, road construction and maintenance, vehicle and maintenance quality, availability of vehicles and drivers, bus priority, completion of the whole schedule, passenger demand balance, driver technology, use of wheelchair elevators and slopes, line length, and number of stations [19]. Abkowitz and Engelstein [20] pointed out that factors that may affect the operational reliability of bus lines include line characteristics (line length, number of signalized intersections, degree of parking on the road, distance between stations, etc.), operation conditions (traffic volume, service frequency, passenger activities), drivers (departure delay, driving behavior) [21,22], etc. Due to the influence and restriction of many factors on the reliability of public transport operations, research on the reliability of public transport has been a difficult problem for operators and researchers all over the world.
In terms of evaluation indexes, the TCQSM from the United States sets up evaluation indexes related to reliability at both the station and line levels when evaluating the quality of public transport services [19]. It is known that the reliability of adjacent stations is almost the same, so the line-level indexes, such as punctuality, has received a great deal of attention. In the following research, a number of indexes have been taken into account [23], such as the time coefficient of variation, running time ratio [24,25], time reliability [26], headway [27][28][29], and other indicators. Moreover, Chen et al. put forward three performance parameters to evaluate the reliability of public transport services, namely, ontime rate index (PIR), departure index (DIS), and uniformity index (EIS) [30]. Bruinsma et al. studied passenger departure times and section operation times [31].
In accordance with the above indexes, many reliability evaluation methods have been put forward, see [32][33][34][35][36][37][38]. Statistical methods play an important role in these methods. It is concluded that the bus travel time for one line obeys a normal distribution, lognormal distribution, gamma distribution, Weibull distribution, etc. Therefore, Zhao et al. proposed the system engineering theory [39] and Song studied bus lines as a series system and then solved the problem using Monte Carlo simulations [40]. Sharov and Boisjoly et al. studied the multi-level evaluation model and algorithm of operation reliability of conventional public transportation networks [40,41].
In summary, there has not been a consensus definition regarding bus operation reliability. From different perspectives, different indexes of bus service reliability have been provided. From the perspective of passengers, the adopted indexes are mainly about the smoothness of headway, punctual arrival, and waiting times. From the operating point of view, they are concerned with the execution rate of the running time. Most of the research perspectives involve variations of the operational time variance between stations, but there are few studies on bus service reliability from the perspective of the relationship between station and station adjacent paths.
To fill this research gap, this paper takes the conventional bus operation system on a single line as the research object, and defines the connotation of the operation reliability of bus lines with regard to system theory. The functional cause of operational reliability of a single bus line is defined to be composed of routes between stations in the middle of the bus line, and the reliability evaluation method under the influence of the correlation of the components is studied. Through the study of the relationship between the routes between stations and the total reliability of the bus line, a more reasonable basis will be determined for the optimization, adjustment, and service improvement of the bus line.
Along these lines, this paper evaluates and analyzes the reliability of bus routes and the relationship between routes at various stations using the Copula function. As a multivariable correlation analysis function [42], it can well express the correlation of different variables and has the advantage of system reliability analysis model construction. The other sections of this paper are mainly arranged as follows: in the second part, bus line operation reliability evaluation index is presented. The third part covers the general idea and bus line reliability evaluation model based on the Copula function. The fourth part covers the actual case analysis (including data and calculation steps). Conclusions and prospects are made in the last section.

Definition of Bus Line Operation Reliability
This paper takes one-way operation of a single bus line as the research object. The operational reliability of the bus line is defined from the view of the passenger. The main transport function of a bus line is to transport passengers from the starting point to the end point, within a certain period of time, under the specified route conditions. Specific description: Suppose a one-way single bus line is G, the running direction of the line is → , and it is composed of + 1 ( ≥ 1 ) bus stops. The bus stops are denoted by Ni (i = 1, 2, …, m, m ≥ 1), m is the number of stations. Lj (j = ,2, …, m) represents path between two stations.
In the operation of one-way bus lines, the situation of alternative lines is not considered; that is, only the operation of a single line is considered. In order to realize the function of one-way operation of the bus line, it is necessary to ensure the normal operation (that is, not invalid) of the bus vehicles between the stations on the bus line.
A functional structure diagram of the bus line is shown in Figure 1. In the operation process of the bus line system, G (N1 → Nm+1), any failure of interstation path operation will lead to failure of the whole bus line operation. Therefore, the reliability of the one-way bus route, G, can be represented by the failure probability of all the paths between m stations. The specific calculation method is as follows: The operating reliability of one-way single bus route G (N1 → Nm+1) is denoted by Rg(t), and the operating reliability of path Lj between each station is expressed by (t). In a previous study, it was assumed that the bus routes run independently between each station, the reliability of bus routes (reliability estimation of series system) is: In practice, the operation of routes between bus stops may affect each other. At the same time, bus operation is affected by many factors. It is a difficult problem to describe the relationship between inter-station path and the reliability of the line. This paper will adopt a new method to address this problem, which will be discussed in Section 3.

Operational Reliability Evaluation Index of Bus Lines
The operational reliability of a single bus line is related to the operation of vehicles. When considering the reliability evaluation index, it is mainly based on the following principles: 1. Reflect the operation reliability characteristics of bus lines, which is convenient to analyze and find existing problems; 2. Should be able to reflect the passengers for the bus to provide operational capacity requirements, that is, expectations; 3. It should be operable and easy to calculate based on the given data. After comprehensive consideration of the above factors, one-way bus line operation is regarded as a system based on the perspective of system theory, and each inter-station path is regarded as a subsystem.
Accuracy and regularity are the key indicators to measure the reliability of public transportation. The higher the accuracy and regularity, the more reliable the public transport system is. In a mathematical expression, accuracy and regularity are usually expressed by average time standard deviation and coefficient of variation. So the evaluation indexes of reliability in the operation of bus lines are selected, as listed in Table 1. Accompanied with the above indicators, a series of threshold ranges should be assigned to judge the reliability or unreliability of bus line operations and inter-station route operations. These threshold ranges are affected by many factors, such as the benefit and cost of bus, road traffic conditions, passenger demands and passenger expectations, etc., and the ranges may be different for different routes and different inter-station routes at different periods.
Broadly, the reliability index is the probability of a system during its normal life. So a one-way bus line operating life is the non-failure bus running time. That is generally within the range of the average running time, expressed in T. The distribution function corresponding to T is: F( ) = { > }, ≥ 0 . That's the life distribution F( ), that is, the probability that the bus line is normal (not invalid) before time t; that is, the reliability or reliability function of the bus line at time is: On the contrary, when the entire bus line running time exceeds the normal average running time range, it is a fault. Fault distribution is Fg(t).
is the inter-station bus travel time when the running time of vehicles between two adjacent stations falls within the variation range of normal running time between stations, and the corresponding distribution function of is: Life distribution, ( ), that is, the probability that inter-station paths are normal (not failing) before time t, that is, the reliability or the reliability function of inter-station paths at time t is: On the contrary, the operation time of the inter-station path exceeds the normal variation range of the inter-station operation time, which is a fault.
Under the specified conditions, the failure probability of a bus line in unit time after time is called the failure probability density function of time , which is denoted as ( ); then, ( ) is: Similarly, the failure probability of the inter station path in unit time after time under specified conditions is called the failure probability density function of the line at time , denoted as (t); then (t) is: The conditional probability of a bus line failure in a unit time or running path at time t is called the failure rate of the line, at time t, which is abbreviated as the failure rate, denoted as: In the same way, the conditional probability of the failure of the route between stations in the unit time or running path, at time t, is called the failure rate of the line at time t, denoted as:

Research Framework
The main research ideas of this paper are shown in Figure 2. Firstly, bus route G is described by the system theory method, and a series system is established. Next, the evaluation indexes of line operation reliability are determined, including the line indexes and the inter-station path indexes. Then, the Copula function model is determined. In addition, the parameter values are determined using the estimation algorithm. Finally, the line reliability and inter-station path dependency are analyzed using actual data.

Reasons for Choosing the Copula Function
As a multivariable correlation analysis function [41], the Copula function could well express the correlation of different variables and has the advantage of system reliability analysis model construction. To characterize the interaction between bus routes and station routes, the Copula function is used in this paper. The Copula function was derived from the Sklar theorem [43], which is able to construct multivariate joint distribution functions with arbitrary marginal distributions and has strong flexibility and adaptability. The Therefore, the Copula function can be used to describe the interaction of paths between stations when there is an interaction relationship between the paths between stations. Suppose that bus line G is connected in series by m inter-station paths, Lj, and the life of the i-th inter-station path is , reliability is ( ). If the life distribution function is ( ), I = 1, 2, 3, …, n, then the lifetime of G is T = ( , , … , > ). The joint distribution function is: According to Sklar's theorem, there exists an n-dimensional copula (C) such that: ( , , … , )= ( ), ( ), … , ( ) Because ( ) is a continuous function, ( ), ( ), … , ( ) is unique.
Then, the reliability calculation of bus line G is as follows: where 2≤ ≤ .

Copula Function
There are two types of popular Copula functions, elliptic Copula functions and Archimedean Copula functions. Elliptic copula functions could be further divided into integral normal Copula functions and t-copula functions, in which the Gauss copula function could be extended to the N-element model. , ,…, (. ) and (. ) denotes the standard normal distribution function and its inverse function, respectively. R is the n-order correlation coefficient matrix, X= ( , ,…, ) corresponds to n analysis variables, which is suitable for the same group of multivariable in system reliability analysis.

Selection and Estimation of Copula Function Model
Because the relationship between bus routes is very complex, it is very important to choose a copula function; it is necessary to analyze the correlation of indicators. This paper chooses a copula function based on empirical distribution analysis. The specific steps are as follows: Let ( , ), ( , ), … , ( , ) be a set of samples selected from the population (X, Y). The population (X, Y) has a joint distribution function H( , ), and the corresponding copula is a copula function. The empirical joint distribution function is ( , ) for the empirical copula, ( , ) indicates that the copula function model is selected according to the scatter diagram of (X, Y).
Before using the copula function to estimate the reliability of bus lines, the parameters of the copula function should be adjusted (θ) to make an estimate. At present, the most commonly used estimation method is maximum likelihood estimation. Suppose ndimensional random vector, , , , … , , the edge distribution is According to the extremum theorem:

Raw data
Taking the bus route 430 in Beijing as an example, GPS data and vehicle scheduling data from July 2020 were selected for the case analysis. Starting from Dongshagezhuang bus station to Andingmen, the whole journey is about 21.4 km. There are 25 stations in this direction, and 24 inter-station routes in the middle, as shown in Figure 3. The corresponding relationship between number and station is shown in Table 2. The original GPS data are shown in the following table (Table 3).  -1984 Coordinate System); RUN_STATUS is the state of operation of the vehicle, 1 for operation and 0 for non-operation; PARK_STATUS is the entry and exit status of the parking lot, 1 for in and 0 for out.
An example of raw data for a trip plan is shown in Table 4.

Data Processing
The raw data cannot be directly used due to the data completeness, noise, consistency, and other problems. In order to improve the data quality, it is necessary to preprocess the raw collected data, including missing data completion, error data deletion, duplicate data deletion, and GPS drift data processing. Taking GPS drift data processing as an example, the characteristics of GPS drift data are that the position of individual GPS data deviates seriously from the whole GPS data series.
where ∆ is the current GPS positioning point relative to the previous GPS positioning point displacement; t k is the current positioning point at positioning time; tk-1 is the previous positioning time; r is the range of errors of the GPS receiver, and V h is the maximum speed limit of the road network of the target city, and the V h of Beijing is 80 km/h.

∆ = × arccos [sin( ) sin( ) + cos ( )cos ( )cos ( − )]
where ∆ is the shortest spherical distance between two fixed points; R is the radius of the Earth; x k-1 and x k are the longitude coordinates of GPS fixed points; y k-1 and y k are the latitude coordinates of GPS fixed points. When ∆S>∆S m , a GPS position point is identified as a drift point.

Index calculation
According to Table 1 (in the second part), the line reliability evaluation index values are calculated through these original data. The specific steps are as follows:  inter-station path average running time ( ).
Step 1: Select the real-time bus dynamic data according to the route number, get the one-day arrival time of all the vehicles in a line, save the results as bus route arrival time table.
Step 2: According to the vehicle number and arrival time, get the arrival time of each vehicle.
Step 3: Each station is traversed, and the running time of each station is obtained by subtracting the time when the vehicles arrive at the station in turn.
Step 4: Calculate the average running time interval of each station path, that is, the average running time between stations of the line within the statistical period.


Inter-station path average running speed ( ) Step 1: Get average running time ( ) above.
Step 2: Match the average running time of up and down bus routes with the distance between bus stations.
Step 3: Calculate the ratio where is the operational mileage of the inter-station path.

Coefficient of variation (CVT)
CVT= , where is the running time between stations: where: is the inter-station path running time; is the average inter-station path average running time within a statistical time period; n is the total of inter-station paths. Through statistical analysis of one month's data of the line, the line reliability evaluation index results are obtained, as shown in Table 5 below.

Determine the Edge Distribution Model
Many distribution functions have been used for bus line running times, and include symmetric distributions, such as normal distribution and logistic distribution, asymmetric distribution, such as lognormal distribution and gamma distribution, and log logistic distribution. For headway distribution, exponential distribution and gamma distribution are usually used.
Based on past experience, assuming that bus 430 runs from Dongshagezhuang bus station to Andingmen, and the bus operation between stations follows a normal distribution, the probability distribution model, i.e., density function, is as follows: where -the mean value of random variable x; -the standard deviation of the random variable x. Using the maximum likelihood estimation method, the parameters of the density function of the bus line with a confidence of 95%, and , are as in Table 6.

Determine Copula Function Model
Suppose that bus line 430 is composed of 24 inter-station paths, Lj, in series, and the life of the i-th inter station path is , reliability is Based on the previous empirical analysis method, the copula function is selected, and the reliability of the bus line running time conforms to the normal distribution. Therefore, the reliability estimation function of the whole bus line is as follows: , ∈ , C( , , … , ) is the copula function, which is called a normal Gauss copula function with covariance matrix R, and the linear correlation matrix R is symmetric, so the positive definite matrix is diagR = 1.
Through the maximum likelihood estimation method, the following conclusions are obtained The reliability estimates of No.1-No.24 inter-station paths using historical data are shown in Figure 4. It can be seen from the Figure 4 that the average distribution of reliability between stations is above 0.85; only the fifth section is below 0.86, and the overall situation is relatively stable. By analyzing the three-dimensional diagram of the operation time distribution of the routes between stations, as shown in Figure 5, it can be seen that the overall trend of the operation time of the routes between stations tends to be consistent, which can also be understood as the correlation between the routes between stations. The previous operation affects the operation of the adjacent routes between stations, and has an obvious series effect. Through maximum likelihood estimation, taking a 95% confidence interval, the reliability estimation value of the whole line is 0.86. Through the analysis of Figure 4, the reliability estimates of the route between stations are all above 0.85, which indicates that the reliability of the whole line is obviously related to the reliability of the route between stations, and the maximum value of the reliability of the route between stations is 0.95, and the minimum value is 0.85. This shows that the reliability of the whole line is most affected by the path between stations with the smallest reliability, that is, the path number 5 has the greatest influence. The path with number 5 just happens to be from the south of the Tiantongyuan north station and to Dongsanqi south station, which may be due to the large passenger flow or the large number of social vehicles. Therefore, a reliability improvement project can be carried out for this section in the later stage to improve the reliability of the whole line. As shown in Figure 6, managers should find and improve the inter-station sections with a low reliability in the line during the whole day. Figure 6. Route map.

Results and Discussion
Analyzing the relationship of route reliability between stations is the core of analyzing the key nodes of the whole route operation reliability and the relationship between the routes of each station. Through the copula function-based reliability estimation of public transportation lines, the reliability index relationship between different routes can be analyzed.
Through the above theoretical analysis and practical case analysis, the operation reliability of the one-way bus line and the relationship with the path reliability between stations can be obtained. Assuming that the operation state of the paths between stations obeys a normal distribution, then the copula function in the whole line operation reliability relation function is called the normal Gauss copula function with covariance matrix R.
Through the actual line analysis, the typical inter-station path, which affects the operation reliability of the whole line, can also be obtained, and the response reasons can be determined. The operational reliability of the bus line between stations is affected by the operation of adjacent shifts and the passenger flow and road conditions, which provides the basis for improving the reliability of the whole line or the optimization adjustment of the line in the later period.
Through the analysis of the reliability of the bus line between stations, it is found that the reliability of the whole line is most affected by the lowest reliability between stations, and further proves the correctness of the assumption that the whole line is a series system. However, the operation reliability of the same line in different driving directions may vary greatly. Therefore, when calculating the bus operation reliability, the same line needs to be calculated according to the two driving directions of the bus, and the reverse direction of inter-station path reliability is also different.
In addition, traditional research takes route operation between stations as an independent subsystem, but this paper assumes that there is interaction relationship between the route operation between stations of all lines. It is difficult to make a conclusive comparison in this regard, because few previous studies have discussed the relationship between stations. The traditional research method of station reliability is based on a statistical method to calculate the distribution of the punctuality rate of the arrival time of each station. Variable sites are identified by the distribution of site reliability. Therefore, it is considered that the station may be near an unstable section of the line. This statistical result is essentially consistent with the critical path result identified in this paper. The advantage of this paper is that it can also see the relationships and values between this section and the reliability of the whole bus line; traditional statistical methods however do not. In the calculation process, most of the traditional research conclusions are adopted, that is, the assumption that the bus lines and inter station routes are in line with the normal distribution; in practice, they may not be in line with the normal distribution, which needs further verification and to overcome one of the limitations of current computing.
Because the research on the reliability of the structure based on the copula theory is still in the primary stage, because of the complexity of bus line operations, there is uncertainty in this field, so it is a feasible exploration to establish bus line reliability evaluations using the copula function. However, the parameter estimation of multidimensional the copula function is involved. Due to the increase in dimensions, the calculation time and difficulty may be increased, which is an aspect that needs to be improved in follow-up research.

Conclusions
The operation reliability of bus lines plays a crucial role in maintaining high-level service of public transportation. This study aims to clarify the relationship between whole line and sites on influencing the operation reliability of bus lines. This study proposes a bus routes reliability evaluation method based on copula connection functions, and the actual line data are taken as a case study. The results show that the method reveals the relationship between the overall line and the inter-station paths in terms of operation reliability. This study finds the interstation path that is more critical of the whole line. Previous studies focused on the service level of the whole line by investigating the overall stability of the departure punctuality, line running time, and punctuality [44].
The establishment of the copula function to evaluate and analyze the operational reliability of bus lines will lay a foundation for later expansion to the overall operational reliability of a bus line network. At the same time, it plays an important role in the operation and optimization of a single line. The method can provide a good reference for diagnosis and evaluation of line operation and the optimization of bus routes. The results can support the sustainable development of public transportation. It can make improvements to parts of a route, according to the relevant conclusions. In addition, it can also be used to optimize the operation scheme of the entire bus line, so as to achieve the effect of improving the punctuality rate of bus operations, service level and public satisfaction [45]. The next step is to attract more people to choose public transport and finally to provide support for green and sustainable development.