Exact and Evolutionary Algorithms for Synchronization of Public Transportation Timetables Considering Extended Transfer Zones †

Featured Application: The planning methods presented in this article are speciﬁcally applicable to help decision makers in the processes of the conﬁguration and operation of public intelligent transportation systems under the novel paradigm of smart cities. Abstract: This article addresses timetable synchronization in public transportation, an important problem in modern smart cities, in order to guarantee a proper quality of service to citizens. Two variants of the bus timetabling synchronization problem considering extended transfer zones are studied: optimizing offsets and optimizing offsets and headways for each line. An exact mixed integer programming and an evolutionary algorithm are developed to solve both problem variants. The algorithms are evaluated on 45 instances of a real case study, the intelligent transportation system of Montevideo, Uruguay. Experimental results reported signiﬁcant improvements over the current timetable implemented by the city administration. The number of successful synchronizations improved up to 66.6% and 179.9% for the ﬁrst and second problem variant, respectively. The average waiting times for transfers improved, especially in tight problem instances (up to 57.8% and 158.3% for the ﬁrst and second problem variant, respectively). The proposed planning methods are useful to help decision makers to conﬁgure public transportation systems.


Introduction
Public transportation is a key service in smart cities, allowing for efficient and low pollution mobility for citizens and reducing the dependency on cars and other motorized transportation modes [1,2]. Designing and operating an efficient public transportation system requires solving several relevant problems, including the proper design and management of routes, timetabling, and planning of buses and drivers, to provide a good quality of service to citizens [3] and also to promote sustainability [4].
The timetable synchronization problem consists of crafting a timetable that optimizes transfers between lines in a public transportation system. Timetable synchronization has been recognized as one of the most difficult problems for public transportation planning and optimization [5]. The problem has been often addressed intuitively, assuming that experienced operators are able to define headways, i.e., the time between consecutive departures of buses in a line to provide a proper quality of service. Conversely, this work proposes combinatorial optimization approaches to state the problem and the use of optimization techniques to solve real-world instances.
A flexible approach is proposed, entirely applicable to contexts where information about operating lines, traveling times, and transfer zones is available and when transfers proposed exact and evolutionary approaches, adapted to solve both addressed problem variants; (v) an exhaustive experimental evaluation of the proposed optimization methods over 45 problem instances for each problem variant.
Regarding previous studies in the literature, this article contributes by considering an extended transfer zones proposal to model transfers between different (geographically separated) bus stops of different lines. This formulation is more useful than standard models to capture the reality of modern intelligent transportation systems that do not limit transfers to specific locations, instead allowing them to be performed at every pair of bus stops. The new problem model provides flexibility for passengers who have many realistic transfer options available and also for the bus system administration to design proper timetables. The new formulation also considers the explicit demand of transfer trips for each pair of bus stops that defines a transfer zone, unlike previous formulations that only considered the number of bus trips synchronized [9][10][11] or proposals focused on minimizing waiting times between bus trips [12]. Thus, the proposed formulation allows for focusing on the quality of service offered to passengers better than existing approaches. The proposed evolutionary approach is also a contribution, as no previous application of EAs to solve the problem was found in the review of related works. The EA is useful for solving large-dimension problem instances, e.g., by considering a large number of transfer zones in a city scenario. Besides the formulations proposed to model both problem variants, a relevant contribution of the reported research is related to the obtained results for the case study in Montevideo: both the exact and EA approaches are able to compute timetables that are significantly improved over the real timetable applied by the city administration, considered as a reference baseline for the comparison.
The article is organized as follows. Section 2 introduces the problem, the proposed model, and the two variants addressed. Section 3 presents a review of related articles. The exact and evolutionary approaches for bus synchronization are described in Section 5. Section 6 reports the evaluation of the proposed methods over realistic problem instances in Montevideo, and the main conclusions are outlined in Section 7.

The Bus Synchronization Problem
This section introduces the bus synchronization problem (BSP) model and the formulation of two specific variants of the problem.

Overall Description of the Problem Model
The BSP considers two of the most important purposes of a mass transportation system: offering an efficient means for the movement of citizens, while maintaining low costs and fares. The considered model focuses on citizens, offering them an efficient travel experience and short waiting times for passengers that use two or more buses to perform consecutive trips.
The concept of a synchronization event is defined as the action of providing passengers transfers whose waiting times are lower than the maximum time passengers are willing to wait. The research proposes addressing the BSP on real scenarios, built using real data about lines, bus stops, traveling times, and passengers performing transfers between lines. The problem model divides a day into several planning periods, considering the regularity of travel demands, travel times, and citizens' behavior. On each planning period, a data analysis approach can be applied to extract common characteristics and steady information to build BSP instances.
In the considered problem model, the bus network is represented by a set of bus lines and a set of relevant locations (the synchronization nodes or transfer zones), where passengers can transfer from one line to another. Unlike previous formulations of the problem [9,10], in the proposed model, nodes are not just bus stops but wider transfer zones, formed by separated bus stops for lines i and j. This way, the model can include several bus stops for several lines (see a graphical description for two generic lines i and j in Figure 1). The proposed model explicitly includes the distance between the bus stops for lines i and j, d i,j b , drawn in red in Figure 1. For the instances solved in this article, it is assumed that the walking speed of pedestrians is constant ws = 6 km/h, and the walking time for a distance d between any pair of bus stops is given by wt = d /ws. The proposed problem model is useful for capturing the reality of contemporary intelligent transportation systems, which do not impose a limitation on the number of transfers for passengers and in which transfers are not limited to specific locations. Instead, transfers can be performed at every pair of bus stops, providing flexibility to passengers when planning a trip. This is the case of the intelligent transportation system in Montevideo, Uruguay, which is the case study in this article [7]. In this scenario, the corresponding synchronization problem is more complex, as many realistic transfers options are available to passengers, which in turn may require them to walk between bus stops.
Unlike traditional formulations, the proposed problem model is not focused on maximizing the number of passengers headed from origin to destination, mainly because the main focus is on the user experience when performing transfers, by considering the maximum time passengers are willing to wait for a transfer. This way, the objective function of the optimization model considers the demand of transfers in each transfer zone (pair of bus stops) for all synchronized trips of two bus lines. Previous articles addressing different variants of the timetable synchronization problem have worked under stronger assumptions, i.e., only considering trips for lines to synchronize.
Two cases are distinguished in the proposed model. The case with uniform departing times and the case with non-uniform departing times. These two cases are described next.
In the case with uniform departing times, there are no differences between departing times in the planning period; thus, r and s (trips of line i and line j, respectively) are not relevant for defining the time between consecutive departures (which in fact is constant, i.e., F 1 i ). This is a realistic assumption when planning for short and medium periods (e.g., in the morning, in the afternoon, in rush hours, etc.). For example, in the case study solved in this article, i.e., the bus network of Montevideo, Uruguay, this assumption holds: in the time period 12:00-14:00 on working days, the time between consecutive buses is almost constant for all lines (the standard deviation of their values is between 0.80% and 1.37%). However, trips of line i and line j are relevant for defining the number of synchronizations and the number of synchronized passengers on each trip.
In the case with non-uniform departing times, the trip number r is relevant to determine synchronizations. The time between consecutive buses is given by X i r − X i r−1 for trip r of line i, and the problem formulation must be solved considering the number of trips that are synchronized. When computing the objective function in this model, the demand is split uniformly among the f j trips of line j. This assumption is also realistic when passengers' demand has slight variations between consecutive trips. For modeling purposes, we are simply assuming that the planning period is determined because of its regularity. Whenever buses' interarrival times are uniformly distributed along a planning period, a fairly precise reference weight (i.e., average number of passengers) for each transfer zone is the ratio of the demand of transfers for the considered lines i and j over the number of trips of line i in the planning period T. However, due to the fact that the passengers transferring onto a bus previously alighted from another, they arrive in bursts. There are more accurate formulations for weights in transfer zones [13]. Defining and studying an accurate model to handle variable demands are proposed as future work.

Related Work
Timetable synchronization was stated as an important problem for public transportation systems in pioneering research articles by Ceder [3]. Daduna and Voß [14] presented one of the first proposals for synchronizing schedules on public transportation. The authors studied several objective functions, including weighted sum approaches with transfers and maximum waiting time, and proposed metaheuristics for simple problem versions with uniform frequencies. The approach was evaluated over a case study on Berlin and several cities in Germany. In the experiments, tabu search outperformed simulated annealing in randomized instances, and the authors highlighted the trade-off between operation costs and efficiency.
The transit network timetabling problem was studied by Ceder et al. [15] to optimize synchronization events at stops shared by several bus lines, by maximizing the number of simultaneous arrivals. A greedy approach was introduced to define custom timetables, properly selecting nodes from the bus network. The article focused on maximizing simultaneous bus arrivals over small problem instances involving few nodes and lines. A subjective metric was proposed by Fleurent et al. [16] to evaluate synchronizations, including expert knowledge. The proposed synchronization metric was applied to design a heuristic to minimize vehicle operation costs in a case study consisting of small problem instances from Montréal, Canada. Different timetables were found by weighting the components of the cost function.
Shafahi and Khani [17] presented a mixed integer programming model for optimizing transfers in a public transportation network with fixed transfer stations. The problem considered the offset optimization, i.e., setting the departure times of buses, which was solved by an exact method using CPLEX. A second problem variant was formulated, accounting for the extra stopping time of buses at transfer stations. A genetic algorithm was proposed to solve this problem variant over small problem instances involving 14 lines and just three transfer stations. A real case study was presented: the public transportation network in the city of Mashhad, Iran. The proposed methods were able to improve up to 14.5% with a business-as-usual (i.e., no explicit optimization) strategy.
A variant of the the synchronization problem including time-windows between travel times was addressed by Ibarra and Ríos [10]. A multi-start iterated local search (MILS) was applied to solve eight scenarios from the public transportation of Monterrey, Mexico, considering from 4 to 40 nodes. In less than 60 seconds, MILS computed accurate timetables for medium-size instances, regarding both upper bounds and an exact Branch & Bound algorithm. MILS was also applied by Ibarra et al. [11] to solve the multiperiod BSP, to optimize multiple trips of a given set of lines. MILS was able to compute results similar to a variable neighborhood search and a simple population-based algorithm on synthetic instances with few transfer zones. Results for a sample case study using data for a single line of Monterrey demonstrated that maximizing synchronizations for a specific node usually reduces the number of synchronizations for other nodes.
A recent article by Abdolmaleki et al. [18] proposed a model for timetable synchronization assuming a fixed headway for each line. The authors identified specific cases of the problem that are solvable in polynomial time. An approximation algorithm was proposed, based on the maximum directed cut problem. The proposed method was evaluated in a simple network and a large case study in Mashhad, Iran. In turn, a recursive algorithm that executes in quasi-linear time was proposed to minimize the total transfer waiting time in a problem variant that relaxes the fixed headway assumption.
An EA was proposed in a previous article [19] for a specific variant of the BSP, which outperformed real timetables and heuristics. This article extends our previous approach, solving a different BSP variant to determine optimal values for offset and headways, maintaining the number of trips of the real timetable, thus not impacting the provided quality of service.
Regarding the related literature, the approach proposed in this article contributes by considering the extended transfer zones model accounting for transfers between different (geographically separated) bus stops of different lines; by considering the explicit demand of transfer trips for each pair of bus stops that defines a transfer zone to better focus on the quality of service offered to passengers; and by simultaneously adjusting offsets and headways for each line.

The Proposed Problem Formulations
This section describes the proposed problem formulations for the considered problem variants.

Problem Data
The formulation of the studied problem considers: • The planning period [0, T], expressed in minutes. • A set of bus lines I = {i 1 , i 2 , . . . , i n }, whose routes are fixed and known beforehand. is the distance that separates the bus stops for lines i and j in b. Each b considers two bus stops with transfer demand between lines i and j, as described in Figure 1 Considering the previously defined elements, two variants of the BSP are formulated, accounting for the optimization of just offsets and both offsets and headways for each line, respectively. The next subsections describe these two problem variants.

Problem Variant #1: Offset Optimization
The first problem variant focuses on optimizing the offsets (i.e., the departing time of the first trip of each line). Subsequent departing times are fixed by the reference value defined by the city administration (F i ). Thus, the control variables of the problem are the offset of each line (X i 1 ), which defines the whole set of departing times for all trips of each line. Auxiliary variables are needed to capture the synchronization events in each transfer zone. Binary variables Z ij rsb take a value 1 when trip r of line i and trip s of line j are synchronized in node b (i.e., trip r of line i arrives before trip s of line j and allows passengers to complete the transfer, i.e., walk between the corresponding bus stops and wait less than the waiting threshold for that transfer, W ij b ). To guarantee that all lines perform the required number of trips in the planning period, possible values for the offset are limited to the interval [0, T (mod F i )] (see a graphical description in Figure 2).
The optimization problem formulates the maximization of the number of successful transfers completed in the planning period in every transfer zone (the objective function in Equation (1)). The expression ∑ if the corresponding transfer is synchronized. In Equations (2) and (3), the arrival time of trip r of line i to transfer zone b is A i rb , and the arrival time of trip s of line j to transfer zone b is A j sb . For an interpretation of constraint (2), consider that the limit time defines the maximum time passengers are willing to wait for a transfer between trip r of line i and trip s of line j at transfer zone b. Whenever the arrival time of trip s of line j does not surpass that limit, the right-hand side of Equation (2) is greater (or equal) to 1, so this is the only case when Z ij rsb (the synchronization variable) is allowed to take the value 1. Furthermore, it is also necessary for passengers alighting from trip r of line i to walk to the transfer point (arriving at time A i rb + d ij b ) before trip s of line j arrives (at time A j sb ). Otherwise, passengers will not complete the transfer on time. This second condition, when met, also allows Z ij rsb to be set to 1, as the right term of constraints in Equation (3) is positive. To date, there is a potential issue when non-synchronized trips lead to negative values on the right term of Equation (3), which produces unfeasible constraints. The formulation only needs one value, , to be lower than zero, so the synchronization variable Z ij rsb is deactivated. Thus, it is enough to introduce a constant value M, large enough to guarantee that both Equations (2) and (3) These values of M can be easily calculated during the process of crafting the MIP formulation before using any solver implementation, since finding M is a polynomial complexity problem. Equation (4) indicates that the maximum value for the offset of each line is the minimum between T (mod F i ) and the maximum headway H i . No constraints are defined over headways, since a fixed frequency F i , which satisfies the bounds for headways, is assumed for all subsequent trips. Finally, Equation (5) defines that decision variables Z ij rsb are within the domain of binary variables.
Without losing generality, the proposed problem formulation assumes that F j > W ij b , ∀j ∈ I, i.e., headways of bus lines are larger than the waiting time thresholds for users. The case where F j ≤ W ij b corresponds to a scenario in which the headway of line j is lower than the time users are willing to wait; thus, all transfers with line j would be synchronized, and they would not be part of the problem to solve (i.e., those lines can be removed for the specific problem instance to solve).

Problem Variant #2: Headways Optimization
The second problem variant proposes optimizing not only the offsets of each line but also the headways of each line. Headways are allowed to vary within a fixed range of the reference value F i . The interval for variation is defined by To work under the assumption of maintaining an appropriate quality of service for the transportation system, which is assumed to be fairly provided by the current situation (i.e., using the reference values for the time difference between consecutive trips of each line), small values of α are considered (see a graphical description in Figure 3).
In Equations (6)-(13), X i r is the time of departure of trip r of line i, and F i is the reference value for the time difference between consecutive trips of line i, as defined in problem variant #1. In turn, α ∈ [0, 1] is the coefficient used for defining the allowed deviation of times between buses from the reference value F i . Finally, X i f i is the departing time of the last trips of the line i.
The objective function is Equation (6), i.e., maximizing the number of successful transfers completed in every transfer zone in the planning period. In this case, the demand is split considering the (flexible) time between consecutive buses of line i (i.e., X i r − X i r−1 ). Equations (7)-(13) formulate the problem constraints. Equations (7) and (8) define a synchronization, as in the previous problem formulation.
Equation (9) guarantees that every trip r of line i synchronizes with, at most, one trip s of line j. The last point is necessary because headways are part of the control variables in this case. In the previous model, headways were known in advance, so they could be pre-filtered in the case where a headway was lower than the maximum time passengers are willing to wait at some stop. Conversely, in this model, Equation (9) prevents us from counting such synchronizations more than once. Constraints in Equations (10) and (11) impose the limits for headways, according to the allowed variation α. Constraints in Equation (12) guarantee that the last trip is within the specified maximum range. Finally, Equation (13) defines the domain for decision variables Z ij rsb . The formulation in Equations (6)- (13) intuitively extends the previous to incorporate headways as control variables, but it has the drawback of being quadratic in its objective function due to the product of Z ij rsb (X i r − X i r−1 ). However, it is linearized by a change of variables and additional constraints. Let y ij rsb be as Z ij rsb (X i r − X i r−1 ) in (6), so the objective turns out to be 1 , which is linear. Moreover, two equations per each y ij rsb variable must be included: Within a maximization problem, variables y ij rsb will take a value as high as possible. H i is an upper bound for (X i r −X i r−1 ) because of Equation (11), so whenever Z ij rsb = 1, the second equation results, y ij rsb ≤ H i , and it is the first equation that guarantees y's value to be (X i r −X i r−1 ) at most, which is then exactly the value that the variable y ij rsb will take. Conversely, when Z ij rsb = 0, the second equation forces y ij rsb to be 0. This behavior replicates that of Z ij rsb (X i r − X i r−1 )'s product, so the change of variables is equivalent and linear.  Table 1 summarizes both problem variants. Variables X i 1 (i.e., offsets) are the control variables in both proposed problem variants, while the latter adds new control variables X i r with r > 1. The difference is semantic rather than substantive, and we refer to them as X i r in general. The group of variables Z ij rsb are auxiliary in both proposed problem variants. They are used to identify synchronizations as an outcome of control variable X i r values. These variables are only meaningful when stated as Boolean variables, a condition that must be explicitly set in the formulation since it is not achieved otherwise. Finally, problem variant #2 introduces y ij rsb variables, which are also auxiliary and number as many as those of Z ij rsb . They are used to obtain to a purely linear alternative formulation for the second problem variant.
Unlike Z ij rsb variables, the domain of X i r and y ij rsb is that of the real numbers, since their values are a consequence of active constraints in an optimization process. However, whenever parameters that define an instance are integers, optimal X i 1 values Equations (1)-(5) are to be integers as well, as they are the result of pushing the objective function variables against integer constraints. The situation is not so for Equations (6)-(13), because α values greater than 0 generally lead to non-integer bounds in both proposed problem variants.

Exact and Evolutionary Computation Methods for the BSP
This section presents the exact and evolutionary computation approaches developed to address the BSP, considering extended transfer zones.

Mathematical Programming
The exact method for solving the formulated model was developed by combining several tools. Each instance in the input dataset was imported into MATLAB matrices, just as it was for each solution (externally found). In this way, results could be easily analyzed, debugged, and post-processed. The software version of MATLAB is R2015a-8.5.0.
Some parameters of the models formulated in both proposed problem variants are to be computed during the MIP construction itself. The most notorious is the value of the parameter M, which has to be large enough to prevent (2), (3) or (7), (8) from being unfeasible but not so large to lead to numerical issues. Because of this, we decided not to use AMPL or other algebraic modeling language to feed the solver. Instead, we developed a C++ program to read instances in the input dataset and convert them to CPLEX LP-format. IBM(R) ILOG(R) CPLEX(R) Interactive Optimizer 12.6.3.0 was used as the optimization tool. The default GAP tolerance for the MIP solver is 0.01%. This corresponds to the relative distance between the best integer solution found and the best upper bound estimated up to that moment: is its objective function value, and bestBound is the lowest upper bound found for the optimum value.

Evolutionary Algorithm
The EA was developed using the Malva library (github.com/themalvaproject, accessed on 15 June 2021) in the C++ programming language. The main implementation details are described in the next subsections.

Solution Encoding
For both problem variants, candidate solutions are represented using integer vectors. Next, both representations are explained.
Problem variant #1: offset optimization. For problem variant #1, each integer value in the solution representation indicates the offset for each bus line, i.e., the time elapsed between time zero (start of the planning period) and the time when the first trip of the line departs. A candidate solution of the BSP is represented by a vector X = {X 1 1 , X 2 1 , . . . X n 1 } (n is the number of lines in the instance), X i 0 ∈ Z + , and 0 ≤ X i 0 ≤ T mod H i . Figure 4 describes the solution representation for a problem instance with n bus lines.  Figure 5 describes the solution representation for a problem instance with n bus lines (offset values for each line are marked in blue font).
X 2 1 · · · X 1 f 2 · · · · · · X n 1 X 1 n · · · X n f n line 1 line 2 line n Evolution model. The EA follows the (µ + λ) evolution model [20]. λ offsprings are generated from µ parents, and all compete among themselves to select the solutions to include in the population in a new generation. The (µ + λ) evolution model computed better and more diverse solutions than a standard generational model in configuration experiments.
Initialization operator. The initialization generates random solutions, selecting integer values within the corresponding ranges and taking into account the problem constraints. The random initialization operator is conceived to generate an appropriate diversity for the evolutionary process.
Selection operator. A tournament method is used for selecting individuals during the evolutionary search. A tournament size of three individuals was used (just one individual survives). The tournament operator outperformed the standard proportional selection in configuration experiments, providing a proper selection pressure for candidate solutions during the evolutionary process.
Recombination operator. An ad hoc variant of the well-known two-point crossover operator was designed. Crossover points are randomly selected in [1, n−1], and information from both parents is exchanged between the crossover points. The main idea is to maintain those features of bus lines synchronized in the parent to be used as relevant information for offspring generation. The recombination probability is p R . A correction procedure is applied to guarantee the feasibility of the generated solutions by properly shifting conflicting offset and/or headway values.
Mutation operator. An ad hoc variant of Gaussian mutation is applied. Selected position(s) in an individual are changed according to a Gaussian distribution and considering the limits (minimum and maximum) defined for both offsets and headways of each line. The mutation probability is p M .

Experimental Evaluation
The experimental analysis of the exact and evolutionary methods for the BSP is reported in this section.

Methodology
This subsection describes the methodology applied for the experimental evaluation of the proposed methods to solve the BSP.

Problem Instances
The experimental evaluation considered realistic problem instances, generated using real information from the case study, i.e., the STM in Montevideo, Uruguay.
Several data sources were considered to gather information to build the BSP instances. All the information about the bus network (lines, routes, real timetables, bus stops locations, etc.) was retrieved from the National Open Data Catalog (catalogodatos.gub.uy, accessed on 15 June 2021). The real information about transfer demands was provided by the city administration. All the gathered data were analyzed applying urban data analysis [7]. Several elements define the scenario and the problem instances built:

•
The starting time was set to 12:00 p.m., and the final time was set to 2:00 p.m., including one of the two peak hours occurring in a working day for the public transportation system in Montevideo [21]. • The transfer demand function (P) is generated from real information registered by the smart cards of the STM.

Execution Platform
Experiments were carried out on a Quad-core Xeon E5430 at 2.66GHz, 8 GB RAM, from the high-performance computing infrastructure of the National Supercomputing Center (Cluster-UY), Uruguay [22].

Baseline Solution for the Comparison
A significant solution to be used as baseline for the comparison is the real timetable used in the transportation system of Montevideo. This solution determines the current level of service regarding both direct trips and transfers. The real timetable does not apply an explicit synchronization approach for transfers. The first trip of each line departs when the planning period starts, and subsequent trips depart according to the real predefined headways for each line. The comparison with the real timetable allows determining the advantages of the explicit optimization approaches proposed for the studied BSP variants.

Statistical Analysis and Metrics
The effectiveness of the proposed EA is assessed via proper statistical analysis. For every problem instance and experiment, 30 independent executions (i.e., using a different seed for the random numbers generator) were performed. Then, a three-step analysis is applied to study the fitness results distributions: 1.
Normality check. The Shapiro-Wilk statistical test is applied to determine if the results distribution is modeled by a normal distribution, i.e., computing the likeliness of the underlying randomness to be normally distributed.

2.
Mean rank comparison (for parametric configuration analysis). The Friedman rank statistical test is applied to detect/analyze the differences in the distributions of fitness values across multiple executions.

3.
Pairwise comparison of distributions. The Kruskal-Wallis non-parametric test is applied to determine if the differences between two parameter configurations have statistic significance.

4.
Boxplots are used for results visualization and graphical comparison. Relevant order statistics are computed and reported: first quartile (Q1), third quartile (Q3), median, minimum, and maximum values, since the Shapiro-Wilk test confirmed that the results do not follow a normal distribution. The interquartile range (IQR) is used as a measure of statistical dispersion, as usual for non-normal distributions.
Several metrics are used for evaluating the exact and the evolutionary approach: (i) the number of successfully synchronized trips for passengers (i.e., the objective function defined in Equations (1) and (6)), (ii) the improvements over the real solution, and (iii) the average waiting for transfers in a transfer zone.

Parameter Setting
Exact Mathematical Programming. The default set of parameters of CPLEX was used to find exact solutions, except for the timeout parameter, which was set to 7200 seconds. Variant #2 with α = 0 is equivalent to variant #1, where headways are fixed and equal to F i . However, problem formulations are not equal nor are the proposed methods to solve them. Variant #2 implies significantly more variables than variant #1, so it is worth checking that algorithms are also able to rapidly find solutions when α = 0. Such instances were solved within a few seconds, so the timeout limit only applies for α = 0.3. It was verified, though, that the performance of the more general variant #2 is quite good for very low values of α, comparable to the performance of the much simpler version #1. Runtimes degrade for higher values of α, and in fact, the best solutions found for α = 0.3 were not proven to be optimal according to the defined tolerance for the MIP solver (0.01%). There is room to explore changes in parameters in order to improve performance, which is appointed as a future line of work.
Evolutionary algorithm. Since EAs are non-deterministic, parameter configuration analysis is mandatory to find the proper combination of parameter values to compute the best results. Studied parameters included population size (ps), recombination probability (p R ), and mutation probability (p M ). Experiments were performed on five small problem instances with different features to avoid bias. Candidate values considered for each parameter were ps ∈ {15, 25, 50}, p R ∈ {0.5, 0.75, 0.9}, and p M ∈ {0.001, 0.01, 0.1}.
A summary of the results obtained in the analysis of the population size is presented in Table 2. Results of median and best fitness values are reported for each population size. The Friedman rank test is applied to analyze the results distribution. Results in Table 2 demonstrate that the best results were computed using a population size of 25 individuals. Using a larger population size turned to be counterproductive for the EA, since suboptimal solutions dominate quickly and results do not improve in the long term. Table 3 reports the best fitness values and the Friedman ranks of the nine configurations (C1, . . . , C9) considered in the analysis of the recombination and mutation probabilities. The p-value of the Friedman rank was below 10 −3 , thus assuring statistical significance of the results. In turn, the boxplots in Figure 6 graphically compare the median, best, worst, Q1, and Q3 values obtained for each studied configuration in the problem instance, which is representative of the results computed for other instances too. According to the reported metrics, the best results were obtained with configuration C2, i.e., setting PS = 25, p R = 0.5, and p M = 0.005. Configuration C2 systematically obtained the best Friedman ranking in all but one of the studied problem instances.   Figure 6. Boxplot analysis of parameter configurations for a representative BSP instance.

Numerical Results
This subsection reports the numerical results of the proposed methods for the BSP and the comparison with the baseline real timetable for both studied problem variants. Table 4 presents the objective function values achieved by the exact and the EA for the studied BSP instances for variant #1 (offset optimization). The real timetable is reported as baseline for the comparison. Column 'EA vs. exact' compares the results of both proposed approaches (a negative percentage value means a smaller function value for the EA solution). Relative improvements over the real timetable (in percentage values) are also reported. Column 'EA vs. real' reports the relative improvement over the real timetable for the EA solution, and column 'exact vs. real' reports the relative improvement over the real timetable for the exact solution.

Problem Variant #1: Offset Optimization
Results in Table 4 demonstrate that both exact and EA significantly outperform the baseline real solution in all problem instances. EA improved the real solution up to 66,3% in instance 40.37.30.1, and the exact method improved over the real solution up to 66.6% in instance 70.63.30.2. When comparing the EA and the exact solutions, the evolutionary approach demonstrated to be highly efficient at solving the problem: it computed the optimal solution in 54 out of the 75 problem instances studied. In all other instances, the distance to the optimum was below 1%. Overall, improvements over the real timetable were 24.5% (on average) for EA and 24.6% (on average) for the exact solution. The average improvements of both exact and EA over the baseline solutions, grouped by tolerance and scenario size, are reported in Table 5. Regarding tolerance levels, improvements of EA and the exact algorithm over the real timetable were up to 57.56% and 57.96%, respectively, in scenarios with λ = 30, which pose the tighter bounds for user waiting times. In less restrictive scenarios, both studied methods improved over 9.74% in regard to the real timetable. Regarding the scenario dimension, both methods computed robust solutions that improve between 23.88% (EA, for scenarios with 30 transfer zones) and 25.72% (exact, for scenarios with 110 transfer zones). The reported improvements on the objective function values grouped by tolerance and problem dimension indicate that for all sizes, the improvements over the real timetable increase as user tolerance decreases. This is a relevant result, which demonstrates that the optimization methods properly scale when the complexity of the problem increases, providing solutions with a better quality of service. Improvements also slightly increase for larger instance dimensions.

Problem Variant #2: Offset and Headways Optimization
Problem variant #2 imposes a significantly harder challenge for the studied optimization algorithms. Control variables, originally spanning a sole variable per line (offsets), also incorporate several more variables per line (headways). In addition, as mentioned in Section 4.3, in the exact model, the number of variables increases not only because of control variables (i.e., offsets and headways) but also because of auxiliary variables (i.e., synchronizations and the objective's contributions), half of which are Boolean. In turn, for the EA, the number of variables increased more than 20 times, and a correction procedure is needed to repair non-feasible solutions generated by the evolutionary operators. Both facts result in a larger and more complex search space for optimization. Table 6 reports the results of the exact algorithm for BSP variant #2. Each scenario has an associated row in the table. Columns #vars, #varsCtl, and #varsBool, respectively, correspond to the total number of variables, the number of control variables (all of which are real variables), and the number of Boolean variables. The column named best-sol reports the value of the objective function for the best solution found before timeout. Column GAP corresponds to the estimated worst-case gap to the optimum, that is, the relative gap between the reported solution and the lowest upper bound for the optimum estimated up to that moment. Finally, column time to solution shows the time required for the solver to find that solution.

Scenario #vars #varsCtl #varsBool Best-Sol GAP Time to Solution
Since no instance was solved to optimality when α = 0.3, the overall runtime was 7200 s for all instances. For example, in instance 30.40.50.4, the best solution was found in 328 seconds, and it was not improved in the remaining 6872 seconds. The rest of the processing was spent to improve the gap to the lower bound. GAPs reported in Table 6 are always best-gaps, i.e., the lowest gap, the last reported in logs before timeout.
The remarkable facts of the results in Table 6 are: (i) control variables only represent between 2.5% and 5.3% of all variables, meaning that most resources in the exact approach are put into the auxiliary variables needed to reach the linear mixed integer programming formulation; (ii) the total number of variables is highly correlated with both the number of lines and bus stops; and (iii) although gap and time to solution are both correlated with the number of control and total variables, those correlations are higher for the total number of variables. The last two observations arise from computing Pearson's linear correlation coefficient over the whole set of instances used as a dataset: number of bus stops, number of lines, #vars, #varsCtl, GAP, and time to solution, as in Table 6.
The most notorious difference between variant #1 and variant #2 comes from the value of α, not from the higher number of variables. The number and type of variables in Equations (6)- (13) are not affected by α, but CPLEX time-to-optimal is in the order of the second for α = 0 (no headway tolerance at all). Conversely, when α = 0.3, none of the solutions is proven optimal within the 2 h timeout period, and only in 7 out of the 75 instances could the best solution be found before 20 min. Thus, the increased size and complexity of the search space coming from higher values of α have a much higher impact in the performance than the number of variables.
The objective function values achieved by exact and EA for the studied problem instances in BSP variant #2 are reported in Table 7. The real timetable is reported as baseline for the comparison, and columns 'EA vs. exact', 'EA vs. real', and 'exact vs. real' summarize the comparison, as in Table 4. Results in Table 7 demonstrate that both exact and EA significantly outperform the baseline real solution in all BSP instances. EA improved over the real solution up to 155.9% in instance 30.37.30.1, and the exact method improved over the real solution up to 179.9% in instance 30.40.30.0. The comparative analysis of EA and the exact solutions indicates that the evolutionary approach demonstrated to be highly efficient at solving the problem: on average, the distance to the exact solution was below 1.32%. Overall, EA improved 63.4% (on average) over the real timetable, and the exact method improved 66.2% (on average) over the real timetable.
The average improvements of exact and EA over the baseline real solution, grouped by tolerance and instance size for BSP variant #2, are reported in Table 8. Regarding tolerance levels, EA improved up to 150.95%, and the exact method improved up to 158.13% over the real timetable. The best improvements were computed for λ = 30, which pose the tighter bounds for user waiting times. In less restrictive scenarios, improvements over the real timetable were over 25.23% for both methods. Regarding scenario dimension, both methods computed robust solutions that improve between 58.50% (EA, instances with 110 transfer zones) and 70.25% (exact, instances with 30 transfer zones) over the real solution. The reported improvements on the objective function values grouped by tolerance and problem dimension demonstrate that for all BSP instances, the improvements over the real timetable increase as user tolerance decreases. Similar to problem variant #1, the optimization algorithms properly scale with the complexity of the problem, computing better solutions for tighter instances.
The proposed EA computed the same solution as the exact method in 33 problem instances (10 with 30 transfer zones, 18 with 70 transfer zones, and 5 with 110 transfer zones). Furthermore, in another 14 instances, the difference was below 1%, and in two problem instances, the EA computed a better solution than the exact method. Table 9 reports the average objective values (normalized by the number of transfer zones) for all scenarios, grouped by tolerance. Results show that both EA and the exact methods significantly improve over the baseline real solution. For BSP variant #1, the largest difference is 1.72 (4.68 -2.97) between the exact approach and the real solution, when low user tolerance is considered (λ = 30). Similar to the results reported in Table 5, the graphic demonstrates that improvements of the optimization methods are better in tighter BSP instances. For BSP variant #2, the best improvement of the exact method was 4.45, and the best improvement of EA was 4.45, both when λ = 30. Results are graphically presented in Figure 7 for BSP variant #1.  Three quality of service metrics (r, l s , and l n ) are considered in the results reported for the computed solutions in Table 10, grouped by dimension (NP) and tolerance (λ). Metric r is the ratio between the average waiting time computed by the proposed timetable schedules and the threshold value that passengers are willing to wait for a transfer in each transfer zone (W ij b ). The r metric evaluates the number of successful synchronized trips, which are represented by r ≤ 1.0, whereas unsuccessful synchronizations are represented by r > 1.0. Metric r also allows evaluating how far from acceptable (i.e., synchronized) the trips of two lines are, considering the deviation from the ratio defined by the threshold value passengers are willing to wait (1.0). In turn, metric l s is the average number of successfully synchronized lines, whereas metric l n is the average number of not synchronized lines. All metrics are reported for each instance class regarding dimension and tolerance. Results in Table 10 indicate that both studied optimization methods significantly improved the quality of service when compared with the real solution. Both the exact method and the EA computed lower values of the waiting time metric for all instance classes. The best improvements of the studied methods over the baseline real solution occurred for problem instances with NP = 70/110 and λ = 30, where both exact and EA computed solutions with a higher number of synchronized lines. In turn, the largest number of synchronized lines were computed by the exact method in problem instances with NP = 110 and λ = 30. Furthermore, better waiting time values were obtained by both exact and EA in problem instances with lower waiting time tolerance from users, with respect to the baseline real solution.

Quality of Service Results
A comparison of waiting time values for transfers of all lines (normalized by W b ) is presented in the histograms in Figure 8. Results correspond to scenario 70.63.30.2, which is representative of the results computed for other studied BSP instances. Normalized average waiting time results are reported for the baseline real solution, the EA solution for BSP variant #1 (α = 0.0), and the exact solution for BSP variant #2 (α = 0.3). Values of the normalized average waiting time over 1.0 mean that for a number of lines in the reported scenario, the optimization methods were not able to find a combination of offset and headway values that allows lowering the waiting time for transfers under W b . This occurs as a direct consequence of the inter-related design of bus lines and the behavior of passengers in each scenario, since when adjusting offsets and headways to achieve successful transfers between line i and line j in transfer zone b (i.e., for those transfers avg(wait) < W b ), those lines are not synchronized (at least for some trips) with other lines on other transfer zones b * (i.e., for those transfers, avg(wait) > W b * ). The histograms in Figure 8 report both successful synchronizations (avg(wait) < W b , on the left side of each histogram) and non-successful synchronizations (avg(wait) > W b , on the right side of each histogram).
Results reported in the histograms in Figure 8 demonstrate that the exact solution effectively reduces the waiting time for a significant percentage of lines in the studied instance. The exact solution has more lines with an average waiting time lower or equal to 1 (i.e., 23 in the exact solution vs. only 6 in the real solution). Furthermore, five lines in the exact solution have an average waiting time less than 0.5 of W b , whereas there is none in the real solution. Regarding larger waiting times, only 3 lines in the exact solution have more than 1.5 of the maximum value for a successful synchronization, whereas in 13 lines of the real solution, passengers must wait 1.5 more than expected to perform a multi-leg trip. The comparative results demonstrate that the solution found by the exact method synchronizes a larger number of lines, thus improving the QoS. Solutions computed by the proposed EA have a similar behavior regarding waiting times, as reported in Table 10. Similar results were obtained for other studied BSP instances, and even better results were achieved in the most restrictive instances (λ = 30).  Finally, regarding execution time, both the exact and evolutionary approaches were able to compute accurate solutions in very short execution times. For problem variant #1, the execution times were less than 10 seconds. For problem variant #2, the time limit of the exact method was two hours, and the EA found the best solutions in less than five minutes.

Concluding Remarks
This article studied the bus timetabling synchronization problem. The problem model extends existing ones by defining extended transfer zones for every pair of bus stops. In turn, an improved model was included to account for the transfer demands of passengers, and two variants of the problem were addressed: optimizing the offset for each line, and optimizing the offset and headways for each line. For the second problem variant, the optimization domain was restricted to a certain deviation on the real headways defined for the case study to provide a proper quality of service to users.
An exact MILP approach and an EA were proposed to solve the problem. Both methods were evaluated for a real case study for the public transportation system in Montevideo, Uruguay. A total number of 75 scenarios were defined using real data about lines, headways, and transfer demands, gathered from the intelligent transportation system of Montevideo. Results were compared with the real timetable currently implemented by the city administration.
The empirical evaluation indicated that both proposed methods are able to compute solutions that are significantly improved over the real current timetable in Montevideo. The exact method computed the optimal solution for all instances in problem variant #1, improving successful synchronizations up to 66.6% (24.6% on average) over the real timetable. The EA was highly efficient for this problem variant, too, improving the synchronizations up to 66.3% (24.5% on average) over the current timetable. Problem variant #2 poses a significantly harder challenge for optimization, as the number of variables significantly increases for each considered scenario. Despite this, the exact method was able to compute an accurate solution, which provides significant improvements on the number of successfully synchronized trips over the real timetable. Improvements up to 179.9% (66.2% on average) for the exact methods and up to 174.8% (66.2% on average) for the EA. The proposed optimization methods also improved relevant quality of service metrics, such as the average waiting times for transfers, especially in tight problem instances, which reduced up to 57.8% for BSP variant #1 and up to 158.3% for BSP variant #2.
Both methods were efficient regarding computing times. The exact method provided accurate objective function values within less than a second for problem variant #1. In turn, the proposed EA is useful for solving larger problem instances of problem variant #2 in reduced execution times.
Future lines of research are related to addressing other BSP variants, e.g., explicitly focusing on the perceived waiting times for users, and modeling the real demand for direct trips from real data on ticket sales provided by the transportation system. Multi-objective versions of the problem can be devised, too, by including the explicit optimization of other relevant functions, such as the operation cost of the bus fleet and other quality of service metrics.