Optimal transport in multilayer networks for traffic flow optimization

Modeling traffic distribution and extracting optimal flows in multilayer networks is of utmost importance to design efficient multi-modal network infrastructures. Recent results based on optimal transport theory provide powerful and computationally efficient methods to address this problem, but they are mainly focused on modeling single-layer networks. Here we adapt these results to study how optimal flows distribute on multilayer networks. We propose a model where optimal flows on different layers contribute differently to the total cost to be minimized. This is done by means of a parameter that varies with layers, which allows to flexibly tune the sensitivity to traffic congestion of the various layers. As an application, we consider transportation networks, where each layer is associated to a different transportation system and show how the traffic distribution varies as we tune this parameter across layers. We show an example of this result on the real 2-layer network of the city of Bordeaux with bus and tram, where we find that in certain regimes the presence of the tram network significantly unburdens the traffic on the road network. Our model paves the way to further analysis of optimal flows and navigability strategies in real multilayer networks.

transportation modes and is computationally efficient. While here we focus on transportation networks, our method is applicable to a broader set of practical applications involving flows on multilayer networks.

What makes multilayer networks different than single-layer in transportation
Having given the broader context for our work, we now highlight the main features of transport on multilayer networks. The presence of edges between layers (inter-layer edges) makes a multilayer network fundamentally distinct from a standard single-layer one, as these edges allow passengers to switch between transportation modes. However, this is not the only difference. In fact, in a multilayer network, the various layers have different characteristics. The main one is that the type of transportation cost varies across layers. For example, the cost to build and maintain the infrastructure differs depending on the transportation mode, with subway or rail tracks costing more than a road network. Moreover, the cost assigned to traffic congestion is also different, as road networks are more sensitive to traffic bottlenecks than rail ones. In addition, also the power dissipated differs depending on the means of transportation, as running a tram generally produces less CO 2 emissions than running a bus. All these different features impact the results of an optimal transport problem, as the network features contributing to the cost function to be optimized vary with layers, and thus also the optimal solution.
Finally, the network topologies themselves vary with layers [41], as a bus network has many edges with short lengths, while a rail network tends to have fewer edges but longer. In addition, the weights assigned to each edge differ based on the layer, which can induce a coupling between layers [42].

A. Multilayer transportation networks
In general, a multilayer network is represented as a graph G({V α } α , {E α } α , {E αγ } α,γ ), where V α and E α are the set of nodes and edges in layer α, respectively, and E αγ is the set of edges between nodes in layer α and nodes in layer γ. Here α = 1, . . . , L and L is the number of layers. We denote with N α = |V α | the number of nodes in layer α, and with E α = |E α | the number of edges in layer α, E αγ = |E αγ | is the number of edges between nodes in layer α and γ. Finally, we denote with V 0 = ∪ α V α the total set of nodes, E 0 = (∪ α E α ) ∪ (∪ αγ E αγ ) the total set of edges, and N 0 = |V 0 | and E 0 = |E 0 | their cardinalities. We assume that edges have lengths l e > 0, which determine the cost to travel through them.
Transportation networks are relevant examples of this type of structures, where nodes are stations, edges are connections between stations and layers are transportation modes, for instance rails or bus routes. A convenient way to represent multilayer network is with two tensors [43]: i) an intra-layer adjacency tensor A with entries A α uv = 1 if there is an edge between nodes u and v in layer α, and 0 otherwise. We refer to this type of edges as intra-layer edges; ii) an inter-layer adjacency tensorÂ with entriesÂ αγ uv = 1 if there is an edge between node u in layer α and node v in layer γ, and 0 otherwise. Without loss of generality, in our applications we haveÂ αγ uv = 0 if u = v, meaning that different layers are connected solely by shared nodes. We refer to edges connecting nodes in different layers as inter-layer edges. In the case of transportation networks, the main application studied here, a station could have a bus stop, a train platform and a subway entrance, which allows passengers to switch between communication modes within the same station. For example, one can think of an inter-layer edge as the stairs connecting the subway entrance with the entrance to the train station. Typically, inter-layer edges are thus much shorter than intra-layer edges.
In case of multilayer networks, we need to be careful on how stations connecting multiple transportation modes are represented. In fact, if an entry station connects more than one layer, we may not be able to distinguish in what layer a passenger enters. In other words, if a node u belongs to more than one layer, i.e. a node u α exists for more than one value of α, we may not be able to tell whether the passengers entering u entered from u α , u γ or from any of the other instances of node u in the various layers. To alleviate this problem, we build auxiliary super nodes u, which do not belong to any layer in particular but instead connect the various instances of the same node in the various layers together. Specifically, we remove all the inter-layer edges (u α , u γ ) and replace them with auxiliary inter-super edges (u α , u) connecting all the instances u α of node u with the super node u, as in a star graph, so that the original edge (u α , u γ ) has been replaced by a 2-edge path {(u α , u), (u γ , u)}.
This auxiliary structure allows the model to allocate in an optimal way passengers along the inter-super edges when they enter from a station with connections to more than one layer, thus avoiding to select arbitrary entrances a priori. This becomes relevant in applications where the cost to travel along inter-layer edges is non trivial. For instance, in situations where changing connection impacts the comfort of the passengers. Moreover, the introduction of super nodes and edges facilitates how we represent the multilayer network. In fact, by adding these auxiliary super nodes and inter-super edges, we only need to consider an individual network adjacency matrix A, instead of two separate tensors. This matrix has entries A uv = 1 if an edge exists between nodes u and v and 0 otherwise, where a node u can be a node u α in layer α or a super node u. The set of nodes will then be V = V 0 ∪ V super , where V super is the set of super nodes, and |V super | = N super is their number, which corresponds to the number of nodes that belong to more than one layer. Similarly, the new set of edges is E = (∪ α E α ) ∪ E super where E super is the set of inter-super edges. The final numbers of nodes and edges are N = |V| = N 0 + N super and E = |E| ≥ E 0 . Notice that this construction is equivalent to assume that the network has L + 1 layers, where the extra layer is made of inter-super edges E super and all nodes incident to them (without loss of generality, we assume that all the inter-super edges are treated equally). We denote it as super layer and this corresponds to α = L + 1, so that E L+1 ≡ E super . We show an example of this structure in Figure 1.
Finally, we consider a coupling between layers as in [42] that controls how the layers are linked. Specifically, we multiply the lengths of each edge by a factor w α ∈ [0, 1] that depends on what layer the edge belongs to. For convenience, we introduce q e ≡ q e (α) taking values q e = α, for each e ∈ E α and with α = 1, . . . , L + 1. Using this, we define the resulting length as e := w qe l e . This ensures that edges in different layers can be navigated differently. If we interpret w α as the inverse of a velocity, then e is proportional to the time needed to travel along edge e, which can be seen as an "effective" length. When w α < 1 and w γ = 1, a passenger takes less time to travel along an edge of length l e in α than one in γ. Typically, e are small for inter-super edges. Nevertheless, one can tune the cost to travel along them by tuning w L+1 .

B. The model
We consider the formalism of optimal transport theory, and in particular recent works that maps the setting of solving a standard optimization problem into that of solving a dynamical system of equations [24][25][26][27][28][29][30]40]. Specifically, we model two main quantities defined on network edges: i) fluxes F e of passengers traveling through an edge e; ii) conductivities µ e , these are quantities determining the flux passing through an edge e. Intuitively, the conductivity µ e of an edge can be seen as proportional to the size of the edge e. To keep track of the different routes that passengers have, we consider a multi-commodity formalism as in [40], i.e. we distinguish passengers based on their entry station a ∈ S, where S ⊆ V is the set of stations where passengers enter, we denote with M = |S| the number of passenger types. With this formalism, we have that the fluxes F e are M -dimensional vectors, where the entries F a e denote an amount of passengers of type a traveling on edge e. The important modeling choice is that the conductivities µ e are shared between passengers, thus they are scalar numbers contributing to the cost for all passengers' types traveling trough e. This formalism can be equally applied to both edge types, intra-layer and inter-super edges.
We assume that fluxes are determined by pressure potentials p a u defined on nodes as: We model the amount of passengers entering a station a with a positive real number g a . For notational convenience, we define a N × M dimensional matrix of entries g a u such that g a u := 0 if u = a, and g a u := g a if u = a. Similarly, we define with h a u the amount of passengers of type a exiting at node u. Here the only constraint is that h a u = 0 if u = a, to avoid unrealistic situations were passengers entering in one station exit from the same station. Finally, we define the N × M -dimensional source matrix with entries S a u = g a u − h a u , which indicates the amount of passengers of type a entering or exiting a station. Notice that for each a ∈ S we have u S a u = 0, meaning the system is isolated, i.e. all the passengers of a certain type who enter the network also exit.
With this in mind, we enforce mass conservation by imposing Kirchhoff's law on nodes. To properly enforce this constraint we need to consider all the edges, both intra-layer and inter-layer edges. This can be compactly written by considering the multilayer network signed incidence matrix B with entries B ve = 1, −1, 0 if node v ∈ V is the start, end of edge e ∈ E, or none of them, respectively. With this in mind, Kirchhoff's law can be written as: Finally, we assume that the conductivities follow the dynamics: where q e encodes the type of edge, as defined in Section II A. The parameter 0 < β qe < 2 is important as it determines the type of optimal transport problem that we aim to solve, as we describe in more detail later. Interpreting the conductivities as quantities proportional to the size of an edge, this dynamics enforces a feedback mechanism such that the edge size increases if the flux trough that edge increases, it decreases otherwise. This feedback mechanism has been observed in biological networks like the one made by the slime mold Physarum polycephalum [24,44], which adapts its body shape to optimally navigate the space searching for food. The important property of this dynamics is that its stationary solutions minimize a multilayer transport cost function: where Γ (β α ) = 2 (2 − β α ) / (3 − β α ) for all α and the 2-norm is calculated over the M entries of each F e . This means that solving the systems of Eqs. (1) to (3) is equivalent to finding the optimal trajectories of passengers in a multilayer network, where optimality is given with respect to the cost in Eq. (4). An extended discussion and a formal derivation of this property can be found in [32,40]. The parameter β qe (taking value β α on layer α) regulates how the fluxes should distributes in each of the layers. In fact, according to Eq. (4), when β α > 1, the fluxes are encouraged to consolidate into few edges of layer α, being Γ (β α ) < 1 and thus the cost in Eq. (4) sub-linear. In the opposite scenario, when 0 < β α < 1, we have that the fluxes are encouraged to distribute over more edges and with lower values, in order to keep traffic congestion low. Finally, when β α = 1 we obtain shortest path-like minimization. The consequence of having different β α in different layers is that the optimal trajectories will have different topologies in each of the layers. At the same times, layers are coupled together, thus the final trajectories will be a complex combination of the weights w α and the β α . We give an example of optimal flows for various combinations of these parameters in Figure 2.

C. The algorithmic implementation
The numerical implementation consists in initializing the µ e > 0 at random. Then one iterates between i) extracting the pressure potentials (or the fluxes) using Eqs. (1) and (2); ii) use these to recompute the µ e by means of Eq. (3), this can be solved numerically with a finite difference discretization. The iteration is repeated until convergence. In our experiments, we terminate a run of the algorithm when the difference J general, hence the solution of Algorithm 1 may converge to a local optima. One should then run the algorithm several times, each time initializing to a different random initial realization of µ e > 0. A possible choice for a final optimal solution would be the one that has lower J β . We give the pseudocode for this in Algorithm 1, this is complemented with the block diagram in Figure 3. Most of the computational effort required by Algorithm 1 is in the solution of M linear systems as in Eq. (2). In our implementation, this has been performed by a sparse direct solver (UMFPACK), performing a LU decomposition of each column of the right hand side of Eq. (2), and having complexity scaling as O(M N 2 ). The resulting {F e } capture how passengers travel along the network via optimal trajectories. The norms ||F e || 2 measure the total amount of passengers along an edge e.

A. Results on synthetic data
We show how the model works on synthetic data where each layer is planar, to mimic realistic scenarios of transportation networks in space. We generate 2-layer networks and the source matrix S as done in [42]. Specifically, we generate one layer by randomly placing N nodes in the square [0, 1] × [0, 1] and then extract their Delaunay triangulation [45]. We then select a subset of nodes and use this to build the second layer, with an analogous procedure. An example of this is given in Figure 2. After having constructed the network topology, we assign entry and exit stations to each node in the network starting from a monocentric scenario where all passengers exit from a central station, regardless their origin. We then randomly re-assign with a probability p ∈ [0, 1] the exit station of each set of passengers. When p = 0 all the passengers travel to the city center, while when p = 1 the destinations are assigned completely at random. We generate 20 networks with N 1 = 100 and N 2 = 10, so that layer 1 has on average shorter edges than layer 2. For each sampled network we take 50 random samples of S. We consider p ∈ {0.2, 0.8} to study two opposite situations of having a majority or a minority of the passengers directed to a common central node. Then, we fix w 1 = 1 and vary w 2 ∈ {0.2, 0.8}, to mimic a scenario where traveling on the second layer is faster. Overall, with these combinations of parameters, we obtain 2-layer networks that resemble a road-rail network. With this in mind, we run our model with the following combination of parameters for the dynamics: (β 1 , β 2 ) ∈ {(0.5, 1.1), (0.5, 1.3), (0.5, 1.5), (1., 1.)}. This is because we expect to penalize traffic congestion in a road network, hence β 1 = 0.5. Instead, a rail network is less sensitive to traffic but it may cost more to build connections, thus once should consolidate traffic along fewer edges, hence β 2 > 1. The case (β 1 , β 2 ) = (1., 1.) is used as a baseline for comparison with shortest path-like optimization.
We measure how passengers distribute along the optimal trajectories to assess how the network operates under various regimes of w and β. For this, we consider ||F e || 2 and measure the distribution of this quantity along the edges, to see how this varies across parameters' values and in each of the two layers. In addition, we calculate the current flow edge betweenness centrality (FBC) [46], which captures how important an edge is based on how many passengers travel through it. This is different than the standard edge betweenness centrality [47] in that it considers random paths connecting two points, instead of only the shortest paths. We argue that FBC is more appropriate in our case as the shortest paths may not be the optimal trajectories where passengers travel. We calculate the weighted version of FBC, where the edge weigth is ||F e || 2 , so that the random paths are more likely to follow edges with higher flux. We use the Gini coefficient Gini ∈ [0, 1] to characterize the disparity in the flow assignment along edges. We consider the following definition [48]: where r, q denote edges, x is the quantity we want to measure this coefficient with andx = e x e /E is its average value. Here we use x e = ||F e || 2 and x e = F BC e . When Gini is close to one, most of the flow passes through few edges. Instead, when Gini is small, the flows are distributed evenly across edges. Looking at Figure 4, we see that Gini increases with β 2 and thus the network usage becomes more hierarchical, as expected in this case (we report here results for Gini w.r.t. the flux, but similar results are observed for FBC, see Figure S1). The exact value of Gini depends on the travel demand, as for p = 0.2, i.e. when the central node is a destination in 80% of the journeys, Gini is higher than when p = 0.8. This is because with fewer destinations there are also fewer possible path trajectories, and thus more passengers use the same part of the network. We can also see how Gini decreases for higher w 2 , i.e. when traveling by tram is not much faster than traveling on the road network. Finally, we can notice the drop in Gini compared to the shortest path-like scenario β 1 = β 2 = 1. In this case, the traffic distribution is the most hierarchical, suggesting that possible traffic congestions can be avoided by setting lower values of β 1 .

FIG. 4:
Results on synthetic data. We show the Gini w.r.t. the optimal ||F e || 2 (y axis) vs β 2 (x axis) for synthetic 2-layer networks generated as in Section III A. Blue and red markers denote p = 0.2, 0.8, respectively, w 1 = 1 in all cases, while w 2 = 0.2 (left) and w 2 = 0.8 (right); β 1 = 0.5 in all cases, except for the case where β 2 = 1, for which β 1 = 1. This case is a shortest path-like baseline. Markers are averages over 20 network samples and 50 source matrix samples (for a total of 1000 individual samples).
Our model can be used to simulate traffic distributions under various conditions. In fact, tuning p, {w α } and {β α }, one can simulate disparate scenarios. For instance, in Figure 2 we show results for different parameters' choices on a particular realization of a 2-layer synthetic network. Several conclusions can be drawn from this simple experiment. For instance, the second layer, which ideally can represent a tram network, is only partially used when β 2 = 1.5. This value encourages traffic to consolidate on fewer main connections, simulating the scenario where building the rail infrastructure is expensive. Our model can guide a network manager to decide what edges should be prioritized when designing the network. In this example, we can distinguish what set of edges are the most utilized. These are mainly central edges, but the exact set can change depending on the other parameters. For example, if the travel demand, tuned by p, switch from a monocentric to a more heterogenous set of entry-exit stations, one of the main central edges changes from connecting a periphery to the center, to connecting two locations in the periphery.

B. Results on real data
We illustrate our model on a real 2-layer network of the city of Bordeaux, where the two layers are the bus and tram, respectively. Data are taken from [49]. We simulated a monocentric source matrix S, i.e. p = 0.0, to asses the scenario where all the passengers travel to the city center, however results are similar for other values of p (not reported here). Optimal paths are extracted using our model for β 1 = 0.5, β 2 = 1.5, w 2 = 0.2 and compared against the case where the tram network is absent. This can be simulated by setting a high value of w 2 , so that the cost on the tram edges makes it extremely unlikely to use any tram connection (here we used w 2 = 100). We measure the total percentage flux f 2 = e∈E2 ||F e || 2 /( e∈E1 ||F e || 1 + e∈E2 ||F e || 2 ) passing through layer 2. Remarkably, in this scenario the tram network absorbs f 2 = 17% of the total flow of passengers, even though the tram network contains only E 2 = 112 edges, compared to E 1 = 2347 bus edges. This allows to reduce significantly the traffic along the road network, as can be seen in Figure 5: the road edges, and in particular those parallel to the tram line and close to the city center, get thinner, as more passengers use the tram. This also results in a higher Gini 1 = 0.26 (calculated on edges in layer 1 w.r.t. ||F e || 2 ), compared to the Gini 1 = 0.23 when the tram is absent: as the passengers use the tram, they decrease traffic on many road edges. While the traffic distribution on layer 1 gets more hierarchical (higher Gini 1 ), this does not necessarily lead to more traffic congestion. In fact, the total percentage flow f 1 decreases, as we saw above. Additional plots can be seen in Figure S2. Here β 1 = 0.5 in both cases, while β 2 = 1.5 in the second case. The width of the edges is proportional to the optimal ||F e || 2 . The reported Gini 1 coefficient for the bus network (layer 1) is calculated using ||F e || 2 . The total percentage flux f 2 = e∈E2 ||F e || 2 /( e∈E1 ||F e || 1 + e∈E2 ||F e || 2 ) = 0.17, distributed over E 2 = 112 tram edges, compared to E 1 = 2347 bus edges.

IV. Discussion
We have presented a model that extracts optimal flows on multilayer networks based on optimal transport theory. Our models accounts for different contributions from different layers to the total transport cost by means of a parameter β α . Our modeling choice is relevant in scenarios where passengers can travel using different transport modalities on an interconnected transportation network. We have shown how the optimal distribution of passenger flows on network edges is influenced by different factors. In fact, a complex combination of the parameter β α on each layer, the coupling between layers and the distribution of the origin and destination pairs determine how heterogeneous the flow distributions is inside the various layers. In particular, when β α < 1 in one layer and β α > 1 in another layer, the network topologies are significantly different in the two layers, as in one the traffic is more balanced and distributed along many edges, while in the other traffic is consolidated along few main arteries. To show the potential of our model, we have considered an application to the 2-layer bus and tram network of Bordeaux, showing how the presence of the tram changes the traffic distribution on the road network.

V. Conclusions
In this work we propose a model that uses optimal transport theory to find optimal path trajectories on multilayer networks. By means of the regularization parameter β α , we are able to take into account different contributions from the different layers to the total transportation cost. We illustrate the model on both synthetic and real data and show how the optimal distribution of passenger flows on network edges is influenced by different parameters used for the construction of the model (i.e., w, p, β α ).
In the absence of real data, we simulated the entry and exit destination of passengers. However, if travel demands were known, for instance using mobile data [50], it would be interesting to investigate the distribution of traffic obtained with our model and compare it with real usage data as done in [51]. We have considered a cost assigned on edges where β α tunes the impact of traffic on them, but one can generalize this to include penalties on nodes based on their degrees, as considered in [52]. Our model can be used to extract the main features of multilayer transportation networks [53] or to study the existence of several congestion regimes in both synthetic and real data [21] and investigate how this changes varying β α . Finally, in our experiments we fixed the weight of inter-super nodes to be small. Potentially, one could suitably increase this to account for the cost of changing transportation mode within a journey and use our model to see how optimal trajectories change. This would be relevant in scenarios where passengers' comfort contributes to the total transport cost. To facilitate future analysis, we provide an open source implementation of our code at https://github.com/cdebacco/MultiOT.

Supporting Information (SI)
FIG. S1: Additional results on synthetic data. We show the Gini w.r.t. the optimal F BC (top) and the total percentage flux f 2 on layer 2 (bottom) vs β 2 (x axis), for synthetic 2-layer networks generated as in Section III A; w 2 = 0.2, 0.8 (left,right), β 1 = 0.5 in all cases, except for the case where β 2 = 1, for which β 1 = 1. This cases is a shortest path-like baseline. Markers are averages over 20 network samples and 50 source matrix samples.