On Multistage Multiscale Stochastic Capacitated Multiple Allocation Hub Network Expansion Planning

: The hub location problem (HLP) basically consists of selecting nodes from a network to act as hubs to be used for ﬂow trafﬁc directioning, i.e., ﬂow collection from some origin nodes, probably transfer it to other hubs, and distributing it to destination nodes. A potential expansion on the hub building and capacitated modules increasing along a time horizon is also considered. So, uncertainty is inherent to the problem. Two types of time scaling are dealt with; speciﬁcally, a long one (viz., semesters, years), where the strategic decisions are made, and another whose timing is much shorter for the operational decisions. Thus, two types of uncertain parameters are also considered; namely, strategic and operational ones. This work focuses on the development of a stochastic mixed integer linear optimization modeling framework and a matheuristic approach for solving the multistage multiscale allocation hub location network expansion planning problem under uncertainty. Given the intrinsic difﬁculty of the problem and the huge dimensions of the instances (due to the network size of realistic instances as well as the cardinality of the strategic scenario tree and operational ones), it is unrealistic to seek an optimal solution. A matheuristic algorithm, so-called SFR3, is introduced, which stands for scenario variables ﬁxing and iteratively randomizing the relaxation reduction of the constraints and variables’ integrality. It obtains a (hopefully, good) feasible solution in reasonable time and a lower bound of the optimal solution value to assess the solution quality. The performance of the overall approach is computationally assessed by using stochastic-based perturbed well-known CAB data.


Introduction and Motivation
The hub location problem (HLP) basically consists of selecting nodes from a network to be used for flow traffic directioning, i.e., flow collection from some origin nodes, probably transfer it to other hubs, and distributing it to destination nodes. Examples can be found in product distribution, telecommunications, land and air transportation, etc. As a matter of fact, an HLP can be considered in any flow supplying problem where, instead of directly delivering from origin to destination, there is an opportunity (due to capacity limitations and economic reasons, among others) for performing hub transfer to delivering flow from different pairs of origin-destination. See, for instance, the multiperiod deterministic problem presented in [1], where a case study of humanitarian aid distribution in Lebanon is considered. The hub building and operational overall cost are the functions to be minimized in most of the problems. Real-life multiperiod problems are frequently encountered in industry where the hub capacity needs to be expanded and/or additional hubs are required to satisfy the flow demand from one node to another; see [2,3]. The stochasticity on the main parameters is inherent to multiperiod problems; see [4,5]. The hubs can be capacitated or uncapacitated, and the problem can be deterministic or stochastic, and static (i.e., single period) or dynamic along a time horizon. Let the dynamic HLP be named as a multiperiod one for the deterministic and two-stage multiperiod settings, and a multistage one for the other stochastic settings, where the uncertainty is either a stage-dependent, stagewise one, or even both types in a multiscale setting. The NP-hardness of HLP is proven in [6], even for the single-allocation uncapacitated p-hub location deterministic version, where p hubs must be located and non-hub nodes are allocated to the hub ones. This type of problems is a particular case of the one that is addressed in this work.
There is broad literature on the subject, see [2,7,8]; recent reviews and some perspectives on modelling are presented in [9,10]. However, the literature is scarce on the inherent stochasticity of HLP, to the best of our knowledge, there are no works on multistage stochastic HLP.
This work deals with the multistage multiscale stochastic capacitated HLP with network expansion along a time horizon. The uncertainty of the strategic parameters is assumed to be stagewise-dependent and the one of the operational parameters is assumed to be stage-dependent. So, a multiscale setting is addressed, as it frequently happens in reality. Given the network size of realistic instances as well as the cardinality of the strategic scenario tree and operational ones, it is a challenge to deal with the intrinsic difficulty of the problem and the huge dimensions of the instances. It is therefore unrealistic to seek an optimal solution for the deterministic equivalent model (DEM) of the tight stochastic mixed integer linear optimization one (traditionally, named MILP) that is presented. In contrast, a matheuristic algorithm, so-called SFR3, is introduced for achieving (hopefully, good) feasible solutions jointly, with a lower bound of the optimal solution value to assess the solution quality.
The rest of the work is organized as follows: Section 2 presents the problem's assumptions. For completeness purposes and setting some notation to be used throughout the work, Section 2.1 also recalls the main concepts of the strategic multistage scenario tree with embedded operational two-stage ones. Section 3 is devoted to a literature review on the subject, and outlines the main contributions of the work. Section 4 presents the model for MS-Hub-NEP. Section 5 introduces the matheuristic algorithm SFR3. Section 6 reports the main results of the computational experiment. Section 7 draws the main conclusions, and its Section 7.1 outlines the perspectives of model MS-Hub-NEP and maheuristic SFR3, as well as a future research agenda. An example is presented in Appendix A to illustrate the main decisions to be made in MS-Hub-NEP and the role of the scenario trees. Appendix B reports the results of the computational comparison between SFR3 strategies.

MS-Hub-NEP Problem Description
The section has two parts; first, the problem is presented jointly with the related sets. Secondly, Section 2.1 recalls the main elements of the strategic multistage scenario tree and related operational two-stage ones, also introducing the location-allocation elements for the strategic and operational scenarios.
The features of the Multistage Stochastic-Hub Network Expansion Planning problem (MS-Hub-NEP) are presented. The time horizon is finite and divided into time units, socalled stages. All nodes in the network (even the hubs) can have flow outbound, inbound or both. Without loss of generality, the set of potential hub locations is assumed to be a subset of the nodes in the network. A hub can have outbound and inbound flows. Hub building (a strategic decision) is performed at the beginning of the stages. The hubs are capacitated, so, the number of initial capacity modules should be decided, in the case of building. An expansion on the number of modules (another strategic decision) could be decided at later stages. A maximum number of capacity modules is imposed for each hub; all modules have the same capacity (i.e., they are assumed to be technically identical). The number of available hubs at the stages is bounded. The flow from origin node, say i, destined to another one, say j, can be transported (an operational decision) by using the so-called non-stop service mode or, alternatively, the transportation can be done through the hub network. The unit transportation cost is a function of the distance between the nodes.
The assumption of two hubs to be used, at most, to transfer a given origin-destination flow through the hub network, is not always realistic. However, it is one of the main assumptions of the work and, even in this case, the problem is still very difficult to solve. By using the notation k and l for the hubs involved, the route can be i → j, i → k → j or i → k → l → j. If the non-stop service mode between any two network nodes is decided, then one of those nodes, at most, could be hub available. The allocation of nodes to hubs can be changed from a stage to another. In stochastic optimization parlance, it means that the allocation can be changed from a strategic node to each of its immediate successors. The number of origin nodes to allocate to any hub could be bounded.
As an illustrative example, let us assume that the strategic decisions have already been made, based on the hub building and capacity module strategic costs, the transportation costs, and the flow demand. So, let hubs 1 and 3 be available at, say, stage 1 with a given capacity; see Figure 1. The figure also shows the operational decisions that have been assumed for a given operational scenario. Those decisions are related to transporting the flow demand from origin i to destination j, perhaps through hubs k and l. That is, it shows the routes i → j, i → k → j or i → k → l → j, for the flow of each network node pair (ij). Suppose that the operational decisions are made as follows: Flow demand in the non-stop service mode, 7 → 6, 7 → 8 and 2 → 7; 8 → 3, collection at hub 3 of flow demand from node 8; 8 → 3 → 4, collection at hub 3 of flow demand from node 8 with destination node 4; 8 → 3 → 1 → 5, collection at hub 3 of flow demand from node 8 that is transferred to hub 1 with destination node 5; 6 → 1 → 4 and 6 → 1 → 5, collection at hub 1 of flow demand from node 6 with destination nodes 4 and 5, resp.; and 2 → 1 → 3 → 9, collection at hub 1 of flow demand from node 2 that is transferred to hub 3 with destination node 9. Observe that hub 3 is inbound under the assumed operational scenario, being the destination of a portion of the flow demand from origin node 8. Note that the hub edge kl is used in both senses for hubs 1 and 3 since, on one hand, it transfers the flow demand from node 2 to node 9 and, on the other hand, it transfers it from node 8 to node 5. Flow transfer from a hub to another one Flow distributed from a hub to a destination node Flow from an original node collected at a hub Flow directly transported from origin to destination in the non-service mode (

Sets for the Hub Network
Let the following notation, where capital letters represent data, calligraphic symbols do it for sets, and i, j, k, l, are indices.
V, network nodes. H ⊆ V, candidate hubs. E ⊂ V × V, pairs of origin node i and destination node j, each one denoted as ij. Note that hub k can also be any of those nodes, for k ∈ H. D ij , 'distance' between nodes i and j, for i, j ∈ V.Recall that the edge types in the hub network are as follows: origin node to hub, hub to hub, and hub to destination node. D, maximum 'distance' that is allowed for any hub edge in the network. It is worth pointing out that link (ij) is not allowed forD ij > D.
For the purpose of reducing the number of variables and constraints related to the operational scenarios in the model below, the strengthening sets K, K → i , K ← i , H i , H ij are as follows: K = {kl ∈ H × H : k < l|D kl ≤ D ∨ D lk ≤ D}, hub pair set for flow transfer in the directions from k to l and l to k) in case of their availability.
hub pair set for flow transfer from origin node i in the direction k to l, for i ∈ V. K ← i = {kl ∈ K : D il ≤ D ∧ D lk ≤ D}, hub pair set for flow transfer from origin node i in the direction l to k, for i ∈ V. H ij = {l ∈ H : D il ≤ D ∧ D lj ≤ D}, hub set for flow collecting from origin node i, where it is directly distributed to destination node j, for ij ∈ E, such that H ij ⊆ H i . Any of the hubs i and j can be in H ij (but not both), such that i = k if hub i is outbound and j = l if hub j is inbound. H i = {k ∈ H : either ∃l|kl ∈ K → i or k ∈ H ij }, hub set for flow collecting from origin node i, for i ∈ V, such that either the collected flow at a hub is transferred to another, or it is directly distributed to the appropriate nodes. Note: Hub i can be in H i .

Strategic Multistage Operational Two-Stage Stochastic Trees
The notation of multistage and two-stage scenario trees to be used in this work is recalled from [11].

Strategic Multistage Stochastic Tree
Let a strategic scenario be the realization of the uncertain strategic parameters (i.e., hub-related building and capacity module installation costs) along the time horizon. A strategic node for a given stage has one-to-one correspondence with the group of strategic scenarios that have the same realization of the uncertain parameters up to the stage. This information structure can be visualized as the tree depicted in Figure 2, where each rootto-leaf path represents a specific scenario and, then, it corresponds to a realization of the whole set of the uncertain parameters. Let us point out that it is beyond the scope of this work to present a methodology for multistage scenario tree generation and reduction; see e.g., [12,13] and references therein. 14,27,28,29, 30} The notation for the lexicographically ordered sets in the strategic tree is as follows: T , stages, and T = |T |. N , nodes in the scenario tree, such that N = {0, · · ·, N − 1}, and N = |N |. N t , nodes in stage t, where N t ⊂ N , for t ∈ T . Note: By construction, |N 1 |=1. Ω, scenarios. Each one is included by the nodes in the Hamiltonian path from the root node 0 to a node, say, ω in the last stage, through the ones in set T ; so, ω ∈ N T . Note: For convenience, a scenario has traditionally been denoted by its last node in the path. Ω n ⊆ Ω, scenarios having one-to-one correspondence with node n. A n , node n and its ancestors, for n ∈ N . Note that A 0 is only included by node 0, where 0 ∈ N 1 . S n , successors nodes of node n, for n ∈ N . Note: S ω = ∅, for ω ∈ N T . S n 1 ⊆ S n , immediate successors of node n, for n ∈ N . Let the following other elements in the strategic scenario tree: w n , weight factor representing the likelihood that is associated with node n, for n ∈ N . Note: w n = ∑ ω∈Ω n w ω , where w ω ≥ 0 gives the modeler-driven likelihood associated with scenario ω, such that ∑ ω∈Ω w ω = 1. t n , stage to which node n belongs to, so, n ∈ N t n .
A n 1 , immediate ancestor node of node n, for n ∈ N . Note: It is assumed that A 0 1 is not necessarily null, since it is allowed that the hub network already exists before the beginning of the time horizon; so, in that case, the aim of the proposal is the hub expansion along the time horizon.

Operational Two-Stage Trees Rooted at Strategic Nodes
The operational uncertainty is represented in a finite set of stage-dependent scenarios. It is structured in a two-stage tree rooted at any strategic node in set N . So, the operational realizations (i.e., scenarios) are visualized in the nodes of the second stage. Let us introduce the following additional notation: Π t , set of operational scenarios in stage t, for t ∈ T . w π , weight factor or probability of operational scenario π, for π ∈ Π t , such that ∑ π∈Π t w π = 1, for t ∈ T .
So, it is assumed that the operational uncertainty in MS-Hub-NEP (e.g., flow demand) is independent of the strategic one.
Following on from the illustration of the strategic tree dimensions considered above, let us assume that there are |Π t | = 4 operational scenarios in the two-stage tree rooted at any strategic node that belongs to stage t in the scenario tree with |N | = 31 strategic ones. So, in total, there are 124 uncertain operational situations to be dealt with, being partitioned into 31 groups. It means that there are 124 hub operational (static deterministic) interconnected submodels, see next. Thus, the submodels related to the nodes in set S n 1 , for n ∈ N \ {N T }, have the same initial infrastructure topology (i.e., initial set of available hubs). Figure 3 depicts the illustrative strategic and operational trees, where, e.g., Π 3 = {a, b, c, d}, for any strategic node in set N 3 . For |Π t | = 8 ∀t ∈ T it results that there are 248 hub operational submodels. See, in Section 6, a broad computational experience.

Remark 2.
For the unlikely case where the strategic nodes in MS-Hub-NEP are also stagewisedependent on the operational ones then, instead of the tree depicted in Figure 3, the full combination of strategic and operational scenarios results in a gigantic multistage scenario tree. For ease of presenting, let us assume the frequent case where the multistage scenario tree is symmetric. Additionally, let us consider the two following alternative decision making frameworks at stage t. The first one is a two-step framework, where the strategic decisions are carried out at step 1 by considering all the potential operational scenarios that can occur at step 2 of the stage as above. In the second framework, it is assumed that the strategic decisions are delayed until the operational uncertainty is unveiled and, then, both types of decisions are made simultaneously, so, only one step is considered. Thus, let the new 0-1 parameter ρ t take the value 1 for the first framework in stage t, and otherwise, 0. So, for ρ t = 1, there are two types of nodes in each stage: strategic and operational. The operational nodes are the leaf ones in the stage. Let L t and, for this purpose, N t denote the leaf node subset and the node set in stage t, resp, for t ∈ T , being expressed as where b t gives the number of strategic nodes in stage t for the immediate ancestor, ∀t ∈ T \ {1}. As an example, for T = 5, b t = 2 and ρ t = 1, the dimensions of the scenario tree are ∑ t∈T |N t | = 23,405 nodes and |Ω| = L T = 16,384 scenarios for |Π t | = 4, and 629,145 nodes and 524,288 scenarios for |Π t | = 8 ∀t ∈ T . Observe that each scenario has a one-to-one correspondence with a hub strategic-operational multiperiod deterministic submodel. So, it is useful to consider the proposed approximation type in order to have an affordable structure. Location-Allocation Parameters for Strategic and Operational Scenarios γ t , maximum number of hubs that can be available at any strategic node n that belongs to stage t, for n ∈ N t , t ∈ T . λ t , maximum number of origin nodes that can be allocated to a hub (i.e., whose flow can be collected) in any strategic node that belongs to set N t , for t ∈ T . q k "maximum number of capacity modules that can be available at hub k, for k ∈ H. K, flow capacity of a module in any hub. Note: The modules are assumed to be identical.
Costs Related to Hub k in Strategic Node n, for n ∈ N F n k , hub building cost. G n k , installation cost of an initial capacity module at the hub. H n k , installation cost of an additional capacity module at the hub. M k,t , maintenance cost of a capacity module at hub k in stage t, for k ∈ H, t ∈ T . Note: It is the expected cost in the operational scenarios in stage t that also depends on the hub location.
Costs Related to Operational Scenario π in Stage t, for Π t , t ∈ T A π ik , flow allocation setup cost of origin node i to hub k, for k ∈ H i , i ∈ V. B π kl , flow transfer setup cost through hub edge kl, for kl ∈ K. C π ij , flow transportation setup cost from origin node i to destination node j in case the non-stop service mode is chosen, for ij ∈ E . P π ij , unit cost of flow (a) collection from origin node i at hub j, for j ∈ H i , i ∈ V, (b) transfer through hub edge ij, for ij ∈ K, and (c) distribution from hub i to destination node j, for i ∈ H, j ∈ V. Note: P π ik (resp., P π lj ) is zero for hub outbound (resp., inbound) flow, where i = k (resp., l = j).
Note: All costs should be computed by considering the net present value.
Flow from Origin to Destination under Operational Scenario π in Stage t, for Π t , t ∈ T F π ij , flow demand from origin node i to destination node j under scenario π, for ij ∈ E . O π i , overall flow originated in node i under scenario π, computed as ∑ j:ij∈E F π ij , for i ∈ V. Problem Decisions

•
At the strategic nodes of the multistage scenario tree: location of the new hubs and number of initial capacity modules to be installed as well as the additional ones to be installed in the successor strategic nodes, if any. • Under the operational scenarios in the two-stage trees, being rooted at the strategic nodes: flow from origin nodes to be collected in the hubs, flow transferred from one hub to another one, if any, flow distributed from the hubs to the destination nodes, and flow directly transported from origin to destination in the non-stop service mode.
The objective is to minimize the expected hub building cost, the expected cost of initial and additional capacity modules installation, and the expected cost of hub maintenance, operation and routing flow along the time horizon.

HLP Literature Review
Let us look at the following HLP categories:

Static Deterministic Hub Location
Most of the works in the literature on hub location and operation (up to 2016, at least) are usually related to a single period. Its key elements are assumed to be known with certainty, such the origin-destination flow demand, the hub availability, and the strategic and operational costs, among others. The hubs are frequently assumed to be uncapacitated ones, and different types of MILP models are introduced, as well as a few nonlinear ones. See the seminal works [14][15][16], and a survey of HLPs up to the early 1990s in [17]. An integer model for the Euclidean uncapacitated HLP is presented in [18], jointly with an algorithm to consider the set of valid inequalities that, in a Lagrangean relaxation framework, seem to be tighter to reduce the branch-and-bound (B&B) burden. Several MILP models are presented in [19] for the capacitated HLP, and tight valid inequalities are identified in a B&B algorithm. A survey of HLPs is presented in [20], where some real-world applications, models and decomposition algorithms for problem solving are reviewed by considering works in the literature from 2013 to 2016. A further comprehensive review is presented in [21] for the classical HLP.
The design of intermodal HLPs is reviewed in [10], where most of the 100 works that are considered were published between 2014 and 2020. Most of those works are static deterministic MILP models, very few are multiperiod ones, and the literature dealing with uncertainty is very scarce, mainly devoted to dealing with disruptive events at the hubs and the links. A capacitated intermodal transportation hub network model is presented in [22], where spoke-to-hub and hub-to-spoke transportation is allowed. A real-life problem of a Turkish public institution is considered for validating the proposed approach for problem solving. A MILP model for a capacitated multimodal, multicommodity HLP is presented in [23], where a non-stop service mode (i.e., a direct transportation from origin to destination) is considered as the hub network alternative.
A profit maximization is presented in [24] for a multicommodity multiple node allocation to hubs in a singular HLP, where the design (here, so-called strategic) decisions are as follows: nodes where hubs are located (satisfying the triangular inequality); node selection as origin or destination of commodity, at least, to serve; and hub edges selection to be used for transferring a commodity, at least. All of them are represented by binary variables. Operational decisions for the commodities, in case of being serviced, are those related to traffic through the network. All of them are represented by binary variables and, since, the problem is uncapacitated, the integrality is relaxed, due to the fact that an optimal solution guarantees that feature. A sophisticated Lagrangan relaxation is considered, where the optimality of the solution is guaranteed.
Two formulations are presented in [25] for the uncapacitated HLP, where the flow transfer between more than two hubs is allowed. Mixed integer-based approaches for dealing with discrete multiple location-allocation hub-centroid problems are presented in [26] and references therein. In that work, two non-cooperative agents, the leader and the follower, are considered for locating hubs to compete for the same market in a static setting.
A mixed integer nonlinear optimization (traditionally, named MINLP) model for a single period single allocation CHL problem is presented in [27], where the capacity is represented by a set of exclusive modules of different size, and two hubs are allowed, at most for flow transfers. The flow congestion in the hubs, see [28], is key in the approach, being introduced with non-linearities in the objective function. It is partially linearized by converting it into alternate partial linearizations, so that several mixed integer second order conic program-based reformulations are also proposed.
The HLP-Ordered Median Tree problem (HLP-OMTP) is presented in [29] in its singleallocation unconstrained version, where each network node is only allocated to one hub, and a given number of hubs should be located. The objective function to minimize is the weighted average cost of the ordered flow collection and distribution, as well as the hub edges flow. Several Minimum Spanning Tree and OMTP-based binary linear optimization models are introduced. A unified approach of the single-allocation OMTP is presented in [30], where a MILP model is considered.

Multiperiod Deterministic Hub Location
Multiperiod models are very scarce given the problem's difficulty; however, as noted in [31], 'ignoring the multiperiod nature of the problem might considerable formulations affect costs'. A good review of multiperiod hub network expansion planning (NEP) modeling and algorithmic approaches is presented in [2]. Among the very few works in the literature dealing with the multiperiod NEP problem, A MILP model for the uncapacitated hub NEP (MUNEP) is presented in [32]. A B&B approach is proposed for problem solving, where a Lagrangean relaxation is considered for decomposing the model into smaller submodels, by exploiting the NEP structure, and some reduction procedures are also presented. Another MILP model is presented in [3] for MUNEP, where the leasing of the hubs is performed along the time horizon, as well as the termination of the existing contracts. Two approaches for problem solving are proposed; namely, an extension of the Benders algorithm and a metaheuristic one. The latter is based on neighborhood structures, so that the algorithm moves from one feasible solution (given by a set of hubs and hub edges) to another one within the neighborhood of the former.
A MILP model is presented in [31] for the multiperiod capacitated NEP, where single and multiple allocation variants are considered, and tight valid inequalities are generated and appended to the model. Some MILPs for multiperiod multimodal NEP with hub and hub edge capacity degradations due to disruptive events are presented in [33,34]. A problem with serial demands is presented in [1], where a set of valid inequalities are identified and appended to the original MILP model. A Benders cut algorithm is introduced, and the approach is validated by considering a case study of humanitarian aid distribution in Lebanon.

Static Stochastic Hub Location
Most of the works in the literature on static stochastic location are usually related to network infrastructure stochastic disruptions, due to natural disasters and terrorist attacks. See in [35] a comprehensive review on the subject. Additionally, a MINLP model presented for the single-allocation capacitated HLP with stochastic complete and partial hub disruptions and stochastic partial hub edges (i.e., hub network links). A bi-objective framework is considered for the minimization of the total cost and the maximum transportation time, in order to get lower bounds for the Pareto-frontier. By appropriate approximations, the nonlinear chance constraints are converted into linear ones, and then a MILP model is considered. Near optimal Pareto solutions are obtained by two metaheuristics, namely, a genetic algorithm and a variable neighborhood search-based algorithm.

Two-Stage Stochastic Hub Location
The key parameters in HLP are frequently uncertain in the decision making process in real-life problems; however, the literature on stochastic hub NEP is very scarce. The realization of the uncertain parameters in mathematical optimization can usually be structured in a representative finite set of scenarios along the periods in a time horizon. Traditionally, special attention has been paid to optimize the DEM of the stochastic model by, in this case, minimizing the expected hub network location and operation in the scenarios. Obviously, that optimization is subject to the satisfaction of the problem constraints for each of the realizations (i.e., scenarios) that have been considered for representing the uncertainty of the main parameters. Thereafter, it will be referred to as the risk neutral (RN) problem. The parameters' uncertainty in HLP has been studied since the 1960s, see, recently, [5,36]. Most of the works in the literature deal with two-stage static stochastic models, and related algorithms for problem solving.
The capacitated/uncapacitated hub location decisions (referred to in this work as strategic ones) under uncertainty are considered as first stage variables in the models. No subordination is made to a single scenario, but all of them are considered as a whole for decision making. On the other hand, the (referred to in this work as operational) decisions on allocation of origin-destination flow demands from one hub to another are considered as second stage variables in the scenarios. A typical static two-stage MILP model is presented in [37], where the first stage variables represent the decisions about the air freight hub location, and the second stage variables are the flight routes to allocate. A two-stage pure binary model is presented in [38] for the uncapacitated HLP, where the uncertainty is associated to the flow demand and transportation costs. It is shown that the stochastic problem can be replaced with the equivalent expected value one for the demanddependent transportation costs. Additionally, a Monte Carlo simulation-based algorithm is introduced for problem solving in case of independent transportation costs, where a sample average approximation (SAA) scheme is integrated with a Benders decomposition. Simulated annealing and imperialistic competitive metaheuristic algorithms are presented in [39] to address a static sustainable fuzzy multiobjective HLP under uncertainty, based on chance constrained optimization. Another HLP and a genetic metaheuristic are presented in [40], where the overall transportation cost is minimized with a reliable constraint based on the flow maximization through the network. One of its new features is that the hub edge capacity is subject to stochastic degradation, as in a form of daily traffic, earthquake, flood, etc. A version of the capacitated HLP under uncertainty is presented in [41], where a robust static MILP model is introduced. A particle swarm metaheuristic algorithm is introduced in [42] to address a static capacitated HLP under uncertainty, where a fuzzy bi-objective model is also considered. A heuristic procedure for solving a static two-stage MILP model is proposed in [43] for the uncapacitated HLP. Its scheme for dealing with the non-stop service mode on the origin-destination flow servicing has partially inspired the current work. A two-stage model is presented in [44] for designing and maintaining a reliable and efficient transportation network, whose infrastructure protection is considered in an endogenous uncertainty-based scheme. An accelerated L-shaped algorithm (i.e., a specialization of the Benders algorithm for the two-stage stochastic optimization) is introduced for problem solving. A two-stage stochastic MILP model for the capacitated HLP is presented in [45], where the uncertainty in the demand is simulated by a Monte Carlo-based SAA scheme. Another accelerated L-shaped algorithm, jointly with a variable fixing approach, is presented for problem solving. A mixed integer quadratic model is presented in [46] for the capacitated two-stage stochastic HLP under demand uncertainty, where the location is performed at the first stage and the allocation of nodes to hubs is done in the second one; an ad-hoc B&C algorithm is introduced.
A robust optimization approach for dealing with the uncapacitated HLP is presented in [47], where the interval uncertainty is considered for the demand and transportation cost; the MILP model is solved by a B&C algorithm. Robust binary models have been presented in [48] for minimizing a distribution-free worst case scenario hub set-up cost, as well as the flow transportation cost for single and multiple allocation versions in a static (i.e., a single stage) setting.
To the best of our knowledge, the first work on a two-stage multiperiod stochastic MILP model is presented in [4] for the capacitated hub NEP, where the definition of the problem subject of the current work and some modeling schemes have been taken from. It is implicitly assumed that the uncertainty is not periodwise-dependent. In contrast, the value of the uncertain parameters, except for the first period, determines, with 100% certainty, the value of the parameters in later periods. The hub building and initial capacity modules installation are modelled by period-related 'here and now' (i.e., first stage) binary variables. The hub capacity modules expansion is modeled by scenario-related 'wait-andsee' (i.e., second stage) binary variables. There are no differences in the modeling treatment between the strategic and operational uncertainties. The modeling object of the binary decision variables is the so-called impulse one, instead of the tighter so-called step variable type. Note that the value 1 for a step binary variable in a given stage means, in this case, that the hub building occurred by that stage (i.e., at the same one or earlier). The value 1 for its impulse counterpart means that the hub building occurred at that stage.

Decomposition Matheuristics
Given the dimensions of the HLP large-scale instances, the straightforward use of state-of-the-art solvers requires a very high computational effort. There are numerous types of exact and inexact decomposition algorithms for stochastic MILP problem solving; see [49] for a recent literature overview. However, the model solving up to optimality even for a single scenario along a time horizon also requires a very high computational effort. So, Lagrangean-based approaches for solving inter-connected scenario-based submodels are excluded. To the best of our knowledge, the stochastic nested decomposition (SND) methodology, see [50,51], among others, could be an interesting one to consider, where the use of single strategic nodes-based submodels should not be discarded. However, the computing time could be expensive even in that case.
Matheuristics, see [52][53][54], are the methodologies of choice for providing (hopefully, good) feasible solutions for model (1)- (30), in affordable computing time. Very few even provide the solution's quality by obtaining lower bounds of the optimal solution value. Fix-and-relax (FR) is a matheuristic, as far as we know, introduced in [55] for solving dynamic deterministic 0-1 MILP problems; see also [56]. The solution value obtained at the first FR level is a lower bound of the solution value of the problem. The approach has given good results while solving large-scale dynamic deterministic and stochastic MILP problems, see [57][58][59][60], among others.
The so-named restrict-relax (RR) variant was introduced in [61] where, besides fixing variables and relaxing integrality, those fixings can be, iteratively, un-fixed at the B&C nodes. That approach was considered in [62] for constructing a feasible solution of the optimal maintenance and refueling scheduling nuclear power plant in the first stage of the multiperiod (5 years) two-stage 0-1 MILP model that is presented; the second stage is composed by the power production optimization in the scenarios. A multi-neighborhood search and other local procedures are devised for improving the RR solution. It allows one to consider multi-objective functions. On the other hand, it provides a lower bound of the optimal solution value. A hybridization MILP with dual heuristics and machine learning techniques for the scheduling power as above are presented in [63].

On Multistage Strategic and Operational Stochastic Scenarios in Hub NEP. The Subject of This Work
As can be observed in the stochastic HLPs considered above, the uncertainty is assumed to be revealed at a single moment, being the case of the two-stage single-period stochastic scenario tree setting, as well as the case of the two-stage multiperiod one. However, as pointed out in [5], there are many cases where the uncertainty is being revealed along a time horizon, being frequently the case where the realization of the uncertainty is stagewise-dependent. On the other hand, it is well documented in the stochastic literature that a two-stage multiperiod stochastic tree is usually a relaxation of its most difficult multistage stochastic counterpart. Observe that the nonanticipativity constraints are violated in the nodes of all stages but the first one, provided that the uncertainty is stagewisedependent. So, in that case, the two-stage approach would be inappropriate. However, the proposed scheme in this work benefits from the HLP categories briefly reviewed above.
In real-life hub NEP problems, two types of time scaling could be considered, namely, a long one (say, semesters, years) and another one where the timing is much shorter. So, strategic decisions (in the long time scale) and operational ones (in the shorter time scale) have to be considered. As it is pointed out in [9], point out that 'There is also a need to better integrate hub location models with service network design research to bridge the strategic and tactical [here, operational] decision models,. . . to bridge long-and short term decisions, requiring managing the time scale differences between the different decisions'. The hub location decisions, as well as the initial capacity dimensioning and extensions, are strategic decisions to be made along a time horizon. The flow transportation through the hub network, as well as a direct one for the pairs origin-destination, is an operational decision.
Two types of uncertain parameters are also considered in this work; namely, strategic and operational ones. Examples of strategic parameters are the costs of hub building, as well as the initial and additional capacity modules installation along the time horizon. The uncertainty of that type of parameters is usually stagewise-dependent, i.e., its realization varies depending on the realization of the uncertainty in the previous stages. In fact, the parameters in a given stage may have different realizations for each set of parameters in the previous stages. On other hand, the uncertainty of the so-called operational parameters is only stage-dependent. It is represented in a two-stage scenario tree, where the nodes in the second stage give the realizations of the set of those uncertain parameters (so-called operational scenarios). Examples of this type of parameters are the origin-destination flow demand, costs of flow hub collection, transfer and distribution, and the cost of the non-stop service mode. This work deals with handling those difficult schemes.
See in [11] the rationale behind the partition of uncertain parameters into strategic and operational ones and, as the case may be, tactical parameters. Basically, it consists of considering that the strategic decisions should not be based on occasional operational ones at the stages, nor on occasional realizations of other operational parameters. That is, it is assumed that the strategic decisions should not depend on single operational scenarios in the previous stages. By contrast, strategic decisions should depend on the realizations of the strategic uncertain parameters in the stage, which depend on the previous ones, as well as on the set of realizations as a whole of the operational parameters in the next stages. So, while dealing with uncertainty in multistage scenario trees, that observation is translated into considering that the strategic nodes in the tree should not be successors of occasional operational nodes. An additional reason is the gigantic stochastic model that would result in the unlikely case, where strategic and operational features are not being taken into independent consideration. Note that the operational uncertainty can be represented in a two-stage tree, where the second stage nodes have one-to-one correspondence with the operational scenarios in the stage. The root nodes of those trees are precisely the strategic nodes in the stage, where the strategic uncertainty is realized and the related hub availability is represented.
The multistage multiscale approach is considered in different industrial sectors, such as production energy planning, see [11,64,65], and rapid transit network design, see [66]. None of those works present any algorithmic approach for empirically validating the proposals, but an ad hoc one is considered in the latter. The multiscale approach is also considered in multistage forest stand harvesting selection planning, see [67], where multiperiod 'tactical' activity replaces two-stage operational' one. Note that it is related to storable good production (as opposed to service production as in this work). Then, it does influence, as an expected scenario set, the decisions to be made in the successor strategic nodes.
While dealing with large time horizon problems as in this work, those two frequent types of uncertainty imply two types of variables, precisely, strategic and operational. The strategic variables are related to the decisions on the NEP infrastructure elements, such as location, capacity, and availability timing. The operational variables are related to the operational decisions of the available elements in the network at the stages along the time horizon. So, there are two types of optimization models: strategic and operational, being very different in all aspects and intrinsically inter-related in a usually large-scale model for real-life problem solving.

Contribution of the Work
To the best of our knowledge, no work in the literature considers multiple-allocation stochastic hub location in a time horizon setting where strategic uncertainty is represented in a multistage finite scenario tree, and the operational one is represented in a finite set of stage-dependent scenarios.
The main features of the contribution are as follows: • A tight MILP model is introduced, where the investment on hub network availability and expansion (i.e., number of hubs and the additional number of capacity modules for the already available hubs) is assumed to be made at the strategic nodes of the stages along the time horizon. The operation (i.e., origin node flow collection, transfer and distribution to destination nodes) of the available hubs in the network is carried out at the operational scenarios in the stages. The objective function to minimize is the expected hub investment cost in the strategic nodes plus the expected cost related to the operational scenarios. It is well-known that the step modeling object is tighter than its impulse counterpart, since it produces higher lower bounds in the instances' minimization. It also allows state variables to only link pairs of consecutive stages; in fact, a variable of that type links a strategic node with each of its immediate successors in a multistage stochastic setting. • The large-scale of the problem's realistic instances is due to the hub network size and the number of capacity modules, as well as the cardinality of the strategic and operational scenario trees. It makes it unrealistic to seek a problem solving up to optimality by the straightforward use of MILP solvers and, probably, any other current means. Based on the type of network expansion modeling that is considered, the so-called SFR3 decomposition matheuristic algorithm is introduced for problem solving. It stands for scenario variables fixing and iteratively randomizing the relaxation reduction of the constraints and variables' integrality. The aim is to obtain (hopefully, good) feasible solutions whose overall cost optimality gap is also provided. The step variable modeling object is key for achieving a good performance of the matheuristic. The validity of the approach is computationally assessed, by considering a broad testbed from medium-and large-scale instances up to very large-scale ones. The quality of the results obtained by using a set of SFR3 strategies versus the straightforward use of a state-of-the-art MILP solver is also studied. Some of the results when using SFR3 are coherent with other works applying the matheuristic to solve very large scale instances of optimization problems, possibly highly constrained ones, see [68]. An approach with similar ideas as in SFR3 has been presented in [62] for a two-stage multiperiod MILP model, to deal with the optimal maintenance and refuelling nuclear power plants.

Strategic Multistage Operational Two-Stage Stochastic MILP Model for MS-Hub-NEP
Let the following notation, where binary and continuous variables are represented by Greek and small letters, resp., and q, q , n, n , n1, n2, n3, t, t are indices.
Strategic Variables for Hub k in Strategic Node n, for k ∈ H, n ∈ N The strategic variables represent the hub availability, the number of initial capacity modules and the number of additional ones as an expansion. They are as follows: γ n k , step binary variable whose value 1 means that hub k has been made available for flow servicing by strategic node n and otherwise, 0.
k = 1 means that hub k has been made available (i.e., built) in strategic node n. On the other hand, γ n k − γ A n 1 k = 0 means that hub k has been made available by strategic node A n 1 for γ A n 1 k = 1, and it has not yet been made available for γ n k = 0. Note 2: It is assumed that hub k is already available at the beginning of the time horizon for γ A 0 1 k = 1. δ n kq , step binary variable whose value 1 means that q capacity modules have been initially installed at hub k by strategic node n and otherwise, 0, for all q = 1, · · ·, q k . Note 1: it is assumed that q capacity modules are already available at the beginning of the time horizon for δ A 0 1 kq = 1. In that case, γ A 0 1 k = 1 as well. Note 2: the initial capacity modules must be installed in the same strategic node where the hub is made available, if any. So, γ n k = 0 implies δ n kq = 0, and vice versa, for all q = 1, · · ·, q k . Note 3: δ n kq = 1 implies δ n kq = 0 for q, q = 1, · · ·, q k : q = q . η n kq , step binary variable whose value 1 means that q additional capacity modules have been installed at hub k by strategic node n and, otherwise, 0, for all q = 1, · · ·, q k . Note 1: kq = 1 means that q additional capacity modules have been installed in strategic node n. On the other hand, η n kq − η A n 1 kq = 0 means that those q modules have been installed at hub k by strategic node A n 1 for η A n 1 kq = 1, and they have not yet been installed for η n kq = 0.
kq = 0 for q, q = 1, · · ·, q k : q = q . Note 3: the additional capacity modules can only be installed, if any, in the successor strategic nodes to the one where the hub was made available, so, γ n k = 0 implies η A n 1 kq = 0 for any admissible q.
The (kq)-based scheme is taken from [4], where the two-stage multiperiod stochastic framework is presented. However, the step-variable based version considered here for the multistage framework is tighter than the classical counterpart integer variable δ n k -based one. Let the following notation to be used throughout the rest of the work when considering the operational variables, see below: Π n := Π t n ∀ n ∈ N . It means the situation that results from the joint realization of the uncertain strategic parameters represented in set A n (recall that it is composed by node n and its ancestors) and the operational scenario set Π t n for stage t n .
Operational Variables under Scenario π, for π ∈ Π n , n ∈ N The operational variables represent the hub collection, transfer and distribution of the flow demand. They are as follows: x π ik , flow originated in node i that is collected at hub k (i.e., transported from node i to hub k) under scenario π, for k ∈ H i , i ∈ V. The flow collected a hub can be transferred to another one, and that flow can also be directly distributed to the appropriate network nodes. λ π ik , binary variable whose value 1 means that origin node i is allocated to hub k (i.e., x π ik > 0) under scenario π and otherwise, 0, for k ∈ H i , i ∈ V. Note: λ π ik = 0 implies x π ik = 0 and x π ik > 0 implies λ π ik = 1. y π ikl , flow originated in node i, which is collected at hub k to be transferred to hub l (as the second one), and from there, it is distributed to the appropriate network nodes under scenario π, for kl ∈ K → i , i ∈ V. y π ilk , flow originated in node i that is collected at hub l to be transferred to hub k (as the second one), and from there, it is distributed to the appropriate network nodes under scenario π, for kl ∈ K ← i , i ∈ V. Note: The variables y π ikl and y π ilk are mutually exclusive. µ π kl , binary variable whose value 1 means that there is flow that is transferred through hub edge kl under scenario π (i.e., ∃i ∈ V : (y π ikl > 0 or y π ilk > 0)) and otherwise, 0, for kl ∈ K. Note: µ π kl = 0 implies y π ikl = 0 and y π ilk = 0, and y π ikl > 0 or y π ilk > 0 imply µ π kl = 1 ∀i ∈ V. z π ilj , flow originated in node i destined to node j that is distributed from hub l; either as the second one (in which case, the flow has been transferred from another hub) or as the unique one (in which case, the flow has been collected at the same hub) under scenario π, for l ∈ H ij , ij ∈ E .
Operational Variables for the Non-Stop Service Mode between Network Nodes under Scenario π, for π ∈ Π n , n ∈ N ν π ij , binary variable whose value 1 means that flow F π ij from origin node i to destination node j is transported as a whole in a non-stop service mode under scenario π and otherwise, 0. Appendix A presents an example to illustrate the joint role of the strategic nodes and operational scenarios and the meaning of the variables in the model.

DEM for MS-Hub-NEP
The objective function C (1) is the minimization of the expected cost of the network expansion planning in the scenarios, subject to the constraint system (5)- (30). where Function (2) expresses the hub network expected investment cost due to hub building, initial capacity module installation, and additional ones along the time horizon. Function (3) expresses the hub capacity modules expected maintenance cost under the operational scenarios. Function (4) expresses the overall expected cost to setup the origin node allocation to hubs and hub edge transfer, as well as the expected cost for the non-stop service flow from origin to destination and for the collecting, transfer and distributing the flow from origin to destination in the operational scenarios. Note that the hub transfer link for flow originated in a given node can be performed from hub k to hub l and from l to k. Note that the elements {kl} in sets K → i and K ← i are, by construction, such that k < l. Constraints system (5)- (11) imposes restrictions on the hub building and the initial and additional capacity modules. Constraint (5) defines the step character of the γ-binary variables. Recall that γ A 0 1 k = 1 means that hub k is already available before starting the time horizon. Constraint (6) upper bounds the number of available hubs in the strategic nodes. Constraint (7) defines the step character of the δ-binary variables.
Constraint (8) upper bounds the number of the initial capacity modules to be installed in the hubs. Note that its installation is performed in the same strategic node (and, then, at the same stage), where the related hub is made available. Additionally, the hub building and its initial capacity module installation can only be performed once. So, given its oneto-one correspondence with the term q k ∑ q=1 δ n kq , the variable γ n k can be replaced. In that case, the constraint system (5)-(11) should be reformulated. The drawback of this alternative is that the model's constraint matrix density is strongly increased, and there is no guarantee that the resulting model will be a tighter one. Constraint (9) defines the step character of the η-binary variables. It also prevents additional capacity modules expansion in a strategic node n (i.e., η n kq = 1), in case the related hub was not made available in any of its ancestor nodes (i.e., γ A n 1 k = 0). Constraint (10) prevents the capacity module expansion at any hub in a strategic node in case the hub has not previously been made available in an ancestor node. Note that the constraint system (9) and (10) prevents the same number of additional capacity modules to be expanded in a hub in any successor to node n, where η n kq − η A n 1 kq = 1. This apparent shortcoming is arguably offset since, given the realistic instances' dimensionssee Section 6, seeking optimal solutions is not a high priority and, on the other hand, it is convenient due to its mathematical strength.
Constraint (11) upper bounds the number of capacity modules that can be available at the hubs. Observe also that the RHS is strengthened by considering the related γ-variables. Let us point out that this type of constraints could be reinforced by cuts expressed as ∑ q k q=1:q+q >q k δ n kq + η n kq ≤ γ n k ∀q = 1, · · ·, q k , k ∈ H, n ∈ N in order to prevent a number of capacity modules that violate the given maximum that is allowed at the hubs. However, provisional experience does not detect the computational effects of that model's tightness. Constraint system (12)- (25) imposes restrictions on the hub network under the operational scenarios that belong to the two-stage trees rooted at the strategic nodes in the multistage scenario tree. Constraint systems (12) and (13) define the variable upper bound of the λ-binary variables in the hub network, and it prevents the allocation of origin nodes to hubs that have not yet been made available.
Constraint (14) upper bounds the number of origin nodes to be allocated at any hub. Constraint (15) forces a hub edge, say kl, to not be in operation (i.e., µ π kl = 0), provided that one of the two hubs, at least, has not yet been made available (i.e., γ n k γ n l = 0) by the strategic node n where operational scenario π belongs to (i.e., π ∈ Π n ). Constraint (16) prevents the flow originated in node i, represented by the non-negative y-variables, could be transferred from k to l or viceversa in case the hub edge kl is not available.
For a given origin node, the constraint (17) ensures the flow conservation balance in a hub. Note that the flow that arrives to a hub, say k, as collected from the node plus the flow that is transferred to it from other hubs, must be equal to the flow that is distributed from that hub (as a unique one or a second hub), plus the flow that is transferred to the other hubs. Observe that the y-variables in system (16) and (17) belong to the sets K → i and K ← i for flow transfer from hub k to l and from l to k, respectively. Note also that the sets K → i and K ← i are restricted to the hub edge set {kl} in the directions from k to l and l to k, respectively.
Constraint systems (18) and (19) prevent the violation of the hub capacity by the overall flow collected and transferred.
Constraint (20) prevents the service mode between nodes i and j under an operational scenario, provided that both nodes were made hub available by the strategic node which the operational scenario belongs to. Note that the classical tight modeling object ν π ij ∈ {0, 1}, ν π ij ≤ 1 − γ n i , ν π ij ≤ 1 − γ n j cannot replace constraints (20) since, in this case, it should only restrict the non-stop service mode to the situation where neither of the two related network nodes is a hub. Constraint (21) ensures that the overall flow originated in a node is transported by using the hub network as well as the non-stop service mode, if needed.
Constraint (22) defines the flow distribution z-variable. It also ensures that if a destination node receives flow from some hub (i.e., it is distributing it to the node), then the hub is available.
Constraint (23) guarantees that an origin-destination flow is either delivered by using the hub network or the non-stop service mode. Constraint (24) prevents the flow originated at a hub (i.e., it is an outbound) from also being collected at another. Constraint (25) prevents the flow destined to a hub (i.e., it is an inbound) from being distributed by another.
The main limitations of model (1)-(30) are as follows: (a) no hub closing along the time horizon, see [3] for hub leasing and contract cancellation in an uncapacitated deterministic environment; (b) no degradation on the hub and hub edge capacities, due to disruptive events, see [33,34] for multiperiod deterministic, [35] for static stochastic, and [40] for two-stage stochastic with capacity degradation; and (c) no more than the classic two hub paradigm, at most, for each origin-destination flow, see [25] for uncapacitated static deterministic, where more than two hubs are allowed. See in Section 7.1 the outline of the future research agenda.
For the ease of presenting of the submodels that result from the decomposition of the original model to be presented next, allow the following additional notation for the variables' vectors: Γ n , state step binary variables in strategic scenario node n, for n ∈ N , such that Γ n = (γ n k , (δ n kq ) q k q=1 , (η n kq ) q k q=1 ∀k ∈ H) X π , continuous variables in operational scenario π, for π ∈ Π n , n ∈ N , such that X π = (x π ik ∀k ∈ H i , y π ikl ∀kl ∈ K → i , y π ilk ∀kl ∈ K ← i , i ∈ V; z π ilj ∀l ∈ H ij , ij ∈ E ) ∆ π , binary variables under operational scenario π, for π ∈ Π n , n ∈ N , such that ∆ π = (λ π ik ∀k ∈ H i , i ∈ V; µ π kl ∀kl ∈ K; ν π ij ∀ij ∈ E ) Additionally, let variableq n k denote the overall number of available capacity modules (i.e., the initial and additional ones) at hub k in strategic node n, for k ∈ H, n ∈ N . It is expressed as

SFR3, a Decomposition Matheuristic Algorithm
Fix-and-relax (FR) is a matheuristic algorithm that, to the best of our knowledge, was introduced in [55] for solving dynamic deterministic 0-1 MILP problems. See the same idea described in [56]. It consists of solving a sequential series of subproblems, such that the variables in each one are partitioned into three subsets. The variables in the first subset are fixed, the binary variables in the second one are kept integer, and the integrality of the binary variables in the third subset is relaxed. The partition of the variables results from an ordering which is established a priori, and the variables declared integer in each subproblem define the so-called FR level. In particular, the variables that are fixed at level are the variables fixed at level − 2 plus the variables declared integer in level − 1. However, as a matter of fact, the relaxation of the integrality in a sizable subset of binary variables in the original problem (1)-(30) prevents taking advantage of that integrality feature when solving the submodels in classical FR. Note that the knowledge of the variables' integrality by any solver strongly helps to model's tightening by performing probing, fixing variables, eliminating redundant constraints, and appending new cuts. So, the computing effort could be very high for problem solving by straightforward use of the optimizer as well as by using classical FR in the presence of a high number of binary variables in the instance.
Algorithm 1, SFR3, stands for scenario variables fixing and iteratively randomizing the relaxation reduction of the constraints and variables' integrality. It is a specialization of the FR methodology for partially preventing the drawbacks outlined above. At any iteration but the last one, a given subset of constraints and variables' integrality is relaxed. After fixing some variables at any iteration, the main idea consists of reducing those relaxations in a set of randomized executions. The rationale behind that scheme in a FR framework consists of keeping the submodels' dimensions within affordable limits until obtaining a (hopefully, good) feasible solution.
Let the following input parameters and related new sets in SFR3, see Figure 4 and, in particular, the relaxation so-called n-submodel of the original model (1) integrality is performed in the n-submodel, say, t = {T, T − 1, · · ·, T − t R + 1}, provided that relaxation is not performed for stage t : t ≤ t < t + t. t * , additional number of consecutive stages (starting with stage t + t) where no constraints are relaxed while solving the submodels at iteration t. Note: 0 ≤ t * ≤ T − t.
α t , β t , probabilities that are allowed for the non-relaxation of operational scenario and strategic nodes, resp., that belong to stage t in a execution of n-submodel, see next.
New sets at each execution of n-submodel N n t ⊆ S n ∩ N t , strategic node subset that belongs to stage t whose elements are successors of strategic node n that are not relaxed for solving n-submodel at an execution, for t ∈ T : t ≥ t + (t + t * ), n ∈ N t , t ∈ T : t ≤ T − (t + t * ). The nodes in setÑ n t are selected at random with probability β t , but with the caveat on keeping a Hamiltonian path from node n to a leaf node in the scenario tree. L n , leaf node set in the strategic subtree rooted at node n up to stage t + (t + t * ) − 1 for a given execution of n-submodel. So, L n = S n ∩ N t+(t+t * )−1 . For the purpose of preserving the caveat mentioned above, let the assumption: ∃ ∈ L n , such that all nodes, say, {n3}, in the Hamiltonian path from a node, say, n1 to a node, say, n2 in the strategic scenario tree, do satisfy the condition n3 ∈ (S A n3 1 1 ∩Ñ n t n3 ), for n1 ∈ S 1 ∩Ñ n t n1 and n2 ∈ S ∩Ñ n T . Note that, by construction, node n1 satisfies that condition. Π n ⊆ Π t , operational scenario subset in the two-stage tree rooted at strategic node n whose elements are not relaxed for solving n-submodel at an execution, for n ∈Ñ n t and t, n as above. The scenarios in setΠ n are selected at random with probability α t . S n , subset of the successor nodes in set S n \ S n that have been chosen not to be relaxed in the current execution of the algorithm, for n ∈ N t , t ∈ T : t ≤ T − (t + t * ). It can be expressed as (33). The solution of n-submodel at a given execution of any iteration is denoted as vector (∆ π ,X π ∀π ∈Π n ,Γ n ∀n ∈ {n} ∪ S n ∪ S n ). The cost can be expressed Let C n denote the smallest cost in the executions of an iteration, and note that C 0 is the lower bound of the optimal solution of the original model (1) Retrieved soln Γ A n 1 :=Γ A n 1 (36b) As cons system (5)- (11) where ∀n ∈ N is replaced with ∀n ∈ {n} ∪ S n ∪ S n (36c) As cons system (12)- (25) where ∀π ∈ Π n , n ∈ N is replaced with ∀π ∈Π n , n ∈ {n} ∪ S n ∪ S n (36d)

Binary variables' integrality is not relaxed in vectors
∆ π ∀π ∈Π n and Γ n ∀n ∈ {n} ∪S n ∪ S Remarks on n-submodel (36): Given the step variable type of the modeling object that is considered for the Γ-variables, only the strategic nodes A n 1 and n link the submodel (36). Node A n 1 is such that t A n 1 = t − 1. So, the solutionΓ A n 1 is retrieved from the A n 1 -submodel. Input data: B ∈ {L, U}, t, t * , t R , e, α t , β t ∀t ∈ T : t ≥ (t + t * ) ;

14
GenerateΠ n at random from set Π t n ;

Additional Details
• For B = L, C 0 is computed as the highest cost C 0 (34) in the e executions of n-submodel  (30). Note that the objective function and constraints for node 0 in the n-submodel, jointly with those of its successor nodes in sets (31) and (33) and related operational scenarios, gives a lower bound for each execution. Additionally, observe that the algorithm running is interrupted after obtaining that bound (lines [26][27], so, the input parameters t, t * , t R , e, α t and β t , ∀t ∈ T : t > (t + t * ) could be more restricted than the related values for option B = U.
• For B = U, C n is computed as the smallest cost C n in the e executions (lines 20-21) of the original model (1)- (30), so that the partial incumbent solution value (line 22) is updated. For that purpose, the scheme in lines 7-25 is used, where n-submodel (36) is solved for all n ∈ N t until the updated stage t (line 30) is zero. The rationale behind it is that, once the updated stage t (line 29) is such that t > T − t + 1, then the stages {t } involved in the next iteration are t = t, t + 1, · · ·, T. So, the execution of its Step 1 requires the resetting t := T − t + 1, and note that it is to be the last iteration of the algorithm. Observe that t is reset to zero at the end of its execution. Note 1: The submodel (36) for the full scenario tree rooted at node 0, (i.e., t = 1) is a relaxation of the original model (1)- (30). It is solved for each of the e executions. Note 2: The variables related to the ancestor strategic scenario nodes in set A n \ {n} are fixed to the values retrieved from the appropriate submodels. In any case, observe that retaining the solution related to the stages t to t + t − 1 (that is, fixing it) implies the independence of the submodels supported by the subtrees rooted at the nodes in set N t+t . • The parameters α t and β t can have a value of 1. It could be the case where neither strategic, nor operational constraints are relaxed. Observe that the relaxation of the variables' integrality is allowed for the last t R stages, so that it is the FR classical algorithm for t R = T − (t + t) + 1.
• The weights, say, w n and w π (lines 12,15 and 19) to be used in n-submodel (36) for obtaining the SFR3 solution are such that ∑ n ∈Ñ n t w n = 1 and ∑ π ∈Π n w π = 1.
• The strategic node setÑ n t for t ∈ T : t ≥ t + (t + t * ) and the operational scenario set |Π n |, for n ∈ S n in n-submodel (36), are generated anew in any of the executions, for n ∈ N t , t ∈ T : t ≤ T − (t + t * ) (lines [10][11][12][13][14]. Note: the best solution is retrieved among the e executions. • At Step 1, for a given stage t, such that t + (t + t * ) − 1 < T (line 9), the parameter t R is reduced to t R := t R − 1 for t + t = T − t R + 1 (line 16). Recall that the parameter t R is used for deciding the stages where the relaxation of the variables' integrality is carried out. Moreover, as above, but for t + (t + t * ) − 1 = T, the resetting in lines 18 and 19 is performed, including t R := 0 (i.e., no integer relaxation) and the related weights. Note: in this case, it is assumed that the n-submodels (36) can be solved without further relaxations in an affordable computing time. • At Step 2, the SFR3 solution valueĈ (35) is computed for the original model model (1)-(30).

Computational Experience
This section reports the main computational results that have been obtained while experimenting with a broad testbed composed of a variety of instances from medium MS-Hub-NEP ones up to large-sized instances and very large-sized ones. Section 6.1 presents the instance dimensions and the main parameters used in the experiment. It also gives the structure of the multistage scenario tree under consideration. Section 6.2 presents the dimensions of the related model (1)- (30) and reports the main results that have been obtained by considering CPLEX straightforward use. Section 6.3 presents the strategies that are used by matheuristic SFR3, as well as a comparison of the results.
The computational experiment was conducted on a PC with a 2.9 gigahertz dualcore Intel Core i5 processor, 8 gigabyte of RAM and operating system OS X 10.12.1. The modeling approach, as well as the proposed matheuristic, have been implemented in a C++ experimental code. The default options of CPLEX v20.1.0 are used for the full model (1)- (30) solving, as well as for the n-submodels (36) to be optimized in SFR3. However, given the difficulty of the problem, the optimality tolerance has been set to 1%. Table 1 shows the dimensions of the instances, under the assumption that the parameter D is large enough to allow K → i = K ← i = K, H ij = H i = H. The headings for the instances and the structure of the scenario tree are as follows: inst, instance code; network dimensions: |V | = |H|, number of network nodes, it is also the number of candidates hubs in this experiment; |E | = |K|, number of network edges, it is also the number of candidate hub links in this experiment; γ, maximum number of hubs that can be available at any strategic node, so, γ t := γ ∀t ∈ T ; |Π|, number of operational scenarios at any stage, so, |Π t | := |Π| ∀t ∈ T . The other parameters are as follows: q = 5, maximum number of capacity modules that can be installed at any hub, so, q k := q ∀k ∈ H; λ = 5, maximum number of origin nodes that can be allocated to a hub in any strategic node at any stage, so, λ t := λ ∀t ∈ T ; T = 5, number of stages in the time horizon. The dimensions of the strategic scenario tree are as follows, see Figure 3: b = 2, number of immediate successor nodes for any strategic node (i.e., b = |S n 1 | ∀n ∈ N \ N T ); N = 31, number of strategic nodes in the multistage scenario tree; and |Ω| = 16, number of strategic scenarios (i.e., |N T |). Assumption: The weights w n and w π are equiprobable, so, w n = 1 |N t | ∀t ∈ T and w π = 1 |Π| ∀π = 1, · · ·, |Π|. The number of static stage deterministic hub network expansion planning submodels within the (deterministically equivalent) stochastic full model (1)- (30) is N × |Π| = 31 × 4 = 124 for |Π| = 4 and 31 × 8 = 248 for |Π| = 8. Those submodels are inter-related by initially sharing hub-network designing γ-binary variables in the strategic nodes from their immediate ancestor in the scenario tree.

CAB-Inspired Data Generation
The testbed has been partially generated based on the well-known CAB data for deterministic network problems, see [69]. For the experiment with MS-Hub-NEP, a data pattern has been inspired on the two-stage multiperiod data considered in [4]. Accordingly, the hub building setup cost, F 0 i , has been computed as 7 × 15 × log(R i ), where R i is an 'overall' flow demand in origin node i.
By taking a 2% increase in the hub building cost from one stage to the next one in the deterministic case, it has been considered in this experiment that F n i = p n F n i , for n ∈ S n 1 , n ∈ N : t n < T, where p n = 1.02 and 1.04 for the even and odd nodes n , resp. Recall that node i is a potential hub, since V = H in this experiment.
The flow demand F π ij is computed as the perturbed CAB-based expression where R ij is the static stage deterministic flow demand for node pair ij ∈ E as taken from the CAB data. So, F π ij = R ij for the operational scenario π = 1 in the strategic node n = 0 (i.e., stage t = 1).
The flow capacity K of the modules can be expressed The installation setup costs of any initial or additional capacity module can be expressed G n k = 0.2 × F n k and H n k = 1.2 × G n k , resp., ∀k ∈ H, n ∈ N . The other cost types are available upon request from the authors.  Table 2 shows the model's dimensions and CPLEX results. The headings for the dimensions are as follows: m, n01 and nc, number of constraints, binary variables and continuous variables, respectively, and nel and dens%, constraint matrix nonzero elements and density, respectively. The headings for the results are as follows: z CPX , lower bound of the solution value (i.e., value of the best node in the B&B tree up to the optimization's interruption); z CPX and t CPX , incumbent MILP solution value, and its computing time (in seconds, as for all experiments), respectively; and GAP CPX , optimality gap of the incumbent solution, being computed as 100.

CPLEX Straightforward Use. Results
. Based on the computational results shown in Table 2, the testbed in Table 1 can be classified into three categories: Cat 1 is included by the middle-scale instances from i1 to i7, whose dimensions are up to m = 175,983 constraints, n01 = 27,156 binary variables and nc = 134,292 continuous variables; Cat 2 is included by the large-scale instances from i8 to i19, whose dimensions are up to m = 881,643, n01 = 90,706 and nc = 750,882; and Cat 3 is included by the very large-scale instances from i20 to i32, whose dimensions are up to m = 3,473,631, n01 = 242,172 and nc = 3,134,844. The computing time limit has been set to 2h for the instances in the categories Cat 1 and Cat 2. Moreover, given the high dimensions of the instances in category Cat 3, the time limit has been set to 15 h.
The solution's 1% quasi-optimality can not be proved in 4 out of the 7 instances in Cat 1, even considering the model's tightness (based on the step type of the γ-,δ-and η-binary variables). Observe that the optimization was interrupted due to the 2 h and 15 h time limitation in all of the instances in Cat 2 and Cat 3, resp. As a matter of fact, that limit was reached while obtaining the lower bound z CPX at the B&B root node. The incumbent solution value z CPX has been provided by the CPLEX preprocessing heuristic algorithms. Both values contribute to the very high GAP CPX , showing that CPLEX straightforward use is useless. The situation worsens for Cat 3, where usually even no lower bound is obtained at the root node in some instances, since CPLEX is running out of memory in most of them.

SFR3 Matheuristic. Results
Besides the input parameters t, t * , t R and e presented in Section 5, the elements of the SFR3 strategies for solving the n-submodel (36), supported for a scenario tree rooted at strategic node n, for each of its executions at the iteration coded as stage t, for n ∈ N , t ∈ T , are as follows: α, probability of a scenario not to be relaxed in any operational set Π t n for non-relaxed strategic node n , where stage t n starts with t + (t + t * ) along the time horizon, so that α t n is fixed to α. β, probability of a strategic node n not to be relaxed, where stage t n starts with t + (t + t * ) along the time horizon, so that β t n is fixed to β.
The following four SFR3 strategies are considered in decreasing relaxation order: stra = 1: Strong myopic. It consists of a myopic one stage-based rolling horizon approach, where the submodel supported by a subtree rooted at strategic node n, is included by the constraints and variables that only belong to that node as well as the related scenarios in operational set Π n . Its components are t = 1, t * = 0, α = β = 0, t R = − and e = 1, Note that the scheme for the latest consecutive stages for variables' integrality relaxation does not apply here. Once the solution is fixed, the algorithm proceeds to update t := t + t = t + 1.

stra = 2:
Weaker myopic. It consists of a myopic two stage-based rolling horizon approach, where (a) The model, supported by a subtree rooted at strategic node n, is included by the constraints and variables that belong to that node n and its immediate successors n (i.e., set S n 1 ) as well as the related scenarios in the operational sets Π n , for n ∈ {n} ∪ S n 1 ; and (b) The variables' integrality is relaxed in the node sets S n 1 and Π n , n ∈ S n 1 . Its components are t = 1, t * = 1, α = β = 0, t R = ( * ) and e = 1. By convention, t R = ( * ) means that the variables' integrality relaxation is restricted to the number t * = 1 of consecutive stages starting at stage t + t, and in this case finishing at it, (i.e., the next one to stage t in this experiment). Once the solution is fixed, the algorithm proceeds to update t := t + t = t + 1. stra = 3: Scenario variables Fixing and variables' integrality iteratively Relaxation Reduction in the given latest consecutive stages T − t R + 1, · · ·, T. By construction, no constraints are relaxed. Its components are t = 1, t * = 4, α = β = 1.00, t R = 4 and e = 1, since T = 5 in this experiment. Note: t

stra = 4:
Scenario variables Fixing and iteratively Randomizing the Relaxation Reduction of the constraints and variables' integrality, for chosen t, t * , α, β, t R , e.
The computing time limit has been set to 2 h for each execution of submodel (36) in any iteration of the strategies.
For assessing the performance of the SFR3 strategies, Table 3 reports the aggregation by categories Cat 1, Cat 2 and Cat 3 of the instances' results. The new headings are as follows: z FR3 , SD(z FR3 ), GAP FR3 , t FR3 , GR FR3 , average and standard deviation of the cost of the SFR3 incumbent solution z FR3 , average of the optimality gap GAP FR3 , average of the computing time t FR3 , and average of the goodness ratio GR FR3 , resp. All the instances' results are taken from Table A1, but GAP FR3 that is computed as 100.
, where the lower bound z FR3 is taken from Table 4. For comparative purposes, it is also shown z CPX for each category. It stands for the average cost of the CPLEX incumbent solution z CPX in the instances as taken from Table 2.
It can be observed in Table 3 that strategy stra = 4 obtains an incumbent solution value very similar for category Cat 1 to the one obtained by CPLEX straightforward. On the other hand, it is the champion for medium-and large-scale instances (i.e., categories Cat 1 and Cat 2, resp.). Note that the higher the pair (α, β), the smaller the expected cost of the incumbent solution in the scenarios. An important observation is that SFR3 provides a lower bound, such that the optimality gap for the best pair (α, β) is around 6%. Table 4 shows the results obtained for the best of the four strategies (i.e., the one with the smaller solution value), whose computational comparison is shown in Table A1, see Appendix B. The headings are as follows: inst-stra, instance code-strategy, where stra goes from 1 to 4; z FR3 and t FR3 , lower bound of the solution value of model (1)- (30) obtained by SFR3 in an independent run to that based on the chosen strategy (see below), and computing time, resp.; z B FR3 ,z B FR3 and t B FR3 , incumbent solution value (i.e., smallest 'expected' cost (35) in the scenarios), median obtained by SFR3 in the set of e executions that have been carried out for each n-submodel (36), and overall computing time that is required, resp., for the whole set of the e executions; GAP FR3 , optimality gap of the best SFR3 incumbent solution z B FR3 , being computed as 100. . Note: the smaller the goodness ratio GR FR3 < 1, the better performance of SFR3 versus CPLEX straightforward use.
Remark. The lower bound provided by SFR3 stra = 4 can be very weak for small values of pair (α, β). So, z FR3 is the related lower bound of stra = 3 (i.e., the classical fix-and-relax approach), independently of its ability for obtaining a feasible solution for the original model (1)- (30). It is the lower bound of the solution value of the model attached to strategic node n = 0 (i.e., at its iteration t = 1). For that purpose, 2 h time limit has been imposed for instances in Cap 1 and Cap 2, and 4 h is the time limit for the instances in Cap 3.
Note: z FR3 = − means that no solution has been obtained in the B&B root node within the time allowed, as in instances i26, i27, i29 and i32.  Observe that the optimality gap allowed 1% is also satisfied by SFR3 for the instances i1, i2 and i3. Additionally, the solutions obtained by SFR3 are very close to the CPLEX ones for the other members of Cat 1. Probably, it is an indication of the quality of SFR3 as a solution provider. On the other hand, note that all of the 32 instances in the testbed have been solved by matheuristic SFR3, where the strategies stra = 3, 4, usually, provide comparable solutions with the CPLEX ones for the instances in Cat 1, see the goodness ratio GR FR3 , requiring very little computing time. On the other hand, SFR3 consistently provides much better solutions for the instances with higher dimensions. It can also be observed that strategy stra = 4 provides the best solution for most of the instances. However, strategy stra = 3 gives the best solution value of 9 out of the 32 instances in the testbed within a reasonable computing time. Additionally, note that the myopic strategies stra = 1, 2 were needed for two of the most difficult instances, stra = 2 for i32 and even stra = 1 for i29. Observe also the weakness of the lower bound for the instances in Cat 3, if any, reaching the time allowed in all of them. Moreover, note also that the goodness ratio GR FR3 is very small for the instances where CPLEX straightforward use provides a feasible solution; it is below 0.04 for most of the instances in Cat 2 and Cat 3, which is an impressive result obtained by SFR3 versus CPLEX straightforward use; the problem is so difficult that it is hopeless, except for Cat 1.
It can be observed in Appendix B that the median costz FR3 in the e =10 and 5 executions is very close to the incumbent one z FR3 in most of the instances when strategy SFR3 stra = 4 is considered. Obviously, it means that at least 50% of the e executions give very close costs to the incumbent one. So, besides those statistics, it is also important to consider the cost distribution in the set of the e executions, instead of just considering the expected one for some values of pair (α, β). For that purpose, Figure 5 depicts the boxplot for the 'expected' cost (35) distribution as well as the median in the e executions for instance i6; all of those costs are generally named z cost . By construction, strategies stra = 1, 2, 3 only have e = 1 execution, so, the median is the same cost. On the other hand, e = 10 executions have been carried out for strategy stra = 4. Note the impact of considering the two-stage myopic strategy stra = 2 versus the one-stage stra = 1. The randomized relaxation reduction related strategy stra = 4 (where α = β = 0.50) provides a solution with the smallest median.

Conclusions
In this work, a mixed integer linear optimization model is presented for the capacitated hub network expansion planning problem, MS-Hub-NEP, in a multistage multiscale framework under uncertainty. Two types of uncertainties are considered. Both are represented in finite sets of scenarios. The first type is related to the costs of the hub building and initial capacity modules and extensions at the strategic nodes of a multistage scenario tree; it is assumed to be stagewise-dependent and, then, it is attached to each strategic node and influences its successors. The strategic variables have been modeled by considering the step modeling object a tighter approach than its impulse counterpart. Additionally, it is very appropriate for the decomposition matheuristic algorithm SFR3 that is also introduced in this work. In contrast, the second type of uncertainty is related to the hub network operation and, then, it is only stage-dependent. It represents the stage flow demand and cost of flow hub allocation, collection, transfer and non-stop service from origin to destination. A two-stage tree rooted at the strategic nodes is considered for representing the operational scenarios.
MS-Hub-NEP is very difficult to solve, even for medium-scale instances. CPLEX straightforward use could only guarantee the solution's 1% quasi-optimality in the three smallest instances in the experiment that has been carried out, being composed by 32 instances. As a matter of fact, its straightforward use is hopeless for large-and very large-scale instances, where the execution is interrupted when reaching the computing time limit, 2 h and 15 h, resp. The number of hubs, hub edges, stages and strategic nodes for each instance go from 4, 12, 5, 31 to 20, 380, 5, 31, resp., the operational scenarios are 4 and 8 per stage depending on the instance, and the number of constraints, binary variables and continuous variables are 3,473,631, 242,172 and 3,134,844, resp.
The matheuristic SFR3 provides a solution for all of the 32 instances in the testbed. Two of the four strategies that have been experimented with are one-stage and two-stage myopic rolling time horizon-based approaches. They are a benchmark for the other two strategies as well as for solving 2 out of the 13 very large-scale instances; the computing time is affordable. The third strategy is the well-known Fix-and-Relax approach. Its usefulness for solving MS-Hub-NEP has been empirically proven in this experiment, provided that the hardware/software resources allow to solve the n-submodel (36) at the first iteration t = 1. The fourth strategy deals with the scheme for iteratively Randomizing the Relaxation Reduction of constraints and variables' integrality for solving MS-Hub-NEP; it could be expanded without much difficulty to solving other stochastic multistage multiscale problems. The α-and β-mechanisms for selecting operational scenarios and strategic nodes, resp., not to be relaxed in each execution of the submodel to be solved at each iteration of the decomposition approach, have shown to be very efficient for solving this very challenging HLP. The strategy is very flexible and its roots go back to the third strategy; on the other hand, the optimality gap of the attained incumbent solution is reasonable. The strategy usually provides the best results, within affordable computational resources. It outperforms CPLEX straightforward use for large and very large instances (i.e., categories Cat 2 and Cat 3). Note that the solution values of CPLEX and SFR3 are very similar for the medium scale instances Cat 1.
Anyway SFR3 provides the lower bound of the solution value by considering the third strategy for stage t = 1. Its optimality gap is around 6% for most of the instances.

Outline of Perspectives and Future Research Agenda
The structure of the model MS-Hub-NEP (1)- (30) can be considered on other types of system or network expansion problems as the facility location and expansion planning [68] and, in a more general setting, the distributional robust optimization multistage multiscale stochastic problems [70]. The proposed matheuristic methodology, including its scheme for obtaining lower bounds and the potentiality for considering multi-objective functions, could be extrapolated to solve other multistage multiscale MILP as those ones, among others, in a similar way as in [62].
The features of the strategic nodes and the operational scenario submodels that would extend the proposed model as items in a research agenda are as follows: In the strategic node submodel: (a) a moderate additional number of hubs to the two, at most, that right now are considered for any origin-destination flow, (b) strategic uncertainty on the (total or full) unavailability of hub edges and hubs themselves, due to disfunction interruptions; and (c) allowing the possibility of hub leasing for given stages. In the operational scenario submodels: operational uncertainty on the (total or full) unavailability of hub edges and hubs themselves, due to temporal interruptions in the stages. Those features may substantially increase the model's high dimensions, due to the higher cardinality of the operational scenario sets and additional strategic constraints. However, it seems that those modifications can be accepted by the algorithm SFR3 as it is, given its generality.
In any case, note that the computing time that is required by strategies stra = 3, 4 is relatively high for solving the n-submodel (36) for the set included by the first t stages, even for t = 1. So, that difficulty precludes the exclusion of decomposition algorithms that require the optimal solution of even one single scenario submodel as e.g., those approaches based on scenario Lagrangean decomposition, see [49]. Probably, given the character of the state step variables, the Stochastic Nested Decomposition (SND) research would take benefit from the splitting variable scheme introduced in [51] for state binary variables, see also [50]. As a matter of fact, it does not require solving any full scenario submodel. So, it could be interesting to computationally compare the performance of SND against the proposed matheuristic SFR3 alone, as well as considering it in the Front-to-Back step of the first SND iteration.
Those decisions are related to transporting the flow demand from origin to destination. That is, it shows the routes i → j, i → k → j or i → k → l → j, for the flow of each network node pair (ij) under the operational scenario in Π n assumed in the example for the strategic node set under consideration. Recall that i → j denotes the non-stop service mode for node pair (ij), so that in case of occurring then the other routes are prevented. For saving space purposes, let us assume that the operational decisions that have been made for strategic node n = 0 are as the ones considered in Figure 1, so that they have been brought to Figure A1. The operational decisions for the strategic nodes n = 2 and n = 5 in the example are also depicted in the figure. Observe that it is assumed that hub 1 is outbound under the given operational scenario related to strategic node n = 5, being the origin of a portion of the flow demand to destination node 5.
It is worth to point out that it is required to increase the randomized relaxation fraction up to (1 − α) = (1 − β) = 0.84 for some of the most difficult instances, say i31 and i32.
Observe also that the SFR3 strategies stra = 3, 4 usually provide comparable solutions with the CPLEX straightforward use for the instances in Cat 1, see the goodness ratio GR FR3 , requiring very small computing time. On the other hand, SFR3 consistently provides much better solutions for the instances with higher dimensions. Note that in most of the instances the goodness ratio G FR3 is below 0.04, which is an impressive result for SFR3 versus CPLEX straightforward use. On the other hand, it requires an affordable computing time up to instance i22, being usually smaller than the one allowed for CPLEX straightforward use. No solution is provided by the latter for most of the difficult problems in Cat 3. However, the strategies stra = 3, 4 do provide the best solution value, but for instances i29 and i32 where stra = 1, 2 give a better solution in a much smaller computing time.
As a reminder, SFR3 strategy stra = 3 is the classical Fix-and-Relax, which does not provide any solution for instances i25, i31 and i32, see Table A1. The main reason is that CPLEX cannot solve n-submodel (36) at iteration t = 1; note that the variables' integrality is relaxed for some of the submodels. On the other hand, note that it could happen that CPLEX straightforward use (for the full model) does provide a solution for the same instance. The main reason in large-scale instances is that CPLEX -as well as other state-of-the-art solvers-use the available information on the variables' integrality. For the straightforward use of the solver, it is the information on all of the binary variables from the first stage to the last. By contrast, CPLEX in SFR3 strategy stra = 3 does only use the information on the variables' integrality of the strategic nodes where the solution is sought for (i.e., set n ∈ {n} ∪S n and related sets Π n ), while the variables' integrality for the other successor nodes has been relaxed. However, observe that the model dimensions for CPLEX straightforward use could be increased by appending the additional cuts generated by the integrality implications detected in the preprocessing phase. Thus, that increase in the model could prevent attaining solutions at the B&B root node when the computer memory and time requirements are up to the imposed limitations, as it is the case for the testbed in the experiment. As an example, SFR3 stra = 3 gives the incumbent solution value z FR3 = 494,044 for instance i10, requiring t FR3 = 412 seconds. CPLEX straightforward use could not provide a solution at the B&B root node in the 2h time limit, but the lower bound z CPX = 472,112 and, on the other hand, z CPX = 2,266,526 is the very bad solution obtained by the CPLEX preprocessing heuristics; see Table 2 in the main body of the paper. Thus, GR FR3 = 0.22.
It can be observed for SFR3 stra = 4 that the median costz FR3 in the e =5 and 10 executions is very close to the incumbent z FR3 in most of the instances. Obviously, it means that at least 50% of the e executions give very close costs to the incumbent one. So, besides those statistics, it is also important to consider the cost distribution in the set of the e executions, instead of just the expected one for some values of pair (α, β). For clarification purposes, see Figure A1 above.  inst-stra t t * α β t