Network- and Demand-Driven Initialization Strategy for Enhanced Heuristic in Uncapacitated Facility Location Problem

Lin, Jayson; Yang, Shuo; Huang, Kai; Wang, Kun; Jang, Sunghoon

doi:10.3390/math13132138

Open AccessArticle

Network- and Demand-Driven Initialization Strategy for Enhanced Heuristic in Uncapacitated Facility Location Problem

by

Jayson Lin

^1,2,3

,

Shuo Yang

^1,*,

Kai Huang

^2,4,*,

Kun Wang

¹

and

Sunghoon Jang

³

¹

School of Civil Engineering, Anhui JianZhu University, Hefei 230601, China

²

School of Instrument Science and Engineering, Southeast University, Nanjing 210096, China

³

Department of Civil and Environmental Engineering, The Hong Kong Polytechnic University, Hong Kong 999077, China

⁴

Wuxi Campus, Southeast University, Wuxi 214100, China

^*

Authors to whom correspondence should be addressed.

Mathematics 2025, 13(13), 2138; https://doi.org/10.3390/math13132138

Submission received: 13 May 2025 / Revised: 17 June 2025 / Accepted: 20 June 2025 / Published: 30 June 2025

(This article belongs to the Special Issue Advanced Approaches to Mathematical Programming: Exact Methods, Metaheuristics, and Machine Learning Synergies)

Download

Browse Figures

Versions Notes

Abstract

As network scale and demand rise, the Uncapacitated Facility Location Problem (UFLP), a classical NP-hard problem widely studied in operations research, becomes increasingly challenging for traditional methods confined to formulation, construction, and benchmarking. This work generalizes the UFLP to network setting in light of demand intensity and network topology. A new initialization technique called Network- and Demand-Weighted Roulette Wheel Initialization (NDWRWI) has been introduced and proved to be a competitive alternative to random (RI) and greedy initializations (GI). Experiments were carried out based on the TRB dataset and compared eight state-of-the-art methods. For instance, in the ultra-large-scale Gold Coast network, the NDWRWI-based Neighborhood Search (NS) achieved a competitive optimal total cost (9,372,502), closely comparable to the best-performing baseline (RI-based: 9,189,353), while delivering superior clustering quality (Silhouette: 0.3859 vs. 0.3833 and 0.3752 for RI- and GI-based NS, respectively) and reducing computational time by nearly an order of magnitude relative to the GI-based baseline. Similarly, NDWRWI-based Variable Neighborhood Search (VNS) improved upon RI-based baseline by reducing the overall cost by approximately 3.67%, increasing clustering quality and achieving a 27% faster runtime. It is found that NDWRWI prioritizes high-demand and centrally located nodes, fostering high-quality initial solutions and robust performance across large-scale and heterogeneous networks.

Keywords:

stochastic optimization; heuristic algorithm; network-based uncapacitated facility location problem; constructive method; demand intensity; network topology

MSC:

90-10

1. Introduction

The facility location problems are a core area in operations research, underpinning practical progress in logistics, supply chains, and urban planning. These problems aim to determine optimal facility locations to reduce construction and transport costs while ensuring efficient service coverage [1,2,3]. The Uncapacitated Facility Location Problem (UFLP) and its variant are particularly important for large-scale, real-world networks with many demand nodes. Such problems require sophisticated algorithmic solutions to deliver high-quality results in reasonable computational times [4,5,6,7,8]. In 1979, Kariv and Hakimi proved that the p-median problem (pMP) is NP-hard, a finding applicable to its variant, UFLP [9]. The finding lent emphasis to the hardness of finding optimal solutions for general instances of the problem for arbitrary networks. Through the lens of optimization theory, both are recognized as generalized mathematical problems that have been extensively studied.

Contemporary solution approaches include exact, heuristic and metaheuristic methods. Exact approaches, such as advances branch-and-bound [10,11,12] and decomposition techniques [13,14,15,16,17], provide optimality guarantees for small- to medium-size instances, but they are not applicable for large network sizes [18]. Heuristic methods such as Greedy algorithms [19,20] achieve better scalability [21] yet might prioritize on solution quality or consistency [18]. Heuristic approaches like Greedy algorithms [19,20] offer improved scalability [21] but may compromise on solution quality or consistency [18]. Metaheuristics are very effective for solving complex UFLP instances. For example, genetic algorithms [22,23,24,25] outperform deterministic methods but are difficult to avoid being trapped into local optima [26]. In addition, Neighborhood Search (NS) [27] and Variable Neighborhood Search (VNS) [28] are among the state-of-the-art methods for solving large-scale UFLPs and related discrete facility location problems [18]. The classic NS [27] iteratively reallocates facility−customer pairs within pre-specified spatial clusters. Although it converges to the local optimum, its global search ability is restrained by myopic search ability. The latest VNS [28] generalizes the previous NS to include the exchange procedure in an iterative framework with the shaking step, a local search phase, and the evaluation of interchanges. It is shown that such an approach works well at a large scale. These techniques trade off local intensification and global exploration, which leads to better solution quality in challenging high-dimensional environments.

Although a lot of effort has been devoted to the research on NS and VNS schemes for improvement, the quality of the initial solution provided by constructive methods is also important but sometimes neglected. Both NS and VNS need a feasible solution, and its quality directly impacts convergence rate and the result quality. Hansen and Mladenović [28], who introduced VNS, advocated using random initialization (RI) to encourage diversification. While RI enhances diversification through generating random or semi-random starting points, it often yields poor solutions without further improvement and typically requires multiple runs [29]. In contrast, greedy initialization (GI) [19] offers a flexible framework for addressing various facility location problems. For instance, Jacobsen [30] extended this approach to the capacitated facility location context by incorporating considerations of remaining facility capacity during selection. GI is simple to implement and produces a feasible solution rapidly. However, its inherently myopic strategy may lead to suboptimal outcomes, as it does not account for the global structure of the problem space [31]. More recently, advanced initialization methods like GRASP (Greedy Randomized Adaptive Search Procedure) have emerged [32,33,34], balancing greediness and randomness to generate higher-quality initial solutions prior to local search. Techniques such as roulette wheel initialization [35], commonly used in algorithms like K-means++, exemplify randomized greedy construction by selecting candidates based on weighted probabilities, offering a flexible and broadly applicable alternative to purely random or purely greedy methods.

Traditional methods such as RI [28] or GI [19,20] are used, yet these approaches frequently generate suboptimal or heavily biased solutions, especially in large and heterogeneous networks [36]. While recent studies have introduced various improvements to exploration mechanisms [37,38,39] and hybridization with other metaheuristics [40,41,42], only limited focus has been given to development of effective initialization strategies tailored for UFLPs. As a result, state-of-the-art metaheuristic performance is often inherently constrained by the quality of starting solutions, leading to increased computational overhead and reduced robustness. In addition to the challenges in initialization, two further research gaps persist. First, there remains a lack of clear distinction and formal definition between discrete and network-based UFLPs, despite substantial progress in applying VNS to pMP variants such as the Hamiltonian p-median problem (HpMP) [40], the α-neighbor p-median problem [41], the balanced p-median problem [42], and the capacitated p-median problem (CpMP). This gap limits both the development of new solution strategies and the formulation of appropriate models for network-based pMP variants. Second, there is a shortage of benchmark instances suitable for evaluating network-based UFLP solutions. The widely used the OR-Library test sets [43] (Beasley, 1985; 1990) provide 40 standard UFLP instances, with problem sizes ranging from 100 to 900 nodes, but these datasets are strictly discrete: facilities and customers are indexed independently, and cost information is given only as explicit matrices, without regard to network topology or spatial structure. Additionally, other existing studies, such as those by Gwalani, Tiwari & Mikler [44] and Daskin & Maass [45], are limited both in terms of network size (small to medium scale) and the breadth of algorithmic comparison, further constraining the advancement and standardization of research in this field. In response to the limitations of existing constructive methods, notably their failure to cope with large-scale, network-based UFLP instances. The goal is to contribute to this area through mathematical modeling, theoretical analysis, and experimental validation. The main contributions are as follows:

Formalization definition of the N-UFLP: This study sets the basis for further research on the Network-Based Uncapacitated Facility Location Problem (N-UFLP), addressing the gap in the existing literature that largely focuses on discrete UFLP models and benchmark datasets. In the case study and experiments, we used the TRB dataset [46], which is often used as a benchmark for transportation networks.
Development of highly competitive constructive methods: Although significant research effort has been devoted to the development of improvement algorithms/metaheuristics (e.g., NS and VNS), there has been limited exploration into alternative constructive algorithms beyond random or greedy initialization. We propose a new initialization method specific to the N-UFLP, which uses network topology and demand intensity and shows better performance compared to baseline initialization methods.
Comparative benchmark testing and evaluation: Multi-case comparative tests will use large-scale network data (i.e., Gold Coast network) and six additional networks with UFLP attributes. The study compares mainstream algorithms, such as branch-and-bound, genetic algorithms, Lagrangian relaxation, greedy heuristics, NS, VNS, and alternative initialization techniques. Through evaluation across total cost, computational efficiency, and clustering quality, this work not only provides new insights into the relative strengths of different constructive methods but also forms a reference for future research on large-scale network-based facility location.

The rest of this paper is structured as follows. Section 2 formally defines the N-UFLP. Section 3 describes the methodological setting, which integrates baseline algorithms, the proposed method, and the comparison of algorithmic mechanisms and computational complexity. Section 4 describes the design of the experimental and the results. Section 5 concludes with a discussion of the findings and the wider implications for research. Finally, Section 6 concludes the work and discusses some possible avenues of research.

2. Problem Definition

2.1. Network-Based Uncapacitated Facility Location Problem (N-UFLP)

The UFLP, also known as the Simple Plant Location Problem [47] or Warehouse Location Problem [48], is one of the classical location problems in operations research. It aims to determine an optimal subset of facility sites from a given set of candidates to minimize the total cost, which itself generally consists of two components: one is associated with the setup costs of a certain number of facilities (fixed charge), and the other is the sum of the transportation costs. Unlike routing problem formulations such as the Vehicle Routing Problem (VRP) and the Capacitated VRP (CVRP) which aim at finding the best transportation paths between nodes [11], the UFLP—as a prototypical location problem—aims at jointly optimizing the opening of facilities and the assignment of customers to facilities in such a way that they can all satisfy their demand, at the least cost of getting such a demand met, where this is typically represented as total demand weighted travel cost (i.e., a cost matrix).

This matrix is typically estimated using shortest-path methods, making the UFLP fundamentally built upon the outcomes of classical routing problems.

Prior studies [4,5,9,11,45,47] have investigated and formalized the mathematical structure of the UFLP. Let I = {1, 2, …, M} denote the set of all potential facility nodes with |I| = M, and suppose every node is a candidate for facility location. Let J ⊆ I, J = {1, 2, …, N} (N ≤ M) specify the subset of customer nodes; the following parameters can be defined:

Trip attraction (h_j): Let h_j represent the N × 1 demand/trip attraction at customer node/demand site j.
Distance (d_ij)—the effective (shortest-path) distance between facility i and customer j.
Transportation cost rate (α)—a scalar parameter α multiplies the effective distance to yield the transportation cost.
Facility setup cost (β)—a scalar parameter β denotes the unitary fixed cost for opening a facility. This cost is assumed to be identical for all candidate locations.

The UFLP can then be formulated as follows:

\min α \sum_{i \in I} y_{i} + β \sum_{j \in J} \sum_{i \in I} h_{j} d_{ij} x_{ij}

(1)

s . t . \sum_{i \in I} x_{i j} = 1, \forall j \in J,

(2)

x_{i j} \leq y_{i}, \forall i \in I, \forall j \in J,

(3)

y_{i} \in {0, 1} \forall i \in I,

(4)

x_{i j} \in {0, 1} \forall i \in I, \forall j \in J .

(5)

where y_i and x_ij are binary decision variables indicating whether facility i is opened and whether customer j is assigned to facility i, respectively. The objective function seeks to minimize total costs; constraint (2) requires each customer to be served exactly once; constraint (3) enforces that assignments can only occur to open facilities. All variables are binary, reflecting the combinatorial nature of the classical UFLP.

The UFLP takes customer locations, customer demand, transportation costs, and facility setup costs into consideration but is limited to network topology. This gap will be addressed by the introduction of the N-UFLP in this work, which is an extension of the original formulation which inculcates network topology and its associated operational characteristics in an explicit manner:

Node location integration: Both facilities and customers are co-located in a unified set of nodes in which the first N nodes serve dual roles. This contrasts with the traditional UFLP that separates customers from facilities and may include self-service restrictions (e.g., a facility can serve itself). It better captures realistic applications, including telecommunication, logistics, and waste management systems, in which the real-world nodes (e.g., intersections, hubs, or urban locations) can act both as service origins and destinations, improving model fidelity to network realities [5,48,49].
Path-dependent distance metric: Transportation costs depend on the shortest paths on the network, which are the sum of edge weights between nodes with the Floyd-Warshall algorithm. Unlike classical UFLP models, which assume Euclidean distances or direct transportation costs, this approach dynamically computes path costs based on the network topology. Additionally, we assume that the adjacency-based distance is undirected, which implies that the path-dependent distance is also undirected. For a number of applications such as transportation and waste collection, previously, the shortest-path network distances (rather the geometric distances) are relevant for operational reality, since actual network paths are defined by the geometry and connectivity of the network [2,50,51].
Network distance matrix: A pre-computed M × M matrix stores all-pairs shortest-path distances for M nodes, replacing conventional transportation cost matrices. The precomputation of shortest-path distances makes it feasible to solve big problems efficiently and to obtain fast cost estimates within metaheuristic or exact optimization frameworks widely adopted in both academic research and industrial applications [2,52].
Network connectivity constraints: They accommodates disconnected subnetworks (e.g., isolated traffic zones) by creating virtual edges that attach isolated nodes to the closest connected component for full-service coverage. This aspect of the various communication services is critical, particularly in dense urban logistics, telecommunication, and emergency service applications when we need to guarantee the service in the presence of network disconnections and dynamic topological changes [49,53]. Virtual/auxiliary links are commonly used in practice to ensure full coverage.
Demand-weighted travelled cost: It combines customer demand volumes with path distances to generate a demand-weighted travelled cost metric. Weighting travel costs by demand mirrors operational objectives in logistics and supply chain management, where both the volume and the distance traveled significantly influence the optimal facility location and overall system efficiency [49].

The case in Figure 1 is adapted from the 6-point tree provided by Goldman [54], and the only difference is made to emphasize key assumptions in the N-UFLP. In the pre-location network (Figure 1a), every node {A, B, C, D, E} acts as a customer demand point, and each node has specific trip attraction weights {100; 150; 125; 175; 250; 200}. Edges correspond to adjacent distances (undirected), ranging approximately from 3 to 9 units. In the post-location network, two nodes (labeled D and E) are selected as facilities, allowing self-service and reflecting real-world scenarios in which facilities and customers coexist on the same network nodes. The assignment costs between customer nodes and facilities are determined by the shortest-path distances along the network (for example, assigning A to facility D incurs a cost equal to the minimum number of edges traversed, which here is 3 + 5 = 8), rather than by simple Euclidean distance; these costs adjust in real time in response to shifts in the network’s topology and edge weights. Before solving, all-pairs shortest-path distances are computed and stored in a distance matrix. The model also accommodates the possibility of disconnected network components; if, for example, node E is isolated, a virtual edge with a suitably high cost will be added to ensure it remains serviceable (virtually connected to point C here), thereby guaranteeing that the solution is defined for any network configuration even in fragmented real-world settings. Finally, the cost for assigning customers to facilities is carefully weighted by customer demand. There, if the demand at F is large, the travelled cost of assigning F to facility at D will also be large and the cost function will thus mimic the real logistical or supply chain problems. Based on the assumptions and explanations outlined above, we further define the specific sets and indices. First, define the adjacency-based distance matrix as Δ = [δ_ir], for all i, r ∈ I:

δ_{i r} = \{\begin{array}{l} 0, & if i = r, \forall i \in I . \\ ϖ_{i r}, & if i and r are adjacent nodes \\ with edge weight; symmetry ϖ_{ir} = ϖ_{ri} . \\ γ (i, r), & if i is an isolated node, \\ but with customer demand . \\ \infty, & if i and r are not connected, \\ after connectivity adjustment . \end{array}

(6)

where γ(i, r) is the Euclidean distance computed for isolated nodes, computed from their geographic coordinates given in the M × 2 matrix node Coords when a node j with customer demand is identified as isolated (i.e., δ_ij = ∞ for all j ∈ J);

ϖ

denotes the actual edge weight for directly connected nodes. The geodist function triggers the γ condition when either node i or r represents an isolated node with customer demand. This indicates that these nodes are special cases within the network data, often corresponding to unique subareas in real-world transportation networks or OD points connected via feeder branches. To ensure effective routing for demand originating from these isolated nodes, we connect them “virtually” to their geographically nearest reachable node via the Euclidean distance.

Next, compute the shortest-path distance matrix D = [d_ir]_M_×_M using the adjacency matrix Δ:

d_{i r} = ψ (i, r; Δ), \forall i, r \in I

(7)

where ψ(i, r; Δ) denotes the length of the shortest path from i to r in the graph defined by Δ. This ensures that all transportation costs reflect the underlying network topology.

To account for demand, we define the demand-weighted traveled distance (W_ij) as:

W_{i j} = h_{j} \cdot d_{i j}

(8)

where h_j is the demand at customer node j. This transportation cost is incurred when assigning customer demand h_j to a facility at node i.

After the demand-weighted travelled distance has been defined, we can further enhance the mathematical formulation. Similar to the general types, the N-UFLP aims to minimize total costs by optimizing both facility locations and assignment relationships.

\min imize α \sum_{i \in I} y_{i} + β \sum_{j \in J} \sum_{i \in I} h_{j} d_{ij} x_{ij} = α \sum_{i \in N} y_{i} + β \sum_{j \in J} \sum_{i \in I} W_{ij} x_{ij}

(9)

where the first component aims to reduce facility setup costs, which are determined by construction rates and facility opening decision variables; the second component focuses on transportation costs, which depend on transportation rates, customer demand, travelled distances, and assignment decision variables.

In terms of constraints, additional considerations include node location integration and path-dependent distance metrics corresponding to previous assumption:

Self-service feasibility:

x_{j j} = y_{j}, \forall j \in J

(10)

which forces the assignment variable that represents “customer j serves itself” (x_jj) to be the value of the opening variable of the same node (y_i). Compared to the classical UFLP, although the standard assignment-opening link x_ij ≤ y_j, ∀ i ∈ I, ∀ i ∈ J already prevents assigning demand at a closed site, it is noteworthy that this formulation incorporates node location integration, allowing facilities and customers to be located at the same point by employing a shared set J (J ⊆ I) for both facility and customer decision variables. This means that if node j is selected as a facility location, the demand at node j is automatically assigned to the facility at that point.

Path consistency (triangle inequality) constraint:

d_{i r} ≦ d_{i m} + d_{m r}, \forall i, m, r \in I

(11)

where d_ik refers to the effective transportation distance (shortest-path length) between nodes i and k on the underlying network G = (I, E) with non-negative edge weights. The inequality enforces the triangle inequality for every ordered triple of nodes. For any detour that goes from i to k through an intermediate node m, the direct shortest-path distance d_ik cannot exceed the length of that detour d_im + d_mr.

Variables x_ij and y_i are the decision variables representing the assignment relationships and facility location, respectively:

x_{i j} = \{\begin{array}{l} 1 i f c u s t o m e r j i s a s s i g n e d t o f a c i l i t y i, a n d \\ 0 o t h e r w i s e; \end{array}

(12)

y_{i} = \{\begin{array}{l} 1 i f f a c i l i t y i s o p e n e d a t n o d e i, a n d \\ 0 o t h e r w i s e; \end{array}

(13)

It should be noted that customer nodes may also be candidate facility sites, allowing co-location.

The N-UFLP model can be formulated as follows:

\min α \sum_{i \in I} y_{i} + β \sum_{j \in J} \sum_{i \in I} h_{j} d_{ij} x_{ij}

(14)

s . t . \sum_{i \in I} x_{i j} = 1, \forall j \in J,

(15)

\sum_{i \in I} y_{i} \leq p,

(16)

x_{i j} \leq y_{i}, \forall i \in I, \forall j \in J,

(17)

x_{j j} = y_{j}, \forall j \in J,

(18)

d_{i k} ≦ d_{i m} + d_{m k}, \forall i, m, k \in I,

(19)

y_{i} \in {0, 1} \forall i \in I,

(20)

x_{i j} \in {0, 1} \forall i \in I, \forall j \in J .

(21)

where y_i and x_ij are separate facility location decision variables and customer assignment decision variables. The objective function (14) is to minimize the total cost. Constraint (15) means each customer at a demand site j must be served. Constraint (16) ensures the maximum allowed facility number is p. In this research, we further limit p equal to the square root value of the number of customer sites N. Constraint (17) means facility activation, which guarantees no assignment to the closed facility or customer demand can only be assigned to open facilities. Constraint (18) enforces the commonsense rule that a customer may be self-served only if a facility is actually installed at that node, closing a loophole that arises when customer and facility sets overlap and self-service incurs zero travel distance. Constraint (19) secures the metric validity of the distance matrix, aligns transportation costs with physically realizable routes and prevents the optimization model from exploiting impossible shortcuts when path-dependent distances are part of the decision process.

2.2. Assumptions

Other considerations (together with the assumptions in Section 2.1) may be as follows:

All facilities incur identical fixed setup costs, i.e., no site-dependent cost variations.
Facilities are either fully open or closed; partial opening is prohibited.
Each customer is entirely served by a single facility.
There is no upper limit on the demand a facility can serve.
All costs, demands, and distances are deterministic and known with certainty.
The maximum number of facilities p is the square root of N (budget-constrained).

3. Methodology

3.1. Baseline Methods

3.1.1. Roulette Wheel Initialization (RWI)

Roulette wheel initialization, as described by Celebi [41], is widely adopted as a heuristic approach, most notably for enhancing iterative selection mechanisms in genetic algorithms and the initialization process of K-Means++. The essence of this method lies in its probabilistic selection scheme: at iteration t, the probability of selecting node k from the remaining candidates K\S_t₋₁ is proportional to its weight ω_k, normalized over all unselected nodes. Formally,

P (s_{t} = k ∣ S_{t - 1}) = \frac{ω_{k}}{\sum_{j \in K \ S_{t - 1}} ω_{j}}, s_{t} ~ Discrete (P (\cdot))

(22)

where S_t₋₁ is the set of facilities already selected before iteration t; K\S_t₋₁ is the set of remaining candidate nodes. In the basic version, one may set ω_k = (δ_k)^min. Let s_t~Discrete(P(·)) denote that S_t is drawn from the discrete distribution defined by the probabilities P_k. This ensures that nodes with higher weights have a correspondingly greater chance of being selected, thereby guiding the initialization process in a data-driven manner.

3.1.2. Greedy Initialization (GI)

The greedy heuristic initialization method [19,20], commonly applied in facility location problems, iteratively selects facilities to minimize the N-UFLP’s total cost. While prioritizing immediate cost reduction, it often lacks exploratory diversity, which may lead to convergence at local optima. Let S_t denote the set of facilities selected at iteration t (with S₀ = ∅) and K\S_t₋₁ represent the set of unselected candidate nodes, where K = {1, 2, …, N}. The basic process will be as follows.

Phase 1 (initial facility selection)
Initialize S₀ = ∅, t = 0, K\S_t₋₁ = K.
Phase 2 (iterative marginal-cost evaluation)
While |S_t| < p, set t = t + 1:

(a): Cost evaluation

For each candidate node k ∈ K\S_t₋₁, compute the temporary total cost Z(S_t₋₁ ∪ {k}) incurred by opening k:

Fixed facility cost:

FC (S_{t - 1} \cup {k}) = α \cdot ({| S}_{t - 1} | + 1)

(23)

Transportation cost: Assign each customer j ∈ J to its nearest facility in S_t₋₁ ∪ {k}

N N (j ∣ S_{t - 1} \cup {k}) = a r g m i n_{i \in S_{t - 1} \cup {k}} d_{j i}

(24)

where d_ji is the shortest-path distance from j to i. The transportation cost becomes

TC (S_{t - 1} \cup {k}) = β \sum_{j \in J} h_{j} \cdot d_{j, NN (j | S_{t - 1} \cup {k})}

(25)

stance from j to i.

Total cost:

Z (S_{t - 1} \cup {k}) = FC (S_{t - 1} \cup {k}) + TC (S_{t - 1} \cup {k})

(26)

(b): Selection criterion

Select the candidate k* ∈ K that maximizes the marginal cost reduction:

k^{*} = a r g m a x_{k \in K \ S_{t - 1}} [Z (S_{t - 1}) - Z (S_{t - 1} \cup {k})]

(27)

where Z(S_t₋₁) is the total cost of the current facility set S_t₋₁. It should be noted that when S_t₋₁ = ∅, define Z(S_t₋₁) = ∞ (no facilities imply infinite unassigned demand cost).

(c): Update

Add k* to the facility set and remove it from candidates:

S_{t} \leftarrow S_{t - 1} \cup {k^{*}}

(28)

The greedy method directly optimizes the N-UFLP objective by sequentially minimizing

Z (S_{t - 1} \cup {k}) = {\underset{︸}{α ({| S}_{t - 1} | + 1)}}_{Fixed Cost} + {\underset{︸}{β \sum_{j \in J} h_{j} \cdot \min_{i \in S_{t - 1} \cup {k}} d_{j i}}}_{Transportation Cost} .

(29)

which guarantees that each iteration incorporates the facility yielding the greatest reduction in total cost. Although commonly adopted as a constructive algorithm, it requires evaluating every remaining candidate facility k ∈ K and calculating the total cost Z(S_t₋₁ ∪ {k}). For each candidate, this involves the following: assigning every customer j ∈ J to its closest facility within S_t₋₁ ∪ {k}, where argmin operation over |S_t₋₁| + 1 facilities is performed; repeating this process for all |K\S_t₋₁| candidates. In each iteration, if |J| = N, |I| = M, and the final number of facilities is p, then the computational cost per iteration is O(|K\S_t₋₁|·|J|·(|S_t₋₁| + 1)). Since K\S_t₋₁ decreases as facilities are selected and p iterations are required, a typical worst-case bound is O(MNp). For large-scale networks or urban scenarios where both M and N are large, this complexity becomes computationally prohibitive, severely limiting the practicality of the greedy method for real-time or iterative metaheuristic frameworks.

3.1.3. Neighborhood Search Algorithm (NS) and Greedy-Initialized Neighborhood Search

Throughout this work, we use NS, originally appeared in [27], in order to minimize the total costs according to the N-UFLP objective. The general NS process is initialized with an initial S facility sets, usually produced by greedy initialization or alternative techniques, and at each iteration, a search of neighboring solutions that allow a reduction in cost is performed. In each iteration, all customers change their assignments to the closest unarrived facility, and for each facility, the algorithm searches through M potential locations to find the best place in terms of minimizing the sum of the demand-weighted distances to the assigned customers of the facility. Subsequently, the best neighboring solution S’ is accepted if it yields a lower total cost Z(S′), and this process repeats until no further improvement can be found.

Given an initial facility set S(|S| = p), each NS iteration involves the following: evaluating all possible swaps between i ∈ S and k ∈ K\S, giving approximately p × (M − p) ≈ pM neighborhood solutions. For each neighbor, all customers are reassigned to their nearest open facility, requiring N operations. Thus, each iteration of NS has complexity O(pMN). For common swap/add/drop implementations, this is often simplified to O(p²N) as seen in the baseline. If NS executes for T total iterations, the overall complexity of greedy-initialized NS is:

O (M \cdot N \cdot p) (GI) + O (T \cdot p^{2} \cdot N) (NS)

where T denotes the number of NS improvement iterations, which depends on the convergence criteria.

3.1.4. Variable Neighborhood Search (VNS) and Greedy-Initialized Variable Neighborhood Search

The NS framework is adapted to a sequence of dynamically varying neighborhood structures in the Variable Neighborhood Search (VNS) [28] that helps it to avoid premature convergence to local optima and allows for an increased exploration of the solution space. The algorithm begins by generating an initial solution set S, typically via GI or other methods. Multiple neighborhoods N₁, N₂, …, N_k are defined, each representing increasingly complex move operators, such as single-swap, double-swap, and demand-weighted perturbations. In each outer iteration, the current solution S undergoes a random perturbation within the current neighborhood N_i, referred to as “shaking”, to diversify the search. This is followed by a local search step using NS to refine the perturbed solution. If a superior solution is not achieved, the algorithm transitions to the next, larger neighborhood, continuing this process sequentially through all k neighborhoods.

The analysis of VNS complexity is structured into three distinct sections: employing k distinct neighborhood structures; carrying out a local search in each neighborhood, each having complexity O(p²N); running the primary VNS process for L outer iterations, during which the full suite of k neighborhoods is explored. The total overall complexity of greedy-initialized VNS is

O (M \cdot N \cdot p) (GI) + O (L \cdot k \cdot p^{2} \cdot N) (VNS)

where L denotes the number of VNS main loop cycles (typically much smaller than M); the parameter k serves as a hyperparameter representing the number of neighborhood types utilized in the model.

3.2. Network- and Demand-Weighted Roulette Wheel Initialization (NDWRWI)

Although the greedy-initialized NS and VNS algorithms are very popular because of their simplicity of the concept and also the quality of solution for relatively small and midsize problems, they have serious drawbacks in computational complexity and speed, especially when the network size becomes larger. However, the RWI algorithm is limited to continuous space location problems and demand data or network constrained problems at the same time. To tackle these drawbacks, we develop a Network- and Demand- Weighted Roulette Wheel Initialization (NDWRWI), which combines demand knowledge and network structure with a probabilistic kind of approach to produce better initial facility distribution. We hope that NS/VNS algorithms will have stronger exploration ability and better overall performance in the solution of the large-scale N-UFLP when the traditional initialization is replaced by our NDWRWI method. We introduce the following notation:

t—an iteration index for facility selection, t = 1, 2, …, p, where p ≤ N.
S_t = {s₁, s₂, …, s_t}—a set of currently selected facility locations by the end of iteration t.
s_t—a newly selected facility at the t-th iteration, where s_t ∈ K\S_t₋₁.

Phase 1 (initial facility selection)

The first facility s₁ is selected uniformly at random from the first N customer nodes K to avoid bias toward densely populated regions. This ensures spatial dispersion in the initial configuration:

s_{1} ~ U (K), where S_{1} = \{s_{1}\}

(30)

Phase 2 (iterative roulette wheel selection)

For each subsequent iteration (t = 2, …, p), one additional facility s_t is selected from the remaining candidates K\S_t₋₁, as follows:

(a): Minimum distance calculation:

For each candidate node k ∈ K\S_t₋₁, calculate the shortest-path distance to the nearest already selected facility:

d_{k}^{m i n} = \min_{i \in S_{t - 1}} D (k, i), \forall k \in K ∣ S_{t - 1}

(31)

(b): Weight assignment:

Assign weight w_k to node k proportional to its unsatisfied demand h_k and its minimum distance from existing facilities d_k^min:

ω_{k} = d_{k}^{\min} \cdot h_{k}, \forall k \in K | S_{t - 1}

(32)

which improves over the original non-demand-driven initialization method. In comparison, the roulette wheel initialization used the mathematical expression ω_k = (δ_k)^min, where the initialization of nodes is determined solely by their Euclidean distances; ω_k is to balance two objectives:

Demand satisfaction: prioritize nodes with higher demand (h_k) to maximize service coverage.
Spatial dispersion: favor nodes further from existing facilities (d_k)^minto avoid clustering while considering network topology.

Combining these terms multiplicatively ensures that both conditions must be satisfied for a node to receive high weight: Firstly, a high-demand node close to existing facilities (large h_k and small d_k^min) is less preferred than one with a similar demand but a greater distance. Then, a distant node with a low demand (large d_k^min and small h_k) is also deprioritized. Candidates with both a higher demand and a greater distance from current facilities receive larger weights, encouraging coverage expansion and demand satisfaction.

(c): Probability normalization:

Normalize the weights to obtain selection probabilities:

P_{k} = \frac{ω_{k}}{\sum_{k^{'} \in K} ω_{k^{'}}}, \forall k \in K

(33)

Any node already selected (j ∈ S_t₋₁) is excluded (P_k = 0). This equation represents normalization that converts absolute weights (w_k) into selection probabilities (P_k) to achieve the following objectives:

Ensure valid probability distribution (∑P_k = 1);
Proportionally allocate selection likelihood based on relative weights.

This aligns with RWS principles in stochastic optimization, where higher-weighted candidates have proportionally higher chances of selection.

(d): Roulette wheel selection

Generate a random number θ ~ U(0, 1) and select the next facility s_t such that:

s_{t} = \arg \min_{k^{″}} (\sum_{k^{″} = 1}^{N} P_{k^{″}} \geq θ)

(34)

Phase 3 (termination criteria)

Iterative sampling stops when t = p, which means an upper bound on the number of facilities has been reached. If none of the above holds, set t ← t + 1 and return to Phase 2. The pseudo code for processing in three stages is as followed Algorithm 1.

Algorithm 1: Network- and Demand-Weighted Roulette Wheel Initialization

Input:
K ← {1, 2, …, N}           // Set of candidate facility nodes
h_k for all k ∈ K                        // Demand at node k
D(k, i) for all k, i ∈ K   // Shortest-path distance between any pair of nodes
p                                               // Number of facilities to select
Output:
S_p                                               // Selected facility set of cardinality p

Phase 1: Randomly select the first facility
1. Randomly select s₁ ∈ K uniformly:
S₁ ← {s₁}

Phase 2: Iterative weighted roulette wheel selection
2. For t = 2 to p, perform the following steps:
2.1 Set candidates ← K\S_t₋₁
     2.2 For each k ∈ candidates,
     compute the minimum distance to current selected facilities:
d_min_,k ← min_i_∈_S_{t−1} D(k, i)
     compute weighted distance:
ω_k ← d_min,k × h_k
2.3 Normalize weights to get probabilities:
for all k ∈ candidates,
P_k ← ω_k/(∑_k_{’∈candidates} ω_k_’);
2.4 Exclude already selected nodes (i.e., P_j = 0 for j∈S_t₋₁)
2.5 Use roulette wheel selection to choose s_t:
- Generate a random number θ ~ U(0, 1);
- Find s_t such that:
∑P_k_″ ≥ θ, where k″ indexes over candidates
2.6 Update selected facility set:
S_t ← S_t₋₁ ∪ s_t

Phase 3: termination
3. After t = p iterations, return final facility set S_p.

3.3. Mechanism Comparison and Complexity Derivation

We theoretically analyze the practical competitiveness of the proposed NDWRWI-initialized NS and VNS frameworks for the N-UFLP, especially when comparing them to the standard greedy-initialized methods. At the problem level, one important characteristic of NDWRWI is that it initializes the design based on the demand information and network topology, considering the initial layout as a probabilistic selection device. Mathematically, the selection probability for a candidate facility node k in NDWRWI is given by:

P_{k} = \frac{d_{k}^{m i n} \cdot h_{k}}{\sum_{k^{'} \in K \ S_{t - 1}} d_{k^{'}}^{m i n} \cdot h_{k^{'}}}

(35)

In contrast, the GI selects facilities according to the addition of immediate cost reductions only as described by the following:

k^{*} = a r g \max_{k \in K \ S_{t - 1}} [Z (S_{t - 1}) - Z (S_{t - 1} \cup {k})]

(36)

While the GI explicitly scales down the incremental costs, it ignores the overall spatial-demand relations, leading to locally optimal but globally suboptimal configurations. Similarly, traditional RWI, which selects nodes based solely on spatial distances without considering node-specific demand, uses probabilities defined as:

P_{k} = \frac{δ_{k}^{m i n}}{\sum_{k^{'} \in K \ S_{t - 1}} δ_{k^{'}}^{m i n}}

(37)

In contrast, RWI considers only spatial distances, disregarding demand and potentially allocating facilities to low-demand areas. RI assigns uniform probabilities, ignoring both spatial and demand factors, leading to inefficient configurations.

Then, we derive the complexity of NDWRWI. In this method, p facilities are selected iteratively. In the t-th iteration, the following steps are performed:

For each remaining candidate node (k ∈ K\S_t₋₁), the minimum distance to current set S_t₋₁ is computed (t − 1 comparisons per candidate).
Demand-weighted scores and normalization are performed, with a total cost of O(|K\S_t₋₁|).
The computational cost per iteration t is O((M − t + 1)·(t − 1)), where M = |K| is the total number of candidate nodes.

Aggregating over p iterations gives:

\sum_{t = 2}^{p} O ((M - t + 1) (t - 1)) = O (M p^{2})

(38)

After initialization, the subsequent NS or VNS search processes mirror those standard frameworks:

NDWRWI-initialized NS: O(Mp²) + O(Tp²N);
NDWRWI-initialized VNS: O(Mp²) + O(Lkp²N)

where N is the number of customer nodes, T is the number of neighborhood search iterations, L is the VNS outer loops, and k is the number of neighborhood structures, typically small constants.

From the methodological perspective, NDWRWI reduces computational complexity while addressing the complex implosion bottleneck of greedy-initialized VNS. More precisely, the computational complexity for GI is O(M·N·p), and that for greedy-initialized VNS is O(M·N·p) + O(L·k·p²·N), which illustrates repeated evaluations of total costs across all candidate nodes and customer assignments for every facility addition. The critical advantage of NDWRWI-initialized NS and VNS over greedy-initialized VNS lies in the comparison of initialization complexities, specifically O(Mp²) versus O(M·N·p). The network size is defined in terms of M, the total number of nodes. In applications, M is typically large, so that the problem can be computationally intractable. Among these nodes, a subset, N, is considered as customer nodes, being the sites from which demand comes, or the sites that need the service. Typically, N ≤ M, yet both parameters can be large, indicating the scale of the problem. P determines the number of final facilities to be sited, which is substantially smaller than N (N ≫ p). Therefore, the complexity associated with GI, proportional to the larger parameter N, becomes computationally prohibitive as the problem scales. Conversely, NDWRWI’s complexity, governed primarily by p², grows far more slowly, making it substantially more computationally feasible for larger instances.

The complexity of the VNS phase is uniformly given by O(Lkp²N). The overall complexity is the sum of initialization and VNS phases. The complexity in the VNS phase is mainly determined by four factors: the number of external iterations; the external iteration count L, typically a small constant; the number of neighborhoods k, also generally a small constant; the number of facilities p; and the number of customer nodes N, with the latter two being the primary variables in large-scale problems. Consequently, the overall complexity formula can be simplified as O(Lkp²N) ≈ O(p²N), reflecting that the complexity is mainly determined by the number of facilities and customer nodes.

If one considers the overall computational effort of optimization methods as a measure of the initialization effect, the difference between greedy-initialized VNS and NDWRWI-initialized VNS is evident. For the greedy-initialized VNS, the total complexity can be represented as O(MNp) + O(Lkp²N) ≈ O(MNp + p²N). In general, with the number of nodes N being of the same order as the number of customers and M ≈ N and p being smaller than N (p ≪ N), the computational complexity of the initialization phase is approximately O(p²N), after which the total complexity will change to

O (N^{2} p + p^{2} N) \approx O (N^{2} p)

(39)

which implies that, for greedy-initialized VNS, the initialization phase may dominate the overall computational complexity, elevating it to the higher-order term O(p²N). In contrast, the NDWRWI-initialized VNS method, represented by O(Mp²) + O((Lkp²N) ≈ O(Mp² + p²N), presents a different pattern. Under the same assumption (M ≈ N), the initialization complexity reduces to O(Np²). Therefore, the total complexity can be approximated as

O (N p^{2} + p^{2} N) \approx O (p^{2} N)

(40)

Thus, the computation complexity of the initialization step of NDWRWI-initialized VNS is approximately the same as or lower than that of the VNS stage. Thus, the initialization procedure of NDWRWI adds little to the global complexity and does not impose additional computational burden. Theoretical analysis can be analytically inferred that the initialization of the GI technique can be the numerical bottleneck in terms of computational efficiency, especially in large-size network scenarios. The contribution of NDWRWI, in contrast, is minimal to the total computational complexity, and its complexity primarily depends on the VNS stage. This provides NDWRWI with significant scalability and efficiency benefits.

3.4. Benchmark Methods

To benchmark the performance of the proposed NS/VNS framework, four state-of-the-art methods were implemented, converging to the same settings as available in their original literature and for the sake of reproducibility. The following methods were used for testing: (i) Mixed-Integer Programming (MIP)—an exact solution method that employs the Branch-and-Price algorithm [10] to solve UFLP instances. In most cases, the method can achieve optimality for small-scale instances and is widely used as a state-of-the-art improvement method. However, its exponential time complexity limits scalability, particularly for networks with N > 500. Key parameters include a relative optimality gap tolerance of MIPGap = 1 × 10⁻⁴ and a runtime limit of 3600 s; (ii) greedy heuristic—a heuristic that iteratively selects facilities to minimize the total cost objective [19,20]. The method employs a drop heuristic, sequentially adding facilities that maximize demand-weighted distance reduction. While computationally efficient, the method’s myopic selection process may overlook globally optimal solutions; (iii) genetic algorithm (GA): This metaheuristic combines evolutionary operators such as crossover and mutation to explore the solution space [22,24]. The method implementation uses tournament selection and path-based crossover, with parameters including a population size of 50, a crossover rate of 0.8, and a mutation rate of 0.1. While effective for global exploration, the method is sensitive to parameter tuning; (iv) Lagrangian relaxation—a dual decomposition approach that relaxes customer assignment constraints and employs subgradient optimization to solve the relaxed problem [13,15]. The implementation iteratively updates Lagrangian multipliers to balance primal and dual bounds, achieving tight bounds for large-scale problems. However, recovering feasible solutions from the relaxed problem introduces overhead. In addition to the above models, there are other four approaches which share common improvement-stage algorithms (NS and VNS) but differ in their initial solution construction through either RI or GI. These can be categorized into RI-based methods (NS1964 and VNS1997) and GI-based methods (NSGreedy and VNSGreedy).

4. Results

4.1. Setting up the Case Study

To evaluate the performance and applicability of the proposed N-UFLP model and algorithms, we designed experiments centered on large-scale, realistic network data. Classical UFLP studies often rely on discrete benchmarks, such as the Beasley OR-Library [43] synthetic instances. These instances lack explicit network structure and spatial connectivity. In contrast, our research uses datasets that explicitly encode network topology, link attributes, and origin-destination demand flows.

As the principal data source, we utilize the open-access “Transportation Networks for Research” repository [46], curated by Ben Stabler and the TRB Network Modeling Committee, as our primary data source. This widely recognized benchmark dataset [55,56,57] features a diverse collection of real-world and synthetic transportation networks, spanning from small educational examples to large, complex metropolitan regions. Each one provides node coordinates and detailed link attributes, facilitating both algorithmic generality and benchmarking of network-dependent facility location methods. Representative instances include the Sioux Falls network and various subregions of Berlin, Anaheim, and Gold Coast, with zone counts from 24 to 4807 and demand matrices featuring up to 1064 unique OD pairs. In total, seven instances are selected in this study (see Table 1).

This study presents an empirical evaluation of our proposed NS2025 and VNS2025 algorithms, both initialized with NDWRWI, compared to eight established improvement methods as benchmark methods (Table A1). While the rationale for multi-method and multi-case controls has been previously discussed, the multiple-run group involves executing each algorithm four times per stance. This repeated execution allows for rigorous evaluation of algorithm performance, enabling the identification of result variability under identical conditions and minimizing the influence of stochastic or probabilistic elements inherent in certain methods.

The seven primary output metrics are defined to comprehensively evaluate algorithmic performance, focusing on solution quality, computational efficiency, reliability, and clustering effectiveness. The optimal number of open facilities represents a scalar value denoting the ideal count of facilities selected, effectively meeting demand requirements under defined constraints. The optimal total cost metric combines facility setup costs (set uniformly at 10) and transportation cost rate (set at 600,000), resulting in a scalar total derived from optimized customer-to-facility assignments. To further facilitate comparative analysis across different methods within the same scenario, the LB GAP (%) metric is introduced. For example, using the total cost metric as a given scenario, there are 10 methods indexed by the set Q = {1, …, 10}. Let TC_q denote the total cost obtained by method m (m ∈ Q) and TC_max = max{m ∈ Q} represent the maximum total cost among all methods. Then, the Total Cost LB GAP (%) for method q is defined as GAP_q = (TC_max − TC_q)/TC_max × 100%, equivalently expressed as GAP_q = 1 − TC_q/TC_max. The computation time for the best solution quantifies the average computational effort, measured in seconds, to achieve the optimal configuration. Stability and result dispersion are further evaluated by the cost interquartile range, calculated as the interquartile range (Q3–Q1) of total costs across multiple simulation runs. Last but not least, the Silhouette indicator [58] is incorporated to evaluate clustering quality in customer-to-facility assignments. This metric quantifies the compactness within each facility cluster (intra-cluster distance) and the separation between facility clusters (inter-cluster distance). Ranging from −1 to 1, a higher Silhouette value close to 1 indicates well-defined, spatially coherent clusters, demonstrating that customers assigned to each facility are closely grouped, with clear boundaries distinguishing different service areas. These metrics underscore the spatial rationality and clustering performance of the solutions beyond cost minimization.

In addition, case instances require a strict comparison of methods and conditions because there are many differences in both their quantities and quality. To address this, a standardized statistical measure proposed by Balk et al. [59] that quantifies each method’s relative performance across all cases can be used, enabling a fair and meaningful cross-case performance comparison. The cross-case relative performance (CRP) metric is defined through the following steps. First, normalization within each case is performed by computing the performance ratio: for each case c (c = 1,…,6), the best observed Optimal Total Cost TC_c is identified (minimum across all methods). The performance ratio for method m on case c is thus defined as R_(m,c) = TC_(m,c), ensuring that R_(m,c) ≥ 1 and assigning a value of 1 to the best-performing method. Second, to aggregate these normalized ratios across all cases while mitigating the impact of outliers, a geometric mean is utilized. Consequently, the cross-case relative performance CRP_m for method m is computed as the geometric mean:

C R P_{m} = {(\prod_{c = 1}^{C} R_{m, c})}^{1 / C}

(41)

where smaller values of CRP_m indicate superior performance, with an ideal lower bound of 1. In addition to the primary cross-case geometric mean indicator, average ranking (Rank_m) is provided for further insights [60].

Finally, the proposed method is solved by MATLAB R2023a on a computer with an i7 processor @ 2.60 GHz, 8 GB RAM, and a Windows 10 64-bit operating system. Parallel computing is adopted by using six processors in solving the related problem.

4.2. Small-Scale Illustrative Scenario

This section includes step-by-step demonstrations of algorithmic processes using a small-scale illustrative scenario. The 6-point tree, mentioned in Section 2.1 (Figure 1), has been widely used as a benchmark problem for testing multiple methods (e.g., Daskin and Maass [45]). Using the shortest-path algorithm, the distance matrix D can be calculated. For p = 3, the task is to locate two facilities to minimize the total weighted transportation cost, considering network constraints. In this case, the global optimal solution was calculated, shown in Figure 2b. NS1964 (RI-based method) produced {A, C, E} as the initial facility configuration. As evident in Figure 2d, this choice was misaligned with the spatial distribution of high-demand nodes such as D and E, leading to inefficient customer assignments and higher total transportation costs. Although iterative improvements allowed NS1964 to refine facilities to a final configuration of {A, D, E}, this was still suboptimal due to the inefficient placement of node A, a low-demand peripheral site. Consequently, the results of the NS method with different initialization methods are shown in Table 2. NS1964 achieved a total cost of 2375, which was notably higher than other methods. The failure to balance the considerations of demand and spatial coverage in the initial phase led to a constrained capability for subsequent improvement.

In contrast, the NDWRWI algorithm incorporates both network topology and demand distribution to iteratively select optimal facility nodes, ensuring efficient service coverage. (i) First iteration: The process begins with the random selection of node D as the initial facility. From then on, the algorithm iterates to select the next facility by evaluating all remaining candidate nodes based on their proximity to already selected facilities and their associated demands. In the first iteration, with node D as the sole facility, the selection metric for each candidate k is determined as the product of the shortest-path distance d_k^min between k and D and the demand h_k at candidate k. For instance, node E demonstrates a high demand of 250 and a distance of 10 to node D, yielding a weight significantly higher than other nodes. Normalizing these weights into probabilities and employing a random sampling mechanism, node E emerges as the second facility, as its demand and distance strongly align with the algorithm’s prioritization criteria. The result highlights the effectiveness of NDWRWI in addressing high-demand nodes that are strategically located relative to the initially selected facilities. (ii) Second iteration: In the second iteration, the set of selected facilities expands to D and E, and the same evaluation process is repeated for the remaining unselected nodes. Each node’s weight is recalculated with its shortest-path distance to the closer of the two facilities, D or E, ensuring that the already-established spatial coverage is optimally extended. Node B exhibits a balanced combination of moderate demand (150) and proximity to node D, resulting in a high selection probability. Despite the presence of other candidates like F, their weights are marginally lower due to either suboptimal locations or smaller demand magnitudes. The probabilistic mechanism ultimately selects node B as the final facility, completing the facility configuration of {D, E, B}. (iii) Improvement: A further iterative refinement yields negligible adjustments since the initial configuration is already optimal, as shown in Figure 2e. The final total cost achieved was 2275, the lowest among all methods. This outcome underscores the superiority of NDWRWI in guiding the NS search process to converge efficiently on the global optimum within a comparatively shorter time.

4.3. Warehouse Location Analysis

This section presents the quantitative application of the N-UFLP to the classical Warehouse Location Problem (WLP) and how the N-UFLP can be solved, by first constructing a solution with NDWRWI and then improving it through VNS. The WLP as a version of the UFLP is a fundamental decision-making problem in supply chain logistics that is based on optimization of dual objectives: minimizing total operational costs but satisfying customer demands from distributed facilities. At the core of this problem are two primary sources of cost: the fixed costs of setting up facilities (including capital investment in facilities, wages, and ongoing operational costs of the facilities or warehouses) and the variable costs of transportation (for routing inventory from warehouses to demands nodes or customers in the network). In the context of the Sioux Falls network as a testbed, a classical 24-node, 76-edge transportation network, the WLP models closely enough the size and level of complexity of many urban distribution scenarios. Every node is simultaneously considered as both a possible warehouse location and a customer location. The formulation of the WLP as an N-UFLP in this context confers to a higher level of analytical manageability to the model while inheriting the challenging features like co-locating service, self-fulfillment, and network-wide accessibility. The use of the Floyd−Warshall algorithm to compute shortest-path distances across (discrete) routes abstracted within the graph guarantees that the transportation costs model the realistic requirements for travel (including the effects of the network topology, potential detours, and localized bottlenecks) as opposed to relying on a direct or Euclidean measurement. Furthermore, by integrating demand-weighted distances, each transported unit’s cost is scaled by both the required tonnage and the computed route length (W_ij = h_j·d_ij). The objective function consequently becomes the minimization of the sum of facility fixed costs and demand-weighted transportation costs, each parameterized to reflect realistic pricing, thus providing actionable insights for logistics planners. Comprehensive yet practical constraints ensure that every node’s demand is fulfilled, warehouses are only activated where necessary, and the network remains connected and viable, even in the presence of isolated nodes that are incorporated through strategic virtual links if needed.

In conducting the WLP, realistic cost parameters were introduced to reflect practical logistics expenses. Transportation costs were calculated based on a standardized average transportation rate of 10 ¥/(ton·kilometer), representative of a diverse range of transported commodities, from daily essentials to electronics. Facility setup costs were estimated by considering expenses associated with leasing warehouses capable of handling 30,000 tons of goods, with storage rates set at 10 ¥/(ton·day). This yields a two-day total fixed cost baseline of 600,000 ¥ per facility.

Figure 3 summarizes the experimental analysis performed using NDWRWI-based, RI-based, and GI-based approaches, as well as their subsequent VNS-driven improvements. While the previous section focused solely on initialization, this section aims to further refine the process based on that foundation, thereby forming a complete closed-loop solution to the problem (see Figure 3b–d). Initially, the NDWRWI selected warehouse locations at nodes {8, 9, 14, 22} (Figure 3b), demonstrating its capability of rapidly identifying critical network locations and high-demand nodes. Notably, node 22, identified as one of the top three nodes by demand (24,400 units), was directly selected by the NDWRWI algorithm. Node 9, located centrally, further underscores NDWRWI’s effectiveness in integrating both network topology and demand weighting into initial selections. However, node 10, which exhibited the highest trip attraction (45,100 units), was notably absent from this initial selection. Through subsequent optimization in the VNS2025 improvement phase (Figure 3b), the final warehouse locations were significantly refined to {10, 13, 16, 22}. This optimization notably incorporated node 10, thus addressing and rectifying the probabilistic gaps left by the initial NDWRWI construction. The refined facility locations selected through VNS2025 exhibit balanced geographic dispersion, covering all major sectors of the network, including node 13 (southwest), node 22 (southeast), node 16 (northeast), and node 10 (central). Consequently, the final solution ensures comprehensive and efficient coverage across the entire urban area, achieving a total cost of approximately 5,050,045 ¥, demonstrating effective cost optimization while maintaining full network accessibility.

In contrast, the RI approach (Figure 3c) generated an initial selection of warehouse nodes {2, 4, 19, 20}, clearly illustrating its lack of strategic alignment with the demand-weighted distribution. The subsequent traditional VNS1997 improvement process (Figure 3c), while partially mitigating RI’s inherent randomness, still yielded inferior spatial clustering and higher operational costs relative to the proposed NDWRWI-based approach. The GI-based method (Figure 3d), typically recognized for myopic optimization, similarly encountered limitations due to immediate cost-reduction biases and overlooked spatial distribution factors. Its improvement through VNSGreedy (VNSG; Figure 3d) offered marginal spatial rationalization but failed to match the comprehensive coverage and balanced cost efficiency achieved by VNS2025.

4.4. Ultra-Large-Scale Network Analysis

In this section, the algorithmic performance of various methods will be revealed under large-scale networks. The Gold Coast network dataset represents an ultra-large-scale urban road system, widely recognized as a challenging and representative benchmark in contemporary transport network analysis. The network comprises 4807 nodes and 11,140 directed links and encompasses 1068 zones, with a demand matrix specifying 139,253 trips. The scale of this network far exceeds that of the OR-Library [43] commonly used in standard tests (with a maximum of 900 nodes), making it one of the few ultra-large-scale network tests to date. Considering the data scale, this section primarily employs six methods, including NS1964, NS2025, NSG, VNS1997, VNS2025, and VNSG for case analysis. Each method is executed four times to ensure the robustness and reliability of the results. Table 3 presents aggregated results for the Gold Coast network, enabling a robust comparison of NDWRWI-based methods (NS2025 and VNS2025) against state-of-the-art GI-based (NSG and VNSG) and RI-based (NS1964 and VNS1997) alternatives.

In terms of optimal total cost, NS1964 achieved the lowest value (9,189,353), closely followed by NS2025 (9,372,502). Although NSG produced a slightly better total cost than NS2025 (9,206,563), it required nearly ten times more computational effort, highlighting a substantial efficiency gap. VNS1997 consistently resulted in higher costs (up to 10,066,266), underperforming relative to all other approaches. The proposed VNS2025, in contrast, delivered competitive total costs (best at 9,736,593), representing a notable 3.7% improvement over VNS1997 and approaching the performance of VNSG (9,497,958). Regarding computational efficiency, the NS-family methods demonstrated clear advantages: both NS1964 and NS2025 solved the problem within 8–19 s, significantly outperforming NSG, which required over 130 s. In the VNS category, while computation times increased due to neighborhood search complexity, VNS2025 reduced the required time by approximately 27% compared to VNS1997 (2600 s vs. 3562 s). Clustering quality, as measured by the Silhouette index, further distinguishes the NDWRWI-based methods. NS2025 achieved the highest Silhouette score (0.3859), marginally surpassing NS1964 (0.3833) and NSG (0.3752). VNS2025 similarly outperformed its competitors, attaining 0.3776—higher than both VNS1997 (0.3626) and VNSG (0.3197). These findings demonstrate that NDWRWI-based approaches consistently deliver superior spatial clustering. In summary, for ultra-large-scale facility location problems, NDWRWI-based methods offer a compelling balance of clustering quality and computational efficiency. While total costs may be marginally higher than those of GI-based algorithms, the resulting solutions remain highly competitive, particularly when factoring in algorithmic speed and the spatial rationality of facility assignments.

4.5. Comparative Analysis Across Case Cohorts: Multi-Method and Multi-Run Evaluation

This section presents an evaluation of algorithmic performance through experiments conducted on six large-scale network-based UFLP scenarios. Utilizing a nested-loop experimental framework, we systematically assessed ten distinct solution methods within each scenario, repeating each method four times to ensure statistical robustness. This approach resulted in a 10 (methods) × 6 (cases) performance matrix.

The comparative evaluation across multiple scenarios clearly confirms the methodological advantages of the NDWRWI-based methods over both RI-based and GI-based approaches. Notably, NS2025, while slightly lagging behind NSG in CRP (1.058 vs. 1.028), achieved a 28% reduction of average computation time and secured the best clustering quality, as evidenced by the highest average Silhouette rank (1.7) compared to NSGreedy’s 4.5. NS2025 also demonstrated the highest overall Silhouette score across all six benchmark scenarios, maintaining top rankings over both RI-based (NS1964, avg. rank = 3.3) and GI-based methods. This simultaneous improvement in speed and spatial clustering underscores the NDWRWI method’s capacity to accelerate solution convergence while enhancing the semantic coherence of customer-facility assignments. VNS2025 further validated this trend, outperforming VNSGreedy in four out of six Silhouette evaluations and reducing computation time by approximately 15%, thus demonstrating the robustness of demand-weighted initialization in metaheuristic frameworks. Beyond outperforming GI-based methods, NDWRWI-based approaches consistently proved superior to RI-based and classical VNS baselines. For example, NS2025 surpassed NS1964 in computational speed (rank 1.0 vs. 2.0), with lower CRP (1.058 vs. 1.059), and improved clustering (Silhouette rank: 1.7 vs. 3.3). Similarly, VNS2025 recorded a 3.2% CRP reduction against VNS1997 (1.121 vs. 1.158), while also achieving a substantial advance in Silhouette ranking (2.8 vs. 4.0–4.5 in baselines), indicating stronger spatial organization and reduced result volatility. While both NDWRWI-based and RI-based methods are inherently stochastic and exhibit greater inter-run variability than GI-based approaches, NDWRWI shows clear statistical gains in average rankings: for instance, NS2025 (average rank = 7.3) and VNS2025 (8.8) outperform their RI-based analogs, NS1964 (8.2) and VNS1997 (8.8), highlighting more reliable solution quality. These results position NDWRWI as an effective method for solving large-scale facility location problems, offering accelerated computation, competitively low costs, and consistently superior clustering, particularly in complex solution spaces where GI-based heuristics suffer from limited exploration.

Table 4 provides an overview of algorithmic performance, demonstrating the advantages of different solution strategies. The exact MIP method, used as the benchmark with a CRP value of 1.000. For comparison, greedy, GA, and Lagrangian relaxation approaches were also evaluated. Figure 4 provides a visual summary of the data presented in Table 4. Specifically, Figure 4a illustrates clustering effectiveness via the Silhouette coefficient, computational efficiency through IterTime, and solution quality using the Performance Gap in total cost. Additionally, Figure 4b displays the total cost distribution to demonstrate the variability in total cost outcomes across different runs.

5. Discussion

The empirical study provides a comparative study between the NDWRWI-based method (NS2025 and VNS2025) and the traditional RI- (NS1964 and VNS1997) or GI-based baselines (NSG and VNSG). The performance comparison shows the robust advantage of NDWRWI-based methods in clustering quality, solution generation, and computational efficiency. A few critical points deserve further discussion:

Total cost: In total cost performance, results from the ultra-large-scale Gold Coast dataset show that sometimes the VNSG algorithm can achieve a lower total cost level compared with the NDWRWI-based VNS2025. Specifically, VNSG obtained the lowest total cost for the problem (approximately 9,497,958, compared with VNS2025’s lower bound of 9,736,593, i.e., approximately 2.5%). This small advantage can be attributed to a myopic property of greedy initialization, which focuses on near sight local cost reduction and may lead to premature convergence with only local exploration. Although VNSG can have slightly better cost results in some situations, VNSG’s applicability is limited to non-uniform or complicated networks and is not a dominant method for other important performance metrics. Conversely, the NDWRWI dominates in other performance measures and proposes a competitive total cost solution.
Clustering quality: NDWRWI-based methods significantly outperform their GI- and RI-based counterparts. In experiments over multiple datasets and multiple runs, the Silhouette measure that assesses the tightness and separation of customer versus facility clusters strongly favored NDWRWI-based ones. NS2025 achieved a mean Silhouette score of 0.3859, which is significantly greater than NSG (0.3752) and RI (0.3833) with respect to the NDWRWI-based methods, showing that balanced clusters can be efficiently formed aligned with demand distribution and network topology using NDWRWI. Meanwhile, although the NDWRWI-based ones can significantly outperform in the majority of cases, different distributions of data can result in different clustering quality. Thus, additional study is needed to see how non-normal data distribution has effects.
Constructive solution: The impact of initialization strategies is further corroborated by the small-scale illustrative scenario, where the RI-based NS1964 initially chose suboptimal facilities {A, C, E}, resulting in inflated total costs (2375). due to poor alignment with high-demand nodes D and E. Though iterative improvements moved the solution towards {A, D, E}, the suboptimal placement of the peripheral low-demand node A constrained the overall effectiveness. In contrast, NDWRWI iteratively selected facilities {D, E, B} with high demand-distance weights, converging rapidly to the global optimum at a total cost of 2275. This example highlights the crucial role of demand- and topology-aware initialization in guiding neighborhood search algorithms towards high-quality solutions efficiently.
On the computational front: NDWRWI demonstrates improved initialization and algorithm complexity over the GI. The asymptotic complexity in the discussion above (Section 3.3) expects the initialization cost as O (Mp²) at best while greedy techniques incur O (MNp) costs. This effectiveness generalizes to realistic large-scale situations. For example, in Gold Coast experiments, NS2025 terminated optimization after about 8 to 19 s, ten times faster than NSG (~130 s), and with competitive solution quality. Even among the class of VNS, VNS2025 reached a faster execution time by 27% (2600 s) than classical VNS1997 (3562 s), demonstrating the ability of NDWRWI to accelerate convergence while maintaining solution quality. Note that while the NDWRWI-based methods are theoretically less complex than the GI-based methods, extension-based second-step algorithms may take longer to compute.
Solution stability: We must also counteract the endemic trade-offs introduced by NDWRWI’s probabilistic sampling. While facilitating more extensive investigation, the stochastic nature of the process leads to a certain degree of variation in solution quality, as evidenced by nonzero IQR cost distributions across multiple runs. In contrast to optimal methods such as MIP or deterministic greedy algorithms being zero-variability. However, we are still within such a moderate additional cost and everything remains acceptable for practical uses, notably when gains in clustering quality, solution quality, and computational time can be taken in account.

This also makes the NDWRWI method a viable alternative to constructive methods such as RI or GI, especially when dealing with large-scale networks. Unlike RI—which neglects both demand distribution and spatial characteristics—and GI, which focuses narrowly on immediate cost reductions, NDWRWI effectively integrates demand information and network topology from the outset. By assigning selection probabilities based on demand-weighted shortest-path distances, NDWRWI strategically directs the initial solution toward high-demand and central network locations. This not only reduces total costs by avoiding poorly positioned initial facility selections but also decreases computational complexity. The strong performance observed in clustering quality, evidenced by consistently higher Silhouette values compared to baseline methods, underscores the method’s capacity to generate solutions that are both economically efficient and spatially coherent.

6. Conclusions

This study formulated and solved the N-UFLP, an optimization problem with complex interaction between the network topology, customer demands, and the location of facilities. To solve this problem, we proposed a new constructive method, i.e., NDWRWI, which incorporates both demand intensity and network structure. We also compared our method with RI, GI, and other state-of-the-art methods by both numerical experiments and computational experiments (which covered a broad range of realistic network data).

As mentioned in Section 5, extending the NDWRWI framework to adaptive or hybrid initialization strategies may be promising to pursue a more robust optimization over a variety of datasets and complex network topologies. Further research on the scalability and generalization of NDWRWI-based algorithms in other facility location problems or instances with different distributions may also be promising.

The computational efficiency and fast convergence of NDWRWI are particularly desired in dynamic siting applications (e.g., just-in-time logistics for disaster relief or emergency rescue), in which the warehouse location has to be quickly adapted to the dynamic demand reallocation. It would be of particular interest in such settings to be able to easily change the location of facilities based on information about real-time demand, with the combined objectives of flexibility and efficiency. In addition, the demonstrated capability of the proposed approach to handle large-scale datasets makes it suitable for big data-driven designs such as e-commerce warehousing and national logistics networks. Last but not least, the applicability of NDWRWI warrants investigation in more complex traffic scenarios, such as those involving supply−demand imbalances [61]. The most important aspect is to account for road resistance factors and further strengthen the mathematical formulation [62]. In above situations, the scalability of NDWRWI guarantees that networks with thousands of nodes and intricate demand distribution can still be tackled efficiently, indicating that it can be a good candidate for problems like supply chains, urban infrastructure optimization, or smart city logistics.

Author Contributions

Conceptualization, J.L. and K.H.; data curation, J.L.; formal analysis, J.L. and S.Y.; funding acquisition, S.Y. and K.W.; resources, K.W.; investigation, S.J.; methodology, J.L. and K.H.; project administration, K.W.; software, J.L.; supervision, K.W. and S.J.; validation, S.Y. and K.H.; visualization, J.L.; writing—original draft, J.L. and S.J.; writing—review and editing, S.Y. and K.W. All authors have read and agreed to the published version of the manuscript.

Funding

This research is supported by the Initial Scientific Research Fund of Young Teachers in Anhui Jianzhu University (grant numbers: 2020QDZ37 and 2022QDZ25).

Data Availability Statement

The original contributions of this study are included in the article; for further inquiries, please contact the corresponding author.

Acknowledgments

The authors would like to express their sincere gratitude to N’Guerekata for his invaluable detailed revision suggestions in improving the logic and detailed errors of the article.

Conflicts of Interest

The authors declare no conflict of interest.

Abbreviations

Abbrev.	Terminology
pMP	p-median problem
UFLP	Uncapacitated Facility Location Problem
N-UFLP	Network-Based Uncapacitated Facility Location Problem
MIP	Mixed Integer Programming with the Branch-and-Price Algorithm
Greedy	greedy algorithm with a drop strategy
GA	genetic algorithm
Lagrangian	Lagrangian relaxation algorithm
NS1964	Neighborhood Search Algorithm with RI
NS2025	Neighborhood Search with NDWRWI
VNS1997	Variable Neighborhood Search with RI
VNS2025	Variable Neighborhood Search with NDWRWI
NSG/	Neighborhood Search with GI
NSGreedy	Neighborhood Search with GI
VNSG/	Variable Neighborhood Search with GI
VNSGreedy	Variable Neighborhood Search with GI
NDWRWI	Network- and Demand-Weighted Roulette Wheel Initialization
GI	greedy initialization
RI	random initialization
CRP	cross-case relative performance
Symbol	Definition
Sets
I	Set of all nodes (∣I∣ = M), including facilities and customers.
J	Subset of customer nodes/demand site (∣J∣ = N), also candidate facilities.
S	Final set of selected facilities. S = {s₁, s₂, …, s_p}. p × 1 vector of indices. Output.
Asgmt.	1 × N vector mapping customers to assigned facilities. Output.
K	Set of all candidate facility locations. K = {1, 2, …, N}.
Indices
i, r ∈I	Indices for general nodes (facilities or customers).
j∈J	Index specifically for customer nodes.
t	Iteration index for facility selection, t = 1, 2, …, p, where p ≤ N.
k∈K	K ∈ K corresponds to a potential facility.
Parameters
δ_ir	Adjacency-based distance between nodes i and r (direct edge length).
Δ	M × M adjacency distance matrix.
d_ir	Shortest-path distance between nodes i and r.
D	M × M shortest-path distance matrix.
(h_j)_j_∈_J	Demand weight (trip attraction) at customer j (N × 1 vector). Input.
W_ij	Demand-weighted traveled distance from node i to node j.
α	Uniform facility setup cost. Input components.
β	Transportation cost rate per unit-distance-demand. Input.
p	Maximum allowed facilities.
Z(s)	Total cost. Scalar objective value of solution S. Output.
Decision variables
y_i	y_i ∈ {0, 1}: 1 if a facility is opened at node i; 0 otherwise.
x_ij	x_ij ∈ {0, 1}: 1 if customer j is assigned to facility i; 0 otherwise.
Key matrices
NodeNames	M × 1 cell array of unique node identifiers. Input components^#.
NodeCoords	M × 2 matrix of geographic coordinates (longitude and latitude). Input.
DistTable	L × 3 matrix of adjacency distances (origin, destination, and edge length). Input.
DistanceMatrix	M × M matrix of precomputed shortest-path distances. Input.

Appendix A

Table A1. Overview of test methods.

Terminology	Abbrev.	Brief Explanation	Reference
Mixed Integer Programming with the Branch-and-Price Algorithm	MIP	Applies the Branch-and-Price algorithm	[10,11,12]
Greedy algorithm with a drop strategy	Greedy	A myopic algorithm that iteratively selects facility sites to minimize the demand-weighted total distance.	[19,20]
Genetic algorithm	GA	Metaheuristic algorithm involving selection, crossover, and mutation.	[22,24,25]
Lagrangian relaxation algorithm	Lagrangian	It generates feasible solutions and calculates lower bounds iteratively.	[13,14,15]
Neighborhood Search Algorithm with RI	NS1964	It iteratively updates facility locations and assignment relationships to minimize the demand-weighted total distance.	[27]
Neighborhood Search with NDWRWI	NS2025	It incorporates Network- and Demand-Weighted Roulette Wheel Initialization (NDWRWI).
Variable Neighborhood Search with RI	VNS1997	Metaheuristic method with key steps: shaking, local search, and swap evaluation.	[28]
Variable Neighborhood Search with NDWRWI	VNS2025	It incorporates NDWRWI and adjusts the p-range for enhanced performance.
Neighborhood Search with GI	NSG/ NSGreedy	It combines GI and adjusts the Neighborhood Search.	[19,20]
Variable Neighborhood Search with GI	VNSG/ VNSGreedy	It combines greedy initialization with the Variable Neighborhood Search algorithm.	[19,20]
Network- and Demand-Weighted Roulette Wheel Initialization	NDWRWI	A constructive method; NS2025 and VNS2025 belong to the NDWRWI-based methods.
Greedy initialization	GI	A constructive method; NSG and VNSG are both categorized as GI-based methods.	[19,20]
Random initialization	RI	A constructive method; NS1964 and VNS1997 belong to the RI-based methods.	[35]

References

Verter, V. Uncapacitated and capacitated facility location problems. In Foundations of Location Analysis; Eiselt, H., Marianov, V., Eds.; Springer: New York, NY, USA, 2011; Volume 155. [Google Scholar]
Daskin, M.S. Network and Discrete Location: Models, Algorithms, and Applications; John Wiley & Sons: Hoboken, NJ, USA, 1995. [Google Scholar]
Saldanha-da-Gama, F.; Wang, S. Discrete facility location problems. In Facility Location Under Uncertainty; Springer: Cham, Switzerland, 2024; Volume 356, pp. 25–45. [Google Scholar]
Weber, A. On the Location of Industries; University of Chicago Press: Chicago, IL, USA, 1909. [Google Scholar]
Hakimi, S.L. Optimum locations of switching centers and the absolute centers and medians of a graph. Oper. Res. 1964, 12, 450–459. [Google Scholar] [CrossRef]
Tansel, B.C. A review and annotated bibliography of location problems in the public sector. Eur. J. Oper. Res. 1983, 12, 42–56. [Google Scholar]
Beasley, J.E. Lagrangean heuristics for location problems. Eur. J. Oper. Res. 1993, 65, 383–399. [Google Scholar] [CrossRef]
Dantrakul, S.; Likasiri, C.; Pongvuthithum, R. Applied p-median and p-center algorithms for facility location problems. Expert Syst. Appl. 2014, 41, 3596–3604. [Google Scholar] [CrossRef]
Kariv, O.; Hakimi, S.L. An algorithmic approach to network location problems. II: The p-medians. SIAM J. Appl. Math. 1979, 37, 539–560. [Google Scholar] [CrossRef]
Corberán, Á.; Landete, M.; Peiró, J.; Saldanha-da-Gama, F. Improved polyhedral descriptions and exact procedures for a broad class of uncapacitated p-hub median problems. Transp. Res. Part B Methodol. 2019, 123, 38–63. [Google Scholar] [CrossRef]
Arslan, O. The location-or-routing problem. Transp. Res. Part B Methodol. 2021, 147, 1–21. [Google Scholar] [CrossRef]
Barbato, M.; Gouveia, L. The Hamiltonian p-median problem: Polyhedral results and branch-and-cut algorithms. Eur. J. Oper. Res. 2024, 316, 473–487. [Google Scholar] [CrossRef]
Beltran, C.; Tadonki, C.; Vial, J.P. Solving the p-median problem with a semi-Lagrangian relaxation. Comput. Optim. Appl. 2006, 35, 239–260. [Google Scholar] [CrossRef]
Beltran-Royo, C.; Vial, J.-P.; Alonso-Ayuso, A. Semi-Lagrangian relaxation applied to the uncapacitated facility location problem. Comput. Optim. Appl. 2012, 51, 387–409. [Google Scholar] [CrossRef]
Nezhad, A.M.; Manzour, H.; Salhi, S. Lagrangian relaxation heuristics for the uncapacitated single-source multi-product facility location problem. Int. J. Prod. Econ. 2013, 145, 713–723. [Google Scholar] [CrossRef]
Mokhtar, H.; Krishnamoorthy, M.; Ernst, A.T. The 2-allocation p-hub median problem and a modified Benders decomposition method for solving hub location problems. Comput. Oper. Res. 2019, 104, 375–393. [Google Scholar] [CrossRef]
Ghaffarinasab, N.; Çavuş, Ö.; Kara, B.Y. A mean-CVaR approach to the risk-averse single allocation hub location problem with flow-dependent economies of scale. Transp. Res. Part B Methodol. 2023, 167, 32–53. [Google Scholar] [CrossRef]
Ghosh, D. Neighborhood search heuristics for the uncapacitated facility location problem. Eur. J. Oper. Res. 2003, 150, 150–162. [Google Scholar] [CrossRef]
Kuehn, A.A.; Hamburger, M.J. A heuristic program for locating warehouses. Manag. Sci. 1963, 9, 643–666. [Google Scholar] [CrossRef]
Gokalp, O. An iterated greedy algorithm for the obnoxious p-median problem. Eng. Appl. Artif. Intell. 2020, 92, 103674. [Google Scholar] [CrossRef]
Atta, S.; Mahapatra, P.R.; Mukhopadhyay, A. Solving uncapacitated facility location problem using heuristic algorithms. Int. J. Nat. Comput. Res. 2019, 8, 18–50. [Google Scholar] [CrossRef]
Moreno-Perez, J.A.; Roda Garcia, J.L.; Moreno-Vega, J.M. A parallel genetic algorithm for the discrete p-median problem. Stud. Locat. Anal. 1994, 7, 131–141. [Google Scholar]
Kratica, J.; Tošic, D.; Filipović, V.; Ljubić, I. Solving the simple plant location problem by genetic algorithm. RAIRO Oper. Res. 2001, 35, 127–142. [Google Scholar] [CrossRef]
Alp, O.; Erkut, E.; Drezner, Z. An efficient genetic algorithm for the p-median problem. Ann. Oper. Res. 2003, 122, 21–42. [Google Scholar] [CrossRef]
Fathali, J. A genetic algorithm for the p-median problem with pos/neg weights. Appl. Math. Comput. 2006, 183, 1071–1083. [Google Scholar] [CrossRef]
Mladenović, N.; Brimberg, J.; Hansen, P.; Moreno-Pérez, J.A. The p-median problem: A survey of metaheuristic approaches. Eur. J. Oper. Res. 2007, 179, 927–939. [Google Scholar] [CrossRef]
Maranzana, F.E. On the location of supply points to minimize transport costs. J. Oper. Res. Soc. 1964, 15, 261–270. [Google Scholar] [CrossRef]
Hansen, P.; Mladenović, N. Variable neighborhood search for the p-median. Locat. Sci. 1997, 5, 207–226. [Google Scholar] [CrossRef]
Sutton, D.; Basiri, A.; Li, Z. Exploring a diagnostic test for missingness at random. Mathematics 2025, 13, 1728. [Google Scholar] [CrossRef]
Jacobsen, S.K. Heuristics for the capacitated plant location model. Eur. J. Oper. Res. 1983, 12, 253–261. [Google Scholar] [CrossRef]
Wang, H.; Zhou, J.; Zhou, L. A lattice Boltzmann method-like algorithm for the maximal covering location problem on the complex network: Application to location of railway emergency-rescue spot. Mathematics 2024, 12, 218. [Google Scholar] [CrossRef]
de Armas, J.; Juan, A.A.; Marquès, J.M.; Pedroso, J.P. Solving the deterministic and stochastic uncapacitated facility location problem: From a heuristic to a simheuristic. J. Oper. Res. Soc. 2017, 68, 1161–1176. [Google Scholar] [CrossRef]
Morán-Figueroa, G.-H.; Muñoz-Pérez, D.-F.; Rivera-Ibarra, J.-L.; Cobos-Lozada, C.-A. Model for predicting maize crop yield on small farms using clusterwise linear regression and GRASP. Mathematics 2024, 12, 3356. [Google Scholar] [CrossRef]
Sánchez-Oro, J.; López-Sánchez, A.D.; Martínez-Gavara, A.; Hernández-Díaz, A.G.; Duarte, A. A hybrid strategic oscillation with path relinking algorithm for the multiobjective k-balanced center location problem. Mathematics 2021, 9, 853. [Google Scholar] [CrossRef]
Celebi, M.E.; Kingravi, H.A.; Vela, P.A. A comparative study of efficient initialization methods for the k-means clustering algorithm. Expert Syst. Appl. 2013, 40, 200–210. [Google Scholar] [CrossRef]
Bucci, M.J.; Kay, M.G.; Warsing, D.P. A comparison of meta-heuristics for large scale facility location problems with economies of scale. In Proceedings of the Industrial Engineering Research Conference (IERC), Nashville, TN, USA, 19–23 May 2007. [Google Scholar]
Herrán, A.; Colmenar, J.M.; Duarte, A. A variable neighborhood search approach for the Hamiltonian p-median problem. Appl. Soft Comput. 2019, 80, 603–616. [Google Scholar] [CrossRef]
Croci, D.; Jabali, O.; Malucelli, F. The balanced p-median problem with unitary demand. Comput. Oper. Res. 2023, 155, 106242. [Google Scholar] [CrossRef]
Chagas, G.O.; Lorena, L.A.N.; dos Santos, R.D.C.; Renaud, J.; Coelho, L.C. A parallel variable neighborhood search for α-neighbor facility location problems. Comput. Oper. Res. 2024, 165, 106589. [Google Scholar] [CrossRef]
Amrani, H.; Martel, A.; Zufferey, N.; Makeeva, P. A Variable Neighborhood Search Heuristic for the Design of Multicommodity Production-Distribution Networks with Alternative Facility Configurations (CIRRELT-2008-35); Interuniversity Research Centre on Enterprise Networks, Logistics and Transportation (CIRRELT): Montreal, QC, Canada, 2008. [Google Scholar]
Hansen, P.; Brimberg, J.; Urošević, D.; Mladenović, N. Solving large p-median clustering problems by primal–dual variable neighborhood search. Data Min. Knowl. Discov. 2009, 19, 351–375. [Google Scholar] [CrossRef]
Irawan, C.A.; Salhi, S. Solving large p-median problems by a multistage hybrid approach using demand points aggregation and variable neighbourhood search. J. Glob. Optim. 2015, 63, 537–554. [Google Scholar] [CrossRef]
Beasley, J.E. OR-Library: Distributing test problems by electronic mail. J. Oper. Res. Soc. 1990, 41, 1069–1072. [Google Scholar] [CrossRef]
Gwalani, H.; Tiwari, C.; Mikler, A.R. Evaluation of heuristics for the p-median problem: Scale and spatial demand distribution. Comput. Environ. Urban Syst. 2021, 88, 101656. [Google Scholar] [CrossRef]
Daskin, M.S.; Maass, K.L. The p-median problem. In Location Science; Laporte, G., Nickel, S., Saldanha da Gama, F., Eds.; Springer International Publishing: Cham, Switzerland, 2015; pp. 21–45. [Google Scholar]
Transportation Networks for Research Core Team [TRB Network Modeling Committee]. Transportation Networks for Research. Available online: https://github.com/bstabler/TransportationNetworks (accessed on 1 January 2025).
Krarup, J.; Pruzan, P.M. The simple plant location problem: Survey and synthesis. Eur. J. Oper. Res. 1983, 12, 36–81. [Google Scholar] [CrossRef]
Khumawala, B.M. An efficient branch and bound algorithm for the warehouse location problem. Manag. Sci. 1972, 18, B-718–B-731. [Google Scholar] [CrossRef]
Gendron, B.; Semet, F. Formulations and relaxations for two-level uncapacitated facility location problems. INFORMS J. Comput. 2009, 21, 490–506. [Google Scholar]
Adeleke, O.J.; Oladele, D.O. Facility location problems: Models, techniques, and applications in waste management. Recycling 2020, 5, 10. [Google Scholar] [CrossRef]
Tsuya, K.; Takaya, M.; Yamamura, A. Application of the firefly algorithm to the uncapacitated facility location problem. J. Intell. Fuzzy Syst. 2017, 32, 3201–3208. [Google Scholar] [CrossRef]
Sonuç, E.; Özcan, E. An adaptive parallel evolutionary algorithm for solving the uncapacitated facility location problem. Expert Syst. Appl. 2023, 224, 119956. [Google Scholar] [CrossRef]
Cui, T.; Ouyang, Y.; Shen, Z.-J.M. Reliable facility location design under the risk of disruptions. Oper. Res. 2010, 58, 998–1011. [Google Scholar] [CrossRef]
Goldman, A.J. Optimal center location in simple networks. Transp. Sci. 1971, 5, 212–221. [Google Scholar] [CrossRef]
Bar-Gera, H.; Boyce, D. Origin-based algorithms for combined travel forecasting models. Transp. Res. Part B Methodol. 2003, 37, 405–422. [Google Scholar] [CrossRef]
Boyce, D.; Bar-Gera, H. Multiclass combined models for urban travel forecasting. Netw. Spat. Econ. 2004, 4, 115–124. [Google Scholar] [CrossRef]
Bar-Gera, H.; Boyce, D. Solving a non-convex combined travel forecasting model by the method of successive averages with constant step sizes. Transp. Res. Part B Methodol. 2006, 40, 351–367. [Google Scholar] [CrossRef]
Rousseeuw, P.J. Silhouettes: A graphical aid to the interpretation and validation of cluster analysis. Comput. Appl. Math. 1987, 20, 53–65. [Google Scholar] [CrossRef]
Balk, B.M.; De Koster, M.B.M.; Kaps, C.; Zofío, J.L. An evaluation of cross-efficiency methods: With an application to warehouse performance. Appl. Math. Comput. 2021, 406, 126261. [Google Scholar] [CrossRef]
Dolan, E.D.; Moré, J.J. Benchmarking optimization software with performance profiles. Math. Program. 2002, 91, 201–213. [Google Scholar] [CrossRef]
Liu, C.; Wang, Z.; Liu, Z.; Huang, K. Multi-agent reinforcement learning framework for addressing demand-supply imbalance of shared autonomous electric vehicle. Transp. Res. E 2025, 197, 104062. [Google Scholar] [CrossRef]
Huang, K.; Zhang, Z.; Wang, X.; Tao, Y.; Liu, Z. Life-cycle carbon emissions of autonomous electric vehicles in varying traffic situations. Transp. Res. D 2025, 146, 104871. [Google Scholar] [CrossRef]

Figure 1. Illustrative diagram of 6-point tree.

Figure 2. Illustrative scenario diagram: (a) 6-point tree; (b) 1st iteration of NDWRWI; (c) 2nd iteration of NDWRWI; (d) RI-based initialization results; (e) NDWRWI-based initialization results.

Figure 3. Warehouse location analysis on the Sioux Falls Network: (a) Sioux Falls Network; (b) VNS2025; (c) VNS1997; (d) VNSG.

Figure 4. Performance of methods across case cohorts: (a) Silhouette coefficient, IterTime, and Performance Gap in total cost; (b) total cost distribution. Note: Red-marked numbers indicate the best-performing method in each case; pink and blue markings denote our proposed methods.

Table 1. Overview of test case stances.

Instances	Zones (Demands)	Nodes	Links
6-points tree	6	6	5
Sioux Falls	24	24	76
Berlin-Friedrichshain	23	224	523
Berlin-Tiergarten	26	361	766
Berlin-Mitte-Center	36	398	871
Anaheim	38	416	914
Berlin-Mitte-Prenzlauerberg-Friedrichshain-Center	98	975	2184
Gold Coast	1068	4807	11,140

Table 2. Facility location results of the 6-point network (p = 3) using NS1964, NS2025, and NSG methods.

	InitTech	Init.SelectedFac	FinalSelectedFac	Assignments	Total Cost
NS1964	RI	{A, C, E}	{A, D, E}	{A, D, D, D, E, D}	2375
NS2025	NDWRWI	{B, D, E}	{B, D, E}	{D, B, D, D, E, D}	2275
NSG	GI	{B, D, E}	{B, D, E}	{D, B, D, D, E, D}	2275

Table 3. Facility location results of the Gold Coast network using GI-based, RI-based, and NDWRWI-based methods.

	p	Selected Facilities Indices	Total Cost	Iteration Time	Optimal Total Cost	Silhouette	Time for the Best Solution (s)
NS1964	6	{1204, 1347, 1556, 1940, 3191, 4603}	9,189,353	7.98	9,189,353	0.3833	7.98
	7	{561, 1141, 1299, 1425, 1647, 2864, 2918}	9,355,370	9.69
	8	{561, 1133, 1299, 1347, 1499, 1630, 1736, 1940}	9,557,402	7.52
	8	{1204, 1425, 1711, 2358, 2833, 2918, 2943, 4803}	9,661,646	7.69
NS2025	5	{2845, 2916, 3872, 3916, 4074}	9,373,934	18.98	9,372,502	0.3859	14.84
	6	{1425, 1879, 2833, 3872, 4074, 4102}	9,402,195	15.12
	6	{1425, 2918, 3872, 4074, 4102, 4550}	9,403,390	13.75
	5	{1347, 1940, 3872, 3916, 4074}	9,372,502	14.84
NSG	7	{561, 1133, 1299, 1556, 1940, 2845, 4602}	9,206,563	129.58~139.32	9,206,563	0.3752	132.39
VNS1997	8	{1499, 112, 817, 2916, 3700, 3930, 4603, 1275}	10,552,437	3628.89	10,066,266	0.3626	3562.11
VNS1997	8	{4007, 3219, 2155, 1567, 3085, 1724, 1969, 4413}	10,066,266	3562.11	10,066,266	0.3626	3562.11
VNS2025	5	{622, 81, 1529, 2155, 1940}	9,736,593	2661.41	9,736,593	0.3776	2666.41
	7	{1940, 202, 684, 430, 3632, 785, 958}	10,257,816	2642.73
	7	{1, 458, 1940, 561, 725, 3231, 975}	10,342,441	2615.61
	4	{4484, 3926, 3695, 2155}	10,113,549	2583.04
VNSG	7	{1647, 3559, 3496, 1299, 561, 2864, 1556}	9,497,958	3285~3457	9,497,958	0.3197	3361.59

Table 4. Performance of methods across case cohorts (partial). Note: For different tests, underlined numbers indicate our method, while bold numbers highlight the best-performing method.

		Stances						Statistics
		Sioux Falls	Berlin-Frd.	Berlin-Tgr.	Berlin-M-C	Anaheim	Berlin-M-P-F-C	CRP	Rank
Optimal Total Cost	MIP	4.848 × 10⁶	2.543 × 10⁷	3.408 × 10⁷	2.957 × 10⁷	1.047 × 10¹⁰	8.278 × 10⁷	1.000	1.0
	Greedy	1.494 × 10⁷	2.754 × 10⁷	3.466 × 10⁷	3.234 × 10⁷	1.178 × 10¹⁰	8.826 × 10⁷	1.073	6.2
	GA	1.462 × 10⁷	2.543 × 10⁷	3.429 × 10⁷	3.226 × 10⁷	1.354 × 10¹⁰	1.001 × 10⁸	1.094	5.2
	Lagrangian	1.883 × 10⁷	4.628 × 10⁷	7.365 × 10⁷	4.700 × 10⁷	2.398 × 10¹⁰	1.216 × 10⁸	1.743	10.0
	NS1964	1.530 × 10⁷	2.543 × 10⁷	4.061 × 10⁷	3.063 × 10⁷	1.109 × 10¹⁰	8.549 × 10⁷	1.059	4.7
	NS2025	1.786 × 10⁷	2.769 × 10⁷	3.654 × 10⁷	2.965 × 10⁷	1.119 × 10¹⁰	9.298 × 10⁷	1.058	5.2
	VNS1997	5.753 × 10⁶	2.762 × 10⁷	3.779 × 10⁷	3.596 × 10⁷	1.308 × 10¹⁰	9.847 × 10⁷	1.158	8.0
	VNS2025	5.050 × 10⁶	2.543 × 10⁷	3.815 × 10⁷	3.603 × 10⁷	1.242 × 10¹⁰	9.731 × 10⁷	1.121	7.0
	NSG	5.076 × 10⁶	2.647 × 10⁷	3.408 × 10⁷	3.039 × 10⁷	1.077 × 10¹⁰	8.560 × 10⁷	1.028	3.3
	VNSG	5.110 × 10⁶	2.669 × 10⁷	3.486 × 10⁷	3.039 × 10⁷	1.170 × 10¹⁰	8.744 × 10⁷	1.045	4.5
Total Cost LB GAP ((%)	MIP	74.26%	45.05%	53.72%	37.08%	56.34%	31.93%	-	-
	Greedy	20.66%	40.50%	52.94%	31.18%	50.86%	27.43%	-	-
	GA	22.35%	45.05%	53.44%	31.37%	43.53%	17.68%	-	-
	Lagrangian	0.00%	0.00%	0.00%	0.00%	0.00%	0.00%	-	-
	NS1964	18.74%	45.05%	44.86%	34.82%	53.76%	29.71%	-	-
	NS2025	5.16%	40.16%	50.38%	36.91%	53.32%	23.55%	-	-
	VNS1997	69.45%	40.32%	48.69%	23.49%	45.45%	19.04%	-	-
	VNS2025	73.18%	45.05%	48.20%	23.33%	48.22%	19.99%	-	-
	NSG	73.05%	42.81%	53.72%	35.34%	55.10%	29.62%	-	-
	VNSG	72.87%	42.33%	52.67%	35.34%	51.20%	28.11%	-	-
CostIQR (%)	MIP	0.00%	0.00%	0.00%	0.00%	0.00%	0.00%	-	1.0
	Greedy	0.00%	0.00%	0.00%	0.00%	0.00%	0.00%	-	2.0
	GA	11.01%	8.12%	11.49%	2.87%	1.19%	7.20%	-	6.8
	Lagrangian	0.00%	0.00%	0.00%	0.00%	0.00%	0.00%	-	3.0
	NS1964	24.77%	4.09%	23.87%	11.84%	6.68%	9.20%	-	8.2
	NS2025	18.57%	11.00%	11.06%	13.13%	3.08%	6.24%	-	7.3
	VNS1997	23.53%	20.82%	4.49%	17.33%	8.38%	12.36%	-	8.8
	VNS2025	29.70%	12.24%	11.94%	14.10%	10.03%	4.52%	-	8.8
	NSG	0.00%	0.00%	0.00%	0.00%	0.00%	0.00%	-	4.0
	VNSG	2.82%	0.00%	0.00%	0.00%	0.00%	4.48%	-	5.0
Time for Best Solution (s)	MIP	0.16	0.10	0.07	0.12	0.15	1.34	50.330	5.0
	Greedy	0.03	0.69	2.24	3.79	4.51	112.79	606.742	8.0
	GA	0.02	0.03	0.04	0.08	0.05	0.27	15.780	3.0
	Lagrangian	0.04	0.44	1.31	2.71	3.25	86.00	464.278	7.0
	NS1964	0.00	0.00	0.00	0.00	0.00	0.03	1.000	1.0
	NS2025	0.00	0.00	0.01	0.01	0.01	0.05	2.151	2.0
	VNS1997	0.35	0.87	1.66	2.30	2.43	12.26	528.073	7.7
	VNS2025	0.44	0.84	1.40	2.04	2.42	9.85	500.429	7.0
	NSG	0.07	0.06	0.23	0.15	0.10	0.50	41.096	4.7
	VNSG	12.59	12.77	16.92	25.09	24.98	49.33	6107.573	9.7
Silhouette	MIP	0.3131	0.4505	−0.2623	−0.2657	−0.3434	−0.1849	1.420	4.0
	Greedy	0.2739	0.4505	−0.2761	−0.2379	−0.2838	−0.1983	1.348	5.8
	GA	0.3131	0.4505	−0.3260	−0.2197	−0.3412	−0.2198	1.466	5.8
	Lagrangian	0.1884	0.2518	−0.1235	−0.2790	−0.3714	−0.1339	1.011	6.7
	NS1964	0.2979	0.4505	−0.0807	−0.1764	−0.3343	−0.1951	1.085	4.2
	NS2025	0.3073	0.4720	−0.1541	−0.2837	−0.3592	−0.2184	1.367	6.0
	VNS1997	0.3013	0.4038	−0.1445	−0.1711	−0.3338	−0.2038	1.179	5.0
	VNS2025	0.2979	0.4505	−0.1500	−0.1853	−0.2788	−0.1897	1.172	4.3
	NSG	0.3067	0.4332	−0.2623	−0.3257	−0.3517	−0.1913	1.468	6.8
	VNSG	0.3131	0.4332	−0.2623	−0.3257	−0.3431	−0.1891	1.465	6.3

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Lin, J.; Yang, S.; Huang, K.; Wang, K.; Jang, S. Network- and Demand-Driven Initialization Strategy for Enhanced Heuristic in Uncapacitated Facility Location Problem. Mathematics 2025, 13, 2138. https://doi.org/10.3390/math13132138

AMA Style

Lin J, Yang S, Huang K, Wang K, Jang S. Network- and Demand-Driven Initialization Strategy for Enhanced Heuristic in Uncapacitated Facility Location Problem. Mathematics. 2025; 13(13):2138. https://doi.org/10.3390/math13132138

Chicago/Turabian Style

Lin, Jayson, Shuo Yang, Kai Huang, Kun Wang, and Sunghoon Jang. 2025. "Network- and Demand-Driven Initialization Strategy for Enhanced Heuristic in Uncapacitated Facility Location Problem" Mathematics 13, no. 13: 2138. https://doi.org/10.3390/math13132138

APA Style

Lin, J., Yang, S., Huang, K., Wang, K., & Jang, S. (2025). Network- and Demand-Driven Initialization Strategy for Enhanced Heuristic in Uncapacitated Facility Location Problem. Mathematics, 13(13), 2138. https://doi.org/10.3390/math13132138

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Network- and Demand-Driven Initialization Strategy for Enhanced Heuristic in Uncapacitated Facility Location Problem

Abstract

1. Introduction

2. Problem Definition

2.1. Network-Based Uncapacitated Facility Location Problem (N-UFLP)

2.2. Assumptions

3. Methodology

3.1. Baseline Methods

3.1.1. Roulette Wheel Initialization (RWI)

3.1.2. Greedy Initialization (GI)

3.1.3. Neighborhood Search Algorithm (NS) and Greedy-Initialized Neighborhood Search

3.1.4. Variable Neighborhood Search (VNS) and Greedy-Initialized Variable Neighborhood Search

3.2. Network- and Demand-Weighted Roulette Wheel Initialization (NDWRWI)

3.3. Mechanism Comparison and Complexity Derivation

3.4. Benchmark Methods

4. Results

4.1. Setting up the Case Study

4.2. Small-Scale Illustrative Scenario

4.3. Warehouse Location Analysis

4.4. Ultra-Large-Scale Network Analysis

4.5. Comparative Analysis Across Case Cohorts: Multi-Method and Multi-Run Evaluation

5. Discussion

6. Conclusions

Author Contributions

Funding

Data Availability Statement

Acknowledgments

Conflicts of Interest

Abbreviations

Appendix A

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI