GIS-Integrated Data Analytics for Optimal Location-and-Routing Problems: The GD-ARISE Pipeline

Won, Jun-Jae; Lee, Jong-Seung; Ha, Hyung-Tae

doi:10.3390/math13213465

Open AccessArticle

GIS-Integrated Data Analytics for Optimal Location-and-Routing Problems: The GD-ARISE Pipeline

by

Jun-Jae Won

¹,

Jong-Seung Lee

²

and

Hyung-Tae Ha

^1,2,*

¹

Department of Applied Statistics, Gachon University, 1342 Seongnam-daero, Sujeong-gu, Seongnam-si 13120, Gyeonggi-do, Republic of Korea

²

Department of Next Generation Smart Energy System Convergence, Gachon University, 1342 Seongnam-daero, Sujeong-gu, Seongnam-si 13120, Gyeonggi-do, Republic of Korea

^*

Author to whom correspondence should be addressed.

Mathematics 2025, 13(21), 3465; https://doi.org/10.3390/math13213465

Submission received: 30 September 2025 / Revised: 21 October 2025 / Accepted: 28 October 2025 / Published: 30 October 2025

(This article belongs to the Special Issue Theoretical and Applied Mathematics in Supply Chain Management)

Download

Browse Figures

Versions Notes

Abstract

Optimizing the siting and servicing of urban facilities is a core operations research problem that must reconcile heterogeneous demand, spatial constraints, and network-realistic travel. We present GD-ARISE, a GIS-integrated and data analytics pipeline that maintains a pedestrian–road network metric from demand inference through siting to routing. The workflow has three modules: (i) GIS integration that unifies spatial layers on one network and distance metric; (ii) data analytics that builds multi-criteria suitability via the Analytic Hierarchy Process (AHP) and maps scores to adaptive service radii; (iii) optimal location-and-routing that selects nonoverlapping sites with a transparent greedy rule (SCASS) and computes depot-to-depot routes via simulated annealing on the same metric. A case study in Seoul’s Gangnam District yields a high-coverage portfolio and feasible collection routes. We add a theoretical framework that casts SCASS as a conflict-graph problem, document the AHP elicitation with consistency checks, and report robustness analyses including sensitivity to AHP weights and to radius bounds. Results indicate that core hotspots remain stable to weighting, whereas mid-range corridors shift as criteria priorities or spatial parameters change.

Keywords:

optimal location-and-routing problem; urban waste management; GIS integration; data analytics; analytic hierarchy process; maximal covering location problem; adaptive coverage; simulated annealing

MSC:

90B80

1. Introduction

1.1. Motivation

Urban waste disposal and storage can be framed as a rigorously defined operations research location-and-routing problem that is driven by demand data analytics and is GIS-native [1]. In megacities, rapid urbanization and rising population density intensify pressure on municipal services; public waste management often struggles to keep pace. A core cause is geometric mismatch: bins are not where people generate waste, and service routes ignore how people and vehicles actually move [1,2]. The result is a network-based operations research problem in which demand must be inferred, sites must be chosen, and service must be routed using a coherent metric and data stack. This is now feasible at the city scale because municipalities publish high-frequency geospatial datasets (e.g., floating populations, transit nodes, and points of interest), and open platforms such as OpenStreetMap, with OSMnx, enable the computation of network-consistent distances [3]. Planners increasingly require transparent analytics that map directly to policy levers—how many bins to deploy, where to place them, and how to service them given fleet and budget constraints—so there is a timely opportunity to connect demand estimation, siting, and routing under one geometry and one metric [2,4]. Despite practical need, algorithms for urban amenity siting that integrate heterogeneous demand data with geometry-consistent spatial and operational constraints have received limited attention. Few studies, to our knowledge, claim an end-to-end algorithm that explicitly integrates GIS-based geometry, demand-driven multi-criteria scoring that feeds back into spatial parameters, nonoverlap enforced with adaptive radii on a network metric, and routing on the same metric. Most prior workflows stop at ranking without altering geometry, mix Euclidean screening with network routing, or impose fixed buffers that ignore local demand. As a result, geometry-consistent siting-and-routing pipelines remain rare in the literature, leaving a gap for methods that are both operationally credible and analytically transparent.

1.2. Related Works

The literature on multi-criteria decision-making (MCDM), the maximal covering location problem (MCLP), and routing has been extensively developed for facility location in operations research. GIS strengthens these models by linking geospatial data and network analysis [3,5], enabling city-scale location allocation, coverage, and access studies [6]. For multi-criteria synthesis, the Analytic Hierarchy Process (AHP) [7] and related MCDM methods are widely used to build suitability surfaces for siting and infrastructure planning [6]. This approach typically involves weighting heterogeneous GIS layers to derive a composite suitability index. It has been widely applied across various domains, including landfill siting [8,9], healthcare facility planning [10], and renewable energy projects [11,12]. These studies formalize how multi-criteria weighting translates spatially into suitability gradients, underscoring MCDM’s capacity to operationalize complex environmental and social criteria.

Church and ReVelle introduced MCLP, which chooses sites to cover as much demand as possible within a set distance [13]. Many studies have extended this idea to include budgets, fairness goals, and changes over time or uncertainty [14,15,16,17]. Ref. [18] investigated model placement as a single goal of maximizing coverage. In solid-waste systems, recent reviews summarize how covering models are used and point out that results can change a lot depending on the distance measure and the service radius assumed [19,20]. A primary critique of the classical MCLP concerns its rigid, binary definition of coverage. Berman et al. (2003) [21] addressed the model’s unrealistic “all-or-nothing” assumption by introducing a gradual decay function, where service quality diminishes with distance rather than ceasing abruptly. Complementing this, Karasakal and Karasakal (2004) [22] highlighted that strict binary formulations can yield unjustified solutions where partial service is plausible, thereby motivating variants that permit partial coverage. More recent extensions have broadened the MCLP’s focus from pure efficiency to include equity. Blanco and Gázquez (2023) [23] formalized the integration of fairness, employing concepts such as

α

-fairness and ordered-weighted objectives. Their work underscores that distributional equity is not inherent in the basic model and must be incorporated as an explicit objective. Despite these theoretical advancements, practical siting decisions increasingly rely on spatial data and multi-criteria integration. However, using such derived scores to dynamically adjust geometric or operational parameters, such as service radii or coverage decay, remains limited [24,25,26]. Spatial feasibility often requires nonoverlap [27,28], which connects to disk packing [29] and independent set formulations [30]; in practice, spacing is frequently enforced with ad hoc buffers rather than with network metric optimization. In algorithms, greedy heuristics are common for NP-hard location models because they are fast and transparent [31], though not globally optimal [14,15,32].

For routing, simulated annealing [33,34] and other local search methods perform well on the traveling salesman problem (TSP) [35] and on modern VRP variants [36]. Recent applied work also moves toward joint decisions on where to locate facilities and how to route the service.

Rahmanifar et al. (2024) [37] present a non-linear multiobjective model that integrates warehouse location with vehicle routing in cold-chain logistics, and Hashemi-Amiri et al. (2023) [38] propose a tri-objective mixed-integer linear program (MILP) that unifies facility location, crew scheduling, and routing for municipal solid waste. These studies advance integrated planning, but they do not maintain a single, GIS-consistent distance metric from multi-criteria demand aggregation through siting constraints to downstream routing.

1.3. Contributions

This paper introduces GD-ARISE (GIS-integrated and Data analytic Adaptive Radius Integrated Siting and rEservicing), an end-to-end, GIS-integrated pipeline that maintains geometric consistency from demand analytics to operations. First, all spatial layers—administrative boundaries, pedestrian–road networks, floating populations, transit nodes, and waste-related points of interest—are reconciled onto a single pedestrian–road network and one distance metric, ensuring that measurements and decisions are expressed in the same geometry. Second, multi-criteria demand is constructed via AHP into a composite suitability score, and then mapped to geometry as an adaptive service radius, so that high-suitability areas receive smaller radii while lower-suitability areas receive larger radii to preserve access. Third, spatially constrained selection is formulated with site-specific radii on the network metric and implemented via a transparent greedy rule (SCASS) to produce a maximal nonoverlapping portfolio; the associated conflict structure admits an interpretation as a graph independent set problem. Fourth, depot-to-depot service routes are generated using simulated annealing on the identical network metric used upstream, closing the loop without changing geometry. Finally, the pipeline is demonstrated on waste bin siting and servicing in Seoul’s Gangnam District, where the walkable network is sampled at fine resolution, composite scores and adaptive radii are computed, a nonoverlapping portfolio of sites is selected, and operational routes are produced, including a single depot case in Samsung 1–dong. By maintaining one geometry and one metric across all stages, GD-ARISE turns GIS analytics into operations research decisions about counts, placement, spacing, and service effort under realistic constraints.

2. An Integrated Planning Algorithm: GD-ARISE

We present GD-ARISE, a unified, GIS-integrated workflow for optimal location-and-routing. Figure 1 shows the workflow of the GD-ARISE algorithm, which has three modules: GIS integration, data analytics, and optimal location-and-routing. All steps use one network distance metric on the pedestrian–road network. This keeps geometry and units consistent. In GIS integration, load the administrative boundaries, the pedestrian–road network, floating population layers, transit nodes, and waste-related points of interest. Harmonize coordinate reference systems, clean fields, and clip layers to the study region. In data analytics, generate dense candidate sites along walkable streets. For each candidate site, compute criterion scores from the GIS layers such as population, transit access, and proximity to points of interest. Normalize all scores to

[0, 1]

. Use AHP to set weights and combine the criteria into a composite suitability. In optimal location-and-routing, first select sites with SCASS. Sort candidates by suitability and add a site if its radius does not overlap any already selected radius. Then route service to build depot-to-depot tours. The workflow also reports the coverage attained, the number of selected sites, the route length, and the route time. Because all steps share one metric and one GIS substrate, results are easy to audit, map, and reproduce.

We work on a single geographic region

L \subset R^{2}

with a pedestrian–road network-induced distance

d : R^{2} \times R^{2} \to R_{\geq 0}

(shortest path on the network). The region is discretized into candidate sites

J = {j_{1}, \dots, j_{N}}

sampled from the network at resolution

δ > 0

, so adjacent samples are at most

δ

apart under d. A set of criteria

C = {C_{1}, \dots, C_{M}}

evaluates each site. We assign to every

j \in J

a composite suitability

S_{j}^{*} \in [0, 1]

and an adaptive service radius

r_{j} \in [R_{\min}, R_{\max}]

with

0 < R_{\min} < R_{\max} < \infty

. Then, we select a subset

J_{sel} \subseteq J

of size

P_{target} \in N

while enforcing nonoverlap based on d and

{r_{j}}

. Finally, we route the service from a designated depot

D_{0} \in L

. When routing is required, each served site carries demand

q_{j} \geq 0

, the fleet has

m \in N

vehicles, and each vehicle has capacity

Q > 0

. All distances and constraints use the same metric d.

Multi-criteria demand assessment will be performed by mapping heterogeneous raw measurements into a composite demand suitability score

S_{j}^{*}

and a site-specific spatial footprint

r_{j}

. The inputs are the raw criterion values or proximity-based syntheses for each

C_{k} \in C

, together with a pairwise comparison matrix that encodes decision priorities. Each criterion is normalized to

[0, 1]

via a monotone transformation so that larger values consistently mean greater desirability, and AHP weights are extracted from the principal eigenvector of the comparison matrix, subject to a consistency check. Aggregation by a convex combination yields

S_{j}^{*} = \sum_{k} α_{k} S_{j, k}

. This first stage resolves two design tensions in a data-driven yet interpretable way: it fuses many incommensurate predictors of demand into a single, dimensionless score, and it ties spatial influence to local desirability so that high-value sites are modeled with tighter catchments while low-value sites expand to preserve access.

For every criterion

C_{k} \in C

and site

j \in J

, the non-negative value

V_{j, k} \in R_{\geq 0}

quantifies the magnitude of

C_{k}

at j. Two constructions cover the settings of interest. When

C_{k}

is directly observed at the site—such as floating population, footfall, or local demand intensity—we set

V_{j, k} = P_{j}

, where

P_{j}

denotes the measured quantity at j. When

C_{k}

reflects the influence of external features distributed in space, we let

F_{k} = {f_{s}}_{s = 1}^{S_{k}} \subset L

denote the relevant feature set and assign non-negative weights

{ω_{k, s}}_{s = 1}^{S_{k}}

that sum to one and encode the relative importance of individual features or subtypes. A influence kernel

K : R_{\geq 0} \to R_{\geq 0}

then maps distances to contributions so that

V_{j, k} = \sum_{s = 1}^{S_{k}} ω_{k, s} K (d (j, f_{s})) .

(1)

In the case study, we adopt

K (ρ) = ρ

with

d (j, f_{s})

, measured as Euclidean distance.

Because the criteria are measured in heterogeneous units, each raw value is transformed to a standardized unit-interval score via a criterion-specific monotone standardization map

f_{k} : R_{\geq 0} \to [0, 1]

defined by

S_{j, k} = f_{k} (V_{j, k})

. When

C_{k}

is a benefit-type attribute for which larger values are more desirable, a min–max mapping places all sites on a common scale,

S_{j, k} = \{\begin{matrix} \frac{V_{j, k} - \min_{ℓ \in J} V_{ℓ, k}}{\max_{ℓ \in J} V_{ℓ, k} - \min_{ℓ \in J} V_{ℓ, k}}, & if \max_{ℓ \in J} V_{ℓ, k} > \min_{ℓ \in J} V_{ℓ, k}, \\ 0, & otherwise, \end{matrix}

(2)

sending the empirical minimum to 0 and the empirical maximum to 1. To translate composite suitability into a spatial service footprint, two design constants

0 < R_{\min} < R_{\max}

specify the admissible range of coverage radii. The adaptive coverage radius at site j is then defined by the affine mapping:

r_{j} = R_{\max} - (R_{\max} - R_{\min}) S_{j}^{*} .

(3)

Because

S_{j}^{*} \in [0, 1]

, this definition guarantees

r_{j} \in [R_{\min}, R_{\max}]

. Differentiating shows that

\partial r_{j} / \partial S_{j}^{*} = - (R_{\max} - R_{\min}) < 0

, so sites with higher suitability are assigned proportionally smaller catchments by concentrating service in areas of strong demand while allowing sites in weaker areas to expand their reach to preserve access. The use of a network-based metric d is essential in urban contexts when it captures the true impedance of travel.

Now, we design an algorithm for selecting locations via adaptive coverage and greedy nonoverlap, so-called SCASS, in the spirit of MCLP. The aim is to choose the optimal number of target sites

P_{target}

that maximizes the amount of demand covered within the heterogeneous radii

{r_{j}}_{j \in J}

.

Proposition 1

(SCASS). Let

L \subset R^{2}

be a geographic region endowed with a network metric d, let

J = {j_{1}, \dots, j_{N}} \subset L

be candidate sites with adaptive radii

{r_{j}}_{j \in J}

determined from composite scores

{S_{j}^{*}}_{j \in J}

, and let

U \subseteq L

be finite demand nodes with weights

{w_{u}}_{u \in U}

. For each

j \in J

define the coverage set:

C (j) = {u \in U : d (u, j) \leq r_{j}}, and the service disk D_{j} = {x \in L : d (x, j) \leq r_{j}} .

Construct the conflict graph

G_{c} = (J, E)

with an edge

{i, j} \in E

iff

d (i, j) < r_{i} + r_{j}

.

(i): Nonoverlap ⟺ Independence. A subset $J_{sel} \subseteq J$ satisfies the SCASS nonoverlap requirement $D_{i} \cap D_{j} = ⌀$ for all distinct $i, j \in J_{sel}$ , if and only if $J_{sel}$ is an independent set in $G_{c}$ .
(ii): Adaptive radius MCLP with exclusions. For a given cardinality $P_{target} \in N$ , the problem of choosing $J_{sel} \subseteq J$ with $| J_{sel} | = P_{target}$ to maximize covered demand,

$\sum_{u \in U} w_{u} 1 \{u \in ⋃_{j \in J_{sel}} C (j)\}$

subject to SCASS nonoverlap is equivalent to the maximal covering location problem on U with site-specific radii ${r_{j}}$ and the additional constraint that $J_{sel}$ is an independent set of $G_{c}$ .
(iii): Maximum-weight independent set (MWIS). If the siting objective is to maximize $\sum_{j \in J_{sel}} S_{j}^{*}$ subject to nonoverlap and $| J_{sel} | = P_{target}$ , then the problem is a cardinality-constrained MWIS on $G_{c}$ :

$\max_{J_{sel} \subseteq J} \sum_{j \in J_{sel}} S_{j}^{*} s . t . | J_{sel} | = P_{target}, J_{sel} independent in G_{c} .$

Equivalently, with binaries $x_{j} \in {0, 1}$ ,

$\begin{matrix} \max_{x \in {0, 1}^{N}} & \sum_{j = 1}^{N} S_{j}^{*} x_{j} \\ s . t . & x_{i} + x_{j} \leq 1 \forall {i, j} \in E, \\ \sum_{j = 1}^{N} x_{j} = P_{target} . \end{matrix}$

Proof.

For (i), if

D_{i} \cap D_{j} \neq ⌀

, then there exists x with

d (i, j) \leq d (i, x) + d (x, j) \leq r_{i} + r_{j}

by the triangle inequality; hence

{i, j} \in E

and the pair cannot be jointly selected. Conversely, if

d (i, j) \geq r_{i} + r_{j}

, then no x can lie in both disks, so

D_{i} \cap D_{j} = ⌀

. This establishes the equivalence between nonoverlap and independence in

G_{c}

. Statement (ii) inserts this feasibility into the adaptive radius MCLP coverage objective, so a feasible solution is exactly an independent set of prescribed size. Statement (iii) is a direct translation of the selection objective into a cardinality-constrained MWIS, yielding the stated 0–1 formulation. □

Finally, we optimize the service route problem over the selected sites and depot using a capacitated vehicle-routing model. The node set is

V = {0, 1, \dots, K}

with

K = | J_{sel} |

. The node 0 means the depot

D_{0}

of an operations center or garage, and node i corresponds to site

j_{i} \in J_{sel}

. Edge costs are the network distances

d_{i j} = d (j_{i}, j_{j})

with

j_{0} \equiv D_{0}

. Demands

{q_{j_{i}}}_{i = 1}^{K}

and a common capacity Q define feasibility, and the goal is to partition

J_{sel}

into at most m depot-to-depot tours minimizing total travel cost while serving each site exactly once and respecting capacity. Because exact mixed-integer formulations are NP-hard at realistic scales, we adopt a simulated annealing search over permutations augmented with depot separators. A candidate solution is encoded as a sequence that starts and ends at 0 and contains each customer once, together with

m - 1

additional depot symbols; cutting at the depot symbols yields the m routes. The total cost is the sum of

d_{i j}

along the sequence, and capacity feasibility amounts to verifying that the sum of demands on each between-depot segment does not exceed Q. This stage converts a strategic siting outcome into operationally feasible tours that coincide with the same network metric d used in the first two stages.

Routing operates on these selected sites and the designated depot

D_{0} \in L

. To harmonize indexing, fix an arbitrary bijection between

{1, \dots, K}

and

J_{sel}

and write

j_{i}

for the site associated with index i. Define the node set

V = {0, 1, \dots, K}

, where node 0 corresponds to the depot

D_{0}

and node

i \in {1, \dots, K}

corresponds to site

j_{i}

. For any pair

(i, j) \in V \times V

, define

d_{i j} = d (j_{i}, j_{j})

with the convention

j_{0} \equiv D_{0}

and exclude self-loops via

x_{i i} = 0

. Each customer node

i \in {1, \dots, K}

is assigned a non-negative service demand

q_{i} \geq 0

(e.g., expected daily pickups or deliveries), and vehicles have a common capacity

Q > 0

in the same units as the

q_{i}

. Let

m \in N

be the available fleet size. A vehicle route is a directed cycle that starts and ends at the depot, visits a subset of customers exactly once, and respects capacity. We encode routing decisions with binary arc variables

x_{i j} \in {0, 1}

that indicate whether a vehicle travels directly from node i to node j. To enforce capacity and eliminate subtours, we introduce continuous load–flow variables using the classical single-commodity formulation: let

y_{i}

denote the cumulative load delivered up to and including customer i on the route that visits i, measured from zero at the depot. The capacitated vehicle routing problem on

(V, d_{i j})

is then

\min_{x, y} \sum_{i = 0}^{K} \sum_{j = 0}^{K} d_{i j} x_{i j}

(4)

subject to the depot degree constraints

\sum_{j = 1}^{K} x_{0 j} = m

and

\sum_{i = 1}^{K} x_{i 0} = m

, the customer in- and out-degree constraints

\sum_{i = 0}^{K} x_{i h} = 1

and

\sum_{j = 0}^{K} x_{h j} = 1

for every

h \in {1, \dots, K}

, and the load–flow constraints

y_{j} \geq y_{i} + q_{j} - Q (1 - x_{i j}) for all i \in V, j \in {1, \dots, K},

(5)

together with the bounds

y_{0} = 0

and

q_{i} \leq y_{i} \leq Q

for all

i \in {1, \dots, K}

, and the binary and no-nnegativity restrictions

x_{i j} \in {0, 1}

and

y_{i} \geq 0

. The degree constraints ensure that every customer is entered and left exactly once and that exactly m tours depart from and return to the depot. The load–flow inequalities propagate cumulative load along used arcs: if

x_{i j} = 1

, then

y_{j} \geq y_{i} + q_{j}

. So, when a vehicle traverses

(i, j)

, it must have delivered an additional

q_{j}

units by the time it leaves j; if

x_{i j} = 0

, the constraint is slack by at most Q. The bounds

y_{0} = 0

and

y_{i} \leq Q

enforce vehicle capacity and, together with flow propagation, preclude subtours disconnected from the depot, since any positive delivery in a closed customer-only cycle would force y to grow without the possibility of resetting to 0. This mixed-integer program is NP-hard; indeed, when

m = 1

and

Q \geq \sum_{i = 1}^{K} q_{i}

, the problem reduces to the classical traveling salesman problem on

{0, 1, \dots, K}

.

3. Application

Typical applications of the proposed algorithm include siting clinics, fire stations, micro-mobility docks, or public waste bins, each requiring a coherent pipeline. We illustrate the approach on public waste bin siting and servicing in the Gangnam District of Seoul, a dense mixed-use environment in which pedestrian flows, transit access, and commercial intensity co-produce spatially and temporally concentrated litter generation. Figure 2 presents the detailed computational workflow to apply the proposed GD-ARISE with the case study on public waste-bin siting and servicing in Seoul’s Gangnam District.

3.1. GIS Analytics

The application domain is the Gangnam District of Seoul. We set the study region of the framework to

L = G \subset R^{2}

, and we use a single distance metric

d : R^{2} \times R^{2} \to R_{\geq 0}

throughout all stages, instantiated in practice as the shortest-path metric induced by the pedestrian–road network

G

so that distances reflect walk times and barriers rather than straight lines. Gangnam’s land incorporates corridors along Teheran–ro, high-street retail near Gangnam Boulevard, entertainment clusters in Apgujeong and Cheongdam, and multiple subway interchanges induce marked spatiotemporal variation in footfall. We represent this with a non-negative pedestrian density field

P : G \times [0, T] \to R_{\geq 0}

over a representative horizon

[0, T]

(e.g., one day). Administrative boundaries for G were obtained as polygonal census layers and ingested into a GeoPandas workflow. To ensure geometric consistency across heterogeneous sources, all layers were reprojected to the common geographic CRS EPSG:4326 (WGS84) for integration with web data, and then to EPSG:5186 (Korea 2000/Central Belt) for all computations that require metric accuracy. All geoprocessing and network analyses were performed in Python (v3.11.11) using GeoPandas (v1.0.1) for spatial data handling, OSMnx (v2.0.2) for extracting the pedestrian network from OpenStreetMap, Shapely (v2.1.0) for geometric operations (e.g., buffering and distance), and Folium (v0.19.5) for interactive map visualization. Table 1 summarizes the GIS integration for the Gangnam case. It shows each spatial layer, how we prepare it, and how it is used in the model.

Candidate facility sites are drawn from the walkable subgraph of OpenStreetMap within G. We extracted footways, pedestrian paths, sidewalks, and low-speed residential links using OSMnx, simplified the network to retain unique traversable edges, and then sampled points along these edges at a fixed network spacing

δ = 10

m. The result is a finite candidate set

J = {j_{1}, \dots, j_{N}} \subset G

with

N = 32,890

points for feasible bin locations. Each

j \in J

inherits attributes from intersecting administrative polygons via spatial join (e.g., sub-district codes) and is snapped to the nearest network node to avoid topological artifacts when computing

d (\cdot, \cdot)

. As shown in Figure 3, a random sample of candidate waste bin sites illustrates the spatial spread of the full feasible set across the walkable network. Blue points depict a random subset of 1000 candidates drawn from the full set of

32, 890

network-anchored sites (black lines: pedestrian–road network; grey polygons: output-area boundaries). The sample visualizes coverage of feasible placements prior to scoring and selection.

3.2. Data Analytics: Demand Criteria and Feature Construction

Waste bin demand is driven by cumulative exposure to pedestrians and by proximity to attractors such as transit nodes or retail frontages. To support coverage modeling and validation, we also assemble a demand representation that is consistent with the framework’s notation. Let

U \subset G

denote a set of demand nodes at which pedestrian exposure and ancillary variables are tabulated; in practice, U may consist of network vertices in

G

inside G or centroids of census micro-polygons. All exogenous point datasets (e.g., transit stops and points of interest) are cleaned, deduplicated within a tolerance in EPSG:5186, and converted to GeoDataFrame for nearest-neighbor and kernel computations. Where a dataset is temporally indexed, we aggregate to representative daily means so that demand scores represent a typical day on the planning horizon. The dense sampling of the pedestrian network at

δ = 10

m yields

| J | = 32, 890

feasible bin locations, allowing the adaptive radii

{r_{j}}

to respond to fine-scale variation in the built environment. Distances used in all subsequent computations—spatial-exclusion checks

d (i, j) \geq r_{i} + r_{j}

and routing costs

d_{i j} = d (j_{i}, j_{j})

—are evaluated with the same network metric d on

G

. Table 2 summarizes the data analytics used in the Gangnam district case, which lists each variable category, how the data are sourced and preprocessed, and how they enter the model as criteria scores

S_{j, k}

or routing inputs.

In connection with open data acquisition provided by the Seoul Open Data platform, first, Table 3 summarizes the demographic datasets integrated into the GD-ARISE pipeline for the Gangnam district case study. The datasets include Seoul’s living-population estimates for domestic and foreign residents, and each living-population dataset represents the estimated number of people present in a specific location and time, derived by combining administrative records (resident registry, transport, business, and building databases) with KT (Korea Telecom) big data. The census-block boundary layer (EPSG:5186) provides the spatial framework for aggregating and visualizing these population estimates. Together, these layers constitute the demographic foundation for quantifying spatial patterns of pedestrian exposure and population density within the GD-ARISE framework.

Table 4 summarizes the transit-related GIS layers integrated into the GD-ARISE pipeline to represent multimodal accessibility across Gangnam District. The datasets include bus stop locations and subway entrance coordinates, each obtained from verified public sources and re-projected to a unified coordinate reference system (EPSG:5186) for geometric consistency. The bus stop layer provides detailed node attributes, such as stop IDs, names, and coordinates, while the subway entrance layer contains manually extracted latitude–longitude pairs for all access points within the study area. Together, these layers constitute the transit node component of criterion

C_{2}

, serving as the spatial foundation for the proximity-based accessibility analysis in subsequent stages of the GD-ARISE framework. In addition, Table 5 details the source datasets and preprocessing for the waste-related POIs criterion

C_{3}

.

Now, we discretize G by sampling

G

at the spatial resolution

δ > 0

to obtain a candidate set of feasible bin sites

J = {j_{1}, \dots, j_{N}} \subset G

, and, independently, a demand lattice

U = {u_{1}, \dots, u_{M}} \subset G

. The criterion collection

C = {C_{1}, \dots, C_{M}}

is expressed for waste bin siting, with observable correlates of litter pressure and disposal opportunity, in a manner that matches the demand formulation. A floating population is treated as a direct, site-specific magnitude: block-level counts from the observation period are averaged to obtain a mean daily exposure and then spatially joined to both J and U. When a candidate point or demand node falls within overlapping administrative polygons, the exposure is taken as the mean of the overlapping values, which yields well-defined raw measurements

V_{j, k}

for the population criterion and preserves mass under areal interpolation. Public transit proximity is encoded via proximity to bus stops and subway exits. After loading both datasets and projecting to EPSG:5186, we compute for each

j \in J

the shortest path distance along

G

to the nearest stop and to the nearest exit. Local waste generation potential is represented by proximity to POIs that tend to generate street litter, such as convenience stores, cafés, food trucks, and public parks. Each POI set is harmonized into a single layer with subtype weights

ω_{k, s} \geq 0

that sum to one within the criterion.

As shown in Figure 4, population scores are highly right-skewed, indicating many low-exposure segments and a small fraction of hotspots that dominate the upper tail. Most candidates have low values, with a long tail and few very high-exposure locations created by concentrated corridors. A floating population

C_{1}

is a direct site-specific magnitude derived from block-level daily counts aggregated over the observation period. Let

P_{j}

denote the mean daily floating population assigned to site j via areal interpolation from its containing (or overlapping) census blocks. To place

P_{j}

on a unit scale while preserving ranks, we apply the benefit of min–max normalization over all candidates,

S_{j, pop} = \frac{P_{j} - \min_{i \in J} P_{i}}{\max_{i \in J} P_{i} - \min_{i \in J} P_{i}},

(6)

which yields

S_{j, pop} \in [0, 1]

and encodes relative pedestrian exposure.

As shown in Figure 5, disposal scores are mostly moderate with a broad mode around the mid-range, plus a spike at zero for candidates far from transit. The distribution is broadly spread with a mid-range mode and a mass at zero reflecting locations beyond the 300 m influence of both bus stops and subway exits. Transit proximate disposal opportunity

C_{2}

is modeled as proximity to the nearest bus stop and the nearest subway exit, recognizing that on–off flows around stations correlate with both waste generation and appropriate placement of receptacles. Let

d_{j, bus}

and

d_{j, sub}

denote the network distances from j to the nearest bus stop and the nearest subway exit, respectively. A convex combination of linear distance decay kernels with a common maximum influence range

D_{\max, disp} = 300

m produces a unit-interval score

\begin{matrix} S_{j, disp} & = ω_{bus} \max \{0, 1 - \frac{d_{j, bus}}{300}\} + ω_{sub} \max \{0, 1 - \frac{d_{j, sub}}{300}\}, \\ where ω_{bus} = 0.75 and ω_{sub} = 0.25 . \end{matrix}

(7)

which reflects the stronger baseline frequency of bus stops relative to subway exits, while allowing both to contribute when they are nearby.

As shown in Figure 6, shop proximity scores exhibit two masses: a spike at zero for sites with no nearby POIs and a broad peak around 0.5–0.6 where multiple POIs lie within walking range. The spike at zero reflects POI-sparse areas; the main peak indicates neighborhoods with several POIs inside the decay radius. Waste source proximity

C_{3}

aggregates the influence of POIs associated with street litter. Let

K = {conv, cafe, truck, park}

index convenience stores, cafés, food trucks, and parks, and let

d_{j, k}

denote the network distance from j to the nearest POI of subtype k. Subtype weights

{w_{k}}_{k \in K}

encode relative propensities for waste exposure and satisfy

\sum_{k} w_{k} = 1

. Using the same 300 m influence range, we set

\begin{matrix} S_{j, shop} & = \sum_{k \in K} w_{k} \max \{0, 1 - \frac{d_{j, k}}{300}\}, \\ where (w_{conv}, w_{cafe}, w_{truck}, w_{park}) = (0.35, 0.35, 0.20, 0.10) . \end{matrix}

(8)

so that co-location near multiple waste-related attractors increases the score while contributions taper linearly to zero at 300 m.

The empirical distributions of the component scores and the composite reveal substantial spatial heterogeneity across Gangnam’s pedestrian network and inform the subsequent radius mapping. The floating population score

S_{j, pop}

spans

[0, 1]

and is right-skewed, with a mean of

0.1133

, a standard deviation of

0.1380

, and a 75th percentile of

0.1339

, reflecting concentrated corridors of high exposure. The transit disposal score

S_{j, disp}

ranges over

[0, 0.9780]

and is more uniform, with a mean of

0.4170

, a standard deviation of

0.2257

, and a median of

0.4450

, consistent with the dense but heterogeneous distribution of bus stops and subway exits. The POI proximity score

S_{j, shop}

is generally small, with a maximum of

0.0987

, a mean of

0.0275

, a standard deviation of

0.0288

, and a 75th percentile of

0.0515

, indicating that only a minority of network points lie within short walks of multiple waste-generating attractors.

We specify three criteria

C = {C_{1}, C_{2}, C_{3}}

that capture complementary drivers of litter pressure and disposal opportunity:

C_{1}

encodes floating population intensity,

C_{2}

captures transit-proximate disposal likelihood, and

C_{3}

reflects proximity to waste-generating points of interest (POIs). For each

j \in J

and criterion

C_{k}

, a raw, non-negative measurement

V_{j, k}

is constructed and mapped to a unit-interval score

S_{j, k} \in [0, 1]

via a monotone transformation

f_{k}

so that larger values consistently indicate greater suitability for siting a bin at j. With the criterion scores in hand, the relative importance of population exposure, transit opportunity, and POI proximity is computed via the AHP. To clarify the AHP elicitation, we provide the pairwise comparison scale and the exact questions used for expert judgments. AHP pairwise comparison scale used for expert elicitation.

Scale	Definition
1	Equally important
3	Slightly more important
5	Moderately more important
7	Strongly more important
9	Absolutely more important
2, 4, 6, 8	Intermediate between adjacent judgments
Reciprocals	Inverse when criterion B is preferred over A

Pairwise comparison questions used for expert elicitation were as follows:

Q1.: How much more important is distance to waste-source facilities (cafés, convenience stores) than floating population? Answer: $1 / 5$ (population is moderately more important than shop).
Q2.: How much more important is disposal likelihood near transit stops (bus, subway) than floating population? Answer: $1 / 5$ (population is moderately more important than disposal).
Q3.: How much more important is disposal likelihood near transit stops than waste-source proximity? Answer: 3 (disposal is slightly more important than shop).

Two municipal officers completed the elicitation independently. Their responses were aggregated into the following pairwise comparison matrix:

A = [\begin{matrix} 1 & 5 & 5 \\ \frac{1}{5} & 1 & \frac{1}{3} \\ \frac{1}{5} & 3 & 1 \end{matrix}] .

This matrix indicates that population was judged substantially more important than both disposal likelihood and shop proximity, and that disposal likelihood was moderately more important than shop proximity. Using the normalized average method on A, we obtain the AHP weights,

α_{pop} = 0.6864, α_{disp} = 0.2114, α_{shop} = 0.1022,

reported to four decimal places. The composite suitability at site j is the convex combination given by

\begin{matrix} S_{j}^{*} & = α_{pop} S_{j, pop} + α_{disp} S_{j, disp} + α_{shop} S_{j, shop} \\ = 0.6864 S_{j, pop} + 0.2114 S_{j, disp} + 0.1022 S_{j, shop} \end{matrix}

(9)

which, by construction, lies in

[0, 1]

and is strictly increasing in each constituent score. This

S_{j}^{*}

serves as the demand score output used downstream to assign adaptive radii and to prioritize candidates during siting. Aggregation by the AHP weights produces a composite

S_{j}^{*}

that remains right-skewed, with a mean of

0.2083

, a standard deviation of

0.1199

, a median of

0.1883

, a maximum of

0.9301

, and a 75th percentile of

0.2551

. Under the adaptive radius map

r_{j} = R_{\max} - (R_{\max} - R_{\min}) S_{j}^{*}

, these statistics translate into smaller service radii in the highest-scoring corridors and larger radii in peripheral or residential areas, ensuring fine spatial granularity where litter pressure is greatest while preserving baseline access elsewhere. All distances entering the kernels are evaluated with the same network metric d as used in the siting and routing stages, maintaining geometric consistency across the full GD-ARISE pipeline.

Figure 7 shows a heatmap for the composite demand surface that is highly clustered with the highest percentiles concentrated along the northern–central corridors and decreasing toward the periphery. Colors show percentile ranks from low (light) to high (dark); road segments and administrative polygons are overlaid for reference. The surface reveals pronounced high-demand bands in the northern–central corridors, tapering toward the southeast and peripheral areas.

3.3. SCASS

This stage specializes the SCASS formulation to the Gangnam candidate set

J = {j_{1}, \dots, j_{N}}

with

N = 32, 890

points sampled every

δ = 10

m along the pedestrian–road network

G

, using the same network-based distance metric d as in the scoring stage. The input at each site

j \in J

is the composite suitability

S_{j}^{*} \in [0, 1]

obtained from the AHP-based demand assessment. Two preprocessing choices are made to align data coverage with plausible service contexts while preserving the mathematical structure of SCASS. First, a proximity sanity check is applied to screen out isolated candidates: for each j we test whether there exists at least one demand facility (bus stop, subway exit, or waste-related POI) within 1 km under d. In the Gangnam dataset, every candidate satisfies this condition, so the working set remains J. Second, each candidate is assigned an adaptive coverage radius

r_{j}

via the affine map

r_{j} = R_{\max} - (R_{\max} - R_{\min}) S_{j}^{*},

(10)

so that higher suitability implies a smaller catchment, consistent with dense placement in demand hotspots, while lower suitability enlarges catchments to preserve baseline access elsewhere. Applied to Gangnam with

(R_{\min}, R_{\max}) = (30 m, 150 m)

and

P_{target} = 500

, the greedy procedure selects

| J_{sel} | = 347

sites before all remaining candidates conflict with at least one already-selected site; this binding of nonoverlap rather than the numerical target determines the achieved count. The realized

J_{sel}

concentrates in high-scoring corridors around major commercial and transit axes, while respecting minimum separations implied by the adaptive radii and thereby avoiding redundant service areas.

As shown in Figure 8, the final set of selected sites for this analysis is displayed. Network edges are shown in black; selections were obtained by the SCASS greedy nonoverlap procedure on the pedestrian–road network. Table 6 summarizes key parameters and resulting outcomes for the SCASS stage. It lists the candidate set size and sampling, the proximity filter, the adaptive radius bounds and mapping, the target portfolio size, the number of selected sites, and the shared network distance metric.

3.4. Collection Route Optimization: Samsung 1–dong

To instantiate routing optimization on a concrete sub-area, we focus on Samsung 1–dong within Gangnam and route a single collection vehicle to visit the

K = 37

waste–bin sites that were previously selected there. The depot is fixed at

D_{0}

= (37.50867° N, 127.086466° E), and we retain the notation of the main framework by labeling the selected bins

{j_{1}, \dots, j_{37}}

and defining the node set

V = {0, 1, \dots, 37}

with

j_{0} \equiv D_{0}

. Consistent with the end-to-end pipeline, inter-node travel costs are evaluated using the same distance metric d induced by the pedestrian–road network

G

restricted to the Samsung 1–dong extent. Because all bins must be serviced exactly once and, for this small cluster, a single vehicle is sufficient, we solve the capacitated vehicle routing model in the special case

m = 1

and

Q \geq \sum_{i = 1}^{37} q_{i}

, which reduces to a traveling salesman problem on the node set V. A route is represented by a closed walk

(π_{0}, π_{1}, \dots, π_{37}, π_{38})

with

π_{0} = π_{38} = 0

and

{π_{1}, \dots, π_{37}} = {1, \dots, 37}

. Its network cost is

C (π) = \sum_{t = 0}^{37} d_{π_{t}, π_{t + 1}},

(11)

and the optimization objective is to find

π

that minimizes

C (π)

.

As shown in Figure 9, simulated annealing produces a single depot-to-depot tour that visits all selected Samsung 1–dong sites in sequence on the same network used for siting. The green marker denotes the depot; blue points mark the 37 selected sites labeled in visit order; orange lines depict the annealed tour; the black polygon outlines the Samsung 1–dong boundary. The route is the outcome of simulated annealing, evaluated using the network metric consistent with location selection. The resulting annealed tour visits each of the 37 Samsung 1–dong bins exactly once and returns to the depot. Because the encoding, acceptance logic, and termination criteria conform exactly to the routing specification, this Samsung 1–dong case demonstrates that the GD-ARISE pipeline maintains geometric consistency from demand scoring through siting to routing, and that the final routing layer can be solved to high quality with modest computation even when distances respect real network impedances rather than idealized straight lines.

For the computation of the simulated annealing (SA) algorithm, the parameters of the number of nodes in the route (n), the number of iterations per temperature stage (M), the initial temperature (

T_{0}

), the stopping temperature (

T_{stop}

), and the cooling rate (

α

,

0 < α < 1

) are considered, so that the total number of temperature stages (

L_{a}

) and the time complexity (

L_{b}

) are respectively computed as

L_{a} = ⌊\frac{ln (T_{stop} / T_{0})}{ln (α)}⌋

and

L_{b} = O (L_{a} \cdot (M \cdot n + n^{2}))

. For the single vehicle case, the parameters were set to

M = 1000

,

n = 39

,

T_{0} = 1000

,

T_{stop} = 10^{- 8}

, and

α = 0.995

, yielding

L_{a} = 5054

and

L_{b} = 204,793,134

. The optimized SA route resulted in a total travel distance of approximately

11.43 km

, with a mean leg distance of about

301 m

. The self-intersection count was none, and the tortuosity was

2.44

times the straight round-trip length from the depot. Figure 10 shows the total route length decreased sharply before about 1000 steps, from approximately

19, 500 m

to

11, 430 m

, and then stabilized.

For the two-vehicle case, instead of one long tour (TSP), the route can first be divided into two subroutes, each optimized independently by SA. The sites were partitioned via the K-means clustering method to minimize overlap between vehicle coverages and to balance total travel distances, among many other alternative clustering methods. As shown in Figure 11, K-means clustering partitioned the candidate sites between the two vehicles, and inspection of the routes confirms that the two vehicles divided the four census blocks in Samsung 1–dong, with each vehicle traversing two blocks. This process results in two single depot-to-depot tours assigned to the two vehicles, each visiting all trash bins in Samsung 1–dong in sequence, as in the single vehicle case. The green marker denotes the depot; purple points mark the 37 selected sites; the blue lines depict the annealed tour of the 26 selected sites visited by Vehicle A, labeled in visit order; the orange lines depict the annealed tour of the 11 selected sites visited by Vehicle B, labeled in visit order; the black polygon outlines the Samsung 1–dong boundary.

After clustering, Vehicle A was assigned

n_{A} = 28

nodes and Vehicle B was assigned

n_{B} = 13

nodes, excluding the depot. All other SA parameters were kept identical to the single-vehicle case, except for n. The estimated number of operations was

145,474,336

for Vehicle A, and

66,556,126

for Vehicle B. When the two vehicles divided the candidate sites for collection, the total travel distance of both vehicles combined was approximately

15.81 km

, which is longer than the single-vehicle distance of about

11.43 km

. However, the individual travel burdens were reduced, with Vehicle A and Vehicle B traveling approximately

8.77 km

and

7.04 km

, respectively. Each vehicle’s route had a mean leg distance of approximately

324.8 m

and

587.0 m

, tortuosity values of

1.83

and

1.84

, and a self-intersection count of zero for both.

Figure 12 shows the total route lengths of both vehicles decreased rapidly during the early temperature steps, with Vehicle A dropping from approximately

14, 000 m

to

8700 m

and Vehicle B from about

7600 m

to

7040 m

within the first 1000 steps, after which both stabilized. The number of crossings followed a similar trend, falling from roughly 30 and 0 at the beginning to nearly zero within 1000 steps for Vehicle A and Vehicle B, respectively.

3.5. Sensitivity Analysis

We analyze the impact of parameter settings on GD-ARISE’s results. Two sensitivity experiments have been conducted by focusing on (i) the AHP weighting in the multi-criteria scoring stage and (ii) the adaptive coverage parameters in the SCASS site-selection stage. These experiments give ideas about how variations in subjective or geometric settings propagate through the pipeline and influence the resulting suitability distribution and selected facility portfolio.

3.5.1. AHP Sensitivity

To gauge the effect of AHP weighting, we repeated the analysis with near-uniform weights across the three criteria. Under uniform weights, the composite suitability distribution broadened and shifted upward (Figure 13), implying that more candidates attained moderate-to-high suitability. The feasible nonoverlapping selection also expanded and reconfigured, with changes concentrated in mid-range corridors while core hotspots remained stable (Figure 14). Overall, equal weighting increases admissible placements and shifts marginal sites rather than overturning the highest-demand areas.

3.5.2. SCASS Sensitivity

To assess sensitivity to spatial parameters in SCASS, we increased the adaptive-radius bounds from

(R_{\min}, R_{\max}) = (30 m, 150 m)

to

(100 m, 200 m)

. Larger radii produced fewer but broader catchments: the feasible nonoverlapping set shrank (from 347 to 180 sites) while coverage areas grew across the entire distribution (range, mean, and quartiles all increased). Spatially, intensified exclusions reduced site density and shifted feasible locations, with removals and additions concentrated where spacing became binding (Figure 15). Overall, expanding

(R_{\min}, R_{\max})

trades off count for reach, yielding sparser deployments with larger service footprints.

3.6. Managerial Implications

The Gangnam application converts analytics into direct choices about how many bins to deploy, where to place them, and how to service them under real network constraints. The realized portfolio

J_{sel}

provides a defensible deployment count given nonoverlap and adaptive radii; managers can tune

P_{target}

and

(R_{\min}, R_{\max})

to meet coverage or budget targets. Adaptive radii yield a clear siting rule—smaller in high-pressure corridors, larger in quieter areas—so spacing reflects actual walkability rather than Euclidean distance, and nearby nonconflicting alternatives can be identified when frontage or regulations prevent a proposed point.

Two indicators support oversight: the share of demand nodes U covered by

⋃_{j \in J_{sel}} D_{j}

, and the distribution of separations relative to

r_{i} + r_{j}

, revealing where spacing is tight or slack. The routing layer turns network distance into service hours with average speed and per-stop time, enabling staffing and shift design and making comparisons to incumbent routes straightforward. Sensitivity to AHP weights highlights “swing” sites and shows whether small preference shifts change coverage or route time; robust outcomes ease stakeholder agreement, while sensitive zones can be prioritized for pilots or field checks. Because all stages share one metric and dataset, the pipeline can be re-run seasonally as demand or networks change, with re-routing for minor updates and re-siting reserved for larger deviations. The same geometry-consistent workflow transfers to curb-level assets, such as micro-mobility docks or hydration stations, by swapping criteria and data sources while retaining the sensitivity and monitoring protocol.

4. Conclusions

This paper framed public waste bin placement and servicing as a GIS-grounded operations research problem and developed a GD-ARISE data analytic pipeline that unifies multi-criteria demand assessment, spatially constrained site selection, and operational routing on a shared network metric. The Gangnam district case demonstrated that, with readily available administrative layers, pedestrian network data, floating population estimates, transit nodes, and points of interest, the pipeline can sample the walkable network at fine resolution, construct composite suitability scores, translate them into adaptive service radii, select a maximal nonoverlapping portfolio of facility sites, and generate depot-to-depot routes that are feasible on the actual network.

Methodologically, four elements define our contribution. First, GIS integration unifies all spatial layers on one pedestrian–road network and one distance metric. Second, demand data analytics converts heterogeneous inputs into transparent multi-criteria scores. Third, site selection is cast as coverage with nonoverlap on the same network. A fast greedy procedure enforces spacing with a degree-based performance guarantee and scales on GIS graphs. Fourth, routing closes the loop on the identical metric used upstream. Together, these steps form a reproducible, end-to-end pipeline for location-and-routing that is defensible, scalable, and operationally relevant for municipal planning. The Gangnam application highlights several substantive findings. Composite demand is highly skewed and spatially clustered along commercial corridors and transit interchanges; adaptive radii concentrate bins where pressure is greatest while preserving access in quieter areas; nonoverlap constraints prevent redundant service footprints and enforce minimum spacing that respects sidewalk conditions; and simulated annealing delivers shorter, operationally coherent tours once distances reflect real network impedances. These outcomes suggest that a GIS-integrated operations research approach can simultaneously improve perceived cleanliness, reduce reactive cleanup, and control service effort by aligning siting and routing with observed urban activity. While our case study centers on waste bins, the framework generalizes to other street-level amenities—micro-mobility docks, kiosks, sensors, hydration stations—where demand is heterogeneous, space is scarce, and operations matter.

The work also has limitations that point to avenues for extension. Criteria weights were elicited via AHP and calibrated to available data; while consistency checks were enforced, robustness to alternative weight sets and to additional criteria (e.g., complaints, street-furniture density, event schedules) merits further study. The adaptive radius mapping is intentionally simple and interpretable; in settings with strong regulatory or equity requirements, radius rules could be learned from outcomes, made time-dependent, or augmented with minimum-access guarantees. Finally, SCASS employs a greedy selection to ensure transparency and scalability, but we did not benchmark its potential suboptimality against exact or metaheuristic methods; future work should quantify optimality gaps (e.g., via MILP on subregions or lightweight local improvements) to bound performance under the stated constraints.

Author Contributions

Conceptualization, H.-T.H.; Methodology, J.-J.W., J.-S.L., and H.-T.H.; Software, J.-J.W. and J.-S.L.; Formal analysis, J.-J.W.; Data curation, J.-J.W. and J.-S.L.; Writing—original draft, J.-S.L. and H.-T.H.; Writing—review and editing, H.-T.H.; Visualization, J.-S.L.; Supervision, H.-T.H.; Funding acquisition, H.-T.H. All authors have read and agreed to the published version of the manuscript.

Funding

This Research was supported by the MSIT (Ministry of Science and ICT), Republic of Korea, under the ITRC (Information Technology Research Center) support program (RS-2025-00259004) supervised by the IITP (Institute for Information and Communications Technology Planning and Evaluation) and by the Korea Institute of Energy Technology Evaluation and Planning(KETEP) grant funded by the Korea government(MOTIE) (20214000000060, Department of Next Generation Energy System Convergence based-on Techno-Economics-STEP).

Data Availability Statement

The data presented in this study are openly available from the Seoul Open Data Portal (public repository) at the following URLs (accessed on 2 July 2025): https://data.seoul.go.kr/dataList/OA-14979/F/1/datasetView.do; https://data.seoul.go.kr/dataList/OA-14980/F/1/datasetView.do; https://data.seoul.go.kr/dataList/OA-14978/F/1/datasetView.do; https://data.seoul.go.kr/dataVisual/seoul/seoulLivingPopulation.do; https://data.seoul.go.kr/dataList/OA-15067/S/1/datasetView.do; https://data.seoul.go.kr/dataList/OA-18699/S/1/datasetView.do; https://data.seoul.go.kr/dataList/OA-15004/F/1/datasetView.do; In addition, one auxiliary dataset (subway entrance coordinates used for accessibility analysis) was derived from a public-domain web map and is available at: http://map.esran.com/ (accessed on 2 July 2025) [Seoul Open Data Portal].

Conflicts of Interest

The authors declare no conflicts of interest.

Abbreviations

The following abbreviations are used in this manuscript (Figure 1):

CRS	Coordinate Reference System
MCDM	Multi-Criteria Decision-Making
AHP	Analytic Hierarchy Process
CSA	Coverage Suitability Analysis
MCLP	Maximal Covering Location Problem
SCASS	Spatially-Constrained Adaptive Site Selection
RO	Route Optimization
GD-ARISE	GIS-integrated and Data analytic Adaptive Radius Integrated Siting and rEservicing
OR	Operations Research
POI	Point of Interest
PRN	Pedestrian–Road Network

References

Hess, C.; Dragomir, A.G.; Doerner, K.F.; Vigo, D. Waste collection routing: A survey on problems and methods. Cent. Eur. J. Oper. Res. 2024, 32, 399–434. [Google Scholar] [CrossRef]
Han, J.; Zhang, J.; Guo, H.; Zhang, N. Optimizing location-routing and demand allocation in the household waste collection system using a branch-and-price algorithm. Eur. J. Oper. Res. 2024, 316, 958–975. [Google Scholar] [CrossRef]
Boeing, G. OSMnx: New methods for acquiring, modeling, analyzing, and visualizing complex street networks. Comput. Environ. Urban Syst. 2017, 65, 126–139. [Google Scholar] [CrossRef]
Mahéo, A.; Rossit, D.G.; Kilby, P. Solving the integrated bin allocation and collection routing problem for municipal solid waste: A Benders decomposition approach. Ann. Oper. Res. 2023, 322, 441–465. [Google Scholar] [CrossRef]
Haklay, M.; Weber, P. OpenStreetMap: User-generated street maps. IEEE Pervasive Comput. 2008, 7, 12–18. [Google Scholar] [CrossRef]
Nyimbili, P.H.; Erden, T. A hybrid approach integrating entropy–AHP and GIS for suitability assessment of urban emergency facilities. ISPRS Int. J. -Geo-Inf. 2020, 9, 419. [Google Scholar] [CrossRef]
Saaty, T.L. The Analytic Hierarchy Process; McGraw–Hill: New York, NY, USA, 1980. [Google Scholar]
Şener, B.; Süzen, M.L.; Doyuran, V. Landfill site selection by using geographic information systems. Environ. Geol. 2006, 49, 376–388. [Google Scholar] [CrossRef]
Rahmat, Z.G.; Niri, M.V.; Alavi, N.; Goudarzi, G.; Babaei, A.A.; Baboli, Z.; Hosseinzadeh, M. Landfill site selection using GIS and AHP: A case study: Behbahan, Iran. KSCE J. Civ. Eng. 2017, 21, 111–118. [Google Scholar] [CrossRef]
Ahmed, A.; Kheraj, T.; Mohammadi, A.; Bergquist, R. Hybrid GIS-MCDM approach for Hospital Site Selection Suitability Analysis in Poonch District, Jammu and Kashmir, India. GeoJournal 2024, 89, 186. [Google Scholar] [CrossRef]
Mostafaeipour, A.; Hosseini Dehshiri, S.S.; Hosseini Dehshiri, S.J.; Almutairi, K.; Taher, R.; Issakhov, A.; Techato, K. A thorough analysis of renewable hydrogen projects development in Uzbekistan using MCDM methods. Int. J. Hydrogen Energy 2021, 46, 31174–31190. [Google Scholar] [CrossRef]
Janmontree, J.; Zadek, H.; Ransikarbum, K. Analyzing solar location for green hydrogen using multi-criteria decision analysis. Renew. Sustain. Energy Rev. 2025, 209, 115102. [Google Scholar] [CrossRef]
Church, R.L.; ReVelle, C.S. The maximal covering location problem. Pap. Reg. Sci. 1974, 32, 101–118. [Google Scholar] [CrossRef]
Daskin, M.S. Network and Discrete Location: Models, Algorithms, and Applications; Wiley: Hoboken, NJ, USA, 1995. [Google Scholar]
ReVelle, C.; Eiselt, H.A. Location analysis: A synthesis and survey. Eur. J. Oper. Res. 2005, 165, 1–19. [Google Scholar] [CrossRef]
Marsh, M.T.; Schilling, D.A. Equity measurement in facility location analysis: A review and framework. Eur. J. Oper. Res. 1994, 74, 1–17. [Google Scholar] [CrossRef]
Daskin, M.S. A maximum expected covering location model: Formulation, properties and heuristic solution. Transp. Sci. 1983, 17, 48–70. [Google Scholar] [CrossRef]
Bonnet, B.; Dessavre, D.G.; Kraus, K.; Ramirez-Marquez, J.E. Optimal placement of public-access AEDs in urban environments. Comput. Ind. Eng. 2015, 90, 269–280. [Google Scholar] [CrossRef]
Farahani, R.Z.; Asgari, N.; Heidari, N.; Hosseininia, M.; Goh, M. Covering problems in facility location: A review. Comput. Ind. Eng. 2012, 62, 368–407. [Google Scholar] [CrossRef]
Adeleke, O.J.; Olukanni, D.O. Facility location problems: Models, techniques, and applications in waste management. Recycling 2020, 5, 10. [Google Scholar] [CrossRef]
Berman, O.; Krass, D.; Drezner, Z. The gradual covering decay location problem on a network. Eur. J. Oper. Res. 2003, 151, 474–480. [Google Scholar] [CrossRef]
Karasakal, O.; Karasakal, E.K. A maximal covering location model in the presence of partial coverage. Comput. Oper. Res. 2004, 31, 1515–1526. [Google Scholar] [CrossRef]
Blanco, V.; Gázquez, R. Fairness in maximal covering location problems. Comput. Oper. Res. 2023, 157, 106287. [Google Scholar] [CrossRef]
Berman, O.; Drezner, Z.; Krass, D.; Wesolowsky, G.O. The variable radius covering problem. Eur. J. Oper. Res. 2009, 196, 516–525. [Google Scholar] [CrossRef]
Özkan, B.; Özceylan, E.; Sarıçiçek, İ. GIS-based MCDM modeling for landfill site suitability analysis: A comprehensive review of the literature. Environ. Sci. Pollut. Res. 2019, 26, 30711–30730. [Google Scholar] [CrossRef] [PubMed]
Araújo, E.J.; Chaves, A.A.; Lorena, L.A.N. A mathematical model for the coverage location problem with overlap control. Comput. Ind. Eng. 2020, 146, 106548. [Google Scholar] [CrossRef]
Cherri, L.H.; Carravilla, M.A.; Ribeiro, C.; Toledo, F.M.B. Optimality in nesting problems: New constraint programming models and a new global constraint for non-overlap. Oper. Res. Perspect. 2019, 6, 100125. [Google Scholar] [CrossRef]
Gola, A.; Kłosowski, G.; Świć, A. Facility Layout Problem with Alternative Facility Variants. Appl. Sci. 2023, 13, 5032. [Google Scholar] [CrossRef]
Hifi, M.; M’Hallah, R. A literature review on circle and sphere packing problems: Models and methodologies. Adv. Oper. Res. 2009, 2009, 150624. [Google Scholar] [CrossRef]
Erlebach, T.; Jansen, K.; Seidel, E. Polynomial-time approximation schemes for geometric intersection graphs. SIAM J. Comput. 2005, 34, 1302–1323. [Google Scholar] [CrossRef]
Osman, I.H.; Laporte, G. Metaheuristics: A bibliography. Ann. Oper. Res. 1996, 63, 513–623. [Google Scholar] [CrossRef]
ReVelle, C.; Schlossberg, M.; Williams, J.C. Solving the maximal covering location problem with heuristic concentration. Comput. Oper. Res. 2008, 35, 427–435. [Google Scholar] [CrossRef]
Kirkpatrick, S.; Gelatt, C.D.; Vecchi, M.P. Optimization by simulated annealing. Science 1983, 220, 671–680. [Google Scholar] [CrossRef]
Aarts, E.; Korst, J. Simulated Annealing and Boltzmann Machines: A Stochastic Approach to Combinatorial Optimization and Neural Computing; Wiley: Hoboken, NJ, USA, 1989. [Google Scholar]
Lin, S.; Kernighan, B.W. An effective heuristic algorithm for the traveling-salesman problem. Oper. Res. 1973, 21, 498–516. [Google Scholar] [CrossRef]
Yu, V.F.; Lin, C.-H.; Maglasang, R.S.; Lin, S.-W.; Chen, K.-F. An efficient simulated annealing algorithm for the vehicle routing problem in omnichannel distribution. Mathematics 2024, 12, 3664. [Google Scholar] [CrossRef]
Rahmanifar, G.; Mohammadi, M.; Golabian, M.; Sherafat, A.; Hajiaghaei-Keshteli, M.; Fusco, G.; Colombaroni, C. Integrated location and routing for cold chain logistics networks with heterogeneous customer demand. J. Ind. Inf. Integr. 2024, 38, 100573. [Google Scholar] [CrossRef]
Hashemi-Amiri, O.; Ji, R.; Tian, K. An integrated location–scheduling–routing framework for a smart municipal solid waste system. Sustainability 2023, 15, 7774. [Google Scholar] [CrossRef]

Figure 1. Workflow of the GD-ARISE.

Figure 2. Computational workflow of the GD-ARISE for Gangnam District case.

Figure 3. 1000 sample candidates of potential bin locations across Gangnam district.

Figure 4. Histogram of the population score (floating population, min–max normalized).

Figure 5. Histogram of the disposal score (transit proximity).

Figure 6. Histogram of the shop score (proximity to convenience stores, cafés, food trucks, and parks).

Figure 7. Composite demand heatmap across Gangnam district.

Figure 8. Final selected waste bin sites (red) and their adaptive coverage areas (green).

Figure 9. Optimized collection route in Samsung 1–dong.

Figure 10. Variation of total route length and crossings with temperature (iteration) steps.

Figure 11. Optimized collection routes for two vehicles in Samsung 1–dong.

Figure 12. Variation of total route length and crossings with temperature steps for the two-vehicle case.

Figure 13. Histograms of composite suitability scores

S_{j}^{*}

of candidate sites under baseline AHP weights (left) and changed weights for sensitivity analysis (right).

Figure 13. Histograms of composite suitability scores

S_{j}^{*}

of candidate sites under baseline AHP weights (left) and changed weights for sensitivity analysis (right).

Figure 14. Spatial comparison between the baseline and AHP-sensitivity selections.

Figure 15. Spatial comparison between the baseline and SCASS-sensitivity selections.

Table 1. GIS integration workflow for Gangnam case.

Variable Category	Source and Preprocessing	Role in Model
Administrative boundaries	Census polygons; reproject WGS84 → EPSG:5186; spatial joins to points	Study region G, attribution of J and U
Pedestrian network $G$	OSM walkable edges via `OSMnx`; simplification; snapping tolerance	Distance metric d; candidate sampling; routing graph
Candidate locations J	Points every $δ = 10$ m along $G$ ; $N = 32, 890$	Feasible siting set
Demand nodes U	Network vertices or grid centroids within G	Coverage evaluation, calibration, validation

Table 2. Data analytics workflow for Gangnam District case.

Variable Category	Source and Preprocessing	Role in Model
Floating population	Daily counts by block; temporal mean; areal join to J and U	Criterion $C_{1}$ : raw $V_{j, 1} \to S_{j, 1}$
Transit nodes	Bus stops and subway exits; nearest-neighbor on $G$	Proximity criterion $C_{2}$ : raw via kernel $\to S_{j, 2}$
Waste-related POIs	Convenience stores, cafés, food trucks, parks; subtype weights $ω$	Proximity criterion $C_{3}$ : raw via kernel $\to S_{j, 3}$
Depot $D_{0}$ , fleet and service params	Operations center location; $(m, Q, v_{speed}, τ)$	Routing inputs and constraints

Table 3. Population and boundary datasets used for demographic analysis (

C_{1}

).

Table 3. Population and boundary datasets used for demographic analysis (

C_{1}

).

Dataset	Format/Unit	Description	Source
Seoul Living Population (Domestic Residents)	CSV (by census block)	Daily population counts (15–21 May 2025) Variables: DateID, TimeType, DistrictCode, CensusBlockCode, TotalPopulation Estimated using public data (resident registry, transport, business, and building DBs) combined with KT telecom big data	https://data.seoul.go.kr/dataList/OA-14979/F/1/datasetView.do (accessed on 2 July 2025) (Seoul Open Data)
Seoul Living Population (Short-term Foreign Residents)	CSV (by census block)	Estimated short-term foreign resident population by census block Method identical to domestic dataset; includes mobility-based estimation using transport and telecom data	https://data.seoul.go.kr/dataList/OA-14980/F/1/datasetView.do (accessed on 2 July 2025) (Seoul Open Data)
Seoul Living Population (Long-term Foreign Residents)	CSV (by census block)	Long-term foreign residents measured via public and telecom data integration Same variable structure as domestic dataset	https://data.seoul.go.kr/dataList/OA-14978/F/1/datasetView.do (accessed on 2 July 2025) (Seoul Open Data)
Census Block Boundary	SHP/SBN (EPSG:5186)	Geospatial boundaries of census blocks where living population is estimated Spatial unit for integrating demographic data	https://data.seoul.go.kr/dataVisual/seoul/seoulLivingPopulation.do (accessed on 2 July 2025) (Seoul Open Data)

Table 4. Transit-related datasets used for transit proximate disposal

C_{2}

.

Table 4. Transit-related datasets used for transit proximate disposal

C_{2}

.

Dataset	Format/Unit	Description	Source
Seoul Bus Stop Locations	CSV (EPSG:5186)	NodeID, StopID StopName Coordinates (X, Y) StopType	https://data.seoul.go.kr/dataList/OA-15067/S/1/datasetView.do (accessed on 2 July 2025) (Seoul Open Data)
Subway Entrance Coordinates	CSV (manual extraction)	Latitude–Longitude pairs (manual extraction) Subway entrance locations for accessibility criterion $C_{2}$	http://map.esran.com/ (accessed on 2 July 2025) (Seoul Open Data)

Table 5. POI datasets used for waste source facilities

C_{3}

.

Table 5. POI datasets used for waste source facilities

C_{3}

.

Dataset	Format / Unit	Description	Source
Gangnam Food-Service Licensing (cafe_conv_stfood)	CSV (EPSG:5186)	Fields: business name, type, address, coordinates (X, Y) Filtered to cafés, convenience stores, food trucks; active only Updated: 26 May 2025	https://data.seoul.go.kr/dataList/OA-18699/S/1/datasetView.do (accessed on 2 July 2025) (Seoul Open Data)
Gangnam Urban Parks	CSV (EPSG:5186)	Fields: park name, type, latitude–longitude Subtype of $C_{3}$ (parks; weight $= 0.10$ ) Updated: 1 July 2024	https://data.seoul.go.kr/dataList/OA-15004/F/1/datasetView.do (accessed on 2 July 2025) (Seoul Open Data)

Table 6. Parameters and outcomes for adaptive site selection.

Quantity	Value	Definition/Role
Candidate set size N	$32, 890$	Points sampled every $δ = 10$ m along $G$ within G
Proximity sanity radius	1000 m	Filter to exclude candidates with no nearby facilities under d
Adaptive radius bounds	$R_{\min} = 30$ m, $R_{\max} = 150$ m	Affine map $r_{j} = R_{\max} - (R_{\max} - R_{\min}) S_{j}^{*}$
Target portfolio $P_{target}$	500	Desired number of nonoverlapping bins
Selected sites $\| J_{sel} \|$	347	Maximal nonoverlapping set obtained by greedy scan under d
Distance metric d	Network shortest path	Used for radii, nonoverlap checks, coverage, and routing

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Won, J.-J.; Lee, J.-S.; Ha, H.-T. GIS-Integrated Data Analytics for Optimal Location-and-Routing Problems: The GD-ARISE Pipeline. Mathematics 2025, 13, 3465. https://doi.org/10.3390/math13213465

AMA Style

Won J-J, Lee J-S, Ha H-T. GIS-Integrated Data Analytics for Optimal Location-and-Routing Problems: The GD-ARISE Pipeline. Mathematics. 2025; 13(21):3465. https://doi.org/10.3390/math13213465

Chicago/Turabian Style

Won, Jun-Jae, Jong-Seung Lee, and Hyung-Tae Ha. 2025. "GIS-Integrated Data Analytics for Optimal Location-and-Routing Problems: The GD-ARISE Pipeline" Mathematics 13, no. 21: 3465. https://doi.org/10.3390/math13213465

APA Style

Won, J.-J., Lee, J.-S., & Ha, H.-T. (2025). GIS-Integrated Data Analytics for Optimal Location-and-Routing Problems: The GD-ARISE Pipeline. Mathematics, 13(21), 3465. https://doi.org/10.3390/math13213465

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

GIS-Integrated Data Analytics for Optimal Location-and-Routing Problems: The GD-ARISE Pipeline

Abstract

1. Introduction

1.1. Motivation

1.2. Related Works

1.3. Contributions

2. An Integrated Planning Algorithm: GD-ARISE

3. Application

3.1. GIS Analytics

3.2. Data Analytics: Demand Criteria and Feature Construction

3.3. SCASS

3.4. Collection Route Optimization: Samsung 1–dong

3.5. Sensitivity Analysis

3.5.1. AHP Sensitivity

3.5.2. SCASS Sensitivity

3.6. Managerial Implications

4. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

Abbreviations

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI