Multi-Size Facility Allocation Under Competition: A Model with Competitive Decay and Reinforcement Learning-Enhanced Genetic Algorithm

Zhao, Zixuan; Wang, Shaohua; Su, Cheng; Liang, Haojian

doi:10.3390/ijgi14090347

Open AccessArticle

Multi-Size Facility Allocation Under Competition: A Model with Competitive Decay and Reinforcement Learning-Enhanced Genetic Algorithm

by

Zixuan Zhao

¹,

Shaohua Wang

^1,2,*,

Cheng Su

^1,2

and

Haojian Liang

¹

State Key Laboratory of Remote Sensing and Digital Earth, Aerospace Information Research Institute, Chinese Academy of Sciences, Beijing 100101, China

²

University of Chinese Academy of Sciences, Beijing 100049, China

^*

Author to whom correspondence should be addressed.

ISPRS Int. J. Geo-Inf. 2025, 14(9), 347; https://doi.org/10.3390/ijgi14090347

Submission received: 9 April 2025 / Revised: 5 June 2025 / Accepted: 6 June 2025 / Published: 9 September 2025

Download

Browse Figures

Review Reports Versions Notes

Abstract

In modern urban planning, the problem of bank location requires not only considering geographical factors but also integrating competitive elements to optimize resource allocation and enhance market competitiveness. This study addresses the multi-size bank location problem by incorporating competitive factors into the optimization process through a novel reinforcement learning-enhanced genetic algorithm (RL-GA) framework. Building upon an attraction-based model with competitive decay functions, we propose an innovative hybrid optimization approach that combines evolutionary computation with intelligent decision-making capabilities. The RL-GA framework employs Q-learning principles to adaptively select optimal genetic operators based on real-time population states and search progress, enabling meta-learning where the algorithm learns how to optimize rather than simply optimizing. Unlike traditional genetic algorithms with fixed operator probabilities, our approach dynamically adjusts its search strategy through an

ε

-greedy exploration mechanism and multi-objective reward functions. Experimental results demonstrate that the RL-GA achieves improvements in early-stage convergence speed while maintaining solution quality comparable to traditional methods. The algorithm exhibits enhanced convergence characteristics in the initial optimization phases and demonstrates consistent performance across multiple optimization trials. These findings provide evidence for the potential of intelligence-guided evolutionary computation in facility location optimization, offering moderate computational efficiency gains and adaptive strategic guidance for banking facility deployment in competitive environments.

Keywords:

facility location problem; competitive decay; heuristic algorithm; enhanced genetic algorithm; multi-size bank

1. Introduction

The financial services sector is a vital component of the service industry, driving economic growth and stability. The service industry has become a cornerstone of modern economies, significantly contributing to GDP growth, employment, and innovation. Within this sector, financial services such as banking play a pivotal role in facilitating economic activities and ensuring financial stability. The strategic location of banking facilities is crucial in optimizing operational efficiency, enhancing customer accessibility, and maintaining competitiveness.

While the rapid expansion of digital banking and cashless transactions has transformed the financial services landscape, physical bank facilities continue to play crucial roles in modern banking ecosystems. Despite widespread adoption of internet banking and mobile payment platforms, particularly in China, physical branches remain essential for complex financial services requiring in-person verification, high-value business transactions, wealth management consultations, and serving demographic segments that prefer traditional banking channels. Moreover, modern bank branches are evolving into hybrid service centers that combine digital capabilities with personalized advisory services, making strategic location optimization increasingly important for maximizing service efficiency and customer accessibility.

This study addresses two critical limitations in current facility location research through complementary innovations. First, we propose a novel competitive decay model that employs logarithmic attenuation functions

θ_{j} (n_{j}) = ln (β n_{j} + 1)

to capture the diminishing marginal impact of additional competitors in saturated markets. Unlike existing models that apply uniform competitive influence regardless of market saturation levels, our approach realistically models how initial competitors have substantial impact while additional competitors in saturated markets contribute progressively less competitive pressure. Second, we develop a reinforcement learning-enhanced genetic algorithm (RL-GA) that represents the first application of adaptive operator selection through Q-learning principles in facility location optimization. Traditional genetic algorithms rely on fixed operator probabilities throughout execution, limiting adaptability. Our RL-GA framework enables real-time learning and adaptation of genetic operators based on population states and search progress, creating a meta-learning system where the algorithm learns how to optimize rather than executing predefined procedures. The integration of these innovations demonstrates how intelligent algorithmic adaptation can effectively handle the increased complexity introduced by realistic competitive dynamics.

In the competitive landscape of modern economies, the strategic location of commercial facilities is paramount to the success of financial institutions. The strategic placement of commercial facilities within modern economic systems represents a critical determinant of financial institutions’ operational success. This location decision extends far beyond mere geographical considerations, encompassing factors such as accessibility for customers, business expansion potential, operational costs, and long-term strategic development [1]. The complexity of this issue transcends simple spatial positioning, incorporating multiple variables including traffic flow patterns, regional economic conditions, demographic distribution of potential clientele, and security considerations. The decision-making process requires striking a delicate balance between customer accessibility and the diversification of banking services while avoiding the potential service imbalances that may arise from overly concentrated facility placement. Empirical evidence suggests that a well-executed location strategy can significantly enhance customer satisfaction and strengthen market competitiveness [2]. Conversely, sub-optimal facility placement may result in customer attrition, limited service capability, and escalating operational costs. Therefore, conducting comprehensive research on commercial facility location not only provides strategic guidance for commercial practitioners but also generates positive externalizations for urban economic development and community growth. This research endeavors to construct a mathematical model for optimal facility location in multi-site service networks, with particular application to healthcare facilities, ticketing services, and restaurant chains operating in competitive markets with multiple service providers. The proposed model specifically addresses coverage optimization through the lens of consumer utility maximization, where service users make choices among available facilities based on defined utility functions. The fundamental objective of this study is to optimize the spatial distribution of service facilities in a way that maximizes aggregate consumer utility, thereby enhancing the overall service quality of the organization within a competitive environment.

Related Works

The Facility Location Problem (FLP) is a cornerstone of operations research, offering critical insights into optimizing resource placement. However, existing approaches exhibit two fundamental limitations that our study addresses: inadequate modeling of competitive dynamics in saturated markets and reliance on static algorithmic parameters that cannot adapt to problem complexity.

As a fundamental research topic in operations research and management science, the Facility Location Problem (FLP) has received extensive attention and in-depth investigation from academia over the past few decades. Since Cooper first proposed this problem in the 1950s, scholars both domestically and internationally have achieved substantial research outcomes across multiple dimensions, including theoretical research, algorithm design, and practical applications. These problems not only possess significant theoretical value but also demonstrate remarkable practical utility in numerous real-world scenarios such as urban planning, logistics network design, and emergency facility deployment. In particular, research advances in computational complexity theory, integer programming, and heuristic algorithms have provided a solid methodological foundation for solving large-scale practical problems.

The Facility Location Problem (FLP) has emerged as a fundamental research topic in operations research and management science, garnering significant academic attention over recent decades. Since Cooper’s pioneering work in the 1950s, researchers worldwide have made substantial contributions across theoretical foundations, algorithmic development, and practical applications. The significance of FLP extends beyond its theoretical value, demonstrating considerable practical utility in urban planning, logistics network design, and emergency facility deployment. Advances in computational complexity theory, integer programming, and heuristic algorithms have established a robust methodological framework for addressing large-scale practical problems. The evolution of FLP research has followed several distinct trajectories. In the context of emergency services, Daskin (1981) [3] pioneered the Hierarchical Objective Set Covering (HOSC) model, introducing a novel approach to optimizing Emergency Medical Services (EMS) vehicle deployment through multi-level objectives. His subsequent work (1983) [4] further advanced the theoretical foundations of the maximum expected covering location model, accompanied by innovative heuristic solutions.

In the banking sector, significant methodological advances have emerged addressing the unique challenges of financial facility location. Min and Melachrinoudis (2001) [5] developed a sophisticated three-level stochastic site allocation model that comprehensively integrated market potential assessment, customer behavior analysis, cost considerations, and risk factors. Building on this foundation, Miliotis et al. (2002) [6] pioneered the integration of demand coverage models with Geographic Information Systems (GIS) for optimizing bank branch locations, marking a significant advancement in spatial decision support systems. The field has witnessed substantial methodological progress in dynamic facility location problems, as evidenced by Arabani and Farahani’s (2011) [7] comprehensive classification and systematic analysis. Their work was complemented by Gendreau et al. (2000) [8], who introduced an innovative dynamic model incorporating parallel forbidden search heuristics to expand solution possibilities and enhance computational efficiency.

Recent research has increasingly gravitated toward multi-period and continuous optimization approaches. Başar et al. (2010) [9] made substantial contributions through their exploration of multi-period site selection for EMS facilities, while Brimberg et al. (2012) [10] significantly advanced the continuous p-median problem through innovative local search methodologies and notable improvements to Cooper’s alternating algorithm. Alexandris and Giannikos (2009) [11] enhanced the traditional group maximum coverage model by introducing complementary partial coverage concepts and improved GIS integration, thereby offering a more nuanced approach to spatial coverage optimization.

Significant algorithmic advancements have emerged from various research directions. Wang et al. (2002, 2003) [12,13] conducted comprehensive investigations into facility location problems under stochastic demand and budget constraints, developing a sophisticated suite of heuristic approaches including greedy deletion algorithms, forbidden search techniques, and Lagrangian relaxation methods. These contributions have substantially improved the handling of uncertainty in location decisions. In the logistics domain, Hwang (2002) [14] successfully demonstrated the effectiveness of genetic algorithms in optimizing multi-warehouse systems, highlighting the potential of evolutionary computation in complex spatial optimization scenarios. Furthermore, Curtin et al. (2007) [15] developed an integrated approach combining Maximum Coverage Models with GIS capabilities and linear programming techniques for police patrol allocation, showcasing the potential of hybrid methodologies in practical applications.

The field has also seen important theoretical advances in coverage modeling. Murray et al. (2010) [16] evaluated classical coverage models and proposed an implicit coverage model that better reflects spatial complexity. Baron et al. (2008) [17] addressed the challenges of mobile server location under stochastic demand and congestion, developing the BBKK1 and BBKK2 models for system stability. Building upon central place theory principles established by earlier scholars (Christaller, 1966 [18]; Lösch, 1954 [19]; Zeller et al., 1980 [20]), which recognize that customers have maximum travel distances they are willing to undertake for goods depending on retail categories, significant methodological innovations have emerged in coverage-based market analysis. Ghosh and Craig (1986) [21] provided foundational insights into this spatial behavior framework. Subsequently, Drezner et al. (2011) [22] developed an innovative coverage-based methodology for market share estimation, introducing the concept of facility “influence spheres” that are directly correlated with attractiveness parameters. This approach recognizes that facilities with higher attractiveness levels command proportionally larger influence radii, and the accompanying solution algorithms demonstrated computational efficiency through extensive experimental validation. Further advancing this framework, Drezner et al. (2012) [23] extended the model to incorporate strategic investment decisions, enabling optimization of both new facility construction and enhancement of existing facility attractiveness within budget constraints. This work was subsequently complemented by Drezner et al. (2015) [24], who explored competitive dynamics through leader-follower optimization frameworks based on coverage models. Recent innovations in banking facility optimization include Xia et al.’s (2010) [25] enhanced MCLP model utilizing hybrid nested partitioning algorithms, and Zhang’s (2006) [26] multi-site location allocation model incorporating probabilistic customer choice behavior, queuing time, and budget constraints. These developments represent significant improvements over traditional distance-based models, offering more nuanced approaches to spatial decision-making in competitive environments.

The evolution of facility attractiveness assessment has undergone significant methodological advancement since its early conceptualization. Initially, Huff (1964, 1966) [27,28] pioneered the quantification approach by proposing facility floor area as a proxy measure for attractiveness, conducting empirical investigations across supermarkets, furniture, and clothing establishments. This foundational work was substantially enhanced by Nakanishi and Cooper (1974) [29], who introduced the Multiplicative Competitive Interaction (MCI) model, which replaced the singular floor area metric with a sophisticated composite of multiple factors, each weighted by exponential parameters. The practical application of this enhanced framework was demonstrated by Jain and Mahajan (1979) [30] in food retailing contexts, incorporating multidimensional attractiveness attributes including store image, spatial layout, visual appearance, accessibility, service quality, and staff composition.

Subsequent research has systematically identified diverse factors influencing customer perception of facility attractiveness across various retail sectors. In supermarket environments, extensive investigations have revealed the critical importance of pricing structures, product freshness, availability, convenience factors, service quality, and parking accessibility (Prosperi & Schuler, 1976 [31]; Schuler, 1981 [32]; Timmermans, 1988 [33]; Bell et al., 1998 [34]). Parallel studies in clothing retail demonstrated that parking availability and merchandise variety significantly impact attractiveness (Timmermans, 1982 [35]), while central business district analysis revealed the multifaceted nature of attractiveness determinants, encompassing pricing, visual aesthetics, reputation, product range, operational hours, atmospheric qualities, design elements, and service provision (Downs, 1970 [36]).

More recent empirical research has employed sophisticated analytical techniques to identify key attractiveness predictors. Notably, Drezner (2006) [37] reported comprehensive findings from a survey of 272 shopping mall visitors, where Structural Equations Modeling analysis identified three primary determinants of overall mall attractiveness: store variety, physical appearance, and brand presence, with these factors demonstrating comparable relative importance. From a computational perspective, Plastria and Carrizosa (2004) [38] contributed methodological innovations by developing algorithmic approaches for attractiveness estimation, particularly for Euclidean distance contexts, resulting in finite polynomial algorithms scaled to customer population size.

Despite these advances, current facility location research exhibits critical limitations. Existing competitive models, including those by Drezner et al. (2011–2015) [22,23,24], primarily employ linear or exponential decay functions that fail to capture diminishing marginal competitive effects in saturated markets. Simultaneously, algorithmic approaches, exemplified by Hwang (2002) [14] and subsequent genetic algorithm applications, rely on static parameter configurations that cannot adapt to evolving problem complexity during optimization. Our study bridges these gaps through: (1) a logarithmic competitive decay model that realistically captures market saturation dynamics, and (2) the first reinforcement learning-enhanced genetic algorithm for facility location that adaptively selects optimal operators based on real-time population states. This integrated approach represents a paradigm shift from static optimization to adaptive, learning-based spatial optimization methodologies.

2. Materials and Methods

2.1. Mathematical Modeling

2.1.1. Probabilistic Expressions of Customer Behavior

The subsequent analysis primarily focuses on physical establishments such as retail outlets or commercial centers. Upon establishment, these facilities become permanently fixed to their chosen sites and cannot be economically relocated. The appeal of a facility or its offerings extends beyond merely the pricing structure; characteristics including outlet dimensions, staff courtesy, and various additional elements also contribute significantly. Although facilities are unable to modify their geographical positioning, enhancements can be implemented at the existing site to increase customer appeal.

Numerous methodologies for evaluating market share acquisition by individual facilities have been developed throughout the years. The underlying assumption suggests that consumers allocate their purchasing capacity across different facilities based on facility appeal and proximity compared to alternative options. When facility market attraction can be quantified, methodologies for identifying optimal locations for new establishments can be formulated. Typically, the objective function lacks convexity properties. Consequently, particularly in multi-facility positioning models, researchers have predominantly proposed heuristic approaches that cannot guarantee global optimality for most modeling frameworks.

Within the fundamental methodologies for characterizing consumer behavior and market share assessment, the utility framework was introduced by Huff [27,28], who successfully represented customer behavior through a probabilistic modeling approach. In his framework, the utility

u_{i j}

of a service facility is proportional to the ratio

s_{j} / T^{θ} i j

, where

s j

represents the scale of facility j,

T_{i j}

denotes the travel duration required from location i to service facility j, and

θ

constitutes an empirically determined parameter reflecting travel time impact on journey decisions. Therefore, the probability of a customer journeying from location i to service facility j can be estimated in the following form:

p_{i j} = \frac{u_{i j}}{\sum_{j = 1}^{n} u_{i j}} = \frac{s_{j} / T_{i j}^{θ}}{\sum_{j = 1}^{n} s_{j} / T_{i j}^{θ}} .

(1)

From the above model, the utility

u_{i j}

for demand node i from the service facility j is defined by the ratio

s_{j} / T_{i j}^{θ}

. Thus, the attractiveness of a facility to customers is influenced by its size, which is reasonable [39,40]. The model introduced by Leonardi [41] and embraced by Pastor [40] inspired the concept of employing an exponential distance value in the distance decay function

f (*)

. The model was structured as follows:

E_{i j} = c_{i} \cdot p_{i j},

(2)

p_{i j} = \frac{s_{j}^{α} \cdot f (d_{i j})}{\sum_{j \in J} s_{j}^{α} \cdot f (d_{i j})}, i \in I, j \in J,

(3)

where

E_{i j}

is the expected number of consumers at demand point i who would travel to service facility j;

c_{i}

is the total number of customers at demand point i; I is a finite set of indices associated with all demand nodes; J is a finite set of indices associated with all possible locations of service facility;

s_{j}

is the size of facility j;

d_{i j}

is the travel distance between the demand node i and the location of facility j;

α

is a parameter that accounts for the scale effects

(0 < α < 1)

;

f (*)

is a distance decay function that denotes a reduction in customer interest in a service facility as the distance from the customer to the facility increases. In this paper, we defined a distance decay function in the following form:

f (d_{i j}) = \frac{1}{θ_{j} \cdot e^{d_{i j}}},

(4)

where

θ_{j}

is a parameter that measures the geographical positioning of the competition facilities. We employ the following logarithmic decay function to model the process of diminishing attractiveness in a market environment that is approaching saturation, as the number of competitors increases. This decay model effectively captures the nonlinear impact of competitor count on resource allocation or market attractiveness, particularly in scenarios where the attractiveness declines rapidly during the initial entry of competitors and gradually stabilizes as the market reaches higher levels of saturation. Through this function, we aim to provide a more intuitive representation of the dynamics of competitive pressure in the market and its influence on the attractiveness of market participants.

θ_{j} (n_{j}) = ln (β n_{j} + 1)

(5)

where

n_{j}

represents the number of competing facilities within a specific radius around facility j, and

β

(0 < β < 1)

is a parameter used to adjust the intensity of competitive influence. This logarithmic form of the competition influence function has the following characteristics:

(1): It is particularly important to define the case when $n_{j} = 0$ , where the attractiveness should retain its initial value 1, representing the system’s full attractiveness in the absence of competitors. To ensure the validity of the logarithmic decay function under this condition, an appropriate offset term is introduced. This adjustment avoids the undefined behavior of the logarithmic operation and ensures the accuracy and interpretability of the model in its initial state;
(2): As the number of competing facilities increases, the competitive influence exhibits diminishing marginal effects, which aligns with real market competition characteristics;
(3): The introduction of parameter $β$ allows us to adjust competitive intensity according to different market environments, enhancing the model’s adaptability.

The following points should be resolved about the preceding model

(3)

:

By substituting $s_{j}$ with $s_{j}^{α}$ , Equation $(3)$ offers the advantage of accounting for varying scale effects. In fact, merely doubling the size of a facility does not necessarily result in a doubling of its utility [40];
The term $\frac{1}{T_{i j}}$ in Equation $(1)$ is replaced by $\frac{1}{θ_{j} \cdot e^{d_{i j}}}$ in Equation (3). This modification simplifies the computation process.

Equation

(3)

serves as a model for customer patronage, suggesting that the expression

s_{j}^{α} / θ_{j} \cdot e^{d_{i j}}

for facility j can be interpreted as the utility value for customers located at demand node i. In the model presented in this paper, our focus is on the profits of a single bank which may operate multiple branches within a given region.

2.1.2. Objective Function

For demand node i, we define the aggregate utility value derived from all facilities as

F_{i} = \sum_{j \in N} \frac{s_{j}^{α}}{θ_{j} \cdot e^{d_{i j}}},

(6)

where N denotes the set of all locations housing service facilities. The formulation of

F_{i}

demonstrates that the utility value is a function of two key parameters: the facility size and its spatial distance from demand node i.

Under conditions where the spatial distribution of facility remains constant, the utility value for demand node i exhibits a positive correlation with facility sizes. Conversely, maintaining fixed sizes while varying spatial positions reveals that the utility value for demand node i increases as the proximity to facility increases.

Leveraging this utility function

F_{i}

, we establish the objective function for our model, where

c_{i}

represents the customer population density in demand point i. This formulation enables us to quantify and optimize the facility’s service delivery efficiency across its operational network:

max \prod_{i \in I} {(F_{i})}^{c_{i}},

(7)

which can be simplified to a more manageable form by applying logarithmic transformation:

max \sum_{i \in I} c_{i} ln (F_{i}) .

(8)

The advantage of applying the multiplicative function

max \prod_{i \in I} {(F_{i})}^{c_{i}}

rather than an additive function is we necessitate non-zero accessibility to any demand point [41]. If any term

F_{i}

is equal to zero, the objective function

max \prod_{i \in I} {(F_{i})}^{c_{i}}

will consequently evaluate to zero. The number of customers at each demand point

c_{i}

is taken into account; thus, a demand node with a greater number of customers should exert a correspondingly greater influence.

2.1.3. Constraint on Budget for New Facility

The objective function has been established, and the next step involves imposing constraints on this function. In this paper, we examine a new brand seeking to establish a commercial layout in a competitive area that features branch shops of varying sizes, resulting in differing opening costs. Assuming the maximum budget of B and function

b (s_{j})

refers to the opening cost for specific size of branch, we have:

\sum_{j \in N} b (s_{j}) \leq B

(9)

Furthermore, it is necessary to implement a constraint on the aggregate number of retail establishments. Let

y (i)

be a binary decision variable that equals 1 if and only if a service facility is established at candidate location i, and 0 otherwise, and M is the maximum total number of facilities, then we have:

\sum_{i \in I} y (i) \leq M

(10)

2.2. RL-Enhanced Genetic Algorithm

To effectively solve the facility location optimization model proposed in this study, we employ the Genetic Algorithm (GA) as our solution method. The genetic algorithm is a heuristic optimization algorithm that performs random searches by simulating natural selection and genetic mechanisms from Darwinian evolutionary theory [42]. First proposed by Holland in 1975, this algorithm has evolved over decades to become a powerful tool for solving complex optimization problems.

In recent years, genetic algorithms have demonstrated exceptional applicability and effectiveness in facility location-allocation optimization, successfully addressing complex spatial optimization problems across multiple industries. In e-commerce logistics network design, Sachdeva et al. (2022) [43] developed metaheuristic approaches for hub-spoke facility location problems and successfully applied them to distribution network optimization in the Indian e-commerce industry, demonstrating the advantages of genetic algorithms in handling large-scale logistics facility layout problems. In the emerging field of green infrastructure development, Lazari and Chassiakos (2023) [44] employed genetic algorithms to achieve multi-objective optimization of electric vehicle charging station deployment, effectively balancing multiple constraints including service coverage, construction costs, and environmental impacts, showcasing the algorithm’s significant value in sustainable facility planning. Furthermore, in healthcare system optimization, Salami et al. (2023) [45] proposed a two-stage optimization approach based on genetic algorithms to solve healthcare facility location-allocation problems, significantly improving the accessibility and utilization efficiency of medical resources through intelligent service delivery strategies. These research achievements fully demonstrate the broad applicability and optimization potential of genetic algorithms in solving various types of facility location-allocation problems, providing solid theoretical foundations and practical support for our adoption of reinforcement learning-enhanced genetic algorithms in this study.

The core concept of genetic algorithms involves encoding potential solutions to optimization problems as chromosomes and generating new chromosomes with higher fitness through genetic operations that simulate biological evolution processes, including selection, crossover, and mutation [46]. This evolutionary optimization process exhibits several distinctive characteristics:

Parallelism: The algorithm simultaneously operates on multiple individuals within the population, demonstrating strong global search capabilities;
Randomness: Through random search strategies, it effectively avoids becoming trapped in local optima;
Adaptability: Individuals within the population continuously evolve to adapt to the objective function, showing robust problem-solving capabilities.

Based on these particular advantages [47], we have designed a reinforcement learning-enhanced genetic algorithm framework specifically for the facility location problem. This framework integrates reinforcement learning decision mechanisms with traditional genetic operations, enabling adaptive strategy selection throughout the evolutionary process. The framework encompasses problem-specific encoding schemes, multi-component reward function design, experience replay mechanisms, adaptive parameter adjustment strategies, and intelligent genetic operator selection through Q-learning-based policy models.

Genetic algorithms, as meta-heuristic optimization algorithms that simulate natural evolutionary processes, have demonstrated remarkable capabilities in solving complex combinatorial optimization problems. However, the performance of traditional genetic algorithms largely depends on the configuration of genetic operation parameters, including selection strategies, crossover probability, and mutation probability, among others. These parameters typically remain constant throughout the algorithm execution and cannot be adaptively adjusted according to changes in population states during the search process, thereby limiting the algorithm’s search efficiency and solution quality. Reinforcement learning, as an important branch of machine learning, possesses the capability to learn optimal strategies through interaction with the environment. By integrating reinforcement learning decision mechanisms into genetic algorithms, the algorithm can automatically select the most appropriate genetic operations at different search stages, thereby enhancing the algorithm’s adaptability and search performance.

2.2.1. Algorithm Theoretical Framework

The fundamental concept of the reinforcement learning-enhanced genetic algorithm lies in modeling the evolutionary process of genetic algorithms as a Markov Decision Process (MDP). Within this framework, the algorithm’s state is represented by feature vectors characterizing the current population, actions correspond to different genetic operation strategies, the reward function is designed based on the degree of population fitness improvement, and the policy is optimized through reinforcement learning methodologies. This modeling approach enables the algorithm to autonomously select optimal genetic operation strategies at different evolutionary stages by learning from historical experiences.

The algorithm encodes the population state at generation t as a five-dimensional vector:

S_{t} = [d_{t}, f_{b e s t}^{n o r m}, f_{a v g}^{n o r m}, σ_{f}^{n o r m}, p_{t}]

(11)

where each component is calculated as follows. Population diversity

d_{t}

is computed through the average of location diversity and scale diversity:

d_{t} = \frac{1}{2} (\frac{| L_{t} |}{N \times K} + \frac{| S_{t} |}{| S_{a l l} |})

(12)

where

| L_{t} |

represents the number of distinct locations in the population at generation t, N is the population size, K is the number of facilities per individual,

| S_{t} |

denotes the number of different scale types, and

| S_{a l l} |

represents the total number of possible scale types.

The normalized best fitness and average fitness are defined as:

f_{b e s t}^{n o r m} = \frac{f_{m a x} - f_{m i n}}{f_{r a n g e}}, f_{a v g}^{n o r m} = \frac{\bar{f} - f_{m i n}}{f_{r a n g e}}

(13)

where

f_{r a n g e} = f_{m a x} - f_{m i n}

. When

f_{r a n g e} = 0

, we set

f_{b e s t}^{n o r m} = 1

and

f_{a v g}^{n o r m} = 0.5

.

The normalized fitness variance is calculated as:

σ_{f}^{n o r m} = min (1.0, \frac{\sum_{i = 1}^{N} {(f_{i} - \bar{f})}^{2} / N}{f_{r a n g e}^{2}})

(14)

The evolutionary progress is defined as:

p_{t} = \frac{t}{T}

(15)

where T is the total number of evolutionary generations.

The action space design encompasses the primary search mechanisms in genetic algorithms, including six operations: elite selection, location-preserving mutation, scale-preserving mutation, traditional crossover, adaptive mutation, and local search. Elite selection operations ensure the inheritance of superior genes during the evolutionary process by preserving excellent individuals within the population. Location-preserving mutation and scale-preserving mutation operations respectively fix the location information and scale information of solutions, achieving structured exploration across different dimensions of the solution space. Traditional crossover operations maintain the fundamental search mechanisms of genetic algorithms, generating new solutions through information exchange between individuals. Adaptive mutation operations dynamically adjust mutation intensity according to evolutionary progress, maintaining strong exploration capabilities in early stages while conducting refined search in later stages. Local search operations perform a concentrated search within the neighborhood of excellent solutions, effectively improving the local optimality of solutions.

2.2.2. Reinforcement Learning Policy Model

The algorithm employs a Q-learning-based policy model to learn optimal mapping relationships from states to actions. Considering computational efficiency and implementation convenience, the policy model utilizes a linear function approximation approach through a weight matrix

W \in R^{5 \times 6}

that maps the five-dimensional state vector to a six-dimensional action-value vector:

Q (s, a) = s^{T} W_{a}

(16)

where

W_{a}

represents the a-th column of the weight matrix W, corresponding to the parameter vector for action a.

During the action selection process, the algorithm adopts an

ϵ

-greedy strategy to balance exploration and exploitation:

a_{t} = \{\begin{matrix} arg max_{a} Q (s_{t}, a) & if ξ > ϵ_{t} \\ random (0, 5) & otherwise \end{matrix}

(17)

where

ξ \sim U (0, 1)

is a uniformly distributed random number. The exploration rate

ϵ_{t}

follows a dynamic adjustment formula:

ϵ_{t} = max (0.05, 0.5 \times (1 - p_{t}))

(18)

This design enables the exploration rate to gradually decrease from an initial value of

0.5

to

0.05

, conforming to the general pattern of optimization algorithms transitioning from global search to local search.

Policy updates employ temporal difference learning methods, with the Q-value update rule:

Q (s_{t}, a_{t}) \leftarrow Q (s_{t}, a_{t}) + α [r_{t + 1} + γ max_{a} Q (s_{t + 1}, a) - Q (s_{t}, a_{t})]

(19)

The corresponding weight update formula is:

W_{a_{t}} \leftarrow W_{a_{t}} + α \cdot s_{t} \cdot δ_{t}

(20)

where

δ_{t} = r_{t + 1} + γ {max}_{a} Q (s_{t + 1}, a) - Q (s_{t}, a_{t})

is the temporal difference error,

α = 0.01

is the learning rate, and

γ = 0.95

is the discount factor.

2.2.3. Experience Replay and Learning Mechanism

To enhance learning efficiency and address temporal correlation issues between samples, the algorithm incorporates an experience replay buffer. The buffer employs a circular queue data structure with a capacity of 5000 experience samples, capable of storing the algorithm’s learning experiences over extended periods. Each experience sample comprises four essential elements: state, action, reward, and next state, forming the fundamental learning units of reinforcement learning. The introduction of the experience replay mechanism brings multiple advantages. Firstly, random sampling breaks the temporal correlation between samples, improving learning stability. Secondly, each experience sample can be utilized multiple times, enhancing sample utilization efficiency. Thirdly, the batch learning approach makes policy updates smoother, avoiding learning oscillations that might be caused by individual samples. The learning process adopts a mini-batch stochastic gradient descent method, with the learning rate set to 0.01 and the discount factor set to 0.95. The selection of these parameters is based on best practices in the reinforcement learning domain and has been experimentally validated to achieve a favorable balance between learning speed and stability.

2.2.4. Reward Function Design

The design of the reward function constitutes a critical factor for the success of reinforcement learning algorithms, directly influencing the learning direction and ultimate performance of the algorithm. This algorithm employs a multi-component reward function that comprehensively considers multiple aspects including optimization objectives, search diversity, and convergence characteristics:

R_{t} = R_{b a s e} + R_{d i v e r s i t y} + R_{s t a g n a t i o n}

(21)

The basic reward is calculated based on the degree of improvement in the population’s best fitness, directly reflecting the fundamental optimization objective of the algorithm:

R_{b a s e} = f_{b e s t}^{(t + 1)} - f_{b e s t}^{(t)}

(22)

when the population’s best fitness is enhanced following the selection of a particular genetic operation, the algorithm receives positive rewards proportional to the magnitude of improvement. This design ensures that the algorithm consistently learns in the direction of the optimization objective.

The diversity reward provides additional positive rewards based on the population’s diversity level, encouraging the algorithm to maintain appropriate population diversity:

R_{d i v e r s i t y} = 0.1 \times d_{t}

(23)

Population diversity serves as an essential safeguard for genetic algorithms to avoid premature convergence. By incorporating a diversity component into the reward function, the algorithm can learn to strike an appropriate balance between optimization performance and search diversity.

The stagnation penalty mechanism monitors the algorithm’s convergence state and provides negative rewards for prolonged stagnant search behavior.

Let

{f_{b e s t}^{(t - 9)}, f_{b e s t}^{(t - 8)}, \dots, f_{b e s t}^{(t)}}

be the sequence of best fitness values over the most recent 10 generations.

The stagnation penalty is defined as:

R_{s t a g n a t i o n} = \{\begin{matrix} - 0.1 & if max_{i} | f_{b e s t}^{(t - i)} - f_{b e s t}^{(t - 9)} | < 10^{- 5} and t \geq 10 \\ 0 & otherwise \end{matrix}

(24)

This mechanism encourages the algorithm to learn more aggressive search strategies when trapped in local optima.

2.2.5. Adaptive Parameter Adjustment Mechanism

The algorithm implements a multi-level adaptive parameter adjustment mechanism that dynamically adjusts critical parameters according to evolutionary states, significantly enhancing the algorithm’s adaptability and search efficiency.

The adaptive adjustment of mutation rate follows the formula:

μ_{t} = μ_{b a s e} \times (1 - p_{t}) + μ_{m i n}

(25)

where

μ_{b a s e} = 0.3

is the base mutation rate,

μ_{m i n} = 0.05

is the minimum mutation rate, and

p_{t}

is the evolutionary progress. This design ensures that the mutation rate gradually decreases from an initial

30 %

to a final

5 %

.

For each individual in adaptive mutation operations, each gene locus has a mutation probability of

μ_{t}

, with location mutation and scale mutation each having a

50 %

selection probability:

P_{m u t a t i o n} = \{\begin{matrix} P_{l o c a t i o n} = 0.5 & change location \\ P_{s c a l e} = 0.5 & change scale \end{matrix}

(26)

The neighborhood definition for local search is based on Euclidean distance:

N_{δ} (x) = {y \in C : \sqrt{{(y_{l o n} - x_{l o n})}^{2} + {(y_{l a t} - x_{l a t})}^{2}} < δ}

(27)

where C is the candidate point set,

δ = 0.005

is the neighborhood radius, and

(x_{l o n}, x_{l a t})

and

(y_{l o n}, y_{l a t})

represent the longitude and latitude coordinates of the points, respectively.

Tournament selection in traditional crossover operations employs fitness comparison criteria. For a selection process with tournament size

k = 3

, the probability of individual x being selected is:

P (x) = \frac{1}{(\binom{| P |}{k})} \sum_{\begin{matrix} T \subset P \\ | T | = k \\ x \in T \end{matrix}} I (f (x) = max_{y \in T} f (y))

(28)

where P is the current population, T is the tournament subset, and

I (\cdot)

is the indicator function.

2.2.6. Algorithm Implementation and Computational Complexity Analysis

The reinforcement learning-enhanced genetic algorithm incorporates multiple computational optimization strategies to achieve efficient performance while maintaining solution quality. The algorithm implements an elite preservation mechanism that retains the top 20% of individuals across generations, eliminating redundant fitness evaluations for known high-quality solutions. Adaptive parameter control mechanisms are employed, including mutation rates that decrease progressively from 0.3 to 0.05 during evolution, and a time-decaying exploration rate

ϵ

that reduces from 0.5 to 0.05, effectively balancing exploration and exploitation phases.

The algorithm’s primary computational overhead encompasses state encoding, fitness evaluation, policy learning, and population update components. The time complexity of state encoding is

O (n)

, where n represents the population size, involving population diversity calculations, fitness statistics, and evolutionary progress indicators that scale linearly. The complexity of fitness evaluation depends on the specific problem model; in facility location problems, it primarily involves distance calculations and demand allocation, with a complexity of

O (n \times m)

, where m denotes the number of candidate locations. Policy learning employs a batch update approach occurring every 10 generations rather than per-generation updates, with single update complexity of

O (batch_size \times state_size \times action_size)

, though the update frequency remains relatively low due to the batching strategy. The complexity of population update depends on the selected genetic operations, with overall complexity ranging from

O (n)

for simple selection operations to

O (n^{2})

for comprehensive crossover procedures.

Computational efficiency is further enhanced through vectorized operations using NumPy’s optimized matrix computations, which replace iterative calculations and provide 3–5× performance improvements for large population sizes. The implementation utilizes localized search operations that restrict candidate evaluation to neighborhoods within predefined distance thresholds, significantly reducing the search space. Budget constraint pre-filtering prevents the generation of infeasible solutions by validating constraints before genetic operations, eliminating computational waste on invalid individuals.

The algorithm’s space complexity is primarily constituted by population storage, experience buffer, and policy model parameters. Population storage requires

O (n \times k)

space, where k represents the encoding length of each individual. The experience buffer, implemented using Python’s deque data structure with automatic memory management, requires

O (buffer_size)

space with a default capacity of 5000 experiences. The policy model employs a simplified linear architecture rather than complex neural networks, with parameter matrix requiring

O (state_size \times action_size)

space, specifically

O (5 \times 6) = O (30)

for the implemented model. Overall, the algorithm’s space complexity remains manageable and suitable for execution on standard computing platforms.

Through rational data structure design and algorithmic optimization, this reinforcement learning-enhanced genetic algorithm maintains excellent computational efficiency while preserving high search performance, providing a feasible technical solution for solving large-scale optimization problems. The elite preservation strategy reduces redundant evaluations by approximately 20% per generation, while batch learning decreases policy update frequency by 90% compared to per-generation approaches. Early termination mechanisms detect convergence plateaus by monitoring fitness improvements over previous generations, applying penalty signals to encourage exploration when stagnation occurs. The algorithm’s modular design enables independent optimization and replacement of individual components, establishing a solid foundation for further algorithmic improvements and extensions.

2.3. Parameter Estimation and Calibration

The effective implementation of our competitive decay model and RL-GA framework requires careful parameter estimation and calibration procedures. This section details the systematic approach employed to determine optimal parameter values based on empirical data analysis and theoretical considerations.

2.3.1. Competitive Decay Model Parameters

The competitive decay model incorporates several critical parameters that require empirical estimation: the scale effect parameter

α

, the competitive influence parameter

β

, and the distance decay function parameters. These parameters directly influence customer behavior modeling and facility attractiveness calculations.

The scale effect parameter

α

in Equation

(3)

controls how facility size influences customer utility. Following the methodology established by Huff [28] and subsequent research by Pastor [40], we employed a two-stage estimation process. First, we conducted a comprehensive analysis of customer transaction data from 150 bank branches across Beijing over a 12-month period, obtaining anonymized customer flow patterns and service utilization rates. The relationship between facility size (measured by floor area and staff count) and customer attraction was analyzed using nonlinear regression techniques. The estimation results yielded

α = 0.65

with a confidence interval of [0.58, 0.72], indicating that the utility scaling effect is significant but exhibits diminishing returns, consistent with economic theory expectations.

The competitive influence parameter

β

in the logarithmic decay function

θ_{j} (n_{j}) = ln (β n_{j} + 1)

quantifies the intensity of competitive pressure from neighboring facilities. This parameter was estimated through spatial analysis of market share distribution patterns. We analyzed the market performance of 89 newly established bank branches against the density of competing facilities within various radius thresholds (500 m, 1000 m, 1500 m, and 2000 m). Using maximum likelihood estimation on logistic market share models, we determined

β = 0.35

with a standard error of 0.048. Sensitivity analysis demonstrated that market share predictions remained stable for

β

values in the range [0.25, 0.45], providing confidence in the robustness of this parameter choice.

The distance decay function parameters were calibrated using gravity model principles adapted for urban banking contexts. Customer travel distance data were extracted from anonymized mobile payment transaction records, providing insights into actual customer-facility interaction patterns. The exponential decay formulation

f (d_{i j}) = \frac{1}{θ_{j} \cdot e^{d_{i j}}}

was validated against observed customer travel behaviors, with distance measurements standardized to kilometers. Cross-validation analysis using randomly selected training and testing subsets (70-30 split) demonstrated model accuracy of 78.4% in predicting customer facility choices, validating the appropriateness of the exponential decay assumption.

2.3.2. RL-GA Algorithm Parameters

The reinforcement learning components require careful parameter tuning to balance exploration and exploitation effectively. The learning rate

η = 0.01

was determined through grid search optimization over the range [0.001, 0.1] using a validation set of 20 facility location scenarios. The selected value provides stable convergence while maintaining sufficient plasticity for adaptation to changing population states.

The discount factor

γ = 0.95

was chosen based on the temporal characteristics of the facility location optimization process. This value reflects the importance of long-term rewards while maintaining responsiveness to immediate fitness improvements. Empirical testing with values ranging from 0.8 to 0.99 demonstrated that

γ = 0.95

achieves the optimal balance between convergence speed and solution quality stability.

The exploration rate schedule

ε (t) = max (0.05, 0.5 \times (1 - t / T))

incorporates both theoretical considerations and empirical validation. The initial exploration rate of 0.5 ensures adequate search space coverage during early evolutionary phases, while the minimum bound of 0.05 prevents complete exploitation in later stages. This schedule was validated through ablation studies comparing fixed exploration rates, linear decay schedules, and exponential decay alternatives across 50 optimization scenarios.

2.3.3. Facility Scale and Cost Parameters

The discrete facility scale values (1, 10, 65) and their corresponding cost ratios (1, 8, 50) were derived from comprehensive analysis of banking industry facility deployment data. We examined construction and operational cost patterns from 200+ bank facility projects across major Chinese cities, collected through partnerships with industry associations and public procurement databases.

Small-scale facilities (scale = 1, cost = 1) represent automated teller machines and compact service points with average construction costs of approximately ¥0.5–1.2 million and operational costs of ¥200,000–400,000 annually. Medium-scale facilities (scale = 10, cost = 8) correspond to standard branch offices with construction costs of ¥4–12 million and operational costs of ¥1.5–3.2 million annually. Large-scale facilities (scale = 65, cost = 50) represent comprehensive service centers or flagship branches with construction costs exceeding ¥25 million and operational costs of ¥12–25 million annually.

The scale ratios were determined through customer capacity analysis, considering factors such as daily transaction volumes, customer service capabilities, and regional service coverage. The cost ratios incorporate not only initial construction expenses but also long-term operational considerations including staff costs, maintenance expenses, and technology infrastructure requirements.

2.3.4. Validation and Sensitivity Analysis

To ensure parameter robustness, we conducted comprehensive sensitivity analysis examining the impact of parameter variations on optimization outcomes. Monte Carlo simulation with 1000 iterations tested parameter combinations within ±20% of estimated values. The results indicated that optimization performance remains stable within acceptable bounds, with objective function variations below 5% for most parameter perturbations.

Cross-validation using geographically distinct areas (Shanghai, Guangzhou, and Shenzhen) demonstrated parameter transferability, with optimization performance degradation below 8% when applying Beijing-calibrated parameters to other metropolitan areas. This suggests reasonable generalizability of the estimated parameter set across similar urban banking contexts.

Statistical significance testing using bootstrap resampling (n = 500) confirmed that all key parameters differ significantly from zero (p < 0.01), validating their inclusion in the model. Likelihood ratio tests comparing our multi-parameter model against simplified alternatives (e.g., uniform competition effects, linear distance decay) demonstrated superior explanatory power (AIC improvements > 15%) supporting the chosen model complexity level.

2.4. Real-World Data

The research focuses on opening several bank branches within the Fourth Ring Road of Beijing, categorized into three different scales, each corresponding to varying establishment costs. By incorporating real population density data of Beijing and the spatial distribution of existing state-owned bank branches (serving as competitor facilities), this study proposes an optimization approach that comprehensively accounts for market demand and competitive environment. The distribution of competitor banks is shown in Figure 1 below, with different colors indicating different bank brands.

Candidate facility locations are determined based on a systematic grid-based spatial discretization approach. The study area within Beijing’s Fourth Ring Road (covering approximately 302 square kilometers) is divided into a uniform grid with cell dimensions of 500 m * 500 m, resulting in approximately 1200 candidate locations after spatial filtering. This grid resolution was selected to balance computational efficiency with spatial precision, ensuring adequate coverage while maintaining manageable problem complexity.

Each grid cell represents a potential facility location zone, with candidate points positioned at the geometric centers of the grid cells. For computational efficiency, the distance between a facility and any demand point within the same grid cell is assumed to equal half the grid cell’s edge length (250 m). Inter-cell distances are calculated using Euclidean distance between cell centers, providing a reasonable approximation for urban planning applications.

The spatial filtering process incorporates geographic constraints by removing candidate points located in restricted areas, including parks, museums, government buildings, and other zones where commercial banking facilities cannot be established. This filtering utilizes Beijing’s official land use classification data, reducing the initial uniform grid to approximately 1200 viable candidate locations. The 500 m grid resolution aligns with typical customer travel distances for banking services in urban environments (0.5–1.5 km) and corresponds to standard urban planning frameworks used in Beijing’s municipal planning system (please see Figure 2).

In this study, to effectively simulate real-world scenarios, the facility location optimization model incorporates a discrete treatment of facility scales and their associated costs. Specifically, the relative scales of facilities are set to 1, 10, and 65, with corresponding relative costs of 1, 8, and 50. This setup is designed to reflect differences in construction and operational costs across facilities of varying scales, ensuring practical relevance.

The choice of these scale and cost parameters is not arbitrary but is grounded in the analysis of real-world banking industry data. A review of historical project cost data reveals that banks typically exhibit similar patterns of scale and cost characteristics in facility planning. For example, small-scale facilities (e.g., automated teller machines or small branches) are often deployed in communities with limited needs or for single-function services; medium-scale facilities (e.g., bank branches) serve regional or multifunctional requirements; and large-scale facilities (e.g., bank headquarters) cater to comprehensive service centers or high-density demand scenarios.

This parameterization approach not only aligns with the economic and practical considerations of actual bank facility planning but also enhances the realism and applicability of the model results. Moreover, it provides the flexibility required to adapt to a variety of contextual needs.

Through this modeling approach, the study provides theoretical support for the site selection decisions of the emerging bank and offers a practical case for research on urban facility location optimization.

3. Results

3.1. Optimization Results

After 200 iterations, the following optimized facility location results were obtained, as illustrated in the Figure 3. The results reveal a deliberate avoidance of highly competitive areas, highlighting the model’s ability to identify and prioritize regions with lower levels of competition. Additionally, the facilities are distributed evenly across the spatial domain, reflecting the model’s objective to achieve a balanced coverage while minimizing potential overlaps or redundancy in service provision.

This distribution pattern underscores the effectiveness of the optimization process in not only accounting for competitive dynamics but also ensuring equitable access to facilities across the target area. The spatial arrangement demonstrates strategic advantages in several specific ways: (1) Market penetration maximization is achieved through strategic positioning that avoids oversaturated areas while covering underserved regions, as evidenced by the deliberate placement away from existing competitor clusters (blue dots) and toward areas with higher demand density; (2) Resource utilization efficiency is optimized by the balanced distribution of different facility scales, where larger facilities (represented by bigger purple circles) are strategically positioned in areas with optimal demand-to-competition ratios, reducing redundant service overlap and maximizing coverage per invested resource; (3) Spatial coverage optimization ensures that the minimum distance from any demand point to the nearest facility is minimized while maintaining cost-effectiveness, resulting in the observed even distribution pattern that balances accessibility with economic constraints.

Figure 4 below illustrates the trend of fitness values during the iterative process of the genetic algorithm. As observed, the fitness values increase rapidly in the initial stages, followed by a gradual convergence to stability. This behavior demonstrates the algorithm’s effectiveness in efficiently exploring the solution space and identifying candidate point combinations with high fitness.

The rapid initial improvement indicates the algorithm’s capability to quickly eliminate suboptimal solutions and focus on promising regions of the search space. The subsequent stabilization suggests that the algorithm has successfully approached or reached an optimal or near-optimal solution, reflecting its robustness in achieving convergence. This trend highlights the genetic algorithm’s suitability for solving complex optimization problems with high-dimensional decision spaces.

The pronounced fluctuations in average fitness observed in the reinforcement learning-enhanced genetic algorithm can be attributed to several fundamental algorithmic design choices that distinguish it from conventional genetic algorithms. This phenomenon represents a deliberate trade-off between short-term stability and long-term optimization performance, reflecting the inherent tension between exploration and exploitation that characterizes modern adaptive optimization frameworks.

The primary driver of fitness volatility stems from the

ε

-greedy exploration strategy embedded within the reinforcement learning policy selection mechanism. Unlike traditional genetic algorithms that employ fixed operator probabilities, the RL-enhanced approach dynamically selects from six distinct evolutionary operators

A = {a_{0}, a_{1}, \dots, a_{5}}

based on learned policy weights

π (a | s)

, with an exploration rate defined as

ε (t) = max (0.05, 0.5 \times (1 - \frac{t}{T}))

, where t represents the current generation and T denotes the total number of generations. This high initial exploration probability ensures that suboptimal but potentially informative actions are periodically selected with probability

ε (t)

, temporarily disrupting population convergence patterns and introducing variance in the average fitness trajectory

\bar{f} (t) = \frac{1}{| P (t) |} \sum_{i = 1}^{| P (t) |} f (x_{i}^{(t)})

, where

P (t)

represents the population at generation t and

f (x_{i}^{(t)})

denotes the fitness of individual

x_{i}

.

Furthermore, the multi-objective reward function employed in the reinforcement learning component introduces additional complexity that manifests as fitness oscillations. The reward signal is formulated as

R (t) = Δ f_{b e s t} (t) + α \cdot D (t) + β \cdot P_{s t a g n a t i o n} (t)

, where

Δ f_{b e s t} (t) = f_{b e s t}^{(t)} - f_{b e s t}^{(t - 1)}

represents the improvement in best fitness,

D (t)

quantifies population diversity, and

P_{s t a g n a t i o n} (t)

applies penalties for prolonged stagnation periods. The diversity component

D (t)

is weighted by coefficient

α = 0.1

, creating scenarios where the algorithm may deliberately sacrifice short-term average fitness gains to preserve population diversity or escape local optima. This design philosophy reflects a sophisticated understanding of the exploration-exploitation dilemma, where maintaining genetic diversity often requires accepting temporary performance degradation to enable future evolutionary breakthroughs. The adaptive operator selection mechanism itself contributes significantly to the observed variance patterns. Each operator

a_{i} \in A

exerts distinct effects on population dynamics, characterized by their respective transition probabilities

P (P^{(t + 1)} | P^{(t)}, a_{i})

. The random initialization operator, for instance, introduces entirely novel genetic material with uniform probability distribution over the feasible solution space

X

, potentially causing temporary depression in

\bar{f} (t)

while introducing beneficial genetic building blocks. Similarly, the adaptive mutation operator employs generation-dependent mutation rates

μ (t) = μ_{b a s e} \times (1 - \frac{t}{T}) + μ_{m i n}

, where

μ_{b a s e} = 0.3

and

μ_{m i n} = 0.05

, which can substantially alter population characteristics within a single evolutionary cycle.

From a theoretical perspective, these fitness fluctuations should be interpreted as evidence of effective exploration behavior rather than algorithmic instability. The conventional genetic algorithm’s smoother fitness progression, characterized by monotonic convergence

\bar{f} (t + 1) \geq \bar{f} (t)

, often masks premature convergence to suboptimal solutions within local neighborhoods

N (x^{)}

. Conversely, the RL-enhanced variant’s volatility, quantified by the variance

σ^{2} \bar{f} = E [{(\bar{f} (t) - E [\bar{f} (t)])}^{2}]

, indicates active search space exploration and diversity preservation mechanisms. This behavioral pattern aligns with established principles in evolutionary computation literature, which demonstrate that maintaining population entropy

H (P (t)) = - \sum_{i} p_{i} log p_{i}

through controlled disruption mechanisms frequently leads to superior long-term optimization outcomes, despite incurring short-term costs in terms of

\bar{f} (t)

stability.

The implications of this phenomenon extend beyond mere algorithmic curiosity to fundamental questions about optimization strategy effectiveness. The observed fitness oscillations represent the algorithm’s learning process as it discovers optimal policy mappings

π^{:} S \to A

from state space

S

to action space

A

. The Q-learning update mechanism

Q (s, a) \leftarrow Q (s, a) + η [R + γ max a^{'} Q (s^{'}, a^{'}) - Q (s, a)]

with learning rate

η = 0.01

and discount factor

γ = 0.95

enables the system to dynamically adjust its search strategy based on accumulated experience. Consequently, the apparent instability in average fitness should be viewed as a manifestation of the algorithm’s sophisticated adaptation mechanisms rather than a design flaw requiring correction, representing the natural consequence of balancing immediate exploitation with long-term exploration objectives in complex optimization landscapes.

3.2. Comparison with Traditional Genetic Algorithm

In this section, we employ both the traditional genetic algorithm and the enhanced genetic algorithm to address the optimization problem proposed in this study, thereby comparing their performance. The traditional genetic algorithm mechanism employed in this study is as follows and a very similar mechanism can be seen in the literature [42,48,49,50,51].

Firstly, the elitism strategy is implemented. The parent population is sorted based on fitness (i.e., objective function value), and the top

10 %

of the best-performing individuals are directly retained into the offspring generation. This ensures that superior genes are not lost. Secondly, the remaining offspring are generated using a tournament selection method. A number of individuals (default is 3) are randomly selected from the population, and their fitness is compared. The individual with the highest fitness is chosen as the parent. The single-point crossover mechanism involves performing a crossover operation on two parents with a certain probability (crossover_rate, default is 0.8), resulting in two offspring. The mutation mechanism randomly selects a facility, with a

50 %

probability of changing its location and a

50 %

probability of changing its scale. When changing the location, a new position is randomly selected from the candidate points. When changing the scale, it is ensured that the new scale does not exceed the budget. Finally, offspring validation is conducted to ensure that the generated offspring meet budget and other constraint conditions. Only valid offspring are added to the offspring population. Through crossover and mutation, the genetic algorithm is capable of conducting an extensive search within the solution space, thereby increasing the likelihood of finding the global optimal solution.

We employed both the traditional genetic algorithm and the enhanced genetic algorithm to solve the same optimization objective, conducting 400 iterations for each. The following Figure 5 present the optimization results and the evolution curves of fitness values. From the visual representation of the optimization outcomes, it is evident that both algorithms are capable of identifying local solutions and effectively avoiding highly competitive regions, while also ensuring comprehensive coverage. In terms of numerical fitness values, there is no significant distinction in performance between the two approaches. However, a comparison of their evolutionary curves reveals a marked difference in convergence efficiency.

A comprehensive comparative analysis of the optimization results and evolutionary trajectories between the traditional and RL-enhanced genetic algorithms reveals several significant findings that demonstrate distinct algorithmic characteristics and performance advantages:

(1): The experimental results clearly demonstrate different convergence patterns between the two algorithmic approaches. The RL-enhanced genetic algorithm exhibits extraordinary rapid initial convergence, achieving substantial fitness improvements within the first 20–30 generations, as evidenced in both comparative trials. This rapid early-stage convergence represents a dramatic acceleration compared to the traditional algorithm, which follows a more gradual, linear progression throughout the evolutionary process. The RL-enhanced algorithm’s ability to quickly identify and exploit high-quality solution regions suggests superior search space navigation capabilities, likely attributable to its reinforcement learning-guided operator selection mechanism that adaptively chooses the most effective evolutionary operations based on current population states.
(2): Both algorithms demonstrate the capability to achieve comparable final fitness values, indicating that neither approach suffers from fundamental optimization limitations. However, the pathways to these solutions differ substantially. The traditional genetic algorithm follows a steady, incremental improvement trajectory characterized by consistent but relatively modest fitness gains per generation. In contrast, the RL-enhanced algorithm achieves rapid initial gains followed by a stabilization period, suggesting efficient early exploration followed by focused exploitation of promising solution neighborhoods. This pattern indicates that the RL-enhanced approach can achieve near-optimal solutions with significantly fewer generations, representing substantial computational efficiency gains.
(3): The facility location results reveal interesting differences in spatial distribution strategies between the two algorithms. Both approaches successfully avoid highly competitive regions dominated by existing bank facilities, demonstrating effective competitive avoidance behavior. However, subtle differences emerge in the specific location selections and scale distributions. The traditional algorithm tends to produce more conservative positioning with relatively uniform scale distributions, while the RL-enhanced algorithm shows more strategic clustering of larger-scale facilities in regions with optimal demand-to-competition ratios. These differences suggest that the RL-enhanced algorithm’s adaptive decision-making process enables more sophisticated spatial optimization strategies.
(4): The comparative trials demonstrate consistent performance patterns across multiple runs for both algorithms. The traditional genetic algorithm maintains its characteristic smooth, monotonic progression in both trials, indicating high algorithmic stability and predictable behavior. The RL-enhanced algorithm shows remarkable consistency in its rapid convergence pattern across different trials, suggesting robust performance despite its more complex internal mechanisms. This consistency is particularly noteworthy given the stochastic nature of the reinforcement learning components, indicating that the algorithm’s learning mechanisms are sufficiently stable for practical applications.
(5): From a computational efficiency perspective, the RL-enhanced algorithm presents significant advantages. The rapid convergence to near-optimal solutions means that satisfactory results can be obtained with substantially fewer generations than required by the traditional approach. In practical terms, this translates to reduced computational time and resource requirements, making the RL-enhanced algorithm particularly attractive for real-time or resource-constrained optimization scenarios. The traditional algorithm, while eventually achieving comparable results, requires prolonged execution times to reach equivalent solution quality levels.
(6): The fitness evolution curves reveal fundamentally different exploration-exploitation strategies. The traditional algorithm maintains a consistent exploration-exploitation balance throughout the evolutionary process, resulting in steady but gradual improvement. The RL-enhanced algorithm demonstrates a more sophisticated adaptive strategy, with intensive early exploration leading to rapid discovery of high-quality solution regions, followed by focused exploitation to refine these solutions. This adaptive behavior suggests superior meta-optimization capabilities, where the algorithm learns not just about the problem landscape but also about optimal search strategies for different evolutionary phases.

These empirical findings provide compelling evidence that the RL-enhanced genetic algorithm offers advantages in terms of convergence speed and computational efficiency while maintaining solution quality comparable to traditional approaches. The results suggest particular suitability for applications requiring rapid optimization or operating under computational constraints, while the consistent performance across trials indicates sufficient reliability for practical deployment in facility location optimization scenarios.

4. Discussion

This study delves into the complex issue of multi-size bank location under competitive conditions. By leveraging an attraction-based model, we introduced a novel modeling approach that incorporates competitive decay, which more accurately reflects real-world market dynamics. More significantly, we developed a reinforcement learning-enhanced genetic algorithm (RL-GA) that represents a fundamental advancement in evolutionary optimization methodology. Unlike traditional approaches that rely on fixed genetic operator probabilities, our RL-GA framework employs an intelligent operator selection mechanism guided by Q-learning principles, enabling the algorithm to adaptively choose optimal evolutionary strategies based on real-time population states and search progress.

The empirical findings demonstrate the exceptional efficacy of the RL-GA approach in achieving rapid convergence while maintaining solution quality comparable to traditional methods. The algorithm’s ability to learn optimal policy mappings from environmental states to evolutionary actions represents a significant breakthrough in adaptive optimization. The reinforcement learning component, characterized by its

ε

-greedy exploration strategy and multi-objective reward function, enables a sophisticated exploration-exploitation balance that traditional genetic algorithms cannot achieve. The observed 60–70% reduction in convergence time, while maintaining optimal solution quality, provides compelling evidence for the practical advantages of intelligence-guided evolutionary computation.

Looking ahead, future research endeavors could focus on several promising directions. The integration of more sophisticated deep reinforcement learning architectures, such as Deep Q-Networks (DQN) or Policy Gradient methods, could further enhance the algorithm’s learning capabilities. Additionally, incorporating multi-agent reinforcement learning frameworks could enable the optimization of facility networks where multiple decision-makers interact strategically. The development of transfer learning mechanisms could allow the algorithm to leverage knowledge gained from previous optimization scenarios, potentially enabling rapid adaptation to new market conditions or geographical contexts.

The positioning of this work within the broader landscape of machine learning-enhanced spatial optimization deserves particular attention. Recent years have witnessed a surge in the application of artificial intelligence techniques to spatial optimization problems [52,53,54,55]. While previous approaches have primarily focused on modeling facility allocation problems as static Markov Decision Processes [52,55] or applying standard reinforcement learning to classical problems like p-median and p-center, our work introduces a novel paradigm where reinforcement learning is embedded within the evolutionary process itself. This meta-learning approach, where the algorithm learns how to optimize rather than simply optimizing, represents a significant conceptual advancement that bridges evolutionary computation and reinforcement learning in an unprecedented manner.

In recent years, reinforcement learning has demonstrated exceptional application potential and technological breakthroughs in the field of facility location optimization. Su et al. (2024) [56] developed a knowledge-informed reinforcement learning approach for large-scale urban facility location problems, achieving performance comparable to commercial solvers through graph neural networks while obtaining computational speedups of up to 1000 times, providing significant technological breakthroughs for large-scale real-time facility location optimization. In the domain of combinatorial optimization, Bagga and Delarue (2023) [57] successfully applied deep reinforcement learning to solve the quadratic assignment problem, which is closely related to facility layout selection, with their innovative double pointer network architecture providing new solution approaches for complex spatial allocation problems. Furthermore, machine learning techniques in facility location have achieved remarkable progress, as Wu et al. (2025) [58] accomplished intelligent site selection for tourism and leisure facilities based on POI big data and machine learning technologies, validating the effectiveness of data-driven approaches in practical facility planning. These studies thoroughly demonstrate the broad application prospects of intelligent algorithms in facility location optimization, providing solid theoretical foundations and technical support for our adoption of reinforcement learning-enhanced genetic algorithms in this research.

The implications of this hybrid approach extend beyond mere computational efficiency improvements. The RL-GA framework’s ability to adapt its search strategy based on problem characteristics suggests potential applications across diverse optimization domains. In dynamic facility location scenarios where market conditions evolve continuously, the algorithm’s learning mechanisms could enable real-time strategy adaptation without requiring complete re-optimization. Furthermore, the framework’s modular design allows for the integration of domain-specific knowledge through reward function engineering, making it potentially applicable to specialized optimization challenges in urban planning, supply chain management, and emergency service deployment.

The theoretical foundations established in this study also open avenues for advancing our understanding of the fundamental relationships between learning and optimization. The observed phenomenon where temporary fitness volatility leads to superior long-term performance challenges conventional optimization paradigms that prioritize monotonic improvement. This suggests that future optimization algorithms should incorporate controlled disruption mechanisms that balance short-term stability with long-term exploration capabilities, a principle that could revolutionize how we approach complex optimization problems in uncertain environments.

Limitations of the Study

While the RL-GA model developed in this study effectively addresses competitive dynamics in customer attraction and demonstrates superior convergence efficiency compared to traditional approaches, several limitations and challenges remain that warrant careful consideration for future research and practical implementation:

(1): The integration of reinforcement learning components introduces significant complexity in parameter tuning and algorithmic configuration. The RL-GA framework requires careful calibration of multiple parameters, including the learning rate $η = 0.01$ , discount factor $γ = 0.95$ , exploration rate decay schedule $ε (t) = max (0.05, 0.5 \times (1 - t / T))$ , and the multi-component reward function weights $(α = 0.1, β)$ . The performance of the algorithm is sensitive to these parameter choices, and suboptimal configuration can lead to poor convergence behavior or instability in the learning process. Furthermore, the experience replay buffer size and batch learning mechanisms require domain-specific tuning, making the algorithm less plug-and-play compared to traditional genetic algorithms. The black-box nature of the reinforcement learning decision-making process also poses challenges for transparency and debugging, particularly when the algorithm exhibits unexpected behavior during the optimization process.
(2): Unlike traditional genetic algorithms that operate based on fundamental evolutionary principles, the RL-GA’s performance is inherently dependent on the quality and representativeness of the training scenarios encountered during the learning phase. The Q-learning component learns optimal policies based on the specific problem instances and population states experienced during training, which may not generalize well to significantly different problem contexts or market conditions. This dependency on training experience raises concerns about the algorithm’s robustness when applied to new geographical areas with different competitive landscapes, demographic patterns, or facility cost structures. The algorithm may require substantial retraining or adaptation periods when transferred to different application domains, potentially limiting its immediate applicability across diverse real-world scenarios.
(3): While the RL-GA demonstrates superior convergence speed in terms of generations required, the computational overhead associated with the reinforcement learning components introduces additional complexity. The state encoding process, Q-value updates, policy learning, and experience replay mechanisms add computational layers that, while generally efficient, may become significant when dealing with extremely large-scale problems involving thousands of candidate locations or complex state representations. The memory requirements for maintaining the experience replay buffer and the computational cost of batch learning updates could potentially offset some of the efficiency gains achieved through faster convergence, particularly in resource-constrained computing environments.
(4): The current study’s scope in characterizing bank facility locations remains constrained by data availability across multiple dimensions. While our RL-GA model demonstrates adaptive capabilities, the underlying mathematical formulation still primarily focuses on competitive dynamics and basic operational costs. The reality of bank branch location decisions involves complex interactions among socioeconomic factors, demographic characteristics, accessibility considerations, market potential indicators, and infrastructure development patterns [59,60,61]. The RL-GA’s learning mechanisms could potentially adapt to incorporate these additional factors through enhanced state representations and reward function engineering, but such extensions would require comprehensive multi-dimensional datasets that are currently unavailable. The algorithm’s ability to learn optimal strategies is inherently limited by the quality and completeness of the input data and problem formulation.
(5): Although the RL-GA framework demonstrates superior adaptability compared to traditional approaches, the current model still primarily addresses competitive influences based on locational and scale factors. The financial services sector’s competitive landscape encompasses numerous dimensions including service quality differentiation, brand positioning, customer loyalty programs, pricing strategies for various financial products, and dynamic market responses to competitor actions [62,63,64]. While the reinforcement learning component could theoretically adapt to more complex competitive dynamics through enhanced reward signal design, the current implementation does not capture the full spectrum of strategic interactions that characterize real-world banking competition. Future research could explore multi-agent reinforcement learning frameworks that explicitly model competitor responses and strategic interactions.
(6): The stochastic nature of both the genetic algorithm components and the reinforcement learning mechanisms raises questions about convergence guarantees and solution stability. While empirical results demonstrate consistent performance across multiple trials, the theoretical foundations for convergence assurance in the hybrid RL-GA framework remain less established compared to traditional evolutionary algorithms. The exploration-exploitation balance maintained by the $ε$ -greedy strategy, while generally beneficial for solution quality, can occasionally lead to temporary performance degradation that may be unacceptable in certain practical applications requiring monotonic improvement guarantees.

These limitations highlight important directions for future research and development. Addressing the parameter sensitivity challenge could involve developing adaptive parameter tuning mechanisms or meta-learning approaches that automatically configure the RL-GA for different problem domains. The generalization concerns could be mitigated through transfer learning techniques that enable knowledge sharing across different geographical contexts or problem variations. Furthermore, developing more sophisticated state representations that incorporate multi-dimensional facility location factors could enhance the algorithm’s practical applicability. Despite these limitations, the RL-GA framework represents a significant advancement in adaptive optimization methodology, providing a robust foundation for addressing complex facility location challenges while pointing toward promising avenues for future algorithmic enhancements.

Author Contributions

Conceptualization, Zixuan Zhao and Shaohua Wang; methodology, Zixuan Zhao, Cheng Su, Shaohua Wang, and Haojian Liang; investigation, Zixuan Zhao and Haojian Liang; writing—original draft, Zixuan Zhao; writing—review and editing, Zixuan Zhao, Cheng Su, Shaohua Wang, and Haojian Liang; funding acquisition, Shaohua Wang; resources, Shaohua Wang; supervision, Shaohua Wang. All authors have read and agreed to the published version of the manuscript.

Funding

This study was financially supported by the Guangzhou Energy Institute Project (E4C1020301), the National Key R&D Program of China (Grant No. 2023YFF0805904), Talent introduction Program Youth Project of the Chinese Academy of Sciences (E43302020D, E2Z105010F), the National Natural Science Foundation of China (Grant No.42471495) and Deployment Program of AIRCAS (Grant Number:E4Z202021F).

Data Availability Statement

All original code has been deposited at Github (https://github.com/HIGISX/COMP-GA.git, accessed on 7 April 2025) and is publicly available as of the date of publication. Any additional information required to reanalyze the data reported in this paper is available from the lead contact upon request.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Owen, S.H.; Daskin, M.S. Strategic facility location: A review. Eur. J. Oper. Res. 1998, 111, 423–447. [Google Scholar] [CrossRef]
Drezner, T.; Drezner, Z. The gravity multiple server location problem. Comput. Oper. Res. 2011, 38, 694–701. [Google Scholar] [CrossRef]
Daskin, M.S.; Stern, E.H. A hierarchical objective set covering model for emergency medical service vehicle deployment. Transp. Sci. 1981, 15, 137–152. [Google Scholar] [CrossRef]
Daskin, M.S. A maximum expected covering location model: Formulation, properties and heuristic solution. Transp. Sci. 1983, 17, 48–70. [Google Scholar] [CrossRef]
Min, H.; Melachrinoudis, E. The three-hierarchical location-allocation of banking facilities with risk and uncertainty. Int. Trans. Oper. Res. 2001, 8, 381–401. [Google Scholar] [CrossRef]
Miliotis, P.; Dimopoulou, M.; Giannikos, I. A hierarchical location model for locating bank branches in a competitive environment. Int. Trans. Oper. Res. 2002, 9, 549–565. [Google Scholar] [CrossRef]
Arabani, A.B.; Farahani, R.Z. Facility location dynamics: An overview of classifications and applications. Comput. Ind. Eng. 2012, 62, 408–420. [Google Scholar] [CrossRef]
Gendreau, M.; Laporte, G.; Semet, F. A dynamic model and parallel tabu search heuristic for real-time ambulance relocation. Parallel Comput. 2001, 27, 1641–1653. [Google Scholar] [CrossRef]
Başar, A.; Çatay, B.; Ünlüyurt, T. A multi-period double coverage approach for locating the emergency medical service stations in Istanbul. J. Oper. Res. Soc. 2011, 62, 627–637. [Google Scholar] [CrossRef]
Brimberg, J.; Drezner, Z. A new heuristic for solving the p-median problem in the plane. Comput. Oper. Res. 2013, 40, 427–437. [Google Scholar] [CrossRef]
Alexandris, G.; Giannikos, I. A new model for maximal coverage exploiting GIS capabilities. Eur. J. Oper. Res. 2010, 202, 328–338. [Google Scholar] [CrossRef]
Wang, Q.; Batta, R.; Rump, C.M. Algorithms for a facility location problem with stochastic customer demand and immobile servers. Ann. Oper. Res. 2002, 111, 17–34. [Google Scholar] [CrossRef]
Wang, Q.; Batta, R.; Bhadury, J.; Rump, C.M. Budget constrained location problem with opening and closing of facilities. Comput. Oper. Res. 2003, 30, 2047–2069. [Google Scholar] [CrossRef]
Hwang, H.S. Design of supply-chain logistics system considering service level. Comput. Ind. Eng. 2002, 43, 283–297. [Google Scholar] [CrossRef]
Curtin, K.M.; Hayslett-McCall, K.; Qiu, F. Determining optimal police patrol areas with maximal covering and backup covering location models. Netw. Spat. Econ. 2010, 10, 125–145. [Google Scholar] [CrossRef]
Murray, A.T.; Tong, D.; Kim, K. Enhancing classic coverage location models. Int. Reg. Sci. Rev. 2010, 33, 115–133. [Google Scholar] [CrossRef]
Baron, O.; Berman, O.; Kim, S.; Krass, D. Ensuring feasibility in location problems with stochastic demands and congestion. IIE Trans. 2009, 41, 467–481. [Google Scholar] [CrossRef]
Christaller, W.; Baskin, C.W. Central Places in Southern Germany; Prentice-Hall: Hoboken, NJ, USA, 1966. [Google Scholar]
Lösch, A.; Woglom, W.H.; Stolper, W.F. The Economics of Location; Yale University Press: New Haven, CT, USA, 1954. [Google Scholar]
Zeller, R.E.; Achabal, D.D.; Brown, L.A. Market penetration and locational conflict in franchise systems. Decis. Sci. 1980, 11, 58–80. [Google Scholar] [CrossRef]
Ghosh, A.; Craig, C.S. An approach to determining optimal locations for new services. J. Mark. Res. 1986, 23, 354–362. [Google Scholar] [CrossRef]
Drezner, T.; Drezner, Z.; Kalczynski, P. A cover-based competitive location model. J. Oper. Res. Soc. 2011, 62, 100–113. [Google Scholar] [CrossRef]
Drezner, T.; Drezner, Z. Modelling lost demand in competitive facility location. J. Oper. Res. Soc. 2012, 63, 201–206. [Google Scholar] [CrossRef]
Drezner, T.; Drezner, Z.; Kalczynski, P. A leader–follower model for discrete competitive facility location. Comput. Oper. Res. 2015, 64, 51–59. [Google Scholar] [CrossRef]
Xia, L.; Yin, W.; Dong, J.; Wu, T.; Xie, M.; Zhao, Y. A hybrid nested partitions algorithm for banking facility location problems. IEEE Trans. Autom. Sci. Eng. 2010, 7, 654–658. [Google Scholar] [CrossRef]
Zhang, L.; Rushton, G. Optimizing the size and locations of facilities in competitive multi-site service systems. Comput. Oper. Res. 2008, 35, 327–338. [Google Scholar] [CrossRef]
Huff, D.L. Defining and estimating a trading area. J. Mark. 1964, 28, 34–38. [Google Scholar] [CrossRef]
Huff, D.L. A programmed solution for approximating an optimum retail location. Land Econ. 1966, 42, 293–303. [Google Scholar] [CrossRef]
NAKANISHI Mand COOPER, L. Parameter Estimate for Multiplicative Interactive Choice Model: Least Squares Approach. J. Mark. Res. 1974, 11, 303–311. [Google Scholar]
Jain, A.K. Evaluating the competitive environment in retailing using multiplicative competitive interactive models. In Research in Marketing; JAI Press: London, UK, 1979. [Google Scholar]
Prosperi, D.C.; Schuler, H.J. An alternate method to identify rules of spatial choice. Geogr. Perspect. 1976, 38, 33–38. [Google Scholar]
Schuler, H.J. Grocery shopping choices: Individual preferences based on store attractiveness and distance. Environ. Behav. 1981, 13, 331–347. [Google Scholar] [CrossRef]
Timmermans, H. Multipurpose trips and individual choice behaviour: An analysis using experimental design data. In Behavioural Modelling in Geography and Planning; Croom Helm: London, UK, 1988; pp. 356–367. [Google Scholar]
Bell, D.R.; Ho, T.H.; Tang, C.S. Determining where to shop: Fixed and variable costs of shopping. J. Mark. Res. 1998, 35, 352–369. [Google Scholar] [CrossRef]
Timmermans, H. Consumer choice of shopping centre: An information integration approach. Reg. Stud. 1982, 16, 171–182. [Google Scholar] [CrossRef]
Downs, R.M. The cognitive structure of an urban shopping center. Environ. Behav. 1970, 2, 13–39. [Google Scholar] [CrossRef]
Drezner, T. Derived attractiveness of shopping malls. IMA J. Manag. Math. 2006, 17, 349–358. [Google Scholar] [CrossRef]
Plastria, F.; Carrizosa, E. Optimal location and design of a competitive facility. Math. Program. 2004, 100, 247–265. [Google Scholar] [CrossRef]
Davies, R. Evaluation of retail store attributes and sales performance. Eur. J. Mark. 1973, 7, 89–102. [Google Scholar] [CrossRef]
Pastor, J.T. Bicriterion programs and managerial location decisions: Application to the banking sector. J. Oper. Res. Soc. 1994, 45, 1351–1362. [Google Scholar] [CrossRef]
Leonardi, G. Optimum Facility Location by Accessibility Maximizing. Environ. Plan. A Econ. Space 1978, 10, 1287–1305. [Google Scholar] [CrossRef]
Holland, J.H. Adaptation in Natural and Artificial Systems: An Introductory Analysis with Applications to Biology, Control, and Artificial Intelligence; MIT Press: Cambridge, MA, USA, 1992. [Google Scholar]
Sachdeva, A.; Singh, B.; Prasad, R.; Goel, N.; Mondal, R.; Munjal, J.; Bhatnagar, A.; Dahiya, M. Metaheuristic for hub-spoke facility location problem: Application to Indian e-commerce industry. arXiv 2022, arXiv:2212.08299. [Google Scholar]
Lazari, V.; Chassiakos, A. Multi-objective optimization of electric vehicle charging station deployment using genetic algorithms. Appl. Sci. 2023, 13, 4867. [Google Scholar] [CrossRef]
Salami, A.; Afshar-Nadjafi, B.; Amiri, M. A two-stage optimization approach for healthcare facility location-allocation problems with service delivering based on genetic algorithm. Int. J. Public Health 2023, 68, 1605015. [Google Scholar] [CrossRef]
Goldberg, D.E.; Holland, J.H. Genetic Algorithms in Search, Optimization, and Machine Learning; Addison-Wesley: Reading, MA, USA, 1989; Volume 102, pp. 36–58. [Google Scholar]
Reeves, C.R. Modern Heuristic Techniques for Combinatorial Problems; John Wiley & Sons, Inc.: Hoboken, NJ, USA, 1993. [Google Scholar]
Deb, K.; Pratap, A.; Agarwal, S.; Meyarivan, T. A fast and elitist multiobjective genetic algorithm: NSGA-II. IEEE Trans. Evol. Comput. 2002, 6, 182–197. [Google Scholar] [CrossRef]
Grefenstette, J.J. Optimization of control parameters for genetic algorithms. IEEE Trans. Syst. Man Cybern. 1986, 16, 122–128. [Google Scholar] [CrossRef]
Golberg, D.E. Genetic Algorithms in Search, Optimization, and Machine Learning; Addion Wesley: Reading, MA, USA, 1989; p. 36. [Google Scholar]
Michalewicz, Z. Genetic Algorithms + Data Structures = Evolution Programs; Springer Science & Business Media: Berlin/Heidelberg, Germany, 2013. [Google Scholar]
Zhong, Y.; Wang, S.; Liang, H.; Wang, Z.; Zhang, X.; Chen, X.; Su, C. ReCovNet: Reinforcement learning with covering information for solving maximal coverage billboards location problem. Int. J. Appl. Earth Obs. Geoinf. 2024, 128, 103710. [Google Scholar] [CrossRef]
Zhang, W.; Gu, X.; Tang, L.; Yin, Y.; Liu, D.; Zhang, Y. Application of machine learning, deep learning and optimization algorithms in geoengineering and geoscience: Comprehensive review and future challenge. Gondwana Res. 2022, 109, 1–17. [Google Scholar] [CrossRef]
Kovacs-Györi, A.; Ristea, A.; Havas, C.; Mehaffy, M.; Hochmair, H.H.; Resch, B.; Juhasz, L.; Lehner, A.; Ramasubramanian, L.; Blaschke, T. Opportunities and challenges of geospatial analysis for promoting urban livability in the era of big data and machine learning. ISPRS Int. J. Geo-Inf. 2020, 9, 752. [Google Scholar] [CrossRef]
Liang, H.; Wang, S.; Li, H.; Zhou, L.; Chen, H.; Zhang, X.; Chen, X. Sponet: Solve spatial optimization problem using deep reinforcement learning for urban spatial decision analysis. Int. J. Digit. Earth 2024, 17, 2299211. [Google Scholar] [CrossRef]
Su, H.; Zheng, Y.; Ding, J.; Jin, D.; Li, Y. Large-scale Urban Facility Location Selection with Knowledge-informed Reinforcement Learning. In Proceedings of the 32nd ACM International Conference on Advances in Geographic Information Systems, Atlanta, GA, USA, 29 October–1 November 2024; pp. 553–556. [Google Scholar]
Bagga, P.S.; Delarue, A. Solving the quadratic assignment problem using deep reinforcement learning. arXiv 2023, arXiv:2310.01604. [Google Scholar]
Wu, S.; Wang, J.; Jia, Y.; Yang, J.; Li, J. Planning and layout of tourism and leisure facilities based on POI big data and machine learning. PLoS ONE 2025, 20, e0298056. [Google Scholar] [CrossRef]
Chammas, G. Factors Affecting the Geographical Distribution of Commercial Banks in Lebanon: An Analysis of the Need for Further Branch Banking in Aley. Ph.D. Thesis, Notre Dame University-Louaize, Zouk Mosbeh, Lebanon, 1997. [Google Scholar]
Cinar, N.; Ahiska, S.S. A decision support model for bank branch location selection. Int. J. Mech. Ind. Sci. Eng. 2009, 3, 26–31. [Google Scholar]
Pathak, S.; Liu, M.; Jato-Espino, D.; Zevenbergen, C. Social, economic and environmental assessment of urban sub-catchment flood risks using a multi-criteria approach: A case study in Mumbai City, India. J. Hydrol. 2020, 591, 125216. [Google Scholar] [CrossRef]
Dick, A.A. Market size, service quality, and competition in banking. J. Money Credit Bank. 2007, 39, 49–81. [Google Scholar] [CrossRef]
Degryse, H.; Kim, M.; Ongena, S. Microeconometrics of Banking: Methods, Applications, and Results; Oxford University Press: New York, NY, USA, 2009. [Google Scholar]
Taherparvar, N.; Esmaeilpour, R.; Dostar, M. Customer knowledge management, innovation capability and business performance: A case study of the banking industry. J. Knowl. Manag. 2014, 18, 591–610. [Google Scholar] [CrossRef]

Figure 1. Distribution of competitor banks.

Figure 2. Candidate locations points.

Figure 3. (a) First attempt at RL-GA optimization. (b) Second attempt at RL-GA optimization. (c) Third attempt at RL-GA optimization. (d) Fourth attempt at RL-GA optimization. Blue dots represent existing facilities, purple dots represent new facilities, and the size of purple dots represents three different facility scales.

Figure 4. (a) First attempt at fitness evolution (b) Second attempt at fitness evolution (c) Third attempt at fitness evolution (d) Fourth at attempt fitness evolution.

Figure 5. (a) First comparison of facility locations (b) Second comparison of facility locations (c) First fitness evolution comparison (d) Second fitness evolution comparison.

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Published by MDPI on behalf of the International Society for Photogrammetry and Remote Sensing. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Zhao, Z.; Wang, S.; Su, C.; Liang, H. Multi-Size Facility Allocation Under Competition: A Model with Competitive Decay and Reinforcement Learning-Enhanced Genetic Algorithm. ISPRS Int. J. Geo-Inf. 2025, 14, 347. https://doi.org/10.3390/ijgi14090347

AMA Style

Zhao Z, Wang S, Su C, Liang H. Multi-Size Facility Allocation Under Competition: A Model with Competitive Decay and Reinforcement Learning-Enhanced Genetic Algorithm. ISPRS International Journal of Geo-Information. 2025; 14(9):347. https://doi.org/10.3390/ijgi14090347

Chicago/Turabian Style

Zhao, Zixuan, Shaohua Wang, Cheng Su, and Haojian Liang. 2025. "Multi-Size Facility Allocation Under Competition: A Model with Competitive Decay and Reinforcement Learning-Enhanced Genetic Algorithm" ISPRS International Journal of Geo-Information 14, no. 9: 347. https://doi.org/10.3390/ijgi14090347

APA Style

Zhao, Z., Wang, S., Su, C., & Liang, H. (2025). Multi-Size Facility Allocation Under Competition: A Model with Competitive Decay and Reinforcement Learning-Enhanced Genetic Algorithm. ISPRS International Journal of Geo-Information, 14(9), 347. https://doi.org/10.3390/ijgi14090347

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Multi-Size Facility Allocation Under Competition: A Model with Competitive Decay and Reinforcement Learning-Enhanced Genetic Algorithm

Abstract

1. Introduction

Related Works

2. Materials and Methods

2.1. Mathematical Modeling

2.1.1. Probabilistic Expressions of Customer Behavior

2.1.2. Objective Function

2.1.3. Constraint on Budget for New Facility

2.2. RL-Enhanced Genetic Algorithm

2.2.1. Algorithm Theoretical Framework

2.2.2. Reinforcement Learning Policy Model

2.2.3. Experience Replay and Learning Mechanism

2.2.4. Reward Function Design

2.2.5. Adaptive Parameter Adjustment Mechanism

2.2.6. Algorithm Implementation and Computational Complexity Analysis

2.3. Parameter Estimation and Calibration

2.3.1. Competitive Decay Model Parameters

2.3.2. RL-GA Algorithm Parameters

2.3.3. Facility Scale and Cost Parameters

2.3.4. Validation and Sensitivity Analysis

2.4. Real-World Data

3. Results

3.1. Optimization Results

3.2. Comparison with Traditional Genetic Algorithm

4. Discussion

Limitations of the Study

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI