Research and Modeling of Commercial Location Selection Based on Geographic Big Data and Mobile Signaling Data—A Case Study of the Central Urban Area of Beijing

Jin Zou; Xun Zhang; Yangxiao Cong; Zhentong Gao; Jinlian Shi

doi:10.3390/ijgi13120432

,

and

¹

School of Mathematics and Physics, Xinjiang Hetian College, Hetian 848000, China

²

School of Computer and Artificial Intelligence, Beijing Technology and Business University, Beijing 100048, China

³

Railway Economic and Planning Research Institute, Beijing 100089, China

^*

Author to whom correspondence should be addressed.

ISPRS Int. J. Geo-Inf.2024, 13(12), 432;https://doi.org/10.3390/ijgi13120432

Version Notes

Order Reprints

Abstract

The layout and site selection strategy of commercial facilities are crucial for both enterprise performance and market image, while also significantly impacting the overall planning of urban commercial environments. However, conventional methods of choosing sites sometimes depend on outdated management information systems or static statistical models, which may not take into account all relevant factors and have poor data quality. By utilizing geographical big data and geographical artificial intelligence, this study improves the viability of commercial layout and site selection methods. This study utilizes mobile phone signaling data from Beijing combined with point-of-interest (POI) data from within the Sixth Ring Road of Beijing to identify user behaviors using algorithms. Through a combination of BiLSTM-RF and reinforcement learning algorithms, a population location prediction algorithm is constructed to address the issues of inaccurate and outdated population flow data in commercial site selection. The forecast distribution has a high level of accuracy, with a prediction accuracy rate of 73.2%. Additionally, based on geographical big data, the urban landscape is reconstructed to create a 3D model of Beijing. An immersive interactive commercial site selection system is implemented using the Unreal Engine.

Keywords:

geographical big data; mobile phone signaling; commercial site selection; algorithm; model

1. Introduction

Commercial site selection refers to the process of systematically analyzing and considering the optimal geographic location for opening a new business, aiming to maximize opportunities that best meet the company’s needs. With ongoing urbanization, commercial site selection plays an increasingly crucial role in urban development. Specifically, commercial site selection is a multi-criteria decision problem [1], where the best option is chosen from several potential solutions based on multiple criteria [2]. The determination of commercial site selection includes methods based on surveys [3] and data analysis [4]. Surveys often lack real data support and rely on the opinions of a subset of the population, while most data analysis methods tend to consider factors more narrowly.

This study argues that the accuracy of site selection is influenced by the distribution of populations, transportation convenience, and the vitality and competitiveness of the surrounding environment. It emphasizes the need to support these factors with real data. Hence, in this study, we present a model for predicting population distribution and a model for selecting commercial sites based on many factors. These models offer a range of potential locations to help choose the optimal place for launching a business.

Over the past decade, cities have become increasingly intelligent, with advancements in information and communication technology enabling the monitoring of urban activities [5]. In recent years, the proliferation of mobile devices and location-based services has generated a wealth of social data, offering methods and opportunities to analyze user locations [6]. Predicting user locations is beneficial for studying population distributions and uncovering potential commercial opportunities. While digital twins are primarily applied in manufacturing, other commercial fields are beginning to recognize their potential uses [7]. Commercial site selection solutions can create a digital twin that updates the virtual world in response to changes in the real world [8]. Digital twins allow for simulations in a virtual environment, thereby identifying the advantages and disadvantages of planned actions in reality [9]. This is especially crucial for choosing business sites because it allows decision-makers to lower risks by simulating scenarios using extensive data.

This study introduces a digital twin commercial site selection model for Beijing that utilizes technologies that include digital twins and geographic information to improve the visualization, management, and decision-making process of commercial site selection. This methodology is a unique and innovative analytical decision-making method. Section 2 reviews related work in various fields. Section 3 discusses the use of mobile signaling data for population distribution prediction and the design of the multi-factor commercial site selection model. Section 4 outlines the digital twin system for commercial site selection, the setup for simulation experiments, and feedback obtained from the site selection results. Section 5 presents the simulation results and discusses future work.

2. Related Work

2.1. Digital Twins

Digital twin (DT) technology, an emerging field based on digital technology and data modeling, aims to create a digital replica or representation of the physical world. The concept of digital twins can be traced back to 1969 [10], but it was not until 2023 that Professor Michael Grieves from the University of Michigan clearly defined the concept. Originally, digital twins were defined by NASA as prototypes for future NASA and U.S. Air Force spacecraft [11]. Initially, digital twins were viewed as digital representations of physical products, but due to technological and cognitive limitations, they did not receive the attention they deserved. However, with the establishment of the concept of digital twin cities, the features of virtual–real interaction, precise mapping, and data sharing provide strong technological support for building digital twin cities [12].

In urban management, the concept of digital twin cities and the technologies that could support them have been widely researched, with the aim of using city twins to assess and predict urban states, thereby aiding decision-makers’ choices [13]. Singapore made an initial attempt at a digital twin smart city in 2017 [14]. One study constructed 3D models of cities and provided urban planning data [7], publicly revealing a digital twin system for Dublin Port in Ireland, demonstrating how users could engage with the model for the urban planning of green spaces and enabling interaction and feedback on planning changes.

2.2. Geolocation Prediction

Geolocation prediction is of significant importance to operators and businesses, as it can enhance personalized service levels and bring substantial economic benefits, including in areas like advertising, information recommendation, and product marketing [15]. Twitter data, as social media data, include geotags and text information, making these data suitable for location prediction. Erica C [16] utilized probabilistic graphical models to achieve location inference from users’ posts and social network information. Another study [17] validated the feasibility of using a Markov chain model for location prediction and explored the construction of Markov chain models using geographic data from Twitter and Foursquare for future path prediction. Clustering methods can partition data with similar spatial attributes into different clusters, thereby uncovering correlations between different location points and more accurately predicting users’ next locations. Research utilizing the semantics of points of interest (POIs) for clustering [18], combined with spatiotemporal data and semantic information, predicts users’ future check-in locations. Additionally, the analysis of Twitter users’ tweet content, combined with Gaussian Mixture Models (GMMs) [19], has been used to predict users’ home addresses, improving the accuracy of home address predictions. Word embedding models, such as Word2Vec [20] and Doc2Vec [21], have been successfully applied to location prediction tasks. These models effectively capture the linguistic patterns within sentences and trajectory points in sequences, inspired by successful vocabulary embedding in natural language processing and text-mining fields, prompting many studies to use word embedding models to infer location embeddings.

2.3. Commercial Site Selection

In the context of the rapid development of urban big data and artificial intelligence, commercial site selection has become a highly focused research area. Commercial site selection not only has significant implications for business interests, but also directly affects the sustainability of future development [22]. For business decision-makers, the metrics for evaluating the value of a candidate location typically involve assessing how many potential customers the location can attract. Research that solely considers geographic location has now evolved [23], with numerous studies emerging that use diverse heterogeneous information from geographic information services, POI data, and more to select the optimal location.

One study [24] proposed a demand-driven method for commercial store location selection, combining multi-source spatiotemporal big data, clustering analysis, and customer count prediction to determine potential store locations and optimize store layouts. In another study [25], geographic location and spatial characteristics were integrated, and a spatial connectivity algorithm was introduced to identify candidate locations, aiming to maximize impact and minimize costs. Yang et al. [26] developed a HoLSAT (hotel location selection and analysis toolset) application, combining WebGIS and machine learning algorithms to obtain hotel locations. Lu et al. [27] combined neural network regression prediction and the MCDM model to predict hotel deployment locations based on taxi GPS data. Liu et al. [28] built an urban billboard site selection system named SmartAdP based on taxi trajectory data; Wang et al. [29] constructed a hybrid BP neural network (backpropagation neural network) algorithm based on urban point-of-interest data to perform shop location selection and visualized the location results.

Furthermore, researchers have also proposed commercial site selection decision-making methods based on geographic spatial data, considering multiple criteria such as competitors, traffic convenience, and traffic conditions. One study [30] utilized an AHP/TOPSIS hybrid method to determine commercial development sites. Another explored [31] the need to consider various dimensions in commercial site selection decisions, using multi-criteria analysis methods for evaluation, including indicators like climate, safety, and technology, from a commercial perspective.

How to conduct commercial site selection is a valuable scientific question. By using models to predict the travel behavior and locations of crowds, the prediction results can provide important references for commercial site selection. Utilizing digital twin technology allows for the easy viewing and analysis of various forms of information in the target area, helping decision-makers better understand the regional environment and market potential. This not only enhances the scientific and accurate management and decision-making of commercial site selection, but also provides new tools and methods for the sustainable development of urban commercial site selection, which is of significant importance for the development of the commercial site selection field. In this paper, we use mobile signaling data to construct a hierarchical reinforcement learning model to predict crowd movement and subsequently design a multi-factor commercial site selection algorithm for assessment. The site selection results are presented using digital twin technology, where we integrate real geographic information data to construct a 3D model, creating a digital twin of Beijing city and incorporating the site selection results within the digital city.

3. Materials and Methods

3.1. Study Area and Data

This paper selects Beijing as the research area. Located in northern China, Beijing is the capital and a political and cultural center of the country. The total area of the city is 16,410.54 square kilometers, with the built-up area covering 4706.6 square kilometers. Beijing is administratively divided into 16 districts, and urban development primarily focuses on the central urban area and surrounding towns, forming a contiguous urban cluster. Economically, the central urban area is dominated by high-end services, such as finance, technology, and culture, while the surrounding districts are supported by manufacturing, logistics, and other industries. Generally, the urban planning cycle for developed cities spans about 20 years, with minimal changes in urban functions and structures [32]. This study focuses on the area within the Sixth Ring Road of Beijing, which is in the central urban area of Beijing. This study is of great significance to the commercial layout and development of cities.

Mobile signaling data can accurately reflect human movement patterns [33]. The research data in this paper come from the Jizhi network data platform, which is a SaaS (software-as-a-service) big data platform that uses SQL language for data extraction. This study collected 190 million pieces of user mobile location data from 1 October to 5 October 2021. After data preprocessing, 48 million records were selected as the subjects of the study, covering 490,000 people. These data focus on signals generated by residents traveling within the 6th Ring Road in Beijing, and is therefore divided into latitude and longitude from [116.0977, 40.1730] to [116.7184, 39.7245] (WGS coordinate system). On the one hand, the commercial vitality during the holiday season is strong; on the other hand, foreign tourists and local residents both generate data, generating compounded commercial information. So, the data are representative of the behavioral characteristics of a large number of people.

The data platform has a total of 17 data tables, which are mainly divided into 6 major types, namely, residence, travel, user attributes, code table, base location code table, and travel JVP. In this study, three data tables are used, including a location grid table from the basic location code table (grid), a mobile base station table from the travel table (move_vp), and a user attribute table from the user attribute table (user_attribute). The location grid table includes unique identifier grid_id, side length, and the latitude and longitude of the grid center; the mobile base station table includes the user’s unique identifier uid, travel number, travel order, and the location grid number of the base station passed; and the user attribute table includes the user’s unique identifier uid, gender, age, and place of origin, among other details. The field descriptions of these three data tables are shown in Table 1, Table 2 and Table 3.

Table 1. Location grid table.

Table 2. Traveling through the base station table.

Table 3. User attribute table.

Based on a preliminary examination of the gathered data, it can be seen from Figure 1 that user routes show obvious patterns and similarities, with the majority of movement taking place between 6 a.m. and 9 p.m. Between 10 a.m. and 1 p.m., there is a discernible drop in the volume of movement. This is probably because this time frame falls around lunchtime, when users may stop doing anything, which also lowers the rate of trip. Between 4 p.m. and 9 p.m., there is a considerable reduction in travel, which might be attributed to consumers relaxing at home.

Figure 1. Daily average travel volume of users.

Through the analysis of POI (point-of-interest) data, suggestions for tourism, urban planning, and business decision-making may be made with greater ease and an improved understanding of geographic spatial features and distribution patterns.

In this paper, we collected POI data of Beijing city using Gaode map and Baidu map APIs, which contain information related to name, major category, medium category, latitude and longitude, province, city, and region. Among them, POI types include catering and food, companies and enterprises, shopping and consumption, transportation facilities, financial institutions, hotels and accommodations, science, education and culture, tourist attractions, business and residential, living services, leisure and entertainment, healthcare, and sports and fitness. The above POI types are categorized into four main categories: accommodation, work, dining, and entertainment.

Using the base station location information from the grid table and cell phone signaling data, the attributes of the points of interest within 250 m of each base station are calculated. The POI data covered under each base station are counted and the distance between the base station and the POI is calculated. The constructed POI dataset within the Sixth Ring Road of Beijing is shown in Table 4.

Table 4. POI dataset within the Sixth Ring Road of Beijing.

3.2. Methods

3.2.1. Crowd Distribution Prediction

Stationary Point Identification

The user’s traveling stopping point refers to the specific location where the user stays during the mobile process. In the user’s mobile process, cell phone signaling data are used to record the information of base stations that the user connects with at different times and locations, and such data contain time information, location information, user information, etc. To a certain extent, this can represent the user’s activity trajectory. When the user is traveling, multiple base stations will be connected, and the neighboring track points are usually far away from each other, and the track points are relatively dispersed in spatial distribution. When the user stays, the cell phone will connect to nearby base stations, and the neighboring track points are closer together, forming a densely distributed stopping area. The stopping points reflect the user’s activity patterns and frequently visited locations in the city, and further analysis of the stopping point data can help to understand the user’s travel behavior and activity characteristics. Based on the above characteristics, processed mobile signaling data are obtained, and a speed-based stop-point recognition algorithm for stay areas is designed to identify the sequences of stop points in user travel. The specific steps of the stop-point recognition algorithm process are as follows: The first step is to organize the user’s mobile signaling data by the sequence of start times, including time and location information. The second step is to obtain the distance and time difference between two adjacent mobile signaling data points in the sequence, including time and location information, and further calculate the average moving speed between the two adjacent data points. The third step sets the minimum speed threshold at

v

meters per second. When the average moving speed is below the set minimum speed threshold, it indicates that the user has stayed at that location; the average moving speed of the two adjacent mobile signaling data points is compared with the minimum speed threshold. If the average speed is less than the minimum speed threshold, the start time, end time, and latitude and longitude information of these two adjacent mobile signaling data points are added to the stop-point sequence, ultimately resulting in a full stop-point sequence.

2.: User Travel Behavior Recognition

By using time and location data from activity stay zones, it is possible to deduce the purpose of individuals’ actions in certain regions, analyze their behaviors, and consequently create users’ activity chains. An in-depth understanding of users’ travel habits in urban environments yields valuable insights into their daily living patterns, preferences, and requirements. This information is crucial for urban planning, traffic management, and commercial decision-making processes. For instance, it is possible to deduce users’ daily work and leisure activities, as well as their lifestyles and purchasing patterns, by identifying often visited commercial places, entertainment venues, and other sites. This facilitates corporate decision-makers in gaining a more comprehensive understanding of the requirements and actions of inhabitants, therefore equipping enterprises with accurate positioning and marketing strategies. Hence, comprehending the travel habits of users is the primary objective of this study.

After extracting the user’s travel stops and POI dataset, in order to further obtain the user’s travel behavior, this paper constructs a user travel behavior recognition model and semantically constructs the user’s trajectory to give the user’s behavioral recognition results for each section of trajectory points. Based on the relationship between time and space, a spatiotemporal feature-based user travel behavior recognition model is proposed. The semantic trajectory of a user’s travel is calculated based on the nearby POI (point-of-interest) set, with varying weights based on distance and time. Closer proximity to a POI increases its weight, and alignment with typical times for activities such as lodging, work, dining, and entertainment also increases its weight. The formula for calculating semantic features is given in Equation (1):

{F e a t u r e}_{i j} = \sum_{n = 1}^{n} (α_{n} \times D i s t a n c e + β_{n} \times T i m e)

(1)

This is based on the relationship between time and space, where

{F e a t u r e}_{i j}

represents the semantic feature represented by the article

j

data of the

i

th user,

n

represents the quantity in the POI set,

α_{n}

represents the distance weight in the

n

th POI set,

β_{n}

represents the time weight in the

n

th POI set, and

D i s t a n c e

represents the distance between the

n

th POI location and the user location.

T i m e

indicates the intersection time between the start time in the user’s semantic trajectory and the current POI working time.

3.2.2. Location Prediction Model

Predicting users’ locations is a critical task in many applications, including navigation systems and point-of-interest recommendations. Numerous studies have proposed various algorithms to predict the next location people will move to, often based on open social media data or GPS trajectory data. However, there is relatively little research using mobile signaling data for this purpose. Utilizing mobile signaling data to predict users’ future locations has significant research value [34], as the information about when and where populations appear in geographic areas is crucial for urban planning, commercial site selection, and tourism resource development.

In this study, we propose a BiLSTM-RF model based on mobile signaling data. Using mobile signaling data as the foundational dataset, we construct a frequent semantic dataset for users as input to the BiLSTM-RF model, aiming to predict users’ next locations in their movement processes. This model integrates user behavior, a multi-layer attention mechanism, and a bidirectional long short-term memory network. The model framework is illustrated in Figure 2.

Figure 2. BiLSTM-RF position prediction model framework.

Spatiotemporal Semantic Vectors

The user’s trajectory records include time information represented as integer timestamps. To construct the spatiotemporal semantic vector corresponding to a location, we first create a time vector of length N. Then, we construct a functional semantic vector of length Z, where Z is the number of user behaviors. Using the TF-IDF (term frequency–inverse document frequency) algorithm, we determine the attractiveness of each behavior category for the user. The TF-IDF formula is given in Equation (2):

W_{d} = f_{w, d} \cdot l o g \frac{|D|}{1 + f_{w, d}}

(2)

The user’s trajectory records include time information represented as integer timestamps.

w

represents the type of user behavior and

d

represents the location.

d

represents the location set in the human trajectory data.

f_{w, d}

represents the frequency of

w

in

d

, and the dimension of the semantic vector is the same as the number of user actions, i.e., Z.

Finally, the semantic attractiveness vector is appended to the time information vector, forming a spatiotemporal feature vector of dimensions (N + Z). This vector is connected with each of the user’s M trajectories to construct an M × (N + Z) spatiotemporal semantic sequence matrix.

Spatiotemporal Mobility Pattern Learning

To extract contextual information from the trajectories and spatiotemporal semantic sequences, a BiLSTM-RF model is used in the respective modules, including the feature input layer and the training layer. In the trajectory prediction module, the user’s trajectory vector serves as the input to the BiLSTM-RF model. In the spatiotemporal feature module, the spatiotemporal semantic sequence is used as the input. Finally, a fully connected layer links the trajectory prediction module with the spatiotemporal feature module. During training, each sequence is divided into fixed-length segments of m, using the position information and spatiotemporal semantic features from the previous m − 1 segments for prediction.

Attention Mechanism

The attention mechanism mimics how humans observe objects, allowing the selection of key features from large amounts of information [35]. Within the task of location prediction, the attention mechanism has the ability to allocate weights to locations, spatiotemporal characteristics, and various human motions present in trajectories. The model utilizes a multi-layer attention model comprising a local attention layer and a global attention layer. The local attention layer integrates the outputs of the trajectory and spatiotemporal semantic models, while the global attention layer highlights the impact of location on prediction results.

Local Feature Fusion

User mobility is random and influenced by various factors, such as preferences related to specific times and locations [36], which increase the likelihood of users appearing in certain areas at specific times. There are implicit correlations between these factors, which need to be considered in the model. To address this, a local attention layer is used to integrate position and spatiotemporal features. Specifically, hidden state vectors from the trajectory model and the spatiotemporal model are merged to obtain a new hidden state vector

h_{f}

. During this process,

h_{f}

has dimensions of L × 2 × 2N, where L is the sequence length and N is the number of neurons in the hidden layer. Since a BiLSTM-RF model is used, the output hidden state vector’s length is twice that of the input vector. Subsequently, the hidden state vectors from both the position model and the temporal semantic model are of length 2N, and the local attention layer computes the weights for the position and spatiotemporal semantic features, as represented in Equation (3):

u_{l i} = V_{l}^{T} \tanh (W_{l} \cdot h_{f} + b_{l})

(3)

Using the initialization coefficient matrix W and bias term

b_{l}

, input

h_{f}

undergoes a linear transformation followed by a non-linear transformation via the tanh function. This approach determines the weights for the position and spatiotemporal semantic features, thereby better integrating the influence of different factors to improve the performance and accuracy of the mobility transition model. During training, the model iteratively optimizes and updates the parameters

W_{l}

,

b_{l}

, and

V_{l}^{T}

using the softmax function to normalize and calculate the weights

α_{l i}

of local features, as shown in Equation (4):

α_{l i} = softmax (u_{l i}) = \frac{\exp (u_{i i})}{\sum_{i = 1}^{2} \exp (u_{i i})}

(4)

Finally, the sum of all feature vectors, weighted by feature weights, is obtained to yield the integrated feature

h_{A t t e n}

∈

R^{L \times 2 N}

, as shown in Equation (5):

h_{A t t e n} = \sum_{i = 1}^{2} α_{l i} \cdot h_{f i}

(5)

where

α_{l i}

is the influence weight at the

i

-th feature and

h_{f i}

is the

i

-th feature vector.

Global Attention Allocation

The population dynamics in urban areas display erratic transitional patterns, suggesting that people’s mobility is impacted not just by transportation requirements, but also by variables such as dining out or shopping at supermarkets. While transit hubs, such as bus and tube stations, are significant in facilitating commuting, they may not necessarily serve as the decisive elements in forecasting individuals’ subsequent activities. Hence, it is imperative to allocate greater focus on the significance of each place while forecasting the subsequent position. In order to tackle this issue, a global attention layer is implemented to investigate the influence of each location on the prediction. This approach enables the model to consider the relationships between different locations in the city more comprehensively, thereby providing a broader perspective. The global attention layer uses integrated features

h_{{A t t e n}_{j}}

as the input, and the computation is shown in Equations (6)–(8):

u_{g j} = V_{g}^{T} \tanh (W_{g} \cdot h_{{A t t e n}_{j}} + b_{g})

(6)

α_{g} = softmax (u_{g j}) = \frac{\exp (u_{g j})}{\sum_{j = 1}^{L} \exp (u_{g j})}

(7)

H = \sum_{j = 1}^{T} α_{g j} \cdot h_{{A t t e n}_{j}}

(8)

where

W_{g}

and

b_{g}

are the coefficient matrix and bias term in the attention mechanism. T represents the total number of time steps,

α_{g j}

is the influence weight for each location at the j-th time step, L is the trajectory sequence length, and H is the result. In the final prediction module, based on the output of the feature from the global attention layer, a fully connected layer and softmax are used to obtain the probability of each candidate location, with the highest probability location chosen as the prediction result.

3.2.3. Hierarchical Reinforcement Learning

While basic user trajectory location prediction models can estimate the weights of each historical location, the influence related to the target location may be diluted by unrelated trajectories when users leave footprints in many different places. To address this issue, actions are divided into low-level actions and high-level actions, which, respectively, focus on individual adjustments of historical trajectories and global adjustments of the entire user profile. As shown in Figure 3, by embedding reinforcement learning into the BiLSTM-RF location prediction model, we use the location probabilities of the location prediction model as the input. Through hierarchical reinforcement learning, low-level and high-level actions modify the trajectory file, thus updating the trajectory, which ultimately updates the position data in the location prediction model. This process continually trains the model to enhance its user location prediction capability.

Figure 3. User location prediction model of reinforcement learning.

The environment, intelligent body, action, and reward in the reinforcement learning model for user location prediction are used to adjust the location prediction model to improve the prediction performance of the model. Wherein the environment is the historical movement trajectory of the user with the prediction model, the environment interacts with the intelligent body, and the updating of the environment state is carried out through the actions of the intelligent body. The intelligent body is used to be an entity that makes decisions through reinforcement learning. The task of the intelligent body is to decide whether or not to keep the current output provided by the predictive model in the current environment. The decision-making process of the intelligent body aims to maximize the cumulative rewards by learning to extract information from the environment state and selecting actions to maximize the cumulative rewards. The actions of the intelligent body are used to decide whether or not to retain the predicted results of the movement trajectory computed by the position prediction model. The actions are binary sequences that are set to a binary of 1 if the results provided by the position prediction model are retained, and 0 if they are not retained. The rewards are the feedback that the intelligent body receives after executing the actions of the intelligent body. In this experiment, rewards are values assigned based on the correctness of the predicted result, and if the retained prediction is correct, the corresponding reward value is obtained. At each time step, the model calculates the reward for the current action and accumulates it to adjust the model’s decision-making ability. The intelligence performs high-level actions and low-level actions based on the adjustment strategy to correct the position prediction model.

Low-Level Actions

Low-level actions aim to optimize the user trajectory prediction model by adjusting the influence of historical trajectories to enhance the accuracy of predicting the target location. When determining whether to delete a historical trajectory

e_{t}^{u} \in E_{u}

, the influence of the user’s historical trajectory is adjusted. The defined state feature

s

evaluates the similarity between the current historical trajectory

e_{t}^{u}

and the target location

c

through the cosine similarity of their embedding vectors. Based on the defined state feature, it is decided whether to delete

e_{t}^{u}

to optimize the overall trajectory prediction performance. The formulas for low-level actions are given in Equations (9)–(11):

H_{t}^{l} = ReLU (W_{1}^{l} s_{t}^{l} + b^{l})

(9)

π (s_{t}^{l}, a_{t}^{l}) = P (a_{t}^{l}∣ s_{t}^{l}, Θ^{l})

(10)

π (s_{t}^{l}, a_{t}^{l}) = a_{t}^{l} σ (W_{2}^{l} H_{t}^{l}) + (1 - a_{t}^{l}) (1 - σ (W_{2}^{l} H_{t}^{l}))

(11)

where

H_{t}^{l}

represents the hidden state of the low-level actions, using the ReLU activation function.

W_{1}^{l}

is the weight matrix of the low-level hidden layer, with dimensions

d_{1}^{l} \times d_{2}^{l}

, where

d_{1}^{l}

is the number of state features and

d_{2}^{l}

is the dimension of the hidden layer.

W_{2}^{l}

is the weight matrix of the output layer, with dimensions

d_{2}^{l} \times 1

.

b^{l}

is the bias for the low-level hidden layer.

π (s_{t}^{l}, a_{t}^{l})

is the policy function for low-level actions.

a_{t}^{l}

is a binary value indicating whether to delete the historical trajectory

e_{t}^{u}

. The Sigmoid function σ is used to convert the input into a probability.

High-Level Actions

High-level actions aim to adjust the impact of the entire user’s trajectory file

E_{u}

when determining whether to modify it. The state function

S

is defined as the average cosine similarity between each historical trajectory and the target location’s embedding vectors, as well as the average value of their element-wise product. Additionally, an extra state feature

P (y = 1| E, c)

is introduced, representing the probability predicted by the basic trajectory prediction model based on

E

, where

E

is the user’s trajectory file and

c

is the prediction confidence.

Reward

In the task of recommending user movement trajectory predictions, the reward function serves as a signal for whether an action is reasonable. Suppose each low-level action represents a step to modify the user’s historical trajectory. The reward is defined as shown in Equation (12):

\begin{matrix} R (a_{t}^{l}, s_{t}^{l}) \\ = \{\begin{matrix} \log p (E_{u}, c_{i}) - \log p (E_{u}^{'}, c_{i}) \\ 0 \end{matrix} \begin{matrix} , & t = t_{u}, d i s < 1000 \\ , & o r \end{matrix} \end{matrix}

(12)

In this context,

p (E_{u}, c_{i})

represents

p (y = 1, E_{u}, c_{i})

, where

E_{u}^{'}

is the modified trajectory, a subset of

E_{u}

. If

t = t_{u}

, meaning the current action is the last low-level action, and the predicted position is within 1000 m of the actual position, the reward is determined by the difference in the log probabilities of the modified and unmodified trajectories. This positive difference in reward indicates the beneficial effect of the modification.

In special cases where

E_{u} = Ø

, meaning all historical trajectories have been removed, a trajectory is randomly selected from the original set

E_{u}

. The reward is defined as the difference in log probability between the modified file and the previous one.

If the high-level task chooses to perform a correction action, it invokes the low-level task and receives the same delayed reward

R (a_{t}^{l}, s_{t}^{l})

after executing the last low-level action. Otherwise, it retains the original configuration and receives zero reward.

Ultimately, based on the defined state features, the decision is made whether to modify the entire user location profile to improve overall trajectory prediction performance.

3.2.4. Commercial Site Selection Calculation

The commercial site selection problem is defined as determining the optimal location within a given geographic grid and a set of candidate locations to maximize the impact of a new commercial facility on the target group. The types of commercial sites are categorized into four main types: accommodation, work, dining, and entertainment. The goal is to find a location where establishing a commercial facility can attract the maximum number of potential customers and influence a broader target group. When solving this problem, factors such as the popularity of candidate locations, geographical features, the surrounding environment, and the distribution of competitors are considered. These factors are converted into four calculation metrics: prosperity, competitiveness, traffic flow, and transportation accessibility [3,30,37]. By comprehensively considering these metrics, along with the geographic grid and candidate location information, the optimal location can be determined to achieve the best layout and operational efficiency of the commercial facility.

Prosperity

Prosperity reflects the thriving nature of an area through nearby points of interest (POIs). The more prosperous the area, the more it attracts tourists; thus, prosperity has a positive correlation with commercial site selection. Prosperity is represented by Equation (13), as follows:

P_{l, p} = - \sum_{k} (\frac{N_{l, k}}{N_{l}} \times \ln \frac{N_{l, k}}{N_{l}})

(13)

where

P_{l, p}

represents the prosperity of cell

l

,

k

represents the types of POIs within the cell,

N_{l}

represents the total number of all types of POIs within the cell, and

N_{l, k}

represents the number of POIs of type

k

within the cell.

Competitiveness

Competitiveness represents the relationship between the types of POIs being considered and existing POIs of the same type in the area. The more POIs of the same type, the fiercer the competition; thus, competitiveness has a negative correlation with commercial site selection [38,39,40]. Competitiveness is represented by Equation (14), as follows:

P_{l, c} = \frac{α \cdot N}{N_{l, k}}

(14)

where

P_{l, c}

represents the competitiveness of cell

l

,

α

is a constant,

N_{l, k}

represents the number of POIs of type

k

within the cell, and

N

represents the number of the same type of POIs in the current cell.

Traffic Flow

Traffic flow is the result of the population density calculated by the crowd location prediction model based on reinforcement learning. The higher the traffic flow, the higher the potential revenue from site selection in the area; thus, traffic flow has a positive correlation with commercial site selection [41]. Traffic flow is represented by Equation (15), as follows:

P_{l, f} = α \times c o u n t (n)

(15)

where

P_{l, f}

represents the traffic flow of cell

l

,

α

is a constant, and

c o u n t (n)

represents the predicted population count within the cell.

Transportation Accessibility

Transportation accessibility represents the distance between the recommended site in the specified area and the main roads. The farther the distance, the less attractive it is to users; thus, transportation accessibility has a positive correlation with commercial site selection. Transportation accessibility is represented by Equation (16), as follows:

P_{l, t} = \{\begin{matrix} α \times \frac{d i s t a n c e}{250}, d i s t a n c e < 250 \\ 0, d i s t a n c e \geq 250 \end{matrix}

(16)

where

P_{l, t}

represents the transportation accessibility within cell

l

,

α

is a constant, and

d i s t a n c e

represents the distance from the site to the main road.

Finally, the commercial site selection score that integrates the four factors is defined by Equation (17), as follows:

P = α P_{l, p} + β P_{l, c} + χ P_{l, f} + δ P_{l, t}

(17)

where

α, β, χ

, and

δ

are weight coefficients, each taking a value in the range of [−1, 1].

P

represents the score obtained by establishing a specific type of commercial POI in the area. A higher score indicates a higher probability of profitability for that type of commercial POI in the area.

4. Results and Analysis

During the training phase, the BiLSTM-RF-based location prediction network underwent an initial round of training. The setup included 3 input units, 12 standard hidden units, and 3 memory units, with a learning rate of 0.001 over 50 epochs. The reinforcement learning phase constituted the second round of training, spanning 30 epochs with an agent learning rate of 0.0005, a batch size of 24, a high-level state dimension of 18, and a low-level state dimension of 34. In the testing phase, the optimal weights obtained from the training phase were loaded, and user trajectory data were input to predict user locations in future timeframes.

4.1. Comparative Experiments

To validate the effectiveness of the proposed BiLSTM-RF model, comparisons were made with the Markov, LSTM, and BiLSTM models.

From Table 5, it can be observed that the BiLSTM-RF model significantly reduces both the MAE and RMSE metrics compared to other models, indicating its superior accuracy in predicting target values. In terms of accuracy, the BiLSTM-RF model also outperforms the other models, achieving an accuracy rate of 0.732. Therefore, compared to other baseline algorithms, BiLSTM-RF shows a marked advantage. This proves that the crowd prediction model constructed in this study is reasonable.

Table 5. Comparative experimental results.

4.2. Regional Analysis

To further analyze the model’s accuracy across different regions, calculations were performed individually for each district and each block in Beijing, focusing on the top 400 popular blocks.

In this experiment, the frequent semantic trajectories of users were divided into sample sets, with 80% of the data used as a training set and 20% as a test set. The model was trained and tested using one day’s worth of user data, and the prediction results for a specific time period were extracted. The results are illustrated in Figure 4.

Figure 4. (a) Real population distribution; (b) predicted population distribution.

To further determine whether the predicted kernel density calculations were accurate, the prediction results were processed and analyzed. The results are shown in Table 6.

Table 6. Analysis of the accuracy of the study area.

The experimental results indicate that the BiLSTM-RF model proposed in this chapter has an average accuracy of about 73% in predicting the future trajectory distribution of users and accurately identifying the future movement patterns of users. Analysis reveals that the model performs well in Xicheng District, Fengtai District, and Tongzhou District, likely due to the greater number of statistical points in these areas and their even distribution, allowing the model to fully leverage the data for training. In contrast, Chaoyang District and Dongcheng District, which have more statistical points, show slightly lower accuracy than the average level. Therefore, it is evident that the model yields satisfactory results in urban population prediction. The proposed BiLSTM-RF model has demonstrated high accuracy and stability in predicting future user trajectory distributions, providing reliable population density support for subsequent commercial location models.

Based on the commercial location algorithm’s evaluation scores, the address with the highest score is selected as the target store location. Currently, preliminary recommendations have been made for commercial locations within the Sixth Ring Road of Beijing, involving 400 regions and 200 commercial points.

The site selection results are presented in Figure 5. The point location results indicate that the optimal areas for selecting commercial sites are primarily found in the central urban area within the Fourth Ring Road. These locations are divided into four categories: (1) Olympic Park, Wangfujing, South luogu Lane, and other places in the figure that are convenient for transportation, have complete catering and entertainment facilities, and are part of the well-known culture and entertainment industry zone in Beijing. (2) Beijing West Railway Station, Beijing South Railway Station, and Beijing Railway Station are passenger and logistics distribution centers. These locations have high year-round population mobility and overly dense crowd distribution, resulting in a higher weighting of the location in the algorithm and therefore more recommended points. (3) Liangxiang, Tongzhou, Daxing, etc., have gradually increased their commercial development due to the construction of Beijing’s subcenters and the relocation of some universities. This area has high commercial value and layout potential. (4) Huiju Shopping Mall, Wangjing, and other mega shopping centers are also recommended because of their high prosperity and huge customer flow.

Figure 5. Recommended locations for site selection.

According to the analysis of the recommended points, the recommended business areas in Haidian district are near Zhongguancun, Golden Resources Shopping Mall, and universities, and the corresponding consumer demand is larger in these areas. Near the scenic spots in Xicheng District and Dongcheng District, there are many recommended sites. In Chaoyang District, the recommended areas include Wangjing SOHO, Liangmaqiao, and nearby Sanlitun, which are more prosperous and therefore have a higher probability of recommendation. The recommended business district in Fangshan District is concentrated near the university town, where the population is dense and the business district is dense, corresponding to more consumer groups. Daxing District is also recommended near the densely populated university town.

5. Digital Twin Business Location System

5.1. Simulation Software

Unreal Engine is an advanced game engine developed by Epic Games that is widely used to create various types of games, including those incorporating virtual reality, augmented reality, simulators, visualization applications, and special effects [42]. With the built-in data transfer interface and blueprints of the Unreal Engine, it is possible to dock real-world data with the virtual world, combine it with the back-end program system to provide real-time city data calculations, create a real-world digital twin model, and realize a comprehensive data analysis to analyze the candidate locations of potential business sites in the city and, finally, present them through the visualization of the digital twin system.

5.2. Urban Building Simulation

Digital City Construction

In order to narrow the gap between the digital city and the real city, the construction of digital assets should start with authenticity wherever possible. Geographic information data provide real and effective city data. By using geographic information data to restore the digital city model, the real scene can be replicated in the virtual scene with high accuracy to ensure the accuracy and authenticity of the scene.

Before creating a city model, it is necessary to collect geographic information data for buildings and roads, use QGIS 3.22.8 to add height fields to the building data, and retain major sections in the road data. Next, the 2D geographic data are converted into a 3D model by converting the file into a 3D model in Blender 2.93 adjusting the height of the 3D model according to the height field of the buildings, and generating a NURBS path model for the road data (Figure 6a).

Figure 6. (a) Three-dimensional architectural white film; (b) digital city scene.

Finally, the generated 3D model is imported into Unreal Engine for scene construction and rendering (Figure 6b). The size of the 3D model in Unreal Engine can be adjusted to match the terrain scene of the real city, ensuring a high degree of reducibility and authenticity of the scene and model. Through the use of lighting, materials, and special effects, urban 3D building models and road models are optimized to enhance the realism and visual effects of the scene.

5.3. Site Selection Area Simulation

Key area display

In this experiment, we chose to use an VRest plug-in to realize the data communication between the commercial site selection system and the back-end system to ensure that the system can quickly and accurately obtain the external data and realize virtual and real synchronization. In the screening function, after the user carries out data screening, the data are transmitted through the connected node to the client, and the back-end system queries the geographic information data in the MySQL database and calculates the results in real time. Eventually, the commercial location recommendation results are transmitted from the back end to the unreal system, realizing the real-time and accurate data interaction between the real scene and the digital twin scene, and realizing the recommendation of the commercial location so that the user can intuitively feel the commercial characteristics of the area, as well as the geographic environment. Business hot spots are finely modeled, allowing business decision-makers the freedom to see within the model and label locations within it. To accurately locate business locations, users can view hot spots in the image set and can view the model in the virtual scene. Figure 7 shows the interface of key areas.

Figure 7. (a,b) Display of key areas.

Recommended Location for Business Siting

The presentation of the suggested location for selecting a company site is a crucial aspect. In order to accomplish this objective, we provide users an ideal company placement scheme by utilizing a mix of data analysis and simulated scenarios.

First, we created the Actor blueprint for the recommended point display and configured a static grid body. In view of the need to recommend a location area with multiple attributes, we created a corresponding texture map for each attribute to label and distinguish among the recommended points.

Secondly, we developed a data conversion interface to convert the real geographic information data into the coordinate system of the virtual scene to realize the seamless connection of data. As shown in Figure 8, through the recommendation point in the Actor blueprint, the dynamic loading of the recommendation point in the virtual scene is realized.

Figure 8. Recommended Actor for site selection.

In addition, we designed a details window in the interactive control to display the details of each recommendation point. The window includes key data such as geographical location, surrounding environment, business potential analysis, etc., to provide users with comprehensive information support.

6. Discussion and Future Work

The results show that some commercial siting simulations of digital cities can be performed, and these simulation results can be fed back through the digital twin model. Our business location system can help to make business decisions and help with urban planning; with geographical big data and mobile signaling data as data support, business decision-makers can analyze the business potential of a region and a city. By employing a recognition algorithm, we are able to comprehend the travel patterns of users. We enhance the accuracy of our predictions by integrating LSTM with reinforcement learning to anticipate the movement locations of crowds. Consequently, we address the issue of imprecise and outdated population flow data in commercial site selection. The forecast distribution has a high level of accuracy, with a prediction accuracy rate of 73.2%. Compared to the survey data used in other studies, the mobile signaling data used in this research are authentic and difficult to obtain, playing a crucial role in accurate predictions. This study designed a stop-point recognition algorithm for the collected user mobility data and proposed a behavior recognition algorithm that combines time and space. This study not only provides a good modeling method, but also provides a commercial site selection visualization system. On the one hand, the system can realize real-time accurate data interaction. When the geographic information data are updated, the system can quickly and accurately obtain external data and realize the synchronization between reality and reality; on the other hand, the system can also be adjusted according to the data to re-feedback the parameters in order to realize sustainable development. Using Unreal Engine to build a commercial location system can provide comprehensive data support, model analysis, and decision support to help business decision-makers better understand a location environment, optimize their site selection schemes, and improve the efficiency and success rate of site selection.

This study also has some shortcomings. After identifying suitable commercial locations and hotspots, we did not combine different commercial types for quantitative and qualitative analysis. In our subsequent work, we will analyze and verify this in detail with different types of businesses in different areas to maximize economic and other benefits. In future work, we plan to expand on this with data from IoT services within cities. We will combine various factors—including the environment of an area and real-time information on traffic, noise pollution, rent, and other factors—to build a better location algorithm and further expand the functions of the commercial location system to make the scenes of the digital city more real.

Author Contributions

Conceptualization, Jin Zou; methodology, Jin Zou and Xun Zhang; software, Jin Zou and Xun Zhang; validation, Jin Zou and Zhentong Gao; formal analysis, Jin Zou and Jinlian Shi; resources, Jin Zou; data curation, Jin Zou and Yangxiao Cong; writing—original draft preparation, Jin Zou; writing—review and editing, Jin Zou and Xun Zhang; visualization, Jin Zou and Xun Zhang; supervision, Xun Zhang; project administration, Xun Zhang; funding acquisition, Jin Zou. All authors have read and agreed to the published version of the manuscript.

Funding

This research was supported by the National Natural Science Foundation of China (42101470); the Project of Social Science Foundation of Xinjiang Uygur Autonomous Region (2023BTY128); and the Project of Natural Science Foundation of Xinjiang Uygur Autonomous Region (2023D01A57).

Data Availability Statement

Data can be provided upon request.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Triantaphyllou, E.; Triantaphyllou, E. Multi-Criteria Decision Making Methods; Springer: Berlin/Heidelberg, Germany, 2000. [Google Scholar]
Önüt, S.; Efendigil, T.; Kara, S.S. A combined fuzzy MCDM approach for selecting shopping center site: An example from Istanbul, Turkey. Expert Syst. Appl. 2010, 37, 1973–1980. [Google Scholar] [CrossRef]
Semih, T.; Seyhan, S. A multi-criteria factor evaluation model for gas station site selection. Evaluation 2011, 2, 12–21. [Google Scholar]
Karamshuk, D.; Noulas, A.; Scellato, S.; Nicosia, V.; Mascolo, C. Geo-spotting: Mining online location-based services for optimal retail store placement. In Proceedings of the 19th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Chicago, IL, USA, 11–14 August 2013; Association for Computing Machinery: New York, NY, USA; pp. 793–801. [Google Scholar]
Neirotti, P.; Marco, A.D.; Cagliano, A.C.; Mangano, G.; Scorrano, F. Current trends in Smart City initiatives: Some stylised facts. Cities 2014, 38, 25–36. [Google Scholar] [CrossRef]
Hou, J.; Zhao, H.; Zhao, X.; Zhang, J. Predicting mobile users’ behaviors and locations using dynamic Bayesian networks. J. Manag. Anal. 2016, 3, 191–205. [Google Scholar] [CrossRef]
White, G.; Zink, A.; Codecá, L.; Clarke, S. A digital twin smart city for citizen feedback. Cities 2021, 110, 103064. [Google Scholar] [CrossRef]
Kaur, M.J.; Mishra, V.P.; Maheshwari, P. The convergence of digital twin, IoT, and machine learning: Transforming data into action. In Digital Twin Technologies and Smart Cities; Springer: Berlin/Heidelberg, Germany, 2020; pp. 3–17. [Google Scholar]
Alam, K.M.; El Saddik, A. C2PS: A digital twin architecture reference model for the cloud-based cyber-physical systems. IEEE Access 2017, 5, 2050–2062. [Google Scholar] [CrossRef]
Grieves, M. Digital twin: Manufacturing excellence through virtual factory replication. White Pap. 2014, 1, 1–7. [Google Scholar]
Glaessgen, E.; Stargel, D. The digital twin paradigm for future NASA and US Air Force vehicles. In Proceedings of the 53rd AIAA/ASME/ASCE/AHS/ASC Structures, Structural Dynamics and Materials Conference 20th AIAA/ASME/AHS Adaptive Structures Conference 14th AIAA, Honolulu, HI, USA, 23–26 April 2012; p. 1818. [Google Scholar]
Qi, Q.; Tao, F. Digital twin and big data towards smart manufacturing and industry 4.0: 360 degree comparison. IEEE Access 2018, 6, 3585–3593. [Google Scholar] [CrossRef]
Deng, T.; Zhang, K.; Shen, Z.-J.M. A systematic review of a digital twin city: A new pattern of urban governance toward smart cities. J. Manag. Sci. Eng. 2021, 6, 125–134. [Google Scholar] [CrossRef]
Soon, K.; Khoo, V. CityGML modelling for Singapore 3D national mapping. Int. Arch. Photogramm. Remote Sens. Spat. Inf. Sci. 2017, 42, 37–42. [Google Scholar] [CrossRef]
Pennacchiotti, M.; Popescu, A.-M. A machine learning approach to twitter user classification. In Proceedings of the International AAAI Conference on Web and Social Media, Barcelona, Spain, 17–21 July 2011; pp. 281–288. [Google Scholar]
Rodrigues, E.; Assunção, R.; Pappa, G.L.; Renno, D.; Meira, W., Jr. Exploring multiple evidence to infer users’ location in Twitter. Neurocomputing 2016, 171, 30–38. [Google Scholar] [CrossRef]
Li, W.; Eickhoff, C.; de Vries, A.P. Want a coffee? Predicting users’ trails. In Proceedings of the 35th International ACM SIGIR Conference on Research and Development in Information Retrieval, Portland, OR, USA, 12–16 August 2012; Association for Computing Machinery: New York, NY, USA; pp. 1171–1172. [Google Scholar]
Yin, H.; Hu, Z.; Zhou, X.; Wang, H.; Zheng, K.; Nguyen, Q.V.H.; Sadiq, S. Discovering interpretable geo-social communities for user behavior prediction. In Proceedings of the 2016 IEEE 32nd International Conference on Data Engineering (ICDE), Helsinki, Finland, 16–20 May 2016; pp. 942–953. [Google Scholar]
Chang, H.-W.; Lee, D.; Eltaher, M.; Lee, J. @ Phillies tweeting from Philly? Predicting Twitter user locations with spatial word usage. In Proceedings of the 2012 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining, Istanbul, Turkey, 26–29 August 2012; pp. 111–118. [Google Scholar]
Iso, H.; Wakamiya, S.; Aramaki, E. Density estimation for geolocation via convolutional mixture density network. arXiv 2017, arXiv:1705.02750 2017. [Google Scholar]
Mousset, P.; Pitarch, Y.; Tamine, L. End-to-end neural matching for semantic location prediction of tweets. ACM Trans. Inf. Syst. TOIS 2020, 39, 1–35. [Google Scholar] [CrossRef]
Lian, J.; Zhang, F.; Xie, X.; Sun, G. Restaurant survival analysis with heterogeneous information. In Proceedings of the 26th International Conference on World Wide Web Companion, Perth, Australia, 3–7 April 2017; Association for Computing Machinery: New York, NY, USA; pp. 993–1002. [Google Scholar]
Cheema, M.A.; Lin, X.; Zhang, W.; Zhang, Y. Influence zone: Efficiently processing reverse k nearest neighbors queries. In Proceedings of the 2011 IEEE 27th International Conference on Data Engineering, Hannover, Germany, 11–16 April 2011; pp. 577–588. [Google Scholar]
Xu, M.; Wang, T.; Wu, Z.; Zhou, J.; Li, J.; Wu, H. Demand driven store site selection via multiple spatial-temporal data. In Proceedings of the 24th ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems, Burlingame, CA, USA, 31 October–3 November 2016; Association for Computing Machinery: New York, NY, USA; pp. 1–10. [Google Scholar]
Shi, J.; Lu, H.; Lu, J.; Liao, C. A skylining approach to optimize influence and cost in location selection. In Proceedings of the International Conference on Database Systems for Advanced Applications, Bali, Indonesia, 21–24 April 2014; pp. 61–76. [Google Scholar]
Yang, Y.; Tang, J.; Luo, H.; Law, R. Hotel location evaluation: A combination of machine learning tools and web GIS. Int. J. Hosp. Manag. 2015, 47, 14–24. [Google Scholar] [CrossRef]
Lu, Y.; Zhu, S.; Zhang, L. A machine learning approach to trip purpose imputation in GPS-based travel surveys. In Proceedings of the 4th Conference on Innovations in Travel Modeling, Tampa, FL, USA, 30 April–2 May 2012. [Google Scholar]
Liu, D.; Weng, D.; Li, Y.; Bao, J.; Zheng, Y.; Qu, H.; Wu, Y. Smartadp: Visual analytics of large-scale taxi trajectories for selecting billboard locations. IEEE Trans. Vis. Comput. Graph. 2016, 23, 1–10. [Google Scholar] [CrossRef]
Wang, L.; Fan, H.; Wang, Y. Site selection of retail shops based on spatial accessibility and hybrid BP neural network. ISPRS Int. J. Geo-Inf. 2018, 7, 202. [Google Scholar] [CrossRef]
Shaikh, S.A.; Memon, M.A.; Prokop, M.; Kim, K.-S. An AHP/TOPSIS-based approach for an optimal site selection of a commercial opening utilizing geospatial data. In Proceedings of the 2020 IEEE International Conference on Big Data and Smart Computing (BigComp), Busan, Republic of Korea, 19–20 February 2020; pp. 295–302. [Google Scholar]
Perez-Benitez, V.; Gemar, G.; Hernández, M. Multi-criteria analysis for business location decisions. Mathematics 2021, 9, 2615. [Google Scholar] [CrossRef]
Liu, Y.; Li, J.; Yang, Y. Strategic adjustment of land use policy under the economic transformation. Land Use Policy 2018, 74, 5–14. [Google Scholar] [CrossRef]
Li, M.; Gao, S.; Lu, F.; Zhang, H. Reconstruction of human movement trajectories from large-scale low-frequency mobile phone data. Comput. Environ. Urban Syst. 2019, 77, 101346. [Google Scholar] [CrossRef]
Bao, Y.; Huang, Z.; Li, L.; Wang, Y.; Liu, Y. A BiLSTM-CNN model for predicting users’ next locations based on geotagged social media. Int. J. Geogr. Inf. Sci. 2021, 35, 639–660. [Google Scholar] [CrossRef]
Luong, M.-T.; Pham, H.; Manning, C.D. Effective approaches to attention-based neural machine translation. arXiv 2015, arXiv:1508.04025 2015. [Google Scholar]
Feng, J.; Li, Y.; Zhang, C.; Sun, F.; Meng, F.; Guo, A.; Jin, D. Deepmove: Predicting human mobility with attentional recurrent networks. In Proceedings of the 2018 World Wide Web Conference, Lyon, France, 23–27 April 2018; pp. 1459–1468. [Google Scholar]
Ustinovičius, L.; Stasiulionis, A. Multicriteria-based estimation of selection of commercial property construction site. Statyba 2001, 7, 474–480. [Google Scholar] [CrossRef]
Turhan, G.; Akalın, M.; Zehir, C. Literature review on selection criteria of store location based on performance measures. Procedia-Soc. Behav. Sci. 2013, 99, 391–402. [Google Scholar] [CrossRef]
Hoch, S.J.; Kim, B.-D.; Montgomery, A.L.; Rossi, P.E. Determinants of store-level price elasticity. J. Mark. Res. 1995, 32, 17–29. [Google Scholar] [CrossRef]
Durvasula, S. A Retail Store Location Model Based on Managerial Judgements. J. Retail. 1992, 68, 402–404. [Google Scholar]
Ahmed, M.; Muhammad, N.; Mohammed, M.; Idris, Y. A GIS-Based analysis of police stations distributions in kano metropolis. IOSR J. Comput. Eng. 2013, 8, 72–78. [Google Scholar] [CrossRef]
Sanders, A. An Introduction to Unreal Engine 4; AK Peters/CRC Press: Natick, MA, USA, 2016. [Google Scholar]

Figure 1. Daily average travel volume of users.

Figure 2. BiLSTM-RF position prediction model framework.

Figure 3. User location prediction model of reinforcement learning.

Figure 4. (a) Real population distribution; (b) predicted population distribution.

Figure 5. Recommended locations for site selection.

Figure 6. (a) Three-dimensional architectural white film; (b) digital city scene.

Figure 7. (a,b) Display of key areas.

Figure 8. Recommended Actor for site selection.

Table 1. Location grid table.

Field	Data Examples	Type
grid_id	1001403016115366	bigint
length	data	int
centroid_lat	39.8883603308591	double
centroid_lon	115.548248843125	double
wkt	POLGON((115.54676354663768 39.88946791718516, 115.54973413961343 39.88946791718516, 115.54973413961343 39.887252744533036, 115.54676354663768 39.887252744533036, 115.54676354663768 39.88946791718516))	string
zone_id	110109	string
province	011	string
city	V0110000	string

Table 2. Traveling through the base station table.

Field	Data Examples	Type
uid	2038792652410140000	string
move_id	1	int
move_vp_id	37	bigint
stime	1 October 202117:34:10	timestamp
grid_id	3003403592115440	bigint
cid	116998	bigint
province	011	string
city	V0110000	string
date	20211001	int

Table 3. User attribute table.

Field	Data Examples	Type
uid	1594256208358330000	string
gender	02	string
age	09	string
arpu	26	double
area	V0310000	string
brand	iPhone	string
type	iPhone	string
weight	5.4741700725862	decimal(28,8)
gw	8.44297150858395	decimal(28,8)
province	011	string
is_core	Y	string
is_local	Y	string
home_district	110108	string
work_district	110108	string
home_lon	116.332570558765	double
home_lat	39.9142582560604	double
work_lon	116.322234356429	double
work_lat	39.9372904232794	double
id_area	130721	string
city	V0310000	string
date	20170101	int

Table 4. POI dataset within the Sixth Ring Road of Beijing.

Grid	Type	Distance
46,88	Tourist attractions	37 m
46,88	Hotel accommodation	62 m
46,88	Leisure and entertainment	138 m
47,88	Life services	90 m
48,85	Hotel accommodation	213 m
48,85	Company	32 m

Table 5. Comparative experimental results.

Method	MAE	RMSE	Accuracy
Markov	142.3	172.53	0.5085
LSTM	107.2301	131.65	0.5472
BiLSTM	87.711	103.129	0.6295
BiLSTM-RF	79.3	88.164	0.7323

Table 6. Analysis of the accuracy of the study area.

Study Area	Statistical Point	Accuracy Rate
Haidian	52	0.721
Xicheng	31	0.779
Dongcheng	48	0.715
Chaoyang	90	0.716
Shijingshan	11	0.728
Fengtai	24	0.738
Tongzhou	63	0.754
Daxing	8	0.675
Fangshan	9	0.653

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Published by MDPI on behalf of the International Society for Photogrammetry and Remote Sensing. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Research and Modeling of Commercial Location Selection Based on Geographic Big Data and Mobile Signaling Data—A Case Study of the Central Urban Area of Beijing

Abstract

1. Introduction

2. Related Work

2.1. Digital Twins

2.2. Geolocation Prediction

2.3. Commercial Site Selection

3. Materials and Methods

3.1. Study Area and Data

3.2. Methods

3.2.1. Crowd Distribution Prediction

3.2.2. Location Prediction Model

3.2.3. Hierarchical Reinforcement Learning

3.2.4. Commercial Site Selection Calculation

4. Results and Analysis

4.1. Comparative Experiments

4.2. Regional Analysis

5. Digital Twin Business Location System

5.1. Simulation Software

5.2. Urban Building Simulation

5.3. Site Selection Area Simulation

6. Discussion and Future Work

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Article Metrics

Citations

Article Access Statistics