Site Selection Analysis and Prediction of New Retail Stores from an Urban Commercial Space Perspective: A Case Study of Luckin Coffee and Starbucks in Shanghai

Zhao, Zhengxu; Chen, Gang; Duan, Jianshu; Xu, Youheng

doi:10.3390/ijgi14060217

Open AccessArticle

Site Selection Analysis and Prediction of New Retail Stores from an Urban Commercial Space Perspective: A Case Study of Luckin Coffee and Starbucks in Shanghai

Department of Geographic Information Science, School of Geographic and Oceanographic Sciences, Nanjing University, Nanjing 210023, China

^*

Author to whom correspondence should be addressed.

ISPRS Int. J. Geo-Inf. 2025, 14(6), 217; https://doi.org/10.3390/ijgi14060217

Submission received: 8 March 2025 / Revised: 25 May 2025 / Accepted: 27 May 2025 / Published: 30 May 2025

(This article belongs to the Special Issue Spatial Information for Improved Living Spaces)

Download

Browse Figures

Versions Notes

Abstract

In the context of digital transformation, examining the differences in commercial site selection and the factors influencing these decisions holds significant practical value for understanding market adaptation strategies across varying business models and predicting future industry trends. This study divides the research area into 100 m × 100 m grids and employs a random forest model and related interpretability methods to conduct an empirical analysis of the site selection and influencing factors of Luckin Coffee and Starbucks stores in Shanghai. By integrating the prediction results with existing planning documents, this study achieves a coupling between urban spatial structure and location strategies. The findings indicate the following: (1) The random forest model demonstrates high accuracy in predicting new retail store locations, with an accuracy rate of 90.0% for Luckin Coffee and 92.2% for Starbucks. (2) The influence of traditional factors on the expansion of new retail coffee stores is declining, while Luckin Coffee’s layout demonstrates a stronger reliance on urban functional zones. (3) Relative suitability is derived by calculating the difference between the predicted probability values and the normalized kernel density values. In the central activity areas of the city, the relationship between site selection probability and suitability exhibits an inverse correlation, with Starbucks generally showing higher relative suitability overall. (4) Suitable areas for both brands’ site selections are spatially contiguous and integrated within the urban fabric, which suggests significant growth potential for both brands in the main urban areas. This study not only focuses on commercial optimization but also offers theoretical and methodological insights by exploring how different retail models interact with urban spatial structures, thereby contributing to the fields of retail geography and spatial governance.

Keywords:

new retail; business site selection; Shanghai; random forest

1. Introduction

In recent years, the global retail landscape has undergone a significant transformation driven by technological innovation and shifting consumer behaviors. International retail giants such as Amazon (with Amazon Go), Walmart, and Carrefour have adopted hybrid models that integrate online and offline channels, signaling the rise of “new retail” as a global trend. In China, new retail has been embraced rapidly and at scale, with Shanghai emerging as a leading testing ground. Shanghai, as China’s economic center, has developed a diversified business district structure and established a strong brand awareness, creating opportunities to experiment with new retail models. In 2023, the report “Highlights and Trends in Shanghai’s Digital Life Consumption” revealed that the average monthly scale of new retail in Shanghai exceeded 2.5 billion yuan. Furthermore, the Special Planning for Shanghai’s Commercial Spatial Layout (2022–2035) explicitly emphasized commercial digital transformation as a key component of Shanghai’s planning support system. With its rich commercial practices and forward-looking policies, Shanghai stands out as an ideal location for studying new retail models and their development.

The aim of “new retail” is to establish an offline channel and integrate it with the online retail channel [1]. This model merges physical and digital spaces, adjusting supply chains dynamically based on consumer data. The essence of new retail lies in retail itself, with its “new” aspect being the interactive methods it employs to enhance competitiveness in operational models, service concepts, and technological innovation [2]. However, new retail enterprises still face significant challenges in integrating online and offline resources, delivering seamless shopping experiences, and addressing rapidly changing market demands. For the success of a retail store, its location is arguably the most critical and costly decision that retailers must make [3]. Specifically, businesses need to carefully select precise locations to ensure efficient logistics and provide high-quality services. Substantial resources and efforts are required to strategically position themselves within urban spaces according to their business models, which is crucial for their survival and sustainable development. Therefore, when opening new stores or modifying the locations of existing ones, optimizing their location becomes the top priority [4].

In this study, we select coffee shops in Shanghai as a research case. The coffee industry is characterized by rapid market expansion, everyday consumption scenarios, and a well-defined consumer profile [5,6]. Luckin Coffee and Starbucks, as two representative coffee shop brands, highlight contrasting business models. The former entered the market with an “Internet+” model from the outset, while the latter, as a traditional retail giant, has also begun introducing its own online channels and integrating its digital operation platforms [7]. Different business models exhibit unique distribution trends and preferences within urban areas, which has prompted numerous scholars to investigate and compare their spatial distribution patterns.

Traditional research has typically focused on intercity or street-level scales, using spatial analysis methods to examine the spatial distribution characteristics and spatial autocorrelation of commercial outlets within cities [8,9]. In recent years, scholars have increasingly adopted machine learning approaches—such as Random Forest (RF), Iterative Dichotomiser 3 (ID3), Artificial Neural Network (ANN), and Gradient Boosting Decision Tree (GBDT)—at the grid level to analyze distribution patterns, identify influencing factors, and predict site selection [10,11,12,13,14,15,16,17]. Among these methods, RF has been widely used [10,11,14,15,16]. Existing studies have demonstrated their accuracy and practical value in predicting the location of coffee shops in Beijing by integrating multi-source spatial data [18,19,20]. To explore the nonlinear relationships of influencing factors, researchers have applied methods such as the Geographical Detector and XGBoost combined with partial dependence plots (PDPs) to reveal the direction and mechanisms of these effects [21,22]. However, prior studies have largely focused on comparing model performance, while lacking forward-looking perspectives on the future trends of influencing factors. Moreover, few studies have explicitly integrated site selection predictions with formal spatial planning documents, which limits the practical applicability of these models in supporting urban commercial development strategies [23].

Therefore, this study aims to integrate site selection prediction for new retail stores with the analysis of influencing mechanisms and interpretation within the context of urban policy. Specifically, we use an RF model to predict site suitability at a 100 m grid scale, compare the influencing factors of two retail models—Luckin Coffee and Starbucks—using Shapley Additive Explanation (SHAP) values and PDPs, and link the resulting spatial patterns with the planned commercial structure outlined in Shanghai’s official urban spatial planning documents.

2. Literature Review

2.1. Evolution of Site Selection Methods

In addressing business site selection issues, three distinct waves of retail location decision-making approaches have emerged in the academic field [24]. Early retail site selection often combined contact diffusion and hierarchical diffusion [25]. As a result, the Analytic Hierarchy Process (AHP), which enables hierarchical and systematic operational analysis, has emerged as a leading traditional site selection model [26,27,28,29]. With the emergence of new retail models and the growing availability of data resources, many scholars have increasingly turned to Geographic Information System (GIS) to analyze commercial store layouts [8,9,30,31]. Various spatial econometric models provide intuitive insights into the spatial distribution and influencing factors of new retail stores [32] while also identifying suitable site selection areas [33]. Moreover, as data volumes and computational demands rise in complex social and geographic environments, machine learning methods can improve the intelligence and precision of site selection models [18]. Previous research has examined site selection predictions for new retail stores using both single-model approaches [17] and comparative multi-model analyses [34], assessing the influence of various factors. Among these, the RF model often demonstrates relatively stable performance [10,18,19,20]. As a robust ensemble learning method, RF is well suited for handling high-dimensional, nonlinear, and collinear spatial data—common characteristics in urban site selection problems [35]. In addition, RF provides interpretable outputs, including variable importance scores, and is compatible with SHAP and PDPs, enabling deeper insights into the contribution and directionality of each predictive factor [36,37]. Compared to deep learning models such as neural networks, RF performs well with relatively small sample sizes and requires less parameter tuning, which makes it particularly suitable for exploratory spatial analysis with moderate amounts of data [38].

2.2. Selection of Influencing Factors

The form and spatial distribution of new retail stores differ fundamentally from those of traditional retail, reflecting shifts in consumer behavior, technological dependence, and business strategies. Unlike traditional retail, which often relies on centralized locations such as shopping malls and commercial streets, new retail emphasizes decentralization, integration with online platforms, and rapid adaptation to local demand. This shift has led to the emergence of smaller, more dispersed store layouts—often co-located with transportation hubs, office areas, or residential neighborhoods.

Therefore, site selection and the optimization of new retail store layouts require a comprehensive analysis of the interactions among multiple factors. In addition to conventional factors that reflect suitability for commercial development—such as the natural environment, transportation accessibility, economic conditions, and market demand [39]—this study also incorporates urban functional zoning and the competitive landscape into feature construction, as these better align with everyday consumption scenarios and target consumer behavior.

The natural environment—particularly topographic features such as elevation and land type—can constrain the use of land for commercial purposes due to differences in environmental carrying capacity and physical accessibility [10]. Land categories such as water bodies, farmland, and forests can be excluded from commercial development based on land use classification standards [40].

Transportation accessibility is crucial for attracting customer traffic and ensuring efficient logistics. High road network density and proximity to major roads or public transportation—such as metro and bus stations—are strongly preferred by commercial retailers [41,42].

Economic conditions significantly influence commercial investment decisions. Nighttime light intensity is strongly correlated with GDP and serves as an intuitive indicator of regional economic development [43]. Population density and housing prices further reflect land value and local purchasing power [44,45].

Market demand includes specific types of buildings commonly selected by coffee shops, such as commercial centers, schools, and office buildings. The former are typically surrounded by well-developed industries and facilities, offering strong consumer appeal [46], while in the latter two are concentrated key target groups for coffee shops, including students and office workers—primarily younger consumers [47].

Urban functional zoning is defined by the density of specific categories of points of interest (POIs) [48]. Identifying and comparing the preferred functional zones of the studied retail models helps analyze their site selection differences from an urban planning perspective.

The competitive landscape helps assess market saturation and spatial inefficiency. The presence and overlap of same-brand or rival coffee shops reveal both regional demand and the risk of over-distribution. In this study, coffee shops of different brands are treated as competitors. The overlapping service areas among these shops indicate the local demand for coffee, while also suggesting potential inefficiencies in store location allocation.

2.3. Theoretical Framework

Based on different optimization objectives for site selection decisions, the commercial site selection process can be explained by four primary theories: central place theory [49], the retail gravity model [50], the theory of minimum differentiation [51], and bid rent theory [52]. From a consumer behavior perspective, the first two theories assume that consumers in different regions are equal and uniform and that the retail industry merely needs to respond to consumer demand [53]. However, many challenges faced by modern retail enterprises stem from increasingly complex and unpredictable consumer behaviors [54,55], driven by socioeconomic differentiation among consumer groups. From the perspective of spatial diffusion and location, the spatial expansion of new retail disrupts traditional assumptions of homogeneous plains, as urban spatial structures undergo significant changes [56]. In this context, new retail models gradually decentralize retail away from traditional shopping centers, leading to the rising prominence of smaller centers. With the emergence of multi-centered urban structures, the monocentric scenarios discussed in bid rent theory no longer fully capture the site selection processes of new retail models.

Nevertheless, traditional theories still offer salient lessons for the layout of new retail businesses. For instance, the retail agglomeration theory, which evolved from the theory of minimum differentiation [57], provides a valuable explanation for the benefits of clustering. Clusters can reduce uncertainty and risk while fostering healthy competition. These benefits are equally attractive to new retail models. On the other hand, Spatial Interaction Models further emphasize that the interaction of goods and information flows between regions leads to shifts in socio-economic relationships, rather than assuming consumer homogeneity. This highlights that attention to the spatiality of retail regions should not focus solely on economic factors but must also account for the role of urban morphology in analysis [58]. Thus, explaining the site preferences of new retail businesses requires a nuanced analysis of locational contexts and urban vitality.

In summary, the theory of site selection for new retail should be expanded to encompass a more realistic and universal framework. This framework must account for the “reverse hierarchical” spatial distribution pattern often exhibited by new retail, minimize subjective factors in the complex site selection process, and incorporate the external environmental characteristics reflected in site selection trends. Accordingly, by leveraging multi-source data and machine learning algorithms, we can predict and compare the site selection trends of these two brands, enabling an objective analysis of their distribution preferences within urban spaces. To ensure the feasibility of the site selection results, we align our evaluation with the requirements outlined in urban planning documents.

3. Materials and Methods

3.1. Study Area and Data

3.1.1. Overview of the Study Area and Research Subjects

Shanghai, situated at the mouth of the Yangtze River, spans an area of 6341 km². According to Shanghai’s commercial planning, the city is divided into the Central Activity Zone and the main urban area (Figure 1). As the economic center of the Yangtze River Delta and a global hub for economics, finance, trade, and shipping, Shanghai’s geographical advantages and economic vitality have provided new opportunities for the growth of the coffee retail industry. In 2023, Shanghai hosted a total of 9553 coffee shops, ranking first in the nation, with Luckin Coffee and Starbucks holding the top two positions in the city’s coffee market [59].

Before conducting the site selection model analysis, we performed a comparative analysis of the brand backgrounds and store operation characteristics of Starbucks and Luckin Coffee, as shown in Table 1. According to Luckin’s 2024 annual report [60] and Starbucks’s financial statements [61], there are significant differences between the two brands in terms of store composition and order structure. In terms of store composition, 98.9% of Luckin stores are small-format pick-up stores with a floor area of 20–60 square meters, while only 1.1% are larger relax stores occupying around 120 square meters. In contrast, Starbucks primarily operates relax stores. Regarding order composition, Luckin operates entirely under a mobile internet-driven model, with nearly all orders placed through its app, WeChat mini-programs, or third-party delivery platforms—17.1% of which are delivery orders and 82.9% mobile pick-up orders. By contrast, even during its digital transformation, Starbucks still recorded 48% of its orders as on-site purchases. In addition, the two brands differ significantly in market positioning and marketing strategies, which further highlights the value of comparing their site selection patterns.

3.1.2. Data Sources and Preprocessing

The types, time dimensions, and sources of data used in this study are summarized in Table 2. This study draws on two primary data sources: offline store data for Luckin Coffee and Starbucks in Shanghai, along with data on relevant influencing factors. The POI data for the stores were sourced from Amap. First, Python and the Amap API were used to scrape the addresses and geographic coordinates of all Luckin Coffee and Starbucks stores in Shanghai. Second, the obtained addresses were verified and filtered through the official websites of Luckin Coffee and Starbucks. After verification, the store data underwent coordinate transformation and cleaning, yielding a final dataset of 1035 Luckin Coffee stores and 1167 Starbucks stores for analysis.

For grid data, the initial step involves geographic registration, followed by standardizing projections and resampling to achieve a uniform resolution size. From the POI data retrieved via Amap, a total of 1,508,610 valid data entries were obtained after cleaning. In line with the research objectives, the Shanghai metropolitan area was subdivided into 100 m × 100 m grids, thus facilitating a multi-source spatial data overlay analysis. The use of a grid division approach ensures consistency across study units [62], facilitating comparisons and tests while aiding in the integration or deconstruction of multi-source spatial data [63].

3.2. Methods

3.2.1. Research Framework

The overall research framework of this study is illustrated in Figure 2 and is divided into five main steps: dataset construction, feature selection, model building, model comparison, and spatial comparative analysis. First, the study area was processed by grid division and overlaid with multiple spatial datasets to construct a geographic dataset. A feature matrix was then created by extracting the grids containing the research objects along with an equal number of negative sample points. Next, the dataset was split into training and testing sets at a 70:30 ratio [64]. Based on the initial RF model, features were ranked according to their importance scores derived from SHAP values. Starting with the top-ranked feature, variables were incrementally added one by one to build models with increasing dimensionality. For each subset of variables, 20% of the training set was used as a validation set, and model performance was evaluated using accuracy. The subset that achieved the highest accuracy on the validation set was selected for the final model. Subsequently, the RF model was trained on the filtered training set, and its performance was evaluated on the test set. The best-performing model was then selected for final use. Lastly, the suitability for site selection across all grids within the study area was predicted. A set of highly suitable sample points were selected as potential site locations. The distribution of high-probability points was examined using Shanghai’s commercial spatial planning, and a comparative analysis was conducted with actual distribution patterns to test the relative suitability of their layouts.

All data preprocessing, spatial analysis, and machine learning modeling in this study were conducted using the following software: Python 3.9 was used for data processing, model training, and evaluation, primarily relying on the pandas, scikit-learn, matplotlib, and SHAP libraries; and ArcGIS 10.8 was used for spatial data extraction, layer processing, and result visualization.

3.2.2. Feature Extraction

Initially, irrelevant place names were removed from the POI data. Building on the research of Zhao et al. [11], this study selected universities, shopping malls, office buildings, subway stations, and bus stations for the Euclidean distance analysis to examine the layout preferences of the two types of coffee retail stores. In addition, eleven categories of POI data—including transportation facilities, dining services, residential areas, science and education, cultural facilities, shopping services, life services, medical facilities, scenic spots, sports and leisure facilities, government institutions, and corporate businesses—were chosen for kernel density analysis to reflect the division of functional areas within the city. Land use data were used to extract built-up areas. Using Stata MP 18, a binary variable was created to indicate whether the center of each grid cell falls within a built-up area, with a value of 1 representing built-up and 0 representing non-built-up. To prevent redundancy in the feature factors, subway and bus stations within transportation facilities, higher education institutions within science and education, office buildings within residential areas, and shopping malls within shopping services were excluded during POI processing. A circular area with a radius of 1000 m was defined as the basis for analysis, referencing the stores’ delivery distance and the walking accessibility range of consumers. Finally, the various data types were overlaid with the grid center points and standardized to derive feature variables for each grid. In summary, this study categorizes the data into 26 feature types, as shown in Table 3.

3.2.3. Feature Selection

In practical machine learning applications, having an excessive number of features does not necessarily improve performance [65]. A large number of features often leads to increased interdependence among variables and longer model-training times [36]. SHAP indicates how a given feature contributes to the prediction of a specific data point and determines the direction and magnitude of its influence. This method can be applied to any supervised learning problem requiring feature selection, helping to identify features related to the dependent variable and, thus, improving data understanding and enabling more accurate predictive models. The selected features were further tested using the forward selection method to evaluate how changes in the feature count affected model performance on the validation set.

3.2.4. Model Construction and Interpretation Methods

This study applies the RF model to conduct probability prediction for the site selection of Luckin Coffee and Starbucks in Shanghai. First, grids containing Luckin Coffee and Starbucks stores are overlaid with an equal number of negative sample points, along with their feature values, to create the training dataset. The training dataset is then split into a training set (D) and a test set (S) in a 70:30 ratio, according to practical requirements [64].

The SHAP value is used to determine the feature importance for each research object. SHAP is a game-theoretic approach that explains the contribution of each feature to the model’s prediction by computing the marginal contribution of each feature across all possible combinations. The SHAP value for a feature i is defined as Equation (1):

φ_{i} = \sum_{S \subseteq N / {i}} \frac{| S |! (| N | - | S | - 1)!}{| N |!} [f (S \cup {i}) - f (S)]

(1)

where

N

is the set of all features,

S

is a subset of features not containing

i

, and

f (S)

is the model output when only features in

S

are present.

The PDPs illustrate the dependency between the target response and a set of “target” features while marginalizing the values of other features [66]. Intuitively, partial dependence can be interpreted as a function of the expected target response with respect to the “target” feature. Although feature importance reflects the contribution of a feature to the model, the use of PDPs provides insights into how a feature influences predictions.

3.2.5. Model Validation and Evaluation

Machine learning requires quantitative evaluation metrics to assess model performance. For classification model results, this study uses accuracy, precision, recall, F1 score, and area under the receiver operating characteristic curve (AUC) to evaluate performance. The first four metrics are based on the confusion matrix, while the AUC represents the probability that a randomly selected positive sample has a higher score than a negative sample. Although a classification model was used, the model outputs the predicted probability of each grid cell being classified as the positive class. To evaluate model performance using standard classification metrics, a threshold of 0.5 was applied to convert the predicted probabilities into binary classification results. The formulas are as follows:

A c c u r a c y = (T P + T N) / (T P + F P + F N + T N)

(2)

P r e c i s i o n = T P / (T P + F P)

(3)

R e c a l l = T P / (T P + F N)

(4)

F 1 = (2 \times P r e c i s i o n \times R e c a l l) / (P r e c i s i o n + R e c a l l)

(5)

A U C = \sum_{s_{i} \in p o s i t i v e c l a s s} {r a n k}_{s_{i}} - (M \times (M + 1)) / 2) / (M \times N)

(6)

where TP represents the number of true positive samples, TN represents the number of true negative samples, FP represents the number of false positive samples, and FN represents the number of false negative samples.

{r a n k}_{s_{i}}

represents the sample index; M and N denote the number of positive and negative samples, respectively; and

\sum_{s_{i} \in p o s i t i v e c l a s s} i s

the sum of positive sample indices.

4. Results

4.1. Variable Selection

The SHAP plot reflects both the magnitude and direction of each feature’s contribution to the model, with wider color bands indicating greater feature importance. As illustrated in Figure 3, the key influencing factors contributing significantly to the spatial layout of both brands exhibit a certain degree of similarity and consistent directional effects, highlighting the commonalities in site selection for coffee shops under the new retail model.

In terms of urban functional areas, both brands show a preference for locations near science and education cultural facilities (X16) and transportation facilities (X13). This spatial arrangement helps attract highly educated consumers, enhances brand image, and improves store accessibility while increasing customer foot traffic. The distance to the nearest office building (X11) is a major contributing factor for both brands, whereas the distance to the nearest university (X12) is more significant for Luckin Coffee. As an early adopter of the new retail model, Luckin targets a broader customer base compared to Starbucks.

Notably, population density (X15) has a positive impact on Luckin’s distribution but does not significantly affect Starbucks. High population density can generate substantial customer traffic, which Luckin leverages by establishing multiple convenient pickup locations in these areas to meet the instant demands of consumers with fast-paced urban lifestyles. By contrast, Starbucks may prioritize the quality of site selection over quantity, favoring locations such as business districts and high-end shopping centers. Although these locations may not have the highest population density, they offer stronger purchasing power and higher brand loyalty among consumers.

According to our comparison, dining service (X14), distance to the nearest competitor (X24), and competitor density (X25) exert varying degrees of influence on both brands. Moreover, market agglomeration positively impacts the spatial distribution of both brands. According to spatial competition theory, the differences in transportation costs arising from geographic dispersion can weaken the intensity of market competition to varying degrees [67]. Coffee shops of similar types tend to cluster around core commercial areas to maximize market share, while the symbiotic spatial layout of different coffee brands also demonstrates a high degree of complementarity.

Decision tree-based methods are particularly advantageous in capturing nonlinear relationships [37]. Therefore, we compared several tree-based algorithms—GBDT, RF, Extra Trees (ET), and AdaBoost (ADA)—as well as the Stacking ensemble method to assess the appropriateness of the model selection. As shown in Figure 4, RF demonstrates greater stability across the different feature set sizes compared to the other models. Although Stacking achieves peak accuracy with fewer features, its overall improvement is marginal. Moreover, the complexity of the Stacking model makes it less interpretable in terms of feature importance. For these reasons, RF was ultimately selected as the primary model in this study.

The selected RF model was applied in the forward feature selection process. As shown in Figure 5, the accuracy of the Luckin Coffee model reaches its peak at 0.910 with 14 features, while the Starbucks model achieves its highest accuracy of 0.924 with 18 features.

4.2. Feature Importance

Based on feature importance and model accuracy, the following variables were identified as key influencers of location preferences (see Table 4). PDPs were generated for the selected features to quantitatively analyze their impact on the spatial distribution of both brands.

Although RF is generally robust to multicollinearity, this study further examined multicollinearity among the selected features to ensure data quality. Variance Inflation Factors (VIFs) were calculated, and most features were found to have VIF values below 10, indicating no serious multicollinearity issues overall. Therefore, all features were retained in the model to preserve information completeness. The feature importance rankings are shown in Figure 6.

The results of the PDPs are shown below (see Figure 7). In terms of urban structure, nighttime light data (X7) and road network density (X3) have a positive impact on the spatial distribution of both brands, while topography (X1) exerts a negative influence. Notably, these three factors have a stronger impact on Starbucks than on Luckin Coffee. Additionally, the peak effect of land rent (X9) on Starbucks is positioned to the right of that for Luckin, indicating that Starbucks has a higher tolerance for selecting locations in prime commercial areas compared to Luckin. Even after integrating online sales channels, Starbucks continues to maintain high operational costs. This strategy reflects its reliance on brand strength and customer loyalty to sustain competitiveness in high-cost locations. Such an approach benefits Starbucks in terms of brand promotion and market influence.

Population density (X8) has an opposite effect on Luckin Coffee and Starbucks. In areas where the population density exceeds 95 people per 10,000 m², its contribution to Luckin’s site selection surpasses that of Starbucks. In densely populated areas, such as city centers or university neighborhoods, Luckin benefits from its low pricing and convenience-oriented model, effectively attracting customers. By contrast, Starbucks’s performance in these regions may be constrained by its premium brand positioning and pricing strategy, which make it less competitive than Luckin in high-density areas.

In terms of proximity to nearby facilities, distance to the nearest subway entrance (X5), distance to the nearest mall (X10), and distance to the nearest office building (X11) have a similar level of influence on both brands. However, distance to the nearest bus stop (X6) and distance to the nearest university (X12) show significant differences in their impact, with Luckin’s fitted curve having a greater absolute slope than that of Starbucks, indicating a higher sensitivity to these factors. This suggests that, compared to Starbucks, Luckin places greater emphasis on location factors that cater to students and commuters who require fast and convenient service.

In terms of urban functional zones, the fitted curves of both brands show similar distribution patterns for dining services (X14), science and education cultural facilities (X16), and life services (X18), indicating that their market strategies and consumer appeal are relatively aligned in these areas. However, there are notable differences in peak values for transportation facilities (X13), residential areas (X15), and corporate businesses (X23). Starbucks exhibits a stronger attraction to high-traffic and office-dense areas, whereas residential areas contribute more to Luckin’s distribution. Notably, leisure facilities (X21) have completely opposite effects on the two brands. This contrast may stem from the fact that Starbucks’s store locations often signify a vibrant commercial atmosphere and high population activity. Areas with dense transportation, office spaces, and leisure facilities present stronger growth potential compared to residential neighborhoods, as visitors to these locations tend to have higher spending power.

In terms of industry competition, the fitted curves for distance to the nearest competitor (X24) and competitor density (X25) indicate that both brands exhibit a high level of tolerance for competition. Additionally, areas with a high concentration of coffee shops contribute more significantly to Starbucks’s site selection. This strategy allows Starbucks to leverage the existing coffee culture and customer flow while benefiting from the clustering effect. Moreover, the highly competitive environment encourages each store to enhance its service quality and product innovation to attract and retain customers.

4.3. Predictive Results and Spatial Analysis

4.3.1. Potential Site Locations

Hyperparameter tuning was conducted using a grid search with five-fold cross-validation on the training set. Cross-validation further partitions the training data to evaluate the performance of different hyperparameter combinations. After identifying the best-performing combination, the model was retrained on the full training set using the selected hyperparameters. The selected hyperparameters are presented in Table 5.

Using the trained models, predictions were made on the training set, and the results are presented in Table 6. The site selection model for Luckin Coffee achieved an accuracy of 0.900, while the model for Starbucks reached 0.922. A ROC value of 0.5 indicates that the model lacks any predictive capability. Evidently, both models demonstrate strong performance in predicting site selection outcomes. The prediction results are presented in Figure 8.

Commercial consumption clusters, as the core framework for constructing urban consumption space networks, play a crucial role in supporting economic development. Therefore, this study uses the international-, municipal-, and district-level commercial centers, as defined in Shanghai’s commercial spatial planning, as reference urban planning landmarks for site selection. The average probabilities for these centers were calculated. As shown in Table 7, both brands exhibit probabilities exceeding 0.7 around international and municipal commercial centers, which are significantly higher than their respective probabilities around district-level centers. Consequently, this study adopts international-level commercial centers and actual commercial centers as landmarks for subsequent fine-scale site selection.

4.3.2. Site Selection Recommendations

From September 2022 to November 2023, the number of Starbucks stores in Shanghai increased from 1000 to 1113. Based on this, the number of new stores in this study was set to 100. Excluding grids with existing stores, the remaining grids were evaluated, and the top 100 grids with the highest predicted probabilities were identified as prime candidates for future store locations. Because most high-probability points are located in the city center and significantly overlap with the locations of international- and municipal-level commercial centers, these two levels of commercial centers are selected for further site selection prediction and validation. This approach aims to provide recommendations for new retail store site selection and the sustainable development of commercial consumption spatial patterns.

We delineated the aggregation zones for each non-future commercial center in Figure 9. For international-level commercial centers, due to the overlapping aggregation zones of West Nanjing Road, East Nanjing Road, North Bund, Little Lujiazui, Yuyuan Garden Mall, and Middle Huaihai Road, these areas are combined into the Central International Business District (CIBD). Within this area, there are 65 candidate sites for Starbucks and 44 for Luckin Coffee. Starbucks’s candidate sites are primarily concentrated around West Nanjing Road, East Nanjing Road, and Middle Huaihai Road, whereas Luckin Coffee’s candidate sites are predominantly located around Middle Huaihai Road. Additionally, Starbucks’s other high-probability points are distributed within the Central Activity Zone, without clustering around other international commercial centers. By contrast, Luckin Coffee has 10 high-probability points in the Xujiahui commercial center aggregation zone and 35 in the Central Activity Zone, and the remainder are scattered throughout the main urban area.

For municipal-level commercial centers, we merged the overlapping aggregation zones of East Nanjing Road and West Nanjing Road into Nanjing Road, where Luckin Coffee and Starbucks show similar numbers of candidate sites, with 37 and 35, respectively. Luckin Coffee’s other high-probability points are primarily located in Xujiahui and Wujiaochang, whereas Starbucks exhibits a higher site selection probability along Middle Huaihai Road.

To further refine the high-probability points, this study calculates kernel density values for existing stores and normalizes them as realistic suitability. By subtracting realistic suitability from the estimated suitability derived from the calculated probabilities, we obtain relative suitability, which is used to evaluate whether high-probability points around commercial centers are viable for site selection. From Figure 10, it can be observed that Luckin Coffee exhibits relative suitability values below 0 in international commercial centers such as East Nanjing Road, West Nanjing Road, and Middle Huaihai Road. In the Xujiahui international commercial center, the relative suitability approaches 0, indicating that although these areas have high site selection probabilities, opening new stores in these regions would face considerable pressure. On the other hand, Starbucks still has the potential to open new stores in the West Nanjing Road international commercial center.

For municipal-level commercial centers, East Nanjing Road and West Nanjing Road similarly attract a large number of high-probability points for both brands. Starbucks also has a notable number of high-probability points on Middle Huaihai Road. However, opening stores in these commercial aggregation zones would similarly face significant challenges. Conversely, the high-probability points for Luckin Coffee in Wujiaochang face less competitive pressure, which makes it a suitable area for new store openings.

5. Discussion

From the perspective of influencing factors, the influence of natural environment, transportation accessibility [41,42], and economic conditions [43,44,45] on store layout is gradually diminishing for both brands. In terms of urban functional zones, science and education cultural facilities (X16) and transportation facilities (X13) have a positive influence on site selection for both brands but exhibit clear marginal effects. This suggests that the two brands share similar spatial preferences regarding urban functional zones and that in areas with excessive concentration or intense competition, stores may face diluted customer traffic and overlapping services, thereby reducing the marginal value of site selection. Distance to the nearest competitor (X24) and competitor density (X25) represent micro- and macro-level competitive factors, respectively. These factors show threshold and marginal effects on the two brands. The impact of competitor density (X25) is weaker for Luckin, which indicates that Starbucks is better positioned to leverage high-density commercial clusters to generate brand agglomeration effects and market recognition spillovers. This aligns with Starbucks’s strategic emphasis on “social experiences” and “brand value”. By contrast, Luckin tends to pursue rapid market penetration by avoiding saturated areas, thereby enhancing operational efficiency and reducing customer acquisition costs. The observed nonlinear effects are also consistent with the findings of Gao [21]. This shift from traditional factors to urban functional variables further underscores the enduring relevance of foundational concepts from location and diffusion theory—such as market demand, market competition, and agglomeration—even in the era of new retail, albeit in evolved forms [68].

In terms of urban spatial planning, Starbucks and Luckin Coffee exhibit stark differences in probability and suitability within the Central Activity Zone and the main urban area. Both brands currently face the issue of high probability but low suitability in their layouts within the Central Activity Zone, indicating increasingly limited opportunities for growth by simply expanding the number of stores. The Eastern International Consumption Cluster, as a commercial hub of Shanghai, includes East Nanjing Road, Middle Huaihai Road, and Xujiahui, all of which contain numerous high-probability site selection points. However, these areas already have dense store layouts in reality, which leads to challenges such as homogeneous competition, service overflow, and significant diminishing marginal returns for both brands. Notably, West Nanjing Road still offers development potential for Starbucks. The area’s well-crafted commercial spaces, combining artistic comfort with recognizable public spaces, and its strong brand presence create a favorable environment for Starbucks to further grow and differentiate itself in the market. Notably, the International Tourism Resort commercial center within the main urban area demonstrates high suitability exclusively for Starbucks. Centered around Disneyland, this area benefits from a consumer base with high spending power. Even without relying on an integrated online–offline approach, Starbucks can leverage the new retail model to provide customized products and services, capitalizing on the area’s unique market characteristics.

The location strategies of Starbucks and Luckin Coffee should not be viewed as a rejection of classical location theory but rather as adaptations and extensions under the logic of new retail. Starbucks consistently favors high-rent, high-footfall areas such as central activity zones, aligning with central place theory and bid rent theory, both of which emphasize accessibility and the willingness to pay for centrality. By contrast, Luckin Coffee’s dispersed layout supports the emerging polycentric urban spatial model. This indicates that new retail formats are challenging the monocentric assumption by increasingly prioritizing convenience, density, and localized demand over central prestige. The threshold effect at the micro scale (distance to the nearest competitor) and the marginal effect of saturation at the macro scale (competitor density) reflect the logic of the theory of minimum differentiation and its extensions into retail agglomeration theory. Starbucks thrives in highly saturated clusters, benefiting from brand-driven agglomeration and synergistic foot traffic. Meanwhile, Luckin Coffee actively avoids oversaturated areas, which reflects a strategy of competitive avoidance.

Overall, addressing site selection tasks through an integrated planning approach will not only sustain and encourage the layout and development of new retail stores in urban centers but also extend these efforts to suburban areas. This strategy promotes mutual reinforcement between suburban and central commercial centers, fostering balanced urban development.

6. Conclusions

This paper offers a novel perspective on the comparative analysis of the new retail model. The study area is divided into 100 m × 100 m small-scale grids, and a feature matrix is constructed based on the spatial data of the research subjects. Machine learning methods are used to filter out the key features of both subjects, predicting their location suitability. SHAP and PDPs are employed to compare and interpret the spatial drivers underlying the site selection mechanisms of the two brands. Furthermore, high-probability grids are overlaid with urban planning maps to guide future site selection strategies in alignment with formal spatial planning frameworks.

The findings reveal that urban functional variables—such as transportation and educational facilities—as well as the competitive landscape increasingly influence store location decisions. In Shanghai’s nearly saturated international business centers, West Nanjing Road still presents growth potential and is well suited for further expansion by Starbucks. By contrast, Luckin Coffee is better positioned to expand in municipal-level commercial centers, where the competitive intensity is relatively lower. These observations suggest that the new retail model necessitates a reconfiguration of spatial layout strategies in response to varying degrees of market saturation and urban functional structures. Rather than relying exclusively on central business districts, new retail brands are increasingly integrating localized demand, competitive dynamics, and functional zoning into their site selection strategies—signaling a broader shift toward more data-driven, adaptive, and decentralized approaches to urban commercial planning.

Overall, this study demonstrates the value of integrating machine learning and GIS in understanding and optimizing urban retail site selection, while also highlighting the importance of aligning site selection outcomes with formal urban planning documents. The proposed approach offers a scalable, data-driven framework that contributes to both academic research and practical urban commercial planning.

However, this study also has several limitations. This research divides the area into 100 m × 100 m grids. Although this scale is fine-grained, there is a potential issue of weak spatial heterogeneity between adjacent grids. A comparative analysis with larger-scale grids could be conducted in future studies. The analysis assumes that all site selection decisions are driven by brand-level strategic considerations. However, in practice, many new store openings—particularly for Starbucks—may be initiated by franchisees or influenced by local partners. These decentralized decisions may not strictly follow the optimization logic reflected in the model. Additionally, due to the deterioration of the global economic environment, the accompanying issue of consumption downgrading is likely to have a greater impact on Starbucks compared to Luckin Coffee.

Author Contributions

Methodology, Zhengxu Zhao and Youheng Xu; Validation, Zhengxu Zhao and Jianshu Duan; Formal analysis, Zhengxu Zhao; Writing—original draft, Zhengxu Zhao; Writing—review and editing, Zhengxu Zhao, Gang Chen, Jianshu Duan, and Youheng Xu; Visualization, Gang Chen; Supervision, Gang Chen. All authors have read and agreed to the published version of the manuscript.

Funding

This research was supported by the National Natural Science Foundation of China (No. 42071172).

Data Availability Statement

Data are available upon request.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Wang, X.; Ng, C.T. New retail versus traditional retail in e-commerce: Channel establishment, price competition, and consumer recognition. Ann. Oper. Res. 2020, 291, 921–937. [Google Scholar] [CrossRef]
Kang, W.; Shao, B. The impact of voice assistants’ intelligent attributes on consumer well-being: Findings from PLS-SEM and fsQCA. J. Retail. Consum. Serv. 2023, 70, 103130. [Google Scholar] [CrossRef]
Öner, Ö. Retail Location. Ph.D. Thesis, Jönköping International Business School, Jonkoping, Sweden, 2014. [Google Scholar]
Golovnin, O.; Igonina, A. Decision support system for location selection of convenience stores and retail facilities using optimization techniques and GIS. J. Phys. Conf. Ser. 2021, 2134, 012015. [Google Scholar]
Pozos-Brewer, R. Coffee Shops: Exploring Urban Sociability and Social Class in the Intersection of Public and Private Space. Bachelor’s Thesis, Department of Sociology & Anthropology, Swarthmore College, Swarthmore, PA, USA, 2015. [Google Scholar]
Montgomery, J. Café culture and the city: The role of pavement cafés in urban public social life. J. Urban Des. 1997, 2, 83–102. [Google Scholar] [CrossRef]
Ratchford, B.; Soysal, G.; Zentner, A.; Gauri, D.K. Online and offline retailing: What we know and directions for future research. J. Retail. 2022, 98, 152–177. [Google Scholar] [CrossRef]
You, Y.; Zhang, J. Analysis of new retail location based on GIS spatial analysis—Take Starbucks and Luckin Coffee for example. Int. Arch. Photogramm. Remote Sens. Spat. Inf. Sci. 2022, 48, 79–84. [Google Scholar] [CrossRef]
Xu, L.; Li, F.; Huang, K.; Ning, J. A two-layer location choice model reveals what’s new in the “New Retail”. Ann. Am. Assoc. Geogr. 2023, 113, 635–657. [Google Scholar] [CrossRef]
Yesilnacar, M.I.; Cetin, H. Site selection for hazardous wastes: A case study from the GAP area, Turkey. Eng. Geol. 2005, 81, 371–388. [Google Scholar] [CrossRef]
Zhao, J.; Zong, B.; Wu, L. Site Selection Prediction for Coffee Shops Based on Multi-Source Space Data Using Machine Learning Techniques. ISPRS Int. J. Geo-Inf. 2023, 12, 329. [Google Scholar] [CrossRef]
Tang, X.; Zou, C.; Shu, C.; Zhang, M.; Feng, H. Research on Site Selection Planning of Urban Parks Based on POI and Machine Learning—Taking Guangzhou City as an Example. Land 2024, 13, 1362. [Google Scholar] [CrossRef]
Zhao, B.; Zheng, H.; Cheng, X. A Machine Learning Approach to Predict Site Selection from the Perspective of Vitality Improvement. Land 2024, 13, 2113. [Google Scholar] [CrossRef]
Wang, Y.; Han, Y.; Luo, A.; Xu, S.; Chen, J.; Liu, W. Site selection and prediction of urban emergency shelter based on VGAE-RF model. Sci. Rep. 2024, 14, 14368. [Google Scholar] [CrossRef] [PubMed]
Niu, Q.; Wang, G.; Liu, B.; Zhang, R.; Lei, J.; Wang, H.; Liu, M. Selection and prediction of metro station sites based on spatial data and random forest: A study of Lanzhou, China. Sci. Rep. 2023, 13, 22542. [Google Scholar] [CrossRef]
Yao, Y.; Feng, C.; Xie, J.; Yan, X.; Guan, Q.; Han, J.; Zhang, J.; Ren, S.; Liang, Y.; Luo, P. A site selection framework for urban power substation at micro-scale using spatial optimization strategy and geospatial big data. Trans. GIS 2023, 27, 1662–1679. [Google Scholar] [CrossRef]
Lu, J.; Zheng, X.; Nervino, E.; Li, Y.; Xu, Z.; Xu, Y. Retail store location screening: A machine learning-based approach. J. Retail. Consum. Serv. 2024, 77, 103620. [Google Scholar] [CrossRef]
Mazhi, K.Z.; Suryana, L.E.; Davi, A.; Dewi, W.R. Site selection of retail shop based on spatial analysis and machine learning. In Proceedings of the 2020 International Conference on Advanced Computer Science and Information Systems (Icacsis), Depok, Indonesia, 17–18 October 2020; pp. 135–140. [Google Scholar]
Ganguly, P.; Mukherjee, I. Enhancing Retail Sales Forecasting with Optimized Machine Learning Models. In Proceedings of the 2024 4th International Conference on Sustainable Expert Systems (ICSES), Kaski, Nepal, 15–17 October 2024; pp. 884–889. [Google Scholar]
Yee, H.-J.; Ting, C.-Y.; Ho, C.C. Retail site selection using machine learning algorithms. Int. J. Recent Technol. Eng. 2019, 8, 2422–2431. [Google Scholar] [CrossRef]
Feng, G.; Shunyi, L.; Zhenzhi, J.; Zhisai, H.; Yang, L.; Hongbao, L.; Jiemin, W.; Wangyang, C.; Guanyao, L. Location differs between traditional and new retail: A comparison analysis of Starbucks and Luckin Coffee in China using machine learning. Cities 2025, 158, 105668. [Google Scholar] [CrossRef]
Feng, G.; Zexia, W.; Shunyi, L.; Wangyang, C.; Guanyao, L.; Zhenzhi, J. Cafe geography tells how locations vary across retail models. J. Retail. Consum. Serv. 2025, 84, 104174. [Google Scholar] [CrossRef]
Yao, L.; Gao, C.; Xu, Y.; Zhang, X.; Wang, X.; Hu, Y. Prediction of Commercial Street Location Based on Point of Interest (POI) Big Data and Machine Learning. ISPRS Int. J. Geo-Inf. 2024, 13, 371. [Google Scholar] [CrossRef]
Aversa, J.; Doherty, S.; Hernandez, T. Big data analytics: The new boundaries of retail location decision making. Pap. Appl. Geogr. 2018, 4, 390–408. [Google Scholar] [CrossRef]
Bronnenberg, B.J.; Mela, C.F. Market roll-out and retailer adoption for new brands. Mark. Sci. 2004, 23, 500–518. [Google Scholar] [CrossRef]
Saaty, T.L. A scaling method for priorities in hierarchical structures. J. Math. Psychol. 1977, 15, 234–281. [Google Scholar] [CrossRef]
Duarte, L.; Teodoro, A.C.; Santos, P.; Rodrigues de Almeida, C.; Cardoso-Fernandes, J.; Flores, D. An Interactive WebGIS Integrating Environmental Susceptibility Mapping in a Self-Burning Waste Pile Using a Multi-Criteria Decision Analysis Approach. Geosciences 2022, 12, 352. [Google Scholar] [CrossRef]
Miller, H.J.; Goodchild, M.F. Data-driven geography. GeoJournal 2015, 80, 449–461. [Google Scholar] [CrossRef]
Abdelouhed, F.; Ahmed, A.; Abdellah, A.; Yassine, B.; Mohammed, I. GIS and remote sensing coupled with analytical hierarchy process (AHP) for the selection of appropriate sites for landfills: A case study in the province of Ouarzazate, Morocco. J. Eng. Appl. Sci. 2022, 69, 19. [Google Scholar] [CrossRef]
Aboulola, O.I. A Literature Review of Spatial Location Analysis for Retail Site Selection. In Proceedings of the AMCIS, Boston, MA, USA, 10–12 August 2017. [Google Scholar]
Xiang, Y.; Chang, D.; Feng, X. Leveraging Urban Big Data for Informed Business Location Decisions: A Case Study of Starbucks in Tianhe District, Guangzhou City. In Proceedings of the 2023 IEEE International Conference on Industrial Engineering and Engineering Management (IEEM), Singapore, 18–21 December 2023; pp. 1012–1016. [Google Scholar]
Zhu, T.; Singh, V. Spatial competition with endogenous location choices: An application to discount retailing. QME 2009, 7, 1–35. [Google Scholar] [CrossRef]
Mendes, A.B.; Themido, I.H. Multi-outlet retail site location assessment. Int. Trans. Oper. Res. 2004, 11, 1–18. [Google Scholar] [CrossRef]
Si, W.; Yang, X. Medicine retail terminal layout and site selection problems based on machine learning research: Take S enterprise as an example. In Proceedings of the 2021 6th International Conference on Cloud Computing and Internet of Things, Okinawa, Japan, 22–24 September 2021; pp. 22–28. [Google Scholar]
Duan, J.; Zhao, Z.; Xu, Y.; You, X.; Yang, F.; Chen, G. Spatial Distribution Characteristics and Driving Factors of Little Giant Enterprises in China’s Megacity Clusters Based on Random Forest and MGWR. Land 2024, 13, 1105. [Google Scholar] [CrossRef]
Lundberg, S.M.; Lee, S.-I. A unified approach to interpreting model predictions. In Advances in Neural Information Processing Systems; Curran Associates, Inc.: San Francisco, CA, USA, 2017; Volume 30. [Google Scholar]
Molnar, C. Interpretable Machine Learning; Lulu. Com: Morrisville, NC, USA, 2020. [Google Scholar]
Probst, P.; Wright, M.N.; Boulesteix, A.L. Hyperparameters and tuning strategies for random forest. Wiley Interdiscip. Rev. Data Min. Knowl. Discov. 2019, 9, e1301. [Google Scholar]
Araldi, A.; Fusco, G. Retail fabric assessment: Describing retail patterns within urban space. Cities 2019, 85, 51–62. [Google Scholar]
Jiang, H.; He, G. Analysis of spatial and temporal evolution of regional water resources carrying capacity and influencing factors—Anhui Province as an example. Sustainability 2023, 15, 11255. [Google Scholar] [CrossRef]
Murray, A.T.; Wu, X. Accessibility tradeoffs in public transit planning. J. Geogr. Syst. 2003, 5, 93–107. [Google Scholar] [CrossRef]
Tinessa, F.; Pagliara, F.; Biggiero, L.; Veneri, G.D. Walkability, accessibility to metro stations and retail location choice: Some evidence from the case study of Naples. Res. Transp. Bus. Manag. 2021, 40, 100549. [Google Scholar] [CrossRef]
Shi, K.; Yu, B.; Huang, Y.; Hu, Y.; Yin, B.; Chen, Z.; Chen, L.; Wu, J. Evaluating the ability of NPP-VIIRS nighttime light data to estimate the gross domestic product and the electric power consumption of China at multiple scales: A comparison with DMSP-OLS data. Remote Sens. 2014, 6, 1705–1724. [Google Scholar] [CrossRef]
Tsiotsou, R.H.; Rigopoulou, I.D.; Kehagias, J.D. Tracing customer orientation and marketing capabilities through retailers’ websites: A strategic approach to internet marketing. J. Target. Meas. Anal. Mark. 2010, 18, 79–94. [Google Scholar] [CrossRef]
Havel, M.B. Delineation of property rights as institutional foundations for urban land markets in transition. Land Use Policy 2014, 38, 615–626. [Google Scholar] [CrossRef]
Yao, H.; Zhang, G. A review of commercial geography: Theoretical foundations, practical applications, and prospects. Geogr. Res. Bull. 2024, 3, 183–214. [Google Scholar]
Donner, H.; Loh, T.H. Does the Starbucks effect exist? Searching for a relationship between Starbucks and adjacent rents. Prop. Manag. 2019, 37, 562–578. [Google Scholar] [CrossRef]
Wang, Z.; Ma, D.; Sun, D.; Zhang, J. Identification and analysis of urban functional area in Hangzhou based on OSM and POI data. PLoS ONE 2021, 16, e0251988. [Google Scholar] [CrossRef]
Berry, B.J. Central places in southern Germany. Econ. Geogr. 1967, 43, 275–276. [Google Scholar] [CrossRef]
Huff, D.L. A probabilistic analysis of shopping center trade areas. Land Econ. 1963, 39, 81–90. [Google Scholar] [CrossRef]
Hotbllino, H. Stability in competition. Econ. J. 1929, 39, 41–57. [Google Scholar]
Haig, R.M. Regional Survey of New York and Its Environs; New York City Planning Commission: New York, NY, USA, 1927. [Google Scholar]
Foxall, G. Consumer Psychology in Behavioral Perspective; Beard Books: Fairless Hills, PA, USA, 2004. [Google Scholar]
Dolega, L.; Reynolds, J.; Singleton, A.; Pavlis, M. Beyond retail: New ways of classifying UK shopping and consumption spaces. Environ. Plan. B Urban Anal. City Sci. 2021, 48, 132–150. [Google Scholar] [CrossRef]
Roig-Tierno, N.; Baviera-Puig, A.; Buitrago-Vera, J.; Mas-Verdu, F. The retail site location decision process using GIS and the analytical hierarchy process. Appl. Geogr. 2013, 40, 191–198. [Google Scholar] [CrossRef]
Dadashpoor, H.; Yousefi, Z. Centralization or decentralization? A review on the effects of information and communication technology on urban spatial structure. Cities 2018, 78, 194–205. [Google Scholar] [CrossRef]
Nelson, R.L. The Selection of Retail Locations; FW Dodge Corporation: Cincinnati, OH, USA, 1958. [Google Scholar]
Efeoglu, H.E.; Joutsiniemi, A.; Mozuriunaite, S. Exploring the plot patterns of the retail landscape: The case of the Helsinki Metropolitan area. Environ. Plan. B Urban Anal. City Sci. 2024, 51, 1210–1226. [Google Scholar] [CrossRef]
Daily, C. Shanghai Tops the World with Over 9500 Coffee Shops. Available online: https://www.chinadailyhk.com/hk/article/582889 (accessed on 3 March 2025).
Luckin Coffee Inc. 2024 Annual Report. Available online: https://investor.luckincoffee.co/static-files/a171f854-0b83-4a0b-b00c-bfa752887010 (accessed on 3 March 2025).
Corporation, S. Q1 Fiscal 2024 Results. 2024. Available online: https://s203.q4cdn.com/326826266/files/doc_financials/2024/ar/Starbucks-Fiscal-2024-Annual-Report (accessed on 3 March 2025).
Guan, C.; Rowe, P.G. The concept of urban intensity and China’s townization policy: Cases from Zhejiang Province. Cities 2016, 55, 22–41. [Google Scholar] [CrossRef]
Gong, Z.; Ma, Q.; Kan, C.; Qi, Q. Classifying street spaces with street view images for a spatial indicator of urban functions. Sustainability 2019, 11, 6424. [Google Scholar] [CrossRef]
Hastie, T.; Tibshirani, R.; Friedman, J.H.; Friedman, J.H. The Elements of Statistical Learning: Data Mining, Inference, and Prediction; Springer: Berlin/Heidelberg, Germany, 2009; Volume 2. [Google Scholar]
Guyon, I.; Elisseeff, A. An introduction to variable and feature selection. J. Mach. Learn. Res. 2003, 3, 1157–1182. [Google Scholar]
Friedman, J.H. Greedy function approximation: A gradient boosting machine. Ann. Stat. 2001, 29, 1189–1232. [Google Scholar] [CrossRef]
Baldwin, R.E.; Okubo, T. Heterogeneous firms, agglomeration and economic geography: Spatial selection and sorting. J. Econ. Geogr. 2006, 6, 323–346. [Google Scholar] [CrossRef]
Zhou, L.; Wang, S.; Li, H. Store network expansion in the era of online consumption: Evidence from the Suning Appliance retail chain in China. Appl. Geogr. 2024, 165, 103225. [Google Scholar] [CrossRef]

Figure 1. Location of the study area.

Figure 2. Flowchart of the Research Framework.

Figure 3. SHAP Plots of Luckin Coffee and Starbucks.

Figure 4. The accuracy of different models under varying numbers of input features.

Figure 5. Performance Variation in Models with Different Numbers of Feature Variables.

Figure 6. Feature Importance.

Figure 7. Partial Dependence Plot.

Figure 8. Spatial Distribution of Site Suitability for Luckin Coffee and Starbucks Stores.

Figure 9. Distribution of High-Probability Points within Commercial Aggregation Zones.

Figure 10. Results of Relative Suitability Testing.

Table 1. Brand Backgrounds and Core Operational Differences.

Dimension	Luckin Coffee	Starbucks
Year established	2017 [59]	1971 (USA), entered China in 1999
Market positioning	Affordable, convenient, high-frequency consumption	Premium, experience-oriented consumption
Store composition	98.9% pick-up stores (20–60 m²), 1.1% relax stores (>120 m²)	Primarily relax stores
Order structure	17.1% delivery orders, 82.9% mobile orders	48% on-site orders, 52% delivery and mobile orders
Marketing strategy	App-centered, emphasizing “traffic before quality” with aggressive promotions	Store-centered, relying on a membership system and trend-driven products

Table 2. Data Sources of This Study.

Data Types	Temporal Scope	Data Formats	Data Source Name	URL
Nighttime light data	2023	Raster data (100 m resolution)	Visible Infrared Imaging Radiometer Suite (VIIRS) Nighttime Light	https://eogdata.mines.edu/products/vnl/ (accessed on 30 January 2024)
Land use data	2022 ¹	Raster data (100 m resolution)	Sentinel-2 Land Cover Explorer (Esri)	https://livingatlas.arcgis.com/landcoverexplorer/ (accessed on 30 January 2024)
Road network data	2023	Vector data (Line features)	National Catalogue Service For Geographic Information	https://www.webmap.cn/
Population data	2022 ¹	Raster data (100 m resolution)	LandScan Global Population Database (ORNL)	https://landscan.ornl.gov/
Listed Housing Prices	2023	Vector data (Point features)	Listed Housing Prices (Anjuke Real Estate Platform)	https://shanghai.anjuke.com/
Elevation data	2022 ¹	Raster data (30 m resolution)	Copernicus Digital Elevation Model (DEM)	https://dataspace.copernicus.eu/explore-data/data-collections/copernicus-contributing-missions (accessed on 30 January 2024)
POI data	2023	Vector data (Point features)	AutoNavi Map(Amap)	https://lbs.amap.com/

¹ Most datasets used in this study are from 2023. However, due to limited availability, some variables were sourced from 2022 as the most recent reliable data at the time of research.

Table 3. Factors Influencing the Location Selection of Luckin Coffee and Starbucks in Shanghai.

Dimension	Features	Identifier	Data Types
Natural environment	Elevation	X1	Elevation data
Natural environment	Land use type	X2	Land use data
Transportation accessibility	Road network density	X3	Road network data
	Distance to nearest major road	X4	Road network data
	Distance to nearest subway entrance	X5	POI data
	Distance to nearest bus stop	X6	POI data
Economic conditions	Nighttime light intensity	X7	Nighttime light data
	Population density	X8	Population data
	Land rent	X9	Listed Housing Prices
Market demand	Distance to nearest mall	X10	POI data
	Distance to nearest office building	X11
	Distance to nearest university	X12
Urban functional zoning	Transportation facilities	X13
	Dining services	X14
	Residential areas	X15
	Science and education cultural facilities	X16
	Shopping services	X17
	Life services	X18
	Medical facilities	X19
	Scenic spots	X20
	Sports and leisure facilities	X21
	Government institutions	X22
	Corporate businesses	X23
The competitive landscape	Distance to nearest competitor	X24
	Competitor density	X25
	Number of same-brand coffee shops within walking distance	X26

Table 4. The features utilized in the location prediction model for Luckin Coffee and Starbucks.

Research Subject	Selected Features (Sorted from Highest to Lowest Influence)
Luckin Coffee	X16, X24, X11, X13, X14, X18, X7, X15, X6, X10, X25, X5, X12, X8
Starbucks	X13, X24, X16, X25, X7, X11, X14, X10, X15, X18, X5, X6, X3, X9, X1, X23, X21, X12

Table 5. Model Hyperparameters and Descriptions.

Hyperparameters	Luckin Coffee	Starbucks
n_estimators	420	300
min_samples_split	2	5
min_samples_leaf	1	2
max_depth	15	16
max_features	sqrt	sqrt

Table 6. Model Evaluation Metric Results.

Research Subject	Accuracy	Precision	Recall	F1	ROC
Luckin Coffee (training set)	0.986	0.979	0.993	0.986	0.986
Luckin Coffee (validation set)	0.900	0.893	0.918	0.906	0.900
Starbucks (training set)	0.982	0.976	0.990	0.983	0.982
Starbucks (validation set)	0.922	0.909	0.951	0.929	0.920

Table 7. The Average Site Selection Probabilities for Commercial Centers of Different Levels.

Commercial Center System	Average Probability of Site Selection
Commercial Center System	Luckin Coffee	Starbucks
International business center	0.756	0.775
Municipal commercial center	0.702	0.755
District-level commercial center	0.554	0.578

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Published by MDPI on behalf of the International Society for Photogrammetry and Remote Sensing. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Zhao, Z.; Chen, G.; Duan, J.; Xu, Y. Site Selection Analysis and Prediction of New Retail Stores from an Urban Commercial Space Perspective: A Case Study of Luckin Coffee and Starbucks in Shanghai. ISPRS Int. J. Geo-Inf. 2025, 14, 217. https://doi.org/10.3390/ijgi14060217

AMA Style

Zhao Z, Chen G, Duan J, Xu Y. Site Selection Analysis and Prediction of New Retail Stores from an Urban Commercial Space Perspective: A Case Study of Luckin Coffee and Starbucks in Shanghai. ISPRS International Journal of Geo-Information. 2025; 14(6):217. https://doi.org/10.3390/ijgi14060217

Chicago/Turabian Style

Zhao, Zhengxu, Gang Chen, Jianshu Duan, and Youheng Xu. 2025. "Site Selection Analysis and Prediction of New Retail Stores from an Urban Commercial Space Perspective: A Case Study of Luckin Coffee and Starbucks in Shanghai" ISPRS International Journal of Geo-Information 14, no. 6: 217. https://doi.org/10.3390/ijgi14060217

APA Style

Zhao, Z., Chen, G., Duan, J., & Xu, Y. (2025). Site Selection Analysis and Prediction of New Retail Stores from an Urban Commercial Space Perspective: A Case Study of Luckin Coffee and Starbucks in Shanghai. ISPRS International Journal of Geo-Information, 14(6), 217. https://doi.org/10.3390/ijgi14060217

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Site Selection Analysis and Prediction of New Retail Stores from an Urban Commercial Space Perspective: A Case Study of Luckin Coffee and Starbucks in Shanghai

Abstract

1. Introduction

2. Literature Review

2.1. Evolution of Site Selection Methods

2.2. Selection of Influencing Factors

2.3. Theoretical Framework

3. Materials and Methods

3.1. Study Area and Data

3.1.1. Overview of the Study Area and Research Subjects

3.1.2. Data Sources and Preprocessing

3.2. Methods

3.2.1. Research Framework

3.2.2. Feature Extraction

3.2.3. Feature Selection

3.2.4. Model Construction and Interpretation Methods

3.2.5. Model Validation and Evaluation

4. Results

4.1. Variable Selection

4.2. Feature Importance

4.3. Predictive Results and Spatial Analysis

4.3.1. Potential Site Locations

4.3.2. Site Selection Recommendations

5. Discussion

6. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI