Exploring the Spatial Association Between Spatial Categorical Data Using a Fuzzy Geographically Weighted Colocation Quotient Method

Li, Ling; Duan, Lian; Li, Meiyi; Mai, Xiongfa

doi:10.3390/ijgi14080296

Open AccessArticle

Exploring the Spatial Association Between Spatial Categorical Data Using a Fuzzy Geographically Weighted Colocation Quotient Method

¹

Key Laboratory of Environment Change and Resources Use in Beibu, Nanning Normal University, Nanning 530000, China

²

School of Natural Resources and Surveying, Nanning Normal University, Nanning 530000, China

³

Joint Centre for Urban Health and Security Intelligent Data Analytics, Nanning Normal University, Nanning 530000, China

⁴

School of Mathematics and Statistics, Nanning Normal University, Nanning 530000, China

^*

Author to whom correspondence should be addressed.

ISPRS Int. J. Geo-Inf. 2025, 14(8), 296; https://doi.org/10.3390/ijgi14080296

Submission received: 18 May 2025 / Revised: 17 July 2025 / Accepted: 24 July 2025 / Published: 29 July 2025

(This article belongs to the Special Issue Spatial Data Science and Knowledge Discovery)

Download

Browse Figures

Versions Notes

Abstract

Spatial association analysis is essential for understanding interdependencies, spatial proximity, and distribution patterns within spatial data. The spatial scale is a key factor that significantly affects the result of spatial association mining. Traditional methods often rely on a fixed distance threshold (bandwidth) to define the scale effect, which can lead to scale sensitivity and discontinuity results. To address these limitations, this study introduces the Fuzzy Geographically Weighted Colocation Quotient (FGWCLQ) method. By integrating fuzzy theory, FGWCLQ replaces binary distance cutoffs with continuous membership functions, providing a more flexible and stable approach to spatial association mining. Using Point of Interest (POI) data from the Beijing urban area, FGWCLQ was applied to explore both intra- and inter-category spatial association patterns among star hotels, transportation facilities, and tourist attractions at different fuzzy neighborhoods. The results indicate that FGWCLQ can reliably discover global prevalent spatial associations among diverse facility types and visualize the spatial heterogeneity at various spatial scales. Compared to the deterministic GWCLQ method, FGWCLQ delivers more stable and robust results across varying spatial scales and generates more continuous association surfaces, which enable clear visualization of hierarchical clustering. Empirical findings provide valuable insights for optimizing the location of star hotels and supporting decision-making in urban planning. The method is available as an open-source Matlab package, providing a practical tool for diverse spatial association investigations.

Keywords:

spatial association; fuzzy neighborhood; fuzzy geographically weighted colocation quotient; spatial categorical data; Beijing

1. Introduction

With advancements in geographic information observation technology, there has been an explosion in the growth of spatial categorical data [1,2]. This type of data, which can be nominal, ordinal, or interval, often exhibits interrelationships and interdependencies in both spatial and temporal dimensions [3]. Spatial association, defined as the process of identifying the relationships between geographical locations and the features present at those locations [4], is a central focus in spatial data mining [5,6]. Understanding the spatial association between categories or features is crucial for identifying spatial patterns, trends, and relationships in specific locations. This analysis helps researchers explore how spatial factors influence various outcomes [7,8]. Unlike the analysis of numeric data, spatial association analysis in spatial categorical data must consider not only the characteristics of different categories, but also the inter-spatial relationship among the categories [9].

Traditional spatial association mining primarily focuses on spatial data comprising continuous variables, such as measurements, counts, or quantities associated with specific locations. These analyses aim to uncover how values change across space or how they correlate with other continuous attributes [10,11]. However, the spatial association rules in spatial categorical data, which consists of discrete categories or classes assigned to specific locations, often investigate the relationships between different categories and how they colocate or spatially influence each other. Mining spatial association rules in categorical data is a more complex task than mining in other spatial data. Two key factors contribute to this complexity: (1) the complex interaction between multiple categories and (2) the scale issues of the spatial objects. The former arises from the fact that different categories interact with each other with varying intensities and directions, at different places or times, presenting heterogeneity [12]. The latter refers to the fact that the association relationship may vary across different spatial scales [13]. Therefore, studying the spatial heterogeneity of relationships between categories and selecting an appropriate scale remain challenging tasks in analyzing spatial associations of categorical data.

Techniques for analyzing spatial association in spatial categorical data often involve colocation pattern analysis. A spatial colocation pattern represents a subset of spatial features whose instances are often located in close geographic proximity [14,15]. Existing approaches for discovering colocation rules can be primarily categorized into data mining methods and spatial statistical methods [15]. Data mining methods, such as the Apriori algorithm [16], FP-Growth algorithm [17], and decision tree methods [18], primarily focus on identifying frequent item sets and deriving association rules between different categories of spatial data. While these methods offer efficiency and flexibility, but can be algorithmically complex and may yield less interpretable results [16]. Spatial statistical methods, on the other hand, offer a robust theoretical foundation and intuitive results, and have been widely used in geographic analysis [19,20]. Several spatial statistical methods have been proposed to study the spatial association rules, including Nearest Neighbor Analysis [21], the cross-k-function [22], the Colocation Quotient [23,24]. These methods focus on finding statistically significant colocation patterns between two spatial categories using significance tests under the null hypothesis of independence [25]. The Colocation Quotient (CLQ) is a common method that measures the bivariate association between category pairs. It is defined as a ratio of probabilities, where the denominator calculates the expected proportion by chance, and the numerator is the observed number of category B points that are nearest neighbors of category A points [26]. The CLQ provides detailed spatial association analysis and visualizes the heterogeneity of relationships within the study area [27,28]. However, traditional spatial statistical methods often depend on the spatial scale and are sensitive to scale parameters.

In spatial colocation pattern mining, defining spatial scale or neighborhood is a critical step, where all points within that scale are considered close [21,29]. Different neighborhood definitions can lead to significant disparities in the analysis results [30,31,32]. A common approach to defining the neighbor relationship (K-neighbor) between instances is to set a distance threshold (referred to as bandwidth), which aligns with spatial data and provides realistic geographic context. Most spatial statistical methods use the kernel function to measure the decay of spatial effect. The scale parameter is a key parameter that influences the result. Traditional spatial statistical methods are highly sensitive to bandwidth selection, which poses challenges for determining an optimal bandwidth in practical applications [33]. Moreover, the optimal bandwidth always leads to a hard boundary (e.g., correlated within the bandwidth, uncorrelated beyond it) and overlooks the fuzziness of the concept of “adjacency” in geographical space, as the distance between objects is not a simple binary relationship but rather involves a gradual, fuzzy degree of proximity [34]. Therefore, using a single bandwidth not only leads to the loss of potential neighborhood associations but also undermines the objectivity of characterizing correlations among spatial features. As a result, the analysis becomes highly sensitive to bandwidth selection, introducing potential bias and instability [35].

To address the limitations of hard boundaries in spatial relationships, fuzzy theory has been introduced into spatial analysis [36]. It employs fuzzy concepts instead of rigid distance definitions [37,38], and has proved reliable and stable in spatial analysis, such as clustering [39,40], classification [41,42] and association rules [43,44]. Compared to traditional statistical methods, the fuzzy approach offers several advantages: (1) it is computationally more efficient [45]; (2) fuzzy neighborhoods more naturally and flexibly express the gradual nature of spatial proximity [34]; (3) fuzzy methods are less affected by parameters [46]. Due to these advantages, fuzzy methods have been incorporated into spatial statistical techniques, yielding reliable results. For example, Mason and Jacobson [47] proposed a fuzzy geographically weighted clustering method for geodemographic segmentation analysis. They incorporated space information into a clustering algorithm through spatial interaction, handling the geographical neighborhood effect, and improving the standard fuzzy c-means algorithm. Grekousis [48] improved the original FGWC and proposed a local fuzzy geographically weighted clustering method by applying a local spatial weight metric to handle spatial dependence and spatial heterogeneity, which is particularly important in spatial analysis. Although these methods demonstrated the flexibility and efficiency of fuzzy theory in spatial analysis, existing research still lacks a comprehensive framework that fully addresses the complex interactions among multiple categorical variables and the scale-related challenges inherent in spatial categorical data analysis.

Based on the above analysis, critical research gaps remain in spatial association analysis of categorical data. First, traditional spatial association mining primarily focuses on global associations but overlooks the spatial non-stationarity of these associations and fails to account for scale-dependent associations. Second, traditional statistical methods for categorical data, such as the Colocation Quotient (CLQ), are constrained by rigid single-bandwidth thresholds, which introduce distance bias and overlook the fuzzy nature of geographic adjacency. Lastly, few studies compare fuzzy versus deterministic methods (e.g., CLQ and GWCLQ) for capturing spatial heterogeneity, and most applications lack cross-domain generalizability. Collectively, these gaps highlight the need for a framework that integrates the flexibility of fuzzy methods with objective parameterization and multi-scale validation to effectively analyze complex urban data.

To fill the gap above, this study proposes an innovative Fuzzy Geographically Weighted Colocation Quotient (FGWCLQ) method to address the limitations of rigid single-bandwidth thresholds, significantly enhancing the precision and applicability of spatial association analysis from global and local perspectives. This method first defines a fuzzy neighborhood between spatial category objects and then proposes a new method to study both global and local associations in the category data. Finally, the method is validated through an empirical analysis of Point of Interest (POI) data in Beijing. The results indicate that FGWCLQ can explore complex associations between various categorical variables, yielding more robust and stable results. The innovations of this paper are threefold:

(1): This study defines a fuzzy neighborhood between spatial categorical points and constructs a fuzzy geographically weighted matrix to measure spatial proximity.
(2): This study proposes a new method, Fuzzy Geographically Weighted Colocation Quotient (FGWCLQ), to explore the association relationship between various categorical variables and visualize the spatial heterogeneity of these relationships.
(3): This study proposes a multi-scale framework to explore the hidden spatial association rules among the different facility categories in Beijing using FGWCLQ.

This paper is structured as follows. The methodology is given in Section 2. Section 3 presents the study area and data sources. Section 4 presents the experimental results of the FGWCLQ method in analyzing the spatial relationships between star hotels, transportation facilities, and tourist attractions in Beijing, and compares the empirical results of the FGWCLQ with those of the GWCLQ. Section 5 discusses and summarizes the conclusions of this paper.

2. Methodology

In this section, the details of FGWCLQ were introduced. To formally describe the algorithmic part, Table 1 lists the main notations used in this paper.

2.1. Fuzzy Neighbor Relationship Between Spatial Categorical Data

Definition 1.

(Spatial categorical data) The spatial categorical data are represented as discrete spatial points with location and different categories/features, which are referred to as Equation (1):

O = {o_{i} \in O | o_{i} = (i d, l (x_{i}, y_{i}), f_{i}),

(1)

where

l (x_{i}, y_{i})

is the location usually represented by the latitude and longitude, and

f_{i} \in F, (i = 1,2, \dots, K)

is the category/feature of

o_{i}

.

Definition 2.

(Fuzzy relationship between spatial categorical points, FR) Suppose D is the distance between spatial points, D ∈ R⁺, then the fuzzy relationship FR between o_i and o_j in the spatial category points set O is defined as Equation (2):

F R (O) = {U (d (o_{i}, o_{j})) | o_{i}, o_{j} \in O, d (o_{i}, o_{j}) \in D

(2)

where

U (d (o i, o j))

is the fuzzy membership function of FR, indicating the probability that

d (o_{i}, o_{j})

belongs to FR [38]. The FR maps the distance between two points to a value within the interval [0, 1].

The fuzzy membership function

U ()

is critical for the FR (O). There are various definitions for membership functions, including triangular and trapezoidal membership functions in the piecewise linear category, as well as other forms such as Gaussian and S-shaped membership functions [49,50]. Among them is the piecewise linear membership function, which is characterized by simplicity in calculation, ease of implementation, and understanding, making it suitable for different data analyses [51]. Considering the multi-scale analysis of the spatial pattern, this study uses a distance range [d₁, d₂] (d₂ > d₁) to define the fuzzy membership function, as follows in Equation (3) [33,52]:

U (d (o_{i}, o_{j})) = \{\begin{cases} 1, d (o_{i}, o_{j}) \leq d_{1} \\ 1 - \frac{d (o_{i}, o_{j}) - d_{1}}{d_{2} - d_{1}}, d_{1} < d (o_{i}, o_{j}) \leq \\ 0, d (o_{i}, o_{j}) > d_{2} \end{cases} d_{2}

(3)

Definition 3.

(Fuzzy neighborhood set, FNS) Based on the definition of fuzzy relationship FR (O), the fuzzy neighborhood set FNS (O) is defined as in Equation (4):

F N S (O) = {(o_{i}, o_{j}) | U (d (o_{i}, o_{j})) > 0}

(4)

where U (

d > 0)

indicates that the fuzzy neighborhood membership degree is greater than 0 in the fuzzy neighbor relationship FR (O) [52].

Definition 4.

(The fuzzy geographically weighted, FGW) The fuzzy geographically weighted FGW (

o_{i}

,

o_{j}

) combines the traditional geographically weighted with the fuzzy membership degree

U (d (o_{i}, o_{j}))

and describes the strength and uncertainty of the spatial relationship between points o_i and o_j. The definition of FGW (o_i, o_j) is as in Equation (5):

\begin{array}{l} F G W (o_{i}, o_{j}) = U (d (o_{i}, o_{j})) \times W (o_{i}, o_{j}) \\ = \{\begin{cases} e x p (- 0.5 \times \frac{d {(o_{i}, o_{j})}^{\dot{2}}}{{d_{1}}^{2}}), d (o_{i}, o_{j}) < d_{1} \\ (1 - \frac{d (o_{i}, o_{j}) - d_{1}}{d_{2} - d_{1}}) \times e x p (- 0.5 \times \frac{d {(o_{i}, o_{j})}^{\dot{2}}}{{d_{1}}^{2}}), d_{1} < d (o_{i}, o_{j}) < d_{2} \\ 0, o t h e r w i s e \end{cases} \end{array}

(5)

W (o_{i}, o_{j}) = \exp (- \frac{1}{2} (\frac{{d (o_{i}, o_{j})}^{2}}{d_{1}^{2}}))

(6)

where W (o_i, o_j) is a geographically weighted matrix which uses the Gaussian kernel function and

d_{1}

is the bandwidth. The fuzzy geographically weighted integrates traditional geographically weighted with fuzzy neighborhoods to redefine the decay pattern of spatial associations through a soft boundary interval of [d₁, d₂]. Specifically, in the core region (

d (o_{i}, o_{j}) \leq d

₁), a Gaussian kernel function is employed to maintain the locality of the core area. Additionally, a transition zone is introduced (

d

₁

\leq d (o_{i}, o_{j}) \leq d

₂), which accelerates the decline of weights by incorporating an extra linear decay factor based on the original Gaussian kernel. This approach mitigates the influence of distant points while avoiding abrupt changes in weight. In the case of long distances (

d (o_{i}, o_{j}) > d

₂), weights are set to zero to eliminate interference from distant outliers.

2.2. The Fuzzy Neighborhood Boundary Setting Approach

The definition of fuzzy neighborhoods is fundamental to the fuzzy geographically weighted approach, which seeks to quantify graded spatial associations between target units and their surrounding entities. This is typically operationalized through two key parameters, d1 and d2. Establishing these parameters requires balancing theoretical rigor, data characteristics, and the specific study context. Two primary methodological approaches are commonly employed.

Knowledge-driven definition: This approach begins with a clear articulation of research objectives and contextual background, followed by a comprehensive review of relevant literature. It is further refined through consultations with domain experts and adjustments to neighborhood parameters based on specific characteristics of the study area, such as the spatial distribution and density of points.

Data-driven approaches: Two statistical methods are typically used to define fuzzy neighborhoods:

(1) Cumulative frequency analysis: This method utilizes distance histograms to identify structural breakpoints in distance distributions via cumulative frequency curves. Such breakpoints correspond to inflection points where the slope of the curve—that is, the rate of frequency change—shifts abruptly, indicating qualitative transitions in spatial interaction density across distance intervals. For histograms with approximately linear distribution characteristics, critical thresholds are determined using percentiles of cumulative frequencies (e.g., 10%, 20%, and 70%) of inter-entity distances, chosen to align with natural data divisions and reflect empirical spatial patterns [53].

(2) Rule-of-thumb: This method operationalizes parameter selection through statistically grounded heuristics, ensuring thresholds are anchored in quantifiable properties of the distance dataset. For example, Baride proposed a “Rule-of-thumb” for selecting these two parameters based on the mean and standard deviation of distances, defined as follows [54]:

d 1 = \max (0, m - σ)

(7)

d 2 = m + σ

(8)

where

m

represents the mean distance between all data points, and

σ

is the standard deviation of these distances.

In this study, we adopt both approaches to determine the bandwidth parameters and explore multi-scale spatial effects by testing different values.

2.3. The Fuzzy Geographically Weighted Colocation Quotient

Building on the fuzzy geographically weights introduced in the previous section, this study proposed the FGWCLQ, an extension of the Geographically Weighted Colocation Quotient (GWCLQ). Unlike the GWCLQ, FGWCLQ provides a more flexible representation of spatial proximity by recognizing gradual variations in the strength of spatial relationships, rather than adhering to a fixed distance threshold. Similar to the traditional colocation quotient (CLQ), the FGWCLQ also includes both global and local versions.

In general, the global FGWCLQ value between two categories, f_i and f_j, is defined as the ratio of the observed to expected proportion of category f_i within the neighborhood set of category f_j. The formulation of the global FGWCLQ is presented in Equation (7).

F G W C L Q_{f_{i} \to f_{j}}^{g l o b a l} = \frac{\sum_{o_{i} \in O_{f_{i}}} \sum_{o_{j} \in O, o_{i} \neq o_{j}} F G W (o_{i}, o_{j}) \times x_{F N S_{j}}}{\frac{N_{f_{j}}}{N - 1} \sum_{o_{i} \in O_{f_{i}}} \sum_{o_{j} \in O, o_{i} \neq o_{j}} F G W (o_{i}, o_{j})}, N_{f_{j}} = N_{f_{i}} - 1, i f f_{j} = f_{i}

(9)

where

X_{{F N S}_{j}}

is a binary variable that equals 1 if the j-th fuzzy neighborhood of the f_i-type point belongs to category

f_{j}

, and 0 otherwise. The global FGWCLQ provides a comprehensive overview of spatial association patterns.

The local Fuzzy Geographically Weighted Colocation Quotient, which assesses the spatial heterogeneity of association patterns across the study area, is defined as in Equation (8).

F G W C L Q_{f_{i} \to f_{j}}^{l o c a l} (i) = \frac{\sum_{o_{j} \in O, o_{i} \neq o_{j}} F G W (o_{i}, o_{j}) \times x_{F N S_{j}}}{\frac{N_{f_{j}}}{N - 1} \sum_{o_{j} \in O, o_{i} \neq o_{j}} F G W (o_{i}, o_{j})}

(10)

The local FGWCLQ calculates the CLQ value at each individual point, rather than for a broader region. This index not only identifies whether a point pattern is clustered or dispersed but also facilitates mapping and visualization for interpretation, such as LISA.

The value of FGWCLQ is always positive. A higher FGWCLQ value indicates a stronger spatial attraction from category

f_{i}

to

f_{j}

. Specifically, when the value of

{F G W C L Q}_{f_{i} \to f_{j}}

is less than 1, it suggests that the points of category

f_{i}

are not spatially associated with category

f_{j}

. Conversely, a value greater than 1 indicates an attraction from category

f_{i}

to category

f_{j}

, while a value equal to 1 signifies a random spatial distribution between the two categories. Furthermore, FGWCLQ captures the directionality of spatial attraction. If both

{F G W C L Q}_{f_{i} \to f_{j}}

and

{F G W C L Q}_{f_{j} \to f_{i}}

exceed 1, the attraction between category

f_{i}

and category

f_{j}

is symmetrical. If only one of them exceeds 1, the spatial association is asymmetrical. Notably,

{F G W C L Q}_{f_{i} \to f_{i}}

represents autocorrelation within category

f_{i}

.

To test the significance of FGWCLQ values, this study uses a Monte Carlo simulation method to calculate statistical test values. A p-value smaller than 0.05 indicates that the spatial association is statistically significant at the 95% confidence level [19].

3. Study Area and Data

3.1. Study Area

Beijing, as China’s political and economic center, boasts substantial economic and tourism resources that attract millions of visitors annually for both leisure and business purposes. The central urban area functions as the city’s heart, characterized by its dense population and clusters of infrastructure, making it a key focus of urban development and planning research. Due to well-developed transportation and a high concentration of tourist attractions, most hotels in Beijing are located in this area, offering tailored accommodation services. Thus, examining the hotel characteristics in this region is crucial for optimizing location selection and improving the tourist experience. This study covers the urban area of Beijing, which includes six districts: Dongcheng, Xicheng, Chaoyang, Fengtai, Shijingshan, and Haidian.

Hotels in urban areas are not randomly distributed. Instead, they tend to be highly clustered [55]. Several studies have focused on factors influencing the spatial and temporal distribution of hotels in Beijing, which show that traffic accessibility, commercial prosperity, and tourist attraction distribution are generally considered the key factors affecting hotel spatial layout [56]. However, these studies do not address the differences in patterns between different categories of star hotels and their surrounding infrastructures across various spatial scales. Accurately grasping the spatial correlation structure of distribution in star-rated hotels and its influencing factors is of great significance for promoting high-quality coordinated development of star-rated hotels and the regional tourism and transportation industry. This study will examine the spatial association between different types of star hotels and their surrounding infrastructures in the urban area of Beijing using the proposed FGWCLQ method.

3.2. Data Collection and Pre-Processing

The data used in this study were points of interest (POI) data, including star hotels, transportation facilities, and tourist attractions within Beijing, which were collected from Baidu Maps published in April 2022. We developed a data collection module using Baidu Maps API (http://api.map.baidu.com(accessed on 15 April 2022)), collecting attributes such as facility name, location, category, time, and district.

Since the raw data were unsuitable for analysis, several pre-processing steps were applied. First, missing features were filled in using the Baidu engine. Next, abnormal geo-location data were corrected by adjusting coordinates, interpolating with surrounding data, and excluding obviously erroneous entries. Consistency in naming conventions for fields such as category and district was also ensured. Finally, duplicate POIs were identified and removed by matching locations and other attributes. After pre-processing, a total of 6810 instances were achieved, including 5545 star hotels, 432 transportation facilities (TP), and 923 tourist attractions (TS). In this study, star hotels were classified into four quality levels: 2-star and below (2-STR), 3-star (3-STR), 4-star (4-STR), and 5-star (5-STR). The statistical characteristics and spatial distribution of these facilities are presented in Table 2 and Figure 1.

4. Results

4.1. Experiment Setting

Urban facilities inherently exhibit distinct service ranges and influence radii. To quantify this for target facilities in Beijing, we first computed pairwise spatial distances between all data points and visualized their distribution via a histogram (Figure 2a). This histogram revealed a dominant peak at 5000 m, indicating this distance as the most prevalent among facility pairs. This study selected 5000 m as the service range for point facilities. This choice is justified because it corresponds both to the peak of the distance distribution and ensures that each point has at least one neighboring point within this radius. Furthermore, this distance aligns with previous research in the hotel research [55]. Therefore, drawing on both the character of the dataset and prior knowledge, this study adopts a 5 km service radius, beyond which the hotel’s influence is assumed to be negligible.

Within the influence range of the facilities, for single-scale analysis, the fuzzy neighborhood can be calculated through Equations (7) and (8), resulting in the range [2013 m, 4457 m]. However, as this study aims to explore the multi-scale distribution pattern of the facilities, three scales of fuzzy neighborhoods are selected for analysis to compare the distribution differences among these scales.

A histogram (Figure 2b) was constructed to characterize the distance distributions between diverse facilities within the 5 km influence radius. To operationalize the fuzzy neighborhoods, three critical thresholds (10%, 20%, and 70%) were identified based on the cumulative frequency of this histogram, corresponding to the intuitive cognitive distinctions of “nearby,” “medium,” and “faraway” in everyday distance perception. To enhance alignment with real-world behavioral contexts, these statistically derived points were further refined by integrating passenger travel behavior and decision-making processes, including typical commuting ranges and distance thresholds for facility utilization. The final definition of the three fuzzy neighborhoods, which synthesizes both spatial statistical distribution features and behavioral constraints, is presented in Table 3. This table details the classification of fuzzy neighborhoods alongside their geographical characteristics.

In addition to the fuzzy neighborhood setting, the Monte Carlo simulation for the significance test used a sample size of 1000 and was performed with 1000 random iterations, consistent with approaches employed in previous studies [59].

The experiments were conducted using MATLAB R2022b software on a 64-bit platform with 32 GB of RAM, an Intel Core i7 CPU, and a 2.00 GHz processor to validate the proposed method. The code and data are available at https://doi.org/10.6084/m9.figshare.26788822 (accessed on 21 August 2024).

4.2. The Results of Global FGWCLQ

After parameter setting, this study calculated the global values for six types of facilities in the urban area of Beijing using Equation (9). The results are shown in Table 4.

First, the global FGWCLQ results reveal distinct spatial association relationships of star hotels across three fuzzy neighborhoods (shaded areas). Intra-category autocorrelation (main diagonal values) indicates that high-star hotels cluster significantly (autocorrelation >1), with clustering intensity diminishing as neighborhood scale increases. Specifically, 5-STR hotels exhibit the strongest clustering across all scales, peaking at the 1000-m radius (highest CLQ value). In contrast, 2-STR hotels display random distribution at all scales, likely due to their high numbers and dispersion, creating service gaps in suburban tourism zones. Beyond intra-category dynamics, inter-category associations strengthen with increasing neighborhood scale, peaking at 2000 m—indicating mixed star-rated clusters function as “hierarchical service systems” (e.g., 3-STR complementing 5-STR by catering to extended stays). A critical observation is that star hotels show stronger associations with tourist attractions (TS) than with transportation facilities (shown in the last two columns of Table 4), with this gap widening at larger scales, highlighting attraction proximity as a primary locational driver over mere transit access. These findings confirm that high-star hotels in Beijing’s core cluster are more intense than lower-star types, with diverse hotels aggregating around high-star establishments—disproportionately proximate to tourist attractions—underscoring their role as spatial anchors for star-rated hotel distribution [60]. Consequently, upgrade infrastructure within 1000 m of 5-STR clusters to reinforce agglomeration, incentivize 2-STR development near suburban transit/attractions to fill gaps, zone 2000 m mixed-star corridors for complementarity, and strengthen attraction-hotel connectivity via scale-appropriate infrastructure. These optimize tourism resource allocation using high-star hotels as anchors.

In addition to the intensity of spatial association, the symmetrical and asymmetrical dependences between the six categories, identified and classified based on the global results in Table 4, are visualized in Figure 3. As the spatial scale increases, spatial correlations between hotel categories become stronger, aligning with findings from previous studies [61]. Within the 500–1000 m range, spatial clustering is observed for 3-STR, 4-STR, and 5-STR hotels, each displaying significant autocorrelation (Figure 3a). Within 1000–2000 m, 3-STR and 4-STR hotels tend to cluster around the periphery of tourism attractions (TS), suggesting that the distribution of medium-star hotels is notably influenced by tourism facilities at this scale (Figure 3b). At the 2000–5000 m scale, relationships between star hotels and surrounding facilities become more complex and are characterized by mutual influence (Figure 3c). Notably, 5-STR hotels cluster near the periphery of other hotel categories and exhibit emerging associations with both transportation and scenic areas, indicating a marked increase in their spatial influence at broader scales.

The global results demonstrate the dependency and independence relationship among the six facilities, providing an overall association pattern in Beijing. However, the facilities display heterogeneous distribution, and the spatial association pattern will vary by location. To examine the differences in spatial association relationships across various districts, the local FGWCLQ results are analyzed in the following section.

4.3. The Results of Local FGWCLQ

The local FGWCLQ analysis allows for the examination of the spatial heterogeneity of associations among six types of facilities across Beijing. Using Equation (10), the local values for all the category points were calculated. According to the results of global analysis (Table 4), the spatial associations between 3-STR hotels and the above hotel and tourism attraction facilities are particularly strong. Therefore, this study presents only the local spatial association maps for these relationships in Figure 3.

As illustrated in Figure 3, the significant spatial associations between 3-STR hotels and other facilities exhibit distinct patterns across the 500–1000 m scale. First, high autocorrelation values (greater than 2) for 3-STR hotels are concentrated on the periphery of the city center periphery (Figure 4a), while their interrelationships with other facilities vary markedly by star rating: clustering with 4-STR hotels is centered on the urban core (e.g., near Tiananmen Square, Figure 4b), whereas clustering with 5-STR hotels is confined to the Dongcheng, Chaoyang district junction (Figure 4c). These divergences indicate that 3-STR hotels function as “complementary alternatives” to 4-STR hotels in core commercial zones (serving budget-conscious business travelers) but as “peripheral supplements” to 5-STR hotels in upscale mixed-use areas (supporting extended family or tour group stays). Beyond inter-hotel associations, 3-STR hotels show weakened spatial links to tourism attractions in the urban core, reflecting a critical “spatial mismatch” between mid-range accommodation demand and attraction accessibility, likely exacerbated by strict land-use regulations limiting new hotel development, with associations intensifying toward the periphery (Figure 4d). These findings inform targeted urban planning interventions: In the city core, incentivize 3-star hotels proximate to 4-star clusters to adopt family-friendly amenities (e.g., kitchenettes) to mitigate direct competition. For the Dongcheng-Chaoyang Corridor, integrate 3-star and 5-star hotels into cohesive “hospitality ecosystems,” with shared shuttles linking these establishments to convention centers. In peripheral areas, prioritize enhancing last-mile connectivity (e.g., metro feeders and bike-sharing) between 3-star clusters and emerging attractions, alongside targeted subsidies for 3-star development, to accelerate decentralized tourism and alleviate core-area congestion.

Given the strong spatial association between 4-STR hotels and TS, this study analyzed their relationship across fuzzy neighborhoods, as presented in Figure 5. Results indicate that peak association occurs at 1000–2000 m (Figure 5b), where clusters in historic districts and transit hubs define a “functional sweet spot”, balancing accessibility and mobility. Sparser clustering in <1000 m zones (Figure 5a) reflects niche proximity demand, while >2000 m associations (Figure 5c) align with business–tourism hybrids near offices/conventions. The result indicates that the optimal functional distance between 4-STR hotels and tourist attractions lies within a 1000–2000 m radius, balancing accessibility and land-use efficiency. Urban planners can capitalize on this spatial range to design integrated tourism zones by prioritizing 4-STR hotel development within 1000–2000 m of key tourist sites, thereby minimizing traveler transit distances and enhancing experiential quality. For zones < 1000 m, targeted incentives should be deployed to preserve niche functions while precluding overdevelopment. In zones >2000 m, 4-STR hotel density should be aligned with office and convention center capacity, supplemented by dedicated shuttle services connecting to attractions. These interventions leverage the distance-sensitive nature of 4-STR hotel-tourist attraction associations to optimize tourism system functionality across spatial scales.

4.4. Comparison with Deterministic Methods

To demonstrate the superior performance of FGWCLQ, this section compares it with deterministic methods in both global and local versions. Geographically Weighted Colocation Quotient (GWCLQ) is a typical deterministic method that uses a single threshold to define the spatial neighborhood [24]. For comparability, the GWCLQ employed a bandwidth of 1500 m, while the FGWCLQ method utilized a fuzzy neighborhood of 1000–2000 m. Both methods used the same parameters (sample size = 1000 and random iterations = 1000). The results are presented in Table 5 and Figure 6.

From Table 5, it can be seen that the global values for both methods are quite similar. However, there are some specific differences. First, the autocorrelation values for the six types of facilities in FGWCLQ are greater than those in GWCLQ. Additionally, the symmetric association relationships between 4-STR and 5-STR hotels are absent in the GWCLQ results. Lastly, the proportion of significant associations in the FGWCLQ is higher than that in GWCLQ.

To further investigate the differences between the two methods, this study selects five bandwidths for GWCLQ (ranging from 1000 m to 2000 m) and compares their corresponding global values. For clarity, five representative combinations have been selected for display, with the results presented in Figure 6. It is evident that the global values of GWCLQ show significant variation across the five bandwidths. Specifically, these variations include declines (e.g., 3-STR to 4-STR), increases (e.g., 4-STR to TS, and 2-STR to 3-STR), and patterns of initial decline followed by increase and stabilization (e.g., 3-STR autocorrelation). In contrast, the values of FGWCLQ remain constant across the entire bandwidth range, demonstrating its greater stability and robustness in identifying spatial association relationships. The GWCLQ method, relying on a single fixed distance, is proven to be sensitive to bandwidth changes.

Additionally, a comparison between the FGWCLQ and GWCLQ was conducted at the local scale. This study focuses on comparing the local values between 5-STR hotels and transportation facilities, with the results presented in Figure 7. Figure 7 shows that the FGWCLQ outperforms GWCLQ in visualizing fine-grained spatial heterogeneity. While GWCLQ produces fragmented hotspot maps, FGWCLQ generates continuous association surfaces, correlation intensity spreads from the city center to the suburbs (Figure 7a). Furthermore, there are fewer non-significant points in FGWCLQ. These findings prove that the FGWCLQ method, by incorporating fuzzy geographically weighted, is better equipped to capture local variations and complex spatial relationships. In contrast, the deterministic method GWCLQ, relying on a fixed bandwidth, tends to overlook potential neighborhood relationships and smooths out spatial heterogeneity.

Finally, the statistical results of local value are summarized in Table 6, which illustrates the number of local values greater than 1 and exhibiting significance for “5-STR to TP” and “5-STR to TS” under the fuzzy bandwidth [1000 m, 2000 m] for FGWCLQ, as well as under a bandwidth of 1500 m for GWCLQ. Notably, FGWCLQ shows higher local clusters when fuzzy neighborhoods are considered.

The comparative analysis results above reveal that the FGWCLQ method is more accurate and robust than the deterministic method in exploring spatial association rules in categorical data. It can identify more association patterns overlooked by the traditional GWCLQ method, thereby providing a more comprehensive description of spatial association patterns.

4.5. Sensitivity Analysis

In this study, the configuration of fuzzy neighborhoods was identified as a critical parameter. To evaluate its impact on model results, a series of consecutive fuzzy distance thresholds, [d1, d2], were defined, ranging from 0 m to 5000 m in 500 m increments, resulting in ten distinct fuzzy neighborhoods (e.g., [0, 500 m], [500 m, 1000 m], etc.). The FGWCLQ algorithm was applied to each neighborhood to calculate the corresponding global values. For clarity, six representative combinations have been selected for display, with the results presented in Figure 8. The results indicate that as the fuzzy neighborhood distance increases, the growth in the global values gradually decelerates, becoming negligible beyond 2500 m. This suggests that the algorithm exhibits low sensitivity to large bandwidths, likely due to the influence range of various facilities being limited, beyond which their impact nearly stabilizes.

To comprehensively evaluate the sensitivity of the relationships among various facilities to the fuzzy neighborhood settings, this study utilized a widely accepted sensitivity index: Spatial Scale Sensitivity Index (SSSI) [62,63]. The formula for calculating this index is presented below:

S S S I = \frac{1}{n} \sum_{i = 1}^{10} \frac{|R_{i} - R_{a v g}|}{R_{a v g}}

(11)

Using this formula, the SSSI was calculated and compared at different fuzzy neighborhoods. The results are summarized in Table 7. The table indicates that most facilities exhibit low sensitivity to the fuzzy neighborhood settings, with sensitivity indices below 0.1. However, certain facilities show higher sensitivity, particularly the autocorrelation of transportation, followed by that of 5-STR hotels. Furthermore, the influence of transportation on tourism attractions is also notably affected by the fuzzy neighborhood settings, demonstrating strong sensitivity. These results suggest that the effect of fuzzy neighborhood settings on the algorithm is largely determined by the semantic attributes and spatial distribution of the geographic objects, rather than by the specific spatial scale.

5. Discussion

Traditional spatial statistical methods (such as geographically weighted approaches) typically use a fixed bandwidth (distance threshold or K-nearest neighbors) to define spatial relationships. Such a single bandwidth imposes an abrupt cutoff, thereby disrupting the continuous nature of geographic processes. Additionally, model performance demonstrates high sensitivity to bandwidth selection. This study proposes a Fuzzy Geographically Weighted Colocation Quotient (FGWCLQ) method, which employs fuzzy membership functions to model spatial continuity in a more flexible and natural manner. By defining fuzzy neighborhoods, this approach overcomes the constraints of traditional single-bandwidth methods. The results demonstrate that FGWCLQ can reliably and stably discover prevalent spatial associations, as well as visualize the spatial heterogeneity under different fuzzy neighborhood definitions. Sensitivity analyses confirm the robustness of FGWCLQ to changes in fuzzy neighborhood parameters.

Compared to other methods, the FGWCLQ approach fundamentally differs in its assumptions, performance metrics, and interpretability. In theoretical assumptions, unlike classical CLQ and GWCLQ, which rely on rigid distance thresholds to define spatial neighbors, FGWCLQ employs fuzzy membership functions to model the gradual decay of spatial relationships. This aligns with theoretical advancements in fuzzy spatial analysis, which recognize that geographic adjacency exists on a continuum rather than a binary boundary [33,36]. This assumption also differentiates FGWCLQ from fuzzy clustering techniques (e.g., FGWC or LFGWC), which define fuzzy neighborhoods based on feature similarity rather than spatial distances [48]. In terms of performance, FGWCLQ outperforms GWCLQ in capturing scale-varying associations and exhibits stable and robustness (Figure 5). For example, in analyzing Beijing’s 4-STR to TS attractions, GWCLQ exhibited unstable performance across bandwidths (e.g., 1000 m to 2000 m), with significant fluctuations in association (variance of the global value >0.25). In contrast, FGWCLQ’s fuzzy neighborhood framework maintained consistent performance, as the gradual membership decay mitigated sensitivity to arbitrary scale settings. This finding is consistent with prior studies showing that fuzzy methods reduce parameter-induced variance in spatial analyses [64]. Compared to FGWCLQ, fuzzy clustering methods emphasize attribute stability over spatial fidelity. Using static weight parameters disregards scale-related processes, producing overly smooth results that obscure subtle heterogeneous features. Regarding interpretability, the CLQ and GWCLQ methods exhibit pronounced boundary effects that disrupt the continuity of geographic processes. While fuzzy clustering methods offer interpretability within the attribute space, their membership values do not correspond to measurable geographic processes, resulting in a lack of interpretability at the spatial level. FGWCLQ addresses the limitation by grounding fuzzy memberships in distance decay, allowing local association values to be directly linked to real-world spatial phenomena. This enhances the method’s practical value for urban planners and policymakers. Collectively, these distinctions position FGWCLQ as a method that advances beyond the limitations of existing approaches by integrating theoretically grounded fuzzy spatial decay with robust performance and dual spatial-attribute interpretability [65].

The case study results demonstrate the effectiveness of FGWCLQ and enhance our understanding of spatial associations in urban facilities. By defining the linguistic predicates, such as “nearby,” “medium,” and “faraway,” using domain knowledge-based models for fuzzy neighborhoods and spatial effects, we identify the dependence and independence relationships among the six facilities and visualize the heterogeneity of spatial distribution patterns. For example, high-star hotels tend to show greater clustering over larger spatial ranges, whereas low-star hotels demonstrate stronger clustering within smaller spatial ranges. These findings are consistent with previous studies [66], which also reported significant spatial associations between hotels and surrounding facilities. However, notable distinctions also emerge. Traditional approaches by Yang et al. [53] identified discrete hotspot zones based on rigid bandwidths, while FGWCLQ reveals continuous association surfaces, highlighting hierarchical clustering patterns. This not only provides a more detailed visualization of spatial distributions but also better aligns with cognitive theories of human spatial perception.

However, this study has some limitations. While it demonstrates the effectiveness of the FGWCLQ method in analyzing the spatial correlations among three types of facilities in Beijing, there is currently insufficient evidence to confirm its universality and reproducibility across different contexts. In practice, factors such as urban environment, research objectives, and data structure can substantially influence the method’s performance. To address this limitation, a generalizable and adaptable application framework is essential to facilitate broader adoption of the FGWCLQ method. First, for spatial scale adaptation, fuzzy neighborhoods need to be calibrated to reflect spatial interaction patterns in different scenarios. In areas with dense point distributions, smaller-scale fuzzy neighborhoods are used to analyze fine-grained spatial dependencies; conversely, in sparse areas, larger-scale fuzzy neighborhoods are used to accommodate large-scale spatial patterns. Regarding the research objectives, parameter optimization aims to align with the interaction mechanisms specific to the domain. For example, in this study, we adopt a 5-km radius to represent the influence range of facilities. In contrast, in epidemiological applications such as disease transmission modeling, the fuzzy thresholds (d1,d2) are established based on epidemiological precedents; for instance, for locally transmitted airborne pathogens, the thresholds typically range from 500 m to 1.5 km [42]. Regarding data structure, parameterization strategies can be adapted to differences in spatial resolution and data sparsity. High-resolution datasets (e.g., POI points or crime incidents) employ refined, data-driven thresholds based on cumulative frequency breakpoints to preserve local spatial heterogeneity. In contrast, coarse-grained or sparse data (such as county-level disease incidence rates) utilize statistically robust thresholds (e.g., mean ± 1.5σ) or expert knowledge to mitigate the effects of random noise.

Second, this method relies on dual parameters (d1, d2) to define fuzzy neighborhood boundaries, introducing potential subjective bias that substantially affects analytical outcomes. The selection of fuzzy boundaries represents the “Achilles’ heel” of fuzzy mathematics, directly influencing model robustness, result interpretability, and cross-study reproducibility. As evidenced in our Beijing case study, such subjectivity manifests tangibly: a 50% variation in d2-d1 leads to 18–27% fluctuations in the association indices between 4-STR hotels and tourist attractions, with hotspot zones shifting spatially by up to 500 m (Table 4). This sensitivity may either overstate inter-category associations due to excessively broad neighborhood ranges or diminish the theoretical advantages of the fuzzy method when ranges are overly narrow. In this study, fuzzy boundaries were selected using a data-driven approach supplemented by prior knowledge, providing a scientific and reproducible framework for boundary determination; our findings further validate the feasibility of fuzzy neighborhoods. However, this research solely utilizes the spatial distribution and statistical characteristics of data for calculations. More precise and universally applicable methods could enhance the scientific rigor of fuzzy neighborhood determination. For example, cross-validation could be employed to select d1 and d2 parameters that maximize the model’s fit to observed spatial patterns [33], or machine learning-based techniques (e.g., Bayesian optimization) could enable automatic parameter tuning [36]. Additionally, integrating scale sensitivity indices (e.g., SSI) would quantify the impact of parameter variations, facilitating robust sensitivity analysis.

Furthermore, this study relies exclusively on spatial proximity to measure spatial associations, neglecting deeper functional couplings and semantic similarities between categorical data, an oversight that can readily create discrepancies between derived spatial association patterns and real-world scenarios. For instance, proximity-based metrics treat the 500-m distance between a 3-star hotel and a metro station as equivalent to the same distance between that hotel and a scenic spot, despite their fundamentally distinct roles in tourism functionality. Similarly, these metrics fail to account for semantic gradients. Two-star and three-star hotels, which share semantic and functional attributes aligned with budget travelers’ needs, should theoretically exhibit stronger spatial associations than either would with five-star hotels (catering to high-end tourists), yet proximity-only models conflate these relationships. Such limitations risk generating spatial association outcomes misaligned with real-world tourism dynamics. To address this, we propose integrating semantic similarity weights between categories via three approaches: expert knowledge (e.g., rating 4-star hotels’ relevance to scenic spots as 0.8 versus 0.5 for general transportation facilities), co-occurrence statistics (e.g., “5-star hotels” co-occur with “world heritage sites” 42 times per 1000 tourism reviews versus 11 times with “bus stations”), and ontological frameworks (e.g., “luxury hotels” share tourism service attributes with “scenic spot visitor centers” [semantic distance = 0.2] but not with “cargo terminals” [0.7]). Linear or non-linear integration of these semantic weights with geography weights will enable the model to distinguish meaningful tourism-related associations (e.g., hotels and scenic spots) from trivial spatial adjacencies (e.g., hotels and maintenance facilities), more accurately reflect functional and semantic affinities between tourism entities, and capture the multidimensional interactions among hotels, scenic spots, and transportation facilities that extend beyond mere spatial proximity.

In summary, the FGWCLQ not only addresses methodological limitations in spatial analysis but also advances theoretical frontiers in geographic information science. By integrating fuzzy theory with spatial statistics, FGWCLQ provides a formal framework to model the “geographic principle of distance decay” in its continuous form, resolving the long-standing contradiction between deterministic scale assumptions and real-world spatial complexity [34,36]. This theoretical innovation has profound implications for urban studies and offers policymakers a data-driven tool to design “fuzzy urban districts”. In practice application, this method is not only applicable to urban facilities to explore spatial association rules between different categories but can also be extended to other spatial categorical data, such as spatial associations between crime and disease, between house and work area, or industrial spatial pattern.

6. Conclusions

In this study, a novel fuzzy geographically weighted colocation quotient method was proposed for spatial association mining in spatial category data. Instead of selecting a single distance parameter, our approach uses a spatial range and introduces fuzzy theory to define geographic neighborhoods, constructing fuzzy geographic weights. This offers a new way to represent spatial proximity and closeness in spatial statistical models. Additionally, this study provides a framework for analyzing multi-scale patterns in spatial categorical data, which is valuable for spatial association mining. The proposed method has been validated through the analysis of the associations between hotels and their surrounding facilities in the Beijing urban area. The results demonstrate that the FGWCLQ method effectively maximizes the exploration of spatial associations between categorical data and exhibits greater robustness compared to GWCLQ, showing insensitivity to variations in bandwidth.

Several directions can be explored to further develop this research. First, this study considers only the statistical characteristics of data to define fuzzy neighborhoods. Incorporating both the spatial distribution patterns of data and prior knowledge would yield more refined results. Future work could leverage Moran’s I or sensitivity indices to optimize the selection of fuzzy neighborhoods. Second, future extensions could incorporate domain knowledge through semantic networks or knowledge graphs, assigning weighted memberships based on category-level functional similarity (e.g., using WordNet to measure semantic distances between “hotel” and “tourist attraction” concepts). Alternatively, integrating text mining of business descriptions (e.g., from POI metadata) could derive data-driven semantic weights, enhancing the model’s ecological validity. Finally, this study’s case analysis examines the spatial associations between hotels and facilities in Beijing. While the results clearly demonstrate the effectiveness of FGWCLQ in the Beijing urban area, spatial association patterns may vary significantly across different geographical contexts and application domains. Future research should investigate its application in various contexts to continuously improve the algorithm’s universality and efficacy.

Author Contributions

Conceptualization, Ling Li and Lian Duan; methodology, Ling Li and Meiyi Li; software, Xiongfa Mai; validation, Ling Li and Meiyi Li; formal analysis, Ling Li and Meiyi Li; investigation, Ling Li and Meiyi Li; resources, Ling Li; data curation, Ling Li and Meiyi Li; writing—original draft preparation, Ling Li and Meiyi Li; writing—review and editing, Ling Li and Lian Duan; visualization, Ling Li and Meiyi Li; supervision, Xiongfa Mai; project administration, Ling Li; funding acquisition, Ling Li and Lian Duan. All authors have read and agreed to the published version of the manuscript.

Funding

This research is funded by the Guangxi Science and Technology Major Program [grant number AA23062039-2] and the Major Talent Project in Guangxi Province.

Data Availability Statement

The code and the POI data are available with the identifier at the following link https://doi.org/10.6084/m9.figshare.26788822 (accessed on 21 August 2024). The algorithms are coded in Matlab currently, but their standalone program will be available for sharing in the near future.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Deng, Y.; Liu, J.; Liu, Y.; Luo, A. Detecting Urban Polycentric Structure from POI Data. ISPRS Int. J. Geo-Inf. 2019, 8, 283. [Google Scholar] [CrossRef]
Gan, M.; Gao, L. Discovering Memory-Based Preferences for POI Recommendation in Location-Based Social Networks. ISPRS Int. J. Geo-Inf. 2019, 8, 279. [Google Scholar] [CrossRef]
Cao, G.; Kyriakidis, P.C.; Goodchild, M.F. A multinomial logistic mixed model for the prediction of categorical spatial data. Int. J. Geogr. Inf. Sci. 2011, 25, 2071–2086. [Google Scholar] [CrossRef]
Goodchild, M.F.; Yuan, M.; Cova, T.J. Towards a general theory of geographic representation in GIS. Int. J. Geogr. Inf. Sci. 2007, 21, 239–260. [Google Scholar] [CrossRef]
Liu, J.; Liu, C.; Liu, Z.; Zhou, Y.; Li, X.; Yang, Y. Spatial analysis of air pollutant exposure and its association with metabolic diseases using machine learning. BMC Public Health 2025, 25, 831. [Google Scholar] [CrossRef]
Wang, H.; Liang, G. Association Rules Between Urban Road Traffic Accidents and Violations Considering Temporal and Spatial Constraints: A Case Study of Beijing. Sustainability 2025, 17, 1680. [Google Scholar] [CrossRef]
Li, L.; Cheng, J.; Bannister, J.; Mai, X. Geographically and temporally weighted co-location quotient: An analysis of spatiotemporal crime patterns in greater Manchester. Int. J. Geogr. Inf. Sci. 2022, 36, 918–942. [Google Scholar] [CrossRef]
Zhou, M.; Yang, M.; Chen, Z. Flow colocation quotient: Measuring bivariate spatial association for flow data. Comput. Environ. Urban Syst. 2023, 99, 101916. [Google Scholar] [CrossRef]
Andrzejewski, W.; Boinski, P. Co-location pattern mining using approximate Euclidean measure. Inf. Sci. 2025, 706, 122000. [Google Scholar] [CrossRef]
Krishnasamy, S.; Rajiah, M.; SenthilKumar, K.K.; Nagalingam Rajendiran, S. Association rule-based multilevel regression pricing and artificial neural networks based land selling price prediction based on market value. Concurr. Comput. Pract. Exp. 2023, 35, e7550. [Google Scholar] [CrossRef]
Chakraborty, J. Revisiting Tobler’s first law of geography: Spatial regression models for assessing environmental justice and health risk disparities. Geospat. Anal. Environ. Health 2011, 4, 337–356. [Google Scholar]
Yoo, J.S.; Park, S.J.; Raman, A. Micro-Level Incident Analysis using Spatial Association Rule Mining. In Proceedings of the 2019 IEEE International Conference on Big Knowledge (ICBK), Beijing, China, 10–11 November 2019; pp. 310–317. [Google Scholar]
Petelin, B.; Kononenko, I.; Malačič, V.; Kukar, M. Multi-level association rules and directed graphs for spatial data analysis. Expert Syst. Appl. 2013, 40, 4957–4970. [Google Scholar] [CrossRef]
Ge, Y.; Yao, Z.; Li, H. Computing Co-location Patterns in Spatial Data with Extended Objects: A Scalable Buffer-based Approach. IEEE Trans. Knowl. Data Eng. 2021, 33, 401–414. [Google Scholar] [CrossRef]
Huang, Y.; Shekhar, S.; Xiong, H. Discovering colocation patterns from spatial data sets: A general approach. IEEE Trans. Knowl. Data Eng. 2004, 16, 1472–1485. [Google Scholar] [CrossRef]
Chen, H.; Yang, M.; Tang, X. Association rule mining of aircraft event causes based on the Apriori algorithm. Sci. Rep. 2024, 14, 13440. [Google Scholar] [CrossRef]
Smith, J.M. An Efficient Parallel FP-Growth Algorithm for Big Data Association Rule Mining. J. Comput. Sci. Softw. Appl. 2024, 4, 1–8. [Google Scholar] [CrossRef]
Han, D.; Shi, Y.; Wang, W.; Dai, Y. Research on Multi-Level Association Rules Based on Geosciences Data. J. Softw. 2013, 8, 3269–3276. [Google Scholar] [CrossRef]
Wang, F.; Hu, Y.; Wang, S.; Li, X. Local Indicator of Colocation Quotient with a Statistical Significance Test: Examining Spatial Association of Crime and Facilities. Prof. Geogr. 2016, 69, 22–31. [Google Scholar] [CrossRef]
Shekhar, S.; Yan, H. Discovering Spatial Co-location Patterns: A Summary of Results. In Proceedings of the Advances in Spatial & Temporal Databases, International Symposium, SSTD, Redondo Beach, CA, USA, 12–15 July 2001. [Google Scholar]
Yoo, J.S.; Bow, M. Mining spatial colocation patterns: A different framework. Data Min. Knowl. Discov. 2012, 24, 159–194. [Google Scholar] [CrossRef]
Cressie, N.A.C. Statistics for Spatial Data; Wiley: New York, NY, USA, 1991. [Google Scholar]
Leslie, T.F.; Kronenfeld, B.J. The Colocation Quotient: A New Measure of Spatial Association Between Categorical Subsets of Points. Geogr. Anal. 2011, 43, 306–326. [Google Scholar] [CrossRef]
Cromley, R.G.; Hanink, D.M.; Bentley, G.C. Geographically Weighted Colocation Quotients: Specification and Application. Prof. Geogr. 2014, 66, 138–148. [Google Scholar] [CrossRef]
Cai, J.; Kwan, M.-P. Discovering co-location patterns in multivariate spatial flow data. Int. J. Geogr. Inf. Sci. 2022, 36, 720–748. [Google Scholar] [CrossRef]
Xia, Z.; Li, H.; Chen, Y.; Yu, W. Detecting urban fire high-risk regions using colocation pattern measures. Sustain. Cities Soc. 2019, 49, 101607. [Google Scholar] [CrossRef]
Liu, H.; Kwan, M.-P.; Hu, M.; Wang, H.; Zheng, J. Application of the local colocation quotient method in jobs-housing balance measurement based on mobile phone data: A case study of Nanjing City. Comput. Environ. Urban Syst. 2024, 109, 102079. [Google Scholar] [CrossRef]
Zhou, L.; Wang, C. Detecting the Spatial Association between Commercial Sites and Residences in Beijing on the Basis of the Colocation Quotient. ISPRS Int. J. Geo-Inf. 2023, 13, 7. [Google Scholar] [CrossRef]
Yu, W. Spatial co-location pattern mining for location-based services in road networks. Expert Syst. Appl. 2016, 46, 324–335. [Google Scholar] [CrossRef]
Mennis, J.; Liu, J.W. Mining Association Rules in Spatio-Temporal Data: An Analysis of Urban Socioeconomic and Land Cover Change. Trans. GIS 2010, 9, 5–17. [Google Scholar] [CrossRef]
Santos, M.Y.; Amaral, L.A. Geo-spatial data mining in the analysis of a demographic database. Soft Comput. 2005, 9, 374–384. [Google Scholar] [CrossRef]
Bembenik, R.; Rybinski, H. FARICS: A method of mining spatial association rules and collocations using clustering and Delaunay diagrams. J. Intell. Inf. Syst. 2009, 33, 41–64. [Google Scholar] [CrossRef]
Akbari, M.; Samadzadegan, F. Identification of air pollution patterns using a modified fuzzy co-occurrence pattern mining method. Int. J. Environ. Sci. Technol. 2015, 12, 3551–3562. [Google Scholar] [CrossRef][Green Version]
Zadeh, L.A. Fuzzy Sets. Inf. Control 1965, 8, 338–353. [Google Scholar] [CrossRef]
Hu, Y.; Miller, H.J.; Li, X. Detecting and Analyzing Mobility Hotspots using Surface Networks. Trans. GIS 2014, 18, 911–935. [Google Scholar] [CrossRef]
Jadidi, A.; Mostafavi, M.; Bédard, Y.; Shahriari, K. Spatial Representation of Coastal Risk: A Fuzzy Approach to Deal with Uncertainty. ISPRS Int. J. Geo-Inf. 2014, 3, 1077–1100. [Google Scholar] [CrossRef]
Wang, M.; Chen, Y.; Wu, Y.; He, L. Spatial co-location pattern mining based on the improved density peak clustering and the fuzzy neighbor relationship. Math. Biosci. Eng. 2021, 18, 8223–8244. [Google Scholar] [CrossRef] [PubMed]
Wang, M.J.; Wang, L.Z.; Zhao, L.H. Spatial Co-location Pattern Mining Based on Fuzzy Neighbor Relationship. J. Inf. Sci. Eng. 2019, 35, 1343–1363. [Google Scholar] [CrossRef]
Cui, X.; Wang, J.; Wu, F.; Li, J.; Gong, X.; Zhao, Y.; Zhu, R. Extracting Main Center Pattern from Road Networks Using Density-Based Clustering with Fuzzy Neighborhood. ISPRS Int. J. Geo-Inf. 2019, 8, 238. [Google Scholar] [CrossRef]
Zheng, K.; Huo, X.; Jasimuddin, S.; Zhang, J.Z.; Battaïa, O. Logistics distribution optimization: Fuzzy clustering analysis of e-commerce customers’ demands. Comput. Ind. 2023, 151, 103960. [Google Scholar] [CrossRef]
Baser, F.; Koc, O.; Selcuk-Kestel, A.S. Credit risk evaluation using clustering based fuzzy classification method. Expert. Syst. Appl. 2023, 223, 119882. [Google Scholar] [CrossRef]
Kalia, H.; Dehuri, S.; Ghosh, A.; Cho, S.-B. Surrogate-Assisted Multi-objective Genetic Algorithms for Fuzzy Rule-Based Classification. Int. J. Fuzzy Syst. 2018, 20, 1938–1955. [Google Scholar] [CrossRef]
Anari, Z.; Hatamlou, A.; Anari, B. Finding Suitable Membership Functions for Mining Fuzzy Association Rules in Web Data Using Learning Automata. Int. J. Pattern Recognit. Artif. Intell. 2021, 35, 2159026. [Google Scholar] [CrossRef]
Zhang, Z.; Huang, J.; Hao, J.; Gong, J.; Chen, H. Extracting relations of crime rates through fuzzy association rules mining. Appl. Intell. 2019, 50, 448–467. [Google Scholar] [CrossRef]
McBratney, A.B.; Moore, A.W. Application of fuzzy sets to climatic classification. Agric. For. Meteorol. 1985, 35, 165–185. [Google Scholar] [CrossRef]
Dubois, D.; Prade, H. The three semantics of fuzzy sets. Fuzzy Sets Syst. 1997, 90, 141–150. [Google Scholar] [CrossRef]
Velmurugan, S.; Kumar, S.A.; Udhayakumar, R. Analysis of Fuzzy Membership Function on Greenhouse Gas Emission Estimation by Triangular and Trapezoidal Membership Functions in Indian Smart Cities. Contemp. Math. 2024, 5, 2508–2530. [Google Scholar] [CrossRef]
Jain, A.; Sharma, A. Membership function formulation methods for fuzzy logic systems: A comprehensive review. J. Crit. Rev. 2020, 7, 8717–8733. [Google Scholar]
Ali, O.A.M.; Ali, A.Y.; Sumait, B.S. Comparison between the effects of different types of membership functions on fuzzy logic controller performance. Int. J. Emerg. Eng. Res. Technol. 2015, 3, 76–83. [Google Scholar]
Wang, X.; Lei, L.; Wang, L.; Yang, P.; Chen, H. Spatial Colocation Pattern Discovery Incorporating Fuzzy Theory. IEEE Trans. Fuzzy Syst. 2021, 30, 2055–2072. [Google Scholar] [CrossRef]
Xu, Z.; Gautam, M.; Mehta, S. Cumulative frequency fit for particle size distribution. Appl. Occup. Environ. Hyg. 2002, 17, 538–542. [Google Scholar] [CrossRef] [PubMed]
Baride, S. Algorithms for Spatial Colocation Pattern Mining. Ph.D. Thesis, The Department of Computer Science and Engineering Indraprastha Institute of Information Technology, New Delhi, India, 7 December 2023. [Google Scholar]
Yang, Y.; Wong, K.K.F.; Wang, T. How do hotels choose their location? Evidence from hotels in Beijing. Int. J. Hosp. Manag. 2012, 31, 675–685. [Google Scholar] [CrossRef]
Li, G.; Jin, F.; Chen, Y.; Jiao, J.; Liu, S. Location characteristics and differentiation mechanism of logistics nodes and logistics enterprises based on points of interest (POI): A case study of Beijing. J. Geogr. Sci. 2017, 27, 879–896. [Google Scholar] [CrossRef]
Li, M.; Fang, L.; Huang, X.; Goh, C. A spatial–temporal analysis of hotels in urban tourism destination. Int. J. Hosp. Manag. 2015, 45, 34–43. [Google Scholar] [CrossRef] [PubMed]
Di Marino, M.; Tomaz, E.; Henriques, C.; Chavoshi, S.H. The 15-minute city concept and new working spaces: A planning perspective from Oslo and Lisbon. Eur. Plan. Stud. 2023, 31, 598–620. [Google Scholar] [CrossRef]
Zacharov, P.; Rezacova, D.; Brozkova, R. Evaluation of the QPF of convective flash flood rainfalls over the Czech territory in 2009. Atmos. Res. 2013, 131, 95–107. [Google Scholar] [CrossRef]
Wei, M. Spatial distribution and the agglomeration performance of high-star hotels. Tour. Anal. 2017, 22, 31–43. [Google Scholar] [CrossRef]
Lee, K.H.; Kang, S.; Terry, W.C.; Schuett, M.A. A spatial relationship between the distribution patterns of hotels and amenities in the United States. Cogent Soc. Sci. 2018, 4, 1444918. [Google Scholar] [CrossRef]
Chen, L.; Chen, S.; Li, S.; Shen, Z. Temporal and spatial scaling effects of parameter sensitivity in relation to non-point source pollution simulation. J. Hydrol. 2019, 571, 36–49. [Google Scholar] [CrossRef]
Lilburne, L.; Tarantola, S. Sensitivity analysis of spatial models. Int. J. Geogr. Inf. Sci. 2009, 23, 151–168. [Google Scholar] [CrossRef]
Grekousis, G. Local fuzzy geographically weighted clustering: A new method for geodemographic segmentation. Int. J. Geogr. Inf. Sci. 2021, 35, 152–174. [Google Scholar] [CrossRef]
Zhang, J.X.; Stuart, N. Fuzzy methods for categorical mapping with image-based land cover data. Int. J. Geogr. Inf. Sci. 2001, 15, 175–195. [Google Scholar] [CrossRef]
Guo, J.F.; Mao, J.; Cui, T.J.; Li, C.W. A Multi-Scale Fuzzy Spatial Analysis Framework for Large Data Based on IT2 FS. Int. J. Uncertain. Fuzziness Knowl.-Based Syst. 2015, 23, 73–104. [Google Scholar] [CrossRef]
Zhang, X.; Han, D.; Zhang, C.; Feng, W.; Wu, J.; Xie, Y.; He, Y. Spatial Pattern Evolution and Influencing Factors of Foreign Star-Rated Hotels in Chinese Cities. Reg. Sci. Environ. Econ. 2024, 2, 1. [Google Scholar] [CrossRef]
Qin, Y.; Qin, J.; Liu, C. Spatial-temporal evolution patterns of hotels in China: 1978–2018. Int. J. Contemp. Hosp. Manag. 2021, 33, 2194–2218. [Google Scholar] [CrossRef]

Figure 1. The spatial distribution of the six types of facilities in the Beijing urban area. The points in different colors represent different facility POI: (a) 2-star; (b) 3-star; (c) 4-star; (d) 5-star; (e) Transport facilities; (f) Tourist attractions.

Figure 2. The spatial distribution of distances between (a) all pairs of data points and (b) point distances less than 5000 m.

Figure 3. Symmetrical and asymmetrical dependence relationship among diverse facilities across different fuzzy neighborhoods: (a) 500–1000 m; (b) 1000–2000 m; (c) 2000–5000 m. Circles represent six types of facilities. Unidirectional line segments indicate an asymmetric relationship from facility A to facility B, while bidirectional line segments signify a symmetric attraction between A and B. A line segment pointing back to the same facility denotes an autocorrelation relationship.

Figure 4. The distribution of local value at the spatial scale 500–1000 m from 3-STR hotels to (a) itself, (b) 4-STR hotels, (c) 5-STR hotels, and (d) tourism attractions (TS). The colors of the dots denote the class of the local value.

Figure 6. Global values by FGWCLQ and GWCLQ at different bandwidths.

Figure 7. The distribution of local value from 5-STR to transportation facility by FGWCLQ (1000–2000 m) and GWCLQ (1500 m).

Figure 8. Effect on global CLQ value of FGWCLQ at different fuzzy neighborhoods.

Table 1. Important notations used in this paper.

Notation	Definition
F	The categories of spatial objects
O	A set of spatial objects within the study area
f_i, f_j	Two different categories from the set F
o_i, o_j	Two different objects from dataset O
d(o_i, o_j)	Distance between two objects oi and o_j
$O_{f_{i}}$	A set of category fi objects
$N_{f_{i}}$	The number of category f_i
N	The total number of O

Table 2. Statistical characteristics of six categories of facilities in Beijing.

Categories	Explicate	Count	Portion
2-STR	Two Star Hotel and Economy Hotel	4051	59.49%
3-STR	Three Star Hotel	647	9.50%
4-STR	Four Star Hotel	525	7.71%
5-STR	Five Star Hotel	232	3.41%
Transportation (TP)	Subway Stations, Railway stations, Bus stations, and Airports	432	6.34%
Tourism (TS)	Urban squares, National scenic spots, and Famous scenic spots	923	13.55%
Total		6810	100.00%

Table 3. The division of spatial scale and the meaning of the fuzzy neighborhood.

Spatial Scale[d1, d2]	Fuzzy Neighborhood	Exploration
500–1000 m	nearby	5–10 min neighborhood: the walking distance at an average walking speed of around 1 m/s [57]
1000–2000 m	medium	15–20 min neighborhood: people’s daily needs can be satisfied within walkable or cyclable distances [58]
2000–5000 m	faraway	The service radius of the hotel that can be reached by car [55]

Table 4. Global FGWCLQ values of four types of hotels and surrounding facilities at different fuzzy neighborhoods.

Fuzzy Neighborhood	Categories	2-STR	3-STR	4-STR	5-STR	TP	TS
500–1000 m	2-STR		0.943	0.938	0.935	0.946	0.907
	3-STR		1.084	1.130	1.247	1.010	0.926
	4-STR		1.070	1.411	1.555		0.996
	5-STR	0.858	1.111	1.463	2.611	0.881	0.934
	TP	0.859			0.872
	TS	0.798		0.898	0.895
1000–2000 m	2-STR	1.010	0.971	1.008	1.039	0.908
	3-STR	0.963	1.066	1.126	1.265	0.898	1.027
	4-STR	0.948	1.068	1.232		0.850	1.017
	5-STR	0.911	1.119		1.917	0.851	0.969
	TP	0.969		0.964	1.034	1.852	0.773
	TS	0.869	0.895	0.935	0.955	0.627	1.871
2000–5000 m	2-STR	0.992	1.020	1.049	1.100		1.036
	3-STR	0.965	1.061	1.089	1.238	0.882	1.056
	4-STR	0.958	1.051	1.130	1.264		1.072
	5-STR	0.936	1.112		1.517	0.792
	TP		1.024		1.024	1.132	0.933
	TS	0.906	0.976	1.027	1.099	0.743	1.513

Note: Blank values are not significant at 5% level.

Table 5. Global value for six categories by two methods.

Method	Category	2-STR	3-STR	4-STR	5-STR	TP	TS
FGWCLQ (1000–2000 m)	2-STR	1.010	0.971	1.008	1.039	0.908
	3-STR	0.963	1.066	1.126	1.265	0.898	1.027
	4-STR	0.948	1.068	1.232		0.850	1.017
	5-STR	0.911	1.119		1.917	0.851	0.969
	TP	0.969		0.964	1.034	1.852	0.773
	TS	0.869	0.895	0.935	0.955	0.627	1.871
GWCLQ (1500 m)	2-STR	1.003	0.981	1.018	1.060	0.915	1.015
	3-STR	0.962	1.068	1.114	1.263		1.043
	4-STR		1.065	1.202		0.836	1.024
	5-STR	0.925	1.125	1.284	1.737	0.829
	TP		0.983	0.970	1.032	1.522	0.803
	TS	0.880		0.947	0.968		1.789

Note: Blank values are not significant at 5% level.

Table 6. Comparison of statistical quantities between two methods.

Method	5-STR → TP		5-STR → TS		Bandwidth
Method	Number	Proportion	Number	Proportion	Bandwidth
FGWCLQ	84	36.21%	73	31.46%	[1000 m, 2000 m]
GWCLQ	71	30.6%	62	26.72%	1500 m

Table 7. The SSSI between various facilities in different fuzzy neighborhoods.

Category	2-STR	3-STR	4-STR	5-STR	TP	TS
2-STR	0.025	0.031	0.042	0.066	0.024	0.063
3-STR	0.004	0.034	0.019	0.025	0.084	0.057
4-STR	0.023	0.012	0.110	0.097	0.043	0.040
5-STR	0.039	0.005	0.105	0.262	0.059	0.068
TP	0.064	0.050	0.076	0.067	0.564	0.157
TS	0.054	0.075	0.058	0.090	0.095	0.200

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Published by MDPI on behalf of the International Society for Photogrammetry and Remote Sensing. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Li, L.; Duan, L.; Li, M.; Mai, X. Exploring the Spatial Association Between Spatial Categorical Data Using a Fuzzy Geographically Weighted Colocation Quotient Method. ISPRS Int. J. Geo-Inf. 2025, 14, 296. https://doi.org/10.3390/ijgi14080296

AMA Style

Li L, Duan L, Li M, Mai X. Exploring the Spatial Association Between Spatial Categorical Data Using a Fuzzy Geographically Weighted Colocation Quotient Method. ISPRS International Journal of Geo-Information. 2025; 14(8):296. https://doi.org/10.3390/ijgi14080296

Chicago/Turabian Style

Li, Ling, Lian Duan, Meiyi Li, and Xiongfa Mai. 2025. "Exploring the Spatial Association Between Spatial Categorical Data Using a Fuzzy Geographically Weighted Colocation Quotient Method" ISPRS International Journal of Geo-Information 14, no. 8: 296. https://doi.org/10.3390/ijgi14080296

APA Style

Li, L., Duan, L., Li, M., & Mai, X. (2025). Exploring the Spatial Association Between Spatial Categorical Data Using a Fuzzy Geographically Weighted Colocation Quotient Method. ISPRS International Journal of Geo-Information, 14(8), 296. https://doi.org/10.3390/ijgi14080296

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Exploring the Spatial Association Between Spatial Categorical Data Using a Fuzzy Geographically Weighted Colocation Quotient Method

Abstract

1. Introduction

2. Methodology

2.1. Fuzzy Neighbor Relationship Between Spatial Categorical Data

2.2. The Fuzzy Neighborhood Boundary Setting Approach

2.3. The Fuzzy Geographically Weighted Colocation Quotient

3. Study Area and Data

3.1. Study Area

3.2. Data Collection and Pre-Processing

4. Results

4.1. Experiment Setting

4.2. The Results of Global FGWCLQ

4.3. The Results of Local FGWCLQ

4.4. Comparison with Deterministic Methods

4.5. Sensitivity Analysis

5. Discussion

6. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI