1. Introduction
Urban spatial structure represents the relative spatial relationships and distribution patterns of geographical elements within a certain geographic area [
1]. On the one hand, the spatial structure of a city, as the core representation of its functional layout, plays a significant role in guiding and shaping human activities [
2]. For instance, commercial districts often attract residents during leisure time, encouraging them to engage in consumption activities such as shopping and dining. However, areas with high levels of noise pollution may deter pedestrian movement and outdoor activities, thereby affecting the functionality and attractiveness of certain urban zones [
3,
4]. On the other hand, the dynamic changes in human mobility also feed back into the urban spatial structure, prompting continuous optimization and adjustment of the city’s functional layout [
5]. As urban residents’ lifestyles and consumption needs evolve, the hotspots of human activity may shift or expand, driving the urban spatial layout to adapt to better meet demands. Therefore, through the study of urban spatial structure, researchers can grasp the current distribution and interrelations of various functional zones, optimize the functional layout, enhance the efficiency of the transportation system, and ensure the equitable distribution of public service resources. This has significant implications for urban planning [
6], traffic management [
7], and public transportation route selection [
8].
With the rapid development of mobile positioning technology, transportation devices such as taxis, buses, and subways can record detailed movement trajectories of individuals in different areas, generating a large amount of human mobility data [
9,
10]. The rise of micromobility infrastructure has further diversified the sources and scale of human mobility data [
11]. These data establish a connection between urban spatial structures and human activities, leading to a key shift in the perspective of urban spatial structure research from static to dynamic. For example, with the help of daily travel data such as subway transit data and check-in information, we can accurately assess the functional attributes and living environment quality of different urban areas, allowing for a more comprehensive and in-depth understanding of urban spatial structures [
12]. Human mobility plays a significant role in driving the optimization and adjustment of urban functional layouts, thereby having a broad and profound impact on urban spatial structures. Given the important driving force of human behavior in shaping changes in urban functional layouts, numerous scholars have initiated related research and introduced the concept of “urban mobility structure,” which aims to analyze the spatial and temporal distribution, combination, and interrelationships of various activities within the city [
13].
The urban mobility structure reflects the mobility patterns of urban residents in different regions and time periods, including areas for work, residence, leisure, and entertainment [
14]. To capture the changes in urban layout and network attributes induced by human activities, researchers have explored various network structure partitioning methods. Among these, modularity-based network substructure partitioning methods, such as Louvain [
15], FastNewman [
16], and InfoMap [
17], are widely used for effectively identifying community structures in urban activities. However, with the increasing scale and complexity of data, these traditional methods face challenges in terms of computational efficiency and accuracy. In recent years, with the rapid development of deep learning technologies, data-driven approaches have provided new perspectives and solutions for urban mobility structure partitioning. These methods process massive data, enabling models to automatically learn the intrinsic patterns, features, and regularities within the data, thereby allowing for the precise segmentation of urban activity structures. For example, deep clustering [
18] and graph embedding [
19], with their powerful feature extraction capabilities, have shown unique advantages in handling large-scale and complex urban data. Compared to traditional methods, data-driven approaches not only effectively handle large-scale data but also uncover more detailed and comprehensive urban mobility structure information. As a result, these methods have garnered widespread attention from researchers in recent years.
The urban mobility structure is complex, encompassing both crowd activity features and spatial structural features. However, current data-driven methods for detecting urban community structures mainly focus on learning local crowd activity features through node proximity, while learning global spatial structural features presents significant challenges. At the same time, urban networks are scale-free networks, and traditional community division metrics such as modularity make it difficult to measure the quality of urban mobility structures, neglecting the key factor that crowd movement is a direct representation of urban functional demands [
20]. To address these limitations, we proposed a city activity structure embedding method that considers multi-scale features. The proposed method uses road network data and crowd travel data to construct an urban spatial activity structure map and designs a multi-scale comparison method to learn both local crowd activity features and global spatial structure features, generating region embedding vectors for urban mobility structure detection. Based on this, the region embedding vectors are further used to introduce a correlation matrix to explicitly encode the functional synergy and competitive relationships of POIs, enabling the learning of urban functional distribution and establishing a nonlinear mapping relationship between crowd activities and urban functional distribution. Finally, by quantifying the balance between crowd activity and service coverage under different urban spatial functional distributions, the method provides multi-objective optimization decision support for planners to address the complex challenges in urban spatial planning. The main contributions are summarized as follows:
1. A city activity structure embedding method considering multi-scale features is designed. Building upon the urban spatial mobility network, we designed a learning framework that incorporates both local node contrast and global activity structure contrast. Additionally, we introduced a batch decay factor to integrate the local node contrast loss with the global activity structure contrast loss, thereby significantly enhancing the performance of urban mobility structure detection;
2. Functional distribution learning based on urban mobility structures is conducted. We introduced a correlation matrix to explicitly encode the functional synergy and competitive relationships of POIs, enabling urban functional distribution learning, constructing a nonlinear mapping relationship between crowd activity structures and urban functions, and thereby evaluating the quality of regional embedding vectors and the balance between crowd activities and functional coverage;
3. Experiments in Haikou City have shown that our method can accurately detect the urban mobility structure and functional distribution. The analysis reveals that the functions of Haikou’s central urban area are highly concentrated, with a clear tidal movement effect, while the suburban functions are relatively weak, and residents have a high level of dependence on the central urban area. In the process of urban development, it is important to optimize the functional layout of the central urban area, improve suburban functions, and promote the balanced development of urban space.
The paper is organized as follows:
Section 2 introduces the related work.
Section 3 introduces the proposed method.
Section 4 implements the method on the Haikou City dataset and compares it with baseline methods.
Section 5 discusses the travel patterns of residents under different community structures.
Section 6 provides the conclusion.
3. Method
Urban mobility structure detection categorizes regions into distinct communities based on the travel patterns observed in the data. These communities are composed of regions with similar mobility characteristics, revealing the spatial structure and functional distribution of the city. It ought to be assumed that the study area can be divided into
N regions, represented as nodes
, with the mobility OD flows
serving as the edges. The urban mobility network is represented as
, where
and
are the starting and destination regions of OD flow, respectively, and
and
are the start and end times of travel. Through contrastive learning, the mobility network
G maps to a low-dimensional vector space, generating the region vector
, while urban regions are grouped into communities
, with communities exhibiting similar internal characteristics. Next, the region embedded vector fits the region functional distribution
, building a complex relationship between urban mobility structures and urban functional distribution. Finally, we analyze the flow pattern, spatio-temporal pattern of travel, and functional distributions among different communities, providing multi-objective optimization decision support for planners, as shown in
Figure 1.
3.1. Model 1: Urban Mobility Structure Embedding with Multi-Scale Features
The objective of embedding urban mobility structures is to map a mobility network into a low-dimensional vector space, thereby generating vector representations for each region. However, existing self-supervised learning methods primarily rely on contrastive strategies to learn the proximity features of nodes, which have limitations in capturing the global mobility structure features [
31]. To address this issue, we proposed a city activity structure embedding model that incorporates multi-scale features. We simultaneously performed local node contrast and global mobility structure learning by the enhanced mobility network. Furthermore, an epoch decay coefficient is introduced to integrate the local node contrast loss with the global contrast loss. The model architecture, as depicted in
Figure 2, consists of two key components: graph enhancement and multi-scale contrastive learning.
Step1. Graph enhancement
We enhanced the mobility network by randomly masking the node features and randomly discarding the edges. The node feature masking is obtained by multiplying the random mask vectors
, as follows:
where
is the Hadamard product,
is the original node feature matrix, and
is the random noise vector, following the Bernoulli distribution:
,
is the random discard rate of node features. Edge randomly discard is obtained by multiplying the random mask matrix
, as follows:
where
is the original adjacency matrix,
follows a Bernoulli distribution
.
Through the graph enhancement operations of feature masking and edge discarding, we obtained two enhanced views, respectively:
and
, Then, we obtained the node embedding vectors of the two enhanced views, respectively, through the GCN of parameter sharing, as follows:
where
is a graph convolution, and its calculation principle is as follows:
where
is the normalized adjacency matrix,
,
is the adjacency matrix,
is the identity matrix, and
is the degree matrix,
is the node feature matrix of the
l-th layer,
is the initial node feature,
is the weight matrix of the
l-th layer, and
is the activation function.
Step2. Multi-scale contrastive learning
- ①
Local contrastive learning
Upon generating the vector embeddings for the two enhanced views, we employed the node contrast loss function, InfoNCE, to learn the node embeddings, effectively capturing the mobility patterns of the crowd at the nodes. Specifically, the goal is to minimize the contrast loss, ensuring the embedding distances of the same nodes in the two enhanced views is closer, while increasing the distances between different nodes, as follows:
where
is the Gaussian RBF distance, and
and
are the node embedding vectors of two enhanced views, respectively.
- ②
Global contrastive learning
To learn the global structure features, we determined the community ownership of each node by calculating the distance between each node and the community centroid, as follows:
where
indicates that node
i belongs to the community
k, and
is a trainable community centroid matrix. Then, we adopted the cross-comparison objective to compare the node representation of one view with the community centroid of the other view, to maximize the community consistency between the two views, as follows:
where
represents the embedded representation of the
-th node,
represents the centroid of the community
to which the
-th node belongs, and
is the RBF weight function.
To train the node contrast loss and community contrast loss in a single loss function, we proposed the epoch decay coefficient
such that the loss weight of
decreases as the number of training epochs increases, and the overall training loss is as follows:
where
decreases as the epoch
increases.
3.2. Model 2: Urban Functional Distribution Learning with the Correlation Matrix
The urban mobility structure embeddings learned through contrastive learning effectively capture the mobility patterns of urban residents across different regions and periods. However, objectively evaluating these embeddings remains a challenge. Given the complex relationship between urban mobility structures and the urban functional distribution, we argued that the structure of urban activities can reflect the functional distribution of a region [
32]. Therefore, by using the embedding vector of urban activity structures as input, we transformed the task of learning urban functional distribution into a problem of predicting the region’s POI distribution.
We believed that different types of POIs exhibit a synergistic relationship. For instance, regions with a high number of “shopping” POIs tend to also have a larger number of “food and beverages” POIs. Based on this insight, we proposed an enhanced urban functional distribution learning model that incorporates these collaborative relationships. The synergy between POIs is quantified by analyzing the correlation in the distribution of different POI types.
The POI distribution prediction for regions can be defined as follows: let be the urban mobility structure embedding matrix for each region, where (with being the embedding dimension of each region) and denote the set of POI types. The label distribution matrix is , where represents the true proportion of each POI type in a region, with and . Given the training set , the goal is to model the relationship between X and Y.
Figure 3 illustrates the three key steps involved in the urban functional distribution learning model. First, the embedding matrix
is fed into a multilayer perceptron (MLP) to obtain the initially predicted distribution
. Second, an affinity matrix is constructed to explicitly encode POI functional synergies, which adjusts the initial predictions through matrix multiplication. Finally, the output is passed through a SoftMax function to generate the normalized POI type distribution prediction, ensuring the sum of all probabilities equals 1. The core of this method lies in defining POI functional synergies, where cosine similarity
is used to quantify the functional relationship between region
and region
, as follows:
The KL divergence can be used to measure the difference between two probability distributions. Therefore, we adopted the KL divergence as the loss function of the urban functional distribution learning model, as follows:
where
is the number of regions,
is the predicted regional label distribution, and
is the actual regional label distribution.
5. Discussion
5.1. Flow Pattern Between Communities
As depicted in
Figure 7, the flow patterns between communities reveal distinct characteristics on weekdays (a) and weekends (b). On weekdays, there is a pronounced unidirectional convergence of flows from suburbs to the central regions (Type I). This stems from the necessity for residents to travel to central hubs for work, business meetings, or commuting to office buildings, commercial zones, or transportation hubs. Such a pattern underscores the central area’s dominant role and the significant reliance of suburban residents on its employment and commercial resources.
On weekends, the flow between suburbs significantly increases, and more bidirectional arrows appear. This indicates that during weekends, people have more leisure time for diverse activities, no longer limited to traveling to the central area. There may be more recreational, entertainment, commercial, or social activities between peripheral nodes. For example, on weekends, people may choose to engage in consumption and leisure activities in surrounding commercial streets, parks, or cultural venues, which enhances the interaction between peripheral nodes. It shows the potential for functional transformation in suburban spatial structures on non-working days.
The change from (a) to (b) shows that the flow pattern on weekdays is more concentrated and singular, while the flow pattern on weekends is more dispersed and diversified. This reflects the different living rhythms and activity demands of urban residents on weekdays and weekends, which is consistent with the trend of functional diversification in suburban and fringe areas within urban spatial forms. With urban development, suburban areas are gradually transitioning from a single residential function to a multifunctional one, increasing the supply of leisure, entertainment, and commercial facilities. The transformation not only relieves pressure on the central area but also promotes the balanced development of the urban spatial structure. However, this balanced development is still in its early stages, and there is still a significant gap in the degree of functional improvement in suburban areas compared to the central urban areas. This is particularly evident in the travel patterns of Type IV and Type V communities.
5.2. The Spatio-Temporal Pattern of Travel
To analyze the travel time patterns of Haikou City, we counted orders from different communities by the hour, separately for weekdays and weekends, as shown in
Figure 8. On weekdays, the travel volumes of Type I and Type II communities display similar time-distribution patterns with slight numerical differences. From 00:00–03:00, volumes stay low; 04:00–06:00 sees a gradual rise; and a small peak emerges at 6:00–9:00. Volumes remain high and stable between 9:00 and 15:00, climb again to another daily peak at 15:00 and 18:00, and then decrease from 18:00 to 21:00.
Over the weekend, the difference in travel volumes between Type I and Type II communities narrowed compared to weekdays. The travel demand in Type II communities rises, and both types see a peak at 18:00. This might be due to Haikou’s tropical monsoon climate, where high daytime temperatures prompt residents to schedule leisure activities for the cooler evenings.
Daytime temperatures are relatively high, leading residents to schedule their leisure activities during cooler evening hours. The order quantities for Type I, Type III, Type IV, and Type V in the community remain at a lower level, both on weekdays and weekends, consistently ranging between 0 and 200 orders. The volatility is also low, with no significant fluctuations at either the upper or lower extremes, indicating that travel demand in this area is relatively low and stable.
To explore the characteristics of each community, we calculated the average travel distance, average travel time, and average travel speed for different communities, as shown in
Figure 9. On weekdays, the travel distance in most communities is greater than that on weekends, with Type III, Type IV, and Type V communities exhibiting longer travel distances compared to Type I and Type II communities. It can be attributed to the centralizing urban structure of Haikou, where Type I and Type II areas encompass various functions such as commerce, culture, administration, and residential purposes. Residents in these areas do not need to travel long distances for daily activities. In contrast, residents of Type III, Type IV, and Type V communities need to commute long distances to reach their workplaces. However, on weekends, residents tend to choose leisure, entertainment, and other nearby destinations, resulting in shorter travel distances.
In terms of travel time, communities of Type I, Type II, and Type IV on weekdays have shorter travel times compared to their corresponding communities on weekends, while Type III and Type V communities show the opposite trend. This is because, on weekdays, residents have a clear travel purpose due to work, leading to shorter travel times in Type I, Type II, and Type IV communities. Conversely, residents in Type III and Type V communities need to commute longer distances on weekdays, thus resulting in longer travel times compared to weekends. On weekends, people’s travel purposes are more diverse, and the travel time is relatively more flexible. As a result, travel speeds on weekdays are higher than on weekends across all areas.
5.3. Urban Functional Distribution
In
Figure 10, we conducted a visual analysis of the functional distribution prediction of different regions in the city. The results show that the urban functional distribution prediction model demonstrates good performance in most regions and can reflect the actual distribution of urban functions relatively accurately, indicating that residents’ travel activities and the distribution of urban functions are in a relatively balanced state. We selected four regions especially and conducted an in-depth visualization study on the predicted values of their functional distribution. Among them, the cosine similarity between the model outputs of regions (a) and (b) and the real situation exceeds 0.9, indicating that the model can effectively fit the POI type distribution of these regions with the help of the active structure vectors. However, the cosine similarity of regions (c) and (d) is less than 0.5, and the model shows significant overprediction biases in the predictions of the three types of POIs, namely daily life service, shopping, and food and beverages, in these regions.
To explore the reasons behind it, we conducted further statistics and analyses of the four regions of travel flow conditions, and the results are shown in
Figure 11. Both region (a) and region (b) are commercial areas with diverse functions. The flow of residents’ activities is not only regular but also large in scale, which provides rich characteristic information for the model and enables it to effectively conduct functional distribution learning. In contrast, region (c) is a tourist attraction. Although the flow of residents’ activities in the region is also considerable, due to the lack of other regions (tourist attractions) with similar functional distribution as a reference and comparison, this undoubtedly increases the difficulty of accurate prediction by the model. Region (d), on the other hand, is an industrial area mainly composed of factories, whose activity flow is inherently at a relatively low level, making it difficult to extract sufficient effective features from the urban mobility structure vector. From the perspective of urban planning, there are indeed unreasonable aspects in the functional services distribution of region (d), lacking basic living supporting facilities such as daily life.
6. Conclusions
We proposed a method for embedding urban mobility structures that incorporate multi-scale features, which is used to detect urban mobility structures and learn the distribution of urban functions. To begin with, we gathered road network data and population travel data to construct the urban mobility network. By designing a multi-scale contractive learning model, it generates the embedding vectors of the regions by learning both local population activity features and global spatial structure features. In addition, a correlation matrix is employed to explicitly encode the coordination and competition relationship of POI functions. By integrating the urban mobility structure embedding vectors with the learning of urban functional distribution, we established a nonlinear mapping relationship between population activities and urban functions. It helps evaluate the balance of functional distribution and population activities in urban regions. The experimental results in Haikou City demonstrate that the proposed method has achieved excellent performance in both the detection of urban mobility structure and the learning of functional distribution.
Through the analysis of urban mobility structure, it was found that the urban structure of Haikou City has a central clustering characteristic. The central area attracts a large amount of surrounding mobility due to high-intensity development and multi-functional agglomeration, reflecting a clear functional zoning in urban space and the dominant role of the central area. However, this also leads to traffic congestion and excessive population concentration. The mobility structure on weekends shows a decentralized and diversified pattern, indicating that the suburbs are transitioning from a single residential function to a multi-functional one, with the addition of leisure, entertainment, and commercial facilities. It has eased the pressure on the central area and fostered a more balanced spatial structure. However, the functional development of the suburbs remains less advanced than that of the central area, and balanced development is still in its early stages. In urban planning, it is essential to optimize the functional layout of the central urban area, enhance suburban functions, and promote their diversified development. It will help reduce residents’ dependence on the central area and support the balanced growth of urban space. Owing to the limitations of the available data sources, this study exclusively utilizes DiDi data. For future work, we plan to expand our data scope to include micromobility data, such as shared scooters and bikes, as well as sensor-based urban quality metrics like noise and vibration levels. We also intend to adopt multimodal mobility data fusion techniques to more accurately capture and reflect the complex dynamics of modern urban environments.