Commute Networks as a Signature of Urban Socioeconomic Performance: Evaluating Mobility Structures with Deep Learning Models

Khulbe, Devashish; Belyi, Alexander; Sobolevsky, Stanislav

doi:10.3390/smartcities8040125

Open AccessArticle

Commute Networks as a Signature of Urban Socioeconomic Performance: Evaluating Mobility Structures with Deep Learning Models

by

Devashish Khulbe

^1,*

,

Alexander Belyi

¹

and

Stanislav Sobolevsky

^2,3,4,*

¹

Department of Mathematics and Statistics, Faculty of Science, Masaryk University, 602 00 Brno, Czech Republic

²

Institute of Computer Science, Masaryk University, 602 00 Brno, Czech Republic

³

Center for Urban Science + Progress, New York University, Brooklyn, NY 11201, USA

⁴

Center for Interacting Urban Networks, New York University Abu Dhabi, Abu Dhabi P.O. Box 129188, United Arab Emirates

^*

Authors to whom correspondence should be addressed.

Smart Cities 2025, 8(4), 125; https://doi.org/10.3390/smartcities8040125

Submission received: 14 May 2025 / Revised: 18 July 2025 / Accepted: 23 July 2025 / Published: 29 July 2025

Download

Browse Figures

Versions Notes

Abstract

Highlights

What are the main findings?

Mobility network structures derived from census-based commute data significantly enhance modeling performance for socioeconomic modeling, even without using region-specific features.
The proposed deep learning framework employing Graph Neural Networks outperforms traditional models using regional-level features across 12 major U.S. cities.

What is the implication of the main finding?

Network-based representations derived from deep learning methods offer a powerful alternative to traditional location-based approaches for urban analysis and forecasting of population-based metrics.
The approach provides urban researchers and policymakers with scalable tools to incorporate mobility-driven structural and topological network effects—derived purely from commuting patterns—into socioeconomic planning and decision-making.

Abstract

Urban socioeconomic modeling has predominantly concentrated on extensive location and neighborhood-based features, relying on the localized population footprint. However, networks in urban systems are common, and many urban modeling methods do not account for network-based effects. Additionally, network-based research has explored a multitude of data from urban landscapes. However, achieving a comprehensive understanding of urban mobility proves challenging without exhaustive datasets. In this study, we propose using commute information records from the census as a reliable and comprehensive source to construct mobility networks across cities. Leveraging deep learning architectures, we employ these commute networks across U.S. metro areas for socioeconomic modeling. We show that mobility network structures provide significant predictive performance without considering any node features. Consequently, we use mobility networks to present a supervised learning framework to model a city’s socioeconomic indicator directly, combining Graph Neural Network and Vanilla Neural Network models to learn all parameters in a single learning pipeline. In experiments in 12 major U.S. cities, the proposed model achieves considerable explanatory performance and is able to outperform previous conventional machine learning models based on extensive regional-level features. Providing researchers with methods to incorporate network effects in urban modeling, this work also informs stakeholders of wider network-based effects in urban policymaking and planning.

Keywords:

urban mobility; Graph Neural Networks; socioeconomic modeling

1. Introduction

Interactions among urban neighborhoods are increasingly common in the modern world. The interactions can take the form of physical movement of people and things or manifest in intangible quantities like social networks. A location’s characteristics may be determined by its interactions with other places within the city. Evaluating a region’s place in a large urban network could determine many important indicators. This could reveal the social well-being of the residents, as well as the general economic status of a place in a city. Networks based on urban systems’ geographical, cultural, and social perspectives have been well established and studied [1,2]. In particular, urban street-based networks are widely evaluated in the literature [3,4], and their impact on socioeconomics has also been established [5]. However, studying networks in the context of socioeconomic modeling is not very common in the literature. Urban scientists have focused more on aggregated physical quantities [6] and contextual local features [7] in socioeconomic modeling. Extensive urban variables like 311 service requests have proved to be crucial signatures of an urban neighborhood [8]. However, local regional variables are only able to capture the information within a location. Traditional feature-based models often overlook structural and relational dependencies between urban regions, despite the fact that such mobility-driven network effects—reflecting how regions are interconnected through commuting flows—can implicitly influence where people choose to live, work, or relocate within a city. Hence, it has become increasingly crucial to have urban network representations as a key factor in different aspects of urban modeling. Such representations could capture intricate relations and the topology of a region within a vast graph. Moreover, evaluating predictive models based on these representations with/without contextual regional variables would help determine the explanatory power of just the network in urban modeling. This could be quite helpful for urban scientists who rely on static regional features that are not updated frequently, while many urban networks can be constructed almost in real time. Leveraging urban networks in socioeconomic modeling could thus be a crucial way of modeling city dynamics efficiently.

Representation methods to generate low-dimensional embedding in a graph have existed in the literature. Methods such as node2vec [9] and LINE [10] have been used to capture the structural properties of networks. These methods have limitations in capturing the global topology of a graph as they rely on local structures. Recently, Graph Neural Networks (GNNs) [11] have shown promising results in graph representation experiments for many downstream tasks like node classification [11] and community detection [12]. GNNs essentially work on the principle of propagating information through convolution operations within a network, with the goal of preserving the important structural properties of the network, such as local connectivity and symmetry. The convolution operation is implemented with Graph Convolutional Networks (GCNs), while the attention mechanism has also been incorporated in the form of Graph Attention Networks (GAT) [13]. We will hereby include GCN and GAT under the umbrella term of GNN.

Recent work in network representation modeling has also extended to urban networks, although in a limited way. Urban networks are typically constructed by defining nodes as geographic regions—such as grid cells, administrative zones, or neighborhoods—and edges as interactions or relationships between these regions. These interactions may include commuting flows, taxi trip counts, social connections, or even the co-occurrence of events, depending on the application context. This spatially grounded formulation enables the encoding of urban dynamics and functional connectivity patterns into graph structures that can be exploited by GNNs. Machine learning techniques like Variational Autoencoders have been used to represent urban street networks and compare their metrics [14]. For street networks, researchers have also proposed using different path (edge) representations for predictive modeling [15]. In the context of node representation, GATs have shown promising results for learning embedding for regions in an urban network [16,17]. GATs have been used in an unsupervised learning fashion to learn the vector embedding with initial Point-of-Interest (POI) features as input [16]. POI features have also been considered with GAT in the case of heterogeneous graphs, where researchers have used multiple node categories to construct the urban network [17]. More recently, GAT [13] has been used to capture population–facility interactions and dependencies in a bipartite urban graph [18]. GNN-based node embedding vectors have shown promising results in supervised classification tasks, such as classifying building patterns based on building footprints [19] and classifying urban scenes incorporating both visual and semantic features [20]. GNNs are being increasingly applied in policy-relevant contexts. Recent methods have provided effective frameworks for traffic simulation [21], road-safety sensing [22], and land-use inference [23].

In the domain of socioeconomic modeling, recent work has demonstrated the potential of graph convolutional network (GCN)-based embeddings—trained to reconstruct network edges—for predicting variables such as median income and housing characteristics across city districts [24]. These approaches have achieved promising results by leveraging the structural information of urban networks; however, they typically rely on heterogeneous graphs that incorporate multiple categories of points of interest (POIs) and their links to spatial regions. Such models thus depend heavily on rich auxiliary datasets and fine-grained contextual information, including POI annotations and regional attributes. Furthermore, many existing studies integrate extensive node-level features during training, often overlooking the question of how much predictive power lies in the network structure itself. This raises a critical research question: To what extent can urban networks alone, without additional contextual data, support reliable and interpretable socioeconomic modeling? By isolating the structural signal in mobility networks, we aim to assess whether they suffice for modeling key socioeconomic indicators, or if context-dependent features are indispensable for accurate predictions.

In applications in downstream modeling, most existing approaches with GNNs follow a two-stage pipeline: first, a model is trained to generate node embeddings from the network—either in a supervised or unsupervised fashion—and then a second supervised model is trained to predict target variables using these learned embeddings as input features [16,17,24]. While effective in some cases, this decoupled learning framework introduces a disconnect between representation learning and the end prediction task. It also requires careful tuning of two separate models with distinct objectives—embedding quality and predictive accuracy—which may not align optimally. This presents a notable gap: the absence of end-to-end models that directly learn to predict downstream outcomes from the network structure. A more integrated approach, where the model is trained directly on the target variable using graph-based signals, could offer improved performance and interpretability, particularly in urban contexts where network structure may already encode meaningful spatial and demographic patterns. Exploring whether such direct modeling approaches outperform two-stage pipelines is a key direction for advancing urban attribute inference from mobility networks.

Our main contributions can be summarized as follows:

We demonstrate that mobility (commute) networks alone, without reliance on auxiliary contextual data, can provide sufficient structural signal for effective socioeconomic modeling.
We propose a unified GNN + VNN architecture that jointly learns network embeddings and performs downstream prediction of urban location characteristics in a single end-to-end training pipeline.

2. Materials and Methods

2.1. Data Overview

We define the urban mobility network by the commute flow among a city’s neighborhoods. Specifically, we consider the network with nodes to be the geographical units in a city with edges being weighted by the number of people who commute for work from one unit to another per day. The commute flow data is retrieved from the Longitudinal Employer-Household Dynamics (LEHD), a U.S. Census Bureau program [25]. LEHD creates a detailed picture of labor market dynamics and integrates federal, state, and Census Bureau data on employers and employees. The commute flows are specifically collected by LEHD Origin-Destination Employment Statistics (LODES), which is updated annually. We use the commute flow values to populate our networks as they reflect a comprehensive picture of mobility in a city. While the LEHD data may not be as dynamically updated as other information about a city, it nevertheless represents an inclusive mobility network when contrasted with urban networks built on data from platforms like social media, which may exclude significant portions of the population. Table 1 shows the key network statistics for the three cities. For the socioeconomic variable, we consider median income as the quantity for modeling. While income may not be a complete measure of the socioeconomic status of a neighborhood, it captures essential socioeconomic features. Moreover, other variables like unemployment status, housing profile, etc., are not easily available in many urban areas. We retrieve the income data for the cities from American Community Survey (ACS) data by U.S. Census [26], which is updated annually. All the data is aggregated on the census tract level. Figure 1 shows the network structure and income distribution for NYC across census tracts.

For comparison, we also consider the 311 complaint dataset [27] (based on availability) as features for an area. The data consists of all the non-emergency complaints across a city and comprises a large number of complaint categories, thus making it a comprehensive representation of a given area’s socioeconomic status. The results from 311 data-based modeling serve as a robust and proven benchmark [8] for comparison with our proposed methods. More details on the 311 data are provided in Appendix A.1: Data.

2.2. Methods

We first investigate mobility network embeddings as predictors of a demographic target variable. The goal is to consider only the network structure within a city and evaluate its utility in modeling a socioeconomic indicator across the city’s census tracts. In the following sections, we propose learning node representations using a VNN architecture to reconstruct the network edges. The VNN model follows a two-step learning process based on two MLP neural network models. In our next approach, we stack GNN and MLP layers to build a model that directly learns the socioeconomic target variable. This unified pipeline (hereafter referred to as the GNN + VNN framework) combines graph convolution operations with fully connected MLP layers to jointly learn all parameters while optimizing node embeddings for socioeconomic modeling.

2.2.1. Network Embedding as Model Inputs

Graph embeddings have been extensively experimented with and used in graph-based models in the literature. In recent work in graph Transformer models (GT), network embeddings have been used as positional encodings. Some of the techniques for generating such embedding include SVD [28], Laplacian Eigenvectors [29,30,31], and shortest path distances [32]. Recently, random walk encodings have been successfully used as structural encoding in GNN-based models [33,34,35]. In mobility networks, network embeddings have been proven to be useful for downstream tasks using heterogeneous networks [36,37].

Notably, in our experiments with clustering node embedding for mobility networks, we notice an interesting capability to distinguish regions based on their socioeconomic profile. In all cities considered in this work, we particularly observe that embeddings can differentiate between high-income and low-income districts in cities. Figure 2 shows the results with a K-means clustering model [38] on the SVD embedding. The median income of clusters can indicate the distinction of regions based on their socioeconomic profiles. In particular, the clear distinction among areas such as Lower Manhattan, the Bronx, Inner Brooklyn/Queens in NYC, and Chicago’s south side from central and north neighborhoods of the city, is interesting. Clustering results from more embedding methods are discussed in Appendix A.2: Network Embedding Clustering.

A recent study has found the potential to delineate regions in cities using cell phone mobility data [39]. The ability of mobility embedding to discern regions based on their socioeconomic profiles is particularly interesting in our experiments, as recent studies have also found that mobility is vastly impacted by commuters’ income status [40,41]. Thus, embedding as a low-dimensional representation of the larger mobility network can be particularly useful for modeling other socioeconomic indicators of regions, including income. We thus propose using these embeddings as inputs in our models. More specifically, we experiment with spatial embedding (location of regions), SVD and Laplacian Eigenvectors as initial inputs.

2.2.2. Evaluating Mobility Networks—VNN Based Embedding

We aim to learn vector representations of nodes (regions) in a city’s mobility network, which can then be used as features for socioeconomic modeling. Urban network representations have been established using various methods, including deep learning-based architectures trained in supervised/unsupervised manner [15,16,19]. We begin by training a non-graph-based neural network as a baseline for comparison with the proposed graph-based models. Specifically, we use a VNN to reconstruct the edges of the mobility network, thereby learning structural representations of nodes. This training is conducted in a self-supervised manner, where the objective is to reconstruct the original adjacency structure of the network. The resulting internal states of the model serve as learned embeddings of the mobility graph, which can then be used for downstream tasks. Figure 3 shows the model with its inputs and outputs involved.

Let

G (V, E)

be the mobility network with nodes V representing census tracts and edges E weighted by the volume of commute flow among the census tracts in a given city. We consider an initial d dimensional vector

e = [e_{1}, e_{2}, \dots, e_{d}]

populated by the node embedding and random values, denoting the embedding for a given geographical entity in a city. Then,

E \in N \times d

initializes the embedding matrix where each row represents the d dimensional embedding vector for a specific entity (e.g., an area in the city). We then augment this matrix to transform it into a pairwise interaction matrix among the embedding pairs. The augmentation process involves creating a new matrix

E_{a u g m e n t e d} = c o n c a t e n a t e ([E_{i}, E_{j}]), E_{i}, E_{j} \in E

by concatenating each row with every other row and itself from the original matrix. The concatenation is performed for all

i, j \in V

(including

i = j

), accounting for self-loops in the network. We then compute the element-wise squared difference between the embedding vectors of paired entities to establish the interaction or dissimilarity measure between the two entities, which can be given as

{(e^{'} [: d] - e^{'} [d :])}^{2}, e^{'} \in E_{a u g m e n t e d}

. The resulting matrix

E_{a u g m e n t e d}

has dimensions

(N^{2}, d)

, capturing pairwise combinations of the original embedding vectors.

With the augmented matrix as the input, we consider a three-layer VNN model with

(4 \times d, 3 \times d, d)

hidden layers (d being the embedding dimensionality) and Rectified Linear Unit (ReLU) activation. Specifically, the output of VNN model (f) is

Y_{i j} = f (E_{a u g m e n t e d} [i j])

(1)

where the output Y is a

N^{2} \times 1

vector with a specific element

Y_{i j}

representing the mobility between the ith and jth geographical entities. Importantly, we specify the individual nodes’ vector (embedding) E as a trainable parameter of the model. Therefore, the input vector can adapt during training to better represent the characteristics of the input data. Consequently, the model’s output Y can be reconstructed as the adjacency matrix A, which completely represents the connectivity between nodes in the network. The model is thus trained with the objective of reconstructing the adjacency matrix of the mobility network, with mean squared error (MSE) as the objective to minimize the following:

\frac{1}{N^{2}} \sum_{i}^{N} \sum_{j}^{N} {(A_{i j} - \hat{A_{i j}})}^{2}

(2)

Here,

A_{i j}

represents the element at row i and column j in the true adjacency matrix A, while

\hat{A_{i j}}

represents the corresponding element in the predicted adjacency matrix.

The trained matrix E can thus be regarded as an embedding of the network nodes. Each d-dimensional vector within this embedding corresponds to a specific region (node) within the mobility network, capturing its unique characteristics and interactions.

We model the socioeconomic variable with learned embedding as inputs with a supervised MLP model as a second step. The configuration of the model is discussed in Section 3, while performance with different embedding dimensions is discussed in Appendix A.3: Experiments with Embedding: Dimensionality.

While the VNN-based approach effectively captures structural embeddings of nodes in the mobility network, it requires a two-stage pipeline: first learning embeddings through self-supervised network reconstruction, and then using these embeddings as inputs to a separate supervised model for socioeconomic prediction. This separation introduces potential inefficiencies, such as suboptimal alignment between the embedding objective and the final prediction task. In contrast, a unified model that jointly learns node representations and predicts the target variable in a single training pipeline can offer several advantages: it eliminates the need for embedding-specific supervision, ensures that learned representations are directly optimized for the downstream task, and simplifies the modeling process. In the next section, we propose such an architecture combining the GNN and VNN models in a single training pipeline.

2.2.3. Single-Pipeline Modeling: GNN + VNN Framework

The use of pre-trained network representations as features for various downstream tasks has been prevalent in the literature [16,17,24]. However, these methods require learning embeddings with a defined objective (e.g., edge reconstruction) which may have little significance in the overall objective in urban socioeconomic modeling. Hence, we propose a model architecture to learn the target variable in a single learning pipeline, without having to learn network embedding with a separate model. Figure 4 depicts the complete model architecture with all layers involved.

We adopt a two-layer GNN architecture for modeling. Specifically, the input feature matrix is defined as

X = I_{N}

, where

I_{N}

is the

N \times N

identity matrix and N is the number of nodes (regions) in the mobility network. This choice ensures that each node has the same initial feature space, while placing the burden of learning relevant representations entirely on the network structure.

Two-layer GNNs have been widely recognized as effective for a range of graph-based learning tasks [11,42]. They strike a balance between expressiveness and efficiency, capturing rich local topological information while avoiding excessive computational overhead. Importantly, deeper GNN architectures are often prone to the oversmoothing problem, where node embeddings converge to similar values and become less discriminative [42,43]. By limiting the depth to two layers, we mitigate this issue and maintain meaningful structural differentiation among nodes.

A typical GNN layer involves graph convolutions taking into account the topology and connectivity of nodes in the network. These layers operate by aggregating information from neighboring nodes and updating node representations accordingly. The graph convolution operation for a single layer can be defined as follows:

H^{l} = σ (W^{l} \hat{A} H^{l - 1} + B^{l})

(3)

where

H^{l} \in R^{V \times d}

is the output of the layer,

W^{l}

is the weight matrix,

\hat{A} = D^{- 1 / 2} A D^{- 1 / 2}

is the normalized adjacency matrix with self-loops (D being the diagonal degree matrix),

B^{l}

is bias vector at layer l and

σ

is an activation function. We consider ReLU as the activation in our configuration. Introducing normalization to account for the scale of sub-regions in the network, the convolution operation at the node level is given by the following:

h_{i}^{l} = σ (\sum_{j \in N (i)} \frac{1}{| N (i) |} W^{T} h_{j}^{l - 1})

(4)

where

N (i)

is the connected neighborhood of node i, with

h_{i}^{l}

being its feature vector in network embedding

H^{l}

.

While GCNs apply graph convolutions, the attention mechanism has been wildly successful in recent applications. The GAT model assigns different attention weights to neighboring nodes, thereby aggregating information in a more nuanced manner and updating node representations accordingly. The graph attention operation for a single head can be defined as follows:

h_{i}^{l} = σ (\sum_{j \in N (i)} α_{i j} W^{l} h_{j}^{l - 1})

(5)

where

h_{i}^{l} \in R^{d}

is the output feature vector of node i at layer l,

W^{l}

is the weight matrix, and

α_{i j}

are the attention coefficients, which are computed as follows:

α_{i j} = \frac{exp (LeakyReLU (a^{T} [W^{l} h_{i}^{l - 1} | | W^{l} h_{j}^{l - 1}]))}{\sum_{k \in N (i)} exp (LeakyReLU (a^{T} [W^{l} h_{i}^{l - 1} | | W^{l} h_{k}^{l - 1}]))}

(6)

where a is a learnable weight vector, and

| |

denotes concatenation. The LeakyReLU activation function introduces non-linearity into the attention mechanism. To enhance the model’s capability, we employ multi-head attention, where K independent attention mechanisms (heads) are applied in parallel, and their outputs are concatenated or averaged to form the final output.

The GCN/GAT layers are supplemented by a VNN to map the model’s output to the socioeconomic target variable in a given region. The MSE objective function (2) from the VNN model is thus changed to

\frac{1}{N^{2}} \sum_{i}^{N} {(Y_{i} - \hat{Y_{i}})}^{2}

, where

Y_{i}, \hat{Y_{i}}

are the ground truth and predicted target variable in question. This operation can be defined as follows:

\hat{Y_{i}} = f (h_{i}^{(2)})

. Here,

h_{i}^{(2)}

corresponds to the vector embedding produced by the second GCN/GAT layer and f is the mapping function (VNN). The model parameters

θ_{G N N}

and

θ_{V N N}

for GNN and VNN layers are learned in a single backpropagation pipeline. Notably, the learned embedding from the two-layer GCN/GAT

H^{(2)}

is thus optimized for the specific purpose of socioeconomic modeling, ensuring that they capture essential network features conducive to accurately modeling a city’s socioeconomic indicator.

2.3. Experiments

The broader aim of our experiments with mobility networks is to investigate their utility in socioeconomic modeling in a city. Graph-based modeling with mobility networks has demonstrated effectiveness in smaller urban networks, treating large urban regions (zipcodes) as nodes [24]. In this study, we extend our focus to larger urban networks, where smaller geographical units (census tracts) serve as nodes. This expansion in scale allows for an evaluation of the stability and consistency of mobility networks as predictors in more extensive urban areas characterized by complex mobility dynamics. As socioeconomic indicators, recent works have considered regional variables like crime statistics, personal income, bike flow, etc. [17]. We focus on median household income as the target variable in this work. Income is not only a direct measure of economic health in a neighborhood but also an indicator of development and investments in urban areas [44]. Moreover, the income data is consistently available and easily accessible in urban areas, typically through census.

As a comparison benchmark, the 311 service request dataset has been shown to be a strong proxy for capturing various socioeconomic characteristics of urban areas. Studies have demonstrated its utility in modeling diverse variables such as median income, housing prices, and neighborhood distress [8,45]. By aggregating the types, frequencies, and temporal patterns of resident complaints, researchers have been able to construct high-resolution spatial models of urban socioeconomic conditions. It thus becomes a good benchmark to compare against our proposed approaches. In Appendix A.4, we show in detail the performance of 311 data against node-level attributes in all cities where the data is available. The results confirm the utility of 311 dataset as a good benchmark to compare against our proposed methods. It is important to note, however, the potential reporting bias in the 311 data. For instance, disparities in reporting between low- and high-income groups have been found in NYC’s 311 dataset [46]. Such biases can skew socioeconomic models, potentially leading to misinformed policy decisions if not properly addressed. In contrast, network-based approaches such as those leveraging commute flow graphs may offer a more equitable alternative by relying on structural, behavior-independent data rather than voluntary reporting.

With VNN-based embedding, a separate supervised learning model is considered to predict the target variable with the embedding as input. A VNN with (32, 64, 32) dimensional hidden layers is considered with Mean Squared Error (MSE) objective and Rectified Linear Unit (ReLU) activation, with the parameters being optimized with backpropagation. Proving the utility of mobility embeddings across all cities, we next experiment with training the GNN + VNN single pipeline architecture, further establishing the importance of network topology without any regional node features.

3. Results

We present the R-squared (R2) scores corresponding to each configuration for median income modeling in Table 2. The R2 scores being calculated as

R^{2} = 1 - \frac{\sum_{i = 1}^{n} {(y_{i} - {\hat{y}}_{i})}^{2}}{\sum_{i = 1}^{n} {(y_{i} - \bar{y})}^{2}}

, with results showing out-of-sample values where data was split in a 70:30 ratio for model training/testing. Also shown are the R2 scores with 311 complaint features, which were utilized in a MLP model for income modeling. We consider MLP with a 311 feature set as the main comparison benchmark, while we discussed the results with other models and features in Appendix A.4. For our proposed VNN-based embedding model, we present the results with four different network embeddings as inputs, obtained from Spatial (location coordinates of nodes), SVD, Laplacian Eigenvectors (LE), and Random Walk methods. We notice the consistently superior results with the mobility network’s embedding over the benchmark 311 features in 9 out of 12 cities considered, whereas 3 cities—Houston, San Antonio and Phoenix—do not have 311 data available. In any case, we observe better modeling results from the proposed methods over local socioeconomic variables like job and population density.

While spatial embeddings yield the strongest performance, we also observe that other types of network-derived embeddings exhibit substantial explanatory power in socioeconomic modeling. Importantly, our models do not incorporate any region-specific features; the embeddings are learned purely from the structure of the mobility network. This design choice is central to our objective: to evaluate whether mobility patterns alone—without any contextual or demographic information—are sufficient for downstream socioeconomic prediction tasks. By excluding explicit node features, we are able to directly assess the predictive value of the network structure itself. The performance is improved with the (GNN + VNN) framework, which directly models the target socioeconomic variable. Improved results by stacking GNN layers make sense, as Graph convolution and attention layers can capture network topology more intricately. However, incorporating attention with GAT does not improve the results, and in many cases achieves a similar performance as with using just the convolution layers. Evaluation with cross-validation indicates consistent and stable improvement in mobility networks over contextual node features as input.

4. Discussion

Our experiments across 12 cities show 1. the effectiveness of mobility networks in modeling median income and, consequently, 2. the ability of a GNN + VNN-based architecture to model a socioeconomic indicator in a single pipeline. While traditional network embedding methods fail to obtain meaningful representations for socioeconomic modeling, Vanilla Neural Network-based embeddings serve as good predictors in all cities. VNN-based embeddings, in a way, also capture network effects, as the training objective is designed to model the commute network weights in a city.

4.1. Interpretation of Embedding Configurations

In many urban studies, initial configurations of spatial embeddings consistently show significantly better predictive performance compared to other approaches. Even when contrasted with local variables like job and population density, network embeddings demonstrate a superior ability to model income structures across cities. The embedding spaces inputted into our models capture distinct aspects of the network: SVD and Laplacian Eigenvectors (LE) emphasize structural and connectivity patterns, while random walk embeddings focus on connectivity and proximity. Although spatial embeddings may not directly represent network structure, tuning these embeddings within our models incorporates certain network characteristics after training. The superior performance of spatial embeddings in many cities suggests that inherent urban spatial residential patterns are critical in predicting median income. However, this does not discount the role of network interactions and structure, as other embeddings still outperform purely local variables. SVD embeddings likely excel because they capture core connectivity and community structure, which are important for modeling economic indicators. On the other hand, the relatively lower performance of random walk and Laplacian Eigenvector (LE) embeddings could be due to their focus on specific aspects of network connectivity that might not directly align with socioeconomic factors. Random walk embeddings prioritize proximity and frequently visited paths, which could lead to overemphasis on highly localized network areas, potentially missing broader spatial patterns relevant to income distribution. LE embeddings, while capturing global connectivity patterns, may introduce noise by highlighting structural nuances that are not as pertinent to income prediction. Further analysis comparing the significance of network structure versus residential location could be enhanced by modeling additional socioeconomic variables such as unemployment rates, housing availability, and more.

4.2. Graph vs. Non-Graph Models

Generally, the combination of GNN or GAT with VNN models outperforms standalone VNN models, suggesting that graph-based models capture spatial dependencies effectively. Furthermore, the GAT + VNN combination in the random walk feature consistently yields competitive performance, indicating that the attention mechanism in GATs effectively incorporates neighborhood information. However, there are cases where VNN-only models outperform the graph-based combinations, such as in Chicago and San Jose, suggesting that, for certain cities, neighborhood relational data might be less informative for income modeling. This could be due to the particular urban layouts or spatial distributions in these cities. This, in fact, highlights that incorporating graph structures may or may not be useful depending on the spatial characteristics of the city. It is important to note that in certain cities, such as Los Angeles and Phoenix, even GNN-based models do not achieve strong predictive performance. This suggests that mobility patterns in these cities may differ substantially from those in other urban areas, and may not strongly correlate with socioeconomic indicators. Unique urban forms and spatial layouts likely contribute to this divergence. For example, the extensive urban sprawl of Los Angeles has been widely studied and contrasted with more compact city structures [47]. Similarly, the polycentric nature of Phoenix—characterized by multiple, loosely connected urban centers—has been shown to influence its distinct mobility behavior [48]. Such structural differences may weaken the predictive power of models relying solely on commute-based network information.

It is noteworthy that mobility networks as input exhibit superior performance compared to 311 data features and demographic variables such as population and job density, which represent comprehensive regional-level variables. When optimal network embedding is employed in conjunction with the 311 features as modeling input, notable improvements in results are observed in New York City and Chicago. However, incorporating population and job density alongside network embedding does not yield similar improvements (refer to Appendix A.5: Concatenation and Modeling with Neighborhood-Level Features). It is also worth noting the significant results just based on network embedding, which shows that network-based representations are extensive enough to capture a lot of node information. We have focused on just the mobility network; however, it could be interesting to see how other network embeddings compare to node features. The GNN + VNN model can learn all parameters in a single gradient descent pipeline; as such, the intermediate network representations (output of GNN layers) are learned with the objective of modeling the socioeconomic target variable. This is in contrast to traditional representation learning, where network embedding vectors are typically learned with an objective based on network reconstruction.

4.3. Policy Implications and Limitations

The ability of mobility networks to effectively model socioeconomic indicators has significant policy implications. First, these models offer a scalable and data-efficient way to assess urban inequality by identifying structurally marginalized or disconnected regions based solely on commuting patterns. This can inform targeted interventions in under-served areas, even when fine-grained demographic or economic data is unavailable. Second, the framework enables planners to simulate and evaluate the potential socioeconomic impact of proposed infrastructure projects by modifying the network structure and observing the predictions of the downstream model. For instance, the addition of a transit link or road segment can be tested for its potential to improve modeled income predictions in peripheral neighborhoods. Finally, the end-to-end trainability of the model allows rapid retraining with updated mobility data, making it suitable for integration into real-time monitoring and response systems, which are especially useful in dynamic contexts such as post-disaster recovery, pandemic-related disruptions, or changes in commuter behavior. By learning directly from urban structure, the proposed GNN + VNN pipeline provides an interpretable and transferable tool for data-driven decision-making in urban policy.

While the results are promising, several limitations remain. First, the generalizability of our approach to cities outside the U.S. is yet to be validated, especially in regions with different urban forms, commuting patterns, or socioeconomic structures. And while census data is collected worldwide, the model’s performance could be inherently sensitive to the quality and granularity of mobility data. In this regard, investigating the robustness of the model with respect to inconsistent and inaccurate mobility data could be a follow-up research direction. Furthermore, although our deep learning framework offers strong predictive performance, its interpretability remains limited for non-technical stakeholders. Thus, integrating an interpretability mechanism into the model architecture offers another future direction.

5. Conclusions

Urban networks have demonstrated strong modeling capabilities across a wide range of downstream tasks, such as traffic forecasting, land-use inference, and road-safety sensing [3,15,22]. In this work, we show that large-scale urban mobility networks—constructed from census-based commute flows—hold significant predictive power for socioeconomic modeling. Crucially, our models rely solely on the structure of the mobility network, without incorporating any node-level features, allowing us to isolate and evaluate the value of network topology alone. We first demonstrate that node embeddings learned through a VNN trained to reconstruct the network can serve as powerful feature representations for urban areas. These embeddings outperform models trained on extensive handcrafted node features, achieving consistent improvements across all evaluated cities. Notably, our findings also echo broader trends in the literature, where deep learning-based embeddings have been shown to overcome the limitations of traditional network embedding methods [49,50]. Building on this, we propose a dedicated Graph Neural Network (GNN)-based model for socioeconomic prediction that jointly learns network structure and downstream modeling in a single, end-to-end pipeline. This unified approach eliminates the need for a two-step training process and directly optimizes all model parameters for the final prediction task. As a result, it not only improves efficiency but also delivers superior performance compared to both node feature-based models and separate embedding-based pipelines.

The results underscore a key implication: the effectiveness of graph-based modeling is closely tied to the spatial and structural characteristics of the urban environment. In cities with more compact or centralized commuting structures—such as New York, Chicago, or San Jose—the network topology captures substantial socioeconomic signals, leading to consistently high predictive performance. However, in cities like Los Angeles and Phoenix, even GNN-based models show comparatively lower accuracy. This divergence likely stems from their distinct urban forms: Los Angeles is characterized by extensive urban sprawl, resulting in diffuse commuting patterns that may not align neatly with socioeconomic gradients. In such settings, mobility connectivity alone may not strongly correlate with socioeconomic indicators, and additional contextual or spatial features may be needed to enhance the model’s effectiveness.

While the proposed methods show the significant benefits of network-based modeling over contextual node features, more graph-based deep learning methods could be potentially useful in homogeneous urban networks. Transformer architectures in graphs have been shown to achieve significant improvements in tasks concerning networks such as molecular structure networks [29]. Additionally, many more urban networks could be evaluated for obtaining node representations [2,51,52], combining street networks, POIs, and building footprints. Moreover, other socioeconomic indicators like the housing profile of citizens and unemployment rate could also be modeled as a function of heterogeneous networks. Such models could help in understanding the complete socioeconomic picture of a neighborhood.

Urban scientists have long explored the prediction and analysis of socioeconomic status in cities using a wide range of social, economic, and spatial features [6,7]. While these approaches have yielded valuable insights, they often overlook the structural interactions between regions—interactions that are naturally captured by urban mobility networks. Recent work has begun to explore the role of network-based representations in urban modeling, suggesting their potential for capturing latent spatial dynamics [24]. In our experiments in twelve major U.S. metro areas, we show that deep learning architectures have powerful modeling capabilities with mobility networks. Our work offers urban researchers a scalable and generalizable framework for incorporating inter-neighborhood interactions into their analyses. Beyond static modeling, the proposed methods can be adapted to dynamic or real-time mobility networks, offering tools for responsive urban planning and policy decision-making. Future researchers and practitioners should recognize the potential of mobility-based network signals as standalone predictors, and consider integrating such structural representations into broader frameworks for socioeconomic forecasting, urban resilience planning, and real-time intervention strategies.

Author Contributions

Conceptualization, D.K. and S.S.; methodology, D.K. and S.S.; formal analysis, D.K.; investigation, D.K. and A.B.; data curation, D.K.; writing—original draft preparation, D.K.; writing—review and editing, A.B. and S.S.; visualization, D.K.; supervision, S.S.; project administration, S.S. and A.B.; funding acquisition, S.S. All authors have read and agreed to the published version of the manuscript.

Funding

This research was supported by the MUNI Award in Science and Humanities (MASH Belarus) of the Grant Agency of Masaryk University under the Digital City project (MUNI/J/0008/2021). This work was also partially supported by the NYUAD Center for Interacting Urban Networks (CITIES), funded by Tamkeen under the NYUAD Research Institute Award CG001.

Data Availability Statement

The data used in this study can be found in a Zenodo repository: https://doi.org/10.5281/zenodo.11494208. Specifically, the data contains origin–destination daily commute flow information among the census tracts in all the 12 U.S. cities considered in this work.

Conflicts of Interest

The authors declare no conflicts of interest.

Abbreviations

The following abbreviations are used in this manuscript:

GNN	Graph Neural Network
LEHD	Longitudinal Employer Household Dynamics
SVD	Singular Value Decomposition
GAT	Graph Attention Network
VNN	Vanilla Neural Network
MLP	Multi Layer Perceptron
NYC	New York City

Appendix A

Appendix A.1. Data

The mobility information for Chicago and Boston was retrieved from the LEHD [25]. The 311 complaint dataset for the two cities is available from the official city data portals [53,54]. The 311 dataset comprises comprehensive data of the non-emergency complaints from citizens directed at various city agencies. Figure A1 shows the complaint distribution by top categories in each city.

In modeling, the 311 data is aggregated at the census tract level and normalized across all complaint categories to represent the proportion footprint of each category in a location. This approach was proposed in [8] to model income, and we use the same pre-processing for 311 data to use as inputs in our models.

Figure A1. Top 10 complaint categories in the 311 data, represented as % of total complaints, across each city.

We also consider two more demographic variables—population density and job density—as modeling inputs for comparison purposes. These features are available from the U.S. census [26] and aggregated to the census tract level for our study. Figure A2 shows population density across the three cities.

Figure A2. Population density (per 100 sq.meters) across census tracts in cities.

Appendix A.2. Network Embedding Clustering

With the idea of identifying spatial patterns (if any) related to socioeconomics, we experimented with clustering mobility embeddings derived from Random Walk and Laplacian Eigenvectors (LE) methods. We employ the Pagerank method [55] as a Random Walk embedding, owing to its success in urban networks in the literature [56,57]. Results from the K-means clustering method are shown in Figure A3 and Figure A4 for Pagerank and LE embeddings, respectively.

Like SVD, the Pagerank embedding for census tracts can distinguish regions based on their income profile. For all the three cities in consideration, we see distinct neighborhoods identified by their differences in median income. However, clustering results from LE embedding is not so discernable. While we still see spatial patterns in clusters, the differences in income profile are not quite significant among most clusters in all cities. However, LE embedding can still distinguish some high-income neighborhoods from others. For instance, it can distinguish the Lower Manhattan area of NYC and central Chicago as separate clusters.

GNN-based embeddings, on the other hand, result in significantly more meaningful community partitions. These clusters exhibit strong spatial cohesiveness while also aligning more clearly with gradients of socioeconomic status. As shown in Figure A5, the income density maps overlaid with GNN-derived community boundaries reveal that these embeddings are capable of delineating urban subregions with high internal socioeconomic similarity and sharp external contrasts. This demonstrates an important strength of GNN-based representations: they not only encode the structural and relational properties of the mobility network but also learn to abstract relevant patterns that correlate with real-world socioeconomic variables.

Figure A3. Clustering of random walk (Pagerank) embedding of mobility networks. Like SVD, we obtain neighborhood distinction based on income profiles.

Figure A4. Clustering of Laplacian embedding of mobility networks. While the results are not as spatially cohesive as SVD and Pagerank methods, we can still distinguish high-income neighborhoods from the rest.

Figure A5. High- and low-income-density areas, captured by communities resulting from GNN-based embeddings.

These findings underscore the potential of GNN embeddings for data-driven urban delineation, where neighborhoods can be defined not just by arbitrary administrative boundaries but by their embedded position in mobility-driven social and economic space.

Appendix A.3. Experiments with Embedding: Dimensionality

We experimented with different node embedding dimensions to evaluate modeling median income at the node (census tract) level. Table A1 shows the R-2 scores in the three cities corresponding to each VNN-based embedding configuration. Note that the embeddings are used as inputs to a separate VNN model to model the target socioeconomic variable.

Table A1. Embedding dimensions and stability: R² scores from modeling median income, with d dimensional VNN-based embedding as modeling input.

VNN Embedding	Cities
VNN Embedding	NYC	Boston	Chicago	San Jose	San Diego	Austin	Dallas	LA	San Antonio	Phoenix
d = 2	$0.47 \pm 0.04$	$0.03 \pm 0.02$	$0.3 \pm 0.05$	$0.47 \pm 0.06$	$0.36 \pm 0.03$	$0.1 \pm 0.04$	$0.09 \pm 0.01$	$0.13 \pm 0.02$	$0.23 \pm 0.04$	$0.16 \pm 0.05$
d = 5	$0.40 \pm 0.01$	$0.32 \pm 0.01$	$0.68 \pm 0.015$	$0.63 \pm 0.03$	$0.43 \pm 0.01$	$0.59 \pm 0.03$	$0.58 \pm 0.025$	$0.28 \pm 0.01$	$0.49 \pm 0.03$	$0.44 \pm 0.01$
d = 10	$0.53 \pm 0.02$	$0.18 \pm 0.01$	$0.68 \pm 0.01$	$0.75 \pm 0.02$	$0.41 \pm 0.02$	$0.54 \pm 0.03$	$0.54 \pm 0.02$	$0.28 \pm 0.015$	$0.49 \pm 0.025$	$0.45 \pm 0.01$
d = 15	$0.53 \pm 0.01$	$0.26 \pm 0.005$	$0.63 \pm 0.02$	$0.70 \pm 0.03$	$0.41 \pm 0.01$	$0.54 \pm 0.03$	$0.58 \pm 0.005$	$0.26 \pm 0.01$	$0.52 \pm 0.01$	$0.42 \pm 0.005$

Notably, the 5 dimensions are optimal for many of the cities, whereas, for NYC, San Jose and Phoenix, the optimal embedding dimension is 10. In contrast, increasing dimensionality to

d = 15

yields only marginal improvements or even slightly reduced performance in several cities. This may be attributed to overfitting or the “curse of dimensionality,” where additional embedding dimensions do not contribute useful information and instead introduce noise. Moreover, in medium-sized cities such as Boston and San Antonio, improvements from

d = 10

to

d = 15

are minimal, possibly indicating that the underlying mobility structures are relatively simpler or less fragmented.

Another potential explanation lies in the capacity of the downstream MLP model: beyond a certain embedding size, the model may be unable to effectively utilize the additional latent information without increased depth or regularization. Finally, we note that embedding performance is relatively stable across runs, with low standard deviation in most cases, indicating the robustness of the learned structural features to random initialization and training variation.

Appendix A.4. Comparison with Classical ML Models

In Table A2, we present the R-2 scores for baseline models using node-level features. We evaluate three widely used classical machine learning approaches: Linear Regression with Lasso regularization (LR) [58], Random Forests (RF) [59], and Gradient Boosting Trees (GB) [60]. The models are trained using three different sets of node-level features: normalized 311 service request data, population density, and job density.

Our results show that when using 311 features, these classical methods generally underperform compared to the MLP model reported in Table 2. In certain cities, RF and GB models achieve comparable performance to the MLP; however, overall, the analysis indicates that the MLP provides the most robust predictive performance when using 311 features. These results establish 311 features as a strong baseline for evaluating the performance of our proposed models.

Table A2. R-2 Scores of baseline models with node attribute features.

City	LR			RF			GB
City	311	Pop.Density	Job.Density	311	Pop.Density	Job.Density	311	Pop.Density	Job.Density
NYC	$0.49 \pm 0.01$	$0.01 \pm 0.001$	$0.02 \pm 0.001$	$0.51 \pm 0.005$	$0.02 \pm 0.001$	$0.05 \pm 0.005$	$0.41 \pm 0.03$	$0.05 \pm 0.003$	$0.07 \pm 0.01$
LA	$0.12 \pm 0.01$	$0.03 \pm 0.001$	$0.01 \pm 0.001$	$0.16 \pm 0.001$	$0.03 \pm 0.001$	$0.02 \pm 0.001$	$0.16 \pm 0.001$	$0.03 \pm 0.01$	$0.01 \pm 0.001$
Chicago	$0.41 \pm 0.02$	$0.1 \pm 0.01$	$0.08 \pm 0.001$	$0.55 \pm 0.01$	$0.16 \pm 0.01$	$0.09 \pm 0.001$	$0.5 \pm 0.03$	$0.16 \pm 0.01$	$0.12 \pm 0.01$
Boston	$0.15 \pm 0.005$	$0.01 \pm 0.001$	$0.03 \pm 0.005$	$0.15 \pm 0.01$	$0.01 \pm 0.005$	$0.05 \pm 0.001$	$0.15 \pm 0.01$	$0.03 \pm 0.01$	$0.08 \pm 0.01$
Philadelphia	$0.3 \pm 0.02$	$0.12 \pm 0.01$	$0.08 \pm 0.01$	$0.5 \pm 0.02$	$0.15 \pm 0.01$	$0.11 \pm 0.01$	$0.51 \pm 0.02$	$0.2 \pm 0.01$	$0.16 \pm 0.02$
Dallas	$0.32 \pm 0.02$	$0.05 \pm 0.001$	$0.03 \pm 0.001$	$0.37 \pm 0.005$	$0.04 \pm 0.01$	$0.03 \pm 0.001$	$0.29 \pm 0.001$	$0.02 \pm 0.001$	$0.02 \pm 0.001$
Austin	$0.25 \pm 0.01$	$0.09 \pm 0.001$	$0.1 \pm 0.001$	$0.25 \pm 0.01$	$0.1 \pm 0.005$	$0.11 \pm 0.001$	$0.28 \pm 0.001$	$0.09 \pm 0.01$	$0.1 \pm 0.005$
San Jose	$0.29 \pm 0.02$	$0.05 \pm 0.001$	$0.07 \pm 0.001$	$0.45 \pm 0.01$	$0.06 \pm 0.005$	$0 . 0.07 \pm 0.005$	$0.46 \pm 0.001$	$0.05 \pm 0.01$	$0.06 \pm 0.01$
San Diego	$0.25 \pm 0.03$	$0.09 \pm 0.01$	$0.11 \pm 0.001$	$0.32 \pm 0.01$	$0.12 \pm 0.001$	$0.15 \pm 0.005$	$0.34 \pm 0.02$	$0.1 \pm 0.01$	$0.12 \pm 0.02$

Appendix A.5. Concatenation and Modeling with Neighborhood-Level Features

To evaluate the combined modeling capability of both node embeddings and neighborhood-level features, we experimented with using them together as inputs to a supervised VNN model. Specifically, we concatenate the optimal VNN-based embedding with the 311 complaint feature set, population density, and job density of the neighborhoods. Table A3 shows results for three major cities.

Table A3. R-2 scores from across various input features to a supervised model to predict median income.

Model Inputs	Cities
Model Inputs	NYC	Boston	Chicago
Population density	0.02	0.01	0.16
Job density	0.07	0.08	0.12
Embedding + Population density	0.53	0.33	0.68
Embedding + Job density	0.53	0.33	0.68
Embedding + 311 data	0.64	0.33	0.75

References

Rosvall, M.; Trusina, A.; Minnhagen, P.; Sneppen, K. Networks and cities: An information perspective. Phys. Rev. Lett. 2005, 94, 028701. [Google Scholar] [CrossRef]
Pflieger, G.; Rozenblat, C. Introduction. Urban networks and network theory: The city as the connector of multiple networks. Urban Stud. 2010, 47, 2723–2735. [Google Scholar] [CrossRef]
Jiang, B.; Claramunt, C. Topological analysis of urban street networks. Environ. Plan. B Plan. Des. 2004, 31, 151–162. [Google Scholar] [CrossRef]
Li, X.; Lv, Z.; Zheng, Z.; Zhong, C.; Hijazi, I.H.; Cheng, S. Assessment of lively street network based on geographic information system and space syntax. Multimed. Tools Appl. 2017, 76, 17801–17819. [Google Scholar] [CrossRef]
Shi, G.; Shan, J.; Ding, L.; Ye, P.; Li, Y.; Jiang, N. Urban Road Network Expansion and Its Driving Variables: A Case Study of Nanjing City. Int. J. Environ. Res. Public Health 2019, 16, 2318. [Google Scholar] [CrossRef]
Xu, Y.; Belyi, A.; Bojic, I.; Ratti, C. Human mobility and socioeconomic status: Analysis of Singapore and Boston. Comput. Environ. Urban Syst. 2018, 72, 51–67. [Google Scholar] [CrossRef]
Lee, S.; Lin, J. Natural amenities, neighbourhood dynamics, and persistence in the spatial distribution of income. Rev. Econ. Stud. 2018, 85, 663–694. [Google Scholar] [CrossRef]
Wang, L.; Qian, C.; Kats, P.; Kontokosta, C.; Sobolevsky, S. Structure of 311 service requests as a signature of urban location. PLoS ONE 2017, 12, e0186314. [Google Scholar] [CrossRef]
Grover, A.; Leskovec, J. node2vec: Scalable feature learning for networks. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Francisco, CA, USA, 13–17 August 2016; pp. 855–864. [Google Scholar]
Tang, J.; Qu, M.; Wang, M.; Zhang, M.; Yan, J.; Mei, Q. LINE: Large-scale information network embedding. In Proceedings of the 24th International Conference on World Wide Web, Florence, Italy, 18–22 May 2015; pp. 1067–1077. [Google Scholar]
Kipf, T.N.; Welling, M. Semi-supervised classification with graph convolutional networks. arXiv 2016, arXiv:1609.02907. [Google Scholar]
Sobolevsky, S.; Belyi, A. Graph neural network inspired algorithm for unsupervised network community detection. Appl. Netw. Sci. 2022, 7, 63. [Google Scholar] [CrossRef]
Velickovic, P.; Cucurull, G.; Casanova, A.; Romero, A.; Lio, P.; Bengio, Y. Graph attention networks. Stat 2017, 1050, 10–48550. [Google Scholar]
Kempinska, K.; Murcio, R. Modelling urban networks using Variational Autoencoders. Appl. Netw. Sci. 2019, 4, 114. [Google Scholar] [CrossRef]
Pagani, A.; Mehrotra, A.; Musolesi, M. Graph input representations for machine learning applications in urban network analysis. Environ. Plan. B Urban Anal. City Sci. 2021, 48, 741–758. [Google Scholar] [CrossRef]
Huang, W.; Zhang, D.; Mai, G.; Guo, X.; Cui, L. Learning urban region representations with POIs and hierarchical graph infomax. ISPRS J. Photogramm. Remote Sens. 2023, 196, 134–145. [Google Scholar] [CrossRef]
Kim, N.; Yoon, Y. Effective urban region representation learning using heterogeneous urban graph attention network (hugat). arXiv 2022, arXiv:2202.09021. [Google Scholar]
Mishina, M.; Sobolevsky, S.; Kovtun, E.; Khrulkov, A.; Belyi, A.; Budennyy, S.; Mityagin, S. Prediction of Urban Population-Facilities Interactions with Graph Neural Network. In Computational Science and Its Applications—ICCSA 2023; Lecture Notes in Computer Science; Springer: Cham, Switzerland, 2023; Volume 13956. [Google Scholar] [CrossRef]
Yan, X.; Ai, T.; Yang, M.; Yin, H. A graph convolutional neural network for classification of building patterns using spatial vector data. ISPRS J. Photogramm. Remote. Sens. 2019, 150, 259–273. [Google Scholar] [CrossRef]
Xu, Y.; Jin, S.; Chen, Z.; Xie, X.; Hu, S.; Xie, Z. Application of a graph convolutional network with visual and semantic features to classify urban scenes. Int. J. Geogr. Inf. Sci. 2022, 36, 2009–2034. [Google Scholar] [CrossRef]
Natterer, E.; Engelhardt, R.; Hörl, S.; Bogenberger, K. Graph Neural Network Approach to Predict the Effects of Road Capacity Reduction Policies: A Case Study for Paris, France. arXiv 2024, arXiv:2408.06762. [Google Scholar]
Zhang, Y.; Dong, X.; Shang, L.; Zhang, D.; Wang, D. A multi-modal graph neural network approach to traffic risk forecasting in smart urban sensing. In Proceedings of the 2020 17th Annual IEEE International Conference on Sensing, Communication, and Networking (SECON), Como, Italy, 22–25 June 2020; pp. 1–9. [Google Scholar]
Zhai, X.; Jiang, J.; Dejl, A.; Rago, A.; Guo, F.; Toni, F.; Sivakumar, A. Heterogeneous Graph Neural Networks with Post-hoc Explanations for Multi-modal and Explainable Land Use Inference. arXiv 2024, arXiv:2406.13724. [Google Scholar]
Khulbe, D.; Belyi, A.; Mikeš, O.; Sobolevsky, S. Mobility Networks as a Predictor of Socioeconomic Status in Urban Systems. In Computational Science and Its Applications—ICCSA 2023; Springer: Cham, Switzerland, 2023; pp. 139–157. [Google Scholar]
Center for Economic Studies at the U.S. Census Bureau. Longitudinal Employer-Household Dynamics. 2023. Available online: https://lehd.ces.census.gov/ (accessed on 9 August 2023).
U.S. Census Bureau. American Community Survey Data. 2023. Available online: https://www.census.gov/programs-surveys/acs/data.html (accessed on 9 August 2023).
NYC Open Data. NYC 311 Data. 2023. Available online: https://data.cityofnewyork.us/Social-Services/NYC-311-Data/jrb2-thup (accessed on 9 August 2023).
Sium, Y.; Kollias, G.; Idé, T.; Das, P.; Abe, N.; Lozano, A.; Li, Q. Direction aware positional and structural encoding for directed graph neural networks. In Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Rhodes Island, Greece, 4–10 June 2023. [Google Scholar]
Dwivedi, V.; Bresson, X. A Generalization of Transformer Networks to Graphs. arXiv 2020, arXiv:2012.09699. [Google Scholar]
Rampášek, L.; Galkin, M.; Dwivedi, V.P.; Luu, A.T.; Wolf, G.; Beaini, D. Recipe for a General, Powerful, Scalable graph transformer. Adv. Neural Inf. Process. Syst. 2022, 35, 14501–14515. [Google Scholar]
Kreuzer, D.; Beaini, D.; Hamilton, W.; Létourneau, V.; Tossou, P. Rethinking graph transformers with spectral attention. Adv. Neural Inf. Process. Syst. 2021, 34, 21618–21629. [Google Scholar]
Ying, C.; Cai, T.; Luo, S.; Zheng, S.; Ke, G.; He, D.; Shen, Y.; Liu, T.Y. Do transformers really perform badly for graph representation? In Proceedings of the Advances in Neural Information Processing Systems, Virtual, 6–14 December 2021. [Google Scholar]
Li, P.; Wang, Y.; Wang, H.; Leskovec, J. Distance encoding: Design provably more powerful neural networks for graph representation learning. In Proceedings of the Advances in Neural Information Processing Systems, Virtual, 6–12 December 2020; Volume 33, pp. 4465–4478. [Google Scholar]
Dwivedi, V.P.; Luu, A.T.; Laurent, T.; Bengio, Y.; Bresson, X. Graph neural networks with learnable structural and positional representations. In Proceedings of the International Conference on Learning Representations, Virtual, 25–29 April 2022. [Google Scholar]
Brüel-Gabrielsson, R.; Yurochkin, M.; Solomon, J. Rewiring with positional encodings for graph neural networks. arXiv 2022, arXiv:2201.12674. [Google Scholar]
Xu, F.; Lin, Z.; Xia, T.; Guo, D.; Li, Y. SUME: Semantic-enhanced Urban Mobility Network Embedding for User Demographic Inference. Proc. ACM Interact. Mob. Wearable Ubiquitous Technol. 2020, 4, 98. [Google Scholar] [CrossRef]
Chandra, D.K.; Leopold, J.; Fu, Y. NodeSense2Vec: Spatiotemporal Context-Aware Network Embedding for Heterogeneous Urban Mobility Data. In Proceedings of the 2021 IEEE International Conference on Big Data (Big Data), Virtual, 15–18 December 2021; pp. 2884–2893. [Google Scholar] [CrossRef]
Jain, A.K.; Dubes, R.C. Algorithms for Clustering Data; Prentice-Hall, Inc.: Englewood Cliffs, NJ, USA, 1988. [Google Scholar]
Dong, L.; Duarte, F.; Duranton, G.; Santi, P.; Barthelemy, M.; Batty, M.; Bettencourt, L.; Goodchild, M.; Hack, G.; Liu, Y.; et al. Defining a city—Delineating urban areas using cell-phone data. Nat. Cities 2024, 1, 117–125. [Google Scholar] [CrossRef]
He, M.; Bogomolov, Y.; Khulbe, D.; Sobolevsky, S. Distance deterrence comparison in urban commute among different socioeconomic groups: A normalized linear piece-wise gravity model. J. Transp. Geogr. 2023, 113, 103732. [Google Scholar] [CrossRef]
Bogomolov, Y.; He, M.; Khulbe, D.; Sobolevsky, S. Impact of income on urban commute across major cities in US. Procedia Comput. Sci. 2021, 193, 325–332. [Google Scholar] [CrossRef]
Xu, K.; Hu, W.; Leskovec, J.; Jegelka, S. How powerful are graph neural networks? arXiv 2018, arXiv:1810.00826. [Google Scholar]
Hamilton, W.; Ying, Z.; Leskovec, J. Inductive representation learning on large graphs. Adv. Neural Inf. Process. Syst. 2017, 30, 1025–1035. [Google Scholar]
Hicks, N.; Streeten, P. Indicators of development: The search for a basic needs yardstick. World Dev. 1979, 7, 567–580. [Google Scholar] [CrossRef]
Li, Y.; Hyder, A.; Southerland, L.T.; Hammond, G.; Porr, A.; Miller, H.J. 311 Service Requests as Indicators of Neighborhood Distress and Opioid Use Disorder. Sci. Rep. 2020, 10, 14334. [Google Scholar] [CrossRef] [PubMed]
Kontokosta, C.; Hong, B.; Korsberg, K. Equity in 311 reporting: Understanding socio-spatial differentials in the propensity to complain. arXiv 2017, arXiv:1710.02452. [Google Scholar]
Chen, D.T. The Science of Smart Growth. Sci. Am. 2000, 283, 84–91. [Google Scholar] [CrossRef]
Leslie, T.F.; Ó hUallacháin, B. Polycentric Phoenix. Econ. Geogr. 2006, 82, 167–192. [Google Scholar] [CrossRef]
Wang, D.; Cui, P.; Zhu, W. Structural Deep Network Embedding. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD’16), San Francisco, CA, USA, 13–17 August 2016; pp. 1225–1234. [Google Scholar] [CrossRef]
Chang, S.; Han, W.; Tang, J.; Qi, G.J.; Aggarwal, C.C.; Huang, T.S. Heterogeneous Network Embedding via Deep Architectures. In Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD’15), Sydney, Australia, 10–13 August 2015; pp. 119–128. [Google Scholar] [CrossRef]
Yap, W.; Stouffs, R.; Biljecki, F. Urbanity: Automated modelling and analysis of multidimensional networks in cities. npj Urban Sustain. 2023, 3, 45. [Google Scholar] [CrossRef]
Yap, W.; Biljecki, F. A Global Feature-Rich Network Dataset of Cities and Dashboard for Comprehensive Urban Analyses. Sci. Data 2023, 10, 667. [Google Scholar] [CrossRef]
Boston 311 Data. 2023. Available online: https://data.boston.gov/dataset/311-service-requests/resource/f53ebccd-bc61-49f9-83db-625f209c95f5 (accessed on 9 August 2023).
Chicago 311 Data. 2023. Available online: https://data.cityofchicago.org/Service-Requests/311-Service-Requests/v6vf-nfxy (accessed on 9 August 2023).
Brin, S. The PageRank citation ranking: Bringing order to the web. Proc. ASIS 1998, 98, 161–172. [Google Scholar]
Jia, C.; Du, Y.; Wang, S.; Bai, T.; Fei, T. Measuring the vibrancy of urban neighborhoods using mobile phone data with an improved PageRank algorithm. Trans. GIS 2019, 23, 241–258. [Google Scholar] [CrossRef]
Jiang, B. Ranking spaces for predicting human movement in an urban environment. Int. J. Geogr. Inf. Sci. 2009, 23, 823–837. [Google Scholar] [CrossRef]
Tibshirani, R. Regression shrinkage and selection via the lasso. J. R. Stat. Soc. Ser. B (Methodol.) 1996, 58, 267–288. [Google Scholar] [CrossRef]
Breiman, L. Random forests. Mach. Learn. 2001, 45, 5–32. [Google Scholar] [CrossRef]
Friedman, J.H. Greedy function approximation: A gradient boosting machine. Ann. Stat. 2001, 29, 1189–1232. [Google Scholar] [CrossRef]

Figure 1. Median income distribution across NYC census tracts. (a) LEHD commute flow network. (b) Network structure and median income data for NYC.

Figure 2. Clustering of census tracts in cities based on SVD embedding of mobility networks—spatial patterns reveal distinction between high-income and low-income neighborhoods.

Figure 3. VNN-based embedding model—socioeconomic modeling is achieved in a two-step process: 1. trainable regions’ embeddings are processed and fed to an MLP to reconstruct the original O-D mobility flow matrix, 2. the learned embeddings are then fed as input to an MLP for modeling median income.

Figure 4. Graph-based models—GCN/GAT layers are stacked with the MLP layers to directly model the socioeconomic feature.

Table 1. Commute network statistics for 12 cities.

City	Nodes	Non-Zero Edges	Avg. Edge Weight
New York	2157	976,832	0.69
Chicago	1318	439,553	1.06
Boston	520	127,357	3.5
Austin	218	34,777	8.63
Dallas	529	129,352	2.83
Los Angeles	2341	1,171,362	0.65
San Antonio	366	83,192	4.77
San Diego	627	180,781	2.97
San Jose	372	81,938	4.73
Philadelphia	384	68,119	2.57
Phoenix	916	349,894	2.10
Houston	786	290,496	2.50

Table 2. Out-of-sample R-2 values with the proposed methods for modeling median income across 12 U.S. cities. The proposed models are able to outperform the models based on 311 features in cities. Four sets of network embeddings are considered as inputs to the models. Spatial and SVD embedding are found to be the most effective. The stability of results with embeddings (with margin of errors) is presented in Appendix A.3: Experiments with Embedding: Dimensionality. (Bold represents best R2 score.)

	Comparison Benchmark [8]	VNN \| (GNN + VNN) \| (GAT + VNN)
	311 Features	Spatial	SVD	LE	Random Walk
NYC	0.49	0.55 \| 0.58 \| 0.58	0.30 \| 0.46 \| 0.29	0.33 \| 0.20 \| 0.20	0.35 \| 0.45 \| 0.44
LA	0.16	0.13 \| 0.32 \| 0.28	0.11 \| 0.31 \| 0.31	0.1 \| 0.24 \|0.24	0.1 \| 0.21 \| 0.20
Chicago	0.59	0.7 \| 0.69 \| 0.68	0.3 \| 0.44 \| 0.44	0.5 \| 0.15 \| 0.14	0.56 \| 0.61 \| 0.60
Boston	0.15	0.35 \| 0.50 \| 0.50	0.3 \| 0.28 \| 0.25	0.44 \| 0.14 \| 0.10	0.35 \| 0.21 \| 0.15
Philadelphia	0.51	0.28 \| 0.33 \| 0.33	0.51 \| 0.55 \| 0.52	0.3 \| 0.31 \| 0.3	0.3 \| 0.45 \| 0.45
Houston	NA	0.23 \| 0.17 \| 0.15	0.36 \| 0.42 \| 0.40	0.19 \| 0.18 \| 0.18	0.25 \| 0.39 \| 0.39
Dallas	0.37	0.33 \| 0.27 \| 0.26	0.58 \| 0.58 \| 0.56	0.38 \| 0.08 \| 0.05	0.36 \| 0.41 \| 0.40
Austin	0.28	0.43 \| 0.38 \| 0.38	0.59 \| 0.57 \| 0.53	0.33 \| 0.04 \| 0.07	0.36 \| 0.31 \| 0.34
San Jose	0.46	0.21 \| 0.4 \| 0.4	0.75 \| 0.42 \| 0.4	0.21 \| 0.21 \| 0.24	0.36 \| 0.03 \| 0.09
San Diego	0.36	0.16 \| 0.26 \| 0.28	0.43 \| 0.34 \| 0.32	0.27 \| 0.26 \| 0.25	0.36 \| 0.41 \| 0.33
San Antonio	NA	0.3 \| 0.48 \| 0.41	0.52 \| 0.52 \| 0.33	0.3 \| 0.04 \| 0.03	0.14 \| 0.34 \| 0.3
Phoenix	NA	0.14 \| 0.25 \| 0.25	0.15 \| 0.26 \| 0.25	0.21 \| 0.25 \| 0.24	0.15 \| 0.26 \| 0.25

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Khulbe, D.; Belyi, A.; Sobolevsky, S. Commute Networks as a Signature of Urban Socioeconomic Performance: Evaluating Mobility Structures with Deep Learning Models. Smart Cities 2025, 8, 125. https://doi.org/10.3390/smartcities8040125

AMA Style

Khulbe D, Belyi A, Sobolevsky S. Commute Networks as a Signature of Urban Socioeconomic Performance: Evaluating Mobility Structures with Deep Learning Models. Smart Cities. 2025; 8(4):125. https://doi.org/10.3390/smartcities8040125

Chicago/Turabian Style

Khulbe, Devashish, Alexander Belyi, and Stanislav Sobolevsky. 2025. "Commute Networks as a Signature of Urban Socioeconomic Performance: Evaluating Mobility Structures with Deep Learning Models" Smart Cities 8, no. 4: 125. https://doi.org/10.3390/smartcities8040125

APA Style

Khulbe, D., Belyi, A., & Sobolevsky, S. (2025). Commute Networks as a Signature of Urban Socioeconomic Performance: Evaluating Mobility Structures with Deep Learning Models. Smart Cities, 8(4), 125. https://doi.org/10.3390/smartcities8040125

	Comparison Benchmark [8]	VNN \| (GNN + VNN) \| (GAT + VNN)
	311 Features	Spatial	SVD	LE	Random Walk
NYC	0.49	0.55 \| 0.58 \| 0.58	0.30 \| 0.46 \| 0.29	0.33 \| 0.20 \| 0.20	0.35 \| 0.45 \| 0.44
LA	0.16	0.13 \| 0.32 \| 0.28	0.11 \| 0.31 \| 0.31	0.1 \| 0.24 \|0.24	0.1 \| 0.21 \| 0.20
Chicago	0.59	0.7 \| 0.69 \| 0.68	0.3 \| 0.44 \| 0.44	0.5 \| 0.15 \| 0.14	0.56 \| 0.61 \| 0.60
Boston	0.15	0.35 \| 0.50 \| 0.50	0.3 \| 0.28 \| 0.25	0.44 \| 0.14 \| 0.10	0.35 \| 0.21 \| 0.15
Philadelphia	0.51	0.28 \| 0.33 \| 0.33	0.51 \| 0.55 \| 0.52	0.3 \| 0.31 \| 0.3	0.3 \| 0.45 \| 0.45
Houston	NA	0.23 \| 0.17 \| 0.15	0.36 \| 0.42 \| 0.40	0.19 \| 0.18 \| 0.18	0.25 \| 0.39 \| 0.39
Dallas	0.37	0.33 \| 0.27 \| 0.26	0.58 \| 0.58 \| 0.56	0.38 \| 0.08 \| 0.05	0.36 \| 0.41 \| 0.40
Austin	0.28	0.43 \| 0.38 \| 0.38	0.59 \| 0.57 \| 0.53	0.33 \| 0.04 \| 0.07	0.36 \| 0.31 \| 0.34
San Jose	0.46	0.21 \| 0.4 \| 0.4	0.75 \| 0.42 \| 0.4	0.21 \| 0.21 \| 0.24	0.36 \| 0.03 \| 0.09
San Diego	0.36	0.16 \| 0.26 \| 0.28	0.43 \| 0.34 \| 0.32	0.27 \| 0.26 \| 0.25	0.36 \| 0.41 \| 0.33
San Antonio	NA	0.3 \| 0.48 \| 0.41	0.52 \| 0.52 \| 0.33	0.3 \| 0.04 \| 0.03	0.14 \| 0.34 \| 0.3
Phoenix	NA	0.14 \| 0.25 \| 0.25	0.15 \| 0.26 \| 0.25	0.21 \| 0.25 \| 0.24	0.15 \| 0.26 \| 0.25

Article Menu

Commute Networks as a Signature of Urban Socioeconomic Performance: Evaluating Mobility Structures with Deep Learning Models

Abstract

Highlights

Abstract

1. Introduction

2. Materials and Methods

2.1. Data Overview

2.2. Methods

2.2.1. Network Embedding as Model Inputs

2.2.2. Evaluating Mobility Networks—VNN Based Embedding

2.2.3. Single-Pipeline Modeling: GNN + VNN Framework

2.3. Experiments

3. Results

4. Discussion

4.1. Interpretation of Embedding Configurations

4.2. Graph vs. Non-Graph Models

4.3. Policy Implications and Limitations

5. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

Abbreviations

Appendix A

Appendix A.1. Data

Appendix A.2. Network Embedding Clustering

Appendix A.3. Experiments with Embedding: Dimensionality

Appendix A.4. Comparison with Classical ML Models

Appendix A.5. Concatenation and Modeling with Neighborhood-Level Features

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI