Previous Article in Journal
Pedestrian Trajectory Prediction Based on Delaunay Triangulation and Density-Adaptive Higher-Order Graph Convolutional Network
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Data–Knowledge Collaborative Learning Framework for Cellular Traffic Forecasting via Enhanced Correlation Modeling

1
School of Geosciences and Info-Physics, Central South University (CSU), Changsha 410083, China
2
Yunnan Key Laboratory of Intelligent Monitoring and Spatio-Temporal Big Data Governance of Natural Resources, Kunming 650093, China
3
Yunnan Institute of Geology and Mineral Surveying and Mapping Co., Ltd., Kunming 650011, China
4
Hunan Geospatial Information Engineering and Technology Research Center, Changsha 410119, China
5
School of Computer Science and Technology, Central South University (CSU), Changsha 410083, China
*
Author to whom correspondence should be addressed.
ISPRS Int. J. Geo-Inf. 2026, 15(1), 43; https://doi.org/10.3390/ijgi15010043
Submission received: 20 November 2025 / Revised: 10 January 2026 / Accepted: 12 January 2026 / Published: 16 January 2026

Abstract

Forecasting the spatio-temporal evolutions of cellular traffic is crucial for urban management. However, achieving accurate forecasting is challenging due to “complex correlation modeling” and “model-blindness” issues. Specifically, cellular traffic is generated within complex urban systems characterized by an intricate structure and human mobility. Existing approaches, often based on proximity or attributes, struggle to learn the latent correlation matrix governing traffic evolution, which limits forecasting accuracy. Furthermore, while substantial knowledge about urban systems can supplement the modeling of correlations, existing methods for integrating this knowledge—typically via loss functions or embeddings—overlook the synergistic collaboration between data and knowledge, resulting in weak model robustness. To address these challenges, we develop a data–knowledge collaborative learning framework termed the knowledge-empowered spatio-temporal neural network (KESTNN). This framework first extracts knowledge triplets representing urban structures to construct a knowledge graph. Representation learning is then conducted to learn the correlation matrix. Throughout this process, data and knowledge are integrated collaboratively via backpropagation, contrasting with the forward feature injection methods typical of existing approaches. This mechanism ensures that data and knowledge directly guide the dynamic updating of model parameters through backpropagation, rather than merely serving as a static feature prompt, thereby fundamentally alleviating the “model-blindness” issue. Finally, the optimized matrix is embedded into a forecasting module. Experiments on the Milan dataset demonstrate that the KESTNN exhibits excellent forecast performance, reducing RMSE by up to 23.91%, 16.73%, and 10.40% for 3-, 6-, and 9-step forecasts, respectively, compared to the best baseline.

1. Introduction

Cellular traffic comprises records of activities conducted via mobile devices on cellular networks. These activities primarily include call operations (incoming and outgoing), SMS transactions (inbound and outbound), and internet usage [1]. Cellular traffic data collected from cellular networks are typically processed as spatio-temporal distributions of activity intensity. Given the near-universal adoption of mobile phones, cellular traffic effectively reflects urban dynamics. Simultaneously, it provides an accurate and granular description of the spatio-temporal distribution of human activity intensity within urban spaces [2]. As shown in Figure 1a, three areas in Milan exhibit distinct cellular traffic patterns, each with unique dynamic characteristics. Among them, the Leonardo Hotel, located in the city center, demonstrates more active nighttime cellular traffic. Parco Sempione, situated in a park area, generally exhibits lower cellular traffic throughout the day. Centrale FS, a railway station, exhibits higher cellular traffic on weekends than weekdays. Thus, the differential dynamic variations in cellular traffic reflect specific communication patterns across different areas, thereby revealing disparities and dynamic changes in human activity intensity. Consequently, establishing spatio-temporal process models for cellular traffic and predicting its evolution trends holds significant implications for network resource management and urban planning [3].
In GIS research, cellular traffic forecasting constitutes a spatio-temporal forecasting task. Although various forecasting models—such as Kriging [4], geographically weighted regression [5], and spatio-temporal autoregressive integrated moving average (STARIMA) [6]—have been developed in GIS research, they remain limited in modeling complex urban systems. Specifically, these GIS models rely heavily on a correlation matrix that records similarities among spatial locations [7]. According to Tobler’s first law of geography, “Everything is related to everything else, but near things are more related than distant things” [8]; similarity holds significant importance in physical geography (e.g., topography, climate, and vegetation). For example, in Figure 1b, all three areas are located within parks. Although Arco della Pace is geographically closer to Parco Sempione, its traffic flow variation shows greater similarity to that of Parco Indro Montanelli. Therefore, Indro Montanelli provides greater forecasting gain for Parco Sempione’s flow. However, cellular traffic is distributed across urban spaces—a complex system shaped by diverse human mobility patterns [9,10,11]. In this context, the similarity structure becomes more intricate, making the correlation matrix harder to estimate and consequently limiting cellular traffic forecasting performance [12]. As illustrated in Figure 1c, locations A and B on the same road exhibit similar internet usage patterns, whereas location C—which is not on the same road—displays distinct traffic patterns. Therefore, the characteristics of location A prove highly useful when forecasting traffic at location B. We term this challenge as “complex correlation modeling”. Addressing this gap requires consideration of not only spatial proximity (per Tobler’s law) but also underlying connections among urban facilities. Despite diverse data-driven methods, such as transformer and graph attention networks which are developed to generate correlation matrices adaptively, they remain constrained by data availability [13,14,15]. Incomplete data—a common issue in urban research—yields non-informative matrices. For example, the MVCV-Traffic [16] integrates traffic, environmental, and time series data for dynamic correlation modeling, where missing any data may lead to incorrect results. Therefore, how to learn the underlying correlations among urban spaces from accessible urban facility data remains an unresolved challenge in cellular traffic forecasting research.
Another challenge in cellular traffic forecasting lies in effectively incorporating knowledge. With the fast advancement of deep learning technologies, neural network–based data-driven models have greatly outperformed traditional statistical approaches. As a result, they have become widely adopted in forecasting studies, particularly for tasks involving human-related forecasts [17,18]. The reason is that the parameters and structure of statistical models must follow a certain assumption (such as the stationarity and isotropy assumptions in Kriging models), making these models inapplicable in human-related forecasting, as the human mobility is highly uncertain and variable [19]. In this work, the spatio-temporal pattern of cellular traffic is regarded as a representative example of human-related activity. To forecast this phenomenon accurately, various neural network models are employed, including convolutional neural networks [20], graph neural networks [21,22], meta learning networks [23], and transfer learning networks [24]. These studies tend to develop deeper (more hidden layers) and wider (more trainable parameters) neural networks to seek for better performance; however, they ignore the utilization of the knowledge. Knowledge in cellular traffic forecasting indicates the prior understanding and experience about the human mobility or urban configurations, such as the spatial heterogeneity and the urban topology [25]. Leveraging this knowledge can help to design more effective and efficient models to make accurate and interpretable forecasting. Most of the existing studies overlook the informative knowledge. Although some studies attempt to integrate conceptual knowledge (e.g., spatial heterogeneity) or rule-based knowledge (e.g., physical equations) through loss functions or vector embeddings, these methods are often overly rigid and simplistic (loss functions struggle to model higher-order, nonlinear knowledge, and knowledge utilization is crude) or exert too limited an influence on the model (embedding knowledge into the forward propagation process means it only serves as a prompt, unable to guide dynamic model updates) [26,27]. A common limitation arises when knowledge representations are simply concatenated with data features, obscuring the important role of knowledge in the spatial learning process—a challenge we term the “model-blindness” problem. The recently proposed DKNN framework [28] integrates the Kriging interpolation mechanism from geostatistics with deep neural networks by constructing an asymmetric encoder–decoder structure. This enables the explicit modeling and backpropagation of structural information during the interpolation process. Bai et al. [29] introduced a technique for learning correlation matrices via backpropagation. The data–knowledge collaborative mechanism and correlation matrix learning approach they introduced provide us with theoretical support. Based on this, our research focuses on developing a data–knowledge collaborative learning framework to correctly distinguish and leverage these two information types, thereby enhancing forecasting accuracy and interpretability.
To address both the “complex correlation modeling” and “model-blindness” challenges in cellular traffic forecasting, we present a fresh perspective: incorporating knowledge allows data-driven models to more effectively grasp the intrinsic correlations within urban systems. For example, giving two knowledge triplets (Region A is close to Commercial center) and (Region B is close to Commercial center), the model can adaptively capture connections between Region A and B through data-driven learning. In this way, data and knowledge play distinct yet complementary roles in forecasting. Building on this foundation, we introduce KESTNN, a knowledge-empowered spatio-temporal neural network, tailored for cellular traffic forecasting. This framework comprises three key components: (1) a geospatial knowledge graph, (2) a knowledge-guided attention module, and (3) a data–knowledge collaborative learning forecasting network. The framework incorporates prior knowledge about urban structure, particularly spatial relationships between urban facilities, represented as triplets in the knowledge graph. These triplets feed into an attention module via backpropagation, which utilizes cellular traffic data and established knowledge to estimate and update the correlation matrix for urban spaces. Finally, a temporal neural network utilizes this refined correlation matrix for accurate cellular traffic forecasting.
Our key contributions are distilled below:
  • We devise a hybrid learning paradigm that fuses data-driven signals with prior knowledge for spatio-temporal forecasting. Data and knowledge are two main information sources for forecast learning. While previous studies have integrated both data and knowledge, the application of knowledge and its collaboration with data-driven processes must be more targeted to mitigate the “model-blindness” challenge. In our framework, we extract knowledge triplets representing urban structures. These triplets characterize latent similarities across urban spaces and are embedded into the forecast learning process, enabling the collaborative learning of data and knowledge through backpropagation while estimating a correlation matrix. This approach to knowledge-enhanced correlation modeling offers a new perspective for collaborative data–knowledge forecast learning.
  • Tests on the Milan communication dataset reveal that the proposed scheme sharply lifts the cellular traffic forecasting quality: KESTNN cuts the RMSE of the best rival by 23.91% at horizon-3, 16.73% at horizon-6, and 10.40% at horizon-9, while also surpassing prior work in MAE and R2, evidencing superior stability and generalization. Furthermore, spatio-temporal error analysis validated the model’s performance advantages in complex scenarios involving sudden changes and weak correlations. Knowledge ablation experiments analyzed the critical roles of POI semantics, road networks, and administrative hierarchy knowledge in achieving stable short-term, long-term, and multi-step forecasts demonstrating the effectiveness of knowledge. Moreover, KESTNN maintains low error rates during both daytime high-dynamic periods and nighttime stable periods, significantly outperforming other comparison methods. This fully demonstrates the effectiveness and necessity of the data–knowledge collaborative learning framework in cellular traffic forecasting.
Section 2 surveys prior studies on cellular traffic forecasting. Section 3 unpacks the design of our new architecture. Experimental validation with Milan’s cellular traffic dataset is reported and discussed in Section 4, while Section 5 closes the paper and outlines future research avenues.

2. Related Works

Current cellular traffic forecasting models fall mainly into two camps: classical machine learning approaches and deep learning systems [30,31,32].

2.1. Machine Learning Methods for Cellular Traffic Forecasting

Ongoing progress in machine learning techniques has markedly elevated the precision of cellular traffic forecasting [33,34,35]. Initial forecasting frameworks focused mainly on classic time series extrapolation. Models such as the historical average (HA) model [36] and autoregressive integrated moving average model (ARIMA) model [37] demonstrated stable performance in capturing linear trends and cyclical patterns, leading to their widespread adoption. Subsequently, more methods focused on addressing the multilinear nature of time series forecasting tasks. The multilinear autoregressive (AR) model demonstrated superior performance, typically outperforming relatively simpler AR and autoregressive moving average (ARMA) models [38]. To better tackle challenges such as capturing time-varying cycles in cellular traffic, Tran et al. [39] proposed the exponential smoothing method. This approach not only features lower computational complexity but also effectively accommodates multi-seasonal cycles. To address temporal heterogeneity, local linear regression (LLR) employs locally weighted regression to capture time dependence, significantly enhancing forecasting performance [40]. To address temporal nonlinear dependencies, the support vector regression (SVR) model fits data by identifying an optimal hyperplane and employs kernel functions for regression. This approach maps nonlinear data to high-dimensional spaces, demonstrating strong performance in cellular traffic forecasting tasks [41].

2.2. Deep Learning Methods for Cellular Traffic Forecasting

Driven by deep learning’s swift progress, numerous neural architectures now tackle cellular traffic forecasting. Many scholars utilize the recurrent neural network (RNN), the long short-term memory network (LSTM), and the smoothed long short-term memory network (SLSTM) to forecast cellular traffic, underscoring the pivotal role that RNNs play in modeling temporal dynamics [3,42,43,44]. Compared with the relative maturity of temporal dimension modeling, the effective modeling of spatial dependencies in cellular traffic forecasting still faces significant challenges. Current methods for capturing spatial dependencies primarily include GNN-based methods, attention-based methods, and knowledge graph-based methods. The GNN-based methods mainly include the global-local spatio-temporal transformer network (GLSTTN) [13], forecasting cellular traffic by leveraging explicit inductive graph-based learning (FLEXIBLE) [45], etc. GLSTTN distills cellular traffic dynamics by intertwining global and local spatio-temporal blocks, which has a better performance on real cellular traffic datasets [13]. FLEXIBLE devises an inductive graph-learning strategy tailored for data-sparse conditions [45]. The temporal sequence enhancement network (TSENet) model is an attention-based approach. The model consists of a transformer and a self-attention network specialized in extracting time series features of cellular traffic [46]. Knowledge graph-based approaches usually model spatial objects in a knowledge graph. Then, corresponding embedded representations are generated based on the knowledge graph to provide information gain for spatio-temporal forecasting tasks. However, such methods do not consider the complex interactions and multiple semantic attributes of spatial objects, making it difficult to mine the correct implicit spatial structure [47,48]. There are also some knowledge graph-based methods that model the complex semantic relationships and social attributes of spatial objects. However, they do not effectively leverage this knowledge to guide the learning of spatial structures [49].

2.3. Large Language Model Methods for Cellular Traffic Forecasting

With the rapid evolution of large language models (LLMs), these models have also pioneered a new paradigm in spatio-temporal forecasting. The core challenge in enabling LLMs for spatio-temporal forecasting lies in coupling textual symbols with spatio-temporal signals [50]. The current research primarily encompasses two approaches: LLM-oriented temporal methods and temporal-oriented LLM methods. LLM-oriented temporal methods minimize intervention on pre-trained LLMs. Time-LLM [51] maps time series to textual prototype embedding spaces via learnable reprogramming modules, converting numerical values into signals understandable by LLMs. TEST [52] employs contrastive learning to align time series segments with textual prototypes in a shared semantic space, activating the LLM’s comprehension capabilities. TEMPO [53] constructs forecasting tasks for transformer frameworks by designing specialized prompt templates. Time-oriented LLM approaches adopt deeper architectural integration. ST-LLM [54] combines GNNs with LLM networks to explicitly capture dependencies within traffic networks. STH-SepNet [55] enhances efficiency by decoupling temporal dynamics learning from complex spatial correlation learning. Furthermore, the agent-based collaborative paradigm treats LLMs not merely as forecasting tools. For instance, the Timecap [56] framework deploys two cooperating LLM agents: one for translating spatio-temporal data into textual semantics, and another for performing semantically grounded forecasts. This paradigm explicitly separates knowledge generation from forecasting functions, representing a seminal work in data–knowledge collaboration.

3. Methodology

3.1. Construction of Urban Geospatial Structure Knowledge Graph

In this research, knowledge is defined as prior understanding and experience of urban structure, which is encoded as a triple (head, relation, tail). To surface such knowledge, we first define the relevant entities, relationships, and attribute information, as summarized in Table 1, Table 2 and Table 3. Specifically, we consider four entity types: (1) region, (2) district, (3) POI, and (4) road. Based on these entities, we define five relationship types: (1) Adjacency relationship, (2) Location-at relationship, (3) Spatial intersection relationship, (4) Spatial containment relationship, (5) Semantic subordination relationship. The POI, road, and district data are sourced from OpenStreetMap (OSM). We extract spatial relationships using spatial analysis approach and align entities by geographic coordinates and name similarity. For example, we utilize spatial intersection analysis to extract “adjacent to”, “located at”, and “intersects with” relationships; spatial containment analysis to extract “contained within” relationships; and spatial query analysis to extract “subordinate to” relationships. Duplicate triples are removed, and a unified ID system is applied to ensure consistency.
These structured triplets provide multidimensional prior knowledge for correlation matrix learning: spatial continuity (Region adjacent to Region), administrative boundary constraint effects (District adjacent to District), semantic information and cell mapping (POI located at Region), cross-regional environmental similarity (POI located at District), physical accessibility of cells (Road intersects with Region), and administrative affiliation of urban entities (Region contained within District and Road subordinate to District). Collectively, these elements enable the model to capture complex dependencies in cellular traffic that transcend mere spatial proximity.
We construct the urban structural knowledge graph G = { ( h , r , t ) | h , t E , r R } by extracting triples from multi-source geospatial data (OSM POIs, road networks, administrative boundaries) according to the schema in Table 1, Table 2 and Table 3. Spatial analysis operations (intersection, containment, adjacency) and semantic rules are applied to instantiate the relationships defined in Table 2. Entity alignment is performed based on geographic coordinates and name similarity, followed by deduplication and consistency checks to ensure a clean and unified graph. The resulting knowledge graph is depicted in Figure 2.

3.2. Data–Knowledge Collaborative Representation Learning

The prerequisite for data–knowledge collaborative learning is to project heterogeneous data and knowledge into a unified computable vector space. We employ transformative embedding to project entity and relation embedding vectors e h R m , e t R m , and e r R m into the relation space [57], as shown in Figure 3. For each triplet ( e h r , e r , e t r ) in the relation space, embedding vectors are learned by minimizing the distance-based scoring function ε ( h ,   r ,   t ) . This process can be expressed as follows:
e h r = e h M r ,   e t r = e t M r
ε ( h ,   r ,   t ) = | | e h r + e r e t r | | 2 2
argmin e h , e t , e r L e m b e d = ( h ,   r ,   t ) S ε ( h ,   r ,   t )
where M r R m × n is the relation-specific projection matrix. The overall embedding loss L e m b e d is trained via negative sampling [57]:
L e m b e d = ( h ,   r ,   t ) S ( h ,   r ,   t ) S m a x [ 0 , ε ( h ,   r ,   t ) + γ ε ( h ,   r ,   t ) ]
Optimizing L e m b e d yields the entity embedding matrix X ( e ) R N × M (where N is the number of cellular units and M is the embedding dimension), which simultaneously encapsulates urban functional attributes and spatial structural semantics.
To jointly model historical traffic patterns and urban prior knowledge, we fuse the knowledge embeddings with the raw cellular traffic features. The traffic data are represented as X ( f ) R T × I × N × F (sequence length T , input window I , feature dimension F ). For each cell at every time step, we concatenate its traffic feature X ( f ) with its corresponding knowledge embedding from X ( e ) :
X t , i , n ( z ) = c o n c a t X t , i , n f ,     X n e ,     t T ,   i I ,   n N
where c o n c a t ( · ) is the concatenate operation. This yields a fused representation X ( z ) R T × I × N × F + M that integrates spatio-temporal signals with urban semantics, as shown in Figure 4.
To enable the model to learn information from both data and knowledge through X ( z ) , we construct a correlation matrix to capture the dynamic dependencies between cellular units. Given the strong dynamic nature of cellular traffic, the correlation matrix varies across different timestamps. Therefore, we need to construct a dynamic correlation matrix A R N × N to replace the traditional static correlation matrix. Since X ( z ) is a high-dimensional vector with substantial noise and redundancy, the resulting graph structure exhibits low stability. Consequently, we do not directly use X ( z ) to generate A . Instead, we introduce a learnable parameter E R N × d (where d is the embedding dimension) to correlate the fused representation X ( z ) with A . Here, E can be understood as a low-dimensional embedding encoded by X ( z ) through a gradient optimization process, enabling the generation of a stable, task-driven A . Its expression is [29]
A = s o f t m a x ( R e L U ( E , E T ) )
where R e L U ( · ) serves as the activation function. During training for the forecasting task Y ^ =   f θ ( X ( z ) ) , E performs gradient optimization (see Equations (15) and (21)) to update the structural parameters of the modified A end-to-end. Ultimately, the model generates the optimal A at each timestamp through backpropagation based on the forecasting process of X ( z ) . At each timestamp, the model adaptively generates the optimal A via backpropagation, driven by minimizing forecasting error.

3.3. Spatio-Temporal Model for Cellular Traffic Forecasting

The core of KESTNN’s spatial dependency modeling lies in all nodes sharing the same convolutional kernel while maintaining node-specific weights. However, traditional GCN weight matrices suffer from excessive parameter complexity and low model efficiency. We decompose the weight matrix Θ R N × C × F for each node into node embedding E and shared weight pool W G R d × C × F , where Θ = E · W G . Here, E remains consistent with the node embedding described earlier, ensuring node specificity. W G enables all nodes to share weights. Similarly, we decompose the bias term b into b = E · b G . Thus, the spatial convolution process is expressed as follows [29]:
Z = ( I N + A ) ) X ( z ) E W G + E b G
where Z denotes the output of the convolution, and I N represents the identity matrix of dimension N .
KESTNN employs a time-dependent modeling approach using gated recurrent unit (GRU). We construct an adaptive graph convolutional recurrent network (AGCRN) based on GRU. At each time step, it receives the current time’s fused features x t ( z ) and the previous time’s hidden state h t 1 as inputs. Through multiple gating mechanisms, it outputs the updated hidden state h t :
A = s o f t m a x ( R e L U ( E , E T ) )
z t = σ ( A [ x t ( z ) ; h t 1 ] E W z + E b z )
r t = σ ( A [ x t ( z ) ; h t 1 ] E W r + E b r )
h ^ t = t a n h ( A [ x t ( z ) ; r t h t 1 ] E W h ^ + E b h ^ )
h t = z t h t 1 + ( 1 z t ) h ^ t
where σ ( · ) denotes the s i g m o i d activation function, denotes the Hadamard product, E is the uniform representation of all learnable embeddings, z t , r t , and h ^ t denote the forgetting gate, the reset gate, and the candidate state, respectively, W z , W r , and W h ^ denote their corresponding weight parameters, and b z , b r , and b h ^ denote their corresponding bias terms.
Combining the above spatio-temporal modeling methods, in order to effectively achieve data–knowledge collaborative learning of complex spatio-temporal relationships, we construct the KESTNN shown in Figure 5. We stack AGCRN layers as encoders to realize data–knowledge collaborative learning of complex spatio-temporal relationships from capturing complex urban spatial information and feature history trends. Next, a CNN decoder processes the encoded representation to forecast upcoming traffic flow in each cellular unit. The decoding formula is expressed as follows:
Y ^ = C N N ( H )
where Y ^ denotes the model’s forecasting result, H represents the encoder’s output, and C N N ( · ) refers to the convolutional neural network.
To tighten the gap between forecasts and ground truth, we constructed the following loss function for the constraints of the spatio-temporal forecasting process:
L P r e d = 1 N T i = 1 T j = 1 N ( y i j y ^ i j ) 2
where y i j is the true value, y ^ i j is the predicted value, N is the number of cellular units, and T is the time step.

3.4. Data–Knowledge Collaborative Learning Mechanism in Backpropagation

During the backpropagation process based on this loss function, the update process for key parameters such as embedded representations E , shared weight matrix W G , and shared bias terms b G can be learned as follows:
E E η · L P r e d E
W W η · L P r e d W
b b η · L P r e d b
where η is the learning rate. The gradient of the above parameters can be expressed as follows:
Δ = L P r e d Z = 2 N T · ( Z Y )
L P r e d W G = [ ( I N + A ) · X ( z ) · E ] T · Δ
L P r e d b = E T · Δ
L P r e d E = [ ( I N + A ) X ( z ) ] T ( Δ · W G ) T + ( Δ · b G ) T + 2 [ ( ( Δ · W G ) T E T ( X ( z ) ) T ) A R S ] E
where Z denotes the output of the graph convolution.
Observing the gradients in Equations (18)–(21), the parameter updates in the spatio-temporal forecasting module involve the fusion feature X ( Z ) . The gradient depends on Z and E , and also relies on X ( Z ) . The parameter gradient indicates that X ( Z ) is effectively utilized during training and influences the update of model parameters. Specifically, X ( Z ) is no longer merely a forward-propagated feature prompt but becomes an explicit driver of error feedback. Based on this, spatial prior is backpropagated into each layer’s learnable tensors, enabling the network to automatically learn stable and optimal correlation matrices for the task scenario while minimizing forecasting errors. Compared to forward concatenation or fixed loss weighting, backpropagation allows knowledge to directly drive dynamic updates of network parameters. This achieves dynamic data–knowledge synergy without requiring additional hyperparameters, significantly enhancing model forecasting accuracy and stability.
In summary, through Equations (18)–(21), gradient flow demonstrates that the fused feature X ( Z ) (incorporating both data and knowledge embeddings) directly participates in updating node embeddings E and the adjacency matrix A . This enables knowledge to dynamically refine the graph structure during backpropagation, achieving genuine synergy between data-driven signals and prior knowledge rather than simple feature concatenation. Consequently, it effectively mitigates the inherent “model-blindness” inherent in traditional forward concatenation or fixed loss weighting methods.

4. Results and Discussions

4.1. Datasets

The cellular traffic data for this study were provided by Telecom Italia [2]. The dataset focuses on analyzing the Internet usage of the Milanese masses. Temporally, the logs span 2 months: 1 November 2013 to 1 January 2014. For the spatial division, the dataset is presented in the form of 100 × 100 grid partitions, each with dimensions of 235 m × 235 m. The dataset is time-stepped every 10 min, and one traffic record is logged for any grid whose users stay online ≥15 min or exchange ≥5 MB. The study area selected for this paper covers a 30 × 30 rectangular grid formed by Square IDs from 5151 to 8080, as shown in Figure 6 below.

4.2. Evaluation Metrics

The forecasting performance of the cellular traffic model is evaluated using a set of established metrics, including root mean square error (RMSE), mean absolute error (MAE), R-squared (R2), and mean absolute percentage error (MAPE). These criteria collectively gauge the model’s accuracy and dependability. For the optimization procedure, the mean square error (MSE) was utilized as the loss function for all models, aiming to minimize the average of the squared differences between the model’s forecasts and the actual observed data. The formulas are as follows:
R M S E = 1 N i = 1 N y i ( t ) y ^ i ( t ) 2
M A E = 1 N i = 1 N y i ( t ) y ^ i ( t )
R 2 = 1 i = 1 N ( y i ( t ) y ^ i ( t ) ) 2 i = 1 N ( y i ( t ) y ¯ i ) 2
M A P E = 100 % N i = 1 N y i ( t ) y ^ i ( t ) y i ( t )
M S E = 1 N i = 1 N ( y i ( t ) y ^ i ( t ) ) 2
where y i ( t ) , y ^ i ( t ) , and y ¯ i are the observed values, estimated values, and the average of the observed values at the cellular cell i , and N is the number of cellular cells.

4.3. Training Settings

Our training and test sets are divided into an 8:2 ratio. The training set covers the time period from 1 November 2013 to 18 December 2013, while the test set spans from 19 December 2013 to 1 January 2014. The hyperparameters were chosen according to validation scores; hidden size was fixed at 16. The model was implemented with PyTorch 2.9 on an NVIDIA RTX 5070 workstation. For stable convergence, we trained with the Adam optimiser [58], with a learning rate of 0.001 and mini-batches of 32 samples. The input window size is set to 12 time steps (i.e., 2 h), and the forecasting horizons are 3, 6, and 9 steps (30, 60, and 90 min). The data are normalized using Z-score standardization. The stride is set to 1 step for sequential sampling.

4.4. Accuracy Comparison with Baselines

We benchmark KESTNN against a spectrum of baselines. These baselines are categorized into four groups to systematically evaluate the challenges of “complex correlation modeling” and “model-blindness”:
Graph-free models avoid spatial assumptions entirely, thus isolating temporal modeling capability but ignoring spatial dependencies.
Static graph models rely on predefined adjacency (e.g., spatial proximity), which fails to capture complex urban functional correlations.
Dynamic graph models learn graph structures and are data-driven but lack semantic guidance, struggling to infer meaningful correlations under data scarcity.
Knowledge-embedded models incorporate knowledge as auxiliary features but do not allow knowledge to dynamically guide graph learning, representing a typical “model-blindness” scenario.
Our KESTNN aims to address both limitations by integrating knowledge as a differentiable guide for correlation learning through backpropagation. These four types of models are as follows:
Graph-free models include gradient boosting regression (GBR) [59] and GRU [60]. Spatio-temporal forecasting methods with static graph structure, such as the temporal graph convolutional network (TGCN) [61], diffusion convolutional recurrent neural network (DCRNN) [62], and graph convolutional network sequence to sequence (GCNSeq2Seq). Spatio-temporal forecasting methods with dynamic graph structure include the AGCRN [29]. Knowledge-embedded spatio-temporal forecasting methods include knowledge-driven spatial-temporal graph convolutional network (KSTGCN) [63].
We analyzed the accuracy of multiple baseline models in comparison with the proposed KESTNN model and summarized the specific results in Table 4.
Table 4 results indicate that the graph-free structure model (GBR, GRU) achieves the highest accuracy among baseline methods, as it entirely avoids spatial adjacency assumptions and eliminates erroneous spatial information propagation. Static graph methods (e.g., TGCN, DCRNN, GCNSeq2Seq) encode regular grid cells as spatial adjacency, introducing structural noise that causes spatial information transmission errors and reduces model accuracy. Dynamic graph methods reconstruct graph topology in a data-driven manner but lack urban semantic constraints, making it difficult to approximate true spatial dependencies and resulting in relatively low accuracy. KSTGCN enhances node features with knowledge graphs, improving upon TGCN; however, it treats knowledge solely as an auxiliary signal without using prior knowledge to correct spatial structures, thereby limiting its predictive accuracy. KESTNN mitigates the spatial structure noise by using data–knowledge collaboration to guide the graph construction, ensuring that the learned adjacency matrix reflects true semantic and functional relationships rather than mere spatial proximity or data artifacts. KESTNN achieved optimal performance.
As shown in Table 4, KESTNN consistently outperforms all baselines across all step lengths. Compared to the best graph-free model (GRU), KESTNN reduces RMSE by 23.91%, 16.73%, and 10.40% for 3-, 6-, and 9-step forecasts, respectively. Against the best static graph model (GCNSeq2Seq), KESTNN achieves an RMSE reduction of 28.14% for 3-step forecasting. Notably, compared to the dynamic graph model AGCRN and the knowledge-embedded model KSTGCN, KESTNN attains RMSE improvements of 49.01% and 58.32%, respectively, for 3-step forecasting. These gains stem from KESTNN’s ability to collaboratively learn spatial correlations from both data and structured knowledge, whereas AGCRN lacks semantic guidance and KSTGCN treats knowledge as a static feature. The results confirm that our backpropagation based knowledge integration effectively captures complex urban dependencies and overcomes “model-blindness”.
Figure 7 further demonstrates that the forecasting accuracy of all models decreases as the forecasting horizon increases, proving that long-range forecasting tasks are more challenging than short-range ones. The experimental results show that the knowledge graph embedding in the KESTNN method can realize effective information extraction and representation. In addition, the collaborative data–knowledge learning mechanism is able to fully capture the complex and potential spatial structure and realize the efficient and correct transmission of information, which significantly improves the accuracy and interpretability of the forecasting model. The experimental results fully demonstrate the superiority and powerful spatio-temporal modeling capability of the KESTNN model in the cellular traffic forecasting task.
To further clarify the methodological positioning of KESTNN, we compare it with several representative models in Table 5. These include adaptive graph models (AGCRN, GTS [64]) and knowledge-enhanced models (KSTGCN, ATDM [65], MTGNN [66]). Among them, KSTGCN, ATDM, and MTGNN integrate knowledge via forward concatenation or embedding injection, whereas KESTNN employs collaborative learning through backpropagation. The results show that KESTNN achieves the best performance, confirming that the proposed collaborative mechanism more effectively utilizes knowledge to guide spatial structure learning.

4.5. Spatio-Temporal Analysis of Forecasting Error

4.5.1. Temporal Distribution of Forecasting Error

To analyze the full-day fluctuation of forecasting results, Figure 8 shows the temporal distribution of RMSE for the forecasting results.
Error patterns remain largely consistent across all three kinds of forecast horizons. During nighttime, particularly from 20:00 to 08:00 the following day, the RMSE values for forecasting results from higher-performing models (KESTNN, GRU, DCRNN, GCNSeq2Seq) are lower compared to other time periods. Conversely, the models with poorer performance (AGCRN, KSTGCN, TGCN, GBR) exhibit higher RMSE values for their forecasting results compared to other time periods. During nighttime, traffic flow in urban areas decreases relatively, with people concentrating more in residential zones. The traffic flow curve becomes smoother at this time, allowing simpler models (GRU) to directly capture its periodicity and temporal patterns, resulting in higher forecasting accuracy. However, in daytime hours, traffic flow exhibits sharper temporal swings and faster spatial spread. Models like AGCRN and KSTGCN can capture long-range spatial information transmission, leading to higher forecasting performance compared to nighttime.
Notably, KESTNN demonstrated high accuracy in RMSE across both time periods. During daytime (08:00–20:00), when urban activity intensifies and most models experience significant accuracy decline, KESTNN maintained high performance. At night, prior knowledge such as residential POIs embedded in the knowledge graph enabled the model to faithfully track the gentle traffic decline in living zones. During daytime, diverse and multi-level semantic relationships—including roads, districts, and administrative zones—guide dynamic graph convolutions, enabling the timely capture of spatial traffic propagation and functional shifts within the city during flow transmission. KESTNN’s adaptive adjacency matrix module updates online at each time step, allowing the model to continuously deliver high-precision forecasting results in complex, non-stationary scenarios.
In summary, KESTNN achieves high-accuracy forecasting of cellular traffic through the collaborative learning between data and knowledge.

4.5.2. Spatial Distribution of Forecasting Error

To further reveal the spatial distribution of the KESTNN model’s forecasting performance and its underlying physical logic, we evaluated the forecasting results at 3-, 6-, and 9-step lengths within the study area using MAE. Figure 9 displays the spatial distribution of MAE for the forecasting results and the statistical analysis outcomes.
Figure 9a displays the spatial distribution of MAE across cellular cells. Figure 9b presents the LISA clustering results for MAE under the p < 0.05 threshold. Visualization of the three stride lengths reveals a “core-periphery” gradient diffusion pattern for global errors. Areas of low-value form connected patches across the northern and southwestern sectors (away from the central urban district), exhibiting a distinct low-low (LL) clustering pattern. This indicates a relatively simple spatial structure of cellular traffic demand in this area, where forecasting models can adequately capture its spatio-temporal patterns. In contrast, the southeastern part of the study area exhibits a significant cluster of high values (HH). This region constitutes the urban core, featuring diverse and complex functional buildings such as commercial spaces, residential areas, restaurants, churches, hospitals, and schools. Two large parks are also distributed within this zone. The complex urban morphology, frequent human activities, and diverse functional transitions result in significantly non-stationary and abrupt cellular traffic patterns in this area, increasing forecasting difficulty. Consequently, this region exhibits a pronounced high-value clustering pattern in the mean absolute error (MAE). Additionally, high-low (HL) and low-high (LH) anomaly patterns occur rarely, indicating the stable spatial polarization of errors and continuous model response to urban structure.
Figure 9c shows that at the p < 0.001 significance level, the global Moran’s I values for step sizes of 3, 6, and 9 are 0.633, 0.731, and 0.674, respectively, all exhibiting strong positive autocorrelation. This indicates that urban functional layouts are not random but driven by significant spatial diffusion-aggregation mechanisms.
Figure 9d employs the natural breakpoint method to partition the MAE into four quartiles (Q1–Q4) and plots a box-and-whisker plot. The results reveal that the median values of Q1 to Q4 increase sequentially across the three step lengths, accompanied by the synchronous expansion of box heights. This indicates that areas with larger errors exhibit higher dispersion in error distribution, increased probability of extreme values, and poorer forecasting robustness.
In summary, the spatial pattern of KESTNN’s forecasting errors essentially reflects the complex mapping of human–land interactions. The errors exhibit significant clustering in spatial distribution and non-stationary patterns in numerical distribution. However, compared to other models, KESTNN still better models complex spatial relationships and delivers higher-precision forecasting results.

4.6. Assessment of Knowledge Validity

In order to assess the impact of city knowledge embedding on forecasting accuracy improvement, we designed several variants of the KESTNN model. Based on the results, we found the effectiveness of different urban knowledge and knowledge-guided implicit spatial structure modeling. We designed three types of variants: w/o road (the attributes of city roads and their semantic association information are not taken into account when constructing city relationship mapping), w/o POI (the attributes of city POIs and their semantic association information are not taken into account when constructing city relationship mapping), and w/o district (the attributes of city administrative districts and their semantic association information are not taken into account when constructing city relationship mapping). Table 5 demonstrates the results of knowledge validity assessment of KESTNN and its variants.
Table 6 shows that KESTNN achieves the best performance. However, the performance of w/o POI is the poorest in 3-step and 6-step forecasts, which indicates that 3-step and 6-step forecasts are most affected by the knowledge of urban POI. The performance of w/o road is the poorest in the 9-step forecast, indicating that the 9-step forecast is most influenced by road knowledge.
In the 3- and 6-step forecasts, the POI information covers the distribution of various functional areas such as commercial, residential, and recreational areas. People’s activity ranges are relatively fixed in a short period of time, and the distribution of cellular traffic is closely related to the attractiveness and population density of these areas. The spatial distribution and correlation between POIs form a specific traffic pattern. Adjacent or nearby POIs, as well as their combinations, influence cellular traffic patterns. The model is able to use POI spatial correlation information to infer traffic trends in neighboring areas during forecasting.
In the 9-step forecast, the propagation of cellular traffic flow is not only affected by the population activities in the region, but also closely related to the inter-regional traffic flow. The road network determines the path and time cost of people moving between different regions. The clustering and spreading effects of cellular traffic along the road network are more significant when the forecasting time span is longer. The road network reflects the spatial structure and layout of the city.

4.7. Limitation Analysis

This section aims to analyze the limitations of the proposed method—its vulnerability to sudden non-stationary events. By examining the impact patterns of typical event disturbances on forecasting errors, we clarify the applicability of this method and point the way forward for future research.
During the forecasting process, numerous events exert significant influence on the spatio-temporal distribution of forecasting accuracy. In Figure 10, we outline the perturbation mechanisms of three typical events affecting the spatio-temporal variation in cellular traffic forecasting accuracy, alongside their time lag effect diagrams. Figure 10a illustrates nationwide holiday gatherings such as the New Year’s Eve concert at Cathedral Square. The figure indicates that such events trigger abnormal surges in cellular traffic over the subsequent period, thereby degrading forecasting accuracy. This may occur because the dense crowd during the concert engages in multiple communication behaviors—such as photo uploads, live streaming, and instant social interactions—within a confined geographic area, disrupting the stationarity of the time series and reducing model precision. Figure 10b depicts large-scale ceremonial events with social influence, such as the memorial activities organized by the Parliamentary Anti-Mafia Commission. This event also leads to a decline in forecasting accuracy. Events with relatively concentrated audiences tend to attract media livestreaming, journalist coverage, and public discussion, creating localized traffic surges within short timeframes. Figure 10c illustrates urban-scale sudden disturbances like traffic congestion caused by accidents. Such incidents trigger rapid gatherings of vehicles and pedestrians, resulting in localized spikes in cellular traffic over brief periods.
In summary, the above analysis reveals a key limitation of the method presented in this paper: KESTNN remains susceptible to non-stationary nonlinear shocks triggered by sudden events. This delineates boundaries for the predictive applications of our approach. Therefore, explicitly characterizing event-driven dynamics and developing interpretable models that incorporate sudden demand fluctuations (as detailed in Section 5. Limitations and Future Works, Point 4) will become a critical focus for our future research. This is essential for enhancing the robustness and scalability of cellular traffic forecasting methods.

5. Conclusions and Future Works

Forecasting urban cellular traffic presents a significant and widespread challenge. However, current spatio-temporal forecasting methods still faces two primary challenges: “complex correlation modeling” and “model-blindness”. Firstly, complex urban systems with high human activity make it difficult to learn an accurate correlation matrix, as existing methods typically rely only on proximity or simple attributes, thereby limiting forecasting accuracy. Secondly, while integrating extensive urban knowledge is essential, current integration methods (often implemented through loss functions or embeddings) neglect the synergistic interactions between data and knowledge, resulting in insufficient model robustness.
Therefore, we develop a novel data–knowledge collaborative learning framework termed the KESTNN. This framework constructs a knowledge graph using triples representing urban structure and learns the correlation matrix through representation learning, thereby tackling the “complex correlation modeling” challenge. By integrating the urban structure knowledge graph with data-driven processes within a unified backpropagation mechanism, this framework enables data–knowledge collaborative learning to overcome “model-blindness”. Finally, the learned correlation matrix is embedded into the forecasting process to achieve precise forecasting. Our framework represents a more efficient and precise data–knowledge collaborative approach, offering a novel forecasting methodology that advances existing theoretical research in cellular traffic forecasting.
Findings from experiments conducted on the Milan cellular traffic real-world dataset indicate the following:
  • KESTNN exhibits optimal performance in 3-step, 6-step, and 9-step forecasting tasks. Compared to the best baseline model, KESTNN reduces RMSE by 23.91%, 16.73%, and 10.40% in 3-step, 6-step, and 9-step forecasts, respectively.
  • KESTNN demonstrates strong generalization capabilities and robustness. Spatio-temporal error analysis validates the auxiliary gains from knowledge embedding in KESTNN for scenarios involving sudden changes and weak correlations.
  • The spatial structural knowledge modeled by KESTNN is effective. Knowledge ablation experiments indicate that short-term forecasting accuracy is primarily constrained by POI semantics, while long-term forecasting relies on road network topology. The hierarchical semantics of administrative divisions provide sustained gains for multi-step forecasts.
The current work still has certain limitations, and we will address these limitations in our future works, specifically the following:
  • The completeness and granularity of the knowledge graph directly determine the upper limit of forecasting accuracy. Future efforts could integrate real-time event streams from social media, emergency dispatch systems, and meteorological environments to construct richer, semantically diverse hybrid knowledge graphs combining static and dynamic data.
  • Current knowledge embedding primarily relies on linear projection and concatenation. Subsequent research may explore incorporating triplet constraints directly into the generation of graph convolution kernels to achieve deeper physical consistency.
  • Although our work validated the effectiveness of KESTNN using the Milan dataset, its framework is generalizable and can be extended to other urban environments. By replacing local POIs, road networks, cellular cells, and administrative boundaries, this method can adapt to diverse urban structures. Future work will further validate its generalization capabilities across multi-city, multi-source datasets. Additionally, we will explore knowledge transfer strategies to enable the model to rapidly adapt to new dataset distributions through a pre-training-fine-tuning approach, thereby enhancing its generalization performance.
  • The analysis results of event perturbations can be used for further research. Future work may incorporate point processes or stochastic differential equations to perform the generative modeling of chain reactions involving events, crowds, and traffic flows, enabling the interpretable and quantifiable modeling of sudden incidents.
  • The Milan dataset used in this study was collected in 2013–2014. While it effectively validates the model’s superiority in modeling complex urban spatial correlations and knowledge collaborative learning mechanisms, it does not reflect the impacts of emerging network technologies like 5G or recent shifts in user behavior. Future research will further validate the framework’s generalization capability and temporal adaptability across multiple cities using updated multi-source datasets.

Author Contributions

Conceptualization, Keyi An, Kaiqi Chen, Min Deng and Kaiyuan Lei; Data Curation, Keyi An and Kaiyuan Lei; Data Analysis, Keyi An; Funding Acquisition, Qiangjun Li, Kaiqi Chen, Min Deng and Yafei Liu; Investigation, Keyi An, Kaiqi Chen and Senzhang Wang; Methodology, Keyi An, Qiangjun Li, Kaiqi Chen and Min Deng; Project Administration, Keyi An; Resources, Qiangjun Li, Kaiqi Chen and Min Deng; Supervision, Kaiqi Chen and Min Deng; Validation, Keyi An; Visualization, Keyi An; Writing—Original Draft, Keyi An and Kaiqi Chen; Writing—Review and Editing, Keyi An, Qiangjun Li and Kaiqi Chen. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the National Natural Science Foundation (NSFC) of China, [42501584] and the Open Fund Program of Yunnan Key Laboratory of Intelligent Monitoring and Spatio-temporal Big Data Governance of Natural Resources, [202449CE340023].

Data Availability Statement

The source data and codes that support the findings of this study are available at https://figshare.com/s/9e39b5732b024a25b8fa (accessed on 5 January 2026).

Conflicts of Interest

Author Qiangjun Li and Yafei Liu were employed by the Yunnan Institute of Geology and Mineral Surveying and Mapping Co., Ltd. The remaining authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

References

  1. Amini, M.; Stanica, R.; Rosenberg, C. Where are the (cellular) data? ACM Comput. Surv. 2023, 56, 1–25. [Google Scholar] [CrossRef]
  2. Barlacchi, G.; De Nadai, M.; Larcher, R.; Casella, A.; Chitic, C.; Torrisi, G.; Antonelli, F.; Vespignani, A.; Pentland, A.; Lepri, B. A multi-source dataset of urban life in the city of Milan and the Province of Trentino. Sci. Data 2015, 2, 150055. [Google Scholar] [CrossRef]
  3. Gao, Z. 5G traffic prediction based on deep learning. Comput. Intell. Neurosci. 2022, 2022, 3174530. [Google Scholar] [CrossRef]
  4. Krige, D.G. A statistical approach to some basic mine valuation problems on the Witwatersrand. J. South. Afr. Inst. Min. Metall. 1951, 52, 119–139. [Google Scholar]
  5. Brunsdon, C.; Fotheringham, A.S.; Charlton, M.E. Geographically Weighted Regression: A Method for Exploring Spatial Nonstationarity. Geogr. Anal. 1996, 28, 281–298. [Google Scholar] [CrossRef]
  6. Martin, R.L.; Oeppen, J.E. The identification of regional forecasting models using space: Time correlation functions. Trans. Inst. Br. Geogr. 1975, 66, 95–118. [Google Scholar] [CrossRef]
  7. Xu, L.; Chen, N.; Chen, Z.; Zhang, C.; Yu, H. Spatio-temporal forecasting in earth system science: Methods, uncertainties, predictability and future directions. Earth-Sci. Rev. 2021, 222, 103828. [Google Scholar] [CrossRef]
  8. Tobler, W.R. A computer movie simulating urban growth in the Detroit region. Econ. Geogr. 1970, 46, 234–240. [Google Scholar] [CrossRef]
  9. Noulas, A.; Scellato, S.; Lambiotte, R.; Pontil, M.; Mascolo, C. A tale of many cities: Universal patterns in human urban mobility. PLoS ONE 2012, 7, e37027. [Google Scholar] [CrossRef]
  10. Zhao, B.; Deng, M.; Shi, Y. Inferring nonwork travel semantics and revealing the nonlinear relationships with the community built environment. Sustain. Cities Soc. 2023, 99, 104889. [Google Scholar] [CrossRef]
  11. Zhao, B.; Deng, Y.; Luo, L.; Deng, M.; Yang, X. Preferred streets: Assessing the impact of the street environment on cycling behaviors using the geographically weighted regression. Transportation 2025, 52, 1485–1511. [Google Scholar] [CrossRef]
  12. Wang, X.; Wang, Z.; Yang, K.; Song, Z.; Bian, C.; Feng, J.; Deng, C. A survey on deep learning for cellular traffic prediction. Intell. Comput. 2024, 3, 0054. [Google Scholar] [CrossRef]
  13. Gu, B.; Zhan, J.; Gong, S.; Liu, W.; Su, Z.; Guizani, M. A spatial-temporal transformer network for city-level cellular traffic analysis and prediction. IEEE Trans. Wirel. Commun. 2023, 22, 9412–9423. [Google Scholar] [CrossRef]
  14. Lu, J.; Chen, Z. Spatio-temporal graph attention network and graph-based Transformer architecture for distributed urban wind sequence reconstruction and forecasting. Measurement 2025, 252, 117400. [Google Scholar] [CrossRef]
  15. Chen, K.; Chu, G.; Yang, X.; Shi, Y.; Lei, K.; Deng, M. HSETA: A heterogeneous and sparse data learning hybrid framework for estimating time of arrival. IEEE Trans. Intell. Transp. Syst. 2022, 23, 21873–21884. [Google Scholar] [CrossRef]
  16. Deng, M.; Chen, K.; Lei, K.; Chen, Y.; Shi, Y. MVCV-Traffic: Multiview road traffic state estimation via cross-view learning. Int. J. Geogr. Inf. Sci. 2023, 37, 2205–2237. [Google Scholar] [CrossRef]
  17. Santos Escriche, E.; Vassaki, S.; Peters, G. A comparative study of cellular traffic prediction mechanisms. Wirel. Netw. 2023, 29, 2371–2389. [Google Scholar] [CrossRef]
  18. Chen, G.; Guo, Y.; Zeng, Q.; Zhang, Y. A novel cellular network traffic prediction algorithm based on graph convolution neural networks and long short-term memory through extraction of spatial-temporal characteristics. Processes 2023, 11, 2257. [Google Scholar] [CrossRef]
  19. Alzate Mejia, N.; Perelló, J.; Santos-Boada, G.; de Almeida-Amazonas, J.R. Evaluating a Multidisciplinary Model for Managing Human Uncertainty in 5G Cyber–Physical–Social Systems. Appl. Sci. 2024, 14, 8786. [Google Scholar] [CrossRef]
  20. Zhang, C.; Zhang, H.; Yuan, D.; Zhang, M. Citywide cellular traffic prediction based on densely connected convolutional neural networks. IEEE Commun. Lett. 2018, 22, 1656–1659. [Google Scholar] [CrossRef]
  21. Zhao, N.; Ye, Z.; Pei, Y.; Liang, Y.; Niyato, D. Spatial-temporal attention-convolution network for citywide cellular traffic prediction. IEEE Commun. Lett. 2020, 24, 2532–2536. [Google Scholar] [CrossRef]
  22. Sun, F.; Wang, P.; Zhao, J.; Xu, N.; Zeng, J.; Tao, J. Mobile data traffic prediction by exploiting time-evolving user mobility patterns. IEEE Trans. Mob. Comput. 2021, 21, 4456–4470. [Google Scholar] [CrossRef]
  23. Chen, K.; Tan, X.; Deng, M.; Lei, K.; Yang, W.; Liu, H.; Huang, C. Learning dynamic relational heterogeneity for spatio-temporal prediction with geographical meta-knowledge. Int. J. Geogr. Inf. Sci. 2025, 39, 2913–2942. [Google Scholar] [CrossRef]
  24. Lei, K.; Chen, K.; Deng, M.; Tan, X.; Yang, W.; Liu, H.; Huang, C. CSSKL: Collaborative Specific-Shared Knowledge Learning framework for cross-city spatio-temporal forecasting in cellular networks. Int. J. Geogr. Inf. Sci. 2025, 39, 1391–1428. [Google Scholar] [CrossRef]
  25. Calabrese, F.; Diao, M.; Di Lorenzo, G.; Ferreira, J., Jr.; Ratti, C. Understanding individual mobility patterns from urban sensing data: A mobile phone trace example. Transp. Res. Part C Emerg. Technol. 2013, 26, 301–313. [Google Scholar] [CrossRef]
  26. Zhang, D.; Liu, L.; Xie, C.; Yang, B.; Liu, Q. Citywide cellular traffic prediction based on a hybrid spatio-temporal network. Algorithms 2020, 13, 20. [Google Scholar] [CrossRef]
  27. Zhao, N.; Wu, A.; Pei, Y.; Liang, Y.; Niyato, D. Spatial-temporal aggregation graph convolution network for efficient mobile cellular traffic prediction. IEEE Commun. Lett. 2021, 26, 587–591. [Google Scholar] [CrossRef]
  28. Chen, K.; Liu, E.; Deng, M.; Tan, X.; Wang, J.; Shi, Y.; Wang, Z. DKNN: Deep kriging neural network for interpretable geospatial interpolation. Int. J. Geogr. Inf. Sci. 2024, 38, 1486–1530. [Google Scholar] [CrossRef]
  29. Bai, L.; Yao, L.; Li, C.; Wang, X.; Wang, C. Adaptive graph convolutional recurrent network for traffic forecasting. arXiv 2020, arXiv:2007.02842. [Google Scholar] [CrossRef]
  30. Wang, X.; Wang, Z.; Yang, K.; Song, Z.; Feng, J.; Zhu, L.; Deng, C. Deep learning based traffic prediction in mobile network-a survey. TechRxiv 2023. [Google Scholar] [CrossRef]
  31. Wang, X.; Yang, K.; Wang, Z.; Feng, J.; Zhu, L.; Zhao, J. Adaptive hybrid spatial-temporal graph neural network for cellular traffic prediction. In Proceedings of the ICC 2023-IEEE International Conference on Communications, Rome, Italy, 28 May–1 June 2023; pp. 4026–4032. [Google Scholar] [CrossRef]
  32. Jiang, W. Cellular traffic prediction with machine learning: A survey. Expert Syst. Appl. 2022, 201, 117163. [Google Scholar] [CrossRef]
  33. Kaur, J.; Khan, M.A.; Iftikhar, M.; Imran, M.; Emad Ul Haq, Q. Machine learning techniques for 5G and beyond. IEEE Access 2021, 9, 23472–23488. [Google Scholar] [CrossRef]
  34. Haidine, A.; Salmam, F.Z.; Aqqal, A.; Dahbi, A. Artificial intelligence and machine learning in 5G and beyond: A survey and perspectives. In Moving Broadband Mobile Communications Forward: Intelligent Technologies for 5G and Beyond; IntechOpen: London, UK, 2021; p. 47. [Google Scholar] [CrossRef]
  35. Rzeszótko, J.; Nguyen, S.H. Machine learning for traffic prediction. Fundam. Informaticae 2012, 119, 407–420. [Google Scholar] [CrossRef]
  36. Pan, B.; Demiryurek, U.; Shahabi, C. Utilizing real-world transportation data for accurate traffic prediction. In Proceedings of the 2012 IEEE 12th International Conference on Data Mining, Brussels, Belgium, 10–13 December 2012; pp. 595–604. [Google Scholar] [CrossRef]
  37. Wang, L.; Zang, C.; Cheng, Y. The short-term prediction of the mobile communication traffic based on the product seasonal model. SN Appl. Sci. 2022, 2, 399. [Google Scholar] [CrossRef]
  38. Dang, X.; Yan, L. Traffic flow prediction based on multivariate linear AR model. Comput. Eng. 2012, 38, 84–86, 89. [Google Scholar] [CrossRef]
  39. Tran, Q.T.; Hao, L.; Trinh, Q.K. A novel procedure to model and forecast mobile communication traffic by ARIMA/GARCH combination models. In Proceedings of the 2016 International Conference on Modeling, Simulation and Optimization Technologies and Applications (MSOTA2016), Xiamen, China, 18–19 December 2016; pp. 29–34. [Google Scholar] [CrossRef]
  40. Sun, H.; Liu, H.; Xiao, H.; Ran, B. Short Term Traffic Forecasting Using the Local Linear Regression Model. 2002. Available online: https://escholarship.org/uc/item/540301xx (accessed on 20 May 2024).
  41. Rizwan, A.; Arshad, K.; Fioranelli, F.; Imran, A.; Imran, M.A. Mobile internet activity estimation and analysis at high granularity: SVR model approach. In Proceedings of the 2018 IEEE 29th Annual International Symposium on Personal, Indoor and Mobile Radio Communications (PIMRC), Bologna, Italy, 9–12 September 2018; pp. 1–7. [Google Scholar] [CrossRef]
  42. Qiu, C.; Zhang, L.Y.; Feng, Z.; Zhang, P.; Cui, S. Spatio-temporal wireless traffic prediction with recurrent neural network. IEEE Wirel. Commun. Lett. 2018, 7, 554–557. [Google Scholar] [CrossRef]
  43. Wang, J.; Tang, J.; Xu, Z.; Wang, Y.; Xue, G.; Zhang, X. Spatio-temporal modeling and prediction in cellular networks: A big data enabled deep learning approach. In Proceedings of the IEEE INFOCOM 2017—IEEE Conference on Computer Communications, Atlanta, GA, USA, 1–4 May 2017; pp. 1–9. [Google Scholar] [CrossRef]
  44. Trinh, H.D.; Giupponi, L.; Dini, P. Mobile traffic prediction from raw data using LSTM networks. In Proceedings of the 2018 IEEE 29th Annual International Symposium on Personal, Indoor and Mobile Radio Communications (PIMRC), Bologna, Italy, 9–12 September 2018; pp. 1827–1832. [Google Scholar] [CrossRef]
  45. Ngo, D.; Piamrat, K.; Aouedi, O.; Hassan, T.; Parvédy, P.R. FLEXIBLE: Forecasting Cellular Traffic by Leveraging Explicit Inductive Graph-Based Learning. In Proceedings of the 2024 IEEE 35th International Symposium on Personal, Indoor and Mobile Radio Communications (PIMRC), Valencia, Spain, 2–5 September 2024; pp. 1–6. [Google Scholar] [CrossRef]
  46. Wang, J.; Shen, L.; Fan, W. A TSENet Model for Predicting Cellular Network Traffic. Sensors 2024, 24, 1713. [Google Scholar] [CrossRef]
  47. Mao, H.; Zhao, X.; Du, S.; Teng, F.; Li, T. Short-term Subway Passenger Flow Forecasting Based on Graphical Embedding of Temporal Knowledge. Comput. Sci. 2023, 50, 213–220. [Google Scholar] [CrossRef]
  48. Li, J.; Li, Y. Spatial-temporal multi-graph convolution for traffic flow prediction by integrating knowledge graphs. J. Zhejiang Univ. (Eng. Sci.) 2024, 58, 1366–1376. [Google Scholar] [CrossRef]
  49. Gong, J.; Li, T.; Wang, H.; Liu, Y.; Wang, X.; Wang, Z.; Deng, C.; Feng, J.; Jin, D.; Li, Y. Kgda: A knowledge graph driven decomposition approach for cellular traffic prediction. ACM Trans. Intell. Syst. Technol. 2024, 15, 1–22. [Google Scholar] [CrossRef]
  50. Liu, L.; Yu, S.; Wang, R.; Ma, Z.; Shen, Y. How can large language models understand spatial-temporal data? arXiv 2024, arXiv:2401.14192. [Google Scholar] [CrossRef]
  51. Jin, M.; Wang, S.; Ma, L.; Chu, Z.; Zhang, J.Y.; Shi, X.; Chen, P.; Liang, Y.; Li, Y.; Pan, S.; et al. Time-llm: Time series forecasting by reprogramming large language models. In Proceedings of the International Conference on Learning Representations (ICLR), Vienna, Austria, 7–11 May 2024. [Google Scholar] [CrossRef]
  52. Sun, C.; Li, Y.; Li, H.; Hong, S. Test: Text prototype aligned embedding to activate llm’s ability for time series. In Proceedings of the International Conference on Learning Representations (ICLR), Vienna, Austria, 7–11 May 2024. [Google Scholar] [CrossRef]
  53. Cao, D.; Jia, F.; Arik, S.O.; Pfister, T.; Zheng, Y.; Ye, W.; Liu, Y. Tempo: Prompt-based generative pre-trained transformer for time series forecasting. In Proceedings of the International Conference on Learning Representations (ICLR), Vienna, Austria, 7–11 May 2024. [Google Scholar] [CrossRef]
  54. Liu, C.; Yang, S.; Xu, Q.; Li, Z.; Long, C.; Li, Z. Spatial-temporal large language model for traffic prediction. In Proceedings of the 2024 25th IEEE International Conference on Mobile Data Management (MDM), Brussels, Belgium, 24–27 June 2024; IEEE: New York, NY, USA, 2024; pp. 31–40. [Google Scholar] [CrossRef]
  55. Chen, J.; Shao, Q.; Chen, D.; Yu, W. Decoupling Spatio-Temporal Prediction: When Lightweight Large Models Meet Adaptive Hypergraphs. arXiv 2025, arXiv:2505.19620. [Google Scholar] [CrossRef]
  56. Lee, G.; Yu, W.; Shin, K.; Cheng, W.; Chen, H. Timecap: Learning to contextualize, augment, and predict time series events with large language model agents. In Proceedings of the AAAI Conference on Artificial Intelligence, Philadelphia, PA, USA, 25 February–4 March 2025; Volume 39, pp. 18082–18090. [Google Scholar] [CrossRef]
  57. Lin, Y.; Liu, Z.; Sun, M.; Liu, Y.; Zhu, X. Learning entity and relation embeddings for knowledge graph completion. In Proceedings of the Twenty-Ninth AAAI Conference on Artificial Intelligence (AAAI 2015), Austin, TX, USA, 25–30 January 2015. [Google Scholar]
  58. Kingma, D.P.; Ba, J. Adam: A Method for Stochastic Optimization. arXiv 2014, arXiv:1412.6980. [Google Scholar] [CrossRef]
  59. Friedman, J.H. Greedy function approximation: A gradient boosting machine. Ann. Stat. 2001, 29, 1189–1232. [Google Scholar] [CrossRef]
  60. Chung, J.; Gulcehre, C.; Cho, K.; Bengio, Y. Empirical evaluation of gated recurrent neural networks on sequence modeling. arXiv 2014, arXiv:1412.3555. [Google Scholar] [CrossRef]
  61. Zhao, L.; Song, Y.; Zhang, C.; Liu, Y.; Wang, P.; Lin, T.; Deng, M.; Li, H. T-GCN: A temporal graph convolutional network for traffic prediction. IEEE Trans. Intell. Transp. Syst. 2019, 21, 3848–3858. [Google Scholar] [CrossRef]
  62. Li, Y.; Yu, R.; Shahabi, C.; Liu, Y. Diffusion convolutional recurrent neural network: Data-driven traffic forecasting. arXiv 2017, arXiv:1707.01926. [Google Scholar] [CrossRef]
  63. Zhu, J.; Han, X.; Deng, H.; Tao, C.; Zhao, L.; Wang, P.; Lin, T.; Li, H. KST-GCN: A knowledge-driven spatial-temporal graph convolutional network for traffic forecasting. IEEE Trans. Intell. Transp. Syst. 2022, 23, 15055–15065. [Google Scholar] [CrossRef]
  64. Han, P.; Wang, J.; Yao, D.; Shang, S.; Zhang, X. A graph-based approach for trajectory similarity computation in spatial networks. In Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining, Virtual Event, 14–18 August 2021; pp. 556–564. [Google Scholar] [CrossRef]
  65. Medrano, R.; Aznarte, J.L. On the inclusion of spatial information for spatio-temporal neural networks. Neural Comput. Appl. 2021, 33, 14723–14740. [Google Scholar] [CrossRef]
  66. Wu, Z.; Pan, S.; Long, G.; Jiang, J.; Chang, X.; Zhang, C. Connecting the dots: Multivariate time series forecasting with graph neural networks. In Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, Virtual Event, 6–10 July 2020; pp. 753–763. [Google Scholar] [CrossRef]
Figure 1. Main challenges in cellular traffic forecasting. (a) Cellular traffic patterns in three different areas of the city of Milan. (b) Cellular traffic patterns are shown for regions closer to Parco Sempione versus farther away, illustrating the importance of correlation. (c) Regions A and B are further apart but on the same highway, and Regions B and C are closer, but Region B is on the highway and Region C is on the railroad, illustrating the importance of urban knowledge.
Figure 1. Main challenges in cellular traffic forecasting. (a) Cellular traffic patterns in three different areas of the city of Milan. (b) Cellular traffic patterns are shown for regions closer to Parco Sempione versus farther away, illustrating the importance of correlation. (c) Regions A and B are further apart but on the same highway, and Regions B and C are closer, but Region B is on the highway and Region C is on the railroad, illustrating the importance of urban knowledge.
Ijgi 15 00043 g001aIjgi 15 00043 g001b
Figure 2. Knowledge graph of urban structure.
Figure 2. Knowledge graph of urban structure.
Ijgi 15 00043 g002
Figure 3. Projecting knowledge onto the relational space.
Figure 3. Projecting knowledge onto the relational space.
Ijgi 15 00043 g003
Figure 4. Splicing and fusion of urban priori knowledge and historical data features.
Figure 4. Splicing and fusion of urban priori knowledge and historical data features.
Ijgi 15 00043 g004
Figure 5. Spatio-temporal learning model for cellular traffic forecasting.
Figure 5. Spatio-temporal learning model for cellular traffic forecasting.
Ijgi 15 00043 g005
Figure 6. Study area of the dataset. (a) Location of Italy in Europe. (b) Location of Milan in Italy. (c) Regions represent the study area.
Figure 6. Study area of the dataset. (a) Location of Italy in Europe. (b) Location of Milan in Italy. (c) Regions represent the study area.
Ijgi 15 00043 g006
Figure 7. Accuracy comparison of cellular traffic forecasting models. (a) Comparison of RMSE metrics across different forecasting models at various step lengths. (b) Comparison of R 2 metrics across different forecasting models at various step lengths.
Figure 7. Accuracy comparison of cellular traffic forecasting models. (a) Comparison of RMSE metrics across different forecasting models at various step lengths. (b) Comparison of R 2 metrics across different forecasting models at various step lengths.
Ijgi 15 00043 g007
Figure 8. Temporal distribution of the RMSE for forecasting results (from left to right: errors for 3-step, 6-step, and 9-step forecasts).
Figure 8. Temporal distribution of the RMSE for forecasting results (from left to right: errors for 3-step, 6-step, and 9-step forecasts).
Ijgi 15 00043 g008
Figure 9. Spatial distribution and statistical analysis of MAE for forecasting results. (a) Spatial distribution of MAE for forecasting results (from left to right: errors for 3-step, 6-step, and 9-step forecasts; subsequent figures follow the same order). (b) LISA significance clustering of MAE for forecasting results (p < 0.05). (c) Global Moran’s I scatter plot distribution of MAE for forecasting results (p < 0.001). (d) Box plot grouped by quartiles of MAE for forecasting results.
Figure 9. Spatial distribution and statistical analysis of MAE for forecasting results. (a) Spatial distribution of MAE for forecasting results (from left to right: errors for 3-step, 6-step, and 9-step forecasts; subsequent figures follow the same order). (b) LISA significance clustering of MAE for forecasting results (p < 0.05). (c) Global Moran’s I scatter plot distribution of MAE for forecasting results (p < 0.001). (d) Box plot grouped by quartiles of MAE for forecasting results.
Ijgi 15 00043 g009
Figure 10. Three typical events affecting forecasting accuracy and their delayed effects. (a) Delayed impact of the New Year’s Eve concert at Cathedral Square on forecasting accuracy. (b) Delayed impact of the Parliamentary Anti-Mafia Commission memorial event on forecasting accuracy. (c) Delayed impact of traffic congestion caused by an accident on forecasting accuracy.
Figure 10. Three typical events affecting forecasting accuracy and their delayed effects. (a) Delayed impact of the New Year’s Eve concert at Cathedral Square on forecasting accuracy. (b) Delayed impact of the Parliamentary Anti-Mafia Commission memorial event on forecasting accuracy. (c) Delayed impact of traffic congestion caused by an accident on forecasting accuracy.
Ijgi 15 00043 g010aIjgi 15 00043 g010b
Table 1. Entity definition table.
Table 1. Entity definition table.
EntityDescriptionSymbol
RegionCellular traffic cellIjgi 15 00043 i001
DistrictAdministrative division areaIjgi 15 00043 i002
POICity points of interestIjgi 15 00043 i003
RoadCity road segmentIjgi 15 00043 i004
Table 2. Entity relationship table.
Table 2. Entity relationship table.
Head EntityTail EntityRelationshipSymbol
RegionRegionRegion adjacent to RegionIjgi 15 00043 i005
DistrictDistrictDistrict adjacent to DistrictIjgi 15 00043 i006
POIRegionPOI located at RegionIjgi 15 00043 i007
POIDistrictPOI located at DistrictIjgi 15 00043 i008
RoadRegionRoad intersects with RegionIjgi 15 00043 i009
RegionDistrictRegion contained within DistrictIjgi 15 00043 i010
RoadDistrictRoad subordinate to DistrictIjgi 15 00043 i011
Table 3. Entity attribute information table.
Table 3. Entity attribute information table.
EntityAttribute InformationSymbol
POIPOI belongs to POI categoryIjgi 15 00043 i012
POIPOI belongs to the POI subcategoryIjgi 15 00043 i013
RoadRoad belongs to the Road typeIjgi 15 00043 i014
DistrictDistrict belongs to District level 1 (the first level)Ijgi 15 00043 i015
DistrictDistrict belongs to District level 2 (the second level)Ijgi 15 00043 i016
Table 4. Comparison of accuracy with baseline models.
Table 4. Comparison of accuracy with baseline models.
Model3 Steps6 Steps9 Steps
RMSEMAER2RMSEMAER2RMSEMAER2
GBR75.1217 47.1178 0.7372 77.7935 48.8656 0.7172 81.2973 50.9675 0.6901
GRU34.1902 17.4987 0.9456 39.2240 17.54020.9281 42.9152 18.80790.9137
TGCN62.8802 40.8986 0.8159 65.8941 42.7381 0.7971 70.7121 44.0781 0.7656
DCRNN41.0592 23.2137 0.9215 45.0818 20.4092 0.9050 46.8489 25.6928 0.8971
GCNSeq2Seq36.2036 16.7790 0.9390 45.6397 24.3235 0.9027 53.0462 27.7382 0.8681
AGCRN51.0185 33.6359 0.8788 53.5359 35.3194 0.8661 57.2496 37.6483 0.8463
KSTGCN62.4183 41.1469 0.8186 69.0864 45.0657 0.7770 69.8529 44.2113 0.7712
KESTNN26.013714.47240.968532.660318.3016 0.950238.452221.6237 0.9307
Table 5. Comparison of accuracy with adaptive graph models and knowledge-enhanced models.
Table 5. Comparison of accuracy with adaptive graph models and knowledge-enhanced models.
Model3 Steps6 Steps9 Steps
RMSEMAER2RMSEMAER2RMSEMAER2
ATDM71.5012 42.7988 0.7620 67.1569 37.0764 0.7893 69.6654 41.1105 0.7725
KSTGCN62.4183 41.1469 0.8186 69.0864 45.0657 0.7770 69.8529 44.2113 0.7712
AGCRN51.0185 33.6359 0.8788 53.5359 35.3194 0.8661 57.2496 37.6483 0.8463
GTS41.1444 21.4566 0.9212 46.3424 25.4622 0.8997 51.9245 28.9784 0.8736
MTGNN38.8506 18.4130 0.9297 39.9455 21.6301 0.9254 45.0419 23.4755 0.9049
KESTNN26.013714.47240.968532.660318.30160.950238.452221.62370.9307
Table 6. Results of the evaluation of KESTNN and its variants.
Table 6. Results of the evaluation of KESTNN and its variants.
ModelStepsRMSEMAEMAPE
w/o road326.0405 14.4912 11.5307%
w/o POI25.9929 15.1774 12.0768%
w/o district25.931114.7910 11.7692%
KESTNN26.0137 14.472411.5158%
w/o road632.8046 18.3050 14.6380%
w/o POI32.7416 18.3178 14.6482%
w/o district32.7240 18.285514.6224%
KESTNN32.660318.3016 14.5742%
w/o road938.8598 21.8964 17.4506%
w/o POI38.7657 21.6568 17.2597%
w/o district38.8894 21.7707 17.3505%
KESTNN38.452221.623717.2333%
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

An, K.; Li, Q.; Chen, K.; Deng, M.; Liu, Y.; Wang, S.; Lei, K. Data–Knowledge Collaborative Learning Framework for Cellular Traffic Forecasting via Enhanced Correlation Modeling. ISPRS Int. J. Geo-Inf. 2026, 15, 43. https://doi.org/10.3390/ijgi15010043

AMA Style

An K, Li Q, Chen K, Deng M, Liu Y, Wang S, Lei K. Data–Knowledge Collaborative Learning Framework for Cellular Traffic Forecasting via Enhanced Correlation Modeling. ISPRS International Journal of Geo-Information. 2026; 15(1):43. https://doi.org/10.3390/ijgi15010043

Chicago/Turabian Style

An, Keyi, Qiangjun Li, Kaiqi Chen, Min Deng, Yafei Liu, Senzhang Wang, and Kaiyuan Lei. 2026. "Data–Knowledge Collaborative Learning Framework for Cellular Traffic Forecasting via Enhanced Correlation Modeling" ISPRS International Journal of Geo-Information 15, no. 1: 43. https://doi.org/10.3390/ijgi15010043

APA Style

An, K., Li, Q., Chen, K., Deng, M., Liu, Y., Wang, S., & Lei, K. (2026). Data–Knowledge Collaborative Learning Framework for Cellular Traffic Forecasting via Enhanced Correlation Modeling. ISPRS International Journal of Geo-Information, 15(1), 43. https://doi.org/10.3390/ijgi15010043

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Article metric data becomes available approximately 24 hours after publication online.
Back to TopTop