You are currently viewing a new version of our website. To view the old version click .
Electronics
  • Article
  • Open Access

13 October 2025

A Traffic Forecasting Framework for Cellular Networks Based on a Dynamic Component Management Mechanism

,
,
,
,
,
,
and
1
School of Space Information, Space Engineering University, Beijing 101416, China
2
China Unicom Research Institute, Beijing 100176, China
3
School of Traffic and Civil Engineering, Shandong Traffic Institute of Civil Engineering, Jinan 250357, China
*
Author to whom correspondence should be addressed.

Abstract

Accurate forecasting of cellular traffic in non-stationary environments remains a formidable challenge, as real-world traffic patterns dynamically evolve, emerge, and vanish over time. To tackle this, we propose a novel meta-learning framework, GMM-SCM-DCM, which features a Dynamic Component Management (DCM) mechanism. This framework employs a Gaussian Mixture Model (GMM) for probabilistic meta-feature representation. The core innovation, the DCM mechanism, enables online structural evolution of the meta-learner by dynamically splitting, merging, or pruning Gaussian components based on a bimodal similarity metric, ensuring sustained alignment with shifting data distributions. A Single-Component Mechanism (SCM) is utilized for precise base learner initialisation. To ensure a rigorous and realistic validation, we reconstructed the Telecom Italia Milan dataset by applying unsupervised clustering and meta-feature engineering to identify and label four distinct functional zones: residential, commercial, mixed use, and crucially, non-stationary areas. This curated dataset provides a critical testbed for non-stationary forecasting. Comprehensive experiments demonstrate that our model significantly outperforms traditional methods and meta-learning baselines, achieving a 9.3% reduction in MAE and approximately 70% faster convergence. The model’s superiority is further confirmed through extensive ablation studies, robustness tests across base learners and data scales, and successful cross-dataset validation on the Shanghai Telecom dataset, showcasing its exceptional generalization capability and practical utility for real-world network management.

1. Introduction

With the rapid advancement of fifth-generation mobile communications technology (5G) and Internet of Things (IoT), mobile network traffic demonstrates explosive growth and high dynamism. 5G and its evolving technology, B5G (Beyond 5G) networks, have introduced three typical application scenarios: enhanced Mobile Broadband (eMBB), massive Machine-Type Communications (mMTCs), and Ultra-Reliable Low-Latency Communications (URLLCs). These scenarios pose new challenges for traffic prediction. The eMBB scenario, characterized by high-speed data transmission, exhibits bursty traffic patterns and high bandwidth demands, with the primary prediction target being throughput. The mMTC scenario, characterized by massive device connectivity, shifts the prediction focus from throughput to the number of connections. The URLLC scenario requires extremely low latency and high reliability, necessitating the prediction of network load to ensure quality of service. Typical applications of these scenarios include smart cities, connected vehicles, and the industrial Internet. Accurate traffic prediction not only enhances user experience but also significantly reduces operational costs, improving the autonomy and intelligence of networks.
Traditional traffic prediction methods are mostly based on statistical models or shallow machine learning approaches. The Seasonal Autoregressive Integrated Moving Average (SARIMA) model has long been used for communication traffic prediction due to its effective characterization of seasonality and trend components []. Support Vector Regression (SVR), on the other hand, handles nonlinear relationships through kernel functions and has demonstrated advantages in early traffic prediction []. Although these methods perform well in stationary time series prediction, their assumptions of linear or static modeling make it difficult to capture the non-stationarity and multimodal evolution characteristics caused by mixed services, user mobility, and sudden events in 5G environments []. In recent years, deep learning models such as Long Short-Term Memory (LSTM) and Gated Recurrent Unit (GRU) have been widely applied to traffic prediction tasks due to their powerful sequence modeling capabilities [,]. However, these methods typically rely on large amounts of labeled data for training and exhibit limited generalization capability when faced with data distribution shifts or unseen traffic patterns, often leading to performance degradation []. This makes it challenging for them to adapt to the dynamic and variable environment of 5G networks.
Meta-learning, as a “learning to learn” paradigm, enables models to quickly adapt to new tasks by extracting shared knowledge from multiple related tasks, making it particularly suitable for addressing model initialisation problems in few-shot scenarios []. In the field of traffic prediction, meta-learning constructs abstract representations of task distributions to provide personalized model initialisation strategies for different base stations or regions, thereby significantly improving prediction efficiency and accuracy. However, most existing meta-learning traffic prediction models are based on the assumption of stationary traffic patterns. Their meta-learners (such as K-Nearest Neighbors, deep neural networks, or Gaussian mixture models) often employ fixed structures or a fixed number of components, making it difficult to effectively handle concept drift issues caused by factors like changes in base station functionality or the introduction of new services in real-world networks.
The core challenge of non-stationary traffic patterns lies in their underlying data distributions, which are not static but undergo structural shifts over time, space, and external events (such as holidays, emergencies, or cyberattacks). These manifest as: (1) the emergence of new patterns (e.g., newly built metro stations or schools); (2) the disappearance of old patterns (e.g., decommissioned factories); (3) the merging and splitting of patterns (e.g., evolution of mixed commercial–residential zones); and (4) the ambiguity of pattern boundaries (e.g., functionally mixed areas). These changes cause meta-learning models based on fixed component numbers (e.g., GMM with fixed K values) to suffer from pattern omission, component degradation, and inefficient resource allocation, severely limiting their applicability and scalability in real-world networks.
To address these challenges, this paper proposes a meta-learning traffic forecasting model based on a Dynamic Component Management (DCM) mechanism, termed GMM-SCM-DCM. Building upon the Gaussian Mixture Model (GMM) as its meta-learning backbone, the model introduces dynamic component splitting and merging mechanisms to achieve adaptive tracking and modeling of non-stationary traffic patterns. Specifically, a bimodal similarity metric combining probabilistic and spatial similarity is designed to evaluate pattern novelty in new tasks, thereby triggering component generation, merging, or removal operations. Furthermore, a Single-Component Mechanism (SCM) is employed as the initial weight allocation strategy for the base learner, mitigating pattern confusion and training instability potentially caused by multi-component synthesis.
The major contributions of this paper include:
  • Proposing a meta-learning framework with a DCM mechanism, enabling the model to perform online identification, adaptation, and evolution of 5G traffic patterns.
  • Designing a dual-modal similarity metric and a three-layer response strategy to achieve fine-grained initialisation and rapid adaptation for different service modes (such as eMBB and mMTC).
  • Constructing a non-stationary traffic evaluation benchmark incorporating typical 5G service modes based on a real cellular dataset, and systematically validating the model’s advantages in prediction accuracy, convergence speed, and generalization capability.
The paper is structured as follows. Section 2 systematically reviews 5G/B5G requirements and existing traffic prediction approaches. Section 3 details the GMM-SCM-DCM architecture and algorithms. Section 4 describes the experimental setup and datasets. Section 5 presents and discusses the results. Finally, Section 6 concludes the paper and suggests future research directions.

3. Model Architecture

In this section, we first outline the proposed traffic forecasting framework, GMM-SCM-DCM. Building upon this, we define the foundational learner, the meta-learner, and the DCM mechanism for long-term forecasting tasks.
As illustrated in Figure 1, the GMM-SCM-DCM model comprises three steps. The first step subjects cellular traffic payload data to processing through operations such as the Fast Fourier Transform (FFT) to derive the traffic feature frequencies for each cellular network, which are then used to train GMMs. The second step feeds the frequency–domain features of new traffic tasks into the DCM for novelty detection. If the traffic pattern is identified as existing, the SCM mechanism assigns initial weights and executes the prediction task. If the pattern is classified as a similar pattern, it provides similarity weights and then fine-tunes using a fine-tuned dataset. If the pattern is classified as a completely new traffic pattern, a new component must be created; this step is termed component splitting. The third step utilizes the soft weights produced by the GMM to select an appropriate base learner for network traffic load prediction.
Figure 1. GMM-SCM-DCM model architecture.

3.1. Problem Definition

This work addresses the prediction of total throughput at the base station level in 5G cellular networks, a critical input for network resource scheduling and capacity planning. We formulate the cellular traffic load prediction problem as a multivariate time series forecasting task. Consider N cellular base stations where the traffic load of each base station at time t is denoted as x t . Given an observed historical sequence X = [ x t T + 1 , , x t ] of T time steps, the objective is to predict the traffic load for the next τ time steps:
Y = [ x t + 1 , x t + 2 , , x t + τ ] = f ( X | Θ )
where f is the prediction model and Θ represents the model parameters. In the meta-learning framework, each base station or region is treated as a task T i , with its training data D i t r a i n and test data D i t e s t . The objective of the meta-learner is to learn shared knowledge from multiple tasks, enabling new task T N e w to rapidly adapt with only a few samples:
Θ N e w = Meta - Learner ( D n e w t r a i n ; Φ )
where Φ is the parameter of the meta-learner.

3.2. Analysis of Dataset and Spatio-Temporal Characteristics

The experiments in this study are based on a publicly available mobile network traffic dataset collected by Telecom Italia in Milan. This dataset contains traffic records from November 2013 to January 2014, covering approximately 10,000 grid cells.
To construct a set of feature vectors suitable for this study, the time series data within the dataset was converted into frequency–domain information through the following steps:
(1)
Time series construction: The entire time span of the dataset was divided into consecutive one-hour time intervals.
(2)
Data normalization: The traffic load was normalized to the range [0, 1] using the Min-Max algorithm. The normalized sequences distinctly reveal the diverse traffic patterns exhibited by cells across different functional zones. As illustrated in Figure 2, three representative cells display markedly different temporal characteristics during identical time periods, demonstrating the significant impact of functional differentiation within urban spaces on communication behavior. Residential areas exhibit stronger evening peak pulses (18:00–21:00), with a notable increase in baseline traffic throughout weekends. Commercial districts exhibit sharper morning peak traffic (8–11 a.m.) with markedly reduced weekend volumes. Mixed-use areas display distinct dual peaks during morning and evening hours, exhibiting moderate weekend effects.
Figure 2. Time–domain characteristics across different cells.
(3)
Feature extraction: Leveraging the periodic nature of cellular network traffic, each cell’s traffic load data is treated as a discrete signal with a period T = 168 h (one week). Its frequency domain characteristics are analyzed via Fast Fourier Transform (FFT). Five principal frequency components (corresponding to periods of 1 week, 1 day, 12 h, 8 h, and 6 h) are selected from the frequency domain information. Their real and imaginary parts collectively form a 10-dimensional feature vector:
Γ p = R ( F p ( π / 84 ) ) , S ( F p ( π / 84 ) ) , R ( F p ( π / 12 ) ) , S ( F p ( π / 12 ) ) , R ( F p ( π / 6 ) ) , S ( F p ( π / 6 ) ) , R ( F p ( π / 4 ) ) , S ( F p ( π / 4 ) ) , R ( F p ( π / 3 ) ) , S ˜ ( F p ( π / 3 ) )
(4)
Foundational sample construction: Formulating the traffic forecasting task as a supervised learning problem, where each foundational sample comprises data from the preceding three hours as input, with the fourth hour serving as the forecast output.
Unlike previous studies which assumed a steady-state environment, this paper focuses on the non-stationary evolution of flow patterns. To this end, we undertook a critical reconstruction and annotation of the dataset (see Section 4.4 for details), thereby establishing an evaluation benchmark that more accurately reflects genuine dynamic variations.

3.3. Probabilistic Modeling of the Feature Space

Let the feature vector for the historical baseline task be { Γ p } p = 1 N R D ( D = 10 ) , which follows a mixture model composed of K Gaussian distributions:
p ( Γ p | θ ) = k = 1 K π k N ( Γ p | μ k , k )
The model parameters are θ = π k , μ k , k k = 1 K , where π k represents the mixture coefficients in the Gaussian mixture model, denoting the probability of selecting the kth Gaussian model during data generation.
μ k denotes the mean of the kth Gaussian model, and k denotes its variance. Each Gaussian distribution corresponds to a category of base station traffic features exhibiting similar traffic patterns (e.g., commercial areas, residential areas, etc.).
For the kth Gaussian component (k), the corresponding optimal set of weight vectors is
W k = { w i | Γ i θ k }
The class centers of the weight vectors are
w ¯ k = 1 | W k | w j W k w j
When inputting the meta-feature Γ q for a new task q, the posterior probabilities for its membership in each component are as follows:
γ k ( Γ q ) = π k N ( Γ q | μ k , k ) j = 1 K π j N ( Γ q | μ j , j )
The component k * = arg max k γ k ( Γ q ) with the highest posterior probability is selected, and initial weights are assigned according to the following rules.
The initial weight allocation employs an SCM, selecting the base model weight whose probability distribution most closely approximates that of the meta-feature vector as the initial weight:
w q i n i t = arg w i W k * Γ q Γ i k 1
· k 1 is employed to compute the Mahalanobis distance, enhancing the measurement of local similarity by introducing the covariance structure of the component k * .
Unlike traditional GMM component learners, the number of components K in this study is not a pre-set and fixed hyperparameter. Instead, the proposed DCM mechanism (Section 3.2) dynamically adjusts the GMM structure—splitting, merging, or eliminating components—based on the novelty of patterns exhibited by new tasks. This enables the model to continuously adapt to non-stationary environments.

3.4. Dynamic Component Management Mechanism

DCM directly acts upon the meta-learner, indirectly optimizing the initialisation of the base learner by managing the evolution of GMM components. To determine whether an existing meta-learner is suitable for a new task q—i.e., whether the meta-learner requires evolution—this paper designs a dual-modal similarity metric based on probabilistic and spatial similarity. This metric primarily reflects the novelty of the traffic pattern in the traffic data stream for the new task, with specific rules as shown in the formula.
ν ( Γ q ) = 1 max k γ k ( Γ q ) exp 1 D Γ q μ k k 1
where ν ( Γ q ) denotes the novelty metric, Γ q represents the meta-feature vector for new task q, γ k ( Γ q ) denotes the posterior probability, μ k is the mean vector for component k, · k 1 denotes the Mahalanobis distance, and D = 10 represents the meta-feature dimension.
Based on the novelty metric for traffic patterns, this paper designs a three-tier response strategy utilizing DCM, as illustrated in Figure 3. When a new task requires prediction, DCM first assesses pattern novelty. If the pattern is classified as an existing traffic pattern, the SCM mechanism directly assigns initial weights and enters operational status. If judged as a similar pattern, it provides similarity weights before undergoing fine-tuning using the fine-tuning dataset. If the pattern is classified as an entirely novel traffic pattern, a new component must be created—a process termed component splitting. The specific splitting methodology is detailed in the subsequent sections.
Figure 3. Mechanism Design: Three-Tier Response Strategy.
When existing components fail to meet the prediction requirements of new tasks, the DCM mechanism is triggered. Its operational flow is illustrated in Figure 4.
Figure 4. GMM Component variation mechanism.
The splitting threshold is set at τ split , the merging threshold at τ merge , and the minimum survival mixture coefficient at π min , with decisions being made via the formula.
Split if ν > τ split Merge if max k γ k < τ split Eliminate if π k < π min
When triggering a component splitting decision, parameters are updated as follows:
k near = arg min k Γ q μ k k 1 μ new = α Γ q + ( 1 α ) μ k near new = β k near + ( 1 β ) diag ( σ global 2 ) π new = η π k near π k near ( 1 η ) π k near W new * = w k near * + N ( 0 , σ w 2 I )
where k near denotes the latest component index; α , β represent the mean and covariance interpolation coefficients, respectively; σ global 2 denotes the global feature variance; η denotes the mixture coefficient transfer rate; σ w denotes the weighting noise standard deviation; and N ( 0 , σ w 2 I ) denotes Gaussian noise.
When triggering component merging decisions, parameters are updated as follows:
π merged = π i + π j μ merged = π i μ i + π j μ j π merged merged = i π i π merged + j π j π merged + δ δ T W merged * = π i W i * + π j W j * π merged
where δ = μ i μ j (the mean difference vector), ρ i j denotes the component similarity, D KL represents the KL divergence, and cos ( · , · ) stands for the cosine similarity.
When the component contribution reaches the exclusion threshold, the exclusion mechanism is triggered. The exclusion criteria are as follows:
π k < π min 1 t c t 0 t 0 t c π k ( t ) d t < k min
The design of the DCM fully accounts for the dynamic characteristics of 5G networks. The operational logic can be formalized as follows:
  • Component merging may correspond to the consolidation of functionally similar areas during network optimization or the decline of certain service patterns. When two components (e.g., patterns of two commercial areas) become similar due to urban development, merging them simplifies the model and prevents overfitting.
  • Component elimination may correspond to the decommissioning of a base station or the obsolescence of outdated service patterns (e.g., relocation of an industrial zone). This ensures efficient utilization of the meta-learner’s resources by focusing only on currently active patterns.

3.5. Initial Weighting Mechanism for Basic Learners

In the preceding section, both the SCM and Multi-Component Mechanism (MCM) were discussed. For meta-learning traffic forecasting tasks under stationary flow patterns, selecting MCM may outperform SCM in certain scenarios. However, for non-stationary flow patterns, this study adopts a “split–merge” path based on Gaussian Mixture Models. Within this path, a bijective relationship exists between “component-flow pattern-weight set”. This bijective relationship ensures that each component corresponds uniquely to a specific flow pattern and its associated weight. The MCM’s composite weights may disrupt this bijection. In contrast, the DCM aligns more closely with the logical requirements of this path. Furthermore, MCM composite weights may induce oscillation in the evolutionary path. Consequently, this section’s research adopts the SCM as the initial weighting mechanism for the base learner, a choice grounded in both the model framework and engineering practice considerations.
The SCM mechanism operates on the principle that if the meta-features of two tasks exhibit similar probability distributions, their optimal model weights should likewise be similar. It selects the base model weight whose probability distribution most closely approximates that of the meta-feature vector as the initial weight:
w q init = arg w i W k * Γ q Γ i k 1
where · k 1 is used to compute the Mahalanobis distance, incorporating the covariance structure k * to enhance the measurement of local similarity.

4. Experiments and Analysis

4.1. Experimental Objectives

This section aims to validate the proposed GMM-SCM-DCM meta-learning framework for non-stationary cellular traffic forecasting. This includes:
(1) Validating the effectiveness of the meta-learning initialisation strategy: Demonstrating the framework’s superiority in prediction accuracy and convergence speed compared with random initialisation, fixed initialisation, and other meta-learning approaches.
(2) Validate adaptability to non-stationary environments: By constructing a non-stationary cellular traffic test set, demonstrate that the DCM mechanism handles distribution drift more effectively than traditional methods.
(3) Validating model robustness: Testing model performance across different base learner architectures and training set sizes.

4.2. Baseline Algorithms and Parameter Settings

For a comprehensive evaluation, five categories of baseline methods are selected for comparison in this section.
(1) Traditional Statistical/Shallow Learning Methods
  • SVR (RBF): Support Vector Regression with Radial Basis Function kernel. Parameters: kernel coefficient γ = 0.1 , regularization parameter C = 1.0 , epsilon tube ϵ = 0.1 , implemented using LIBSVM.
  • SARIMA: Seasonal Autoregressive Integrated Moving Average model. Parameter configuration: ( p , d , q ) = ( 2 , 1 , 2 ) , seasonal parameters ( P , D , Q , s ) = ( 1 , 1 , 1 , 24 ) , where p, P represent autoregressive orders, d, D represent differencing orders, q, Q represent moving average orders, and s = 24 indicates daily periodicity.
(2) Deep Learning Methods
  • CLN-FIWV: LSTM with fixed initialisation weights. Parameters: weight initialisation using Xavier uniform distribution, learning rate = 0.001 , batch size = 8 , hidden layer dimensions = [ 5 , 5 , 1 ] , total trainable parameters: 428.
  • CLN-RSIWV: LSTM with randomly sampled initialisation weights. Parameters: weight initialisation N ( 0 , 0.01 ) , learning rate = 0.001 , batch size = 8 , hidden layer dimensions = [ 5 , 5 , 1 ] , and total trainable parameters: 428.
(3) Meta-Learning Methods
  • ML-TP (KNN): Meta-learning framework using K-Nearest Neighbors as the meta-learner. Parameters: number of neighbors K = 3 , distance metric: Euclidean distance, weight assignment: uniform weights, feature dimension d = 10 .
  • dmTP (DNN): Meta-learning framework using deep neural network as the meta-learner. Architecture: three fully connected layers [ 300 , 300 , 400 ] , activation function: ReLU, output dimension matches base learner’s weight dimension, optimizer: Adam with learning rate = 0.0005 .
  • GMM-SCM: Gaussian Mixture Model meta-learner with Single-Component Mechanism. Parameters: number of components K selected automatically via BIC in the range [ 5 , 50 ] , covariance type: full, initialisation: K-Means++, maximum iterations = 100 .
  • GMM-SCM-DCM: The complete proposed model extending GMM-SCM with a Dynamic Component Mechanism. Parameters: splitting threshold τ split = 0.8 , merging threshold τ merge = 0.2 , minimum survival mixture coefficient π min = 0.05 , weight noise standard deviation σ w = 0.01 , mixture coefficient transition rate η = 0.7 .

4.3. Parameter Settings

(1) Baseline learner configuration.
To test the robustness of the entire model, this experiment employs LSTM networks of two distinct architectures as base learners, utilizing mean squared error (MSE) as the loss function. Table 3 details the parameter settings for the two LSTM network architectures.
Table 3. LSTM architecture.
(2) Meta-learner and correction parameters
KNN model: K-nearest neighbors set to 3.
dmTP’s DNN architecture: Configuration (sets a three-layer hidden network [300, 300, 400].)
GMM model: Implemented using scikit-learn’s GaussianMixture. Number of Gaussian components K automatically selected within the range [ 5 , 50 ] via Bayesian Information Criterion (BIC).
Feedback strength coefficient: Hyperparameter in the MCM correction mechanism β set to 1.0.
(3) Training settings
Meta-training dataset: Primarily tested at two scales—1000 samples and 7000 samples—to validate model effectiveness under small-sample conditions.
Fine-tuning set: Simulates data scarcity scenarios, primarily testing four scales: 24, 168, 500, and 840 (corresponding to 1 day, 1 week, approximately 3 weeks, and 5 weeks of data, respectively).
Training Epochs: 100.

4.4. Dataset Construction

To reflect traffic non-stationarity, this section reconstructs stationary and non-stationary test sets based on the Telecom Italia Milan dataset. As the Telecom Italia Milan dataset does not directly provide labels such as “residential area” or “commercial area”, we must classify cells according to their traffic pattern characteristics. This study employs a data-driven unsupervised learning algorithm for clustering analysis. The core principle is that cellular traffic from different functional zones (e.g., residential, commercial) exhibits distinct, quantifiable pattern characteristics. We define a set of well-specified meta-features to characterize these patterns, apply the K-Means++ algorithm for cell clustering, and finally assign functional zone labels based on the statistical features of each cluster.

4.4.1. Feature Engineering

Using I p = ( I p [ 1 ] , I p [ 2 ] , , I p [ N ] ) to represent the normalized traffic load time series of a cell P assuming traffic from different functional zones exhibits distinct periodic intensities and contour profiles. Based on this, we define five meta-features f p R 5 to characterize each cell area:
(1)
f 1 ( p ) : Ratio of average weekend traffic to average weekday traffic. Typically, this ratio exceeds 1 in residential areas and falls below 1 in commercial zones.
f 1 ( p ) = mean ( I p [ t ] t { weekends } ) mean ( I p [ t ] t { weekdays } )
(2)
f 2 ( p ) : Average traffic volume during the weekday morning peak period (e.g., 8:00–10:00).
f 2 ( p ) = mean ( I p [ t ] t [ 8 : 00 , 10 : 00 ] on weekdays )
(3)
f 3 ( p ) : Average traffic volume during the evening peak period on weekdays (e.g., 18:00–20:00).
f 3 ( p ) = mean ( I p [ t ] t [ 18 : 00 , 20 : 00 ] on weekdays )
(4)
f 4 ( p ) : Information entropy of the flow sequence, used to measure flow uncertainty.
f 4 ( p ) = i = 1 B p ( x i ) log 2 p ( x i )
Here, we partition the range of into B equal-width bins (bins); p ( x i ) denotes the probability that the traffic value falls within the ith bin.
(5)
f 5 ( p ) : 24-h lag autocorrelation coefficient, measuring the strength of diurnal periodicity.
f 5 ( p ) = t = 1 N 24 ( I p [ t ] I ¯ p ) ( I p [ t + 24 ] I ¯ p ) t = 1 N ( I p [ t ] I ¯ p ) 2
Compute the above five feature components for all cells to obtain the feature matrix F = [ f 1 , f 2 , , f P ] T R P × 5 , where P denotes the total number of cells. Subsequently perform Z-score standardization to eliminate dimensional effects. Here, μ d and σ d represent, respectively, the mean and standard deviation of the dth feature across all cells.
f ˜ p , d = ( f p , d μ d ) / σ d

4.4.2. K-Means++ Clustering Algorithm

This section primarily focuses on applying the K-Means++ clustering algorithm for cluster analysis. Given the standardized feature matrix F ˜ , to partition P cells into K clusters C = { C 1 , C 2 , , C K } , minimizing the sum of squares within clusters, the problem can be defined as:
minimize C k = 1 K f ˜ p C k f ˜ p μ k 2 subject to k = 1 K C k = { 1 , , P } , C k C k = for k k
where μ k = 1 | C k | f ˜ p C k f ˜ p represents the centroid of cluster C k .
For the aforementioned NP-complete problem, this study employs the standard K-Means algorithm combined with K-Means++ intelligent initialisation to obtain approximate solutions. K-Means++ selects initial centroids via the following probability distribution:
(1)
Randomly select the first centroid uniformly μ 1 .
(2)
For each non-cluster data point f ˜ p , calculate the squared distance to the nearest selected cluster center D ( f ˜ p ) 2 .
(3)
Select the next centroid probabilistically D ( f ˜ p ) 2 p D ( f ˜ p ) 2 .
(4)
Repeat steps (2) to (3) until K centers are selected.
(5)
Run standard K-Means algorithm iterations until centers stabilize or maximum iterations reached.
Finally, employing the elbow method and contour coefficient analysis, the optimal number of clusters is determined by synthesizing the peak values of both elbow points and contour coefficients.

4.4.3. Classification of Cellular Networks

Based on the calculations in Section 4.4.2, when K = 4, the following clustering results are obtained, as detailed in Table 4. This clustering outcome provides the foundation for constructing subsequent steady-state and non-steady-state test sets.
Table 4. Functional explanation and label assignment for different cellular partitions.

4.4.4. Construction of the Dataset

(1) Training set
A selection shall be made from residential and commercial areas, with these two categories of cells being randomly scrambled before a 70% proportion is randomly sampled.
(2) Stationary Test Set
Select residential and commercial areas to constitute the remaining 30% of cells outside the training set. Traffic patterns in these zones exhibit pronounced, reproducible periodicity (such as morning and evening rush hours, weekend patterns) and relatively stable distribution.
(3) Non-Stationary Test Set:
Selected from all non-stationary cell areas.
As shown in Table 5, The meta-training set comprises 70% randomly selected residential and commercial cells (Data from 1 November 2013 to 1 December 2013. The test portion consists of both stationary and non-stationary test sets, formed from the remaining 30% stationary cells and all non-stationary cells, respectively. For these test cells, this paper utilized data from 16 December 2013 to 22 December 2013 for fine-tuning, with all comparative models ultimately being evaluated using data from 23 December 2013 to 29 December 2013. This design ensures complete spatial and temporal isolation of test data from training data.
Table 5. Composition and purpose of datasets.

4.4.5. Simulation and Mapping of 5G Service Patterns

Although the original dataset originates from the 4G era, its diverse traffic patterns demonstrate high comparability and foresight regarding early 5G service characteristics. The four region types identified through clustering can be mapped to typical 5G scenarios:
Commercial Areas: Mapped to eMBB-dominated scenarios, characterized by sharp morning peaks on workdays with high throughput demands.
Residential Areas: Mapped to hybrid eMBB and partial mMTC scenarios, exhibiting evening peaks on workdays and stable all-day traffic on weekends, reflecting applications like home broadband and smart home devices.
Mixed-Use Areas: Represent the spatiotemporal mixture of eMBB services, constituting the most common complex scenario in urban environments.
Non-Stationary Areas: Represent traffic anomalies potentially caused by emergency events or cyberattacks, or unpredictable traffic in new service pilot zones, serving as critical testbeds for model adaptability.
Through this construction method, the non-stationary test set in this paper effectively simulates the concept drift problems caused by service dynamics and urban evolution in 5G networks.

5. Results and Analysis

5.1. Foundational Performance Evaluation

As shown in Table 6:
Table 6. Performance comparison of different methods on the stabilized test set (MAE).
(1) Under stable conditions, all meta-learning methods (KNN, DNN, GMM) still outperform traditional methods, though with more reasonable margins. For instance, the optimal meta-learning method (GMM-SCM) achieves a relative reduction of approximately 13.6% in MAE compared with the optimal traditional deep learning method (CLN-FIWV).
(2) GMM-SCM performs marginally better than KNN and DNN, demonstrating the efficacy of probabilistic modeling.
(3) GMM-SCM-DCM performed very closely to GMM-SCM (with an MAE difference of merely 0.0004), consistent with expectations. In stationary environments, the advantages of dynamic fusion are negligible and may even yield slightly inferior results to GMM-SCM due to the introduction of minor noise.
As Table 7 indicates:
Table 7. Performance comparison of different methods on the non-stationary test set (MAE).
(1) In non-stationary environments, all methods exhibit significant performance degradation, with MAE generally increasing and R2 markedly decreasing, which aligns with expectations for non-stationary flow forecasting tasks.
(2) The advantage of GMM-SCM-DCM becomes more pronounced. MAE is reduced by approximately 9.3% compared with GMM-SCM and improves by approximately 14.4% over the best baseline method (dmTP).
(3) KNN and DNN meta-learners exhibited severe performance degradation, as they struggled to handle novel patterns (non-stationarity) unseen in the training set. Conversely, the GMM-SCM-DCM model demonstrated superior generalization capability and robustness through its probabilistic framework and the dynamic weighting of DCM.
Figure 5 illustrates the impact of fine-tuning dataset scale on performance (non-stationary test set). As shown in Figure 5, the CLN-RSIWV and CLN-FIWV curves exhibit the highest performance, declining only gradually with increasing data volume. The KNN, DNN, and GMM-SCM curves occupy intermediate positions. The GMM-SCM-DCM curve consistently remains at the bottom, maintaining low error even with minimal data, demonstrating robust performance under small-sample conditions. As data volume increases, all curves gradually converge, but GMM-SCM-DCM ultimately achieves the best performance.
Figure 5. Impact of fine-tuning set scale on performance (non-stationary test set).
Figure 6 illustrates the convergence behavior of different models. As shown, the error curve for the CLN method starts from an extremely high value and requires approximately 80 iterations to converge slowly. GMM-SCM begins at a lower starting point. GMM-SCM-DCM exhibits the lowest starting point and converges rapidly to a stable state within approximately 10–15 iterations, demonstrating that its initial weights are very close to the optimal solution and effectively enhancing training efficiency.
Figure 6. Convergence behaviour of different models.

5.2. Model Robustness Testing

To demonstrate that the proposed framework is independent of specific base learner architectures and maintains stable performance across varying data scales, this section conducts robustness testing using a non-stationary test set.
Analysis of the results in Table 8 reveals that regardless of whether the base learner employs LSTM or GRU, the GMM-SCM-DCM framework consistently and significantly outperforms the conventional initialisation method (CLN-FIWV). GRU, owing to its fewer parameters and simpler structure, exhibits a marginal advantage under the limited data conditions of this experiment. This indicates that the proposed meta-learning framework constitutes a generalized architecture whose effectiveness is independent of specific underlying network choices, demonstrating robust performance.
Table 8. Robustness testing based on different base learners and training scales.
Figure 7 illustrates the sensitivity of different models to meta-training set size. As shown, dmTP (DNN) performance is highly dependent on the number of meta-training samples, even underperforming KNN at 1000 samples before improving with increased data. GMM-SCM and GMM-SCM-DCM demonstrate robust performance even with small meta-training sets, exhibiting stable improvement with increasing data volume. This validates the superior data efficiency and small-sample meta-learning capabilities of the GMM-SCM-DCM model.
Figure 7. Sensitivity analysis of meta-training set size.
Moreover, the performance of all models improves with an increase in the meta-training set size. Notably, even under extremely small sample conditions (1000 samples), GMM-SCM-DCM still provides high-quality initialisation, significantly outperforming traditional methods with larger datasets (7000 samples).

5.3. Ablation Studies

To validate the effectiveness of the proposed DCM and SCM mechanisms individually, as well as their synergistic effect when combined, ablation experiments were conducted on non-stationary traffic datasets. The comparison models are as follows:
GMM-MCM-DCM: Employing the Multi-Component Mechanism in place of SCM as the negative control group.
(GMM-SCM (Static): employs the SCM mechanism but disables DCM functionality (i.e., the number of GMM components remains fixed and is not dynamically adjusted). This group serves to validate the role of DCM.)
GMM-SCM-DCM (Ours): This paper’s proposed complete model.
The results of the ablation experiments are shown in Table 9. Analysis reveals:
Table 9. Ablation experiment results.
GMM-MCM-DCM vs. GMM-SCM-DCM (Static): GMM-MCM-DCM (exhibited the poorest performance (highest MAE, lowest R2), coupled with slow convergence. This validates this paper’s view: under multi-component fusion mechanisms, synthetic weights disrupt the bijective relationship between “component-pattern-weight set“, causing training oscillations and performance degradation, demonstrating the efficacy and necessity of the SCM mechanism.
GMM-SCM vs. GMM-SCM-DCM: Disabling DCM resulted in a noticeable decline in model performance on the non-stationary test set (MAE increased from 0.0835 to 0.0898), alongside requiring more iterations to converge. This indicates that when confronted with novel patterns unseen in the training set, the fixed GMM structure fails to effectively characterize and adapt, leading to a performance bottleneck. This directly demonstrates the core contribution of the DCM mechanism in addressing non-stationarity.
Overall, GMM-SCM-DCM achieved optimal performance, demonstrating the synergistic effect of SCM and DCM mechanisms: SCM ensures precise initialisation within components, while DCM ensures the component structure can dynamically evolve to adapt to external changes.

5.4. Cross-Dataset Generalization Capability Validation

This section’s experiments validate the GMM-SCM-DCM framework’s performance on another independent public dataset, demonstrating its effectiveness is not dataset-specific and possesses strong generalization capabilities.
The dataset selected for this section’s experiments is the Telecom Shanghai Dataset. Like Telecom Italia, this dataset is another commonly used public benchmark dataset within the field. Data preprocessing procedures are consistent with those applied to the Telecom Italia dataset.
As shown in Table 10 and Figure 8, the GMM-SCM-DCM framework demonstrated exceptional robustness when confronted with the more complex distribution and potentially higher noise levels of the Shanghai dataset. Although all models exhibited varying degrees of performance degradation, GMM-SCM-DCM still delivered superior results, indicating it is a highly reliable, versatile, and robust framework.
Table 10. Performance of different models on the Shanghai Dataset (MAE).
Figure 8. Cross-dataset performance comparison of models.

5.5. Visualization of GMM Component Dynamic Evolution Process

The objective of this section’s experiments is to visually demonstrate the DCM mechanism’s management of the entire lifecycle of GMM components (generation, splitting, merging, and elimination) within a long-duration simulated task flow, thereby validating its capability to handle non-stationary traffic fluctuations.
Using non-stationary flow to create a simulated scenario:
Initial stable period (T0–T50): The model has learned through the meta-training set two stable patterns: ”commercial zone” (marked by morning rush hour) and “residential zone” (marked by evening rush hour).
Non-stationary zone emerges (T50–T100): A large-scale construction project (e.g., new underground station) commences in one area, causing its traffic flow to lose periodicity and become disordered with violent fluctuations. The DCM must recognize this as a new “non-stationary zone” pattern.
Functional evolution resulting information of hybrid zones (T100–T200): An original “commercial zone” gradually evolves in function due to the completion of surrounding residential buildings, beginning to exhibit characteristics of both morning and evening rush hours. The DCM must detect this change and merge the old commercial pattern with the new hybrid pattern.
Figure 9 clearly demonstrates the DCM mechanism’s pivotal role in addressing non-stationary evolution of urban functions, specifically:
Figure 9. Dynamic evolution of GMM component weights.
Capturing new non-stationary patterns (T = 75): When construction zones induce non-stationary flow patterns (high entropy values, weak periodicity), existing commercial and residential components could not adequately characterize these patterns. The DCM generated a new component (Comp.3) via splitting operations, whose weight steadily increased, demonstrating the model’s successful identification and adaptation to this abrupt change.
Adapting to functional evolution (T = 175): As an area evolves from purely commercial to mixed use, its flow characteristics begin incorporating both morning and evening peak periods. DCM detects feature overlap and convergence between the original commercial component (Comp.1) and the emerging mixed use pattern, triggering a merging operation. The original commercial component is absorbed, with a new, more accurate mixed use Component (Comp.4) becoming dominant, optimizing the model structure and eliminating redundancy.
Clearing extinct patterns (T = 250): A residential zone component (Comp.2) experiences persistently declining traffic activity, its weight eventually falling below the minimum survival threshold. DCM then triggers an elimination mechanism, removing this component to concentrate model resources on currently active patterns.
Achieving a new steady state: Ultimately, the model is dominated by Comp.3 (non-stationary zone) and Comp.4 (mixed zone), accurately reflecting the city’s entirely new functional state following its evolution.

5.6. Computational Complexity Analysis

This section provides both theoretical analysis and experimental comparison of the computational complexity for the proposed method and baseline algorithms during both training and inference phases.
Theoretical Analysis:
SARIMA: Time complexity is O ( ( p + q + P + Q ) L 2 ) , where p, q are non-seasonal AR and MA orders, P, Q are seasonal AR and MA orders, and L is the sequence length. Space complexity is O ( L ) .
SVR: Training using the sequential minimal optimization algorithm has complexity between O ( n 2 d ) and O ( n 3 d ) , where n is the number of samples and d is the feature dimension. Inference complexity is O ( n s d ) , where n s is the number of support vectors.
LSTM (CLN-RSIWV/FIWV): Training complexity is O ( T · L · h 2 + T · h · h i n ) , where T is the sequence length, L is the number of LSTM layers, h is the hidden dimension, and h i n is the input dimension. The complexity for a single time step forward pass is O ( h i n h + h 2 ) .
ML-TP (KNN): Training only requires storing meta-features with complexity O ( 1 ) . Inference requires computing similarities between the new task and all historical tasks with complexity O ( N d ) , where N is the number of tasks and d is the feature dimension.
dmTP (DNN): Training complexity is O ( E · B · l = 1 L h l 1 h l ) , where E is the number of epochs, B is the batch size, L is the number of layers, and h l is the number of neurons in layer l. Inference complexity is O ( l = 1 L h l 1 h l ) .
GMM-SCM: Training using the Expectation Maximization algorithm has per-iteration complexity O ( K N d 2 ) , where K is the number of components, N is the number of samples, and d is the feature dimension. Inference complexity for computing posterior probabilities is O ( K d 2 ) .
GMM-SCM-DCM: The training complexity is the same as GMM-SCM. During online adaptation, the DCM mechanism adds operations for pattern novelty detection and component management: pattern novelty detection complexity O ( K d 2 ) , component splitting O ( d 3 ) , component merging O ( K d 2 ) , and component elimination O ( K ) .
Experimental Results:
All models were evaluated under identical hardware conditions, recording average training and inference times on the non-stationary test set. The test set contains 100 tasks, each comprising 168 h (1 week) of traffic data with a time slot length of 1 h. Experimental results are shown in Table 11.
Table 11. Computational efficiency comparison of different models.
The experimental results show that traditional statistical methods (SARIMA, SVR) have advantages in training and inference speed but limited prediction accuracy. Deep learning methods (CLN series) require longer training times but offer fast inference. Among meta-learning methods, the KNN meta-learner has the fastest training but slower inference; the DNN meta-learner has the highest training complexity; and GMM-based methods achieve a good balance between training and inference efficiency. GMM-SCM-DCM significantly improves prediction accuracy and adaptation capability in non-stationary environments with only a modest increase in computational overhead (approximately 19% increase in training time and 7% increase in inference time compared with GMM-SCM), demonstrating good engineering feasibility.
Furthermore, this paper analyzes the relationship between computational complexity and task scale for each model. As the number of tasks N increases, the inference time of the KNN meta-learner grows linearly ( O ( N ) ), while the inference time of GMM-based methods remains stable ( O ( 1 ) ), making them more suitable for large-scale deployment scenarios.

6. Conclusions

The experiments in this chapter demonstrate that the GMM-SCM-DCM model outperforms traditional deep learning methods and other meta-learning baseline algorithms in cellular traffic forecasting tasks, whether for stationary or non-stationary traffic, in terms of prediction accuracy, convergence speed, and robustness.
Analysis reveals that within the GMM-SCM-DCM framework, the GMM provides probabilistic modeling of weight distributions, offering greater interpretability and statistical efficiency than KNN and DNN. SCM delivers precise initialisation under stationary conditions. DCM is pivotal for handling non-stationary traffic, with its dynamic fusion mechanism significantly enhancing the model’s generalization capability and robustness. This framework is particularly suited for scenarios involving new base stations or sudden events, where historical data is scarce or diverges substantially from past patterns, rendering traditional methods ineffective. Our approach enables rapid provision of high-quality initialisation, achieving an efficient “cold start”.
The findings of this research not only present a novel and efficient meta-learning solution for cellular network traffic forecasting but also offer valuable insights and implementation pathways for broader non-stationary time series prediction challenges. Moving forward, we shall further explore integrating the DCM mechanism with paradigms such as online learning and continuous learning to enhance the model’s adaptability and stability within long-term evolving environments.
The findings of this study not only provide an efficient meta-learning solution for cellular network traffic prediction but also offer valuable modeling insights and implementation pathways for broader non-stationary time series forecasting problems. Although GMM-SCM-DCM demonstrates excellent performance in non-stationary traffic prediction, several promising research directions remain:
Online and continual learning: Integrating the DCM mechanism with online learning frameworks to achieve lifelong learning on continuously arriving data streams while mitigating catastrophic forgetting.
Federated meta-learning: Extending the proposed framework to federated learning environments, enabling collaborative meta-learning across cellular networks under data privacy constraints.
Interpretability and visualization: Incorporating explainable AI techniques (e.g., SHAP, LIME) to provide visual interpretations of the meta-learner’s component assignment and pattern evolution processes.
Cross-modal fusion: Enhancing the perception and prediction capability for non-stationary patterns by incorporating external multimodal data (e.g., meteorological, social events, traffic flow).
Lightweight design and edge deployment: Optimizing the model architecture and component management strategies to adapt to the resource constraints of edge computing devices, enabling real-time prediction on the end side.
Future work will focus on these directions to further enhance the model’s applicability and intelligence in practical network environments.

Author Contributions

Conceptualization, C.L. and X.L.; Methodology, C.L.; Investigation, J.Z. (Jingyu Zhang); Data Curation, J.Z. (Jiao Zhu) and J.Z. (Jingyu Zhang); Writing—Original Draft Preparation, X.L.; Writing—Review & Editing, Y.L. and J.D.; Supervision, S.Z. and Q.S. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Data Availability Statement

The dataset analyzed in this study is publicly available from the repository at https://doi.org/10.1038/sdata.2015.55.

Conflicts of Interest

Author Jiao Zhu was employed by the company China Unicom Research Institute. The remaining authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Abbreviations

The following abbreviations are used in this manuscript:
SARIMASeasonal Autoregressive Integrated Moving Average
SVRSupport Vector Regression
LSTMLong Short-Term Memory
GRUGated Recurrent Unit
CNNConvolutional Neural Network
RNNRecurrent Neural Network
GNNGraph Neural Network
TransformerTransformer Architecture
SNNSpiking Neural Network
GMMGaussian Mixture Model
DCMDynamic Component Management
SCMSingle-Component Mechanism
MCMMulti-Component Mechanism
FFTFast Fourier Transform
MAEMean Absolute Error
MSEMean Squared Error
R2Coefficient of Determination
BICBayesian Information Criterion
KNNsK-Nearest Neighbors
DNNDeep Neural Network
DCGRUDiffusion Convolutional Gated Recurrent Unit
KLKullback–Leibler divergence
QoSQuality of Service
FFLFederated Learning
eMBBEnhanced Mobile Broadband
mMTCsMassive Machine-Type Communications
URLLCsUltra-Reliable Low-Latency Communications
B5GBeyond 5G
IoTInternet of Things
AIGCArtificial Intelligence-Generated Content
MDRNNMulti-Dimensional Recurrent Neural Network
HSTNetHybrid Spatial–Temporal Network
STP-GLNSpatial–Temporal Parallel Graph Learning Network
HGCRNHybrid Graph Convolutional Recurrent Network
DBSTGNN-AttDual-Branch Spatial–Temporal GNN with Attention
KGDAKnowledge Graph-Driven Decomposition Approach
DIC-STDecomposition with Independent Causal Structure Learning for Time Series
ML-TPMeta-Learning for Traffic Prediction
MGSTCMulti-Granularity Spatio-Temporal Complementarity
TSENetTemporal–Spatial Embedding Network
DCG-MAMDiffusion Convolutional GRU with Multi-Head Attention Mechanism

References

  1. Box, G.E.P.; Jenkins, G.M.; Reinsel, G.C.; Ljung, G.M. Time Series Analysis: Forecasting and Control; John Wiley & Sons: Hoboken, NJ, USA, 2015. [Google Scholar]
  2. Smola, A.J.; Schölkopf, B. A tutorial on support vector regression. Stat. Comput. 2004, 14, 199–222. [Google Scholar] [CrossRef]
  3. Hyndman, R.J.; Athanasopoulos, G. Forecasting: Principles and Practice; OTexts: Melbourne, Australia, 2018. [Google Scholar]
  4. Hochreiter, S.; Schmidhuber, J. Long short-term memory. Neural Comput. 1997, 9, 1735–1780. [Google Scholar] [CrossRef]
  5. Yu, Y.; Si, X.; Hu, C.; Zhang, J. A review of recurrent neural networks: LSTM cells and network architectures. Neural Comput. 2019, 31, 1235–1270. [Google Scholar] [CrossRef]
  6. Wang, J.; Zheng, V.W.; Liu, Y.; Yang, Q. Deep learning for spatio-temporal data mining: A survey. IEEE Trans. Knowl. Data Eng. 2020, 34, 3681–3700. [Google Scholar] [CrossRef]
  7. Hospedales, T.; Antoniou, A.; Micaelli, P.; Storkey, A. Meta-learning in neural networks: A survey. IEEE Trans. Pattern Anal. Mach. Intell. 2021, 44, 5149–5169. [Google Scholar] [CrossRef]
  8. Aouedi, O.; Le, V.A.; Piamrat, K.; Ji, Y. Deep learning on network traffic prediction: Recent advances, analysis, and future directions. ACM Comput. Surv. 2025, 57, 1–37. [Google Scholar] [CrossRef]
  9. Mohan, R.R.; Vijayalakshmi, K.; Augustine, P.J.; Venkatesh, R.; Nayagam, M.G.; Jegajohi, B. A comprehensive survey of machine learning-based mobile data traffic prediction models for 5G cellular networks. In AIP Conference Proceedings; AIP Publishing LLC: New York, NY, USA, 2024; Volume 2816, p. 100002. [Google Scholar]
  10. Andrews, J.G.; Buzzi, S.; Choi, W.; Hanly, S.V.; Lozano, A.; Soong, A.C.K. What Will 5G Be? IEEE J. Sel. Areas Commun. 2014, 32, 1065–1082. [Google Scholar] [CrossRef]
  11. Ordonez-Lucena, J.; Chavarria, J.F.; Contreras, L.M.; Pastor, A. The Use of 5G Non-Public Networks to Support Industry 4.0 Scenarios. IEEE Commun. Mag. 2019, 57, 108–114. [Google Scholar] [CrossRef]
  12. Mao, Y.; You, C.; Zhang, J.; Huang, K.; Letaief, K.B. A Survey on Mobile Edge Computing: The Communication Perspective. IEEE Commun. Surv. Tutor. 2017, 19, 2322–2358. [Google Scholar] [CrossRef]
  13. Popovski, P.; Trillingsgaard, K.F.; Simeone, O.; Durisi, G. 5G Wireless Network Slicing for eMBB, URLLC, and mMTC: A Communication-Theoretic View. IEEE Access 2018, 6, 55765–55779. [Google Scholar] [CrossRef]
  14. Chen, X.; Jin, Y.; Qiang, S.; Hu, W.; Jiang, K. Analyzing and Modeling Spatio-Temporal Dependence of Cellular Traffic at City Scale. In Proceedings of the 2015 IEEE International Conference on Communications (ICC), London, UK, 8–12 June 2015; pp. 3585–3591. [Google Scholar] [CrossRef]
  15. Tran, Q.T.; Hao, L.; Trinh, Q.K. Cellular network traffic prediction using exponential smoothing methods. J. Inf. Commun. Technol. 2019, 18, 1–18. [Google Scholar] [CrossRef]
  16. Tran, Q.T.; Hao, L.; Trinh, Q.K. A comprehensive research on exponential smoothing methods in modelling and forecasting cellular traffic. Concurr. Comput. Pract. Exp. 2020, 32, e5602. [Google Scholar] [CrossRef]
  17. Jaffry, S. Cellular traffic prediction with recurrent neural network. arXiv 2020, arXiv:2003.02807. [Google Scholar] [CrossRef]
  18. Santos Escriche, E.; Vassaki, S.; Peters, G. A comparative study of cellular traffic prediction mechanisms. Wirel. Netw. 2023, 29, 2371–2389. [Google Scholar] [CrossRef]
  19. Cao, S.; Liu, W. LSTM network-based traffic flow prediction for cellular networks. In International Conference on Simulation Tools and Techniques; Springer International Publishing: Cham, Switzerland, 2019; pp. 643–653. [Google Scholar]
  20. Kurri, V.; Raja, V.; Prakasam, P. Cellular traffic prediction on blockchain-based mobile networks using LSTM model in 4G LTE network. Peer-to-Peer Netw. Appl. 2021, 14, 1088–1105. [Google Scholar] [CrossRef]
  21. Zhang, D.; Liu, L.; Xie, C.; Yang, B.; Liu, Q. Citywide cellular traffic prediction based on a hybrid spatiotemporal network. Algorithms 2020, 13, 20. [Google Scholar] [CrossRef]
  22. Zhan, S.; Yu, L.; Wang, Z.; Du, Y.; Yu, Y.; Cao, Q.; Dang, S.; Khan, Z. Cell traffic prediction based on convolutional neural network for software-defined ultra-dense visible light communication networks. Secur. Commun. Netw. 2021, 2021, 9223965. [Google Scholar] [CrossRef]
  23. Hu, X.; Liu, W.; Huo, H. An intelligent network traffic prediction method based on Butterworth filter and CNN–LSTM. Comput. Netw. 2024, 240, 110172. [Google Scholar] [CrossRef]
  24. Khalid, M. Traffic prediction in cellular networks using graph neural networks. arXiv 2023, arXiv:2301.12605. [Google Scholar] [CrossRef]
  25. Chen, G.; Guo, Y.; Zeng, Q.; Zhang, Y. A novel cellular network traffic prediction algorithm based on graph convolution neural networks and long short-term memory through extraction of spatio-temporal characteristics. Processes 2023, 11, 2257. [Google Scholar] [CrossRef]
  26. Zhang, M.; Zhou, H.; Yu, K.; Wu, X. Cellular network traffic prediction with hybrid graph convolutional recurrent network. Wirel. Pers. Commun. 2024, 138, 1867–1892. [Google Scholar] [CrossRef]
  27. Cai, Z.; Tan, C.; Zhang, J.; Zhu, L.; Feng, Y. DbSTGNN-ATT: Dual Branch Spatio-Temporal Graph Neural Network with an Attention Mechanism for Cellular Network Traffic Prediction. Appl. Sci. 2024, 14, 2173. [Google Scholar] [CrossRef]
  28. Lin, C.Y.; Su, H.T.; Tung, S.L.; Hsu, W.H. Multivariate and propagation graph attention network for spatial-temporal prediction with outdoor cellular traffic. In Proceedings of the 30th ACM International Conference on Information & Knowledge Management, Virtual Event, 1–5 November 2021; pp. 3248–3252. [Google Scholar]
  29. Xiao, J.; Cong, Y.; Zhang, W.; Weng, W. A cellular traffic prediction method based on diffusion convolutional GRU and multi-head attention mechanism. Cluster Comput. 2025, 28, 125. [Google Scholar] [CrossRef]
  30. Ma, H.; Yang, K.; Pun, M.O. Cellular traffic prediction via deep state space models with attention mechanism. Comput. Commun. 2023, 197, 276–283. [Google Scholar] [CrossRef]
  31. Wang, J.; Shen, L.; Fan, W. A T-SENet model for predicting cellular network traffic. Sensors 2024, 24, 1713. [Google Scholar] [CrossRef]
  32. Tang, C.; Lu, J.; Yang, W.; Xing, W.; Tang, C.; Zou, A.; Guo, J. A frequency-domain multilayer perceptron with multiscale attention network for cellular traffic prediction. Comput. Netw. 2025, 271, 111593. [Google Scholar] [CrossRef]
  33. Gong, J.; Li, T.; Wang, H.; Liu, Y.; Wang, X.; Wang, Z.; Deng, C.; Feng, J.; Jin, D.; Li, Y. KGDA: A knowledge graph driven decomposition approach for cellular traffic prediction. ACM Trans. Intell. Syst. Technol. 2024, 15, 1–22. [Google Scholar] [CrossRef]
  34. Zhang, K.; Chuai, G.; Zhang, J.; Chen, X.; Si, Z.; Maimaiti, S. Dic-st: A hybrid prediction framework based on causal structure learning for cellular traffic and its application in urban computing. Remote Sens. 2022, 14, 1439. [Google Scholar] [CrossRef]
  35. Wang, H.; Wang, L.; Zhao, S.; Yue, X. Multi-dimensional prediction model for cell traffic at city scale. Int. J. Pattern Recognit. Artif. Intell. 2021, 35, 2150010. [Google Scholar] [CrossRef]
  36. Zhang, S.; Zhao, S.; Yuan, M.; Zeng, J.; Yao, J.; Lyu, M.R.; King, I. Traffic prediction-based power saving in cellular networks: A machine learning method. In Proceedings of the 25th ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems, Redondo Beach, CA, USA, 7–10 November 2017; pp. 1–10. [Google Scholar]
  37. Liu, C.; Wu, T.; Li, Z.; Wang, B. Individual traffic prediction in cellular networks based on tensor completion. Int. J. Commun. Syst. 2021, 34, e4952. [Google Scholar] [CrossRef]
  38. Chen, G. Spatiotemporal Individual Mobile Data Traffic Prediction. Ph.D. Thesis, INRIA Saclay-Ile-de-France, Palaiseau, France, 2018. [Google Scholar]
  39. Wu, J.; Zeng, M.; Chen, X.; Li, Y.; Jin, D. Characterising and predicting individual traffic usage of mobile applications in cellular networks. In Proceedings of the 2018 ACM International Joint Conference and 2018 International Symposium on Pervasive and Ubiquitous Computing and Wearable Computers, Singapore, 8–12 October 2018; pp. 852–861. [Google Scholar]
  40. Qi, Y.; Wang, H. QoS-aware cell association based on traffic prediction in heterogeneous cellular networks. IET Commun. 2017, 11, 2775–2782. [Google Scholar] [CrossRef]
  41. Li, F.; Zhang, Z.; Chu, X.; Zhang, J.; Qiu, S.; Zhang, J. A meta-learning based framework for cell-level mobile network traffic prediction. IEEE Trans. Wirel. Commun. 2023, 22, 4264–4280. [Google Scholar] [CrossRef]
  42. Dainotti, A.; Pescapé, A.; Rossi, P.S.; Palmieri, F.; Ventre, G. Internet traffic modeling by means of Hidden Markov Models. Comput. Netw. 2008, 52, 2645–2662. [Google Scholar] [CrossRef]
  43. Guarino, I.; Aceto, G.; Ciuonzo, D.; Montieri, A.; Persico, V.; Pescapè, A. Explainable deep-learning approaches for packet-level traffic prediction of collaboration and communication mobile apps. IEEE Open J. Commun. Soc. 2024, 5, 1299–1324. [Google Scholar] [CrossRef]
  44. Bolla, R.; Repetto, M. A new model for network traffic forecast based on user’s mobility in cellular networks with highway stretches. Int. J. Commun. Syst. 2004, 17, 911–934. [Google Scholar] [CrossRef]
  45. Vesselinova, N.V. On the road to more accurate mobile cellular traffic predictions. arXiv 2023, arXiv:2305.15234. [Google Scholar] [CrossRef]
  46. Basyoni, Y.; Abbas, H.M.; Talaat, H.; Dimeery, I.E. Speed prediction from mobile sensors using cellular phone-based traffic data. IET Intell. Transp. Syst. 2017, 11, 387–396. [Google Scholar] [CrossRef]
  47. Lin, C.Y.; Tung, S.L.; Hsu, W.H. Geographical Cellular Traffic Prediction with Multivariate Spatio-Temporal Modelling. In Proceedings of the STRL@ IJCAI, Macao, China, 21 August 2023. [Google Scholar]
  48. Gao, X.; Zhao, Y.; Wang, J.; Zhang, C. Federated learning cellular traffic prediction based on multi-time scale information. In International Conference on Intelligent Computing; Springer Nature Singapore: Singapore, 2024; pp. 276–289. [Google Scholar]
  49. Fu, N.; Liu, S.; Xie, W.; Huang, Y. Multi-grained spatial-temporal feature complementarity for accurate online cellular traffic prediction. arXiv 2025, arXiv:2508.08281. [Google Scholar] [CrossRef]
  50. Tsiolakis, T.; Pavlidis, N.; Perifanis, V.; Efraimidis, P.S. Carbon-aware machine learning: A case study on cellular traffic forecasting with spiking neural networks. In IFIP International Conference on Artificial Intelligence Applications and Innovations; Springer Nature: Cham, Switzerland, 2024; pp. 178–191. [Google Scholar]
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Article Metrics

Citations

Article Access Statistics

Multiple requests from the same IP address are counted as one view.