A TSENet Model for Predicting Cellular Network Traffic

Wireless sensor networks (WSNs) are gaining traction in the realm of network communication, renowned for their adaptability, configuration, and flexibility. The forthcoming network traffic within WSNs can be forecasted through temporal sequence models. In this correspondence, we present a method (TSENet) that can accurately predict the traffic in the cellular network. TSENet is composed of transformers and self-attention network. We have designed a temporal transformer module specifically for extracting temporal features. This module accomplishes this by modeling the traffic flow within each grid of the communication network at both near-term and periodical intervals. Simultaneously, we amalgamate the spatial features of each grid with information from its correlated grids, generating spatial predictions within the spatial transformer. Furthermore, we employ self-attention aggregation to capture dependencies between external factor features and cellular data features. Empirical assessments performed on a genuine cellular traffic dataset offer compelling evidence substantiating the efficacy of TSENet.


Introduction
In recent years, the swift progress in distributed computing and micro-electromechanical systems (MEMS) has propelled wireless sensor networks into numerous domains, encompassing military [1], intelligent architecture [2], and industrial sectors [3,4].To observe diverse phenomena like network traffic, sensors are equipped with sensing, computing, communication, and additional functionalities.
With the continuous development of wireless communication technologies, higher requirements are being imposed on wireless communication networks.Among these requirements, accurately predicting the future change trend of key performance indicators of wireless cells holds crucial significance.It profoundly impacts the network planning, network optimization, and network energy-saving practices of mobile communication operators.
Predicting the change trend of the key performance index of the wireless cell is very important for the mobile communication operator, which can help the operator to plan the network.By accurately predicting future KPI changes, operators can properly plan network resource allocation, ensure stable network connection during peak hours, and avoid congestion and service quality deterioration.By learning the change trends of performance counters, operators can adjust network parameters and configurations in a timely manner to optimize network performance and provide better user experience.For example, according to the prediction result, the operator may decide whether to increase the number of base stations, adjust the power allocation, or change the channel allocation policy, thereby increasing the network capacity and coverage area.
In recent studies, Yu et al. [5] tackled the challenge of cellular data prediction with STEP, a spatiotemporal fine-grained mechanism utilizing GCGRN.The deep graph convolution network outperformed existing methods, offering efficiency and reduced resource consumption in cellular networks.Santos et al. [6] evaluated DL and ML models for short-term mobile Internet traffic prediction.LSTM exhibited excellent performance across diverse clusters in Milan.Hardegen et al. [7] presented a pipeline using deep neural networks (DNNs) to predict information of real-world cellular data flows.Unlike traditional binary classifications, their system quantizer predicted flow features into three classes.They emphasized the potential for flow-based routing scenarios.Furthermore, Dodan et al. [8] proposed an efficient Internet traffic prediction system using optimized recurrent neural networks (RNNs).They addressed the challenges in large-scale networks, emphasizing the need for scalable solutions to anticipate traffic fluctuations, avoid congestion, and optimize resource allocation.Experimental outcomes showcased the enhanced efficacy of their RNN-based methodology compared to conventional approaches such as ARIMA and AdaBoost regressor.
In the realm of deep learning methodologies, both convolutional neural networks (CNNs) [9] and recurrent neural networks (RNNs) [10] stand as foundational structures for modeling temporal information.Cellular communication data exhibit temporal periodicity in their spatial distribution, bearing distinct characteristics [11,12].Consequently, contemporary deep learning models aim to integrate temporal and spatial information in cellular data.Zhang et al. [13] came up with STDenseNet, a convolutional neural network approach for cellular data prediction.In RNN models, elements in the input sequence are processed sequentially, leading to inefficiencies in handling lengthy sequences.Conversely, methodologies primarily reliant on CNNs may fall short in fully capturing spatial dependencies between adjacent features.Additionally, approaches leveraging RNNs for extracting prolonged temporal relationships exhibit inherent weaknesses.These constraints may result in an insufficient description of continuous and temporal features.
In response to these challenges, we present a model named TSENet, founded on the principles of transformers and self-attention.Initially introduced in 2017 within the realm of natural language processing, the transformer architecture boasts a parallel encoderdecoder architecture, later extended to various domains.Primarily, we harness and tailor its robust sequential modeling capabilities.Concretely, we devise a time-sequence module (TSM) and a spatial-sequence module (SSM), adept at simultaneously and accurately extracting temporal and spatial features within short time intervals.Subsequently, we employ self-attention mechanisms to amalgamate the model's prowess in capturing dependencies between external factors and cellular traffic characteristics.Our comprehensive experiments conducted on a bespoke dataset aim to substantiate the efficacy of our approach.Subsequent analyses and discussions delve into a nuanced exposition of the TSENet model.
In summary, the primary contributions of this study can be delineated as follows: (1) We pioneer the introduction of a spatiotemporal traffic forecasting model, christened TSENet, which harmoniously integrates external elements.This model, predicated on harnessing spatiotemporal features, takes into account the nuanced characteristics of external factors.( 2) By employing attention mechanisms, TSENet adeptly captures the multifaceted global spatiotemporal correlations within cellular traffic networks.Notably, TSENet extracts both enduring and fleeting temporal features, culminating in the establishment of a spatial nexus model among nodes.(3) Through the incorporation of self-attention mechanisms, TSENet effectively excavates the intricate interdependencies between external factors and cellular traffic, thereby further elevating the predictive precision of TSENet.

Wireless Sensor Networks
Network sensing focuses on capturing the dynamic state and performance metrics of the cellular network itself, such as signal strength, latency, and packet loss.On the other hand, environmental sensing involves monitoring external factors like weather con-ditions, interference, and population density.The features extracted from both network and environmental sensing contribute to the comprehensive set of parameters used for accurate network traffic estimations.Wireless sensor networks epitomize the avant-garde paradigm in ubiquitous computing, embodying the fusion of sophisticated sensor technologies and wireless communication.The structural backbone of these networks comprises diminutive yet potent sensor nodes.Communication among these sensors unfolds through wireless protocols, fostering seamless interconnectivity between nodes [14].This mirrors the traditional single-receiver WSN (refer to Figure 1, left segment).However, this singular receiver scenario lacks scalability.Due to the substantial number of nodes, the collector accumulates more data, impeding the network's scalability when it reaches capacity.This is a frequent scenario, especially with multiple receivers in a network (depicted on the right side of Figure 1) [15].This decentralized approach exhibits advantages in terms of scalability, fault tolerance, and load distribution.In practical applications, wireless sensors serve diverse purposes in the realm of cellular traffic monitoring.Krishnasamy et al. [16] provided an overview of wireless sensor networks, comprising multiple sensors with sensing and communication functionalities.These sensors both receive and transmit data to designated nodes.The core purpose of a sensor node is to monitor environmental conditions, process the acquired data, and then transmit it to an analysis center.Sensor deployment locations are typically nonuniformly designed, often installed randomly in irregularly shaped specific locations based on their transmission range.Hence, it is important that the deployed algorithms can be adapted to each different geographical area.In essence, deploying wireless sensors facilitates the analysis of traffic patterns, detection of anomalous behavior, optimization of cellular networks, and predictive modeling.

Network Traffic Prediction
Network traffic prediction has been the subject of prolonged inquiry.ARIMA [17] and Holt-Winters (HW) [18] are both univariate time series models.ARIMA leverages autoregressive, difference, and moving average components, with the component order determined by Akaike or Bayesian information criteria.Kim et al. [19] introduced the integer-valued generalized autoregressive conditional heteroscedasticity (INGARCH) as a nonlinear model for network traffic prediction to study traffic arrival processes in various network topologies, including IoT and VANETs.The results suggested that INGARCH was a suitable and tractable model for network traffic prediction, performing competitively or superiorly to other statistical models, particularly in data-insufficient scenarios compared to deep neural networks.In contrast to statistical models, ST-GPKL models often show superior capabilities [20].Xia et al. [21] applied RF and LightGBM to the network traffic prediction model.Zhang et al. [22] employed a model based on Gaussian functions and a weighted EM method to optimize UAV deployment for reduced power consumption in downlink transmission and mobility.Traffic prediction in machine learning often relies on supervised neural networks like MLP and RNN in time series forecasting (TSF) approaches, demonstrating high accuracy with limited complexity [23].Non-TSF methods explore inferring traffic volumes from flow count and packet header fields, showing lower but acceptable error rates, motivating further research in ensemble learning for enhanced accuracy and automation of network management activities.Related work in machine learning traffic prediction involves addressing challenges related to ground truth (GT), emphasizing the scarcity of labeled data in dynamic environments like network traffic.Strategies such as PCA and active Learning are explored to overcome this limitation [24].In recent years, neural networks have emerged as potent tools, attaining significant success across various prediction domains, notably in traffic prediction challenges.Basic yet pivotal, the feed-forward neural network serves as the cornerstone for cellular traffic prediction, incorporating hidden layers and diverse activation functions like tanh, sigmoid, and ReLU to enhance learning capabilities.However, for spatial-temporal prediction challenges, a flattening calculation is required, potentially leading to a lack of spatial information and a decline in prediction performance.

Transformer
In recent years, endeavors rooted in the transformative power of transformers have burgeoned across domains such as computer vision, signal processing [25], and bioinformatics [26], yielding superlative outcomes in these realms.An array of studies has been conducted to unravel the potential of transformer networks in forecasting network traffic, demonstrating their superiority over conventional machine learning methodologies.Wen et al. [27] proffered a transformer-based model meticulously designed for the prediction of short-term traffic congestion.This model, when juxtaposed against traditional machine learning methods like LSTM [28], exhibited a pronounced prowess.Qi et al. [29] expounded upon the performer model, an innovative transformer-based predictive methodology for clustering voluminous time series data, showcasing conspicuous efficacy in comparison to conventional methods such as random forest and SVM.Chuanting Zhang et al. [30] presented a groundbreaking framework for wireless traffic prediction, termed federated dual attention (FedDA).FedDA facilitates collaborative training among multiple edge clients to cultivate a high-fidelity predictive model, eclipsing state-of-the-art methods with substantial enhancements in average mean squared error performance.Rao et al. [31], leveraging deep learning techniques, undertook an exhaustive exploration of challenges and recent strides in cell load balancing.Feng et al. [32] propounded an end-to-end framework christened as DeepTP, meticulously designed for predicting traffic demands from mobile network data through the prism of deep learning techniques.DeepTP not only demonstrated a superiority over other cutting-edge traffic prediction models by exceeding them but also adeptly captured the spatiotemporal characteristics and external factors' influence for a more precise predictive outcome.
Despite achieving commendable milestones in internet traffic prediction, further exploration is requisite regarding the application of cellular traffic data models based on wireless sensor networks.Diverging from prior endeavors, the utilization of self-attention mechanisms has been employed to congregate the extraction of external factors and dependencies in network traffic features, thereby judiciously harnessing external influences.Moreover, we employ the transformer architecture to capture temporal and spatial information within concise intervals.Through the efficacy of multi-head self-attention mechanisms, we adeptly discern spatial relationships effectively interlaced between grids.This novel structure holds promise in mitigating, to some extent, the limitations inherent in existing methodologies.

Method
In this paper, we use the self-attention mechanism to aggregate external factors and dependencies in network traffic features and use a transformer to capture temporal and spatial information in concise intervals.Through the effectiveness of multi-head self-attention mechanism, we can skillfully identify effective interlaced spatial relationships between grids.This section describes the TSENet framework, including its spatial-sequence module, time-sequence module, external factor module, and fusion output module.

The Proposed TSENet Framework
This article introduces TSENet, which comprises four primary modules: the timesequence module, spatial-sequence module, external-factor module, and fusion output module, as illustrated in Figure 2. The time-sequence module and spatial-sequence module are designed based on the fundamental structures of the transformer.The TSM extracts recent temporal trends in cellular traffic M c (t) and periodic features in M p (t).The calculation block fuses them into temporal features.Concerning spatial feature modeling, the multi-head attention mechanism in a series of networks can effectively capture spatial relationships between network features.Real-time spatial features are captured in the SSM.The EFM is designed using a self-attention structure to extract external factor features.These three types of features complement each other, resulting in the generation of the final prediction through the FOM.The transformer enables elements of the input sequence to contribute to the result, incorporating target sequence prediction values in the decoding layer, thus achieving more effective predictions using additional information.

Spatial-Sequence Module
In the SSM, the cellular traffic data of the pertinent network serve as elements within the input sequence.We use a multi-head attention structure in order to engender spatial predictions.It processes information from each network and its related networks.For each network, analyzing its relationship with all other networks yields a considerably extensive grid sequence.Indeed, only a subset of networks significantly influences the traffic states of others.Hence, we select the L networks most correlated with the current network (inclusive of the network itself).The chosen traffic data from M c (t) is linked to a new sequence, M s (t) ∈ R H×L×J , serving as input to the spatial-sequence block, wherein L is the length of the spatial array, and J denotes the dimension of each array.
(1) Grid arrangement: We sort each grid according to its correlation with other grids and select the top L largest grids.The value of L can be chosen based on experimental results.We employ the Pearson correlation coefficient to assess the correlation.The correlation between two grids A and B with equal traffic sequence lengths is calculated by the covariance of the variables and the product of their standard deviations: We concatenate the incoming data A in (t) ∈ R H×J and outgoing data A out (t) ∈ R H×J to form a matrix of dimensions H × 2J.Subsequently, the correlation matrix P(t) ∈ R H×H is computed according to Equation.P i,j signifies the relevance between the i and j grids at the t interval.(2) Spatial sequence: As illustrated in Figure 2, the source sequence and the initial sequence serve as the principal inputs to the essential transformer.In natural language processing (NLP), the elements of the sequence involve more informative word embeddings, but the array of cellular traffic data results in the scarcity of information.
We choose the two complementary input arrays to ensure the predictive performance of the transformer.The proximity matrix M c (t) signifies the cellular communication states close to the target interval.The M s (t) designed by M c (t) encompasses real-time spatial features for the sequence with a length of L, serving as the source array, and M c (t) ∈ R L×1×J is taken as the initial array with a length of 1, yielding the output of the SSM, Y spatial (t) ∈ R L×J .We omit positional encoding because there is no chronological order between spatial networks.

Time-Sequence Module
As depicted in Figure 2, the TSM employs two temporal transformers to individually acquire features from near-term and periodic continuous time.The calculation block utilizes a ⊕ operation and an SSM to aggregate these two features into a predictive time sequence.
(1) Closeness sequence: To harness additional information from the input data, M c (t) is extended as the source sequence of the proximity sequence.During the enhancement procedure, we use the correlation matrix P t to select the topmost U correlated networks for each grid.We combine the selected data and the grid data and create a scalar with dimension L × J × (1 + U), where U is a parameter determined through experimentation.This process enhances both the temporal features within the sequence elements and introduces a few space information in the original sequence, thereby enhancing the accuracy of predictions.
In the closeness sequence, the averaging of the second dimension of M p (t) ∈ R L×G×J yields an initial sequence of dimensions L × J × 1.We employ these averaged historical data as indicators for the traffic data levels in future time intervals.The closeness sequence produces the proximity time prediction, denoted as Y c (t) ∈ R L×J .(2) Periodic sequence: The periodic sequence is akin to the proximity sequence but lacks critical information.We employ M p (t) ∈ R L×G×J as the original sequence, with M c (t) ∈ R L×1×J serving as the initial sequence for the temporal sequence, thereby utilizing the compact time information included in M c (t) to complement the period sequence.The outcome of the periodic time prediction is expressed as Y p (t) ∈ R L×J .The time-sequence module mixes Y c (t) and Y p (t) up into the final temporal prediction Y time (t) ∈ R L×J .

External-Factor Module
The quantity of base stations (BSs), points of interest (POIs), and social activities within a specific region serves as a direct indicator of users' capacity to request telecommunication services.By leveraging the geographical location details of each community, we can derive a mapping of their locations through the processing of respective geographical information.This facilitates the capture of information related to the distribution of POIs.Furthermore, preprocessing steps enable the extraction of the social activity level from the dataset.
As illustrated in Figure 3, we utilize a self-attention mechanism to capture the feature representation of exterior factor ( f ∈ BSs, POIs, Socials).Mapping the input M f to distinct feature spaces, with query , where {C q , C k , C v } represents a set of 1 × 1 convolutional weights, and z k and z v denote channel numbers.Utilizing matrix multiplication to compute the similarity scores for each pair of points yields: The similarity between point i and point j can be expressed as: where M f ,i and M f ,j are feature scalars of shape z k × 1. Subsequently, the index of similarity is column-normalized: The aggregated feature at position i is computed as the weighted sum of all positions: represents the jth column of the value V f .Thus, in the fully connected layer, the input X ∈ R z v ×O results in the output vector w f ∈ R pO×1 .Finally, the vector is reshaped into an output W f ∈ R p×O .w f = σ(C e X + a f ), W f = Resahpe(w f ), where C e and a f constitute the learnable parameters of the fully connected layer.

Fusion Output Module
From the above analysis, it can be discerned that the traffic data of the community are not only related to the continuous flow data but also associated with the time periods.Initially, we integrate the relevant features Z to capture this relationship.Subsequently, In the TSM, we adeptly employed both hourly and daily data to extract temporal features, encompassing the recent and epochal aspects of time.Within our experimental domain, we further formulated a weekly dataset, denoted as M t (t), derived from the original matrix.We analyzed diverse time series to identify crucial temporal features for predicting cellular traffic on the TSM.Results in Table 2 demonstrate that combining proximity and daily periodic time features yielded the most effective outcomes.Longer cycle sequences tended to introduce disruptions to predictions, thereby reducing forecasting accuracy.

Analysis of Useful Spatial Features
In our model, each transformer necessitates two sequences as inputs.To enhance precision, we conducted experiments with varied combinations of input sequences.In Figure 5, the optimal inputs for each transformer are depicted.Spatial transformers leverage traffic data from pertinent grids and M c (t) to capture spatial information and self-information.Enclosed transformers and periodic transformers benefit from input sequences that incorporate recent and cyclical characteristics.

Experiment Analysis
We conducted experiments on our in-house dataset of cellular network traffic to contrast the proposed TSENet against alternative models, and the assessment outcomes are illustrated in Table 3.As shown, ARIMA exhibited an elevated MAE and RMSE on the dataset compared to alternative models, primarily because it disregarded other dependencies and concentrated solely on temporal features.SLTM's performance was inferior to deep learning approaches but surpassed statistical methods.HSTNet exclusively attended to temporal attributes, overlooking external factors such as BS information and POI distribution.Similarly, STDenseNet neglected external factors.Although TWACNet utilizes convolution-based networks, its performance fell short of ConvLSTM-based methods.STC-Net employs ConvLSTM-based networks but lacks the incorporation of self-attention and correlation layers for improved feature extraction.The aforementioned techniques heavily rely on networks for uncovering latent information, potentially inadequately modeling continuous spatiotemporal features.Compared to other methods, TSENet achieved the best performance.The TSENet model took 2765 s to complete 10 epochs during the training phase.On one side, the TSM and SSM amplify the capacity to capture continuous spatiotemporal features.Furthermore, TSENet introduces a self-attention mechanism, aggregating the capability to capture dependencies between extracted external information and network traffic information.(1) Augmentation of data: When setting Q to zero, it is equivalent to omitting data augmentation from the TSM.As discerned from Figure 5, the augmentation of data enhanced the predictive outcomes, with the minimum value of MAE observed when Q was set to 15. (2) Grid selection: Setting K to one is comparable to removing grid selection from the set-top box.As evidenced in Figure 5, grid selection markedly enhanced predictive performance.Optimal performance was achieved when selecting 20 grids.(3) Spatial features in the SSM: In order to substantiate the significance of extracting spatial features, we conducted experiments solely utilizing the TSM.The outcomes of TSENet (TSM only) presented in Table 4 underscore the efficacy of capturing spatial relationships.Additionally, we utilized a graph convolutional network (GCN) [33] in lieu of the SSM to represent spatial features in TSENet, assessing the spatial modeling efficacy of the SSM through experiments.The results in Table 4 demonstrate that incorporating the SSM yielded better performance compared to a GCN, enhancing the overall model performance.(4) Temporal fusion with the SSM: We also trained TSENet (without SSM fusion), where SSM was excluded from TSM.As illustrated in Table 4, TSENet (excluding SSM fusion) demonstrated superior performance in anomalous scenarios, confirming the spatial transformer's effectiveness in capturing real-time spatial dependence.

Conclusions
This paper introduced a novel TSENet to simultaneously explore spatial sequences, temporal sequences, and external factor information.TSENet treats cellular data across grids within a defined time interval as a spatial sequence, facilitating the modeling of realtime spatial correlations globally.Additionally, incorporating a self-attention mechanism, this method integrates the capability to capture dependencies among external factors and network traffic features.Experimental results showed the efficacy of TSENet, affirming the utility of the transformer architecture in cellular traffic prediction.This suggests that our proposed approach can enhance the accuracy of cellular network traffic forecasting.Future endeavors will involve developing more efficient transformers, unearthing additional valuable external information, and further refining predictions.

Figure 3 .
Figure 3. Structures of the external-factor module.

Figure 5 .
Figure 5. (a) MAE values when different transformers employed distinct initial sequences, (b) MAE values for SSM and enclosed transformers with varied K and Q.

Table 1 .
Parameters of different transformers.

Table 2 .
Different combinations of time converters.

Table 3 .
Experimental results of the RMSE and MAE.

Table 4 .
Transformer-based verification of key components.