Long-Term Data Trafﬁc Forecasting for Network Dimensioning in LTE with Short Time Series

: Network dimensioning is a critical task in current mobile networks, as any failure in this process leads to degraded user experience or unnecessary upgrades of network resources. For this purpose, radio planning tools often predict monthly busy-hour data trafﬁc to detect capacity bottle-necks in advance. Supervised Learning (SL) arises as a promising solution to improve predictions obtained with legacy approaches. Previous works have shown that deep learning outperforms classical time series analysis when predicting data trafﬁc in cellular networks in the short term (seconds/minutes) and medium term (hours/days) from long historical data series. However, long-term forecasting (several months horizon) performed in radio planning tools relies on short and noisy time series, thus requiring a separate analysis. In this work, we present the ﬁrst study comparing SL and time series analysis approaches to predict monthly busy-hour data trafﬁc on a cell basis in a live LTE network. To this end, an extensive dataset is collected, comprising data trafﬁc per cell for a whole country during 30 months. The considered methods include Random Forest, different Neural Networks, Support Vector Regression, Seasonal Auto Regressive Integrated Moving Average and Additive Holt–Winters. Results show that SL models outperform time series approaches, while reducing data storage capacity requirements. More importantly, unlike in short-term and medium-term trafﬁc forecasting, non-deep SL approaches are competitive with deep learning while being more computationally efﬁcient.


Introduction
In future 5G networks, it is expected that the rapid traffic growth and the coexistence of services with very different requirements will lead to constantly changing traffic patterns and network capacity requirements [1,2]. As a result of such dynamism, network (re-)dimensioning will become a critical task in these networks. As a result, operators will have to revise and constantly update their capacity plans to predict capacity bottlenecks in advance, avoiding troubles before user experience is degraded. To make it easier, smart capacity planning tools are developed in a Self-Organizing Networks (SON) framework [3,4]. Such tools, often implemented in the Operation Support System (OSS), require accurate prediction of upcoming traffic demand and network capabilities to anticipate traffic variations, so that network configuration can be timely upgraded to guarantee an adequate level of user satisfaction in this changing environment.
In the radio interface, capacity bottlenecks are detected by comparing traffic forecasts with some pre-defined threshold reflecting cell capacity, so that an alarm is activated to notify the future lack of resources. Then, different re-planning actions can be executed depending on how much in advance the problem is detected. Imminent problems detected with short-term forecasts often trigger temporary changes of network parameters (e.g., a more efficient voice coding scheme [5], new handover margin settings for traffic sharing between adjacent cells [6] or naïve packet schedulers for a lower computational load [7]).
Such quick actions, dealing with fast fluctuations of traffic demand, tend to be provisional, so that parameters return to their original values when normal network state is recovered, or act as temporary solution if the problem persists in the hope of more stable solutions relying on network capacity extensions. In contrast, long-term forecasts predict the lack of resources ahead in time (e.g., several months), so that more future proof solutions can be implemented (e.g., bandwidth extension [8], license extension for the maximum number of channel elements and/or simultaneous users [9] or new carriers/co-sited cells).
In the literature, the earliest works address circuit-switched traffic prediction by deriving statistical models based on historical data (e.g., Auto Regressive Integrated Moving Average-ARIMA) [10,11]. More modern approaches tackle packet-data traffic forecasting with sophisticated models based on supervised machine learning to take advantage of massive data collected in cellular networks [12,13]. Several models have been proposed to predict traffic in the short term (seconds, minutes) based on deep learning [14,15]. With these models, advanced dynamic radio resource management schemes and proactive self-tuning algorithms can be implemented [16]. However, some re-planning actions (e.g., deployment of a new cell) may take several months to be implemented (e.g., radio frequency planning, site acquisition, civil works, licenses, installation/commissioning, pre-launch optimization. . . ) [17]. Thus, the upcoming traffic must be predicted with much longer time horizons (i.e., several months) [18]. For this purpose, a monthly traffic indicator is often computed per cell from busy-hour measurements, limiting the number of historical data samples used for prediction [19,20]. Moreover, some studies [21,22] have shown that the influence of past measurements quickly diminishes after a few weeks, due to changes in user trends (e.g., new terminals, new hot spots. . . ) and re-planning actions by the operator (e.g., new site, equipment upgrades. . . ). As a consequence, long-term traffic forecasting relies on short and noisy time series, different from those used in short-term traffic forecasting, based on minute/hourly data. It is still to be checked if SL methods outperform classical time series analysis with these constraints.
In this work, a comprehensive analysis is carried out to compare the performance of SL against time series analysis schemes for predicting monthly busy-hour data traffic per cell in the long term. For this purpose, a large dataset is collected during 30 months from a live Long Term Evolution (LTE) network covering an entire country. All prediction techniques considered here are included in most data analytics packages and have already been used in several fields. Hence, the main novelty is the assessment of well-established SL methods for long-term data traffic forecasting based on short and noisy time series taken from current mobile networks offering a heterogeneous service mix. Specifically, the main contributions are: • The first comprehensive comparison of the performance of SL algorithms against classical time series analysis in long-term data traffic forecasting on a cell basis, relying on short and noisy time series. Algorithms are compared in terms of accuracy and computational complexity. • A detailed analysis of the impact of key design parameters, namely the observation window, the prediction horizon and the number of models to be created (one per cell or one for the whole network).
The rest of the document is organized as follows. Section 2 presents related work. Section 3 formulates the problem of predicting monthly busy-hour data traffic per cell, highlighting the properties of the time series involved. Section 4 describes the considered dataset. Section 5 outlines the compared forecasting methods. Section 6 presents the performance assessment. Finally, Section 7 summarizes the main conclusions.

Related Work
Traffic forecasting in telecommunication networks can be treated as a time series analysis problem. In the earliest works, circuit-switched traffic prediction is addressed by deriving statistical models based on historical data. Linear time series models, such as Auto Regressive Integrated Moving Average (ARIMA), capture trend and short-range dependen-cies in traffic demand. More complex models, such as Seasonal ARIMA (SARIMA) [10,23] and exponential smoothing (e.g., Holt-Winters) [24,25], include seasonality. These can be extended with non-linear models, such as Generalized Auto Regressive Conditionally Heteroskedastic (GARCH) [11], to reflect long-range dependencies.
The previous works show that it is possible to predict cellular traffic at different geographical scales (e.g., network operator [10], province [23], cell [24]) and time resolutions (e.g., minutes [11], hourly [24], daily [10], monthly [23]), provided that traffic is originated by circuit-switched services (e.g., voice and text messages). However, predicting packet data traffic is much more challenging [26]. As pointed out in [25], data traffic is more influenced by abnormal events and changes in network configuration than circuit-switched traffic. In [27], short-term traffic volume in a 3G network is predicted via Kalman filtering. In [28], ARIMA is used to predict the achievable user rate in four cells located in areas with different land use. In [29], application-level traffic is predicted by deriving an -stable model for 3 different service types and then dictionary learning is implemented to refine forecasts. As an alternative, more modern approaches tackle data traffic forecasting with sophisticated models based on Supervised Learning (SL). Most efforts have been focused on short-term prediction (seconds, minutes) to solve the limitations of time series analysis approaches to capture rapid fluctuations of the time series. A common approach is to use deep learning to model the spatio-temporal dependence of traffic demand. The temporal aspect of traffic variations is often captured with recurrent neural networks based on Long Short-Term Memory (LSTM) units [14,[30][31][32]. Alternatively, in [33], a deep belief network and a Gaussian model are used to capture temporal dependencies of network traffic in a mesh wireless network. The spatial dependence is captured by different approaches. In [14], the scenario is divided into a regular grid, and a convolutional neural network is used to model traffic spatial dependencies among grid points. A similar approach is considered in [34], where extra branches are added to the network for fusing external factors such as crowd mobility patterns or temporal functional regions. In [35], convolutional LSTM units and 3D convolutional layers are fused to encode the spatio-temporal dependencies of traffic carried in the grid points. Alternatively, other authors model spatial dependencies of traffic carried in different cells. In [32], a general feature extractor is used with a correlation selection mechanism for modeling spatial dependencies among cells and an embedding mechanism to encode external information. In [15], to deal with an irregular cell distribution, the spatial relevancy among cells is modeled with a graph neural network based on distance among cell towers. A graph-based approach is also considered in [36], where traffic is decomposed in inter-tower and in-tower. Deep learning schemes such as LSTM [37], convolutional neural networks [38] or recurrent neural networks [39] have also been applied to coarser time resolutions (i.e., an hour) to extend the forecasting horizon to several days.
The above works show that advanced deep learning models perform well if data are collected with fine-grained time resolution to build long time series (i.e., thousands of samples) of correlated data. However, long-term traffic prediction performed in radio planning tools relies on short and noisy time series, which might prevent operators from using complex deep learning models. As an alternative, it is appropriate to check if simpler SL methods outperform classical time series analysis for time series with these constraints. For this purpose, a large dataset containing data collected on a per-cell basis during years is required. Such information is an extremely valuable asset for operators, which is seldom shared. For this reason, to the authors' knowledge, no recent work has evaluated long-term traffic forecasting in mobile networks considering SL techniques.

Problem Formulation
The problem of forecasting traffic carried in a cell with a time horizon 0 at time from historical data collected during an observation window can be formulated aŝ where − + 1 denotes the oldest available data sample. The way to tackle traffic prediction strongly depends on the time granularity of data, which determines two key factors. A first factor is the length of available time series, . Note that, when training a prediction model, the number of observations must be higher than the number of model parameters [40]. Thus, complex deep learning models with hundreds of internal parameters cannot be considered if a short time series (i.e., dozens of samples) is available due to data aggregation on a coarse time resolution (e.g., monthly data).
A second factor is data predictability, which may be degraded by the aggregation operation needed to compute monthly busy-hour traffic. To illustrate this fact, Figure 1 shows the evolution of monthly busy-hour traffic in the Down Link (DL) from a live LTE cell. As described later in Section 4, data aggregation is performed by selecting the busy-hour per week and then averaging traffic measurements per week in a month. Weekly and monthly aggregation eliminates hourly/daily fluctuations and the impact of sporadic events (e.g., cell outage, cell barring. . . ), leaving only monthly variations needed for detecting capacity issues. The latter consist of: (a) a trend component, influenced by the traffic growth rate, (b) a seasonal component, given by the month of the year and (c) a remainder component, due to abnormal events taking place locally (e.g., new nearby site, new hot spot. . . ) or network wide (e.g., launch of new terminals, change of network release. . . ). In the example of the figure, it is observed that the trend component prevails over the seasonal component, causing that the time series is not stationary. Additionally important, network events cause sudden trend changes, which decrease the value of past knowledge and, ultimately, degrade the performance of prediction algorithms. For a deeper analysis of how time resolution affects predictability, Figure 2a,b show a box-and-whisker diagram of the autocorrelation function of the DL traffic pattern in 310 cells from a live LTE network on an hourly and monthly basis, respectively. To ease comparison, two seasonal periods are represented in both cases (2 days for hourly data and 2 years for monthly data). The shadowed area depicts the 95% confidence interval for each lag, suggesting that values outside this area are very likely a true correlation and not a statistical fluke. The smaller interval width for the hourly series are due to a larger number of samples stored in those series. In Figure 2a, the interquartile boxes show a cyclical pattern with maxima (strong positive autocorrelation) at lags 24 and 48 and minima (strong negative correlation) at lags 12 and 36. Such a behavior reveals that hourly traffic is seasonal. Moreover, in most lags, the whole box is out of the shadowed area, suggesting that past information is relevant for the current traffic value. In contrast, Figure 2b shows that the autocorrelation of monthly traffic does not suggest a seasonal pattern, due to the absence of local maxima or minima. On the contrary, it quickly diminishes to 0, so that, in most cells, only information from lags 1 to 4 is significantly correlated with the current traffic value. This fact suggests that, even for time series with the same available observation window, monthly busy-hour data are less predictable than hourly data. It is especially relevant that, unexpectedly, correlation with lags 12 and 24 (i.e., with data collected in the same month of previous years) is not significant, reducing data predictability. A closer analysis (not presented here) shows that seasonality is neither observed in the de-trended monthly traffic series.
(b) Monthly busy-hour data. The above analysis confirms that long-term traffic forecasting for network dimensioning, based on monthly busy-hour traffic data, requires a separate analysis from short-term traffic forecasting based on higher time resolution data.

Dataset
The analysis carried out in this work requires a large dataset including the largest number of cells and network states. The considered dataset contains data collected from January 2015 to June 2017 (i.e., 30 months) in a large live LTE network serving an entire country. The network comprises 7160 cells (also known as evolved Nodes B) covering a geographical area of approximately 500,000 km 2 , including cells of different sizes and environments, with millions of subscribers. Analysis is restricted to the DL, as it has the largest utilization in current cellular networks.
In the network, traffic measurements are gathered on a per-cell and hourly basis. As explained in Section 1, such raw data are pre-processed to obtain a single traffic measurement per cell/month to be stored in the long-term for network dimensioning tasks. The resulting dataset consists of 7160 time series (1 per cell) with 30 measurements (1 per month) of the monthly DL traffic volume in the busy-hour, expressed as a rate (i.e., in kbps). The monthly DL traffic volume carried in a cell and month during the busy-hour, ( , ), is calculated as follows: 1. The average DL traffic volume (in kbps) and the average number of active users (i.e., those users demanding resources) are measured and collected per cell and hour.

2.
The weekly busy-hour is selected per cell as the hour with the highest number of active users in week . Each week belongs to a month . The DL traffic volume (in kbps) during that busy-hour is selected as the weekly DL traffic volume per cell, week and month, ( , , ) (in kbps).

3.
Finally, the monthly busy-hour DL traffic volume per cell and month, ( , ), is computed as the average of ( , , ) across weeks in month , as where ( ) is the number of weeks in month . For simplicity, a week riding two months is considered to belong to only one month (the month including more days of that week).
The considered dataset combines a large temporal and spatial scale (30 months, entire country) with a fine-grained space resolution (cell), which guarantee the reliability and significance of results. Moreover, it allows to test prediction methods with different observation windows and time horizons, broadening the scope of the analysis.

Traffic Forecasting Methods
Six prediction methods are tested. A first group consists of classical statistical time series analysis schemes, namely Seasonal Auto Regressive Integrated Moving Average (SARIMA) and Additive Holt-Winters (AHW). A second group consists of state-of-theart SL algorithms, namely Random Forest (RF), Artificial Neural Networks (ANN) and Support Vector Regression (SVR). The basics of these techniques are outlined next: • SARIMA computes the current value of a time series difference as the combination of previous difference values and the present and previous values of the series. As detailed in [41], a SARIMA process is described as SARIMA( , , )( , , ) . ( , , ) describe the non-seasonal part of the model, where is the auto-regressive order, is the level of difference and is the moving average order, with , and non-negative integers. ( , , ) describe the seasonal part of the model, where , and are similar to , and , but with backshifts of the seasonal period (e.g., for monthly data, = 12). • AHW calculates the future value of a time series with recursive equations by aggregating its typical level (average), trend (slope) and seasonality (cyclical pattern) [42]. These three components are expressed as three types of exponential smoothing, with smoothing parameters , and , respectively. As in SARIMA, the seasonal period is denoted as (for monthly data, = 12). In this work, Additive Holt-Winters (AHW) is chosen, since, as shown in Figure 1, the seasonal effect is nearly constant through the time series (i.e., it does not increase with the level). • RF is an ensemble learning method where several decision trees are created with different subsets of the training data (also known as aggregating or bagging). To reduce the correlation among trees, a different random subset of input attributes is selected at each candidate split in the learning process (feature bagging) [43]. To avoid model overfitting, trees are pruned. Then, predictions obtained from different trees are averaged to perform a robust regression [44]. • ANN is a statistical learning method inspired by the structure of a human brain. In ANN, entities called nodes act as neurons, performing non-linear computations by means of activation functions [45]. Two ANNs are considered in this work. The first one, denoted as ANN-MLP, is a feed-forward network (i.e., without memory) consisting on a Multi-Layer Perceptron (MLP) [46]. The second one, denoted as ANN-LSTM, is a deep recurrent network (i.e., with memory) based on LSTM units capable of capturing long-term dependencies thanks to the use of information control gates [47]. The architectures of such networks (number of layers, number of neurons per layer, activation functions, etc.) are detailed in Section 6.1. • SVR maps a set of inputs into a higher dimensional feature space to find the regression hyperplane that best fits every sample in the dataset. For this purpose, a linear or non-linear (also known as kernel) mapping function can be used. Unlike traditional multiple linear regression, SVR neglects all deviations below an error sensitivity parameter, . Moreover, the regularization parameter, , restricts the absolute value of regression coefficients. Both parameters control the trade-off between regression accuracy and model complexity (i.e., the smaller and larger , the better the model fits the training data, but overfitting is more likely) [44].
Several issues must be considered when using these methods for long-term data traffic forecasting in cellular networks:

1.
Observation window: time series corresponding to recently deployed cells may not have many historical measurements. Thus, it is important to check the capability of the methods to work with small observation windows. Such a feature is especially critical during the network deployment stage, when network structure is constantly evolving (e.g., new cells are activated every month). At this early stage, robust traffic forecasting is crucial to avoid under-/over-estimating traffic in the new cells.

2.
Number of models: recursive models such as SARIMA, AHW and ANN-LSTM are conceived to build a different model per cell based on historical data of that particular cell. Thus, the short period available in data warehouse systems for long-term forecasting (typically, less than 24 months) may jeopardize prediction capability in these methods, since it is always necessary to have more observations than model parameters [40]. In contrast, in RF, ANN-MLP and SVR, a single model can be derived for the whole network from historical data of all the cells. The latter ensures that enough training data are available to adjust model parameters, avoiding model overfitting.
Likewise, sharing past knowledge across cells in the system increases the robustness of predictions in cells with limited data or abnormal events.

3.
Time horizon: The earlier a capacity bottleneck can be predicted, the more likely the problem will be fixed without any service degradation, since some network replanning actions (e.g., build a new site) may take several months. Such a delay forces operators to foresee traffic demand several months in advance (referred to as multistep prediction). In classical time series analysis methods, such as SARIMA and AHW, multi-step prediction is carried out recursively by using a one-step model multiple times (i.e., the prediction for the previous month is used as an input for predicting the following month). Such a recursive approach reduces the number of models needed, but quickly increases prediction errors originated by the use of predictions instead of observations as inputs [48]. This is a critical issue when using recursive methods for series with large random components, as those used in long-term forecasting. In contrast, SL algorithms have the ability to directly train a separate model for each future step. Such an approach does not entail an increase of computational load if the set of steps predicted is small (e.g., 3 and 6 months ahead).

4.
Interpretability: Ideally, prediction models should be simple enough to have an intuitive explanation of their output values [49]. Models built with SARIMA and AHW are easier to understand, since their behavior is described by a simple closed-form expression, whereas models built with RF, ANN and SVR cannot be explained intuitively. In long-term traffic forecasting, interpretability is not an issue, and is thus neglected in this work.

Performance Assessment
This section presents the evaluation of the different forecasting approaches with the dataset described in Section 4. For clarity, model construction is first explained, assessment methodology is then detailed and results are presented later.

Model Construction
Assessment is carried out with IBM SPSS Modeler [50], a commercial tool for predictive analytics extensively used in several fields [51][52][53]. The tool has a visual interface that allows users to leverage statistical and data mining algorithms without programming. Likewise, it offers an expert mode to find the optimal settings for certain model hyperparameters. These features are extremely valuable for network planners without a deep knowledge of prediction methods. SPSS Modeler 18.1 includes all the forecasting methods considered in this work except to ANN-LSTM, which is implemented by using Keras python library [54]. Table 1 summarizes the hyperparameter settings selected for the six prediction methods. To get the best of classical time series approaches, , , , , and in SARIMA and , and in AHW must be set on a cell basis. For this purpose, the Expert mode offered by SPSS Modeler is used to find the optimal settings per cell. In this mode, the input data series is first transformed when appropriate (e.g., differentiating, square root or natural log) and model parameters are then estimated by minimizing the one-step ahead prediction error. Such an Expert mode is also used to set the number of neurons per layer in ANN-MLP. In models comprising neural networks, overfitting is avoided through the use of early stopping callbacks and validation data. The rest of the hyperparameters are fixed manually following a grid search in the parameter space. The reader is referred to [54,55] for further information on the algorithms in SPSS Modeler and Keras, respectively. To check the sensitivity of methods to the data collection period, the five approaches are tested in 3 different cases. In a first case, it is assumed that the operator collects traffic data on a cell basis during 24 months. In the second and third cases, it is assumed that the operator collects traffic data during 18 and 12 months, respectively (e.g., network deployment stage). This reduction in the observation window is an important constraint for SARIMA and AHW, which require dozens of samples to predict monthly data from time series with some randomness, ans these considered in this work [40]. Hence, only SL methods are compared in these cases.
To check the impact of prediction time horizon, two horizons are considered: 3 months (i.e., traffic is forecast 3 months in advance) and 6 months (i.e., 6 months in advance). Thus, the combination of factors results in 6 cases, hereafter referred to as cases 12-3, 12-6, 18-3, 18-6, 24-3 and 24-6, where the first number denotes the number of months when data are collected by the operator and the second number denotes the prediction horizon. From the operator point of view, the less data stored and the more in advance traffic is forecast (i.e., case 12-6), the better, but, from an accuracy point of view, the opposite is likely true (i.e., case 24-3).
Forecasting methods are trained and tested emulating the behavior of network planning tools. For clarity, an example of how the operator would apply the methods in each case is given in Figure 3. Specifically, the example illustrates how to predict traffic carried out in June 2017 with a 3-month time horizon (i.e., the prediction is made on March 2017) based on measurements from previous months. Similarly, Figure 3c illustrates the timeline in case 12-3 (12-month collection period). Now, only data from April 2016 to March 2017 are necessary. In this case, SARIMA and AHW methods cannot be used due to the limited length of the time series. In SL approaches, the process is identical to that explained in case 24-3, except for a reduction in the number of input attributes in every datum point from 21 to 9 months (3 months less than the new observation window). Likewise, datum points from case 18-3 have 15 traffic measurements as input datum points.
Data forecasting for cases 24-6, 18-6 and 12-6 (i.e., with a 6-month horizon) follows a similar procedure with a 6-month gap between the end of the observation window and the target/predicted month (e.g., in case 24-6, traffic in June 2017 is predicted in December 2016 based on measurements from January 2015 to December 2016).
It should be pointed out that, as stated in Section 5, ANN-LSTM is conceived to build a model per time series (i.e., per cell). However, the short length of the time series for the three considered observation windows (e.g., 12/18/24 samples) does not allow to have as many datum points (i.e., time lapses) as parameters. Note that the ANN-LSTM model with the architecture in Table 1 has 1331 parameters. To circumvent this problem, in this work, ANN-LSTM is used as the other SL methods, by creating a single model for the whole network, as explained in Figure 3b,c. To this end, a single time series is generated by concatenating time series from the different cells in the network and only time lapses where the target month is the output are considered (in the example above, June 2017).
Finally, it is worth mentioning that in this work, all available time series are used for both training and testing the SL models. However, training and test datasets used for each target month are disjoint due to a shift in the observation window. This workflow emulates how data become available in a network planning tool as time progresses.

Assessment Methodology
Three experiments are performed sequentially. Results of each experiment motivate the execution of the following one.

Experiment 1-Selection of the Observation Window
The first experiment aims to determine how many months of data (i.e., 12 months, 18 months or 24 months) are required to forecast cellular traffic with enough accuracy with the 2 considered prediction horizons (i.e., 3 and 6 months). For this purpose, the 6 considered methods are evaluated in cases 12-3, 12-6, 18-3, 18-6, 24-3 and 24-6. The data to be forecast are cell traffic in June 2017, as shown in Figure 3. As explained above, a model is created per cell with TSA methods, whereas a single network-wide model is built with all SL methods. Note that assessing case 24-6 (the most data-demanding case) requires collecting data 30 months in advance, and thus June 2017 is the only possible target month for this experiment in the considered dataset.

Experiment 2-Method Comparison
The second experiment aims to check: (a) how much time in advance prediction can be made (3 or 6 months), (b) the dependence of prediction accuracy on the target month and (c) which is the best prediction algorithm. For this purpose, cell traffic from July 2016 to June 2017 (i.e., for a year) is forecast 3 or 6 months in advance with an observation window of 12 months. Models used to predict traffic in the different months are created as explained in Section 6-A, but with the corresponding months displaced (e.g., in case 12-3, if the target month is May 2017, the observation window starts/ends a month before than in Figure 3c). SARIMA and AHW approaches cannot be included in this experiment due to the small observation window (12 months).

Experiment 3-Creation of Specific Models for High-Traffic Cells
In capacity planning, accurate traffic prediction is especially important in cells with a high traffic, as these are more likely to suffer capacity problems. The aim of this experiment is to evaluate the possibility of creating a differentiated model for such cells. The idea is to discard underutilized cells, which often show noisy traffic measurements, when training the model. To this end, a model to forecast traffic in June 2017 is trained only with data from high-traffic cells. In this work, a cell is considered as a high-traffic cell if its traffic when the time prediction is made (e.g., in case 12-3, 3 months before, in Mar 2017) exceeds the 85th percentile of the monthly cell traffic in the network, i.e., (c',Mar17) > The Mean Absolute Error ( ) computed as where ( , ) is the predicted traffic for month in cell , and 2.
The Mean Absolute Percentage Error ( ) computed as Additionally, two secondary indicators are used for a more detailed assessment: (a) the , computed as the mean error and (b) the execution time, as a measure of computational load.
6.3. Results 6.3.1. Experiment 1 Table 2 breaks down the performance of the prediction methods in cases 12-3, 18-3 and 24-3 (3-month horizon). As explained above, SARIMA and AHW are not tested in cases 12-3 and 18-3. Results show that, in case 24-3, SARIMA achieves the worst performance, with an extremely large (=43.25%) and (=2069.72 kbps). An analysis not shown here reveals that SARIMA fails to capture trend and level changes in historical data for some time series. Such a behavior may be due to the reduced number of historical data samples in the considered time series, which prevents the model from eliminating noise effect, thus causing a significant error when estimating future traffic. AHW outperforms SARIMA, but it still performs poorly ( = 29.28% and = 1780.71 kbps). Such a poor performance is due to the recursive nature of these models, where noisy input data severely degrade the accuracy of predictions beyond the next step. All SL techniques outperform AHW, with below 28% and below 1600 kbps. Thus, it can be concluded that SL techniques are more accurate than SARIMA and AHW. When comparing case 24-3 (the longest tested window) against case 12-3 (the shortest tested window), it is observed that, unexpectedly, SL algorithms perform similarly or even better in case 12-3 (e.g., for ANN-MLP, increases from 1023.55 kbps with 12 months to 1339.91 kbps with 24 months). This fact confirms that the influence of past measurements quickly diminishes in the long term due to changes in user trends and re-planning actions by the operator. This statement is reinforced by results from case 18-3, in which algorithm obtain any significant improvement compared to case 12-3. Table 3 shows the comparison between cases 12-6, 18-6 and 24-6 (6-month prediction horizon). In case 24-6, all SL techniques but SVR again outperform both SARIMA and AHW (SVR outperforms SARIMA, but not AHW). When comparing cases 12-6 and 24-6, all SL algorithms achieve a better in case 12-6, as in cases 12-3 vs. 24-3 (e.g., in SVR, decreases from 2517.43 kbps in case 24-6 to 1372.89 kbps in case 12-6, a 45.47% in relative terms). Results from case 18-3 only improve performance of SVR. Nonetheless, such a method is still the worst SL approach. From the above results, it can be concluded that: (a) SL approaches outperform SARIMA and AHW when predicting traffic in cellular networks in the long term and (b) there is not much benefit on storing more than 1 year of traffic measurements (unless SARIMA, AHW or SVR are the only option). Thus, AHW and SARIMA are not considered in further experiments. Likewise, a 12-month time horizon is chosen as optimal for SL methods. Table 4 presents the average , and across months for each algorithm and case. For a more detailed analysis, Figure 4 breaks down the obtained in cases 12-3 (solid lines) and 12-6 (dotted lines) for each target month. Recall that cases 24-3 and 24-6 are not considered based on the conclusions in experiment 1. Table 4 show that, in case 12-3, RF, ANN-MLP and ANN-LSTM perform similarly ( ≈ 27% and ≈ 1000 kbps), outperforming SVR ( = 30.26% and = 1059.86 kbps). Moreover, Figure 4 shows that prediction accuracy for the former algorithms degrades significantly when predicting traffic in July and August 2016 (summer holidays) compared to the rest of months (working months). This might be due to isolated events taking place during summer months in the country where data was collected (e.g., tourism, festivals, etc.) that change traffic patterns unpredictably, making information collected 3 months in advance not representative of the traffic in the months to come. In contrast, SVR shows a more stable behavior during summer months (i.e., does not degrade).  By comparing cases 12-3 and 12-6 in Table 4, it is observed that, for all algorithms, there is a significant degradation in accuracy if traffic predictions are made more than 3 months in advance (e.g., for ANN-LSTM, and increase by 18.89% and 20.85% in relative terms, respectively). Moreover, dotted lines in Figure 4 show a strong variation in across months in case 12-6. Thus, when possible, it is recommended to use a 3-month prediction horizon.

Experiment 2
ANN-LSTM shows the best overall results, with a of 26.37% and a of 986.68 kbps in case 12-3, and the smallest degradation in accuracy from case 12-3 to case 12-6. Nonetheless, for a 3-month horizon, ANN-MLP and RF algorithms can be used alternatively with similar prediction accuracy ( ≈ 27.70% and ≈ 990 kbps). Remember that, unlike ANN-LSTM, the latter algorithms are included in most data analytics tools (such as, e.g., SPSS modeler), which provide automatic schemes that relieve network operators from the complex hyperparameter tuning task.
It should be pointed out that, even for the best method and case, prediction results are not very accurate (i.e., ≈ 1000 kbps, or, expressed more intuitively, a deviation of 0.39 GB per hour and cell). This error can be explained by re-planning actions taken by the operator in the considered network area during the data collection period, which lead to unpredictable traffic changes in neighbor cells. For a closer analysis, Figure 5 illustrates traffic prediction from January 2017 to June 2017 with a 3-month horizon for two cells, referred to as Cell A and Cell B, with the compared methods. No significant re-planning actions were taking in the surroundings of Cell A during the data collection period, whereas Cell B is a cell with a new neighbor cell deployed in October 2016. In Figure 5a, it is observed that, for Cell A, all methods approximate real traffic quite well (e.g., fluctuates between 4 and 900 kbps for ANN-MLP, and between 314 and 824 kbps for RF). In contrast, in Cell B, the abrupt decrease in traffic in October 2016, caused by the deployment of the nearby cell, leads to large prediction errors even for all models.
It is also remarkable that values in Table 4 for case 12-3 are negative or close to 0, i.e., models tend to underestimate traffic. This behavior is risky especially for hightraffic cells, which are more likely to suffer from capacity problems and, hence, require re-dimensioning actions. For a closer analysis of bias, Figure 6 shows the error scatter plot of error versus measured cell traffic obtained when predicting traffic carried in March 2017 with ANN-LSTM in case 12-3 (the combination of month/algorithm/horizon with the lowest in this experiment). The regression line shows that the more loaded cells are, the more negative the error is. This trend points out the need for a more accurate model for high-traffic cells. This problem will be addressed in experiment 3.  6.3.3. Experiment 3 Table 5 breaks down and in case 12-3 for the 1074 (15%) cells with the largest traffic in March 2017, obtained with two different models: (a) the model built in experiment 1 (denoted as network-wide model), and (b) a specific model trained exclusively with data collected from the high-traffic cells (denoted as specific model). Note that the same set of cells is tested to assess both models. To isolate the effect of building a differentiated model, those high-traffic cells that were affected by re-planning actions before June 2017 have not been considered when evaluating the accuracy of the models. In the table, it is observed that the specific model outperforms the network-wide model in RF, ANN-MLP and SVR methods, whereas the specific model created with ANN-LSTM does not improve the network-wide model built with this method. SVR experiences the largest improvement with the specific model in absolute terms, decreasing from 2223.88 kbps to 1725.20 kbps (22.42% in relative terms) and from 20.44% to 14.19% (29.11% in relative terms). Nonetheless, it is still the worst method. Specific models created with RF and ANN-MLP show the best results, achieving ≈ 11% and ≈ 1200 kbps. For a closer analysis, Figure 7a-d show the error cumulative distribution functions (CDF) obtained with the different methods for high-traffic cells with the network-wide model (solid lines) and the specific model (dashed lines) in case 12-3 for the selected target month (June 2017). It is observed that, for all methods, error curves with the specific models are shifted to the right compared to those with the network-wide models, whose median values are closer to 0. Thus, the specific models for high traffic cells not only increase prediction accuracy, but also reduce the bias. Nevertheless, CDFs show that prediction error is still negative in many high-loaded cells (≈45% for RF, ANN-MLP and ANN-LSTM and ≈75% for SVR). This issue should be addressed by other means (e.g., models based on predictors other than cell traffic).

Computational Complexity
The different methods are compared by their total execution time. For SARIMA and AHW, execution time comprises computing a cell-specific model and extending it until the target month, which must be repeated for all cells in the system. For SL algorithms, execution time comprises training a network-wide model with the series from all cells and computing predictions from historical traffic values (i.e., no model extension is needed).
For SARIMA and AHW, it is expected that running time grows linearly with the number of predictors (months), , and the number of models built (cells), . Thus, their worst-case time complexity is O ( ). For SL methods, running time increases with the number of datum points (cells), , and the number of predictors/inputs (months), . Specifically, the worst-case time complexity for the back propagation algorithm used to train ANNs with inputs, 1 output and hidden layers is O ( ) where is the size of the hidden layers and is the number of iterations. The time complexity of sequential minimal optimization used to train SVR is quadratic with the training set size, O ( 2 ). Likewise, the worst-case time complexity of RF is given by the time of building a complete decision tree, O ( log ). Table 6 summarizes the time taken to train the models in the different approaches in experiment 1 (7160 cells) in a centralized server with Intel Xenon octa-core processor, clock frequency of 2.4 GHz and 64 GB of RAM. The number of predictors per case is shown on the upper part of the table. Results show that, in cases 24-3 and 24-6, AHW and SARIMA are the most time-consuming methods, since they build a model per cell. Among SL methods, ANN-MLP is the fastest approach, whereas ANN-LSTM is the lowest approach. In SL methods, it is observed that, the longer data collection period, the larger number of predictors, and hence the larger runtime. For a certain collection period, the larger the time horizon, the lower number of predictors in the model, and hence the lower runtime. In contrast, in SARIMA and AHW, the longer the time horizon, the larger runtime, since the model must be extended until the target month. Nonetheless, execution times in all cases are negligible in long-term traffic forecasting, where traffic must be predicted at most once a month.

Conclusions
Accurate long-term traffic forecasting will be crucial for network (re)dimensioning in 5G networks. However, monthly busy-hour time series are short and noisy, which makes long-term traffic prediction a challenging task. In this work, a comparative study has been carried out to assess different approaches for predicting cellular traffic in the long term (i.e., several months in advance). Six methods have been compared, including classical time series analysis schemes (SARIMA and AHW) and supervised machine learning algorithms (RF, ANN-MLP, ANN-LSTM and SVR). To this end, three experiments have been carried out with a dataset taken from a live LTE network, covering an entire country (7160 cells) and traffic data for two and a half years.
Results have shown that SL methods outperform classical time series analysis in terms of accuracy and required storage capacity. Specifically, with SL algorithms, traffic carried per cell can be predicted with a ≈ 1000 kbps with a 3-month time horizon and a 12-month observation window (i.e., data collection period). It has also been shown that it is convenient to develop specific models for high-traffic cells, where prediction accuracy is critical. Overall, RF and ANN-MLP have shown the best results, providing acceptable accuracy ( ≈ 11%) to detect capacity bottlenecks in high-load cells with a 3-month prediction horizon by using data from the past 12 months. It is remarkable that these non-deep algorithms perform very similar to deep neural networks based on LSTM units, used to model time dependencies in short-and medium-term traffic forecasting. This is due to the monthly busy-hour aggregation of data when forecasting traffic for network dimensioning, which reduces time series length and predictability compared to series used in short/medium-term forecasting. Nonetheless, none of the considered algorithms is extremely accurate, especially for summer months, due to changes in user trends, social events or temporary re-planning actions by the operator.
The use of SL algorithms proposed in this work, based on the creation of a single model for the whole network, is suitable for a centralized network dimensioning tool running in the OSS, where historical traffic measurements are currently stored. Alternatively, SARIMA and AHW could also be executed in base stations if historical measurements were available in each cell.
Future work will extend models to account for events that drastically change traffic patterns in a cell (e.g., new neighbor site, equipment upgrades, social events, etc.). Likewise, the values of nearby cells (e.g., co-sited cells) can be jointly predicted via multi-task learning to cope with noise in time series.