A Deep Dive into AI-Based Network Traffic Prediction Using Heterogeneous Real Datasets

Kim, Jungyun; Ryoo, Intae

doi:10.3390/app16010367

Open AccessArticle

A Deep Dive into AI-Based Network Traffic Prediction Using Heterogeneous Real Datasets

by

Jungyun Kim

and

Intae Ryoo

^*

Department of Computer Science & Engineering, Kyung Hee University, Yongin-si 17104, Gyeonggi-do, Republic of Korea

^*

Author to whom correspondence should be addressed.

Appl. Sci. 2026, 16(1), 367; https://doi.org/10.3390/app16010367

Submission received: 28 October 2025 / Revised: 16 December 2025 / Accepted: 17 December 2025 / Published: 29 December 2025

(This article belongs to the Special Issue Applications of Artificial Intelligence in Wireless Communications and Networks)

Download

Browse Figures

Versions Notes

Abstract

Recent studies have highlighted that network traffic may be influenced by various external factors such as weather conditions and user behavior, making it challenging to achieve precise predictions using only historical traffic data. To address this limitation, this study proposes a multivariate time series prediction model that incorporates environmental variables, such as meteorological information, to improve the accuracy of network traffic forecasting. Five deep learning models—RNN, GRU, LSTM, CNN, and Transformer—were evaluated under the same experimental conditions. Performance was assessed using metrics such as MSE, RMSE, MAE, R², and MAPE. In addition, ANOVA and Tukey HSD post hoc tests were conducted to analyze the statistical significance of performance differences between models, and the contribution of each environmental variable was evaluated using the Permutation Importance method, which demonstrated a significant impact on model performance. Experimental results indicated that the GRU and RNN models achieved the best overall prediction accuracy. Additionally, some weather variables, such as temperature and sunlight duration, positively impacted performance improvement. This study empirically demonstrates the generalization capabilities of simple recurrent architectures and the effectiveness of integrating environmental variables. Furthermore, it suggests future research directions, including cross-domain model adaptation and the application of large language model (LLM)-based time series forecasting frameworks.

Keywords:

network traffic prediction; environmental variables; time series forecasting; deep learning models; permutation importance; artificial intelligence

1. Introduction

As the hyper-connected society emerges, network infrastructure is experiencing continuous increases in structural complexity and network traffic handling demands. The commercialization of 5G, advancements in cloud and edge computing, and the proliferation of the Internet of Things (IoT) have diversified both the number of devices connected to the network and the types of data traffic. Consequently, the importance of real-time and large-scale network traffic processing is becoming increasingly prominent [1]. Against this backdrop, the surge in network traffic cannot be effectively addressed by merely expanding physical resources. Instead, an intelligent network traffic prediction-based proactive resource management system is required [2,3]. In response to this need, traditional statistical models such as the Autoregressive Integrated Moving Average (ARIMA) model and regression-based methods have been used for traffic prediction. However, these models have limitations in accurately capturing the complex and nonlinear patterns of actual network traffic [4]. Time series forecasting techniques utilizing Deep Learning (DL) have recently gained attention. Recurrent neural network-based models such as Recurrent Neural Network (RNN), Long Short-Term Memory (LSTM), and Gated Recurrent Unit (GRU) have been applied to various traffic prediction problems due to their structures that effectively capture time dependencies [5,6].

However, these recurrent structures are prone to the vanishing gradient problem during the training process for long sequence inputs and have structural limitations that make them inefficient for parallel processing due to their sequential learning approach. To overcome these limitations, Transformer-based models have recently been extended to the field of time series forecasting [7]. Transformers are effective for time series data with long-term dependencies because they utilize a self-attention mechanism that simultaneously considers relationships across all time steps. Additionally, their parallel processing structure, which eliminates sequential connections, offers advantages in terms of training efficiency. These traits, such as handling long-term dependencies and enabling efficient parallel processing, have been demonstrated in natural language processing, and recently, various tailored structures for time series forecasting have been proposed [8].

Recent deep learning–based approaches have significantly advanced time series forecasting, yet several limitations remain in their application to network traffic prediction. For instance, while RNN-based models (RNN, LSTM, GRU) effectively capture temporal dependencies, they are vulnerable to gradient vanishing and suffer from slow training on long sequences. CNN-based models demonstrate high efficiency in local feature extraction but often fail to learn global temporal dependencies [9,10]. Transformer architectures, on the other hand, address long-range dependency modeling through self-attention but introduce heavy computational complexity and instability when applied to small or noisy datasets [11,12].

Existing comparative studies—such as Autoformer [13] and Informer [14]—primarily evaluate model accuracy using benchmark datasets such as the Electricity Transformer Temperature dataset (ETTh/ETTm) [15] and the Performance Measurement System traffic dataset (PEMS) [16]. However, these studies do not consider domain-specific external influences such as meteorological or environmental factors. Moreover, few studies have examined how the composition of input variables (e.g., the inclusion of exogenous weather information) interacts with different model architectures. Consequently, empirical evidence remains limited regarding which model type achieves the best trade-off between predictive accuracy, generalization capability, and computational efficiency under heterogeneous real-world conditions.

This study aims to fill these gaps by performing a controlled comparison of five representative deep learning models—RNN, LSTM, GRU, CNN, and Transformer—under identical experimental configurations. In particular, we critically evaluate how each model responds to the inclusion of environmental variables, analyze statistical significance through Analysis of Variance (ANOVA) and Tukey’s Honest Significant Difference (HSD) tests, and quantify variable contribution via Permutation Importance (PI) [17,18]. Unlike previous works that focus solely on accuracy, our study emphasizes interpretability, model robustness, and the structural rationale behind performance differences. This comprehensive and statistically validated comparison provides new insight into the suitability of each architecture for real-world, environment-aware network traffic prediction.

Meanwhile, Convolutional Neural Networks (CNNs), originally used in the field of image recognition, have recently been effectively employed to extract and predict local patterns within time series data using a 1D kernel [19].

CNN-based time series models require fewer learning parameters and are suitable for parallel processing. They are particularly advantageous for learning features within short time intervals. This makes them well-suited for detecting sudden traffic changes or periodic predictions over short durations.

Another limitation of existing research is the composition of input variables. Most studies have been limited to predicting and utilizing traffic based on historical traffic usage. However, actual traffic can be influenced by extra-temporal factors, such as environmental variables like temperature, humidity, wind speed, and sunlight [20]. Weather conditions can affect human online activities, the operating times of industrial equipment, and network usage, highlighting the need for multivariate prediction models that integrate such external information. However, not all environmental variables positively impact prediction performance. Some may act as noise, potentially degrading the model’s effectiveness. Therefore, a precise analysis of variable selection strategies and their interaction with model structures is necessary.

In this study, we construct a multivariate time series prediction model that integrates weather data based on the actual network traffic data from Korea South-East Power Co., Ltd. We train RNN, LSTM, GRU, CNN, and Transformer models under the same conditions to quantitatively compare and analyze the performance of each model.

Performance evaluation is conducted using standard forecasting metrics, including Mean Squared Error (MSE), Root Mean Squared Error (RMSE), Mean Absolute Error (MAE), the coefficient of determination (R²), and Mean Absolute Percentage Error (MAPE), to examine differences in predictive performance depending on the inclusion of environmental variables. Additionally, we apply the Permutation Importance method to analyze the contribution of each environmental variable and assess the validity of a correlation-based variable selection strategy. Through such an analysis, this study aims to address the structural limitations of existing time series prediction research and empirically demonstrate the applicability of Transformer and CNN-based models. Furthermore, by quantitatively analyzing the impact of integrating external environmental information and variable selection strategies on prediction accuracy, the study is expected to provide practical insights for the design of intelligent network operation systems, autonomous traffic control structures, and distributed infrastructure management technologies in the future, focusing exclusively on the forecasting of network data traffic.

The novelty of this study lies in its comprehensive and statistically validated comparison of multiple deep learning architectures for network traffic prediction under identical experimental conditions using real-world data.

Several prior studies have demonstrated that environmental factors can materially influence wireless/network performance and forecasting. For example, one study reported that combining real meteorological data (e.g., temperature, humidity, and precipitation) with deep learning models (LSTM and CNN–LSTM) improved the prediction accuracy of signal strength, which is one of the key indicators of network QoS [21].

Another work decomposed the predictors for cellular traffic forecasting into temporal (time-series), spatial (neighboring-cell), and auxiliary/external components, and incorporated precipitation information (rain/snow occurrence and precipitation intensity) as auxiliary variables; when these environmental variables were included in an LSTM-based framework, long-horizon prediction performance improved in terms of RMSE and MAE compared with the setting without environmental variables [22].

In addition, an empirical study examining the effects of altitude and weather conditions on cellular network performance found that rainy weather has a statistically meaningful impact on QoS—reducing download throughput, increasing packet loss, and lowering mobile signal quality indices [23].

A field investigation of cellular QoS in Dehradun, India further showed that altitude significantly affects signal strength, and that when rainfall intensity exceeds 100 mm/h, the received signal level and signal-to-noise ratio deteriorate sharply, leading to reduced connectivity and increased packet loss; the authors also reported that high summer humidity and temperature exacerbate QoS degradation by increasing thermal noise and propagation delay, and suggested that future work could extend toward cellular optimization by integrating real-time meteorological data with machine learning models [24].

Moreover, in Beyond-5G scenarios where proactive resource allocation depends on accurate traffic forecasting, one study statistically verified—using real-world data—that meteorological factors such as precipitation, wind, and temperature strongly affect traffic volume, and proposed a hierarchical GCN–GRU model that jointly learns temporal, spatial, and meteorological patterns (a GRU for traffic, a GRU for meteorology, and a fusion GRU, with graph convolutions capturing spatial correlations), achieving performance gains of up to 24.8% [25].

Taken together, these findings provide objective evidence that environmental factors can influence traffic prediction. Accordingly, unlike previous studies that focused primarily on model accuracy or relied on standardized benchmark datasets, this paper integrates real Simple Network Management Protocol (SNMP)-based network traffic logs provided by Korea South-East Power Co., Ltd., with publicly available meteorological variables from the Korea Meteorological Administration (KMA) [26] to evaluate the impact of environmental factors on predictive performance.

Furthermore, this study emphasizes interpretability and reproducibility by applying rigorous statistical analyses—such as ANOVA and Tukey HSD tests—to assess the significance of performance differences, and by employing Permutation Importance to quantify the contribution of each input feature. Through this framework, the study not only identifies the most suitable model (GRU and RNN) for short-term network traffic forecasting but also clarifies how environmental variable selection interacts with model architectures.

These contributions collectively extend beyond mere model benchmarking by establishing an empirically grounded methodology for evaluating deep learning models in domain-specific, environment-aware traffic prediction scenarios. This methodological framework can be applied to other real-world infrastructure domains, offering practical insights for intelligent network management and predictive resource allocation.

2. Related Work

2.1. Graph-Based Traffic Prediction

Graph-based traffic prediction models have increasingly focused on learning dynamic spatial–temporal relationships, adapting the graph structure over time, and capturing interactions between nodes under rapidly changing network conditions [27,28,29,30,31,32,33]. These approaches are effective in domains such as vehicular traffic flow and cellular base station coordination, where spatial topology and inter-node dependencies are explicit and evolve over time. However, they typically assume (i) a dense and well-defined spatial graph, (ii) access to rich spatial context such as road layout or base-station adjacency, and (iii) large-scale regional datasets spanning many sensing nodes. Such assumptions do not necessarily hold in operational communication networks within industrial power infrastructures, where traffic patterns are driven more by equipment operation schedules, maintenance activities, and localized load conditions than by human mobility or road congestion. Moreover, prior GNN-based studies rarely perform a controlled, architecture-agnostic comparison across multiple deep learning models under identical experimental settings, and they seldom quantify how exogenous variables affect forecasting accuracy using formal statistical validation. A comparative overview of representative graph-based studies and their structural characteristics is summarized in Table 1.

In contrast, the present work does not construct or assume an explicit spatial graph. Instead, we evaluate recurrent, convolutional, and attention-based architectures (RNN, GRU, LSTM, CNN, Transformer) under the same training and evaluation pipeline using real SNMP-based network traffic data. We further analyze how integrating environmental variables such as temperature, wind, and sunlight duration influences predictive accuracy and generalization. This design shifts the focus from proposing a new spatial graph model to providing an interpretable, statistically validated, and practically deployable framework for environment-aware network traffic prediction.

Table 1. Summary of Graph-Based Traffic Prediction Models.

Problem Definition	Research Objectives	Proposed Method	Dataset	Performance Evaluation Metrics
Reflecting Changes in Spatiotemporal Interactions Using a Static Graph Structure [27]	To develop a GNN architecture capable of adapting to dynamically changing traffic patterns	DSTAGNN: distance-adaptive graph construction + multiscale gated graph convolution + dynamic attention for spatio-temporal modeling	PeMSD7 [34], PEMS-BAY [35]	MAE, RMSE, MAPE
Reflecting Real Interaction Relationships Using a Fixed Adjacency Matrix [28]	Learning-Based Spatial Structure and Attention-Based Spatiotemporal Pattern Detection	Node Embedding + Spatial Attention-Based WaveNet	PeMSD7, METR-LA [36]	MAE, RMSE, MAPE
Representation of complex inter-cell relationships as a simple grid [29]	Design of a GNN prediction architecture that reflects cell-based spatiotemporal relationships	Sequence Similarity Graph + Attention GNN Encoder/Decoder to model temporal dynamics and inter-cell dependencies	China Mobile 5G Cellular Traffic	MSE, MAE, RMSE, R²
Limitations of Spatial Generalization Based on Static Graphs [30]	Improving Spatiotemporal Prediction Accuracy through Self-Evolving Dynamic Graphs	Time-based Graph Generation + Interactive Learning Mechanism	METR-LA, PeMSD4	MAE, RMSE, MAPE
Performance Degradation Issues in Environments with Insufficient Labeled Data [31]	Ensuring the Potential for Traffic Prediction through Self-Supervised Learning	Ensuring the Potential for Traffic Prediction through Self-Supervised Learning	METR-LA, PEMS-BAY	MAE, RMSE, MAPE
Prediction Performance Degradation and Generalization Across Domains [32]	Enhancing Predictive Scalability through Domain-Adaptive Transfer Learning	Transfer learning with spatio-temporal GCN; adversarial domain adaption to reduce distribution mismatch	PEMSD7, PEMS-BAY	MAE, RMSE, MAPE
Underutilization of External Variables, Lack of Lightweight Models, and Processing Time [33]	Analysis of Limitations in Traffic Prediction Technologies and Suggestions for Future Directions	Lightweight GNN, Integration of External Variables, GNN + Transformer Hybrid	Various Open Traffic and Communication Data	MAE, RMSE, R²

2.2. Non-Graph Based Traffic Prediction

As summarized in Table 2, non-graph-based traffic prediction studies have mainly advanced Transformer and hybrid LSTM–Transformer architectures to overcome the scalability and sequence-length bottlenecks inherent in conventional attention mechanisms [37,38,39,40,41,42,43]. These approaches have achieved impressive progress in long-horizon forecasting by decomposing temporal components, improving attention sparsity, and introducing multi-scale temporal encoders. Nevertheless, most were validated on large public benchmarks such as ETTh, ETTm, and PEMS, which provide abundant, homogeneous, and spatially regular data. They seldom reflect domain-specific external influences—particularly environmental or meteorological variables—that can critically affect network-traffic dynamics. Moreover, previous research has rarely offered a statistically validated cross-model comparison under identical configurations, leaving uncertainty as to whether performance gaps arise from genuine model superiority or dataset bias.

In contrast, the present work establishes a controlled experimental framework to evaluate representative deep-learning architectures—RNN, GRU, LSTM, CNN, and Transformer—under consistent training and evaluation settings. Rather than introducing a new Transformer variant, our study emphasizes interpretability, reproducibility, and practical generalization. We specifically analyze how environmental factors such as temperature, wind speed, and sunlight duration interact with each model architecture and quantify their effects through formal statistical validation (ANOVA and Tukey HSD). This framework provides a transparent and empirically grounded comparison, clarifying the trade-offs among model complexity, predictive accuracy, and deployability for real-world industrial network-traffic forecasting.

Table 2. Summary of None-Graph Based Traffic Prediction Models.

Problem Definition	Research Objectives	Proposed Method	Dataset	Performance Evaluation Metrics
In Transformer-based long-term time series forecasting, there are issues of inadequate reflection of trends and seasonality, as well as information loss [37]	Improving long-term forecasting accuracy through a framework based on time series decomposition and autocorrelation.	Series Decomposition + Auto-Correlation Attention	ETTh1, ETTh2, ETTm1, and Weather, etc.	MSE, MAE
In traditional Transformers, the Self-Attention mechanism is inefficient for long time series data [38]	Design of Efficient Transformers for Long-Sequence Forecasting	ProbSparse Attention + Distilling Operation	TTh1, ETTh2, ETTm1, and Exchange, etc.	MSE, MAE
It is challenging to capture both long-term and short-term time series patterns simultaneously with a single model [39]	Integration of Spatiotemporal Features through the Fusion of LSTM and Transformer Models	LSTM + Transformer-Based Parallel Spatiotemporal Network	PeMSD7, METR-LA	RMSE, MAE, MAPE
The Predictive Limitations of a Single Temporal Scale and Spatial Structure [40]	Enhancing Predictions with Multiple Temporal Scales and Augmented Spatial Information	Multi-Scale Temporal Encoder + Enhanced Spatial Block	PEMS03, PEMS04, PEMS07, PEMS08 [43]	MAE, RMSE, MAPE
The fixed spatiotemporal attention structure is not flexible for short-term forecasting [41]	Reflecting Dynamic Spatiotemporal Relationships with a Self-Progressive Architecture	Progressive Space-Time Attention + Temporal Refinement	PEMS-BAY, METR-LA	MAE, RMSE, MAPE
Lack of Incorporation of Interregional Heterogeneity and Time Series Non-Stationarity [42]	Enhancing Long-Term Forecasting Through Regional Spatial Token Learning	Location-wise Token + Spatial Multi-head Attention	PEMS03, PEMS04, PEMS07, PEMS08	MAE, RMSE, MAPE

2.3. Environmental Information Based Traffic Prediction

As summarized in Table 3, recent studies on environmental information–based traffic prediction have sought to integrate meteorological, infrastructural, and contextual factors into deep learning frameworks to better capture real-world variability [44,45,46,47,48,49,50].

These works collectively highlight a growing recognition that traffic flow is not solely determined by temporal dynamics or spatial topology but is also substantially influenced by external conditions such as temperature, precipitation, humidity, and nearby facilities.

To achieve this, prior research has expanded traditional ST-GCN and CNN–LSTM models through attribute augmentation, selective attention, and dynamic weighting mechanisms that enable models to reflect nonlinear and heterogeneous external effects. However, most of these approaches remain tailored to urban transportation contexts—particularly road or subway systems—where spatial adjacency and physical node connections are explicit. Consequently, their applicability to non-transportation domains, such as industrial communication networks, remains uncertain due to differences in data granularity, topological definition, and external factor relevance.

In contrast, the present study focuses on network traffic within industrial infrastructure, where environmental variables indirectly affect communication behavior rather than physical mobility. Instead of constructing spatial graphs or augmenting external attributes within a GCN, this work isolates and evaluates environmental variables as independent explanatory factors. By systematically analyzing how temperature, wind speed, and sunlight duration influence predictive accuracy across multiple model architectures under identical experimental conditions, we provide an interpretable and statistically validated framework for understanding environment-aware traffic prediction. This approach bridges the gap between environmental sensitivity and model simplicity, emphasizing generalizability and operational feasibility over architectural complexity.

Table 3. Summary of Traffic Prediction Models Based on Environmental Information.

Problem Definition	Research Objectives	Proposed Method	Dataset	Performance Evaluation Metrics
Conventional metro flow prediction methods fail to capture complex correlations as they separate temporal and spatial interactions [44]	Incorporating spatiotemporal continuity of metro flow using an integrated model based on CNN and LSTM	A spatiotemporal deep learning architecture combining CNN and LSTM	Real-world usage data from urban metro systems	RMSE, MAE
Static graphs fail to incorporate external attributes [45]	Enhancing spatiotemporal prediction performance through the integration of Attribute information	Attribute-Augmented ST-GCN	Traffic flow spatiotemporal graph data	MAE, RMSE, MAPE
Euclidean-based structures struggle to represent non-Euclidean relationships effectively [46]	Learning non-Euclidean structures and incorporating external factors	An attention-based non-Euclidean spatiotemporal model	Traffic flow data incorporating weather factors	RMSE, MAE, MAPE
Information loss occurs when external factors are simply merged [47]	Reflecting the temporal and spatial significance of external factors	ST-GCN (Spatio-Temporal Graph Convolutional Network) combined with Selective Attention	Traffic and weather data for multi-step traffic forecasting	RMSE, MAE, MAPE
Meteorological factors have a nonlinear impact on predictions, which is inadequately captured by existing models [48]	Improving prediction accuracy based on weather data	ST-Fusion GCN (Spatio-Temporal Fusion Graph Convolutional Network integrating Weather Data)	Traffic volume combined with meteorological data (temperature, precipitation, humidity)	MAE, RMSE
Insufficient incorporation of the spatial impact of urban infrastructure [49]	Refinement of spatial relationships based on facility proximity	Application of graph weights based on Urban Features	Traffic flow combined with surrounding facility location information.	MAE, RMSE
High-precision models are inefficient in edge environments [50]	Lightweight LLM-based traffic prediction	Lightweight LLM combined with spatiotemporal features	Large-scale traffic time series data	MAE, RMSE, MAPE, Model size, Computational speed

3. Proposed Experimental Design and Methodology for Network Traffic Prediction

3.1. Data Collection and Preprocessing

The study aims to evaluate the performance of a time-series-based network traffic prediction model by integrating actual operational data with external environmental information. Additionally, it seeks to verify the applicability of various deep learning architectures. The data used in this experiment is based on a two-year SNMP-based real network traffic log provided by Korea South-East Power Co., Ltd., a major Korean public energy enterprise located at 78 Sadeul-ro, Jinju-si, Gyeongsangnam-do 52852, Republic of Korea. Among the hourly measurements, the maximum daily traffic value (in Mbps) is designated as the prediction target, serving as a key indicator for network infrastructure design and capacity planning.

3.1.1. Data Description

In this study, the prediction target (output variable) is the maximum daily network traffic value, measured in megabits per second (Mbps) and derived from hourly SNMP logs. The input features include both network-related and environmental variables to comprehensively capture internal temporal trends and external environmental influences. Specifically, the network-related feature represents the historical traffic record from the preceding seven days, while the environmental features consist of average temperature, minimum temperature, maximum temperature, average wind speed, maximum wind speed, maximum instantaneous wind speed, and total sunlight hours, and days. Here, days represents the day of the week (integer index 0–6) to capture weekly seasonality.

Consequently, each input sample forms an eight-dimensional multivariate feature vector for every day within a seven-day sequence, and the model is trained to predict the traffic value of the subsequent day. This configuration allows the models to simultaneously learn short-term temporal dependencies in network behavior and the contextual impact of meteorological factors, thereby enhancing both predictive accuracy and interpretability. To ensure reproducibility and transparency, detailed characteristics of the dataset are summarized in Table 4.

3.1.2. Data Preprocessing

All input variables were normalized to the range of [0, 1] using the Min–Max scaling method to ensure consistency in numerical magnitude across features [51,52].

Each input feature and the target variable are Min–Max normalized using parameters computed from the training split only. For the target, we apply

y^{'} = (y - y_{m i n}) / (y_{m a x} - y_{m i n})

during training and inverse-transform predictions for evaluation as

\hat{y} = y_{m i n} + {\hat{y}}^{'} (y_{m a x} - y_{m i n})

. Therefore, squared-error scales as

{S E}_{orig} = (y_{m a x} - y_{m i n})^{2} {S E}_{norm}

and similarly for MSE. Numerical example: assume

y_{m i n} = 100

,

y_{m a x} = 600

(Mbps). If

y_{true}^{'} = 0.50

and

y_{pred}^{'} = 0.52

, then

y_{true} = 350

Mbps and

y_{pred} = 360

Mbps. The absolute error is 10 Mbps and the squared error is 100 (Mbps²). In normalized space, the absolute error is 0.02 and the squared error is 0.0004; multiplying by

(600 - 100)^{2} = 250,000

yields 100 (Mbps²). In this study, all reported MSE/MAE/RMSE/MAPE are computed in the original Mbps scale after inverse transformation.

Following normalization, missing values and anomalous observations were identified and corrected through a standard data cleansing pipeline, thereby improving overall data quality and ensuring stable model convergence. After preprocessing, the time-series dataset was chronologically partitioned into 60% for training, 20% for validation, and 20% for testing to prevent data leakage and preserve temporal order. To explicitly prevent data leakage, we employed a strict sliding-window methodology on chronologically sorted data. Let

X_{t} = {x_{t - 6}, \dots, x_{t}}

represent the input vector containing traffic and environmental data observed strictly up to day t. The model is trained to predict

y_{t + 1}

(traffic of the next day). The Test Set consists solely of data from the final 20% of the timeline, ensuring that no future information is accessible during the training phase.

For supervised learning, the dataset was then restructured using a sliding-window approach, where a consecutive 7-day sequence of multivariate features is used to predict the Maximum Traffic Value of the next day (one-day-ahead forecasting). To clarify the model input format, the final tensor structure is explicitly defined as:

X \in R^{(B, 7, F)}

(1)

In this formulation,

B

denotes the batch size used during training, 7 represents the seven consecutive days included in each input sequence, and

F = 8

represents the total number of daily input features, consisting of one network-related variable and seven meteorological variables.

The target output is expressed as:

y \in R^{(B, 1)}

(2)

representing the Maximum Traffic Value on day

t + 1

.

3.2. Model Architecture and Configuration

After data preprocessing, this study selected five deep learning models for comparison to effectively learn the complex patterns of time-series data: Recurrent Neural Network (RNN), Long Short-Term Memory (LSTM), Gated Recurrent Unit (GRU), Convolutional Neural Network (CNN), and Transformer. These models differ in their structural characteristics and how they learn temporal and spatial information. Training and evaluation were conducted under consistent experimental conditions to compare predictive performance and structural interpretability.

Specifically, to ensure a fair evaluation and prevent overfitting, the dataset was partitioned into training (60%), validation (20%), and testing (20%) sets. The training set was used to optimize model parameters, while the validation set was employed for hyperparameter tuning and model selection, such as identifying the optimal number of epochs and batch sizes. The test set was strictly reserved for the final performance assessment of the selected models.

RNN is effective for modeling short-term temporal dependencies but suffers from difficulty in retaining information across long sequences. LSTM alleviates this limitation by introducing memory cells and gating mechanisms designed to capture long-term dependencies. GRU simplifies the LSTM structure by reducing the number of gates, achieving competitive performance while improving computational efficiency. CNN treats time-series data as one-dimensional spatial signals, enabling efficient extraction of local temporal features through convolutional filters and supporting parallel computation. The Transformer model employs a self-attention mechanism that simultaneously processes the entire sequence and effectively models long-range dependencies, making it suitable for complex forecasting tasks.

All models were configured with an input sequence length of seven days and trained to predict one day ahead. The recurrent architectures used 64 hidden units in the first layer and 32 units in the second layer, applying Tanh as the activation function for the recurrent layers and Linear for the output layer. Training employed the MSE loss function and the Adam optimizer with a learning rate of 0.001. Each model was trained for 100 to 1000 epochs in increments of 100, with batch sizes of 7, 14, and 21, and each configuration was repeated five times to ensure statistical reliability. Initial weights and random seeds were fixed to guarantee reproducibility.

To maintain fairness across model families, the input format, sequence length, and output configuration were kept identical for all architectures. Although various Transformer variants such as Autoformer, Informer, and MTESformer exist, this study intentionally focused on evaluating the performance of the basic Transformer structure [53,54,55].

The CNN model was implemented using a 1D convolutional architecture to extract localized features along the time axis. By experimenting with multiple architectures under identical conditions, the study aimed to quantitatively and visually analyze how each model’s learning mechanism interacts with external environmental variables and influences the complexity of the time-series structure.

3.3. Architectural Details and Training Configuration

All models were implemented using the Keras 3.0 framework with a TensorFlow 2.16 backend [56,57]. To ensure fair comparison, each architecture was configured to maintain a comparable number of trainable parameters. The RNN, LSTM, and GRU models consisted of two stacked recurrent layers with 64 and 32 hidden units, respectively. Consistent with Section 3.2, the recurrent layers employed the Tanh activation function, while the internal gating mechanisms of LSTM and GRU used their standard sigmoid–Tanh combinations. A final Dense (1) layer was used to output the next-day Maximum Traffic Value.

The CNN model employed two 1D convolutional layers with 64 and 32 filters and a kernel size of 3, each followed by ReLU activation, and a Global Average Pooling layer before the final Dense (1) output. The Transformer architecture consisted of a Dense (64) embedding layer, Layer Normalization, a Multi-Head Attention module with two heads and a key dimension of 32, and a feed-forward block composed of Dense (64, ReLU), with residual connections (Add) and Global Average Pooling prior to the output layer.

All models were trained using the Adam optimizer (learning rate = 0.001) with mean squared error (MSE) as the loss function. Training epochs ranged from 100 to 1000 in increments of 100, and batch sizes were set to 7, 14, or 21 to align with the time-series sequence length [58,59]. No dropout or early stopping was applied so that the effect of architectural differences could be isolated. All input features were normalized to the [0, 1] range before training, and evaluation metrics (MSE, RMSE, MAE) were computed on the denormalized scale for interpretability.

Classical forecasting models such as persistence and ARIMA were not included as baselines because they are fundamentally incompatible with the multivariate, nonlinear, and aggregated nature of the Maximum Traffic Value series. Moreover, their inability to incorporate environmental variables would break the controlled comparison framework established for evaluating deep learning architectures.

3.4. Comparative Experimental Design and Integration Analysis of External Factors

To quantitatively analyze the performance of previously established deep learning-based time-series forecasting models (RNN, GRU, LSTM, CNN, Transformer), this study designed comparative experiments focusing on the inclusion of external environmental factors. By maintaining consistent model architectures and input formats across all experiments, the study systematically verified how predictive performance varies under different compositions of external variables.

To examine the influence of meteorological attributes such as average temperature, minimum temperature, maximum temperature, average wind speed, maximum wind speed, peak gust speed, and total sunshine duration, the experiments were conducted under three distinct input configurations. First, a model without external factors was constructed using only traffic data to establish a baseline representing the predictive power of traffic variables alone. Second, a model with external factors incorporated all available variables, including both traffic and meteorological information, to determine whether environmental context enhances forecasting accuracy. Finally, a reduced model using key variables only was developed by removing input variables that exhibited low correlation or low predictive contribution based on variable importance analysis. This reduced configuration aimed to assess whether a more compact, refined feature set improves model efficiency while mitigating unnecessary noise.

The influence of external factors was evaluated using the Permutation Importance method, which quantifies the relative contribution of each variable by randomly shuffling its values and measuring the resulting increase in prediction error (MSE). Beyond measuring individual contributions, this analysis also provided a basis for redesigning the reduced-variable model.

While Shapley Additive Explanations (SHAP) [60] or Partial Dependence Plot (PDP) [61] analyses could offer additional insight into nonlinear interactions and marginal effects, these methods were not conducted in this study because the primary objective was to compare predictive architectures under controlled conditions rather than to generate explainable AI (XAI) visualizations [60,62]. Nevertheless, the proposed framework can be readily extended to incorporate SHAP- or PDP-based analyses in future research to provide deeper interpretability regarding how environmental variables affect model predictions.

Based on the variable importance analysis, key features with strong predictive contributions were identified, and the models were retrained using these simplified input combinations to verify whether similar performance could be achieved with fewer variables. All experiments were repeated five times under identical conditions, and the average values of five evaluation metrics—MSE, MAE, RMSE, MAPE, and R²—were compared.

3.5. Performance Evaluation and Variable Importance Analysis

This study employed multiple evaluation metrics rather than relying on a single indicator to comprehensively assess model performance and quantify the influence of external environmental variables. Five representative regression metrics were used to examine different perspectives of predictive accuracy, including absolute error magnitude, sensitivity to variance, and explanatory power. The metrics are mathematically defined as follows. For true values

y_{i}

and predicted values

{\hat{y}}_{i}

across

n

samples:

M S E = \frac{1}{n} \sum_{i = 1}^{n} {(y_{i} - {\hat{y}}_{i})}^{2},

(3)

R M S E = \sqrt{M S E},

(4)

M A E = \frac{1}{n} \sum_{i = 1}^{n} | (y_{i} - {\hat{y}}_{i}) |,

(5)

M A P E = 100 \times \frac{1}{n} \sum_{i = 1}^{n} | \frac{y_{i} - {\hat{y}}_{i}}{y_{i}} |,

(6)

R^{2} = 1 - \frac{\sum_{i = 1}^{n} {(y_{i} - {\hat{y}}_{i})}^{2}}{\sum_{i = 1}^{n} {(y_{i} - \bar{y})}^{2}},

(7)

These formulations replace the previously included table and provide formal definitions of the criteria used to evaluate predictive performance throughout the study.

The predictive contributions of individual environmental factors such as temperature, wind speed, and total sunlight hours were assessed using the Permutation Importance method. For a given variable

X_{j}

, its importance is defined as:

P I (X_{j}) = {M S E}_{p e r m (X_{j})} - {M S E}_{b a s e l i n e}

(8)

where

{MSE}_{perm (X_{j})}

denotes the error after randomly shuffling the variable to break its relationship with the target. A larger positive value indicates a greater contribution to predictive accuracy, while negative scores suggest noise or adverse influence.

The models were retrained using only the key variables (e.g., average temperature, maximum wind speed, and total sunlight hours) to assess whether reducing the input dimensionality could maintain comparable forecasting performance while improving interpretability, efficiency, and computational resource usage.

All experiments were repeated five times under identical hyperparameter settings and input configurations, and the mean values of the evaluation metrics were used for comparison. This repeated-measure design enabled statistically robust assessment of how different variable selection strategies impact model performance and provided empirical evidence supporting their practical applicability.

4. Experimental Results and Analysis

4.1. Performance Analysis of Deep Learning-Based Predictive Models

In this section, we compare the predictive performance of five deep learning models—GRU, RNN, LSTM, CNN, and Transformer—under identical conditions. The experiments used only the ‘Maximum Traffic Value’ as the input feature, excluding any external environmental information. The number of epochs varied from 100 to 1000 in increments of 100, and batch sizes were set at 7, 14, and 21, resulting in a total of 150 experiments. To ensure consistency in experimental conditions, the mean squared error (MSE) was used as the loss function, Adam was the optimizer. Regarding activation functions, Tanh was applied to the recurrent layers of the RNN, LSTM, and GRU models, while ReLU was used for the CNN and Transformer models, consistent with the architectural details provided in Section 3.3. Model performance was evaluated using five metrics: MSE, mean absolute error (MAE), root mean squared error (RMSE), R², and mean absolute percentage error (MAPE). Key results are summarized in Table 5 and Table 6.

Under the conditions of Epoch 800 and Batch Size 14, the RNN achieved an MSE of 750.19, recording the lowest prediction error among all conditions. The GRU, under Epoch 1000 and Batch Size 7, recorded an MSE of 3400.24, demonstrating relatively stable and consistent predictive performance. The LSTM showed favorable results with an MSE of 4226.99 at Epoch 800 and Batch Size 7, though it showed some sensitivity to training convergence. The CNN recorded its lowest MSE of 6539.10 at Epoch 100 and Batch Size 7 but showed limitations in time series learning structures. The Transformer had its lowest MSE at 9506.48, indicating a need for improvement in learning stability and convergence speed.

Table 6. Optimal Performance in the Traffic-Only Scenario: Five Deep Learning Models and a Persistence Baseline under Identical Parameters (Epoch = 1000, Batch Size = 7).

Model	MSE	MAE	RMSE	R²	MAPE
GRU	3400.24	39.34	58.31	0.99	3.46
RNN	1015.46	20.12	31.87	0.99	1.56
LSTM	5083.90	49.11	71.30	0.98	4.07
CNN	7149.71	58.55	84.56	0.97	4.55
Transformer	9908.81	60.92	99.54	0.96	4.88
Persistence	2636.06	44.17	51.34	−0.07	2.1

Persistence (one-step naive) denotes

{\hat{y}}_{t + 1} = y_{t}

. Metrics are computed in Mbps after inverse transformation;

R^{2}

may be slightly negative for naive baselines on short windows.

Under the common conditions of Epoch 1000 and Batch Size 7, the RNN exhibited the most outstanding performance among all models, with an MSE of 1015.46 and an MAE of 20.12. This suggests that a simpler structure may enhance generalization capabilities. The GRU demonstrated stability with an MSE of 3400.24 and an MAE of 39.34, although it had slightly higher errors compared to the RNN. The LSTM showed an MSE of 5083.90 and an MAE of 49.11, indicating a tendency for lower convergence stability. The CNN, with an MSE of 7149.71 and an MAE of 58.55, revealed structural limitations in time series prediction. The Transformer had the lowest performance among all models, with an MSE of 9908.81 and an MAE of 60.92, suggesting that the basic Transformer structure may struggle with convergence when applied to small-scale time series datasets.

These results empirically demonstrate that recurrent architectures—particularly RNN and GRU—outperform more complex structures such as LSTM, CNN, and Transformer in short-term network traffic forecasting. Among them, the RNN achieved the most stable and reproducible performance across all experimental conditions, showing the lowest average MSE and MAE with minimal sensitivity to hyperparameter variation. The GRU, while occasionally attaining comparable or slightly lower minimum errors in specific configurations, required more careful tuning to maintain consistent results. Therefore, the plain RNN is identified as the most reliable and practically applicable model for real-world deployment, offering an optimal trade-off between prediction accuracy, computational efficiency, and implementation simplicity.

4.2. Verification of Performance Differences and Statistical Significance Across Models

To statistically analyze the performance differences among deep learning-based predictive models (GRU, RNN, LSTM, CNN, Transformer), a normality test and analysis of variance (ANOVA) were conducted based on the mean squared error (MSE) obtained from each model. The Shapiro–Wilk test was applied to verify whether the MSE data of the models follow a normal distribution, with results presented in Table 7 [63].

The normality test results indicated that under conditions without environmental variables, most models satisfied normality, except for the RNN and Transformer, which did not at a significance level of 0.05. When all environmental variables were included, only the CNN satisfied normality. It is important to note that the Shapiro–Wilk test can be sensitive with small sample sizes, and ANOVA is relatively robust to violations of normality when group sizes are similar.

To eliminate ambiguity in the comparison structure, we clarify that the ANOVA and Tukey HSD analyses were performed only on the MSE values from the two primary experimental scenarios: (1) the Traffic-only condition and (2) the Include8 condition, which incorporates all environmental variables. The Include4 configuration, which involves feature reduction based on variable importance, was analyzed separately in Section 4.5 and was therefore not included in the ANOVA framework.

For the ANOVA procedure, each model contributed 10 independent MSE samples (five repetitions under the baseline condition and five repetitions under the environmental-variables condition), resulting in a total of 50 observations included in the analysis. Homoscedasticity was verified using Levene’s test [64], which produced a non-significant result (p > 0.05), confirming that the variances across model groups can be considered statistically equivalent. Therefore, the assumptions for one-way ANOVA were adequately satisfied. The results indicated that the differences in MSE among the deep learning-based prediction models were statistically significant, with significance confirmed at a p-value < 0.001. Consequently, Tukey’s Honest Significant Difference (HSD) test was performed as a post hoc analysis, revealing the statistical differences among the major models as shown in Table 8 [65,66].

The mean MSE difference between the GRU and CNN was −2028.58, statistically significant at the p = 0.0123 level. Between the RNN and CNN, the mean MSE difference was −6071.34, significant at the p = 0.0472 level. Comparisons between the LSTM and CNN, as well as the Transformer and CNN, also showed significant performance differences with p < 0.05. These results suggest that relatively simple recurrent neural network-based models (RNN, GRU) are statistically superior in predictive performance based on MSE compared to more complex models like CNN and Transformer. Notably, the RNN demonstrated excellent results not only in average performance but also in performance consistency (in terms of variability), indicating that simple recurrent structures can be effectively applied to time series prediction problems.

4.3. Analysis of Predictive Performance Variations with Environmental Variable Integration

Integrating environmental information into deep learning-based time series prediction models can significantly enhance or degrade performance, depending on the model. This suggests that issues such as low correlation between environmental variables or multicollinearity may lead to the model learning unnecessary information, potentially causing overfitting.

For instance, in the GRU model, the mean MSE was relatively high at 7009.41 without environmental information but drastically decreased to 1151.46 after integration. The MAE and RMSE also decreased from 54.48 to 25.05 and from 82.66 to 18.82, respectively, while the MAPE improved from 4.41% to 1.492%. Although the R² increased from 0.98 to 0.99, the change was smaller compared to other metrics.

In the RNN architecture, introducing environmental variables had the most positive effect. The mean MSE decreased from 2966.65 to 407.32, RMSE from 51.43 to 13.63, and R² improved from 0.99, indicating uniform performance enhancement across all metrics. These results suggest that RNNs effectively capture changes in external factors over time.

The LSTM model showed substantial improvements in MSE and RMSE but only limited gains in R², and some experimental conditions exhibited unstable fluctuations. This suggests that while LSTMs are strong at capturing long-term patterns, they may be relatively limited in reflecting short-term variations such as meteorological factors.

These results are summarized in Table 9, which quantitatively compares how integrating environmental variables influences predictive performance across different model architectures. The analysis indicates that while RNNs demonstrated the most stable and significant performance gains, GRU and LSTM models exhibited mixed responses, including performance degradation under certain conditions. Therefore, rather than uniformly applying environmental variables to all models, employing a selective integration strategy that aligns variable characteristics with model architectures can yield more reliable results.

4.4. Correlation Analysis Among Environmental Variables

Figure 1 presents a heatmap visualization of Pearson correlation coefficients between the maximum traffic value and various environmental variables, allowing for an intuitive interpretation of the strength of linear relationships among the variables [67]. The analysis revealed strong positive correlations exceeding 0.90 among average, minimum, and maximum temperatures, indicating that these variables share similar seasonal fluctuation patterns.

In contrast, the target variable, Maximum Traffic, showed very weak correlations with all environmental variables, with absolute correlation coefficients remaining below 0.07. Specifically, the correlations with average temperature (−0.06), maximum temperature (−0.06), and total sunlight hours (−0.07) were negligible. This suggests that simple linear relationships are insufficient to explain the influence of environmental conditions on traffic variations, highlighting the need to consider nonlinear or interaction-based complex relationship structures.

Furthermore, wind-related variables (average wind speed, maximum wind speed, and instantaneous wind speed) exhibited strong mutual correlations but showed weak correlations with the target variable. This implies a potential risk of multicollinearity, which could affect model stability. Therefore, during model refinement, variables with low informational contribution or high multicollinearity were excluded, and only key features were selected. As a result, the model was retrained using a refined input set consisting of four features: Maximum Traffic, Average Temperature (°C), Maximum Temperature (°C), and Total Sunlight Hours (hr).

Table 10 summarizes the prediction performance of GRU, LSTM, RNN, CNN, and Transformer models trained using only core features, after removing variables with low correlation and high multicollinearity among environmental factors. The results are organized according to various combinations of epoch and batch size. Performance evaluation showed a clear improvement in most models compared to previous experiments, with GRU and RNN models demonstrating particularly high predictive accuracy.

The GRU model achieved its best performance with Epoch = 700 and Batch Size = 7, yielding an MSE of 11.95, MAE of 2.73, RMSE of 3.46, MAPE of 0.23%, and R² = 0.99. The LSTM model showed comparable results under Epoch = 700 and Batch Size = 14, recording an MSE of 11.43, MAE of 2.60, RMSE of 3.38, MAPE of 0.216%, and R² = 1.0000. Although the LSTM model showed relatively higher error rates—with MSE = 42.92, MAE = 5.05, and RMSE = 6.55 under Epoch = 1000 and Batch Size = 21—it maintained strong prediction stability, achieving R² = 0.99. For the CNN model, optimal performance was observed at Epoch = 900 and Batch Size = 7, with MSE = 5891.24, MAE = 53.32, RMSE = 76.75, and R² = 0.99. The Transformer model performed best under Epoch = 800 and Batch Size = 7, with MSE = 9621.59, MAE = 61.70, RMSE = 98.09, and R² = 0.97. However, both CNN and Transformer models exhibited substantially higher prediction errors compared to GRU and RNN, suggesting that they are relatively less suitable for time series-based traffic prediction tasks.

The selection of epoch and batch size followed a systematic exploration process to analyze model convergence behavior and learning stability. Specifically, the number of epochs varied from 100 to 1000 in increments of 100 to capture the transition from underfitting to potential overfitting and to evaluate performance trends at different convergence stages. Batch sizes of 7, 14, and 21 were chosen to correspond to one-, two-, and three-week temporal groupings, respectively, reflecting typical periodic patterns observed in network traffic. This configuration also ensured stable gradient updates and consistent memory utilization across models. The ranges were determined through preliminary pilot training, which identified the boundaries of stable learning without gradient explosion or vanishing effects. All combinations were trained under identical conditions, and average results from repeated runs were reported to ensure reliability and reproducibility.

Prior to visualization, two distinct experimental settings were defined to analyze the effect of environmental variable inclusion. The term Include8 refers to the configuration that incorporates all eight environmental variables—average temperature, minimum temperature, maximum temperature, average wind speed, maximum wind speed, maximum instantaneous wind speed, total sunlight duration, and days—alongside traffic data. In contrast, Include4 denotes the model trained using only four key variables selected through correlation and multicollinearity analysis: Maximum Traffic, Average Temperature (°C), Maximum Temperature (°C), and Total Sunlight Hours (hr). This distinction enables a direct comparison between full-variable and reduced-variable models to examine how selective environmental feature integration influences model stability and convergence behavior.

Figure 2 compares the changes in average MSE according to increasing epochs, illustrating the impact of including environmental information on model convergence speed and prediction stability under training conditions using only core variables. In the range of Epochs 100–400, models incorporating environmental information (Include4, Include8) showed a rapid improvement in performance, whereas models excluding environmental variables maintained relatively high error levels and exhibited a more gradual convergence trend. This indicates that selective integration of environmental variables can effectively enhance model performance without inducing overfitting.

According to the quantitative performance comparison in Table 11, the GRU model achieved the best performance with an MSE of 66.30, MAE of 6.32, RMSE of 8.14, MAPE of 0.52%, and R² = 0.99, demonstrating the lowest error and highest prediction accuracy. The RNN model showed slightly higher error rates (MSE = 172.62, MAE = 9.67, RMSE = 13.14) but maintained a high level of explanatory power with R² = 0.99. The LSTM model yielded comparable performance to GRU, with MSE = 68.81, MAE = 6.19, RMSE = 8.30, MAPE = 0.46%, and R² = 0.99. In contrast, the CNN and Transformer models showed significantly higher error levels, with MSEs of 6677.75 and 10,869.45, respectively. Their R² values were also relatively lower at 0.98 and 0.96, suggesting limited predictive suitability, even with the inclusion of environmental variables. These results suggest that the selection of core variables contributes to preventing overfitting and enhancing the generalization capability of the models. In particular, RNN and GRU models are effective in incorporating environmental factors in short-term time series prediction tasks and can achieve optimized performance through the elimination of unnecessary variables. Furthermore, the findings highlight the importance of considering the interaction between variable selection strategies and model architectures when designing predictive models. Consequently, the appropriate integration of environmental information is shown to be a key factor in enhancing the performance of deep learning-based time series models, contributing to faster convergence in the early stages of training and improved overall prediction stability.

4.5. Variable Importance Analysis

In this section, the Permutation Importance method was applied to quantitatively evaluate the predictive contribution of each input feature for five deep learning models: RNN, LSTM, GRU, CNN, and Transformer. All experiments were conducted under identical hyperparameter settings (Epoch = 1000, Batch Size = 7), using four environmental variables as input features: Maximum Traffic Value, Average Temperature, Maximum Temperature, and Total Sunlight Hours. Permutation Importance estimates the relative contribution of each variable by randomly shuffling its values—thereby disrupting the association between the variable and the target—and measuring the resulting change in the model’s prediction performance (MSE). A higher importance score indicates a greater contribution to predictive accuracy, whereas a low or negative score suggests limited usefulness or even a detrimental effect.

Figure 3 compares the Permutation Importance results across the five models. Overall, Average Temperature and Maximum Temperature demonstrated relatively high importance in most architectures. Maximum Temperature emerged as the most influential variable in the RNN and GRU models, suggesting that these recurrent structures effectively capture temporal fluctuations in temperature-related patterns.

In contrast, the Total Sunlight Hours variable in the GRU model exhibited near-zero or negative importance, indicating that shuffling this variable actually reduced prediction error. This suggests that Total Sunlight Hours may lack meaningful predictive information in this context or may act as noise. Similarly, the days variable in the GRU model displayed negative importance scores, implying that it may introduce unnecessary variance, thereby impairing generalization.

For the LSTM model, all variables showed relatively low importance. This behavior aligns with LSTM’s architectural optimization for capturing long-term dependencies, which may not align well with the short-term fluctuations reflected in the environmental variables used in this study.

Although the Pearson correlation coefficients between individual environmental variables and the Maximum Traffic Value were close to zero, this does not imply that such variables lack predictive relevance. Pearson correlation captures only linear and instantaneous relationships, whereas real-world network traffic often responds to weather patterns through nonlinear effects, seasonal dynamics, and delayed influence. Recurrent models such as RNN and GRU can capture these nonlinear, higher-order dependencies even when simple linear correlation statistics suggest weak relationships. This explains why environmental variables improved predictive accuracy despite their near-zero Pearson correlations.

In Table 12, the permutation importance values represent the absolute increase in MSE, computed by measuring the difference between the model’s baseline loss and the loss obtained after shuffling a specific variable. No normalization or scaling was applied to the MSE values; thus, each score directly reflects the magnitude of prediction degradation attributable to disrupting the information contained in that feature. Importantly, negative values indicate that the model’s performance improved when the variable was permuted. Such negative importance does not imply that the variable is inherently detrimental but instead reflects issues such as multicollinearity, temporal misalignment, or structural incompatibility between the feature and the model architecture. These patterns were especially notable in CNN and Transformer models, which displayed sensitivity to weakly correlated or noisy environmental variables under small-data conditions.

Additionally, GRU and RNN models showed relatively stable importance scores for Total Sunlight Hours and days variables, whereas the LSTM architecture consistently recorded lower values across all variables. These patterns reinforce the observation that recurrent models are better aligned with short-horizon, daily multivariate forecasting tasks, while more complex architectures may fail to leverage environmental information efficiently under limited data.

In conclusion, the variable importance analysis highlights that including only core features with demonstrated predictive contribution—rather than indiscriminately incorporating all environmental variables—offers a practical strategy for improving forecasting performance and mitigating overfitting. These findings provide meaningful guidance for feature selection in deep learning-based time-series forecasting and offer a foundation for more advanced variable selection strategies in future research.

4.6. Comprehensive Analysis

In this study, we quantitatively compared the performance of five deep learning-based time series forecasting models—GRU, RNN, LSTM, CNN, and Transformer—under various conditions, including different hyperparameter settings, the inclusion of environmental variables, and feature selection strategies. The experimental results showed that RNN and GRU models, both employing relatively simple recurrent structures, consistently demonstrated superior prediction accuracy and stability. Their performance was particularly enhanced when environmental information was selectively integrated. In contrast, CNN and Transformer models, with their more complex architectures, showed limitations in learning temporal characteristics. Negative Permutation Importance scores for certain variables further suggested the presence of informational noise in these models.

The superior performance of GRU and RNN can be attributed to their architectural alignment with the characteristics of the dataset. Both models are optimized for capturing short-term temporal dependencies and sequential continuity, which dominate the dynamics of daily network traffic. The GRU’s simplified gating mechanism, compared with the more complex structure of LSTM, facilitates efficient parameter updating and prevents overfitting in relatively small-scale data, leading to faster convergence and stable generalization.

In contrast, the CNN and Transformer models exhibited relatively lower accuracy because their architectural strengths were not fully leveraged in this experimental context. The CNN’s convolutional filters effectively extract local temporal patterns but struggle to preserve long-range dependencies without recurrence, resulting in a loss of contextual information across days. The Transformer, while theoretically powerful for long-horizon sequence modeling, tends to overfit or become unstable when applied to short sequences (seven-day inputs) and limited datasets. Moreover, attention-based models are more sensitive to noisy or weakly correlated environmental variables, which may explain their negative Permutation Importance values.

Collectively, these findings highlight that, under moderate data scales and short forecasting windows, recurrent structures such as GRU and RNN achieve an optimal balance between temporal representation capacity, computational efficiency, and robustness to environmental variability.

In addition to these structural insights, statistical validation using the Shapiro–Wilk normality test, one-way ANOVA, and Tukey HSD post hoc analysis confirmed that the differences in model performance were statistically significant, indicating that the observed numerical differences were not due to random chance. Furthermore, among the input features, Average Temperature and Maximum Temperature were identified as major contributing variables, while the removal of irrelevant or highly collinear variables contributed to improved model generalization.

Beyond predictive accuracy, computational performance metrics were also evaluated to assess the feasibility of each model for deployment in network or embedded environments. These metrics included the average training time per epoch, the total number of trainable parameters, and the estimated floating-point operations per second (FLOPs) required for a single forward pass. Experimental profiling showed that GRU and RNN models required significantly fewer parameters and achieved faster training times compared to LSTM, CNN, and Transformer. In contrast, the Transformer model exhibited the highest computational cost due to its multi-head attention and dense connections, leading to increased FLOPs and memory requirements.

These results emphasize that, under real-world constraints such as limited hardware capacity or energy efficiency requirements, lightweight recurrent models provide a more balanced trade-off between accuracy and computational complexity. Future research should extend this analysis by benchmarking inference latency, throughput, and energy consumption across hardware platforms to evaluate deployment feasibility in edge computing and real-time network monitoring systems.

Overall, this study systematically analyzed how model selection, input feature composition, and hyperparameter tuning affect performance in time series–based traffic prediction tasks that incorporate environmental information. It empirically demonstrated the practical superiority of GRU and RNN models and the effectiveness of selective environmental variable integration in enhancing prediction accuracy. Future work may involve expanding practical insights through visualizations of model-specific predictions, comparisons of computational complexity, and evaluations of generalizability in real-world network applications.

4.7. Limitations and Future Work

Although this study provides meaningful insights into the integration of environmental information for network traffic prediction, several limitations should be acknowledged.

First, the dataset used in this research was derived from a specific regional network operated by Korea South-East Power Co., Ltd. While it reflects real operational characteristics, such regional dependency may introduce bias and limit the generalizability of the results to other network environments or countries. Future work should expand the dataset to include multi-regional or cross-organizational traffic data to enhance model robustness and external validity.

Second, the current framework primarily focuses on meteorological and environmental variables while excluding potential social and behavioral factors such as population density, human mobility, and event-driven variations in network demand. Incorporating these social variables could provide a more comprehensive understanding of external influences on network traffic patterns, particularly in urban or consumer-oriented systems.

Third, this study only examined conventional deep learning architectures. Future extensions could explore hybrid models that integrate graph neural networks (GNNs) with Transformer-based attention mechanisms to better capture spatial–temporal and relational dependencies across network nodes. A hybrid GNN–Transformer architecture could enhance the ability to model complex network topologies and improve predictive accuracy in distributed and interconnected systems.

In addition, the feature importance analysis in this study was based solely on the Permutation Importance method, which, although effective for quantifying variable contribution, does not account for complex feature interactions or nonlinear dependencies among variables. To provide more interpretable and context-aware insights, future work should incorporate advanced explainable AI (XAI) techniques such as SHapley Additive exPlanations (SHAP) and Partial Dependence Plots (PDP). These methods can help visualize how each feature influences model output both independently and in interaction with others, thereby enhancing transparency and interpretability of deep learning–based network traffic prediction models.

Addressing these limitations in future work will contribute to developing more generalizable, scalable, and context-aware frameworks for network traffic forecasting and intelligent resource management.

5. Conclusions

This study quantitatively compared and analyzed the network traffic prediction performance of five deep learning models—RNN, GRU, LSTM, CNN, and Transformer—using multivariate time series forecasting models that incorporate environmental information. The experimental results showed that GRU and RNN achieved the best performance, with average MSEs of 3821.15 and 3610.90, and R² scores of 0.877 and 0.884, respectively. In contrast, CNN and Transformer models showed relatively lower prediction accuracy, with MSEs of 5850.16 and 8251.19, likely due to structural mismatches with the characteristics of time series data. Furthermore, the performance differences among models were statistically significant, as confirmed by ANOVA and Tukey HSD post hoc tests.

This study identified key factors contributing to model performance, particularly through an analysis of the impact of including environmental variables and evaluating variable importance. The findings indicate that certain meteorological features—such as average temperature, maximum temperature, and total sunlight hours—substantially improve predictive accuracy, whereas variables with high multicollinearity or weak correlations can induce overfitting or degrade model performance.

Through this analysis, the study empirically validated that relatively simple recurrent neural networks, such as the RNN structure, can achieve high generalization performance in time series-based network traffic prediction. It also experimentally demonstrated that an integrated input design, including environmental information, can enhance the convergence speed and accuracy of predictive models. Additionally, by quantitatively analyzing the impact of input variable selection strategies on model performance, the study provides a reference for designing practical and interpretable network traffic prediction systems.

Future research could consider the following approaches: First, expanding the potential for generalization across diverse regions and network conditions by introducing cross-domain learning and domain adaptation techniques. Second, exploring predictive structures that apply time-series models based on Large Language Models (LLMs) or foundation models to enhance the semantic interpretation of complex environmental factors and improve prediction accuracy. Third, designing lightweight models that consider real-time prediction and computational efficiency, along with further validation of their applicability in practical environments.

Author Contributions

Conceptualization, J.K. and I.R.; Methodology, J.K. and I.R.; Software, J.K.; Validation, J.K. and I.R.; Formal analysis, J.K.; Investigation, I.R.; Resources, J.K.; Writing—original draft, J.K.; Writing—review & editing, I.R.; Visualization, J.K.; Supervision, I.R.; Project administration, I.R. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The data presented in this study are available upon request from the first author.

Acknowledgments

During the preparation of this manuscript/study, the author(s) used GPT 4o and GPT 5 for the purposes of English grammar checking and Korean-to-English translation. The authors would like to express their sincere appreciation to Korea South-East Power Co., Ltd. for providing the real network traffic datasets essential to this study. We also extend our thanks to Gwanghoon Park, whose valuable discussions helped us substantially improve this study and enhance the objectivity of the research findings. The authors have reviewed and edited the output and take full responsibility for the content of this publication.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Gao, Z. 5G traffic prediction based on deep learning. Comput. Intell. Neurosci. 2022, 2022, 3174530. [Google Scholar] [CrossRef] [PubMed]
Harir, M.A.N.; Ataro, E.; Nyah, C.T. Machine learning-based fifth-generation network traffic prediction using federated learning. Int. J. Adv. Comput. Sci. 2025, 16, 1. [Google Scholar] [CrossRef]
Xu, L.; Liu, H.; Song, J.; Li, R.; Hu, Y.; Zhou, X.; Patras, P. TransMUSE: Transferable traffic prediction in multi-service edge networks. Comput. Netw. 2023, 221, 109518. [Google Scholar] [CrossRef] [PubMed]
Bousqaoui, H.; Slimani, I.; Achchab, S. Comparative analysis of short-term demand predicting models using ARIMA and deep learning. Int. J. Electr. Comput. Eng. 2021, 11, 3319–3328. [Google Scholar] [CrossRef]
Ghojogh, B.; Ghodsi, A. Recurrent neural networks and long short-term memory networks: Tutorial and survey. arXiv 2023, arXiv:2304.11461. [Google Scholar] [CrossRef]
Shiri, F.M.; Perumal, T.; Mustapha, N.; Mohamed, R. A comprehensive overview and comparative analysis on deep learning models: CNN, RNN, LSTM, GRU. arXiv 2023, arXiv:2305.17473. [Google Scholar] [CrossRef]
Zerveas, G.; Jayaraman, S.; Patel, D.; Bhamidipaty, A.; Eickhoff, C. A transformer-based framework for multivariate time series representation learning. In Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining, Virtual, 14–18 August 2021. [Google Scholar]
Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.N.; Kaiser, L.; Polosukhin, I. Attention Is All You Need. Adv. Neural Inf. Process. Syst. 2017, 30, 5998–6008. [Google Scholar]
Alzubaidi, L.; Zhang, J.; Humaidi, A.J.; Al-Dujaili, A.; Duan, Y.; Al-Shamma, O.; Santamaria, J.; Fadhel, M.A.; Farhan, L. Review of deep learning: Concepts, CNN architectures, challenges, applications, future directions. J. Big Data 2021, 8, 53. [Google Scholar] [CrossRef]
Li, Z.; Liu, F.; Yang, W.; Peng, S.; Zhou, J. A survey of convolutional neural networks: Analysis, applications, and prospects. IEEE Trans. Neural Netw. Learn. Syst. 2021, 33, 6999–7019. [Google Scholar] [CrossRef]
Lin, T.; Wang, Y.; Liu, X.; Qiu, X. A survey of transformers. AI Open 2022, 3, 111–132. [Google Scholar] [CrossRef]
Islam, S.; Elmekki, H.; Elsebai, A.; Bentahar, J.; Drawel, N.; Rjoub, G.; Pedrycz, W. A comprehensive survey on applications of transformers for deep learning tasks. Expert Syst. Appl. 2024, 241, 122666. [Google Scholar] [CrossRef]
Zhou, T.; Ma, Z.; Wen, Q.; Wang, X.; Sun, L.; Jin, R. FEDformer: Frequency Enhanced Decomposed Transformer for Long-Term Series Forecasting. In Proceedings of the 39th International Conference on Machine Learning, PMLR 2022, Baltimore, MD, USA, 17–23 July 2022; Volume 162, pp. 27268–27286. [Google Scholar]
Oliveira, J.M.; Ramos, P. Evaluating the Effectiveness of Time Series Transformers for Demand Forecasting in Retail. Mathematics 2024, 12, 2728. [Google Scholar] [CrossRef]
Zhou, H.; Zhang, S.; Peng, J.; Zhang, S.; Li, J.; Xiong, H.; Zhang, W. Informer: Beyond efficient transformer for long sequence time-series forecasting. In Proceedings of the AAAI Conference on Artificial Intelligence, Vancouver, BC, Canada, 2–6 February 2021; pp. 11106–11115. [Google Scholar]
California Department of Transportation (Caltrans). PeMS Data Source (Performance Measurement System). Available online: https://dot.ca.gov/about-caltrans (accessed on 14 December 2025).
Biswas, S.; Grundlingh, N.; Boardman, J.; White, J.; Le, L. A Target Permutation Test for Statistical Significance of Feature Importance in Differentiable Models. Electronics 2025, 14, 571. [Google Scholar] [CrossRef]
Molnar, C.; König, G.; Bischl, B.; Casalicchio, G. Model-agnostic feature importance and effects with dependent features: A conditional subgroup approach. Data Min. Knowl. Discov. 2024, 38, 2903–2941. [Google Scholar] [CrossRef]
Zhao, B.; Lu, H.; Chen, S.; Liu, J.; Wu, D. Convolutional neural networks for time series classification. J. Syst. Eng. Electron. 2017, 28, 162–169. [Google Scholar] [CrossRef]
Pan, Z.; Liang, Y.; Wang, W.; Yu, Y.; Zheng, Y.; Zhang, J. Urban traffic prediction from spatio-temporal data using deep meta learning. In Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, Anchorage, AL, USA, 4–8 August 2019. [Google Scholar]
Madariaga, D.; Panza, M.; Bustos-Jiménez, J. I’m only unhappy when it rains: Forecasting mobile QoS with weather conditions. In Proceedings of the 2018 Network Traffic Measurement and Analysis Conference (TMA), Vienna, Austria, 26–29 June 2018; IEEE: New York, NY, USA; pp. 1–6. [Google Scholar]
Kuber, T.; Seskar, I.; Mandayam, N. Traffic prediction by augmenting cellular data with non-cellular attributes. In Proceedings of the 2021 IEEE Wireless Communications and Networking Conference (WCNC), Nanjing, China, 29 March–1 April 2021; pp. 1–6. [Google Scholar]
Igbekele, O.J.; Zhimwang, J.T.; Ogherohwo, E.P.; Kwaha, B.J. Impact of Altitude and Weather Conditions on Cellular Networks: A Comprehensive Analysis of Quality of Service. Int. J. Adv. Netw. Appl. 2024, 15, 6169–6173. [Google Scholar] [CrossRef]
Joshi, T.; Semwal, S.; Rawat, A. Assessing Quality of Service in Cellular Networks: Effects of Altitude and Weather Conditions in Dehradun, Uttarakhand. In Proceedings of the 2024 International Conference on Advances in Computing, Communication and Materials (ICACCM), Dehradun, India, 22–24 November 2024; IEEE: New York, NY, USA; pp. 1–6. [Google Scholar]
Abdullah, M.; He, J.; Wang, K. Weather-aware fiber-wireless traffic prediction using graph convolutional networks. IEEE Access 2022, 10, 95908–95918. [Google Scholar] [CrossRef]
Korea Meteorological Administration (KMA). Meteorological Data Open Portal. Available online: https://data.kma.go.kr/ (accessed on 14 December 2025).
Lan, S.; Ma, Y.; Huang, W.; Wang, W.; Yang, H.; Li, P. DSTAGNN: Dynamic spatial-temporal aware graph neural network for traffic flow forecasting. In Proceedings of the International Conference on Machine Learning (ICML), Baltimore, MD, USA, 17–23 July 2022; pp. 11906–11917. [Google Scholar]
Tian, C.; Chan, W.K. Spatial-temporal attention wavenet: A deep learning framework for traffic prediction considering spatial-temporal dependencies. IET Intell. Transp. Syst. 2021, 15, 4. [Google Scholar] [CrossRef]
Wang, Z.; Hu, J.; Min, G.; Zhao, Z.; Chang, Z.; Wang, Z. Spatial-temporal cellular traffic prediction for 5G and beyond: A graph neural networks-based approach. IEEE Trans. Ind. Inform. 2022, 19, 4. [Google Scholar] [CrossRef]
Liu, A.; Zhang, Y. Spatial–temporal dynamic graph convolutional network with interactive learning for traffic forecasting. IEEE Trans. Intell. Transp. Syst. 2024, 25, 7. [Google Scholar] [CrossRef]
Ji, J.; Wang, J.; Huang, C.; Wu, J.; Xu, B.; Wu, Z.; Zhang, J.; Zheng, Y. Spatio-temporal self-supervised learning for traffic flow prediction. In Proceedings of the AAAI Conference on Artificial Intelligence, Washington, DC, USA, 13–14 February 2023; pp. 4356–4364. [Google Scholar]
Yao, Z.; Xia, S.; Li, Y.; Wu, G.; Zuo, L. Transfer learning with spatial–temporal graph convolutional network for traffic prediction. IEEE Trans. Intell. Transp. Syst. 2023, 24, 8. [Google Scholar] [CrossRef]
Aouedi, O.; Le, V.A.; Piamrat, K.; Ji, Y. Deep learning on network traffic prediction: Recent advances, analysis, and future directions. ACM Comput. Surv. 2025, 57, 6. [Google Scholar] [CrossRef]
Song, C.; Lin, R.; Zheng, X.; Chen, Z. PeMSD7 and PeMSD4 Traffic Datasets. Available online: https://zenodo.org/records/7816008 (accessed on 3 December 2025).
Li, Y.; Yu, R.; Shahabi, C.; Liu, Y. PEMS-BAY Traffic Dataset. Available online: https://zenodo.org/records/4263971 (accessed on 3 December 2025).
Jagadish, V.; Mori, U.; Mendiburu, A.; Otaegi, A. METR-LA Traffic Dataset. Available online: https://zenodo.org/records/5146275 (accessed on 3 December 2025).
Wu, H.; Xu, J.; Wang, J.; Long, M. Autoformer: Decomposition transformers with auto-correlation for long-term series forecasting. Adv. Neural Inf. Process. Syst. 2021, 34, 22419–22430. [Google Scholar]
Zhou, H.; Zhang, S.; Peng, J.; Zhang, S.; Li, J.; Xiong, H.; Zhang, W. Informer: Beyond efficient transformer for long sequence time-series forecasting. arXiv 2021, arXiv:2012.07436. [Google Scholar] [CrossRef]
Luo, Q.; He, S.; Han, X.; Wang, Y.; Li, H. LSTTN: A long-short term transformer-based spatiotemporal neural network for traffic flow forecasting. Knowl. Based Syst. 2024, 293, 111637. [Google Scholar] [CrossRef]
Dong, X.; Zhao, W.; Han, H.; Zhu, Z.; Zhang, H. MTESformer: Multi-scale temporal and enhance spatial transformer for traffic flow prediction. IEEE Access 2024, 12, 47231–47245. [Google Scholar] [CrossRef]
Yan, X.; Gan, X.; Tang, J.; Zhang, D.; Wang, R. ProSTformer: Progressive space-time self-attention model for short-term traffic flow forecasting. IEEE Trans. Intell. Transp. Syst. 2024, 25, 9. [Google Scholar] [CrossRef]
Tedjopurnomo, D.A.; Choudhury, F.M.; Qin, A.K. TrafFormer: A transformer model for predicting long-term traffic. arXiv 2023, arXiv:2302.12388. [Google Scholar]
Juyongjiang. TimeSeriesDatasets: Datasets for Time Series Forecasting. Available online: https://github.com/juyongjiang/TimeSeriesDatasets (accessed on 3 December 2025).
Shi, B.; Wang, Z.; Yan, J.; Yang, Q.; Yang, N. A novel spatial–temporal deep learning method for metro flow prediction considering external factors and periodicity. Appl. Sci. 2024, 14, 1949. [Google Scholar] [CrossRef]
Zhu, J.; Wang, Q.; Tao, C.; Deng, H.; Zhao, L.; Li, H. AST-GCN: Attribute-augmented spatiotemporal graph convolutional network for traffic forecasting. IEEE Access 2021, 9, 35973–35983. [Google Scholar] [CrossRef]
Nadarajan, J.; Sivanraj, R. Attention-based multiscale spatiotemporal network for traffic forecast with fusion of external factors. ISPRS Int. J. Geo-Inf. 2022, 11, 619. [Google Scholar] [CrossRef]
Ye, J.; Xue, S.; Jiang, A. Attention-based spatio-temporal graph convolutional network considering external factors for multi-step traffic flow prediction. Digit. Commun. Netw. 2022, 8, 3. [Google Scholar] [CrossRef]
Qi, X.; Yao, J.; Wang, P.; Shi, T.; Zhang, Y.; Zhao, X. Combining weather factors to predict traffic flow: A spatial-temporal fusion graph convolutional network-based deep learning approach. IET Intell. Transp. Syst. 2024, 18, 3. [Google Scholar] [CrossRef]
Rajha, R.; Shiode, S.; Shiode, N. Improving traffic-flow prediction using proximity to urban features and public space. Sustainability 2025, 17, 68. [Google Scholar] [CrossRef]
Rong, Y.; Mao, Y.; He, X.; Chen, M. Large-scale traffic flow forecast with lightweight LLM in edge intelligence. IEEE Internet Things Mag. 2024, 8, 12–18. [Google Scholar] [CrossRef]
Patro, S.G.; Sahu, K.K. Normalization: A preprocessing stage. arXiv 2015, arXiv:1503.06462. [Google Scholar] [CrossRef]
Ali, P.J.M. Investigating the impact of min-max data normalization on the regression performance of K-nearest neighbor with different similarity measurements. ARO Sci. J. Koya Univ. 2022, 10, 85–91. [Google Scholar]
Kim, J.; Ryoo, I. A Study on Network Traffic Prediction Based on Deep Learning Training Using Environmental In-formation. J. Broadcast Eng. 2025, 30.3, 461–479. [Google Scholar] [CrossRef]
Wen, Q.; Zhou, T.; Zhang, C.; Chen, W.; Ma, Z.; Yan, J.; Sun, L. Transformers in time series: A survey. arXiv 2024, arXiv:2202.07125. [Google Scholar]
Kim, J.; Ryoo, I. A Comparative Study of Multivariate Time Series Deep Learning Models for Network Traffic Pre-diction Incorporating Environmental Variables. J. Broadcast Eng. 2025, 30.6, 1093–1106. [Google Scholar] [CrossRef]
Abadi, M.; Barham, P.; Chen, J.; Chen, Z.; Davis, A.; Dean, J.; Devin, M.; Ghemawat, S.; Irving, G.; Isard, M.; et al. TensorFlow: A system for large-scale machine learning. In Proceedings of the 12th USENIX Symposium on Operating Systems Design and Implementation (OSDI 16), Savannah, GA, USA, 2–4 November 2016; pp. 265–283. [Google Scholar]
Zollanvari, A. Deep learning with Keras–TensorFlow. In Machine Learning with Python: Theory and Implementation; Springer: Cham, Switzerland, 2023; pp. 351–391. [Google Scholar]
Kingma, D.P.; Ba, J. Adam: A method for stochastic optimization. arXiv 2017, arXiv:1412.6980. [Google Scholar] [CrossRef]
Ahn, K.; Zhang, Z.; Kook, Y.; Dai, Y. Understanding Adam optimizer via online learning of updates: Adam is FTRL in disguise. In Proceedings of the 41st International Conference on Machine Learning, Vienna, Austria, 21–27 July 2024. [Google Scholar]
Lundberg, S.M.; Lee, S.-I. A Unified Approach to Interpreting Model Predictions. In Advances in Neural Information Processing Systems; Curran Associates, Inc.: Red Hook, NY, USA, 2017; p. 30. [Google Scholar]
Friedman, J.H. Greedy Function Approximation: A Gradient Boosting Machine. Ann. Stat. 2001, 29, 1189–1232. [Google Scholar] [CrossRef]
Van den Broeck, G.; Lykov, A.; Schleich, M.; Suciu, D. On the tractability of SHAP explanations. J. Artif. Intell. Res. 2022, 74, 851–886. [Google Scholar] [CrossRef]
Arnastauskaitė, J.; Ruzgas, T.; Bražėnas, M. An exhaustive power comparison of normality tests. Mathematics 2021, 9, 788. [Google Scholar] [CrossRef]
Odoi, B. Performance of Levene’s Test with Various Residuals and Correction Factors for Homogeneity of Variance in Single and Factorial ANOVA—A Simulation Approach. Adv. Knowl. Based Syst. Data Sci. Cybersecur. Res. 2024, 2, 149–171. [Google Scholar] [CrossRef]
Nanda, A.; Mohapatra, B.B.; Mahapatra, A.P.K.; Mahapatra, A.P.K.; Mahapatra, A.P.K. Multiple comparison test by Tukey’s honestly significant difference (HSD): Do the confidence level control type I error. Int. J. Stat. Appl. Math. 2021, 6, 59–65. [Google Scholar] [CrossRef]
Mohamad, D.A.; Goeman, J.J.; van Zwet, E.W. An improvement of Tukey’s HSD with application to ranking institutions. arXiv 2017, arXiv:1708.02428. [Google Scholar]
Karch, J.D.; Perez-Alonso, A.F.; Bergsma, W.P. Beyond Pearson’s Correlation: Modern Nonparametric Independence Tests for Psychological Research. Multivar. Behav. Res. 2024, 59, 957–977. [Google Scholar] [CrossRef]

Figure 1. Correlation Heatmap between Environmental Variables and Maximum Traffic.

Figure 2. MSE Comparison by Epoch Based on the Inclusion of Environmental Variables.

Figure 3. Permutation Importance Comparison of Environmental Variables Across GRU, LSTM, RNN, CNN, and Transformer Models.

Table 4. Dataset Summary.

Category	Description
Data Source	SNMP-based network traffic monitoring logs from Korea South-East Power Co., Ltd.
Monitoring Period	January 2022–December 2023 (2 years)
Sampling Frequency	1 h intervals (24 measurements per day)
Total Records	Approximately 17,520 traffic measurements (730 days × 24 h)
Prediction Target	Maximum daily outbound network traffic (Mbps)
Network Variables	Hourly traffic volume, aggregated packet count, and link utilization rate
Environmental Variables	Average temperature (°C), minimum temperature (°C), maximum temperature (°C), average wind speed (m/s), maximum wind speed (m/s), maximum instantaneous wind speed (m/s), total sunlight hours (hr), days
Data Sources (Environment)	Public API of the Korea Meteorological Administration (KMA)
Matching Method	Each day’s environmental record is aligned with the corresponding date of network traffic measurement
Data Size after Merging	730 composite samples (daily aggregation, 8 features per sample)

Table 5. Model Performance: Optimal and Common Condition (Sorted by MSE).

Model	Epoch	Batch Size	MSE	MAE	RMSE	R²	MAPE
GRU	1000	7	3400.24	39.34	58.31	0.99	3.49
	1000	21	3648.71	42.48	60.41	0.99	3.60
	900	7	3742.44	42.84	61.18	0.99	3.56
RNN	800	14	750.19	15.78	27.39	0.99	1.15
	900	7	805.16	18.11	28.38	0.99	1.48
	900	14	1013.50	20.49	31.84	0.99	1.55
LSTM	800	7	4226.99	44.33	65.02	0.99	3.68
	700	7	4633.59	49.13	68.07	0.98	4.06
	1000	7	5083.90	49.11	71.30	0.98	4.07
CNN	1000	14	6539.10	55.19	80.87	0.98	4.38
	900	7	6758.62	55.70	82.21	0.98	4.40
	900	14	7045.47	58.53	83.94	0.98	4.73
Transformer	800	7	9506.48	59.34	97.50	0.97	4.80
	1000	7	9908.81	60.92	99.54	0.96	4.88
	600	7	10,335.02	61.97	101.66	0.96	4.94

Table 7. Results of Shapiro–Wilk Normality Test for Deep Learning Prediction Models.

	Model	W Statistic	p-Value	Analysis
Environmental Information Excluded	GRU	0.97	5.1462 × 10⁻¹	acceptance
	RNN	0.86	8.3924 × 10⁻⁴	rejection
	LSTM	0.94	9.1412 × 10⁻²	acceptance
	CNN	0.98	6.9992 × 10⁻¹	acceptance
	Transformer	0.63	1.5523 × 10⁻⁷	rejection
Environmental Information Included	GRU	0.56	2.3127 × 10⁻⁸	rejection
	RNN	0.77	2.0642 × 10⁻⁵	rejection
	LSTM	0.65	3.4425 × 10⁻⁷	rejection
	CNN	0.95	1.4675 × 10⁻¹	acceptance
	Transformer	0.92	2.0710 × 10⁻²	rejection

Table 8. Results of Tukey HSD Post hoc Test for Pairwise MSE Differences Across Deep Learning Models Under Various Environmental Input Conditions.

	Group 1	Group 2	Mean Diff	P-adj	Lower	Upper	Reject
Environmental Information Excluded	GRU	CNN	−2028.58	0.01	−3752.56	−304.60	TRUE
	RNN	CNN	−6071.34	0.05	−3461.73	−13.77	TRUE
	LSTM	CNN	−1737.75	0	−7795.32	−4347.36	TRUE
	Transformer	CNN	4400.04	0	2676.06	6124.02	TRUE
	LSTM	GRU	290.83	0.99	−1433.15	2014.81	FALSE
	RNN	GRU	−4042.76	0	−5766.74	−2318.78	TRUE
	Transformer	GRU	6428.61	0	4704.63	8152.59	TRUE
	RNN	LSTM	−4333.60	0	−6057.58	−2609.62	TRUE
	Transformer	LSTM	6137.78	0	4413.80	7861.76	TRUE
	Transformer	RNN	10,471.38	0	8747.40	12,195.36	TRUE
Environmental Information Included	GRU	CNN	−7629.67	0	−8821.12	−6438.22	TRUE
	LSTM	CNN	−7386.85	0	−8578.30	−6195.40	TRUE
	RNN	CNN	−8373.81	0	−9565.26	−7182.36	TRUE
	Transformer	CNN	4399.27	0	3207.82	5590.72	TRUE
	LSTM	GRU	242.82	0.98	−948.63	1434.27	FALSE
	RNN	GRU	−744.14	0.42	−1935.59	447.31	FALSE
	Transformer	GRU	12,028.94	0	10,837.49	13,220.39	TRUE
	RNN	LSTM	−986.96	0.15	−2178.41	204.49	FALSE
	Transformer	LSTM	11,786.12	0	10,594.67	12,977.57	TRUE
	Transformer	RNN	12,773.08	0	11,581.63	13,964.53	TRUE

Table 9. Prediction Performance Comparison for GRU, LSTM, RNN, CNN, and Transformer Models with and without Environmental Information.

Model	Epoch	Batch Size	MSE (Excl.)	MSE (Incl.)	Relative Improvement (%)	MAE (Excl.)	MAE (Incl.)	R2 (Incl.)	Remark
GRU	700	7	7009.41	1151.46	83.57	54.48	25.05	0.99	Fast convergence and stable generalization
RNN	700	14	2966.65	407.32	86.27	34.75	17.92	1.00	Most consistent and lightweight
LSTM	1000	21	7300.25	1394.28	80.90	55.87	28.05	0.99	Higher variance and slower convergence
CNN	900	7	9037.99	8781.13	2.84	63.12	62.48	0.99	Limited temporal learning
Transformer	800	7	13,438.03	13,180.44	1.92	73.34	73.34	0.99	High computational cost

Table 10. Optimal and Common Condition Performance by Model (Include4: Selected Environmental Variables, Sorted by MSE Criteria).

Model	Epoch	Batch Size	MSE	MAE	RMSE	R²	MAPE
GRU	700	7	11.95	2.73	3.46	0.99	0.23
	800	21	58.51	6.03	7.65	0.99	0.48
	900	14	63.08	5.75	7.94	0.99	0.41
RNN	1000	21	42.92	5.05	6.55	0.99	0.41
	800	7	44.37	5.22	6.66	0.99	0.44
	1000	7	68.81	6.19	8.30	0.99	0.46
LSTM	700	14	11.43	2.60	3.38	0.99	0.22
	600	21	33.79	4.55	5.81	0.99	0.38
	900	14	34.40	4.42	5.86	0.99	0.35
CNN	900	7	5891.24	53.32	76.75	0.99	4.24
	800	7	6313.98	53.47	79.46	0.99	4.23
	1000	14	6417.44	54.74	80.11	0.99	4.36
Transformer	800	7	9621.59	61.70	98.09	0.97	4.91
	1000	14	10,374.42	62.53	101.85	0.96	4.98
	1000	7	10,869.45	68.11	104.26	0.96	5.12

Table 11. Performance Comparison of GRU, LSTM, RNN, CNN, and Transformer Models with Environmental Features (Epoch = 1000, Batch Size = 7).

Model	MSE	MAE	RMSE	R²	MAPE
GRU	66.30	6.32	8.14	0.99	0.52
RNN	172.62	9.67	13.14	0.99	0.67
LSTM	68.81	6.19	8.30	0.99	0.46
CNN	6677.75	58.35	81.72	0.98	4.84
Transformer	10,869.45	68.11	104.26	0.96	5.12

Table 12. Permutation Importance of Key Environmental Features Across GRU, LSTM, RNN, CNN, and Transformer Models.

Feature	GRU	LSTM	RNN	CNN	Transformer
Average Temperature (°C)	0.00725	0.00721	0.00430	−0.00740	−0.00575
Maximum Temperature (°C)	0.00567	0.00651	0.00183	−0.00312	−0.00216
Total Sunlight Hours (hr)	0.00267	−0.00065	0.00101	0.00292	0.00149
days	0.00216	0.00233	−0.00209	0.00079	0.00160

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Kim, J.; Ryoo, I. A Deep Dive into AI-Based Network Traffic Prediction Using Heterogeneous Real Datasets. Appl. Sci. 2026, 16, 367. https://doi.org/10.3390/app16010367

AMA Style

Kim J, Ryoo I. A Deep Dive into AI-Based Network Traffic Prediction Using Heterogeneous Real Datasets. Applied Sciences. 2026; 16(1):367. https://doi.org/10.3390/app16010367

Chicago/Turabian Style

Kim, Jungyun, and Intae Ryoo. 2026. "A Deep Dive into AI-Based Network Traffic Prediction Using Heterogeneous Real Datasets" Applied Sciences 16, no. 1: 367. https://doi.org/10.3390/app16010367

APA Style

Kim, J., & Ryoo, I. (2026). A Deep Dive into AI-Based Network Traffic Prediction Using Heterogeneous Real Datasets. Applied Sciences, 16(1), 367. https://doi.org/10.3390/app16010367

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

A Deep Dive into AI-Based Network Traffic Prediction Using Heterogeneous Real Datasets

Abstract

1. Introduction

2. Related Work

2.1. Graph-Based Traffic Prediction

2.2. Non-Graph Based Traffic Prediction

2.3. Environmental Information Based Traffic Prediction

3. Proposed Experimental Design and Methodology for Network Traffic Prediction

3.1. Data Collection and Preprocessing

3.1.1. Data Description

3.1.2. Data Preprocessing

3.2. Model Architecture and Configuration

3.3. Architectural Details and Training Configuration

3.4. Comparative Experimental Design and Integration Analysis of External Factors

3.5. Performance Evaluation and Variable Importance Analysis

4. Experimental Results and Analysis

4.1. Performance Analysis of Deep Learning-Based Predictive Models

4.2. Verification of Performance Differences and Statistical Significance Across Models

4.3. Analysis of Predictive Performance Variations with Environmental Variable Integration

4.4. Correlation Analysis Among Environmental Variables

4.5. Variable Importance Analysis

4.6. Comprehensive Analysis

4.7. Limitations and Future Work

5. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI