Hybrid Spatio-Temporal CNN–LSTM/BiLSTM Models for Blocking Prediction in Elastic Optical Networks

Nourmohammadi, Farzaneh; Comellas, Jaume; Kaymak, Uzay

doi:10.3390/network5040044

Open AccessArticle

Hybrid Spatio-Temporal CNN–LSTM/BiLSTM Models for Blocking Prediction in Elastic Optical Networks

by

Farzaneh Nourmohammadi

¹,

Jaume Comellas

^1,* and

Uzay Kaymak

²

¹

Department of Signal Theory and Communications, Universitat Politècnica de Catalunya, 08034 Barcelona, Spain

²

Jheronimus Academy of Data Science, Eindhoven University of Technology, 5600 MB Eindhoven, The Netherlands

^*

Author to whom correspondence should be addressed.

Network 2025, 5(4), 44; https://doi.org/10.3390/network5040044

Submission received: 29 July 2025 / Revised: 14 September 2025 / Accepted: 25 September 2025 / Published: 7 October 2025

Download

Browse Figures

Versions Notes

Abstract

Elastic optical networks (EONs) must allocate resources dynamically to accommodate heterogeneous, high-bandwidth demands. However, the continuous setup and teardown of connections with different bit rates can fragment the spectrum and lead to blocking. The blocking predictors enable proactive defragmentation and resource reallocation within network controllers. In this paper, we propose two novel deep learning models (based on CNN–BiLSTM and CNN–LSTM) to predict blocking in EONs by combining spatial feature extraction from spectrum snapshots using 2D convolutional layers with temporal sequence modeling. This hybrid spatio-temporal design learns how local fragmentation patterns evolve over time, allowing it to detect impending blocking scenarios more accurately than conventional methods. We evaluate our model on the simulated NSFNET topology and compare it against multiple baselines, namely 1D CNN, 2D CNN, k-nearest neighbors (KNN), and support vector machines (SVMs). The results show that the proposed CNN–BiLSTM/LSTM models consistently achieve higher performance. The CNN–BiLSTM model achieved the highest accuracy in blocking prediction, while the CNN–LSTM model shows slightly lower accuracy; however, it has much lower complexity and a faster learning time.

Keywords:

elastic optical networks; spectrum fragmentation; blocking prediction; spatio-temporal modeling; CNN–LSTM; CNN–BiLSTM; deep learning

1. Introduction

Services that require high bandwidth, such as ultra-high-definition video streaming, cloud gaming, telepresence, and virtual reality, are increasing the need for fast optical networks. Traditional optical networks use fixed-grid Wavelength Division Multiplexing (WDM), usually with channel widths of 50 or 100 GHz. This method often leads to unused spectrum when the bandwidth requested is smaller than the channel size, particularly when different types of traffic are combined.

EONs were developed to overcome the limitations of traditional optical networks. They divide large frequency bands into smaller frequency slots, such as 12.5 GHz or 6.25 GHz, which can be combined flexibly according to the bandwidth needed for each connection. By allowing dynamic spectrum allocation, EONs offer a more flexible and cost-effective way to handle the growing network and internet traffic demands [1].

However, the flexibility of EONs can also result in spectrum fragmentation. When connections are repeatedly set up and terminated, available frequency slots can become scattered. As a result, even if there is a large total amount of free spectrum, it may not contain enough continuous slots to accommodate new connection requests [2]. Spectrum fragmentation increases the likelihood of rejecting new connections, which in turn increases connection blocking probability and reduces the overall efficiency of the network [3]. Although many metrics have been proposed to represent fragmentation [4], its impact on blocking probability is difficult to quantify. Therefore, using a comprehensive view of the spectral status of links, as proposed in this work, can be valuable in mitigating the effects of fragmentation.

The issue of spectrum fragmentation can be addressed by performing spectrum defragmentation or rerouting to restore contiguous frequency slots [5]. However, repeated execution of these actions may lead to unnecessary resource consumption, while delaying them could result in blocked connections [6,7]. Therefore, it is important to correctly predict when the risk of blocking increases and intervene at the appropriate time [8].

Predicting network blocking events is a critical challenge. Deep learning techniques can provide powerful predictive capabilities to optimize performance and enhance network reliability. Early machine learning attempts modeled blocking as a classification problem fed with specifically designed traffic features or spectrum occupancy statistics [9,10,11,12].

Blocking prediction has been an important topic in recent research in EONs. Singh and Jukan [13] proposed a reduced-state Markov model for blocking probability estimation in EONs. Moreover, analytical traffic models such as Erlang B have been evaluated for their applicability in EONs. Ujjwal and Thangaraj [14] investigated the limitations of Erlang B in estimating blocking probability under dynamic RSA scenarios. Cheng and Qiu [15] introduced a Long Short-Term Memory (LSTM)-based routing and spectrum assignment (RSA) framework, which learns temporal network dynamics to anticipate spectrum usage and predict blocking trends. Dávalos et al. [16] proposed a machine learning-based triggering strategy for spectrum defragmentation in EONs. Their method estimates the blocking rate using neural networks and activates defragmentation only when necessary, outperforming fixed-time and metric-based triggering approaches by reducing both blocking demands and unnecessary reconfigurations.

The authors in [1] demonstrated that two-dimensional Convolutional Neural Networks (2D-CNNs) trained directly on spectrum status matrices can reach an accuracy of 92.17% in blocking prediction on EONs based on the NSFNET topology, significantly outperforming support vector machines and k-Nearest Neighbors baselines.

Recent works on routing, spectrum assignment, and traffic forecasting indicate that combining spatial convolutions with recurrent units able to model time series yields superior results in optical networks [17]. Current blocking predictors either neglect the temporal dynamics of fragmentation or rely on handcrafted features that fail to generalize across topologies and traffic patterns. As a consequence, network operators still lack actionable and timely alarms that would allow them to schedule defragmentation exactly when it is needed.

Traffic prediction and dynamic resource management in EONs have gained substantial attention with the advent of machine learning (ML) techniques [18,19]. Deep reinforcement learning (DRL) methods have particularly emerged as powerful tools for routing and spectrum assignment (RSA) challenges [20,21]. Panayiotou et al. [17] propose a DRL-based RSA solution leveraging Graph Convolutional Networks (GCNs) for topological feature extraction and Recurrent Neural Networks (RNNs) for aggregating path-level features, addressing the limitations of Conventional Neural Networks (CNNs) in capturing network topology.

The broader survey by Panayiotou et al. [17] discusses the role of CNNs and Long Short-Term Memory (LSTM) networks in traffic prediction for wireless networks, emphasizing their capacity to model spatial and temporal dependencies, respectively. Deep Convolutional Recurrent Neural Networks (DCRNNs) effectively capture spatio-temporal traffic patterns, outperforming traditional methods [22].

Azzouni and Pujolle [23] introduced NeuTM, an LSTM-based framework for predicting traffic matrices in Software-Defined Networks (SDNs), demonstrating low mean square error performance on the GÉANT backbone. For EON-based cloud networks, Monte Carlo Tree Search (MCTS) and artificial neural network (ANN) approaches were evaluated by Aibin and Walkowiak [24] to predict load and minimize blocking. Similarly, Poupart et al. [25] predicted flow sizes to proactively manage routing decisions, while Nie et al. [26] applied Deep Belief Networks (DBNs) to predict traffic in wireless mesh backbones, decomposing traffic dynamics into predictable and irregular components. Most of the high-speed wireless networks, such as 4G and 5G, use EONs as a backbone.

Fault detection and resilience in EONs have also benefited from ML integration. Reddy and Kumar [27] proposed a deep neural network (DNN) combined with a Fuzzy Inference System (FIS) to enhance Shared Backup Path Protection (SBPP), improving the fault restoration ratio while reducing blocking probabilities. Multi-step traffic forecasting has been shown to outperform single-step predictions. Maryam et al. [28] utilized an Encoder–Decoder LSTM (ED-LSTM) model to anticipate future traffic over multiple periods, proposing heuristics like Multi-step Maximum Demand Spectrum Allocation (MMD-SA) and Multi-step Average Demand Spectrum Allocation (MAD-SA), which significantly reduce service disruptions. Vinchoff et al. [29] introduced the GCN-GAN model, integrating GCNs and Generative Adversarial Networks (GANs) to predict bursty traffic patterns with superior mean square error performance compared to LSTM baselines. Aibin et al. [30] further demonstrated the applicability of GCN-GAN for both short- and long-term traffic forecasting.

Emerging research also highlights the importance of explainability. Goścień [31] combined traffic prediction with explainable AI (XAI) techniques (SHAP values), showing that interpreting ML model outputs can lead to improved demand ordering policies and reduced blocking probabilities in software-defined EONs. Topology-aware models have become essential for accurate traffic prediction in elastic cognitive optical networks (ECONs). Li et al. [32] proposed a GCN-Transformer model that jointly captures spectral (spatial features of the spectrum) and temporal traffic features. In multicore optical network environments, Pinto-Ríos et al. [8] employed DRL, specifically TRPO (Trust Region Policy Optimization) and PPO2 (Proximal Policy Optimization 2) agents, to solve the Routing, Modulation, Spectrum, and Core Assignment (RMSCA) problem, outperforming traditional heuristic methods. Finally, reinforcement learning approaches like DeepRMSA [33] and IRRS [34] have been pivotal in addressing spectrum fragmentation and fairness, improving blocking probability and fairness indexes across variable traffic scenarios.

We propose two AI-based predictors (CNN–BiLSTM/LSTM) [35,36] that process a series of consecutive spectrum snapshots. First, convolutional layers uncover local fragmentation patterns in each snapshot. Then, a Long Short-Term Memory (LSTM) or a Bidirectional Long Short-Term Memory (BiLSTM) layer learns how these patterns change over time, forward or both forward and backward. Finally, the model classifies the snapshots based on the probably of the blocking. The proposed model provides actionable blocking predictions that enable proactive spectrum defragmentation in software-defined optical networks. By predicting blocking events in advance, network operators can schedule just-in-time defragmentation to minimize service disruption while avoiding unnecessary reconfigurations.

The key contributions of this paper include the following:

-: We propose two hybrid CNN–LSTM/BiLSTM models for analyzing spectrum data. These models extract spatial features using CNN and temporal features using LSTM/BiLSTM.
-: Evaluating CNN–LSTM and CNN–BiLSTM models, finding CNN–BiLSTM achieves a higher accuracy (94.1%) than CNN–LSTM (92.65%) and existing CNN models (92.17%).
-: We conduct extensive comparisons with methods including 1D CNN, 2D CNN, KNN, and SVM, using accuracy, recall, F1-score, training time, and complexity.
-: A correlation analysis of model hyperparameters with output performance is presented.

The remainder of this paper is organized as follows: Section 2 presents the methodology and problem definition, while Section 3 describes the network simulation and data generation. Section 4 introduces the proposed CNN–LSTM and CNN–BiLSTM architectures. Section 5 details hyperparameter tuning, and Section 6 describes the validation metrics used. Training procedures are summarized in Section 7 and Section 8 presents the evaluation results. Section 9 provides a comparative discussion, and Section 10 concludes the paper with key findings and future directions.

2. Methodology

2.1. Problem Definition

Given a stream of connection requests

r_{i} = 〈 s_{i}, d_{i} 〉

(source, destination), in an EON, the goal is to allocate sufficient network resources to establish the requested connection. In EONs, two key constraints limit how spectrum can be allocated to new connections: the contiguity constraint and the continuity constraint.

The contiguity constraint requires that the frequency slots assigned to a connection must be adjacent (a group of contiguous and adjacent frequency slots without gaps in between) across the spectrum, ensuring the optical signal can be transmitted efficiently without gaps. The continuity constraint requires that the same set of frequency slots must be available across all the fiber spans (hops) along the end-to-end path between the source and destination, guaranteeing that the signal stays on the same spectral band throughout the route. Jointly, these constraints make routing and spectrum assignment (RSA) challenging, especially under fragmentation, because even if enough total free capacity exists, it may be impossible to find a continuous and contiguous set of slots across the entire path, leading to connection blocking.

The status of EON networks changes after the establishment or termination of each connection. At each time unit, the status of the network can be represented by the network snapshot

X

. The aim is to identify network snapshots that are likely to lead to blocking in the near future. In this way, we can predict network blocking. Given a stream of

X_{i}

, the problem can be formulated as learning mapping, calculated as follows:

f_{θ} : X_{i} ⟶ y_{i} \in {0, 1} X_{i} \in R^{l \times s}

where

X_{i}

is the occupancy snapshot capturing the current spectrum allocation, l is the number of links in the network, s is the number of frequency slot per link, and

y_{i}

is the binary blocking label (“0” = not-leading-to-blocking, “1” = leading-to-blocking).

2.2. Approach

In this study, the Knowledge Discovery in Databases [37] (KDD) process is adapted to address the prediction of connection blocking events in EONs. The main steps of this process are detailed as follows:

Data collection: Data are generated by simulating an EON, capturing the network state as snapshots (matrices of link and frequency slot usage) for each connection request and resource allocation/release event.
Data preparation: From all collected snapshots, relevant data is selected: specifically, snapshots just prior to blocking events (13, 14, and 15 time units before) and snapshots not associated with blocking for at least 100 subsequent time units.
Feature selection and data analysis: Each snapshot is represented as an $l \times s$ matrix (links × frequency slots), which serves as the feature set. Data analysis is performed to identify patterns associated with upcoming blocking events.
Data preprocessing: Snapshots are reshaped to match the deep learning input requirements and organized into temporal sequences. The dataset is split into training (80%), validation (10%), and testing (10%) sets and labeled as leading-to-blocking or not-leading-to-blocking for 100 time units. This split ratio balances the need for a sufficiently large training set to prevent underfitting while retaining enough validation samples for hyperparameter tuning and early stopping. The traffic samples are generated under homogeneous stochastic processes (Poisson arrivals, exponential holding times), making the 10% validation set representative.
Model training: Deep learning models are trained on the processed data to learn mappings from network state features to the probability of a future blocking event.
Model testing: The trained CNN–BiLSTM and CNN–LSTM models are evaluated on unseen data to assess their predictive performance in identifying blocking events at least 13 time units before they occur.

CNN–LSTM and CNN–BiLSTM hybrid models combine spatial pattern detection (CNN) with temporal sequence modeling (LSTM/BiLSTM).

3. Network Simulator and Data Generation

The blocking rate in elastic optical networks depends on both topology (e.g., the number of nodes and links, path lengths, and the number of frequency slots per link) and traffic characteristics (e.g., offered load, connection size distribution, holding times, and inter-arrival processes). These factors jointly determine the level of spectrum fragmentation and the likelihood of rejecting new requests.

In this study, an EON is simulated using the NSFNET topology (Figure 1) which has been used in the United States and consists of 14 nodes and 21 bidirectional links

(21 \times 2 = 42

directed links). The 14 nodes in the NSFnet topology represent 14 cities in the USA, and each bidirectional link between cities has independent uplink and downlink channels with their own frequency slots.

Figure 2 depicts the NSFnet adjacency matrix, which contains 42 ones because, while there are 21 physical links, each link has separate and independent frequency spectra for uplink and downlink directions, resulting in 42 directed links. The NSFnet topology matrix is symmetric due to the bidirectional nature of the links. In the NSFnet topology adjacency matrix, the ones indicate the presence of a direct link between a source node (row) and a destination node (column). For instance, in the first row, the values are

[0, 1, 0, 1, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0]

, which indicates that Seattle, WA (node 1) is directly connected to Palo Alto, CA1 (node 2), San Diego, CA2 (node 4), and Champaign, IL (node 8).

For each directed link, a bandwidth of 2 THz is allocated. This bandwidth is divided into equal-sized frequency slots (FS) of 12.5 GHz each. Therefore, each directed link has 160 frequency slots, with each slot providing 12.5 GHz of bandwidth.

The Routing and Spectrum Allocation (RSA) algorithm selects appropriate routes and allocates spectral resources based on network requests [38,39]. The continuity and contiguity of spectrum slots are ensured during spectrum allocation. Traffic simulations were performed using a MATLAB-based simulator (MATLAB R2024a). Statistical characteristics of simulated traffic profiles are as follows:

Bandwidth: The connection sizes are uniformly distributed between 1FS and 10FS, thereby capturing traffic heterogeneity, which is a typical characteristic of EONs.
Arrival rate: The connection inter-arrival time (IAT) follows Poisson distribution with average value equal to 1 time unit.
Holding time: The connection holding times (HTs) follow negative exponential distribution, whose average value is adjusted to reach different network load conditions, which in turn impact blocking probability. The specific HT values used throughout this work are around 250 time units. This leads the network blocking probability to values ranging from 1% to 5%. Changing the connections’ duration parameters would lead the blocking probability to different values, and the prediction time spans explained below should be changed.

The network state is represented by matrices (42 links × 160 FSs), with snapshots collected continuously, totaling 90,000 samples stored as .csv files.

Data Preparation

The dataset is divided into two classes:

Leading-to-blocking: snapshots 13 time units before a blocking event.
Not-leading-to-blocking: snapshots without blocking events within at least 100 subsequent time units.

Three consecutive snapshots taken 15, 14, and 13 time units before a blocking event are labeled as leading-to-blocking (Figure 3). Conversely, three consecutive snapshots for which blocking does not occur within the subsequent 100 time units (taken 102, 101, and 100 time units in the future) are labeled as not-leading-to-blocking (Figure 4). These time parameters could be adjusted accordingly to the network load conditions that impact the blocking.

To allow sufficient time for the network management system to respond and prevent blocking, the blockage should be predicted as early as possible. In this study, for the preliminary analysis, as suggested in [1] we assume that an arbitrary time unit (t.u.) of 13 is sufficient for the network to correct and respond. This value is selected based on the size of the simulated network to illustrate and analyze the concept. In practice, it should be adjusted according to the dynamics of the actual network and the response time of the network administrator. We use the three snapshots taken before the 13 t.u. (i.e., before the blocking) as the training set for blocking prediction. Using three snapshots, rather than a single one, provides the deep learning models with additional information for improved analysis. This number is motivated by a trade-off between predictive accuracy and model complexity [1].

We define not-leading-to-blocking snapshots as those followed by at least 100 t.u. without blocking. This value is arbitrary and serves to illustrate the concept. Assuming a very high number would make the difference between the leading-to-blocking cases overly distinct. The deep learning models are designed to continuously analyze the network and trigger an alarm if a blocking is expected within 13 t.u. However, these values are specific to the IAT and HT used in the simulations of this study and should be adjusted accordingly. To evaluate the model’s performance, we compare its predictions with the actual simulation results.

Data splits include training (80%), validation (10%), and testing (10%). The input data for the models is represented as a numerical matrix of size 42 × 160, capturing a snapshot of the network’s spectral occupancy at a specific time unit (Figure 5). Each element of the matrix indicates the occupancy status of frequency slots across different network links, with occupied frequency slots marked by ones and empty frequency slots represented by by zeros.

4. Proposed Models

Two deep learning architectures were explored. One model is based on the CNN–LSTM architecture, denoted as Algorithm 1, and the other utilizes the CNN–BiLSTM architecture, referred to as Algorithm 2. Each algorithm possesses different complexity, accuracy, and interpretability. All models process occupancy snapshots

X \in R^{42 \times 160}

, representing the instantaneous optical network state, but differ in their convolutional depth and recurrent capacity.

CNN–LSTM is a family of hybrid deep learning models that combines Convolutional Neural Networks (CNNs) with Long Short-Term Memory (LSTM) networks to analyze data that contains both spatial and temporal patterns. The CNN component extracts spatial features from input data by applying convolution operations, where small filters called kernels move across the input matrix to detect specific patterns. These kernels are trainable parameters that learn to identify important features during the training process.

The Rectified Linear Unit (ReLU) activation function is employed in convolutional layers to introduce nonlinearity into the network, enabling the model to effectively learn complex spatial patterns by activating neurons selectively. Following each convolutional layer, batch normalization is applied to stabilize and accelerate the training process. Batch normalization normalizes the output of each convolutional layer by adjusting and scaling feature distributions, thereby reducing internal covariate shifts, mitigating issues like vanishing gradients, and ultimately leading to improved convergence speed and model stability.

The output of the batch normalization layer is applied to max pooling to simplify and reduce the dimensions of the resulting feature maps. max pooling divides the feature maps into smaller patches and selects the highest value from each patch, effectively preserving the most relevant information. This step reduces the dimensionality of the feature map, lowering computational complexity, decreasing the number of parameters, and improving the model’s robustness by reducing its sensitivity to small variations in the input data.

The extracted spatial features are processed by the LSTM component to capture temporal dependencies. LSTM networks use memory cells and gating mechanisms to remember information across time steps, making them effective for understanding how patterns change over time. Global average pooling is used to convert the variable-length LSTM outputs into fixed-size vectors that can be processed by the final classification layers.

CNN–BiLSTM extends the basic CNN–LSTM architecture by replacing unidirectional LSTM with Bidirectional LSTM (BiLSTM). The BiLSTM processes sequences in both forward and backward directions. This bidirectional processing allows the model to provide a more complete understanding of temporal patterns.

In BiLSTM, one LSTM processes the sequence from start to end, while another processes it from end to start. The outputs from both directions are combined, through concatenating and averaging, to create a comprehensive representation. This approach captures more temporal context compared to unidirectional processing, often leading to better performance in sequence modeling tasks.

After spatial and temporal feature extraction, fully connected layers perform the classification task. These layers consist of neurons that connect to all outputs from the previous layer through learnable weights. To prevent overfitting, a dropout process is employed, which randomly disables a fraction of neurons during training. Activation function ReLU (Rectified Linear Unit) is used to introduce non-linearity, while in the output layer, one neuron and the Sigmoid activation function are used for the classification process.

4.1. Proposed CNN–LSTM (Algorithm 1)

In our proposed CNN–LSTM model, we introduce a reduced-complexity hybrid deep learning algorithm specifically designed for efficient analysis of NSF-NET EON network data containing both spatial and temporal features while significantly reducing computational resources. In a deep learning model, the size of the filters and kernels significantly affects the classification results. Our algorithm is carefully optimized using GridSearchCV to converge better. GridSearchCV is a hyperparameter tuning technique for finding the optimal parameter values for accurate feature extraction and classification.

Figure 6 depicts the structure of our proposed CNN–LSTM model. The input to the model is a tensor of shape (42, 160), where 42 represents the number of optical links and 160 denotes the number of frequency slots. This structure allows the model to analyze the spectral usage across multiple links simultaneously. The data is reshaped to ensure compatibility with 2D convolutional operations.

Initially, our method uses a specialized CNN to capture spatial patterns from the input data. The CNN employs three convolutional layers, each using small, row-wise kernels (1 × 3). The first convolutional layer applies eight filters, followed by batch normalization and max pooling (1 × 2), reducing the spectral dimension. Similarly, the second convolutional layer uses 16 filters, and the third uses 32 filters, each followed by batch normalization and max pooling, progressively refining and compressing spectral information.

Subsequently, the extracted features from the CNN layers are reshaped into a sequence format suitable for processing by the LSTM network. The LSTM consists of a single layer with 64 units and is designed to capture dependencies across the set of optical links. A global average pooling operation is applied over the LSTM outputs, reducing the temporal dimension and producing a fixed-size feature vector. This vector is then passed through a dense layer with 32 units and ReLU activation, followed by a dropout (rate = 0.3) to mitigate overfitting. The final dense output layer with sigmoid activation performs binary classification, leading to (blocking vs. unblocking).

The model is trained using the Adam optimizer (learning rate = 0.0001), with early stopping based on validation loss (patience = 40 epochs), and incorporates class weighting based on the frequency of each class in the training set.

The pseudocode of the CNN–LSTM pipeline (Algorithm 1) is as follows:

Algorithm 1: CNN–LSTM

4.2. Proposed CNN–BiLSTM (Algorithm 2)

To complement the CNN–LSTM model, we introduce a lightweight CNN–BiLSTM architecture. Figure 7 illustrates the architecture of the proposed model. The input is a tensor of shape (42, 160), where 42 denotes the number of optical links and 160 the number of frequency slots. As in the previous model, the data is reshaped to incorporate 2D convolution operations.

After performing hyperparameter tuning as described in Section 5 (within the hyperparameter search space defined in Table 1), we obtained the optimal values for the proposed architectures. These optimal hyperparameter values are summarized in Table 2 and illustrated in Figure 7. The model begins with spatial feature extraction layers. This feature extraction module consists of three convolutional layers with progressively increasing filter sizes: 16, 32, and 64. These values are selected based on the hyperparameter tuning analysis, which will be explained in Section 5. Each layer uses 3 × 3 kernels with ReLU activation, followed by batch normalization and max pooling (1 × 2) to reduce dimension, ensuring efficient spectral compression. The output of CNN layers is reshaped to be prepared for the next layer.

A Bidirectional LSTM (BiLSTM) with 64 units is employed to model bidirectional dependencies across optical links. The BiLSTM output is aggregated using global average pooling, followed by a dense layer with 64 units and ReLU activation. A dropout (rate = 0.3) is applied, followed by an output neuron with sigmoid activation for binary classification. Both CNN–LSTM and CNN–BiLSTM models use 64 units in the recurrent layer. In the BiLSTM, the 64 units work in both forward and backward directions, which increases the temporal learning capacity while keeping the same number of units for consistency. The model is trained using Adam optimizer (learning rate = 0.0001) and binary cross-entropy loss, incorporating early stopping (patience = 40 epochs).

The BiLSTM is adopted because spectrum fragmentation evolves with both short- and long-range temporal dependencies; by processing sequences in forward and backward directions, the BiLSTM can capture richer temporal context than a unidirectional LSTM (which processes sequences in a forward direction), leading to improved prediction accuracy.

The pseudocode of the CNN–BiLSTM pipeline is shown in Algorithm 2.

Algorithm 2: CNN–BiLSTM

4.3. Key Differences Between Proposed Algorithms

The two proposed architectures differ primarily in their CNN depth, recurrent structure, and fully connected layers. Algorithm 1 employs a lightweight CNN architecture, whereas Algorithm 2 utilizes a deeper CNN, impacting their spatial feature extraction capabilities. Additionally, Algorithm 1 incorporates a unidirectional LSTM for temporal modeling, while Algorithm 2 enhances temporal modeling by using a bidirectional LSTM (BiLSTM). Finally, Algorithm 1 uses a fully connected dense layer with 32 units for blocking prediction, whereas Algorithm 2 includes a larger dense layer with 64 units.

Together, these two pipelines systematically explore different neural designs, allowing a clear assessment of their impacts on complexity, accuracy, and interpretability.

5. Hyperparameter Tuning

The performance of deep learning models depends significantly on proper hyperparameter selection. To optimize the CNN–LSTM and CNN–BiLSTM architectures for blocking prediction in elastic optical networks (EONs), we conducted systematic hyperparameter tuning using the GridSearchCV method. This approach ensures that our models achieve optimal performance while maintaining computational efficiency.

GridSearchCV is a systematic technique that explores different combinations of hyperparameters by evaluating model performance across a predefined parameter grid. This method provides a comprehensive search strategy that identifies the optimal parameter configuration for our specific blocking prediction task.

The hyperparameter tuning process follows the following structured approach:

Define the search space for each hyperparameter.
Create a grid of all possible parameter combinations.
Train and evaluate models for each combination using cross-validation.
Select the hyperparameter configuration (including number of filters, kernel size, number of LSTM/BiLSTM units, dense layer size, learning rate, batch size, dropout rate, and pooling size) that yields the best validation performance.

Table 1 presents the hyperparameters investigated for both CNN–LSTM and CNN–BiLSTM models, along with their respective search ranges.

The total hyperparameter tuning process required approximately 72 h of computation time on 16 GB RAM, AMD Ryzen 7 Microsoft Surface^® Edition 2.00 GHz (Microsoft Corporation, Redmond, WA, USA), evaluating over 18,000 different parameter combinations for both model architectures.

Based on the GridSearchCV results, the optimal hyperparameter configurations for both models are presented in Table 2. These selected configurations represent the final tuned hyperparameters for the CNN–LSTM and CNN–BiLSTM models, and they were used in all subsequent training and evaluation experiments.

It should be noted that the hyperparameters tuned in this work are specific to the NSFNET topology. In general, the optimal values are influenced by both network topology and traffic characteristics. Topological factors such as the number of nodes, link density, and number of frequency slots per link and traffic factors affect the input dimensionality and feature extraction depth required. Consequently, for a larger network or a network with a different topology, hyperparameter re-optimization will be necessary to ensure generalization and robust predictive performance.

Hyper-Parameter Correlation Analysis

Figure 8 shows the Pearson correlation coefficients between the tuned hyperparameters and the validation accuracy of the CNN–LSTM model. The strongest positive association is observed for the number of training epochs (

ρ \approx 0.67

), suggesting that longer training within the tested range is linked to higher validation accuracy. Learning rate magnitude also shows a positive association (

ρ \approx 0.54

), with step sizes in the moderate

10^{- 4} - 10^{- 3}

range tending to achieve better results than smaller values. Increasing the width of the deepest convolutional block (conv3_filters,

ρ \approx 0.39

) was also positively associated with accuracy, while enlarging the first convolutional layer (conv1_filters,

ρ \approx - 0.48

) was negatively associated, indicating potential sensitivity to over-parameterization in early layers. Dropout rates above 0.3 showed a negative correlation (

ρ \approx - 0.30

), which may reflect the removal of too many informative features. The remaining parameters (dense-layer size, intermediate convolutional width, and LSTM hidden units) show only weak correlations (

| ρ | < 0.20

).

Figure 9 shows the Pearson correlation coefficients between various hyperparameters and validation accuracy for the CNN–BiLSTM model. The number of training epochs has the strongest positive association (

ρ = 0.905

), indicating that within the tested range, longer training was linked to higher validation accuracy. Moderate positive associations were also observed for dense layer units (

ρ = 0.291

), learning rate (

ρ = 0.247

), second convolutional layer filters (

ρ = 0.236

), and dropout rate (

ρ = 0.226

), suggesting that adjustments to these parameters can influence performance. By contrast, the third convolutional layer filters showed a negative correlation (

ρ = - 0.257

), suggesting that excessive depth in convolutional layers may reduce performance in this setting. Hyperparameters, such as LSTM units (

ρ = 0.094

) and first convolutional layer filters (

ρ = 0.051

), showed only weak associations, implying limited impact on accuracy within the explored ranges. Overall, these results should be interpreted as descriptive trends rather than causal relationships, providing useful guidance on which hyperparameters were most influential during tuning.

6. Validation Criteria

To ensure a comprehensive assessment of the model’s performance in binary classification leading to (blocking vs. unblocking), we utilize the validation metrics explained below.

6.1. Confusion Matrix

The confusion matrix is a table representation that summarizes the model’s classification performance across both classes:

True Positives (TPs): Correctly predicted “leading-to-blocking” instances;
True Negatives (TNs): Correctly predicted “Not-leading-to-blocking” instances;
False Positives (FPs): Incorrectly predicted “leading-to-blocking” when actually “Not-leading-to-blocking”;
False Negatives (FNs): Incorrectly predicted “Not-leading-to-blocking” when actually “leading-to-blocking”.

It provides deeper insight into the types of classification errors and is especially useful for analyzing class-specific performance.

6.2. Accuracy

Accuracy measures the proportion of correctly classified samples among all samples:

Accuracy = \frac{T P + T N}{T P + F P + F N + T N}

where

T P

,

T N

,

F P

, and

F N

represent true positives, true negatives, false positives, and false negatives, respectively.

6.3. Loss

The model is trained using binary cross-entropy loss, defined as follows:

Loss = \frac{1}{N} \sum_{i} [- y_{i} log ({\hat{y}}_{i}) - (1 - y_{i}) log (1 - {\hat{y}}_{i})]

where

y_{i}

is the true label and

{\hat{y}}_{i}

is the predicted probability and N is the number of samples in the batch. Lower loss values indicate better alignment between predictions and ground truth. It is tracked for both training and validation sets to monitor convergence and overfitting.

6.4. Precision

Precision indicates how many of the predicted positive instances are actually positive:

Precision = \frac{T P}{T P + F P} .

High precision means fewer false positives, which is important in scenarios where false alarms must be minimized. Precision indicates how many of the “leading-to-blocking” predictions were actually correct. High precision reduces false alarms.

6.5. Recall (Sensitivity)

Recall measures how many of the actual positive instances were correctly identified:

Recall = \frac{T P}{T P + F N} .

Recall measures the model’s ability to detect all actual positive cases (“leading-to-blocking”). High recall indicates that most “leading-to-blocking” cases are detected, which is critical in ensuring blocked connections are not missed.

6.6. F1-Score

The F1-score is the harmonic mean of precision and recall:

F 1 = 2 \cdot \frac{Precision \cdot Recall}{Precision + Recall} .

It provides a balanced measure when there is a trade-off between precision and recall. It is a single metric that accounts for both false positives and false negatives.

By analyzing all these metrics together, we can be confident that the model works well in terms of accuracy, stability, and error.

7. Training

All models were trained using Adaptive Moment Estimation (Adam), a type of mini-batch stochastic gradient descent algorithm, with a learning rate

η = 10^{- 4}

and mini-batch sizes of 32 and 128 for Algorithm 1 (CNN–LSTM) and Algorithm 2 (CNN–BiLSTM), respectively. To avoid overfitting and enhance training efficiency, early stopping was applied based on validation accuracy with a patience of 40 epochs. The complete Keras models were saved after convergence. CNN–LSTM and CNN–BiLSTM models were trained and validated using the prepared dataset. Predictions were compared with actual simulation outcomes, emphasizing early detection of potential blocking to facilitate proactive network management.

Figure 10 illustrates the training and validation accuracy curves for the CNN–LSTM model over 50 epochs, demonstrating the learning dynamics of the proposed architecture. The training accuracy rises rapidly during the initial epochs, reaching over 90% within the first 10 epochs and gradually converging near 98%. The validation accuracy also exhibits rapid initial improvement, stabilizing around 93%, slightly lower than the training accuracy.

Figure 11 depicts the training and validation accuracy curves of the CNN–BiLSTM model across 50 epochs, highlighting its efficient training dynamics and superior generalization performance. The training accuracy steadily approaches slightly more than 98%, while the validation accuracy experiences initial fluctuations before stabilizing around 95%. The small gap between training and validation curves confirms lower overfitting compared to the CNN–LSTM model.

8. Evaluation and Result

Figure 12 shows the confusion matrix obtained from evaluating the CNN–LSTM model on the test dataset, providing a clear visualization of the model’s predictive performance. The model correctly predicted 839 instances as blocking (True Positives) and accurately classified 1078 instances as non-blocking (True Negatives), demonstrating its robustness. However, it incorrectly classified 77 non-blocking events as blocking (False Positives), and 75 blocking events were misclassified as non-blocking (False Negatives). The overall accuracy reached approximately 92.65%, with precision and recall values of approximately 91.59% and 91.79%, respectively, highlighting the model’s strong capability to accurately detect blocking scenarios while maintaining a balanced error distribution between false alarms and missed detections.

Figure 13 presents the confusion matrix for the CNN–BiLSTM model evaluated on the test dataset, clearly illustrating its superior predictive accuracy. The model correctly identified 841 blocking events as True Positives and 1106 non-blocking events as True Negatives, indicating strong predictive performance with high accuracy, precision, and recall. The model generated fewer incorrect predictions, with only 49 False Positives and 73 False Negatives. Consequently, the CNN–BiLSTM achieved an impressive accuracy of approximately 94.10%, with high precision and recall rates of approximately 94.49% and 92.01%, respectively, underscoring its effectiveness in accurately and consistently predicting blocking scenarios while significantly minimizing false alerts and missed events.

8.1. Complexity, Memory, and Accuracy Comparison

Table 3 presents an empirical comparison of the proposed CNN–LSTM and CNN–BiLSTM models in terms of complexity, memory usage, validation accuracy, test accuracy, and computational cost measured by FLOPs (floating-point operations per inference request). The CNN–BiLSTM (Algorithm 2) exhibits better predictive performance, achieving higher validation accuracy (0.9511) and test accuracy (0.9410), compared to the CNN–LSTM (Algorithm 1), which achieves validation and test accuracies of 0.9356 and 0.9265, respectively. However, this improved performance comes at the cost of increased model complexity and computational demands: CNN–BiLSTM consists of approximately 0.72 million parameters, requiring around 2.9 MB of memory and approximately 150 million FLOPs per inference request. In contrast, CNN–LSTM offers a more lightweight alternative with roughly 0.18 million parameters, consuming about 0.71 MB of memory and significantly fewer computational resources (∼23 million FLOPs). Thus, Algorithm 1 significantly reduces complexity to suit resource-constrained environments, compromising some predictive power for faster training. Conversely, Algorithm 2 balances complexity and accuracy by reintroducing bidirectional recurrent layers with moderate convolutional depth.

8.2. Complexity Analysis

Table 4 and Table 5 provide the detailed complexity analysis of the CNN–LSTM and CNN–BiLSTM models. In Table 3, the CNN–LSTM model shows a lightweight design with about 185k trainable parameters in total compared to the CNN–BiLSTM model. By contrast, Table 4 shows that the CNN–BiLSTM model contains more than 720k trainable parameters. The deeper convolutional stack and larger dense layer further increase complexity. Although this leads to higher memory requirements and computational cost, the richer temporal modeling capacity improves predictive accuracy, as confirmed in the evaluation results. The tables use the following notations (notations used in Table 4 and Table 5):

$k_{h}, k_{w}$ : kernel height and width in convolutional layers.
$C_{i n}, C_{o u t}$ : number of input and output channels (filters).
C: number of channels for batch normalization.
D: input dimension to the recurrent unit.
H: number of hidden units in the LSTM/BiLSTM layer.
$U_{i n}, U_{o u t}$ : number of input and output neurons in a dense (fully connected) layer.

9. Discussion

To achieve a fair comparison, all baseline methods (e.g., KDD, SVM, 1D CNN, and 2D CNN) were re-implemented following the original papers to ensure methodological consistency. Where source code was available, we used it directly; otherwise, we adhered to the algorithmic descriptions. All methods were trained and tested on the same input format (network state matrices with identical slot granularity) and under identical traffic patterns (Poisson arrivals, uniform bandwidth distribution). Performance metrics (blocking probability, execution time) were measured consistently across methods.

Table 6 and Figure 14 compare the performance of the proposed CNN–BiLSTM and CNN–LSTM models with several baselines, including 2D CNN and 1D CNN from [1], as well as traditional machine learning approaches such as SVM and KNN. The main metrics test accuracy, test loss, macro F1-score, and weighted F1-score show that the CNN–BiLSTM achieves the best results, with a test accuracy of 94.01% and a macro F1-score of 94.01%. This indicates that using 2D CNNs together with a bidirectional LSTM to learn how spectrum fragmentation changes over time provides a clear advantage over unidirectional methods.

CNN–LSTM also performs well, reaching 92.65% test accuracy and weighted F1-scores, but with slightly higher test loss (21.68%) compared to CNN–BiLSTM. The 2D CNN method from [1] shows accuracy (92.17%) but a higher loss (22.81%) compared to our hybrid models, and its training time (23,561 s) is much longer than theirs. The 1D CNN from [1] shows lower performance across all metrics compared to hybrid models and 2D CNNs, and it requires even more training time (135,960 s). The traditional machine learning models SVM and KNN have a lower performance than hybrid deep learning methods. KNN achieves the lowest accuracy (77.29%). These results show that the proposed hybrid CNN–BiLSTM model improves accuracy across different evaluation metrics.

Although the proposed CNN–BiLSTM model needs more training time (3771 s) than the simpler CNN–LSTM (1966 s), its high macro F1-score (94.01%) translates to more accurate alerts for blocking events. These results show that a hybrid approach using both CNNs and recurrent layers can significantly improve blocking predictions while keeping computation manageable. Capturing the local spatial patterns alongside temporal changes in fragmentation is key to achieving a high performance.

10. Conclusions

This paper proposed two models based on CNN–LSTM/BiLSTM. These models are designed to predict spatial fragmentation patterns and their temporal evolution in elastic optical networks. Using a combination of three convolutional layers and a bidirectional LSTM, the proposed method achieved an accuracy of 94.1% and a macro F1-score of 0.94 on the NSFNET dataset. This result outperformed both purely convolutional and standard CNN–LSTM variants.

The model is trained to predict 13 time units ahead of potential blocking events, enabling proactive resource management actions such as spectrum defragmentation or connection rerouting. Additionally, the model is compact, consisting of around 0.72 million parameters, making it suitable for use on standard GPU hardware. The other proposed model, CNN–LSTM, represents a lightweight variant with about 0.18 million parameters and provides reasonable competitive accuracy. However, it has a much shorter training time, suitable for faster and adaptive deployment.

Future work for this research involves expanding the applicability of the proposed model to more complex and larger networks, such as GEANT or other continental-scale mesh topologies, to verify its generalization capability. Applying domain-adaptation methods, such as fine-tuning on selected snapshots, could further enhance the model’s adaptability. Additionally, investigating predictive capabilities over multiple future horizons, such as 5, 10, and 20 steps ahead, would significantly enhance the model’s adaptability and relevance to dynamic, real-world network conditions. Finally, employing graph-aware neural network architectures, such as graph convolutional layers or attention mechanisms on nodes and links, would likely improve the representation of routing constraints that conventional two-dimensional occupancy matrices fail to capture.

Author Contributions

Conceptualization, F.N.; methodology, F.N.; software, F.N.; validation, F.N.; formal analysis, F.N.; investigation, F.N.; resources, F.N.; data curation, F.N.; writing—original draft preparation, F.N.; writing—review and editing, J.C., U.K. and F.N.; visualization, F.N.; supervision, J.C. and U.K. All authors have read and agreed to the published version of the manuscript.

Funding

This research was partially funded by the Catalonia Government through the Agency for Management of University and Research Grants, grant SGR-2021-00598.

Data Availability Statement

The raw data supporting the conclusions of this article will be made available by the authors on request.

Conflicts of Interest

The authors declare no conflicts of interest.

Abbreviations

The following abbreviations are used in this manuscript:

EONs	Elastic Optical Networks
CNN–BiLSTM	Convolutional Neural Network-Bidirectional Long Short-Term Memory
CNN–LSTM	Convolutional Neural Network–Long Short-Term Memory
NSFNET	National Science Foundation Network
KNN	K-Nearest Neighbors
SVM	Support Vector Machine
WDM	Wavelength Division Multiplexing
2D CNNs	Two-Dimensional Convolutional Neural Networks
DRL	Deep reinforcement learning
RSA	Routing and Spectrum Assignment
GCNs	Graph Convolutional Networks
RNNs	Recurrent Neural Networks
SDN	Software-Defined Network
ANN	Artificial Neural Network
DBNs	Deep Belief Networks
DNN	Deep Neural Network

References

Nourmohammadi, F.; Parmar, C.; Wings, E.; Comellas, J. Using convolutional neural networks for blocking prediction in elastic optical networks. Appl. Sci. 2024, 14, 2003. [Google Scholar] [CrossRef]
Li, C.; Huang, Y.C.; Mu, L. Inter-Core Crosstalk Aware Deep Reinforcement Learning Based Resource Allocation in Multicore Elastic Optical Networks. In Proceedings of the 2023 Asia Communications and Photonics Conference/2023 International Photonics and Optoelectronics Meetings (ACP/POEM), Wuhan, China, 4–7 November 2023; pp. 1–5. [Google Scholar] [CrossRef]
Yan, D.; Feng, N.; Lv, J.; Ren, D.; Hu, J.; Zhao, J. DRL-based fragmentation-and impairment-aware resource allocation algorithm in C + L band elastic optical networks. Opt. Fiber Technol. 2025, 90, 104133. [Google Scholar] [CrossRef]
Chatterjee, B.C.; Ba, S.; Oki, E. Fragmentation problems and management approaches in elastic optical networks: A survey. IEEE Commun. Surv. Tutor. 2017, 20, 183–210. [Google Scholar] [CrossRef]
Xie, J.; Song, Y.; Zhang, Y.; Li, S.; Zhang, M.; Wang, D. Physical Layer-Aware Route and Spectrum Allocation in Optical Networks by Multi-Objective Deep Reinforcement Learning. In Proceedings of the 2024 Asia Communications and Photonics Conference (ACP) and International Conference on Information Photonics and Optical Communications (IPOC), Beijing, China, 2–5 November 2024; IEEE: Piscataway, NJ, USA, 2024; pp. 1–4. [Google Scholar]
Comellas, J.; Nourmohammadi, F.; Junyent, G. Impact of traffic delay tolerance on elastic optical networks performance. In Proceedings of the 2019 21st International Conference on Transparent Optical Networks (ICTON), Angers, France, 9–13 July 2019; IEEE: Piscataway, NJ, USA, 2019; pp. 1–4. [Google Scholar]
Asiri, A.; Wang, B. Deep Reinforcement Learning for QoT-Aware Routing, Modulation, and Spectrum Assignment in Elastic Optical Networks. J. Light. Technol. 2024, 43, 42–60. [Google Scholar] [CrossRef]
Pinto-Ríos, J.; Leiva, A.; Feris, B.D.; Iglesias, D.; Cuevas, C.; Jara, N.; Olivares, R.; Morales, P.; Bórquez-Paredes, D.; Saavedra, G. Dynamic multicore elastic optical networks: A comparative study of performance using heuristics and artificial intelligence. In Proceedings of the 2024 24th International Conference on Transparent Optical Networks (ICTON), Bari, Italy, 14–18 July 2024; IEEE: Piscataway, NJ, USA, 2024; pp. 1–6. [Google Scholar]
Choudhury, P.D.; De, T. Recent developments in Elastic Optical Networks Using Machine Learning. In Proceedings of the 2019 21st International Conference on Transparent Optical Networks (ICTON), Angers, France, 9–13 July 2019; pp. 1–3. [Google Scholar] [CrossRef]
Xu, L.; Huang, Y.C.; Xue, Y.; Hu, X. Deep reinforcement learning-based routing and spectrum assignment of EONs by exploiting GCN and RNN for feature extraction. J. Light. Technol. 2022, 40, 4945–4955. [Google Scholar] [CrossRef]
Zhu, X.; Wang, J.; Lai, Q.; Luo, X.; Ren, R.; Lu, H. Research on 5G Network Slicing Type Prediction Based on Random Forest and Deep Neural Network. In Proceedings of the 2023 IEEE 8th International Conference on Big Data Analytics (ICBDA), Harbin, China, 3–5 March 2023; pp. 154–158. [Google Scholar] [CrossRef]
Tang, B.; Huang, Y.C.; Xue, Y.; Zhou, W. Heuristic Reward Design for Deep Reinforcement Learning-Based Routing, Modulation and Spectrum Assignment of Elastic Optical Networks. IEEE Commun. Lett. 2022, 26, 2675–2679. [Google Scholar] [CrossRef]
Singh, S.K.; Jukan, A. Computing blocking probabilities in elastic optical networks with spectrum defragmentation. In Proceedings of the IEEE INFOCOM 2019-IEEE Conference on Computer Communications, Paris, France, 29 April–2 May 2019; IEEE: Piscataway, NJ, USA, 2019; pp. 424–432. [Google Scholar]
Ujjwal; Thangaraj, J. Limitation of Erlang B Traffic Model in elastic optical network for blocking probability estimation. J. Opt. Commun. 2021, 42, 249–258. [Google Scholar] [CrossRef]
Cheng, L.; Qiu, Y. Routing and spectrum assignment employing long short-term memory technique for elastic optical networks. Opt. Switch. Netw. 2022, 45, 100684. [Google Scholar] [CrossRef]
Dávalos, E.; Enciso, J.L.; Silva, N.; Pinto-Ríos, J.; Leiva, A. Triggering strategy for defragmentation process in elastic optical networks using machine learning techniques. ICT Express 2023, 9, 890–895. [Google Scholar] [CrossRef]
Panayiotou, T.; Michalopoulou, M.; Ellinas, G. Survey on Machine Learning for Traffic-Driven Service Provisioning in Optical Networks. IEEE Commun. Surv. Tutor. 2023, 25, 1412–1443. [Google Scholar] [CrossRef]
Knapińska, A.; Kanimba, R.; Yesilyurt, Y.; Walkowiak, K. Application of Ensemble Regression Methods in Elastic Optical Network Optimization. In Proceedings of the 5th Polish Conference on Artificial Intelligence (PP-RAI 2024), Warsaw, Poland, 18–20 April 2024; Available online: https://research.chalmers.se/publication/543844 (accessed on 28 July 2025).
Easwaran, S.; Shadaram, M. Enhanced resource allocation in elastic optical network using deep learning and optimization process. Opt. Fiber Technol. 2025, 93, 104210. [Google Scholar] [CrossRef]
Gao, X.; Wang, J.; Zhou, M. The research of resource allocation method based on GCN-LSTM in 5G network. IEEE Commun. Lett. 2022, 27, 926–930. [Google Scholar] [CrossRef]
Pinto-Ríos, J.; Calderón, F.; Leiva, A.; Hermosilla, G.; Beghelli, A.; Bórquez-Paredes, D.; Lozada, A.; Jara, N.; Olivares, R.; Saavedra, G. Resource allocation in multicore elastic optical networks: A deep reinforcement learning approach. Complexity 2023, 2023, 4140594. [Google Scholar] [CrossRef]
Borylo, P.; Biernacka, E.; Domzal, J.; Kadziolka, B.; Kantor, M.; Rusek, K.; Skala, M.; Wajda, K.; Wojcik, R.; Zabek, W. Neural Networks in Selected Aspects of Communications and Networking. IEEE Access 2024, 12, 132856–132890. [Google Scholar] [CrossRef]
Azzouni, A.; Pujolle, G. NeuTM: A Neural Network-Based Framework for Traffic Matrix Prediction in SDN. In Proceedings of the 2018 IEEE/IFIP Network Operations and Management Symposium (NOMS), Taipei, Taiwan, 23–27 April 2018; pp. 1–5. [Google Scholar] [CrossRef]
Aibin, M.; Walkowiak, K. Monte Carlo Tree Search for Cross-Stratum Optimization of Survivable Inter-Data Center Elastic Optical Network. In Proceedings of the 2018 10th International Workshop on Resilient Networks Design and Modeling (RNDM), Longyearbyen, Norway, 27–29 August 2018; pp. 1–7. [Google Scholar] [CrossRef]
Poupart, P.; Chen, Z.; Jaini, P.; Fung, F.; Susanto, H.; Geng, Y.; Chen, L.; Chen, K.; Jin, H. Online Flow Size Prediction for Improved Network Routing. In Proceedings of the 2016 IEEE 24th International Conference on Network Protocols (ICNP), Singapore, 8–11 November 2016; pp. 1–6. [Google Scholar] [CrossRef]
Nie, L.; Jiang, D.; Yu, S.; Song, H. Network Traffic Prediction Based on Deep Belief Network in Wireless Mesh Backbone Networks. In Proceedings of the 2017 IEEE Wireless Communications and Networking Conference (WCNC), San Francisco, CA, USA, 19–22 March 2017; pp. 1–5. [Google Scholar] [CrossRef]
Reddy, S.P.; Kumar, A.P. Optimizing Defect Detection in Elastic Optical Networks using a Deep Neural Network and Fuzzy Logic. In Proceedings of the 2024 9th International Conference on Communication and Electronics Systems (ICCES), Coimbatore, India, 16–18 December 2024; pp. 606–610. [Google Scholar] [CrossRef]
Maryam, H.; Panayiotou, T.; Ellinas, G. Multi-Step Traffic Prediction for Multi-Period Planning in Optical Networks. In Proceedings of the 2024 24th International Conference on Transparent Optical Networks (ICTON), Bari, Italy, 14–18 July 2024; pp. 1–5. [Google Scholar] [CrossRef]
Vinchoff, C.; Chung, N.; Gordon, T.; Lyford, L.; Aibin, M. Traffic Prediction in Optical Networks Using Graph Convolutional Generative Adversarial Networks. In Proceedings of the 2020 22nd International Conference on Transparent Optical Networks (ICTON), Bari, Italy, 19–23 July 2020; pp. 1–4. [Google Scholar]
Aibin, M.; Chung, N.; Gordon, T.; Lyford, L.; Vinchoff, C. On Short- and Long-Term Traffic Prediction in Optical Networks Using Machine Learning. In Proceedings of the 2021 International Conference on Optical Network Design and Modeling (ONDM), Gothenburg, Sweden, 28 June–1 July 2021; pp. 1–6. [Google Scholar] [CrossRef]
Goscien, R. Traffic Prediction- and Explainable Artificial Intelligence-Based Dynamic Routing in Software-Defined Elastic Optical Networks. In Proceedings of the 2024 IFIP Networking Conference (IFIP Networking), Thessaloniki, Greece, 3–6 June 2024; pp. 750–756. [Google Scholar] [CrossRef]
Li, J.; Wang, F.; Yao, H.; Tian, F.; Yin, X.; Zhang, Q. Topology-Aware-based Traffic Prediction Mechanism for Elastic Cognitive Optical Networks. In Proceedings of the 2023 International Wireless Communications and Mobile Computing (IWCMC), Marrakesh, Morocco, 19–23 June 2023; pp. 321–326. [Google Scholar] [CrossRef]
Chen, X.; Li, B.; Proietti, R.; Lu, H.; Zhu, Z.; Yoo, S.J.B. DeepRMSA: A Deep Reinforcement Learning Framework for Routing, Modulation and Spectrum Assignment in Elastic Optical Networks. J. Light. Technol. 2019, 37, 4155–4163. [Google Scholar] [CrossRef]
Valkanis, A.; Papadimitriou, G.; Beletsioti, G.; Varvarigos, E.; Nicopolitidis, P. Efficiency and Fairness Improvement for Elastic Optical Networks Using Reinforcement Learning-Based Traffic Prediction. J. Opt. Commun. Netw. 2021, 14, 25–42. [Google Scholar] [CrossRef]
Lu, W.; Li, J.; Wang, J.; Qin, L. A CNN-BiLSTM-AM method for stock price prediction. Neural Comput. Appl. 2021, 33, 4741–4753. [Google Scholar] [CrossRef]
Yang, M.; Wang, J. Adaptability of financial time series prediction based on BiLSTM. Procedia Comput. Sci. 2022, 199, 18–25. [Google Scholar] [CrossRef]
Nourmohammadi, F.; Jumabayev, A.; Wings, E. Anomaly Detection in the Time Series Data from Fehn Pollux Ship with ECO Flettner Rotor. In Proceedings of the 2021 IEEE 19th International Conference on Industrial Informatics (INDIN), Palma de Mallorca, Spain, 21–23 July 2021; pp. 1–6. [Google Scholar] [CrossRef]
Sahu, S.; Clement, J.C. Deep learning techniques for quality of transmission estimation in optical networks. Opt. Commun. 2025, 574, 131223. [Google Scholar] [CrossRef]
Xu, L.; Huang, Y.C.; Xue, Y.; Hu, X. Hierarchical reinforcement learning in multi-domain elastic optical networks to realize joint RMSA. J. Light. Technol. 2023, 41, 2276–2288. [Google Scholar] [CrossRef]

Figure 1. National Science Foundation Network (NSFNET) topology.

Figure 2. NSFNET topology matrix.

Figure 3. An example scenario illustrating network blocking occurs when a sequence of status matrices indicates that the network is unable to allocate a new connection due to insufficient contiguous and continuous frequency slots (FSs).

Figure 4. An example scenario illustrates spectral status matrices in which sufficient continuous and contiguous frequency slots (FS) are available, thus preventing network blocking (not-leading-to-blocking spectral status matrices).

Figure 5. The status matrix of an elastic optical network (EON) indicates a snapshot of the network state at a given moment. The dimensions of this matrix are

42 \times 160

, corresponding to the number of links and frequency slots, respectively.

Figure 5. The status matrix of an elastic optical network (EON) indicates a snapshot of the network state at a given moment. The dimensions of this matrix are

42 \times 160

, corresponding to the number of links and frequency slots, respectively.

Figure 6. Proposed CNN–LSTM architecture for the network state classification between leading and not-leading-to-blocking.

Figure 7. Proposed CNN–BiLSTM architecture for the network state classification between leading and not-leading-to-blocking.

Figure 8. Pearson correlation coefficients between each hyperparameter and validation accuracy for the CNN–LSTM model. Positive values indicate that accuracy increases as the parameter grows; negative values indicate the converse.

Figure 9. Pearson correlation coefficients between each hyper-parameter and validation accuracy for CNN–BiLSTM model. Positive values indicate that accuracy tends to increase as the parameter grows, whereas negative values indicate the opposite trend.

Figure 10. Training and validation accuracy curves in the proposed CNN–LSTM model.

Figure 11. Training and validation accuracy curves in the proposed CNN–BiLSTM model.

Figure 12. Confusion matrix of the CNN–LSTM model used.

Figure 13. Confusion matrix of the CNN–BiLSTM model used.

Figure 14. Comparison of test accuracy, macro F1, and weighted F1 for models.

Table 1. Hyperparameter search space.

Hyperparameter	CNN–LSTM Range	CNN–BiLSTM Range
No. of filters in first convolutional layer	8, 16, 32	16, 32, 64
No. of filters in second convolutional layer	16, 32, 64	32, 64, 128
No. of filters in third convolutional layer	32, 64, 128	64, 128, 256
Size of convolutional kernels	(1,3), (3,3)	(3,3), (5,5)
No. of LSTM/BiLSTM units	32, 64, 128	32, 64, 128
No. of neurons in dense layer units	16, 32, 64	32, 64, 128
Optimizer learning rate	$10^{- 3}$ , $10^{- 4}$ , $10^{- 5}$	$10^{- 3}$ , $10^{- 4}$ , $10^{- 5}$
Training batch size	16, 32, 64	32, 64, 128
Dropout rate	0.2, 0.3, 0.5	0.2, 0.3, 0.5
Max pooling size	(1,2), (2,2)	(1,2), (2,2)

Table 2. Optimal hyperparameter configurations for CNN–LSTM and CNN–BiLSTM models.

Hyperparameter	CNN–LSTM	CNN–BiLSTM
No. of filters in first convolutional layer	8	16
No. of filters in second convolutional layer	16	32
No. of filters in third convolutional layer	32	64
Size of convolutional kernels	(1,3)	(3,3)
No. of LSTM/BiLSTM units	64	64
No. of neurons in dense layer units	32	64
Optimizer learning rate	$10^{- 4}$	$10^{- 4}$
Batch size	32	128
Dropout rate	0.3	0.3
Max pooling size	(1,2)	(1,2)

Table 3. Comparison of CNN–LSTM and CNN–BiLSTM models in terms of complexity (number of parameters and memory), validation and test accuracy, and computational cost (FLOPs per request).

Model	#Params	Memory	Validation Acc.	Test Acc.	FLOPs/req
CNN–LSTM (Algorithm 1)	∼0.18 M	∼0.71 MB	$0.9356$	0.9265	∼23 M
CNN–BiLSTM (Algorithm 2)	∼0.72 M	∼2.9 MB	0.9511	0.9410	∼150 M

Table 4. Complexity calculation for the CNN–LSTM model.

Layer	Complexity	Values	Total
Conv2D (8, $1 \times 3$ )	$(k_{h} k_{w} C_{in} C_{out}) + C_{out}$	$(1 \times 3 \times 1 \times 8) + 8$	32
BatchNorm ( $C = 8$ )	$2 C$	$2 \times 8$	16
Conv2D (16, $1 \times 3$ )	$(k_{h} k_{w} C_{in} C_{out}) + C_{out}$	$(1 \times 3 \times 8 \times 16) + 16$	400
BatchNorm ( $C = 16$ )	$2 C$	$2 \times 16$	32
Conv2D (32, $1 \times 3$ )	$(k_{h} k_{w} C_{in} C_{out}) + C_{out}$	$(1 \times 3 \times 16 \times 32) + 32$	1568
BatchNorm ( $C = 32$ )	$2 C$	$2 \times 32$	64
LSTM (64)	$4 [(D + H) H + H]$	$4 [(640 + 64) \times 64 + 64]$	180,480
Dense (32)	$(U_{in} U_{out}) + U_{out}$	$(64 \times 32) + 32$	2080
Dense (1)	$(U_{in} U_{out}) + U_{out}$	$(32 \times 1) + 1$	33
Grand total			184,705

Table 5. Complexity calculation for the CNN–BiLSTM model.

Layer	Complexity	Values	Total
Conv2D (16, $3 \times 3$ )	$(k_{h} k_{w} C_{in} C_{out}) + C_{out}$	$(3 \times 3 \times 1 \times 16) + 16$	160
BatchNorm ( $C = 16$ )	$2 C$	$2 \times 16$	32
Conv2D (32, $3 \times 3$ )	$(k_{h} k_{w} C_{in} C_{out}) + C_{out}$	$(3 \times 3 \times 16 \times 32) + 32$	4640
BatchNorm ( $C = 32$ )	$2 C$	$2 \times 32$	64
Conv2D (64, $3 \times 3$ )	$(k_{h} k_{w} C_{in} C_{out}) + C_{out}$	$(3 \times 3 \times 32 \times 64) + 64$	18,496
BatchNorm ( $C = 64$ )	$2 C$	$2 \times 64$	128
BiLSTM (64)	$2 \times (4 [(D + H) H + H])$	$2 (4 [(1280 + 64) \times 64 + 64])$	688,640
Dense (64)	$(U_{in} U_{out}) + U_{out}$	$(128 \times 64) + 64$	8256
Dense (1)	$(U_{in} U_{out}) + U_{out}$	$(64 \times 1) + 1$	65
Grand total			720,481

Table 6. Performance comparison of CNN–BiLSTM and CNN–LSTM with baseline methods.

Model	Test Accuracy	Test Loss	Macro F1-Score	Weighted F1-Score	Training Time (s)
CNN–BiLSTM (this work)	0.9410	0.1845	0.9401	0.9409	3771
CNN–LSTM (this work)	0.9265	0.2168	0.9255	0.9265	1966
CNN2D [1]	0.9217	0.2281	0.9208	0.9212	23,561
CNN1D [1]	0.8426	0.4201	0.8425	0.8432	135,960
SVM	0.9091	–	0.9082	0.9093	–
KNN	0.7729	–	0.7686	0.7724	–

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Nourmohammadi, F.; Comellas, J.; Kaymak, U. Hybrid Spatio-Temporal CNN–LSTM/BiLSTM Models for Blocking Prediction in Elastic Optical Networks. Network 2025, 5, 44. https://doi.org/10.3390/network5040044

AMA Style

Nourmohammadi F, Comellas J, Kaymak U. Hybrid Spatio-Temporal CNN–LSTM/BiLSTM Models for Blocking Prediction in Elastic Optical Networks. Network. 2025; 5(4):44. https://doi.org/10.3390/network5040044

Chicago/Turabian Style

Nourmohammadi, Farzaneh, Jaume Comellas, and Uzay Kaymak. 2025. "Hybrid Spatio-Temporal CNN–LSTM/BiLSTM Models for Blocking Prediction in Elastic Optical Networks" Network 5, no. 4: 44. https://doi.org/10.3390/network5040044

APA Style

Nourmohammadi, F., Comellas, J., & Kaymak, U. (2025). Hybrid Spatio-Temporal CNN–LSTM/BiLSTM Models for Blocking Prediction in Elastic Optical Networks. Network, 5(4), 44. https://doi.org/10.3390/network5040044

Article Menu

Hybrid Spatio-Temporal CNN–LSTM/BiLSTM Models for Blocking Prediction in Elastic Optical Networks

Abstract

1. Introduction

2. Methodology

2.1. Problem Definition

2.2. Approach

3. Network Simulator and Data Generation

Data Preparation

4. Proposed Models

4.1. Proposed CNN–LSTM (Algorithm 1)

4.2. Proposed CNN–BiLSTM (Algorithm 2)

4.3. Key Differences Between Proposed Algorithms

5. Hyperparameter Tuning

Hyper-Parameter Correlation Analysis

6. Validation Criteria

6.1. Confusion Matrix

6.2. Accuracy

6.3. Loss

6.4. Precision

6.5. Recall (Sensitivity)

6.6. F1-Score

7. Training

8. Evaluation and Result

8.1. Complexity, Memory, and Accuracy Comparison

8.2. Complexity Analysis

9. Discussion

10. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

Abbreviations

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI