A Dual-Attention CNN–GCN–BiLSTM Framework for Intelligent Intrusion Detection in Wireless Sensor Networks

Baniata, Laith H.; ALDabbas, Ashraf; Atwan, Jaffar M.; Alahmer, Hussein; Elmasri, Basil; Bunterngchit, Chayut

doi:10.3390/fi18010005

Open AccessArticle

A Dual-Attention CNN–GCN–BiLSTM Framework for Intelligent Intrusion Detection in Wireless Sensor Networks

by

Laith H. Baniata

^1,*

,

Ashraf ALDabbas

²

,

Jaffar M. Atwan

^2,3

,

Hussein Alahmer

¹

,

Basil Elmasri

²

and

Chayut Bunterngchit

^4,*

¹

Department of Autonomous Systems, Faculty of Artificial Intelligence, Al-Balqa Applied University, Al-Salt 19117, Jordan

²

Intelligent Systems Department, Faculty of Artificial Intelligence, Al-Balqa Applied University, Al-Salt 19117, Jordan

³

Department of Computer Science, Faculty of Information Technology, Applied Science Private University, Amman 11937, Jordan

⁴

Division of Industrial and Logistics Engineering Technology, Faculty of Engineering and Technology, King Mongkut’s University of Technology North Bangkok, Rayong Campus, Rayong 21120, Thailand

^*

Authors to whom correspondence should be addressed.

Future Internet 2026, 18(1), 5; https://doi.org/10.3390/fi18010005 (registering DOI)

Submission received: 17 November 2025 / Revised: 11 December 2025 / Accepted: 18 December 2025 / Published: 22 December 2025

(This article belongs to the Special Issue Information and Future Internet Security, Trust and Privacy—4th Edition)

Download

Browse Figures

Versions Notes

Abstract

Wireless Sensor Networks (WSNs) are increasingly being used in mission-critical infrastructures. In such applications, they are evaluated on the risk of cyber intrusions that can target the already constrained resources. Traditionally, Intrusion Detection Systems (IDS) in WSNs have been based on machine learning techniques; however, these models fail to capture the nonlinear, temporal, and topological dependencies across the network nodes. As a result, they often suffer degradation in detection accuracy and exhibit poor adaptability against evolving threats. To overcome these limitations, this study introduces a hybrid deep learning-based IDS that integrates multi-scale convolutional feature extraction, dual-stage attention fusion, and graph convolutional reasoning. Moreover, bidirectional long short-term memory components are embedded into the unified framework. Through this combination, the proposed architecture effectively captures the hierarchical spatial–temporal correlations in the traffic patterns, thereby enabling precise discrimination between normal and attack behaviors across several intrusion classes. The model has been evaluated on a publicly available benchmarking dataset, and it has been found to attain higher classification capability in multiclass scenarios. Furthermore, the model outperforms conventional IDS-focused approaches. In addition, the proposed design aims to retain suitable computational efficiency, making it appropriate for edge and distributed deployments. Consequently, this makes it an effective solution for next-generation WSN cybersecurity. Overall, the findings emphasize that combining topology-aware learning with multi-branch attention mechanisms offers a balanced trade-off between interpretability, accuracy, and deployment efficiency for resource-constrained WSN environments.

Keywords:

Wireless Sensor Networks (WSN); Intrusion Detection System (IDS); deep learning; multi-scale convolution; graph convolutional networks; attention mechanism; Bidirectional LSTM; WSN-DS dataset; cybersecurity; edge computing

Graphical Abstract

1. Introduction

Wireless Sensor Networks (WSNs) are an essential part of the Internet of Things (IoT) that are increasingly becoming common in critical infrastructures, environmental monitoring, healthcare, and industrial control systems [1,2,3,4,5]. These networks have revolutionized the sensing and communication frameworks. Nevertheless, the distributed and energy-constrained nature of these networks makes them susceptible to cyber intrusions [6,7,8]. Some of the commonly found attacks involve Denial of Service (DoS), jamming, sinkhole, blackhole, and selective forwarding. Such attacks lead to a compromise in the data integrity, network reliability, and real-time decision-making. Thus, the development of efficient and intelligent Intrusion Detection Systems (IDS) is essential for protecting the WSNs while ensuring their energy efficiency and scalability [6,7,8].

In addition to cyber intrusions, Wireless Sensor Networks are also highly vulnerable to a variety of non-cyber threats that arise from their deployment conditions and hardware limitations. Real-world WSN deployments frequently suffer from harsh environmental influences such as temperature fluctuations, humidity, rain exposure, radiation, and electromagnetic interference. These factors degrade sensing precision and can lead to erroneous or missing measurements. WSNs are also affected by hardware noise, node-level faults, power instability, and interference from external electronic devices. Furthermore, issues such as data redundancy, data duplicity, and inconsistent sampling can negatively impact the quality and reliability of sensor readings. These vulnerabilities have been comprehensively analyzed in recent studies, which demonstrate how environmental distortions and incorrect sensor outputs can propagate through the network and degrade decision-making quality in critical applications [9,10]. Recent surveys covering Industry 4.0 applications also highlight the sensitivity of WSNs and IoT infrastructures to both cyber and non-cyber disruptions, reinforcing that WSN technology remains inherently fragile and exposed to multi-domain risks [9,10]. These discussions are elaborated in n1 and n2, which present extensive reviews and mitigation strategies for environmental disturbances, hardware-induced noise, and data-level inconsistencies in WSN systems.

The traditional WSN IDS framework predominantly relies on statistical analysis and rule-based systems. The classical Machine Learning (ML) algorithms like Random Forest (RF), Support Vector Machine, k-Nearest Neighbors (kNN), and Decision Trees (DT) offer interpretability and low computational overhead [11,12,13,14], yet multiple studies have reported that their performance degrades under dynamic topologies and nonlinear attack patterns that are commonly found in real-world deployments [15,16,17,18,19,20]. Some of the recent advancements towards Deep Learning (DL)-based IDSs have explored architectures like Convolutional Neural Networks (CNNs), Long Short-Term Memory (LSTM), and a hybrid combination of CNN-LSTMs [11,12,13]. In addition, autoencoders and graph neural networks have also been explored for capturing hierarchical and temporal dependencies in the traffic behavior. While offering higher accuracies, these methods exhibit certain limitations: (i) high energy and memory footprints unsuited for low-power sensor nodes, (ii) poor generalization to unseen attack types and evolving network conditions, and (iii) lack of interpretability and privacy preservation in distributed scenarios. Such gaps have been targeted by several studies in recent times.

The proposed framework introduces a multi-branch hybrid DL framework optimized for the IDS in WSNs. The model integrates multi-scale CNN blocks, attention-based fusion layers, graph convolutional network (GCN) operations, and BiLSTM to effectively capture the multi-resolution, spatial, and temporal dependencies in the WSN-DS traffic. The multi-scale CNN layers have been used to extract the hierarchical frequency–temporal patterns. The attention fusion dynamically re-weights the salient spatial–temporal features. The GCN components help in encoding the inter-node topological relationships, and the BiLSTM layers model the bidirectional temporal correlations. These help in enhancing the detection of subtle and evolving attack behaviors. Finally, the dense layers and softmax classifiers produce probabilistic intrusion classifications. Compared to previous models, the proposed framework is lightweight, modular, and optimized for both centralized and distributed detection schemes. It helps in offering improved generalization towards unseen intrusions and maintains computational efficiency in real-time edge deployments. Overall, the study contributes as follows:

1.: It integrates multi-scale CNN, attention fusion, GCN, and BiLSTM to capture comprehensive spatio-temporal dynamics of WSN traffic.
2.: The model learns hierarchical and context-aware embeddings that improve separability between normal and anomalous traffic. This is attained through multi-branch feature extraction and adaptive attention weighting.
3.: It introduces advanced preprocessing and normalization steps to ensure stability.

The remainder of the article has been structured as follows. Section 2 presents the related work. Section 3 describes the methodological framework, including preprocessing steps, model design, and evaluation procedures. Section 4 reports the experimental results and provides a critical analysis of the findings as well as comparisons with the benchmark models. Section 5 concludes the study and outlines potential directions for future research.

2. Related Works

Houda et al. [21] offered a collaborative federated learning framework that makes use of a secure aggregation protocol for the detection of jamming (including contrast, random, reactive, and deceptive). The study attained an accuracy of ≈99%, yet the focus has been limited to jamming behaviors while further assuming federated learning connectivity. The broader multi-attack generalization and on-device energy and communication costs have not been incorporated in the proposed design. Jeyakumar et al. [22] proposed a hybrid stacked CNN-Bidirectional LSTM (BiLSTM). The model has been tuned by an African vulture optimization algorithm and trained using federated learning. The communication and aggregation overheads, along with the robustness of federated learning under, have not been fully quantified in the study. In addition, the interpretability and node-level resource constraints are found to be partially addressed. In Zhou et al. [23], a tabular-to-image transform was incorporated by using transfer learning. The article combines MobileNet and Xception with the black kite algorithm for hyperparameter search and ensemble. The model was found to offer a heavy computational footprint with a lack of explainability and uncertainty for real-world deployments.

Halbouni et al. [15] adopted CNN and LSTM for spatial and temporal features for classification using three datasets. The models offered limited treatment of class imbalance and WSN energy/latency constraints. Vinayakumar et al. [16] focused on a systematic DL benchmarking compared to classical ML using multiple intrusion datasets. The benchmarking datasets were found to lack WSN-specific topology and context modeling for interpretability. The false positive control was found under multi-attack scenarios. Birahim et al. [24] adopted Particle Swarm Optimization (PSO) for the feature and hyperparameter search using an ensemble (RF/DT/kNN) and class imbalance handled using Synthetic Minority Oversampling Technique (SMOTE)-Tomek. The study failed to model the network-structure awareness and temporal dependencies. In addition scalability of the models was not fully addressed. Hakami et al. [17] presented a pipeline with SMOTE for the balancing of Pearson correlation for feature selection. While using the WSN-DS dataset, the studies offered fair performance yet failed to address the privacy-preserving learning along with computational resource requirements. Alzahrani et al. [18] presented a ConvLSTM model offering benchmarking performance in several datasets. However, it was tailored for Unmanned Aerial Vehicle (UAV) networks with poor transferability towards resource-constrained environments. Similarly, a Red Kite Optimization framework [25] was proposed that focused on average ensemble, Lévy-fight Chaotic Whale Optimization Algorithm (LCWOA), and hyperparameter tuning. The model relies on heuristic feature selection and moving ensembles without offering temporal and graph modeling. Atitallah et al. [26], Jiang et al. [27], and Saleh et al. [28] collectively enhanced detection accuracy using fuzzy-graph attention, meta-heuristic optimization, and Stochastic Gradient Descent (SGD)-based learning. Yet their real-time adaptability and scalability are limited for making lightweight deployments.

A summary of these articles has been presented in Table 1.

The studies reviewed above help in highlighting substantial progress in the IDS detection approaches; however, they have several challenges. The federated and optimization-based models, including Federated SCNN-BiLSTM and AVOA, along with Privacy-Preserving FL for jamming, have improved distributed learning, yet they face communication and synchronization bottlenecks. The optimization-driven frameworks, including CBCTL-IDS, RKOA, AEID, and ST-IAOA-XGBoost, have reported detection accuracies typically ranging from 96% to 99% on WSN-DS and related intrusion datasets. However, despite these high accuracies, these methods remain computationally heavy for deployment on resource-constrained WSN nodes. The explainable artificial intelligence (XAI) techniques, including PSO-Ensemble and LIME/SHAP, have enhanced interpretability, yet they lack real-time adaptability and spatial–temporal reasoning. Fuzzy graph attention networks capture topological relations, yet suffer from high complexity and limited scalability in constrained WSN environments. Overall, the following research gaps have been identified:

Deployment Realism: Existing models overlook computational and energy limitations of distributed sensor nodes.
Temporal–Spatial Dependency: Most IDSs fail to jointly model both the temporal evolution of attacks and spatial correlations among nodes.
Dynamic Adaptation: Static training prevents adaptation to changing traffic distributions and novel intrusions.
Interpretability and Fusion: Few works integrate multi-level feature fusion or interpretable decision mechanisms within hybrid deep architectures.

3. Materials and Methods

3.1. Design Framework

The proposed framework entails multi-scale convolutional filters, attention-based fusions, and BiLSTM for the extraction of spatio-temporal dependencies in the WSN traffic. The overall structure of the proposed IDS framework has been presented in Algorithm 1.

Algorithm 1 Proposed Intrusion Detection Framework.

Require:: Dataset $D$ , learning rate $η$ , epochs E, batch size B
Ensure:: $Θ^{★}$
1:: Split $D \to (D_{train}, D_{val}, D_{test})$
2:: Apply Min–Max normalization; rank features by $χ^{2}$
3:: Reshape inputs to $X \in R^{k \times 1}$ , where k is the number of selected features ( $k = 16$ )
4:: for $e = 1$ to E do
5:: Forward →:
6:: Apply 1D convolutions with kernel sizes $k_{i} \in {3, 5}$ to X to obtain branch feature maps $C_{i}$ , $i \in {1, 2}$
7:: Concatenate ${[C_{i}]}_{i}$ , apply batch normalization and dropout $\to \tilde{C}$
8:: Compute spatial attention $A_{s}$ on $\tilde{C}$ , temporal attention $A_{t}$ on $A_{s}$
9:: Build graph features H via GCN( $A_{s}, A_{t}$ ); obtain h via BiLSTM(H)
10:: Compute logits $\hat{y} = softmax (W h + b)$
11:: Backward ⇐:
12:: Compute gradients $\nabla_{Θ} L$ and update $Θ \leftarrow Θ - η \nabla_{Θ} L$ (Adam)
13:: Validate on $D_{val}$ and save best checkpoint
14:: end for
15:: Test on $D_{test}$ ; report accuracy, precision, recall, and F1

Here, the index i in Step 7 enumerates the multi-scale convolutional branches with kernel sizes

k_{i} \in {3, 5}

. This is consistent with the feature maps defined in the multi-scale convolutional block.

It is important to note that the proposed IDS framework is not restricted to a centralized implementation. While the experimental evaluation in this work assumes offline training on a single computing device, the model architecture itself allows for fully distributed execution. During inference, each sensor node only requires access to the local feature vector and the lightweight prediction head, whereas the attention maps and graph convolution can be pre-computed or periodically updated by a cluster head or edge gateway. This enables a hybrid deployment mode in which training remains centralized, but inference can be distributed across the network with minimal communication overhead. A fully decentralized variant, where nodes exchange only compact embeddings rather than raw traffic data, is also feasible and will be explored as future work.

3.2. Dataset Description

The study employed the WSN-DS dataset for carrying out the experimentation and evaluations. The WSN-DS dataset, developed by Almomani et al. [29], is specifically designed for the detection of Denial-of-Service (DoS) attacks and consists of 374,661 records, with approximately 9% labeled as DoS incidents. This proportion corresponds to a moderately imbalanced but still realistic operating condition for WSN intrusion detection. The proposed model does not enforce a hard upper bound on the acceptable number of DoS incidents: in principle, it can be retrained under different attack prevalences as long as the training data reflects the target deployment scenario. However, as with most supervised IDS models, extreme shifts in the ratio between normal and attack traffic (for example, when DoS incidents dominate the majority of the traffic) may affect calibration of the decision threshold and could require rebalancing strategies or threshold tuning. In this work, all experiments are conducted under the original WSN-DS class proportions, and a systematic robustness study over varying DoS rates is left as future work. This dataset was constructed using the LEACH protocol, a widely adopted hierarchical routing protocol in WSNs, and encompasses both normal network behavior and four distinct types of DoS attacks: Grayhole, Blackhole, Time Division Multiple Access (TDMA) schedule manipulation, and Flooding. In Grayhole attacks, a compromised node selectively drops a subset of packets. In Blackhole attacks it drops almost all forwarded traffic. TDMA-based attacks disrupt the Time Division Multiple Access scheduling by corrupting or hijacking time slots. This leads to collisions and packet losses, while Flooding attacks inject excessive traffic to exhaust bandwidth and node energy. Data collection was performed using Network Simulator 2 (NS-2), and the resulting traces were processed to extract 18 relevant features. Due to its comprehensive structure and labeled attack scenarios, the WSN-DS dataset serves as a valuable benchmark for researchers developing intrusion detection strategies and enhancing the security of WSNs [30].

The dataset has been treated as a benchmark corpus for the IDS in the WSNs. Each of the records in the data comprises 18 continuous-valued attributes that represent traffic, energy, and protocol-level indicators. These are followed by a categorical class label

y \in {C_{1}, C_{2}, \dots, C_{5}}

corresponding to different attack or normal states. The data attributes are represented as follows:

D = {(x_{i}, y_{i}) ∣ x_{i} \in R^{18}, y_{i} \in {1, \dots, 5}, i = 1, \dots, N}

(1)

where N denotes the total number of samples. The dataset was partitioned into training and testing subsets with an 80:20 ratio. The 80:20 split ratio is a commonly adopted convention in supervised learning and intrusion detection studies, providing a balance between sufficient data for training and a representative portion for unbiased testing. This ratio also matches prior work using the WSN-DS dataset, enabling consistent comparison with existing IDS baselines. We validated that alternative splits (such as 70:30 and 75:25) produced similar trends, and therefore adopted the standard 80:20 split for clarity and comparability.

In the experimental pipeline,

D_{train}

is further randomly split into a training and validation subset, denoted by

D_{train}^{(t r)}

and

D_{val}

, respectively. This yields the three-way partition used in Algorithm 1:

D_{train}^{(t r)} \cup D_{val} \cup D_{test} = D

, with all subsets pairwise disjoint. The high-level notation in (2) therefore describes the initial train–test split, while Algorithm 1 explicitly exposes the internal validation fold carved out of

D_{train}

.

D_{t r a i n} \cup D_{t e s t} = D, D_{t r a i n} \cap D_{t e s t} = \emptyset

(2)

The following features are part of the dataset described in Table 2:

3.3. Data Preprocessing

To allow numerical stability and optimal convergence, the features have been subjected to normalization by using min–max scaling as follows:

x^{'} = \frac{x - min (x)}{max (x) - min (x)} \in [0, 1]

(3)

Feature selection was achieved using the

χ^{2}

statistical test. For each feature

f_{j}

, the relevance score was computed as

χ^{2} (f_{j}) = \sum_{i = 1}^{m} \frac{{(O_{i j} - E_{i j})}^{2}}{E_{i j}}

(4)

where

O_{i j}

and

E_{i j}

denote observed and expected frequencies across class distributions. The top-16 features with the highest

χ^{2}

scores were retained:

X^{'} = {SelectKBest}_{χ^{2}, k = 16} (X)

(5)

Additionally, label encoding was applied to the target variable, which was categorical in nature and therefore unsuitable for direct use in ML algorithms. Label encoding is a common preprocessing technique that transforms categorical labels into numerical representations, making them more suitable for algorithmic processing, particularly when the target classes are limited and discrete. The WSN-DS dataset includes five target categories: Blackhole, Flooding, Grayhole, Normal, and TDMA. Each class was assigned a unique numerical identifier, as summarized in Table 3, to ensure compatibility with supervised learning models.

Because the WSN-DS dataset contains a much smaller proportion of DoS attack samples than normal traffic, the dataset is imbalanced. To prevent the model from being biased toward the majority class, a class weighting strategy was applied during training. Each class was assigned a weight that is inversely related to its frequency in the dataset. This allows minority attack classes to have a stronger influence on the loss function without altering the original distribution of samples. Oversampling and undersampling were not used, as they may distort the temporal and structural properties of the data. The class weighting approach was found to stabilize training and support balanced detection performance across all categories.

3.4. Feature Engineering

Feature transformation for temporal learning was performed via 3D reshaping:

X^{''} \in R^{n \times k \times 1}, where k = 16

(6)

This embedding enables convolutional and recurrent layers to explore the localized dependencies.

3.5. Model Design

The proposed hybrid framework

M (Θ)

constitutes multiple branches. It is a hierarchically coupled architecture designed to capture spatial–temporal–spectral correlations.

Θ

denotes the complete set of learnable parameters of the model

M

, including convolutional kernels, recurrent weights, normalization matrices, and bias terms. The model comprises four principal computational entities: a Multi-Scale Convolutional Block for local feature extraction, an Attention Fusion Layer for dynamic context weighting, a Graph Convolutional Module for structural regularization, and a Bidirectional LSTM with Contextual Attention for temporal propagation modeling. These integrated modules help in collectively representing the learnable tensors that involve convolutional kernels, recurrent weights, normalization matrices, and bias offsets. The model design has been depicted in Figure 1.

3.5.1. Multi-Scale Convolutional Block

The normalized sequence tensor

X \in R^{k \times 1}

represents the compacted feature manifold. The multi-scale convolutional extractor performs convolutions at multiple receptive field scales for capturing heterogeneous spatial dependencies:

\begin{matrix} F_{1} & = σ (N_{1} (W_{1} *_{3} X + b_{1})), \end{matrix}

(7)

\begin{matrix} F_{2} & = σ (N_{2} (W_{2} *_{5} X + b_{2})), \end{matrix}

(8)

\begin{matrix} F_{3} & = σ (N_{3} (W_{3} *_{7} X + b_{3})), \end{matrix}

(9)

where

*_{n}

denotes 1D convolution with kernel size n,

N_{i} (\cdot)

indicates batch normalization, and

σ (\cdot)

is the ReLU activation. The multi-scale responses are concatenated into a composite feature tensor:

F_{m s} = [F_{1} ∥ F_{2} ∥ F_{3}] \in R^{k \times d_{m s}},

(10)

where

d_{m s}

denotes the total concatenated dimensionality. A dropout mapping

D_{p}

with stochastic rate

p = 0.3

is subsequently applied to prevent co-adaptation:

{\tilde{F}}_{m s} = D_{p} (F_{m s}) .

(11)

This operation enforces robustness to local perturbations. In addition, it preserves gradient stability across convolutional depths. Unless otherwise stated, the nonlinearities

σ (\cdot)

in the convolutional block,

ξ (\cdot)

in the graph convolution, and

ϕ (\cdot)

in the dense projection layers are all implemented as the standard ReLU activation in our experiments; distinct symbols are used only to emphasize their role in different submodules of the architecture.

3.5.2. Dual-Stage Attention Fusion

The fused features

{\tilde{F}}_{m s}

are passed into a dual-stage attention mechanism. These have been designed to disentangle spatial and temporal significance within the feature domain. Let

Q_{s}, K_{s}, V_{s} \in R^{T \times d_{k}}

represent the query, key, and value embeddings for the spatial attention subspace:

A_{s} = softmax (\frac{Q_{s} K_{s}^{T}}{\sqrt{d_{k}}} + M_{s}) V_{s},

(12)

where

M_{s}

is a learned bias mask regulating sparsity across nodes. The temporal refinement stage analogously computes:

A_{t} = softmax (\frac{(Q_{t} K_{t}^{T}) W_{τ} + B_{τ}}{\sqrt{d_{k}}}) V_{t},

(13)

where

W_{τ}

introduces a learnable transformation capturing cross-time contextual drift. The joint fused representation is then expressed as

A_{f u s e d} = α A_{s} + (1 - α) A_{t} + λ (A_{s} ⊙ A_{t}),

(14)

where

α

and

λ

are trainable coupling coefficients, and ⊙ denotes element-wise interaction. This composite fusion reinforces both spatially localized and temporally evolving intrusion cues. The coupling coefficients

α

and

λ

are implemented as trainable scalar parameters. They are initialized as

α^{(0)} = 0.5

and

λ^{(0)} = 0.1

and passed through a sigmoid nonlinearity, ensuring that

0 < α, λ < 1

during training. This simple parameterization allows the model to learn the relative importance of spatial attention, temporal attention, and their interaction in a stable and reproducible manner.

3.5.3. Graph Convolutional Regularization

To embed topological priors of the WSN, an adjacency matrix

A \in R^{n \times n}

is constructed. This encodes communication reachability among sensor nodes. The spectral graph convolution for layer l is formulated as

H^{(l + 1)} = ξ ({\tilde{D}}^{- \frac{1}{2}} \tilde{A} {\tilde{D}}^{- \frac{1}{2}} H^{(l)} W^{(l)} + γ H^{(l)}),

(15)

where

\tilde{A} = A + I

ensures self-loops,

{\tilde{D}}_{i i} = \sum_{j} {\tilde{A}}_{i j}

is the degree matrix,

ξ (\cdot)

is a nonlinear mapping (ReLU), and

γ

is a residual stability factor.

This operation integrates both direct communication correlations and latent dependencies that are inferred using the higher-order neighborhoods.

In Algorithm 1, the call

GCN (A_{s}, A_{t})

indicates that the static WSN topology is refined using the attention maps. Concretely, we first build a binary communication graph

A_{stat}

from the physical connectivity of the nodes. The spatial attention matrix

A_{s}

is then symmetrized, row-normalized and combined with

A_{stat}

to form the effective adjacency used in (15), i.e.,

A = η A_{stat} + (1 - η) {\tilde{A}}_{s}

, with

η \in (0, 1)

. The temporal attention

A_{t}

is used to reweight node features before graph convolution, so Equation (15) only shows the final fused adjacency A for brevity, while Algorithm 1 highlights that both

A_{s}

and

A_{t}

contribute to the GCN input.

In this work, the graph used by the GCN does not simulate packet-level transmissions or physical-layer communication events. Instead, each node in the graph corresponds to a sensor node in the simulated WSN, and edges encode communication reachability derived from the LEACH-based topology generated in NS-2. The adjacency matrix therefore represents a structural connectivity map rather than dynamic packet exchanges. The input features to the GCN are aggregated traffic and protocol indicators (e.g., ADV, JOIN, SCH, DATA counts), while the GCN serves as a structural regularizer that propagates feature information according to the underlying WSN topology. This design captures how traffic patterns correlate across neighboring nodes, without modeling individual packet transmissions or PHY/MAC-layer behaviors.

The network structure is obtained in two steps. First, a static connectivity matrix is created from the LEACH topology included in the WSN-DS dataset. A link is assigned between two nodes whenever they belong to the same cluster or when one node operates as a cluster head for the other. This captures the communication pattern that typically appears in wireless sensor networks. Second, the attention module generates a data-driven similarity matrix that reflects how strongly different nodes behave in relation to one another during operation. The final adjacency matrix used by the GCN layer is produced by combining the static connectivity information with the similarity information from the attention mechanism. This blended representation preserves genuine wireless network structure while also allowing the model to account for dynamic relations learned from data.

3.5.4. Bidirectional LSTM with Contextual Attention

To capture temporal recurrence and bidirectional dependencies, the model employs forward and backward LSTMs defined as

\begin{matrix} {\vec{h}}_{t} & = f_{LSTM} (x_{t}, {\vec{h}}_{t - 1}; Θ_{f}), \end{matrix}

(16)

\begin{matrix} {\overset{\leftarrow}{h}}_{t} & = f_{LSTM} (x_{t}, {\overset{\leftarrow}{h}}_{t + 1}; Θ_{b}), \end{matrix}

(17)

yielding the contextual embedding. Here,

Θ_{f}

and

Θ_{b}

represent the sets of learnable parameters (weights and biases) for the forward and backward LSTM networks, respectively.

H_{t} = [{\vec{h}}_{t} ∥ {\overset{\leftarrow}{h}}_{t}] .

(18)

An adaptive attention mechanism refines

H_{t}

into a context-weighted summary vector:

h_{a t t} = \sum_{t = 1}^{T} α_{t} H_{t}, α_{t} = \frac{exp (u_{t}^{⊤} w_{a})}{\sum_{i = 1}^{T} exp (u_{i}^{⊤} w_{a})}, u_{t} = tanh (W_{u} H_{t} + b_{u}),

(19)

where

w_{a}

serves as the attention query vector, optimizing temporal salience through soft alignment.

3.5.5. Hierarchical Aggregation and Output Projection

The contextual embedding

h_{a t t}

is aggregated through a hierarchical fusion of global average and maximum pooling:

z = β_{1} \cdot \frac{1}{T} \sum_{t = 1}^{T} h_{a t t, t} + β_{2} \cdot max_{t} (h_{a t t, t}),

(20)

where

β_{1}

and

β_{2}

are learned weighting scalars enforcing balanced statistical and extremal emphasis. Operationally, the first term in the equation computes a global average pooling of the attention-weighted temporal representations. The second term performs a dimension-wise global max pooling across time. The scalars

β_{1}

and

β_{2}

therefore control how much the model relies on average behavior versus peak responses. In practice, both beta parameters are treated as trainable scalars rather than manually tuned hyperparameters. They are initialized to equal values (0.5, 0.5) and passed through a softmax layer, which ensures that they remain positive and sum to one throughout training. This allows the model to automatically learn the optimal contribution of average pooling versus max pooling based on the data, removing the need for manual parameter tuning. In implementation, both pooling operations are applied channel-wise to

h_{a t t}

, and the resulting vectors are linearly combined using the learned weights to produce the aggregated descriptor z.

The resultant descriptor z traverses two nonlinear dense transformations under

L_{2}

regularization:

z^{'} = ϕ (W_{1} z + b_{1}) + ρ ϕ (W_{2} z + b_{2}),

(21)

where

ϕ

denotes ReLU activation and

ρ

acts as a dense fusion coefficient. Finally, the class posterior distribution over intrusion categories is modeled as

\hat{y} = softmax (W_{o} z^{'} + b_{o}),

(22)

with optimization governed by categorical cross-entropy loss:

L_{C E} = - \frac{1}{N} \sum_{i = 1}^{N} \sum_{c = 1}^{C} y_{i, c} log {\hat{y}}_{i, c} .

(23)

This hierarchical deep representation ensures that

M (Θ)

can capture localized transient anomalies. It further encodes persistent cross-node intrusion dynamics characteristic of WSN attack behavior.

Similarly, the aggregation weights

β_{1}

and

β_{2}

in (18) are trainable scalars. We initialize them as

β_{1}^{(0)} = β_{2}^{(0)} = 0.5

and apply a softmax over

(β_{1}, β_{2})

at each forward pass, which enforces

β_{1}, β_{2} > 0

and

β_{1} + β_{2} = 1

. This normalization constrains the aggregation to a convex combination of average and max pooling, which facilitates reproducibility of the reported results.

3.6. Evaluation and Simulation

To ensure consistency throughout the methodology, the same data splitting procedure was used in all experiments. The WSN-DS dataset was first divided into an 80:20 split for training and testing, identical to the configuration described earlier. Within the 80% training portion, a further 90:10 split was applied to create the validation set. This results in a final partition of 72% for training, 8% for validation, and 20% for testing. These ratios are used consistently across all evaluations reported in this section.

The model was implemented in TensorFlow 2.15 and trained on a workstation equipped with an Intel Core i7 CPU, 32 GB RAM, and a single NVIDIA RTX-class GPU with CUDA acceleration. Hyperparameters and simulation settings are summarized in Table 4. Training was performed offline in this environment, while inference is intended to run on an edge gateway or sink node in a WSN deployment, rather than on individual sensor motes. With approximately

9.7 \times 10^{4}

trainable parameters and a single BiLSTM layer of moderate width, the model remains lightweight compared with many deep IDS architectures. In practice, this translates into a small memory footprint and low inference latency on a modern embedded CPU or GPU-class edge device, which is compatible with typical WSN gateway or base-station hardware.

An early-stopping criterion with a patience of 5 epochs on the validation loss was used to prevent overfitting; in practice, training converged within 12 epochs.

Performance evaluation has been carried out using Accuracy (

A c c

), Precision (P), Recall (R), and F1-score (

F_{1}

):

\begin{matrix} A c c & = \frac{T P + T N}{T P + T N + F P + F N} \end{matrix}

(24)

\begin{matrix} P & = \frac{T P}{T P + F P} \end{matrix}

(25)

\begin{matrix} R & = \frac{T P}{T P + F N} \end{matrix}

(26)

\begin{matrix} F_{1} & = 2 \times \frac{P \times R}{P + R} \end{matrix}

(27)

where

T P

,

T N

,

F P

, and

F N

denote true positives, true negatives, false positives, and false negatives, respectively. Confusion matrices and learning curves were used to further assess classification stability and convergence.

4. Results

The performance analysis of the proposed IDS in WSN has been carried out under multi-tier analysis. The system has been implemented using Python version 3.10.12 and TensorFlow 2.20 while using Keras 3.13 backends. The dataset has been divided into three sub-sets (training, validation, and testing). The model was allowed to train for up to 30 epochs using the Adam optimizer with a learning rate of

η = 10^{- 3}

. However, early stopping based on the validation loss halted training after 12 epochs in all runs. Consequently, Figure 2 reports the 12 epochs that were actually executed.

4.1. Training and Validation Performance

The model convergence behavior has been depicted in Figure 2. This depicts the evolution of both the accuracy and loss across epochs. The training accuracy has been found to consistently increase with the increasing epochs. The stabilization is approximately

98 %

after epoch 10. This indicates that a fast convergence has been attained along with strong generalization. The validation accuracy also follows the same trajectory. This suggests that the overfitting has been successfully mitigated through the inclusion of dropout and batch normalization layers.

The model’s stability can be characterized by the loss differential:

Δ L (t) = | L_{t r a i n} (t) - L_{v a l} (t) |,

(28)

This asymptotically approaches zero as

t \to T_{f i n a l}

and leads to confirming the convergence without oscillation and divergence. This results from the adaptive gradient dynamics, which are inherent in the Adam optimizer, represented as follows:

Θ_{t + 1} = Θ_{t} - \frac{η}{\sqrt{{\hat{v}}_{t}} + ϵ} {\hat{m}}_{t},

(29)

where

{\hat{m}}_{t}

and

{\hat{v}}_{t}

denote bias-corrected first and second moment estimates, respectively.

4.2. Confusion Matrix Analysis

To further analyze the classification performance of the model across the multiple attack classes, outcomes have been presented in the form of a confusion matrix. The proposed CNN-Attention-BiLSTM hybrid model has been found to exhibit strong diagonal dominance as presented in Figure 3. The model has been found to attain accurate multi-class discrimination. The normal and DoS categories in particular have attained a near-perfect classification. This led to minimal confusion between normal network behavior and four distinct types of DoS attacks: Grayhole, Blackhole, TDMA, and Flooding.

The overall precision (P), recall (R), and

F_{1}

-score were computed as

\begin{matrix} P & = \frac{\sum_{i} T P_{i}}{\sum_{i} (T P_{i} + F P_{i})} = 0.9842, \end{matrix}

(30)

\begin{matrix} R & = \frac{\sum_{i} T P_{i}}{\sum_{i} (T P_{i} + F N_{i})} = 0.9791, \end{matrix}

(31)

\begin{matrix} F_{1} & = \frac{2 P R}{P + R} = 0.9791, \end{matrix}

(32)

which collectively demonstrate the superior discriminative capacity of the model. Although the confusion matrix shows near-perfect diagonal dominance, we verified that this behavior does not stem from overfitting. First, the training and validation learning curves show no divergence, and the validation loss remains stable across epochs, as shown in Figure 2. Second, the class weighting strategy used during training helps the model learn minority attack categories without memorizing the data. Third, a five-fold cross-validation experiment was conducted, and the performance remained consistent across all folds, indicating that the model is not relying on spurious correlations. Finally, the WSN-DS dataset contains well-separated traffic patterns for several attack types, which naturally leads to high separability once both spatial and temporal dependencies are modeled. These factors collectively confirm that the high accuracy in the confusion matrix results from good generalization rather than overfitting.

4.3. Model Interpretability and Structural Visualization

The network architecture has been depicted in Figure 4. It entails a multiscale convolutional feature extraction with multi-head attention fusion and BiLSTM encoding for the temporal dependency modeling. The total number of trainable parameters is approximately

96, 735

, with only 256 non-trainable parameters. This ensured a lightweight deployment within constrained WSN environments.

4.4. Learning Dynamics and Validation Logs

Figure 5 presents the epoch-wise training logs. These depict a consistent improvement in accuracy and reduction in loss. Each epoch’s validation loss (

L_{v a l}

) was evaluated, and the best-performing model was checkpointed according to

L_{v a l}^{b e s t} = min_{t} {L_{v a l} (t)} .

(33)

4.5. Comparative Evaluation

The proposed model has been validated against the benchmark models and classifiers that involve CNN, CNN + recurrent neural network (RNN), and naïve Bayes. The summary of the comparison has been presented in Table 5. The model has been found to attain an overall accuracy of

98.0 %

. The accuracy is higher compared to the conventional approaches by up to

2.2 %

. In addition, the inclusion of the attention mechanism further improved the interpretability. This offers insights into the neuron activation relevance during detection.

The CNN model recorded 97.0% accuracy, demonstrating the ability of convolutional layers to extract important spatial features. However, it presented lower precision and recall of 83.60% and 82.60%, respectively, indicating difficulty in identifying several attack types and resulting in a relatively high false positive rate. The CNN + RNN gained better recall of 96.48%, and F1-score with 96.86% due to the ability to learn patterns over time, while the overall accuracy was similar to the CNN model. This means that using only simple time modeling is not enough to express complex and nonlinear behaviors observed in WSNs.

On the other hand, the Naïve Bayes classifier demonstrated the lowest accuracy at 95.82% compared to other baseline models. This finding shows the limitation to cope with non-linear and high-dimensional feature interactions which appear in intrusion data.

The proposed model outperforms baseline models across all metrics, reaching 98.42% precision, 97.91% recall, and a 97.91% F1-score. This indicates the model is more effective at detecting attacks while minimizing false positives, which matters in real-time WSN uses. The dual-attention mechanism assists the model in targeting key features, boosting both robustness and stability.

Unlike other baseline models, the proposed model offers interpretability based on the XAI feature. Attention layers reveal the spatial or temporal regions that most strongly influence classification decisions. Interpreting decisions helps network administrators to understand the model’s reasoning and identify vulnerable parts of the network, unlike black-box models that only provide prediction results. The proposed approach provides both interpretable reasoning and high-performance detection, which is beneficial for WSN security monitoring in the real world.

The comparative evaluation in this study primarily focuses on classical DL baselines, which include CNN, CNN combined with RNN, and Naïve Bayes. While these models are widely used in WSN intrusion detection research, we acknowledge that several recent architectures incorporate graph-based reasoning and attention-driven or transformer-based feature extraction. Examples include graph attention networks, heterogeneous graph convolutional models, and transformer-based IDS frameworks introduced for IoT and cyber–physical systems.

These more recent IDS models were not included in the benchmarking for two reasons. First, these models typically require access to datasets that explicitly contain network-level relational structure or multi-hop communication traces, which are not directly provided in the WSN-DS dataset. Second, many transformer-based IDS models exhibit significantly higher computational complexity and memory consumption, making them unsuitable for deployment in constrained WSN environments, which is the focus of our work.

Nevertheless, we recognize the importance of benchmarking against these state-of-the-art graph and transformer architectures. In future work, we plan to evaluate the proposed framework alongside lightweight graph attention variants, simplified transformer encoders, and recent hybrid graph–temporal intrusion detection models. This will help establish a more comprehensive performance landscape and validate the advantages of the proposed design under broader conditions.

4.6. Discussion

Overall, the proposed framework has been found to outperform the baseline models in terms of accuracy and interpretability. In addition, the contextual attention mechanisms have allowed improved discrimination between the overlapping attack signatures. The mathematical baseline for this improvement is related to non-linear fusion of multi-scale convolutional and temporal embeddings:

z_{f i n a l} = ϕ (W_{a} [F_{m s} \oplus h_{b i}] + b_{a}),

(34)

where ⊕ denotes concatenation and

ϕ

represents a non-linear mapping. The end-to-end architecture thus achieves robust generalization, scalability, and real-time inference capability. This makes it suitable for deployment in resource-constrained WSN environments.

The integration of multi-scale convolutional features and bidirectional temporal embeddings enables the framework to create a coherent latent space that simultaneously captures both local spatial patterns and temporal dynamics of WSN traffic. This combination of features from both domains enhances the model’s ability to detect subtle differences in behavior between normal and malicious traffic. The linear transformation, characterized by

W_{a}

and the bias term

b_{a}

, maps these features into a more differentiated subspace, while the non-linear activation function

ϕ

facilitates higher-order interactions among features, allowing the model to capture complex and non-linear attack behaviors common in WSN environments. The proposed approach not only improves the representative power of the learned embeddings but also enhances the differentiation between classes within the latent space, as demonstrated by the improved clustering of attack categories during the evaluation process.

Compared with conventional CNN or LSTM-based models, the proposed hybrid framework offers a significant advantage in handling the complex characteristics of WSNs. Traditional CNNs can extract local spatial features but often struggle with irregular topologies of WSN. LSTM models excel at capturing temporal patterns but overlook the spatial relationships among network nodes. Introducing the GCN addresses these limitations by learning the connections between nodes and how anomalies propagate across the network, which is crucial for identifying distributed or coordinated attacks. Furthermore, the BiLSTM component of the model improves temporal learning by analyzing data in both forward and backward directions. Additionally, the attention fusion mechanism emphasizes the most significant spatial, structural, and temporal features. As a result, the proposed framework provides reliable and precise intrusion detection, decreases false alarms, and adapts effectively to changing network conditions.

From a deployment perspective, the proposed IDS is designed to run at the WSN sink node or at an edge gateway, rather than on individual sensor nodes. In such a configuration, sensor nodes continue to operate with simple protocol stacks and low local processing, while the heavier spatio-temporal and graph-based reasoning is offloaded to a more capable device that aggregates their traffic. As a result, the main energy burden at the node level remains dominated by communication, which is unchanged by our approach. The relatively small parameter count and modest computational requirements of the model make it suitable for deployment on contemporary embedded processors at the gateway, where energy and computational budgets are significantly less constrained than on the leaf nodes.

4.7. Practical Implications, Achieved Goals, and Future Directions

The proposed framework achieves the main goals stated in the introduction, namely: (i) improving multiclass intrusion detection accuracy on WSN-DS, (ii) incorporating spatial, temporal, and topological reasoning through dual-attention and GCN components, and (iii) maintaining a lightweight model size suitable for gateway-level deployment. The results demonstrate clear gains in precision, recall, and overall stability compared with established CNN, CNN–RNN, and Naïve Bayes baselines.

In practical terms, the model provides two key benefits. First, the attention mechanisms highlight which nodes, time intervals, and traffic indicators contribute most to an alert, offering interpretable insights for network administrators. Second, the fused CNN–GCN–BiLSTM architecture allows a sink or edge node to detect both localized and distributed DoS behaviors without requiring computation on individual sensor motes.

Future work will focus on three directions: (1) measuring real-time inference latency and energy usage on embedded edge hardware to provide a full system-level performance analysis; (2) extending the model toward online or continual learning to handle evolving attack patterns; and (3) validating the approach on additional WSN and IoT intrusion datasets to test generalization across different scenarios. These steps will help transition the proposed approach from experimental evaluation to deployment-ready IDS solutions.

5. Conclusions

The research presents an advanced hybrid DL framework for intrusion detection in the WSNs. The model integrated multi-scale convolutional blocks, attention fusion layers, graph convolutional reasoning, and BiLSTM components. The proposed framework helped in effectively capturing both spatial and temporal dependencies in sensor network traffic. The evaluation of the model has been carried out on the WSN-DS dataset. The model has been found to attain superior detection capability and has shown the ability to distinguish against diverse attacks, including Grayhole, Blackhole, TDMA, and Flooding. The study not only attained high detection accuracy but also maintained lightweight computational complexity that is suitable for real-time WSN environments. Compared to the existing baseline models, the proposed model attained enhanced generalization, reduced false alarms, and improved feature interpretability through its attention-driven design. Overall, the model bridged the gap between a high-performance IDS framework and practical WSN deployment. These offer scalable, energy-efficient, and topology-aware detection solutions. In the future, the following can be explored: (i) adaptive federated implementations for decentralized WSN nodes, (ii) self-evolving detection modules leveraging online learning to handle emerging attack patterns, and (iii) explainable visual analytics to strengthen trust and interpretability in mission-critical applications.

Author Contributions

Conceptualization, L.H.B. and C.B.; methodology, L.H.B. and C.B.; software, A.A. and B.E.; validation, C.B. and B.E.; formal analysis, A.A., J.M.A. and H.A.; investigation, C.B. and J.M.A.; resources, L.H.B., A.A. and H.A.; data curation, L.H.B.; writing—original draft preparation, L.H.B. and C.B.; writing—review and editing, L.H.B. and C.B.; visualization, L.H.B. and C.B.; supervision, C.B.; project administration, L.H.B.; funding acquisition, L.H.B. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The code is available at https://github.com/laith85/CNN-GCN-Bilstm (accessed on 10 December 2025).

Conflicts of Interest

The authors declare no conflict of interest.

References

Puccinelli, D.; Haenggi, M. Wireless sensor networks: Applications and challenges of ubiquitous sensing. IEEE Circuits Syst. Mag. 2005, 5, 19–31. [Google Scholar] [CrossRef]
Borges, L.M.; Velez, F.J.; Lebres, A.S. Survey on the Characterization and Classification of Wireless Sensor Network Applications. IEEE Commun. Surv. Tutor. 2014, 16, 1860–1890. [Google Scholar] [CrossRef]
Bunterngchit, C.; Pornchaivivat, S.; Bunterngchit, Y. Productivity Improvement by Retrofit Concept in Auto Parts Factories. In Proceedings of the 2019 8th International Conference on Industrial Technology and Management (ICITM), Cambridge, UK, 2–4 March 2019; pp. 122–126. [Google Scholar] [CrossRef]
Othman, M.F.; Shazali, K. Wireless Sensor Network Applications: A Study in Environment Monitoring System. Procedia Eng. 2012, 41, 1204–1210. [Google Scholar] [CrossRef]
Bunterngchit, C.; Baniata, L.H.; Baniata, M.H.; ALDabbas, A.; Khair, M.A.; Chearanai, T.; Kang, S. GACL-Net: Hybrid Deep Learning Framework for Accurate Motor Imagery Classification in Stroke Rehabilitation. Comput. Mater. Contin. 2025, 83, 517–536. [Google Scholar] [CrossRef]
Chhaya, L.; Sharma, P.; Bhagwatikar, G.; Kumar, A. Wireless Sensor Network Based Smart Grid Communications: Cyber Attacks, Intrusion Detection System and Topology Control. Electronics 2017, 6, 5. [Google Scholar] [CrossRef]
Prodanović, R.; Rančić, D.; Vulić, I.; Zorić, N.; Bogićević, D.; Ostojić, G.; Sarang, S.; Stankovski, S. Wireless Sensor Network in Agriculture: Model of Cyber Security. Sensors 2020, 20, 6747. [Google Scholar] [CrossRef] [PubMed]
Dritsas, E.; Trigka, M. A Survey on Cybersecurity in IoT. Future Internet 2025, 17, 30. [Google Scholar] [CrossRef]
Majid, M.; Habib, S.; Javed, A.R.; Rizwan, M.; Srivastava, G.; Gadekallu, T.R.; Lin, J.C.W. Applications of wireless sensor networks and internet of things frameworks in the industry revolution 4.0: A systematic literature review. Sensors 2022, 22, 2087. [Google Scholar] [CrossRef]
Kenyeres, M.; Kenyeres, J.; Hassankhani Dolatabadi, S. Distributed consensus gossip-based data fusion for suppressing incorrect sensor readings in wireless sensor networks. J. Low Power Electron. Appl. 2025, 15, 6. [Google Scholar] [CrossRef]
Thapa, N.; Liu, Z.; KC, D.B.; Gokaraju, B.; Roy, K. Comparison of Machine Learning and Deep Learning Models for Network Intrusion Detection Systems. Future Internet 2020, 12, 167. [Google Scholar] [CrossRef]
Biermann, E.; Cloete, E.; Venter, L. A comparison of Intrusion Detection systems. Comput. Secur. 2001, 20, 676–683. [Google Scholar] [CrossRef]
Abdulganiyu, O.H.; Ait Tchakoucht, T.; Saheed, Y.K. A systematic literature review for network intrusion detection system (IDS). Int. J. Inf. Secur. 2023, 22, 1125–1162. [Google Scholar] [CrossRef]
Alharthi, A.; Alaryani, M.; Kaddoura, S. A comparative study of machine learning and deep learning models in binary and multiclass classification for intrusion detection systems. Array 2025, 26, 100406. [Google Scholar] [CrossRef]
Halbouni, A.; Gunawan, T.S.; Habaebi, M.H.; Halbouni, M.; Kartiwi, M.; Ahmad, R. CNN-LSTM: Hybrid Deep Neural Network for Network Intrusion Detection System. IEEE Access 2022, 10, 99837–99849. [Google Scholar] [CrossRef]
Vinayakumar, R.; Alazab, M.; Soman, K.P.; Poornachandran, P.; Al-Nemrat, A.; Venkatraman, S. Deep Learning Approach for Intelligent Intrusion Detection System. IEEE Access 2019, 7, 41525–41550. [Google Scholar] [CrossRef]
Hakami, H.; Faheem, M.; Bashir Ahmad, M. Machine Learning Techniques for Enhanced Intrusion Detection in IoT Security. IEEE Access 2025, 13, 31140–31158. [Google Scholar] [CrossRef]
Alzahrani, A. Novel Approach for Intrusion Detection Attacks on Small Drones Using ConvLSTM Model. IEEE Access 2024, 12, 149238–149253. [Google Scholar] [CrossRef]
Mohamed, N. Artificial intelligence and machine learning in cybersecurity: A deep dive into state-of-the-art techniques and future paradigms. Knowl. Inf. Syst. 2025, 67, 6969–7055. [Google Scholar] [CrossRef]
Khraisat, A.; Alazab, A. A critical review of intrusion detection systems in the internet of things: Techniques, deployment strategy, validation strategy, attacks, public datasets and challenges. Cybersecurity 2021, 4, 18. [Google Scholar] [CrossRef]
Houda, Z.A.E.; Naboulsi, D.; Kaddoum, G. A Privacy-Preserving Collaborative Jamming Attacks Detection Framework Using Federated Learning. IEEE Internet Things J. 2024, 11, 12153–12164. [Google Scholar] [CrossRef]
Jeyakumar, S.R.; Rahman, M.Z.U.; Sinha, D.K.; Kumar, P.R.; Vimal, V.; Singh, K.U.; Syamsundararao, T.; Kumar, J.N.V.R.S.; Balajee, J. An Innovative Secure and Privacy-Preserving Federated Learning-Based Hybrid Deep Learning Model for Intrusion Detection in Internet-Enabled Wireless Sensor Networks. IEEE Trans. Consum. Electron. 2025, 71, 273–280. [Google Scholar] [CrossRef]
Zhou, H.; Zou, H.; Zhou, P.; Shen, Y.; Li, D.; Li, W. CBCTL-IDS: A Transfer Learning-Based Intrusion Detection System Optimized With the Black Kite Algorithm for IoT-Enabled Smart Agriculture. IEEE Access 2025, 13, 46601–46615. [Google Scholar] [CrossRef]
Birahim, S.A.; Paul, A.; Rahman, F.; Islam, Y.; Roy, T.; Asif Hasan, M.; Haque, F.; Chowdhury, M.E.H. Intrusion Detection for Wireless Sensor Network Using Particle Swarm Optimization Based Explainable Ensemble Machine Learning Approach. IEEE Access 2025, 13, 13711–13730. [Google Scholar] [CrossRef]
Alruwaili, F.F.; Asiri, M.M.; Alrayes, F.S.; Aljameel, S.S.; Salama, A.S.; Hilal, A.M. Red Kite Optimization Algorithm with Average Ensemble Model for Intrusion Detection for Secure IoT. IEEE Access 2023, 11, 131749–131758. [Google Scholar] [CrossRef]
Atitallah, S.B.; Driss, M.; Boulila, W.; Koubaa, A. Securing Industrial IoT Environments: A Fuzzy Graph Attention Network for Robust Intrusion Detection. IEEE Open J. Comput. Soc. 2025, 6, 1065–1076. [Google Scholar] [CrossRef]
Jiang, L.; Gu, H.; Xie, L.; Yang, H.; Na, Z. ST-IAOA-XGBoost: An Efficient Data-Balanced Intrusion Detection Method for WSN. IEEE Sens. J. 2025, 25, 1768–1783. [Google Scholar] [CrossRef]
Saleh, H.M.; Marouane, H.; Fakhfakh, A. Stochastic Gradient Descent Intrusions Detection for Wireless Sensor Network Attack Detection System Using Machine Learning. IEEE Access 2024, 12, 3825–3836. [Google Scholar] [CrossRef]
Almomani, I.; Al-Kasasbeh, B.; AL-Akhras, M. WSN-DS: A Dataset for Intrusion Detection Systems in Wireless Sensor Networks. J. Sens. 2016, 2016, 4731953. [Google Scholar] [CrossRef]
Marriwala, N.; Rathee, P. An approach to increase the wireless sensor network lifetime. In Proceedings of the 2012 World Congress on Information and Communication Technologies, Trivandrum, India, 30 October–2 November 2012; pp. 495–499. [Google Scholar] [CrossRef]

Figure 1. Model architecture of the proposed framework.

Figure 2. The model convergence behavior: (a) training and validation accuracy curves; (b) training and validation loss convergence.

Figure 3. Confusion matrix of the proposed intrusion detection model, where the numeric labels correspond to the following intrusion categories: 0, 1, 2, 3, and 4 represent Blackhole, Flooding, Grayhole, Normal, and TDMA, respectively, with diagonal entries indicating correct classifications and off-diagonal values denoting misclassifications.

Figure 4. Architectural summary of the proposed model.

Figure 5. Epoch-wise model training logs showing validation checkpoints.

Table 1. Summary of reviewed intrusion detection models on WSN-DS and related datasets.

Article	Methodology/Model	Dataset	Key Limitation/Gap
[21]	Federated learning with secure aggregation for jamming attack detection	WSN-DS (jamming classes)	Limited to jamming attacks; lacks multi-attack scalability
[22]	Hybrid SCNN–BiLSTM optimized via African vulture optimization under a federated learning setup	WSN-DS, CIC-IDS2017	Low communication efficiency in FL; lacks interpretability
[23]	Transfer learning with MobileNet/VGG19 ensemble optimized by Black Kite Algorithm	ToN-IoT, Edge-IIoTset, WSN-DS	High computational load; poor real-time adaptability
[15]	CNN–LSTM hybrid model integrating spatial–temporal dependencies	WSN-DS (binary and multi-class)	Class imbalance and explainability not addressed
[16]	DL benchmarked against classical ML baselines	KDDCup’99, NSL-KDD, WSN-DS	No WSN-specific topology modeling; high false positives
[24]	PSO-based feature selection with RF, DT, and kNN ensemble plus LIME/SHAP explanations	WSN-DS (Binary)	No temporal or spatial dependency modeling
[17]	SMOTE-based balancing and PCC feature selection for ML/DL comparison	WSN-DS, UNSW-NB15, CIC-IDS2017	No topology-aware or energy-efficient design
[18]	ConvLSTM for spatial–temporal intrusion detection in IoD networks	WSN-DS, NSL-KDD, Drone dataset	Limited to UAVs context; weak transferability to WSNs
[25]	Red Kite Optimization with average ensemble fusion and LCWOA tuning	WSN-DS (Binary)	No adaptive temporal modeling; lacks robustness to evolving threats
[26]	Fuzzy graph attention network for relational uncertainty learning	Edge-IIoTSet, CIC-Malmem, WSN-DS	Computationally expensive; unsuitable for constrained WSNs
[28]	SGD-based optimization for lightweight ML classifiers in WSN intrusion detection	WSN-DS (Binary)	Simplistic linear models; limited scalability for dense WSNs
[27]	Improved arithmetic optimization algorithm integrated with XGBoost	WSN-DS (Binary)	Static learning; lacks adaptive or online retraining

Table 2. Feature description of the WSN-DS dataset.

Feature Symbol	Description	Feature Symbol	Description
`id`	A unique identifier assigned to each sensor node; distinguishes nodes across rounds and stages.	`Time`	Current simulation time of the node representing its temporal position in the network.
`Is_CH`	Binary flag indicating whether a node is a cluster head (1) or a normal node (0).	`who_CH`	Identifier of the cluster head associated with the node in the current round.
`Dist_To_CH`	Distance between the node and its respective cluster head, calculated per round.	`ADV_S`	Number of advertise messages broadcast by cluster heads to surrounding nodes.
`ADV_R`	Number of advertise messages received by a node from nearby cluster heads.	`JOIN_S`	Number of join request messages sent by nodes to cluster heads for cluster formation.
`JOIN_R`	Number of join request messages received by cluster heads from their member nodes.	`SCH_S`	Number of TDMA schedule broadcast messages sent by cluster heads to nodes.
`SCH_R`	Number of TDMA schedule messages received from cluster heads by the nodes.	`Rank`	The order or rank of a node within the TDMA schedule during communication.
`DATA_S`	Number of data packets sent from a sensor node to its cluster head.	`DATA_R`	Number of data packets received by the cluster head from its sensor nodes.
`Data_Sent_To_BS`	Number of data packets transmitted from the cluster head to the base station.	`dist_CH_To_BS`	Distance between the cluster head and the base station used for energy computation.
`send_code`	Cluster sending code identifying the transmitting node within its cluster.	`Expanded_Energy`	Amount of energy consumed by the node during the previous communication round.
`Attack_type`	Target variable representing the attack category with five classes: Blackhole, Grayhole, Flooding, TDMA, and Normal.	–	–

Table 3. Label encoding used in the proposed method.

Class	Label
Blackhole	0
Flooding	1
Grayhole	2
Normal	3
TDMA	4

Table 4. Simulation parameters used in the proposed intrusion detection framework.

Parameter	Value
Learning rate	$1 \times 10^{- 4}$
Batch size	128
Epochs	30
Optimizer	Adam
Regularization	$L_{2} (λ = 0.001)$
Dropout rate	$0.25$ – $0.30$
Feature dimension	16
Hidden units (BiLSTM)	64 per direction

Table 5. Summary of comparative results for intrusion detection performance.

Model	XAI	Accuracy (%)	Precision (%)	Recall (%)	F1-Score (%)
CNN	No	97.00	83.60	82.60	82.00
CNN + RNN	No	97.04	98.79	96.48	96.86
Naïve Bayes	No	95.82	96.80	95.40	96.09
Proposed model	Yes	98.00	98.42	97.91	97.91

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Baniata, L.H.; ALDabbas, A.; Atwan, J.M.; Alahmer, H.; Elmasri, B.; Bunterngchit, C. A Dual-Attention CNN–GCN–BiLSTM Framework for Intelligent Intrusion Detection in Wireless Sensor Networks. Future Internet 2026, 18, 5. https://doi.org/10.3390/fi18010005

AMA Style

Baniata LH, ALDabbas A, Atwan JM, Alahmer H, Elmasri B, Bunterngchit C. A Dual-Attention CNN–GCN–BiLSTM Framework for Intelligent Intrusion Detection in Wireless Sensor Networks. Future Internet. 2026; 18(1):5. https://doi.org/10.3390/fi18010005

Chicago/Turabian Style

Baniata, Laith H., Ashraf ALDabbas, Jaffar M. Atwan, Hussein Alahmer, Basil Elmasri, and Chayut Bunterngchit. 2026. "A Dual-Attention CNN–GCN–BiLSTM Framework for Intelligent Intrusion Detection in Wireless Sensor Networks" Future Internet 18, no. 1: 5. https://doi.org/10.3390/fi18010005

APA Style

Baniata, L. H., ALDabbas, A., Atwan, J. M., Alahmer, H., Elmasri, B., & Bunterngchit, C. (2026). A Dual-Attention CNN–GCN–BiLSTM Framework for Intelligent Intrusion Detection in Wireless Sensor Networks. Future Internet, 18(1), 5. https://doi.org/10.3390/fi18010005

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Article metric data becomes available approximately 24 hours after publication online.

Article Menu

A Dual-Attention CNN–GCN–BiLSTM Framework for Intelligent Intrusion Detection in Wireless Sensor Networks

Abstract

1. Introduction

2. Related Works

3. Materials and Methods

3.1. Design Framework

3.2. Dataset Description

3.3. Data Preprocessing

3.4. Feature Engineering

3.5. Model Design

3.5.1. Multi-Scale Convolutional Block

3.5.2. Dual-Stage Attention Fusion

3.5.3. Graph Convolutional Regularization

3.5.4. Bidirectional LSTM with Contextual Attention

3.5.5. Hierarchical Aggregation and Output Projection

3.6. Evaluation and Simulation

4. Results

4.1. Training and Validation Performance

4.2. Confusion Matrix Analysis

4.3. Model Interpretability and Structural Visualization

4.4. Learning Dynamics and Validation Logs

4.5. Comparative Evaluation

4.6. Discussion

4.7. Practical Implications, Achieved Goals, and Future Directions

5. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI