Reinforcement Q-Learning-Based Adaptive Encryption Model for Cyberthreat Mitigation in Wireless Sensor Networks
Abstract
:1. Introduction
1.1. Security Issues in WSN
1.2. Motivation
1.3. Research Contributions
- Development of a reinforcement learning-driven adaptive encryption framework that utilizes Q-learning-based policy optimization to select encryption levels based on network threat conditions. The model incorporates a Markov Decision Process (MDP) formulation to define state transitions and reward functions for optimizing energy efficiency and security robustness [16].
- Integration of dynamic Q-learning (Algorithm 1) for energy-efficient encryption scaling in low-threat conditions. The algorithm dynamically adjusts encryption complexity by evaluating the energy cost–security trade-off, ensuring resource conservation while maintaining sufficient security levels. It employs an -greedy exploration–exploitation strategy with an adaptive decay mechanism to improve learning convergence in dynamic WSN environments.
- Implementation of Double Q-Learning (Algorithm 2) for robust security adaptation under high-threat scenarios. The use of dual Q-value function approximation reduces the overestimation bias inherent in traditional Q-learning. By maintaining two separate Q-value estimations and alternating updates, the framework enhances security decision-making, effectively mitigating advanced cyberattacks such as DDoS, black-hole, and data injection attacks.
- Design of a Hybrid Policy Derivation Algorithm (Algorithm 3) that optimally balances encryption levels by combining the strengths of dynamic Q-learning and double Q-learning. The hybrid policy integrates real-time threat assessment using a feedforward neural network-based anomaly detection model (Algorithm 4) to ensure adaptive encryption decision-making without excessive computational overhead.
- Integration of a deep learning-based anomaly detection system (Algorithm 4) for real-time threat classification using packet delivery ratio (PDR), latency, and anomaly scores. The model utilizes a feedforward neural network trained on historical network traffic to classify low-, moderate-, and high-threat states, guiding the encryption adaptation process.
- Introduction of a dynamic hyperparameter tuning mechanism for reinforcement learning updates, ensuring adaptive learning rate adjustment based on network conditions. This optimizes Q-learning convergence speed, enhancing the model’s adaptability to dynamic WSN environments.
- Extensive evaluation in a simulated wireless sensor network environment, demonstrating a 30.5% reduction in energy consumption, a 92.5% packet delivery ratio, and a 37% reduction in transmission latency.
- Enhanced security effectiveness with a 94% attack mitigation efficiency against DDoS, black-hole, and data injection attacks.
- Practical applicability and future scalability considerations, including integration with blockchain-based encryption key management for decentralized security enhancement, deployment in low-power IoT hardware such as ARM Cortex and Raspberry Pi for real-world energy profiling, and extension to multi-agent reinforcement learning for decentralized encryption decision-making.
Algorithm 1 Enhanced Q-Learning with Dynamic Parameter Adjustment for Adaptive Encryption. |
|
Algorithm 2 Double Q-Learning for Adaptive Encryption Scaling. |
|
Algorithm 3 Adaptive Cyberattack Mitigation with Hybrid Encryption. |
|
Algorithm 4 Deep Learning-Based Anomaly Detection. |
|
2. Literature Review
2.1. Reinforcement Learning Models for Attack Mitigation
2.2. Adaptive Encryption Models for Attack Mitigation
2.3. Issues with Existing Models
2.4. Research Gaps Identified
3. Reinforcement Q-Learning Framework for Adaptive Encryption
3.1. Methodology
3.2. System Model and Assumptions
- S is the state space that captures the energy levels of the nodes and the levels of threat of the network.
- A is the action space that corresponds to possible encryption levels.
- P is the state transition probability, denoted as .
- R is the reward function that balances energy efficiency and security.
3.3. Deep Learning-Based Anomaly Detection
3.4. Q-Learning Algorithm
- is the reward received steps after taking action a in state s at time t.
- is the discount factor, which determines the importance of future rewards relative to immediate rewards.
- r is the immediate reward for taking action a in state s.
- is the next state resulting from action a.
- is the maximum Q-value achievable from state .
- is the learning rate, which determines the extent to which newly acquired information overrides the old information.
- is the reward received after taking action in state .
- is the discount factor, as defined earlier.
4. Adaptive Cyberattack Mitigation with Hybrid Encryption
5. Performance Evaluation
5.1. Metrics for Evaluation
5.2. Comparison Analysis and Discussion
- AES-128 Fixed Encryption: Standard fixed 128-bit encryption, widely used but non-adaptive.
- Lightweight Block Cipher (LBC): Optimized for low-energy environments, with minimal computational overhead.
- Elliptic Curve Cryptography (ECC): Public-key encryption with low energy requirements and suitable for constrained devices.
- Hybrid AES-RSA: Uses AES for encryption and RSA for key exchange, balancing energy use and security.
- Dynamic Threshold Encryption (DTE): An adaptive encryption approach that scales on the basis of estimated threat levels.
- Blockchain-based Lightweight Encryption (BLE): Integrates blockchain for enhanced security, suitable for distributed WSN applications.
6. Evaluation Against Cyberattack Mitigation
- Distributed Denial-of-Service (DDoS) Attack: Overwhelms the network by flooding it with excessive requests, depleting node resources, and disrupting communication. The goal is to degrade the availability and performance of the network.
- Data Injection Attack: Malicious nodes inject falsified or altered data, compromising data integrity and misleading decision-making processes, potentially causing incorrect routing and vulnerabilities.
- Black-Hole Attack: A malicious node intercepts and drops all packets, disrupting communication flow, reducing the packet delivery ratio, and compromising network reliability.
- Wormhole Attack: Malicious nodes create a tunnel (wormhole) to forward packets to distant parts of the network, bypassing normal routes. This disrupts routing protocols and reroutes traffic through malicious nodes.
- Selective Forwarding Attack: A malicious node selectively drops specific packets, targeting particular types of data. This makes the attack harder to detect compared to black-hole attacks.
6.1. Experimental Setup for Cyberattack Mitigation
6.2. Results and Discussion
6.3. Numerical Analysis of the Proposed System
7. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
- Kim, J.M.; Lee, H.S.; Yi, J.; Park, M. Power Adaptive Data Encryption for Energy-Efficient and Secure Communication in Solar-Powered Wireless Sensor Networks. J. Sens. 2016, 2016, 2678269. [Google Scholar] [CrossRef]
- Jamshed, M.A.; Ali, K.; Abbasi, Q.H.; Imran, M.A.; Ur-Rehman, M. Challenges, applications, and future of wireless sensors in Internet of Things: A review. IEEE Sens. J. 2022, 22, 5482–5494. [Google Scholar] [CrossRef]
- Manikandan, S.; Suganthi, S.; Gayathiri, R. Optimal Energy Efficiency Techniques and Security Enhancement in Wireless Sensor Network Using Machine Learning. In Proceedings of the 2022 International Conference on Power, Energy, Control and Transmission Systems (ICPECTS), Chennai, India, 8–9 December 2022; pp. 1–5. [Google Scholar] [CrossRef]
- Mousavi, S.K.; Ghaffari, A.; Besharat, S.; Afshari, H. Security of internet of things based on cryptographic algorithms: A survey. Wirel. Netw. 2021, 27, 1515–1555. [Google Scholar]
- Ahmad, R.; Wazirali, R.; Abu-Ain, T. Machine Learning for Wireless Sensor Networks Security: An Overview of Challenges and Issues. Sensors 2022, 22, 4730. [Google Scholar] [CrossRef]
- Dhanda, S.S.; Singh, B.; Jindal, P. Lightweight cryptography: A solution to secure IoT. Wirel. Pers. Commun. 2020, 112, 1947–1980. [Google Scholar]
- Phalaagae, P.; Zungeru, A.; Sigweni, B.; Rajalakshmi, S. Authentication schemes in wireless internet of things sensor networks: A survey and comparison. Indones. J. Electr. Eng. Comput. Sci. 2024, 33, 1876–1888. [Google Scholar] [CrossRef]
- Altaweel, A.; Aslam, S.; Kamel, I. Security attacks in Opportunistic Mobile Networks: A systematic literature review. J. Netw. Comput. Appl. 2024, 221, 103782. [Google Scholar] [CrossRef]
- Kheddar, H.; Dawoud, D.; Awad, A.; Himeur, Y.; Khan, M. Reinforcement-Learning-Based Intrusion Detection in Communication Networks: A Review. IEEE Commun. Surv. Tutor. 2024; early access. [Google Scholar] [CrossRef]
- Jain, A.; Shukla, H.; Goel, D. A comprehensive survey on DDoS detection, mitigation, and defense strategies in software-defined networks. Clust. Comput. 2024, 27, 13129–13164. [Google Scholar] [CrossRef]
- Chen, X.; Liu, Y. Q-Learning-Based Adaptive Encryption Level Adjustment for Resource-Constrained IoT Devices. Sensors 2019, 19, 2228. [Google Scholar] [CrossRef]
- Gao, Y.; Xian, H.; Yu, A. Secure data deduplication for Internet-of-Things sensor networks based on threshold dynamic adjustment. Int. J. Distrib. Sens. Netw. 2020, 16, 1550147720911003. [Google Scholar] [CrossRef]
- Anitha, R.; Tapas Bapu, B.R. Blockchain-based light-weight authentication approach for a multiple wireless sensor network. IETE J. Res. 2024, 70, 1480–1494. [Google Scholar] [CrossRef]
- Rehman, A.; Abdullah, S.; Fatima, M.; Iqbal, M.W.; Almarhabi, K.A.; Ashraf, M.U.; Ali, S. Ensuring security and energy efficiency of wireless sensor network by using blockchain. Appl. Sci. 2022, 12, 10794. [Google Scholar] [CrossRef]
- Fascista, A. Toward integrated large-scale environmental monitoring using WSN/UAV/Crowdsensing: A review of applications, signal processing, and future perspectives. Sensors 2022, 22, 1824. [Google Scholar] [CrossRef] [PubMed]
- Ruan, K.; Zhang, J.; Di, X.; Bareinboim, E. Causal imitation for markov decision processes: A partial identification approach. Adv. Neural Inf. Process. Syst. 2025, 37, 87592–87620. [Google Scholar]
- Mohamed Anwar, A.; Pavalarajan, S. Spider Web-based Dynamic Key for Secured Transmission and Data-Aware Blockchain Encryption for the Internet of Things. IETE J. Res. 2024, 70, 499–514. [Google Scholar] [CrossRef]
- Arivumani, S.; Nagarajan, M. Adaptive convolutional-LSTM neural network with NADAM optimization for intrusion detection in underwater IoT wireless sensor networks. Eng. Res. Express 2024, 6, 035243. [Google Scholar] [CrossRef]
- Wang, B.; Yue, X.; Liu, Y.; Hao, K.; Li, Z.; Zhao, X. A Dynamic Trust Model for Underwater Sensor Networks Fusing Deep Reinforcement Learning and Random Forest Algorithm. Appl. Sci. 2024, 14, 3374. [Google Scholar] [CrossRef]
- Rasool, H.; Najim, A.; Alsadh, M.; Hariz, H. Recognition of Threats in Hybrid Wireless Sensor Networks by Integrating Harris Hawks with Gradient Boosting Algorithm. Int. J. Intell. Eng. Syst. 2025, 18, 675–892. [Google Scholar] [CrossRef]
- Saveetha, D.; Maragatham, G.; Ponnusamy, V.; Zdravkovic, N. An Integrated Federated Machine Learning and Blockchain Framework With Optimal Miner Selection for Reliable DDOS Attack Detection. IEEE Access 2024, 12, 127903–127915. [Google Scholar] [CrossRef]
- Suhag, S.; Aarti. Challenges and Potential Approaches in Wireless Sensor Network Security. J. Electr. Eng. Technol. 2024, 19, 2693–2700. [Google Scholar] [CrossRef]
- Devi, S.; Kumar, A. Establishment of secure and authentic data security framework in wireless sensor network using key reconciliation. Int. J. Inf. Technol. 2024, 16, 3325–3336. [Google Scholar] [CrossRef]
- Yesodha, K.; Krishnamurthy, M.; Thangaramya, K.; Kannan, A. Elliptic curve encryption-based energy-efficient secured ACO routing protocol for wireless sensor networks. J. Supercomput. 2024, 80, 18866–18899. [Google Scholar] [CrossRef]
- Jagwani, N.; Poornima, G. Machine Learning Algorithms to Detect Attacks in Wireless Sensor Networks. Int. J. Intell. Syst. Appl. Eng. 2024, 12, 417–431. [Google Scholar]
- Aziz, A.; Mirzaliev, S. Optimizing Intrusion Detection Mechanisms for IoT Network Security. J. Cybersecur. Inf. Manag. 2024, 13, 60–68. [Google Scholar] [CrossRef]
- Kumar, A.; Kumar, S. An Advance Encryption and Attack Detection Framework for Securing Smart Cities Data in Blockchain Using Deep Learning Approach. Wirel. Pers. Commun. 2024, 135, 1329–1362. [Google Scholar] [CrossRef]
- Singh, A.; Kaur, H.; Kaur, N. A novel DDoS detection and mitigation technique using hybrid machine learning model and redirect illegitimate traffic in SDN network. Clust. Comput. 2024, 27, 3537–3557. [Google Scholar] [CrossRef]
- Ramalakshmi, R.; Kavitha, D. DDoS Attack Mitigation using Distributed SDN Multi Controllers for Fog Based IoT Systems. Int. J. Intell. Syst. Appl. Eng. 2024, 12, 57–69. [Google Scholar]
- Han, Y.; Wang, H.; Li, Y.; Zhang, L. Trust-aware and improved density peaks clustering algorithm for fast and secure models in wireless sensor networks. Pervasive Mob. Comput. 2024, 105, 101993. [Google Scholar] [CrossRef]
- Saleh, H.; Marouane, H.; Fakhfakh, A. A Comprehensive Analysis of Security Challenges and Countermeasures in Wireless Sensor Networks Enhanced by Machine Learning and Deep Learning Technologies. Int. J. Saf. Secur. Eng. 2024, 14, 373–386. [Google Scholar] [CrossRef]
- Singh, R.; Sharma, K.; Awasthi, L. A machine learning-based ensemble model for securing the IoT network. Clust. Comput. 2024, 27, 10883–10897. [Google Scholar] [CrossRef]
- National Institute of Standards and Technology (NIST). Advanced Encryption Standard (AES). In FIPS Publication 197; National Institute of Standards and Technology: Gaithersburg, MD, USA, 2001. [Google Scholar] [CrossRef]
- National Institute of Standards and Technology (NIST); Ron, R.; Adi, S.; Leonard, A. Hybrid Cryptographic Standard: AES and RSA; AES Established as FIPS PUB 197 in 2001; RSA Cryptography Specifications Version 2.2, November 2016; National Institute of Standards and Technology (NIST): Gaithersburg, MD, USA, 2016. [Google Scholar] [CrossRef]
- Chatzoglou, E.; Kambourakis, G.; Kolias, C. Empirical evaluation of attacks against IEEE 802.11 enterprise networks: The AWID3 dataset. IEEE Access 2021, 9, 34188–34205. [Google Scholar] [CrossRef]
- IEEE Std 802.11ah-2016; IEEE Standard for Information Technology–Telecommunications and Information Exchange Between Systems–Local and Metropolitan Area Networks–Specific Requirements–Part 11: Wireless LAN Medium Access Control (MAC) and Physical Layer (PHY) Specifications Amendment 2: Sub 1 GHz License Exempt Operation. IEEE Computer Society: Piscataway, NJ, USA, 2016; pp. 1–594. [CrossRef]
- Garcia, S.; Parmisano, A.; Erquiaga, M.J. IoT-23: A labeled dataset with malicious and benign IoT network traffic. In A Dataset Designed for Intrusion Detection in IoT Networks, Containing Labeled Malicious and Benign Traffic Scenarios; Zenedo: Geneva, Switzerland, 2020. [Google Scholar] [CrossRef]
- Dener, M.; Okur, C.; Al, S.; Orman, A. Wsn-bfsf: A new dataset for attacks detection in wireless sensor networks. IEEE Internet Things J. 2023, 11, 2109–2125. [Google Scholar]
Ref. | Methodology | Advantages | Limitations | Results |
---|---|---|---|---|
Mohaned Anwar et al. [17] (2024) | Blockchain-based encryption with dynamic key generation | Improved security and throughput | High computational cost | Latency reduced by 15%, accuracy 91% |
Arivumani et al. [18] (2024) | Convolutional LSTM for intrusion detection | High detection accuracy | Requires large training data | 98% precision, few false positives |
Wang et al. [19] (2024) | Deep reinforcement learning for trust model in underwater WSNs | Mitigates malicious node attacks | Requires online learning updates | 95% attack detection accuracy, energy-efficient routing |
Rasool et al. [20] (2025) | Hybrid optimization for WSN security | Efficient threat detection | High computational complexity | 96% threat detection rate, moderate energy efficiency |
Saveetha et al. [21] (2024) | Federated learning with blockchain for DDoS mitigation | Decentralized security | High resource consumption | 92% accuracy, reduced latency |
Rehman et al. [14] (2022) | Blockchain-enhanced reinforcement learning | Secure distributed encryption | Increased computation overhead | 93% anomaly detection accuracy, low latency overhead |
Fascista et al. [15] (2022) | Adaptive encryption with real-time feedback mechanism | Optimized energy usage | Limited scalability in large networks | 90% improvement in network resilience, energy savings |
Han et al. [30] (2024) | Trust-aware clustering algorithm for IoT networks | Enhances data security | Higher energy consumption | 88% security effectiveness, 20% reduced overhead |
Saleh et al. [31] (2024) | Blockchain and machine learning integration for security | Improved anomaly detection | High storage and processing requirements | 97% anomaly detection accuracy, enhanced privacy protection |
Singh et al. [32] (2024) | Hybrid SVM and RF model for DDoS detection in SDN | High detection accuracy | Model complexity increases processing time | 94% detection rate, improved network resilience |
Rama lakshmi et al. [29] (2024) | Distributed multi-controller approach for mitigating DDoS in fog computing | Scalability in large networks | Requires extensive resource allocation | 90% mitigation efficiency, reduced latency |
Altaweel et al. [8] (2024) | Security strategies for opportunistic mobile networks | Detects various attacks like black-hole and wormhole | High false positive rates in anomaly detection | 91% threat classification accuracy, moderate latency |
Acronym | Definition | Acronym | Definition |
---|---|---|---|
Q-value for state s and action a | Learning rate in Q-learning | ||
Discount factor in reinforcement learning | Reward function for state s and action a | ||
Policy function mapping state s to action probabilities | Value function of state s | ||
Exploration–exploitation parameter in -greedy strategy | Temporal Difference (TD) error | ||
Model parameters in deep reinforcement learning (DRL) | ∇ | Gradient operator used in optimization | |
Expectation operator | Target update parameter in Deep Q-Networks (DQNs) | ||
Eligibility trace decay factor in TD learning | Standard deviation in Gaussian exploration | ||
Loss function for policy optimization | State transition probability from s to given action a | ||
Advantage function in Actor–Critic methods | Entropy regularization parameter |
Parameter | Description |
---|---|
Number of Sensor Nodes | 50 sensor nodes with distinct energy levels and sensing capabilities |
Threat Levels | Low, medium, and high threat levels simulated as random attack intervals |
Energy Levels | Uniformly distributed between and |
Simulation Duration | Fixed period of 300 s during which nodes adaptively transmitted data |
Model | Energy Consumption (J) | PDR (%) | Latency (ms) | Security Effectiveness (Threats Detected) |
---|---|---|---|---|
AES-128 Fixed | 72.5 | 80.1 | 190 | 6 |
Lightweight Block Cipher | 55.0 | 84.3 | 160 | 8 |
Elliptic Curve Cryptography | 65.2 | 86.7 | 175 | 7 |
Hybrid AES-RSA | 70.4 | 88.5 | 150 | 10 |
Dynamic Threshold Encryption | 52.8 | 89.9 | 140 | 12 |
Blockchain-Based Lightweight Encryption | 60.7 | 88.9 | 135 | 11 |
Proposed Model | 45.2 | 92.5 | 120 | 15 |
Metric | Dataset | DDoS | Data Injection | Black-Hole | Worm Hole | Select Forward |
---|---|---|---|---|---|---|
Accuracy (%) | AWID Dataset | 98 | 97 | 97 | 96 | 96 |
IoT-23 Dataset | 98 | 96 | 97 | 94 | 96 | |
WSN-BFSF Dataset | 97 | 95 | 95 | 93 | 95 | |
PDR (%) | AWID Dataset | 95 | 94 | 94 | 93 | 93 |
IoT-23 Dataset | 95 | 92 | 93 | 91 | 92 | |
WSN-BFSF Dataset | 93 | 91 | 92 | 91 | 92 | |
Energy (J) | AWID Dataset | 73 | 70 | 71 | 69 | 70 |
IoT-23 Dataset | 73 | 70 | 71 | 68 | 69 | |
WSN-BFSF Dataset | 74 | 70 | 71 | 69 | 70 | |
Latency (ms) | AWID Dataset | 120 | 115 | 118 | 122 | 116 |
IoT-23 Dataset | 122 | 116 | 119 | 124 | 117 | |
WSN-BFSF Dataset | 125 | 118 | 121 | 126 | 119 | |
Mitigation (%) | AWID Dataset | 91 | 89 | 90 | 87 | 88 |
IoT-23 Dataset | 90 | 88 | 89 | 86 | 87 | |
WSN-BFSF Dataset | 89 | 87 | 88 | 85 | 86 |
Attack Type | Mitigation Success Rate |
---|---|
DDoS | 95% |
Data Injection | 92% |
Black-Hole | 94% |
Wormhole | 90% |
Selective Forwarding | 91% |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Premakumari, S.B.N.; Sundaram, G.; Rivera, M.; Wheeler, P.; Guzmán, R.E.P. Reinforcement Q-Learning-Based Adaptive Encryption Model for Cyberthreat Mitigation in Wireless Sensor Networks. Sensors 2025, 25, 2056. https://doi.org/10.3390/s25072056
Premakumari SBN, Sundaram G, Rivera M, Wheeler P, Guzmán REP. Reinforcement Q-Learning-Based Adaptive Encryption Model for Cyberthreat Mitigation in Wireless Sensor Networks. Sensors. 2025; 25(7):2056. https://doi.org/10.3390/s25072056
Chicago/Turabian StylePremakumari, Sreeja Balachandran Nair, Gopikrishnan Sundaram, Marco Rivera, Patrick Wheeler, and Ricardo E. Pérez Guzmán. 2025. "Reinforcement Q-Learning-Based Adaptive Encryption Model for Cyberthreat Mitigation in Wireless Sensor Networks" Sensors 25, no. 7: 2056. https://doi.org/10.3390/s25072056
APA StylePremakumari, S. B. N., Sundaram, G., Rivera, M., Wheeler, P., & Guzmán, R. E. P. (2025). Reinforcement Q-Learning-Based Adaptive Encryption Model for Cyberthreat Mitigation in Wireless Sensor Networks. Sensors, 25(7), 2056. https://doi.org/10.3390/s25072056