Optimal Security Task Offloading in Cognitive IoT Networks: Provably Optimal Threshold Policies and Model-Free Learning
Abstract
1. Introduction
1.1. Cognitive IoT Networks and Mobile Edge Computing
1.2. Research Challenge and Problem Formulation
- Multiple heterogeneous MEC servers with different processing capabilities serve the IoT device;
- Critical threats arrive randomly and preempt security tasks on MEC servers;
- Security tasks arrive continuously and must be either processed locally or queued for MEC offloading;
- Queue congestion creates security risks as pending tasks leave vulnerabilities unaddressed;
- System dynamics are stochastic, with random arrivals and processing times.
1.3. Our Main Contributions
1.4. Practical Impact
- Deploy analytically grounded policies with simple implementation requiring only queue length monitoring;
- Benchmark AI-based controllers against theoretical performance bounds;
- Design hybrid architectures combining structural insights with neural adaptation;
- Inform quality-of-service provisioning for latency-sensitive security applications under the modeled conditions;
- Reduce the computational overhead by using simple threshold rules instead of complex neural networks for moderate-scale systems.
1.5. Paper Organization
2. System Model
2.1. System Model Assumptions
| Symbol | Domain | Description |
|---|---|---|
| System Parameters | ||
| N | Number of MEC servers | |
| M | Maximum queue capacity | |
| Security task arrival rate (Poisson) | ||
| Critical threat arrival rate, MEC i | ||
| Critical threat processing rate, MEC i | ||
| Security task processing rate, MEC i | ||
| R | Reward for processing a security task | |
| Discount factor | ||
| Queuing cost function | ||
| State Variables | ||
| s | Complete system state | |
| MEC server state vector | ||
| MEC server i state | ||
| k | Queue length | |
| e | Event type | |
| Functions | ||
| Value function | ||
| Optimal value function | ||
| Policy | ||
| Optimal threshold | ||
2.2. State Space
- The MEC server state vector is with , where
- –
- : MEC server i is idle;
- –
- : MEC server i is processing a critical threat;
- –
- : MEC server i is processing a security task.
- The queue length is , denoting waiting security tasks.
- The event type is , where
- MEC server 1 is idle ();
- MEC server 2 is processing a critical threat ();
- Three security tasks are waiting in the queue ();
- A new security task arrives ().
2.3. Action Space and Decision Epochs
- Security task arrivals (rate ) (an offloading decision is required only when all MEC servers are occupied; otherwise, the arriving ST is automatically assigned to the highest-priority idle server);
- Critical threat arrivals at MEC server i (rate );
- Critical threat completions at MEC server i (rate );
- Security task completions at MEC server i (rate ).
2.4. Transition Dynamics
2.5. Reward Structure
- (no cost when queue empty);
- is non-decreasing;
- is convex: .
2.6. Optimization Objective
2.7. Threshold Policy
3. Optimal Policy
3.1. Threshold Policy Structure
3.2. Value Function at Decision States
- OFFLOAD: The arriving task joins the queue, so there are waiting tasks plus one in-service task, so server 2 total tasks remaining in the system at the next epoch. Hence, .
- LOCAL: The arriving task is processed locally and does not enter the queue, so there are k waiting tasks plus one in-service task so server 2 in total. Hence .
3.3. Concavity Preservation Lemma
3.4. Proof of Threshold Structure
- If at some queue length k, then by monotonicity (22), for all .
- Therefore, if LOCAL is optimal at k, it remains optimal for all .
3.5. Value Iteration and Concavity Preservation
- (i)
- The value function is concave and non-increasing in k;
- (ii)
- The optimal policy is a threshold policy.
3.6. Extension to General MEC Server Configurations
- (a)
- : MEC server 1 has critical threat and MEC server 2 processes security task.
- (b)
- : MEC server 1 processes security task and MEC server 2 has critical threat.
- (c)
- : Both MEC servers process critical threats.
- The base case holds; is concave due to convexity of .
- The inductive step holds; Lemma 1 ensures that concavity is preserved.
- Convergence follows from the standard CTMDP theory.
3.7. Computing the Optimal Threshold
- State Space Size: .
- Per-Iteration Cost: (due to transition probability calculations).
- Convergence Rate: Geometric with rate (discount factor).
3.8. Summary and Discussion
- 1.
- Structural result (Theorem 1): The concavity of the value function implies threshold policy optimality.
- 2.
- Concavity proof (Theorem 2): Value iteration preserves concavity when is convex.
- 3.
- General applicability (Theorem 3): Threshold policies are optimal for all the MEC server configurations.
- Convex queuing cost: is convex and non-decreasing.
- Exponential processing times: Critical threat and security task processing durations are exponentially distributed.
- Poisson arrivals: Both critical threat and security task arrivals follow Poisson processes.
- Preemptive priority: Critical threats can preempt security tasks without penalties.
- 1.
- Simplicity: Optimal policies have a simple threshold structure, which is easy to implement in IoT devices.
- 2.
- Robustness: Threshold policies are robust to parameter estimation errors.
- 3.
- Computation: Value iteration converges efficiently for practical system sizes.
- 4.
- Learning: Model-free methods (Section 4) can discover through experience.
- Multiple classes of security tasks with different priorities.
- Finite capacity constraints on individual MEC servers.
- Time-varying threat arrival rates (within slowly varying regimes).
- Non-convex cost functions (e.g., step costs).
- Non-exponential processing time distributions.
- Imperfect threat detection with false positives/negatives.
- Network latency and communication overhead variations.
- Bursty and correlated attack arrival patterns (e.g., coordinated DDoS).
- The joint optimization of offloading decisions with energy consumption and communication resource allocation.
- Per-server or hierarchical queue architectures replacing the single shared FIFO queue, which may better reflect deployments where MEC servers are geographically distributed.
4. The Q-Learning Based Optimization Algorithm
| Algorithm 1 Q-Learning Algorithm |
| Input: |
| Output: |
| Initialize: , , , . |
| For to n do |
| Repeat: At Epoch k, Choose an action , then observe, |
| and . |
| Repeat: and . |
| Update: according to Equation (34). |
| Until all the epochs and action space is terminal (according to the test data). |
| If then |
| Update: . |
| End |
| Until convergence. |
| End. |
- The action value function satisfies the Bellman equation:
5. Simulation Analysis and Results
5.1. Simulation and Methodology
5.1.1. Simulation Environment
5.1.2. Baseline System Configuration
- Arrival rates: ST arrival rate ; CT arrival rates .
- Service rates: CT service rates ; ST service rates , (priority ordering).
- Economic parameters: immediate reward ; discount factor .
- Cost function: Convex holding cost where and the maximum queue capacity .
- Learning parameters: Learning rate ; the exploration rate decays from 0.3 to 0.01 with factor 0.995 per episode.
5.1.3. Training Procedure
5.1.4. Reproducibility Summary
5.2. Baseline Policy Comparison
- 1.
- Q-learning (Baseline learner): Our algorithm (Algorithm 1) from Section 4, selected as the primary learner due to having the fastest convergence.
- 2.
- SARSA: On-policy temporal difference learning.
- 3.
- DQN: Deep Q-Network with experience replay [25].
- 4.
- FIXED-k: A fixed threshold policy with .
- 5.
- GREEDY: Always accept if any capacity available.
- 6.
- RANDOM: Accept with probability 0.5 (lower bound).
| Policy | Avg Reward | Blocking Prob. | Avg Queue | Server Util. | Conv. Episodes | Time (s) |
|---|---|---|---|---|---|---|
| Q-learning (Baseline) | ||||||
| SARSA | 3500 | |||||
| DQN | 4000 | |||||
| FIXED-k = 5 | N/A | |||||
| FIXED-k = 10 | N/A | |||||
| FIXED-k = 15 | N/A | |||||
| FIXED-k = 20 | N/A | |||||
| GREEDY | N/A | |||||
| RANDOM | N/A |
5.3. Sensitivity Analysis
5.3.1. Load Parameter Sensitivity
5.3.2. Discount Factor Sensitivity
5.3.3. Cost Function Sensitivity
5.4. Robustness Under Realistic Imperfections
5.4.1. Imperfect Sensing
5.4.2. Parameter Estimation Errors
5.4.3. MEC Server Switching Costs
5.4.4. Practical Deployment Considerations
5.4.5. Comparison with Non-Poisson and Bursty Traffic Scenarios
5.5. Scalability Analysis
5.5.1. MEC Server Scalability
5.5.2. Function Approximation for Large-Scale Systems ()
5.5.3. Queue Capacity Sensitivity
5.6. Statistical Validation
5.6.1. Hypothesis Testing
5.6.2. Effect Size Analysis
5.6.3. Convergence Stability
5.6.4. Sensitivity to Initial Conditions
5.7. Discussion and Practical Implications
5.7.1. Threshold Policy Validation
5.7.2. Computational Feasibility
5.7.3. Adaptability
5.7.4. Scalability Limits
5.7.5. Design Guidelines
5.7.6. Scalable Learning for Large MEC Systems
6. Conclusions
6.1. Contributions
6.2. Limitations
6.3. Future Work
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
References
- Lueth, K.L. State of IoT 2024: Number of Connected IoT Devices Growing 13% to 18.8 Billion Globally. IoT Analytics Research. 28 October 2025. Available online: https://iot-analytics.com/number-connected-iot-devices/ (accessed on 15 January 2026).
- Gkonis, P.; Giannopoulos, A.; Trakadas, P.; Masip-Bruin, X.; D’Andria, F. A survey on IoT-edge-cloud continuum systems: Status, challenges, use cases, and open issues. Future Internet 2023, 15, 383. [Google Scholar] [CrossRef]
- Ni, J.; Zhang, K.; Lin, X.; Shen, X.S. Securing fog computing for internet of things applications: Challenges and solutions. IEEE Commun. Surv. Tutor. 2017, 20, 601–628. [Google Scholar] [CrossRef]
- Rajesh, R.; Hemalatha, S.; Nagarajan, S.M.; Devarajan, G.G.; Omar, M.; Bashir, A.K. Threat detection and mitigation for tactile internet driven consumer IoT-healthcare system. IEEE Trans. Consum. Electron. 2024, 70, 4249–4257. [Google Scholar]
- Xiao, Y.; Jia, Y.; Liu, C.; Cheng, X.; Yu, J.; Lv, W. Edge computing security: State of the art and challenges. Proc. IEEE 2019, 107, 1608–1631. [Google Scholar] [CrossRef]
- Wu, Q.; Ding, G.; Xu, Y.; Feng, S.; Du, Z.; Wang, J.; Long, K. Cognitive Internet of Things: A new paradigm beyond connection. IEEE Internet Things J. 2014, 1, 129–143. [Google Scholar] [CrossRef]
- Zhang, Y.; Ma, X.; Zhang, J.; Hossain, M.S.; Muhammad, G.; Amin, S.U. Edge Intelligence in the Cognitive Internet of Things: Improving Sensitivity and Interactivity. IEEE Netw. 2019, 33, 58–64. [Google Scholar] [CrossRef]
- Wang, D.; Bakar, K.B.A.; Isyaku, B.; Eisa, T.A.E.; Abdelmaboud, A. A comprehensive review on internet of things task offloading in multi-access edge computing. Heliyon 2024, 10, e29916. [Google Scholar] [CrossRef]
- Hosseinpour, M.; Yaghmaee, M.H. Quality of experience aware computation offloading in MEC-enabled blockchain-based IoT networks. IEEE Internet Things J. 2023, 11, 14483–14493. [Google Scholar] [CrossRef]
- Tang, M.; Wong, V.W.S. Deep reinforcement learning for task offloading in mobile edge computing systems. IEEE Trans. Mob. Comput. 2020, 21, 1985–1997. [Google Scholar] [CrossRef]
- Dong, S.; Zhou, H. Task offloading strategies for mobile edge computing: A survey. Comput. Netw. 2024, 254, 110791. [Google Scholar] [CrossRef]
- Wang, D.; Bakar, K.B.A.; Isyaku, B. Two-stage IoT computational task offloading decision-making in MEC with request holding and dynamic eviction. Comput. Mater. Contin. 2024, 80, 2065. [Google Scholar] [CrossRef]
- Hossain, M.A.; Liu, W.; Ansari, N. Computation-efficient offloading and power control for MEC in IoT networks by meta reinforcement learning. IEEE Internet Things J. 2024, 11, 16722–16730. [Google Scholar] [CrossRef]
- Zhao, Y.; Xiang, Z.; Lu, Q. Performance evaluation for secondary users in finite-source cognitive radio networks with dynamic preemption limit. AEU-Int. J. Electron. Commun. 2022, 149, 154183. [Google Scholar] [CrossRef]
- Chi, J.; Qiu, T.; Xiao, F.; Zhou, X. ATOM: Adaptive task offloading with two-stage hybrid matching in MEC-enabled industrial IoT. IEEE Trans. Mob. Comput. 2023, 23, 4861–4877. [Google Scholar] [CrossRef]
- Naparstek, O.; Cohen, K. Deep multi-user reinforcement learning for distributed dynamic spectrum access. IEEE Trans. Wirel. Commun. 2018, 18, 310–323. [Google Scholar] [CrossRef]
- Kaur, A.; Kumar, K. Intelligent spectrum management based on reinforcement learning schemes in cooperative cognitive radio networks. Phys. Commun. 2020, 43, 101226. [Google Scholar] [CrossRef]
- Zhao, D.; Qin, H.; Song, B.; Han, B.; Du, X.; Guizani, M. A graph convolutional network-based deep reinforcement learning approach for resource allocation in a cognitive radio network. Sensors 2020, 20, 5216. [Google Scholar] [CrossRef] [PubMed]
- Liu, S.; Pan, C.; Zhang, C.; Yang, F.; Song, J. Dynamic spectrum sharing based on deep reinforcement learning in mobile communication systems. Sensors 2023, 23, 2622. [Google Scholar] [CrossRef]
- Ukpong, U.C.; Idowu-Bismark, O.; Adetiba, E.; Kala, J.R.; Owolabi, E.; Oshin, O.; Abayomi, A.; Dare, O.E. Deep reinforcement learning agents for dynamic spectrum access in television whitespace cognitive radio networks. Sci. Afr. 2025, 27, e02523. [Google Scholar] [CrossRef]
- Puterman, M.L. Markov Decision Processes: Discrete Stochastic Dynamic Programming; John Wiley & Sons: Hoboken, NJ, USA, 2014. [Google Scholar]
- Zhao, Y.; Xiang, Z. A multichannel allocation strategy based on preemption threshold and preemption probability in cognitive radio networks. Mob. Inf. Syst. 2021, 2021, 6190872. [Google Scholar] [CrossRef]
- Lippman, S. Applying a new device in the optimization of exponential queuing systems. Oper. Res. 1975, 23, 687–710. [Google Scholar] [CrossRef]
- Watkins, C.J.; Dayan, P. Q-learning. Mach. Learn. 1992, 8, 279–292. [Google Scholar] [CrossRef]
- Mnih, V.; Kavukcuoglu, K.; Silver, D.; Rusu, A.A.; Veness, J.; Bellemare, M.G.; Graves, A.; Riedmiller, M.; Fidjeland, A.K.; Ostrovski, G.; et al. Human-level control through deep reinforcement learning. Nature 2015, 518, 529–533. [Google Scholar] [CrossRef] [PubMed]
- Bashar, S.; Ding, Z. Admission control and resource allocation in a heterogeneous OFDMA wireless network. IEEE Trans. Wirel. Commun. 2009, 8, 4200–4210. [Google Scholar] [CrossRef]
- Nieto, G.; de la Iglesia, I.; Lopez-Novoa, U.; Perfecto, C. Deep reinforcement learning techniques for dynamic task offloading in the 5G edge-cloud continuum. J. Cloud Comput. 2024, 13, 94. [Google Scholar] [CrossRef]


| Category | Parameter | Value |
|---|---|---|
| Environment | ||
| MEC servers N | 2 (baseline); {2, 4, 8, 16} (tabular scalability); {4, 8, 16, 32} (DQN-agg.) | |
| Max queue capacity M | 100 (baseline); {10, 50, 100, 500} (sensitivity) | |
| ST arrival rate | 2.0 (baseline); [0.5, 8.0] (load sensitivity); (scalability, Table 6) | |
| CT arrival rates | 1.0 per server | |
| CT service rates | 2.0 per server | |
| ST service rates | 3.0, 2.5 | |
| Reward R | 10.0 | |
| Discount factor | 0.95 (baseline); sensitivity sweep {0.8, 0.9, 0.95, 0.99} | |
| Cost function | (baseline); with , (sensitivity) | |
| Training | ||
| Episodes | 20,000 (baseline, ); increased for larger N (see Table 6) | |
| Decision epochs per episode | 50 (max) | |
| Independent runs | 10 | |
| Random seeds | {0, 1, 2, …, 9} | |
| Bootstrap samples (CI) | 1000 | |
| Exploration (all learning algorithms) | ||
| Exploration (initial) | 0.3 | |
| Exploration (final) | 0.01 | |
| -decay factor | 0.995 per episode | |
| Q-Learning/SARSA | ||
| Learning rate | ||
| Q-table initialization | (default; see Section 5.6 for alternatives) | |
| DQN-Aggregate (Section 5.5) † | ||
| Hidden layers | 3 (128-64-32 neurons, ReLU) | |
| Experience replay buffer | 10,000 transitions | |
| Mini-batch size | 32 | |
| Target network update | Every 500 steps | |
| Hardware | ||
| Processor | Intel Core i7-10700K (8 cores, 3.8 GHz) | |
| RAM | 32 GB DDR4 | |
| OS | Ubuntu 20.04 LTS | |
| Avg Reward | Learned | Blocking Prob. | |
|---|---|---|---|
| 0.80 | 12 | ||
| 0.90 | 11 | ||
| 0.95 | 10 | ||
| 0.99 | 9 |
| Avg Reward | Performance Retained | ||
|---|---|---|---|
| 0.00 | 0.00 | 100% (baseline) | |
| 0.01 | 0.01 | 98.5% | |
| 0.05 | 0.05 | 93.0% | |
| 0.10 | 0.10 | 85.2% |
| N | State Space | Conv. Episodes | Time (min) | Learned |
|---|---|---|---|---|
| 2 | ∼ | 2500 | 0.14 | 10 |
| 4 | ∼ | 8500 | 2.45 | 21 |
| 8 | ∼ | 35,000 | 18.3 | 42 |
| 16 | ∼ | 145,000 | 112.5 | 85 |
| N | Method | Learned | Avg Reward | Conv. Episodes | Time |
|---|---|---|---|---|---|
| 4 | Tabular Q | 21 | 8500 | 2.45 min | |
| 4 | DQN-Agg | 21 | 12,000 | 3.1 min | |
| 8 | Tabular Q | 42 | 35,000 | 18.3 min | |
| 8 | DQN-Agg | 42 | 28,000 | 12.8 min | |
| 16 | Tabular Q | 85 | 145,000 | 112.5 min | |
| 16 | DQN-Agg | 84 | 55,000 | 38.7 min | |
| 32 | Tabular Q | Infeasible (state space ∼) | |||
| 32 | DQN-Agg | 163 | 50,000 | 45.2 min | |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2026 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license.
Share and Cite
Wang, N.; Ren, Y. Optimal Security Task Offloading in Cognitive IoT Networks: Provably Optimal Threshold Policies and Model-Free Learning. IoT 2026, 7, 30. https://doi.org/10.3390/iot7020030
Wang N, Ren Y. Optimal Security Task Offloading in Cognitive IoT Networks: Provably Optimal Threshold Policies and Model-Free Learning. IoT. 2026; 7(2):30. https://doi.org/10.3390/iot7020030
Chicago/Turabian StyleWang, Ning, and Yali Ren. 2026. "Optimal Security Task Offloading in Cognitive IoT Networks: Provably Optimal Threshold Policies and Model-Free Learning" IoT 7, no. 2: 30. https://doi.org/10.3390/iot7020030
APA StyleWang, N., & Ren, Y. (2026). Optimal Security Task Offloading in Cognitive IoT Networks: Provably Optimal Threshold Policies and Model-Free Learning. IoT, 7(2), 30. https://doi.org/10.3390/iot7020030

