IoT Network Security Threat Detection Algorithm Integrating Symmetric Routing and a Sparse Mixture-of-Experts Model
Abstract
1. Introduction
- (I)
- In terms of base model design, ConvNeXt is effectively combined with a Sparse MoEs framework to form an integrated model that unifies feature extraction and expert decision-making. ConvNeXt is employed to automatically extract multi-scale, hierarchical spatial features from network traffic data, thereby improving the efficiency and accuracy of recognizing complex attack patterns. On this basis, the Sparse MoEs architecture aggregates multiple parallel BiLSTM experts to capture sequence-level dependencies among ordered features.
- (II)
- For IoT network intrusion detection scenarios, we introduce a symmetric linear routing mechanism into the gating network, which effectively mitigates the strong perturbation sensitivity of routing gates and the expert load imbalance commonly observed in conventional MoE models. This design enables efficient collaborative inference among multiple experts and leads to improved classification accuracy in the final decision.
- (III)
- To address the asymmetric class distribution in IoT intrusion datasets, we integrate random undersampling with SMOTE–Tomek oversampling. This strategy enhances the representation of minority classes while preserving high recognition rates for majority classes, thereby improving class balance, boosting overall accuracy, and alleviating bias induced by data imbalance.
2. Related Work
2.1. IoT Network Intrusion Detection Technology
2.2. Sparse Mixture-of-Experts
3. Proposed Model Framework
3.1. ConvNeXt Feature Extraction Module
3.2. Sparse Mixture-of-Experts Classification Module
3.2.1. Routing Gate Network
3.2.2. Expert Model
3.2.3. Auxiliary Loss
3.3. Algorithmic Implementation Workflow
Calculation of the Model Calculations and Parameter Quantities
- (I)
- Complexity analysis of the ConvNeXt-1D feature extraction module:The ConvNeXt-1D feature extraction network begins with a stem layer for initial representation learning, implemented using a 1D convolution with a kernel size of . This layer takes a single input channel and outputs channels. With a stride of , the input sequence length is downsampled to at this stage. Accordingly, the parameter count and computational cost (FLOPs) of the stem module are given in Equations (15) and (16), respectively.In the ConvNeXt Block-1D, let the input channel dimension be d. This block comprises a 1D depthwise convolution with kernel size k (with groups ), a LayerNorm (LN) layer, two pointwise convolutions with channel mappings and , and a learnable scaling factor . The corresponding parameter count and FLOPs are given in Equations (17) and (18), respectively.The ConvNeXt-1D network consists of multiple stages. For the s-th stage, let the channel width be , the number of ConvNeXt Block-1D units be , and the input sequence length be . Then, the parameter count and computational cost (FLOPs) of this stage are given in Equations (19) and (20), respectively.A Downsample-1D module is inserted between adjacent stages to perform channel transition and sequence downsampling. Let its input and output channels be and , respectively, with a kernel size of 2, and the output sequence length satisfies . Then, the parameter count and computational cost (FLOPs) of the Downsample-1D module are given by Equations (21) and (22), respectively.Therefore, assuming that the ConvNeXt-1D feature extraction network contains S stages, its total number of parameters is given by Equation (12), where the final output normalization LayerNorm contributes learnable parameters. The corresponding total computational cost (FLOPs) is given by Equation (13). Note that the FLOPs of the output normalization are typically negligible compared with those of the convolutional backbone and the expert modules and are thus omitted in standard complexity accounting.
- (II)
- Complexity analysis of the sparse mixture-of-experts module:First, for the gating linear layer (Gate), let the input feature dimension be , and the number of experts be M. Then, the parameter count and computational cost (FLOPs) of the gate are given by Equations (25) and (26), respectively.For a single BiLSTM expert, let the input feature dimension be , the sequence length be , the hidden size be H, and the number of layers be 1. Then, the parameter count and computational cost (FLOPs) of this expert are given by Equations (27) and (28), respectively.The classification head is implemented as a linear layer with input dimension , which maps the features to classes. Its parameter count and computational cost (FLOPs) are given by Equations (29) and (30), respectively.Although the model contains M experts, under the Top-K sparse routing mechanism, only K experts are activated for each sample during the forward pass. Therefore, the parameter count and computational cost (FLOPs) of the Sparse MoEs module are given by Equations (31) and (32), respectively.Therefore, the total parameter count and FLOPs of the model during inference are given by Equations (33) and (34), respectively. It can be observed that Sparse MoEs module increases conditional capacity by introducing more experts, while the inference time computational cost is mainly governed by the Top-K value K. During training, the proposed symmetric gating requires an additional forward pass through the backbone and averages the forward and reverse gating probabilities; during inference, only the forward gate is used, and thus the inference FLOPs do not increase.Overall, the computational cost of the proposed model is mainly dominated by two parts: (i) the pointwise and downsampling convolutions in the ConvNeXt-1D backbone and (ii) the BiLSTM expert computations in the Sparse MoEs module over the output sequence length . In contrast, operations such as global average pooling, LayerNorm, GELU, softmax, and Top-K selection are typically non-dominant and are thus treated approximately in the FLOPs accounting (reported using “≈”). Notably, Sparse MoEs increases the parameter count with the number of experts M to enhance capacity, while the inference time computation is primarily governed by the Top-K value K, enabling performance gains under a bounded inference budget. Moreover, the proposed symmetric linear gating mainly increases training time cost but introduces negligible overhead during inference.
4. Experiments
4.1. Experimental Setup
- (I)
- Experimental hardware environment: The experiments were conducted on a workstation running Windows 11, equipped with an Intel® Core™ i9-9900K processor (3.60 GHz), 32 GB of RAM, and an NVIDIA GeForce RTX 2080 Ti GPU.
- (II)
- Experimental software environment: The proposed sparse-expert-based intrusion detection method for IoT networks was implemented using the PyTorch (version 2.5.1) framework. Data processing, numerical computation, and visualization were mainly carried out with libraries such as Pandas (version 2.2.3), NumPy (version 1.26.4), and Matplotlib (version 3.10.0).
4.2. Datasets and Pre-Processing
4.3. Training, Evaluation, and Results Analysis
4.3.1. Training Loss Function
4.3.2. Evaluation Metrics
4.3.3. Comparative Analysis Experiments on the CIC-IDS2018 Dataset
4.3.4. Comparative Analysis Experiments on the TON-IoT Dataset
4.3.5. Comparative Analysis Experiments on the BoT-IoT Dataset
4.3.6. Ablation Study
4.3.7. Analysis of Results
5. Conclusions and Future Work
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
References
- Danladi, M.S.; Baykara, M. Low Power Wide Area Network Technologies: Open Problems, Challenges, and Potential Applications. Rev. Comput. Eng. Stud. 2022, 9, 2. [Google Scholar] [CrossRef]
- Zhou, Y.; Chen, X. Edge Intelligence: Edge Computing for 5G and the Internet of Things. Future Internet 2025, 17, 101. [Google Scholar] [CrossRef]
- Adewale, T.; Paul, J. AI, 5G, and IoT: How These Technologies Are Creating the Perfect Storm for Smart Systems. Available online: https://www.researchgate.net/publication/385855348_AI_5G_and_IoT_How_These_Technologies_Are_Creating_the_Perfect_Storm_for_Smart_Systems (accessed on 1 November 2024).
- Le Jeune, L.; Goedeme, T.; Mentens, N. Machine Learning for Misuse-Based Network Intrusion Detection: Overview, Unified Evaluation and Feature Choice Comparison Framework. IEEE Access 2021, 9, 63995–64015. [Google Scholar] [CrossRef]
- Jyothsna, V.; Prasad, R.; Prasad, K.M. A Review of Anomaly Based Intrusion Detection Systems. Int. J. Comput. Appl. 2011, 28, 26–35. [Google Scholar] [CrossRef]
- Ahmad, R.; Alsmadi, I.; Alhamdani, W.; Tawalbeh, L.A. Zero-Day Attack Detection: A Systematic Literature Review. Artif. Intell. Rev. 2023, 56, 10733–10811. [Google Scholar] [CrossRef]
- Buchta, R.; Gkoktsis, G.; Heine, F.; Kleiner, C. Advanced Persistent Threat Attack Detection Systems: A Review of Approaches, Challenges, and Trends. Digit. Threat. Res. Pract. 2024, 5, 1–37. [Google Scholar] [CrossRef]
- Debar, H.; Dacier, M.; Wespi, A. Towards a Taxonomy of Intrusion-Detection Systems. Comput. Netw. 1999, 31, 805–822. [Google Scholar] [CrossRef]
- Jordan, M.I.; Mitchell, T.M. Machine Learning: Trends, Perspectives, and Prospects. Science 2015, 349, 255–260. [Google Scholar] [CrossRef]
- LeCun, Y.; Bengio, Y.; Hinton, G. Deep Learning. Nature 2015, 521, 436–444. [Google Scholar] [CrossRef]
- Saleem, T.J.; Chishti, M.A. Deep Learning for Internet of Things Data Analytics. Procedia Comput. Sci. 2019, 163, 381–390. [Google Scholar] [CrossRef]
- Idrissi, I.; Boukabous, M.; Azizi, M.; Moussaoui, O.; El Fadili, H. Toward a Deep Learning-Based Intrusion Detection System for IoT against Botnet Attacks. IAES Int. J. Artif. Intell. 2021, 10, 110. [Google Scholar] [CrossRef]
- Ghurab, M.; Gaphari, G.; Alshami, F.; Alshamy, R.; Othman, S. A Detailed Analysis of Benchmark Datasets for Network Intrusion Detection System. Asian J. Res. Comput. Sci. 2021, 7, 14–33. [Google Scholar] [CrossRef]
- Hussain, N.; Rani, P. Comparative Studied Based on Attack Resilient and Efficient Protocol with Intrusion Detection System Based on Deep Neural Network for Vehicular System Security. In Distributed Artificial Intelligence; Taylor & Francis: Abingdon, UK, 2020; pp. 217–236. [Google Scholar]
- Shazeer, N.; Mirhoseini, A.; Maziarz, K.; Davis, A.; Le, Q.; Hinton, G.; Dean, J. Outrageously Large Neural Networks: The Sparsely-Gated Mixture-of-Experts Layer. arXiv 2017, arXiv:1701.06538. [Google Scholar]
- Liu, Z.; Mao, H.; Wu, C.-Y.; Feichtenhofer, C.; Darrell, T.; Xie, S. A ConvNet for the 2020s. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), New Orleans, LA, USA, 18–24 June 2022; pp. 11976–11986. [Google Scholar]
- Siami-Namini, S.; Tavakoli, N.; Namin, A.S. The Performance of LSTM and BiLSTM in Forecasting Time Series. In Proceedings of the 2019 IEEE International Conference on Big Data (Big Data), Los Angeles, CA, USA, 9–12 December 2019; IEEE: New York, NY, USA, 2019; pp. 3285–3292. [Google Scholar]
- Rokach, L.; Maimon, O. Decision Trees. In Data Mining and Knowledge Discovery Handbook; Springer: Boston, MA, USA, 2005; pp. 165–192. [Google Scholar]
- Hearst, M.A.; Dumais, S.T.; Osuna, E.; Platt, J.; Scholkopf, B. Support Vector Machines. IEEE Intell. Syst. Appl. 1998, 13, 18–28. [Google Scholar] [CrossRef]
- Peterson, L.E. K-Nearest Neighbor. Scholarpedia 2009, 4, 1883. [Google Scholar] [CrossRef]
- Breiman, L. Random Forests. Mach. Learn. 2001, 45, 5–32. [Google Scholar] [CrossRef]
- Tahri, R.; Jarrar, A.; Lasbahani, A.; Balouki, Y. A Comparative Study of Machine Learning Algorithms on the UNSW-NB15 Dataset. ITM Web Conf. 2022, 48, 03002. [Google Scholar] [CrossRef]
- Kilincer, I.F.; Ertam, F.; Sengur, A. Machine Learning Methods for Cyber Security Intrusion Detection: Datasets and Comparative Study. Comput. Netw. 2021, 188, 107840. [Google Scholar] [CrossRef]
- Koonce, B. ResNet 50. In Convolutional Neural Networks with Swift for TensorFlow: Image Recognition and Dataset Categorization; Apress: Berkeley, CA, USA, 2021; pp. 63–72. [Google Scholar]
- He, H.; Bai, Y.; Garcia, E.A.; Li, S. ADASYN: Adaptive Synthetic Sampling Approach for Imbalanced Learning. In Proceedings of the 2008 IEEE International Joint Conference on Neural Networks, Hong Kong, China, 1–8 June 2008; IEEE: New York, NY, USA, 2008; pp. 1322–1328. [Google Scholar]
- Chawla, N.V.; Bowyer, K.W.; Hall, L.O.; Kegelmeyer, W.P. SMOTE: Synthetic Minority Over-Sampling Technique. J. Artif. Intell. Res. 2002, 16, 321–357. [Google Scholar] [CrossRef]
- Yen, S.-J.; Lee, Y.-S. Cluster-Based Under-Sampling Approaches for Imbalanced Data Distributions. Expert Syst. Appl. 2009, 36, 5718–5727. [Google Scholar] [CrossRef]
- Hasanin, T.; Khoshgoftaar, T. The Effects of Random Undersampling with Simulated Class Imbalance for Big Data. In Proceedings of the 2018 IEEE International Conference on Information Reuse and Integration (IRI), Salt Lake City, UT, USA, 6–9 July 2018; IEEE: New York, NY, USA, 2018; pp. 70–79. [Google Scholar]
- Zhang, K.; Zheng, R.; Li, C.; Zhang, S.; Wu, X.; Sun, S.; Yang, J.; Zheng, J. SE-DWNet: An Advanced ResNet-Based Model for Intrusion Detection with Symmetric Data Distribution. Symmetry 2025, 17, 526. [Google Scholar] [CrossRef]
- Dangol, N.; Eaman, A.; Shakshuki, E.; Hassan, E. Impact of Resampling Techniques in Deep Learning Based Intrusion Detection: A Comparative Study on NSL-KDD and UNSW-NB15. Procedia Comput. Sci. 2025, 272, 84–91. [Google Scholar] [CrossRef]
- Zhang, J.; Ling, Y.; Fu, X.; Yang, X.; Xiong, G.; Zhang, R. Model of the Intrusion Detection System Based on the Integration of Spatial–Temporal Features. Comput. Secur. 2020, 97, 101946. [Google Scholar] [CrossRef]
- Li, Y.; Zhang, S.; Yang, H. Feature-Space Transformations for Robust Network Intrusion Detection. Expert Syst. Appl. 2023, 224, 119927. [Google Scholar]
- Sayem, I.M.; Sayed, M.I.; Saha, S.; Haque, A. ENIDS: A Deep Learning-Based Ensemble Framework for Network Intrusion Detection Systems. IEEE Trans. Netw. Serv. Manag. 2024, 21, 5809–5825. [Google Scholar] [CrossRef]
- Wang, Z.; Liu, Y.; He, D.; Chan, S. Intrusion Detection Methods Based on Integrated Deep Learning Model. Comput. Secur. 2021, 103, 102177. [Google Scholar] [CrossRef]
- Ncir, C.E.B.; HajKacem, M.A.B.; Alattas, M. Enhancing Intrusion Detection Performance Using Explainable Ensemble Deep Learning. PeerJ Comput. Sci. 2024, 10, e2289. [Google Scholar] [CrossRef]
- Ilias, L.; Doukas, G.; Lamprou, V.; Ntanos, C.; Askounis, D. Convolutional Neural Networks and Mixture of Experts for Intrusion Detection in 5G Networks and Beyond. arXiv 2024, arXiv:2412.03483. [Google Scholar] [CrossRef]
- Shanka, S.; Singh, D.; Badoni, A.; Shukla, M.K. Towards Robust IDS in Network Security: Handling Class Imbalance with Deep Hybrid Architectures. IEEE Netw. Lett. 2025, 7, 120–124. [Google Scholar] [CrossRef]
- Wang, L.; Sikdar, B.; Zhang, K.; Wang, Y. MoE-TransDLD: A Transformer-Driven Mixture of Experts for Cyber-Attack Detection in Power Systems. In Proceedings of the 2025 IEEE 19th International Conference on Control & Automation (ICCA), Tallinn, Estonia, 30 June–3 July 2025; IEEE: New York, NY, USA, 2025; pp. 511–516. [Google Scholar]
- Rahim, K.; Nasir, Z.U.I.; Ikram, N.; Qureshi, H.K. Integrating Contextual Intelligence with Mixture of Experts for Signature and Anomaly-Based Intrusion Detection in CPS Security. Neural Comput. Appl. 2025, 37, 5991–6007. [Google Scholar] [CrossRef]
- Mu, S.; Lin, S. A Comprehensive Survey of Mixture-of-Experts: Algorithms, Theory, and Applications. arXiv 2025, arXiv:2503.07137. [Google Scholar]
- Zhao, G.; Zhao, Y.; Yin, X.; Lin, L.; Zhu, J. Beyond Spurious Cues: Adaptive Multi-Modal Fusion via Mixture-of-Experts for Robust Sarcasm Detection. Mathematics 2025, 13, 3250. [Google Scholar] [CrossRef]
- Gao, Y.; Zhao, B.; Peng, H.; Bao, H.; Zhao, J.; Cui, Z. Bidirectional Temporal-Aware Modeling with Multi-Scale Mixture-of-Experts for Multivariate Time Series Forecasting. In Proceedings of the 34th ACM International Conference on Information and Knowledge Management, Seoul, Republic of Korea, 10–14 November 2025; pp. 696–706. [Google Scholar]
- Xu, J.; Sun, X.; Zhang, Z.; Zhao, G.; Lin, J. Understanding and Improving Layer Normalization. In Proceedings of the Advances in Neural Information Processing Systems, Vancouver, BC, Canada, 8–14 December 2019; Volume 32. [Google Scholar]
- Hendrycks, D. Gaussian Error Linear Units (GELUs). arXiv 2016, arXiv:1606.08415. [Google Scholar]
- Zhao, Y.; Ge, L.; Cui, G.; Fang, T. Improved ConvNeXt Facial Expression Recognition Embedded with Attention Mechanism. In Proceedings of the International Conference on Applied Intelligence, Nanning, China, 8–12 December 2023; Springer Nature: Singapore, 2023; pp. 89–100. [Google Scholar]
- Wang, Z. Combining UPerNet and ConvNeXt for Contrails Identification to Reduce Global Warming. arXiv 2023, arXiv:2310.04808. [Google Scholar]
- A Realistic Cyber Defense Dataset (CSE-CIC-IDS2018). Canadian Institute for Cybersecurity. Available online: https://registry.opendata.aws/cse-cic-ids2018 (accessed on 17 February 2025).
- Moustafa, N. A New Distributed Architecture for Evaluating AI-Based Security Systems at the Edge: Network TON_IoT Datasets. Sustain. Cities Soc. 2021, 72, 102994. [Google Scholar] [CrossRef]
- Koroniotis, N.; Moustafa, N.; Sitnikova, E.; Turnbull, B. Towards the Development of Realistic Botnet Dataset in the Internet of Things for Network Forensic Analytics: BoT-IoT Dataset. Future Gener. Comput. Syst. 2019, 100, 779–796. [Google Scholar] [CrossRef]
- Hasan, M.A.M.; Nasser, M.; Ahmad, S.; Molla, K.I. Feature Selection for Intrusion Detection Using Random Forest. J. Inf. Secur. 2016, 7, 129–140. [Google Scholar] [CrossRef]
- Hancock, J.T., III; Khoshgoftaar, T.M. Exploring Maximum Tree Depth and Random Undersampling in Ensemble Trees to Optimize the Classification of Imbalanced Big Data. SN Comput. Sci. 2023, 4, 462. [Google Scholar] [CrossRef]
- Swana, E.F.; Doorsamy, W.; Bokoro, P. Tomek Link and SMOTE Approaches for Machine Fault Classification with an Imbalanced Dataset. Sensors 2022, 22, 3246. [Google Scholar] [CrossRef]
- Loshchilov, I.; Hutter, F. Decoupled Weight Decay Regularization. arXiv 2017, arXiv:1711.05101. [Google Scholar]
- Kim, J.; Kim, J.; Kim, H.; Shim, M.; Choi, E. CNN-Based Network Intrusion Detection against Denial-of-Service Attacks. Electronics 2020, 9, 916. [Google Scholar] [CrossRef]
- Jumabek, A.; Yang, S.S.; Noh, Y.T. CatBoost-Based Network Intrusion Detection on Imbalanced CIC-IDS-2018 Dataset. Korean Soc. Commun. Commun. J. 2021, 46, 2191–2197. [Google Scholar] [CrossRef]
- Umman Varghese, M.; Taghiyarrenani, Z. Intrusion Detection in Heterogeneous Networks with Domain-Adaptive Multi-Modal Learning. arXiv 2025, arXiv:2508.03517. [Google Scholar]
- Cherfi, S.; Boulaiche, A.; Lemouari, A. Enhancing IoT Security: A Deep Learning Approach with Autoencoder-DNN Intrusion Detection Model. In Proceedings of the 2024 6th International Conference on Pattern Analysis and Intelligent Systems (PAIS), El Oued, Algeria, 24–25 April 2024; IEEE: New York, NY, USA, 2024; pp. 1–7. [Google Scholar]
- Chishti, F.; Rathee, G. ToN-IOT Set: Classification and Prediction for DDoS Attacks Using AdaBoost and RUSBoost. In Proceedings of the 2023 3rd International Conference on Advance Computing and Innovative Technologies in Engineering (ICACITE), Greater Noida, India, 12–13 May 2023; IEEE: New York, NY, USA, 2023; pp. 2842–2847. [Google Scholar]
- Tareq, I.; Elbagoury, B.M.; El-Regaily, S.; El-Horbaty, E.S. Analysis of TON-IoT, UNW-NB15, and Edge-IIoT Datasets Using DL in Cybersecurity for IoT. Appl. Sci. 2022, 12, 9572. [Google Scholar] [CrossRef]
- Esmaeilyfard, R.; Shoaei, Z.; Javidan, R. A Lightweight and Efficient Model for Botnet Detection in IoT Using Stacked Ensemble Learning. Soft Comput. 2025, 29, 89–101. [Google Scholar] [CrossRef]
- Hussan, M.I.T.; Reddy, G.V.; Anitha, P.T.; Kanagaraj, A.; Naresh, P. DDoS Attack Detection in IoT Environment Using Optimized Elman Recurrent Neural Networks Based on Chaotic Bacterial Colony Optimization. Clust. Comput. 2024, 27, 4469–4490. [Google Scholar] [CrossRef]
- Syed, N.F.; Ge, M.; Baig, Z. Fog-Cloud Based Intrusion Detection System Using Recurrent Neural Networks and Feature Selection for IoT Networks. Comput. Netw. 2023, 225, 109662. [Google Scholar] [CrossRef]
- Zhou, Y.; Lei, T.; Liu, H.; Du, N.; Huang, Y.; Zhao, V.; Dai, A.M.; Le, Q.V.; Laudon, J. Mixture-of-Experts with Expert Choice Routing. Adv. Neural Inf. Process. Syst. 2022, 35, 7103–7114. [Google Scholar]
- Saabni, R.; Asi, A.; El-Sana, J. Text Line Extraction for Historical Document Images. Pattern Recognit. Lett. 2014, 35, 23–33. [Google Scholar] [CrossRef]
- van der Maaten, L.; Hinton, G. Visualizing Data Using t-SNE. J. Mach. Learn. Res. 2008, 9, 2579–2605. [Google Scholar]











| Index | Attack Class | Count | Index | Attack Class | Count |
|---|---|---|---|---|---|
| 0 | Benign | 67,424 | 8 | DoS attacks—Hulk | 23,096 |
| 1 | Bot | 14,310 | 9 | DoS attacks—SlowHTTPTest | 6994 |
| 2 | Brute Force—Web | 611 | 10 | DoS attacks—Slowloris | 550 |
| 3 | Brute Force—XSS | 230 | 11 | FTP—BruteForce | 9668 |
| 4 | DDoS attack—HOIC | 34,301 | 12 | Infiltration | 8097 |
| 5 | DDoS attack—LOIC-UDP | 1730 | 13 | SQL Injection | 87 |
| 6 | DDoS attacks—LOIC-HTTP | 28,810 | 14 | SSH—Bruteforce | 9379 |
| 7 | DoS attacks—GoldenEye | 2075 |
| Method | Accuracy | Precision | Recall | F1-Score |
|---|---|---|---|---|
| CNN | 92.87% | 93.83% | 92.87% | 92.02% |
| BiLSTM | 92.09% | 91.54% | 92.09% | 90.94% |
| CNN-Image [54] | 91.50% | - | - | - |
| CatBoost-Based [55] | 91.95% | - | - | - |
| MM-DNN [56] | 93.40% | - | - | - |
| ConvNext–MoEs | 94.08% | 93.68% | 94.08% | 93.22% |
| Index | Attack Class | Count | Index | Attack Class | Count |
|---|---|---|---|---|---|
| 0 | Dos | 4539 | 4 | Normal | 9740 |
| 1 | DDoS | 505 | 5 | Password | 3594 |
| 2 | Injection | 606 | 6 | Scanning | 434 |
| 3 | Xssl | 13 | 7 | Mitm | 1240 |
| Method | Accuracy | Precision | Recall | F1-Score |
|---|---|---|---|---|
| CNN | 99.96% | 99.96% | 99.96% | 99.96% |
| BiLSTM | 99.90% | 99.84% | 99.90% | 99.87% |
| SA+DNN [57] | 99.86% | 99.86% | 99.86% | 99.86% |
| AdaBoost [58] | 99.70% | - | - | - |
| Inception Time [59] | 98.30% | 98.30% | 98.30% | 98.30% |
| ConvNext–MoEs |
| Index | Attack Class | Count |
|---|---|---|
| 0 | DDoS | 19,266 |
| 1 | DoS | 16,502 |
| 2 | Normal | 477 |
| 3 | Reconnaissance | 9108 |
| 4 | Theft | 79 |
| Dataset | Random Seed | Accuracy | Fluctuation Range |
|---|---|---|---|
| BoT-IoT | 0 | 99.66% | |
| 42 | 99.78% | Baseline | |
| 84 | 99.79% | ||
| 128 | 99.42% |
| Method | Accuracy | Precision | Recall | F1-Score |
|---|---|---|---|---|
| CNN | 97.70% | 97.81% | 97.70% | 97.69% |
| BiLSTM | 96.55% | 96.76% | 96.55% | 96.54% |
| Stacked Ensemble Learning [60] | 99.30% | 99.20% | 99.00% | 99.10% |
| CBCO-ERNN [61] | 99.02% | 99.75% | 98.59% | 98.35% |
| RNN [62] | 99.55% | 99.99% | 99.02% | 99.49% |
| ConvNext–MoEs | 99.78% | 99.78% | 99.78% | 99.78% |
| ConvNeXt | Symmetric Routing Gate | Sparse MoEs | Auxiliary Loss | RUS+ SMOTE-Tomek | Accuracy | Precision | Recall | F1-Score |
|---|---|---|---|---|---|---|---|---|
| × | ✓ | ✓ | ✓ | ✓ | 92.94% | 92.90% | 92.94% | 92.26% |
| ✓ | × | × | ✓ | ✓ | 93.33% | 94.06% | 93.33% | 92.57% |
| ✓ | × | ✓ | ✓ | ✓ | 93.69% | 93.24% | 93.69% | 93.11% |
| ✓ | ✓ | ✓ | × | ✓ | 93.17% | 93.25% | 93.17% | 92.50% |
| ✓ | ✓ | ✓ | ✓ | × | 93.88% | 93.54% | 93.88% | 92.91% |
| ✓ | ✓ | ✓ | ✓ | ✓ | 94.08% | 93.68% | 94.08% | 93.22% |
| Dataset | Classes | Gini |
|---|---|---|
| CIC-IDS2018 | 15 | 0.823 |
| TON-IoT | 8 | 0.694 |
| BoT-IoT | 5 | 0.648 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license.
Share and Cite
Yang, J.; Zhang, K.; Zheng, R.; Li, C.; Zheng, J. IoT Network Security Threat Detection Algorithm Integrating Symmetric Routing and a Sparse Mixture-of-Experts Model. Symmetry 2026, 18, 63. https://doi.org/10.3390/sym18010063
Yang J, Zhang K, Zheng R, Li C, Zheng J. IoT Network Security Threat Detection Algorithm Integrating Symmetric Routing and a Sparse Mixture-of-Experts Model. Symmetry. 2026; 18(1):63. https://doi.org/10.3390/sym18010063
Chicago/Turabian StyleYang, Jiawen, Kunsan Zhang, Renguang Zheng, Chaopeng Li, and Jiachun Zheng. 2026. "IoT Network Security Threat Detection Algorithm Integrating Symmetric Routing and a Sparse Mixture-of-Experts Model" Symmetry 18, no. 1: 63. https://doi.org/10.3390/sym18010063
APA StyleYang, J., Zhang, K., Zheng, R., Li, C., & Zheng, J. (2026). IoT Network Security Threat Detection Algorithm Integrating Symmetric Routing and a Sparse Mixture-of-Experts Model. Symmetry, 18(1), 63. https://doi.org/10.3390/sym18010063

