A Hybrid Approach for IoT Security: Combining Ensemble Learning with Fuzzy Logic
Abstract
1. Introduction
1.1. Related Works
1.2. Motivation and Contributions
- A majority voting-based ensemble architecture integrating multiple machine learning algorithms presents an innovative hybrid framework for malware detection and assessment in IoT environments.
- The automatically generated rule base derived from the high-accuracy labeling outputs of the ensemble model eliminates the need for manual rule definition in the fuzzy inference system, thereby enhancing the scalability and consistency of the process.
- A three-level security status modeling that goes beyond binary classification approaches is expressed through membership degrees, providing a more accurate representation of risks in gray areas. Unlike traditional machine learning or fixed threshold-based methods, this approach offers decision-makers a more flexible and human intuition-aligned assessment.
- The comprehensive dataset, encompassing realistic IoT network behaviors, various device types, and different attack scenarios, supports the applicability of the method under real-world conditions.
1.3. Organization
2. Preliminaries
2.1. SelectKBest Feature Selection
2.2. Conversion from Categorical to Numerical Data
Fleiss’ Kappa
2.3. Ensemble Learning
2.3.1. Decision Tree
2.3.2. Random Forest
- Bootstrap samples are taken from the training data. For each sample, the best split is made from a randomly selected subset of features at each node. This continues until the minimum node size is reached. Once all the trees are constructed, their results are combined to make a prediction.
- As the number of trees increases, the error rate generally decreases. The size of the feature subset is a critical parameter for model performance.
2.3.3. K-Nearest Neighbors
2.4. Fuzzy Logic System
3. Proposed Method
3.1. Dataset Description
3.2. Preprocessing
3.3. Automated Rule Extraction and Agreement Analysis Using Ensemble Learning
- ML Accuracy Detection: After the dataset-preprocessing steps, the machine learning models are trained and their performance rates are compared. Among these methods are RF, DT, KNN, Gaussian Naive Bayes (NB), Logistic Regression (LR), and Support Vector Machine (SVM). The k-fold value is set to 10, and the performance rates of the classifiers selected for the ensemble model are evaluated. In k-fold cross-validation, each data point is used once as a test set and k − 1 times as part of the training set. Model performance is reported by averaging the results from k different runs. This ensures that the model is tested in a fairer, more reliable, and generalizable way. Some of the parameters used in this study are as follows: k-fold cross-validation with n_splits = 10, shuffle = True, and random_state = 42. KNN classifier with n_neighbors = 5 DT with criterion = ‘gini’ and random_state = 42. RF with n_estimators = 100, criterion = ‘gini’, and random_state = 42. LR with solver = ‘lbfgs’ and max_iter = 1000. For all models, the random_state value is fixed to ensure reproducibility of data splits, model initialization, and results. The performance rates of the models are shown in Table 2.
- As shown in Table 2, DT, RF, and KNN were chosen as the base classifiers. The reason is that, in our preliminary experiments, these three algorithms achieved higher performance compared to other candidates such as SVM, Naive Bayes, and Logistic Regression. Moreover, these algorithms provide a good balance between predictive accuracy and interpretability, which ensures compatibility with the fuzzy rule extraction step. DT produces transparent decision rules, RF reduces overfitting and improves stability through bagging, and KNN captures local neighborhood patterns that are particularly useful at class boundaries. The complementary nature of their error patterns made it appropriate to use these three methods together in the ensemble.
- Statistical Transformations: After determining the performance rates of the ML methods, the actual test data is generated. A categorical dataset consisting of 81 rows, created from combinations of low, medium, and high for four features, is constructed. The models are trained with numerical data, but the test data consists of categorical values. Therefore, the categorical data needs to be numerically transformed. For each feature, the boundaries corresponding to low, medium, and high states under a balanced distribution are determined as shown in Table 3. Based on these boundaries, the categorical values within the relevant range for each feature are represented numerically by the median value of that group. The median is robust against outliers and, especially in skewed distributions, best represents the class center [18]. The median values for each feature are shown in Table 4. In this way, a numerical dataset is created for the test data.
- Labeling Test Data with ML Models: The test data, converted from categorical to numerical values, is labeled separately with the highest-performing DT, RF, and KNN models. In this way, decision support is provided using machine learning methods to determine the label of the system for each feature’s low, medium, and high.
- Fleiss’ Kappa Compatibility Analysis: The Fleiss’ Kappa compatibility analysis method is used to test the agreement between the labels provided by the DT, RF, and KNN models for the test data. The agreement between the three classifiers for the same dataset is 0.8175949557123559. This indicates that the decisions made by the three classifiers, which have high accuracy rates, exhibit superior compatibility.
- Application of Majority Voting from Post-Model Ensemble Learning Methods: To improve the labeling decisions of the DT, RF, and KNN classifiers with the highest accuracy values, the majority voting method is applied as a post-model ensemble between these three classifiers. This method is shown in Figure 5.
- Obtaining the Fuzzy Rule Base Based on Ensemble Learning: Using the post-model ensemble method, majority voting is applied to the decisions made by the DT, RF, and KNN base classifiers, and the final decision is determined based on the majority vote. In this way, 81 data points consisting of combinations of low, medium, and high for the features are labeled. Each row in this dataset is defined as a rule base. An example of the obtained rule is shown in Figure 6.
3.4. Fuzzy Logic
4. Results and Discussion
4.1. Experimental Setup
4.2. Preprocessing Results for Proposed Ensemble Model
4.3. Automated Rule Base
4.4. Fuzzy Logic Results
4.5. Ablation Study and Computational Cost
5. Conclusions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
Abbreviations
ARP | Address Resolution Protocol |
CPU | Central Processing Unit |
CNN | Convolutional Neural Network |
DDoS | Distributed Denial of Service |
DNS | Domain Name System |
IAT | Inter-Arrival Time |
ICMP | Internet Control Message Protocol |
MCDM | Multi-Criteria Decision Making |
ML | Machine Learning |
OS | Operating System |
RAM | Random Access Memory |
SYN | Synchronize (TCP flag) |
UDP | User Datagram Protocol |
WSN | Wireless Sensor Network |
References
- Karakaya, A.; Ulu, A. A survey on post-quantum based approaches for edge computing security. WIREs Comput. Stat. 2024, 16, e1644. [Google Scholar] [CrossRef]
- Varol, M.; İskefiyeli, M. A low cost compact network TAP device with Raspberry Pi 4. Eng. Sci. Technol. Int. J. 2025, 70, 102118. [Google Scholar] [CrossRef]
- Makkar, A.; Ghosh, U.; Sharma, P.K.; Javed, A. A Fuzzy-Based Approach to Enhance Cyber Defence Security for Next-Generation IoT. IEEE Internet Things J. 2023, 10, 2079–2086. [Google Scholar] [CrossRef]
- Yazdinejad, A.; Dehghantanha, A.; Parizi, R.M.; Srivastava, G.; Karimipour, H. Secure intelligent fuzzy blockchain framework: Effective threat detection in iot networks. Comput. Ind. 2023, 144, 103801. [Google Scholar] [CrossRef]
- Akbari, Y.; Tabatabaei, S. A new method to find a high reliable route in IoT by using reinforcement learning and fuzzy logic. Wirel. Pers. Commun. 2020, 112, 967–983. [Google Scholar] [CrossRef]
- Hashemi, S.Y.; Shams Aliee, F. Fuzzy, dynamic and trust based routing protocol for IoT. J. Netw. Syst. Manag. 2020, 28, 1248–1278. [Google Scholar] [CrossRef]
- Verma, R.; Chandra, S. Interval-valued intuitionistic fuzzy-analytic hierarchy process for evaluating the impact of security attributes in fog based internet of things paradigm. Comput. Commun. 2021, 175, 35–46. [Google Scholar] [CrossRef]
- Zahra, S.R.; Chishti, M.A. Fuzzy logic and fog based secure architecture for internet of things (flfsiot). J. Ambient. Intell. Humaniz. Comput. 2023, 14, 5903–5927. [Google Scholar] [CrossRef]
- Farhin, F.; Sultana, I.; Islam, N.; Kaiser, M.S.; Rahman, M.S.; Mahmud, M. Attack detection in internet of things using software defined network and fuzzy neural network. In Proceedings of the 2020 Joint 9th International Conference on Informatics, Electronics & Vision (ICIEV) and 2020 4th International Conference on Imaging, Vision & Pattern Recognition (icIVPR), Kitakyushu, Japan, 26–29 August 2020; IEEE: Piscataway, NJ, USA, 2020; pp. 1–6. [Google Scholar]
- Zahra, S.R.; Chishti, M.A. A generic and lightweight security mechanism for detecting malicious behavior in the uncertain Internet of Things using fuzzy logic-and fog-based approach. Neural Comput. Appl. 2022, 34, 6927–6952. [Google Scholar] [CrossRef]
- Kerimkhulle, S.; Dildebayeva, Z.; Tokhmetov, A.; Amirova, A.; Tussupov, J.; Makhazhanova, U.; Adalbek, A.; Taberkhan, R.; Zakirova, A.; Salykbayeva, A. Fuzzy Logic and Its Application in the Assessment of Information Security Risk of Industrial Internet of Things. Symmetry 2023, 15, 1958. [Google Scholar] [CrossRef]
- Alalhareth, M.; Hong, S.C. Enhancing the internet of medical things (IoMT) security with meta-learning: A performance-driven approach for ensemble intrusion detection systems. Sensors 2024, 24, 3519. [Google Scholar] [CrossRef]
- Wu, N.I.; Feng, T.H.; Hwang, M.S. A Fuzzy-Based Relay Security Algorithm for Wireless Sensor Networks. Sensors 2025, 25, 4422. [Google Scholar] [CrossRef]
- Qiu, X.; Shi, L.; Fan, P. A cooperative intrusion detection system for internet of things using fuzzy logic and ensemble of convolutional neural networks. Sci. Rep. 2025, 15, 15934. [Google Scholar] [CrossRef]
- Zulfiker, M.S.; Kabir, N.; Biswas, A.A.; Nazneen, T.; Uddin, M.S. An in-depth analysis of machine learning approaches to predict depression. Curr. Res. Behav. Sci. 2021, 2, 100044. [Google Scholar] [CrossRef]
- Brownlee, J. How to choose a feature selection method for machine learning. Mach. Learn. Mastery 2019, 10, 1–7. [Google Scholar]
- Hancock, J.T.; Khoshgoftaar, T.M. Survey on categorical data for neural networks. J. Big Data 2020, 7, 28. [Google Scholar] [CrossRef]
- Forbes, C.; Evans, M.; Hastings, N.; Peacock, B. Statistical Distributions; John Wiley & Sons: Hoboken, NJ, USA, 2011. [Google Scholar]
- Landis, J.R.; Koch, G.G. An application of hierarchical kappa-type statistics in the assessment of majority agreement among multiple observers. Biometrics 1977, 33, 363–374. [Google Scholar] [CrossRef] [PubMed]
- Raza, K. Chapter 8—Improving the prediction accuracy of heart disease with ensemble learning and majority voting rule. In U-Healthcare Monitoring Systems; Dey, N., Ashour, A.S., Fong, S.J., Borra, S., Eds.; Advances in Ubiquitous Sensing Applications for Healthcare; Academic Press: Cambridge, MA, USA, 2019; pp. 179–196. [Google Scholar] [CrossRef]
- Quinlan, J.R. Learning decision tree classifiers. ACM Comput. Surv. (CSUR) 1996, 28, 71–72. [Google Scholar] [CrossRef]
- Breiman, L. Random forests. Mach. Learn. 2001, 45, 5–32. [Google Scholar] [CrossRef]
- Schonlau, M.; Zou, R.Y. The random forest algorithm for statistical learning. Stata J. 2020, 20, 3–29. [Google Scholar] [CrossRef]
- Almomany, A.; Ayyad, W.R.; Jarrah, A. Optimized implementation of an improved KNN classification algorithm using Intel FPGA platform: Covid-19 case study. J. King Saud Univ.-Comput. Inf. Sci. 2022, 34, 3815–3827. [Google Scholar] [CrossRef]
- Mahabir, C.; Hicks, F.; Fayek, A.R. Application of fuzzy logic to forecast seasonal runoff. Hydrol. Process. 2003, 17, 3749–3762. [Google Scholar] [CrossRef]
- Özger, M. Comparison of fuzzy inference systems for streamflow prediction. Hydrol. Sci. J. 2009, 54, 261–273. [Google Scholar] [CrossRef]
- Mpallas, L.; Tzimopoulos, C.; Evangelides, C. Comparison between neural networks and adaptive neuro-fuzzy inference system in modeling lake Kerkini water level fluctuation lake management using artificial intelligence. J. Environ. Sci. Technol. 2011, 4, 366–376. [Google Scholar] [CrossRef]
- Precup, R.E.; Roman, R.C.; Hedrea, E.L.; Petriu, E.M.; Bojan-Dragos, C.A. Data-driven model-free sliding mode and fuzzy control with experimental validation. Int. J. Comput. Commun. Control 2021, 16, 4076. [Google Scholar] [CrossRef]
- Xiong, L.; Shamseldin, A.Y.; O’connor, K.M. A non-linear combination of the forecasts of rainfall-runoff models by the first-order Takagi–Sugeno fuzzy system. J. Hydrol. 2001, 245, 196–217. [Google Scholar] [CrossRef]
- Tabbussum, R.; Dar, A.Q. Performance evaluation of artificial intelligence paradigms—Artificial neural networks, fuzzy logic, and adaptive neuro-fuzzy inference system for flood prediction. Environ. Sci. Pollut. Res. 2021, 28, 25265–25282. [Google Scholar] [CrossRef]
- Bastian, N.; Bierbrauer, D.; McKenzie, M.; Nack, E. ACI IoT Network Traffic Dataset 2023. IEEE Dataport, 29 December 2023. [Google Scholar] [CrossRef]
- Koutris, A.; Siozos, T.; Kopsinis, Y.; Pikrakis, A.; Merk, T.; Mahlig, M.; Papaharalabos, S.; Karlsson, P. Deep Learning-Based Indoor Localization Using Multi-View BLE Signal. Sensors 2022, 22, 2759. [Google Scholar] [CrossRef]
- Thakur, N.; Han, C.Y. Multimodal Approaches for Indoor Localization for Ambient Assisted Living in Smart Homes. Information 2021, 12, 114. [Google Scholar] [CrossRef]
Ref. | Year | Description | Method | IoT Area | Results |
---|---|---|---|---|---|
[6] | 2020 | Routing protocol for IoT | Fuzzy Logic, Multi-Fuzzy Model, Trust Model | IoT Routing Security | Improved network performance and attack detection |
[5] | 2020 | Routing optimization in IoT | Fuzzy Logic, Reinforcement Learning | IoT Routing Optimization | Improved network lifetime and energy efficiency |
[7] | 2021 | Security assessment in fog-based IoT | IVIFS-AHP, MCDM | Fog-IoT Security | Prioritization of security factors |
[10] | 2022 | Security mechanism for detecting malicious behavior | Fuzzy Logic, Fog Computing, Zero-Trust Policy | Fog-IoT Security | High accuracy, low resource overhead |
[9] | 2023 | Attack detection in IoT | Fuzzy Neural Network, SDN | IoT Security | Enhanced attack detection accuracy |
[11] | 2023 | Security risk assessment in industrial IoT | Fuzzy Inference System, Linguistic Variables | IIoT Security | Improved decision support |
[8] | 2023 | Secure architecture for IoT | Fuzzy Logic, Fog Computing, DDoS Prevention | IoT Security | Enhanced real-time attack detection accuracy |
[3] | 2023 | Cyber defense security for next-gen IoT | Fuzzy rule-based classifiers, Machine learning | CIoT | High web spam detection accuracy |
[4] | 2023 | Threat detection in IoT Networks | Fuzzy DL, Optimized ANFIS, Fuzzy Matching | Blockchain-based IoT Security | Superior performance in accuracy and F1-score |
[12] | 2024 | Hybrid attack detection system for IoT networks | Fuzzy Logic, Decision Tree, Clustering | IoT Security | High accuracy, performance, and detection success |
[13] | 2025 | Secure relay selection algorithm | Fuzzy Logic, Tabu Algorithm | WSN Security | Energy efficiency, reduced retransmissions, isolation of malicious nodes |
[14] | 2025 | Intrusion detection for IoT | Fuzzy Logic, CNN, Entropy, Feature Selection | IoT Security | Successful attack detection, reliable classification results |
ML Model | Precision | Recall | Accuracy | F-Score |
---|---|---|---|---|
DT | 0.9968 | 0.9968 | 0.9968 | 0.9968 |
RF | 0.9964 | 0.9964 | 0.9964 | 0.9964 |
KNN | 0.9935 | 0.9935 | 0.9935 | 0.9934 |
SVM | 0.9545 | 0.9542 | 0.9542 | 0.9539 |
NB | 0.8897 | 0.8853 | 0.8853 | 0.8848 |
LR | 0.8599 | 0.8602 | 0.8602 | 0.8567 |
Features | B (Low) | B (Low–Medium) | B (Medium–High) | B (High) |
---|---|---|---|---|
Idle Max | 0.0 | 0.0327408205 | 0.4778552243 | 1.0 |
Fwd Seg Size Min | 0.0 | 0.1818181818 | 0.5454545454 | 1.0 |
Flow IAT Min | 0.0 | 0.0000532440 | 0.0001277126 | 1.0 |
Flow IAT Mean | 0.0 | 0.0000634909 | 0.0007005925 | 1.0 |
Features | Low | Medium | High |
---|---|---|---|
Idle Max | 0.0326968103818217 | 0.0484954050170927 | 0.4888663728675055 |
Fwd Seg Size Min | 0.0 | 0.5454545454545454 | 0.7272727272727273 |
Flow IAT Min | 0.0000054639068678 | 0.0000737378312857 | 0.0306466882538042 |
Flow IAT Mean | 0.0000491087370405 | 0.0001005757563467 | 0.0830450854138886 |
Values | Idle Max | Fwd Seg Size Min | Flow IAT Min | Flow IAT Mean |
---|---|---|---|---|
min | 0.0 | 0.0 | 0.0 | 0.0 |
q1 | 0.032698084930 | 0.0 | 6.4935792867 × 10−6 | 4.9573750445 × 10−5 |
q2 | 0.032717498164 | 0.181818181818 | 4.1020820558 × 10−5 | 5.3825301572 × 10−5 |
q3 | 0.051290879003 | 0.545454545455 | 7.4734288465 × 10−5 | 0.000102170088 |
q4 | 0.478048911682 | 0.545454545455 | 0.000134654580 | 0.000947735184 |
q5 | 0.944537272628 | 0.727272727273 | 0.207061827262 | 0.214860366937 |
max | 1.0 | 1.0 | 1.0 | 1.0 |
Idle Max | Fwd Seg Size Min | Flow IAT Min | Flow IAT Mean | Security Status Score |
---|---|---|---|---|
0.006221688797 | 0.094042473326 | 0.000000000000 | 0.000000000000 | 0.305145532867 |
0.264933761292 | 0.776827809269 | 0.008247201600 | 0.018999679827 | 0.127784707465 |
0.021500476863 | 0.402127314907 | 0.005735505107 | 0.011737656402 | 0.500000000000 |
0.264295998754 | 0.093972995210 | 0.008219071188 | 0.506646538419 | 0.305009254350 |
0.723005723412 | 0.357606249770 | 0.500116286997 | 0.000501555938 | 0.660724529715 |
0.900000000000 | 0.500000000000 | 0.700000000000 | 0,000080000000 | 0.817686940117 |
0.999900000000 | 0.533000000000 | 0.999000000000 | 0.990000000000 | 0.889943930530 |
0.550000000000 | 0.500000000000 | 0.000250000000 | 0.002000000000 | 0.457817362860 |
Figure 11 | Input Values (Idle Max, Fwd Seg Size Min, Flow IAT Min, Flow IAT Mean) | Fuzzy Score | Fuzzy Decision | Security Analyst Insight |
---|---|---|---|---|
(a) | 0.0062,0.0940,0.0000,0.0000 | 0.3051 | Secure | Low risk. Routine monitoring is sufficient. |
(b) | 0.2649,0.7768,0.0082,0.0190 | 0.1278 | Secure | Normal network behavior. No intervention required. |
(c) | 0.0215,0.4021,0.0057,0.0117 | 0.5000 | Partially insecure | Gray area. Potentially suspicious traffic. Analyst should conduct additional log analysis. |
(d) | 0.2643,0.0940,0.0082,0.5066 | 0.3050 | Secure | Mostly safe, but high traffic load detected. Monitoring is recommended. |
(e) | 0.7230,0.3576,0.5001,0.0005 | 0.6607 | Partially insecure | Potential transition toward attack. Early warning. Anomaly monitoring should be increased. |
(f) | 0.9000,0.5000,0.7000,0.00008 | 0.8177 | Insecure | Serious risk. Device should be quarantined or disconnected from the network. |
(g) | 0.9999,0.5330,0.9990,0.9900 | 0.8899 | Insecure | High risk of security breach. Immediate isolation and incident response required. |
(h) | 0.5500,0.5000,0.00025,0.0020 | 0.4578 | Partially insecure | Uncertain case. Analyst should further investigate abnormal traffic patterns. |
Results | (a) | (b) | (c) | (d) | (e) | (f) | (g) | (h) |
---|---|---|---|---|---|---|---|---|
Fuzzy | Secure | Secure | Partially Insecure | Secure | Partially Insecure | Insecure | Insecure | Partially Insecure |
Ensemble | Secure | Secure | Partially Insecure | Secure | Insecure | Insecure | Insecure | Insecure |
xDataset | 25% | 50% | ||||
Stage | Time (s) | Memory (MB) | CPU (%) | Time (s) | Memory (MB) | CPU (%) |
I | 12.23 | 948.84 | 12.9 | 14.93 | 1044.69 | 9.9 |
II | 47.03 | 77.20 | 5.5 | 107.77 | 31.78 | 4.3 |
III | 53.95 | 84.13 | 6.2 | 95.21 | 147.49 | 6.0 |
IIV | 0.135 | 1.02 | 5.01 | 0.124 | 0.95 | 4.09 |
xDataset | 75% | 100% | ||||
Stage | Time (s) | Memory (MB) | CPU (%) | Time (s) | Memory (MB) | CPU (%) |
I | 17.85 | 1102.26 | 14.3 | 19.75 | 1056.17 | 8.4 |
II | 130.57 | 170.66 | 3.3 | 158.94 | 67.64 | 7.2 |
III | 145.95 | 191.7 | 8.8 | 171.08 | 240.02 | 11.4 |
IIV | 0.143 | 0.85 | 5.33 | 0.128 | 0.86 | 5.88 |
xDataset | Precision | Recall | Accuracy | F-Score |
---|---|---|---|---|
25% | 0.9946 | 0.9946 | 0.9946 | 0.9946 |
50% | 0.9955 | 0.9955 | 0.9955 | 0.9955 |
75% | 0.9961 | 0.9961 | 0.9961 | 0.9960 |
100% | 0.9967 | 0.9967 | 0.9967 | 0.9967 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the author. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Karakaya, A. A Hybrid Approach for IoT Security: Combining Ensemble Learning with Fuzzy Logic. Sensors 2025, 25, 5668. https://doi.org/10.3390/s25185668
Karakaya A. A Hybrid Approach for IoT Security: Combining Ensemble Learning with Fuzzy Logic. Sensors. 2025; 25(18):5668. https://doi.org/10.3390/s25185668
Chicago/Turabian StyleKarakaya, Aykut. 2025. "A Hybrid Approach for IoT Security: Combining Ensemble Learning with Fuzzy Logic" Sensors 25, no. 18: 5668. https://doi.org/10.3390/s25185668
APA StyleKarakaya, A. (2025). A Hybrid Approach for IoT Security: Combining Ensemble Learning with Fuzzy Logic. Sensors, 25(18), 5668. https://doi.org/10.3390/s25185668