Towards a Lightweight Detection System for Cyber Attacks in the IoT Environment Using Corresponding Features
Abstract
:1. Introduction
1.1. Intrusion Detection System (IDS)
1.2. Common Attacks in the IoT Environment
- Probing attacks (reconnaissance): These are malicious activities to gather information about the targets through remote scanning. They are often categorized by the two subclasses of port scanning and OS fingerprinting.
- DDoS attacks: These are launched collaboratively by many compromised hosts (called bots). Such attacks try to disrupt the availability of services to legitimate users.
- Information theft attacks: In these attacks, an adversary seeks to obtain sensitive data. They can be subcategorized into data theft and keylogging.
1.3. Our Contributions
- A machine learning (ML)-based lightweight IDS is proposed and implemented on a Raspberry Pi system.
- To overcome the challenges of the resource constraint problem, a novel feature selection algorithm called correlated-set thresholding on gain-ratio (CST-GR) is proposed for selecting essential features. In our experiment, the number of features is greatly reduced by this algorithm.
- The essential features are selected for each specific kind of attack. Thus, good detection performance can be expected.
- The detection performance of our proposal is examined in detail using the botnet dataset, Bot-IoT [6], which is collected in a simulated IoT environment. We observe that the CST-GR algorithm can significantly reduce the processing time with almost no sacrifice on detection accuracy. We also observe that, without the help of the CST-GR algorithm, Raspberry Pi cannot handle the entire dataset used in our experiments.
- We try several tree-based classifiers—J48, Hoeffding tree (VFDT, very fast decision tree), logistic model tree (LMT), and random forest (RF)—to determine which classifier is the most suitable to the IoT environment in terms of lightweight and detection performance. According to our discussion and experimental results, the J48 algorithm is found to be the most suitable for our detection system.
- We make sure to what degree the processing time can be decreased for training and testing if the Raspberry Pi device is used in multithreading mode.
1.4. Organization of the Paper
2. Related Works
2.1. Public IDS
2.2. Machine Learning-Based IDS for the IoT Environment
2.3. Raspberry Pi-Based IDS
2.4. Feature Selection
3. Our IDS Proposal
3.1. A New Algorithm for Feature Selection
Algorithm 1 Correlated-Set Thresholding on Gain-Ratio (CST-GR) |
Input: Feature Set (FI) Output: Selected Feature Set (FS)
|
3.2. The General Flow of the IDS Proposal
3.3. Tree-Based Classifiers
3.3.1. J48
3.3.2. Hoeffding Tree
3.3.3. Logistic Model Tree
3.3.4. Random Forest
4. Experiments
4.1. Dataset
4.2. Features Selected by the CST-GR Algorithm
4.3. Performance Evaluation
4.3.1. Detection Accuracy
4.3.2. Evaluation of Processing Time
4.3.3. Processing Time on the Parallel Mode
4.3.4. CPU and Memory Usage on the Parallel Mode
4.4. Observations
- Using our proposed feature selection algorithm (CST-GR) for each kind of attacks, the number of features can be greatly decreased and the detection system can be made much lighter and much faster almost without any sacrifice on detection accuracy (see Table 2 and Table 4, Figure 4 and Figure 5). Moreover, the Raspberry Pi device can handle many more instances.
- When using J48 and RF as the classifier, the detection accuracy (TPR) is still up to 99.4% even when using only very few features selected by the CST-GR algorithm.
- The detection system can be implemented in the parallel mode in Raspberry Pi. However, it cannot handle all the data in the parallel mode if the original features are used without the help of the CST-GR algorithm.
- The case of the J48 algorithm being used as the classifier has the shortest response time for detection, although the training time is a little longer than the VFDT (but still faster than the other two) and overall detection accuracy of the J48 is better than that of the VFDT.
- Although the detection accuracy (TPR and FPR) of the RF is slightly better than that of the J48, the detection time of the J48 is around ten times faster than that of the RF. Therefore, J48 is the best choice for the classifier in our detection system.
5. Conclusions and Future Work
Author Contributions
Funding
Conflicts of Interest
References
- Cisco. Cisco Visual Networking Index (VNI) Global Mobile Data Traffic Forecast Update, 2017–2022 White Paper; Cisco Systems Inc.: San Jose, CA, USA, 2019. [Google Scholar]
- Spamhaus Malware Labs. Spamhaus Botnet Threat Report 2019; Spamhaus Malware Labs: Geneva, Switzerland, 2018. [Google Scholar]
- Haider, W.; Creech, G.; Xie, Y.; Hu, J. Windows based data sets for evaluation of robustness of Host based Intrusion Detection Systems (IDS) to zero-day and stealth attacks. Future Internet 2016, 8, 29. [Google Scholar] [CrossRef] [Green Version]
- CPS Technologies. Cyber Attack Trends Analysis Report; CPS Technologies: Norton, MA, USA, 2019; Volume 1. [Google Scholar]
- Zitta, T.; Neruda, M.; Vojtech, L. The security of RFID readers with IDS/IPS solution using Raspberry Pi. In Proceedings of the 2017 18th International Carpathian Control Conference, Sinaia, Romania, 28–31 May 2017; pp. 316–320. [Google Scholar]
- Koroniotis, N.; Moustafa, N.; Sitnikova, E.; Turnbull, B. Towards the Development of Realistic Botnet Dataset in the Internet of Things for Network Forensic Analytics: Bot-IoT Dataset. arXiv 2018, arXiv:1811.0070. [Google Scholar] [CrossRef] [Green Version]
- Shah, S.A.R.; Issac, B. Performance comparison of intrusion detection systems and application of machine learning to Snort system. Future Gener. Comput. Syst. 2018, 80, 157–170. [Google Scholar] [CrossRef]
- Amini, P.; Araghizadeh, M.A.; Azmi, R. A survey on Botnet: Classification, detection and defense. In Proceedings of the 2015 International Electronics Symposium (IES), Surabaya, Indonesia, 29–30 September 2016; pp. 233–238. [Google Scholar]
- Hassija, V.; Chamola, V.; Saxena, V.; Jain, D.; Goyal, P.; Sikdar, B. A Survey on IoT Security: Application Areas, Security Threats, and Solution Architectures. IEEE Access 2019, 7, 82721–82743. [Google Scholar] [CrossRef]
- Amin, F.; Ahmad, A.; Choi, G.S. Towards Trust and Friendliness Approaches in the Social Internet of Things. Appl. Sci. 2019, 9, 166. [Google Scholar] [CrossRef] [Green Version]
- Baker, A.R.; Esler, J. Snort IDS, IPS Toolkit; 30 Corporate Dr.; Elsevier Inc.: Burlington, MA, USA, 2007; ISBN 9783540449119. [Google Scholar]
- OISF. Suricata User Guide; Open Information Security Foundation: Boston, MA, USA, 2019. [Google Scholar]
- Tirumala, S.S.; Sathu, H.; Sarrafzadeh, A. Free and open source intrusion detection systems: A study. In Proceedings of the 2015 International Conference on Machine Learning and Cybernetics (ICMLC), Guangzhou, China, 12–15 July 2015; Volume 1, pp. 205–210. [Google Scholar]
- Sforzin, A.; Marmol, F.G.; Conti, M.; Bohli, J.M. RPiDS: Raspberry Pi IDS—A Fruitful Intrusion Detection System for IoT. In Proceedings of the 2016 International IEEE Conferences on Ubiquitous Intelligence & Computing, Advanced and Trusted Computing, Scalable Computing and Communications, Cloud and Big Data Computing, Internet of People, and Smart World Congress (UIC/ATC/ScalCom/CBDCom/IoP/SmartWorld), Toulouse, France, 18–21 July 2016; pp. 440–448. [Google Scholar]
- Cervantes, C.; Poplade, D.; Nogueira, M.; Santos, A. Detection of sinkhole attacks for supporting secure routing on 6LoWPAN for Internet of Things. In Proceedings of the 2015 IFIP/IEEE International Symposium on Integrated Network Management (IM), Ottawa, ON, Canada, 11–15 May 2015; pp. 606–611. [Google Scholar]
- Guo, Z.; Harris, I.G.; Jiang, Y.; Tsaur, L.F. An efficient approach to prevent battery exhaustion attack on BLE-based mesh networks. In Proceedings of the 2017 International Conference on Computing, Networking and Communications (ICNC), Santa Clara, CA, USA, 26–29 January 2017; pp. 1–5. [Google Scholar]
- Anthi, E.; Williams, L.; Burnap, P. Pulse: An Adaptive Intrusion Detection for the Internet of Things. In Proceedings of the Living in the Internet of Things: Cybersecurity of the IoT-2018, London, UK, 28–29 March 2018; p. 35. [Google Scholar]
- Nobakht, M.; Sivaraman, V.; Boreli, R. A host-based intrusion detection and mitigation framework for smart home IoT using OpenFlow. In Proceedings of the 2016 11th International Conference on Availability, Reliability and Security (ARES), Salzburg, Austria, 31 August–2 September 2016; pp. 147–156. [Google Scholar]
- Fu, Y.; Yan, Z.; Cao, J.; Koné, O.; Cao, X. An Automata Based Intrusion Detection Method for Internet of Things. Mob. Inf. Syst. 2017, 2017, 1750637. [Google Scholar] [CrossRef] [Green Version]
- Kyaw, A.K.; Chen, Y.; Joseph, J. Pi-IDS: Evaluation of open-source intrusion detection systems on Raspberry Pi 2. In Proceedings of the 2015 Second International Conference on Information Security and Cyber Forensics (InfoSec), Cape Town, South Africa, 15–17 November 2015; pp. 165–170. [Google Scholar]
- Da Silva Cardoso, A.M.; Lopes, R.F.; Teles, A.S.; Magalhaes, F.B.V. Real-time DDoS detection based on complex event processing for IoT. In Proceedings of the Third International Conference on Internet-of-Things Design and Implementation (IoTDI 2018), Orlando, FL, USA, 17–20 April 2018; pp. 273–274. [Google Scholar]
- von Sperling, T.L.; de Caldas Filho, F.L.; de Sousa, R.T.; e Martins, L.M.C.; Rocha, R.L. Tracking intruders in IoT networks by means of DNS traffic analysis. In Proceedings of the 2017 Workshop on Communication Networks and Power Systems (WCNPS), Brasília, Brazil, 16–17 November 2017; pp. 1–4. [Google Scholar]
- Aspernäs, A.; Simonsson, T. IDS on Raspberry Pi: A Performance Evaluation; Linnaeus University: Vaxjo, Sweden, 2015. [Google Scholar]
- Khater, B.S.; Wahid, A.; Abdul, B.; Yamani, M.; Bin, I.; Hussain, M.A.; Ibrahim, A.A. A Lightweight Perceptron-Based Intrusion Detection System for Fog Computing. Appl. Sci. 2019, 9, 178. [Google Scholar] [CrossRef] [Green Version]
- Creech, G.; Hu, J. Generation of a new IDS test dataset: Time to retire the KDD collection. In Proceedings of the 2013 IEEE Wireless Communications and Networking Conference (WCNC), Shanghai, China, 7–10 April 2013; pp. 4487–4492. [Google Scholar]
- Coşar, M.; Kiram, H.E. Performance Comparison of Open Source IDSs via Raspberry Pi. In Proceedings of the 2018 International Conference on Artificial Intelligence and Data Processing (IDAP), Malatya, Turkey, 16–17 September 2018. [Google Scholar]
- Tripathi, S. Raspberry Pi as an Intrusion Detection System, a Honeypot and a Packet Analyzer. In Proceedings of the 2018 International Conference on Computational Techniques, Electronics and Mechanical Systems (CTEMS), Belgaum, India, 21–23 December 2018; pp. 80–85. [Google Scholar]
- Kohavi, R.; John, G.H. Wrappers for Feature Subset Selection. Artif. Intell. 1997, 97, 273–324. [Google Scholar] [CrossRef] [Green Version]
- Kohavi, R.; Sommerfield, D. Feature subset selection using the wrapper method: Overfitting and dynamic search space topology. In Proceedings of the First International Conference on Knowledge Discovery and Data Mining, Montreal, QC, Canada, 20–21 August 1995; pp. 192–197. [Google Scholar]
- Feng, Y.; Akiyama, H.; Lu, L.; Sakurai, K. Feature Selection for Machine Learning-Based Early Detection of Distributed Cyber Attacks. In Proceedings of the 2018 IEEE 16th International Conference on Dependable, Autonomic and Secure Computing, 16th International Conference on Pervasive Intelligence and Computing, 4th International Conference on Big Data Intelligence and Computing and Cyber Science and Technology Congress (DASC/PiCom/DataCom/CyberSciTech), Athens, Greece, 12–15 August 2018; pp. 173–180. [Google Scholar]
- Chandrashekar, G.; Sahin, F. A survey on feature selection methods. Comput. Electr. Eng. 2014, 40, 16–28. [Google Scholar] [CrossRef]
- Guyon, I.; Elisseeff, A. An Introduction to Variable and Feature Selection. J. Mach. Learn. Res. 2003, 3, 1157–1182. [Google Scholar]
- Battiti, R. Using mutual information for selecting features in supervised neural net learning. IEEE Trans. Neural Netw. 1994, 5, 537–550. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Karegowda, A.G.; Manjunath, A.S.; Ratio, G.; Evaluation, C.F. Comparative study of Attribute Selection Using Gain Ratio. Int. J. Inf. Technol. Knowl. Knowl. Manag. 2010, 2, 271–277. [Google Scholar]
- Hall, M. Correlation-Based Feature Selection for Machine Learning. Ph.D. Thesis, University of Waikato, Hamilton, New Zealand, 1999. [Google Scholar]
- Soe, Y.N.; Feng, Y.; Santosa, P.I.; Hartanto, R.; Sakurai, K. Implementing Lightweight IoT-IDS on Raspberry Pi Using Correlation-Based Feature Selection and Its Performance Evaluation. In Advanced Information Networking and Applications, Proceedings of the 33rd International Conference on Advanced Information Networking and Applications AINA-2019; Advances in Intelligent Systems and Computing, Matsue, Japan, 27–29 March 2019; Springer: Cham, Swiizerland, 2019; Volume 926, pp. 458–469. [Google Scholar]
- Moustafa, N.; Slay, J. UNSW-NB15: A comprehensive data set for network intrusion detection systems (UNSW-NB15 network data set). In Proceedings of the 2015 Military Communications and Information Systems Conference (MilCIS), Canberra, Australia, 10–12 November 2015. [Google Scholar]
- Kuhn, M.; Johnson, K. An Introduction to Feature Selection. In Applied Predictive Modeling; Springer: New York, NY, USA, 2013; pp. 487–519. [Google Scholar]
- Witten, I.H.; Frank, E. Data Mining. Practical Machine Learning Tools and Technicals with Java Implementations, 2nd ed.; Morgan Kaufmann Series in Data Management Systems; Elsevier Inc.: San Francisco, CA, USA, 2005; ISBN 0080890369. [Google Scholar]
- Ashari, A.; Paryudi, I.; Min, A. Performance Comparison between Naïve Bayes, Decision Tree and k-Nearest Neighbor in Searching Alternative Design in an Energy Simulation Tool. Int. J. Adv. Comput. Sci. Appl. 2013, 4, 33–39. [Google Scholar] [CrossRef]
- Domingos, P.; Hulten, G. Mining high-speed data streams. In Proceedings of the Sixth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, New York, NY, USA, 24–27 August 2000; pp. 71–80. [Google Scholar]
- Hulten, G.; Spencer, L. Mining Time-Chaning Data Streams. Comput. Sci. 2001. [Google Scholar] [CrossRef]
- Landwehr, N.; Hall, M.; Frank, E. Logistic model trees. Lect. Notes Artif. Intell. (Subseries Lect. Notes Comput. Sci.) 2005, 2837, 241–252. [Google Scholar] [CrossRef] [Green Version]
- Friedman, J.; Hastie, T.; Tibshirani, R. Additive Logistic Regression: A Statisticl View of Boosting. Ann. Stat. 2000, 28, 337–407. [Google Scholar] [CrossRef]
- Breiman, L. Random Forests—Random Feature; University of California: Berkeley, CA, USA, 2001; pp. 1–33. [Google Scholar]
- Oshiro, T.M.; Perez, P.S.; Baranauskas, J.A. How Many Trees in a Random Forest? In Machine Learning and Data Mining in Pattern Recognition; Springer: Berlin/Heidelberg, Germany, 2012; Volume 3587, pp. 154–168. ISBN 978-3-540-26923-6. [Google Scholar]
- Node-RED. Available online: https://nodered.org/ (accessed on 1 August 2019).
- Ostinato. Available online: https://ostinato.org/ (accessed on 1 August 2019).
- Hping. Available online: http://hping.org/ (accessed on 1 August 2019).
- GoldenEye. Available online: https://github.com/jseidl/GoldenEye (accessed on 1 August 2019).
- Lyon, G.F. Nmap Network Scanning: The official Nmap Project Guide to Network Discovery and Security Scanning; Insecure. Com LLC: Sunnyvale, CA, USA, 1990. [Google Scholar]
- Xprobe2. Available online: https://www.aldeid.com/wiki/Xprobe2 (accessed on 7 August 2019).
- Hisham Htop—An Interactive Process Viewer for Unix. Available online: http://hisham.hm/htop/ (accessed on 15 July 2019).
References | Detection Method | Pi Model | Tools | Threats | Environment |
---|---|---|---|---|---|
Kyaw et al. [20] | Misuse-based | Pi 2-B | Snort, Bro | SYN flood, ARP spoofing, port scanning | Conventional |
Coşar et al. [26] | Misuse-based | - | Snort, Suricata | SYN flood, Smurf, UDP flood | Conventional |
Tripathi et al. [27] | Misuse-based | Pi 3-B | Snort | ICMP ping, brute-force | Conventional |
Sforzin et al. [14] | Misuse-based | Pi 2-B | Snort | - | IoT |
Cardoso et al. [21] | Pattern matching | Pi 3-B | Complex Event Processing | SYN flood, UDP flood, ICMP flood, port scanning | IoT |
Zitta et al. [5] | Misuse-based | Pi 3 | Suricata | Port scanning | IoT |
Sperling et al. [22] | Traffic analyzing | Pi 3-B | Python, DPKT | MITM, DoS, DNS cache poisoning | IoT |
Feature Name | Description | Attacks |
---|---|---|
TnBPDstIP | Total number of bytes per destination IP | DDoS |
drate | Destination-to-source packets per second | DDoS |
N_IN_Conn_P_DstIP | Number of inbound connections per destination IP | Reconnaissance |
AR_P_Proto_P_SrcIP | Average rate per protocol per source IP | Reconnaissance |
AR_P_Proto_P_Dport | Average rate per protocol per dport | Reconnaissance, Theft |
TnP_PDstIP | Total number of packets per destination IP | Reconnaissance |
TnP_PerProto | Total number of packets per protocol | Reconnaissance, Theft |
state_number | Numerical representation of feature state | Theft |
Attack Type | Number of Instances for Training | Number of Instances for Testing | Total Number of Instances |
---|---|---|---|
DDoS | 54,651 | 27,326 | 81,977 |
Reconnaissance | 54,706 | 27,354 | 82,060 |
Theft | 367 | 189 | 556 |
Classifier | All Features | CST-GR | ||
---|---|---|---|---|
Training | Testing | Training | Testing | |
J48 | 57.65 | 1.22 | 8.61 | 0.81 |
VFDT | 33.72 | 1.58 | 4.97 | 0.92 |
LMT | N/A | N/A | 1198.37 | 0.99 |
RF | 422.4 | 11.95 | 184.22 | 10.76 |
Rank | All Features | CST-GR Features | ||
---|---|---|---|---|
Training | Testing | Training | Testing | |
1 | VFDT | J48 | VFDT | J48 |
2 | J48 | VFDT | J48 | VFDT |
3 | RF | RF | RF | LMT |
4 | N/A | N/A | LMT | RF |
Classifiers | Training (seconds) | Testing (seconds) | ||
---|---|---|---|---|
Sequential | Parallel | Sequential | Parallel | |
J48 | 8.61 | 4.47 | 0.81 | 0.42 |
VFDT | 4.97 | 3.03 | 0.92 | 0.48 |
LMT | 1198.37 | 867.49 | 0.99 | 0.5 |
RF | 184.22 | 102.14 | 10.76 | 5.92 |
© 2020 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).
Share and Cite
Soe, Y.N.; Feng, Y.; Santosa, P.I.; Hartanto, R.; Sakurai, K. Towards a Lightweight Detection System for Cyber Attacks in the IoT Environment Using Corresponding Features. Electronics 2020, 9, 144. https://doi.org/10.3390/electronics9010144
Soe YN, Feng Y, Santosa PI, Hartanto R, Sakurai K. Towards a Lightweight Detection System for Cyber Attacks in the IoT Environment Using Corresponding Features. Electronics. 2020; 9(1):144. https://doi.org/10.3390/electronics9010144
Chicago/Turabian StyleSoe, Yan Naung, Yaokai Feng, Paulus Insap Santosa, Rudy Hartanto, and Kouichi Sakurai. 2020. "Towards a Lightweight Detection System for Cyber Attacks in the IoT Environment Using Corresponding Features" Electronics 9, no. 1: 144. https://doi.org/10.3390/electronics9010144
APA StyleSoe, Y. N., Feng, Y., Santosa, P. I., Hartanto, R., & Sakurai, K. (2020). Towards a Lightweight Detection System for Cyber Attacks in the IoT Environment Using Corresponding Features. Electronics, 9(1), 144. https://doi.org/10.3390/electronics9010144