Computational System to Classify Cyber Crime Offenses using Machine Learning
Abstract
:1. Introduction
2. Related Works
3. Proposed Methodology
3.1. Information Gathering (Reconnaissance)
3.2. Preprocessing
Python code for Calculation of tf-idf vector – Incident |
fromsklearn.feature_extraction.text import TfidfVectorizer |
tf_idf = TfidfVectorizer(sublinear_tf=True, min_df=5, norm=’l2’, encoding=’latin-1’, |
ngram_range=(1, 2), stop_words=’english’) |
feat_crime = tfidf.fit_transform(df.Incident).toarray() |
features.shape |
Python code for Calculation of Correlated Words |
fromsklearn.feature_selection import chi2 |
import numpy as np |
M = 2 |
for Cyber_crime, c_id in sorted(c_to_id.items()): |
feat_crime_chi2 = chi2(feat_crime, labels == c_id) |
indices_crime = np.argsort(feat_crime_chi2[0]) |
feat_crime_names = np.array(tfidf.get_feature_names())[indices_crime] |
uni_grams = [j for j in feat_crime_names if len(j.split(’ ’)) == 1] |
bi_grams = [j for j in feat_crime_names if len(j.split(’ ’)) == 2] |
write (.format(Cyber_crime)) |
write (.format(’\n. ’.join(unigrams[-N:])) |
write(.format(’\n. ’.join(bigrams[-N:])) |
3.3. Clustering and Classification
3.4. Prediction Analysis
4. Results and Analysis
Test Cases
5. Conclusions and Future Scope
Future Scope
Author Contributions
Funding
Acknowledgments
Conflicts of Interest
References
- Singh, A.K.; Prasad, N.; Narkhede, N.; Mehta, S. Crime: Classification and Pattern Prediction. Int. Adv. Res. J. Sci. Eng. Technol. 2016, 3, 41–43. [Google Scholar] [CrossRef]
- Brar, H.S.; Kumar, G. Cybercrimes: A Proposed Taxonomy and Challenges. J. Comput. Netw. Commun. 2018, 2018, 1–11. [Google Scholar] [CrossRef] [Green Version]
- Pete, I.; Chua, Y.T. An Assessment of the Usability of Cybercrime Datasets. In Proceedings of the CSET @ USENIX Security Symposium, Santa Clara, CA, USA, 12 August 2019. [Google Scholar]
- Ngo, F.; Jaishankar, K. Commemorating a Decade in Existence of the International Journal of Cyber Criminology: A Research Agenda to Advance the Scholarship on Cyber Crime. Int. J. Cyber Criminol. 2017, 11, 1–9. [Google Scholar]
- Khusna, A.N.; Agustina, I. Implementation of Information Retrieval Using Tf–idf Weighting Method On Detik.Com’s Website. In Proceedings of the 2018 12th International Conference on Telecommunication Systems, Services and Applications (TSSA), Yogyakarta, Indonesia, 4–5 October 2018; pp. 1–4. [Google Scholar]
- Zhang, G.Z. Computer Forensics Based on Data Mining. Appl. Mech. Mater. 2014, 536–537, 371–375. [Google Scholar] [CrossRef]
- Numan, M.; Subhan, F.; Khan, W.Z.; Hakak, S.; Haider, S.; Reddy, G.T.; Jolfaei, A.; Alazab, M. A Systematic Review on Clone Node Detection in Static Wireless Sensor Networks. IEEE Access 2020, 8, 65450–65461. [Google Scholar] [CrossRef]
- Iwendi, C.; Jalil, Z.; Javed, A.R.; Gadekallu, T.R.; Kaluri, R.; Srivastava, G.; Jo, O. KeySplitWatermark: Zero Watermarking Algorithm for Software Protection against Cyber-Attacks. IEEE Access 2020. [CrossRef]
- Bhattacharya, S.; Somayaji, S.R.K.; Maddikunta, K.P.; Kaluri, R.; Singh, S.; Gadekallu, R.T.; Alazab, M.; Tariq, U. A Novel PCA-Firefly Based XGBoost Classification Model for Intrusion Detection in Networks Using GPU. Electronics 2020, 9, 219. [Google Scholar] [CrossRef] [Green Version]
- Jia, X.; He, D.; Kumar, N.; Choo, K.-K.R. Authenticated key agreement scheme for fog-driven IoT healthcare system. Wirel. Netw. 2019, 25, 4737–4750. [Google Scholar] [CrossRef]
- Wu, L.; Zhang, Y.; Ma, M.; Kumar, N.; He, D. Certificateless searchable public key authenticated encryption with designated tester for cloud-assisted medical Internet of Things. Ann. Telecommun. 2019, 74, 423–434. [Google Scholar] [CrossRef]
- Aggarwal, S.; Shojafar, M.; Kumar, N.; Conti, M. A New Secure Data Dissemination Model in Internet of Drones. In Proceedings of the ICC 2019–2019 IEEE International Conference on Communications (ICC), Shanghai, China, 20–24 May 2019; pp. 1–6. [Google Scholar]
- Wang, T.; Zheng, Z.; Bashir, A.K.; Jolfaei, A.; Xu, Y. FinPrivacy: A Privacy-Preserving Mechanism for Fingerprint Identification. ACM Trans. Internet Technol. 2018, 37, 111–116. [Google Scholar]
- Al Ridhawi, I.; Otoum, S.; Aloqaily, M.; Jararweh, Y.; Baker, T. Providing secure and reliable communication for next generation networks in smart cities. Sustain. Cities Soc. 2020, 56, 102080. [Google Scholar] [CrossRef]
- Alloghani, M.; Baker, T.; Al-Jumeily, D.; Hussain, A.; Mustafina, J.; Aljaaf, A.J. A Systematic Review on Security and Privacy Issues in Mobile Devices and Systems. In Handbook of Computer Networks and Cyber Security; Gupta, B., Perez, G., Agrawal, D., Gupta, D., Eds.; Springer: Cham, Germany, 2020; pp. 585–608. [Google Scholar] [CrossRef]
- Reddy, G.T.; Swarna Priya, R.M.; Parimala, M.; Chowdhary, C.L.; Reddy, P.K.; Hakak, S.; Khan, W.Z. A deep neural networks based model for uninterrupted marine environment monitoring. Comput. Commun. 2020, 157, 64–75. [Google Scholar] [CrossRef]
- Patel, H.; Singh Rajput, D.; Thippa Reddy, G.; Iwendi, C.; Kashif Bashir, A.; Jo, O. A review on classification of imbalanced data for wireless sensor networks. Int. J. Distrib. Sens. Netw. 2020, 16, 1550147720916404. [Google Scholar] [CrossRef]
- Reddy, G.T.; Reddy, M.P.K.; Lakshmanna, K.; Kaluri, R.; Rajput, D.S.; Srivastava, G.; Baker, T. Analysis of Dimensionality Reduction Techniques on Big Data. IEEE Access 2020, 8, 54776–54788. [Google Scholar] [CrossRef]
- Ganesan, M.; Mayilvahanan, P. Cyber Crime Analysis in Social Media Using Data Mining Technique. Int. J. Pure Appl. Math. 2017, 116, 413–424. [Google Scholar]
- Khan, M.A.; Pradhan, S.K.; Fatima, H. Applying Data Mining techniques in Cyber Crimes. In Proceedings of the 2017 2nd International Conference on Anti-Cyber Crimes (ICACC), Abha, Saudi Arabia, 26–27 March 2017; pp. 213–216. [Google Scholar]
- Nouh, M.; Nurse, J.R.C.; Goldsmith, M. Towards Designing a Multipurpose Cybercrime Intelligence Framework. In Proceedings of the 2016 European Intelligence and Security Informatics Conference (EISIC), Uppsala, Sweden, 17–19 August 2016; pp. 60–67. [Google Scholar]
- Prasanthi, M.S.; Ishwarya, T.A.S.K. Cyber Crime Prevention & Detection. Int. J. Adv. Res. Comput. Commun. Eng. 2015, 4, 45–48. [Google Scholar] [CrossRef]
- Soomro, T.R.; Mumtaz, H. Social Media-Related Cybercrimes and Techniques for Their Prevention. Appl. Comput. Syst. 2019, 24, 9–17. [Google Scholar] [CrossRef]
- Çağrı, B.A.; Sağlam, R.B.; Li, S. Automatic Detection of Cyber Security Related Accounts on Online Social Networks: Twitter as an example. In Proceedings of the 9th International Conference on Social Media and Society, Copenhagen, Denmark, 18–20 July 2018; pp. 236–240. [Google Scholar]
- Chen, H.; Chung, W.; Xu, J.J.; Wang, G.; Qin, Y.; Chau, M. Crime data mining: A general framework and some examples. Computer 2004, 37, 50–56. [Google Scholar] [CrossRef] [Green Version]
- Prabakaran, S.; Mitra, S. Survey of Analysis of Crime Detection Techniques Using Data Mining and Machine Learning. J. Phys. Conf. Ser. 2018, 1000, 012046. [Google Scholar] [CrossRef]
- Chauhan, C.; Sehgal, S. A review: Crime analysis using data mining techniques and algorithms. In Proceedings of the 2017 International Conference on Computing, Communication and Automation (ICCCA), Greater Noida, India, 5–6 May 2017; pp. 21–25. [Google Scholar]
- An, J.; Kim, H. A Data Analytics Approach to the Cybercrime Underground Economy. IEEE Access 2018, 6, 26636–26652. [Google Scholar] [CrossRef]
- Tsakalidis, G.; Vergidis, K. A Systematic Approach Toward Description and Classification of Cybercrime Incidents. IEEE Trans. Syst. Man Cybern. Syst. 2019, 49, 710–729. [Google Scholar] [CrossRef]
- Tsakalidis, G.; Vergidis, K.; Madas, M. Cybercrime Offenses: Identification, Classification and Adaptive Response. In Proceedings of the 2018 5th International Conference on Control, Decision and Information Technologies (CoDIT), Thessaloniki, Greece, 10–13 April 2018; pp. 470–475. [Google Scholar]
- Gangavane, H.N.; Nikose, M.C. A Survey on Document Clutering for identifying Criminal. Int. J. Adv. Res. Artif. Intell. 2015, 2, 459–463. [Google Scholar] [CrossRef] [Green Version]
- Zubi, Z.S.; Mahmmud, A.A. Crime Data Analysis using Data mining Techniques to Improve Crimes Prevention. Int. J. Comput. 2014, 8, 39–45. [Google Scholar] [CrossRef]
- Sudha, T.S.; Rupa, C. Analysis and Evaluation of Integrated Cyber Crime Offenses. In Proceedings of the 2019 Innovations in Power and Advanced Computing Technologies (i-PACT), Vellore, India, 22–23 March 2019; pp. 1–6. [Google Scholar]
- Reddy, G.T.; Sudheer, K.; Rajesh, K.; Lakshmanna, K. Employing data mining on highly secured private clouds for implementing a security-asa-service framework. J. Theor. Appl. Inf. Technol. 2014, 59, 317–326. [Google Scholar]
- Kigerl, A. Cyber Crime Nation Typologies: K-Means Clustering of Countries Based on Cyber Crime Rates. Int. J. Cyber Criminol. 2016, 10, 147–169. [Google Scholar] [CrossRef]
- Wu, H.; Yuan, N. An Improved TF–IDF algorithm based on word frequency distribution information and category distribution information. In Proceedings of the 3rd International Conference on Intelligent Information Processing, Guilin, Chin, 4–6 May 2018; pp. 211–215. [Google Scholar]
- Zheng, M.; Robbins, H.; Chai, Z.; Thapa, P.; Moore, T. Cybersecurity Research Datasets: Taxonomy and Empirical Analysis. In Proceedings of the International Conference on Cyber Security Experimentation and Test, Baltimore, MD, USA, 13 August 2018. [Google Scholar]
- Wang, C.; Yang, B.; Luo, J. Identity Theft Detection in Mobile Social Networks Using Behavioral Semantics. In Proceedings of the 2017 IEEE International Conference on Smart Computing (SMARTCOMP), Hong Kong, China, 29–31 May 2017; pp. 1–3. [Google Scholar]
- Zhijun, L.; Ning, W. A Cyber Crime Investigation Model Based on Case Characteristics. In Proceedings of the 2017 4th International Conference on Information Science and Control Engineering (ICISCE), Changsha, China, 21–23 July 2017; pp. 11–15. [Google Scholar]
- Delamaire, L.; Abdou, H.; Pointon, J. Credit Card fraud and Detection techniques: A review. Banks Bank Syst. 2009, 4, 57–68. [Google Scholar]
- Roul, R.K.; Sahoo, J.K.; Arora, K. Modified TF–IDF Term Weighting Strategies for Text Categorization. In Proceedings of the 2017 14th IEEE India Council International Conference (INDICON), Roorkee, India, 15–17 December 2017; pp. 1–6. [Google Scholar]
- Sonawane, T.R.; Al-Shaikh, S.; Shinde, R.; Shaikh, S.; Sayyad, A.G. Crime Pattern Analysis Visualization and Prediction using Data Mining. Int. J. Adv. Res. Innov. Ideas Educ. 2015, 1, 681–686. [Google Scholar]
- Williams, M.L.; Burnap, P.; Sloan, L. Crime Sensing With Big Data: The Affordances and Limitations of Using Open-source Communications to Estimate Crime Patterns. Br. J. Criminol. 2016, 57, 320–340. [Google Scholar] [CrossRef] [Green Version]
- Agarwal, A.; Chougule, D.; Agarwal, A.; Chimote, D. Application for Analysis and Prediction of Crime data using Data mining. Int. J. Adv. Comput. Eng. Netw. (IJACEN) 2016, 4, 9–12. [Google Scholar]
Incident | Offender | Access violation | Victim | Harm | Year | Location | Age of offender |
---|---|---|---|---|---|---|---|
Illegal downloading | CC | TI | Company | Loss of proprietary | 2013 | Delhi | 27 |
Pirated textbook | CC | TI | Individual | Loss of copyright | 2012 | Maharashtra | 38 |
Illegal downloading of application | CC | TI | Company | Loss of proprietary | 2013 | Hyderabad | 26 |
Pirated software | CC | TI | Company | Loss of intellectual rights | 2012 | Hyderabad | 22 |
Illegal downloading of music | CC | TI | Industry | Loss of proprietary | 2015 | Hyderabad | 20 |
Hacking of power plant communication | CH | TI | State | Infrastructure loss | 2013 | Delhi | 34 |
Hacking of smart phone | CH | TT | Individual | Loss of proprietary | 2014 | Gujrat | 23 |
Hacking of government website | CH | TI | State | Economic loss | 2015 | Maharashtra | 32 |
Stealing of credit card information | CC | TI | Individual | Financial loss | 2015 | Banglore | 33 |
Illegal purchase of goods | CC | TI | Company | Loss of proprietary | 2013 | Banglore | 29 |
Creating a fake account of reputed person | CC | TI | Individual | Loss of reputation | 2015 | Hyderabad | 21 |
Siphoned money from a individual account | CT | TI | Individual | Financial loss | 2015 | Tamilnadu | 22 |
OTP theft | CC | TI | Individual | Financial loss | 2012 | Tamilnadu | 24 |
Illegal purchase of goods | CC | TI | Company | Loss of proprietary | 2011 | Maharashtra | 26 |
KYC theft | CT | TI | Individual | Financial loss | 2013 | Maharashtra | 22 |
Creating a fake ID | CH | TS | Individual | Loss of reputation | 2014 | Bihar | 23 |
Spoof calling | CC | TT | Individual | Loss of privacy | 2011 | Gujrat | 24 |
Hacking of password of an account | CH | TI | Individual | Loss of reputation | 2013 | Bihar | 21 |
Illegal downloading of movie | CH | TI | Industry | Loss of proprietary | 2012 | Maharashtra | 22 |
Hacking of smart phone | CH | TT | Individual | Loss of security | 2011 | Hyderabad | 24 |
Illegal access of social account | CC | TI | Individual | Loss of proprietary | 2014 | Hyderabad | 21 |
OTP theft | CT | TI | Individual | Financial loss | 2012 | Maharashtra | 32 |
Illegal access of college website | CC | TI | Organisation | Financial loss | 2015 | Delhi | 23 |
Stealing of bank account details | CC | TI | Individual | Financial loss | 2013 | Maharashtra | 30 |
Content-Based Features |
---|
Incident |
Offender |
Harm |
Access Violation |
Year |
Victim |
Cybercrime | Incident | Catergory_id | |
---|---|---|---|
0 | Identity Theft | Email Id Theft | 0 |
1 | Copyright attack | Pirated application | 1 |
2 | Identity Theft | Illegal purchase of goods | 0 |
3 | Copyright attack | Posting an article without permission | 1 |
4 | Copyright attack | Making piracy of an application | 1 |
5 | Identity Theft | KYC theft | 0 |
6 | Identity Theft | Online shopping fraud | 0 |
7 | Hacking | Hacking of Smart phone | 2 |
8 | Copyright attack | Illegal downloading of movie | 1 |
9 | Identity Theft | Illegal access of bank account | 0 |
Crime Type | Total Crimes | |
---|---|---|
0 | Copyright attack | 466 |
1 | Hacking | 282 |
2 | ID theft | 868 |
3 | Others | 1923 |
© 2020 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).
Share and Cite
Ch, R.; Gadekallu, T.R.; Abidi, M.H.; Al-Ahmari, A. Computational System to Classify Cyber Crime Offenses using Machine Learning. Sustainability 2020, 12, 4087. https://doi.org/10.3390/su12104087
Ch R, Gadekallu TR, Abidi MH, Al-Ahmari A. Computational System to Classify Cyber Crime Offenses using Machine Learning. Sustainability. 2020; 12(10):4087. https://doi.org/10.3390/su12104087
Chicago/Turabian StyleCh, Rupa, Thippa Reddy Gadekallu, Mustufa Haider Abidi, and Abdulrahman Al-Ahmari. 2020. "Computational System to Classify Cyber Crime Offenses using Machine Learning" Sustainability 12, no. 10: 4087. https://doi.org/10.3390/su12104087
APA StyleCh, R., Gadekallu, T. R., Abidi, M. H., & Al-Ahmari, A. (2020). Computational System to Classify Cyber Crime Offenses using Machine Learning. Sustainability, 12(10), 4087. https://doi.org/10.3390/su12104087