Ransomware Detection Using Machine Learning: A Survey
Abstract
:1. Introduction
2. Background
3. Survey Planning
4. Literature Review
5. Evolution of Ransomware
- Distribution campaign: The attacker silently induces the victim to download the infection-starting dropper code. The attacker uses methods including email phishing, social engineering, and others.
- Malicious code injection: During this phase, the target’s computer is infected with ransomware, and malicious code is downloaded.
- Malicious payload staging: Ransomware sets up persistence by inserting the system.
- Scan checks for encryption on the target computer and any network-accessible resources.
- Encryption: The process of encrypting all of the selected documents begins.
- Payday: Victims cannot access their data, and a notification seeking payment is visible on the screen of the targeted device.
6. Ransomware Detection
6.1. Ransomware-Detection Methods
6.1.1. Manual Ransomware Detection
Scanning
6.1.2. Automated Ransomware Detection
Artificial-Intelligence-Based Approaches
- 1.
- Machine Learning Approaches
- a.
- Machine Learning Algorithms for Ransomware Detection
- 2.
- Deep Learning Approaches
- 3.
- Artificial Neural Network Approaches
Non-Artificial-Intelligence-Based Methods
- 1.
- Packet Inspection
- 2.
- Traffic Analysis
6.2. Ransomware-Detection Techniques
6.2.1. Signature-Based Detection
6.2.2. Heuristic-Based Detection
6.2.3. Network-Based Detection
6.2.4. Hybrid Detection
6.3. Feature Extraction and Selection
6.3.1. Features Used for Ransomware Detection
- File access patterns are a common feature used to detect ransomware. Ransomware often accesses and encrypts files in a specific pattern, such as alphabetical order, extension type, or creation date. This behavior can be detected using file access patterns as features. For example, analysis of file access patterns may reveal that a large number of files are being accessed and modified in a short period of time, indicating a potential ransomware attack [48].
- System calls are another feature commonly used for ransomware detection. Ransomware frequently uses system calls to perform malicious activities, such as reading and writing files, creating processes, and network communication. System-call traces can be extracted and used as features for detection. For example, analysis of system-call traces may reveal that a process is making an unusually high number of system calls, which could indicate ransomware activity [34].
- Network traffic analysis is a valuable feature for detecting ransomware. Typically, ransomware uses a command-and-control (C&C) server to deliver and receive orders. Analysis of network traffic can provide valuable features for detecting ransomware. For example, analysis of network traffic may reveal that a large amount of data are being sent to an unusual IP address, which could indicate that the system is infected with ransomware [49].
- Behavioral analysis is another approach to ransomware detection. This involves monitoring the behavior of running processes and identifying anomalies that indicate malicious activity. Features such as process creation, termination, and file access can be used for this type of analysis. For example, the analysis of process creation and termination events may reveal that a process is spawning multiple child processes, which could indicate ransomware activity [1].
- Static analysis is the examination of the executable file’s source code to spot malicious activity. Features such as code size, entropy, and string patterns can be used for this purpose. For example, analysis of code size and entropy may reveal that a file contains obfuscated code, which could indicate ransomware activity [32]. Behavioral analysis and dynamic analysis are similar in that they both involve the monitoring of running processes to identify malicious activity. However, there are some key differences between the two approaches.
6.3.2. Feature Selection Techniques
- Principal component analysis: This technique is used to reduce the dimensionality of a dataset by identifying the most critical features that explain the majority of the variance in the data. Principal component analysis can help identify redundant or irrelevant features and select the most informative ones for ransomware detection [50].
- Correlation analysis: Correlation analysis is a technique used to identify the correlation between features in a dataset. Highly correlated features may be redundant and can be removed to simplify the model and improve performance [27].
6.4. Performance Evaluation of Machine Learning Models for Ransomware Detection
- Accuracy: Accuracy is the most straightforward evaluation metric, representing the percentage of correct predictions made by the model. It is calculated as the ratio of accurate predictions to the total number of predictions. However, accuracy can be misleading when dealing with imbalanced datasets, where negative samples greatly outweigh the positive models [51,52].
- Precision: Out of all samples predicted to be positive (recognized as ransomware by the algorithm), precision is the percentage of true positives (samples of successfully identified malware). The ratio of true positives to the total of true and false positives is known as precision. A model with a high precision score will have a low false-positive rate, making it less likely to mistakenly label innocent files as ransomware [52].
- Recall: Recall counts the number of positive samples in the collection that are true positives. The ratio of true positives to true and false negatives is computed. A high recall score suggests that the model has a low incidence of false negatives, which makes it less likely to fail to detect actual ransomware samples [13,52].
- ROC curve: The performance of a binary classifier as the discrimination threshold is changed is graphically represented by a receiver operating characteristic (ROC) curve. At various threshold values, it plots the actual-positive rate (TPR) versus the false-positive rate (FPR). The model’s overall performance is assessed using the area under the ROC curve (AUC), with higher AUC values indicating better performance [53].
7. Challenges and Future Directions
7.1. Challenges in Developing Effective Machine-Learning-Based Ransomware-Detection Systems
- Rapidly evolving ransomware—Ransomware is a constantly changing threat, with new variants and attack techniques being developed regularly. This makes it challenging to build machine learning models that can detect all ransomware accurately and quickly [56].
- Adversarial attacks involve modifying the input data to bypass the machine learning model’s detection capabilities. Malicious attacks can be used to evade ransomware-detection systems, making the systems less effective [56].
- Real-time detection requirements—Ransomware can spread rapidly and cause significant damage within a short time-frame. Therefore, ransomware-detection systems must be able to detect ransomware in real-time to prevent further spread and damage [57].
- One of the main challenges in collecting data for ransomware detection is the need for publicly available datasets that include real-world ransomware samples. This is due to the sensitive nature of the data and the fact that many victims are reluctant to report ransomware attacks. As a result, researchers often rely on synthetic datasets or datasets generated from sandbox environments, which may not accurately reflect the complexity and variability of real-world ransomware attacks [3].
- Another challenge is the diversity of ransomware families and variants, which require a large and diverse dataset to ensure adequate coverage. Ransomware behavior can also vary depending on the victim’s system and network environment, making generalizing detection models across different contexts challenging [2,54].
- Preprocessing data for ransomware detection also presents several challenges. Ransomware often employs obfuscation techniques to evade detection, such as encrypting the payload or using anti-analysis mechanisms. This can make extracting relevant data features and identifying patterns that distinguish ransomware from benign software difficult. In addition, ransomware may use legitimate system functions that are difficult to distinguish from malicious behavior, requiring advanced feature engineering and modeling techniques [54].
- Despite these challenges, several datasets have been used to train and evaluate ransomware-detection models.
- Collecting and preprocessing data for ransomware detection using machine learning presents several challenges, including the lack of real-world datasets, the diversity of ransomware families and variants, and the obfuscation techniques used by ransomware. However, several datasets have been developed to address these challenges, providing valuable resources for training and evaluating ransomware-detection models [54].
7.2. Future Work
- Incorporating real-time detection capabilities—Ransomware-detection systems must incorporate real-time detection capabilities to quickly identify and prevent ransomware attacks. This can be achieved through the use of real-time monitoring and analysis techniques [55].
- Collaboration and sharing of data—Collaboration and sharing of data among researchers and organizations can help develop more effective ransomware-detection systems. This can help build more comprehensive datasets for training and testing machine learning models [56].
- Developing effective machine-learning-based ransomware-detection systems is challenging for several reasons. However, with advanced techniques and collaboration among researchers and organizations, it is possible to develop more robust and accurate ransomware-detection systems [54].
8. Conclusions
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
References
- Celdrán, A.H.; Sánchez, P.M.S.; Castillo, M.A.; Bovet, G.; Pérez, G.M.; Stiller, B. Intelligent and behavioral-based detection of malware in IoT spectrum sensors. Int. J. Inf. Secur. 2022, 22, 541–561. [Google Scholar] [CrossRef]
- Chesti, I.A.; Humayun, M.; Sama, N.U.; Jhanjhi, N. Evolution, mitigation, and prevention of ransomware. In Proceedings of the 2020 2nd International Conference on Computer and Information Sciences (ICCIS), Sakaka, Saudi Arabia, 13–15 October 2020; pp. 1–6. [Google Scholar]
- Philip, K.; Sakir, S.; Domhnall, C. Evolution of ransomware. IET Netw. 2018, 7, 321–327. [Google Scholar]
- Jegede, A.; Fadele, A.; Onoja, M.; Aimufua, G.; Mazadu, I.J. Trends and Future Directions in Automated Ransomware Detection. J. Comput. Soc. Inform. 2022, 1, 17–41. [Google Scholar] [CrossRef]
- Brewer, R. Ransomware attacks: Detection, prevention and cure. Netw. Secur. 2016, 2016, 5–9. [Google Scholar] [CrossRef]
- Bello, I.; Chiroma, H.; Abdullahi, U.A.; Gital, A.Y.; Jauro, F.; Khan, A.; Okesola, J.O.; Abdulhamid, S.M. Detecting ransomware attacks using intelligent algorithms: Recent development and next direction from deep learning and big data perspectives. J. Ambient Intell. Humaniz. Comput. 2021, 12, 8699–8717. [Google Scholar] [CrossRef]
- Zahra, A.; Shah, M.A. IoT based ransomware growth rate evaluation and detection using command and control blacklisting. In Proceedings of the 2017 23rd International Conference on Automation and Computing (ICAC), Huddersfield, UK, 7–8 September 2017; pp. 1–6. [Google Scholar]
- Shaukat, S.K.; Ribeiro, V.J. RansomWall: A layered defense system against cryptographic ransomware attacks using machine learning. In Proceedings of the 2018 10th International Conference on Communication Systems & Networks (COMSNETS), Bengaluru, India, 3–7 January 2018; pp. 356–363. [Google Scholar]
- Makinde, O.; Sangodoyin, A.; Mohammed, B.; Neagu, D.; Adamu, U. Distributed network behaviour prediction using machine learning and agent-based micro simulation. In Proceedings of the 2019 7th International Conference on Future Internet of Things and Cloud (FiCloud), Istanbul, Turkey, 26–28 August 2019; pp. 182–188. [Google Scholar]
- Almashhadani, A.O.; Kaiiali, M.; Sezer, S.; O’Kane, P. A multi-classifier network-based crypto ransomware detection system: A case study of locky ransomware. IEEE Access 2019, 7, 47053–47067. [Google Scholar] [CrossRef]
- Singh, A.; Ikuesan, R.A.; Venter, H. Ransomware detection using process memory. arXiv 2022, arXiv:2203.16871. [Google Scholar] [CrossRef]
- Silva, J.A.H.; Hernández-Alvarez, M. Large scale ransomware detection by cognitive security. In Proceedings of the 2017 IEEE Second Ecuador Technical Chapters Meeting (ETCM), Salinas, Ecuador, 16–20 October 2017; pp. 1–4. [Google Scholar]
- Azmoodeh, A.; Dehghantanha, A.; Conti, M.; Choo, K.K.R. Detecting crypto-ransomware in IoT networks based on energy consumption footprint. J. Ambient Intell. Humaniz. Comput. 2018, 9, 1141–1152. [Google Scholar] [CrossRef]
- Ghouti, L.; Imam, M. Malware classification using compact image features and multiclass support vector machines. IET Inf. Secur. 2020, 14, 419–429. [Google Scholar] [CrossRef]
- Modi, J. Detecting Ransomware in Encrypted Network Traffic Using Machine Learning. Ph.D. Thesis, University of Victoria, Saanich, BC, Canada, 2019. [Google Scholar]
- Ameer, M. Android Ransomware Detection Using Machine Learning Techniques to Mitigate Adversarial Evasion Attacks. Master’s Thesis, Capital University of Science and Technology, Islamabad, Pakistan, 2019. [Google Scholar]
- Khammas, B.M. Ransomware detection using random forest technique. ICT Express 2020, 6, 325–331. [Google Scholar] [CrossRef]
- Hwang, J.; Kim, J.; Lee, S.; Kim, K. Two-stage ransomware detection using dynamic analysis and machine learning techniques. Wirel. Pers. Commun. 2020, 112, 2597–2609. [Google Scholar] [CrossRef]
- Talabani, H.S.; Abdulhadi, H.M.T. Bitcoin ransomware detection employing rule-based algorithms. Sci. J. Univ. Zakho 2022, 10, 5–10. [Google Scholar] [CrossRef]
- Adamu, U.; Awan, I. Ransomware prediction using supervised learning algorithms. In Proceedings of the 2019 7th International Conference on Future Internet of Things and Cloud (FiCloud), Istanbul, Turkey, 26–28 August 2019; pp. 57–63. [Google Scholar]
- Wan, Y.L.; Chang, J.C.; Chen, R.J.; Wang, S.J. Feature-selection-based ransomware detection with machine learning of data analysis. In Proceedings of the 2018 3rd International Conference on Computer and Communication Systems (ICCCS), Nagoya, Japan, 27–30 April 2018; pp. 85–88. [Google Scholar]
- Alzahrani, A.; Alshehri, A.; Alshahrani, H.; Alharthi, R.; Fu, H.; Liu, A.; Zhu, Y. Randroid: Structural similarity approach for detecting ransomware applications in android platform. In Proceedings of the 2018 IEEE International Conference on Electro/Information Technology (EIT), Rochester, MI, USA, 3–5 May 2018; pp. 0892–0897. [Google Scholar]
- Scaife, N.; Carter, H.; Traynor, P.; Butler, K.R. Cryptolock (and drop it): Stopping ransomware attacks on user data. In Proceedings of the 2016 IEEE 36th International Conference on Distributed Computing Systems (ICDCS), Nara, Japan, 27–30 June 2016; pp. 303–312. [Google Scholar]
- Sgandurra, D.; Muñoz-González, L.; Mohsen, R.; Lupu, E.C. Automated dynamic analysis of ransomware: Benefits, limitations and use for detection. arXiv 2016, arXiv:1609.03020. [Google Scholar]
- Prakash, K.P.; Nafis, T.; Biswas, S.S. Preventive Measures and Incident Response for Locky Ransomware. Int. J. Adv. Res. Comput. Sci. 2017, 8, 392–395. [Google Scholar]
- Paquet-Clouston, M.; Haslhofer, B.; Dupont, B. Ransomware payments in the bitcoin ecosystem. J. Cybersecur. 2019, 5, tyz003. [Google Scholar] [CrossRef]
- Kok, S.; Abdullah, A.; Jhanjhi, N.; Supramaniam, M. Ransomware, threat and detection techniques: A review. Int. J. Comput. Sci. Netw. Secur 2019, 19, 136. [Google Scholar]
- Thakran, E.; Kumari, A. Impact of “Ransomware” on Critical Infrastructure Due to Pandemic. 2023, p. 5. Available online: https://ssrn.com/abstract=4361110 (accessed on 3 July 2023).
- Ahmed, Y.A.; Huda, S.; Al-rimy, B.A.S.; Alharbi, N.; Saeed, F.; Ghaleb, F.A.; Ali, I.M. A weighted minimum redundancy maximum relevance technique for ransomware early detection in industrial IoT. Sustainability 2022, 14, 1231. [Google Scholar] [CrossRef]
- Aslan, Ö.A.; Samet, R. A comprehensive review on malware detection approaches. IEEE Access 2020, 8, 6249–6271. [Google Scholar] [CrossRef]
- Akhtar, M.S.; Feng, T. Malware Analysis and Detection Using Machine Learning Algorithms. Symmetry 2022, 14, 2304. [Google Scholar] [CrossRef]
- Yamany, B.; Elsayed, M.S.; Jurcut, A.D.; Abdelbaki, N.; Azer, M.A. A New Scheme for Ransomware Classification and Clustering Using Static Features. Electronics 2022, 11, 3307. [Google Scholar] [CrossRef]
- Yamany, B.; Azer, M.A.; Abdelbaki, N. Ransomware Clustering and Classification using Similarity Matrix. In Proceedings of the 2022 2nd International Mobile, Intelligent, and Ubiquitous Computing Conference (MIUCC), Cairo, Egypt, 8–9 May 2022; pp. 41–46. [Google Scholar]
- Ullah, F.; Javaid, Q.; Salam, A.; Ahmad, M.; Sarwar, N.; Shah, D.; Abrar, M. Modified decision tree technique for ransomware detection at runtime through API Calls. Sci. Program. 2020, 2020, 8845833. [Google Scholar] [CrossRef]
- Arunkumar, M.; Kumar, K.A. GOSVM: Gannet optimization based support vector machine for malicious attack detection in cloud environment. Int. J. Inf. Technol. 2023, 15, 1653–1660. [Google Scholar] [CrossRef]
- Selamat, N.; Ali, F. Comparison of malware detection techniques using machine learning algorithm. Indones. J. Electr. Eng. Comput. Sci. 2019, 16, 435. [Google Scholar] [CrossRef]
- Mezquita, Y.; Alonso, R.S.; Casado-Vara, R.; Prieto, J.; Corchado, J.M. A review of k-nn algorithm based on classical and quantum machine learning. In Distributed Computing and Artificial Intelligence, Special Sessions, 17th International Conference; Springer: Berlin/Heidelberg, Germany, 2021; pp. 189–198. [Google Scholar]
- Saadat, S.; Joseph Raymond, V. Malware classification using CNN-XGBoost model. In Artificial Intelligence Techniques for Advanced Computing Applications: Proceedings of ICACT 2020; Springer: Berlin/Heidelberg, Germany, 2021; pp. 191–202. [Google Scholar]
- Noorbehbahani, F.; Rasouli, F.; Saberi, M. Analysis of machine learning techniques for ransomware detection. In Proceedings of the 2019 16th International ISC (Iranian Society of Cryptology) Conference on Information Security and Cryptology (ISCISC), Mashhad, Iran, 28–29 August 2019; pp. 128–133. [Google Scholar]
- Sharmeen, S.; Ahmed, Y.A.; Huda, S.; Koçer, B.Ş.; Hassan, M.M. Avoiding future digital extortion through robust protection against ransomware threats using deep learning based adaptive approaches. IEEE Access 2020, 8, 24522–24534. [Google Scholar] [CrossRef]
- Swami, S.; Swami, M.; Nidhi, N. Ransomware Detection System and Analysis Using Latest Tool. Int. J. Adv. Res. Sci. Commun. Technol. 2021, 7, 2581–9429. [Google Scholar] [CrossRef]
- Wang, X.b.; Yang, G.y.; Li, Y.c.; Liu, D. Review on the application of artificial intelligence in antivirus detection system i. In Proceedings of the 2008 IEEE Conference on Cybernetics and Intelligent Systems, Chengdu, China, 21–24 September 2008; pp. 506–509. [Google Scholar]
- Yang, B.; Liu, D. Research on Network Traffic Identification based on Machine Learning and Deep Packet Inspection. In Proceedings of the 2019 IEEE 3rd Information Technology, Networking, Electronic and Automation Control Conference (ITNEC), Chengdu, China, 15–17 March 2019; pp. 1887–1891. [Google Scholar] [CrossRef]
- Pimenta Rodrigues, G.A.; de Oliveira Albuquerque, R.; Gomes de Deus, F.E.; de Sousa Jr, R.T.; de Oliveira Júnior, G.A.; Garcia Villalba, L.J.; Kim, T.H. Cybersecurity and network forensics: Analysis of malicious traffic towards a honeynet with deep packet inspection. Appl. Sci. 2017, 7, 1082. [Google Scholar] [CrossRef]
- Song, W.; Beshley, M.; Przystupa, K.; Beshley, H.; Kochan, O.; Pryslupskyi, A.; Pieniak, D.; Su, J. A software deep packet inspection system for network traffic analysis and anomaly detection. Sensors 2020, 20, 1637. [Google Scholar] [CrossRef]
- Cascarano, N.; Ciminiera, L.; Risso, F. Optimizing deep packet inspection for high-speed traffic analysis. J. Netw. Syst. Manag. 2011, 19, 7–31. [Google Scholar] [CrossRef]
- Dargahi, T.; Dehghantanha, A.; Bahrami, P.N.; Conti, M.; Bianchi, G.; Benedetto, L. A Cyber-Kill-Chain based taxonomy of crypto-ransomware features. J. Comput. Virol. Hacking Tech. 2019, 15, 277–305. [Google Scholar] [CrossRef]
- Sheen, S.; Asmitha, K.; Venkatesan, S. R-Sentry: Deception based ransomware detection using file access patterns. Comput. Electr. Eng. 2022, 103, 108346. [Google Scholar] [CrossRef]
- Madani, H.; Ouerdi, N.; Boumesaoud, A.; Azizi, A. Classification of ransomware using different types of neural networks. Sci. Rep. 2022, 12, 4770. [Google Scholar] [CrossRef] [PubMed]
- Arivudainambi, D.; Varun Kumar, K.A.; Visu, P.; Sibi Chakkaravarthy, S. Malware traffic classification using principal component analysis and artificial neural network for extreme surveillance. Comput. Commun. 2019, 147, 50–57. [Google Scholar]
- Kok, S.; Azween, A.; Jhanjhi, N. Evaluation metric for crypto-ransomware detection using machine learning. J. Inf. Secur. Appl. 2020, 55, 102646. [Google Scholar] [CrossRef]
- Masum, M.; Faruk, M.J.H.; Shahriar, H.; Qian, K.; Lo, D.; Adnan, M.I. Ransomware classification and detection with machine learning algorithms. In Proceedings of the 2022 IEEE 12th Annual Computing and Communication Workshop and Conference (CCWC), Las Vegas, NV, USA, 26–29 January 2022; pp. 0316–0322. [Google Scholar]
- Edis, D.; Hayman, T.; Vatsa, A. Understanding Complex Malware. In Proceedings of the 2021 IEEE Integrated STEM Education Conference (ISEC), Princeton, NJ, USA, 13 March 2021; pp. 1–2. [Google Scholar]
- Beaman, C.; Barkworth, A.; Akande, T.D.; Hakak, S.; Khan, M.K. Ransomware: Recent advances, analysis, challenges and future research directions. Comput. Secur. 2021, 111, 102490. [Google Scholar] [CrossRef] [PubMed]
- McIntosh, T.; Kayes, A.; Chen, Y.P.P.; Ng, A.; Watters, P. Ransomware mitigation in the modern era: A comprehensive review, research challenges, and future directions. ACM Comput. Surv. (CSUR) 2021, 54, 1–36. [Google Scholar] [CrossRef]
- Aboaoja, F.A.; Zainal, A.; Ghaleb, F.A.; Al-rimy, B.A.S.; Eisa, T.A.E.; Elnour, A.A.H. Malware detection issues, challenges, and future directions: A survey. Appl. Sci. 2022, 12, 8482. [Google Scholar] [CrossRef]
- Gorment, N.Z.; Selamat, A.; Cheng, L.K.; Krejcar, O. Machine Learning Algorithm for Malware Detection: Taxonomy, Current Challenges and Future Directions. IEEE Access 2023, 1. [Google Scholar] [CrossRef]
- Kapoor, A.; Gupta, A.; Gupta, R.; Tanwar, S.; Sharma, G.; Davidson, I.E. Ransomware detection, avoidance, and mitigation scheme: A review and future directions. Sustainability 2021, 14, 8. [Google Scholar] [CrossRef]
Reference | Year | Author | Resolved the Issue | Utilized Technique | Result | Limitation |
---|---|---|---|---|---|---|
[7] | 2017 | Zahra and Sha | Detecting a ransomware attack using Cryptowall. | Blocklisting of command-and-control (C&C) servers. | The web proxy server, which acts as the TCP/IP traffic gateway, extracts the TCP/IP header. | The model’s efficacy and precision in identifying ransomware and its attack techniques against various operating system environments were not demonstrated through implementation. |
[8] | 2018 | Shaukat and Ribeiro | Detection of ransomware. | RansomWall, a layered and hybrid mechanism. | Effective at identifying zero-day attacks. | N/A |
[9] | 2019 | Makinde et al. | To determine whether an actual network system is vulnerable to a ransomware assault. | Learning machines. | Correlation greater than 0.8. | It imitated the behavior of a small group of users. |
[10] | 2019 | Ahmad et al. | Differentiating Locky ransomware users. | Utilizing parallel classifiers, a behavioral approach to ransomware detection. | Highly reliable detection with a low proportion of false positives. | N/A |
[11] | 2022 | Singh et al. | Discovery of new ransomware families and classification of newly discovered ransomware assaults. | Checks process memory access privileges to enable rapid and accurate malware detection. | Between 81.38% and 96.28% accuracy. | N/A |
Reference | Year | Author | Problem Addressed | Method Used | Result |
---|---|---|---|---|---|
[14] | 2017 | Rahman and Hasan | Enhanced ransomware-detection method. | Using support vector machines as an analysis tool. | Better ransomware detection is achieved with an integrated approach than static or dynamic analysis used separately. |
[13] | 2018 | Dehghantanha et al. | Windows ransomware detection that is quick and accurate. | Netconverse (classifier using j48 decision tree). | 97.1% actual-positive detection rate. |
[15] | 2019 | Jasmin | Separating ransomware traffic and regular traffic. | Algorithms used in logistic regression include random forest and support vector machine. | The best detection rate is 99.9% for the random forest, with 0% false positives. |
[16] | 2019 | Ameer | Detection of ransomware. | Analyses that are static and dynamic. | 100% detection and classification precision. |
[17] | 2020 | Khammas | Detection of ransomware. | Random forest method. | 97.74% of samples are detected. |
[18] | 2020 | Hwang et al. | An improved method of detecting ransomware. | Random forest and Markov models. | 97.3% overall accuracy, 4.8% for false positives, and 1.5% for false negatives. |
[19] | 2022 | Talabani and Abdulhadi | Tools for detecting ransomware that involve data mining and machine learning approaches have poor accuracy. | Decision Table and PARTially Decided Decision Tree. | Recall (96%), accuracy (96.01%), F-measure (95.6%), and precision (95.9%). |
Reference | Year | Name of the Ransomware | Description |
---|---|---|---|
[4] | 1989 | AIDS Trojan | The first known ransomware attack, the AIDS Trojan, was distributed on floppy disks and demanded a payment of USD 189 to unlock infected files. |
[5] | 2012 | Reveton | Ransomware that posed as law enforcement and demanded payment for supposed illegal activities. |
[23] | 2013 | CryptoLocker | One of the first widespread ransomware attacks that used encryption to lock victims’ files. |
[24] | 2014 | CryptoWall | A variant of CryptoLocker that caused millions of dollars in damages. |
[3] | 2015 | TeslaCrypt | A ransomware strain that targeted gamers and encrypted game-related files. |
[25] | 2016 | Locky | Ransomware that was spread through malicious email attachments. |
[3] | 2017 | WannaCry | A ransomware attack affecting over 200,000 systems across 150 different countries. |
[26] | 2018 | SamSam | A ransomware attack that targeted hospitals, municipalities, and other organizations. |
[3] | 2019 | Ryuk | A ransomware attack that caused significant damage to several companies and organizations. |
[27] | 2020 | Maze | A ransomware attack that encrypted victims’ files and threatened to leak sensitive data if the ransom was not paid. |
[3] | 2021 | REvil/Sodinokibi | A ransomware attack that targeted Kaseya, a software company, and affected over 1500 businesses worldwide. |
[28] | 2022 | Royal Ransomware | A ransomware attack that encrypted victims and demanded a ransom payment in order to decrypt them, targeting businesses, governments, and healthcare organizations, with victims mostly from the United States. |
[28] | 2023 | LockBit Ransomware | A ransomware attack that encrypts the files and demands payment in exchange for the decryption key, often in conjunction with phishing emails or other social engineering techniques. |
References | Algorithm | Characteristics |
---|---|---|
[17,34] | Decision tree | Decision trees can be trained on features such as file modifications, network traffic, and system calls to distinguish between ransomware and benign software behavior. The resulting decision tree can then be used to determine whether new data contain ransomware. |
[17,34] | Random forest | In order to guarantee that each tree in the forest has the same distribution and is dependent on the values of a randomly selected random vector, this strategy uses an ensemble method that combines tree predictors. Performance may be enhanced in comparison to standalone decision trees. Using a network of decision trees, the random forest approach is used to select and forecast the input data type. |
[14,35] | Support vector machine | Support vector machines can be trained on features such as system calls, network traffic, and file behavior to distinguish between ransomware and benign software behavior. After that, it is possible to determine whether new data constitute ransomware using the resultant support vector machines. Support vector machines are handy when the data are high-dimensional and non-linearly separable, as is often the case in ransomware detection. |
[36,37] | k-nearest neighbor | k-nearest neighbor is a popular machine learning algorithm used in various research fields. It is a non-parametric approach that can be used for both classification and regression tasks. KNN is known for its simplicity, but is also computationally expensive, with simplified and concise hyperparameters. |
[38] | XGBoost | Extreme gradient boosting is a powerful machine learning algorithm that has gained widespread popularity in research. It is an ensemble method that combines multiple decision trees to improve the accuracy of the model. XGBoost is known for its scalability, speed, and ability to handle complex datasets. |
[39] | Logistic regression | Logistic regression is a widely used machine learning algorithm in various research fields. It is a linear model that can be used for binary classification tasks. Logistic regression is known for its simplicity, interpretability, and ability to handle small datasets. |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Alraizza, A.; Algarni, A. Ransomware Detection Using Machine Learning: A Survey. Big Data Cogn. Comput. 2023, 7, 143. https://doi.org/10.3390/bdcc7030143
Alraizza A, Algarni A. Ransomware Detection Using Machine Learning: A Survey. Big Data and Cognitive Computing. 2023; 7(3):143. https://doi.org/10.3390/bdcc7030143
Chicago/Turabian StyleAlraizza, Amjad, and Abdulmohsen Algarni. 2023. "Ransomware Detection Using Machine Learning: A Survey" Big Data and Cognitive Computing 7, no. 3: 143. https://doi.org/10.3390/bdcc7030143
APA StyleAlraizza, A., & Algarni, A. (2023). Ransomware Detection Using Machine Learning: A Survey. Big Data and Cognitive Computing, 7(3), 143. https://doi.org/10.3390/bdcc7030143