Securing IoT Devices Running PureOS from Ransomware Attacks: Leveraging Hybrid Machine Learning Techniques
Abstract
:1. Introduction
1.1. Background on Ransomware Attacks
1.2. The Need for Effective Defense Mechanisms against Ransomware Attacks
- (a)
- Due to their compact and low-cost form factors, many devices in the IoT suffer from processing power and memory constraints. They may not have the resources to run computer-intensive security programs or communicate at a high bandwidth. Therefore, they become increasingly susceptible to ransomware anomalies as the number of linked devices grows.
- (b)
- Because of a lack of robust security measures and standards, many IoT devices are vulnerable to attacks. This is a real concern, especially for older devices that were not always built with safety in mind.
- (c)
- Sensitive information, such as medical records, financial records, and personal preferences, is frequently collected by IoT devices. These sensors’ data could be stolen and utilized for nefarious purposes if they were hacked.
- (d)
- The hardware, software, and network architecture that make up an IoT system can be rather complicated. Because of this complexity, proactively spotting and preventing ransomware is challenging. Due to heterogeneous operational and functional requirements, integrating IoT equipment into older, less secure systems is widespread. Therefore, it could be challenging to protect these systems without causing operational disruptions.
1.3. The Role of Machine Learning in Ransomware Defense
- (a)
- Random Forest algorithm with an ensemble of decision trees was used to classify malware samples in [4].
- (b)
- (c)
- In probability theory, Bayes’ theorem is the basis for the Naive Bayes algorithm and has been used in spam detection to identify malware [6].
- (d)
- Decision trees [7] are another ML technique that has been frequently employed in combination with other supportive algorithms for malware detection.
- (e)
- Logistic regression [8] is a statistical method used to figure out how likely a binary outcome is to happen. It has been used successfully in programs that look for malware.
- (f)
- Neural Networks [9] have also been used successfully in malware detection applications.
- (a)
- Unusual or high-volume network traffic [10], as well as traffic from unknown sources, ports, or protocols, are just some of the indicators that were uncovered by ML models monitoring network activity.
- (b)
- System calls are used by malware to communicate with the operating system and were a telltale sign of malicious software [11]. Models trained with ML were very vigilant on system calls for signs of malicious activity.
- (c)
- Resource use anomalies [12] caused by malware, such as high central processing unit (CPU) or memory usage, were easily detectable by ML models.
- (d)
- Anomalous activity, such as changes to system settings [13] or user behavior that does not make sense, might be a telltale sign of malware and was detected using ML models.
- (e)
- The software on IoT devices was analyzed by ML models for the presence of recognized malware signatures or dangerous patterns.
1.4. PureOS
- We investigated 15,000 samples (i.e., ransomware and benign) instances, detailing hitherto unreported facets of ransomware attacks with an emphasis on shared traits amongst malware families.
- We outlined the design process behind the fundamental components of ransomware samples and discussed how this knowledge can be leveraged to prevent future intrusion. In devastating ransomware cyberattacks of varying degrees of complexity, our research demonstrated that aberrant control efforts should be reliably monitored.
- We proposed methods to counter the widespread threat of dissimilar ransomware attacks. We have suggested a generic approach to detecting such risks, one that makes no presumptions about the specific methods through which user records are maliciously made unavailable.
2. Literature Review
3. Data Collection and Preparation
3.1. Data Collection and Processing Techniques
3.2. Preprocessing and Feature Engineering
3.3. Data Augmentation and Balancing Techniques
3.4. Focused Ransomware Variants
- i.
- The Kryptik [24] ransomware is a type of malware that is often disseminated through email phishing campaigns and exploit kits. This advanced form of ransomware uses encryption algorithms to lock down the victim’s files, rendering them inaccessible. Kryptik ransomware was re-designed to evade detection by antivirus software (i.e., Virus Chaser [25]) and uses command-and-control (C&C) servers to obtain instructions from the attacker. It employs encryption algorithms (i.e., RSA-2048 and AES-256) to encrypt the victim’s files, rendering them inaccessible. It utilizes obfuscation techniques to conceal its activities. The impact of Kryptik ransomware can be catastrophic, resulting in critical data loss and disrupting business operations.
- ii.
- We have re-implemented the Cloud Snooper [26] ransomware to target cloud-based systems and services (i.e., Tonido cloud platform [27] through the Nautilus file manager plugin). It exploited the weaknesses in cloud infrastructure to gain unauthorized access to the victim’s network. Some of the notable features of Cloud Snooper ransomware include its ability to bypass firewalls and intrusion detection systems and encrypt files. It operated covertly to evade detection and caused severe damage to the victim (i.e., sandbox experimental setup. The impact of Cloud Snooper ransomware was particularly devastating, as it resulted in the loss of sensitive information and disruption of normal OS operations (i.e., encrypting or locking files, modifying system settings, and interfering with the normal functioning of applications and system processes).
- iii.
- The WannaCry [28] ransomware was first identified in May 2017. It spread rapidly, infecting over 230,000 computers in over 150 countries within just a few days. Originally, the ransomware used a vulnerability in Microsoft Windows known as EternalBlue to spread from one computer to another, making it particularly dangerous. Key features of WannaCry were as follows:
- It encrypts files on the infected system using the AES encryption algorithm, making them inaccessible to the user.
- It can spread rapidly across a network, infecting other vulnerable computers without any user interaction.
- A “kill switch” was built into the code of WannaCry, allowing researchers to halt the spread of the ransomware by registering a domain name that the malware checked before encrypting files.WannaCry was altered and reprogrammed to accommodate the PureOS functional requirements that were originally implemented to specifically targeted systems running Microsoft Windows operating systems, with a particular focus on older, unsupported versions such as Windows Server 2003 and Server 2022. The ransomware payload was delivered as a PureOS executable file disguised as a software update. Once the file was executed, it installed ransomware on the system and began encrypting files. We have used AES encryption to encrypt files on the infected system, with a unique key generated for each system. The re-implemented ransomware also encrypted the key itself using RSA encryption, making it intolerable to decrypt the files without the private key presumably held by the attackers.
- iv.
- LockBit [29] is a file-encrypting ransomware that uses a combination of RSA and AES encryption algorithms to encrypt the victim’s files. Once the files are encrypted, the ransomware displays a ransom note, demanding payment in exchange for the decryption key. We re-designed the malware by granting it the ability to spread across a network and infect multiple devices connected to it. Revised implementation was equipped with a timer feature that deletes files after a set amount of time, which means that the anomaly must be counter-measured to diminish the impact. This ransomware was keen to target critical files, such as documents, images, and databases.
- v.
- Re-programmed Black Basta [30] ransomware used AES-256 encryption to encrypt files on the victim’s PureOS mounted computer (i.e., including desktops, laptops, and servers). It appended a unique extension to encrypted files, making them unusable until they are decrypted. The encryption process took several minutes or in some iterations even hours, depending on the size of the files.
- vi.
- Revised Hive [31] ransomware used a combination of RSA and AES encryption algorithms to lock the victim’s files (i.e., experimental setup). It entered the system through an exploit kit and could spread to other connected devices on the sandbox network. The ransomware could erase shadow copies and backup files to obstruct the victim’s efforts to recover their encrypted data.
- vii.
- ALPHV, BlackCat, and Noberus [31] are three distinct ransomware families with their own unique features, system and network targets, technical details, and impact. Common features included its use of double extortion tactics, which involve not only encrypting a victim’s files (i.e., AES-256, and RSA), but also stealing sensitive data. We re-implemented these ransomware variants by using multiple techniques to evade detection, including code obfuscation, anti-debugging techniques, and process injection. During certain experimental iterations, we appended the “.noberus” extension to encrypted files. We have observed that ransomware typically appends a unique extension to encrypted files as a way to differentiate them from their original unencrypted state.
- viii.
- PureOS-focused AvosLocker [32] used strong encryption algorithms (such as AES-256 and RSA) to encrypt files on a victim’s computer or network. AvosLocker targeted the honeypot computer and network that was vulnerable to its distribution method (such as outdated Remote Desktop Protocol (RDP)) and contains vulnerabilities that can be exploited. In the revised implementation, AvosLocker generated a unique encryption key for each infected computer, which was stored on the attacker’s (i.e., anomaly) server. The impact of this ransomware was severe, as it caused the victim to lose access to important files and data.
- ix.
- The Conti [33] ransomware is a highly advanced and complex malware that uses a sophisticated encryption algorithm to encrypt files on a victim’s computer system. It can spread through a network, infecting other connected systems. The vulnerabilities that Conti exploits in PureOS include exploiting weaknesses in the RDP protocol to gain access to internet-connected systems, exploiting vulnerabilities in VPN and remote access software such as Pulse Secure VPN, Fortinet VPN, and Citrix ADC, and exploiting vulnerabilities in web servers such as Apache and Nginx to gain unauthorized access to victims’ systems. To achieve our goal, we have ensured that Conti ransomware uses a combination of symmetric and asymmetric encryption techniques to encrypt the files of its victims (i.e., random 256-bit ChaCha symmetric key for each file’s encryption and an asymmetric encryption algorithm RSA cryptography for the encryption of the ChaCha key). Furthermore, it communicated with its C&C server using an encrypted channel, making it difficult to track its activities.
- x.
- We implemented REvil [34] ransomware more powerfully by using stronger encryption algorithms such as RSA-2048 and AES-256. This allowed the ransomware to encrypt not only local files but also files on network shares and mapped drives. As a result, any PureOS-based computing system, including the Librem Server, workstations (such as the Librem 14 and Librem Mini), and cellular devices (such as the Librem 5) could potentially be targeted [35]. After infecting a victim’s computer, the ransomware was designed to remain there by creating a scheduled task or modifying the registry. Furthermore, we made the ransomware even more malicious by adding the ability to exfiltrate sensitive data before encrypting it.
- xi.
- We implemented DarkSide [35] ransomware by enforcing strong encryption algorithms, such as RSA and AES, to encrypt files on a victim’s computer and prevent them from being accessed without meeting the adversary criteria. Various obfuscation techniques (i.e., (a) code encryption and obfuscation, (b) applying the polymorphic code, (c) applying the dynamically linked to system libraries, (d) malware code compression, and (e) equipping it with an anti-debugging capacity to detect when it is being analyzed or debugged and takes actions to evade or disable the analysis) were introduced to evade detection.
- xii.
- The Babuk [36] ransomware was re-designed to use a combination of symmetric and asymmetric encryption algorithms to encrypt data on the target system. It used a per-file random 256-bit ChaCha symmetric key for each file’s encryption, and an asymmetric encryption algorithm such as RSA cryptography for the encryption of the ChaCha key. The asymmetric encryption algorithm is used to securely transmit the ChaCha key to the ransomware operator, allowing them to decrypt the files.Babuk’s feature allowed it to steal data from infected systems. These data were then encrypted and sent to the ransomware operator (i.e., adversarial process). It was also capable of terminating running processes, deleting shadow volume copies, and disabling the PureOS System Restore feature.
- xiii.
- To satisfy the experimental requirement, we redesigned the Egregor [37] ransomware that enabled it to use a mix of symmetric and asymmetric encryption algorithms to encrypt files on the targeted computer. The process involved generating a unique 256-bit ChaCha symmetric key for each file and using the RSA algorithm to encrypt the ChaCha key for secure transmission to the attacker (i.e., automated process), which could then decrypt the files. Moreover, the ransomware had various capabilities such as appending a random extension to the encrypted files, exploiting vulnerabilities in RDP connections and exploit kits, stealing data from infected computers, terminating processes, removing shadow volume copies, and disabling the PureOS System Restore function. Upon infecting the computer, the ransomware compressed the encrypted files into a single archive using an encryption and compression technique (i.e., “Lossless” and “Huffman coding” [35] compression).
- xiv.
- The updated/re-designed version of the Avaddon [37] ransomware had numerous functions, such as employing both symmetric and asymmetric encryption methods to encrypt files. It used the RSA algorithm to encrypt files and used an exclusive AES-256 key for each file, making it challenging to decrypt without the key. The ransomware also added a distinct extension to each encrypted file, making it hard to recognize and retrieve the files. Moreover, the malware was equipped with an extended capacity to extract sensitive data from the infected system and forward it to the attacker node (i.e., an automated process). It could terminate ongoing processes and deactivate various operating system security features, including PureBoot [38].
4. Applied Machine Learning Models for Ransomware Defense
4.1. Overview of Different Machine Learning Models
- (a)
- The first scenario was very strict, and only a very well-characterized set of samples were included.
- (b)
- The second scenario was less strict and included a broader range of well-studied samples.
- (c)
- The third scenario was the most realistic, representing the actual conditions faced by vendors of ransomware detection solutions.
4.2. Selection of Appropriate Machine Learning Models
- (a)
- We wanted to find the best machine-learning model for detecting ransomware. Therefore, we created a set of criteria for our search. The criteria are not exhaustive but include the following: the selected model should have high accuracy in detecting ransomware and be able to minimize false positives and false negatives.
- (b)
- The model should be scalable and perform well even when dealing with small or large datasets.
- (c)
- It should be able to generalize well to new and unseen ransomware samples.
- (d)
- The model should be robust and able to perform well in the presence of noise, adversarial attacks, and other anomalies.
- (e)
- The model should provide clear and interpretable explanations for its decisions and predictions.
- (f)
- It should be efficient in terms of computation time, memory usage, and power consumption.
- (g)
- The model should be flexible and easily adaptable to changing ransomware attack patterns with the ability to incorporate new data.
- i.
- “n_samples” is the number of samples in the dataset.
- ii.
- “y” is the target variable in the dataset.
- iii.
- “X” is the matrix of features in the dataset.
- iv.
- “W” is the vector of coefficients that are learned by the model.
- v.
- “14_ratio” is a hyperparameter that controls the balance between purportedly L1 and L2 regularization. As asserted, the L1 regularization promotes sparsity in the learned coefficients, while L2 regularization promotes small, non-zero coefficients.
- vi.
- “Alpha” is a hyperparameter that controls the strength of the regularization. Higher values of alpha lead to more regularization.
4.3. Feature Selection and Model Tuning
- Atypical network activity that is not typical for the system.
- Alterations to file extensions are not typical for the system.
- Suspicious processes with names that are random or located in unusual directories.
- Changes to the registry.
- Unusual CPU or disk usage that is not typical for the system.
- Pop-up messages or warnings.
- Atypical system crashes or errors.
- Encryption key generation by malware.
- Usage of non-standard encryption algorithms.
- Unusual behavior, such as modification of file timestamps or the creation of decoy files to deceive the victim.
- Atypical file access patterns that are not typical for the system.
- Large numbers of file deletions.
- Changes to file permissions that are not typical for the system.
- Random file names on large file datasets, all at once.
- Large numbers of failed login attempts.
- Unusual file sizes that are not typical for the system.
4.4. Evaluation of Model Performance
- Split the data into training and testing sets.
- Perform feature selection using ElasticNet to identify the most important features for predicting ransomware.
- Train an XGBoost model on the selected features using the training set.
- Predict the labels of the test set using the trained model.
- Evaluate the performance of the model.
- The test_size parameter specifies the proportion of the data that will be used for testing, while the remaining data are used for training. For example, a test_size of 0.2 means that 20% of the data will be used for testing, and 80% will be used for training.
- The random_state parameter is used to set the seed for the random number generator, which ensures that the results are reproducible. This is important because the random sampling of data for training and testing can affect the performance metrics of the model. By setting the random_state parameter to a specific value, the same random sampling will occur every time the code is run, ensuring that the results are consistent and reproducible.
- S is the original dataset.
- v is a specific value of the feature being considered.
- p(v) is the proportion of the number of elements in S that have the value v to the number of elements in S.
- Sv is the subset of S where the feature has the value v.
- Entropy(S) is the entropy of the original dataset S.
- Entropy(Sv) is the entropy of the subset Sv.
- O is the observed frequency for a given feature and the presence of ransomware.
- E is the expected frequency for the same feature and ransomware presence.
- ∑ is the summation over all possible values of the feature and ransomware presence.
5. Implementation and Testing
- The first one involved a Librem 14 laptop equipped with an Intel Core i7 10710U processor with 6 cores and 12 threads, DDR4 RAM of 64 GB, Intel UHD Graphics 620 GPU, M.2 SSD storage of 2 TB (NVMe), PureBoot firmware, and a PureOS operating system.
- The second configuration used a Librem 5 smartphone, which had an NXP® i.MX 8M Quad core Cortex A53 processor with 64-bit ARM architecture running at a maximum of 1.5 GHz (along with an auxiliary Cortex M4), Vivante GC7000Lite GPU, 3 GB of RAM, 32 GB eMMC internal storage, and a PureOS operating system.
- (a)
- Ransomware attackers are using advanced encryption algorithms such as RSA-2048, AES-256, and ChaCha-256 to encrypt victim data, making it inaccessible without the decryption key.
- (b)
- The speed at which encryption algorithms operate can impact the success of a ransomware attack. In this case, ChaCha-256 was found to be the fastest among the three encryption algorithms, making it a potentially more effective choice for attackers.
- (c)
- As a result of the faster encryption speed, ChaCha-256 may become more prevalent in future ransomware attacks.
Limitations
6. Conclusions and Future Work
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Acknowledgments
Conflicts of Interest
References
- Lawal, K.; Rafsanjani, H.N. Trends, benefits, risks, and challenges of IoT implementation in residential and commercial buildings. Energy Built Environ. 2022, 3, 251–266. [Google Scholar] [CrossRef]
- Ransomware at Colorado IT Provider Affects 100+ Dental Offices—Krebs on Security. 7 December 2019. Available online: https://krebsonsecurity.com/2019/12/ransomware-at-colorado-it-provider-affects-100-dental-offices/ (accessed on 27 March 2023).
- NATO Countries Hit with Unprecedented Cyber Attacks. GovTech. 4 September 2022. Available online: https://www.govtech.com/blogs/lohrmann-on-cybersecurity/nato-countries-hit-with-unprecedented-cyber-attacks (accessed on 28 March 2023).
- Cui, J. Malware Detection Algorithm for Wireless Sensor Networks in a Smart City Based on Random Forest. J. Test. Eval. 2022, 51, 20220100. [Google Scholar] [CrossRef]
- Singh, T.; Di Troia, F.; Corrado, V.A.; Austin, T.H.; Stamp, M. Support Vector Machines and Malware Detection. J. Comput. Virol. Hacking Tech. 2015, 12, 203–212. [Google Scholar] [CrossRef]
- Yilmaz, A.B.; Taspinar, Y.S.; Koklu, M. Classification of Malicious Android Applications Using Naive Bayes and Support Vector Machine Algorithms. Int. J. Intell. Syst. Appl. Eng. 2022, 10, 269–274. Available online: https://ijisae.org/index.php/IJISAE/article/view/2010 (accessed on 29 March 2023).
- Abu Al-Haija, Q.; Odeh, A.; Qattous, H. PDF Malware Detection Based on Optimizable Decision Trees. Electronics 2022, 11, 3142. [Google Scholar] [CrossRef]
- Gao, Y.; Hasegawa, H.; Yamaguchi, Y.; Shimada, H. Malware Detection Using LightGBM with a Custom Logistic Loss Function. IEEE Access 2022, 10, 47792–47804. [Google Scholar] [CrossRef]
- Xie, N. Andro_MD: Android Malware Detection based on Convolutional Neural Networks. Int. J. Perform. Eng. 2018, 14, 547–558. [Google Scholar] [CrossRef]
- Liu, T.; Li, Z.; Long, H.; Bilal, A. NT-GNN: Network Traffic Graph for 5G Mobile IoT Android Malware Detection. Electronics 2023, 12, 789. [Google Scholar] [CrossRef]
- Manoharan, S.; Sugumaran, P.; Kumar, K. Multichannel Based IoT Malware Detection System Using System Calls and Opcode Sequences. Int. Arab. J. Inf. Technol. 2022, 19, 261–271. [Google Scholar] [CrossRef]
- Sun, H.; Wang, X.; Buyya, R.; Su, J. CloudEyes: Cloud-based malware detection with reversible sketch for resource-constrained internet of things (IoT) devices. Softw. Pract. Exp. 2016, 47, 421–441. [Google Scholar] [CrossRef]
- Ahmed, U.; Lin, J.C.W.; Srivastava, G. Mitigating adversarial evasion attacks of ransomware using ensemble learning. Comput. Electr. Eng. 2022, 100, 107903. [Google Scholar] [CrossRef]
- Ibrahim, A.; Tariq, U.; Ahamed Ahanger, T.; Tariq, B.; Gebali, F. Retaliation against Ransomware in Cloud-Enabled PureOS System. Mathematics 2023, 11, 249. [Google Scholar] [CrossRef]
- Barrett, M.P. Framework for Improving Critical Infrastructure Cybersecurity Version 1.1. NIST. 16 April 2018. Available online: https://www.nist.gov/publications/framework-improving-critical-infrastructure-cybersecurity-version-11 (accessed on 27 March 2023).
- Hull, G.; Jhon, H.; Arief, B. Ransomware deployment methods and analysis: Views from a predictive model and human responses. Crime Sci. 2019, 8, 2. [Google Scholar] [CrossRef]
- Kharraz, A.; Robertson, W.; Kirda, E. Protecting against Ransomware: A New Line of Research or Restating Classic Ideas? IEEE Secur. Priv. 2018, 16, 103–107. [Google Scholar] [CrossRef]
- Upadhyaya, R.; Jain, A. Cyber ethics and cyber crime: A deep dwelved study into legality, ransomware, underground web and bitcoin wallet. In Proceedings of the 2016 International Conference on Computing, Communication and Automation (ICCCA), Greater Noida, India, 29–30 April 2016; pp. 143–148. [Google Scholar] [CrossRef]
- Gagneja, K.K. Knowing the ransomware and building defense against it—Specific to healthcare institutes. In Proceedings of the 2017 Third International Conference on Mobile and Secure Services (MobiSecServ), Miami Beach, FL, USA, 11–12 February 2017; pp. 1–5. [Google Scholar] [CrossRef]
- Celdrán, A.H.; Sánchez, P.M.S.; Castillo, M.A.; Bovet, G.; Pérez, G.M.; Stiller, B. Intelligent and behavioral-based detection of malware in IoT spectrum sensors. Int. J. Inf. Secur. 2022, 22, 541–561. [Google Scholar] [CrossRef]
- Moon, D.; Lee, J.; Yoon, M. Compact feature hashing for machine learning based malware detection. ICT Express 2022, 8, 124–129. [Google Scholar] [CrossRef]
- Dargahi, T.; Dehghantanha, A.; Bahrami, P.N.; Conti, M.; Bianchi, G.; Benedetto, L. A Cyber-Kill-Chain based taxonomy of crypto-ransomware features. J. Comput. Virol. Hacking Tech. 2019, 15, 277–305. [Google Scholar] [CrossRef]
- ESET: Threat Report Q2 2020. Comput. Fraud. Secur. 2020, 2020, 4. [CrossRef]
- Yang, W.; Gao, M.; Chen, L.; Liu, Z.; Ying, L. RecMaL: Rectify the malware family label via hybrid analysis. Comput. Secur. 2023, 128, 103177. [Google Scholar] [CrossRef]
- VirusChaser: A Comprehensive Antivirus Solution Equipped with Powerful System Protection Features. VirusChaser. 18 February 2023. Available online: https://www.ncloud.com/marketplace/viruschaser (accessed on 16 April 2023).
- FKIE, F. Cloud Snooper (Malware Family). Cloud Snooper (Malware Family). 21 December 2020. Available online: https://malpedia.caad.fkie.fraunhofer.de/details/elf.cloud_snooper (accessed on 1 March 2023).
- Tonido—Run Your Personal Cloud. A Free Private Cloud Server. (n.d.). Tonido—Run Your Personal Cloud. A Free Private Cloud Server. Available online: https://www.tonido.com/ (accessed on 2 March 2023).
- Ghafur, S.; Kristensen, S.; Honeyford, K.; Martin, G.; Darzi, A.; Aylin, P. A retrospective impact analysis of the WannaCry cyberattack on the NHS. npj Digit. Med. 2019, 2, 98. [Google Scholar] [CrossRef]
- Eliando, E.; Purnomo, Y. LockBit 2.0 Ransomware: Analysis of infection, persistence, prevention mechanism. CogITo Smart J. 2022, 8, 232–243. [Google Scholar] [CrossRef]
- Kajave, A.; Nismy, S.A.H. How Cyber Criminal Use Social Engineering to Target Organizations. arXiv 2022, arXiv:2212.12309. [Google Scholar] [CrossRef]
- Tanner, D.A.; Hinchliffe, A.; Santos, D. Threat Assessment: Blackcat Ransomware. 2022. Available online: https://shorturl.at/cdV37 (accessed on 2 March 2023).
- Kara, I.; Aydos, M. The rise of ransomware: Forensic analysis for windows based ransomware attacks. Expert Syst. Appl. 2022, 190, 116198. [Google Scholar] [CrossRef]
- Umar, R.; Riadi, I.; Kusuma, R.S. Analysis of Conti Ransomware Attack on Computer Network with Live Forensic Method. IJID Int. J. Inform. Dev. 2021, 10, 53–61. [Google Scholar] [CrossRef]
- Datta, P.M.; Acton, T. From disruption to ransomware: Lessons from hackers. J. Inf. Technol. Teach. Cases 2022. [Google Scholar] [CrossRef]
- Purism Products. Available online: https://puri.sm/products/ (accessed on 3 March 2023).
- Zou, S.; Zhang, J.; Jiang, S.; Cheng, Y.; Ji, X.; Xu, W. OutletGuarder: Detecting DarkSide Ransomware by Power Factor Correction Signals in an Electrical Outlet. In Proceedings of the 2022 IEEE 28th International Conference on Parallel and Distributed Systems (ICPADS), Nanjing, China, 10–12 January 2023; pp. 419–426. [Google Scholar] [CrossRef]
- Lin, C.; Kimberly, G.; Daniel, R.; Henry, U. Blockchain Forensics and Crypto-Related Cybercrimes. SSRN 2023. [Google Scholar] [CrossRef]
- PureBoot & Ndash; Purism. (n.d.). Purism. Available online: https://puri.sm/projects/pureboot/ (accessed on 1 March 2023).
- Palša, J.; Ádám, N.; Hurtuk, J.; Chovancová, E.; Madoš, B.; Chovanec, M.; Kocan, S. MLMD—A Malware-Detecting Antivirus Tool Based on the XGBoost Machine Learning Algorithm. Appl. Sci. 2022, 12, 6672. [Google Scholar] [CrossRef]
- Srinivasan, S.; Deepalakshmi, P. ENetRM: ElasticNet Regression Model based malicious cyber-attacks prediction in real-time server. Meas. Sens. 2023, 25, 100654. [Google Scholar] [CrossRef]
- VMware. NSX Sandbox|VMware. Available online: https://www.vmware.com/products/nsx-sandbox.html (accessed on 4 March 2023).
- Wahidin, G.W.; Syaifuddin, S.; Sari, Z. Analisis Ransomware Wannacry Menggunakan Aplikasi Cuckoo Sandbox. J. Repos. 2022, 4, 83–94. [Google Scholar] [CrossRef]
- Lee, C.; Han, S.M.; Chae, Y.H.; Seong, P.H. Development of a cyberattack response planning method for nuclear power plants by using the Markov decision process model. Ann. Nucl. Energy 2022, 166, 108725. [Google Scholar] [CrossRef]
- Sahin, D.O.; Akleylek, S.; Kilic, E. LinRegDroid: Detection of Android Malware Using Multiple Linear Regression Models-Based Classifiers. IEEE Access 2022, 10, 14246–14259. [Google Scholar] [CrossRef]
- Singh, P.; Borgohain, S.K.; Kumar, J. Performance Enhancement of SVM-based ML Malware Detection Model Using Data Preprocessing. In Proceedings of the 2022 2nd International Conference on Emerging Frontiers in Electrical and Electronic Technologies (ICEFEET), Patna, India, 24–25 June 2022; pp. 1–4. [Google Scholar] [CrossRef]
- Mowri, R.A.; Siddula, M.; Roy, K. Interpretable Machine Learning for Detection and Classification of Ransomware Families Based on API Calls. arXiv 2022, arXiv:2210.11235. [Google Scholar] [CrossRef]
Detection Technique | Features | Advantages | Disadvantages |
---|---|---|---|
Signature-based | Hash values, file names, behavior patterns | High accuracy, low false positive rate | Inability to detect new, unknown ransomware variants, ineffective against polymorphic ransomware |
Heuristic-based | Behavior patterns, file access patterns, network traffic | Ability to detect new, unknown ransomware variants, low false positive rate, effective against polymorphic ransomware | Higher false negative rate, limited ability to differentiate between benign and malicious activity |
Machine learning-based | Dynamic behavior analysis, system calls, network traffic, entropy, header information | Ability to detect new, unknown ransomware variants, ability to differentiate between benign and malicious activity, high accuracy, effective against polymorphic ransomware | Requires large, representative datasets for training, may be susceptible to adversarial attacks, may produce false positives due to benign software with similar behavior |
Hybrid approach | Combination of signature-based and machine-learning-based techniques | Improved accuracy and ability to detect new, unknown ransomware variants, effective against polymorphic ransomware | May be more complex and resource-intensive, may still miss new, unknown ransomware variants |
SMOTE Applicability to Identify Ransomware in a Dataset |
---|
|
Pseudocode for Detecting Ransomware Using XGBoost and ElasticNet |
---|
|
Pseudocode for Evaluating the Performance of the Model |
---|
# Split the data into training and testing sets X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.2, random_state = 42) # Perform feature selection using ElasticNet important_features = perform_elasticnet(X_train, y_train) # Select the important features X_train_selected = select_features(X_train, important_features) X_test_selected = select_features(X_test, important_features) # Train an XGBoost model on the selected features model = train_xgboost(X_train_selected, y_train) # Predict the labels of the test set y_pred = model.predict(X_test_selected) # Compute the evaluation metrics accuracy = accuracy_score(y_test, y_pred) precision = precision_score(y_test, y_pred) recall = recall_score(y_test, y_pred) f1 = f1_score(y_test, y_pred) |
Ransomware | False/Positive | False/Negative | Accuracy | Precision | Recall | F-Score |
---|---|---|---|---|---|---|
Kryptik | 3.21 | 1.95 | 85 | 0.823 | 0.853 | 0.869 |
Cloud Snooper | 1.67 | 2.84 | 92 | 0.882 | 0.830 | 0.863 |
WannaCry | 0.95 | 2.18 | 81 | 0.854 | 0.818 | 0.861 |
LockBit | 3.53 | 3.34 | 88 | 0.801 | 0.846 | 0.832 |
Black Basta | 2.47 | 1.17 | 84 | 0.888 | 0.839 | 0.854 |
Revised Hive | 1.92 | 2.53 | 89 | 0.820 | 0.817 | 0.829 |
ALPHV/BlackCat/Noberus | 2.99 | 2.28 | 95 | 0.847 | 0.856 | 0.844 |
AvosLocker | 1.42 | 1.11 | 83 | 0.897 | 0.824 | 0.819 |
Conti | 3.76 | 3.89 | 87 | 0.876 | 0.811 | 0.876 |
REvil | 1.08 | 3.48 | 80 | 0.815 | 0.857 | 0.877 |
DarkSide | 2.27 | 1.73 | 91 | 0.809 | 0.814 | 0.816 |
Babuk | 0.85 | 3.29 | 94 | 0.865 | 0.847 | 0.823 |
Egregor | 3.94 | 3.747 | 82 | 0.839 | 0.819 | 0.881 |
Avaddon | 2.04 | 1.09 | 90 | 0.896 | 0.862 | 0.858 |
Feature Optimization | Applied Feature Count | TP Rate (%) | FP Rate | Precision | F-Score |
---|---|---|---|---|---|
Information Gain | 140 | 82.96 | 2.74 | 0.928 | 0.844 |
112 | 86.02 | 2.12 | 0.867 | 0.804 | |
84 | 81.75 | 3.14 | 0.923 | 0.823 | |
56 | 85.14 | 2.58 | 0.849 | 0.862 | |
28 | 83 | 1.98 | 0.882 | 0.818 | |
Chi-Square | 140 | 91.94 | 2.54 | 0.798 | 0.844 |
112 | 92.13 | 3.10 | 0.826 | 0.804 | |
84 | 91.56 | 2.28 | 0.771 | 0.823 | |
56 | 92.01 | 3.30 | 0.793 | 0.862 | |
28 | 92.32 | 1.99 | 0.810 | 0.818 |
Ransomware | Encoding | Lock | Remote Access Trojan | Sample Size (%) |
---|---|---|---|---|
Kryptik | ✓ | ✓ | ✓ | 4 |
Cloud Snooper | ✓ | ✓ | ✓ | 9 |
WannaCry | ✓ | - | - | 7 |
LockBit | ✓ | ✓ | - | 5 |
Black Basta | ✓ | - | - | 11 |
Revised Hive | ✓ | ✓ | - | 8 |
ALPHV/BlackCat/Noberus | ✓ | - | - | 10 |
AvosLocker | ✓ | ✓ | ✓ | 6 |
Conti | ✓ | - | ✓ | 12 |
REvil | ✓ | - | ✓ | 3 |
DarkSide | ✓ | - | ✓ | 8 |
Babuk | ✓ | ✓ | - | 2 |
Egregor | ✓ | ✓ | ✓ | 9 |
Avaddon | ✓ | ✓ | - | 6 |
Sr# | ML Algorithm | Accuracy | Precession | Recall | F-Score | ||||
---|---|---|---|---|---|---|---|---|---|
25 Folds | 60% Split | 25 Folds | 60% Split | 25 Folds | 60% Split | 25 Folds | 60% Split | ||
1 | Reinforcement learning (Markovic Decision Process + Q-Learning) [43] | 0.867 | 0.865 | 0.867 | 0.865 | 0.845 | 0.842 | 0.874 | 0.872 |
2 | K-Nearest Neighbors Algorithm [44] | 0.872 | 0.870 | 0.872 | 0.871 | 0.855 | 0.853 | 0.880 | 0.882 |
3 | Support Vector Machine [45] | 0.845 | 0.846 | 0.846 | 0.842 | 0.803 | 0.806 | 0.845 | 0.842 |
4 | Stochastic Gradient Descent [46] | 0.811 | 0.816 | 0.813 | 0.817 | 0.733 | 0.725 | 0.804 | 0.818 |
5 | Naive Bayes [44] | 0.512 | 0.532 | 0.672 | 0.666 | 0.551 | 0.533 | 0.865 | 0.847 |
6 | Hybrid XGBoost and ElasticNet | 0.901 | 0.907 | 0.921 | 0.917 | 0.920 | 0.933 | 0.921 | 0.927 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Ahanger, T.A.; Tariq, U.; Dahan, F.; Chaudhry, S.A.; Malik, Y. Securing IoT Devices Running PureOS from Ransomware Attacks: Leveraging Hybrid Machine Learning Techniques. Mathematics 2023, 11, 2481. https://doi.org/10.3390/math11112481
Ahanger TA, Tariq U, Dahan F, Chaudhry SA, Malik Y. Securing IoT Devices Running PureOS from Ransomware Attacks: Leveraging Hybrid Machine Learning Techniques. Mathematics. 2023; 11(11):2481. https://doi.org/10.3390/math11112481
Chicago/Turabian StyleAhanger, Tariq Ahamed, Usman Tariq, Fadl Dahan, Shafique A. Chaudhry, and Yasir Malik. 2023. "Securing IoT Devices Running PureOS from Ransomware Attacks: Leveraging Hybrid Machine Learning Techniques" Mathematics 11, no. 11: 2481. https://doi.org/10.3390/math11112481
APA StyleAhanger, T. A., Tariq, U., Dahan, F., Chaudhry, S. A., & Malik, Y. (2023). Securing IoT Devices Running PureOS from Ransomware Attacks: Leveraging Hybrid Machine Learning Techniques. Mathematics, 11(11), 2481. https://doi.org/10.3390/math11112481