A Machine Learning-Based Ransomware Detection Method for Attackers’ Neutralization Techniques Using Format-Preserving Encryption
Abstract
:1. Introduction
- To prevent ransomware infections and minimize damage, we analyzed existing ransomware detection and neutralization technologies and derived effective countermeasures from a defender’s perspective by considering the neutralization technologies employed by attackers. We proposed a technology to detect encrypted files using FPE, a method that could potentially neutralize ransomware detection. This article is expected to provide a solution for detecting files infected by various types of ransomware by effectively addressing technologies that could neutralize ransomware detection methods.
- By thoroughly analyzing technologies that could neutralize existing ransomware detection methods, we identified the limitations of these methods. Furthermore, by applying various machine learning models, we verified that ransomware detection remains possible even when technologies capable of neutralizing detection methods are used.
- As a result of comparing and evaluating the performance of ransomware detection in the context of neutralization technologies, we found that the proposed method could detect ransomware more effectively.
- Based on the experimental results, it is anticipated that these preliminary research findings could be used to develop countermeasures against additional ransomware neutralization technologies created from an attacker’s perspective, beyond just the neutralization technology for ransomware detection methods based on file entropy measurement.
2. Prior Research Studies
2.1. Neutralization Methods for Entropy-Based Ransomware Detection Technology Using Encoding Algorithms and Countermeasures
2.2. Neutralization Method for Entropy-Based Ransomware Detection Using FPE
3. Proposed Ransomware Detection Method
3.1. Configuring the System for Ransomware Detection
- Data Acquisition Step
- Feature Extraction Step
- Pre-processing Step
- Dataset Configuration Step
- Training Step
- Classification Step
3.2. Experimental Design and Verification Based on Dataset
3.3. Deriving Optimal Hyperparameters According to the Model
4. Experimental Results
4.1. Performance of the Proposed Method
- Performance Evaluation Metrics Using Machine Learning Models
- Performance Evaluation Results by Feature
4.2. Performance Comparison of the Proposed Method with Other Ransomware Neutralization Technologies
4.3. Performance Evaluation Results Based on Different Data Ratios
5. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
- Korea Internet & Security Agency. 2024 Q1 Ransomware Trends Report. Available online: https://seed.kisa.or.kr/kisa/Board/174/detailView.do (accessed on 13 June 2024).
- Sophos. The Satate of Ransomeware 2024. A Sophos Whitepaper. Available online: https://www.sophos.com/en-us/content/state-of-ransomware (accessed on 26 February 2024).
- Bang, J.; Kim, J.N.; Lee, S. Entropy Sharing in Ransomware: Bypassing Entropy-Based Detection of Cryptographic Operations. Sensors 2024, 24, 1446. [Google Scholar] [CrossRef] [PubMed]
- Lee, K.; Lee, J.; Lee, S.-Y.; Yim, K. Effective Ransomware Detection Using Entropy Estimation of Files for Cloud Services. Sensors 2023, 23, 3023. [Google Scholar] [CrossRef]
- Timothy, M.; Julian, J.; Paul, W.; Teo, S. The inadequacy of entropy-based ransomware detection. In Communications in Computer and Information Science; Springer: Cham, Switzerland, 2019; pp. 181–189. [Google Scholar] [CrossRef]
- Digital Corpora. Govdocs1—(Nearly) 1 Million Freely-Redistributable Files. Available online: https://digitalcorpora.org/corpora/file-corpora/files/ (accessed on 13 April 2024).
- Lee, J.; Lee, K. A Method for Neutralizing Entropy Measurement-Based Ransomware Detection Technologies Using Encoding Algorithms. Entropy 2022, 24, 239. [Google Scholar] [CrossRef]
- Lee, J.; Yun, J.; Lee, K. A Study on Countermeasures against Neutralizing Technology: Encoding Algorithm-Based Ran-somware Detection Methods Using Machine Learning. Electronics 2024, 13, 1030. [Google Scholar] [CrossRef]
- Lee, J.; Lee, S.-Y.; Yim, K.; Lee, K. Neutralization Method of Ransomware Detection Technology Using Format Preserving Encryption. Sensors 2023, 23, 4728. [Google Scholar] [CrossRef]
- Kim, D.; Kim, H.; Jang, K.; Yoon, S.; Seo, H. Deep-Learning-Based Neural Distinguisher for FPE Schemes FF1 and FF3. Electronics 2024, 13, 1196. [Google Scholar] [CrossRef]
- Garfinkel, S.; Farrell, P.; Roussev, V.; Dinolt, G. Bringing science to digital forensics with standardized forensic corpora. Digit. Investig. 2009, 6, S2–S11. [Google Scholar] [CrossRef]
- GitHub. Trending. Available online: https://github.com/trending/c?since=daily&spoken_language_code= (accessed on 30 April 2024).
- Suhardjono, S.; Handayani, P.; Sugiarto, H.; Aisyah, N.; Putra, A.S. Forensic Analysis Video Metadata Authenticity Detection Using ExifTool. J. Innov. Res. Knowl. 2022, 1, 1727–1734. [Google Scholar] [CrossRef]
- Alotaibi, F.M.; Al-Dhaqm, A.; Al-Otaibi, Y.D.; Alsewari, A.A. A Comprehensive Collection and Analysis Model for the Drone Forensics Field. Sensors 2022, 22, 6486. [Google Scholar] [CrossRef]
- Lee, K.; Lee, S.-Y.; Yim, K. Machine Learning Based File Entropy Analysis for Ransomware Detection in Backup Systems. IEEE Access 2019, 7, 110205–110215. [Google Scholar] [CrossRef]
- Schneier, B. Applied Cryptograph: Protocols, Algorithms and Source Code in C, 2nd ed.; Wiley: Hoboken, NJ, USA, 1996; p. 251. ISBN -10 9780471117094. [Google Scholar]
- Cho, E.; Chang, T.-W.; Hwang, G. Data Preprocessing Combination to Improve the Performance of Quality Classification in the Manufacturing Process. Electronics 2022, 11, 477. [Google Scholar] [CrossRef]
- Fan, Q.; Li, X.; Wang, P.; Jin, X.; Yao, S.; Miao, S. BDIP: An Efficient Big Data-Driven Information Processing Framework and Its Application in DDoS Attack Detection. IEEE Trans. Netw. Serv. Manag. 2024, 22, 284–298. [Google Scholar] [CrossRef]
- Zhang, M.-L.; Zhou, Z.-H. ML-KNN: A lazy learning approach to multi-label learning. Pattern Recognit. 2007, 40, 2038–2048. [Google Scholar] [CrossRef]
- Cheng, W.; Hüllermeier, E. Combining instance-based learning and logistic regression for multilabel classification. Mach. Learn. 2009, 76, 211–225. [Google Scholar] [CrossRef]
- Strelcenia, E.; Prakoonwit, S. Effective Feature Engineering and Classification of Breast Cancer Diagnosis: A Comparative Study. BioMedInformatics 2023, 3, 616–631. [Google Scholar] [CrossRef]
- Cusack, G.; Michel, O.; Keller, E. Machine Learning-Based Detection of Ransomware Using SDN. In Proceedings of the ACM International Workshop on Security in Software Defined Networks & Network Function Virtualization (SDN-NFV Sec’18), Tempe, AZ, USA, 21 March 2018; pp. 1–6. [Google Scholar] [CrossRef]
- Gono, D.N.; Napitupulu, H.; Firdaniza. Silver Price Forecasting Using Extreme Gradient Boosting (XGBoost) Method. Mathematics 2023, 11, 3813. [Google Scholar] [CrossRef]
- Noble, W.S. What is a support vector machine? Nat. Biotechnol. 2006, 24, 1565–1567. [Google Scholar] [CrossRef]
- Mirjalili, S. How effective is the grey wolf optimizer in training Multi-Layer Perceptrons. Appl. Intell. 2015, 43, 150–161. [Google Scholar] [CrossRef]
- Olaniran, O.R.; Alzahrani, A.R.R.; Alzahrani, M.R. Eigenvalue Distributions in Random Confusion Matrices: Applications to Machine Learning Evaluation. Mathematics 2024, 12, 1425. [Google Scholar] [CrossRef]
- Katal, N.; Gupta, S.; Verma, P.; Sharma, B. Deep-Learning-Based Arrhythmia Detection Using ECG Signals: A Comparative Study and Performance Evaluation. Diagnostics 2023, 13, 3605. [Google Scholar] [CrossRef]
- Singh, A.; Mushtaq, Z.; Abosaq, H.A.; Mursal, S.N.F.; Irfan, M.; Nowakowski, G. Enhancing Ransomware Attack Detection Using Transfer Learning and Deep Learning Ensemble Models on Cloud-Encrypted Data. Electronics 2023, 12, 3899. [Google Scholar] [CrossRef]
- Su, L.; Cheng, H.; Li, L.; Zhang, C.; Wang, Y.; Zhao, J. A Novel Approach of Ransomware Detection with Dynamic Obfuscation Signature Analysis. Res. Sq. 2024. preprints. [Google Scholar] [CrossRef]
- Altais, B.; Arkwright, B.; Ashbourne, T.; Middleham, E. Novel Algorithmic Framework for High-Fidelity Ransomware Detection Using Entropy-Based Behavioural Signatures. OSF 2024. preprint. [Google Scholar] [CrossRef]
- Li, J.; Yang, G.; Shao, Y. Ransomware Detection Model Based on Adaptive Graph Neural Network Learning. Appl. Sci. 2024, 14, 4579. [Google Scholar] [CrossRef]
- Albin Ahmed, A.; Shaahid, A.; Alnasser, F.; Alfaddagh, S.; Binagag, S.; Alqahtani, D. Android Ransomware Detection Using Supervised Machine Learning Techniques Based on Traffic Analysis. Sensors 2024, 24, 189. [Google Scholar] [CrossRef]
Comparison Criteria | Study A [5] | Study B [7] | Study C [8] | Study D [9] | This Study |
---|---|---|---|---|---|
Research Objective | Neutralization of entropy-based ransomware detection | Neutralization of entropy-based ransomware detection | Detection countermeasures for study B | Neutralization of entropy-based ransomware detection | Detection countermeasures for study D |
Entropy Manipulation Method | Base64 | Base64, Base32, ascii 85, URL | N/A | FPE | N/A |
Detection Method | N/A | N/A | Machine learning-based method | N/A | Machine learning-based method |
Dataset | Govdocs1 | Govdocs1 | Govdocs1 | Govdocs1 | Govdocs1 |
Contribution | Identification of the limitations of entropy-based ransomware detection using simple encryption algorithms | Improvement of neutralization techniques by diversifying entropy values with Base64 algorithms | Introduction of a counter-detection method for neutralization techniques using encoding algorithms | Resolution of the limitations of encoding-based neutralization techniques | Introduction of a counter-detection method for neutralization techniques using FPE |
Limitations | Produces fixed entropy values | Can be decoded and detected using machine learning-based algorithms | X | Can be detected using machine learning-based algorithms | X |
File Type | File Format | Number of Files | Radix |
---|---|---|---|
Text file | CSV | 800 | Radix 5 |
TXT | 800 | Radix 4 | |
System file | SYS | 450 | Radix 10 |
DLL | 800 | Radix 8 | |
Document file | 450 | Radix 16 | |
DOC | 450 | Radix 5 | |
DOCX | 150 | Radix 16 | |
PPT | 450 | Radix 10 | |
PPTX | 150 | Radix 16 | |
XLS | 150 | Radix 4 | |
XLSX | 30 | Radix 16 | |
Image file | JPG | 450 | Radix 16 |
Webpage file | HTML | 800 | Radix 6 |
Compressed file | ZIP | 6 | Radix 16 |
Source code file | C | 150 | Radix 6 |
CPP | 150 | Radix 5 |
Dataset | Feature Set | Total Number of Files | Number of Files Infected with Ransomware | Number of Plaintext Files | Ratio |
---|---|---|---|---|---|
Dataset 1 | Entropy, file type | 12,472 | 6236 | 6236 | 1:1 |
Dataset 2 | Entropy, file type, file size | 12,472 | 6236 | 6236 | 1:1 |
Dataset 3 | Entropy, file type, file size, file MAC data | 12,472 | 6236 | 6236 | 1:1 |
Dataset | Model | Hyperparameter |
---|---|---|
Dataset 1 | KNN | n_neighbors: 15 |
Logistic Regression | C: 0.01, penalty: l2 | |
Decision Tree | max_depth: 12 | |
Random Forest | n_estimators: 4 | |
Gradient Boosting | max_depth: 4, learning_rate: 0.1 | |
MLP | max_iter: 1000, alpha: 0.00001 | |
SVM | C: 10,000,000 | |
Dataset 2 | KNN | n_neighbors: 15 |
Logistic Regression | C: 0.01, penalty: l2 | |
Decision Tree | max_depth: 16 | |
Random Forest | n_estimators: 11 | |
Gradient Boosting | max_depth: 13, learning_rate: 0.001 | |
MLP | max_iter: 1000, alpha: 0.00001 | |
SVM | C: 1,000,000 | |
Dataset 3 | KNN | n_neighbors: 1 |
Logistic Regression | C: 10,000, penalty: l2 | |
Decision Tree | max_depth: 3 | |
Random Forest | n_estimators: 1 | |
Gradient Boosting | max_depth: 1, learning_rate: 0.001 | |
MLP | max_iter: 1000, alpha: 0.00001 | |
SVM | C: 1,000,000 |
Classification | Description |
---|---|
True Positive (TP) | Accurately classifies files infected with ransomware using FPE applied. |
True Negative (TN) | Accurately classifies plaintext files. |
False Positive (FP) | Misclassifying plaintext files as infected with ransomware using FPE. |
False Negative (FN) | Files infected with ransomware using FPE are incorrectly classified as plaintext. |
File Type | File Format | Entropy-Based Detection Method | Proposed Method | |
Threshold | Precision | Precision | ||
Text file | csv | 0.2 | 54.26% | 98.59% |
txt | 0.1 | 50.89% | 98.76% | |
System file | sys | 0.0 | 50.00% | 96.36% |
dll | 0.0 | 50.00% | 97.51% | |
Document file | 0.4 | 80.58% | 97.36% | |
doc | 0.1 | 51.84% | 98.60% | |
docx | 0.4 | 85.23% | 98.56% | |
ppt | 0.0 | 50.00% | 98.87% | |
pptx | 0.1 | 64.38% | 99.12% | |
xls | 0.0 | 50.00% | 98.22% | |
xlsx | 0.6 | 90.91% | 100.00% | |
Image file | jpg | 0.1 | 59.95% | 92.83% |
Webpage file | html | 0.0 | 50.00% | 97.67% |
Compressed file | zip | 0.0 | 50.00% | 46.67% |
Source code file | c | 0.1 | 60.98% | 98.26% |
cpp | 0.3 | 74.63% | 96.80% | |
Average | 60.85% | 94.64% |
File Format | Number of Files by File Format According to Data Ratios (Ciphertext/Plaintext) | ||||
---|---|---|---|---|---|
1:99 | 1:9 | 5:5 | 9:1 | 99:1 | |
CSV | 8:792 | 80:720 | 400:400 | 720:80 | 792:8 |
TXT | 8:792 | 80:720 | 400:400 | 720:80 | 792:8 |
SYS | 5:445 | 45:405 | 225:225 | 405:45 | 445:5 |
DLL | 8:792 | 80:720 | 400:400 | 720:80 | 792:8 |
5:445 | 45:405 | 225:225 | 405:45 | 445:5 | |
DOC | 5:445 | 45:405 | 225:225 | 405:45 | 445:5 |
DOCX | 2:148 | 15:135 | 75:75 | 135:15 | 148:2 |
PPT | 5:445 | 45:405 | 450:450 | 405:45 | 445:5 |
PPTX | 2:148 | 15:135 | 75:75 | 135:15 | 148:2 |
XLS | 2:148 | 15:135 | 75:75 | 135:15 | 148:2 |
XLSX | 1:29 | 3:27 | 30:30 | 27:3 | 29:1 |
JPG | 5:445 | 45:405 | 225:225 | 405:45 | 445:5 |
HTML | 8:792 | 80:720 | 400:400 | 720:80 | 792:8 |
C | 2:148 | 15:135 | 75:75 | 135:15 | 148:2 |
CPP | 2:148 | 15:135 | 75:75 | 135:15 | 148:2 |
Study A [28] | Study B [29] | Study C [30] | Study D [31] | Study E [32] | Ours | |
---|---|---|---|---|---|---|
Accuracy | 89~99.1% (AVR. 94.05%) | 92.3~97.8% (AVR. 95.05%) | 89.2~98.7% (AVR. 93.95%) | 92.7~96.6% (AVR. 94.65%) | 86.84~97.24% (AVR. 92.04%) | 95.67~99.97% (97.82%) |
Precision | 89.73~99.2% (AVR. 94.465%) | 93.5~98.6% (AVR. 96.05%) | X | 91.3~94.3% (AVR. 92.8%) | 88.96~98.5% (AVR. 93.73%) | 99.54~100% (AVR. 99.77%) |
Recall | 87.43~98.9% (AVR. 93.165%) | X | X | 90.2~91.4% (AVR. 90.8%) | 97.28~100% (AVR. 98.64%) | 91.19~99.93% (AVR. 95.56%) |
F1-Score | 88.74~97.64% (AVR. 93.19%) | X | X | 90.7~92.8% (AVR. 91.75%) | 92.94~98.45% (AVR. 95.695%) | 95.39~99.97% (AVR. 97.68%) |
Dataset | Kaggle | Ransomware samples collected from multiple sources | Real-world ransomware samples | VirusShare, VirusTotal, and other well-known repositories | Kaggle | GovDoc1 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Lee, J.; Kim, J.; Jeong, H.; Lee, K. A Machine Learning-Based Ransomware Detection Method for Attackers’ Neutralization Techniques Using Format-Preserving Encryption. Sensors 2025, 25, 2406. https://doi.org/10.3390/s25082406
Lee J, Kim J, Jeong H, Lee K. A Machine Learning-Based Ransomware Detection Method for Attackers’ Neutralization Techniques Using Format-Preserving Encryption. Sensors. 2025; 25(8):2406. https://doi.org/10.3390/s25082406
Chicago/Turabian StyleLee, Jaehyuk, Jinwook Kim, Hanjo Jeong, and Kyungroul Lee. 2025. "A Machine Learning-Based Ransomware Detection Method for Attackers’ Neutralization Techniques Using Format-Preserving Encryption" Sensors 25, no. 8: 2406. https://doi.org/10.3390/s25082406
APA StyleLee, J., Kim, J., Jeong, H., & Lee, K. (2025). A Machine Learning-Based Ransomware Detection Method for Attackers’ Neutralization Techniques Using Format-Preserving Encryption. Sensors, 25(8), 2406. https://doi.org/10.3390/s25082406