EVuLLM: Ethereum Smart Contract Vulnerability Detection Using Large Language Models
Abstract
1. Introduction
- C1
- We propose an enhanced vulnerability detection approach for Ethereum smart contracts through the fine-tuning of Large Language Models (LLMs) using parameter-efficient fine-tuning (PEFT) techniques, including Quantized Low-Rank Adaptation (QLoRA), which achieves improved accuracy and efficiency.
- C2
- We demonstrate the viability of open-source, lightweight LLMs for smart contract analysis, significantly lowering the financial and computational barriers to adoption.
- C3
- We introduce a novel, combined evaluation dataset to address the scarcity of diverse, code-snippet-based benchmarks in smart contract security research.
- C4
- We develop fine-tuned LLMs capable of effectively identifying vulnerabilities in Ethereum smart contracts across a range of common patterns.
2. Background
2.1. Large Language Models
2.2. Fine-Tuning Large Language Models
2.3. Retrieval-Augmentented Generation
3. Related Work
3.1. Static and Machine Learning-Based Analysis
3.2. Large Language Models (LLMs)
3.3. Fine-Tuned LLMs
3.4. Retrieval-Augmented Generation
3.5. Summary
- High Cost: Proprietary models such as GPT-4 and Claude-v1.3 incur significant computational and financial costs, particularly for large-scale analysis of smart contracts.
- Limited Customisation: Pre-trained models cannot be adapted easily to domain-specific patterns without fine-tuning, leading to higher false positives and lower precision.
4. Methodology
4.1. TrustLLM Dataset
4.2. EVuLLM Dataset
4.3. Fine-Tuning LLMs for Vulnerability Detection
4.3.1. Model Selection
4.3.2. Fine-Tuning Methodology
4.4. Implementation of a RAG Framework for Vulnerability Detection Using Ollama
Architecture Design for the RAG Framework
- Instructions: Task description and classification guidance.
- Input Code Snippet: The Solidity code under analysis.
- Response Placeholder: Marks where the model generates its output.
5. Results and Discussion
5.1. Baseline Performance
5.1.1. RAG Models Without Retrieval
5.1.2. Models Used for Fine-Tuning (Pre-Fine-Tuning)
5.2. Retrieval-Augmented Generation Results
5.3. Fine-Tuning Experiments
5.3.1. TrustLLM Dataset
- TrustLLM Qwen 2.5 Coder: https://huggingface.co/emandana/TrustLLM_Qwen2_5_Coder_7B_Instruct_4b (accessed on 11 August 2025),
- TrustLLM CodeGemma: https://huggingface.co/emandana/TrustLLM_codegemma_7b_4b (accessed on 11 August 2025),
- TrustLLM Mistral v0.3: https://huggingface.co/emandana/TrustLLM_mistral_7b_Instruct_v03_4b (accessed on 11 August 2025),
- TrustLLM Mistral Nemo: https://huggingface.co/emandana/TrustLLM_Mistral_Nemo_Instruct_2407_4b (accessed on 11 August 2025),
- TrustLLM LLaMA 3.1: https://huggingface.co/emandana/TrustLLM_Meta_Llama_3.1_8B_Instruct_4b (accessed on 11 August 2025).
5.3.2. EVuLLM Dataset
- EVuLLM Qwen 2.5 Coder: https://huggingface.co/emandana/EVuLLM_Qwen2_5_Coder_7B_Instruct_4b_d200 (accessed on 11 August 2025),
- EVuLLM Mistral v0.3: https://huggingface.co/emandana/EVuLLM_mistral_7b_Instruct_v03_4b_d200 (accessed on 11 August 2025),
- EVuLLM Mistral Nemo: https://huggingface.co/emandana/EVuLLM_Mistral_Nemo_Instruct_2407_4b_d200 (accessed on 11 August 2025),
- EVuLLM LLaMa 3.1: https://huggingface.co/emandana/EVuLLM_Meta_Llama_3.1_8B_Instruct_4b_d200 (accessed on 11 August 2025),
- EVuLLM CodeGemma: https://huggingface.co/emandana/EVuLLM_codegemma_7b_4b_d200 (accessed on 11 August 2025).
5.4. Classification Examples
5.5. Cost and Performance Analysis
6. Conclusions and Future Work
Author Contributions
Funding
Data Availability Statement
Acknowledgments
Conflicts of Interest
Abbreviations
CoT | Chain of Thought |
DAO | Decentralized Autonomous Organization |
DeFi | Decentralized Finance |
HPC | High Performance Computing |
LLM | Large Language Model |
LoRA | Low-Rank Adaptation |
PEFT | Parameter-Efficient Fine-Tuning |
QLoRA | Quantized Low-Rank Adaptation |
RAG | Retrieval-Augmented Generation |
References
- Nakamoto, S. Bitcoin: A Peer-to-Peer Electronic Cash System. 2008. Available online: https://assets.pubpub.org/d8wct41f/31611263538139.pdf (accessed on 11 August 2025).
- Kshetri, N. Blockchain’s roles in meeting key supply chain management objectives. Int. J. Inf. Manag. 2018, 39, 80–89. [Google Scholar] [CrossRef]
- Mengelkamp, E.; Gärttner, J.; Rock, K.; Weinhardt, C.; Bretschneider, B. Designing microgrid energy markets: A case study: The Brooklyn Microgrid. Appl. Energy 2018, 210, 870–880. [Google Scholar] [CrossRef]
- Azaria, A.; Ekblaw, A.; Vieira, T.; Lippman, A. MedRec: Using Blockchain for Medical Data Access and Permission Management. In Proceedings of the 2016 2nd International Conference on Open and Big Data (OBD), Vienna, Austria, 22–24 August 2016; pp. 25–30. [Google Scholar] [CrossRef]
- Zheng, Z.; Xie, S.; Dai, H.N.; Chen, X.; Wang, H. An overview of blockchain technology: Architecture, consensus, and future trends. In Proceedings of the 2017 IEEE International Congress on Big Data (BigData Congress), Boston, MA, USA, 11–14 December 2017; pp. 557–564. [Google Scholar] [CrossRef]
- Wood, G. Ethereum: A secure decentralised generalised transaction ledger. Ethereum Proj. Yellow Pap. 2014, 151, 1–32. [Google Scholar]
- Hewa, T.; Ylianttila, M.; Liyanage, M. Survey on Blockchain Based Smart Contracts: Applications, Opportunities and Challenges. J. Netw. Comput. Appl. 2021, 177, 102857. [Google Scholar] [CrossRef]
- Zheng, Z.; Xie, S.; Dai, H.N.; Chen, W.; Chen, X.; Weng, J.; Imran, M. An Overview on Smart Contracts: Challenges, Advances and Platforms. Future Gener. Comput. Syst. 2020, 105, 475–491. [Google Scholar] [CrossRef]
- Macrinici, D.; Cartofeanu, C.; Gao, S. Smart Contract Applications within Blockchain Technology: A Systematic Mapping Study. Telemat. Inform. 2018, 35, 2337–2354. [Google Scholar] [CrossRef]
- Zhang, Z.; Zhang, B.; Xu, W.; Lin, Z. Demystifying Exploitable Bugs in Smart Contracts. In Proceedings of the 2023 IEEE/ACM 45th International Conference on Software Engineering (ICSE), Melbourne, Australia, 14–20 May 2023; pp. 615–627. [Google Scholar] [CrossRef]
- Zhou, X.; Cao, S.; Sun, X.; Lo, D. Large Language Model for Vulnerability Detection and Repair: Literature Review and the Road Ahead. arXiv 2024, arXiv:2404.02525. [Google Scholar] [CrossRef]
- Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.N.; Kaiser, L.; Polosukhin, I. Attention Is All You Need. arXiv 2017, arXiv:1706.03762. [Google Scholar] [CrossRef]
- Bommasani, R.; Hudson, D.A.; Adeli, E.; Altman, R.; Arora, S.; von Arx, S.; Bernstein, M.S.; Bohg, J.; Bosselut, A.; Brunskill, E.; et al. On the Opportunities and Risks of Foundation Models. arXiv 2022, arXiv:2108.07258. [Google Scholar] [CrossRef]
- Brown, T.B.; Mann, B.; Ryder, N.; Subbiah, M.; Kaplan, J.; Dhariwal, P.; Neelakantan, A.; Shyam, P.; Sastry, G.; Askell, A.; et al. Language Models are Few-Shot Learners. Adv. Neural Inf. Process. Syst. 2020, 33, 1877–1901. [Google Scholar]
- Devlin, J.; Chang, M.W.; Lee, K.; Toutanova, K. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. arXiv 2018, arXiv:1810.04805. [Google Scholar]
- Howard, J.; Ruder, S. Universal Language Model Fine-tuning for Text Classification. arXiv 2018, arXiv:1801.06146. [Google Scholar] [CrossRef]
- Kumar, A.B.V. Fine-Tuning LLM: Parameter Efficient Fine-Tuning (PEFT) —LoRA, QLoRA (Part 1). 2024. Available online: https://abvijaykumar.medium.com/fine-tuning-llm-parameter-efficient-fine-tuning-peft-lora-qlora-part-1-571a472612c4 (accessed on 27 September 2012).
- Hu, E.J.; Shen, Y.; Wallis, P.; Allen-Zhu, Z.; Li, Y.; Wang, S.; Chen, W. LoRA: Low-Rank Adaptation of Large Language Models. arXiv 2021, arXiv:2106.09685. [Google Scholar] [CrossRef]
- Dettmers, T.; Pagnoni, A.; Holtzman, A.; Zettlemoyer, L. QLoRA: Efficient Finetuning of Quantized LLMs. arXiv 2023, arXiv:2305.14314. [Google Scholar] [CrossRef]
- Lewis, P.S.H.; Perez, E.; Piktus, A.; Petroni, F.; Karpukhin, V.; Goyal, N.; Küttler, H.; Lewis, M.; Yih, W.; Rocktäschel, T.; et al. Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks. arXiv 2020, arXiv:2005.11401. [Google Scholar] [CrossRef]
- Borgeaud, S.; Mensch, A.; Hoffmann, J.; Cai, T.; Rutherford, E.; Millican, K.; van den Driessche, G.; Lespiau, J.B.; Damoc, B.; Clark, A.; et al. Improving language models by retrieving from trillions of tokens. arXiv 2022, arXiv:2112.04426. [Google Scholar] [CrossRef]
- Schär, F. Decentralized Finance: On Blockchain- and Smart Contract-Based Financial Markets. In Federal Reserve Bank of St. Louis Review, Second Quarter; Federal Reserve Bank of St. Louis: St. Louis, MO, USA, 2021; pp. 153–174. [Google Scholar] [CrossRef]
- Gemini. Why Are Most dApps Built on Ethereum? 2023. Available online: https://www.gemini.com/cryptopedia/dapps-ethereum-decentralized-application (accessed on 4 September 2024).
- Brent, L.; Grech, N.; Lagouvardos, S.; Scholz, B.; Smaragdakis, Y. Ethainter: A Smart Contract Security Analyzer for Composite Vulnerabilities. In Proceedings of the 41st ACM SIGPLAN Conference on Programming Language Design and Implementation, New York, NY, USA, 15–20 June 2020; PLDI: Seoul, Republic of Korea, 2020; pp. 454–469. [Google Scholar] [CrossRef]
- Liu, H.; Liu, C.; Zhao, W.; Jiang, Y.; Sun, J. S-gram: Towards Semantic-Aware Security Auditing for Ethereum Smart Contracts. In Proceedings of the 33rd ACM/IEEE International Conference on Automated Software Engineering, New York, NY, USA, 3–7 September 2018; ASE ’18. pp. 814–819. [Google Scholar] [CrossRef]
- Bose, P.; Das, D.; Chen, Y.; Feng, Y.; Kruegel, C.; Vigna, G. SAILFISH: Vetting Smart Contract State-Inconsistency Bugs in Seconds. arXiv 2021, arXiv:2104.08638. [Google Scholar] [CrossRef]
- Liao, Z.; Zheng, Z.; Chen, X.; Nan, Y. SmartDagger: A Bytecode-Based Static Analysis Approach for Detecting Cross-Contract Vulnerability. In Proceedings of the 31st ACM SIGSOFT International Symposium on Software Testing and Analysis, New York, NY, USA, 18–22 July 2022; ISSTA: Trondheim, Norway, 2022; pp. 752–764. [Google Scholar] [CrossRef]
- So, S.; Hong, S.; Oh, H. SmarTest: Effectively Hunting Vulnerable Transaction Sequences in Smart Contracts through Language {Model-Guided} Symbolic Execution. In Proceedings of the 30th USENIX Security Symposium (USENIX Security 21), Vancouver, BC, Canada, 11–13 August 2021; pp. 1361–1378. [Google Scholar]
- Wang, W.; Song, J.; Xu, G.; Li, Y.; Wang, H.; Su, C. ContractWard: Automated Vulnerability Detection Models for Ethereum Smart Contracts. IEEE Trans. Netw. Sci. Eng. 2021, 8, 1133–1144. [Google Scholar] [CrossRef]
- So, S.; Oh, H. SmartFix: Fixing Vulnerable Smart Contracts by Accelerating Generate-and-Verify Repair Using Statistical Models. In Proceedings of the 31st ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering, San Francisco, CA, USA, 3–9 December 2023; pp. 185–197. [Google Scholar] [CrossRef]
- Sun, Y.; Wu, D.; Xue, Y.; Liu, H.; Wang, H.; Xu, Z.; Xie, X.; Liu, Y. GPTScan: Detecting Logic Vulnerabilities in Smart Contracts by Combining GPT with Program Analysis. In Proceedings of the IEEE/ACM 46th International Conference on Software Engineering, Lisbon, Portugal, 14–20 April 2024; pp. 1–13. [Google Scholar] [CrossRef]
- Li, Z.; Dutta, S.; Naik, M. LLM-Assisted Static Analysis for Detecting Security Vulnerabilities. arXiv 2024, arXiv:2405.17238. [Google Scholar] [CrossRef]
- Shou, C.; Liu, J.; Lu, D.; Sen, K. LLM4Fuzz: Guided Fuzzing of Smart Contracts with Large Language Models. arXiv 2024, arXiv:2401.11108. [Google Scholar] [CrossRef]
- Ma, W.; Wu, D.; Sun, Y.; Wang, T.; Liu, S.; Zhang, J.; Xue, Y.; Liu, Y. Combining Fine-Tuning and LLM-based Agents for Intuitive Smart Contract Auditing with Justifications. arXiv 2024, arXiv:2403.16073. [Google Scholar] [CrossRef]
- Boi, B.; Esposito, C.; Lee, S. Smart Contract Vulnerability Detection: The Role of Large Language Model (LLM). SIGAPP Appl. Comput. Rev. 2024, 24, 19–29. [Google Scholar] [CrossRef]
- Yang, Z.; Man, G.; Yue, S. Automated Smart Contract Vulnerability Detection Using Fine-tuned Large Language Models. In Proceedings of the 2023 6th International Conference on Blockchain Technology and Applications, New York, NY, USA, 26–28 June 2024; ICBTA ’23. pp. 19–23. [Google Scholar] [CrossRef]
- Soud, M.; Nuutinen, W.; Liebel, G. Soley: Identification and Automated Detection of Logic Vulnerabilities in Ethereum Smart Contracts Using Large Language Models. arXiv 2024, arXiv:2406.16244. [Google Scholar] [CrossRef]
- Sun, Y.; Wu, D.; Xue, Y.; Liu, H.; Ma, W.; Zhang, L.; Shi, M.; Liu, Y. LLM4Vuln: A Unified Evaluation Framework for Decoupling and Enhancing LLMs’ Vulnerability Reasoning. arXiv 2024, arXiv:2401.16185. [Google Scholar] [CrossRef]
- Boi, B.; Esposito, C.; Lee, S. VulnHunt-GPT: A Smart Contract Vulnerabilities Detector Based on OpenAI chatGPT. In Proceedings of the 39th ACM/SIGAPP Symposium on Applied Computing, New York, NY, USA, 8–12 April 2024; SAC ’24. pp. 1517–1524. [Google Scholar] [CrossRef]
- Daneshvar, S.S.; Nong, Y.; Yang, X.; Wang, S.; Cai, H. Exploring RAG-based Vulnerability Augmentation with LLMs. arXiv 2024, arXiv:2408.04125. [Google Scholar] [CrossRef]
- Yu, J. Retrieval Augmented Generation Integrated Large Language Models in Smart Contract Vulnerability Detection. arXiv 2024, arXiv:2407.14838. [Google Scholar] [CrossRef]
- David, I.; Zhou, L.; Qin, K.; Song, D.; Cavallaro, L.; Gervais, A. Do You Still Need a Manual Smart Contract Audit? arXiv 2023, arXiv:2306.12338. [Google Scholar] [CrossRef]
- Code4rena. 2024. Available online: https://code4rena.com (accessed on 5 January 2025).
- MetaTrustLabs. Github Repository–GPTScan Top200. 2024. Available online: https://github.com/MetaTrustLabs/GPTScan-Top200 (accessed on 23 November 2024).
- Unsloth AI. Unsloth: Fine-Tuning and Reinforcement Learning for LLMs. 2024. Available online: https://github.com/unslothai/unsloth (accessed on 25 May 2025).
- Dettmers, T. Bitsandbytes: 8-Bit Optimizers and Quantization Routines. 2022. Available online: https://github.com/TimDettmers/bitsandbytes (accessed on 25 May 2025).
- Wolf, T.; Debut, L.; Sanh, V.; Chaumond, J.; Delangue, C.; Moi, A.; Cistac, P.; Rault, T.; Louf, R.; Funtowicz, M.; et al. Transformers: State-of-the-Art Natural Language Processing. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing: System Demonstrations, Online, 16–20 November 2020; Liu, Q., Schlangen, D., Eds.; Association for Computational Linguistics: Stroudsburg, PA, USA, 2020; pp. 38–45. [Google Scholar] [CrossRef]
- Lhoest, Q.; Villanova del Moral, A.; Jernite, Y.; Thakur, A.; von Platen, P.; Patil, S.; Chaumond, J.; Drame, M.; Plu, J.; Tunstall, L.; et al. Datasets: A Community Library for Natural Language Processing. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing: System Demonstrations, Online/Punta Cana, Dominican Republic, 7–11 November 2021; Adel, H., Shi, S., Eds.; Association for Computational Linguistics: Stroudsburg, PA, USA, 2021; pp. 175–184. [Google Scholar] [CrossRef]
- Mangrulkar, S.; Gugger, S.; Debut, L.; Belkada, Y.; Paul, S.; Bossan, B. PEFT: State-of-the-Art Parameter-Efficient Fine-Tuning Methods. 2022. Available online: https://github.com/huggingface/peft (accessed on 25 May 2025).
- Unsloth AI. Codegemma 7B Instruct 4bit. 2024. Available online: https://huggingface.co/unsloth/codegemma-7b-it-bnb-4bit (accessed on 5 January 2025).
- Unsloth AI. Unsloth Qwen 2.5 Coder 7B 4bit. 2024. Available online: https://huggingface.co/unsloth/Qwen2.5-Coder-7B-Instruct-bnb-4bit (accessed on 14 January 2025).
- Unsloth AI. Unsloth Meta Llama 3.1 8B Instruct 4bit. 2024. Available online: https://huggingface.co/unsloth/Meta-Llama-3.1-8B-Instruct-bnb-4bit (accessed on 5 January 2025).
- Unsloth AI. Mistal 7B Instruct v0.3 4bit. 2024. Available online: https://huggingface.co/unsloth/mistral-7b-instruct-v0.3-bnb-4bit (accessed on 5 January 2025).
- Unsloth AI. Mistral Nemo Instruct 2407 4bit. 2024. Available online: https://huggingface.co/unsloth/Mistral-Nemo-Instruct-2407-bnb-4bit (accessed on 5 January 2025).
- Unsloth AI. What Model Should I Use? 2024. Available online: https://docs.unsloth.ai/get-started/beginner-start-here/what-model-should-i-use (accessed on 5 January 2025).
- Chroma. Chroma—AI-Native Vector Database. 2025. Available online: https://www.trychroma.com (accessed on 8 January 2025).
- Ollama. Ollama—Open Language Model Management. 2025. Available online: https://ollama.com (accessed on 8 January 2025).
- Ollama. CodeGemma: Latest—Ollama Library. 2025. Available online: https://ollama.com/library/codegemma:latest (accessed on 8 January 2025).
- Ollama. Granite Code: 8B—Ollama Library. 2025. Available online: https://ollama.com/library/granite-code:8b (accessed on 8 January 2025).
- Ollama. Gemma2: Latest—Ollama Library. 2025. Available online: https://ollama.com/library/gemma2:latest (accessed on 8 January 2025).
- LangChain. LangChain—Building Applications with LLMs. 2025. Available online: https://www.langchain.com (accessed on 8 January 2025).
Name | Method | Details |
---|---|---|
Ethainter [24] | Static Analysis | Modeling information flow and data sanitization. |
S-gram [25] | Static Analysis | N-gram modeling and static semantic analysis. |
SAILFISH [26] | Static Analysis | Detects state inconsistency bugs using storage dependency graphs and symbolic evaluation analysis. |
SmartDagger [27] | Static Analysis | Bytecode analysis, neural translation, and cross-contract flow graphs. |
SMARTEST [28] | Static Analysis | Detects vulnerable transaction sequences using symbolic execution guided by statistical language models. |
ContractWard [29] | ML | Using opcode bigrams, one-vs-rest methods, and XGBoost with balancing techniques. |
SmartFix [30] | ML | Generate-and-verify approach with formal verification and statistical models. |
GPTScan [31] | LLMs | Detecting logic vulnerabilities with GPT-based analysis and static checks. |
David et al. [42] | LLMs | Utilizing pre-trained models and experimenting with prompt engineering. |
LLM4FUZZ [33] | LLMs | Combining LLM-driven insights with static analysis and AST metrics. |
Ma et al. [34] | LLMs | Multi-agent classification and justification using prompt engineering and fine-tuning. |
Sóley [37] | LLMs | Focusing on pre-processed contract segments and fine-tuning. |
Boi et al. [35] | LLMs | Maps OWASP Top 10 to SWC vulnerabilities, performs fine-tuning on annotated datasets. |
Yang et al. [36] | LLMs | Uses PEFT. |
LLM4Vuln [38] | LLMs | Framework with knowledge retrieval, tool invocation, prompt schemes, and instruction following. Utilizes CoT techniques and vector-based retrieval systems. |
VulnHunt-GPT [39] | LLMs | Apply prompt engineering and context enrichment. |
VulScribeR [40] | LLMs | Augments vulnerable code samples for DLVD models using RAG with Mutation, Injection, and Extension strategies. Includes prompt templates and fuzzy parsing. |
Yu [41] | LLMs | Combines RAG with GPT-4, leveraging vector stores and embeddings to test guided and blind vulnerability detection in smart contract auditing. |
Paper | Model | Parameters | Free | PEFT | QLoRA | Performance (%) |
---|---|---|---|---|---|---|
Sun et al. [31] | GPT-3.5 | 175 B | Yes | - | - | 73.9 (F) |
David et al. [42] | GPT-4, Claude-v1.3 | 1.76 T | No | - | - | 40 (A) |
Shou et al. [33] | Llama2 | 70 B | No | - | - | - |
Ma et al. [34] | CodeLlama, Mixtral | 13 B, B | Yes | No | No | 91.21 (F) |
Soud et al. [37] | CodeBERTa | 84 M | Yes | - | - | 91.5 (F) |
Boi et al. [35] | Llama-2, GPT-2-XL | 7 B, 1.5 B | Yes | Yes | Yes | 59.9 (A) |
Yang et al. [36] | Llama-2, CodeLlama | 13 B | Yes | Yes | No | 34 (A) |
Sun et al. [38] | GPT-4, Mixtral, CodeLlama | 1.76 T, B, 7 B, 34 B, 13 B | No | - | - | 18.9 (F) |
Boi et al. [39] | GPT-3.5 | 175 B | No | - | - | 72.3 (F) |
Yu [41] | GPT-4 | 1.76 T | No | - | - | 62.7 (SR) |
Dataset Name | Type | Available |
---|---|---|
GPTScan [31] | Contracts | Yes |
DeFi Attack SoK [42] | Contracts | Yes |
S-gram [25] | Contracts | Yes |
SmartBugs [39] | Contracts | Yes |
Not so Smart Contracts [41] | Contracts | Yes |
TrustLLM [34] | Functions | Yes |
Yang et al. [36] | Functions | No |
DeFiHackLabs [41] | Functions | Yes |
DeFiVulnLabs [41] | Functions | Yes |
Smart Contract VulnDB [41] | Snippets | Yes |
Model | Parameters | 4-Bit | Instruct |
---|---|---|---|
CodeGemma 7B [50] | 7B | Yes | Yes |
Qwen 2.5 Coder [51] | 7B | Yes | Yes |
Llama 3.1 [52] | 8B | Yes | Yes |
Mistral v0.3 [53] | 7B | Yes | No |
Mistral Nemo [54] | 12B | Yes | No |
Hyperparameter | TrustLLM Dataset | EVuLLM Dataset | |
---|---|---|---|
Qwen 2.5 Coder | Other Models | All Models | |
Batch Size | 4 | 4 | 2 |
Gradient Accumulation Steps | 4 | 4 | 2 |
Learning Rate | 0.0001 | 0.0005 | 0.00001 |
Training Epochs | 5 | 10 | 4 |
Optimizer | AdamW_8bit | AdamW_8bit | AdamW_8bit |
Weight Decay | 0.01 | 0.01 | 0.01 |
Learning Rate Scheduler Type | linear | linear | linear |
Warmup Steps | 100 | 100 | 100 |
BFloat16 | True | True | True |
4-Bit Loading | True | True | True |
LoRA Rank | 32 | 32 | 32 |
LoRA alpha | 32 | 32 | 32 |
LoRA dropout | 0 | 0 | 0 |
Seed | 3407 | 3407 | 3407 |
Model | Parameters |
---|---|
CodeGemma 8B [58] | 8B |
Granite Code [59] | 8B |
Gemma 2 [60] | 9B |
Dataset | Model | Accuracy | Precision | Recall | F1-Score |
---|---|---|---|---|---|
TrustLLM | GPT-4o (2024-08-06) | 76.39 | 76.22 | 72.89 | 75.61 |
CodeGemma 8B | |||||
Gemma 2 | 62.91 | 65.47 | 59.84 | 62.58 | |
Granite Code | |||||
EVuLLM | GPT-4o (2024-08-06) | 82.21 | 81.80 | 86.71 | 85.77 |
CodeGemma 8B | |||||
Gemma 2 | 77.61 | 73.42 | 86.57 | 79.45 | |
Granite Code |
Dataset | Model | Accuracy | Precision | Recall | F1-Score |
---|---|---|---|---|---|
TrustLLM | GPT-4o (2024-08-06) | 76.39 | 76.22 | 72.89 | 75.61 |
CodeGemma 7B | |||||
Qwen 2.5 Coder | |||||
Llama 3.1 | 51.88 | ||||
Mistral v0.3 | 51.77 | ||||
Mistral Nemo | 100.00 | 68.09 | |||
EVuLLM | GPT-4o (2024-08-06) | 82.21 | 81.80 | 86.71 | 85.77 |
CodeGemma 7B | 100.00 | ||||
Qwen 2.5 Coder | |||||
Llama 3.1 | |||||
Mistral v0.3 | 68.66 | 68.66 | 68.66 | ||
Mistral Nemo |
Dataset | Model | Accuracy | Precision | Recall | F1-Score |
---|---|---|---|---|---|
TrustLLM | CodeGemma | 65.9 | 71.3 | 68.3 | |
Gemma 2 | 83.7 | ||||
Granite Code | |||||
EVuLLM | CodeGemma | 86.6 | |||
Gemma 2 | 79.1 | 80.0 | 78.8 | ||
Granite Code |
Model | Accuracy | Precision | Recall | F1-Score |
---|---|---|---|---|
CodeGemma 7B 4-bit | 92.52 | 92.87 | 92.52 | 92.52 |
Qwen 2.5 Coder | ||||
LLaMA 3.1 8B | ||||
Mistral 7B v0.3 | ||||
Mistral Nemo 12B |
Model | Accuracy | Precision | Recall | F1-Score |
---|---|---|---|---|
CodeGemma 7B 4-bit | 94.78 | 95.27 | 94.78 | 94.76 |
Qwen 2.5 Coder | ||||
LLaMA 3.1 8B | ||||
Mistral 7B | ||||
Mistral Nemo |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Mandana, E.; Vlahavas, G.; Vakali, A. EVuLLM: Ethereum Smart Contract Vulnerability Detection Using Large Language Models. Electronics 2025, 14, 3226. https://doi.org/10.3390/electronics14163226
Mandana E, Vlahavas G, Vakali A. EVuLLM: Ethereum Smart Contract Vulnerability Detection Using Large Language Models. Electronics. 2025; 14(16):3226. https://doi.org/10.3390/electronics14163226
Chicago/Turabian StyleMandana, Eleni, George Vlahavas, and Athena Vakali. 2025. "EVuLLM: Ethereum Smart Contract Vulnerability Detection Using Large Language Models" Electronics 14, no. 16: 3226. https://doi.org/10.3390/electronics14163226
APA StyleMandana, E., Vlahavas, G., & Vakali, A. (2025). EVuLLM: Ethereum Smart Contract Vulnerability Detection Using Large Language Models. Electronics, 14(16), 3226. https://doi.org/10.3390/electronics14163226