Feasibility Evaluation of Secure Offline Large Language Models with Retrieval-Augmented Generation for CPU-Only Inference
Abstract
1. Introduction
Objectives
- Identify and apply appropriate model evaluation metrics.
- Assess a range of models using the selected metrics.
- Develop a methodology to test text summarization and question-answering models.
- Develop code to test and evaluate models.
- Recommend the most effective text summarization and question-answering models based on performance.
2. Materials and Methods
2.1. Framework Selection
2.2. Hardware Selection
2.3. Model Selection
2.4. Resource Selection
2.5. Question Selection
2.6. Metric Selection
2.7. Implementation
3. Results
3.1. Workstation Results
3.2. Laptop Results
3.3. Latency Analysis
4. Discussion
5. Conclusions
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
Appendix A
Question | Llama 2 Chat 7B (Model 2) Response | Textbook Excerpt for Comparison [22] | Metrics BERTScore/METEOR |
---|---|---|---|
1. What are the five functions of managers? | Based on the provided context, the five functions of managers are:
| (page 70) The activities of managers, including project managers, can be classified into the five functions. First is deciding what has to be done and how it will be done. This is the planning function, which involves setting a purpose or goal and establishing the means for achieving it consistent with higher-level organizational goals, resources, and constraints in the environment. Second and related to planning is arranging for the work to be done; this is the organizing function. This involves (1) hiring, training, and gathering people into a team with specified authority, responsibility, and accountability relationships; (2) acquiring and allocating materials, capital, and other resources; and (3) creating an organization structure with policies, procedures, reporting patterns, and communication channels. Third is directing and motivating people to attain the goal. This is the leadership function. Fourth is monitoring work performance with respect to the goal and taking necessary action whenever work deviates from the goal; this is the control function. The four functions are aimed at the goal, which implies a fifth function: assessing the four functions to determine how well each of the functions is doing and whether the functions or the goals need to be changed. | 0.9053/0.5509 |
2. What are the four phases of the systems development lifecycle? | Based on the provided context, the four phases of the systems development lifecycle are:
| (page 170) The project life cycle is part of a larger life cycle called the systems development cycle (SDC). Virtually all human-made systems follow the four phases of this cycle:
| 0.8783/0.4179 |
3. What are the four costs of quality? | According to the given context, the four costs of quality are:
| (page 650) The costs of quality are classified as prevention, appraisal and control (costs of conformance), internal failure, and external failure (costs of nonconformance):
| 0.9704/0.8134 |
4. What are the five responsibilities of the project engineer? | Based on the provided context, the five responsibilities of the project engineer are:
| (page 1047) The project engineer (also known as systems engineer or systems designer) shoulders responsibility for coordinating technological areas and assures integrated design of the end-item. When several functional areas or subcontractors are involved, the project engineer:
| 0.9564/0.7493 |
5. What are the three common types of programs? | Based on the provided context, the three common types of programs are:
| (page 1161) Among common types of programs are goal-oriented, improvement, and portfolio.
| 0.8853/0.4704 |
Question | Llama 2 Chat 7B (Model 2) Response | Excerpt for Comparison—ChatGPT 4o | Metrics BERTScore/METEOR |
---|---|---|---|
6. What is the name of Chapter 2 of the project management book and make a 100-word summary of the chapter? | Chapter 2 provides an overview of project management principles, systems methodologies, and the systems approach. It covers the origins and concepts of project management, situations where it is needed, and examples of applications. The chapter also introduces the idea that project management is a philosophy that underlies all human-made systems, from simple tools to complex organizations. | Chapter 2 of ‘Project Management for Engineering, Business, and Technology’ discusses the systems approach, viewing projects as systems comprising people, equipment, materials, and facilities organized to achieve goals. This chapter introduces the systems approach and its methodologies, including systems analysis and systems engineering, crucial for managing technical projects. It explains a system’s thinking, which involves understanding a system’s parts, their interactions, and their contribution to the system’s purpose. Key concepts include defining system objectives, resources, and constraints, and using models to understand and manage systems effectively. The chapter emphasizes a holistic view, ensuring project elements are integrated to achieve overall objectives. | 0.8770/0.1556 |
7. What is the name of Chapter 4 of the project management book and make a 100-word summary of the chapter? | Chapter 4 focuses on planning and control methods, procedures, and systems used in project management. The chapter covers various techniques for creating a detailed project plan, including Wideman’s’ Project Management Method and the Delphi Technique. It also discusses different types of project scheduling, such as Gantt charts and critical path method, and explains how to estimate project costs and create a project budget. Additionally, the chapter covers important control procedures, including progress reporting, performance measurement, and conflict resolution. | Chapter 4 of ‘Project Management for Engineering, Business, and Technology’ covers Project and System Definition. It focuses on Phase B, which involves defining the project and system to ensure they meet user requirements and stakeholder needs. This phase includes detailed planning, specifying work tasks, resources, schedules, and budgets. It emphasizes the importance of thorough definition to avoid problems during execution. The chapter also discusses phased or rolling wave project planning, allowing for iterative refinement of plans. A successful project requires a well-defined system and project plan, integrating user and system requirements comprehensively. | 0.8729/0.2887 |
8. What is the name of Chapter 10 of the project management book and make a 100-word summary of the chapter? | Chapter 10 focuses on the importance of organizational structure, leadership, and human behavior in project management. The chapter covers topics such as organizational design, team building, communication, motivation, and conflict resolution. It also discusses the role of stakeholders and their impact on project success. The chapter highlights the need for effective management to ensure that projects are completed on time, within budget, and with desired outcomes. | Chapter 10 of ‘Project Management for Engineering, Business, and Technology’ focuses on Project Risk Management. It explains the concepts of risk, emphasizing that risk is inherent in all projects and can lead to outcomes deviating from the plan. Key components include risk identification, assessment, response planning, and monitoring. Risk identification involves predicting potential problems, while assessment evaluates their likelihood and impact. Response planning outlines strategies to mitigate, transfer, or accept risks, and monitoring involves tracking risks throughout the project lifecycle. The chapter underscores the importance of integrating risk management into overall project management to ensure successful project outcomes. | 0.8821/0.2774 |
9. What is the name of Chapter 16 of the project management book and make a 100-word summary of the chapter? | Chapter 16 focuses on organizational structure and culture, leadership styles, motivation, communication, and teamwork. The chapter covers topics such as organizational design, project management office (PMO) roles and responsibilities, leadership skills, and employee motivation. It also discusses the impact of cultural differences on project management and provides guidelines for managing multicultural teams. | Chapter 16 of “Project Management for Engineering, Business, and Technology” emphasizes the importance of managing participation, teamwork, and conflict within project teams. Effective leadership in project management involves a balance of task-oriented and relations-oriented approaches, with an emphasis on participative management to foster commitment and reduce stress. Teamwork is crucial for project success, and methods for building and improving teams are discussed, including techniques for virtual teams. Conflict, inevitable in projects, can be beneficial if managed properly through confrontation and problem-solving. The chapter also addresses managing emotional stress, highlighting the need for social support and participative management to mitigate stress. | 0.8749/0.1911 |
10. What is the name of Chapter 19 of the project management book and make a 100-word summary of the chapter? | Chapter 19 focuses on the corporate context and international project management. It highlights the importance of understanding the cultural, political, and economic differences between countries when managing projects internationally. The chapter covers topics such as cross-cultural communication, team diversity, and global project management standards. It also provides an overview of the various challenges and opportunities associated with international project management. | Chapter 19 of “Project Management for Engineering, Business, and Technology” addresses International Project Management, emphasizing the complexities and challenges inherent in managing projects across borders. It discusses issues like cultural differences, local institutions, geo-national issues, and the importance of understanding local stakeholders. The chapter highlights the role of the project manager and the necessity of having a local representative to navigate these challenges effectively. It underscores the significance of communication, risk management, and building strong team relationships. Additionally, it provides strategies for handling the unique aspects of international projects to ensure their successful execution and completion. | 0.9102/0.2482 |
References
- Tian, K.; Mitchell, E.; Yao, H.; Manning, C.D.; Finn, C. Fine-tuning Language Models for Factuality. arXiv 2023, arXiv:2311.08401. [Google Scholar] [CrossRef]
- Naveed, H.; Khan, A.U.; Qiu, S.; Saqib, M.; Anwar, S.; Usman, M.; Akhtar, N.; Barnes, N.; Mian, A. A Comprehensive Overview of Large Language Models. arXiv 2024, arXiv:2307.06435. [Google Scholar] [CrossRef]
- Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.N.; Kaiser, Ł.U.; Polosukhin, I. Attention Is All You Need. In Advances in Neural Information Processing Systems; Curran Associates, Inc.: Red Hook, NY, USA, 2017; Available online: https://proceedings.neurips.cc/paper/2017/hash/3f5ee243547dee91fbd053c1c4a845aa-Abstract.html (accessed on 7 April 2025).
- PromptEngineer. LocalGPT. Python. Available online: https://github.com/PromtEngineer/localGPT (accessed on 7 April 2025).
- Zylon by PrivateGPT. PrivateGPT. Python. Available online: https://github.com/zylon-ai/private-gpt (accessed on 7 April 2025).
- Nomic AI. gpt4all. C++. Available online: https://github.com/nomic-ai/gpt4all (accessed on 7 April 2025).
- ggml-org. /llama.cpp. C++. Available online: https://github.com/ggml-org/llama.cpp (accessed on 7 April 2025).
- Ollama. ollama. Go. Available online: https://github.com/ollama/ollama (accessed on 7 April 2025).
- Intel® CoreTM i7-8650U Processor. Available online: https://www.intel.com/content/www/us/en/products/sku/124968/intel-core-i78650u-processor-8m-cache-up-to-4-20-ghz/specifications.html (accessed on 7 April 2025).
- Intel® Xeon® Processor E5-2667 v2. Available online: https://www.intel.com/content/www/us/en/products/sku/75273/intel-xeon-processor-e52667-v2-25m-cache-3-30-ghz/specifications.html (accessed on 7 April 2025).
- Tuggener, L.; Sager, P.; Taoudi-Benchekroun, Y.; Grewe, B.F.; Stadelmann, T. So you want your private LLM at home? A survey and benchmark of methods for efficient GPTs. In Proceedings of the 2024 11th IEEE Swiss Conference on Data Science (SDS), Zurich, Switzerland, 30–31 May 2024; pp. 205–212. [Google Scholar] [CrossRef]
- Tripathi, O.M. GGUF Models and Quantization. SSRN 2024. SSRN:5044207. [Google Scholar] [CrossRef]
- microsoft/Phi-3-mini-4k-instruct · Hugging Face. Available online: https://huggingface.co/microsoft/Phi-3-mini-4k-instruct (accessed on 7 April 2025).
- meta-llama (Meta Llama). Available online: https://huggingface.co/meta-llama (accessed on 7 April 2025).
- Touvron, H.; Lavril, T.; Izacard, G.; Martinet, X.; Lachaux, M.-A.; Lacroix, T.; Rozière, B.; Goyal, N.; Hambro, E.; Azhar, F.; et al. LLaMA: Open and Efficient Foundation Language Models. arXiv 2023, arXiv:2302.13971. [Google Scholar] [CrossRef]
- meta-llama/Llama-2-7b-chat · Hugging Face. Available online: https://huggingface.co/meta-llama/Llama-2-7b-chat (accessed on 7 April 2025).
- Lang, J.; Guo, Z.; Huang, S. A Comprehensive Study on Quantization Techniques for Large Language Models. arXiv 2024, arXiv:2411.02530. [Google Scholar] [CrossRef]
- TheBloke (Tom Jobbins). Available online: https://huggingface.co/TheBloke (accessed on 7 April 2025).
- meta-llama/Llama-2-13b-chat · Hugging Face. Available online: https://huggingface.co/meta-llama/Llama-2-13b-chat (accessed on 7 April 2025).
- mistralai/Mistral-7B-Instruct-v0.2 · Hugging Face. Available online: https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.2 (accessed on 7 April 2025).
- Jiang, A.Q.; Sablayrolles, A.; Mensch, A.; Bamford, C.; Chaplot, D.S.; de las Casas, D.; Bressand, F.; Lengyel, G.; Lample, G.; Saulnier, L.; et al. Mistral 7B. arXiv 2023, arXiv:2310.06825. [Google Scholar] [CrossRef]
- Nicholas, J.M.; Steyn, H. Project Management for Engineering, Business and Technology, 5th ed.; Routledge: New York, NY, USA, 2017; Available online: https://routledgetextbooks.com/textbooks/9781138937345/ (accessed on 7 April 2025).
- Na, S.; Jeong, G.; Ahn, B.H.; Young, J.; Krishna, T.; Kim, H. Understanding Performance Implications of LLM Inference on CPUs. In Proceedings of the 2024 IEEE International Symposium on Workload Characterization (IISWC), Vancouver, BC, Canada, 15–17 September 2024; pp. 169–180. [Google Scholar] [CrossRef]
- Ganesan, K. ROUGE 2.0: Updated and Improved Measures for Evaluation of Summarization Tasks. arXiv 2018, arXiv:1803.01937. [Google Scholar] [CrossRef]
- Papineni, K.; Roukos, S.; Ward, T.; Zhu, W.J. BLEU: A method for automatic evaluation of machine translation. In Proceedings of the 40th Annual Meeting on Association for Computational Linguistics—ACL ’02, Philadelphia, PA, USA, 7–12 July 2002; Association for Computational Linguistics: Stroudsburg, PA, USA, 2002; p. 311. [Google Scholar] [CrossRef]
- Hu, T.; Zhou, X.H. Unveiling LLM Evaluation Focused on Metrics: Challenges and Solutions. arXiv 2024, arXiv:2404.09135. [Google Scholar] [CrossRef]
- Banerjee, S.; Lavie, A. METEOR: An Automatic Metric for MT Evaluation with Improved Correlation with Human Judgments. In Proceedings of the ACL Workshop on Intrinsic and Extrinsic Evaluation Measures for Machine Translation and/or Summarization; Goldstein, J., Lavie, A., Lin, C.Y., Voss, C., Eds.; Association for Computational Linguistics: Ann Arbor, MI, USA, 2005; pp. 65–72. Available online: https://aclanthology.org/W05-0909/ (accessed on 7 April 2025).
- Zhang, T.; Kishore, V.; Wu, F.; Weinberger, K.Q.; Artzi, Y. BERTScore: Evaluating Text Generation with BERT. arXiv 2020, arXiv:1904.09675. [Google Scholar] [CrossRef]
Model | Model Name | Parameters | Memory | Creator | Quantization |
---|---|---|---|---|---|
1 | Phi-3-Mini-4k-Instruct | 3.8 billion | 7.2 GB | Microsoft | None |
2 | Llama 2 Chat 7B | 7 billion | 3.3 GB | Meta AI/TheBloke | 4-bit Medium |
3 | Mistral 7B Instruct v0.2 | 7 billion | 4.37 GB | Mistral/TheBloke | 4-bit Medium |
4 | Llama 2 Chat 13B | 13 billion | 7.87 GB | Meta AI/TheBloke | 4-bit Medium |
Number | Question | Source |
---|---|---|
1 | What are the five functions of managers? | Chapter 1.1 |
2 | What are the four phases of the systems development lifecycle? | Chapter 3.2 |
3 | What are the four costs of quality? | Chapter 9.2 |
4 | What are the five responsibilities of the project engineer? | Chapter 15.5 |
5 | What are the three common types of programs? | Chapter 17.5 |
6 | What is the name of Chapter 2 of the project management book and make a 100-word summary of the chapter? | Chapter 2 |
7 | What is the name of Chapter 4 of the project management book and make a 100-word summary of the chapter? | Chapter 4 |
8 | What is the name of Chapter 10 of the project management book and make a 100-word summary of the chapter? | Chapter 10 |
9 | What is the name of Chapter 16 of the project management book and make a 100-word summary of the chapter? | Chapter 16 |
10 | What is the name of Chapter 19 of the project management book and make a 100-word summary of the chapter? | Chapter 19 |
Prediction | METEOR | BERTScore | |
---|---|---|---|
Reference statement | Artificial intelligence is an increasingly prominent topic across multiple domains. | 1.0 | 1.0 |
Minor modification | Artificial intelligence is a progressively popular topic across various domains. | 0.672 | 0.959 |
Negative | Artificial intelligence is not an increasingly prominent topic across multiple domains. | 0.986 | 0.975 |
Unrelated statement | Language models generate text based on patterns they learned during training. | 0.0 | 0.859 |
Laptop | Workstation | |||||||
---|---|---|---|---|---|---|---|---|
Model 1 | Model 2 | Model 3 | Model 4 | Model 1 | Model 2 | Model 3 | Model 4 | |
name | Phi-3-Mini-4k 3.8B | Llama 2 Chat 7B | Mistral 7B Instruct | Llama 2 Chat 13B | Phi-3-Mini-4k 3.8B | Llama 2 Chat 7B | Mistral 7B Instruct | Llama 2 Chat 13B |
prompt time (s) | 62.9 ± 31.8 | 70.1 ± 37.1 | 71.9 ± 37.8 | 149 ± 74 | 10.7 ± 5.4 | 21.2 ± 11.0 | 21.9 ± 11.3 | 40.7 ± 21.1 |
generation time (s) | 172 ± 66 | 55.4 ± 21.7 | 40.5 ± 18.8 | 120 ± 35 | 61.3 ± 26.9 | 18.6 ± 6.9 | 14.2 ± 7.5 | 32.9 ± 10.3 |
per-token time (ms) | 533 ± 14 | 316 ± 7 | 311 ± 6 | 683 ± 6 | 184 ± 3 | 106 ± 2 | 103 ± 1 | 187 ± 2 |
Model | Model Name | Parameters | Workstation Load Time (s) | Laptop Load Time (s) |
---|---|---|---|---|
1 | Phi-3-Mini-4k- Instruct | 3.8 billion | 11.5 | 58.8 |
2 | Llama 2 Chat 7B | 7 billion | 19.6 | 60.1 |
3 | Mistral 7B Instruct v0.2 | 7 billion | 20.8 | 64.4 |
4 | Llama 2 Chat 13B | 13 billion | 37.0 | 135.6 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Tyndall, E.; Wagner, T.; Gayheart, C.; Some, A.; Langhals, B. Feasibility Evaluation of Secure Offline Large Language Models with Retrieval-Augmented Generation for CPU-Only Inference. Information 2025, 16, 744. https://doi.org/10.3390/info16090744
Tyndall E, Wagner T, Gayheart C, Some A, Langhals B. Feasibility Evaluation of Secure Offline Large Language Models with Retrieval-Augmented Generation for CPU-Only Inference. Information. 2025; 16(9):744. https://doi.org/10.3390/info16090744
Chicago/Turabian StyleTyndall, Erick, Torrey Wagner, Colleen Gayheart, Alexandre Some, and Brent Langhals. 2025. "Feasibility Evaluation of Secure Offline Large Language Models with Retrieval-Augmented Generation for CPU-Only Inference" Information 16, no. 9: 744. https://doi.org/10.3390/info16090744
APA StyleTyndall, E., Wagner, T., Gayheart, C., Some, A., & Langhals, B. (2025). Feasibility Evaluation of Secure Offline Large Language Models with Retrieval-Augmented Generation for CPU-Only Inference. Information, 16(9), 744. https://doi.org/10.3390/info16090744