Introducing LEAF: LLM Edge Assessment Framework for Generative AI on the Edge
Abstract
1. Introduction
- Circular Economy Score: This score quantifies the sustainability value of repurposing existing hardware to reduce e-waste.
- Energy Efficiency: This score metric calculates the energy cost per inference (Joules/Token).
- Performance Speed: This score metric evaluates the token generation throughput (Tokens/Second).
- Model Accuracy: This score metric uses semantic metrics (F1/BERTScore) rather than manual human verification for the calculation of the semantic coherence.
- End-to-End Latency (Tlat): This metric assesses the total wall-clock time from request submission to final token generation, representing the actual delay experienced by the user.
2. Background
2.1. The Evolution: From Discriminative to Generative Edge AI
2.2. Challenges in Benchmarking Edge LLMs
2.3. Optimization Techniques Enabling Edge Deployment
2.4. The Sustainability Gap: Energy vs. Circular Economy
3. Literature Survey
3.1. LLMs in Industrial and IoT Applications
3.2. Cybersecurity and Explainability Frameworks
3.3. Edge Optimization and Benchmarking
3.4. Identification of the Research Gap
3.5. Metric Selection Rationale
- Semantic Accuracy: Since the hallucination thresholds reported in medical and legal LLM experiments are rigorous, we use BERTScore as an essential attribute.
- Latency and Speed: We are informed by the QoS limitations of real-time IoT systems, such as SOLAR of Time-to-First-Token and Tokens/Second.
- Circular Economy: The striking lack of metrics of sustainability in the common benchmarks, such as DeepEdgeBench, inspired the addition of the ‘Circular Economy Score.’
| Paper | Main Focus | Benchmarking | LLM on Edge |
|---|---|---|---|
| Huang (2025) [1] | Ollama on Edge Benchmark | Yes | Yes |
| Nezami (2025) [2] | (BeDGED) | Yes (Dataset) | Yes |
| Dettmers (2022) [4] | 8-bit Optimizers (Training) | No | Indirect |
| Adnyana (2026) [7] | PLC Code Gen Prompting | No | No |
| Al-Masri (2025) [8] | Edge API Discovery | No | No |
| Alasmari (2025) [9] | IoT Phishing Detection | No | Partial/Ambiguous |
| Alsuwaiket (2025) [10] | Zero-Day Threat Detection | No | No |
| Ataman (2025) [11] | Machine Translation Survey | No | No |
| Baller (2021) [12] | DNN Hardware Benchmarking | Yes (DNNs) | No (CNNs) |
| Bao (2025) [13] | Wireless Network Routing | Yes (Network) | Yes |
| Benmeziane (2024) [14] | NAS for Edge via LLM | Indirect | No |
| Çetinkaya (2025) [15] | Validation Framework | No | No |
| Chakraborty (2025) [16] | Hallucination Evaluation | No | No |
| Chen (2025) [17] | Length Control Fine-Tuning | No | No |
| Han (2016) [18] | Deep Compression (Pruning) | Yes (Mobile) | No (CNNs) |
| Hao (2023) [19] | Distributed Benchmarking | Yes (System) | No |
| Jain (2025) [20] | Chatbot Arch. Scaling | No (Simulation) | Yes (Theoretical) |
| Jebli (2025) [21] | Fog Computing LLM Survey | No | Yes |
| Jin (2025) [22] | Cloud-Edge Collaboration | Yes | Yes |
| Kim (2025) [23] | Mobile Korean LLM | Yes | Yes |
| Kohli (2025) [24] | Heterogeneous Edge Profiling | Yes | No (DNNs) |
| Krishnamurthy (2025) [25] | Fog Resource Provisioning | Yes (System) | No (LLM guides) |
| Lee (2025) [26] | Edge GPU Holistics (Thermal) | Yes | No (CNNs) |
| Li (2025) [27] | Efficient LLM Survey | No | Yes (Techniques) |
| Liu (2025) [28] | DeepSeek-R1 Fintech Eval | No | No |
| Liu (2025) [29] | Robotics and LLM Review | No | Yes (Concept) |
| Liu (2025) [30] | Smart Home Edge Routing | Yes (System) | Yes |
| Minott (2025) [31] | GenAI Edge Dataset | Yes | No (Coral/CNNs) |
| Nezami (2025) [32] | GenAI Edge Perf. Eval | Yes | Yes |
| Pozi (2025) [33] | Data-Augmented Routing | Yes (System) | Yes |
| Ranjan (2025) [34] | Vision transformers | Yes (Efficiency) | No (Vision) |
| Ray (2025) [35] | P2P CPU-Only LLM | Yes | Yes |
| Ren (2025) [36] | Edge Expert Deployment Cost | Yes (Simulation) | Yes |
| Saha (2025) [37] | Medical LLM Accuracy Eval | No | No |
| Shaikh (2025) [38] | Agriculture LLM Review | No | No |
| Sun (2025) [39] | Satellite Edge Optimization | Yes (System) | No (LLM guides) |
| Sun (2025) [40] | Trusted 6G LLM | Yes | Yes |
| Thapa (2025) [41] | Social Science LLM Review | No | No |
| Wang (2025) [42] | Federated Learning | Yes (Simulation) | No |
| Wang (2025) [43] | Edge LLM Survey | No | Yes (Concept) |
| Yang (2025) [44] | IoT + LLM + Privacy Review | No | Yes (Concept) |
| Yin (2024) [45] | Edge PM2.5 Forecasting | Yes (System) | Yes |
| Yuan (2025) [46] | Smart City Offloading (LLM) | Yes (System) | No (LLM guides) |
| Zhang (2025) [47] | 5G Spec Contradiction Detect | No | No |
| Zhang (2025) [48] | Wireless Edge GenLLM | Yes | Yes |
| Zhang (2025) [49] | HLS Code Correction | No | No |
| Zhu (2024) [50] | UAV Task Offloading (MARL) | Yes (System) | No (LLM guides) |
| Surianarayanan (2023) [51] | Edge AI Optimization Survey | No | No (DL/CNNs) |
| Stadnicka (2022) [52] | Industrial AI Needs Survey | No | No |
| Rupanetti (2024) [53] | Edge IoT Security (Intrusion) | Yes (System) | No (ML) |
| Liang (2025) [54] | Math Methods for Edge AI | No | No |
| Lawal (2024) [55] | Railroad Bridge Monitoring | Yes (Sensor) | No (TinyML) |
| Gültekin (2022) [56] | Vehicle Fault Detection | Yes (System) | No (ML) |
| Chen (2024) [57] | Feasibility of Edge AI | Yes (Simulation) | No |
| Bourechak (2023) [58] | AI/Edge Convergence | No | No |
| Mustafa (2025) [59] | Automation of benchmarking | Yes | No |
4. Methodology and Framework Design
4.1. The LEAF Architecture
- The Input Layer (Edge Environment): This layer represents the physical hardware (e.g., Raspberry Pi, Jetson, Legacy GPU Servers) as well as the Edge AI software stack (quantized models via Ollama).
- The Assessment Core (LEAF Engine): The central processing unit that monitors system telemetry (power, latency) and evaluates model output quality against a gold standard.
- The Visualization Layer (Output): In this layer, we generated a 5-point radar chart, which is the best method to visualize the trade-offs between sustainability, speed, and accuracy.
4.2. Evaluation Metrics Definition
- Circular Economy Score (SCE) This metric quantifies the environmental benefit when we are repurposing the hardware. It is a discrete score assigned based on the hardware’s lifecycle.
- Definition: where 1.0 represents fully repurposed e-waste (e.g., 5+ year old GPU) and represents newly manufactured silicon with high embodied carbon.
- Rationale: Incentivizes the extension of device lifespan, aligning with green computing principles.
- Energy Efficiency (Eeff)
- Definition: This is the energy cost to generate a complete response.where is the average power consumption (Watts) during load, and is the total time taken.
- Normalization: Higher efficiency (lower Joules) results in a higher score.
- Performance Speed (Rgen)
- Definition: The rate of text generation, measured in Tokens Per Second (TPS).
- Relevance: Critical for user experience in interactive applications like chatbots.
- Model Accuracy (F1BERT)
- Definition: We utilize BERTScore to evaluate semantic similarity between the LLM-generated summary and a human-verified reference summary.
- Rationale: Unlike n-gram metrics (ROUGE/BLEU), BERTScore captures contextual meaning, which is essential for evaluating generative reasoning.
- End-to-End Latency (Tlat)
- Definition: The total wall-clock time measured from the initial request submission to the completion of the task
- Normalization: Inverted scale, where lower time yields a higher score.
4.3. Experimental Setup
- Testbed Hardware:
- Low-Power Edge: Raspberry Pi 4 (4 GB) and Raspberry Pi 5 (8 GB) (Raspberry Pi Foundation, Cambridge, UK)—representing ARM-based CPU inference.
- Specialized Edge: NVIDIA Jetson Nano (NVIDIA Corporation, Santa Clara, CA, USA)—representing older, dedicated edge accelerators.
- Repurposed Workstation: AI Server with NVIDIA GTX 1050 Ti (4 GB) (NVIDIA Corporation, Santa Clara, CA, USA)—representing the “Circular Economy” candidate.
- Professional Edge: Physical Server with NVIDIA T400 (4 GB)—representing modern entry-level professional workstations.
- Software Stack:
- ○
- Inference Engine: Ollama (v0.1.29) (Ollama, San Francisco, CA, USA) serving GGUF quantized models (q4_k_m).
- ○
- Models Evaluated: granite3.3:2b, llama3.2:3b, gemma:2b, tinyllama, qwen2:0.5b, and deepseek-r1:1.5b.
- ○
- Benchmarking Tool: A custom Python (v0.3.13) (Python Software Foundation, Wilmington, DE, USA) pipeline using BERTScore for accuracy and system timers for latency.
- Procedure: Each device processed a standardized prompt (“Summarize the history of artificial intelligence”) across all models. Metrics were recorded over multiple runs to ensure statistical stability, capturing inference time, output text, and system telemetry.
5. Implementation
5.1. Hardware Testbed Configuration
- CPU: Quad-core ARM Cortex-A57 @ 1.43 GHz.
- GPU: 128-core Maxwell.
- RAM: 4 GB LPDDR4.
- Role: Represents older, GPU-accelerated edge devices common in edge-industrial deployments.
- CPU: Quad-core Broadcom BCM2711 (Cortex-A72) @ 1.5 GHz.
- RAM: 4 GB LPDDR4.
- Role: Represents the baseline for CPU-based edge inference.
- CPU: Quad-core Broadcom BCM2712 (Cortex-A76) @ 2.4 GHz.
- RAM: 4 GB LPDDR4X.
- Role: Represents the new generation of high-performance CPU edge nodes.
- CPU: 16-Core Processor.
- RAM: 128 GB.
- GPU: NVIDIA T400 (Professional Low-Profile).
- Role: Represents a professional-grade edge gateway.
- CPU: Intel Core i5 (4 Cores).
- RAM: 32 GB.
- GPU: NVIDIA GTX 1050 Ti (Consumer Legacy).
- Role: Represents the “Circular Economy” approach, utilizing repurposed consumer hardware.
5.2. Software Stack and Orchestration
- Operating System: Linux-based environments (Ubuntu 20.04/22.04 LTS for Servers/Jetson; Raspberry Pi OS Bookworm for the Raspberry Pis).
- LLM Runtime Engine: Ollama (v0.1.x) was utilized for its lightweight footprint and efficient management of the used GGUF quantized models.
- Benchmarking Agent: A custom Python script (score_llm.py) was deployed on each node. This script utilizes a subprocess module to invoke the LLM and a customized version of the BERTScore library (v0.3.13) for semantic evaluation on edge devices. It should be noted that the original BERTScore cannot run on edge nodes due to its huge memory requirements. The code for the Python script will be provided upon request from the author.
5.3. Automated Testing Pipeline
The Efficiency Paradox (75 W vs. 5 W)
- Model Pull: The target model (e.g., ibm/granite3.3:2b) is explicitly pulled to local storage so that download time is excluded from the inference metric.
- Inference Execution: The script sends a standardized prompt (“Summarize the history of artificial intelligence”) to the Ollama API.
- ○
- Telemetry Capture:
- ○
- Latency: This is captured via system timestamps (time.time()) immediately before sending the prompt and after receiving the final token (End-of-Sequence).
- Output Text: The full generated string is captured from stdout.
- Quality Assessment: The generated text is compared against a pre-defined “Gold Standard” summary using a customized BERTScore algorithm for edge devices to calculate precision, recall, and F1. The gold standard was Google’s Gemini 3 Pro model.
5.4. Limitations and Organizational Adoption
6. Results and Discussion
6.1. Hardware Speed and Latency Analysis
Impact of Thermal Throttling on Active Runtime
6.2. Semantic Accuracy (F1 Score) Stability
6.3. Holistic Assessment: The LEAF Radar Chart
6.4. Metrics Calculation Methodology
- Definition: A subjective score (0–10) representing the hardware’s sustainability status (Old/Refurbished = High, New = Low).
- Formula:
- Example (AI Server): The GTX 1050 Ti is older hardware (high reuse), assigned a raw score of 9.
- Step 1: Calculate Raw Energy (Joules).
- ○
- AI Server: (Lowest Energy).
- ○
- Jetson Nano: (Highest Energy).
- Step 2: Normalize (Inverted). Since lower energy is better, we invert the scale, so the lowest Joules gets 1.0.
- ○
- Max : 80.4 (Jetson).
- ○
- Min (): 17.2 (T400).
- ○
- AI Server Score:
- AI Server: (Fastest).
- RPi 4: (Slowest).
- AI Server Score:
- Step 1: Identify Raw F1 Scores.
- Best .
- Worst .
- Step 2: Normalize (Standard).
- RPi 4 score:
- Step 1: Identify Raw Times.
- Step 2: Normalize (Inverted). Lower time is better.
6.5. Key Observations for the LEAF Assessment Chart
7. Conclusions and Future Work
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Acknowledgments
Conflicts of Interest
Abbreviations
| AI | Artificial Intelligence |
| API | Application Programming Interface |
| ARM | Advanced RISC Machines (Processor Architecture) |
| BERT | Bidirectional Encoder Representations from Transformers |
| CNN | Convolutional Neural Network |
| CPU | Central Processing Unit |
| DL | Deep Learning |
| F1 | F-Measure (Harmonic mean of precision and recall) |
| GenAI | Generative Artificial Intelligence |
| GGUF | GPT-Generated Unified Format |
| GPU | Graphics Processing Unit |
| IoT | Internet of Things |
| LEAF | LLM Edge Assessment Framework |
| LLM | Large Language Model |
| LPDDR | Low-Power Double Data Rate (Memory) |
| LTS | Long-Term Support (Software Version) |
| MEC | Mobile Edge Computing/Multi-Access Edge Computing |
| NAS | Neural Architecture Search |
| P2P | Peer-to-Peer |
| RAM | Random Access Memory |
| RLHF | Reinforcement Learning from Human Feedback |
| RNN | Recurrent Neural Network |
| SoC | System on a Chip |
| TDP | Thermal Design Power |
| TPS | Tokens Per Second |
| TPU | Tensor Processing Unit |
| TTFT | Time To First Token |
| ViT | Vision Transformer |
References
- Huang, D.; Wang, Z. LLMs at the Edge: Performance and Efficiency Evaluation with Ollama on Diverse Hardware. In Proceedings of the 2025 International Joint Conference on Neural Networks (IJCNN), Rome, Italy, 30 June–5 July 2025. [Google Scholar] [CrossRef]
- Nezami, Z.; Hafeez, M.; Djemame, K.; Zaidi, S.A.R.; Xu, J. Descriptor: Benchmark Dataset for Generative AI on Edge Devices (BeDGED). IEEE Data Descr. 2025, 2. [Google Scholar] [CrossRef]
- Menon, S.; Addula, S.R.; Parkavi, A.; Subbalakshmi, C.; Dhandayuthapani, V.B.; Pokkuluri, K.S.; Soni, A. Streamlining Task Planning Systems for Improved Enactment in Contemporary Computing Surroundings. SN Comput. Sci. 2024, 5, 993. [Google Scholar] [CrossRef]
- Dettmers, T.; Lewis, M.; Shleifer, S.; Zettlemoyer, L. 8-bit Optimizers via Block-wise Quantization. arXiv 2022. [Google Scholar] [CrossRef]
- Lin, J.; Tang, J.; Tang, H.; Yang, S.; Chen, W.-M.; Wang, W.-C.; Xiao, G.; Dang, X.; Gan, C.; Han, S. AWQ: Activation-aware Weight Quantization for LLM Compression and Acceleration. arXiv 2024. [Google Scholar] [CrossRef]
- Cai, H.; Gan, C.; Wang, T.; Zhang, Z.; Han, S. Once-for-All: Train One Network and Specialize it for Efficient Deployment. arXiv 2020. [Google Scholar] [CrossRef]
- Adnyana, K.; Schwung, A. Benchmarking and validation of prompting techniques for AI-assisted industrial PLC programming. Mach. Learn. Appl. 2026, 23, 100804. [Google Scholar] [CrossRef]
- Al-Masri, E.; Subramanian, I.N. SOLAR: Illuminating LLM performance in API discovery and service ranking for edge AI and IoT. Internet Things 2025, 32, 101630. [Google Scholar] [CrossRef]
- Alasmari, S.M.; Sakly, H.; Kraiem, N.; Algarni, A. Phishing detection in IoT: An integrated CNN-LSTM framework with explainable AI and LLM-enhanced analysis. Discov. Internet Things 2025, 5, 102. [Google Scholar] [CrossRef]
- Alsuwaiket, M.A. ZeroDay-LLM: A Large Language Model Framework for Zero-Day Threat Detection in Cybersecurity. Information 2025, 16, 939. [Google Scholar] [CrossRef]
- Ataman, D.; Birch, A.; Habash, N.; Federico, M.; Koehn, P.; Cho, K.; Ataman, D.; Birch, A.; Habash, N.; Federico, M.; et al. Machine Translation in the Era of Large Language Models: A Survey of Historical and Emerging Problems. Information 2025, 16, 723. [Google Scholar] [CrossRef]
- Baller, S.P.; Jindal, A.; Chadha, M.; Gerndt, M. DeepEdgeBench: Benchmarking Deep Neural Networks on Edge Devices. arXiv 2021. [Google Scholar] [CrossRef]
- Bao, R.; Xue, N.; Sun, Y.; Chen, Z. Dynamic Quality-Latency Aware Routing for LLM Inference in Wireless Edge-Device Networks. In Proceedings of the 2025 IEEE/CIC International Conference on Communications in China (ICCC Workshops), Shanghai, China, 10–13 August 2025; pp. 1–6. [Google Scholar] [CrossRef]
- Benmeziane, H.; Maghraoui, K.E. Are Large Language Models Good Neural Architecture Generators for Edge? In Proceedings of the 2024 IEEE International Conference on Edge Computing and Communications (EDGE), Shenzhen, China, 7–13 July 2024; pp. 162–165. [Google Scholar] [CrossRef]
- Çetinkaya, A. A Systems Approach to Validating Large Language Model Information Extraction: The Learnability Framework Applied to Historical Legal Texts. Information 2025, 16, 960. [Google Scholar] [CrossRef]
- Chakraborty, S.; Chowdhury, R.; Shuvo, S.R.; Chatterjee, R.; Roy, S. A scalable framework for evaluating multiple language models through cross-domain generation and hallucination detection. Sci. Rep. 2025, 15, 29981. [Google Scholar] [CrossRef] [PubMed]
- Chen, P.; Li, Z. Length Instruction Fine-Tuning with Chain-of-Thought (LIFT-COT): Enhancing Length Control and Reasoning in Edge-Deployed Large Language Models. Electronics 2025, 14, 1662. [Google Scholar] [CrossRef]
- Han, S.; Mao, H.; Dally, W.J. Deep Compression: Compressing Deep Neural Networks with Pruning, Trained Quantization and Huffman Coding. arXiv 2016. [Google Scholar] [CrossRef]
- Hao, T.; Hwang, K.; Zhan, J.; Li, Y.; Cao, Y. Scenario-Based AI Benchmark Evaluation of Distributed Cloud/Edge Computing Systems. IEEE Trans. Comput. 2023, 72, 719–731. [Google Scholar] [CrossRef]
- Jain, A.M.; Jain, A. Scaling LLM Inference Architectures: A Performance Analysis for Chatbot Applications. In Proceedings of the 2025 6th International Conference on Artificial Intelligence, Robotics and Control (AIRC), Savannah, GA, USA, 7–9 May 2025; pp. 8–16. [Google Scholar] [CrossRef]
- Jebli, A.; Fourati, R.; Drira, F. Resource Management and Security Challenges for Deploying and Adapting Large Language Models in Fog Computing. In Proceedings of the 2025 IEEE 9th Forum on Research and Technologies for Society and Industry (RTSI), Tunis, Tunisia, 24–26 August 2025; pp. 174–179. [Google Scholar] [CrossRef]
- Jin, X.; Katsis, C.; Sang, F.; Sun, J.; Kundu, A.; Kompella, R. Edge Security: Challenges and Issues. arXiv 2022. [Google Scholar] [CrossRef]
- Kim, J.-H.; Choi, Y.-S. Lightweight Pre-Trained Korean Language Model Based on Knowledge Distillation and Low-Rank Factorization. Entropy 2025, 27, 379. [Google Scholar] [CrossRef]
- Kohli, P.; Jayanth, R.; Gupta, N.; Fan, H.; Prasanna, V. Performance-Energy Characterization of ML Inference on Heterogeneous Edge AI Platforms. In Proceedings of the 2025 IEEE High Performance Extreme Computing Conference (HPEC), Virtual, 15–19 September 2025; pp. 1–7. [Google Scholar] [CrossRef]
- Krishnamurthy, B.; Shiva, S.G.; Krishnamurthy, B.; Shiva, S.G. Scalable Resource Provisioning Framework for Fog Computing Using LLM-Guided Q-Learning Approach. Algorithms 2025, 18, 230. [Google Scholar] [CrossRef]
- Lee, H.; Kang, P. Performance Evaluation of Modern GPU Accelerator-Based Edge Systems: A Holistic Approach. IEEE Internet Things J. 2025, 12, 51716–51729. [Google Scholar] [CrossRef]
- Li, R.; Fu, D.; Shi, C.; Huang, Z.; Lu, G. Efficient LLMs Training and Inference: An Introduction. IEEE Access 2025, 13, 32944–32970. [Google Scholar] [CrossRef]
- Liu, S.; Chen, L.; Yan, J.; Jiang, Y.; Wang, X.; Li, X.; Yang, Q. When DeepSeek-R1 meets financial applications: Benchmarking, opportunities, and limitations. Front. Inf. Technol. Electron. Eng. 2025, 26, 1862–1870. [Google Scholar] [CrossRef]
- Liu, Y.; Sun, Q.; Kapadia, D.R.; Liu, Y.; Sun, Q.; Kapadia, D.R. Integrating Large Language Models into Robotic Autonomy: A Review of Motion, Voice, and Training Pipelines. AI 2025, 6, 158. [Google Scholar] [CrossRef]
- Liu, Z.; Guo, P.; Wang, P. LLMSwitchBench: A New Edge-Cloud Routing Benchmark for Smart Home LLM Inference. IEEE Access 2025. [Google Scholar] [CrossRef]
- Minott, D.; Siddiqui, S.; Haddad, R.J. Benchmarking Edge AI Platforms: Performance Analysis of NVIDIA Jetson and Raspberry Pi 5 with Coral TPU. In Proceedings of the SoutheastCon 2025, Concord, NC, USA, 22–30 March 2025; pp. 1384–1389. [Google Scholar] [CrossRef]
- Nezami, Z.; Hafeez, M.; Djemame, K.; Zaidi, S.A.R. Generative AI on the Edge: Architecture and Performance Evaluation. In Proceedings of the ICC 2025-IEEE International Conference on Communications, Montreal, QC, Canada, 8–12 June 2025; pp. 4595–4602. [Google Scholar] [CrossRef]
- Pozi, M.S.M.; Sato, Y. A data-augmented model routing framework for efficient LLM deployment in edge–cloud environments. J. Supercomput. 2025, 81, 1573. [Google Scholar] [CrossRef]
- Ranjan, N.; Savakis, A. Mix-QViT: Mixed-Precision Vision Transformer Quantization Driven by Layer Importance and Quantization Sensitivity. arXiv 2025. [Google Scholar] [CrossRef]
- Ray, P.P.; Pradhan, M.P. P2PLLMEdge: Peer-to-Peer Framework for Localized Large Language Models using CPU only Resource-Constrained Edge. EAI Endorsed Trans. AI Robot. 2025, 4, 1–27. [Google Scholar] [CrossRef]
- Ren, J.; Wang, C.; Zhong, Y.; Cao, S.; Zheng, D.; Cao, X. Towards Expert Models Deployment Cost Optimization in Edge Computing Networks. In Proceedings of the ICC 2025-IEEE International Conference on Communications, Montreal, QC, Canada, 8–12 June 2025; pp. 838–843. [Google Scholar] [CrossRef]
- Saha, H.; Bhattacharya, D.; Dutta, S.; Bera, A.; Basuray, S.; Changdar, S.; Banerjee, S.; Turdiev, J. Transforming Healthcare with State-of-the-Art Medical-LLMs: A Comprehensive Evaluation of Current Advances Using Benchmarking Framework. Comput. Mater. Contin. 2025, 86, 1–56. [Google Scholar] [CrossRef]
- Shaikh, T.A.; Rasool, T.; Veningston, K.; Yaseen, S.M. The role of large language models in agriculture: Harvesting the future with LLM intelligence. Prog. Artif. Intell. 2025, 14, 117–164. [Google Scholar] [CrossRef]
- Sun, M.; Hou, J.; Qiu, K.; Wang, K.; Chu, X.; Zhang, Z. LLM-based Task Offloading and Resource Allocation in Satellite Edge Computing Networks. IEEE Trans. Veh. Technol. 2025, 74, 1–6. [Google Scholar] [CrossRef]
- Sun, Y.; Liu, J.; Xiong, G.; Song, Q.; Liu, J.; Wang, G.; Wang, R. Towards Trusted 6G Mobile Edge Computing: A Secure Batch Large Language Models Deployment Framework. IEEE Trans. Mob. Comput. 2025, 25, 3328–3346. [Google Scholar] [CrossRef]
- Thapa, S.; Shiwakoti, S.; Shah, S.B.; Adhikari, S.; Veeramani, H.; Nasim, M.; Naseem, U. Large language models (LLM) in computational social science: Prospects, current state, and challenges. Soc. Netw. Anal. Min. 2025, 15, 4. [Google Scholar] [CrossRef]
- Wang, J.; Wu, Y.; Xiong, X.; Zhang, Y.; Lyu, Z.; Ghoneim, A.; Zhao, H. FedLMA: A Federated Learning Framework Integrating LLM-Based Multi-Agent Reasoning With Knowledge Distillation. IEEE Trans. Consum. Electron. 2025, 71, 11339–11349. [Google Scholar] [CrossRef]
- Wang, R.; Gao, Z.; Zhang, L.; Yue, S.; Gao, Z. Empowering large language models to edge intelligence: A survey of edge efficient LLMs and techniques. Comput. Sci. Rev. 2025, 57, 100755. [Google Scholar] [CrossRef]
- Yang, H.; Liu, H.; Yuan, X.; Wu, K.; Ni, W.; Zhang, J.A.; Liu, R.P. Synergizing Intelligence and Privacy: A Review of Integrating Internet of Things, Large Language Models, and Federated Learning in Advanced Networked Systems. Appl. Sci. 2025, 15, 6587. [Google Scholar] [CrossRef]
- Yin, C.; Mao, Y.; He, Z.; Chen, M.; He, X.; Rong, Y.; Yin, C.; Mao, Y.; He, Z.; Chen, M.; et al. Edge Computing-Enabled Secure Forecasting Nationwide Industry PM2.5 with LLM in the Heterogeneous Network. Electronics 2024, 13, 2581. [Google Scholar] [CrossRef]
- Yuan, X.; Li, H.; Yuan, X.; Li, H. LLM-Driven Offloading Decisions for Edge Object Detection in Smart City Deployments. Smart Cities 2025, 8, 169. [Google Scholar] [CrossRef]
- Zhang, W.; Wei, Q.; Chen, H.; Wang, Y. Automated detection of contradictions in 5G network specifications using reinforcement learning-trained small LLM. EURASIP J. Wirel. Commun. Netw. 2025, 2025, 85. [Google Scholar] [CrossRef]
- Zhang, X.; Nie, J.; Huang, Y.; Xie, G.; Xiong, Z.; Liu, J.; Niyato, D.; Shen, X. Beyond the Cloud: Edge Inference for Generative Large Language Models in Wireless Networks. IEEE Trans. Wirel. Commun. 2025, 24, 643–658. [Google Scholar] [CrossRef]
- Zhang, Z.; Fu, Y.; Li, J.; Ma, S.L.; Sham, C.-W. Enhancing Synthesis Efficiency in HLS through LLM-Based Automated Code Correction. In Proceedings of the 2025 IEEE 14th Global Conference on Consumer Electronics (GCCE), Osaka, Japan, 23–26 September 2025; pp. 382–384. [Google Scholar] [CrossRef]
- Zhu, F.; Huang, F.; Yu, Y.; Liu, G.; Huang, T.; Zhu, F.; Huang, F.; Yu, Y.; Liu, G.; Huang, T. Task Offloading with LLM-Enhanced Multi-Agent Reinforcement Learning in UAV-Assisted Edge Computing. Sensors 2024, 25, 175. [Google Scholar] [CrossRef]
- Surianarayanan, C.; Lawrence, J.J.; Chelliah, P.R.; Prakash, E.; Hewage, C.; Surianarayanan, C.; Lawrence, J.J.; Chelliah, P.R.; Prakash, E.; Hewage, C. A Survey on Optimization Techniques for Edge Artificial Intelligence (AI). Sensors 2023, 23, 1279. [Google Scholar] [CrossRef]
- Stadnicka, D.; Sęp, J.; Amadio, R.; Mazzei, D.; Tyrovolas, M.; Stylios, C.; Carreras-Coch, A.; Merino, J.A.; Żabiński, T.; Navarro, J.; et al. Industrial Needs in the Fields of Artificial Intelligence, Internet of Things and Edge Computing. Sensors 2022, 22, 4501. [Google Scholar] [CrossRef]
- Rupanetti, D.; Kaabouch, N.; Rupanetti, D.; Kaabouch, N. Combining Edge Computing-Assisted Internet of Things Security with Artificial Intelligence: Applications, Challenges, and Opportunities. Appl. Sci. 2024, 14, 7104. [Google Scholar] [CrossRef]
- Liang, Y.; Bi, X.; Shen, R.; He, Z.; Wang, Y.; Xu, J.; Zhang, Y.; Fan, X.; Liang, Y.; Bi, X.; et al. When Mathematical Methods Meet Artificial Intelligence and Mobile Edge Computing. Mathematics 2025, 13, 1779. [Google Scholar] [CrossRef]
- Lawal, O.; Shajihan, S.A.V.; Mechitov, K.; Billie, F.; Spencer, J.; Lawal, O.; Shajihan, S.A.V.; Mechitov, K.; Billie, F.; Spencer, J. Edge Integration of Artificial Intelligence into Wireless Smart Sensor Platforms for Railroad Bridge Impact Detection. Sensors 2024, 24, 5633. [Google Scholar] [CrossRef]
- Gültekin, Ö.; Cinar, E.; Özkan, K.; Yazıcı, A.; Gültekin, Ö.; Cinar, E.; Özkan, K.; Yazıcı, A. Real-Time Fault Detection and Condition Monitoring for Industrial Autonomous Transfer Vehicles Utilizing Edge Artificial Intelligence. Sensors 2022, 22, 3208. [Google Scholar] [CrossRef]
- Chen, Y.; Wu, C.; Sui, R.; Zhang, J.; Chen, Y.; Wu, C.; Sui, R.; Zhang, J. Feasibility Study of Edge Computing Empowered by Artificial Intelligence—A Quantitative Analysis Based on Large Models. Big Data Cogn. Comput. 2024, 8, 94. [Google Scholar] [CrossRef]
- Bourechak, A.; Zedadra, O.; Kouahla, M.N.; Guerrieri, A.; Seridi, H.; Fortino, G.; Bourechak, A.; Zedadra, O.; Kouahla, M.N.; Guerrieri, A.; et al. At the Confluence of Artificial Intelligence and Edge Computing in IoT-Based Applications: A Review and New Perspectives. Sensors 2023, 23, 1639. [Google Scholar] [CrossRef]
- Abdulkadhim, M.; Repas, S.R. SHEAB: A Novel Automated Benchmarking Framework for Edge AI. Technologies 2025, 13, 515. [Google Scholar] [CrossRef]





| Hardware Class | Device Node | Component TDP | Est. Total System Power (Load) | Inference Time (Avg) | Total Energy Cost (System) |
|---|---|---|---|---|---|
| Embedded | Raspberry Pi 4 | ~4 W (SoC) | ~6 W | 10.96 s | 65.7 J |
| Embedded | Raspberry Pi 5 | ~7 W (SoC) | ~9 W | 2.20 s | 19.8 J |
| Workstation | T400 Server | 30 W (GPU) | ~65 W | 0.43 s | 27.9 J |
| Circular | GTX 1050 Ti | 75 W (GPU) | ~110 W | 0.29 s | 31.9 J |
| Legacy | Jetson Nano | 10 W (Module) | ~12 W | 8.04 s | 96.4 J |
| Device | Avg Time (t) | Avg F1 (f) | Est. Power (P) | Circular Eco (C) [Assigned] |
|---|---|---|---|---|
| Jetson Nano | 8.04 s | 0.758 | 10 W | 8 (High Reuse) |
| RPi 4 | 10.96 s | 0.764 | 5 W | 9 (High Reuse) |
| RPi 5 | 2.20 s | 0.745 | 8 W | 4 (New HW) |
| Server (T400) | 0.43 s | 0.748 | 40 W | 3 (New HW) |
| AI Server (1050 Ti) | 0.29 s | 0.759 | 100 W | 9 (High Reuse) |
| Device | Circular Eco | Energy Eff. | Speed | Accuracy | Proc. Time |
|---|---|---|---|---|---|
| Jetson | 0.8 | 0.00 | 0.01 | 0.68 | 0.27 |
| RPi 4 | 0.9 | 0.41 | 0 | 1.00 | 0 |
| RPi 5 | 0.4 | 0.99 | 0.11 | 0 | 0.82 |
| T400 | 0.3 | 1.00 | 0.67 | 0.16 | 0.99 |
| AI Server | 0.9 | 0.81 | 1 | 0.74 | 1 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2026 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license.
Share and Cite
Abdulkadhim, M.; Repas, S.R. Introducing LEAF: LLM Edge Assessment Framework for Generative AI on the Edge. Mach. Learn. Knowl. Extr. 2026, 8, 48. https://doi.org/10.3390/make8020048
Abdulkadhim M, Repas SR. Introducing LEAF: LLM Edge Assessment Framework for Generative AI on the Edge. Machine Learning and Knowledge Extraction. 2026; 8(2):48. https://doi.org/10.3390/make8020048
Chicago/Turabian StyleAbdulkadhim, Mustafa, and Sandor R. Repas. 2026. "Introducing LEAF: LLM Edge Assessment Framework for Generative AI on the Edge" Machine Learning and Knowledge Extraction 8, no. 2: 48. https://doi.org/10.3390/make8020048
APA StyleAbdulkadhim, M., & Repas, S. R. (2026). Introducing LEAF: LLM Edge Assessment Framework for Generative AI on the Edge. Machine Learning and Knowledge Extraction, 8(2), 48. https://doi.org/10.3390/make8020048

