Next Article in Journal
Challenges and Responsibilities in Service-Based Sustainable Fashion Retail: Insights and Guidelines from a Qualitative Study
Previous Article in Journal
Sustainable Culinary Tourism Pathways in the Baltic Sea Region: A Comparative Perspective
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
This is an early access version, the complete PDF, HTML, and XML versions will be available soon.
Article

A Deployment-Aware Framework for Carbon- and Water- Efficient LLM Serving

1
College of Engineering and Technology, American University of the Middle East, Egaila 54200, Kuwait
2
College of Business Administration, American University of the Middle East, Egaila 54200, Kuwait
*
Author to whom correspondence should be addressed.
Sustainability 2025, 17(23), 10473; https://doi.org/10.3390/su172310473 (registering DOI)
Submission received: 10 October 2025 / Revised: 15 November 2025 / Accepted: 18 November 2025 / Published: 22 November 2025

Abstract

Inference now dominates the lifecycle footprint of large language models, yet published estimates often use inconsistent boundaries and optimize carbon while ignoring water. We present a provider-agnostic framework that unifies scope-transparent measurement with time-resolved, SLO-aware orchestration and jointly optimizes carbon and consumptive water. Measurement reports daily medians at a comprehensive serving boundary that includes accelerators, host CPU/DRAM, provisioned idle, and PUE uplift, and provides accelerator-only whiskers for reconciliation. Optimization uses a mixed-integer linear program solved over five-minute windows; it selects region, batch size, and phase-aware hardware for prefill and decode while enforcing p95 TTFT and TPOT as well as capacity constraints. Applied to four representative models, a single SLO-aware policy reduces comprehensive-boundary medians by 57 to 59 percent for energy, 59 to 60 percent for water, and 78 to 80 percent for location-based CO2 , with SLOs met in every window. For a day with 500 million queries on GPT-4o, totals fall from 0.344 to 0.145 GWh, 1.196 to 0.490 ML, and 121 to 25 t CO2 (location-based). The framework offers a deployable template for carbon- and water-aware LLM serving with auditable and scope-transparent reporting.
Keywords: LLM inference; carbon-aware routing; water-aware routing; service-level objectives (SLO); mixed-integer linear programming (MILP); power usage effectiveness (PUE); Water Usage Effectiveness (WUE) LLM inference; carbon-aware routing; water-aware routing; service-level objectives (SLO); mixed-integer linear programming (MILP); power usage effectiveness (PUE); Water Usage Effectiveness (WUE)

Share and Cite

MDPI and ACS Style

Hoxha, J.; Thanasi-Boçe, M.; Khalifa, T. A Deployment-Aware Framework for Carbon- and Water- Efficient LLM Serving. Sustainability 2025, 17, 10473. https://doi.org/10.3390/su172310473

AMA Style

Hoxha J, Thanasi-Boçe M, Khalifa T. A Deployment-Aware Framework for Carbon- and Water- Efficient LLM Serving. Sustainability. 2025; 17(23):10473. https://doi.org/10.3390/su172310473

Chicago/Turabian Style

Hoxha, Julian, Marsela Thanasi-Boçe, and Tarek Khalifa. 2025. "A Deployment-Aware Framework for Carbon- and Water- Efficient LLM Serving" Sustainability 17, no. 23: 10473. https://doi.org/10.3390/su172310473

APA Style

Hoxha, J., Thanasi-Boçe, M., & Khalifa, T. (2025). A Deployment-Aware Framework for Carbon- and Water- Efficient LLM Serving. Sustainability, 17(23), 10473. https://doi.org/10.3390/su172310473

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Article metric data becomes available approximately 24 hours after publication online.
Back to TopTop