This is an early access version, the complete PDF, HTML, and XML versions will be available soon.
Open AccessArticle
A Deployment-Aware Framework for Carbon- and Water- Efficient LLM Serving
by
Julian Hoxha
Julian Hoxha 1,*
,
Marsela Thanasi-Boçe
Marsela Thanasi-Boçe 2
and
Tarek Khalifa
Tarek Khalifa 1
1
College of Engineering and Technology, American University of the Middle East, Egaila 54200, Kuwait
2
College of Business Administration, American University of the Middle East, Egaila 54200, Kuwait
*
Author to whom correspondence should be addressed.
Sustainability 2025, 17(23), 10473; https://doi.org/10.3390/su172310473 (registering DOI)
Submission received: 10 October 2025
/
Revised: 15 November 2025
/
Accepted: 18 November 2025
/
Published: 22 November 2025
Abstract
Inference now dominates the lifecycle footprint of large language models, yet published estimates often use inconsistent boundaries and optimize carbon while ignoring water. We present a provider-agnostic framework that unifies scope-transparent measurement with time-resolved, SLO-aware orchestration and jointly optimizes carbon and consumptive water. Measurement reports daily medians at a comprehensive serving boundary that includes accelerators, host CPU/DRAM, provisioned idle, and PUE uplift, and provides accelerator-only whiskers for reconciliation. Optimization uses a mixed-integer linear program solved over five-minute windows; it selects region, batch size, and phase-aware hardware for prefill and decode while enforcing TTFT and TPOT as well as capacity constraints. Applied to four representative models, a single SLO-aware policy reduces comprehensive-boundary medians by 57 to 59 percent for energy, 59 to 60 percent for water, and 78 to 80 percent for location-based CO , with SLOs met in every window. For a day with 500 million queries on GPT-4o, totals fall from 0.344 to 0.145 GWh, 1.196 to 0.490 ML, and 121 to 25 t CO (location-based). The framework offers a deployable template for carbon- and water-aware LLM serving with auditable and scope-transparent reporting.
Share and Cite
MDPI and ACS Style
Hoxha, J.; Thanasi-Boçe, M.; Khalifa, T.
A Deployment-Aware Framework for Carbon- and Water- Efficient LLM Serving. Sustainability 2025, 17, 10473.
https://doi.org/10.3390/su172310473
AMA Style
Hoxha J, Thanasi-Boçe M, Khalifa T.
A Deployment-Aware Framework for Carbon- and Water- Efficient LLM Serving. Sustainability. 2025; 17(23):10473.
https://doi.org/10.3390/su172310473
Chicago/Turabian Style
Hoxha, Julian, Marsela Thanasi-Boçe, and Tarek Khalifa.
2025. "A Deployment-Aware Framework for Carbon- and Water- Efficient LLM Serving" Sustainability 17, no. 23: 10473.
https://doi.org/10.3390/su172310473
APA Style
Hoxha, J., Thanasi-Boçe, M., & Khalifa, T.
(2025). A Deployment-Aware Framework for Carbon- and Water- Efficient LLM Serving. Sustainability, 17(23), 10473.
https://doi.org/10.3390/su172310473
Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details
here.
Article Metrics
Article metric data becomes available approximately 24 hours after publication online.