Dynamic Water-Energy-Carbon Trade-Off Optimization for Heavy Industry Decarbonization via Deep Reinforcement Learning: A UK Case Study

Hassan, M.; Rasheed, M. B.; Khan, Inam Ullah; Gamage, K. A. A.

doi:10.3390/w18091112

Open AccessArticle

Dynamic Water-Energy-Carbon Trade-Off Optimization for Heavy Industry Decarbonization via Deep Reinforcement Learning: A UK Case Study

¹

Bestway Pvt. Ltd., Park Royal London, London NW10 7BW, UK

²

Hattar Plant, Haripur 22600, Pakistan

³

College of Science and Engineering, University of Derby, Derby DE22 3AW, UK

⁴

Lyle School of Engineering, Southern Methodist University, Dallas, TX 75205, USA

⁵

James Watt School of Engineering, James Watt South Building, University of Glasgow, Glasgow G12 8QQ, UK

^*

Author to whom correspondence should be addressed.

Water 2026, 18(9), 1112; https://doi.org/10.3390/w18091112

Submission received: 4 April 2026 / Revised: 28 April 2026 / Accepted: 29 April 2026 / Published: 6 May 2026

(This article belongs to the Section Water-Energy Nexus)

Download

Browse Figures

Review Reports Versions Notes

Abstract

In recent years, the industrial decarbonization in the cement sector has introduced secondary environmental impact due to an increase in power and water demand. Deploying carbon capture, utilization, and distributed storage requires an uninterrupted supply of power and water to achieve net-zero targets. However, the traditional static optimization algorithms seem insufficient in addressing the high-frequency and dynamic renewable networks. To overcome these issues, this work develops a dynamic water-energy-carbon trade-off optimization model for industrial decarbonization, with the deployment of Carbon Capture, Utilization, and Storage system in the cement sector within a United Kingdom industrial cluster. The key objective is to quantify and control the secondary burden that low-carbon interventions can impose on electricity systems and local water resources. Firstly, the Water-Energy-Carbon problem is treated as a tri-lemma, which is formulated as a continuous Markov Decision Process. Then the optimization problem is solved via a Soft Actor-Critic Deep Reinforcement Learning algorithm under coupled and resource-constrained abstraction inputs. This work further introduces the Water-Carbon Mitigation Penalty Index as a diagnostic metric for measuring the marginal increase in water burden associated with carbon mitigation. The results show that unmanaged distributed carbon-mitigation pathways increase local hydrological stress by 2.15–5.17% relative to baseline operating conditions. Although the proposed algorithm successfully reduces the nexus cost by up to 70.5% and achieves 13.83% carbon reduction by shifting from freshwater abstraction to reclaimed municipal wastewater and by coordinating operation with low-carbon hydropower availability. These results show that dynamic AI-based scheduling can support net-zero transitions while reducing pressure on regional hydro-ecological systems.

Keywords:

deep reinforcement learning; Water-Energy-Carbon Nexus; CCUS; mathematical optimization; Soft Actor-Critic; industrial decarbonization

1. Introduction

Due to the recent increase in global warming driven by massive fossil fuel consumption, the transition to a net-zero economy requires decarbonization across all sectors, including transportation, power generation, industrial processes, and particularly cement production [1]. Where the cement production is considered to remain central to the short & long-term development of future smart cities. Renewable energy generation & battery storage technologies (e.g., Li-ion, fuel cell) have advanced rapidly. In contrast, decarbonizing heavy industries, such as cement manufacturing, poses a major challenge [2]. Because the cement industry alone is responsible for approximately 7–8% of global C

O_{2}

emissions. Therefore, achieving the UK government’s net-zero targets [3] primarily relies on alternative fuels [2], the optimal utilization of available energy resources, and the deployment of Carbon Capture, Utilization and Storage (CCUS). However, CCUS integration can introduce severe secondary impacts within the Water-Energy-Carbon (WEC) nexus.

This problem can be viewed as a tri-lemma of industrial decarbonization, as mitigation technologies create competing resource demands. While the post-combustion & amine-based CCUS usually requires a comparatively low-grade thermal energy for regeneration of solvent, a massive amount of energy for compression, and large volumes of water for cooling to manage exothermic absorption reactions [4]. In addition, installing CCUS can eventually increase the energy demand of a standard cement plant by more than 50% and raw-water consumption by up to 90% [2]. Consequently, as stack-level carbon emissions decline, the wider system can experience greater pressure on the electricity supply and water resources. This shift is due to the additional power needed for capture and compression, which must be supplied by the regional grid, while the added cooling demand increases water abstraction or treatment requirements. In practice, part of the carbon burden is therefore transferred upstream to the supporting power system, and part of the environmental burden is transferred laterally to regional water systems [4,5]. Meeting these additional electrical loads may require dispatchable generation, including hydropower or thermal plants, which can impose significant water footprints through reservoir evaporation and and/or cooling withdrawals [5]. In water-stressed regions, meeting cooling demand through fresh surface abstraction may therefore become ecologically unsustainable, which motivates wastewater reuse and energy-intensive water reclamation.

It is also worth noting that CCUS is not the only route for minimizing C

O_{2}

emissions in the cement industry. In constrast, other main pathwasy to mitigate may include; (i) reducing the clinker-to-cement ratio via supplementary cementitious materials, (ii) increasing the integration of alternative and low-carbon fuels, (iii) improving kiln and grinding efficiency, (iv) deploying waste-heat recovery, (v) electrifying the process steps, and (vi) lowering the material demand through comparatively more efficient design and construction mechanisms [6,7]. The abovementioned options can significantly reduce the fuel-related emissions and the embodied carbon of cement products. Nevertheless, since a large share of cement emissions arises from limestone calcination, deep-decarbonization pathways could still identify carbon capture as important for addressing residual process emissions [6,7]. For this purpose, this work focuses on CCUS within a holistic portfolio of cement-sector mitigation mechanisms rather than treating it as the only decarbonization option.

The background literature on WEC-nexus covers multiple methodological solutions and applications in broader geographical contexts. From a methodological perspective, various background studies rely on either deterministic or mixed-integer optimization to specifically model the coupled resource flows regarding energy hubs & related infrastructures. For instance, Salimi et al. [8] demonstrated an interconnected natural-gas & electricity hubs by employing deterministic energy-hub optimization techniques. Similarly, Roustaei et al. [9] extended this concept via hub-based planning toward more integrated smart energy-water systems. Stochastic extensions were then introduced to explicitly capture uncertainty. Ji et al. [10], for instance, developed a two-stage fuzzy stochastic model for water-supply management from a water–energy nexus perspective, showing how uncertainty in supply and demand alters resource-allocation decisions. In parallel, sector-specific environmental studies examined the nexus from a footprint perspective rather than an operational-control perspective. Bakken et al. [5] assessed the water footprint of hydropower production at broader regional and methodological scales, whereas Rosa et al. [4] focused on the water implications of carbon-capture systems.

The extensive background literature shows pronounced fragmentation across both thematic and geographic contexts. Where investigations have been carried out at utility-level water infrastructure scale [5], multi-energy hub architectures, and technologies for individual low-carbon systems, such as hydropower or post-combustion capture [4]. However, these threads will remain largely un-integrated into a unified & integrated industrial decision-making framework. In addition, another structural gap in terms of application contexts, such as the importance of publishing the models, is also validated against generic test networks and/or planning scenarios rather than against the operational contexts of hard-to-abate industrial sectors in regionally constrained settings. Moreover, advanced data-driven control techniques developed recently have also been effective in addressing the rigidity of deterministic models. For instance, Zhang et al. [11] develop a cooperative multi-agent deep reinforcement learning algorithm to schedule distributed and variable energy resources in energy-hub environments, which operates under dynamic exogenous conditions. However, such models remain mainly situated within power-system and generic hub domains such as the coupled electricity, water, & carbon interdependencies, specifically arising from CCUS integration in the cement manufacturing, which remain outside their scope. Accordingly, this research gap is both contextual & methodological, where no similar frameworks are cohesively associated with the real-time dynamic control, CCUS-induced parasitic loads, water-sourcing flexibility, and region-specific regulatory constraints in a single, operationally grounded WEC-nexus model.

To address this gap, we develop a multi-objective DRL framework. Unlike traditional input-output models solved with static nonlinear programming, we formulate dynamic coupling as a Markov Decision Process (MDP) and solve it using the Soft Actor-Critic (SAC) algorithm. We also propose the Water-Carbon Mitigation Penalty Index (WCMPI) (see Section 3.1) as a new study-specific metric for quantifying the trade-off cost of atmospheric mitigation, building on the broader literature on water footprints, carbon footprints, and CCUS-related environmental trade-offs [4,12]. The framework is then evaluated in a detailed United Kingdom (UK) case study.

2. Research Motivation, Gap and Proposed Framework

While the necessity of CCUS deployment in cement production is well established [1,2], less attention has been given to the secondary environmental costs it induces. Prioritizing carbon reduction without a systems-level perspective introduces a coupled Water-Energy-Carbon tri-lemma [4,12]: expanded cooling-water demand and additional electricity requirements that can transfer environmental pressure from atmospheric C

O_{2}

to regional hydro-ecological stress and grid instability [4,5]. This burden is especially acute in water-stressed industrial regions such as the GCC, where uncoordinated CCUS rollout risks compounding, rather than resolving, environmental trade-offs. This study is therefore motivated by the need for a dynamic, multi-objective scheduling framework capable of jointly managing these interdependencies.

Current WEC-nexus studies mainly use static Energy/Water Hub formulations solved through deterministic nonlinear programming [5,8,9]. Although these methods are useful for structured planning, they are less suitable for real-time decision-making under deep uncertainty. Two limitations are especially critical:

High-frequency volatility gap: Static formulations do not adequately represent intra-day stochasticity in renewable generation and grid carbon-intensity signals, both of which strongly influence operationally optimal dispatch in practice [3,10].
Constraint brittleness under coupled objectives: Traditional solvers can become brittle when strict hydrological limits, nonlinear process penalties, and multiple environmental objectives are enforced simultaneously [12,13]. In practice, this can lead either to non-convergent schedules or to static baseload recommendations that appear optimal in aggregate but are ecologically harmful during stressed periods such as drought events.

Proposed Framework and Contributions

To overcome these limitations, we replace static scheduling with a dynamic AI-based control architecture [12], (Figure 1). Specifically, we formulate the WEC tri-lemma as a continuous Markov Decision Process and train an off-policy Soft Actor-Critic (SAC) agent [14], while using PPO as a benchmark policy-gradient baseline [15]. This design choice is consistent with growing evidence that DRL can manage high-dimensional, stochastic energy-dispatch problems more effectively than fixed-policy heuristics [11]. Through maximum-entropy exploration, SAC learns robust policies that adapt to weather variability, carbon-intensity fluctuations, and abstraction constraints in real time. Instead of fixed baseload operation, the learned policy discovers non-intuitive strategies, such as synchronizing energy-intensive wastewater reclamation with low-carbon hydropower availability, to maintain Pareto-efficient WEC performance.

3. Methodology: The DRL-Enabled WEC Framework

This section explains the mathematical problem formulation, modeling assumptions & limits, DRL-learning architecture, sources of WEC datasets [3], and simulation pipeline, used to evaluate WEC trade-offs under deep decarbonization and high stochastic uncertainty. The model operates at hourly resolution over a 24-h time horizon and uses observed exogenous signals (e.g., renewable output and grid carbon intensity) at each decision step. The list of variables and thier definitions can be found in the Nomenclature Table 1.

3.1. The Water-Carbon Mitigation Penalty Index (WCMPI)

To quantify the system-level environmental cost of decarbonization, we propose the WCMPI as a study-specific indicator derived from the broader literature on water footprints, carbon footprints, and CCUS-related trade-offs [4,12]. It is defined here as the marginal increase in total water footprint required to achieve a unit decrease in total carbon footprint:

W C M P I (t) = \frac{Δ W F_{s y s} (t)}{| Δ C F_{s y s} (t) |} = \frac{W F_{m i t i g a t e d} (t) - W F_{b a s e l i n e} (t)}{C F_{b a s e l i n e} (t) - C F_{m i t i g a t e d} (t)}

(1)

A higher WCMPI value in Equation (1) indicates stronger water-stress penalties per unit carbon mitigation. The reinforcement-learning objective is therefore to steer operation toward Pareto-efficient regimes with lower WCMPI values.

3.2. Matrix-Based Sectoral Coupling Dynamics

The continuous resource demand of the industrial hub is driven by the amine-based capture unit. Capturing a target carbon mass (

C O_{t}^{c a p}

) from the cement kiln induces a nonlinear cascade of secondary electrical (

E_{t}^{c c u s}

) and thermal demand, which then drives cooling-water requirements (

W_{t}^{c o o l i n g}

) [2,4].

In this study, the CCUS chain is represented as a post-combustion capture system coupled to the cement plant flue-gas stream. Operationally, the process includes: (i) flue-gas capture in an amine-based absorber, (ii) solvent regeneration in a stripper using low-grade thermal energy, (iii) C

O_{2}

compression for conditioning and downstream handling, and (iv) associated cooling and auxiliary pumping requirements [2,4]. The transport and permanent storage stages are acknowledged as part of the wider CCUS chain, but the present model focuses on the on-site capture and compression stages because these dominate the immediate electricity, heat, and water interactions within the plant-level WEC nexus.

Unlike static linear input-output models, the WEC tri-lemma is represented in this study through a dynamic coupling matrix

M_{c p l}

that maps emission-control decisions to parasitic utility demand

D_{t}

, informed by the broader CCUS burden literature [2,4], as shown in Equation (2):

D_{t} = [\begin{matrix} E_{t}^{c c u s} \\ W_{t}^{c o o l i n g} \end{matrix}] = [\begin{matrix} α {(C O_{t}^{c a p})}^{β - 1} & 0 \\ c_{1} & c_{2} \end{matrix}] [\begin{matrix} C O_{t}^{c a p} \\ E_{t}^{c c u s} \end{matrix}]

(2)

where

β > 1

captures nonlinear efficiency degradation at higher capture rates, and

c_{1}

and

c_{2}

map regeneration and intercooling effects into volumetric water demand.

3.3. Physical Constraints and Nodal Balancing

To satisfy

D_{t}

, the framework enforces real-time nodal conservation across coupled electricity and water subsystems.

Electrical Energy Balance: The aggregate industrial load ( $E_{b a s e} + E_{t}^{c c u s}$ ) and water reclamation energy ( $E_{t}^{w a t e r}$ ) must be perfectly matched by dispatchable thermal plants ( $E_{t}^{t h e r m}$ ), pumped-storage hydropower ( $E_{t}^{h y d r o}$ ), and variable renewable generation ( $E_{t}^{r e n}$ ). In this study, hydropower is treated separately because it is modelled as a controllable dispatchable storage-backed source with an internal reservoir state, whereas $E_{t}^{r e n}$ represents exogenous non-dispatchable renewable availability. The renewable term specifically corresponds to offshore wind input profiles from the UK dataset used in this work [3]. This nodal balancing form follows standard energy-hub and smart water-energy hub formulations [5,8,9], as enforced by Equation (3):

$E_{t}^{h y d r o} + E_{t}^{t h e r m} + E_{t}^{r e n} - E_{t}^{w a t e r} = E_{b a s e} + E_{t}^{c c u s}$

(3)

Cooling-water provision introduces its own energy burden. The water-hub demand ( $E_{t}^{w a t e r}$ ) is defined by a specific-energy matrix $Λ$ for fresh-water abstraction and wastewater reclamation in Equation (4), consistent with water-energy nexus formulations used in prior studies [9,10]:

$E_{t}^{w a t e r} = Λ^{T} W_{s o u r c e, t} = [\begin{matrix} λ_{f r e s h} & λ_{r e c l a i m} \end{matrix}] [\begin{matrix} W_{t}^{f r e s h} \\ W_{t}^{r e c l a i m} \end{matrix}]$

(4)

where $λ_{f r e s h}$ and $λ_{r e c l a i m}$ are the specific electricity intensities of fresh-water abstraction and wastewater reclamation, respectively. Wastewater reclamation has substantially higher specific electricity demand than fresh abstraction ( $λ_{r e c l a i m} ≫ λ_{f r e s h}$ ), which increases net grid burden.
Hydrological Balance and Regulatory Bounds: Total regional cooling demands are satisfied via the source vector $W_{s o u r c e, t}$ according to Equation (5). This balance follows standard source-allocation constraints used in coupled water-energy system models [9,10].

$W_{t}^{f r e s h} + W_{t}^{r e c l a i m} \geq W_{t}^{c o o l i n g}$

(5)

To preserve hydro-ecological safety, daily fresh-water abstraction is bounded by Environment Agency (EA) limits ( $Ω_{E A_L i m i t}$ ) [13] through Equation (6):

$\sum_{τ = 1}^{24} W_{τ}^{f r e s h} \leq Ω_{E A_L i m i t}$

(6)

3.4. Proposed Mathematical Model: Continuous MDP Formalization

Because static solvers struggle with stochastic

E_{t}^{r e n}

and time-varying grid carbon intensity

C_{t}^{i n t}

under nonlinear constraints, we formulate the control problem as a continuous Markov Decision Process

(S, A, P, R)

, following the standard MDP framework used in reinforcement learning [16,17].

State Space (

S

): The observation space in Equation (7) represents the continuous physical context of the cluster at hour t. Its structure is study-specific, but it is defined within the standard MDP state formalism [16,17]:

s_{t} = [C_{t}^{i n t}, E_{t}^{r e n}, (Ω_{E A_L i m i t} - \sum_{τ = 1}^{t - 1} W_{τ}^{f r e s h}), H_{t}^{v o l}, t] \in R^{5}

(7)

where

C_{t}^{i n t}

is the grid carbon intensity,

E_{t}^{r e n}

is renewable generation availability,

H_{t}^{v o l}

denotes the available state of charge of the pumped-storage hydro reservoir, and the third term tracks the remaining daily water-abstraction allowance.

Action Space (

A

): The DRL agent controls the continuous operational dispatch of the coupled infrastructure through Equation (8). This action parameterization is study-specific, but it follows the continuous-control formulation commonly used in modern actor-critic methods [14,18]:

a_{t} = [C O_{t}^{c a p_t a r g e t}, δ_{t}^{h y d r o_d i s p a t c h}, ψ_{t}^{r e c l a i m_r a t i o}] \in {[0, 1]}^{3}

(8)

where

C O_{t}^{c a p_t a r g e t}

is the instantaneous carbon-capture setpoint, while

δ

and

ψ

are proportional control ratios for meeting the grid-energy deficit and cooling-water deficit, respectively.

Objective (Reward) Function (

R

): reward in Equation (9) combines carbon, water, and operating-cost terms with hard-constraint penalties. This reward is custom-built for the present study, while following the standard reward-based optimization structure of reinforcement learning [14,15,16]:

R_{t} = - (ω_{c} (C_{t}^{t o t a l}) + ω_{w} (W_{t}^{t o t a l}) + ϕ_{c o s t} (C_{t}^{o p_f u e l}) + Φ (s_{t}, a_{t}))

(9)

where

C_{t}^{t o t a l}

is total carbon burden,

W_{t}^{t o t a l}

is total water burden,

C_{t}^{o p_f u e l}

is operational fuel-cost burden, and

Φ (s_{t}, a_{t})

is a large penalty activated when actions violate physical constraints (e.g., hydro depletion or EA-limit breach), which restricts exploration to feasible policies. In this study, the operating-cost term is derived from energy-dispatch quantities and plant fuel use, consistent with energy-hub operating-cost formulations [8,9]. The carbon-related term is linked to grid carbon-intensity and CCUS-related emission trade-offs [1,2], while the water-related term reflects abstraction and reclamation burdens informed by water-energy nexus and CCUS water-footprint literature [4,10]. The penalty term is study-specific and encodes regulatory and feasibility violations using Environment Agency abstraction limits and system-operability bounds [13]. The weights

ω_{c}

,

ω_{w}

, and

ϕ_{c o s t}

are selected through calibration sweeps to balance emission reduction, water protection, and operational affordability.

3.5. Proposed DRL Algorithm

To solve this continuous-control problem, we evaluate two policy-gradient architectures. Figure 2 summarizes the proposed SAC-based DRL control loop used in this study, including environment interaction, replay-buffer learning, and entropy-regularized actor-critic updates [14,18].

Figure 2 illustrates the proposed closed-loop SAC learning mechanism employed in this work [14]. The control process starts from the state

s_{t}

of the system, where actions

a_{t}

are generated by the actor and executed in the constrained WEC environment to produce reward and transition outputs

(r_{t}, s_{t + 1})

. These outcomes are saved in the output block and stored in replay memory for off-policy training [14,18]. Value estimates are learned by the twin critics from replayed transitions, while entropy-regularized updates improve the actor and target networks stabilize critic learning [14]. This architecture supports stable policy improvement under coupled WEC constraints while preserving exploration and operational feasibility [11].

Proximal Policy Optimization (PPO) Baseline: As a comparative baseline, we implement PPO [15], an on-policy algorithm that relies on a clipped surrogate objective function in Equation (10) to prevent destructively large policy updates:

$L^{C L I P} (θ) = {\hat{E}}_{t} [min (r_{t} (θ) {\hat{A}}_{t}, clip (r_{t} (θ), 1 - ϵ, 1 + ϵ) {\hat{A}}_{t})]$

(10)

where $θ$ are policy parameters, $r_{t} (θ)$ is the policy probability ratio, ${\hat{A}}_{t}$ is the estimated advantage, and $ϵ$ is the clipping threshold. Although PPO is stable in many benchmark environments, its clipped on-policy updates can be less robust under steep nonlinear penalties from hard WEC constraints.
Soft Actor-Critic (SAC) Framework: Our primary method is Soft Actor-Critic (SAC) [14], an off-policy maximum-entropy formulation given in Equation (11):

$J (π) = \sum_{t = 0}^{T} E_{(s_{t}, a_{t}) \sim ρ_{π}} [r (s_{t}, a_{t}) + α H (π (\cdot | s_{t}))]$

(11)

where $ρ_{π}$ is the state-action visitation distribution under policy $π$ , and $α$ is the temperature parameter that weights entropy regularization. By maximizing policy entropy $H$ , SAC promotes robust exploration, improves adaptation to stochastic operating conditions, and reduces premature convergence to poor local minima.

4. Results and Scenario Analysis

4.1. Dataset & Modelling Methodology

For DRL training and validation, several types of real-world data were integrated into the proposed WEC simulation model [3,13]. Firstly, the energy system dataset contains the grid carbon-intensity vector (hourly) (

C_{t}^{i n t}

), offshore wind availability, and baseline thermal generation levels, which are obtained from dispatch records of the National Energy System Operator (NESO), UK Energy Research Centre (UKERC) and accessed via the cited UK dataset source [3,19]. In the proposed model, the offshore wind serves as the renewable input term

E_{t}^{r e n}

. Hydropower is treated separately from

E_{t}^{r e n}

because it is represented as a dispatchable pumped-storage resource with an explicit internal state (

H_{t}^{v o l}

) rather than as a purely exogenous renewable time series. Secondly, the water-regulation dataset includes daily freshwater abstraction limits (

Ω_{E A_L i m i t}

), derived from Environment Agency planning thresholds [13]. Third, scenario-definition inputs were constructed from these base datasets to represent drought conditions, reduced renewable availability, and high-carbon-price stress cases. These time-series inputs are used directly in the state construction, constraint enforcement, and reward evaluation of the MDP environment. During training, the above variables are repeatedly sampled with Gaussian perturbations, yielding more than 100,000 stochastic episodes and encouraging robust policies rather than static-pattern memorization.

The simulation results are obtained from solving the continuous-MDP problem, which was developed within the integrated framework described in (Section 4), where each episode covers over a 24-h operating horizon, and the decision is updated after every hour. At each time step (t), the optimization controller observes the intensity of renewable energy availability, grid carbon, remaining abstraction capacity, the time index, and the hydro state to select hydro-dispatch, capture, and the reclamation ratio action(s). The simulation results highlight the outcomes that are compared with the other baseline methods, such as PPO & SAC and a rule-based baseline across similar scenario definitions and respective constraint sets, ensuring methodological consistency. In addition, to measure cost performance, policy quantity is assessed using robustness indicators, such as vibration behavior and variance bands, to assess operational reliability under stochastic conditions.

The continuous-MDP WEC formalism was implemented in a custom environment using Python 3.14 gymnasium [17]. The governing equations were not solved using a separate symbolic or algebraic equation solver. Instead, they were evaluated numerically at each hourly step inside the custom simulation environment, where the code updates the state

s_{t}

, enforces matrix-based resource constraints (

Φ

), and computes the WCMPI-aware reward. The optimization and control policy were then learned using the SAC implementation in PyTorch 2.11.0 [20] and Stable-Baselines3 [18]. For comparative analysis, PPO was handled through the same reinforcement-learning toolchain. Training spans a broad operating-condition matrix and millions of environment interactions to identify Pareto-improving continuous dispatch policies. This pipeline links analytical WEC modeling with data-driven control to generate executable policies from high-dimensional grid and water signals.

4.2. Algorithmic Convergence and Learning Robustness

The SAC agent has been benchmarked against PPO and the static rule-based heuristic baseline method. Figure 3 demonstrates that SAC reaches a stable and low-cost (system cost thousand

f a G b p

) within approximately 80,000 training steps and maintains a smoother post-convergence behavior as compared to PPO. The results also reveal that this stability is equally important because the environment exhibits sharp penalty transitions, imposed by abstraction limits and reservoir constraints, such that, under these conditions, oscillatory policies can repeatedly enter high-penalty regions. On the other hand, the SAC also exhibits more consistent learning dynamics, further suggesting it is better suited for operational control in the proposed constrained-enabled WEC systems.

4.3. Dynamic Trade-Off Dispatch Verification

The Figure 4 and Figure 5 demonstrate the comparative analysis of 24-h dispatch profiles, reflecting how SAC tir-lemma is handled by the SAC. Across the three scenarios in Figure 4 and Figure 5, the agent avoids rigid baseload behavior and adapts to changing boundary conditions. Figure 4 shows that SAC rebalances thermal and hydropower procurement in response to both instantaneous carbon intensity and downstream water implications. Figure 5 further shows coordinated source switching between fresh abstraction and reclaimed water, demonstrating that energy and water decisions are co-optimized rather than handled sequentially.

Under stringent environmental restrictions (drought scenario), the agent substantially decouples operations from fresh-river abstraction. It discovers a strategy that schedules energy-intensive municipal wastewater reclamation during periods of high hydropower availability, minimizing WCMPI (Figure 6) despite high aggregate throughput. Operationally, this means the policy does not simply minimize carbon at each instant; instead, it redistributes mitigation effort over time to avoid simultaneous stress on the grid and hydrological system.

4.4. Scenario Sensitivity

Figure 7 and Figure 8 present the comparative analysis to highlight the policy-driven economic behaviour, where

1 \times 3

scenario-based (i.e., base, drought, panelity, and low renewable) comparisons of the cost categories & grid procurement across controllers are performed. The cost (Thousand GBP on a log scale) decompositions show that the gain of the SAC approach is not limited to a single category. Despite the cost improvements, they appear jointly in water-related penalties, carbon-related charges, and total operational expenditures. These cross-category improvements explicitly indicate that the learned policy is not limited to only distributing the overhead and burden from one subsystem to another. Figure 8 shows that these findings further support this interpretation by indicating cleaner procurement under the SAC model, with increased use of low-carbon hydropower during strategically favorable periods. When climatic constraints tighten, the rule-based baseline incurs severe violation penalties, whereas SAC dynamically adjusts CCUS targets to preserve operating margin.

4.5. Monte Carlo-Based Stochastic Evaluation

Environmental optimization models must perform robustly in the presence of substantial physical noise. Figure 9 demonstrates the impact of environmental optimization algorithms, which must perform robustly while considering physical noise. These figures further summarize the cost-related performance across different algorithms, while Figure 10 shows the aggregated cost profiles over 24 h under dynamic weather conditions. Here, the proposed model computes the uncertainty bounds using an iterative Monte Carlo method across 100 independently seeded episodes per scenario. In addition, beyond the lower average cost, the important finding in these results is the distributional robustness. Where the proposed SAC successfully maintains the strict standard-deviation bands and reduces tail-risk behavior associated with constraint violations. Because this is equally important for practical deployment, industrial operators must manage worst-case compliance outcomes rather than expected-value performance.

4.6. Quantitative Benchmarking

Table 2 summarizes the comparative results obtained from the proposed and benchmark algorithms. The choice of comparison models is deliberate: SAC is adopted as the main method because the problem involves continuous control, stochastic exogenous signals, and tightly coupled nonlinear constraints, whereas PPO is included as a widely used policy-gradient benchmark and the rule-based controller is retained as an engineering baseline for interpretability. To ensure consistent comparison, the reported total nexus cost aggregates three cost-component categories used throughout the study: (i) operational expenditure associated with thermal generation and plant energy use, following standard energy-hub cost accounting [8,9]; (ii) regulatory and feasibility fines triggered by abstraction or dispatch violations; and (iii) carbon- and water-related valuation terms derived from the carbon-intensity and water-burden formulations defined in the methodology [1,4,10]. Operationally, the total nexus cost is obtained by evaluating these three components at each simulation time step and then summing them over the full dispatch horizon for each scenario, so the reported values represent cumulative system-level cost rather than a single-hour snapshot. Notably, although the PPO agent appears to achieve low direct operating cost and carbon output, this behavior is identified here as a characteristic “policy collapse” under extreme constraints. Because PPO fails to identify a technically feasible policy during learning, it maintains near-zero thermal generation while avoiding the full trade-off among fuel consumption, water sourcing, and CCUS activation. This “stay-off” strategy leads to severe energy-deficit fines (exceeding 5M). These results show that on-policy baselines are insufficient for handling high-dimensional, nonlinear penalties in the WEC nexus framework. By contrast, the proposed SAC model identifies an industrial-symbiosis operating point that can meet demand while reducing total nexus cost by 70% compared with PPO and 83% compared with the rule-based baseline.

4.7. Implications of Modelling Methodology

The modelling methodology directly affects system outcomes, carbon-emission reduction, and the other performance metrics reported in this study. The proposed framework enforces tightly coupled constraints across the energy and water domains at each hourly step. As a result, DRL policy performance reflects real operational trade-offs rather than unconstrained economic dispatch alone. The use of stochastic Monte Carlo evaluation also means that the reported outcomes represent policy behavior across a distribution of scenarios rather than along a single deterministic trajectory. Consequently, the advantage of SAC should be interpreted as both a reliability gain and an efficiency gain under uncertainty. The simulation results further reveal a policy blind spot in the UK’s net-zero pathway: the water-carbon mitigation penalty. Decarbonizing heavy industry therefore requires a transition from isolated carbon management toward integrated, AI-driven nexus planning. The findings also show that CCUS deployment may become ecologically unsustainable in water-stressed regions when it relies heavily on freshwater abstraction. In contrast, the SAC policy suggests that industrial symbiosis, particularly coupling cement plants with municipal wastewater treatment, offers a practical and resilient pathway. The developed framework further indicates that pumped storage and hydropower should be treated not only as frequency-balancing assets but also as nexus-balancing resources that support energy-intensive water reclamation. From a real-time implementation perspective, the methodology is most suitable as an operational decision-support layer because, once trained, the DRL policy can generate near-instant control decisions from live plant and market signals. However, industrial deployment in cement plants would still require plant-specific calibration, integration with SCADA or supervisory control systems, sensor-quality assurance, hard safety interlocks, and phased pilot validation before autonomous closed-loop use.

4.8. Discussion in Relation to Existing Literature

The findings of this study align with, but also extend, several strands of existing WEC-nexus and industrial-decarbonization literature. First, prior energy-hub and smart water–energy hub studies have shown that integrated optimization can reduce system cost and improve cross-sector coordination under coupled infrastructure constraints [8,9]. The present results are consistent with that general insight, but they go further by showing that, under CCUS-induced nonlinear water and carbon burdens, static planning alone is insufficient to reveal the full operational trade-off space. Second, the importance of uncertainty highlighted here is consistent with stochastic water–energy studies such as Ji et al. [10], which demonstrated that resource decisions change materially when uncertainty is represented explicitly. In the proposed case, this point is imposed by the stochastic DRL setting, where the policy’s quality depends not only on average performance but also on robustness across disturbed operating trajectories.

The study also connects operational control with environmental burden literature. Rosa et al. [4] highlight that the carbon-capture system can increase water demand, while Bakken et al. [5] emphasized that even low-carbon electricity sources such as hydropower may carry non-negligible water implications. Our findings support both observations and reveal how they interact in an industrial setting. Where aggressive carbon mitigation strategies, without coordinated management, can shift pressure from atmospheric emissions to regional hydrology. In that sense, the proposed WCMPI and the learned SAC policy contribute beyond prior footprint studies by converting those trade-offs into a real-time operational decision problem. Finally, compared with recent DRL-based hub-scheduling studies such as those of Zhang et al. [11], the present work provides a more application-specific nexus formulation for heavy-industry decarbonization. Rather than focusing only on generic distributed energy-resource scheduling, the model explicitly integrates CCUS electricity and cooling-water loads, wastewater reuse, abstraction limits, and regional carbon-intensity signals. This sharper alignment with cement-sector decarbonization is the paper’s main contribution relative to the existing literature.

4.9. Limitations

Although this work has a great impact in managing carbon emissions via developing a deep-learning-enabled WEC model. However, there are a few associated limitations. For instance, this study is specifically designed for the UK industrial and cement sector, so the simulation results may not be optimally transferred to other industries with different power mixes, fuel prices, and hydrological regimes. Secondly, the CCUS representation is specifically designed to focus on the plant-level capturing & compression loads. However, this work does not model the long-term storage systems supply chain & transport with great operational detail. Thirdly, the renewable inputs are simplified to offshore wind time series, while the broader multi-renewable portfolios and network congestion effects are outside the scope of this study. In addition, various coefficients in the coupled WEC model have been derived via engineering assumptions and calibration, which further introduces structural uncertainties despite the stochastic training procedure. Finally, although the DRL framework performs well in simulation, real-world deployment would still require plant validation, cyber-physical integration, and robustness checks under operational disturbances that are not fully represented in the current environment.

5. Conclusions

This work develops a novel WEC model to formulate an industrial tri-lemma problem as an MDP and uses the DRL algorithm to learn the policies for net-zero operations in the cement manufacturing industry, particularly. Since the net-zero emissions in industrial transitions are tightly coupled with the regional power & water security. Therefore, this work follows a continuous-control DRL methodology on the basis of Soft Actor-Critic to solve the WEC tri-lemma problem under various dynamic & stochastic conditions. The mathematical models are formulated in a way to incorporate the hydrology, exogenous, stochastic, and policy limits, respectively. As a consequence, this formulation has enabled the coupled WEC system to optimize the infrastructure interactions, power-water demand, and regulatory limits. The developed system is implemented on a publicly available datasets, where the results reveals that the learned policies optimally fulfill the operational requirements, along with dynamic CCUS targets, which achieves 13.83% carbon reduction and a cost reduction of 70%, as compared to PPO & rule-based algorithms. The results also demonstrate the water-saving potential of the proposed strategy by shifting from freshwater abstraction to reclaimed municipal wastewater. This framework also avoided the increased hydrological stress by 2.15–5.17% under unmanaged carbon-mitigation pathways. Regarding the application, this model can be used as a representative UK industrial cluster showing that aggressive carbon capture can shift ecological pressure toward aquatic systems if it is left unmanaged. However, the proposed AI-enabled model reveals that replacing potable/fresh water with wastewater, while coordinating the low-carbon operations, can further maintain a sustainable and Pareto-efficient industrial symbiosis system. In conclusion, the methodology offers a practical roadmap for policymakers seeking to meet net-zero obligations without triggering regional hydro-ecological failure. The framework is therefore positioned as a decision-support tool for industrial operators and policymakers navigating coupled decarbonization and water-security obligations. In the future, the authors would like to extend this framework for multi-site industrial symbiosis networks with full-chain CCUS representations. In addition, the incorporation of broader renewable portfolios, hydrogen energy sources, transmission constraints, and validating the learned control policies through plant-level pilot deployment in collaboration with industrial and regulatory partners.

Author Contributions

Conceptualization, M.H., M.B.R. and I.U.K.; methodology, M.H. and M.B.R.; software, M.H. and M.B.R.; validation, M.B.R., I.U.K., K.A.A.G. and M.H.; formal analysis, M.H. and M.B.R.; investigation, M.H., M.B.R. and K.A.A.G.; resources, M.H. and M.B.R.; data curation, M.H.; writing—original draft preparation, M.H. and M.B.R.; writing—review and editing, M.B.R., I.U.K. and K.A.A.G.; visualization, M.B.R., I.U.K. and K.A.A.G.; supervision, M.B.R. and K.A.A.G.; project administration, M.B.R. and K.A.A.G.; funding acquisition, M.B.R. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Data Availability Statement

The original data presented in the study are openly available in [NESO] at [URL: https://api.neso.energy/dataset/88313ae5-94e4-4ddc-a790-593554d8c6b9/resource/f93d1835-75bc-43e5-84ad-12472b180a98/download/df_fuel_ckan.csv accessed on 1 April 2026].

Conflicts of Interest

Author M. Hassan was employed by the company “Bestway Pvt. Ltd.”. The remaining authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

References

Andrew, R.M. Global CO₂ emissions from cement production. Earth Syst. Sci. Data 2018, 10, 195–217. [Google Scholar] [CrossRef]
Pereira, E.G.; Fossa, A.J.; Muinzer, T.L. (Eds.) Carbon Capture Utilization and Storage: Law, Policy and Standardization Perspectives; Palgrave Macmillan: London, UK, 2025. [Google Scholar]
National Energy System Operator (NESO), Data Portal, Open Data from Great Britain’s System Operator. 2026. Available online: https://www.neso.energy/data-portal (accessed on 11 January 2026).
Rosa, L.; Sanchez, D.L.; Realmonte, G.; Baldocchi, D.; D’Odorico, P. The water footprint of carbon capture and storage technologies. Renew. Sustain. Energy Rev. 2021, 138, 110511. [Google Scholar] [CrossRef]
Bakken, T.H.; Killingtveit, A.; Alfredsen, K. The water footprint of hydropower production-state of the art and methodological challenges. Glob. Chall. 2017, 1, 1600018. [Google Scholar] [PubMed]
International Energy Agency. Cement. IEA Tracking Report 2024. Available online: https://www.iea.org/energy-system/industry/cement (accessed on 6 February 2026).
Li, N.; Unluer, C. Towards a net zero built environment: Decarbonisation of the UK cement industry. npj Mater. Sustain. 2025, 3, 10. [Google Scholar] [CrossRef]
Salimi, M.; Ghasemi, H.; Adelpour, M.; VaezZAdeh, S. Optimal planning of energy hubs in interconnected energy systems: A case study for natural gas and electricity. IET Gener. Transm. Distrib. 2015, 9, 695–707. [Google Scholar] [CrossRef]
Roustaei, M.; Niknam, T.; Salari, S.; Chabok, H.; Sheikh, M.; Kavousi-Fard, A.; Aghaei, J. A scenario-based approach for the design of Smart Energy and Water Hub. Energy 2020, 195, 116931. [Google Scholar] [CrossRef]
Ji, L.; Wu, T.; Xie, Y.; Huang, G.; Sun, L. A novel two-stage fuzzy stochastic model for water supply management from a water-energy nexus perspective. J. Clean. Prod. 2020, 277, 123386. [Google Scholar] [CrossRef]
Zhang, X.; Wang, Q.; Yu, J.; Sun, Q.; Hu, H.; Liu, X. A multi-agent deep-reinforcement-learning-based strategy for safe distributed energy resource scheduling in energy hubs. Electronics 2023, 12, 4763. [Google Scholar] [CrossRef]
Davarpanah, A. Comparative evaluation of carbon capture, utilization, and storage (CCUS) technologies using multi-criteria decision-making approaches. ACS Sustain. Chem. Eng. 2024, 12, 9498–9510. [Google Scholar]
Environment Agency. Water Resources Planning Guideline; UK Government: London, UK, 2026. Available online: https://www.gov.uk/government/publications/water-resources-planning-guideline/water-resources-planning-guideline (accessed on 7 January 2026).
Haarnoja, T.; Zhou, A.; Abbeel, P.; Levine, S. Soft actor-critic: Off-policy maximum entropy deep reinforcement learning with a stochastic actor. In International Conference on Machine Learning; Pmlr: Cambridge, MA, USA, 2018; pp. 1861–1870. [Google Scholar]
Schulman, J.; Wolski, F.; Dhariwal, P.; Radford, A.; Klimov, O. Proximal policy optimization algorithms. arXiv 2017, arXiv:1707.06347. [Google Scholar] [CrossRef]
Puterman, M.L. Markov Decision Processes: Discrete Stochastic Dynamic Programming; John Wiley & Sons: Hoboken, NJ, USA, 2014. [Google Scholar]
Towers, M.; Kwiatkowski, A.; Terry, J.; Balis, J.U.; De Cola, G.; Deleu, T.; Gouláo, M.; Kallinteris, A.; Krimmel, M.; KG, A.; et al. Gymnasium: A standard interface for reinforcement learning environments. arXiv 2024, arXiv:2407.17032. [Google Scholar] [CrossRef]
Raffin, A.; Hill, A.; Gleave, A.; Kanervisto, A.; Ernestus, M.; Dormann, N. Stable-baselines3: Reliable reinforcement learning implementations. J. Mach. Learn. Res. 2021, 22, 1–8. [Google Scholar]
UK Energy Research Centre. 2026. Available online: https://ukerc.rl.ac.uk/cgi-bin/dataDiscover.pl (accessed on 11 January 2026).
Paszke, A.; Gross, S.; Massa, F.; Lerer, A.; Bradbury, J.; Chanan, G.; Killeen, T.; Lin, Z.; Gimelshein, N.; Antiga, L.; et al. Pytorch: An imperative style, high-performance deep learning library. Adv. Neural Inf. Process. Syst. 2019, 32, 8026–8037. [Google Scholar]

Figure 1. Detailed flow-based representation of the integrated WEC optimization framework: exogenous data ingestion, MDP formalization, coupled constrained environment dynamics, SAC/PPO learning pathways, operational dispatch outputs, and closed-loop performance evaluation.

Figure 2. Proposed DRL method (SAC): closed-loop interaction with the constrained WEC environment and off-policy actor-critic learning updates.

Figure 3. Training convergence of SAC, PPO, and rule-based controllers in episodic system-cost minimization.

Figure 4. Dynamic power-procurement stack showing thermal-versus-hydro allocation across scenarios.

Figure 5. Dynamic water-sourcing stack demonstrating algorithmic throttling of fresh-water abstraction.

Figure 6. Carbon-abatement and regional-WCMPI trajectories showing avoidance of high-penalty operating zones.

Figure 7. A cost comparsion and breakdown in comparsion with baseline, PPO, and SAC across different scenarios.

Figure 8. A comparsion of total power generation mix of the different sources, showing optimized reliance of SAC on low-carbon hydropower.

Figure 9. A comparsion of total system cost over the three different scenarios.

Figure 10. 24-h stochastic cumulative-cost profiles with standard-deviation bounds.

Table 1. Nomenclature of key variables and symbols used in the manuscript.

Variable	Definition	Variable	Definition
$W C M P I (t)$	Water-Carbon Mitigation Penalty Index at time t	$W F_{s y s} (t)$	Total system water footprint at time t
$C F_{s y s} (t)$	Total system carbon footprint at time t	$C O_{t}^{c a p}$	Carbon captured by the CCUS process at time t
$E_{t}^{c c u s}$	CCUS electricity demand at time t	$W_{t}^{c o o l i n g}$	Cooling-water demand at time t
$D_{t}$	Parasitic utility demand vector at time t	$M_{c p l}$	Dynamic coupling matrix linking capture to utility demand
$E_{b a s e}$	Baseline industrial electricity demand	$E_{t}^{w a t e r}$	Electricity demand of water sourcing/treatment at time t
$E_{t}^{h y d r o}$	Hydropower electricity supply at time t	$E_{t}^{t h e r m}$	Thermal generation supply at time t
$E_{t}^{r e n}$	Renewable electricity supply at time t	$W_{s o u r c e, t}$	Water-source vector (fresh and reclaimed) at time t
$W_{t}^{f r e s h}$	traction at time t	$W_{t}^{r e c l a i m}$	Reclaimed wastewater used at time t
$Ω_{E A_L i m i t}$	Daily Environment Agency abstraction limit	$C_{t}^{i n t}$	Grid carbon intensity at time t
$H_{t}^{v o l}$	Available hydropower reservoir volume/state of charge	$s_{t}$	MDP state vector at time t
$a_{t}$	MDP action vector at time t	$R_{t}$	Reward value at time t
$Φ (s_{t}, a_{t})$	Constraint-violation penalty function	$ω_{c}, ω_{w}, ϕ_{c o s t}$	Reward weights for carbon, water, and operational cost terms

Table 2. Comprehensive Multi-Scenario Performance Analytics for WEC Nexus Optimization.

Environmental Scenario	Algorithmic Dispatch Strategy	Operational Cost (£)	Carbon (kg)	Water ( $m^{3}$ )	Fines (£)	Total Nexus Cost (£)
Base Case	Rule-Based Baseline	361,800	2,281,000	37,532	8,533,034	8,894,834
	PPO Agent	2887	85,000	38,325	4,999,347	5,002,234
	SAC Agent (Proposed)	378,779	1,864,668	36,722	1,108,973	1,487,753
Drought Scenario (500 $m^{3}$ )	Rule-Based Baseline	361,800	2,281,000	13,532	10,921,034	11,282,834
	PPO Agent	2887	85,000	14,433	4,973,705	4,976,591
	SAC Agent (Proposed)	378,779	1,864,668	13,997	2,992,528	3,371,307
Low Renewables (−70%)	Rule-Based Baseline	361,800	2,281,000	37,532	8,533,034	8,894,834
	PPO Agent	2776	85,000	37,762	6,651,544	6,654,320
	SAC Agent (Proposed)	405,334	1,965,458	37,397	1,968,762	2,374,096
Hyper Carbon Tax (£150/t)	Rule-Based Baseline	361,800	2,281,000	37,532	8,533,034	8,894,834
	PPO Agent	2887	85,000	38,325	4,999,347	5,002,234
	SAC Agent (Proposed)	378,779	1,864,668	36,722	1,108,973	1,487,753

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Hassan, M.; Rasheed, M.B.; Khan, I.U.; Gamage, K.A.A. Dynamic Water-Energy-Carbon Trade-Off Optimization for Heavy Industry Decarbonization via Deep Reinforcement Learning: A UK Case Study. Water 2026, 18, 1112. https://doi.org/10.3390/w18091112

AMA Style

Hassan M, Rasheed MB, Khan IU, Gamage KAA. Dynamic Water-Energy-Carbon Trade-Off Optimization for Heavy Industry Decarbonization via Deep Reinforcement Learning: A UK Case Study. Water. 2026; 18(9):1112. https://doi.org/10.3390/w18091112

Chicago/Turabian Style

Hassan, M., M. B. Rasheed, Inam Ullah Khan, and K. A. A. Gamage. 2026. "Dynamic Water-Energy-Carbon Trade-Off Optimization for Heavy Industry Decarbonization via Deep Reinforcement Learning: A UK Case Study" Water 18, no. 9: 1112. https://doi.org/10.3390/w18091112

APA Style

Hassan, M., Rasheed, M. B., Khan, I. U., & Gamage, K. A. A. (2026). Dynamic Water-Energy-Carbon Trade-Off Optimization for Heavy Industry Decarbonization via Deep Reinforcement Learning: A UK Case Study. Water, 18(9), 1112. https://doi.org/10.3390/w18091112

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Dynamic Water-Energy-Carbon Trade-Off Optimization for Heavy Industry Decarbonization via Deep Reinforcement Learning: A UK Case Study

Abstract

1. Introduction

2. Research Motivation, Gap and Proposed Framework

Proposed Framework and Contributions

3. Methodology: The DRL-Enabled WEC Framework

3.1. The Water-Carbon Mitigation Penalty Index (WCMPI)

3.2. Matrix-Based Sectoral Coupling Dynamics

3.3. Physical Constraints and Nodal Balancing

3.4. Proposed Mathematical Model: Continuous MDP Formalization

3.5. Proposed DRL Algorithm

4. Results and Scenario Analysis

4.1. Dataset & Modelling Methodology

4.2. Algorithmic Convergence and Learning Robustness

4.3. Dynamic Trade-Off Dispatch Verification

4.4. Scenario Sensitivity

4.5. Monte Carlo-Based Stochastic Evaluation

4.6. Quantitative Benchmarking

4.7. Implications of Modelling Methodology

4.8. Discussion in Relation to Existing Literature

4.9. Limitations

5. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI