1. Introduction
Due to the recent increase in global warming driven by massive fossil fuel consumption, the transition to a net-zero economy requires decarbonization across all sectors, including transportation, power generation, industrial processes, and particularly cement production [
1]. Where the cement production is considered to remain central to the short & long-term development of future smart cities. Renewable energy generation & battery storage technologies (e.g., Li-ion, fuel cell) have advanced rapidly. In contrast, decarbonizing heavy industries, such as cement manufacturing, poses a major challenge [
2]. Because the cement industry alone is responsible for approximately 7–8% of global C
emissions. Therefore, achieving the UK government’s net-zero targets [
3] primarily relies on alternative fuels [
2], the optimal utilization of available energy resources, and the deployment of Carbon Capture, Utilization and Storage (CCUS). However, CCUS integration can introduce severe secondary impacts within the Water-Energy-Carbon (WEC) nexus.
This problem can be viewed as a tri-lemma of industrial decarbonization, as mitigation technologies create competing resource demands. While the post-combustion & amine-based CCUS usually requires a comparatively low-grade thermal energy for regeneration of solvent, a massive amount of energy for compression, and large volumes of water for cooling to manage exothermic absorption reactions [
4]. In addition, installing CCUS can eventually increase the energy demand of a standard cement plant by more than 50% and raw-water consumption by up to 90% [
2]. Consequently, as stack-level carbon emissions decline, the wider system can experience greater pressure on the electricity supply and water resources. This shift is due to the additional power needed for capture and compression, which must be supplied by the regional grid, while the added cooling demand increases water abstraction or treatment requirements. In practice, part of the carbon burden is therefore transferred upstream to the supporting power system, and part of the environmental burden is transferred laterally to regional water systems [
4,
5]. Meeting these additional electrical loads may require dispatchable generation, including hydropower or thermal plants, which can impose significant water footprints through reservoir evaporation and and/or cooling withdrawals [
5]. In water-stressed regions, meeting cooling demand through fresh surface abstraction may therefore become ecologically unsustainable, which motivates wastewater reuse and energy-intensive water reclamation.
It is also worth noting that CCUS is not the only route for minimizing C
emissions in the cement industry. In constrast, other main pathwasy to mitigate may include; (i) reducing the clinker-to-cement ratio via supplementary cementitious materials, (ii) increasing the integration of alternative and low-carbon fuels, (iii) improving kiln and grinding efficiency, (iv) deploying waste-heat recovery, (v) electrifying the process steps, and (vi) lowering the material demand through comparatively more efficient design and construction mechanisms [
6,
7]. The abovementioned options can significantly reduce the fuel-related emissions and the embodied carbon of cement products. Nevertheless, since a large share of cement emissions arises from limestone calcination, deep-decarbonization pathways could still identify carbon capture as important for addressing residual process emissions [
6,
7]. For this purpose, this work focuses on CCUS within a holistic portfolio of cement-sector mitigation mechanisms rather than treating it as the only decarbonization option.
The background literature on WEC-nexus covers multiple methodological solutions and applications in broader geographical contexts. From a methodological perspective, various background studies rely on either deterministic or mixed-integer optimization to specifically model the coupled resource flows regarding energy hubs & related infrastructures. For instance, Salimi et al. [
8] demonstrated an interconnected natural-gas & electricity hubs by employing deterministic energy-hub optimization techniques. Similarly, Roustaei et al. [
9] extended this concept via hub-based planning toward more integrated smart energy-water systems. Stochastic extensions were then introduced to explicitly capture uncertainty. Ji et al. [
10], for instance, developed a two-stage fuzzy stochastic model for water-supply management from a water–energy nexus perspective, showing how uncertainty in supply and demand alters resource-allocation decisions. In parallel, sector-specific environmental studies examined the nexus from a footprint perspective rather than an operational-control perspective. Bakken et al. [
5] assessed the water footprint of hydropower production at broader regional and methodological scales, whereas Rosa et al. [
4] focused on the water implications of carbon-capture systems.
The extensive background literature shows pronounced fragmentation across both thematic and geographic contexts. Where investigations have been carried out at utility-level water infrastructure scale [
5], multi-energy hub architectures, and technologies for individual low-carbon systems, such as hydropower or post-combustion capture [
4]. However, these threads will remain largely un-integrated into a unified & integrated industrial decision-making framework. In addition, another structural gap in terms of application contexts, such as the importance of publishing the models, is also validated against generic test networks and/or planning scenarios rather than against the operational contexts of hard-to-abate industrial sectors in regionally constrained settings. Moreover, advanced data-driven control techniques developed recently have also been effective in addressing the rigidity of deterministic models. For instance, Zhang et al. [
11] develop a cooperative multi-agent deep reinforcement learning algorithm to schedule distributed and variable energy resources in energy-hub environments, which operates under dynamic exogenous conditions. However, such models remain mainly situated within power-system and generic hub domains such as the coupled electricity, water, & carbon interdependencies, specifically arising from CCUS integration in the cement manufacturing, which remain outside their scope. Accordingly, this research gap is both contextual & methodological, where no similar frameworks are cohesively associated with the real-time dynamic control, CCUS-induced parasitic loads, water-sourcing flexibility, and region-specific regulatory constraints in a single, operationally grounded WEC-nexus model.
To address this gap, we develop a multi-objective DRL framework. Unlike traditional input-output models solved with static nonlinear programming, we formulate dynamic coupling as a Markov Decision Process (MDP) and solve it using the Soft Actor-Critic (SAC) algorithm. We also propose the Water-Carbon Mitigation Penalty Index (WCMPI) (see
Section 3.1) as a new study-specific metric for quantifying the trade-off cost of atmospheric mitigation, building on the broader literature on water footprints, carbon footprints, and CCUS-related environmental trade-offs [
4,
12]. The framework is then evaluated in a detailed United Kingdom (UK) case study.
4. Results and Scenario Analysis
4.1. Dataset & Modelling Methodology
For DRL training and validation, several types of real-world data were integrated into the proposed WEC simulation model [
3,
13]. Firstly, the energy system dataset contains the grid carbon-intensity vector (hourly) (
), offshore wind availability, and baseline thermal generation levels, which are obtained from dispatch records of the National Energy System Operator (NESO), UK Energy Research Centre (UKERC) and accessed via the cited UK dataset source [
3,
19]. In the proposed model, the offshore wind serves as the renewable input term
. Hydropower is treated separately from
because it is represented as a dispatchable pumped-storage resource with an explicit internal state (
) rather than as a purely exogenous renewable time series. Secondly, the water-regulation dataset includes daily freshwater abstraction limits (
), derived from Environment Agency planning thresholds [
13]. Third, scenario-definition inputs were constructed from these base datasets to represent drought conditions, reduced renewable availability, and high-carbon-price stress cases. These time-series inputs are used directly in the state construction, constraint enforcement, and reward evaluation of the MDP environment. During training, the above variables are repeatedly sampled with Gaussian perturbations, yielding more than 100,000 stochastic episodes and encouraging robust policies rather than static-pattern memorization.
The simulation results are obtained from solving the continuous-MDP problem, which was developed within the integrated framework described in (
Section 4), where each episode covers over a 24-h operating horizon, and the decision is updated after every hour. At each time step (
t), the optimization controller observes the intensity of renewable energy availability, grid carbon, remaining abstraction capacity, the time index, and the hydro state to select hydro-dispatch, capture, and the reclamation ratio action(s). The simulation results highlight the outcomes that are compared with the other baseline methods, such as PPO & SAC and a rule-based baseline across similar scenario definitions and respective constraint sets, ensuring methodological consistency. In addition, to measure cost performance, policy quantity is assessed using robustness indicators, such as vibration behavior and variance bands, to assess operational reliability under stochastic conditions.
The continuous-MDP WEC formalism was implemented in a custom environment using Python 3.14
gymnasium [
17]. The governing equations were not solved using a separate symbolic or algebraic equation solver. Instead, they were evaluated numerically at each hourly step inside the custom simulation environment, where the code updates the state
, enforces matrix-based resource constraints (
), and computes the WCMPI-aware reward. The optimization and control policy were then learned using the SAC implementation in PyTorch 2.11.0 [
20] and
Stable-Baselines3 [
18]. For comparative analysis, PPO was handled through the same reinforcement-learning toolchain. Training spans a broad operating-condition matrix and millions of environment interactions to identify Pareto-improving continuous dispatch policies. This pipeline links analytical WEC modeling with data-driven control to generate executable policies from high-dimensional grid and water signals.
4.2. Algorithmic Convergence and Learning Robustness
The SAC agent has been benchmarked against PPO and the static rule-based heuristic baseline method.
Figure 3 demonstrates that SAC reaches a stable and low-cost (system cost thousand
) within approximately 80,000 training steps and maintains a smoother post-convergence behavior as compared to PPO. The results also reveal that this stability is equally important because the environment exhibits sharp penalty transitions, imposed by abstraction limits and reservoir constraints, such that, under these conditions, oscillatory policies can repeatedly enter high-penalty regions. On the other hand, the SAC also exhibits more consistent learning dynamics, further suggesting it is better suited for operational control in the proposed constrained-enabled WEC systems.
4.3. Dynamic Trade-Off Dispatch Verification
The
Figure 4 and
Figure 5 demonstrate the comparative analysis of 24-h dispatch profiles, reflecting how SAC tir-lemma is handled by the SAC. Across the three scenarios in
Figure 4 and
Figure 5, the agent avoids rigid baseload behavior and adapts to changing boundary conditions.
Figure 4 shows that SAC rebalances thermal and hydropower procurement in response to both instantaneous carbon intensity and downstream water implications.
Figure 5 further shows coordinated source switching between fresh abstraction and reclaimed water, demonstrating that energy and water decisions are co-optimized rather than handled sequentially.
Under stringent environmental restrictions (drought scenario), the agent substantially decouples operations from fresh-river abstraction. It discovers a strategy that schedules energy-intensive municipal wastewater reclamation during periods of high hydropower availability, minimizing WCMPI (
Figure 6) despite high aggregate throughput. Operationally, this means the policy does not simply minimize carbon at each instant; instead, it redistributes mitigation effort over time to avoid simultaneous stress on the grid and hydrological system.
4.4. Scenario Sensitivity
Figure 7 and
Figure 8 present the comparative analysis to highlight the policy-driven economic behaviour, where
scenario-based (i.e., base, drought, panelity, and low renewable) comparisons of the cost categories & grid procurement across controllers are performed. The cost (Thousand GBP on a log scale) decompositions show that the gain of the SAC approach is not limited to a single category. Despite the cost improvements, they appear jointly in water-related penalties, carbon-related charges, and total operational expenditures. These cross-category improvements explicitly indicate that the learned policy is not limited to only distributing the overhead and burden from one subsystem to another.
Figure 8 shows that these findings further support this interpretation by indicating cleaner procurement under the SAC model, with increased use of low-carbon hydropower during strategically favorable periods. When climatic constraints tighten, the rule-based baseline incurs severe violation penalties, whereas SAC dynamically adjusts CCUS targets to preserve operating margin.
4.5. Monte Carlo-Based Stochastic Evaluation
Environmental optimization models must perform robustly in the presence of substantial physical noise.
Figure 9 demonstrates the impact of environmental optimization algorithms, which must perform robustly while considering physical noise. These figures further summarize the cost-related performance across different algorithms, while
Figure 10 shows the aggregated cost profiles over 24 h under dynamic weather conditions. Here, the proposed model computes the uncertainty bounds using an iterative Monte Carlo method across 100 independently seeded episodes per scenario. In addition, beyond the lower average cost, the important finding in these results is the distributional robustness. Where the proposed SAC successfully maintains the strict standard-deviation bands and reduces tail-risk behavior associated with constraint violations. Because this is equally important for practical deployment, industrial operators must manage worst-case compliance outcomes rather than expected-value performance.
4.6. Quantitative Benchmarking
Table 2 summarizes the comparative results obtained from the proposed and benchmark algorithms. The choice of comparison models is deliberate: SAC is adopted as the main method because the problem involves continuous control, stochastic exogenous signals, and tightly coupled nonlinear constraints, whereas PPO is included as a widely used policy-gradient benchmark and the rule-based controller is retained as an engineering baseline for interpretability. To ensure consistent comparison, the reported total nexus cost aggregates three cost-component categories used throughout the study: (i) operational expenditure associated with thermal generation and plant energy use, following standard energy-hub cost accounting [
8,
9]; (ii) regulatory and feasibility fines triggered by abstraction or dispatch violations; and (iii) carbon- and water-related valuation terms derived from the carbon-intensity and water-burden formulations defined in the methodology [
1,
4,
10]. Operationally, the total nexus cost is obtained by evaluating these three components at each simulation time step and then summing them over the full dispatch horizon for each scenario, so the reported values represent cumulative system-level cost rather than a single-hour snapshot. Notably, although the PPO agent appears to achieve low direct operating cost and carbon output, this behavior is identified here as a characteristic “policy collapse” under extreme constraints. Because PPO fails to identify a technically feasible policy during learning, it maintains near-zero thermal generation while avoiding the full trade-off among fuel consumption, water sourcing, and CCUS activation. This “stay-off” strategy leads to severe energy-deficit fines (exceeding 5M). These results show that on-policy baselines are insufficient for handling high-dimensional, nonlinear penalties in the WEC nexus framework. By contrast, the proposed SAC model identifies an industrial-symbiosis operating point that can meet demand while reducing total nexus cost by 70% compared with PPO and 83% compared with the rule-based baseline.
4.7. Implications of Modelling Methodology
The modelling methodology directly affects system outcomes, carbon-emission reduction, and the other performance metrics reported in this study. The proposed framework enforces tightly coupled constraints across the energy and water domains at each hourly step. As a result, DRL policy performance reflects real operational trade-offs rather than unconstrained economic dispatch alone. The use of stochastic Monte Carlo evaluation also means that the reported outcomes represent policy behavior across a distribution of scenarios rather than along a single deterministic trajectory. Consequently, the advantage of SAC should be interpreted as both a reliability gain and an efficiency gain under uncertainty. The simulation results further reveal a policy blind spot in the UK’s net-zero pathway: the water-carbon mitigation penalty. Decarbonizing heavy industry therefore requires a transition from isolated carbon management toward integrated, AI-driven nexus planning. The findings also show that CCUS deployment may become ecologically unsustainable in water-stressed regions when it relies heavily on freshwater abstraction. In contrast, the SAC policy suggests that industrial symbiosis, particularly coupling cement plants with municipal wastewater treatment, offers a practical and resilient pathway. The developed framework further indicates that pumped storage and hydropower should be treated not only as frequency-balancing assets but also as nexus-balancing resources that support energy-intensive water reclamation. From a real-time implementation perspective, the methodology is most suitable as an operational decision-support layer because, once trained, the DRL policy can generate near-instant control decisions from live plant and market signals. However, industrial deployment in cement plants would still require plant-specific calibration, integration with SCADA or supervisory control systems, sensor-quality assurance, hard safety interlocks, and phased pilot validation before autonomous closed-loop use.
4.8. Discussion in Relation to Existing Literature
The findings of this study align with, but also extend, several strands of existing WEC-nexus and industrial-decarbonization literature. First, prior energy-hub and smart water–energy hub studies have shown that integrated optimization can reduce system cost and improve cross-sector coordination under coupled infrastructure constraints [
8,
9]. The present results are consistent with that general insight, but they go further by showing that, under CCUS-induced nonlinear water and carbon burdens, static planning alone is insufficient to reveal the full operational trade-off space. Second, the importance of uncertainty highlighted here is consistent with stochastic water–energy studies such as Ji et al. [
10], which demonstrated that resource decisions change materially when uncertainty is represented explicitly. In the proposed case, this point is imposed by the stochastic DRL setting, where the policy’s quality depends not only on average performance but also on robustness across disturbed operating trajectories.
The study also connects operational control with environmental burden literature. Rosa et al. [
4] highlight that the carbon-capture system can increase water demand, while Bakken et al. [
5] emphasized that even low-carbon electricity sources such as hydropower may carry non-negligible water implications. Our findings support both observations and reveal how they interact in an industrial setting. Where aggressive carbon mitigation strategies, without coordinated management, can shift pressure from atmospheric emissions to regional hydrology. In that sense, the proposed WCMPI and the learned SAC policy contribute beyond prior footprint studies by converting those trade-offs into a real-time operational decision problem. Finally, compared with recent DRL-based hub-scheduling studies such as those of Zhang et al. [
11], the present work provides a more application-specific nexus formulation for heavy-industry decarbonization. Rather than focusing only on generic distributed energy-resource scheduling, the model explicitly integrates CCUS electricity and cooling-water loads, wastewater reuse, abstraction limits, and regional carbon-intensity signals. This sharper alignment with cement-sector decarbonization is the paper’s main contribution relative to the existing literature.
4.9. Limitations
Although this work has a great impact in managing carbon emissions via developing a deep-learning-enabled WEC model. However, there are a few associated limitations. For instance, this study is specifically designed for the UK industrial and cement sector, so the simulation results may not be optimally transferred to other industries with different power mixes, fuel prices, and hydrological regimes. Secondly, the CCUS representation is specifically designed to focus on the plant-level capturing & compression loads. However, this work does not model the long-term storage systems supply chain & transport with great operational detail. Thirdly, the renewable inputs are simplified to offshore wind time series, while the broader multi-renewable portfolios and network congestion effects are outside the scope of this study. In addition, various coefficients in the coupled WEC model have been derived via engineering assumptions and calibration, which further introduces structural uncertainties despite the stochastic training procedure. Finally, although the DRL framework performs well in simulation, real-world deployment would still require plant validation, cyber-physical integration, and robustness checks under operational disturbances that are not fully represented in the current environment.
5. Conclusions
This work develops a novel WEC model to formulate an industrial tri-lemma problem as an MDP and uses the DRL algorithm to learn the policies for net-zero operations in the cement manufacturing industry, particularly. Since the net-zero emissions in industrial transitions are tightly coupled with the regional power & water security. Therefore, this work follows a continuous-control DRL methodology on the basis of Soft Actor-Critic to solve the WEC tri-lemma problem under various dynamic & stochastic conditions. The mathematical models are formulated in a way to incorporate the hydrology, exogenous, stochastic, and policy limits, respectively. As a consequence, this formulation has enabled the coupled WEC system to optimize the infrastructure interactions, power-water demand, and regulatory limits. The developed system is implemented on a publicly available datasets, where the results reveals that the learned policies optimally fulfill the operational requirements, along with dynamic CCUS targets, which achieves 13.83% carbon reduction and a cost reduction of 70%, as compared to PPO & rule-based algorithms. The results also demonstrate the water-saving potential of the proposed strategy by shifting from freshwater abstraction to reclaimed municipal wastewater. This framework also avoided the increased hydrological stress by 2.15–5.17% under unmanaged carbon-mitigation pathways. Regarding the application, this model can be used as a representative UK industrial cluster showing that aggressive carbon capture can shift ecological pressure toward aquatic systems if it is left unmanaged. However, the proposed AI-enabled model reveals that replacing potable/fresh water with wastewater, while coordinating the low-carbon operations, can further maintain a sustainable and Pareto-efficient industrial symbiosis system. In conclusion, the methodology offers a practical roadmap for policymakers seeking to meet net-zero obligations without triggering regional hydro-ecological failure. The framework is therefore positioned as a decision-support tool for industrial operators and policymakers navigating coupled decarbonization and water-security obligations. In the future, the authors would like to extend this framework for multi-site industrial symbiosis networks with full-chain CCUS representations. In addition, the incorporation of broader renewable portfolios, hydrogen energy sources, transmission constraints, and validating the learned control policies through plant-level pilot deployment in collaboration with industrial and regulatory partners.