Energy Management for Integrated Energy System Based on Coordinated Optimization of Electric–Thermal Multi-Energy Retention and Reinforcement Learning

Cheng, Yan; Yang, Song; Sun, Shumin; Yu, Peng; Xing, Jiawei

doi:10.3390/pr13092693

Open AccessArticle

Energy Management for Integrated Energy System Based on Coordinated Optimization of Electric–Thermal Multi-Energy Retention and Reinforcement Learning

by

Yan Cheng

^*,

Song Yang

,

Shumin Sun

,

Peng Yu

and

Jiawei Xing

State Grid Shandong Electric Power Company, Electric Power Science Research Institute, Jinan 250003, China

^*

Author to whom correspondence should be addressed.

Processes 2025, 13(9), 2693; https://doi.org/10.3390/pr13092693

Submission received: 12 May 2025 / Revised: 1 June 2025 / Accepted: 5 June 2025 / Published: 24 August 2025

(This article belongs to the Special Issue Power System Optimization for Energy Storage: Methods and Applications)

Download

Browse Figures

Versions Notes

Abstract

With the large-scale access to a large number of distributed electric and thermal flexible resources and multiple loads on the user side, the energy management of the integrated energy system (IES) has become an effective way for the efficient and low-carbon economic operation of energy systems. In order to explore a new mode of IES energy management with the participation of energy service providers (ESPs) and user clusters (UCs), this paper puts forward an energy management method for electric–thermal microgrids, considering the optimization of user energy consumption characteristics. Firstly, an energy management framework with multi-agent participation of ESP and user cluster is proposed, and a user energy preference model is established considering the user’s electricity and heat consumption preferences. Secondly, considering the operation benefit of ESP and user cluster, based on the reinforcement learning (RL) framework, an energy management model between ESPs and users is established, and a distributed solution algorithm combining Q-learning and quadratic programming is proposed. Finally, the IESs with different user scales and energy units are taken as the test system, and the optimal energy management strategy of the system, considering the user’s energy preference, is analyzed. The simulation results demonstrate that the energy management model proposed enhances the economic efficiency of IES operations and reduces emissions. In a test system with two UCs, the optimized system achieves a 5.05% reduction in carbon emissions. The RL-based distributed solution algorithm efficiently solves the energy management model for systems with varying UC scales, requiring only 6.55 s for systems with two UCs and 13.26 s for systems with six UCs.

Keywords:

microgrid; energy service provider; users; energy management; energy preference

1. Introduction

Amid growing environmental pressures and advancements in renewable energy technologies, countries worldwide are restructuring their energy mix to reduce dependence on conventional fossil fuels. Integrated energy systems (IESs) present an innovative solution to promote large-scale renewable energy adoption, leverage the complementary advantages of diverse energy sources, and ensure cost-effective, efficient energy network operations [1,2]. Within the IES framework, energy service providers (ESPs) and users serve as the primary stakeholders in energy transactions. ESP pricing strategies directly shape user consumption patterns, while users, in turn, respond dynamically to these pricing mechanisms. Since ESPs and users represent distinct stakeholders with differing interests, IES energy management involves complex benefit distribution and multi-stakeholder optimization challenges [3,4]. Consequently, establishing a fair energy pricing mechanism and system operation strategy—one that fully considers the interests of all stakeholders—has become a critical issue requiring urgent resolution.

In integrated energy systems (IES), users’ independent energy optimization behavior, known as integrated demand response (IDR), enables them to adjust multi-energy demand to achieve peak shaving and cost-saving effects [5,6]. Recent studies have explored various aspects of IDR in IES. For instance, the literature [7] proposes a two-stage optimization model of “open source and reducing expenditure” to give full play to the potential of multiple energy sources on the load side to participate in demand response (DR) to realize the low-carbon economic operation of the comprehensive energy system. Meanwhile, the literature [8] establishes low-carbon economic scheduling of integrated energy systems considering flexible supply–demand response and diversified utilization of hydrogen. In the literature [9], optimal hierarchical energy management in an integrated energy system is introduced, considering the variabilities associated with renewable energy resources, uncertain loads like electric vehicles, energy market interaction uncertainties, and a demand response program that relies on a robust optimization technique. The literature [10] provides a multi-objective solution that includes demand response scheduling and optimizes factors such as PV and WT capacities, energy storage strategies, battery usage, power exchange with the grid, and overall costs and environmental impacts. The literature [11] addresses these challenges by proposing a three-level demand response model for integrated energy systems structured to align grid and integrated energy system objectives through a hierarchical incentive system using a Stackelberg game framework. The literature [12] establishes a data center cluster framework composed of a DCC operator and data center prosumers. Furthermore, a two-stage energy-sharing model is developed, incorporating the IDR across multiple loads. In summary, the exploration of IDR in IES has demonstrated its significant potential to enhance energy efficiency, reduce operational costs, and promote sustainable energy utilization through diverse optimization strategies and models.

The effective management of integrated energy systems (IESs) depends on two crucial factors: (1) formulating optimal pricing strategies for energy service providers (ESPs) to maintain a balance between supply and demand and (2) achieving operational optimization to improve efficiency, cost-effectiveness, and reliability. These challenges become more intricate with the integration of distributed energy resources (DERs), demand-side management (DSM) strategies, and multi-energy coupling within contemporary smart grids. The literature [13] presents a real-time monitoring approach to electric vehicle (EV) charging dynamics with battery storage support over a 24 h period. By simulating EV demand, state of charge (SOC), and charging/discharging events, this study provides insights into operational strategies for energy storage systems to maximize the charging simultaneity factor through internal power enhancement. The literature [14] investigates the practical impact of price-based demand-side management (DSM) for occupants in an office building connected to a renewable energy microgrid. It analyzes occupant reactions in terms of perceived practicality regarding DSM implementation, considering factors such as renewable energy generation, load shifting, and energy costs. The literature [15] proposes a blockchain-based smart solution for DSM in residential buildings within a neighborhood, aiming to improve the peak-to-average ratio (PAR) of power load, reduce energy consumption, and enhance occupant thermal comfort by modeling heating, illumination, and appliance systems. The literature [16] focuses on designing and operating a standalone hybrid renewable energy system (HRES) for residential building loads. The proposed HRES integrates photovoltaic panels and wind turbines as energy sources, coupled with a hydrogen subsystem and battery bank for energy storage. The literature [17] introduces a grey-box model-based DSM method for rooftop photovoltaics (PVs) in buildings to achieve peak shaving, thermal comfort, and mechanical cost savings. The literature [18] employs an archetype-based approach to model generic building clusters, targeting diverse building types, compositions, and sizes, and applies this approach to a case study of 54 buildings. The literature [19] designs a decentralized transactive energy management framework for a real-world hybrid building cluster, capturing strategic interactions between distributed energy resource owners and consumers via a multi-leader–multi-follower Stackelberg game. The literature [20] integrates a multi-stage reinforcement learning algorithm with imitation learning within the MAPPO framework to optimize system energy performance, enhancing solar PV self-consumption, reducing energy costs, and maintaining indoor thermal comfort. The literature [21] explores the benefits of deep reinforcement learning, a hybrid method combining RL and deep learning, for online optimization of building energy management system schedules, marking the first such application in the smart grid context. The literature [22] proposes a machine learning-driven robust model predictive control framework for sustainable multi-zone buildings using renewable energies, addressing weather forecast uncertainties in energy management, reducing overall electricity expenses, and ensuring occupant thermal comfort. The literature [23] introduces a novel energy management framework, deep–fuzzy logic control (Deep-FLC), which combines predictive modeling using long short-term memory (LSTM) networks with adaptive fuzzy logic to optimize energy allocation, minimize grid dependency, and preserve battery health in grid-connected microgrid (MG) systems. The literature [24] designs an efficient energy management system for smart homes with battery electric vehicles (BEVs) and bidirectional chargers, addressing the optimal control problem of determining battery charging/discharging strategies to minimize energy expenditure and costs. Generally speaking, these studies show a development trend from single technology application to multi-technology integration, from single system optimization to cluster collaborative management and deterministic control to uncertain response and work together to improve the consumption efficiency of renewable energy, reduce energy consumption costs, and ensure user comfort, which provides theoretical support and practical reference for smart grid and comprehensive energy management.

As artificial intelligence technology advances, reinforcement learning (RL) has gained significant attention and found applications in various power system domains, including power generation control, market competition, and load response [25]. RL, a pivotal machine learning technique, centers on enabling agents to take actions within an environment to maximize their cumulative rewards [26,27]. This aligns seamlessly with the objective of dynamic economic scheduling in integrated energy systems (IESs), which aims to optimize scheduling decisions for minimizing operational costs [28,29]. Several studies have explored real-time pricing strategies and demand response optimization in smart grids using advanced computational frameworks. The literature [30] proposes a deep reinforcement learning method to enhance network flexibility in an integrated energy system scheduling with demand response. The literature [31] proposes an energy management optimization method based on RL for an integrated electric–thermal energy system based on the improved proximal policy optimization algorithm, which effectively mitigates the problems of the traditional heuristic algorithms. Meanwhile, the literature [32] formulates the three-stage energy management problem as a Markov Decision Process and establishes a hierarchical energy management framework for virtual power plants based on deep RL techniques. The literature [33] introduces a collaborative bidding decision framework that leverages a multi-agent deep deterministic policy gradient algorithm, specifically targeting the optimization of decision-making in multi-market coupling scenarios for thermal power suppliers. The literature [34] proposes an adaptive personalized federated reinforcement learning (FRL) for multiple ESS optimal dispatch in various electricity markets with electric vehicles and renewable energy, achieving both the joint optimization of multiple ESSs and avoiding the degraded performance of FRL’s local model. In summary, the integration of RL techniques into IES management showcases remarkable potential for enhancing operational efficiency, reducing costs, and promoting sustainable energy practices across diverse applications.

While substantial advancements have been made in IES management, particularly in demand response, distributed energy integration, and multi-energy optimization, several critical research gaps persist. First, current studies predominantly address single-agent optimization (e.g., ESP pricing or user response) while neglecting the dynamic interplay between energy service providers and consumers, particularly the influence of user energy preferences on system decarbonization. Second, the two-stage energy transaction process (pricing–response) remains underexplored regarding its holistic impacts on economic performance and carbon emissions. Third, despite reinforcement learning’s demonstrated potential in IES scheduling, existing approaches largely focus on single-objective optimization (e.g., cost reduction) without effectively balancing user preferences with ESP benefits or developing robust distributed algorithms. This study addresses these limitations by introducing an innovative energy management framework integrating user preference modeling with reinforcement learning optimization to simultaneously enhance economic efficiency and environmental sustainability in IES operations. The comparison between this article and the other state-of-the-art literature contributions is shown in Appendix A. The main innovations of this paper are as follows.

(1) A quantitative model for users’ electricity and thermal consumption preferences is established to capture their dynamic response behaviors to prices and services. Through a multi-agent (ESP and user clusters) participation framework, user preferences are integrated into energy management decision-making.

(2) An energy management model based on reinforcement learning (RL) is established. Within the RL framework, a game-theoretic model between the ESP and users is constructed to collaboratively optimize the ESP’s revenue and user satisfaction. A distributed solution algorithm combining Q-learning and quadratic programming is proposed to achieve efficient two-stage interactive decision-making.

(3) By filling the gap in the interaction mechanism between the ESP and users, this paper provides a novel approach for low-carbon economic scheduling in IES. The proposed method not only enhances the economic efficiency of the system but also reduces carbon emissions through user preference guidance, offering theoretical support and practical references for the collaborative optimization of future smart grids and multi-energy systems.

The remainder of this paper is organized as follows: Section 2 introduces the energy management framework and user preference model. Section 3 establishes the energy service provider and user benefit model. Section 4 details the energy management model based on reinforcement learning. Section 5 discusses the simulation results. Finally, Section 6 presents the conclusions.

2. Energy Management Framework and User Preference Model

2.1. Energy Management Framework

The subjects involved in the IES include the power grid, ESP, and user cluster. Figure 1 is a system structure with the ESP and multi-users.

On the energy supply side, the ESP has an independent energy management center, which purchases electricity from the power grid and guides various IESs to participate in energy management by optimizing internal electric and thermal prices and multi-energy unit operation strategies. The multi-energy units of the ESP include combined heat and power (CHP), gas boiler (GB), electricity storage (ES), and heat storage (HS).

On the energy demand side, each user cluster optimizes its internal energy consumption mode by integrating load optimization, unit operation optimization, and the purchasing of electric and thermal power. Specifically, user clusters adjust their electric and thermal load demand response based on the electric and thermal prices set by the ESP, aiming to minimize costs while meeting energy needs. This involves dynamically shifting or reducing loads during peak price periods and increasing consumption during off-peak times. Additionally, user clusters optimize the operation of their internal energy units, such as distributed photovoltaics (PVs) for electricity generation, heat pumps (HPs) for heating and cooling, and ES systems for load balancing. By strategically charging and discharging the ES systems, user clusters can further reduce their reliance on grid power and manage energy costs. Finally, user clusters decide on the optimal amount of electric and thermal power to purchase from ESP and the power grid, considering both price signals and their own energy production capabilities, thereby achieving a cost-effective and efficient energy management strategy.

Based on the above energy interaction characteristics, this section establishes an ESP–multi-user energy management framework based on reinforcement learning, as shown in Figure 2.

The established energy management framework of the IES takes the ESP as Agent, the electricity and heat price of the ESP as Action, multiple user clusters as Environment, and user clusters’ demand for electric and thermal transaction of the ESP as State. By iterative optimization of the RL framework, the optimal electric and thermal prices of the ESP and unit operation strategy are optimized, and the energy consumption strategy and unit output power of the user cluster are optimized so as to improve the system operation economy and reduce the system carbon emission.

2.2. User Energy Preference Model

2.2.1. Electricity Energy Preference Model

As can be seen from Figure 1, users can trade electricity with the ESP and power grid at the same time. For users’ electric load demand, the main external power supply sources include the ESP and power grid, and it is proposed to use the price reduction ratio to reflect the probability of users choosing electric energy suppliers [5,6,7,35]. Based on this, considering the user’s electric energy preference, a user’s electric load source selection model is proposed. The probability that the user chooses the ESP or power grid as the power load source at time t in a day can be expressed as

ρ_{ESP, t}^{e, buy} = \frac{μ_{grid, t}^{e, sell} - μ_{ESP, t}^{e, sell}}{μ_{grid, t}^{e, sell} - μ_{grid, t}^{e, buy}}

(1)

ρ_{grid, t}^{e, buy} = 1 - ρ_{ESP, t}^{e, sell}

(2)

where

ρ_{ESP, t}^{e, buy}

is the probability that the user chooses the ESP as the electricity source at time t;

ρ_{grid, t}^{e, buy}

is the probability that the user chooses the power grid as the electricity source at time t;

μ_{grid, t}^{e, sell}

is the selling electric price of the power grid;

μ_{ESP, t}^{e, sell}

is the selling electric price of the ESP;

μ_{grid, t}^{e, buy}

is the purchasing electric price of the power grid.

According to the importance priority of electric load, user’s electric load can be divided into non-adjustable electric load and adjustable electric load. When users choose different power supply sources, they will adjust the demand response of the adjustable electric load according to their selling price [36,37,38].

P_{i, t}^{eload} = P_{i, t}^{eload, cr} + P_{i, t}^{eload, ad}

(3)

Δ P_{i, t, ESP}^{eload, ad} = ρ_{ESP, t}^{e, buy} P_{i, t}^{eload, ad} \{1 - \frac{μ_{grid, t}^{e, sell} - μ_{ESP, t}^{e, sell}}{μ_{grid, t}^{e, sell}}\}

(4)

Δ P_{i, t, grid}^{eload, ad} = ρ_{grid, t}^{e, buy} P_{i, t}^{eload, ad} \{1 - \frac{μ_{grid, t - 1}^{e, sell} - μ_{grid, t}^{e, sell}}{μ_{grid, t - 1}^{e, sell}}\}

(5)

Δ P_{i, t}^{e l o a d, DR} = Δ P_{i, t, ESP}^{eload, ad} + Δ P_{i, t, grid}^{eload, ad}

(6)

where

P_{i, t}^{eload}

is the user’s electric load;

P_{i, t}^{eload, cr}

is the non-adjustable electric load;

P_{i, t}^{eload, ad}

is the adjustable electric load;

Δ P_{i, t}^{e l o a d, DR}

is the actual electric load adjustment;

Δ P_{i, t, ESP}^{eload, ad}

is the adjusted electric load according to the ESP electricity price;

Δ P_{i, t, grid}^{eload, ad}

is the adjusted electric load according to the electricity price of the power grid.

2.2.2. Thermal Energy Preference Model

For the user’s thermal load in the traditional IES operation mode, it is generally believed that the user’s thermal load is only supplied by the ESP. In the IES framework, the user’s thermal load can not only come from the ESP but also from the user’s self-built heat pump system. Furthermore, based on the user’s thermal preference, this section puts forward the user’s thermal mode selection model.

Due to the psychological threshold of users’ acceptance of thermal prices, this paper takes the urban energy system market thermal price as the psychological threshold of users’ acceptance of thermal prices. When the selling price of ESP exceeds the threshold, users will no longer purchase thermal from the ESP; when the threshold is lower, the lower the thermal price set by the ESP, the greater the probability that users will choose the ESP as their thermal source. Based on this, considering user thermal preferences, a user thermal source selection model is proposed. The probability of users choosing the ESP or their own heat pump as the thermal source at time t within a day can be expressed as

ρ_{ESP, t}^{h, buy} = \frac{\max (μ_{grid, t}^{h, sell} - μ_{ESP, t}^{h, sell}, 0)}{μ_{grid, t}^{h, sell} - μ_{ESP, t}^{h, sell, \min}}

(7)

ρ_{HP, t}^{h, buy} = 1 - ρ_{ESP, t}^{h, buy}

(8)

where

ρ_{ESP, t}^{h, buy}

is the probability that the user chooses the ESP as the thermal source at time t;

ρ_{HP, t}^{h, buy}

is the probability that the user chooses the heat pump as the thermal source at time t;

μ_{grid, t}^{h, sell}

is the market thermal price of urban energy system;

μ_{ESP, t}^{h, sell}

is the selling thermal price of ESP;

μ_{ESP, t}^{h, sell, \min}

is the minimum selling thermal price of the ESP.

According to the importance priority of thermal load, user thermal load can be divided into non-adjustable thermal load and adjustable thermal load. When users choose the ESP as the thermal source, the adjustable heat load demand response will be according to the thermal price of ESP [36,37,38].

P_{i, t}^{hload} = P_{i, t}^{hload, cr} + P_{i, t}^{hload, ad}

(9)

Δ P_{i, t}^{hload, ad} = P_{i, t}^{hload, ad} \{1 - \frac{μ_{grid, t}^{h, sell} - μ_{ESP, t}^{h, sell}}{μ_{grid, t}^{h, sell}}\}

(10)

where

P_{i, t}^{hload}

is the user’s thermal load;

P_{i, t}^{hload, cr}

is the non-adjustable thermal load;

P_{i, t}^{hload, ad}

is the adjustable thermal load;

Δ P_{i, t}^{hload, DR}

is the user’s actual thermal load adjustment.

3. ESP and User Benefit Model

3.1. ESP Benefit Model

The ESP is the energy manager of the IES, which optimizes the price of electric and thermal transactions with users according to the power grid price and guides users to participate in the energy management of the system. The benefit model of the ESP takes the maximization of operation benefit as the objective function, and its operation benefit includes transaction benefit with the power grid, interaction benefit with users, operation cost, and environmental cost [15,16,17,18,19,20,39,40].

\max R_{t}^{ESP} = R_{t}^{grid} + R_{t}^{users} + C_{t}^{op} + C_{t}^{en}

(11)

where

R_{t}^{ESP}

is the operation benefit of the ESP;

R_{t}^{grid}

is the transaction income between the ESP and power grid;

R_{t}^{users}

is the transaction income between the ESP and users;

C_{t}^{op}

is the operating cost of the ESP;

C_{t}^{en}

is the environmental cost.

The transaction income between ESP and the power grid takes into account the cost of ESP purchasing electricity from the power grid and the income of selling electricity to the power grid.

R_{t}^{grid} = μ_{grid, t}^{e, buy} P_{grid, t}^{e, buy} - μ_{grid, t}^{e, sell} P_{grid, t}^{e, sell}

(12)

where

μ_{grid, t}^{e, buy}

is the price of electricity purchased by the power grid from the ESP;

μ_{grid, t}^{e, sell}

is the price of electricity sold by the power grid to the ESP;

P_{grid, t}^{e, buy}

is the electric power sold by the ESP to the power grid;

P_{grid, t}^{e, sell}

is the electric power purchased by the ESP from the power grid.

The transaction income between the ESP and users takes into account the cost of purchasing electricity from users and the income of supplying electricity and thermal to users [41].

R_{t}^{users} = \sum_{i = 1}^{I} (μ_{ESP, t}^{e, sell} ρ_{ESP, t}^{e, buy} P_{i, t}^{e, sell} + μ_{ESP, t}^{h, sell} ρ_{ESP, t}^{h, buy} P_{i, t}^{h, sell}) - \sum_{i = 1}^{I} ρ_{ESP, t}^{e, sell} P_{i, t}^{e, buy}

(13)

where

I

is the number of users;

μ_{ESP, t}^{e, sell}

and

μ_{ESP, t}^{h, sell}

are the prices of electricity and thermal supplied by the ESP to users;

μ_{ESP, t}^{e, buy}

is the price of electricity purchased by the ESP from users;

P_{i, t}^{e, sell}

and

P_{i, t}^{h, sell}

are the electricity and thermal purchased by users;

P_{i, t}^{e, buy}

is the electricity sold by users.

The operation cost of the ESP mainly includes natural gas costs and equipment operation and maintenance costs.

C_{t}^{op} = μ_{ng} (\frac{P_{t}^{CHP}}{η^{CHP} L H V_{ng}} + \frac{P_{t}^{G B}}{η^{GB} L H V_{ng}}) + \sum_{n = 1}^{N} c_{n} P_{t}^{n}

(14)

where

μ_{ng}

is the price of natural gas;

P_{t}^{CHP}

and

P_{t}^{G B}

are the power of CHP and GB;

η^{CHP}

and

η^{GB}

are the thermal efficiency of CHP and GB;

L H V_{ng}

is the low calorific value of natural gas;

c_{n}

is the maintenance cost coefficient of equipment n;

P_{t}^{n}

is the output power of equipment n.

The environmental cost of ESP is the environmental cost caused by GB and CHP consuming natural gas and emitting carbon dioxide.

C_{t}^{en} = μ_{t}^{{co}_{2}} ϖ_{ng} μ_{ng} (\frac{P_{t}^{CHP}}{η^{CHP} L H V_{ng}} + \frac{P_{t}^{G B}}{η^{GB} L H V_{ng}})

(15)

where

μ_{t}^{{co}_{2}}

is the carbon emission price;

ϖ_{ng}

is the carbon emission intensity of natural gas.

Formulas (16)–(24) define the operation constraints of the ESP, including power balance constraints, interactive power constraints, energy storage constraints, and pricing decision constraints [15,16,17,18,19,20,21,22,23,24].

P_{t}^{CHP} + P_{grid, t}^{e, sell} + \sum_{i = 1}^{I} ρ_{E S P}^{e, sell} P_{i, t}^{e, buy} - \sum_{i = 1}^{I} ρ_{E S P}^{e} P_{i, t}^{e, sell} - P_{grid, t}^{e, buy} + P_{t}^{ees, c} - P_{t}^{ees, d} = 0

(16)

P_{t}^{CHP} θ_{i}^{CHP} + P_{t}^{GB} + P_{t}^{hes, c} - P_{t}^{hes, d} = 0

(17)

- P_{grid, \max}^{e, sell} \leq P_{grid, t}^{e, sell} \leq P_{grid, \max}^{e, sell}

(18)

E_{t}^{ees} = E_{t - 1}^{ees} + η^{ees, c} P_{t}^{ees, c} - P_{t}^{ees, d} / η^{ees, c}

(19)

10 % E^{ees} \leq E_{t}^{ees} \leq 90 % E^{ees}

(20)

E_{t}^{h e s} = E_{t - 1}^{hes} + η^{hes, c} P_{t}^{h e s, c} - P_{t}^{h e s, d} / η^{hes, c}

(21)

10 % E^{hes} \leq E_{t}^{h e s} \leq 90 % E^{hes}

(22)

0 \leq μ_{ESP, t}^{e, sell} \leq μ_{grid, t}^{e, sell}

(23)

0 \leq μ_{ESP, t}^{h, sell} \leq μ_{grid, t}^{h, sell}

(24)

where

E_{t}^{ees}

and

E_{t}^{h e s}

are the electricity and thermal stored by the ES and HS;

P_{t}^{ees, c}

and

P_{t}^{h e s, c}

are the charging power of the ES and HS;

P_{t}^{ees, d}

and

P_{t}^{h e s, d}

are the discharge power of the ES and HS;

η^{ees, c}

and

η^{hes, c}

are the charging efficiency of the ES and HS;

η^{ees, d}

and

η^{hes, d}

are the discharge efficiency of the ES and HS;

E^{ees}

and

E^{hes}

are the capacity of the ES and HS;

P_{grid, \max}^{e, sell}

is the maximum transmission power of electric energy transaction between the ESP and power grid.

3.2. User Benefit Model

According to the ESP’s price strategy, users in the IES optimize their own energy consumption strategy, and its benefit model aims at the minimum comprehensive operation cost. The comprehensive operation cost of users includes transaction costs with the power grid, electric and thermal transaction costs with the ESP, equipment operation costs, and satisfaction loss costs [42,43].

\min C_{i, t}^{user} = C_{i, t}^{e, grid} + C_{i, t}^{eh, ESP} + C_{i, t}^{user, op} + C_{i, t}^{user, uti}

(25)

where

C_{i, t}^{user}

is the comprehensive cost of users;

C_{i, t}^{e, grid}

is the transaction cost between the user and the power grid;

C_{i, t}^{eh, ESP}

is the transaction cost between the user and the ESP;

C_{i, t}^{user, op}

is the operation cost of the equipment.

The transaction cost between users and the power grid takes into account the cost of purchasing electricity from the power grid and the income of selling electricity to the power grid.

C_{i, t}^{e, grid} = μ_{grid, t}^{e, sell} ρ_{grid, t}^{e, buy} P_{i, t}^{e, sell} - ρ_{grid, t}^{e, sell} P_{grid, t}^{e, buy}

(26)

The transaction cost between users and the ESP takes into account the cost of purchasing electric thermal from the ESP and the income of selling electricity to the ESP.

C_{i, t}^{eh, ESP} = \sum_{i = 1}^{I} (μ_{ESP, t}^{e, sell} ρ_{ESP, t}^{e, buy} P_{i, t}^{e, sell} + μ_{ESP, t}^{h, sell} ρ_{ESP, t}^{h, buy} P_{i, t}^{h, sell}) - \sum_{i = 1}^{I} ρ_{ESP, t}^{e, sell} P_{i, t}^{e, buy}

(27)

The operation cost of equipment is mainly the operation cost of the user heat pump and energy storage.

C_{i, t}^{user, op} = c_{HP} η_{HP} P_{i, t}^{HP} + c_{ES} P_{i, t}^{ES}

(28)

where

η_{HP}

is the thermal efficiency of the HP;

c_{HP}

is the maintenance cost coefficient of the HP;

P_{i, t}^{HP}

is the electric output power of the HP;

c_{ES}

is the maintenance cost coefficient of energy storage;

P_{i, t}^{ES}

is the output power of energy storage.

The loss of customer satisfaction is caused by the user’s optimization and adjustment of the electric and thermal load according to the ESP price.

C_{i, t}^{user, uti} = α_{e, i} (Δ P_{i, t}^{e l o a d}) + \frac{1}{2} β_{e, i} {(Δ P_{i, t}^{e l o a d})}^{2} + α_{h, i} (Δ P_{i, t}^{hload}) + \frac{1}{2} β_{h, i} {(Δ P_{i, t}^{hload})}^{2}

(29)

where

α_{e, i}

and

β_{e, i}

are the loss coefficient of user’s electric load satisfaction;

α_{h, i}

and

β_{h, i}

are the loss coefficient of user’s heat load satisfaction.

Formulas (30)–(33) define user’s operation constraints, including power balance constraints and equipment operation constraints [42,43].

P_{i, t}^{e, sell} - P_{i, t}^{e, buy} + P_{i, t}^{PV} + P_{i, t}^{ES} - P_{i, t}^{HP} = P_{i, t}^{e l o a d} - Δ P_{i, t}^{e l o a d, DR}

(30)

P_{i, t}^{h, sell} + η_{HP} P_{i, t}^{HP} = P_{i, t}^{hload} - Δ P_{i, t}^{hload, ad}

(31)

P_{i, t}^{h, sell} = ρ_{ESP, t}^{e, sell} P_{i, t}^{hload, DR}

(32)

η_{HP} P_{i, t}^{HP} = ρ_{HP, t}^{e, sell} P_{i, t}^{hload, DR}

(33)

where

P_{i, t}^{PV}

is the photovoltaic power of users.

4. Energy Management Model Based on Reinforcement Learning

Based on reinforcement learning theory, this section establishes an energy management model of reinforcement learning between the ESP and users and optimizes the electricity and thermal price of ESP and the electricity and thermal demand strategy of users [31,32,33,34].

4.1. Energy Management Model

4.1.1. Action Space Model

In the reinforcement learning model established in this paper, the action space is the electricity price and hot price of the ESP. In the process of electric heating transaction between the ESP and users, the ESP is in a dominant position and obtains the maximum benefit by adjusting the electric heating price strategy. For the ESP, the action space can be defined as

μ_{t} = [μ_{ESP, t}^{e, sell}, μ_{ESP, t}^{h, sell}, μ_{ESP, t}^{e, buy}, t]

(34)

At each trading moment, the ESP will calculate the reward function, update the price of electric heating, and then update the action space according to the demand of electric heating trading fed back by users.

4.1.2. State Space Model

In the energy management model of reinforcement learning established in this paper, the state space is the electricity and thermal energy transaction between the ESP and users. Under the guidance of the ESP electricity and thermal price, users will optimize their electricity and thermal energy consumption strategy and feed back the optimized electricity and thermal energy trading demand to the ESP. The state space can be defined as

P_{i, t} = [P_{i, t}^{e, sell}, P_{i, t}^{e, buy}, P_{i, t}^{h, sell}, t]

(35)

At each trading moment, the user will optimize the trading strategy according to the ESP electricity and thermal energy pricing information and then update the state space.

4.1.3. Reward Function Model

The energy trading interaction between ESPs and users resembles a game-theoretic scenario, where both parties’ interests must be considered in energy management decisions. To address this, the study establishes a weighted reward function that incorporates dual benefit objectives: the ESP’s profit function and the user benefit function.

R_{t} = \max (r R_{t}^{E S P} - (1 - r) \sum_{i = 1}^{I} C_{i, t}^{user})

(36)

Here,

r

represents the weight coefficient between ESP and the target of user benefit function, where

r \in [0, 1]

.

4.2. Model Solving Method Based on Q-Learning–QP

In conventional centralized optimization approaches, comprehensive data from all participants—including equipment specifications and energy consumption preferences—must be obtained. However, in today’s competitive electricity market, such information is often non-transparent, necessitating independent optimization by each stakeholder. To address this challenge, this paper proposes a distributed equilibrium solution framework that integrates Q-learning with quadratic programming (Q-learning–QP) [31,32,33,34,36,44]. The computational workflow of this algorithm is illustrated in Figure 3.

The decision-making process of ESP, as the leader in integrated energy system management, involves a large-scale nonlinear optimization problem. The Q-learning algorithm can effectively address this by reducing computational complexity and enhancing optimization performance. Since the user’s optimization objective is formulated as a quadratic function, quadratic programming can be employed to improve solution speed and accuracy. By integrating quadratic programming into the Q-learning iterative process, users only need to respond to the ESP’s price signals by providing their electric trading power feedback. This approach not only safeguards against information leakage but also ensures the privacy of all participating parties.

5. Case Study

5.1. Basic Data

In this paper, the electricity–thermal integrated energy system composed of the ESP and two user clusters, as shown in Figure 1, is selected for simulation analysis. In this example, the given optimal scheduling period is 24 h. IES unit parameters [1,4,39,45] are shown in Table 1. The price of natural gas is 3.25 CNY/m³, the low calorific value of natural gas is 38 MJ/m³, the carbon emission intensity of natural gas is 1.885 kg/m³, the environmental cost coefficient is 0.25 CNY/kg [39], and the time-of-use electricity price [39,46] of the power grid is shown in Table 2. The weight coefficient between the ESP and user benefit function target is 0.5. The electricity, thermal load, and photovoltaic power of each user cluster are shown in Figure 4. The maximum adjustable capacity of power load and thermal load of user cluster 1 are 15% and 10%, respectively, and those of user cluster 2 are 8% and 10%, respectively.

Based on the above basic parameters and the established energy management model of a comprehensive energy system considering users’ energy preferences, this paper carries out a simulation test on a computer with i5-9400F CPU, 32 GB memory, and a Windows 64-bit system.

5.2. Simulation Results

5.2.1. Optimization Result of ESP Pricing and Operation Strategy

Figure 5 shows the optimal electricity price strategy of the ESP, and Figure 6 shows the optimal heating price strategy of the ESP. Figure 7 shows the optimal electric and thermal energy interaction strategy between the ESP and user cluster.

From the optimal electricity and heat price curves of the ESP in Figure 5 and Figure 6, it can be seen that the optimal electricity and heat prices of the ESP are always within the constraints of power grid electricity price and heat market price. In most time periods, the ESP adopts the same pricing strategy as the grid electricity price and thermal market price in order to ensure its own operating efficiency, while in some time periods, the ESP adopts a pricing strategy lower than the thermal market price in order to guide users to participate in the ESP electricity–thermal transactions.

Since the ESP is the only external thermal supply source for users, the optimization of ESP operation should first meet the heating needs of user cluster 1 and user cluster 2 in Figure 7 by optimizing the cooperative heating strategy of the CHP, GB, and HE. On this basis, the ES charging and discharging strategy and the power interaction strategy with the power grid are further optimized to meet the needs of its own low-carbon economic operation. Figure 8 shows the optimal unit operation strategy of the ESP.

5.2.2. Optimization Result of User Energy Management

In the process of energy management, after receiving the electricity–thermal price signal of the ESP, the user cluster will optimize its energy consumption strategy with the goal of minimizing its own operating cost, including optimizing ES and HP operating strategies and optimizing the electricity–thermal load according to the electricity–thermal energy preference. Figure 9 and Figure 10 show the load optimization results and optimal operation strategy of user cluster 1, and Figure 11 and Figure 12 show the load optimization results and optimal operation strategy of user cluster 2, respectively.

As can be seen from Figure 9, the load curve of user cluster 1 shows obvious residents’ load characteristics. The electric load peaks at 7:00–9:00, 11:00–13:00, and 18:00–20:00, and the heat load is relatively high at 1:00–10:00 and 1:00–24:00. Therefore, the user cluster 1 loads during the off-load peak period during load optimization.

Under the dual constraints of ESP price strategy and user energy preference, user cluster 1 simultaneously optimizes PV, HP equipment operation strategy, and interaction power with the ESP. As can be seen from Figure 10, user cluster 1 starts the HP for heating from 7:00 to 16:00, gives priority to its own heating through PV power generation, and provides heating through the ESP at night.

As can be seen from Figure 11, the load curve of user cluster 2 shows obvious commercial/office load characteristics. The electric load and heat load both peak at 8:00–20:00, and the above load characteristics show the characteristics changing according to business hours/office hours. Compared with the load regulation characteristics of user cluster 1, user cluster 2 has almost no load regulation during the peak hours of electricity load but has greatly reduced the load during non-business/non-office hours, and the thermal load regulation also shows similar regulation characteristics as the electric load regulation.

Compared with user cluster 1, user cluster 2 shows obvious differences in its operation optimization strategy because of its ES configuration and different load characteristics. User cluster 2 gives priority to PV power generation to meet the power supply demand during the peak hours of 8:00–20:00 and stores the surplus power in the ES to meet the power demand of the HP heating at night. However, in the peak period of thermal load, considering the economy, user cluster 2 mainly supplies thermal through the ESP in addition to reducing part of the thermal load to ensure the basic thermal load demand.

5.3. Discussion and Analysis

5.3.1. Analysis of Operation Economy and Carbon Emission

Figure 13 is the convergence curve of each user and ESP objective function. According to the calculation data in Figure 13, the algorithm converges after 14 iterations, which takes 6.55 s; when the algorithm converges, the objective function values of user cluster 1, user cluster 2, and ESP are 24,758.50 CNY, 21,182.61 CNY, and 3103.72 CNY, respectively.

In order to further verify the effectiveness of the model established in this paper in improving system operation economy and reducing carbon emissions, this section compares the results of the proposed IDR optimization considering user preferences with those of the conventional IDR optimization. Among them, the conventional IDR optimization users directly optimize the load according to the price of the ESP, considering the preference of power supply and heating source. Table 3 shows the results of cost–benefit analysis of user clusters, Table 4 shows the results of cost–benefit analysis of ESP, and Figure 14 shows the results of carbon emission of the system.

As can be seen from the data in Table 3, compared with the conventional IDR optimization, the total operating costs of user cluster 1 and user cluster 2 are reduced by 6.98% and 0.36%, respectively, and the loss costs of user satisfaction are reduced by 11.45% and 17.18%, respectively. For user cluster 1, after considering the energy preference, the adjustment flexibility of load optimization becomes greater, the synergy between load adjustment and unit operation optimization is stronger, and its operation cost and satisfaction loss cost are greatly reduced.

Compared with user cluster 1, after considering the energy preference, user cluster 2 can adjust the electric heating load to better meet the user’s energy supply preference, and its loss cost of user satisfaction is greatly reduced. However, it also leads to a smaller load adjustment range during the peak period of electric heating load and needs to optimize the PV, ES, and trading power with the ESP and power grid to meet the energy demand. Its unit operation and maintenance cost increases by 4.04%, the total electric heating transaction cost increases by 3.38%, and the total operating cost decreases by 0.36%.

As can be seen from the data in Table 4, compared with the conventional IDR optimization, the total operating benefit of the ESP improved by 10.79% after IDR optimization considering user preferences, and the environmental cost was reduced by 5.05%. Figure 14 shows the analysis of carbon emission results, and Table 5 shows the carbon emission source of the test system.

According to the calculation data in Figure 14 and Table 5, in the case of conventional IDR optimization, the flexibility of ESP operation optimization is limited due to the rigid adjustment of user cluster load, and the total carbon emission of the system reaches 22,946.39 kg. After considering the user’s energy preference, the flexibility of the user’s load adjustment is enhanced, which improves the collaborative optimization ability between the user cluster and ESP, and the flexibility of ESP operation optimization is enhanced. The total carbon emission of the system is 21,787.64 kg, which is 5.05% lower than that of conventional IDR optimization.

5.3.2. Scalability Verification of the Proposed Method

To demonstrate the scalability of the proposed method, a sophisticated test system comprising six user clusters (UCs) is employed for simulation. Specifically, UC1–4 represent residential building clusters, UC5 denotes an office building cluster, and UC6 corresponds to a commercial building cluster. Figure 15 illustrates the electric load profile of this complex test system, while Figure 16 depicts its thermal load profile. Table 6 outlines the equipment parameters for the ESP [1,4,39,45], and Table 7 provides the equipment parameters for each UC [1,4,39,45]. The pricing and operational parameters utilized in testing the complex system remain consistent with those specified in Section 5.1.

Figure 17 shows the iterative process of optimizing the benefits of the ESP and UCs. According to the calculation data in Figure 17, the algorithm converges after 65 iterations, which takes 13.26 s. And when the algorithm converges, the objective function values of the ESP and UC1–UC6 are 1494.31 CNY, 946.71 CNY, 1202.20 CNY, 1271.92 CNY, 1399.38 CNY, 1238.98 CNY, and 1567.32 CNY, respectively.

Figure 18 shows the ESP pricing strategy corresponding to the optimization result in the complex test system, Figure 19 shows the optimized electricity and thermal energy demand of UCs, and Figure 20 shows the energy unit operation strategy of the ESP in the complex test system.

Figure 19 demonstrates a distinct peak-valley pattern in the ESP’s electricity supply curve to UCs. UC power consumption decreases during high-price periods while showing significant increases during low-price periods. Simulation results indicate that between 9:00 and 15:00, influenced by both high electricity prices and sufficient photovoltaic generation, UCs supplied 207.15 kWh of electricity back to ESP.

To ensure energy supply–demand equilibrium, the ESP dynamically adjusts equipment operations to maintain optimal performance. Figure 20 shows the output strategies of electric and thermal units: while ESP satisfies most UC demands internally, it strategically purchases grid electricity during low-price periods to improve cost-efficiency.

5.3.3. Sensitivity Analysis of Key Parameters

In the design of the energy trading system, the weight coefficient

r

is the core adjustment parameter to balance the interests of the ESP and users, and its value directly affects the operation efficiency and sustainability of the whole system. Through in-depth analysis, we can find that the setting of

r

value is essentially to find the optimal balance between efficiency and fairness, and many factors need to be considered comprehensively. Figure 21 shows the sensitivity analysis result of weight coefficients in different test systems.

From the basic mechanism, the design of the

r

value reflects the unique characteristics of the supply and demand game in the energy trading market. As shown in Figure 21, when the

r

value increases, the system will tilt to the ESP, giving priority to ensuring its profit margin; when the

r

value decreases, it will pay more attention to protecting the interests of users. This two-way adjustment mechanism provides a flexible means of interest distribution for the system. In actual operation, extreme values often bring negative effects: when

r

approaches 1, although the short-term income of ESP increases rapidly, an aggressive pricing strategy will lead to soaring user costs, which may lead to user loss in the long run; when

r

approaches 0, over-protection of users’ interests will inhibit the ESP’s investment enthusiasm and ultimately affect the quality of service.

The most ideal operating state appears in the median range of the

r

value, namely 0.4–0.7. This interval can produce a significant two-way incentive effect: the ESP can obtain a reasonable profit of about 150% of the benchmark value and, at the same time, control the increase of UC cost within an acceptable range of 20–30%. It is worth noting that the scale of the system plays an important role in adjusting the choice of the

r

value. The data show that in a large-scale system with six UCs, the reward function can be changed from negative to positive when the r value reaches 0.6, while a small-scale system with two UCs needs a higher r value (0.8+) to achieve the same effect.

From the perspective of development trends, the setting of the

r

value should be adapted to the market development stage. Using a low

r

value (0.3–0.5) in the cultivation period is helpful to attract users to participate; after entering the mature stage, appropriately increasing it to 0.6–0.7 can maintain the market vitality. In addition, policymakers can implement precise regulation by setting the floating range of the

r

value, and it is suggested to maintain a policy intervention space of 0.1, which can not only ensure market autonomy but also prevent systemic risks.

To sum up, the setting of the weight coefficient is essentially a process of finding the “efficiency–fairness” optimal solution. The ideal weight value should make the system present a dynamic equilibrium state of “ESP has the motivation to innovate continuously and users are willing to participate”. The simulation results show that maintaining the weight in the golden range of 0.4–0.7, combined with a moderate dynamic adjustment mechanism, can promote the efficient operation of energy-integrated energy systems.

6. Conclusions

In this paper, an energy management method of integrated energy systems considering users’ energy preferences is proposed. Considering users’ energy preference for electric heating load, an energy management model of integrated energy systems based on a reinforcement learning framework is established to optimize users’ point heating load and system operation strategy so as to improve the economy of system operation and reduce the carbon emission of the system.

(1) Considering the two-way energy–information interaction process between the ESP and user clusters, as well as the optimal pricing decision of the ESP and users’ energy preference, an energy management framework based on reinforcement learning is established.

(2) Based on reinforcement learning theory, the relationship model among ESP pricing decision (Action), user feedback (Environment), and energy management objective function (Reward) is established to optimize ESP energy price, user load, and system operation optimization strategy, and a distributed solution algorithm combining Q-learning and QP is proposed.

(3) Simulation analysis verifies the effectiveness of the established energy management model considering user preferences through a typical system. The simulation results demonstrate that the energy management model proposed enhances the economic efficiency of IES operations and reduces carbon emissions. In a test system with two UCs, the optimized system achieves a 5.05% reduction in carbon emissions. The RL-based distributed solution algorithm efficiently solves the energy management model for systems with varying UC scales, requiring only 6.55 s for systems with two UCs and 13.26 s for systems with six UCs. And the weight in the golden range of 0.4–0.7, combined with a moderate dynamic adjustment mechanism, can promote the efficient operation of energy-integrated energy systems.

Author Contributions

Conceptualization, Y.C.; Methodology, Y.C., S.Y. and J.X.; Software, P.Y.; Formal analysis, S.Y. and J.X.; Resources, Y.C. and S.S.; Writing – original draft, Y.C., S.Y., P.Y. and J.X.; Supervision, S.S. All authors have read and agreed to the published version of the manuscript.

Funding

This work is supported by the project “Research on User-side Distributed Optical Storage Evaluation and Its Cooperative Operation with Distribution Network” (SGSDDK00PDJS2310147).

Data Availability Statement

The original contributions presented in this study are included in the article. Further inquiries can be directed to the corresponding author.

Conflicts of Interest

All authors were employed by the State Grid Shandong Electric Power Company. All the authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Appendix A

Viewpoints		Participating Subject	Energy Form	DR Price	Pricing Decision	Optimization Method
Demand response	Ref. [5]	User cluster	Electricity; thermal; hydrogen	Electricity price	No	Distributionally robust optimization model
	Ref. [6]	User cluster	Electricity	Electricity price	No	Multi-objective optimization; horse herd optimization algorithm
	Ref. [7]	User cluster	Electricity; thermal	Electricity price; thermal price	No	Two-stage optimal operation; nondominated sorting genetic algorithm (NSGA-II)
	Ref. [8]	User cluster	Electricity	Electricity price	No	Segmented linearization approach
	Ref. [9]	User cluster	Electricity; thermal	Electricity price	No	Robust optimization technique
	Ref. [10]	User cluster	Electricity; thermal	Electricity price	No	Mixed-integer linear programming approach
	Ref. [11]	ESP; user cluster	Electricity	Electricity price	Electricity pricing	Stackelberg game approach; distributed algorithm
	Ref. [12]	ESP; user cluster	Electricity	Electricity price	Electricity pricing	Two-stage energy sharing; Stackelberg game approach
	This paper	ESP; user cluster	Electricity; thermal	Electricity price; thermal price	Electricity and thermal pricing	Reinforcement learning
Energy management and operation optimization	Ref. [13]	User cluster	Electricity	Electricity price	No	Hybrid deep learning
	Ref. [15]	User cluster	Electricity	Electricity price	Electricity pricing	DR optimization of electric load; mathematical programming
	Ref. [16]	User cluster	Electricity	Electricity price	No	Blockchain-based solution
	Ref. [17]	User cluster	Electricity	Electricity price	No	Bi-level mixed integer nonlinear programming
	Ref. [18]	User cluster	Electricity	Electricity price	No	PSO algorithm
	Ref. [19]	User cluster	Electricity	Electricity price	No	Mathematical programming
	Ref. [20]	ESP; user cluster	Electricity	Electricity price	Electricity pricing	DR optimization of electric load; Stackelberg game approach
	Ref. [21]	User cluster	Electricity	Electricity price	No	Multi-stage RL algorithm
	Ref. [22]	User cluster	Electricity	Electricity price	No	Deep reinforcement learning
	Ref. [23]	User cluster	Electricity	Electricity price	No	Deep–fuzzy logic control
	Ref. [24]	User cluster	Electricity; thermal	Electricity price	No	Electric-thermal collaborative optimization; multiscale multiphysics model
	This paper	ESP; user cluster	Electricity; thermal	Electricity price; thermal price	Electricity and thermal pricing	Electric–thermal collaborative optimization; DR optimization of electric–thermal load; reinforcement learning
Solution method	Ref. [31] Ref. [32]	User cluster	Electricity	Electricity price	No	Markov decision making; deep reinforcement learning
	Ref. [33]	User cluster	Electricity	Electricity price	No	Markov decision making; multi-Agent reinforcement learning
	Ref. [34]	User cluster	Electricity	Electricity price	No	Markov decision making; adaptive personalized federated reinforcement learning
	This paper	ESP; user cluster	Electricity; thermal	Electricity price; thermal price	Electricity and thermal pricing	Markov decision making; reinforcement learning

References

Hung, Y.; Liu, N.; Chen, Z.; Xu, J. Collaborative optimization for interconnected energy hubs based on cluster partition of electric-thermal energy network. Energy 2025, 321, 135403. [Google Scholar] [CrossRef]
Wang, Y.; Zhao, X.; Huang, Y. Low-Carbon-Oriented Capacity Optimization Method for Electric–Thermal Integrated Energy System Considering Construction Time Sequence and Uncertainty. Processes 2024, 12, 648. [Google Scholar] [CrossRef]
Hu, J.; Wang, Y.; Dong, L. Low carbon-oriented planning of shared energy storage station for multiple integrated energy systems considering energy-carbon flow and carbon emission reduction. Energy 2024, 290, 130139. [Google Scholar] [CrossRef]
Wang, Y.; Hu, J. Two-stage energy management method of integrated energy system considering pre-transaction behavior of energy service provider and users. Energy 2023, 271, 127065. [Google Scholar] [CrossRef]
Zhang, Z.; Niu, D.; Yun, J.; Siqin, Z. Distributionally Robust Optimization Model of the Integrated Energy System with Integrated Demand Response. J. Energy Eng. 2025, 151, 04025022. [Google Scholar] [CrossRef]
Du, X.; Yang, Y.; Guo, H. Optimizing integrated hydrogen technologies and demand response for sustainable multi-energy microgrids. Electr. Eng. 2025, 107, 2621–2643. [Google Scholar] [CrossRef]
Wang, X.; Jiao, Y.; Cui, B.; Zhu, H.; Wang, R. Two-Stage Optimal Operation of Integrated Energy System Considering Electricity–Heat Demand Response and Time-of-Use Energy Price. Int. J. Energy Res. 2025, 2025, 6106019. [Google Scholar] [CrossRef]
Ma, C.; Hu, Z. Low-Carbon Economic Scheduling of Integrated Energy System Considering Flexible Supply–Demand Response and Diversified Utilization of Hydrogen. Sustainability 2025, 17, 1749. [Google Scholar] [CrossRef]
Mobarakeh, S.I.; Sadeghi, R.; Saghafi, H.; Delshad, M. Hierarchical integrated energy system management considering energy market, demand response and uncertainties: A robust optimization approach. Comput. Electr. Eng. 2025, 123, 110138. [Google Scholar] [CrossRef]
Moosavi, M.; Olamaei, J.; Shourkaei, H.M. Optimizing microgrid performance a multi-objective strategy for integrated energy management with hybrid sources and demand response. Sci. Rep. 2025, 15, 17827. [Google Scholar] [CrossRef]
Lin, W.-T.; Chen, G.; Li, J.; Lei, Y.; Zhang, W.; Yang, D.; Ming, T. Privacy-preserving incentive mechanism for integrated demand response: A homomorphic encryption-based approach. Int. J. Electr. Power Energy Syst. 2025, 164, 110407. [Google Scholar] [CrossRef]
Bian, Y.; Xie, L.; Ma, L.; Cui, C. A novel two-stage energy sharing model for data center cluster considering integrated demand response of multiple loads. Appl. Energy 2025, 384, 125454. [Google Scholar] [CrossRef]
Ibude, F.; Otebolaku, A.; Ameh, J.E.; Ikpehai, A. Multi-Timescale Energy Consumption Management in Smart Buildings Using Hybrid Deep Artificial Neural Networks. J. Low Power Electron. Appl. 2024, 14, 54. [Google Scholar] [CrossRef]
Ahmad, W.; Lucas, A.; Carvalhosa, S.M.P. Battery Control for Node Capacity Increase for Electric Vehicle Charging Support. Energies 2024, 17, 5554. [Google Scholar] [CrossRef]
Asaleye, D.A.; Murphy, D.J.; Shine, P.; Murphy, M.D. The Practical Impact of Price-Based Demand-Side Management for Occupants of an Office Building Connected to a Renewable Energy Microgrid. Sustainability 2024, 16, 8120. [Google Scholar] [CrossRef]
Kolahan, A.; Maadi, S.R.; Teymouri, Z.; Schenone, C. Blockchain-based solution for energy demand-side management of residential buildings. Sustain. Cities Soc. 2021, 75, 103316. [Google Scholar] [CrossRef]
Al-Quraan, A.; Al-Mhairat, B. Sizing and energy management of standalone hybrid renewable energy systems based on economic predictive control. Energy Convers. Manag. 2024, 300, 117948. [Google Scholar] [CrossRef]
Sun, Y.; Luo, Z.; Li, Y.; Zhao, T. Grey-box model-based demand side management for rooftop PV and air conditioning systems in public buildings using PSO algorithm. Energy 2024, 296, 131052. [Google Scholar] [CrossRef]
Zhang, K.; Saloux, E.; Candanedo, J.A. Enhancing energy flexibility of building clusters via supervisory room temperature control: Quantification and evaluation of benefits. Energy Build. 2024, 302, 113750. [Google Scholar] [CrossRef]
Ying, C.; Zou, Y.; Xu, Y. Decentralized energy management of a hybrid building cluster via peer-to-peer transactive energy trading. Appl. Energy 2024, 372, 123803. [Google Scholar] [CrossRef]
Wang, Z.; Xiao, F.; Ran, Y.; Li, Y.; Xu, Y. Scalable energy management approach of residential hybrid energy system using multi-agent deep reinforcement learning. Appl. Energy 2024, 367, 123414. [Google Scholar] [CrossRef]
Mocanu, E.; Mocanu, D.C.; Nguyen, P.H.; Liotta, A.; Webber, M.E.; Gibescu, M.; Slootweg, J.G. On-Line Building Energy Optimization Using Deep Reinforcement Learning. IEEE Trans. Smart Grid 2019, 10, 3698–3708. [Google Scholar] [CrossRef]
Cavus, M.; Dissanayake, D.; Bell, M. Deep-Fuzzy Logic Control for Optimal Energy Management: A Predictive and Adaptive Framework for Grid-Connected Microgrids. Energies 2025, 18, 995. [Google Scholar] [CrossRef]
Zhou, Y.P.; Wang, L.X.; Wang, B.Y.; Chen, Y.; Ran, C.X.; Wu, Z.B. Polarization management of photonic crystals to achieve synergistic optimization of optical, thermal, and electrical performance of building-integrated photovoltaic glazing. Appl. Energy 2024, 372, 123827. [Google Scholar] [CrossRef]
Liang, H.; Lin, C.; Pang, A. Expert knowledge data-driven based actor–critic reinforcement learning framework to solve computationally expensive unit commitment problems with uncertain wind energy. Int. J. Electr. Power Energy Syst. 2024, 159, 110033. [Google Scholar] [CrossRef]
Alharbi, M.; Alghamdi, A.S. Hybrid CNN-GRU Forecasting and Improved Teaching–Learning-Based Optimization for Cost-Efficient Microgrid Energy Management. Processes 2025, 13, 1452. [Google Scholar] [CrossRef]
Ding, L.; Cui, Y.; Yan, G.; Huang, Y.; Fan, Z. Distributed energy management of multi-area integrated energy system based on multi-agent deep reinforcement learning. Int. J. Electr. Power Energy Syst. 2024, 157, 109867. [Google Scholar] [CrossRef]
Shuai, Q.; Yin, Y.; Huang, S.; Chen, C. Deep Reinforcement Learning-Based Real-Time Energy Management for an Integrated Electric–Thermal Energy System. Sustainability 2025, 17, 407. [Google Scholar] [CrossRef]
Li, Y.; Ma, W.; Li, Y.; Li, S.; Chen, Z.; Shahidehpour, M. Enhancing cyber-resilience in integrated energy system scheduling with demand response using deep reinforcement learning. Appl. Energy 2025, 379, 124831. [Google Scholar] [CrossRef]
Gong, J.; Yu, N.; Han, F.; Tang, B.; Wu, H.; Ge, Y. Bias Correction of Data-Driven Deep Reinforcement Learning in Economic Scheduling of Integrated Energy Systems. In Proceedings of the International Conference on Life System Modeling and Simulation, International Conference on Intelligent Computing for Sustainable Energy and Environment, Suzhou, China, 13–15 September 2024; Springer: Singapore, 2025. [Google Scholar] [CrossRef]
Lian, J.; Li, D.; Li, L. Real-Time Energy Management Strategy for Fuel Cell/Battery Plug-In Hybrid Electric Buses Based on Deep Reinforcement Learning and State of Charge Descent Curve Trajectory Control. Energy Technol. 2025, 13, 2401696. [Google Scholar] [CrossRef]
Li, Y.; Chang, W.; Yang, Q. Deep reinforcement learning based hierarchical energy management for virtual power plant with aggregated multiple heterogeneous microgrids. Appl. Energy 2025, 382, 125333. [Google Scholar] [CrossRef]
Liao, Z.; Li, C.; Zhang, X.; Hu, Q.; Wang, B. A Bidding Strategy for Power Suppliers Based on Multi-Agent Reinforcement Learning in Carbon–Electricity–Coal Coupling Market. Energies 2025, 18, 2388. [Google Scholar] [CrossRef]
Wang, T.; Dong, Z.Y. Adaptive personalized federated reinforcement learning for multiple-ESS optimal market dispatch strategy with electric vehicles and photovoltaic power generations. Appl. Energy 2024, 365, 123107. [Google Scholar] [CrossRef]
Wang, L.; Tao, Z.; Zhu, L.; Wang, X.; Yin, C.; Cong, H.; Bi, R.; Qi, X. Optimal dispatch of integrated energy system considering integrated demand response resource trading. IET Gener. Transm. Distrib. 2022, 16, 1727–1742. [Google Scholar] [CrossRef]
Lu, J.; Liu, T.; He, C.; Nan, L.; Hu, X. Robust day-ahead coordinated scheduling of multi-energy systems with integrated heat-electricity demand response and high penetration of renewable energy. Renew. Energy 2021, 178, 466–482. [Google Scholar] [CrossRef]
Lu, R.; Hong, S.H.; Zhang, X. A Dynamic pricing demand response algorithm for smart grid: Reinforcement learning approach. Appl. Energy 2018, 220, 220–230. [Google Scholar] [CrossRef]
Ma, L.; Liu, J.; Wang, Q. Energy Management and Pricing Strategy of Building Cluster Energy System Based on Two-Stage Optimization. Front. Energy Res. 2022, 10, 865190. [Google Scholar] [CrossRef]
Huang, Y.; Wang, Y.; Liu, N. Low-carbon economic dispatch and energy sharing method of multiple Integrated Energy Systems from the perspective of System of Systems. Energy 2022, 244, 122717. [Google Scholar] [CrossRef]
Lin, Z.; Wang, S.; Wang, S.; Zhao, Q. Distributed coordinated dispatching of district electric-thermal integrated energy system considering ladder-type carbon trading mechanism. Power Syst. Syst. Syst. Technol. 2023, 47, 217–229. [Google Scholar] [CrossRef]
Gao, X.; Tao, H.; Miao, G.; Ping, Z. Joint optimization control of energy storage system management and demand response. J. Syst. Simul. 2016, 28, 1165–1172. [Google Scholar] [CrossRef]
Ma, L.; Cheng, Y. Optimal operation for park integrated energy system considering interruptible loads. J. Syst. Simul. 2022, 34, 817–825. [Google Scholar] [CrossRef]
Huang, H.; Chen, X.; Cha, J. Partition autonomous energy cooperation community and its joint optimal scheduling for multi-park integrated energy system. Power Syst. Technol. 2022, 46, 2955–2965. [Google Scholar] [CrossRef]
Zeng, X.; Liu, T.; Li, Q.; He, C.; Xiao, H.; Qin, H. An Mixed Integer Quadratic Programming Model and Algorithm Study for Power Balance Problem of High Hydropower Proportion’s System. Proc. CSEE 2017, 37, 1114–1125. [Google Scholar] [CrossRef]
Luo, J.; Li, Y.; Wang, H.; Li, G.; Song, Y.; Long, W. Optimal Scheduling of Integrated Energy System Considering Integrated Demand Response. In Proceedings of the 2023 8th Asia Conference on Power and Electrical Engineering (ACPEE), Tianjin, China, 14–16 April 2023; Conference Series 2022. pp. 110–116. [Google Scholar] [CrossRef]
Huang, Y.; Wang, Y.; Liu, N. A two-stage energy management for heat-electricity integrated energy system considering dynamic pricing of Stackelberg game and operation strategy optimization. Energy 2022, 244, 122576. [Google Scholar] [CrossRef]

Figure 1. Typical structure of the IES with the ESP and users.

Figure 2. Energy management framework based on RL.

Figure 3. Flow chart of Q-learning–QP algorithm.

Figure 4. Electricity, thermal load, and PV power in users.

Figure 5. Electricity pricing strategy for the ESP.

Figure 6. Thermal pricing strategy for the ESP.

Figure 7. Energy interaction between the ESP and users.

Figure 8. Optimal unit operation strategy of the ESP.

Figure 9. Load optimization results of user 1.

Figure 10. Optimal operation strategy of user 1.

Figure 11. Load optimization results of user 2.

Figure 12. Optimal operation strategy of user 2.

Figure 13. Convergence curve of all parties.

Figure 14. Analysis of carbon emission results.

Figure 15. Electric load in the test system.

Figure 16. Thermal load in the test system.

Figure 17. Convergence curve of all parties in the test system.

Figure 18. Energy pricing strategy for the ESP in the test system.

Figure 19. Electricity and thermal energy demand of UCs in the test system.

Figure 20. Optimal unit operation strategy of the ESP in the test system.

Figure 21. Sensitivity analysis result of weight coefficients in different test systems.

Table 1. Equipment parameters in the IES.

Equipment	Parameters	ESP	User Cluster 1	User Cluster 2
CHP	Capacity	1500 kW	-	-
	Operating cost coefficient	0.05 CNY/kWh	-	-
	Thermal–electric ratio	1.35	-	-
HP	Capacity	-	1500 kW	1000 kW
	Operating cost coefficient	-	0.026 CNY/kWh	0.026 CNY/kWh
	Heating efficiency	-	0.90	0.86
GB	Capacity	1500 kW	-	-
	Operating cost coefficient	0.15 CNY/kWh	-	-
	Heating efficiency	0.85	-	-
PV	Capacity	-	1000 kW	2500 kW
PV	Operating cost coefficient	-	0.025 CNY/kWh	0.025 CNY/kWh
ES	Capacity	1000 kW	-	2000 kW
ES	Operating cost coefficient	0.02 CNY/kWh	-	0.02 CNY/kWh
HS	Capacity	1000 kW	-	-
HS	Operating cost coefficient	0.02 CNY/kWh	-	-

Table 2. Wholesale price and elasticity of the electricity.

Price Type	Off-Peak	Mid-Peak	On-Peak
Time	1:00–8:00	9:00–11:00; 16:00–17:00; 23:00–24:00	12:00–15:00 20:00–22:00
Selling Price	0.45 CNY/kWh	0.86 CNY/kWh	1.35 CNY/kWh
Buying Price	0.40 CNY/kWh

Table 3. Cost–benefit analysis of users.

User Cost/CNY	General Integrated Demand Response Optimization		Optimization of Integrated Demand Response Considering User Preference
User Cost/CNY	User Cluster 1	User Cluster 2	User Cluster 1	User Cluster 2
Electricity transaction cost with ESP	13,322.06	5968.64	13,489.56	4892.61
Thermal transaction cost with ESP	10,244.04	8644.99	8536.97	8536.97
Electricity transaction cost with power grid	0.00	2044.99	0.00	3793.04
Satisfaction loss	2780.52	3892.95	2462.21	3224.27
Operation and maintenance cost	269.77	707.12	269.77	735.72
Total operating cost	26,616.39	21,258.70	24,758.50	21,182.61

Table 4. Cost–benefit analysis of the ESP.

ESP Benefit Composition/CNY	General Integrated Demand Response Optimization	Optimization of Integrated Demand Response Considering User Preference
Revenue from electricity interaction with power grid	3601.02	3245.98
Revenue from electricity interaction with users	19,290.70	18,382.17
Revenue from thermal energy interaction with users	18,889.03	17,073.93
Natural gas cost	29,778.75	28,274.97
Operation and maintenance cost	3464.13	1876.47
Environmental cost	5736.60	5446.91
Total operating benefit	2801.27	3103.72

Table 5. Carbon emission source of the test system.

Carbon Emission	General Integrated Demand Response Optimization	Optimization of Integrated Demand Response Considering User Preference
Carbon emission of CHP	15,080.48	14,426.84
Carbon emission of GB	7865.91	7360.80
Total carbon emission	22,946.39	21,787.64

Table 6. Equipment parameters of the ESP.

Equipment	Parameters	Value
CHP	Capacity	700 kW
	Operating cost coefficient	0.05 CNY/kWh
	Thermal–electric ratio	1.35
HP	Capacity	200 kW
	Operating cost coefficient	0.026 CNY/kWh
	Heating efficiency	0.90
ES	Capacity	500 kW
ES	Operating cost coefficient	0.02 CNY/kWh
HS	Capacity	500 kW
HS	Operating cost coefficient	0.02 CNY/kWh

Table 7. Equipment parameters of UCs.

Equipment	Parameters	UC 1–UC 4	UC 5	UC 6
HP	Capacity	-	-	150 kW
	Operating cost coefficient	-	-	0.026 CNY/kWh
	Heating efficiency	-	-	0.86
PV	Capacity	50 kW	100 kW	100 kW
PV	Operating cost coefficient	0.025 CNY/kWh	0.025 CNY/kWh	0.025 CNY/kWh
ES	Capacity	-	100 kW	-
ES	Operating cost coefficient	-	0.02 CNY/kWh	-

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Cheng, Y.; Yang, S.; Sun, S.; Yu, P.; Xing, J. Energy Management for Integrated Energy System Based on Coordinated Optimization of Electric–Thermal Multi-Energy Retention and Reinforcement Learning. Processes 2025, 13, 2693. https://doi.org/10.3390/pr13092693

AMA Style

Cheng Y, Yang S, Sun S, Yu P, Xing J. Energy Management for Integrated Energy System Based on Coordinated Optimization of Electric–Thermal Multi-Energy Retention and Reinforcement Learning. Processes. 2025; 13(9):2693. https://doi.org/10.3390/pr13092693

Chicago/Turabian Style

Cheng, Yan, Song Yang, Shumin Sun, Peng Yu, and Jiawei Xing. 2025. "Energy Management for Integrated Energy System Based on Coordinated Optimization of Electric–Thermal Multi-Energy Retention and Reinforcement Learning" Processes 13, no. 9: 2693. https://doi.org/10.3390/pr13092693

APA Style

Cheng, Y., Yang, S., Sun, S., Yu, P., & Xing, J. (2025). Energy Management for Integrated Energy System Based on Coordinated Optimization of Electric–Thermal Multi-Energy Retention and Reinforcement Learning. Processes, 13(9), 2693. https://doi.org/10.3390/pr13092693

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Energy Management for Integrated Energy System Based on Coordinated Optimization of Electric–Thermal Multi-Energy Retention and Reinforcement Learning

Abstract

1. Introduction

2. Energy Management Framework and User Preference Model

2.1. Energy Management Framework

2.2. User Energy Preference Model

2.2.1. Electricity Energy Preference Model

2.2.2. Thermal Energy Preference Model

3. ESP and User Benefit Model

3.1. ESP Benefit Model

3.2. User Benefit Model

4. Energy Management Model Based on Reinforcement Learning

4.1. Energy Management Model

4.1.1. Action Space Model

4.1.2. State Space Model

4.1.3. Reward Function Model

4.2. Model Solving Method Based on Q-Learning–QP

5. Case Study

5.1. Basic Data

5.2. Simulation Results

5.2.1. Optimization Result of ESP Pricing and Operation Strategy

5.2.2. Optimization Result of User Energy Management

5.3. Discussion and Analysis

5.3.1. Analysis of Operation Economy and Carbon Emission

5.3.2. Scalability Verification of the Proposed Method

5.3.3. Sensitivity Analysis of Key Parameters

6. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

Appendix A

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI