In the proposed game model, it can be considered as the bi-level optimization problem. In the upper-level problem. The MGO (leader) aims to maximize its daily profit by determining the optimal electricity and heat prices. In the lower-level problem, the user aggregator (follower) aims to maximize its daily profit by optimizing its flexible electricity load schedule, heat load curtailment, and its participation in the shared energy storage system.
To tackle the non-linearity of the upper-level problem (MGO pricing), the Harris Hawks Optimization (HHO) algorithm is employed for its robust global search. In contrast, the lower-level strategy problem is formulated as a Mixed-Integer Linear Programming (MILP) model and solved exactly using the CPLEX solver.
6.3.1. Harris Hawks Optimization Method
The HHO algorithm is inspired by the cooperative behavior and surprise attacks of Harris hawks [
28]. It operates in the following three phases: global exploration, a transition from exploration to exploitation, and local exploitation. Within the algorithm, each hawk’s position is a candidate solution, while the prey represents the best solution.
(1) Global exploration
In the exploration phase, the Harris hawk population perches randomly at different locations, using their sharp eyesight to track and detect prey across the search space. A global search for the prey is conducted with equal probability using one of two strategies. If
q < 0.5, each hawk moves based on the positions of other members and the prey. If
q ≥ 0.5, the hawks randomly perch on a random tree within the range of population. The corresponding equations are as follows:
where
X(
t) and
X(
t + 1) are the positions of an individual in the current and the next iteration, respectively, and t is the current iteration number.
Xrand(
t) is the position of a randomly selected individual, and
Xrabbit(
t) is the position of the prey (i.e., the individual with the best fitness).
r1,
r2,
r3,
r4, and
q are random numbers in the interval [0, 1]. The parameter
q is used to randomly select the strategy.
Xm(
t) is the average position of the population, expressed as
(2) Transition from exploration to exploitation
The HHO algorithm divides the hunting process into exploration and exploitation behaviors based on the predatory habits of Harris hawks. During its escape, the prey’s energy gradually decreases. Therefore, the prey’s escape energy is used to dynamically select between exploration and exploitation. The prey’s escape energy is defined as
where
E0 is the initial escape energy of the prey, a random number in the interval [−1, 1];
t is the current iteration number; and
M is the maximum number of iterations. When |
E| ≥ 1, the algorithm is in the exploration phase. When |
E| < 1, it transitions to the exploitation phase.
(3) Local exploitation
In the exploitation phase, after spotting the prey, the hawks encircle it, awaiting a chance to strike. The hunt is complex, as the prey can still escape, forcing the hawks to adapt. To model this, HHO uses the following four strategies: soft besiege, hard besiege, soft besiege with progressive rapid dives, and hard besiege with progressive rapid dives.
Let Sp be the prey’s escape probability, a random number in the interval (0, 1). Sp < 0.5 indicates that the prey has an opportunity to escape. The hunting strategy is then determined by combining the prey’s escape energy, |E|, and its escape probability, Sp.
Case 1: Soft Besiege (0.5 ≤ |E| < 1 and Sp ≥ 0.5)
The prey still has enough energy to escape and attempts to break out of the encirclement through random jumps. In this scenario, the hawks employ a soft besiege to exhaust the prey, creating an opportunity for a sudden surprise attack. The position update formula is as follows:
where
represents the difference between the prey’s position and the individual’s current position, and
J is a random number in the interval [0, 2].
Case 2: Hard Besiege (|E| < 0.5 and Sp ≥ 0.5)
The prey has no energy left to escape. The Harris hawks then employ a hard besiege to capture the prey for the final surprise attack. The position update formula is as follows:
Case 3: Soft Besiege with Progressive Rapid Dives (0.5 ≤ |E| < 1 and Sp < 0.5)
The prey has sufficient energy to evade the hawks. However, the hawks will employ a soft besiege with progressive rapid dives, gradually correcting their position and direction based on the prey’s deceptive maneuvers. This is implemented by comparing two potential moves and selecting the better one. The update formula is as follows:
where
D is the dimension;
F(·) is the fitness function;
S is a
D-dimensional random vector with elements that are random numbers in [0, 1]; and
LF(·) is the Lévy flight function.
Case 4: Hard Besiege with Progressive Rapid Dives (|E| < 0.5 and Sp < 0.5)
The prey is exhausted but still has a chance to escape. The hawks employ a hard besiege with progressive rapid dives. The position update formula for this strategy is similar to Case 3. In this scenario, the hawk swarm attempts to reduce the distance to the average position of the target prey. The update formula is as follows:
6.3.2. The Solution Procedure
The detailed solution procedure is as follows:
(1) Initialization: Set up the parameters for the MGO, the shared energy storage system, and the user aggregator. Initialize the iteration counter k = 0, the population size m = 30, the maximum number of iterations to 10.
(2) Initial price generation: Use the HHO algorithm to randomly generate an initial population of m candidate solutions, where each solution is a set of electricity and heat prices. These price sets are transmitted to the user aggregator.
(3) Iteration update: Increment the iteration counter: k = k + 1.
(4) Follower Optimization: For each of the m price sets, the profit-maximization problem in Equation (30) was solved by the CPLEX solver. This determines the optimal distribution of flexible electricity load, heat load curtailment, and shared energy storage participation. The user aggregator then calculates and stores its current daily profit , and returns the purchased electricity and heat quantities to the MGO.
(5) Leader profit evaluation: For each of the m scenarios, the MGO calculates its daily profit based on the purchased electricity and heat quantities reported by the user aggregator.
(6) HHO update and re-evaluation: The HHO algorithm updates the population based on its exploration and exploitation mechanism, where the fitness function is Equation (32). Steps (4)–(5) are repeated for the new population to calculate the corresponding MGO’s profit and the user aggregator’s profit .
(7) Solution update: if , update and ; otherwise, and remain unchanged.
(8) Convergence check: If k reaches the maximum number of iterations, the algorithm is deemed terminates and outputs the best solution. Otherwise, return to step (3).