Next Article in Journal
Study on Time-Varying Mechanism of Reservoir Properties During Long-Term Water Flooding
Previous Article in Journal
Innovative Technologies for Building Envelope to Enhance the Thermal Performance of a Modular House in Australia
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

A Multi-Agent Closed-Loop Decision-Making Framework for Joint Forecasting and Bidding in Electricity Spot Markets

1
Jiangsu Institute of Smart Energy Utilization and Low-Carbon Technologies Co., Ltd., Nanjing 211168, China
2
Shenzhen Shenneng Innovation Technology Co., Ltd., Shenzhen 518028, China
3
School of Electrical and Power Engineering, Hohai University, Nanjing 211100, China
4
Electric Power Research Institute, State Grid Jiangsu Electric Power Co., Ltd., Nanjing 211103, China
5
College of Electrical and Electronic Engineering, North China Electric Power University, Beijing 102206, China
6
Shenzhen Energy Group Co., Ltd., Shenzhen 518000, China
*
Authors to whom correspondence should be addressed.
Energies 2025, 18(24), 6486; https://doi.org/10.3390/en18246486
Submission received: 11 October 2025 / Revised: 27 November 2025 / Accepted: 5 December 2025 / Published: 11 December 2025

Abstract

With increasing renewable energy integration, electricity spot markets exhibit high volatility and uncertainty, making it difficult to balance profit and risk. To address this challenge, this paper proposes Joint (Version 1.0), a multi-agent closed-loop framework that integrates forecasting, strategy, and feedback for coordinated decision-making. The Prediction Agent learns statistical patterns of price spreads to generate distributional forecasts, directional probabilities, and extreme-value indicators; the Strategy Agent adaptively maps these signals into executable bidding ratios through a hybrid mechanism; and the Feedback Agent incorporates settlement results for performance evaluation, CVaR-based risk control, and preference-driven optimization. These agents form a dynamic “forecast–strategy–feedback” loop enabling self-improving trading. Experimental results show that Joint achieves a monthly profit of 146,933.46 CNY with strong classification performance (Precision = 53.25%, Recall = 40.45%, AA = 56.05%, SWA = 57.36%), and the complete model in ablation experiments reaches 157,746.64 CNY, demonstrating the indispensable contributions of each component and confirming its robustness and practical value in volatile electricity spot markets.

1. Introduction

1.1. Motivation

In recent years, the global energy sector has been undergoing a profound transformation. Driven by decarbonization imperatives and the broader energy transition, the rapid deployment and large-scale integration of renewable energy resources have substantially enhanced the cleanliness and sustainability of modern power systems. Nevertheless, the inherent intermittency and stochastic nature of these resources have introduced significant challenges to supply–demand balancing, resulting in elevated and fluctuating electricity price levels [1]. For instance, since mid-2021, Europe has witnessed a sharp escalation in electricity prices due to geopolitical tensions, extreme fuel price fluctuations, and the steadily increasing penetration of renewables [2]. This phenomenon not only creates new challenges for energy policy and market design but also exposes market participants including generators, retailers, and end-users, to uncertainties in operation and risk management.
Electricity market structures are typically organized into medium- and long-term markets and spot markets. Medium- and long-term markets, dominated by bilateral contracts and annual or monthly agreements, serve to hedge against price volatility by securing traded volumes and prices in advance, albeit at the cost of reduced operational flexibility [3]. In contrast, the spot market directly reflects the real-time operational conditions of the power system and constitutes the most dynamic and volatile component of electricity trading. The spot market is generally divided into the day-ahead market (DAM) and the real-time market (RTM). The DAM clears the majority of energy transactions through centralized bidding conducted one day ahead, thereby providing essential price signals and scheduling references for generators and consumers. The RTM complements this process by enabling continuous adjustments in response to renewable output deviations, load forecasting errors, and unforeseen contingencies, thereby safeguarding system reliability and enhancing operational flexibility. The combination of the DAM and the RTM form the core price formation mechanism of the spot market, exerting decisive influence on the strategic decisions and economic outcomes of market participants.
Electricity prices, as a distinctive form of economic signal, exhibit several inherent characteristics: require real-time balancing of supply and demand, display high volatility and frequent price spikes [4]. These complexities render electricity price forecasting more challengeable than load forecasting, while also implying that price prediction alone is insufficient to meet the practical requirements of market participants. In competitive market environments, the main objective of forecasting is to transform price information into rational trading strategies that both maximize returns and effectively control risks [5]. Consequently, developing an intelligent decision-making framework that can jointly ensure profitability and robustness under the highly volatile and uncertain conditions of spot markets has become a pressing issue for both academia and industry.
Under this context, research on multi-agent systems (MASs) offers new opportunities for optimizing decision-making in electricity markets [6]. In particular, the rapid advancement of large language models (LLMs) has significantly strengthened the reasoning, interaction, and collaboration capabilities of intelligent agents, enabling them to perform role differentiation, cooperative decision-making, and strategic competition in trading environments. In financial markets, recent studies have proposed LLM-based multi-agent trading frameworks that draw inspiration from the organizational structures of real trading firms, decomposing complex trading tasks into specialized role agents [7]. These agents collaborate through structured report generation, multi-round debates, and risk assessment mechanisms, which enhances interpretability and transparency of decisions. Empirical results further demonstrate superior performance over conventional single-model approaches, as measured by cumulative returns, Sharpe ratio, and maximum drawdown. This research trajectory highlights the potential of agent-based multi-role cooperation and strategic interaction as an effective pathway to address market volatility and uncertainty.

1.2. Literature Review

Electricity price forecasting, as a core component in electricity spot market, has long attracted extensive attention. The evolution of forecasting methods is generally from traditional econometric approaches such as Autoregressive Integrated Moving Average (ARIMA), Generalized Autoregressive Conditional Heteroskedasticity, and Vector Autoregression (VAR) to the more recent adoption of machine learning and deep learning techniques [8]. Among these, the studies employing deep belief networks (DBNs) have demonstrated that DBNs are able to effectively capture the nonlinear characteristics of electricity prices in day-ahead markets, outperforming conventional neural network models in terms of accuracy [9].
Methodologically, recent research has increasingly shifted towards probabilistic and distributional forecasting, with the goal of not only predicting expected price values but also characterizing distributional features and tail risks. For instance, the distributional deep neural network incorporates distributional parameters in its output layer, with the modeling of the price distribution, which shows outperformance compared to some benchmark models such as Least Absolute Shrinkage and Selection Operator and Quantile Regression Averaging on German day-ahead price datasets [10]. Meanwhile, hybrid and ensemble learning methods have gained prominence. A notable example is the decomposition–optimization–ensemble (DOE) framework, which integrates signal decomposition, optimization, and model fusion to mitigate non-stationarity in price series, demonstrating superior performance in both medium- and short-term forecasting in the PJM market [11]. Overall, while electricity price forecasting has developed into a relatively mature research field, existing approaches remain limited in robustness, spike detection, and uncertainty quantification, particularly under the conditions of high renewable penetration and increased volatility in the RTM. These limitations highlight the need for closer integration of forecasting outcomes with trading strategy design.
Consequently, electricity price forecasts must be translated into executable trading and scheduling strategies to maximize profits and control risks for retail companies. As retailers are increasingly exposed to volatile wholesale markets, their bidding strategies in day-ahead and intra-day trading have become a focal point of recent research. Some studies have formulated Stackelberg game-based frameworks to characterize the interactions among market operators, generators, and retailers, showing that optimized bidding can reduce trading costs [12]. More recent research has extended these approaches in the context of electricity spot markets. A master–slave game-based trading strategy has been studied into enhance profitability and maintain system stability through the interaction between retailers and market operators [13]. Preference-aware trading models that incorporate the behavioral tendencies of energy storage systems have also been developed, to improve market efficiency and overall welfare with risk-adjusted optimization [14]. In addition, a two-stage optimization framework combining robust optimization and risk constraints have been proposed to address renewable uncertainty, demonstrating improved adaptability and reduced imbalance costs in dynamic market conditions [15]. Overall, these studies reflect a recent shift toward risk-aware, preference-adaptive bidding mechanisms that strengthen both flexibility and economic performance in retailer-oriented spot market operations.
Traditional centralized optimization and single-agent modeling approaches are limited in capturing the complexities of multi-stakeholder participation, rule adaptivity, and dynamic competition inherent in spot markets. With the rapid development of learning-based market models, recent studies have increasingly adopted multi-agent reinforcement learning (MARL) to better characterize strategic interactions and adaptive behaviors in electricity markets. One stream of research shows that when agents simultaneously learn profit-maximizing strategies in DAM, the inherently volatile market environment could cause learned behaviors to deviate from rational market responses. To address this issue, new methods have been proposed to stabilize MARL training and establish convergence conditions under centralized-training–decentralized-execution paradigms, thereby improving the realism of market simulations [16]. Another stream focuses on multi-market coupling, where MARL, particularly MADDPG-based frameworks, has been widely used to coordinate bidding decisions for power suppliers across carbon, coal, and electricity markets, effectively capturing multi-timescale decision processes and yielding higher revenues and faster convergence than traditional methods [17]. On the retail side, MARL has also been applied to model heterogeneous bidding behaviors of retail companies with different operational objectives and profit-sharing schemes, demonstrating its ability to reproduce realistic decision diversity and enhance wholesale market simulations [18]. Overall, these advances highlight that multi-agent reinforcement learning has become a promising paradigm for capturing strategic interactions and optimizing bidding behavior under uncertainty in modern electricity spot markets.
Despite these advances, critical research gaps persist. First, most electricity price forecasting models emphasize predictive accuracy but pay limited attention to how forecasts can directly support profit optimization for retailers. Second, while retailer bidding strategies can reduce procurement costs or improve profitability, their robustness and dynamic adaptability remain insufficient under conditions of high renewable penetration and extreme market volatility. Third, although MAS approaches have been applied to market simulation and rule evaluation, much of the existing work has remained at the mechanism level, lacking systematic exploration of profit maximization for retailers. Therefore, this paper proposes a multi-agent framework tailored for electricity spot trading. Through the closed loop coupling of forecasting, strategy generation, and feedback, the framework enables intelligent decision-making that jointly pursues profit maximization with consideration of risk management, offering new insights and methodologies for addressing the challenges of increasingly complex electricity markets.

1.3. Contributions

Against this backdrop, this paper proposes a multi-agent closed-loop decision-making framework—Joint (Version 1.0), tailored for retailer participation in electricity spot markets. The framework integrates forecasting, strategy generation, and adaptive feedback into a unified decision-making cycle, enabling retailers to jointly pursue profit maximization and risk control in highly volatile environments.
Within this framework, three specialized agents collaborate to achieve closed-loop optimization. The prediction agent learns the statistical patterns of day-ahead and real-time price spreads from historical and external features. It produces three complementary types of predictive signal—distributional forecasts that quantify uncertainty ranges, directional probabilities that estimate the likelihood of price increases or decreases, and structural indicators that capture peak and trough patterns in price trends. Together, these outputs provide multi-dimensional information that links predictive uncertainty with actionable decision signals.
Building upon these signals, the strategy agent determines the corresponding bidding ratios under market and operational constraints. When predictive confidence is high, it adopts boundary bidding strategies to capture potential profit opportunities. Under uncertain conditions, it applies a smooth decision mechanism that integrates weighted probabilistic, magnitude, and structural signals to avoid excessive volatility and maintain the balance between profitability and robustness.
After market settlement, the feedback agent evaluates strategy performance based on real outcomes and refines both the forecasting and decision modules through preference-based optimization. By incorporating realized profit–risk feedback, the agent continuously improves the framework’s decision parameters and ensures long-term adaptability in dynamic markets.
Through this dynamic “forecast–strategy–feedback” interaction, the proposed Joint (Version 1.0) establishes a self-evolving optimization loop that bridges forecasting accuracy and economic efficiency. It not only improves the interpretability and stability of decision-making under uncertainty but also demonstrates superior robustness and adaptability compared with conventional forecasting or single agent bidding approaches.

2. Materials and Methods

2.1. Technical Framework and Mathematical Modeling

The framework developed in this study consists of three key components: a Prediction Agent, a Strategy Agent, and a Feedback Agent. These agents operate sequentially within the same decision-making cycle. Specifically, the Prediction Agent generates numerical and directional information on future market price spreads; the Strategy Agent, subject to physical constraints, formulates bidding ratios based on the forecasted information; and the Feedback Agent adjusts and evolves strategies by incorporating market settlement results. Through this closed-loop interaction, the framework achieves a unified balance between profit maximization and risk control, as illustrated in Figure 1.

2.1.1. Problem Formulation

In the intra-day electricity market, each trading day is discretized into T intervals of 15 min. For any given interval t , a retail company submits its bidding quantity based on the load demand L t . Typically, the bid is determined as a proportion y t of the load, that is:
q t = y t L t , 0.9 y t 1.1
Here, L t denotes the electricity demand in interval t , while the bidding ratio y t is constrained within the interval [0.9, 1.1] to ensure compliance with market regulations. This bound is derived from the bidding limits specified in the spot-market rules of a provincial electricity market in China, where retailers are required to submit quantities within this permissible deviation range. q t denotes the actual bidding quantity submitted to the market in that interval. The market clearing price is determined by the day-ahead price λ t D A and the real-time price λ t R T . Their difference can be regarded as the spread signal reflecting market movements.
S t = λ t R T λ t D A
The settlement cost function can then be expressed as:
C t y t = λ t R T L t + λ t D A λ t R T y t L t
where the first term corresponds to the expenditure for purchasing the actual load at the real-time price, while the second term captures the compensation cost arising from deviations between submitted bids and realized consumption. Owing to its near-linear dependence on the decision variable, the optimal solution often lies at the boundaries of the feasible interval. Specifically, when the spread remains strictly positive, the optimal bidding strategy tends to adopt the upper bound ( y t = 1.1 ); conversely, when the spread is strictly negative, the lower bound ( y t = 0.9 ) should be selected. These boundary values are defined according to the bidding ratio constraints of a specific provincial electricity spot market in China, where participants are allowed to adjust their declared load within ±10% of the forecasted value under local regulatory rules. However, when the predicted spread interval crosses zero, the optimal bidding direction becomes uncertain, thereby necessitating a more flexible decision-making mechanism.

2.1.2. Forecasting Agent

The primary role of the Prediction Agent is to learn the statistical patterns of price signals from historical data X h and external environmental features X t , and to provide forecasts and uncertainty quantification of the future spread S t for each interval, as illustrated in Figure 2.
In the implementation, bootstrap resampling is employed to construct B CatBoost-based regression models { f ( b ) } . For a given feature vector x_t, each model produces a spread prediction S ^ t ( b ) . The median of these predictions is taken as the representative forecast:
S ^ t m e d = m e d i a n b = 1 , , B S ^ t b
To further suppress random fluctuations across adjacent intervals, a moving-average operation is applied, with the smoothing window adaptively adjusted according to the data’s temporal granularity and variability, yielding a smoothed sequence S ~ t .
To quantify predictive uncertainty, empirical quantiles are extracted from the ensemble distribution. Specifically, the 2.5% and 97.5% [19] quantiles define the lower and upper bounds of the prediction interval:
S ^ t l o = P 2.5 % S ^ t b , S ^ t h i = P 97.5 % S ^ t b
If the entire interval lies above zero, the company is exposed to a high probability of positive spreads during period t; conversely, if the interval lies entirely below zero, negative spreads are highly likely.
In parallel, another ensemble of B CatBoost classifiers f c l s ( b ) is trained to estimate the probability of spread direction:
p ¯ t c l s = 1 B b = 1 B P r S t > 0 x t ; ϕ b
The resulting probability p ¯ t c l s reflects the likelihood that upward bidding yields an advantage for retailers in interval t, thereby serving as a directional signal for subsequent decision-making. In addition, a peak-detection method is applied to the smoothed series S ~ t to identify local maxima and minima in price trends, denoted by indicator variables I t p e a k and I t t r o u g h trough, respectively. These indicators capture relative price extremes and provide complementary reference signals for the Strategy Agent.

2.1.3. Strategy Agent

Building upon the distributional forecasts provided by the Prediction Agent, the Strategy Agent generates the corresponding bidding ratios. Since the settlement cost function exhibits an approximately linear dependence on the bidding ratio, optimal solutions often occur at the boundaries of the feasible region. However, in the presence of uncertainty in the spread signals, relying excessively on extreme bids may amplify risks. Therefore, differentiated decision rules are adopted under strong-signal and gray-area scenarios, as illustrated in Figure 3.
When the predicted spread interval lies entirely above zero, or when the probability of a positive spread estimated by the classification model exceeds a threshold τ h i , the market can be considered as being in a highly certain positive-spread regime. In this case, adopting the upper bound y t = 1.1 is more appropriate. Conversely, if the interval lies entirely below zero or the probability of a positive spread is below 1 τ h i , the market shows a clear inclination toward day-ahead prices, and the bidding ratio should be tightened to y t = 0.9 .
When the predicted spread interval straddles zero, the results exhibit high uncertainty. To achieve more robust decision-making in such “gray area” scenarios, this study introduces a continuous mapping based on multi-dimensional signals. First, a probability component is constructed from directional probabilities:
s t p r o b = 2 p ¯ t c l s 1
which reflects the likelihood of real-time prices holding a relative advantage. In parallel, a magnitude component is derived from the smoothed spread S ~ t , capturing the intensity of adjustment implied by the spread magnitude.
s t r e g = s i g n S ~ t m i n S ~ t s p r e a d t h , 1
which captures the adjustment intensity implied by the smoothed spread magnitude. Third, a structural component is introduced:
s t p e a k = γ I t p e a k I t t r o u g h
to strengthen or attenuate the decision signal when the price lies at local peaks or troughs. These three components are linearly aggregated with back-propagation–optimized weights ( w 1 , w 2 , w 3 ) to form a composite score:
h t = w 1 s t p r o b + w 2 s t r e g + w 3 s t p e a k , w 1 + w 2 + w 3 = 1
To avoid overly aggressive adjustments, the composite score is subjected to a shrinkage operation:
h ~ t = c l i p κ h t , 1,1
which constrains the amplitude of decisions in gray areas. The final bidding ratio is then determined as:
y t = 1 + 0.1 h ~ t , 0.9 y t 1.1
This design ensures that the bidding ratio remains within the feasible domain, while enabling boundary strategies under high-confidence signals and smooth transitions under uncertain scenarios. To further comply with physical operational constraints, the bidding sequence must also satisfy ramping condition, such that the ratio difference between consecutive intervals does not exceed a predefined limit r , thereby ensuring implement ability in real-world dispatch operations.

2.1.4. Feedback Agent

After market operations conclude, the publication of real-time settlement prices provides external feedback signals for strategy optimization, as illustrated in Figure 4.
The core function of the Feedback Agent is to evaluate the effectiveness of submitted strategies using the settlement outcome C t ( y t ) , and to achieve adaptive parameter updates under the dual objectives of profit maximization and risk control. By incorporating upward and downward adjustment coefficients η + and η , an adjustment factor ϕ ( y t ) is introduced, with the corresponding load deviation defined as:
Δ L t = L t ϕ y t 1
The profit function is then expressed as:
p r o f i t t = λ t R T λ t D A Δ L t
where the first term captures the market spread and the second reflects deviations in bidding relative to actual load.
To simultaneously pursue profit maximization and tail-risk control, the Feedback Agent employs Conditional Value at Risk (CVaR) as a risk metric. At a given confidence level α :
C V a R α = E p r o f i t t p r o f i t t q α
where q α is the α -quantile of the profit distribution. Based on this, a composite score function is constructed:
S c o r e = t p r o f i t t λ C V a R α
With λ > 0 denoting a risk-aversion coefficient to balance returns and risks. Candidate parameter generation follows a three-layer mechanism: (i) local perturbation samples near historically optimal solutions to achieve fine-grained improvements; (ii) global exploration samples randomly in continuous space to enhance diversity; (iii) grid fallback preserves pre-defined discrete points to ensure feasibility under extreme conditions.
During this process, each candidate parameter set produces corresponding profit–risk evaluations. By comparing the settlement results of different bidding ratios under the same state x t , preference pairs can be formed: if bidding ratio y w yields a better outcome than y l , this is denoted as:
x t , y w y l
These preference relations serve as crucial signals for strategy updates. The Feedback Agent incorporates them into training through Direct Preference Optimization (DPO) [20], directly embedding preferences into the learning objective. The loss function is defined as:
L D P O ϕ ; f θ x = E x t , y w y l l o g σ β Δ ϕ f θ x ; y w , y l Δ r e f x ; y w , y l
where Δ ϕ denotes the Strategy Agent’s scoring difference between two bidding ratios, Δ r e f is the baseline preference of a reference model, and β is a temperature parameter. By minimizing this loss, the model adaptively adjusts its strategy while maintaining feasibility, progressively converging toward more profitable directions.
Through iterative interactions, the Feedback Agent continuously accumulates preference information from real market outcomes and transforms it into optimization signals, thereby driving the long-term evolution and robust optimization of the prediction–decision system.

2.1.5. Joint Optimization of Forecasting and Strategy

In the proposed framework, the prediction model, decision model, and feedback mechanism do not operate independently but are integrated into a closed-loop system through shared feature representations and optimization signals. The prediction module constructs temporal feature representations f θ ( x t ) from historical load, market prices, and external environmental factors, and on this basis outputs point estimates of future spreads, confidence intervals, and classification probabilities. The decision module then takes this representation as input, and under bidding ratio constraints of the electricity market, maps the predictive outputs into concrete bidding schemes y t . Specifically, it selects boundary ratios under high-confidence spread signals, while in uncertain intervals it employs directional probabilities, spread magnitudes, and extreme-value indicators to generate continuous decision intensities.
The feedback mechanism incorporates real settlement outcomes after trading, computing realized profits and risk measures and constructs a Direct Preference Optimization (DPO) loss from preference relations. Since the input difference of DPO, Δ ϕ ( f θ ( x ) ; y w , y l ) , directly depends on the predictive representation f θ ( x ) , the feedback optimization not only adjusts the strategy parameters ϕ , but also imposes indirect constraints on the prediction model through a joint objective function:
min θ , ϕ   L s u p θ + λ R L L D P O ϕ ; f θ x t
where L s u p ensures the accuracy of spread forecasting, and L D P O guides the strategy toward more profitable directions via preference-based feedback. Through this joint training mechanism, the prediction–decision–feedback loop evolves dynamically: prediction results drive decision generation, decision outcomes are evaluated via feedback to form preference signals, and these signals in turn refine both the prediction and decision models. This closed-loop interaction enables continuous adaptation and evolution in complex and volatile electricity market environments.

2.2. Experimental Design

2.2.1. Data and Experimental Setup

This study employs electricity spot market data from Shanxi Province, China, covering the period from July 2024 to June 2025, with a temporal resolution of 15 min (96 intervals per day). The dataset comprises day-ahead and real-time price curves, load data, and meteorological features. The price spread, defined as the difference between the real-time and day-ahead prices, is used as the regression prediction target, while its sign is employed as the classification label. For data partitioning, a walk-forward strategy is adopted: in each round, the most recent historical data are used as the training set, the subsequent days as the validation set for hyperparameter tuning and early stopping, and the following days as the test set, thereby ensuring consistency with the “predicting the future based only on past information” logic of real trading. Within each training round, the samples are further split sequentially, with the first 90% reserved for training and the last 10% for validation, to prevent information leakage and ensure robust evaluation. Unless otherwise specified, the CatBoost parameters are set to a tree depth of 8, a learning rate of 0.05, a maximum of 2000 iterations, and an L2 regularization coefficient of 3.0.
All experiments were conducted on the same hardware platform equipped with an Intel Xeon Gold 6330 CPU, an NVIDIA A100 GPU, and 512 GB of memory, running Ubuntu 22.04. The models were implemented using Python 3.10 and the CatBoost library (version 1.2.8), with multi-processing enabled to improve training efficiency. To ensure reproducibility, all random number generators were initialized with fixed seeds.

2.2.2. Evaluation Metrics

In this study, both economic and classification-oriented metrics are employed to evaluate model performance. For the comparison experiments, the evaluation includes monthly cumulative profit, spread-weighted accuracy (SWA), which evaluates the directional prediction accuracy under different price spread conditions; precision and recall, which respectively measure the reliability and completeness of positive signal identification; and avoidance accuracy (AA), which reflects the model’s capability to correctly avoid disadvantageous trading situations. Monthly profit directly measures the economic benefits of trading strategies in real market contexts, while the other indicators assess the accuracy and robustness of predictions and decisions.
The SWA accounts for the fact that prediction errors during periods with larger price spreads have greater economic consequences. For the i - t h interval, let the real-time price be λ i R T and the day-ahead price be λ i D A , so that the spread is defined as S i = λ i R T λ i D A . If the predicted direction y ^ i { 1,1 } matches the true direction yi, the indicator function I ( y ^ i = y i ) = 1 , and 0 otherwise. SWA is then calculated as:
S W A = i = 1 n S i I y ^ i = y i i = 1 n S i
where n denotes the total number of samples.
Classification metrics are derived from the confusion matrix, where TP, FP, TN, and FN represent true positives, false positives, true negatives, and false negatives, respectively. Based on these, precision, recall, and avoidance accuracy are defined as follows:
P r e c i s i o n = T P T P + F P
R e c a l l = T P T P + F N
A A = T N T N + F N
Precision reflects the reliability of upward signals, recall measures the ability to capture upward opportunities, and avoidance accuracy—also referred to as negative predictive value (NPV)—assesses the reliability of downward signals in helping avoid potential losses.
For the ablation studies, the evaluation focuses primarily on monthly cumulative profit, which directly reflects the economic effectiveness of different model configurations in trading scenarios.

2.2.3. Comparative Experiments

To rigorously verify the accuracy of the forecasting component, the comparison experiments were designed with a set of baseline methods focusing on modeling electricity price spreads, rather than directly comparing trading strategies. The selected baselines cover traditional statistical models, time-series approaches, and state-of-the-art machine learning methods. Specifically, XGBoost [21] and LightGBM [22] are widely used gradient boosting tree frameworks that efficiently capture nonlinear relationships in large-scale feature spaces, serving as representative machine learning models; Prophet, proposed by Meta, is capable of modeling long-term trends and seasonality, making it particularly suitable for capturing cyclical and periodic fluctuations in electricity prices; and ARIMA [23], as a classical statistical method, has long been applied to electricity load and price forecasting. These models were chosen because they represent mainstream paradigms in forecasting and provide a comprehensive basis for evaluating differences in predictive accuracy. It is important to emphasize that the purpose of the comparison experiments is to highlight the contribution of the forecasting module to the overall framework. Therefore, the decision-making stage is standardized across all baselines using a hard-coded strategy: when the predicted signal indicates a positive spread, the bidding ratio is fixed at 1.1, and when the predicted signal indicates a negative spread, the bidding ratio is fixed at 0.9. This setup eliminates confounding factors introduced by heterogeneous strategy designs, ensuring that all models are compared under identical decision rules, thereby allowing for a fair and objective assessment of forecasting accuracy and its impact on overall profit and risk performance.

2.2.4. Ablation Study

The ablation study is conducted to validate the contribution of the proposed innovations to the overall framework. Unlike the comparison experiments, the ablation study does not involve replacing forecasting models; instead, it is performed on the same set of trained models, with specific modules gradually removed or disabled to observe changes in performance. Monthly cumulative profit is used as the core evaluation metric, directly reflecting the economic value of different configurations in trading scenarios. To further dissect the role of each component, this paper systematically removes the uncertainty interval shrinkage, extreme-value signal adjustment, continuous mapping mechanism, and feedback optimization module from the full framework and evaluated their impact on strategy performance.

3. Results

3.1. Comparative Experiment Results

As shown in Table 1, the results reveal clear differences among the models in terms of both classification performance and profitability. Prophet achieves the highest Recall (89.16%), predicting nearly all positive spreads as upward, but its Precision is only 48.48%, leading to a very low monthly profit of 8669.41 CNY due to amplified misjudgments under the fixed bidding strategy. In contrast, XGBoost and LightGBM provide a more balanced trade-off between Precision and Recall, achieving SWA scores of 57.36% and 54.60% and monthly profits of 140,801.04 CNY and 105,464.19 CNY, respectively. ARIMA performs the worst on all metrics, ultimately producing a loss.
Although our method shows a relatively low Recall (40.45%), this does not indicate inferior performance. Instead, it reflects the conservative design of our forecasting mechanism. Unlike conventional models, it integrates regression-based spread signals, prediction intervals, and probabilistic constraints. When signals are weak or uncertainty is high, the model deliberately abstains from giving directional forecasts, providing decisions only when spreads are strong and confidence is high. This robust filtering improves Precision to 53.25% and maintains a competitive SWA (57.36%) and AA (56.05%), effectively reducing costly mispredictions in economically significant periods.

3.2. Ablation Experiment Results

To further evaluate the contribution of each component in the proposed method, this work conducted an ablation study. As shown in the Table 2, Joint achieves the highest monthly profit of 157,746.64 CNY, outperforming all ablated variants. This indicates that each module plays a positive role in enhancing both predictive performance and economic outcomes.
When the uncertainty interval shrinkage is removed, the profit drops to 149,763.89 CNY. This demonstrates that interval shrinkage serves as a crucial risk-control mechanism. By constraining the residual distribution, the model can effectively suppress over-predictions under low-confidence conditions and avoid erroneous directional decisions caused by excessive uncertainty. Without this module, weak signals and noise are more likely to be amplified into wrong forecasts, thereby increasing loss risk.
When the peak–valley correction is removed, the profit decreases sharply to 123,559.44 CNY, which represents the largest decline among all ablation variants. This result highlights that correcting extreme signals is essential for capturing drastic spread fluctuations. In electricity spot markets, large peaks and valleys usually correspond to the most economically significant intervals. Without effective correction, the model is prone to deviating from real market movements, leading to severe losses in critical periods.
Removing the continuous mapping reduces the profit to 146,616.57 CNY. The primary role of this module is to maintain the smoothness of predictions and strategy generation, thereby preventing excessive volatility caused by discrete decisions. Without it, the model’s predictions and bidding strategy become more sensitive to short-term noise, weakening profit stability and reducing overall returns.
Finally, removing the feedback optimization reduces the profit to 146,933.46 CNY. Feedback optimization leverages settlement results to dynamically updating the model, allowing predictions and decisions to continuously improve over time. Without feedback adjustment, the model cannot fully exploit posterior market information, resulting in reduced adaptability to changing market conditions and a subsequent drop in profitability.
In summary, all four modules contribute positively to the final profit, with peak–valley correction and interval shrinkage being particularly critical as they directly affect performance in high-spread intervals. The full model integrates uncertainty control, extreme-value correction, continuous mapping, and feedback optimization to achieve the best economic outcome. These results not only validate the rationality of each module’s design but also confirm the practical value and robustness of the proposed method in electricity spot market applications.

4. Discussion

4.1. Comparison with Existing Research

In the context of highly volatile and uncertain electricity spot markets, this study introduces targeted innovations across price forecasting, bidding strategy formation, multi-agent coordination, and feedback learning, forming a closed-loop intelligent decision-making framework that differs substantially from existing research.
Regarding price forecasting, prior studies primarily aim to improve point accuracy or distributional modeling, yet rarely provide mechanisms that translate predictive signals into economically actionable bidding decisions [8,9,10,11]. The proposed framework integrates distributional intervals, directional probabilities, and extreme-value indicators and maps them directly to bidding ratios, enabling a coherent transition from predictive information to profit-oriented strategy formation.
Regarding retailer bidding strategies, traditional approaches typically rely on fixed risk parameters or static decision rules, limiting adaptivity under rapidly changing spot-market conditions [12,13,14,15]. The strategy module in this framework employs a continuous, multi-signal mapping mechanism that adjusts bidding decisions dynamically based on uncertainty levels, thereby enhancing decision stability in volatile environments.
Regarding multi-agent learning and market interaction modeling, existing MARL research largely concentrates on mechanism simulation or single-stage optimization [16,17,18]. The present framework incorporates a dedicated Feedback Agent that updates forecasting and bidding components based on settlement outcomes, forming a self-correcting forecast–decision–feedback loop rather than a one-directional learning structure. Regarding retail-side profit maximization, prior multi-agent or reinforcement learning studies generally focus on system-level market behavior and do not offer a complete workflow tailored to retailer operations [16,17,18]. The integrated architecture in this study combines forecasting, strategy generation, and preference-based feedback to create a deployable, interpretable, and iteratively improving decision process for retail participants.
Overall, the proposed approach establishes clear methodological advancements in workflow integration, adaptivity, and retailer-oriented operational design, offering a new perspective for intelligent decision-making in complex electricity spot markets.

4.2. Limitations

While the forecasting–decision–feedback closed-loop mechanism proposed in this study aims to enhance the profitability and robustness of retail electricity companies in the electricity spot market, it also extends conventional forecasting–decision frameworks by incorporating an adaptive feedback process that enables continuous refinement between forecasting and bidding stages. However, it still faces several limitations in practical application. First, the method is highly dependent on forecasting accuracy. When the market undergoes sharp fluctuations or extreme conditions, deviations in spread forecasts may be directly amplified in the decision-making stage, leading to suboptimal bidding strategies and elevated operational risks. Moreover, under strong-signal scenarios the strategy often converges to boundary values (0.9 or 1.1). While such behavior can yield high profits in deterministic environments, any forecasting errors in direction may exacerbate losses and undermine stable operation. In addition, for gray zones where the spread interval crosses zero, the proposed approach employs a multi-signal weighted continuous mapping mechanism to generate bidding ratios. However, this design relies on manually set weight parameters and lacks dynamic adaptability, which may result in unstable performance across different market environments. Furthermore, risk control in this framework is primarily based on the Conditional Value at Risk (CVaR) metric. Although effective in capturing tail risks, it fails to reflect multi-dimensional factors that retail companies face in practice, such as contract fulfillment obligations, deviation penalties, liquidity pressures, and customer heterogeneity. It should also be noted that the current experimental validation is conducted using data from a provincial electricity spot market in China. Different provinces and countries adopt varying market rules, pricing mechanisms, and trading intervals, which may significantly influence bidding behavior and model applicability. Therefore, the generalizability of the proposed framework to other regional or international markets remains limited and requires further cross-market validation and adaptive mechanism design. Finally, since the implementation involves multi-model integration, confidence interval quantification, and preference optimization, the computational complexity remains high, raising challenges for real-time application in high-frequency decision-making scenarios.

4.3. Future Work

To address these limitations, future research should focus on improving the robustness, adaptability, and practicality of the framework. In the forecasting stage, deep temporal models combined with external knowledge graphs can be introduced to integrate weather, policy, and cross-regional electricity price information, thereby enhancing both the stability and interpretability of spread predictions. In addition, a policy interpretation agent can be developed based on knowledge graph construction, enabling the framework to automatically extract, understand, and reason over policy documents, market regulations, and regional rule variations, thus providing structured policy intelligence to support adaptive decision-making. In strategy generation, adaptive boundary-adjustment mechanisms should be developed to dynamically learn from market conditions and historical feedback, reducing reliance on fixed thresholds and achieving a more balanced trade-off between profit and risk. The continuous mapping mechanism in gray zones can be improved through adaptive weight learning or reinforcement learning, replacing manually set parameters to ensure flexibility and stability under diverse market conditions. In risk management, future work should extend beyond CVaR to incorporate a multi-dimensional risk-control framework that accounts for contract fulfillment, deviation penalties, liquidity constraints, and customer heterogeneity, thus better aligning with real-world operational needs. Moreover, deeper integration of market rules and physical constraints, such as margin management, transmission capacity limits, and coordinated contract optimization, will improve the applicability of the approach in real electricity markets. Finally, to tackle computational complexity and ensure real-time applicability, methods such as model distillation, parallel computing, and incremental updating should be adopted, enabling the proposed framework to operate effectively in high-frequency trading and rolling optimization environments.

5. Conclusions

This study addresses the challenges of high volatility and uncertainty in electricity spot markets under increasing renewable energy penetration through an integrated approach that jointly optimizes forecasting accuracy, decision robustness, and risk control. Experimental results show that Joint achieves a monthly profit of 146,933.46 CNY in prediction validation with favorable classification performance (Precision = 53.25%, Recall = 40.45%, AA = 56.05%, SWA = 57.36%), while the complete model in ablation experiments reaches the highest monthly profit of 157,746.64 CNY, confirming the indispensable contributions of each component to robustness and profitability. These findings demonstrate that Joint is both theoretically sound and practically valuable, and future work will focus on enhancing computational efficiency through lightweight modeling and knowledge distillation, incorporating advanced risk management metrics such as CVaR and dynamic risk-preference modeling, and extending applications across multiple markets and diverse energy resources to support renewable integration and market-based trading.

Author Contributions

Conceptualization, S.Z. and W.D.; methodology, S.Z. and M.L.; software, Y.Z.; validation, Y.Z. and B.W.; formal analysis, B.W. and Z.J.; investigation, S.Z. and M.L.; resources, N.G. and W.D.; data curation, N.G., J.Y. and Z.J.; writing—original draft preparation, S.Z. and N.G.; writing—review and editing, W.D. and J.Y.; visualization, Y.Z. and Z.J.; supervision, N.G. and J.Y.; project administration, M.L. and N.G.; funding acquisition, M.L. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the Shenzhen Science and Technology Program (Grant No. KJZD20241122161901002).

Data Availability Statement

The data that support the findings of this study are available on request from the corresponding author. The data is not publicly available due to privacy and ethical restrictions.

Conflicts of Interest

Authors Shicheng Zhang and Yuqin Zhang were employed by the company Jiangsu Institute of Smart Energy Utilization and Low-Carbon Technologies Co., Ltd., author Wangli Deng was employed by the company Shenzhen Shenneng Innovation Technology Co., Ltd., authors Ning Guo and Jianyu Yu were employed by the company State Grid Jiangsu Electric Power Co., Ltd., author Mei Liao was employed by the company Shenzhen Energy Group Co., Ltd. The remaining authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

References

  1. Florini, A. The International Energy Agency in global energy governance. Glob. Policy 2011, 2, 40–50. [Google Scholar] [CrossRef]
  2. Narajewski, M.; Ziel, F. Optimal bidding in hourly and quarter-hourly electricity price auctions: Trading large volumes of power with market impact and transaction costs. Energy Econ. 2022, 110, 105974. [Google Scholar] [CrossRef]
  3. Conejo, A.J.; Carrión, M.; Morales, J.M. Decision Making Under Uncertainty in Electricity Markets; Springer: New York, NY, USA, 2010; Volume 1. [Google Scholar]
  4. Weron, R. Electricity price forecasting: A review of the state-of-the-art with a look into the future. Int. J. Forecast. 2014, 30, 1030–1081. [Google Scholar] [CrossRef]
  5. Kristiansen, T. Forecasting Nord Pool day-ahead prices with an autoregressive model. Energy Policy 2012, 49, 328–332. [Google Scholar] [CrossRef]
  6. Ramchurn, S.D.; Vytelingum, P.; Rogers, A.; Jennings, N.R. Putting the ‘smarts’ into the smart grid: A grand challenge for artificial intelligence. Commun. ACM 2012, 55, 86–97. [Google Scholar] [CrossRef]
  7. Gao, S.; Wen, Y.; Zhu, M.; Wei, J.; Cheng, Y.; Zhang, Q.; Shang, S. Simulating financial market via large language model based agents. arXiv 2024, arXiv:2406.19966. [Google Scholar] [CrossRef]
  8. O’Connor, C.; Bahloul, M.; Prestwich, S.; Visentin, A. A Review of Electricity Price Forecasting Models in the Day-Ahead, Intra-Day, and Balancing Markets. Energies 2025, 18, 3097. [Google Scholar] [CrossRef]
  9. Cao, M.; Wang, Y.; Liu, J.; Yin, Z.; Guo, X.; Ren, X. Day ahead electricity price forecasting based on the deep belief network. Wirel. Commun. Mob. Comput. 2022, 2022, 3960597. [Google Scholar] [CrossRef]
  10. Marcjasz, G.; Narajewski, M.; Weron, R.; Ziel, F. Distributional neural networks for electricity price forecasting. Energy Econ. 2023, 125, 106843. [Google Scholar] [CrossRef]
  11. Ribeiro, M.H.D.M.; Stefenon, S.F.; de Lima, J.D.; Nied, A.; Mariani, V.C.; Coelho, L.d.S. Electricity Price Forecasting Based on Self-Adaptive Decomposition and Heterogeneous Ensemble Learning. Energies 2020, 13, 5190. [Google Scholar] [CrossRef]
  12. Li, Y.; Yang, Y.; Zhang, F.; Li, Y. A Stackelberg game-based approach to load aggregator bidding strategies in electricity spot markets. J. Energy Storage 2024, 95, 112509. [Google Scholar] [CrossRef]
  13. Yang, N.; Zhu, L.; Wang, B.; Fu, R.; Qi, L.; Jiang, X.; Sun, C. A Master–Slave Game-Based Strategy for Trading and Allocation of Virtual Power Plants in the Electricity Spot Market. Energies 2025, 18, 442. [Google Scholar] [CrossRef]
  14. Zhang, L.; Tian, C.; Li, Z.; Yin, S.; Xie, A.; Wang, P.; Ding, Y. The Impact of Participation Ratio and Bidding Strategies on New Energy’s Involvement in Electricity Spot Market Trading under Marketization Trends—An Empirical Analysis Based on Henan Province, China. Energies 2024, 17, 4463. [Google Scholar] [CrossRef]
  15. Ma, Q.; Liu, B.; Li, J. A Trading Model for the Electricity Spot Market That Takes into Account the Preference for Energy Storage Trading. Energies 2025, 18, 2322. [Google Scholar] [CrossRef]
  16. Renshaw-Whitman, C.; Zobernig, V.; Cremer, J.L.; de Vries, L. Non-stationarity in multiagent reinforcement learning in electricity market simulation. Electr. Power Syst. Res. 2024, 235, 110712. [Google Scholar] [CrossRef]
  17. Liao, Z.; Li, C.; Zhang, X.; Hu, Q.; Wang, B. A Bidding Strategy for Power Suppliers Based on Multi-Agent Reinforcement Learning in Carbon–Electricity–Coal Coupling Market. Energies 2025, 18, 2388. [Google Scholar] [CrossRef]
  18. Wang, Y.; Liu, C.; Yuan, W.; Li, L. MRL-Based Model for Diverse Bidding Decision-Makings of Power Retail Company in the Wholesale Electricity Market of China. Axioms 2023, 12, 142. [Google Scholar] [CrossRef]
  19. Nowotarski, J.; Weron, R. Recent advances in electricity price forecasting: A review of probabilistic forecasting. Renew. Sustain. Energy Rev. 2018, 81, 1548–1568. [Google Scholar] [CrossRef]
  20. Bai, C.; Zhang, Y.; Qiu, S.; Zhang, Q.; Xu, K.; Li, X. Online Preference Alignment for Language Models via Count-based Exploration. arXiv 2025, arXiv:2501.12735. [Google Scholar] [CrossRef]
  21. Chen, T.; Guestrin, C. XGBoost: A Scalable Tree Boosting System; ACM: New York, NY, USA, 2016. [Google Scholar]
  22. Meng, Q. LightGBM: A Highly Efficient Gradient Boosting Decision Tree. In Proceedings of the Neural Information Processing Systems, Long Beach, CA, USA, 4–9 December 2017. [Google Scholar]
  23. Box, G. Box and Jenkins: Time Series Analysis, Forecasting and Control. In A Very British Affair: Six Britons and the Development of Time Series Analysis During the 20th Century; Palgrave Macmillan UK: London, UK, 2013; pp. 161–215. [Google Scholar]
Figure 1. Multi-agent decision-making framework.
Figure 1. Multi-agent decision-making framework.
Energies 18 06486 g001
Figure 2. Structure and workflow of the Prediction Agent.
Figure 2. Structure and workflow of the Prediction Agent.
Energies 18 06486 g002
Figure 3. Structure of the Strategy Agent and its decision logic.
Figure 3. Structure of the Strategy Agent and its decision logic.
Energies 18 06486 g003
Figure 4. Structure of the Feedback Agent.
Figure 4. Structure of the Feedback Agent.
Energies 18 06486 g004
Table 1. Overall Performance Comparison of Different Forecasting Methods on the Test Set.
Table 1. Overall Performance Comparison of Different Forecasting Methods on the Test Set.
MethodMonthly Profit
(×104 CNY)
SWAPrecisionRecallAA
XGBoost140,801.0457.36%50.88%62.95%58.27%
LightGBM105,464.1954.60%49.18%51.73%55.02%
prophet8669.41155.04%48.48%89.16%62.11%
ARIMA−18,500.3049.21%45.10%42.32%51.40%
Prediction Agent146,933.4657.36%53.25%40.45%56.05%
Table 2. Ablation Study Results.
Table 2. Ablation Study Results.
MethodMonthly Profit
(×104 CNY)
Joint157,746.64
Joint-noCI149,763.89
Joint-noPV123,559.44
Joint-noCM146,616.57
Joint-noFO146,933.46
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Zhang, S.; Deng, W.; Zhang, Y.; Jing, Z.; Guo, N.; Yu, J.; Wang, B.; Liao, M. A Multi-Agent Closed-Loop Decision-Making Framework for Joint Forecasting and Bidding in Electricity Spot Markets. Energies 2025, 18, 6486. https://doi.org/10.3390/en18246486

AMA Style

Zhang S, Deng W, Zhang Y, Jing Z, Guo N, Yu J, Wang B, Liao M. A Multi-Agent Closed-Loop Decision-Making Framework for Joint Forecasting and Bidding in Electricity Spot Markets. Energies. 2025; 18(24):6486. https://doi.org/10.3390/en18246486

Chicago/Turabian Style

Zhang, Shicheng, Wangli Deng, Yuqin Zhang, Zhijun Jing, Ning Guo, Jianyu Yu, Bo Wang, and Mei Liao. 2025. "A Multi-Agent Closed-Loop Decision-Making Framework for Joint Forecasting and Bidding in Electricity Spot Markets" Energies 18, no. 24: 6486. https://doi.org/10.3390/en18246486

APA Style

Zhang, S., Deng, W., Zhang, Y., Jing, Z., Guo, N., Yu, J., Wang, B., & Liao, M. (2025). A Multi-Agent Closed-Loop Decision-Making Framework for Joint Forecasting and Bidding in Electricity Spot Markets. Energies, 18(24), 6486. https://doi.org/10.3390/en18246486

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop