1. Introduction-
In Japan, the transition to renewable energy as the main power source has been strongly promoted to achieve carbon neutrality by 2050. In particular, expanding the development of photovoltaic (PV) deployment has become an important policy issue [
1]. Although PV systems are relatively easy to install as distributed energy resources, their output strongly depends on weather conditions, particularly solar irradiance. Therefore, short-term output fluctuations and prediction errors can significantly affect the power supply–demand balance. As PV penetration increases, the operational uncertainty of power systems also increases, and the demand for the reserve power required to maintain a stable electricity supply increases.
In addition, the widespread adoption of PV systems has substantially influenced electricity market prices. As renewable generation is increasingly introduced into the market, periods during which daytime generation exceeds electricity demand become more likely, leading to significant price declines in energy markets, such as the day-ahead spot market [
2]. Negative prices have been reported in Europe, and in Japan, market prices occasionally approach zero during high-PV hours [
3,
4]. The price decline reduces the profitability of PV operators and may hinder large-scale PV deployment. Therefore, to support the sustainable expansion of PV systems, new revenue opportunities that do not rely solely on electricity sales in the energy market need to be secured.
From this perspective, participation in a balancing market [
5], specifically, the possibility that PV facilities can create and provide reserve power, is of particular interest. The balancing market is a framework through which transmission system operators procure the reserve power required to maintain the supply and demand balance, as well as the power system frequency. The importance of renewable energy has increased with the expansion of renewable energy integration. Traditionally, PV systems have been considered variable generation resources that require reserve power. However, this study aimed to explore the possibility of PV systems providing reserve power by absorbing their own uncertainty through proper planning and control.
Recent studies have increasingly linked renewable energy forecasting to market-oriented decision-making. Wang et al. proposed a risk-averse forecasting method for renewable energy trading based on the conditional value at risk (CVaR) assessment of forecast errors [
6]. This demonstrates that, in imbalance-sensitive markets, forecast models should be evaluated not only by their average accuracy but also by their ability to reduce extreme errors and financial risk. This perspective is closely related to the present study, which focused on identifying rare, high-impact forecast errors. In addition to risk assessments based on forecasts, several studies have investigated the economic participation of PV resources in ancillary services and electricity markets. Petkovski et al. evaluated the economic viability of PV systems providing frequency containment reserve services and demonstrated their potential to participate in flexible markets when operational constraints and reserve requirements were considered [
7]. Zhou et al. developed a further learning-based joint bidding strategy for a PV and battery system in Singapore’s electricity market, demonstrating the importance of data-driven bidding strategies for coordinating renewable energy generation and storage in the face of market uncertainty [
8].
Our team has developed a coherent line of research to enable PV systems to participate in electricity markets under forecast uncertainty, with a consistent focus on operational risk, particularly shortage-imbalance exposure, and practical mechanisms to provide flexibility. Early studies established a headroom-control concept to absorb PV prediction errors and systematically compared statistical, machine learning, and combined models of error absorption. This study clarified that, for market-facing operation, reducing average point-forecast error alone is insufficient; rather, an explicit operational “buffer” (headroom) can substantially improve reliability by limiting shortage-driven deviations when forecasts fail [
9]. Building on this foundation, we propose an implementable headroom-setting algorithm for reserve creation in PV power plants. By representing the mapping from the operating conditions to the required headroom using polynomial surfaces, this method converts error-absorption requirements into actionable rules suitable for real-time operation and reserve provisioning [
10].
In parallel, our study connected forecasting uncertainty to Japan’s balancing market participation more explicitly by formulating day-ahead planning and evaluating the shortage-occurrence risk under market participation constraints. This study quantified how plan submission under forecast uncertainty can propagate into shortage events and operational penalties and highlighted the importance of risk-aware planning when withdrawal-related constraints are imposed [
11]. Recently, we have proposed a probabilistic modeling framework for prediction errors aimed at enhancing the balancing market participation of PV systems. This line of work introduced error-threshold estimation and incorporated multisite aggregation effects and inverter overloading considerations, which are practical factors that materially shape the distribution of prediction errors and the tail-risk profile relevant to imbalance exposure [
12]. Collectively, these studies established (i) operational mechanisms for absorbing prediction errors via headroom control, (ii) implementable reserve creation and planning rules aligned with market participation, and (iii) risk evaluation frameworks that incorporate uncertainty, aggregation, and equipment constraints. However, infrequent but severe forecast failures (rare-event tail errors) can disproportionately dominate market outcomes. In particular, certain error types, such as sign-crossing mistakes that invert the direction of over-/under-estimation risk, can lead to operationally critical decisions and heightened shortage-imbalance exposure even when the overall accuracy differences are small. Motivated by this gap, the present study focused on discriminability-driven feature design and selection within a hierarchical classification pipeline, using the area under the receiver operating characteristic (ROC) curve (AUC) as a separability criterion, and evaluated the performance not only via aggregate accuracy but also via operationally critical misclassification patterns and simulated bidding outcomes.
PV systems require advanced forecasting techniques to participate in the balancing market and reliably submit planned values [
5]. PV power forecasting methods based on numerical weather prediction data, such as the Japan Meteorological Agency (JMA) GPV-GSM data, are widely used [
13,
14]. Numerical weather prediction (NWP)-based irradiance and PV forecasting have long been recognized as essential for power system operations under high PV penetration. Nagoya et al. [
15] emphasized that, when PV is widely distributed across numerous sites, system operation requires not only point-wise forecast accuracy but also the verification of forecast performance on an area-wide scale. They evaluated an approach to predict the area-total irradiance using regional NWP data from the JMA (GPV), and compared the forecasted area-total irradiance with a presumed total irradiance derived from numerous points corresponding to the PV distribution, thereby providing an evaluation framework relevant to wide-area balancing needs. Suzuki et al. [
16] proposed a solar irradiance forecasting method that combined NWP information (GPV-based physical model inputs) with a data-driven black-box model via just-in-time modeling. They assessed the prediction accuracy over wide areas using measured data from multiple observation points (44 sites across Japan provided by the JMA and 64 sites around Kanto provided by the New Energy and Industrial Technology Development Organization) and also examined how different GPV initialization times affect prediction performance. This study is informative in that it demonstrates, at an early stage, a hybrid perspective that leverages NWP while compensating for local variability through data-driven modeling. Although these studies provide important insights into wide-area forecasting and NWP-based prediction frameworks, they mainly focus on average accuracy measures and general forecasting performance. In contrast, for market participation and operational risk management, infrequent but severe forecast failures (rare-event errors) can be disproportionately consequential, for example, by triggering shortage imbalances. Accordingly, our study builds on NWP (GPV-GSM) inputs, but shifts the emphasis from average error reduction to rare-event risk detection and its integration into balancing-market bidding decisions.
Although these methods are suitable for predicting large-scale weather fields, they may not accurately capture the sudden changes in solar irradiance caused by local cloud formation and movement. Consequently, large discrepancies may arise between the day-ahead planned values and actual generation, resulting in imbalances.
Most previous studies have focused on improving the metrics that measure the average forecast error, such as the root mean square error, in PV power forecasting. However, from the viewpoint of electricity market operations, the factors that cause significant economic losses and operational disruptions are often not average errors, but rather low-frequency, large deviations from forecasts, that is, errors in forecasting rare events. In particular, in balancing-market bidding planning, submitting an excessively high planned value during time periods in which such rare events occur may increase shortage imbalances and associated penalties. Therefore, developing a framework that can detect rare large prediction errors in advance and adjust market participation volumes according to the associated risks is an important challenge.
Accordingly, this paper proposes a machine-learning-based method that classifies and detects the rare-event risk of solar irradiance forecast errors in PV systems in advance using GPV-GSM numerical weather prediction data and then reflects the results of balancing-market bidding planning. Specifically, rather than treating prediction errors as a single continuous variable and performing point forecasting, the proposed method classifies prediction errors into multiple risk categories, such as overestimation, underestimation, and rare large forecast deviations. Planned values are then determined according to each category. The effectiveness of the proposed method was evaluated using indicators, such as the number of imbalance events, with particular emphasis on shortage-imbalance events, to verify whether bidding planning that accounts for rare-event risk contributes to reducing operational risk in market participation. Through this approach, this study aimed to provide practical insights for improving the profitability of PV operators and securing reserve power in power systems.
The remainder of this paper is organized as follows.
Section 2 presents the research approach, including data preprocessing and the definition of rare events.
Section 3 describes rare-event classification modeling and AUC-based feature extraction.
Section 4 presents the numerical simulation conditions and results used to verify the effectiveness of the proposed method and the bidding planning strategies. Finally,
Section 5 concludes the paper and discusses future scope.
3. Rare-Event Classification Modeling
In this paper, we present an MLP model that incorporates AUC-based feature selection to capture temporal features. To validate the AUC-based model, it was compared with a case in which the AUC was not used.
3.1. Extraction of Candidate Features
In this study, we examined the extent to which input features contribute to discriminating between classes, that is, class separability. We quantitatively evaluated each feature using the AUC, which is a separability metric commonly used in binary classification. We also examined the effectiveness of an AUC-based feature selection scheme. The definitions of the four classes (Classes 1–4) and their corresponding data distributions are the same as those described in the previous section. To compute the AUC stage by stage, we formulated multiple binary classification tasks by pairing and/or grouping the classes.
Figure 2 shows an overview of the hierarchical (stagewise) structure. The primary objective of this section is to assess the influence of feature selection on the classification performance. Therefore, we maintained the classification procedure and training conditions identical across the experiments by varying only the input feature sets. Specifically, we compared a model that used the full set of available features with one that used a subset of features selected according to the AUC base.
In this study, the full candidate feature set was defined as the combination of basic and additional candidate features. The basic features comprised solar radiation, total cloud cover, and periodic features representing diurnal and seasonal cycles (sine/cosine of day and time). These features capture the fundamental factors underlying solar radiation prediction errors (solar radiation level, cloud attenuation, and periodic variation) and constitute the minimum essential information. Additional candidate features included humidity, precipitation, cloud cover at various levels (low, middle, and high), air temperature, barometric pressure, and wind speed. For each binary classification task at each stage, the univariate AUC was calculated for every candidate feature, and features with high discriminative capability were selected for model construction.
3.2. Definition of ROC and AUC
In a sample set, if a higher classification score indicates a higher probability of being positive, samples with are classified as positive using a threshold . Let and denote the classification scores for the negative and positive groups, respectively, and let and denote the corresponding cumulative distribution functions.
The false positive rate (FPR), FPR = FP/(FP + TN), represents the probability that a negative sample is incorrectly classified as positive, and is defined as follows:
Here, by setting the FPR to , and thus the threshold is expressed as a function of as follows: .
In contrast, the true positive rate (TPR), TPR = TP/(TP + FN), represents the probability that a positive sample is correctly classified as positive, corresponding to the sensitivity, and is defined as follows:
The ROC curve characterizes the relationship between the FPR and TPR as the classification threshold
varies. Based on the distributions of the classification scores in the negative and positive groups, the ROC curve can be expressed as a function of the FPR, denoted by
, as follows:
The AUC was defined as the area under the ROC curve and is obtained by integrating
with respect to the FPR
over the interval from 0 to 1. In addition to this geometric interpretation, the AUC can be interpreted as the probability that a randomly selected positive sample receives a higher score than a randomly selected negative sample.
To estimate this probability, all possible pairs of positive and negative samples were compared, and the proportion of pairs in which the positive sample had a higher score than the negative sample was calculated. In this study, the AUC was computed by treating each feature value as a discrimination score between positive and negative samples.
where
and
represent the total numbers of negative and positive samples, respectively.
represents the indicator function, which takes the value of 1 if the condition is true and 0 otherwise.
and
represent the values of feature
for the
-th negative and the
-th positive samples, respectively.
Figure 3 illustrates the representative AUC score distributions corresponding to different AUC values in the binary classification.
corresponds to random ordering, and
indicates perfect separability.
In this study, each candidate feature was treated as a univariate score, and its AUC was used as a quantitative separability measure. Features whose AUCs deviate further from 0.5 provide stronger discriminative power for the corresponding binary task. Therefore, we computed the univariate AUC for each stage-defined binary task using training data and selected features with high separability to augment the baseline feature set.
3.3. Stepwise AUC Calculation and Feature Selection
As the AUC is a metric for binary classification, this study decomposed the four-class classification into three binary tasks to calculate the AUC (as
Figure 2).
In Stage 1 (sign determination), the underprediction (Classes 1 and 2) and overprediction (Classes 3 and 4) sides were distinguished. Group 1-0 was assigned to the underprediction side and Group 1-1 to the overprediction side.
Stage 2 (underprediction degree determination) involves distinguishing between Classes 1 and 2. Group 2-0 was assigned to Class 1 and Group 2-1 to Class 2.
Stage 3 (overprediction degree determination) distinguished between Classes 3 and 4. Here, Group 3-0 was assigned to Class 3 and Group 3-1 to Class 4.
The univariate AUC for each stage was calculated using only the training data. To ensure that an evaluation was independent of the separation direction, we defined the separation index as follows: features that were greater than or equal to a specified threshold were selected. The selection criterion was set to
, with periodic features selected regardless of their AUC values.
The reason for setting the threshold at is that, although an AUC of 0.5 corresponds to random ordering (i.e., no discrimination), indicates a high probability that one group will have higher values than the other, enabling us to anticipate a certain degree of discrimination from a single feature. For this study, we adopted an empirical threshold of 0.6, balancing the number of features against the risk of overfitting.
3.4. MLP Model Structure
A binary classifier was trained using an MLP for each stage [
20]. Stage 1 determined whether a sample was under- or over- predicted. Based on this result, Stage 2 (underprediction) or Stage 3 (overprediction) was applied, ultimately determining the class (Classes 1–4).
For the performance comparison, the classification procedure, MLP architecture, training conditions, and decision threshold were maintained constant. The input features were varied according to the following two conditions:
- (1)
All-features case: The case in which all the features were used as inputs.
- (2)
AUC-based case: The features selected from the pool of candidate features using the stage-based AUC were included.
We configured the MLP classifier as a hierarchical binary classifier and trained three independent MLPs. Each MLP consisted of three hidden layers with 128 units, using the rectified linear unit (ReLU) activation function with dropout (0.3) and L2 regularization (1 × 10
−4). Training was performed using the Adam optimizer with a learning rate of 1 × 10
−3, a batch size of 128, and 100 epochs. To mitigate the effects of class imbalance, we adopted the focal loss function with
. The other settings are listed in
Table 1.
To ensure stable training on imbalanced datasets containing rare events, while mitigating the risk of overlooking minority classes, we adopted focal loss as the training objective. Focal loss reduces the contribution of samples that are easy to classify and focuses learning on samples that are difficult to classify, which is effective under class imbalance. The focal loss for binary classification is defined as follows:
Here, represents the ground-truth label, represents the predicted probability of the positive class , and represents the predicted probability of the correct class. Furthermore, the focusing parameter reduces the contribution of easy samples. A larger value of places greater emphasis on difficult examples.
4. Numerical Simulation and Results
4.1. Data, Study Period, and Site
We selected three target sites with different meteorological characteristics: Sendai (38.2617° N, 140.8967° E) in the Tohoku region, located near the sea; Fukuoka (33.5817° N, 130.3750° E) in the Kyushu region, characterized by relatively large amounts of cloud cover and water vapor; and Tateno (36.0567° N, 140.1250° E) in the Kanto region, located on a plain. Subsequently, we evaluated the effectiveness of the proposed method under various meteorological conditions. Classification was performed using the GPV-GSM forecast values, and a simulated bid evaluation was conducted for the period from 09:00 to 15:00 JST. We used one year for validation because the proposed model was designed to capture the characteristics of sudden changes in solar radiation, and meteorological variations can be sufficiently represented over a one-year period. In addition, as many samples as possible are required for model learning. Therefore, the study period spanned five years, from 1 April 2019 to 31 March 2024. Data from 1 April 2019 to 31 March 2023 were used for model training, whereas data from 1 April 2023 to 31 March 2024 were used for validation.
Although high-resolution solar radiation data with a short time span is available from satellite-based sources, this data is usually presented as an estimated rather than forecast value. In contrast, forecast values more closely correspond to the information available to market participants during the bidding and scheduling process. Because this study focuses on whether fluctuations in solar radiation are reflected in the predicted and planned values used for market transactions, GPV forecast meteorological data were used in this research. In actual power plant operations, bidding and imbalance settlement are usually conducted at 30 min intervals. However, the temporal resolution of the available GPV meteorological data was limited to hourly intervals. While interpolation could be applied to approximate 30 min values, this could introduce errors caused by the interpolation process. Therefore, the analysis was performed using the native hourly resolution of the GPV data rather than interpolated values.
For the power generation data used in the bid evaluation, the power output estimated from the GPV-GSM forecast values was used as the predicted power generation, whereas the power output estimated from meteorological observation data [
14] was treated as the equivalent of the actual measured power generation. The meteorological data were converted to power generation data using the physical model described in [
10]. The following analysis presents bid planning and imbalance evaluation for a PV system with an installed capacity of 1 kW.
4.2. Results and Discussion
Figure 4 shows the histograms of the power generation prediction errors during the training period at each site. As shown in
Figure 4, although both the mode and mean of the forecast errors were near zero, the distributions exhibited right-skewed tails, indicating an overall tendency toward overprediction in this dataset.
Figure 5 shows the ROC curves for each stage of Fukuoka, which was selected as the representative site. All the ROC curves were located above the reference line corresponding to random classification, indicating that each binary classification model exhibited a certain level of discriminative ability. Among them, the Stage 3 model, which detects extreme overprediction, demonstrated the highest discriminative performance compared with the other stage models.
Table 2 presents the classification of the solar irradiance forecast errors.
(Class 1) indicated extreme underprediction, and
(Class 4) indicated extreme overprediction. As the dataset was biased toward overprediction, the 20% thresholds on the negative and positive sides were asymmetric, resulting in different cutoff values.
Table 3 lists the features selected for each stage in Sendai. In Stage 1, features representing atmospheric conditions, such as cloud, humidity and precipitation, were ranked highly, indicating that these features may have contributed to distinguishing error signs. For the underestimation degree classification, features, such as clearness index, humidity, low cloud and pressure, ranked highly, indicating that they may have contributed to the separation of Class 1 from Class 2. For the overestimation degree classification, features, such as radiation, air temperature, high cloud and wind speed, were ranked highly, indicating that they may have contributed to the separation of Class 4 from Class 3.
- (a)
Stage 1 (sign determination: underside versus overside).
In Stage 1, we distinguished between underprediction (Classes 1 and 2) and overprediction (Classes 3 and 4). The evaluation metrics are presented in
Table 4.
In Stage 1, the AUC-based feature selection model showed performance comparable to that of the model using all features. Although slight decreases in the F1-score were observed at some sites, such as Sendai, the differences were limited, indicating that the AUC-based model retained sufficient discriminative ability for separating under- and over-predicted samples. In Fukuoka, the F1-scores were unchanged for both classes, suggesting that the selected features preserved the classification performance while reducing the feature set. Notably, in Tateno, the AUC-based model improved the F1-score for both under- and over-prediction compared with the all-feature model. In particular, the F1-score for under-prediction increased from 0.48 to 0.53, whereas that for over-prediction increased from 0.62 to 0.67. These results suggest that AUC-based feature selection can reduce the influence of less informative features and improve class separability at certain sites.
- (b)
Stage 2 and 3 (Degree determination: underside and overside)
Based on the results of Stage 1, Stage 2 was applied to further distinguish between normal and extreme cases within the underprediction branch. As shown in
Table 5, the AUC-based feature selection model achieved a performance comparable to that of the model that used all the features for the under-normal class. In particular, the F1-scores for the under-normal cases were almost unchanged across the three sites, indicating that the selected features retained sufficient information to identify normal underprediction cases. For the under-extreme class, slight decreases in the F1-score were observed at some sites, such as Sendai and Tateno. However, in Fukuoka, the AUC-based model showed an improvement in the F1-score from 0.35 to 0.40, mainly owing to improved recall. This suggests that the AUC-based feature selection can enhance the detection of extreme cases under certain site-specific conditions.
In Stage 3, the AUC-based feature selection model showed a performance comparable to that of the model using all the features for the over-normal class. As shown in
Table 6, the F1-scores were nearly unchanged in Sendai and Fukuoka, and slightly improved in Tateno, increasing from 0.91 to 0.92. This indicates that the selected features retained sufficient information to identify normal over-prediction cases. For the over-extreme class, the AUC-based model showed an improvement in the F1-score at Tateno from 0.28 to 0.35, mainly owing to improvements in both precision and recall. In Sendai, the F1-score remained unchanged, whereas a slight decrease was observed in Tateno. Nevertheless, the AUC-based model achieved comparable overall performance when using a reduced feature set.
Figure 6 shows confusion matrices of 4 classes for each location. The confusion matrices also showed that the overall classification structure was largely preserved after the AUC-based feature selection.
In
Table 7, while the all-feature model used 17 features in all stages, the AUC-based model reduced the number of features to 7–14, depending on the location and stage. This reduction was achieved without a substantial degradation in classification accuracy. These results suggest that AUC-based feature selection can remove less informative variables while preserving the discriminative performance.
4.3. Bidding Plan and Imbalance Evaluation
Based on the above risk classes, planned values were generated according to three bidding strategies, namely, Strategies A, B, and C, and the resulting imbalance was evaluated. These strategies were designed to dynamically adjust the planned values submitted to the balancing market according to the inferred risk level, with particular attention paid to reducing the shortage-imbalance risk associated with overprediction events (Classes 3 and 4). This section evaluates the effectiveness of the proposed risk-class-based bidding strategies through an imbalance assessment.
This strategy assumes that risk classification is highly reliable by applying an intuitive “risk-consistent” mapping. When a time step is classified as Class 4 (extreme overprediction risk), no bid is submitted to avoid a potential shortage imbalance. For Class 3 (overprediction), the bid is conservatively reduced using a reduction factor of 0.5 in relation to the point forecast. For Classes 1 and 2 (underprediction), bids follow the point forecast because underprediction does not directly increase shortage exposure when the planned value is derived from the forecast. This strategy represents an idealized benchmark under a near-perfect classification.
As shortage risk is primarily driven by overprediction, this strategy excludes market participation for Classes 3 and 4 in each time step. In contrast, for Classes 1 and 2, the bid is set to match the forecasted generation. Strategy B is intentionally simple and prioritizes risk avoidance over market participation during potentially hazardous periods.
This strategy also considers the possibility of misclassification by being more conservative in the case of underprediction. As in Strategy B, no bids are submitted for Classes 3 and 4. Additionally, a reduction factor of 0.5 is applied when the time step is classified as Class 2, reflecting the fact that some instances of overprediction risk may be incorrectly labeled as mildly underpredicted. For Class 1, which corresponds to extreme underprediction, a slightly larger bid (a reduction factor of 0.7) than for Class 2 may be acceptable because the operational consequences of underprediction are generally less critical than those of overprediction in the context of shortage-imbalance risk. Overall, Strategy C aims to suppress sign-crossing risk at the expense of increased bid curtailment.
Figure 7 illustrates the relationship among the predicted, planned, and actual values, as well as the representative cases in which shortage and surplus imbalances occur.
An imbalance evaluation was conducted using the number of shortage events and the total surplus amount as evaluation metrics. The results in
Table 8 indicate that the imbalance evaluation included the case in which the original predicted values were directly used as bid values as a reference. Compared with this reference case, all the proposed strategies substantially reduced the number of shortage events. For example, in the all-feature case, the number of shortages decreased from 1261/1412/1324 for the predicted values to 92/180/77 under Strategy C. Similarly, in the AUC-based case, Strategy C reduced the number of shortages to 99/168/81. These results indicate that risk-class-based bid adjustment is effective in mitigating shortage-imbalance risks. This improvement was accompanied by an increase in surplus energy, indicating a trade-off between shortage reduction and surplus generation. Among these strategies, Strategy C achieved the largest reduction in shortage events, whereas Strategy A provided a more moderate adjustment with a smaller increase in surplus energy. The AUC-based model showed trends comparable to those of the all-feature model despite the use of selected features. In certain cases, such as Fukuoka under Strategy C and Tateno under Strategy B, the AUC-based model further reduced the number of shortages.
5. Conclusions
This study examined the risk of rare events in solar radiation forecasting from the perspective of balancing the market participation of PV power plants. Using GPV-GSM numerical weather forecast data, we developed a four-class labeling scheme to distinguish between underprediction and overprediction, and to further separate extreme cases. Particular emphasis was placed on the extreme overprediction class, which can be directly linked to the shortage-imbalance risk. A hierarchical classification framework was constructed using the MLP models. The model was evaluated at multiple representative sites, including Sendai, Fukuoka, and Tateno. Two feature settings were compared: an all-feature model using the full candidate feature set, and an AUC-based model using features selected according to their univariate discriminative capability.
The results showed that the AUC-based model achieved a classification performance comparable to that of the all-feature model while reducing the number of input features. Although slight decreases in performance were observed at some sites and stages, the AUC-based model maintained the main classification structure and improved detection performance in some cases. These findings indicate that AUC-based feature selection is effective for constructing a more compact and interpretable classification model without substantially degrading predictive performance.
Furthermore, a bidding simulation was conducted to evaluate the practical applicability of the proposed risk classification framework. Based on the classified risk levels, planned bid values were generated according to the three bidding strategies, and the imbalance performance was evaluated using the number of shortage events and the surplus energy.
The simulation results showed that the proposed risk-class-based bidding strategies substantially reduced the number of shortage events compared with the case in which the original predicted values were directly submitted as bid values. This reduction was accompanied by an increase in surplus energy, indicating a trade-off between shortage mitigation and surplus generation. Among these strategies, the most conservative strategy achieved the greatest reduction in shortage events, whereas the more moderate strategies provided a better balance between shortage reduction and surplus increase. The AUC-based model showed imbalance evaluation results comparable to those of the all-feature model, demonstrating that the selected features were sufficient for practical bidding support.
Future studies will focus on optimizing bidding strategies using market-oriented indicators, including imbalance costs, opportunity loss, and profitability, as well as on evaluating the robustness of the framework across additional years, regions, and operational conditions. Further extensions may include rolling model updates, intraday market integration, and the use of battery energy storage to improve the bidding flexibility and reduce the imbalance risk.