Rare-Event Risk-Based Bidding Strategy for Photovoltaic Systems in the Balancing Market

Cui, Jindan; Yanagida, Ren; Yamanaka, Shuzo; Ueda, Yuzuru

doi:10.3390/solar6030032

Open AccessArticle

Rare-Event Risk-Based Bidding Strategy for Photovoltaic Systems in the Balancing Market

Department of Electrical Engineering, Faculty of Engineering, Tokyo University of Science, 6-3-1, Niijuku, Katsushika-ku, Tokyo 125-8585, Japan

^*

Authors to whom correspondence should be addressed.

Solar 2026, 6(3), 32; https://doi.org/10.3390/solar6030032

Submission received: 30 March 2026 / Revised: 20 May 2026 / Accepted: 25 May 2026 / Published: 2 June 2026

(This article belongs to the Special Issue Connecting Photovoltaic Systems to the Distribution Grid: Solar Power Integration)

Download

Browse Figures

Versions Notes

Abstract

The increased deployment of photovoltaic (PV) technology has led to an increased demand for grid-balancing capacity owing to growing short-term variability and forecast uncertainty. Simultaneously, higher PV penetration can lead to daytime energy market oversupply, pushing day-ahead prices toward zero and undermining PV revenues. Against this backdrop, this study investigated a market participation paradigm in which PV power plants supply reserve power themselves while actively absorbing their own uncertainty, rather than merely relying on balancing the services provided by external resources. We propose a risk-aware framework that classifies solar irradiance prediction errors into four risk categories using GPV-GSM numerical weather forecast data, translating the inferred risk level into practical bidding rules for balancing market participation. We adopted a hierarchical classification pipeline consisting of sign determination (stage 1, under- vs. overprediction), followed by degree determination (Stages 2 and 3), implemented with a multi-layer perceptron. To enhance class separability and reduce features, we introduced a stage-wise area under the curve (AUC)-based feature selection and compared AUC-selected and all-features settings under identical training conditions. The proposed strategies substantially reduce shortage events compared with directly using the original predictions as bids, although they increase surplus energy. The AUC-based model achieves comparable imbalance evaluation results, indicating that the selected features are sufficient for practical bidding support.

Keywords:

photovoltaic; reserve power; balancing market; bidding planning; rare event; area under the curve (AUC)

1. Introduction-

In Japan, the transition to renewable energy as the main power source has been strongly promoted to achieve carbon neutrality by 2050. In particular, expanding the development of photovoltaic (PV) deployment has become an important policy issue [1]. Although PV systems are relatively easy to install as distributed energy resources, their output strongly depends on weather conditions, particularly solar irradiance. Therefore, short-term output fluctuations and prediction errors can significantly affect the power supply–demand balance. As PV penetration increases, the operational uncertainty of power systems also increases, and the demand for the reserve power required to maintain a stable electricity supply increases.

In addition, the widespread adoption of PV systems has substantially influenced electricity market prices. As renewable generation is increasingly introduced into the market, periods during which daytime generation exceeds electricity demand become more likely, leading to significant price declines in energy markets, such as the day-ahead spot market [2]. Negative prices have been reported in Europe, and in Japan, market prices occasionally approach zero during high-PV hours [3,4]. The price decline reduces the profitability of PV operators and may hinder large-scale PV deployment. Therefore, to support the sustainable expansion of PV systems, new revenue opportunities that do not rely solely on electricity sales in the energy market need to be secured.

From this perspective, participation in a balancing market [5], specifically, the possibility that PV facilities can create and provide reserve power, is of particular interest. The balancing market is a framework through which transmission system operators procure the reserve power required to maintain the supply and demand balance, as well as the power system frequency. The importance of renewable energy has increased with the expansion of renewable energy integration. Traditionally, PV systems have been considered variable generation resources that require reserve power. However, this study aimed to explore the possibility of PV systems providing reserve power by absorbing their own uncertainty through proper planning and control.

Recent studies have increasingly linked renewable energy forecasting to market-oriented decision-making. Wang et al. proposed a risk-averse forecasting method for renewable energy trading based on the conditional value at risk (CVaR) assessment of forecast errors [6]. This demonstrates that, in imbalance-sensitive markets, forecast models should be evaluated not only by their average accuracy but also by their ability to reduce extreme errors and financial risk. This perspective is closely related to the present study, which focused on identifying rare, high-impact forecast errors. In addition to risk assessments based on forecasts, several studies have investigated the economic participation of PV resources in ancillary services and electricity markets. Petkovski et al. evaluated the economic viability of PV systems providing frequency containment reserve services and demonstrated their potential to participate in flexible markets when operational constraints and reserve requirements were considered [7]. Zhou et al. developed a further learning-based joint bidding strategy for a PV and battery system in Singapore’s electricity market, demonstrating the importance of data-driven bidding strategies for coordinating renewable energy generation and storage in the face of market uncertainty [8].

Our team has developed a coherent line of research to enable PV systems to participate in electricity markets under forecast uncertainty, with a consistent focus on operational risk, particularly shortage-imbalance exposure, and practical mechanisms to provide flexibility. Early studies established a headroom-control concept to absorb PV prediction errors and systematically compared statistical, machine learning, and combined models of error absorption. This study clarified that, for market-facing operation, reducing average point-forecast error alone is insufficient; rather, an explicit operational “buffer” (headroom) can substantially improve reliability by limiting shortage-driven deviations when forecasts fail [9]. Building on this foundation, we propose an implementable headroom-setting algorithm for reserve creation in PV power plants. By representing the mapping from the operating conditions to the required headroom using polynomial surfaces, this method converts error-absorption requirements into actionable rules suitable for real-time operation and reserve provisioning [10].

In parallel, our study connected forecasting uncertainty to Japan’s balancing market participation more explicitly by formulating day-ahead planning and evaluating the shortage-occurrence risk under market participation constraints. This study quantified how plan submission under forecast uncertainty can propagate into shortage events and operational penalties and highlighted the importance of risk-aware planning when withdrawal-related constraints are imposed [11]. Recently, we have proposed a probabilistic modeling framework for prediction errors aimed at enhancing the balancing market participation of PV systems. This line of work introduced error-threshold estimation and incorporated multisite aggregation effects and inverter overloading considerations, which are practical factors that materially shape the distribution of prediction errors and the tail-risk profile relevant to imbalance exposure [12]. Collectively, these studies established (i) operational mechanisms for absorbing prediction errors via headroom control, (ii) implementable reserve creation and planning rules aligned with market participation, and (iii) risk evaluation frameworks that incorporate uncertainty, aggregation, and equipment constraints. However, infrequent but severe forecast failures (rare-event tail errors) can disproportionately dominate market outcomes. In particular, certain error types, such as sign-crossing mistakes that invert the direction of over-/under-estimation risk, can lead to operationally critical decisions and heightened shortage-imbalance exposure even when the overall accuracy differences are small. Motivated by this gap, the present study focused on discriminability-driven feature design and selection within a hierarchical classification pipeline, using the area under the receiver operating characteristic (ROC) curve (AUC) as a separability criterion, and evaluated the performance not only via aggregate accuracy but also via operationally critical misclassification patterns and simulated bidding outcomes.

PV systems require advanced forecasting techniques to participate in the balancing market and reliably submit planned values [5]. PV power forecasting methods based on numerical weather prediction data, such as the Japan Meteorological Agency (JMA) GPV-GSM data, are widely used [13,14]. Numerical weather prediction (NWP)-based irradiance and PV forecasting have long been recognized as essential for power system operations under high PV penetration. Nagoya et al. [15] emphasized that, when PV is widely distributed across numerous sites, system operation requires not only point-wise forecast accuracy but also the verification of forecast performance on an area-wide scale. They evaluated an approach to predict the area-total irradiance using regional NWP data from the JMA (GPV), and compared the forecasted area-total irradiance with a presumed total irradiance derived from numerous points corresponding to the PV distribution, thereby providing an evaluation framework relevant to wide-area balancing needs. Suzuki et al. [16] proposed a solar irradiance forecasting method that combined NWP information (GPV-based physical model inputs) with a data-driven black-box model via just-in-time modeling. They assessed the prediction accuracy over wide areas using measured data from multiple observation points (44 sites across Japan provided by the JMA and 64 sites around Kanto provided by the New Energy and Industrial Technology Development Organization) and also examined how different GPV initialization times affect prediction performance. This study is informative in that it demonstrates, at an early stage, a hybrid perspective that leverages NWP while compensating for local variability through data-driven modeling. Although these studies provide important insights into wide-area forecasting and NWP-based prediction frameworks, they mainly focus on average accuracy measures and general forecasting performance. In contrast, for market participation and operational risk management, infrequent but severe forecast failures (rare-event errors) can be disproportionately consequential, for example, by triggering shortage imbalances. Accordingly, our study builds on NWP (GPV-GSM) inputs, but shifts the emphasis from average error reduction to rare-event risk detection and its integration into balancing-market bidding decisions.

Although these methods are suitable for predicting large-scale weather fields, they may not accurately capture the sudden changes in solar irradiance caused by local cloud formation and movement. Consequently, large discrepancies may arise between the day-ahead planned values and actual generation, resulting in imbalances.

Most previous studies have focused on improving the metrics that measure the average forecast error, such as the root mean square error, in PV power forecasting. However, from the viewpoint of electricity market operations, the factors that cause significant economic losses and operational disruptions are often not average errors, but rather low-frequency, large deviations from forecasts, that is, errors in forecasting rare events. In particular, in balancing-market bidding planning, submitting an excessively high planned value during time periods in which such rare events occur may increase shortage imbalances and associated penalties. Therefore, developing a framework that can detect rare large prediction errors in advance and adjust market participation volumes according to the associated risks is an important challenge.

Accordingly, this paper proposes a machine-learning-based method that classifies and detects the rare-event risk of solar irradiance forecast errors in PV systems in advance using GPV-GSM numerical weather prediction data and then reflects the results of balancing-market bidding planning. Specifically, rather than treating prediction errors as a single continuous variable and performing point forecasting, the proposed method classifies prediction errors into multiple risk categories, such as overestimation, underestimation, and rare large forecast deviations. Planned values are then determined according to each category. The effectiveness of the proposed method was evaluated using indicators, such as the number of imbalance events, with particular emphasis on shortage-imbalance events, to verify whether bidding planning that accounts for rare-event risk contributes to reducing operational risk in market participation. Through this approach, this study aimed to provide practical insights for improving the profitability of PV operators and securing reserve power in power systems.

The remainder of this paper is organized as follows. Section 2 presents the research approach, including data preprocessing and the definition of rare events. Section 3 describes rare-event classification modeling and AUC-based feature extraction. Section 4 presents the numerical simulation conditions and results used to verify the effectiveness of the proposed method and the bidding planning strategies. Finally, Section 5 concludes the paper and discusses future scope.

2. Research Framework and Problem Setting

2.1. Data Source and Preprocessing

In this study, we used data from the Japanese region obtained from the GPV-GSM global numerical weather prediction model operated by the JMA and converted them into a format suitable as an input for machine-learning models. First, time-series data for each meteorological variable at the target location were extracted from the GRIB2-format files. The variables used in this study included solar radiation, total cloud cover, cloud cover per layer (upper, middle, and lower), air temperature, humidity, precipitation, and wind speed (U and V components). For each time series, we used forecasts initialized at 00:00 UTC (09:00 JST) and extracted the period in which PV generation is expected the following day (09:00–15:00 JST). The data had a time resolution of 1 h, and we retained this original resolution to avoid introducing additional interpolation errors in the rare-event classification model.

2.2. Rare-Event Definition and Labeling

In this study, to quantify the risk of errors in solar radiation forecasts, we define the errors using Equation (1).

E = R_{f} - R_{m}

(1)

Here,

R_{m}

represents the measured solar radiation [14], and

R_{f}

represents the solar radiation predicted from the GPV-GSM. An error of less than zero indicates that the prediction is less than the measured value (underprediction), whereas an error of greater than zero indicates that the prediction is higher than the measured value (overprediction).

Forecast errors of solar radiation were classified into four categories based on the error distribution to identify meteorological conditions that are difficult to predict. Underprediction (

E \leq 0

) and overprediction (

E > 0

) groups were treated separately, with thresholds defined at the lower and upper 20% of the cumulative distribution of absolute error values within each group. Consequently, each extreme-error category corresponded to approximately 10% of all samples. Previous studies adopted thresholds corresponding to approximately 7–8% [11] and 1.5% [17] of samples to define rare or large-error events. Taking these criteria into account, a threshold of 20% was adopted within each prediction-error group to maintain a sufficient sample size for stable classification while preserving the rarity of high-error events. Thresholds smaller than 20% resulted in too few samples for stable classification, whereas larger thresholds tended to include events that were not sufficiently rare. Therefore, the selected threshold was considered to provide a reasonable balance between sample size and event rarity.

Class 1: Cases in which the actual results significantly exceed the forecasts and the error falls within the top 20% in the negative direction.
Class 2: General cases in which the actual results exceed the forecasts, but the error is moderate.
Class 3: General cases in which the forecasts exceed the actual results, but the error is moderate.
Class 4: Cases in which the forecast significantly exceeds the actual result and the error is in the top 20% in the positive direction. As overpredicting the planned value can trigger a shortage imbalance, this study classifies Class 4 as a rare outlier directly linked to shortage risk and prioritizes its detection.

2.3. Overall Workflow

This study consisted of three steps. First, we constructed a classification model to predict the risk of forecast errors caused by solar radiation. Regarding the classification model, we refined the input features using feature selection based on the AUC [18,19,20] and performed classification using a multi-layer perceptron (MLP) [21,22]. Second, we formulated bid plans (planned values) for the balancing market based on the results of the classification model. Finally, we evaluated the effectiveness of market bids using imbalance metrics. Figure 1 illustrates the overall research flow.

3. Rare-Event Classification Modeling

In this paper, we present an MLP model that incorporates AUC-based feature selection to capture temporal features. To validate the AUC-based model, it was compared with a case in which the AUC was not used.

3.1. Extraction of Candidate Features

In this study, we examined the extent to which input features contribute to discriminating between classes, that is, class separability. We quantitatively evaluated each feature using the AUC, which is a separability metric commonly used in binary classification. We also examined the effectiveness of an AUC-based feature selection scheme. The definitions of the four classes (Classes 1–4) and their corresponding data distributions are the same as those described in the previous section. To compute the AUC stage by stage, we formulated multiple binary classification tasks by pairing and/or grouping the classes. Figure 2 shows an overview of the hierarchical (stagewise) structure. The primary objective of this section is to assess the influence of feature selection on the classification performance. Therefore, we maintained the classification procedure and training conditions identical across the experiments by varying only the input feature sets. Specifically, we compared a model that used the full set of available features with one that used a subset of features selected according to the AUC base.

In this study, the full candidate feature set was defined as the combination of basic and additional candidate features. The basic features comprised solar radiation, total cloud cover, and periodic features representing diurnal and seasonal cycles (sine/cosine of day and time). These features capture the fundamental factors underlying solar radiation prediction errors (solar radiation level, cloud attenuation, and periodic variation) and constitute the minimum essential information. Additional candidate features included humidity, precipitation, cloud cover at various levels (low, middle, and high), air temperature, barometric pressure, and wind speed. For each binary classification task at each stage, the univariate AUC was calculated for every candidate feature, and features with high discriminative capability were selected for model construction.

3.2. Definition of ROC and AUC

In a sample set, if a higher classification score indicates a higher probability of being positive, samples with

X > μ

are classified as positive using a threshold

μ

. Let

X_{N}

and

X_{P}

denote the classification scores for the negative and positive groups, respectively, and let

F_{N}

and

F_{P}

denote the corresponding cumulative distribution functions.

The false positive rate (FPR), FPR = FP/(FP + TN), represents the probability that a negative sample is incorrectly classified as positive, and is defined as follows:

F P R (u) = P (X_{N} > u) = 1 - F_{N} (u)

(2)

Here, by setting the FPR to

t

, and thus the threshold

u

is expressed as a function of

t

as follows:

u = F_{N}^{- 1} (1 - t)

.

In contrast, the true positive rate (TPR), TPR = TP/(TP + FN), represents the probability that a positive sample is correctly classified as positive, corresponding to the sensitivity, and is defined as follows:

T P R (u) = P (X_{P} > u) = 1 - F_{P} (u)

(3)

The ROC curve characterizes the relationship between the FPR and TPR as the classification threshold

u

varies. Based on the distributions of the classification scores in the negative and positive groups, the ROC curve can be expressed as a function of the FPR, denoted by

t

, as follows:

R O C (t) = 1 - F_{P} (F_{N}^{- 1} (1 - t))

(4)

The AUC was defined as the area under the ROC curve and is obtained by integrating

R O C (t)

with respect to the FPR

t

over the interval from 0 to 1. In addition to this geometric interpretation, the AUC can be interpreted as the probability that a randomly selected positive sample receives a higher score than a randomly selected negative sample.

A U C = \int_{0}^{1} R O C (t) d t = P (X_{P} > X_{N})

(5)

To estimate this probability, all possible pairs of positive and negative samples were compared, and the proportion of pairs in which the positive sample had a higher score than the negative sample was calculated. In this study, the AUC was computed by treating each feature value as a discrimination score between positive and negative samples.

A U C = \frac{\sum_{i = 1}^{n_{N}} \sum_{j = 1}^{n_{P}} I (X_{f}^{P_{j}} > X_{f}^{N_{i}})}{n_{N} n_{P}}

(6)

where

n_{N}

and

n_{P}

represent the total numbers of negative and positive samples, respectively.

I (\cdot)

represents the indicator function, which takes the value of 1 if the condition is true and 0 otherwise.

X_{f}^{N_{i}}

and

X_{f}^{P_{j}}

represent the values of feature

f

for the

i

-th negative and the

j

-th positive samples, respectively.

Figure 3 illustrates the representative AUC score distributions corresponding to different AUC values in the binary classification.

A U C = 0.5

corresponds to random ordering, and

A U C = 1.0

indicates perfect separability.

In this study, each candidate feature was treated as a univariate score, and its AUC was used as a quantitative separability measure. Features whose AUCs deviate further from 0.5 provide stronger discriminative power for the corresponding binary task. Therefore, we computed the univariate AUC for each stage-defined binary task using training data and selected features with high separability to augment the baseline feature set.

3.3. Stepwise AUC Calculation and Feature Selection

As the AUC is a metric for binary classification, this study decomposed the four-class classification into three binary tasks to calculate the AUC (as Figure 2).

In Stage 1 (sign determination), the underprediction (Classes 1 and 2) and overprediction (Classes 3 and 4) sides were distinguished. Group 1-0 was assigned to the underprediction side and Group 1-1 to the overprediction side.

Stage 2 (underprediction degree determination) involves distinguishing between Classes 1 and 2. Group 2-0 was assigned to Class 1 and Group 2-1 to Class 2.

Stage 3 (overprediction degree determination) distinguished between Classes 3 and 4. Here, Group 3-0 was assigned to Class 3 and Group 3-1 to Class 4.

The univariate AUC for each stage was calculated using only the training data. To ensure that an evaluation was independent of the separation direction, we defined the separation index as follows: features that were greater than or equal to a specified threshold were selected. The selection criterion was set to

D \geq 0.6

, with periodic features selected regardless of their AUC values.

D = \max (A U C, 1 - A U C)

(7)

The reason for setting the threshold at

D = 0.6

is that, although an AUC of 0.5 corresponds to random ordering (i.e., no discrimination),

D \geq 0.6

indicates a high probability that one group will have higher values than the other, enabling us to anticipate a certain degree of discrimination from a single feature. For this study, we adopted an empirical threshold of 0.6, balancing the number of features against the risk of overfitting.

3.4. MLP Model Structure

A binary classifier was trained using an MLP for each stage [20]. Stage 1 determined whether a sample was under- or over- predicted. Based on this result, Stage 2 (underprediction) or Stage 3 (overprediction) was applied, ultimately determining the class (Classes 1–4).

For the performance comparison, the classification procedure, MLP architecture, training conditions, and decision threshold were maintained constant. The input features were varied according to the following two conditions:

(1): All-features case: The case in which all the features were used as inputs.
(2): AUC-based case: The features selected from the pool of candidate features using the stage-based AUC were included.

We configured the MLP classifier as a hierarchical binary classifier and trained three independent MLPs. Each MLP consisted of three hidden layers with 128 units, using the rectified linear unit (ReLU) activation function with dropout (0.3) and L2 regularization (1 × 10⁻⁴). Training was performed using the Adam optimizer with a learning rate of 1 × 10⁻³, a batch size of 128, and 100 epochs. To mitigate the effects of class imbalance, we adopted the focal loss function with

γ = 2.0

. The other settings are listed in Table 1.

To ensure stable training on imbalanced datasets containing rare events, while mitigating the risk of overlooking minority classes, we adopted focal loss as the training objective. Focal loss reduces the contribution of samples that are easy to classify and focuses learning on samples that are difficult to classify, which is effective under class imbalance. The focal loss for binary classification is defined as follows:

L o s s = - {(1 - p_{t})}^{γ} \log (p_{t})

(8)

p_{t} = {\begin{array}{l} p & (y = 1) \\ 1 - p & (y = 0) \end{array}

(9)

Here,

y \in {0, 1}

represents the ground-truth label,

p \in (0, 1)

represents the predicted probability of the positive class

(y = 1)

, and

p_{t}

represents the predicted probability of the correct class. Furthermore, the focusing parameter

γ \geq 0

reduces the contribution of easy samples. A larger value of

γ

places greater emphasis on difficult examples.

4. Numerical Simulation and Results

4.1. Data, Study Period, and Site

We selected three target sites with different meteorological characteristics: Sendai (38.2617° N, 140.8967° E) in the Tohoku region, located near the sea; Fukuoka (33.5817° N, 130.3750° E) in the Kyushu region, characterized by relatively large amounts of cloud cover and water vapor; and Tateno (36.0567° N, 140.1250° E) in the Kanto region, located on a plain. Subsequently, we evaluated the effectiveness of the proposed method under various meteorological conditions. Classification was performed using the GPV-GSM forecast values, and a simulated bid evaluation was conducted for the period from 09:00 to 15:00 JST. We used one year for validation because the proposed model was designed to capture the characteristics of sudden changes in solar radiation, and meteorological variations can be sufficiently represented over a one-year period. In addition, as many samples as possible are required for model learning. Therefore, the study period spanned five years, from 1 April 2019 to 31 March 2024. Data from 1 April 2019 to 31 March 2023 were used for model training, whereas data from 1 April 2023 to 31 March 2024 were used for validation.

Although high-resolution solar radiation data with a short time span is available from satellite-based sources, this data is usually presented as an estimated rather than forecast value. In contrast, forecast values more closely correspond to the information available to market participants during the bidding and scheduling process. Because this study focuses on whether fluctuations in solar radiation are reflected in the predicted and planned values used for market transactions, GPV forecast meteorological data were used in this research. In actual power plant operations, bidding and imbalance settlement are usually conducted at 30 min intervals. However, the temporal resolution of the available GPV meteorological data was limited to hourly intervals. While interpolation could be applied to approximate 30 min values, this could introduce errors caused by the interpolation process. Therefore, the analysis was performed using the native hourly resolution of the GPV data rather than interpolated values.

For the power generation data used in the bid evaluation, the power output estimated from the GPV-GSM forecast values was used as the predicted power generation, whereas the power output estimated from meteorological observation data [14] was treated as the equivalent of the actual measured power generation. The meteorological data were converted to power generation data using the physical model described in [10]. The following analysis presents bid planning and imbalance evaluation for a PV system with an installed capacity of 1 kW.

4.2. Results and Discussion

Figure 4 shows the histograms of the power generation prediction errors during the training period at each site. As shown in Figure 4, although both the mode and mean of the forecast errors were near zero, the distributions exhibited right-skewed tails, indicating an overall tendency toward overprediction in this dataset.

Figure 5 shows the ROC curves for each stage of Fukuoka, which was selected as the representative site. All the ROC curves were located above the reference line corresponding to random classification, indicating that each binary classification model exhibited a certain level of discriminative ability. Among them, the Stage 3 model, which detects extreme overprediction, demonstrated the highest discriminative performance compared with the other stage models.

Table 2 presents the classification of the solar irradiance forecast errors.

E < - 158.58

(Class 1) indicated extreme underprediction, and

261.04 \leq E

(Class 4) indicated extreme overprediction. As the dataset was biased toward overprediction, the 20% thresholds on the negative and positive sides were asymmetric, resulting in different cutoff values.

Table 3 lists the features selected for each stage in Sendai. In Stage 1, features representing atmospheric conditions, such as cloud, humidity and precipitation, were ranked highly, indicating that these features may have contributed to distinguishing error signs. For the underestimation degree classification, features, such as clearness index, humidity, low cloud and pressure, ranked highly, indicating that they may have contributed to the separation of Class 1 from Class 2. For the overestimation degree classification, features, such as radiation, air temperature, high cloud and wind speed, were ranked highly, indicating that they may have contributed to the separation of Class 4 from Class 3.

(a): Stage 1 (sign determination: underside versus overside).

In Stage 1, we distinguished between underprediction (Classes 1 and 2) and overprediction (Classes 3 and 4). The evaluation metrics are presented in Table 4.

In Stage 1, the AUC-based feature selection model showed performance comparable to that of the model using all features. Although slight decreases in the F1-score were observed at some sites, such as Sendai, the differences were limited, indicating that the AUC-based model retained sufficient discriminative ability for separating under- and over-predicted samples. In Fukuoka, the F1-scores were unchanged for both classes, suggesting that the selected features preserved the classification performance while reducing the feature set. Notably, in Tateno, the AUC-based model improved the F1-score for both under- and over-prediction compared with the all-feature model. In particular, the F1-score for under-prediction increased from 0.48 to 0.53, whereas that for over-prediction increased from 0.62 to 0.67. These results suggest that AUC-based feature selection can reduce the influence of less informative features and improve class separability at certain sites.

(b): Stage 2 and 3 (Degree determination: underside and overside)

Based on the results of Stage 1, Stage 2 was applied to further distinguish between normal and extreme cases within the underprediction branch. As shown in Table 5, the AUC-based feature selection model achieved a performance comparable to that of the model that used all the features for the under-normal class. In particular, the F1-scores for the under-normal cases were almost unchanged across the three sites, indicating that the selected features retained sufficient information to identify normal underprediction cases. For the under-extreme class, slight decreases in the F1-score were observed at some sites, such as Sendai and Tateno. However, in Fukuoka, the AUC-based model showed an improvement in the F1-score from 0.35 to 0.40, mainly owing to improved recall. This suggests that the AUC-based feature selection can enhance the detection of extreme cases under certain site-specific conditions.

In Stage 3, the AUC-based feature selection model showed a performance comparable to that of the model using all the features for the over-normal class. As shown in Table 6, the F1-scores were nearly unchanged in Sendai and Fukuoka, and slightly improved in Tateno, increasing from 0.91 to 0.92. This indicates that the selected features retained sufficient information to identify normal over-prediction cases. For the over-extreme class, the AUC-based model showed an improvement in the F1-score at Tateno from 0.28 to 0.35, mainly owing to improvements in both precision and recall. In Sendai, the F1-score remained unchanged, whereas a slight decrease was observed in Tateno. Nevertheless, the AUC-based model achieved comparable overall performance when using a reduced feature set.

Figure 6 shows confusion matrices of 4 classes for each location. The confusion matrices also showed that the overall classification structure was largely preserved after the AUC-based feature selection.

In Table 7, while the all-feature model used 17 features in all stages, the AUC-based model reduced the number of features to 7–14, depending on the location and stage. This reduction was achieved without a substantial degradation in classification accuracy. These results suggest that AUC-based feature selection can remove less informative variables while preserving the discriminative performance.

4.3. Bidding Plan and Imbalance Evaluation

Based on the above risk classes, planned values were generated according to three bidding strategies, namely, Strategies A, B, and C, and the resulting imbalance was evaluated. These strategies were designed to dynamically adjust the planned values submitted to the balancing market according to the inferred risk level, with particular attention paid to reducing the shortage-imbalance risk associated with overprediction events (Classes 3 and 4). This section evaluates the effectiveness of the proposed risk-class-based bidding strategies through an imbalance assessment.

Strategy A (ideal accuracy rule)

This strategy assumes that risk classification is highly reliable by applying an intuitive “risk-consistent” mapping. When a time step is classified as Class 4 (extreme overprediction risk), no bid is submitted to avoid a potential shortage imbalance. For Class 3 (overprediction), the bid is conservatively reduced using a reduction factor of 0.5 in relation to the point forecast. For Classes 1 and 2 (underprediction), bids follow the point forecast because underprediction does not directly increase shortage exposure when the planned value is derived from the forecast. This strategy represents an idealized benchmark under a near-perfect classification.

Strategy B (overprediction-avoidance rule)

As shortage risk is primarily driven by overprediction, this strategy excludes market participation for Classes 3 and 4 in each time step. In contrast, for Classes 1 and 2, the bid is set to match the forecasted generation. Strategy B is intentionally simple and prioritizes risk avoidance over market participation during potentially hazardous periods.

Strategy C (conservative rule under misclassification)

This strategy also considers the possibility of misclassification by being more conservative in the case of underprediction. As in Strategy B, no bids are submitted for Classes 3 and 4. Additionally, a reduction factor of 0.5 is applied when the time step is classified as Class 2, reflecting the fact that some instances of overprediction risk may be incorrectly labeled as mildly underpredicted. For Class 1, which corresponds to extreme underprediction, a slightly larger bid (a reduction factor of 0.7) than for Class 2 may be acceptable because the operational consequences of underprediction are generally less critical than those of overprediction in the context of shortage-imbalance risk. Overall, Strategy C aims to suppress sign-crossing risk at the expense of increased bid curtailment.

Figure 7 illustrates the relationship among the predicted, planned, and actual values, as well as the representative cases in which shortage and surplus imbalances occur.

An imbalance evaluation was conducted using the number of shortage events and the total surplus amount as evaluation metrics. The results in Table 8 indicate that the imbalance evaluation included the case in which the original predicted values were directly used as bid values as a reference. Compared with this reference case, all the proposed strategies substantially reduced the number of shortage events. For example, in the all-feature case, the number of shortages decreased from 1261/1412/1324 for the predicted values to 92/180/77 under Strategy C. Similarly, in the AUC-based case, Strategy C reduced the number of shortages to 99/168/81. These results indicate that risk-class-based bid adjustment is effective in mitigating shortage-imbalance risks. This improvement was accompanied by an increase in surplus energy, indicating a trade-off between shortage reduction and surplus generation. Among these strategies, Strategy C achieved the largest reduction in shortage events, whereas Strategy A provided a more moderate adjustment with a smaller increase in surplus energy. The AUC-based model showed trends comparable to those of the all-feature model despite the use of selected features. In certain cases, such as Fukuoka under Strategy C and Tateno under Strategy B, the AUC-based model further reduced the number of shortages.

5. Conclusions

This study examined the risk of rare events in solar radiation forecasting from the perspective of balancing the market participation of PV power plants. Using GPV-GSM numerical weather forecast data, we developed a four-class labeling scheme to distinguish between underprediction and overprediction, and to further separate extreme cases. Particular emphasis was placed on the extreme overprediction class, which can be directly linked to the shortage-imbalance risk. A hierarchical classification framework was constructed using the MLP models. The model was evaluated at multiple representative sites, including Sendai, Fukuoka, and Tateno. Two feature settings were compared: an all-feature model using the full candidate feature set, and an AUC-based model using features selected according to their univariate discriminative capability.

The results showed that the AUC-based model achieved a classification performance comparable to that of the all-feature model while reducing the number of input features. Although slight decreases in performance were observed at some sites and stages, the AUC-based model maintained the main classification structure and improved detection performance in some cases. These findings indicate that AUC-based feature selection is effective for constructing a more compact and interpretable classification model without substantially degrading predictive performance.

Furthermore, a bidding simulation was conducted to evaluate the practical applicability of the proposed risk classification framework. Based on the classified risk levels, planned bid values were generated according to the three bidding strategies, and the imbalance performance was evaluated using the number of shortage events and the surplus energy.

The simulation results showed that the proposed risk-class-based bidding strategies substantially reduced the number of shortage events compared with the case in which the original predicted values were directly submitted as bid values. This reduction was accompanied by an increase in surplus energy, indicating a trade-off between shortage mitigation and surplus generation. Among these strategies, the most conservative strategy achieved the greatest reduction in shortage events, whereas the more moderate strategies provided a better balance between shortage reduction and surplus increase. The AUC-based model showed imbalance evaluation results comparable to those of the all-feature model, demonstrating that the selected features were sufficient for practical bidding support.

Future studies will focus on optimizing bidding strategies using market-oriented indicators, including imbalance costs, opportunity loss, and profitability, as well as on evaluating the robustness of the framework across additional years, regions, and operational conditions. Further extensions may include rolling model updates, intraday market integration, and the use of battery energy storage to improve the bidding flexibility and reduce the imbalance risk.

Author Contributions

J.C. Data curation, investigation, software, validation, writing—original draft, and writing—review and editing; R.Y. Data curation, investigation, methodology, software, validation, and writing—original draft; S.Y. Data curation, investigation, software, and writing—original draft; Y.U. Conceptualization, resources, and supervision. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Data Availability Statement

Some of the data used in this study are not publicly available because they were obtained from a third-party provider under a paid license, and the authors do not have the right to redistribute them.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Agency for Natural Resources and Energy (ANRE). Ministry of Economy, Trade, and Industry (METI). The 7th Strategic Energy Plan. Available online: https://www.enecho.meti.go.jp/category/others/basic_plan/pdf/2025_strategic_energy_plan.pdf (accessed on 24 March 2026).
Maekawa, J.; Hai, B.H.; Shinkuma, S.; Shimada, K. The Effect of Renewable Energy Generation on the Electric Power Spot Price of the Japan Electric Power Exchange. Energies 2018, 11, 2215. [Google Scholar] [CrossRef]
Agency for the Cooperation of Energy Regulators (ACER). Contain the Rise in Electricity Network Cost? ‘Getting the Signals Right’ in the Network Tariffs Is Key. Grid Conference—Council Presidency of Poland. March 2025. Available online: https://www.acer.europa.eu/sites/default/files/documents/en/The_agency/Documents/20250326-ACER-Grid-Conference-Polish-Presidency.pdf (accessed on 24 March 2026).
Electricity and Gas Market Surveillance Commission (EGC), Japan. Available online: https://www.egc.meti.go.jp (accessed on 24 March 2026).
Electric Power Reserve Exchange of Japan (EPRX). Trading Rules for the Balancing Market. Available online: https://www.eprx.or.jp/outline/docs/kitei_231201.pdf (accessed on 24 March 2026).
Wang, J.; Zhou, Y.; Zhang, Y.; Lin, F.; Wang, J. Risk-Averse Optimal Combining Forecasts for Renewable Energy Trading under CVaR Assessment of Forecast Errors. IEEE Trans. Power Syst. 2024, 39, 2296–2309. [Google Scholar] [CrossRef]
Petkovski, E.; Bogen, M.S.; Bosma, T.; Eijgelaar, M. Economic Viability of Photovoltaic Systems Providing Frequency Containment Reserve. Adv. Energ. Sust. Res. 2025, 7, e202500295. [Google Scholar] [CrossRef]
Zhou, Q.; Xia, Y.; Xu, Y. A Learning-Based Joint Bidding Strategy for Photovoltaic-Energy Storage System Power Plant in Singapore Electricity Market. Energy Internet 2025, 2, 218–228. [Google Scholar] [CrossRef]
Cui, J.; Jie, B.; Fang, X.; Oozeki, T.; Ueda, Y. Absorption of PV Power Prediction Errors with Headroom Control by Statistical, Machine Learning and Combined Models. IEEJ Trans. Electr. Electron. Eng. 2023, 19, 200–207. [Google Scholar] [CrossRef]
Cui, J.; Fang, X.; Oozeki, T.; Ueda, Y. Development of Error Absorption Headroom Setting Algorithm using Polynomial Surfaces to Create Reserve Power in PV Power Plants. IEEJ Trans. Electr. Electron. Eng. 2025, 20, 2100–2109. [Google Scholar] [CrossRef]
Cui, J.; Fang, X.; Jie, B.; Oozeki, T.; Ueda, Y. Day-ahead Planning and Shortfall Risk Assessment in Balancing Market for Solar Power Plants. J. Jpn. Sol. Energy Soc. 2025, 51, 85–94. (In Japanese) [Google Scholar] [CrossRef]
Cui, J.; Fang, X.; Oozeki, T.; Ueda, Y. Probabilistic Modeling for Prediction Errors to Enhance Balancing Market Participation of PV Systems: Error Threshold Estimation, Multi-Site Aggregation, and Overloading Effects. Adv. Energy Sustain. Res. 2026, 7, e202500301. [Google Scholar] [CrossRef]
Japan Meteorological Business Support Center. Meteorological Data. Available online: http://www.jmbsc.or.jp/jp/ (accessed on 24 March 2026).
Japan Meteorological Agency (JMA). Available online: https://www.data.jma.go.jp/developer/gpv_sample.html (accessed on 24 March 2026).
Nagoya, H.; Saji, K.; Aoki, I.; Tanikawa, R.; Komami, S.; Ogimoto, K.; Iwafune, Y. A Study on Irradiance Forecast Accuracy on Numerous Points Correspond to Highly Penetrated PVs Using Numerical Weather Prediction Data by Japan Meteorological Agency. IEEJ Trans. Power Energy 2013, 133, 531–540. (In Japanese) [Google Scholar] [CrossRef]
Suzuki, T.; Goto, Y.; Terazono, T.; Wakao, S.; Oozeki, T. Forecasting of Solar Irradiance with Just-in-Time Modeling. Electr. Eng. Jpn. 2013, 182, 19–28. (In Japanese) [Google Scholar] [CrossRef]
Tu, Y.; Fang, X.; Cui, J.; Ueda, Y. Prediction of Radiation Forecast Error Range Due to GPV Surrounding Cloud Conditions and Changes in Radiation Forecast. In Proceedings Japan Solar Energy Society Conference; Japan Solar Energy Society: Tokyo, Japan, 2022; pp. 279–282, (In Japanese). [Google Scholar] [CrossRef]
Pepe, M.S. The Statistical Evaluation of Medical Tests for Classification and Prediction; Oxford Statistical Science Series, 28; Oxford University Press: Oxford, UK, 2003. [Google Scholar]
Hanley, J.A.; McNeil, B.J. The Meaning and Use of the Area under a Receiver Operating Characteristic (ROC) Curve. Radiology 1982, 143, 29–36. [Google Scholar] [CrossRef] [PubMed]
Bradley, A.P. The Use of the Area under the ROC Curve in the Evaluation of Machine Learning Algorithms. Pattern Recognit. 1997, 30, 1145–1159. [Google Scholar] [CrossRef]
Hornik, K. Approximation Capabilities of Multilayer Feedforward Networks. Neural Netw. 1991, 4, 251–257. [Google Scholar] [CrossRef]
Cybenko, G. Approximation by Superpositions of a Sigmoidal Function. Math. Control Signal Syst. 1989, 2, 303–314. [Google Scholar] [CrossRef]

Figure 1. Research flow.

Figure 2. Classification flow.

Figure 3. AUC score distributions.

Figure 4. Errors distributions.

Figure 5. ROC curves by stage for Fukuoka.

Figure 6. Confusion matrices of 4 classes. Note: The color intensity in the confusion matrix indicates the number of samples in each cell, with darker blue representing larger values.

Figure 7. Comparison of predicted, planned, and actual values under shortage and surplus imbalance Cases.

Table 1. MLP hyperparameters.

Hidden layers	3
Units per layer	128
Activation	ReLU
L2 regularization:	1 × 10⁻⁴
Output layer	Dense (1), sigmoid
Optimizer	Adam
Learning rate	1 × 10⁻³
Epochs	100
Batch size	128
Dropout	0.3
Loss	Focal Loss ( $γ = 2.0$ )

Table 2. Label classification (representative site: Sendai).

Class	1	2	3	4
Errors ( $W h / m^{2}$ )	$E < - 158.58$	$- 158.58 \leq E < 0$	$0 \leq E < 261.04$	$261.04 \leq E$
Description	Extreme underprediction	Underprediction	Overprediction	Extreme overprediction

Table 3. Feature list of AUC-based features model (representative site: Sendai).

Feature	Unit	AUC
Feature	Unit	Stage 1	Stage 2 (Under)	Stage 3 (Over)
Clearness index	-	0.580	0.690	0.551
Humidity	%	0.608	0.705	0.530
Precipitation	mm/h	0.614	0.581	0.535
MSLP (Mean sea level pressure)	hPa	0.575	0.629	0.612
Psfc (Surface pressure)	hPa	0.572	0.615	0.588
Air temperature	K	0.528	0.622	0.679
Wind U component	m/s	0.568	0.527	0.601
Wind V component	m/s	0.551	0.502	0.597
Cloud	-	0.720	0.726	0.587
High_cloud	-	0.680	0.524	0.623
Mid_cloud	-	0.653	0.666	0.520
Low_cloud	-	0.625	0.748	0.525
Radiation	Wh/m²	0.581	0.562	0.746
Day_sine	-	0.523	0.572	0.524
Day_cosine	-	0.542	0.641	0.712
Hour_sine	-	0.524	0.525	0.519
Hour_cosine	-	0.511	0.549	0.609

Table 4. Classification of Stage 1.

(a) All-feature
Stage 1 _All	Precision	Recall	F1-score
Under (Error ≤ 0)	0.65/0.60/0.50	0.59/0.64/0.47	0.62/0.62/0.48
Over (Error > 0)	0.63/0.66/0.60	0.69/0.62/0.63	0.66/0.64/0.62
(b) AUC-based
Stage 1 _AUC	Precision	Recall	F1-score
Under (Error ≤ 0)	0.63/0.60/0.57	0.57/0.64/0.49	0.60/0.62/0.53
Over (Error > 0)	0.62/0.66/0.64	0.68/0.62/0.71	0.65/0.64/0.67

Note: Slash-separated values correspond to Sendai, Fukuoka, and Tateno, respectively. The same convention is applied hereafter.

Table 5. Classification of Stage 2.

(a) All-feature
Stage 2 _All	Precision	Recall	F1-score
Under normal	0.90/0.81/0.89	0.92/0.96/0.90	0.91/0.88/0.90
Under extreme	0.60/0.63/0.56	0.53/0.24/0.52	0.56/0.35/0.54
(b) AUC-based
Stage 2 _AUC	Precision	Recall	F1-score
Under normal	0.88/0.82/0.88	0.89/0.95/0.92	0.89/0.88/0.90
Under extreme	0.50/0.64/0.57	0.48/0.29/0.47	0.49/0.40/0.51

Table 6. Classification of Stage 3.

(a) All-feature
Stage 3 _All	Precision	Recall	F1-score
Over normal	0.88/0.85/0.89	0.91/0.94/0.94	0.90/0.89/0.91
Over extreme	0.44/0.43/0.57	0.36/0.21/0.39	0.40/0.28/0.46
(b) AUC-based
Stage 3 _AUC	Precision	Recall	F1-score
Over normal	0.88/0.86/0.88	0.89/0.93/0.96	0.89/0.89/0.92
Over extreme	0.41/0.46/0.60	0.39/0.28/0.32	0.40/0.35/0.41

Table 7. Classification accuracy and number of selected features.

Location	Model	Accuracy			Number of Features
Location	Model	Stage 1	Stage 2 (Under)	Stage 3 (Over)	Stage 1	Stage 2 (Under)	Stage 3 (Over)
Sendai	ALL	0.64	0.85	0.82	17	17	17
Sendai	AUC	0.62	0.82	0.81	11	13	10
Fukuoka	ALL	0.63	0.80	0.81	17	17	17
Fukuoka	AUC	0.63	0.80	0.81	8	13	11
Tateno	ALL	0.56	0.83	0.85	17	17	17
Tateno	AUC	0.61	0.83	0.85	7	14	13

Table 8. Annual imbalance evaluation matrices.

(a) All-feature
All	Predicted	Strategy A	Strategy B	Strategy C
Number of shortages	1261/1412/1324	654/819/618	378/544/448	92/180/77
Surplus amount [kWh]	123.5/103.8/88.4	375.4/310.4/414.7	637.3/540.4/742.7	900.2/790.6/965.3
(b) AUC-based
AUC	Predicted	Strategy A	Strategy B	Strategy C
Number of shortages	1261/1412/1324	661/842/553	388/563/375	99/168/81
Surplus amount [kWh]	123.5/103.8/88.4	381.8/301.9/423.9	628.8/504.7/796.4	888.6/771.5/997.1

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Cui, J.; Yanagida, R.; Yamanaka, S.; Ueda, Y. Rare-Event Risk-Based Bidding Strategy for Photovoltaic Systems in the Balancing Market. Solar 2026, 6, 32. https://doi.org/10.3390/solar6030032

AMA Style

Cui J, Yanagida R, Yamanaka S, Ueda Y. Rare-Event Risk-Based Bidding Strategy for Photovoltaic Systems in the Balancing Market. Solar. 2026; 6(3):32. https://doi.org/10.3390/solar6030032

Chicago/Turabian Style

Cui, Jindan, Ren Yanagida, Shuzo Yamanaka, and Yuzuru Ueda. 2026. "Rare-Event Risk-Based Bidding Strategy for Photovoltaic Systems in the Balancing Market" Solar 6, no. 3: 32. https://doi.org/10.3390/solar6030032

APA Style

Cui, J., Yanagida, R., Yamanaka, S., & Ueda, Y. (2026). Rare-Event Risk-Based Bidding Strategy for Photovoltaic Systems in the Balancing Market. Solar, 6(3), 32. https://doi.org/10.3390/solar6030032

Article Menu

Rare-Event Risk-Based Bidding Strategy for Photovoltaic Systems in the Balancing Market

Abstract

1. Introduction-

2. Research Framework and Problem Setting

2.1. Data Source and Preprocessing

2.2. Rare-Event Definition and Labeling

2.3. Overall Workflow

3. Rare-Event Classification Modeling

3.1. Extraction of Candidate Features

3.2. Definition of ROC and AUC

3.3. Stepwise AUC Calculation and Feature Selection

3.4. MLP Model Structure

4. Numerical Simulation and Results

4.1. Data, Study Period, and Site

4.2. Results and Discussion

4.3. Bidding Plan and Imbalance Evaluation

5. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI