Assessing the Impact of Features on Probabilistic Modeling of Photovoltaic Power Generation

: Photovoltaic power generation has high variability and uncertainty because it is affected by uncertain factors such as weather conditions. Therefore, probabilistic forecasting is useful for optimal operation and risk hedging in power systems with large amounts of photovoltaic power generation. However, deterministic forecasting is the mainstay of photovoltaic generation forecasting; there are few studies on probabilistic forecasting and feature selection from weather or time-oriented features in such forecasting. In this study, prediction intervals were generated by the lower upper bound estimation (LUBE) using neural networks with two outputs to make probabilistic modeling for predictions. The objective was to improve prediction interval coverage probability (PICP), mean prediction interval width (MPIW), continuous ranked probability score (CRPS), and loss, which is the integration of PICP and MPIW, by removing unnecessary features through feature selection. When features with high gain were selected by random forest (RF), in the modeling of 14.7 kW PV systems, loss improved by 1.57 kW, CRPS by 0.03 kW, PICP by 0.057 kW, and MPIW by 0.12 kW on average over two weeks compared to the case where all features were used without feature selection. Therefore, the low gain features from RF act as noise and reduce the modeling accuracy. investigation, H.Y.; resources, J.K.; data curation, H.Y.; writing— original draft H.Y.; writing—review and editing, D.K. and J.K.; visualization, H.Y.; su-pervision, D.K. and funding


Introduction
Renewable energy sources, including photovoltaic (PV) generation, are being developed in many countries as the need for clean energy increases [1]. In particular, the number of PV installations has increased significantly in recent years due to the low cost of the modules, no carbon dioxide emissions, and the ease of installing the panels. However, PV power generation is highly variable and uncertain, as it is affected by weather conditions and other uncertain factors. This variability and uncertainty significantly impact the operation of power systems in which large amounts of PV power generation is installed. As a countermeasure to this problem, it is thought that the impact on the power systems can be mitigated by conducting PV power output forecasting, which is then considered in the operational plans of thermal power, hydroelectric power, and other power sources whose output can be adjusted. The probabilistic forecast for PV generation is especially useful for battery systems [2][3][4] in the distribution network. The operator of the battery systems needs to consider the risk of the peak as probability to determine the charge or discharge with a specific amount.
PV forecasting models can be divided into three main categories: physical models, statistical models, and hybrid models [5]. Physical models are constructed using numerical weather prediction (NWP) and satellite imagery; Miyazaki et al. [6] used optical flow to estimate the geographical motion of PV output lump related to the cloud motion. Saint-Drenan et al. [7] probabilistically predicted PV output from reference PV output derived (1) The effects of the 14 variables were evaluated by RF. Features were selected according to their gains, and PIs were generated by LUBE and QR. (2) When features with high gain were selected, there was an improvement in the accuracy of PIs in both QR and LUBE compared to when all features were used. However, features with low gain rarely contribute to or worsen the accuracy of PIs. According to Loss, the low gain features reduced the accuracy of PIs in LUBE, especially when PV output fluctuations were large.
The remainder of the paper is organized as follows: Section 2 describes feature selection with RF. Section 3 describes LUBE. Section 4 evaluates the accuracy of PIs by LUBE and QR based on feature selection by RF. Section 5 concludes the study.

Feature Selection Using Random Forest
Because PV power output fluctuates due to weather factors, modeling accuracy varies greatly depending on the features used in the model. If features with additional variables that are effective in modeling are used, the modeling accuracy improves; otherwise, the modeling accuracy deteriorates. It also increases the complexity of the model and computational processing. In this study, feature selection was performed by RF, which is an ensemble model that is obtained by modeling a forecasting model that repeats binary classification on a tree called a "decision tree" for each of the multiple samples generated using the bootstrap method, and averaging the forecasting accuracy [29]. RF has been proved useful in forecasting renewable energy sources such as solar power and wind power, which are sensitive to environmental factors [30,31]. The conceptual diagram of feature selection by RF is shown in Figure 1 and the following procedure is used [32,33].
(i). From the training data consisting of n sets of P predictors and corresponding target variables, n sets are extracted, allowing for overlap. The extraction is repeated to generate K bootstrap samples. When generating bootstrap samples, approximately two-thirds of the training data are extracted at least once from the sample and about one-third are never extracted. The group of samples that are not extracted is referred to as out-of-bag (OOB). (ii). A decision tree is modeled for each of the K bootstrap samples and the mean square error (MSE) is obtained using OOB as test data. MSE OOB t represents the MSE when OOB t is the test data. Y is the target variable and the hat symbol represents the predicted value.
(iii). One arbitrary feature from the OOB features is selected, randomly permuted, and the MSE is obtained again. This is repeated for all features.
where X j denotes the feature and j = 1, 2, . . . , P, MSE OOB t X j permuted denotes the MSE at OOB t when the feature X j is permuted.
(iv). The changes in MSE before and after permuting is obtained and the K results are averaged. Normalization is applied so that the sum of the importance of each feature is 1. If a feature is important to the accuracy of the forecast, permuting will significantly reduce the accuracy of the forecast. For unimportant features, the accuracy of the prediction is not affected. is the target variable and the hat symbol represents the predicted value.
(iii). One arbitrary feature from the OOB features is selected, randomly permuted, and the MSE is obtained again. This is repeated for all features.
where denotes the feature and = 1,2, … , P , MSE denotes the MSE at OOB when the feature is permuted.
(iv). The changes in MSE before and after permuting is obtained and the K results are averaged. Normalization is applied so that the sum of the importance of each feature is 1. If a feature is important to the accuracy of the forecast, permuting will significantly reduce the accuracy of the forecast. For unimportant features, the accuracy of the prediction is not affected.

Lower Upper Bound Estimation (LUBE)
LUBE is a nonparametric method that directly generates PIs using NNs with two outputs corresponding to the upper and lower bounds of the PIs [19]. Traditional methods such as the Delta and Bayesian methods are parametric methods that first perform point estimation and then generate PIs by assuming a distribution over the data [34,35]. LUBE directly generates PIs, which is simple, fast, and without special assumptions about the distribution or a large amount of computational work [21]. In LUBE, PIs are evaluated using the following three indicators. The conceptual diagram of the neural network in LUBE is shown in Figure 2.

Continuous Ranked Probability Score (CRPS)
In LUBE, the neural network learns to minimize Loss. Since the ratio of PICP and MPIW changes depending on the value of λ in Loss, the evaluation of PIs was also performed using CRPS [36]. The CRPS measures the difference between the predicted and observed cumulative distribution functions and is expressed by where is the predictive cumulative distribution function (CDF) and y is the verifying observation, − denotes the Heaviside function and takes the value 0 when and the value 1 otherwise, and n is total points.
In Equation (11), when follows a Gaussian distribution with mean and dispersion , it is expressed by where and denote probability density functions (PDF) and the CDF, respectively, of the normal distribution with mean 0 and variance 1 evaluated at the normalized prediction error. To calculate the (12) expression in LUBE, we used proper scoring [37], an external python library. G. Mitrentsis et al. [18] used proper scoring to calculate CRPS for LUBE. However, since LUBE outputs PIs directly without assuming a distribution, care must be taken when evaluating calculations assuming a Gaussian distribution.

Case Study
For the probabilistic modeling of solar power generation, excluding unnecessary features improves modeling accuracy, but missing necessary features reduces modeling accuracy. In particular, PV modeling is difficult when output fluctuation is large, and if unnecessary features are included, the modeling accuracy is likely to be even lower. In this case study, we evaluated the importance of each feature for the probabilistic modeling. We first used RF to evaluate the gain of 14 features, consisting of weather and time-oriented features, for modeling PV power output. Next, PIs were generated using LUBE with a confidence level of 95%. In LUBE, we used 14 pairs of features added in order from

Prediction Interval Coverage Probability (PICP)
PICP is an index that evaluates the percentage of measured values that fall within the interval of PIs and is one of the important evaluation indicators. The predicted lower and upper PI bounds areŷ L i ,ŷ U i . A vector, k, of length n represents whether each data point has been captured by the estimated PIs.
We define the total number of data points captured as c.

Mean Prediction Interval Width (MPIW)
MPIW represents the mean interval width and is an important metric for evaluating PIs; even if all measured values are within the interval and PICP is 100%, a too-wide MPIW implies high uncertainty and is meaningless as a forecast.

Loss
The two key metrics, PICP and MPIW, need to be evaluated simultaneously when generating PIs using LUBE. However, if the width of PIs is narrowed, PICP is likely to Energies 2022, 15, 5337 6 of 17 decrease because some measured values will drop out of the PIs. In LUBE, the two trade-off indicators are evaluated simultaneously using Loss [28].
where n is the number of data samples and α is a measure of the confidence level of the PIs. For example, when α = 0.05, 1 − α is 95%. At this time, the qualitative understanding in the equation is that if PICP does not exceed 0.95, a penalty is imposed by λ. In this case, Loss is the sum of MPIW, and the PICP term including λ. λ is the parameter that imposes a penalty and, at the same time, a tuning parameter that relates MPIW to PICP. If it exceeds 0.95, Loss is equal to MPIW. In PIs, when PICP is much lower than the established confidence level, the PIs can be considered lacking validity [14]. Therefore, in determining λ, there is need to adjust PICP and then consider the combination with MPIW.

Continuous Ranked Probability Score (CRPS)
In LUBE, the neural network learns to minimize Loss. Since the ratio of PICP and MPIW changes depending on the value of λ in Loss, the evaluation of PIs was also performed using CRPS [36]. The CRPS measures the difference between the predicted and observed cumulative distribution functions and is expressed by where F is the predictive cumulative distribution function (CDF) and y is the verifying observation, H(t − y) denotes the Heaviside function and takes the value 0 when t < y and the value 1 otherwise, and n is total points. In Equation (11), when F follows a Gaussian distribution with mean µ and dispersion σ 2 , it is expressed by where ϕ y−µ σ and Φ y−µ σ denote probability density functions (PDF) and the CDF, respectively, of the normal distribution with mean 0 and variance 1 evaluated at the normalized prediction error. To calculate the (12) expression in LUBE, we used proper scoring [37], an external python library. G. Mitrentsis et al. [18] used proper scoring to calculate CRPS for LUBE. However, since LUBE outputs PIs directly without assuming a distribution, care must be taken when evaluating calculations assuming a Gaussian distribution.

Case Study
For the probabilistic modeling of solar power generation, excluding unnecessary features improves modeling accuracy, but missing necessary features reduces modeling accuracy. In particular, PV modeling is difficult when output fluctuation is large, and if unnecessary features are included, the modeling accuracy is likely to be even lower. In this case study, we evaluated the importance of each feature for the probabilistic modeling. We first used RF to evaluate the gain of 14 features, consisting of weather and time-oriented features, for modeling PV power output. Next, PIs were generated using LUBE with a confidence level of 95%. In LUBE, we used 14 pairs of features added in order from variables with the highest gain. The 2-week average modeling accuracy of the PIs from each feature was evaluated, and the optimal features were considered. We also compared PIs by QR using the gradient boosted regression trees method with those of LUBE. See L. Massidda's study [38] for details on the principle of QR. We also evaluated the robustness of the modeling by setting the number of simulations to 85 in LUBE. Finally, in relation to output fluctuation, we considered each day to evaluate when modeling was successful and when they failed, with and without feature selection.
The PV power plant subject of this study is in Tokyo, Japan, and has a capacity of 14.7 kW. PV power generation is observed every 30 min for 24 h. In addition to power generation data, atmospheric temperature, precipitation, degree of cloudiness, solar radiation, wind speed, and humidity were obtained from nearby meteorological observatories.
The corresponding year, month, day, and hour were also added to the data set. The time data were expressed as a trigonometric function to account for periodicity. For example, the trigonometric representation of hour considers 24 h as one cycle, and daily considers the number of days in a month as one cycle. Azimuth, zenith, and declination [39] are also considered effective features, but are not included in this study. All features were standardized for uniformity of scale. Figure  Adding the most recent data to the training and validation data was expected to improve modeling accuracy. Figure 4 shows a flowchart of LUBE using feature selection by random forest. We used NN consisting of an input layer, an intermediate layer with 100 neurons, and an output layer in LUBE. Regarding the NN parameters, from the combinations of nλ included in Equation (9), learning rate, and epochs listed in Table 1, the parameters with high PICP and narrow MPIW were adopted when all features were used. In QR, a grid search was performed from the parameters of Table 1 so that the CRPS would be small. variables with the highest gain. The 2-week average modeling accuracy of the PIs from each feature was evaluated, and the optimal features were considered. We also compared PIs by QR using the gradient boosted regression trees method with those of LUBE. See L.
Massidda's study [38] for details on the principle of QR. We also evaluated the robustness of the modeling by setting the number of simulations to 85 in LUBE. Finally, in relation to output fluctuation, we considered each day to evaluate when modeling was successful and when they failed, with and without feature selection. The PV power plant subject of this study is in Tokyo, Japan, and has a capacity of 14.7 kW. PV power generation is observed every 30 min for 24 h. In addition to power generation data, atmospheric temperature, precipitation, degree of cloudiness, solar radiation, wind speed, and humidity were obtained from nearby meteorological observatories. The corresponding year, month, day, and hour were also added to the data set. The time data were expressed as a trigonometric function to account for periodicity. For example, the trigonometric representation of hour considers 24 h as one cycle, and daily considers the number of days in a month as one cycle. Azimuth, zenith, and declination [39] are also considered effective features, but are not included in this study. All features were standardized for uniformity of scale.  Table 1, the parameters with high PICP and narrow MPIW were adopted when all features were used. In QR, a grid search was performed from the parameters of Table 1 so that the CRPS would be small.     Table 2 shows the results of feature selection using RF. The gain for solar radiation is 0.764, indicating that it is an extremely important feature. The hour sine and hour cosine have a combined gain of 0.172, and the annual cosine has a gain of 0.013. The gains for the other features are almost zero, indicating that they are of low importance.   Table 2 shows the results of feature selection using RF. The gain for solar radiation is 0.764, indicating that it is an extremely important feature. The hour sine and hour cosine have a combined gain of 0.172, and the annual cosine has a gain of 0.013. The gains for the other features are almost zero, indicating that they are of low importance.  Figure 5 shows the correspondence between the number of features and the two-week average of Loss in LUBE and total gain. In Figure 5, the green triangle means median, and the small circle means outliers. LUBE(1) includes only the solar radiation, which is the highest gain. LUBE(4) includes solar radiation, hour sine, hour cosine, and annual cosine. LUBE (14) includes all features. The same applies to QR(1), (4), (14), etc. Table 3 shows the Loss statistics for LUBE(1), (4),and (14).   Figure 5 shows the correspondence between the number of features and the twoweek average of Loss in LUBE and total gain. In Figure 5, the green triangle means median, and the small circle means outliers. LUBE(1) includes only the solar radiation, which is the highest gain. LUBE(4) includes solar radiation, hour sine, hour cosine, and annual cosine. LUBE (14) includes all features. The same applies to QR(1), (4), (14), etc. Table 3 shows the Loss statistics for LUBE(1), (4),and (14). Figure 6 shows a histogram of the results of 85 simulations of Loss for the 2-week average of LUBE(1), (4), and (14). Figure 5 shows that features with a gain of nearly zero act as noise, reducing the accuracy and robustness of the modeling in LUBE. LUBE(4) has the smallest Loss, increases thereafter, and at LUBE (14) has the largest. The distribution of Loss in the boxand-whisker diagram is expanding from LUBE(4) to LUBE (14). To evaluate the modeling accuracy and robustness, we compared LUBE (14) with LUBE(1), which includes only solar radiation, and LUBE(4), which has the lowest Loss. In Table 3, LUBE (14) is significantly worse than LUBE(1) and LUBE(4) in all statistics. At the median, Loss is 1.38 kW greater than LUBE(1) and 1.57 kW greater than LUBE(4). In addition, as shown by the standard deviations in Table 3 and histograms in Figure 6, the distribution of LUBE (14) is wide, and the modeling accuracy is varied. These results indicate that features that are low in gain should be removed because they reduce not only modeling accuracy but also robustness.    Figure 5 shows that features with a gain of nearly zero act as noise, reducing the accuracy and robustness of the modeling in LUBE. LUBE(4) has the smallest Loss, increases thereafter, and at LUBE (14) has the largest. The distribution of Loss in the box-and-whisker diagram is expanding from LUBE(4) to LUBE (14). To evaluate the modeling accuracy and robustness, we compared LUBE (14) with LUBE(1), which includes only solar radiation, and LUBE(4), which has the lowest Loss. In Table 3, LUBE(14) is significantly worse than LUBE(1) and LUBE(4) in all statistics. At the median, Loss is 1.38 kW greater than LUBE(1) and 1.57 kW greater than LUBE(4). In addition, as shown by the standard deviations in Table 3 and histograms in Figure 6, the distribution of LUBE (14) is wide, and the modeling accuracy is varied. These results indicate that features that are low in gain should be removed because they reduce not only modeling accuracy but also robustness.  We then similarly evaluated PICP and MPIW. Figures 7 and 8 show the 2-week average PICP and MPIW in LUBE and QR, respectively. In Figures 7 and 8, the green triangle means median, and the small circle means outliers. Tables 4 and 5 show the PICP and MPIW statistics for LUBE(1), (4), (14), and QR(2), respectively; Figures 9 and 10 show the histograms for LUBE(1), (4), and (14). Figures 7 and 8 show that the features after LUBE(4) work as noise in LUBE, because the PICP tends to decrease and MPIW tends to increase after LUBE(4). In QR, they also show features after QR(4) do not contribute to the prediction accuracy, because there is almost no change in PICP and MPIW after QR(4). Compared to LUBE, QR has wider MPIW and higher PICP. In Tables 4 and 5, we evaluated the impact of features in LUBE. In Table 4, comparing median values, PICP is 0.072 and 0.057 higher for LUBE(1), and LUBE(4), respectively, compared to LUBE (14). In Table 5, MPIW is 0.12 kW narrower for LUBE(4) than for LUBE (14). However, in LUBE(1), MPIW is 0.42 kW wider than in LUBE (14). This means that LUBE(1) is highly uncertain because it contains only one feature. LUBE(4) is the narrowest MPIW in Table 5 because it contains enough features with high gain. Therefore, LUBE(4) outperforms LUBE (14) in MPIW and PICP as well. From the standard deviations in Tables 4 and 5 and the histograms in Figures 9 and 10, PICP and MPIW, as well as Loss, features with low gain reduce the robustness of the modeling. These results indicate that in PICP and MPIW, as in Loss, the features after LUBE(4) are noise. We then similarly evaluated PICP and MPIW. Figures 7 and 8 show the 2-week average PICP and MPIW in LUBE and QR, respectively. In Figures 7 and 8, the green triangle means median, and the small circle means outliers. Tables 4 and 5 show the PICP and MPIW statistics for LUBE(1), (4), (14), and QR(2), respectively; Figures 9 and 10 show the histograms for LUBE(1), (4), and (14).             Figures 7 and 8 show that the features after LUBE(4) work as noise in LUBE, because the PICP tends to decrease and MPIW tends to increase after LUBE(4). In QR, they also show features after QR(4) do not contribute to the prediction accuracy, because there is almost no change in PICP and MPIW after QR(4). Compared to LUBE, QR has wider MPIW and higher PICP. In Tables 4 and 5, we evaluated the impact of features in LUBE.

Evaluation of Prediction Intervals at 2-Week Average
In Table 4, comparing median values, PICP is 0.072 and 0.057 higher for LUBE (1), and LUBE(4), respectively, compared to LUBE (14). In Table 5, MPIW is 0.12 kW narrower for LUBE(4) than for LUBE (14). However, in LUBE(1), MPIW is 0.42 kW wider than in LUBE (14). This means that LUBE(1) is highly uncertain because it contains only one feature. LUBE(4) is the narrowest MPIW in Table 5 because it contains enough features with high gain. Therefore, LUBE(4) outperforms LUBE (14) in MPIW and PICP as well. From the standard deviations in Tables 4 and 5 and the histograms in Figures 9 and 10, PICP and MPIW, as well as Loss, features with low gain reduce the robustness of the modeling. These results indicate that in PICP and MPIW, as in Loss, the features after LUBE(4) are noise.
Finally, using CRPS, we evaluated LUBE and QR. Figure 11 shows the correspondence between the number of features and the two-week average of CRPS in LUBE and QR, and total gain. In Figure 11, the green triangle means median, and the small circle means outliers. Table 6 shows the statistics of CRPS for the 2-week average of LUBE(1), (4), (14) and QR (2).

Accuracy Evaluation by Forecast Target Date
To evaluate the relationship between output fluctuation and modeling accuracy according to features, we analyze LUBE (14), which includes all features, LUBE(1), LUBE(4), and QR(2) by day as shown in Tables 7 and 8. In Tables 7 and 8, out of 85 simulations in LUBE, the median of Loss and corresponding MPIW, PICP, and CRPS for each day are shown. In Tables 7 and 8, we evaluated the modeling accuracy using an index named Fluctuation, which is the average output fluctuations at 30-min intervals over a day. This study defines Fluctuation as indicated in Equation (13).
where represents the power output at time . Since output power is observed at 30-min intervals, there are 48 i in a day. For example, when Fluctuation is 0 kW, the daily generation output is constant. In Tables 7 and 8, the threshold for Fluctuation is 0.5 kW, which is the average of the output fluctuations for all days, and days when the output fluctuations exceed 0.5 kW are highlighted in gray. On each day, the highest PICP and the lowest MPIW, Loss, and CRPS are shown in bold among LUBE(1), (4), (14), and QR(2). For example, on 2 June , Fluctuation is 0.582 kW, QR(2) has the highest PICP of 0.958 and LUBE (14) has the narrowest MPIW of 1.40 kW. LUBE(4) has the lowest Loss and CRPS at   Figure 11 shows that both LUBE(1) and QR(1) have poor modeling accuracy because they only include solar radiation as a feature. In Figure 11, the CRPS after LUBE(4) and QR(4) is almost constant, indicating that features with low gain do not contribute to modeling accuracy. LUBE has the smallest CRPS in LUBE(4). However, QR has the smallest CRPS in QR(2), not QR(4). This is because QR has PICP close to 95% for any number of features in Figure 7, and QR(2) has the narrowest MPIW in Figure 8. Therefore, we can conclude that QR(2) and LUBE(4) use enough features as needed and have high modeling accuracy, and features with low gain do not contribute to improving modeling accuracy.

Accuracy Evaluation by Forecast Target Date
To evaluate the relationship between output fluctuation and modeling accuracy according to features, we analyze LUBE (14), which includes all features, LUBE(1), LUBE(4), and QR(2) by day as shown in Tables 7 and 8. In Tables 7 and 8 shown. In Tables 7 and 8, we evaluated the modeling accuracy using an index named Fluctuation, which is the average output fluctuations at 30-min intervals over a day. This study defines Fluctuation as indicated in Equation (13).
where v(t i ) represents the power output at time t i . Since output power is observed at 30-min intervals, there are 48 i in a day. For example, when Fluctuation is 0 kW, the daily generation output is constant. In Tables 7 and 8, the threshold for Fluctuation is 0.5 kW, which is the average of the output fluctuations for all days, and days when the output fluctuations exceed 0.5 kW are highlighted in gray. On each day, the highest PICP and the lowest MPIW, Loss, and CRPS are shown in bold among LUBE(1), (4), (14), and QR (2). For example, on 2 June, Fluctuation is 0.582 kW, QR(2) has the highest PICP of 0.958 and LUBE (14) has the narrowest MPIW of 1.40 kW. LUBE(4) has the lowest Loss and CRPS at 2.30 kW and 0.48 kW, respectively. Table 9 shows the correlation coefficients between output fluctuations at 30-min intervals and PICP, MPIW, Loss, and CRPS for LUBE(1), (4), (14), and QR(2). Based on the strong correlation between Loss, CRPS and 30-min output fluctuation for all methods in Table 9, it is easy to be modeling when the output fluctuation is small and difficult when it is large. When Fluctuation does not exceed 0.5 kW in Tables 7 and 8, LUBE(4) is the best. From Average (Fluctuation < 0.5 kW) in Tables 7 and 8, PICP, MPIW, Loss, and CRPS over 7 days is best for LUBE (4). When Fluctuation exceeds 0.5 kW, LUBE (4) or QR(2) is suitable for modeling. As indicated by the strong correlation between MPIW and output fluctuation for QR (2) in Table 9, QR(2) has a wide MPIW and high uncertainty on days when Fluctuation is large. However, PIs in QR(2) include many real values and have the largest average PICP in Table 7. In Table 8 Average (Fluctuation > 0.5 kW), CRPS is smallest for LUBE(4), but there are several days when it is more than 0.1 below the confidence level; if the PICP is much below the confidence level, the PIs are not considered valid, so LUBE(4) or QR(2) is appropriate for modeling. Throughout the 14 days, the CRPS and Loss of LUBE (14) are larger than LUBE (4) on all days. In particular, according to Loss, features with low gain on days with large output fluctuation act as noise in LUBE. Therefore, we can conclude that features with low gain should not be used.  Table 9. Correlation coefficient with 30-min output fluctuation for LUBE (1), (4), (14), and QR (2).  Figure 12a shows the PIs for LUBE(1), (4), (14) and QR (2) for the day with the lowest Fluctuation of 0.193 kW in Tables 7 and 8. PICP is above 0.9 in all methods in Table 7. LUBE(1) has the smallest MPIW and Loss, and LUBE(4) is second in Tables 7 and 8. CRPS is the smallest for LUBE(1), (4) in Table 8. Therefore, LUBE(1) is appropriate, followed by LUBE (4). Figure 12b shows the PIs for LUBE(1), (4), (14) and QR (2) for the day with the largest Fluctuation of 0.968 kW in Tables 7 and 8. LUBE (1) or QR (2) is best for 9 June. CRPS is the smallest in LUBE(4). However, PICP of LUBE(4) is 0.833, which is more than 0.1 below the confidence level. The CRPS of LUBE(1) and QR(2) is 0.70, the second smallest, and the PICP is about 0.875. Therefore, LUBE(1) or QR (2)

Conclusions
The objective of this study was to improve the PICP, MPIW, Loss, and CRPS of PIs using LUBE and QR by removing unnecessary features. We considered the change in PI accuracy by incorporating, in order, the features with high gain evaluated by RF into the features used for LUBE and QR. In the number of features 1, which includes only solar radiation evaluated as the most important feature, there was high uncertainty and wide MPIW, high PICP, and large CRPS in LUBE and QR. In the number of features 14 without feature selection, features with a gain of nearly zero led to a decrease in the accuracy of PIs in LUBE and QR. In particular, according to Loss, features with low gain on days with large output fluctuation act as noise in LUBE. In QR, the number of features 2, which includes solar radiation and hour sine, and in LUBE, the number of features 4, which adds hour cosine and annual cosine to the above two features, contained enough features as needed and had the lowest CRPS and Loss. In LUBE, the number of features 4 improved Loss by 1.57 kW, CRPS by 0.03 kW, PICP by 0.057, and MPIW by 0.12 kW compared to the number of features 14 on a 14.7 kW PV system. However, on days with large output fluctuations, there are days when the PICP is more than 0.1 lower than the confidence level in LUBE. Therefore, the challenge remains to generate PIs with PICP close to the confidence level even on days with strong fluctuations.   (1), (4), (14), and QR (2): (a) 6 June, the day with the smallest output fluctuation in two weeks.; (b) 9 June, the day of the largest output fluctuation in two weeks.

Conclusions
The objective of this study was to improve the PICP, MPIW, Loss, and CRPS of PIs using LUBE and QR by removing unnecessary features. We considered the change in PI accuracy by incorporating, in order, the features with high gain evaluated by RF into the features used for LUBE and QR. In the number of features 1, which includes only solar radiation evaluated as the most important feature, there was high uncertainty and wide MPIW, high PICP, and large CRPS in LUBE and QR. In the number of features 14 without feature selection, features with a gain of nearly zero led to a decrease in the accuracy of PIs in LUBE and QR. In particular, according to Loss, features with low gain on days with large output fluctuation act as noise in LUBE. In QR, the number of features 2, which includes solar radiation and hour sine, and in LUBE, the number of features 4, which adds hour cosine and annual cosine to the above two features, contained enough features as needed and had the lowest CRPS and Loss. In LUBE, the number of features 4 improved Loss by 1.57 kW, CRPS by 0.03 kW, PICP by 0.057, and MPIW by 0.12 kW compared to the number of features 14 on a 14.7 kW PV system. However, on days with large output fluctuations, there are days when the PICP is more than 0.1 lower than the confidence level in LUBE. Therefore, the challenge remains to generate PIs with PICP close to the confidence level even on days with strong fluctuations.