Stacking Ensemble Method with the RNN Meta-Learner for Short-Term PV Power Forecasting

: Photovoltaic (PV) power forecasting urges in economic and secure operations of power systems. To avoid an inaccurate individual forecasting model, we propose an approach for a one-day to three-day ahead PV power hourly forecasting based on the stacking ensemble model with a recurrent neural network (RNN) as a meta-learner. The proposed approach is built by using real weather data and forecasted weather data in the training and testing stages, respectively. To accommodate uncertain weather, a daily clustering method based on statistical features, e.g., daily average, maximum, and standard deviation of PV power is applied in the data sets. Historical PV power output and weather data are used to train and test the model. The single learner considered in this research are artificial neural network, deep neural network, support vector regressions, long short-term memory, and convolutional neural network. Then, RNN is used to combine the forecasting results of each single learner. It is also important to observe the best combination of the single learners in this paper. Furthermore, to compare the performance of the proposed method, a random forest ensemble instead of RNN is used as a benchmark for comparison. Mean relative error (MRE) and mean absolute error (MAE) are used as criteria to validate the accuracy of different forecasting models. The MRE of the proposed RNN ensemble learner model is 4.29%, which has significant improvements by about 7–40%, 7–30%, and 8% compared to the single models, the combinations of fewer single learners, and the benchmark method, respectively. The results show that the proposed method is promising for use in real PV power forecasting systems.


Introduction
Photovoltaic (PV) power generation forecasting has become one of the m portant aspects of renewable energy integration to the smart grid. PV power int to a low-voltage weak grid system is beneficial to strengthen the grid and imp performance of the grid-integrated PV system [1][2][3]. However, as the PV power implemented, the harvested PV generation needs to be projected accurately. The ties in predicting the PV power comprise uncertain weather variables, such as t ture, wind speed, relative humidity, and irradiance, which are important for h strong correlation to targeted PV power [4][5][6]. In addition, a forecasting algorith gently required to build an optimal forecasting model that can predict the PV p curately.

Introduction
Photovoltaic (PV) power generation forecasting has become one of the most important aspects of renewable energy integration to the smart grid. PV power integration to a low-voltage weak grid system is beneficial to strengthen the grid and improve the performance of the grid-integrated PV system [1][2][3]. However, as the PV power is vastly implemented, the harvested PV generation needs to be projected accurately. The difficulties in predicting the PV power comprise uncertain weather variables, such as temperature, wind speed, relative humidity, and irradiance, which are important for having a strong correlation to targeted PV power [4][5][6]. In addition, a forecasting algorithm is urgently required to build an optimal forecasting model that can predict the PV power accurately.

Introduction
Photovoltaic (PV) power generation forecasting has become one of the most important aspects of renewable energy integration to the smart grid. PV power integration to a lowvoltage weak grid system is beneficial to strengthen the grid and improve the performance of the grid-integrated PV system [1][2][3]. However, as the PV power is vastly implemented, the harvested PV generation needs to be projected accurately. The difficulties in predicting the PV power comprise uncertain weather variables, such as temperature, wind speed, relative humidity, and irradiance, which are important for having a strong correlation to targeted PV power [4][5][6]. In addition, a forecasting algorithm is urgently required to build an optimal forecasting model that can predict the PV power accurately.
PV power forecasting has been researched by constructing a model that uses regression [7], a statistical method [8], a meta-heuristic method [9], and a hybrid method [10]. In Reference [7], Gaussian kernel support vector regression (SVR) is used as a forecasting model of PV energy due to the combination of numerical weather prediction and satellite data. However, this method provides the forecasting only for a short time horizon (6 h ahead). In Reference [8], an autoregressive moving average and exogenous input (ARMAX) model is used as the improvement of the autoregressive integrated moving average (ARIMA) model, which has limitations because the model cannot take the climate condition into considerations and results in notable error validation. Furthermore, pattern classification and particle swarm optimization [9] are used to optimize the weight of sky image cloud motion calculation to improve the accuracy of PV power forecasting. The drawback of this method is the high computational cost caused by the implementation of the optimization method compared to low prediction accuracy, which needs to be corrected. The hybrid metaheuristic and machine learning techniques for parameter optimization have been widely used in PV power forecasting, as reviewed in Reference [10].
Single forecasting algorithm in PV power generation is well-developed, such as in deep neural network (DNN) [11], convolutional neural network (CNN) [12], SVR [13], and random forest (RF) [14]. In Reference [11], the model is constructed using only historical PV power output data. Mean absolute error (MAE), root mean squared error (RMSE), and coefficient of determination (R 2 ) are used as evaluation criteria. However, without including the meteorological variables, it is still difficult to obtain better accuracy for a longer time period.
Regarding the input variables, in Reference [12], the forecasting model is constructed using module temperature, solar radiance, and PV power output. MAE and RMSE are used as error validation. In Reference [13], the forecasting model is constructed using the same day of historical PV output power, the maximum, minimum, and average temperature. Mean relative error (MRE) and RMSE are used to evaluate accuracy. The results show that the MRE is still relatively high, especially for cloudy, foggy, and rainy days. In Reference [14], the forecasting model is constructed using temperature, atmospheric pressure, relative humidity, wind speed, PM2.5 air pollution concentration, and total solar radiation. MAE, mean absolute percentage error (MAPE), RMSE, and explained variance (EV) are used as performance evaluation indicators. The results show that the accuracy on rainy or snowy days is higher than on clear and cloudy days.
In addition, probabilistic PV power forecasting has also been developed. A high-order Markov chain is implemented to model the generated PV power and used to forecast the probability distribution function (PDF) of PV power [15] by using historical PV output power, ambient temperature, and solar irradiance as input. Continuous rank probability score (CRPS) is used to observe the performance of forecasting PDF, while MAE is used to measure the error of point forecasting. In Reference [16], the prediction model is constructed by an improved Markov chain, having historical PV power, radiation, wind speed, temperature, humidity, and wind direction as input variables. MAE, RMSE, and quantile score are used as evaluation criteria, and the width of the forecasting interval is calculated.
In Reference [17], a forecasting model is constructed using ensemble learning of the linear method, the normal distribution method, and the normal distribution method with additional features. Twelve environmental variables are used as input to the model, and RMSE and MAE are used as evaluation metrics for point forecasting, while the pinball-loss function is used for probabilistic forecasting. Even though the probabilistic PV power is promising, it is difficult for the system operators to understand the result with no prior experience.
Point forecasting is desirable since we need to further process the results of PV power forecasting to the subsequent processes, such as sending the power to the grid [18] or to the energy management system [19]. To improve forecasting accuracy, the ensemble method is preferable for use versus the single forecasting algorithm [20]. The ensemble method is the meta-learner forecasting algorithm [21]. In Raza et al. [22], the ensemble method builds from five FFNN structures, and each FFNN structure contains 20 different structures. Historical data for PV power output, solar irradiance, wind speed, temperature, and humidity are used as input. The performance indices used are MAPE and EV.
In Reference [23], an ensemble method is built based on the artificial neural network (ANN) models with different training data sets using the bagging technique. RMSE, MAE, and weighted mean absolute error are used as performance metrics. In addition, there are ensemble prediction intervals using the same ensemble model. The model uses prediction intervals coverage probability and prediction intervals width to validate this model. In Reference [24], the ensemble method builds from a group of decision trees to construct a random forest ensemble model. Normalized mean bias error (nMBE), normalized mean absolute error (nMAE), normalized RMSE (nRMSE), and skill score are used as evaluation metrics.
In this paper, we propose an accurate ensemble method combining five single models, which are ANN, DNN, SVR, long short-term memory (LSTM), and CNN, into a metalearner algorithm for PV generation forecast of a real 200 kWp PV power plant in Taiwan. The meta-learner algorithm used in this paper is the recurrent neural network (RNN). ANN has a good capability for solving non-linear problems [25], while DNN is similar to ANN with two or more hidden layers. SVR performs lower computation, is easy to implement, and is robust to outliers [26]. LSTM has a memory to store long information and is good at handling sequence data [27,28]. CNN has a good capability to handle image input and can learn relevant features from an image. CNN also has a good performance in forecasting time series data by reshaping the data into 3D form as the input to CNN [29]. By incorporating these advantages, a stacking ensemble method with RNN as the meta-learner is formulated. RNN can capture the sequential information in the input data better than other models. In addition, RNN has parameter sharing, thus having fewer parameters to train and eases computational burden.
In this study, the ensemble of PV power forecasting with RNN as a meta-learner is compared to RF [30] as the benchmark model. To build the proposed ensemble forecasting method, the input variables are preprocessed through normalization and missing data correction, and then clustered into five weather types. Then, these data are fed to each single forecasting method. The result of each method is sent to the proposed RNN to obtain the final forecasting results. Our contributions can be highlighted as follows: (1) the implementation of RNN as a meta-learner in the stacking ensemble method is first used for PV power forecasting; (2) the proposed ensemble method is established based on the latest well-known single forecasting methods (i.e., ANN, DNN, SVR, LSTM, and CNN) before being fed to RNN as a meta-learner, and the combinations of the best three methods among them are observed; (3) the clustering method is implemented based on the daily average, maximum, and standard deviation of PV power output, a scheme that further improves the forecasting accuracy; and (4) real case studies are performed using data from a PV power plant in Taiwan with a nominal capacity of 200 kWp, with good contributions to the PV industry.
The remainder of this paper is organized as follows. The ensemble methods are reviewed in Section 2. Section 3 describes the proposed ensemble forecasting model. Section 4 describes the dataset and performance evaluation. The case study and discussion are described in Section 5. Finally, conclusions are discussed in Section 6.

Ensemble Methods
Ensemble techniques train multiple learners to increase forecasting accuracy. Unlike ordinary learning approaches that attempt to develop one learner, ensemble approaches aim to create a collection of learners from training data and merge them [31]. There are various techniques such as simple averaging [32], weighted averaging [32], bagging [33], boosting [34], and stacking [32] used to perform ensemble methods. Unlike algorithms for bagging and boosting, which originally combine single learners of the same type, the algorithm for stacking is typically applied to single learners constructed by various learning algorithms. Described below are the reviewed ensemble methods.

Simple Averaging
Simple averaging [32] is acquired by directly averaging the outputs of every single learner as follows: where Y(x) is the combined forecasting results,ŷ i (x) is a single learner's forecasting results at the i th point, and N is the number of the single method.

Weighted Averaging
Weighted averaging [32] is acquired by averaging the outputs of single learners with dissimilar weights as follows: where Y(x) is the combined forecasting results, and w i is the weight forŷ i .ŷ i (x) is the single learner's forecasting results at the i th point, and N is the number of the single method.

Bagging
Bagging (or bootstrap aggregating) [33] is the representation of the parallel ensemble technique. In this technique, the single learners are created in parallel. Bagging uses the bootstrap sampling technique to create multiple subsets (bags) of samples from the original dataset. The final prediction results are obtained by combining the prediction results from all subsets through averaging (1). Random forest is the most common algorithm used in the bagging technique.

Boosting
Boosting is a sequential ensemble process [34]. From the original dataset, several subsets are created. Each subsequent model tries to correct the errors of the previous model. A single method is created on this subset. The most inaccurate prediction will be assigned with a higher weight. Similar to the bagging technique, numerous models will be created to correct the errors from the previous model. Finally, the results will be obtained from the weighted average as used in (2). Algorithms used in the boosting technique are AdaBoost [35], gradient boosting machine (GBM) [36], extreme gradient boosting (XGBM) [37], and light gradient boosting machine (LGBM) [38].

Stacking
Stacking, also known as a stacked generalization, is an ensemble learning technique that runs the single learners (first-level learners) and combines them by training the metalearner (second-level learner) to output prediction results [32]. Stacking uses the results of the predictions of multiple models to construct a new model. Generally, any learning algorithm can be used as a single learner and meta-learner. The stacking procedure is shown in Algorithm 1. for t = 1, . . . ,T: % Train a first-level learner by applying the 2.

The Proposed Ensemble PV Power Forecasting Algorithm
The proposed ensemble forecasting algorithm for PV power generation consists of the data preprocessing, five single algorithms, and meta-learner algorithm. Figure 1 shows the RNN-based ensemble method. The input of the single learner algorithms is the hourly historical PV power output and weather variables such as air temperature, relative humidity, and average wind speed with a duration of 12 h in one day from 06:00-17:00. n is the length of training or testing dataset of the single learner for each of the five weather types, for example, for sunny weather type, the training dataset (n) is 576 h (12 h × 48 days) in the training stage, and the testing dataset (n) is 156 h (12 h × 13 days) in the testing stage.

The Proposed Ensemble PV Power Forecasting Algorithm
The proposed ensemble forecasting algorithm for PV power generation consists of the data preprocessing, five single algorithms, and meta-learner algorithm. Figure 2 shows the RNN-based ensemble method. The input of the single learner algorithms is the hourly historical PV power output and weather variables such as air temperature, relative humidity, and average wind speed with a duration of 12 h in one day from 06:00-17:00. n is the length of training or testing dataset of the single learner for each of the five weather types, for example, for sunny weather type, the training dataset (n) is 576 h (12 h × 48 days) in the training stage, and the testing dataset (n) is 156 h (12 h × 13 days) in the testing stage. The forecasting results of single learners are fed to the RNN meta-learner as the training and testing dataset. In RNN, n is 144 h (12 h × 12 days) in the training stage and 12 h in the testing stage. As illustrated in Figure 2, the output results of every single learner model have one vector as the input to the ensemble learner. In our case, we have five First-level learning algorithm L 1 ,…, L T ; Second-level learning algorithm L . Process: 1. for t =1,…,T: %Train a first-level learner by applying the 2.
= ( ); 8. The forecasting results of single learners are fed to the RNN meta-learner as the training and testing dataset. In RNN, n is 144 h (12 h × 12 days) in the training stage and 12 h in the testing stage. As illustrated in Figure 1, the output results of every single learner model have one vector as the input to the ensemble learner. In our case, we have five single learner models, and the RNN input is a matrix of 5 by n. The output of the RNN meta-learner is the final ensemble forecasting results of 12 h for one-day ahead PV power forecasting.
To perform the three-day ahead PV power forecasting, the one-day ahead PV power forecasting is implemented and followed by the rolling-window technique as shown in Figure 2. For example, in step 1, we use the five-day training datasets as input to predict the next day's PV power output D t+1 . In step 2, the five-day training datasets, including the prediction results obtained in step 1 (i.e., D t+1 ), are employed as input to predict D t+2 . The third step employs the five-day training datasets, including the previous two-day predictions in steps 1 and 2, as input to predict D t+3 . The three-day ahead rolling window technique is performed per day to increase the accuracy of PV power forecasting. The forecasting horizon used in this study is 1-3 days ahead with hourly resolution.
Energies 2021, 14, x FOR PEER REVIEW 6 of 24 single learner models, and the RNN input is a matrix of 5 by n. The output of the RNN meta-learner is the final ensemble forecasting results of 12 h for one-day ahead PV power forecasting.
To perform the three-day ahead PV power forecasting, the one-day ahead PV power forecasting is implemented and followed by the rolling-window technique as shown in Figure 3. For example, in step 1, we use the five-day training datasets as input to predict the next day's PV power output Dt+1. In step 2, the five-day training datasets, including the prediction results obtained in step 1 (i.e., Dt+1), are employed as input to predict Dt+2. The third step employs the five-day training datasets, including the previous two-day predictions in steps 1 and 2, as input to predict Dt+3. The three-day ahead rolling window technique is performed per day to increase the accuracy of PV power forecasting. The forecasting horizon used in this study is 1-3 days ahead with hourly resolution.
Initial Window

Rolling Window
Rolling Window training set testin g set Step 1: Step 2: Step 3: In this paper, as an example, the capacity of the PV power plant in Taiwan for demonstration is 200 kWp. The data are taken from January to December 2019 with hourly observations. There are 20 days without weather prediction data, thus the total number of days is 345 days of data for constructing the proposed ensemble algorithm. We take only 12 h in one day where the PV power generation has output.
The training and testing stages are shown in Figure 4. Before preparing the training and testing data sets, the overall input data are classified into five different weather types using the fuzzy c-means (FCM) method based on the Calinski-Harabasz (CH) index. This treatment is performed to obtain better forecasting performance. Then, the classified weather data sets are sunny (61 days), light-cloudy (101 days), cloudy (96 days), heavycloudy (62 days), and rainy (25 days   In this paper, as an example, the capacity of the PV power plant in Taiwan for demonstration is 200 kWp. The data are taken from January to December 2019 with hourly observations. There are 20 days without weather prediction data, thus the total number of days is 345 days of data for constructing the proposed ensemble algorithm. We take only 12 h in one day where the PV power generation has output. The training and testing stages are shown in Figure 3. Before preparing the training and testing data sets, the overall input data are classified into five different weather types using the fuzzy c-means (FCM) method based on the Calinski-Harabasz (CH) index. This treatment is performed to obtain better forecasting performance. Then, the classified weather data sets are sunny (61 days), light-cloudy (101 days), cloudy (96 days), heavycloudy (62 days), and rainy (25 days).
Energies 2021, 14, x FOR PEER REVIEW 6 of 24 single learner models, and the RNN input is a matrix of 5 by n. The output of the RNN meta-learner is the final ensemble forecasting results of 12 h for one-day ahead PV power forecasting.
To perform the three-day ahead PV power forecasting, the one-day ahead PV power forecasting is implemented and followed by the rolling-window technique as shown in Figure 3. For example, in step 1, we use the five-day training datasets as input to predict the next day's PV power output Dt+1. In step 2, the five-day training datasets, including the prediction results obtained in step 1 (i.e., Dt+1), are employed as input to predict Dt+2. The third step employs the five-day training datasets, including the previous two-day predictions in steps 1 and 2, as input to predict Dt+3. The three-day ahead rolling window technique is performed per day to increase the accuracy of PV power forecasting. The forecasting horizon used in this study is 1-3 days ahead with hourly resolution.
Initial Window

Rolling Window
Rolling Window training set testin g set Step 1: Step 2: Step 3: In this paper, as an example, the capacity of the PV power plant in Taiwan for demonstration is 200 kWp. The data are taken from January to December 2019 with hourly observations. There are 20 days without weather prediction data, thus the total number of days is 345 days of data for constructing the proposed ensemble algorithm. We take only 12 h in one day where the PV power generation has output.
The training and testing stages are shown in Figure 4. Before preparing the training and testing data sets, the overall input data are classified into five different weather types using the fuzzy c-means (FCM) method based on the Calinski-Harabasz (CH) index. This treatment is performed to obtain better forecasting performance. Then, the classified weather data sets are sunny (61 days), light-cloudy (101 days), cloudy (96 days), heavycloudy (62 days), and rainy (25 days Figure 4. Flowchart for training and testing stages of each single model. Figure 3. A three-day ahead rolling window technique. The detailed explanations regarding data preprocessing, single algorithms, and metalearner are described as follows.

Data Preprocessing
There are three steps in data preprocessing to obtain good data as model input. First, the missing or broken data are replaced by new data from the cubic spline interpolation method. Second, the data are normalized between [0, 1] as follows: where x n is the normalized data and x n is the original data with x max and x min as the maximum and minimum values of x n , respectively. Third, the FCM method [39] is implemented to divide the data into five clusters according to different weather types. The optimal number of clusters is determined using Calinski-Harabasz (CH) index [40]. Then, the treated data are used to train every single model. FCM aims to minimize the objective function as follows: where u m ij is the degree to which an observation x i belongs to a cluster c j . µ j is the center of cluster j, and m is the fuzzifier. m is a real number greater than one, and it defines the level of cluster fuzziness.
In fuzzy clustering, the centroid of a cluster is the mean of all points, and it is weighted by their degree of belonging to the cluster: where C j is the centroid of cluster j. Figure 4 shows five different weather types based on the clustering method. The detailed explanations regarding data preprocessing, single algorithms, and meta-learner are described as follows.

Data Preprocessing
There are three steps in data preprocessing to obtain good data as model input. First, the missing or broken data are replaced by new data from the cubic spline interpolation method. Second, the data are normalized between [0, 1] as follows: where ̅ is the normalized data and is the original data with and as the maximum and minimum values of , respectively.
Third, the FCM method [39] is implemented to divide the data into five clusters according to different weather types. The optimal number of clusters is determined using Calinski-Harabasz (CH) index [40]. Then, the treated data are used to train every single model. FCM aims to minimize the objective function as follows: where is the degree to which an observation belongs to a cluster . is the center of cluster , and is the fuzzifier. is a real number greater than one, and it defines the level of cluster fuzziness.
In fuzzy clustering, the centroid of a cluster is the mean of all points, and it is weighted by their degree of belonging to the cluster: where is the centroid of cluster . Figure 5 shows five different weather types based on the clustering method.   Table 1. The ANN, DNN, SVR, LSTM, and CNN are briefly described as follows:

Artificial Neural Network
The type of ANN is a feedforward neural network, which is a particular form of early artificial neural network known for its design simplicity. In ANN, there is an input layer, hidden layers, and output layer. The detailed network of ANN is referred to Reference [41]. Table 1. Summary of the single learner methods used.
Identifies the non-linear relationship between input and output with high accuracy.
Requires long processing time to train and might encounter over-fitting problem in the training process.
DNN [42] DNN is ANN with more complexity and has many hidden layers.
Has the ability to solve non-linear problems even with relatively fewer historical samples.
Needs to be fine-tuned to achieve the best performance. SVR [7] SVR is the SVM model used for regression problems.

Robust to outliers.
Does not perform well when the dataset has more noise.
Can handle noise, distributed representations, and continuous values.
As a sequence-based model, LSTM cannot predict well in a long time horizon.
CNN [11,12] CNN is the DNN based on image application.
Has the capability to extract features from time-series data with various lengths.
Needs a large training data, and the training process takes much time.

Deep Neural Network
The type of DNN used in this study is the cascade-forward network with three hidden layers. The difference between the cascade-forward network and the ordinary feedforward network is the connection from the input layer and every previous layer to the following layers. A bias can also be added to the input layer, to which the detailed model refers to Reference [42]. The basic principle of the cascade-forward network is the same as the feedforward network.

Support Vector Regression
SVR is used to solve non-linear regression problems as follows: where f (x) denotes the forecast values and ϕ(x) is the kernel function (RBF function as a kernel function) of the inputs. w and b are adjustable coefficients that are weighted and biased, respectively. Through employing a penalty function to estimate the values of coefficients w, the penalty function can be solved by a quadratic program as in Reference [43].

Long Short-Term Memory
LSTM [44] is the refined RNN that can memorize the pattern for long durations of time and can handle or learn the long-term dependencies. LSTM consists of cell state, forget gate, input gate, and output gate. Forget, input, cell, and output gate can be mathematically described as follows: where W f , W i , W C , and W o represent weights for forget gate, input gate, cell state, and output gate, respectively. b f , b i , b C , and b o represent the bias of forget gate, input gate, cell state, and output gate, respectively. C t represents a new candidate for the cell state. C t represents the cell state. σ represents the sigmoid function.

Convolutional Neural Network
In the convolution step, a filter logic is used to extract features from the input image to create a feature map. The convolution layer and non-linear function calculate the previous layer through a convolution operation to obtain the output feature map.
The convolutional layer output can be described as: where x ∈ R N×k represents the input time series or the output of the preceding layer. s represents the convolution stride. C r (t) represents the t th component of the r th feature map. ω r ∈ R l×k and b(r) represent the weights and bias of the r th convolution filter, respectively. Detailed model of CNN can refer to Reference [45].

Ensemble Forecasting Model
RNN is a type of dynamic neural network in which the output depends on the current input to the network, and previous inputs, outputs, and hidden states of the network [46]. RNN contains hidden states that are distributed across time. This structure allows RNN to efficiently store a vast quantity of information about the past. The core idea is that recurrent connections allow the memory of previous inputs to persist in the network's internal state and thereby influence the network output.
The formula for the current state can be written as: where h t represents the new state, h t−1 represents the previous state, and x t represents the current input. The equation for the state at time t is shown in Equation (16): where tanh is the hyperbolic tangent activation function, w hh represents recurrent neuron weight, and w xh represents input neuron weights.
Once the current state is calculated, the output state can be obtained as: where w hy represents output neuron weights. The complete structure of the RNN used for the ensemble is shown in Figure 5. Unlike other neural networks, the RNN is utilized as a meta-learner because of the reduced complexity of parameters. Furthermore, RNN has the ability of parameter sharing for each input as it performs the same task on all the inputs and hidden layers to produce the output, which makes the RNN suitable to be implemented in our application.
where represents output neuron weights. The complete structure of the RNN used for the ensemble is shown in Figure 6. Unlike other neural networks, the RNN is utilized as a meta-learner because of the reduced complexity of parameters. Furthermore, RNN has the ability of parameter sharing for each input as it performs the same task on all the inputs and hidden layers to produce the output, which makes the RNN suitable to be implemented in our application. Figure 7 shows the detailed process of the proposed ensemble forecasting algorithm. We take three-day ahead ensemble forecasting as an example to describe the detailed process. The input variables for the five single learners are preprocessed first. For each weather type, the original data set is split into training and testing sets to build up and test the single learners, respectively, i.e., ANN, DNN, SVR, LSTM, and CNN. The forecasting results of all the single learners are accounted as the ensemble input variables. The ensemble input variables are also split into training and testing sets for each weather type. The corresponding numbers of days for different weather types are shown in Table 2.
x t+1 w hh w hh w xh w xh w xh w hy w hy w hy Figure 6. The RNN structure.   Figure 6 shows the detailed process of the proposed ensemble forecasting algorithm. We take three-day ahead ensemble forecasting as an example to describe the detailed process. The input variables for the five single learners are preprocessed first. For each weather type, the original data set is split into training and testing sets to build up and test the single learners, respectively, i.e., ANN, DNN, SVR, LSTM, and CNN. The forecasting results of all the single learners are accounted as the ensemble input variables. The ensemble input variables are also split into training and testing sets for each weather type. The corresponding numbers of days for different weather types are shown in Table 2.

Datasets and Performance Evaluation
This section introduces the data sets for training and testing, as well as the performance evaluation.

Datasets and Performance Evaluation
This section introduces the data sets for training and testing, as well as the performance evaluation.

Datasets
In this study, the data were collected from the 200 kWp PV site of You Ji Industrial Co., Ltd. in Taiwan as installed at the building rooftop. In this research, the whole-year PV power output data in 2019 were collected.
The available data consisted of meteorological data and measured data from the PV site. The meteorological data were provided by Coastal Ocean Monitoring Center National Cheng Kung University (NCKU) Tainan, Taiwan. These data contained hourly weather predictions for three days ahead and updates every six hours per day at 00:00, 06:00, 12:00, and 18:00. Every time the weather predictions were updated every six hours, in this case, the newer weather prediction data overwrote the oldest. In other words, we only accepted the new data. The data features used in this study were hourly air temperature, relative humidity, and average wind speed. The actual measurement of hourly PV power output generated from each rooftop PV module of the building was also used as the data feature.
The correlation analysis was mainly used to select the proper data features. In this case, the Pearson correlation coefficient was used to calculate the correlation value between each variable, which had a value between −1 and 1 [4]. Table 3 shows the correlation coefficient, t statistics, and p values between weather variables and PV power output. Although the correlation coefficients of temperature, humidity, and wind speed were low, due to the t test results, the input variables with p values lower than 0.05 were still considered significant and used as input variables. In Reference [4], the use of temperature, humidity, and wind speed can give a lower error in irradiance forecasting, which is significantly related to PV power.

Training and Testing Sets
The data collected, as mentioned, were classified into five weather types based on the FCM clustering method and Calinski-Harabasz index. The weather types were sunny, light-cloudy, cloudy, heavy-cloudy, and rainy days.
The training and testing datasets of different weather types, as listed in Table 1, were used to train and test the single and ensemble learner, respectively. For the ensemble forecasting performance validation, the last three days of the testing data set were used in the ensemble model to perform one-day and three-day ahead PV power forecasting for each weather type. The detailed illustration of data preparation for the single learner and ensemble learner is described in Figure 7.
The training and testing datasets of different weather types, as listed in Table 1, were used to train and test the single and ensemble learner, respectively. For the ensemble forecasting performance validation, the last three days of the testing data set were used in the ensemble model to perform one-day and three-day ahead PV power forecasting for each weather type. The detailed illustration of data preparation for the single learner and ensemble learner is described in Figure 8.

Performance Evaluation
To verify the PV power prediction in this study, MRE was used as the main index to compare the performance of the different models. In MRE, the real and forecasting values were divided by the nominal capacity of the PV site, which was 200 kWp in our case, as computed below:

Performance Evaluation
To verify the PV power prediction in this study, MRE was used as the main index to compare the performance of the different models. In MRE, the real and forecasting values were divided by the nominal capacity of the PV site, which was 200 kWp in our case, as computed below: where y i andŷ i are the forecast value and true value of PV power output at i th point, respectively. N is the number of prediction points, and N p is the nominal power capacity of the PV site.
Performance evaluators, such as MRE, MAE, MAPE, RMSE, and R 2 [47][48][49], were used to verify their forecasting method. In this work, it was necessary to evaluate the volatility of prediction error and the forecasting accuracy before we placed the results of several single learners into the ensemble model. Thus, we implemented performance evaluations such as MAE, nRMSE, and R 2 . MRE and MAE can represent the accuracy of the prediction. nRMSE can show the volatility or uncertainty condition of the prediction error. R 2 is the predicted model coefficient of determination that lies between 0 and 1. A higher R 2 leads to a smaller difference between the predicted and real value. MAE, nRMSE, and R 2 are described in Equations (19)-(21), respectively.

Numerical Results
Short-term PV power forecasting covered the forecasting horizon up to 2-3 days ahead. In our case, we performed one-day and three-day ahead hourly PV power forecasting. One-day ahead PV power forecasting was applied in the applications of demand response, ancillary services, and unit commitment. For a longer-time horizon, three-day ahead PV power forecasting can be used practically for energy schedule, energy trading, and plant maintenance [50].
In this section, we conducted the simulations to verify our proposed ensemble forecasting model. MATLAB 2020a on Intel Core i7 3.60-GHz computer was employed for the Energies 2021, 14, 4733 13 of 23 simulation. As a comparison, the RF ensemble method was used as a benchmark model for one-day to three-day ahead PV power forecasting. As one of the supervised learning algorithms that consists of many decision trees and uses bagging, RF is considered the ensemble of decision trees by using bootstrap to subsample the input data and aggregation to combine the results. The detailed model of RF can be found in Reference [51].

Hyperparameter Setting of Single and Ensemble Models
Prediction performance was highly affected by hyperparameter tuning. Table 4 shows the parameters of five single forecasting models obtained by experiences related to the accuracy of every single model. Because the focus of this research was on the ensemble model, the single models were tuned without an optimization scheme to avoid the high computational cost. The optimal hyperparameters tuned in the proposed ensemble model were defined by setting the number of the hidden layers, neurons, input delay, and learning rate. The hidden layer was set from 1 to 5, while the hidden neuron was set from 4 to 10. The input delay was set from 1 to 4, while the learning rate was 0.001, 0.005, 0.01, and 0.05. The benchmark RF model was tuned by setting the number of trees and minimal leaf size. The number of trees was 100, 500, and 1000 with minimal leaf sizes of 1, 3, 5, and 7. The optimal hyperparameters of the proposed ensemble model and benchmark model are shown in Table 5.

Performance of The Proposed Ensemble Model
The FCM clustering method was performed to obtain the best number of clusters. Based on the CH index as shown in Figure 8, the maximum value was chosen as the optimal number of clusters. Therefore, the five clusters named sunny, light-cloudy, cloudy, heavy-cloudy, and rainy days, based on their characteristics were determined. On sunny days, the PV power output reached its maximum value. Cloudy days had medium PV power output, while on rainy days, the PV power output was low. Light-cloudy was between the sunny and cloudy days, while heavy-cloudy was between cloudy and rainy days. The overall weather types are shown in Figure 4. The proposed ensemble model was conducted using the stacking ensemble method via the meta-learner of RNN to combine the forecasting results of five single learners. First, we performed forecasting with every single learner by using training and testing datasets, as mentioned earlier. Second, the forecasting results from every single learner were used as training and testing datasets. The number of days for the ensemble learner and different weather types are shown in Table 1. There were 13, 23, 18, 14, and 7 days forecasting results for sunny, light-cloudy, cloudy, heavy-cloudy, and rainy days, respectively. The last three days of every weather type obtained from the single learner were used as the testing dataset for the ensemble learner, while the remaining days were used as the training dataset.
As shown in Table 5, for every weather type, the optimal hyperparameters of the RNN ensemble model had the same number of hidden layers and input delays. As the length of training and testing datasets was relatively short, up to 23 days, one hidden layer was sufficient to train the ensemble PV power forecasting model. The key difference of RNN performance as a meta-learner in PV power forecasting was in the number of hidden neurons related to the length of training and testing datasets. In light-cloudy and cloudy weather types, the number of hidden neurons was seven, which was the highest among the other weather types. The numbers of training and testing datasets in these two weather types were also the highest. Meanwhile, the lowest number of hidden neurons was the rainy weather type, which only had seven days of training and testing datasets. The RNN structure needed to be adjusted on the size of its hidden neurons to achieve a good fitting ability which led to an accurate forecasting model.
The comparison of predicted and actual values for one-day ahead PV power output between single and ensemble models is shown in Figures 10-12 for sunny, cloudy, and rainy days. In terms of MRE as our main index, the MRE of DNN were 5.15%, 3.06%, and 9.11% on sunny, cloudy, and rainy days, respectively. On every weather type, DNN showed higher prediction performance than other single forecasting models based on the weighted average. ANN and LSTM showed lower prediction performance compared to the other methods. RNN ensemble achieved the best performance compared to RF and any single forecasting method with MRE of 4.70% on sunny days, while RF ensemble showed a good performance with MRE of 2.42% on cloudy days. The proposed ensemble model was conducted using the stacking ensemble method via the meta-learner of RNN to combine the forecasting results of five single learners. First, we performed forecasting with every single learner by using training and testing datasets, as mentioned earlier. Second, the forecasting results from every single learner were used as training and testing datasets. The number of days for the ensemble learner and different weather types are shown in Table 1. There were 13, 23, 18, 14, and 7 days forecasting results for sunny, light-cloudy, cloudy, heavy-cloudy, and rainy days, respectively. The last three days of every weather type obtained from the single learner were used as the testing dataset for the ensemble learner, while the remaining days were used as the training dataset.
As shown in Table 5, for every weather type, the optimal hyperparameters of the RNN ensemble model had the same number of hidden layers and input delays. As the length of training and testing datasets was relatively short, up to 23 days, one hidden layer was sufficient to train the ensemble PV power forecasting model. The key difference of RNN performance as a meta-learner in PV power forecasting was in the number of hidden neurons related to the length of training and testing datasets. In light-cloudy and cloudy weather types, the number of hidden neurons was seven, which was the highest among the other weather types. The numbers of training and testing datasets in these two weather types were also the highest. Meanwhile, the lowest number of hidden neurons was the rainy weather type, which only had seven days of training and testing datasets. The RNN structure needed to be adjusted on the size of its hidden neurons to achieve a good fitting ability which led to an accurate forecasting model.
The comparison of predicted and actual values for one-day ahead PV power output between single and ensemble models is shown in Figures 9-11 for sunny, cloudy, and rainy days. In terms of MRE as our main index, the MRE of DNN were 5.15%, 3.06%, and 9.11% on sunny, cloudy, and rainy days, respectively. On every weather type, DNN showed higher prediction performance than other single forecasting models based on the weighted average. ANN and LSTM showed lower prediction performance compared to the other methods. RNN ensemble achieved the best performance compared to RF and any single forecasting method with MRE of 4.70% on sunny days, while RF ensemble showed a good performance with MRE of 2.42% on cloudy days.             LSTM and CNN showed the lowest performance. The MREs were 4.99%, 2.56%, and 7.26% for DNN on sunny, cloudy, and rainy days, respectively. DNN performed better than any single methods and ensemble methods with an MRE of 4.99% on sunny days.   LSTM and CNN showed the lowest performance. The MREs were 4.99%, 2.56%, and 7.26% for DNN on sunny, cloudy, and rainy days, respectively. DNN performed better than any single methods and ensemble methods with an MRE of 4.99% on sunny days.   LSTM and CNN showed the lowest performance. The MREs were 4.99%, 2.56%, and 7.26% for DNN on sunny, cloudy, and rainy days, respectively. DNN performed better than any single methods and ensemble methods with an MRE of 4.99% on sunny days. LSTM and CNN showed the lowest performance. The MREs were 4.99%, 2.56%, and 7.26% for DNN on sunny, cloudy, and rainy days, respectively. DNN performed better than any single methods and ensemble methods with an MRE of 4.99% on sunny days. However, the RNN ensemble, as our proposed method, showed a consistent performance with the lowest MRE of 2.53% and 4.77% on cloudy and rainy days, respectively. Table 6 shows the performance of the single and proposed ensemble method in oneday ahead forecasting with different error validations. As observed, the performance of our proposed method compares to the RF ensemble method on one-day ahead PV power forecasting. The weighted average MRE of the proposed RNN-ensemble method and the RF method were 3.87% and 4.53%, respectively. The proposed ensemble method is better, especially if the number of data points is low. RF only has a good prediction on cloudy days because RF can perform better on a higher number of data points for one-day ahead PV power forecasting. Overall, the superiority of our proposed method is prominent. Several single models were able to produce good results for certain weather types. However, they cannot perform well in all situations because every single method has its advantages. The proposed ensemble method can combine these advantages of a single method into a better result in terms of MRE, nRMSE, MAE, and R 2 . Table 7 shows the performance evaluation index comparison for three-day ahead PV power forecasting between single and ensemble methods. The weighted average MRE of the proposed RNN ensemble method and the RF method were 4.29% and 4.68%, respectively. RF showed a good result on heavy-cloudy days since RF had the ability to reduce the variance part of error versus the bias in the heavy-cloudy dataset. Nevertheless, the proposed RNN ensemble method performed with more consistent accuracy, as shown on sunny, light-cloudy, cloudy, and rainy datasets over the RF, proving that the proposed RNN ensemble method is better than RF in forecasting time series data. Although every single method and ensemble method can show their prediction ability for longer time horizons, the weighted average prediction for each weather type of the proposed ensemble method performed better than the benchmark and single methods. The overall comparison between one-day ahead and three-day ahead PV power forecasting shows that the proposed ensemble method can have consistency in terms of MRE, nRMSE, and MAE because the accuracy mismatch is relatively small, except for R 2 , which shows a lower prediction accuracy in three-day ahead forecasting than in one-day ahead forecasting.
In addition, to compare the proposed method with the different combinations of fewer single learner methods, the best three methods, DNN, SVR, and ANN, based on the lowest MRE, nRMSE, and MAE in Table 7, were chosen in this paper. Tables 8 and 9 show the comparison of the MRE and MAE among the combinations of the best three single learner methods, DNN+SVR, DNN+ANN, SVR+ANN, and DNN+SVR+ANN, and the proposed complete five single learner methods with RNN ensemble and DNN+SVR+ANN+LSTM+CNN, for three-day ahead hourly PV power forecasting. In the tables, the weighted results of all the methods are based on the number of days for each weather type.  Figure 15 shows the comparison among the single methods with corresponding combinations of the best three, benchmark, and proposed methods in terms of the weighted average MRE and MAE. The combinations of the best three methods of DNN, SVR, and ANN are superior to any single method. The RNN-ensemble of DNN+SVR+ANN has slightly better accuracy than the benchmark method. The proposed method is proved that the RNN-ensemble of the five single learner methods is superior compared to the combinations of the best three single learner methods with MRE of 4.29% and MAE of 8.59 kW for a 200 kWp PV farm. . 100 where , , and are the MAE of the single, combinations of fewer singles, or benchmark method and the proposed method, respectively. Δ is the difference between , , and . The overall results are shown in Table 10. The proposed method had significant improvements by about 7-40% and 7-30% compared to the single models and the combinations of single models, respectively. As compared to the benchmark model, an 8% improvement was obtained. Table 10. The improvement of the proposed method compared to the single, combination, and  Furthermore, to evaluate the MAE's improvement of the proposed method compared to the single, combinations of fewer singles via RNN ensemble, and benchmark method, Equations (22) and (23) are used below [52,53]: .100 (23) where MAE s,c,b and MAE p are the MAE of the single, combinations of fewer singles, or benchmark method and the proposed method, respectively. ∆ is the difference between MAE s,c,b and MAE p .
The overall results are shown in Table 10. The proposed method had significant improvements by about 7-40% and 7-30% compared to the single models and the combinations of single models, respectively. As compared to the benchmark model, an 8% improvement was obtained. The performance of both RF and RNN depended on time-series data processing. RF performed well only in tabular data. RF performance decreased in multivariate time series data. Although RNN had difficulty in predicting long-term dependencies, it was good at predicting sequential data with any length. Thus, the overall results show that our proposed method outperforms all single and benchmark methods.

Conclusions
In this paper, a stacking ensemble method with RNN as a meta-learner was proposed to improve the prediction accuracy for one-day to three-day ahead PV power forecasting. The forecasting results of ANN, DNN, SVR, LSTM, and CNN were combined by the RNN meta-learner to construct the ensemble model. Our proposed method is compared with the well-known benchmark ensemble model of random forest. This paper also compared the RNN-ensemble combination of the three best single learner methods using DNN, SVR, and ANN. Through intensive experiment, we found that the proposed ensemble method with the clustering method can outperform the single learner, the combinations of the single learners, and the benchmark ensemble method in terms of MRE, nRMSE, MAE, and R 2 . The proposed method reached the lowest MRE of 4.29%, which is a significant improvement for about 7-40%, 7-30%, and 8% as compared to the single models, the combinations of fewer single learners, and the benchmark method, respectively. Furthermore, the results show that the proposed ensemble method is reliable enough to be implemented in practical industrial applications.
The proposed method does not have as good accuracy in heavy-cloudy and rainy days as the other day types, as indicated by the coefficient of determination R 2 . Similar to many existing studies, these weather types are harder to predict accurately. Part of the reason is too little training data for these two types of weather to train the corresponding models. In future applications, it should be helpful to consider more weather variables that have a high correlation to PV power, such as cloudiness level [54] and irradiance data, to increase the accuracy of the proposed model. In addition to the random forest method, the other ensemble forecasting methods such as weighted averaging and ridge regression techniques [24] can also be used for comparison. Furthermore, the combination of the other machine learning methods can be studied to establish an ensemble method.