Solar Power Interval Prediction via Lower and Upper Bound Estimation with a New Model Initialization Approach

: This paper proposes a new model initialization approach for solar power prediction interval based on the lower and upper bound estimation (LUBE) structure. The linear regression interval estimation (LRIE) was ﬁrst used to initialize the prediction interval and the extreme learning machine auto encoder (ELM-AE) is then employed to initialize the input weight matrix of the LUBE. Based on the initialized prediction interval and input weight matrix, the output weight matrix of the LUBE could be obtained, which was close to optimal values. The heuristic algorithm was employed to train the LUBE prediction model due to the invalidation of the traditional training approach. The proposed model initialization approach was compared with the point prediction initialization and random initialization approaches. To validate its performance, four heuristic algorithms, including particle swarm optimization (PSO), simulated annealing (SA), harmony search (HS), and di ﬀ erential evolution (DE), were introduced. Based on the experiment results, the proposed model initialization approach with di ﬀ erent heuristic algorithms was better than the point prediction initialization and random initialization approaches. The PSO can obtain the best e ﬃ ciency and e ﬀ ectiveness of the optimal solution searching in four heuristic algorithms. Besides, the ELM-AE can weaken the over-ﬁtting phenomenon of the training model, which is brought in by the heuristic algorithm, and guarantee the model stable output.


Introduction
With the increasing global energy consumption, renewable energy and its application technologies have received extensive attention and are being studied enthusiastically. The intermittent nature and volatility of renewable energy, as significant factors, restrict its exploitation and penetration. An accurate forecast is required to guarantee the stability and economy of power systems. However, the randomness and indeterminacy of natural resources bring great difficulties for solar power predictions.
Traditional solar power point prediction provides limited forecast information, which causes risk [1]. Solar power interval prediction offering interval information under a certain confidence level breaks a new pathway to handle forecasting uncertainty. The interval prediction technology aims at predicting a narrow interval, encompassing as many predicted points as possible. The high-quality prediction intervals are of benefit to static safety analysis and risk evaluation in power systems. However, solar power interval prediction attracts less attention compared to point prediction. The existing prominent interval prediction methods include the statistical method and data-driven method.
The statistical methods are first employed to construct the prediction interval. Statistical methods usually require prior knowledge or distribution assumption of forecasting errors [2][3][4][5]. They often assume that the forecast errors follow a normal distribution with a zero mean or t-student distribution [6]. The bootstrap [7], Bayesian [8], mean-variance estimation [5], and delta methods [9] are the four prominent and traditional methods. These four methods were analyzed from calculations, interval precision, and interval width, which revealed that each method had its shortcomings [10]. The prediction errors display different characters and differ in various application fields. Thus, it is important to make the appropriate distribution assumption, which might result in poor forecasting performance. Li et al. acquired a precise distribution characteristic based on the divided dataset by the envelope-based clustering algorithm. There are also several statistical methods without any prior assumption for probabilistic prediction, such as kernel density estimation [11], ensemble simulations [12], and quantile regression [3].
Data-driven methods are gradually introduced to avoid distribution assumptions. The lower and upper bound estimation (LUBE) structure for interval prediction was first developed by Khosravi et al. [13]. Two output units of the neural network (NN) model were employed to represent the upper and lower bounds of the predicted interval. Such nonparametric models are further widely utilized in many research works [14][15][16]. In the process of training the LUBE, two prominent evaluation metrics, coverage probability and interval width, are considered. Due to their contradictoriness, the LUBE training can be considered as a multi-objective or single-objective optimization model [17][18][19]. In [20], a new multi-objective optimization method using multi-objective swarm algorithm was proposed to adjust the machine learning model, which revealed superior forecasting performance to the single-objective one. In [21], the Pareto optimal solutions were used to construct a multi-objective framework and Pareto solutions obtained an ensemble of optimal solutions. Due to the discontinuous differentiability of the cost function, it is hard to train the NN through the traditional analytical algorithm. Heuristic algorithms such as particle swarm optimization (PSO) and simulated annealing (SA) are employed in this situation.
Most previous interval prediction methods based on LUBE models concentrate on the building of optimization objective and the selection of intelligent algorithms. The initialization method of the NN parameters is rarely studied. However, the initial solution of heuristic algorithms significantly affects their evolution process and performance.
ELM-AE employed in this paper aims at enhancing the generalization capability of the forecasting model. Besides this, current application objects of interval prediction mainly include wind speed, wind power, electricity load, and electricity price prediction. Solar energy, as a representative renewable resource, also deserves some discussion for interval prediction.
This paper proposes a new model initialization approach for the prediction interval based on the LUBE structure. The ELM-AE is first utilized to initialize the input weight matrix of the LUBE model and the linear regression interval estimation (LRIE) is then used to initialize the prediction interval. The initial prediction interval obtained by LRIE is then employed to update the initial parameters of the LUBE model. Numerous comparison experiments are conducted to validate the performance of the proposed model initialization approach.
Some experiments using the proposed initialization approach, traditional initialization approach, and random initialization approach are implemented with the same sample data. Different heuristic algorithms, including particle swarm optimization (PSO), simulated annealing (SA), harmony search (HS), and differential evolution (DE), are conducted to evaluate the impact of the initial solution on different heuristic algorithms.
The remainder of this paper is organized as follows. Section 2 introduces the LUBE method employing ELM and two primary evaluation indices of forecasting intervals. The proposed model initialization approach is described in Section 3. Experiments and results are reviewed in Section 4. Finally, Section 5 makes some conclusions of this work and discusses some guidelines for future work.

Lower and Upper Bound Estimation
The LUBE method utilizing the neural network structure has been widely used to estimate the prediction interval. The schematic diagram of the LUBE method is shown in Figure 1. The ELM with two output nodes is regarded as the prediction model of LUBE. The output of the two output nodes represents the predicted upper and lower bound. Because the actual predicted interval is unknown and uncertain, the traditional background propagation algorithm cannot be used to train the ELM. The training of the ELM is converted to a parameter optimization problem and the heuristic algorithm is utilized to obtain the optimal parameters of the LUBE. The LUBE method utilizing the neural network structure has been widely used to estimate the prediction interval. The schematic diagram of the LUBE method is shown in Figure 1. The ELM with two output nodes is regarded as the prediction model of LUBE. The output of the two output nodes represents the predicted upper and lower bound. Because the actual predicted interval is unknown and uncertain, the traditional background propagation algorithm cannot be used to train the ELM. The training of the ELM is converted to a parameter optimization problem and the heuristic algorithm is utilized to obtain the optimal parameters of the LUBE.

ELM
The ELM introduced by Huang, et al. [22] is a single-hidden layer feed forward neural network with excellent generalization performance and fast learning speed. Thus, the ELM is utilized as the prediction model in this work. In Figure 1, the ELM only has three layers, the input layer, hidden layer, and output layer. Two neuron units in the output layer separately represent the upper and lower bounds of the predicted interval.
In the normal ELM model, suppose that N samples   1 , N jj j xt = are given, where xj∈R n , representing the input vector, tj∈R m , representing the target vector. The input data are transmitted to the L dimensional feature space constructed by the hidden layer and the output element of the network is obtained by Equation (1): where h(x) indicates the outputs of hidden neuron node and the L element corresponds to the outputs of L hidden nodes generated from activation function. Likewise, β = [β1, …, βL] T is the output weight matrix. The goal of the single hidden layer neural network is to minimize the error between the output value and the actual quantity. In matrix form, the target of the network is achieved by Equation (2): = Hβ T (2) where H = [h T (x1), …, h T (xN)] T and T = [t1, … , tN ] T . Thus, in ELM, the output weight β can be expressed as Equation (3): where † H is the Moore-Penrose generalized inverse of matrix.

ELM
The ELM introduced by Huang, et al. [22] is a single-hidden layer feed forward neural network with excellent generalization performance and fast learning speed. Thus, the ELM is utilized as the prediction model in this work. In Figure 1, the ELM only has three layers, the input layer, hidden layer, and output layer. Two neuron units in the output layer separately represent the upper and lower bounds of the predicted interval.
In the normal ELM model, suppose that N samples x j , t j N j=1 are given, where x j ∈ R n , representing the input vector, t j ∈ R m , representing the target vector. The input data are transmitted to the L dimensional feature space constructed by the hidden layer and the output element of the network is obtained by Equation (1): where h(x) indicates the outputs of hidden neuron node and the L element corresponds to the outputs of L hidden nodes generated from activation function. Likewise, β = [β 1 , . . . , β L ] T is the output weight matrix. The goal of the single hidden layer neural network is to minimize the error between the output value and the actual quantity. In matrix form, the target of the network is achieved by Equation (2): where H = [h T (x 1 ), . . . , h T (x N )] T and T = [t 1 , . . . , t N ] T . Thus, in ELM, the output weight β can be expressed as Equation (3): where H † is the Moore-Penrose generalized inverse of matrix.

The Evaluation and Training of LUBE
To evaluate the prediction performance, the mean prediction interval width, PI mean , and prediction interval coverage probability (PICP) in (4)-(5) are introduced. The PI mean qualifies the width of prediction interval. The PICP indicates the percentage of the probability targets covered by the corresponding prediction intervals.
where t i and t i are the predicted upper and lower bounds of the dataset {(x i , t i ), i = 1, . . . , N}. Since the forecasting interval width is strongly associated with the range of the targets, normalized width evaluation index is more suitable for intuitional comparison. A new normalized index, called prediction interval normalized root-mean-square width (PINRW), is employed as in (6) [14]: where R is the range of the forecasting targets. In general, R is equal to the difference between the maximum and minimum values of the training set. The PI mean and PICP (or PINRW) are the contradictory indexes. An ideal interval aims to maximize PICP and minimize PI mean simultaneously. However, a balance and a compromise are required in practice. The cost function coverage width-based criterion (CWC) is introduced to evaluate the predicted interval. The flexible index combines prediction interval coverage percentage and width simultaneously, which could evaluate the overall performance of the prediction intervals and guide the generation of intervals: where the hyper-parameter η magnifies the difference between PICP and δ, which should be a large value. The training of LUBE can be regarded as an optimization problem. The minimization of CWC is the optimization objective and the output weight matrix of ELM is the independent variable. The heuristic algorithm is employed to obtain the optimal output weight matrix by minimizing CWC. The initialization of the output weight matrix can be generated randomly, called the random initialization (RI) approach. The output weight matrix obtained by the point prediction approach also can be utilized to initialize the output weight matrix of the LUBE, called the point initialization (PI) approach [13].

Proposed Model Initialization Approach
In the traditional LUBE interval prediction model, the random input weight matrix and search capacity of the heuristic algorithm significantly impact the final prediction performance. In this section, the proposed model initialization approach is introduced, including prediction interval initialization and input weight matrix initialization, shown in Figure 2. The initial prediction interval {T U , T L } 0 was first obtained by the prediction interval width initialization method. The input weight matrix β T was then generated by the ELM-AE. The initial output weight matrix w 0 was finally gained by training the LUBE prediction model based on the initial prediction interval and input weight matrix. ... Figure 2. The model initialization approach.

Prediction Interval Initialization
In order to initialize the interval width and estimate initial prediction interval value of the whole training dataset {T U , T L } 0 , cross-validation technology was utilized. In Figure 2, the training dataset {X, T} is first divided for cross validation. In each part, suppose T = XΦ + μ, E(μ) = 0 and Var(μ) = σ 2 I. Then, the prediction error e0 on a single future observation {X0, T0} follows the normal distribution e0 ~N(0, σ 2 (1+X0(X T X) -1 X0 T )) shown in (8) and (9): Therefore, the prediction interval is The PImean of the {X, T} is then calculated as the initial interval width, denoted as B0. To guarantee the expected prediction interval coverage probability φ is satisfied, the B0 should be further adjusted through the binary search algorithm [18]. The actual value of the target T and the initial interval width B compose the initial prediction interval, {T U , T L } 0 , shown as (10): The details of prediction interval width initialization are presented in the following steps:

Prediction Interval Initialization
In order to initialize the interval width and estimate initial prediction interval value of the whole training dataset {T U , T L } 0 , cross-validation technology was utilized. In Figure 2, the training dataset {X, T} is first divided for cross validation. In each part, suppose T = XΦ + µ, E(µ) = 0 and Var(µ) = σ 2 I. Then, the prediction error e 0 on a single future observation {X 0 , T 0 } follows the normal distribution e 0 N(0, σ 2 (1+X 0 (X T X) −1 X 0 T )) shown in (8) and (9): Therefore, the prediction interval is The PI mean of the {X, T} is then calculated as the initial interval width, denoted as B 0 . To guarantee the expected prediction interval coverage probability ϕ is satisfied, the B 0 should be further adjusted through the binary search algorithm [18]. The actual value of the target T and the initial interval width B compose the initial prediction interval, {T U , T L } 0 , shown as (10): The details of prediction interval width initialization are presented in the following steps (see Algorithm 1):

Input Weight Matrix Initialization
In conventional training of the ELM model, its input weights are randomly generated. However, the random input weights have influence on the output weights training, which further impact the prediction performance, especially model training through the heuristic algorithm.
The ELM-AE is capable of learning a useful feature representation [23], which could improve the generalization of the predicted model via projecting the input data into a different dimensional space [24]. ELM-AE has shown good capacity to learn a useful feature representation. The unique differentiation of the specific input data is reduced by the feature transformation. The generalization of the predicted model will be improved via projecting the input data into a different dimensional space.
In ELM-AE, the output data were the same as the input data shown Figure 3. The output weight β represents the information transformation from the feature space to input data. The steps of initializing input weights of ELM through ELM-AE are described in Algorithm 2. The ELM-AE is capable of learning a useful feature representation [23], which could improve the generalization of the predicted model via projecting the input data into a different dimensional space [24]. ELM-AE has shown good capacity to learn a useful feature representation. The unique differentiation of the specific input data is reduced by the feature transformation. The generalization of the predicted model will be improved via projecting the input data into a different dimensional space. In ELM-AE, the output data were the same as the input data shown Figure 3. The output weight β represents the information transformation from the feature space to input data. The steps of initializing input weights of ELM through ELM-AE are described in Algorithm 2. The number of hidden layer nodes of ELM-AE L.

Output:
Input weight matrix of LUBE

Experiment and Results
The bi-hourly solar power data utilized in this paper were collected from a grid-connected photovoltaic (PV) system over two years, from 1 July 2010 to 16 June 2012. The PV system was installed on the rooftop of an academic building located in the Coloane island of Macau. The related two-year data recorded by environmental detector and PV power monitoring in real-time were employed to validate the methods. The data included the date, time, solar radiation, temperature, wind speed, and solar power. In the interval prediction model, the historical time series data of the solar power, P t−2 and P t−1 , and weather data were generated as input variables to predict P t . One-step-ahead prediction was carried out in this section. The majority of the data (70%) were regarded as the training dataset, while the rest were the test dataset.

Parameter Settings
To evaluate the proposed LUBE interval prediction model, several widely used heuristic algorithms, including PSO, DE, SA, and HS, were utilized. The PSO algorithm developed by Kennedy and Eberhart [25] was applied to various fields for its strong convergence performance. The DE algorithm combining the genetic algorithm evolution mechanism with the crossover and mutation operation evolves the population, and DE is suitable to handle non differentiable, such as discrete, problems [26]. SA can accept the worse solution to replace the current optimum by the probabilistic technique, which contributes to high search capacity in a large solution space [27]. HS is a simple meta-heuristic algorithm originated by the improvisation process of jazz musicians, which has been strongly criticized as a special case of the well-established evolution strategies algorithm [28].
The parameter settings of four heuristic algorithms are shown as Table 1. In PSO, the inertia weight linearly decreased from 0.7 to 0.1 in the iteration process. In DE, the crossover constant decreased linearly from 0.3 to 0.1 as the iteration increased. In HS, the pitch adjusting rate and bandwidth descended linearly within the range of (0.05, 1) and (1,50). These algorithms with different characteristics would require different maximum iteration time for an anticipant result. The PSO, which is good at local optimum could converge within a fewer number of iterations. However, SA and HS, as global optimum algorithms, require more iterations to optimize intensively. Thus, the maximum iteration times of the PSO, SA, HS, and DE algorithms re set to 500, 10,000, 2500, and 500, respectively.
In ELM, the number of hidden layer neurons and the tradeoff parameter C were set to 188 and 512 through the point prediction and 5-fold cross-validation technique.
Considering the slight difference of the training and test data, the δ of CWC used equal to 93% in the training set and 90% in the test set, separately. The η was selected as 50 to greatly penalize prediction intervals with a coverage probability lower than δ. In order to leave a certain margin of optimization and avoid being trapped in local optimum, the expected PICP, ϕ, was set at 95%.
The experiments with different initialization approaches and heuristic algorithms were conducted. Each case was repeated five times to reduce the randomness influence of the dataset partitioning and heuristic algorithms. All experiments in this paper were implemented on a personal notebook computer with i5-4210U CPU and the 8 GB memory.

Computational Results
In the experiments, the width initialization, point initialization, and random initialization approaches were abbreviated as WI, PI and RI. The terms w/ ELM-AE and w/o ELM-AE mean the initialization approach with ELM-AE and without ELM-AE, respectively.
Tables 2-5 summarize the average and worst values of the different cases w/ and w/o ELM-AE. Due to the randomness character of the heuristic algorithm, it obtained different results for each optimization, so the result of the average and worst cases can have a comprehensive understanding of the performance and robustness of the algorithms. In Tables 2 and 3, the average case of HS for PI obtained a CWC of 49.36%, but the worst result was 66.4%. In Tables 4 and 5, the average case of SA for WI acquired a CWC of 67.86%, but the worst result was 145.29%, which was almost twice as much as the average case. The model combining PSO with WI w/ ELM-AE produced the best and the most stable prediction result among all the cases.
The training accuracy of WI and PI was similar in Table 2, but the WI behaved more stably than PI in the aspect of the test set. In general, The WI was superior to the PI and RI. The RI performed the worst in all the cases.
Comparing Table 2 with Table 3, the initialization approach with ELM-AE was better than the one without ELM-AE. The ELM-AE can significantly improve the prediction performance of RI in both of the training and test datasets. In PI and WI, the CWC of the training set with ELM-AE was higher than the one without ELM-AE. However, the performance of the test set was the reverse. It is implied that the ELM-AE can reduce the over-fitting phenomenon in the training process, and improve the stability of the test set by impairing the random impact of initial weight.

Result Analysis 1-Initialization Approach
The prediction interval results employing different initialization approaches with ELM-AE and PSO are shown in Figures 4-6. It is clear that most actual power points can be covered in the interval due to the expected PICP equal to 0.93.
In the enlarged views of Figures 4 and 5, both of the predicted boundaries of WI and PI can accurately trace the fluctuation of the power curve and preform similarly.
However, in the turning points, such as the 8th and 20th points in the left view and the 6th and 18th points in the right view, the predicted interval of WI was narrower than PI. Thus, the whole predicted interval of WI was more uniform than PI and its predicted result was better, which is in accordance with Table 2.
In Tables 2-5, for the average test result, the best PINRW for RI was 114.58%, while the worst result for PI and WI was 49.36%. The PINRW of RI was much larger than WI and PI. Thus, the predicted interval of RI intends to employ a universal upper and lower limit to cover as many points as possible, as shown in Figure 6, which has no guidance function.         The CWC convergence curves of PSO for different cases are shown as representative in Figure 7. It is apparent that the CWC initial values in RI were significantly larger than other non-random initialization ways. The curves of RI almost converged around 250 iterations. The WI and PI could achieve stable values around 100 iterations. Besides, the converged value of RI was much larger than the WI and PI. Thus, it is concluded that the RI is not a good choice of LUBE initialization.
However, in the turning points, such as the 8th and 20th points in the left view and the 6th and 18th points in the right view, the predicted interval of WI was narrower than PI. Thus, the whole predicted interval of WI was more uniform than PI and its predicted result was better, which is in accordance with Table 2.
In Tables 2-5, for the average test result, the best PINRW for RI was 114.58%, while the worst result for PI and WI was 49.36%. The PINRW of RI was much larger than WI and PI. Thus, the predicted interval of RI intends to employ a universal upper and lower limit to cover as many points as possible, as shown in Figure 6, which has no guidance function.
The CWC convergence curves of PSO for different cases are shown as representative in Figure  7. It is apparent that the CWC initial values in RI were significantly larger than other non-random initialization ways. The curves of RI almost converged around 250 iterations. The WI and PI could achieve stable values around 100 iterations. Besides, the converged value of RI was much larger than the WI and PI. Thus, it is concluded that the RI is not a good choice of LUBE initialization.  Figures 8 and 9, it is apparent that no matter whether or not the WI and PI utilized ELM-AE, their performances were generally close. In the enlarged views, the predicted interval of WI and PI w/o ELM-AE was narrower than the one with ELM-AE, especially points in the night. This is because the ELM-AE impaired the randomness impact of LUBE, which also reduced the diversity of the solutions and further impacted the optimal solution evolution. Thus, the initialization approach w/o ELM-AE had a higher chance of obtaining the global optimal solution than the one w/ ELM-AE, but it also caused unstable performances due to the over-fitting phenomenon.
When ELM-AE was not utilized in RI in Figure 10, the performance dropped drastically, resulting in the fluctuation range of interval reaching ±200. Thus, the employment of ELM-AE can facilitate RI by reducing the divergence of the model.
To clearly explain the role of ELM-AE, the characteristics of the input weight matrix of ELM in LUBE was analyzed in detail. The rank of the input weight matrix was not influenced. The mean absolute value of the input weight matrix grew down from 0.5014 to 0.1146 and the matrix sparsity dropped from 0.2451 to 0.1368 after adding ELM-AE. Thus, the ELM-AE displays the role of the feature extraction and weakens the overfitting of the trained model.   Figures 8 and 9, it is apparent that no matter whether or not the WI and PI utilized ELM-AE, their performances were generally close. In the enlarged views, the predicted interval of WI and PI w/o ELM-AE was narrower than the one with ELM-AE, especially points in the night. This is because the ELM-AE impaired the randomness impact of LUBE, which also reduced the diversity of the solutions and further impacted the optimal solution evolution. Thus, the initialization approach w/o ELM-AE had a higher chance of obtaining the global optimal solution than the one w/ ELM-AE, but it also caused unstable performances due to the over-fitting phenomenon.

Result Analysis 2-ELM-AE
When ELM-AE was not utilized in RI in Figure 10, the performance dropped drastically, resulting in the fluctuation range of interval reaching ±200. Thus, the employment of ELM-AE can facilitate RI by reducing the divergence of the model.          To clearly explain the role of ELM-AE, the characteristics of the input weight matrix of ELM in LUBE was analyzed in detail. The rank of the input weight matrix was not influenced. The mean absolute value of the input weight matrix grew down from 0.5014 to 0.1146 and the matrix sparsity dropped from 0.2451 to 0.1368 after adding ELM-AE. Thus, the ELM-AE displays the role of the feature extraction and weakens the overfitting of the trained model.

Result Analysis 3-Heuristic Algorithm
To display performances of different heuristic algorithms, the prediction intervals through the WI w/ ELM-AE model optimized by SA and HS are shown in Figures 11 and 12. It is obvious that the lower bounds in Figures 11 and 12 are lower than the one in Figure 4, resulting in the wider prediction interval. The PSO preformed the best among all the heuristic algorithms. In theory, the SA, HS, and DE algorithms have better global optimum search capacity than PSO. However, in the case of LUBE prediction interval, their evolutionary efficiencies were too low and could not obtain a good result in the limited computational time. In Tables 2 and 4, the prediction results of the HS and DE are the same for WI. This is because their optimal solutions, obtained in the initialization, stayed the same in the whole progress due to their low evolutionary efficiency.
DE are the same for WI. This is because their optimal solutions, obtained in the initialization, stayed the same in the whole progress due to their low evolutionary efficiency. Figures 13 and 14 display the predicted interval optimized by SA and HS based on WI w/o ELM-AE. Combined with Tables 3 and 5, the trained LUBE prediction model displayed obvious over-fitting phenomenon. The PSO ha the most serious over-fitting phenomenon among all the heuristic algorithms due to its good capacity of solving optimization problems.   DE are the same for WI. This is because their optimal solutions, obtained in the initialization, stayed the same in the whole progress due to their low evolutionary efficiency. Figures 13 and 14 display the predicted interval optimized by SA and HS based on WI w/o ELM-AE. Combined with Tables 3 and 5, the trained LUBE prediction model displayed obvious over-fitting phenomenon. The PSO ha the most serious over-fitting phenomenon among all the heuristic algorithms due to its good capacity of solving optimization problems.  The iterative time for various heuristic algorithms is another factor affecting the model performance, especially for online prediction. Average computational times for different heuristic  Tables 3 and 5, the trained LUBE prediction model displayed obvious over-fitting phenomenon. The PSO ha the most serious over-fitting phenomenon among all the heuristic algorithms due to its good capacity of solving optimization problems.
The iterative time for various heuristic algorithms is another factor affecting the model performance, especially for online prediction. Average computational times for different heuristic algorithms and initialization approaches are shown in Tables 6 and 7. The training time is directly impacted by the evaluation times of the cost function. The evaluation times of PSO, SA, HS, and DE were 50,000, 50,000, 62,500, and 50,000, respectively. The running time of PSO and DE was close and the SA cost the most computational time. Comparing Table 6 with Table 7, it is obvious that the experiments without ELM-AE ran longer than the ones with ELM-AE in all cases. This is because the ELM-AE makes the input weight matrix of the LUBE sparse, which reduces the computational load and cuts down the time.

Conclusions
Renewable energy generation forecasting technology contributes to decreasing the uncertainty and randomness of renewable resources and can provide essential reference information for the scheduling and operation of the power system. Interval prediction with a statistical confidence level is good at quantifying the uncertainties of the forecasting power. This paper proposed a new LUBE interval prediction framework based on the point prediction technology of ELM. The ELM-AE was employed to generate input weight matrix β T ; then PI width initialization way acquired the initial output weight matrix w0, satisfying the presupposed PICP. Finally, the output weights of ELM were further optimized through a heuristic algorithm. Four algorithms, PSO, DE, SA, and HS, were implemented to verify the performance of the proposed mechanism. Different experimental settings were combined into different contrast experiments to validate and analyze the impacts of different settings on the model performance.
The prediction performance of WI was slightly superior to the property of PI generally. At some power curve turning points, WI could more reasonably constrain the prediction interval and avoid a large prediction margin. The simulation experiments revealed that ELM-AE could significantly decrease the matrix sparsity and the mean absolute value of the input weight matrix, which are statistically equal to 0.5 when the matrix is randomly generated from a uniform random distribution between (-1, 1). The over-fitting of the learned model was weakened and the generalization ability of the model improved when using ELM-AE. The PSO algorithm achieved the best prediction performance among the four algorithms under various situations. The SA, HS, and DE algorithms

Conclusions
Renewable energy generation forecasting technology contributes to decreasing the uncertainty and randomness of renewable resources and can provide essential reference information for the scheduling and operation of the power system. Interval prediction with a statistical confidence level is good at quantifying the uncertainties of the forecasting power. This paper proposed a new LUBE interval prediction framework based on the point prediction technology of ELM. The ELM-AE was employed to generate input weight matrix β T ; then PI width initialization way acquired the initial output weight matrix w 0 , satisfying the presupposed PICP. Finally, the output weights of ELM were further optimized through a heuristic algorithm. Four algorithms, PSO, DE, SA, and HS, were implemented to verify the performance of the proposed mechanism. Different experimental settings were combined into different contrast experiments to validate and analyze the impacts of different settings on the model performance.
The prediction performance of WI was slightly superior to the property of PI generally. At some power curve turning points, WI could more reasonably constrain the prediction interval and avoid a large prediction margin. The simulation experiments revealed that ELM-AE could significantly decrease the matrix sparsity and the mean absolute value of the input weight matrix, which are statistically equal to 0.5 when the matrix is randomly generated from a uniform random distribution between (−1, 1). The over-fitting of the learned model was weakened and the generalization ability of the model improved when using ELM-AE. The PSO algorithm achieved the best prediction performance among the four algorithms under various situations. The SA, HS, and DE algorithms performed poorly in the limited computational time, and the HS and DE algorithms could hardly further optimize the output weight matrix. The performance of the model was also constrained by the limitations of the heuristic algorithms and was related to the algorithm parameters. However, the PSO resulted in the most severe over-fitting phenomenon for a sharp prediction interval. In general, the proposed LUBE model with a new model initialization approach would acquire a faithful prediction interval with more detailed optimization and stable generalization performance.
Although the LUBE approach can forecast the interval covering the solar power accurately, the width of prediction intervals at different times of day was consistent. However, it is apparent that the power value is zero in the night and that the nighttime interval can be narrower. The mechanism of LUBE makes the width of interval in different periods consistent, which deserves improvement in further research. Some normal optimization technique for neural networks also can be added to the prediction model framework to improve the learning performance, such as the ensemble learning of multiple neural networks. The evaluation fitness function transforms the original multi-objective problem into a single-objective problem for simplification. CWC could effectively guarantee the PICP of prediction intervals, but the penalty term also restricts and intervenes in the search for an optimal solution, which results in some feasible solutions being unavailable. In the future, it is expected to explore a new evaluation mechanism that could systematically balance the coverage probability and the width of the prediction interval.

Conflicts of Interest:
The authors declare no conflict of interest.