A Short-Term Photovoltaic Power Forecasting Method Combining a Deep Learning Model with Trend Feature Extraction and Feature Selection

: High precision short-term photovoltaic (PV) power prediction can reduce the damage associated with large-scale photovoltaic grid-connection to the power system. In this paper, a combination deep learning forecasting method based on variational mode decomposition (VMD), a fast correlation-based ﬁlter (FCBF) and bidirectional long short-term memory (BiLSTM) network is developed to minimize PV power forecasting error. In this model, VMD is used to extract the trend feature of PV power, then FCBF is adopted to select the optimal input-set to reduce the forecasting error caused by the redundant feature. Finally, the input-set is put into the BiLSTM network for training and testing. The performance of this model is tested by a case study using the public data-set provided by a PV station in Australia. Comparisons with common short-term PV power forecasting models are also presented. The results show that under the processing of trend feature extraction and feature selection, the proposed methodology provides a more stable and accurate forecasting effect than other forecasting models.


Introduction
World energy demand has increased steadily over the years. Today's principal resources are oil and coal, but with finite supplies of fossil fuels and strong evidence of their negative environmental impact [1]. In addition to being non-renewable, fossil fuels also cause serious environmental pollution, which further leads to greenhouse gas emission, acid rain, ozone depletion and fossil fuel combustion [2]. Therefore, there is an urgent need to seek alternate sources of energy that are cleaner and more sustainable. Solar energy is a clean and renewable energy source. With the growing global demand for clean energy, photovoltaic (PV) power will play an important role. In past decades, PV power has attracted more and more attention [3,4]. PV power brings notable environmental benefits and economic results. However, PV power is indeterminate and intermittent, because the output power of a PV system depends on many random factors including global horizontal radiation, temperature, wind, and system components with their own non-linear factors, among others [5]. As a result, large scale integration of PV into the power grid brings many new risks to the operation of the existing power grid system. Increasing the precision of PV power prediction is effective in addressing these challenges.
There are various methods to predict the output of PV power. According to different prediction processes, short-term PV prediction is divided into direct-prediction and indirectprediction [6]. Ref. [7] describes the use of Feature Attention Deep Forecasting (FADF) in a deep neural network to generate global horizontal irradiance forecasting. However, this kind of FADF based on a neural network is very sensitive to the disturbance of training data,

1.
An effective trend feature extraction method is developed to extract the trend feature of PV power; 2.
Trend feature, meteorological data and historical PV power data are used to select the optimal input feature by a FCBF algorithm; 3.
A BiLSTM model is adopted to predict PV power with high accuracy; 4.
The proposed model is compared with different PV forecasting models.
The rest of this paper is organized as follows. Section 2 presents the framework of the proposed method. Section 3 describes the methodology. A case study and comparative analysis are provided and discussed in Section 4. The conclusion is given in Section 5.

Framework of the Proposed Methodology
The hybrid model combined trend feature extraction and feature selection with BiL-STM for day-ahead PV power forecasting, and the proposed methodology was employed to predict PV power in Australia. Figure 1 illustrates the general framework of the proposed methodology, and the procedures are as follows: 1.
Standard normalization and data procession of the original load data are required before performing trend feature extraction. 2.
The processed data are decomposed into multiple IMFs by VMD to extract the trend feature that can reflect the short-term effect of PV power. 3.
The optimal feature-sets of trend feature and the original data are selected by FCBF, and are then integrated as a new input-matrix. 4.
Finally, the optimal input-set is used in the standard BiLSTM model with a 1-D CNN layer to forecast the PV power. Suppose featur e data is and the initial states of sets G and Q ar e empty, i =1.

Variational Mode Decomposition (VMD)
The VMD algorithm considers that successive data are composed of sub-series with different frequencies. The essence of the VMD algorithm is to decompose the successive data into sub-modes that have different center frequencies [31]. PV power may be composed of a trend feature and noise, so the trend feature can be extracted from the PV power by using the VMD algorithm. The flow of the VMD algorithm consisted of the following steps.
Step 1: the number of modes ( K ) to be decomposed need to be preset, and then the original PV power () xt is decomposed into k modes k u , ( 1, 2,..., kK = ). The original data of each mode is obtained by Hilbert transform. By adding an exponential term to adjust the estimated central spectrum, the spectrum of each intrinsic mode function (IMF) is modulated to the corresponding fundamental-frequency-band. The estimated bandwidth corresponding to each IMF is obtained by calculating the norm of the gradient square (L2) of the demodulated signal. The expression is as follows:

Variational Mode Decomposition (VMD)
The VMD algorithm considers that successive data are composed of sub-series with different frequencies. The essence of the VMD algorithm is to decompose the successive data into sub-modes that have different center frequencies [31]. PV power may be composed of a trend feature and noise, so the trend feature can be extracted from the PV power by using the VMD algorithm. The flow of the VMD algorithm consisted of the following steps.
Step 1: the number of modes (K) to be decomposed need to be preset, and then the original PV power x(t) is decomposed into k modes u k , (k = 1, 2, ..., K). The original data of each mode is obtained by Hilbert transform. By adding an exponential term to adjust the estimated central spectrum, the spectrum of each intrinsic mode function (IMF) is modulated to the corresponding fundamental-frequency-band. The estimated bandwidth corresponding to each IMF is obtained by calculating the norm of the gradient square (L2) of the demodulated signal. The expression is as follows: where K is the number of modes to be decomposed, u k is the k-th decomposed IMF of the original data, ω k is the k-th center frequency, ∂ t is the partial derivative of the function to time t, δ(t) is the Dirac distribution function, j represents an imaginary unit, and "*" represents convolution. In the constraint condition of the above Equation (1), X(t) represents the original data to be decomposed. In addition, {u k } = {u 1 , u 2 , ..., u K } is a collection of all modes, {ω k } = {ω 1 , ω 2 , ..., ω K } is a set of center frequencies of each mode, and ∑ k * = K ∑ k=1 * is the sum of modes.
Step 2: Using a quadratic penalty term and Lagrange multiplication operator, the constrained problem is transformed into an unconstrained problem. The quadratic penalty item is used to ensure that the key information of the PV data is not lost in the decomposition process. The purpose of the Lagrange multiplication operator is to force the constraints to have a certain degree of strictness.
The extended Lagrange expression is as follows: where α is the secondary penalty factor and λ(t) is the Lagrangian multiplication operator. An alternating direction method of multipliers (ADMM) is used to update u n+1 k , ω n+1 k and λ n+1 k alternately, so that the minimization problem in Equation (1) is transformed into finding the Saddle point of extended Equation (2) in the iterative sub-optimization series. The calculation equation for updating u n+1 k can be expressed as follows: where ω k is equivalent to ω n+1 k , and ∑ i u n i is equivalent to ∑ i u n+1 i .
Step 3: By using the Parseval Fourier equidistant transform, Equation (3) is transformed from a time domain to a frequency domain, and the expression of each mode in the frequency domain is as follows: To obtain the updated center frequency λ n+1 k of each mode ω n+1 k , it is also necessary to transform the problem of solving the center frequency to the frequency domain to obtain the expression of update ω n+1 k . The expression is as follows: In Equation (5), ω n+1 k is the center frequency of the k-th IMF, and u n+1 i (ω) is equivalent to the Wiener filter of current remaining amountx(ω) − ∑ i =kû n+1 i (ω). Equation (6) is used for updating the Lagrange factorλ n+1 (ω). It has the characteristics of a Wiener filter structure, which directly updates the mode in the frequency domain. Figure 2 shows the flowchart of the VMD algorithm where ∀ε > 0. . It has the characteristics of a Wiener filter structure, which directly updates the mode in the frequency domain. Figure 2 shows the flowchart of the VMD algorithm where 0   .

Fast Correlation-Based Filter (FCBF)
The FCBF algorithm takes symmetrical uncertainty (SU) as the measurement index to define whether a feature is related to the target data or whether it is a redundant feature. Assuming that the target quantity is,  I X Y is affected by the variables, values and units. In order to eliminate this influence, normalized mutual information SU is used to represent the correlation between variables X and Y .
SU is defined by Equation (9):

Fast Correlation-Based Filter (FCBF)
The FCBF algorithm takes symmetrical uncertainty (SU) as the measurement index to define whether a feature is related to the target data or whether it is a redundant feature. Assuming that the target quantity is, Y = [y 1 , y 2 , ..., y j ] the historical PV power and meteorological data are X = [x 1 , x 2 , ..., where H(X) is the information entropy of variable X, H( X|Y) represents the conditional entropy of variables X under the condition of variables Y, I(X; Y) represents the mutual information that the information entropy of variables X decreases after random variables Y are observed. P(x i ) and P(y j ) are the probabilities of variables X = x i and Y = y j respectively. P( x i |y j ) is the rate of variable X = x i under the condition of random variable Y = y j . However, the size of I(X; Y) is affected by the variables, values and units. In order to eliminate this influence, normalized mutual information SU is used to represent the correlation between variables X and Y. SU is defined by Equation (9): It can be seen from (9) that the value of SU(X, Y) is between [0, 1]. The greater the value, the greater the correlation between the two random variables. When SU(X, Y) = 0, Energies 2022, 15, 5410 7 of 20 this means that the two random variables X and Y are not related. On the contrary, when SU(X, Y) = 1, this means that the random variables X and Y are completely correlated.
The steps of feature selection of original data using the FCBF algorithm are as follows.

1.
Delete the feature that is less relevant to the target variable. Take the i-th feature (v i ) in the original data as variable X and the target variable Y as category C. Calculate the SU(v i , Y) of each input feature and Y. If SU(v i , Y) < ξ (ξ is threshold), delete the variable v i , and put the retained feature variables into the set G with an empty initial state.

2.
Analyze redundancy between feature variables. The feature variables in set G are arranged according to the correlation degree of SU(v i , Y) from large to small. Take the feature variable v i with the largest correlation and put it into set Q with empty initial state, and then calculate the SU(v i , v j ) between the remaining characteristic variables v j in set G and v i .
Return to step 2 and repeat the operation to finally obtain the optimized input feature set Q.

Improved BiLSTM Short-Term PV Power Forecasting Model
Long short-term memory (LSTM) greatly improves the model's ability to store historical load data by adding storage units and gate mechanisms. The LSTM model consists of a series of identical modules [32]. LSTM is mainly composed of three gate structures, including the input gate, memory gate and output gate. The input gate is used to input the historical PV power, the memory gate is used to retain the useful information of historical data, irrelevant information is filtered, and the output gate is used to control the predicted PV power [33].
The bidirectional long short-term memory model (BiLSTM) is actually a deformation of LSTM. It is a two-direction LSTM neural network combined with an LSTM neural network that moves from the beginning, and a LSTM neural network that moves from the end of the load data to the beginning. The biggest difference between BiLSTM and LSTM is that the hidden state H t of BiLSTM at time t includes the forward hidden state where x t is the input of time t, c t−1 is the memory component of time t − 1, and c t+1 is the memory component of time t + 1. The structure of BiLSTM and LSTM are presented in Figure 3. The improved BiLSTM is added to a convolution (CNN) layer in the standard BiLSTM. The newly added CNN layer can select the characteristics of the input data, reduce the redundancy of the input feature to a certain extent, and improve the accuracy of short-term PV forecasting.

Data-Set
The DKASC Alice Springs PV system data were selected for experiments in this study [34]. There were 288 observation values every day (time interval is 5 min). The data included active power (kW), wind speed (m/s), weather temperature ( C  ), weather relative humidity (%), global horizontal radiation ( PV power has a certain regularity, and the prediction model needs to learn from the historical PV power data to find its trend feature. The datasets included the real PV power from December 2021 to February 2022 and from June 2021 to August 2021. The data-set was continuous but there were some vacancy values. Before conducting experiments, the Lagrange interpolation method was used to process the original data and fill the vacancy values. The original data and processed data in February are shown in Figures 4 and 5. Because of the large amount of data, the processed result only uses the data in February as an example.
Two sets of experiments were carried out to show the performance of the proposed model both in summer and winter, representing 1-day ahead and 3-day ahead short-term PV power forecasting. In 1-day ahead PV power forecasting, the training data accounted

Data-Set
The DKASC Alice Springs PV system data were selected for experiments in this study [34]. There were 288 observation values every day (time interval is 5 min). The data included active power (kW), wind speed (m/s), weather temperature ( • C), weather relative humidity (%), global horizontal radiation (w/m 2 × sr), and diffuse horizontal radiation (w/m 2 × sr), among others, with 10 dimensions of data in total. The cases used in this paper can comprehensively and systematically evaluate the validity and usefulness of the proposed methodology. PV power has a certain regularity, and the prediction model needs to learn from the historical PV power data to find its trend feature. The datasets included the real PV power from December 2021 to February 2022 and from June 2021 to August 2021. The data-set was continuous but there were some vacancy values. Before conducting experiments, the Lagrange interpolation method was used to process the original data and fill the vacancy values. The original data and processed data in February are shown in Figures 4 and 5. Because of the large amount of data, the processed result only uses the data in February as an example. accounted for 20%, and the rest of the data-set was the testing data; In 3-day ahead PV power forecasting, the training data accounted for 80% of the data-set, of which the training set accounted for 80%, the verification set accounted for 20%, and the rest of the data set was the testing data.

Evaluation Criteria
In order to evaluate the prediction effect of the fusion model for PV power prediction, the mean absolute error (MAE), root mean square error (RMSE) and mean absolute percentage error (MAPE) were applied as evaluation indexes, which can be described as: where, ˆi y is the actual value of PV data, i y is the predicted value, and n is the number of testing points. accounted for 20%, and the rest of the data-set was the testing data; In 3-day ahead PV power forecasting, the training data accounted for 80% of the data-set, of which the training set accounted for 80%, the verification set accounted for 20%, and the rest of the data set was the testing data.

Evaluation Criteria
In order to evaluate the prediction effect of the fusion model for PV power prediction, the mean absolute error (MAE), root mean square error (RMSE) and mean absolute percentage error (MAPE) were applied as evaluation indexes, which can be described as: where, ˆi y is the actual value of PV data, i y is the predicted value, and n is the number of testing points. Two sets of experiments were carried out to show the performance of the proposed model both in summer and winter, representing 1-day ahead and 3-day ahead short-term PV power forecasting. In 1-day ahead PV power forecasting, the training data accounted for 80% of the data-set, of which the training-set accounted for 80%, the verification-set accounted for 20%, and the rest of the data-set was the testing data; In 3-day ahead PV power forecasting, the training data accounted for 80% of the data-set, of which the training set accounted for 80%, the verification set accounted for 20%, and the rest of the data set was the testing data.
Before carrying out the experiments, original data were standardized and normalized. The equation for normalization is as follows: where x is the original PV power data, x max and x min are the maximum and minimum of the data, and x g is the normalized data. From Equation (13), x g can be obtained in the interval [0, 1].

Evaluation Criteria
In order to evaluate the prediction effect of the fusion model for PV power prediction, the mean absolute error (MAE), root mean square error (RMSE) and mean absolute percentage error (MAPE) were applied as evaluation indexes, which can be described as: where,ŷ i is the actual value of PV data, y i is the predicted value, and n is the number of testing points.

Comparison of Different Trend Feature Extraction Models
Different trend feature extraction models were compared taking the data-set in summer as an example for verifying the effectiveness and excellence of the variational mode decomposition (VMD). After many experiments, it was found that if the number of modes decomposed by VMD was too large, the essential information of intrinsic mode functions (IMFs) was lost, which was manifested in the trend feature that did not fit the change of the original load data. When the number of decomposition modes was 2, the trend feature extraction performance of VMD model was best. It can be concluded that VMD can accurately separate noise and trend variables without losing significant information of photovoltaic (PV) power. The original data with a wide time range included three months of PV power. In order to more intuitively reveal the extraction effect of the trend feature, Figures 6-8 shows the original data in three groups. The actual extraction process was carried out with continuous original data. Since the research direction of this paper was short-term PV power forecasting (time interval of 5 min), the trend feature should be extracted from recent PV power. After many experiments, the trend feature selected from the PV power in the same season was found to be most suitable. All IMFs of original data were decomposed by VMD and their comparison with original data are shown in Figures 6 and 7. From Figure 6, in addition to the IMF, which reflects the trend of PV power, there is also certain high-frequency IMF in the original data. IMF1 may be generated by the equipment measuring PV power during the collection process, which affects the performance of the prediction model. From the comparison between each IMF and the original data in Figure 7, it can be seen that IMF1 has almost no relationship with the change of the original data. Only IMF2 is closest to the pattern of the original data. Therefore, IMF2 was selected as the trend feature.

Comparison of Different Trend Feature Extraction Models
Different trend feature extraction models were compared taking the data-set in summer as an example for verifying the effectiveness and excellence of the variational mode decomposition (VMD). After many experiments, it was found that if the number of modes decomposed by VMD was too large, the essential information of intrinsic mode functions (IMFs) was lost, which was manifested in the trend feature that did not fit the change of the original load data. When the number of decomposition modes was 2, the trend feature extraction performance of VMD model was best. It can be concluded that VMD can accurately separate noise and trend variables without losing significant information of photovoltaic (PV) power. The original data with a wide time range included three months of PV power. In order to more intuitively reveal the extraction effect of the trend feature, Figures 6-8 shows the original data in three groups. The actual extraction process was carried out with continuous original data. Since the research direction of this paper was short-term PV power forecasting (time interval of 5 min), the trend feature should be extracted from recent PV power. After many experiments, the trend feature selected from the PV power in the same season was found to be most suitable. All IMFs of original data were decomposed by VMD and their comparison with original data are shown in Figures 6 and 7. From Figure 6, in addition to the IMF, which reflects the trend of PV power, there is also certain high-frequency IMF in the original data. IMF1 may be generated by the equipment measuring PV power during the collection process, which affects the performance of the prediction model. From the comparison between each IMF and the original data in Figure 7, it can be seen that IMF1 has almost no relationship with the change of the original data. Only IMF2 is closest to the pattern of the original data. Therefore, IMF2 was selected as the trend feature.   401  601  801  1001  1201  1401  1601  1801  2001  2201  2401  2601  2801  3001  3201  3401  3601  3801  4001  4201  4401  4601  4801  5001  5201  5401  5601  5801  6001  6201  6401  6601  6801  7001  7201  7401  7601  7801  The VMD model was compared with more common trend extraction algorithms (such as empirical mode decomposition (EMD), and ensemble empirical mode decomposition (EEMD)). These models took the real PV power of a PV power station in Australia as the input, and checked the trend feature extraction effect of each model by observing the fit between the trend feature and true data. Symmetrical uncertainty (SU) was also used to measure the correlation between the trend feature and true values. The comparison results between trend variables extracted by each model and the original data are shown in Figure 8. The operation time and SU of each model are given in Table 1. From Figure 8 and Table 1, the trend feature extracted by VMD better fit the PV power, the operation speed of VMD was the fastest, and it had the best correlation between the extracted trend feature and the original data. The trend extraction effect of EEMD was good, and almost consistent with the change of PV power. However, the fitting degree of the trend feature extracted by EEMD in amplitude was not as good as that of VMD. The VMD model used in this paper fitted the load change of the original data both in amplitude and tendency.

Selection of the Models' Input-Set
During the experiments, this study took 17 dimensions of features which included various meteorological information, historical photovoltaic (PV) power, and trend features. The target value was the PV power on the prediction day, and the fast correlation-based filter (FCBF) algorithm was used to find the most appropriate input-set for all short-term PV power forecasting models. Based on experience summary and driven by prediction error, the optimal input-set selected by the FCBF algorithm is shown in Table 2. The VMD model was compared with more common trend extraction algorithms (such as empirical mode decomposition (EMD), and ensemble empirical mode decomposition (EEMD)). These models took the real PV power of a PV power station in Australia as the input, and checked the trend feature extraction effect of each model by observing the fit between the trend feature and true data. Symmetrical uncertainty (SU) was also used to measure the correlation between the trend feature and true values. The comparison results between trend variables extracted by each model and the original data are shown in Figure 8. The operation time and SU of each model are given in Table 1. From Figure 8 and Table 1, the trend feature extracted by VMD better fit the PV power, the operation speed of VMD was the fastest, and it had the best correlation between the extracted trend feature and the original data. The trend extraction effect of EEMD was good, and almost consistent with the change of PV power. However, the fitting degree of the trend feature extracted by EEMD in amplitude was not as good as that of VMD. The VMD model used in this paper fitted the load change of the original data both in amplitude and tendency.

Selection of the Models' Input-Set
During the experiments, this study took 17 dimensions of features which included various meteorological information, historical photovoltaic (PV) power, and trend features. The target value was the PV power on the prediction day, and the fast correlation-based filter (FCBF) algorithm was used to find the most appropriate input-set for all short-term PV power forecasting models. Based on experience summary and driven by prediction error, the optimal input-set selected by the FCBF algorithm is shown in Table 2.

Comparison and Analysis of Prediction Results
Different short-term photovoltaic (PV) power forecasting models were introduced to predict PV power in different horizons. The results of the improved bidirectional long short-term memory (BiLSTM) model and extreme learning machine (ELM), BiLSTM, long short-term memory (LSTM) and gate recurrent unit (GRU) models were compared with 1-day ahead and 3-day ahead horizon for short-term PV power prediction. All experiments were repeated more than 10 times, and results averaged in multiple repeated runs.
The historical data of PV power between June 2021 and August 2021 (Winter), and between December 2021 and February 2022 (Summer), were used to predict PV daily power 1-day ahead (288 points in total) and 3-days ahead (288 × 3 points in total), respectively. Short-term PV power prediction results of different models are shown in Figure 9. According to the evaluation measures of mean absolute error (MAE), root mean square error (RMSE) and mean absolute percentage error (MAPE), short-term PV power prediction results of different forecasting models are reported in Tables 3 and 4 and in Figure 10. From the experimental results, the 1-day ahead and 3-day ahead prediction results both in summer and winter of the proposed methodology were the closest to the true values, and the prediction performance of this model was the best of the five models. The comprehensive error index analysis results in Tables 3 and 4 Figure 10 show that the MAE, RMSE and MAPE of the improved BiLSTM model are the lowest, indicating that the prediction performance of this model was the best among the five load forecasting models. In summer, if the trend feature extraction was added to the forecasting model, for 1-day ahead forecasting, MAE, RMSE and MAPE of the model was reduced 7.69, 13.0 and 19.0% respectively. For 3-day ahead PV power forecasting, the MAE, RMSE and MAPE of the model were reduced 11.1, 24.2 and 22.23%, respectively. If feature selection was added to the model, for 1-day ahead forecasting, the MAE, RMSE and MAPE of the model were decreased by 13.3, 28.6 and 1.63%, respectively, while for 3-day ahead forecasting, the MAE, RMSE and MAPE of the model were reduced by 15.8, 28.6 and 25.95%, respectively. In winter, the prediction effect of this model was more effective, and the MAPE of 1-day ahead prediction could reach 5.21%. The prediction precision in winter was better than that in summer, which may be because the change of weather in winter is less intense than that in summer, and the feature of PV power is more stable, which lead to higher accuracy.       From the Figure 9, the error between the predicted value and the true value is large in the high radiation situations. It may be affected by the feature of radiation-diffusion-limited, which tends to be stable at the peak of PV power, resulting in the predicted values being lower than the true values.

and in
The experiments were conducted using a personal computer with a Python 3.9, 64-bit operating system, 16.00 GB of RAM, and Intel(R) Core(TM) i7-10510U CPU @ 1.80 GHz 2.30 GHz. The performance of the improved BiLSTM model based on trend feature extraction and FCBF feature selection was improved, mainly as reflected in the MAE, RMSE and MAPE of forecasting results of the model in summer and winter. Even if the improved BiLSTM model did not carry out feature selection and trend feature extraction, its prediction error was still the lowest among all models in summer, indicating that the model was superior and has prospects in short-term PV power prediction. When feature selection was added to the model, the accuracy of both 1-day ahead prediction and 3-day ahead prediction was further improved which shows that feature selection cannot be ignored before prediction. In the experiment concerning the model's trend feature input, the addition of the trend feature had a great effect on improvement of 3-day ahead prediction results. The trend feature provides the current effects of PV power change in the region, reducing the influence of outliers on the prediction model, improving the performance of the prediction model and improving the accuracy of load forecasting effectively. In general, the trend feature can account for the timing characteristics among PV power, and plays a great role in multi-day ahead PV power forecasting.

Conclusions
This article proposes an improved bidirectional long short-term memory (BiLSTM) model combined with trend feature extraction and feature selection for accurate short-term photovoltaic (PV) power forecasting. First, the original PV data is disintegrated to extract the trend feature of the PV power according to the principle of variable mode decomposition (VMD). Second, the optimal feature set is selected from trend feature, historical PV data and meteorological factors by a fast correlation-based filter (FCBF) algorithm, to lay a foundation for accurate short-term PV power prediction. Finally, the optimal feature set is applied to the improved BiLSTM model for training and testing. The proposed method was compared with the commonly used short-term forecasting models, such as BiLSTM, LSTM, GRU and ELM.
As illustrated in the simulation results in Tables 1-4 and Figures 6-10, the main  conclusions are as: 1.
The collection process of PV power is complex, and the problem of outliers cannot be avoided. Therefore, the extraction of the trend feature with the VMD algorithm can reduce the prediction error caused by outliers and help the prediction model to fully learn the long-term time characteristics and short-term effects on PV power in this region.

2.
Reducing the error caused by high dimension of input variables using the FCBF algorithm to extract the optimal input feature set of the prediction model is conducive to improving the performance and efficiency of the prediction model.

3.
Compared with the commonly used short-term PV power forecasting model, the improved BiLSTM forecasting model can selectively combine historical information and trend information for PV power forecasting because of its unique structure.
In summary, this study starts from the time characteristics of PV data and considers the influence of relevant characteristics on the prediction results. Trend feature extraction and feature selection can provide high-quality load data for follow-up training of the prediction model, preventing the prediction model from learning from outliers and improving the prediction accuracy. With its special structure, the BiLSTM model can better extract longterm information in PV power. The performance of the proposed forecasting methodology is relatively stable and can provide an important reference for power grid planning, and power grid stable operation, and can assist technicians to reasonably arrange a dispatching plan of the power grid.
There are still many aspects that require further research.
(1) Symmetrical loss function is used in this model, and the influence of asymmetric loss function on the prediction effect of the model is not considered. Next, we will conduct in-depth research in this direction. (2) An FCBF algorithm is used to extract the best feature set of PV prediction. In the FCBF algorithm, it is necessary to set the threshold of symmetrical uncertainty (SU) to extract features. A trial-and-error method was used to select the appropriate threshold in this paper. Next, we will try to use an optimization algorithm combined with the FCBF algorithm to select the optimal feature-set effectively.