Short-Term Prediction of PV Power Based on Combined Modal Decomposition and NARX-LSTM-LightGBM

: Recently, solar energy has been gaining attention as one of the best promising renewable energy sources. Accurate PV power prediction models can solve the impact on the power system due to the non-linearity and randomness of PV power generation and play a crucial role in the operation and scheduling of power plants. This paper proposes a novel machine learning network framework to predict short-term PV power in a time-series manner. The combination of nonlinear auto-regressive neural networks with exogenous input (NARX), long short term memory (LSTM) neural network, and light gradient boosting machine (LightGBM) prediction model (NARX-LSTM-LightGBM) was constructed based on the combined modal decomposition. Speciﬁcally, this paper uses a dataset that includes ambient temperature, irradiance, inverter temperature, module temperature, etc. Firstly, the feature variables with high correlation effects on PV power were selected by Pearson correlation analysis. Furthermore, the PV power is decomposed into a new feature matrix by (EMD), (EEMD) and (CEEMDAN), i.e., the combination decomposition (CD), which deeply explores the intrinsic connection of PV power historical series information and reduces the non-smoothness of PV power. Finally, preliminary PV power prediction values and error correction vector are obtained by NARX prediction. Both are embedded into the NARX-LSTM-LightGBM model pair for PV power prediction, and then the error inverse method is used for weighted optimization to improve the accuracy of the PV power prediction. The experiments were conducted with the measured data from Andre Agassi College, USA, and the root mean square error (RMSE), mean absolute error (MAE), and mean absolute percentage error (MAPE) of the model under different weather conditions were lower than 1.665 kw, 0.892 kw and 0.211, respectively, which are better than the prediction results of other models and proved the effectiveness of the model.


Introduction
Solar power generation is safe and reliable and will not be affected by the energy crisis and fuel market instability factors.Photovoltaic power generation is one of the most important forms of solar power generation among many renewable energy sources because of its unique cleanliness, low cost, high efficiency, abundant reserves and low maintenance.Moreover, solar energy is the main renewable distributed energy source used to generate electricity worldwide.Italy reports that grid-connected electricity generation through solar PV power generation is 10 GW, the highest in the world.Thus, whether from the perspective of protecting the Earth's environment or from the perspective of the sustainable development of the Earth's resources, the future of photovoltaic power generation's installed capacity will see significant growth.With the advancement of science and technology in recent years, PV power generation is growing quickly and accounting for an increasing share of power generation.PV systems play an important role, particularly in remote locations with large-scale PV power plants and residential power systems in rural areas [1][2][3][4].
However, solar PV power generation depends on its input, which is essentially stochastic and depends on the solar irradiation intensity.PV electricity output ceases when the sun sets and the solar panels are not illuminated.Even during the day, the output of photovoltaic power can fluctuate greatly due to cloudy and rainy weather.PV grid-connection has a substantial impact on the power system, therefore, in order to better utilize solar energy and vigorously develop PV power generation, accurate and reliable PV power prediction is necessary for power dispatch and allocation [5][6][7][8][9].
At present, there are several techniques for forecasting PV power, which is broadly categorized as conventional techniques and artificial-intelligence-based algorithms.Physical prediction techniques and time series prediction techniques are the two main traditional methodologies.The early stages of PV power prediction have seen the extensive use of the time series prediction method, but it has lower prediction accuracy [10,11].Traditional machine learning models, such as random forest (RF) [12] and support vector machine (SVM) [13,14], which have higher prediction accuracy, are primarily used in applications of artificial intelligence algorithms.Neural networks are gradually becoming more popular in the field of PV prediction as deep learning advances.Numerous studies and experiments on the prediction of PV power by academics have demonstrated that combined prediction models typically produce better prediction outcomes than individual models.Additionally, it can help when an individual prediction model method produces large prediction errors at specific points.In [15], PV power was predicted with relatively high accuracy by recurrent neural network (RNN) model training on historical time series data.However, the gradient disappearance and gradient explosion limit the prediction time range.In addition, the RNN is under a high computational burden in the training phase as it needs a complete database for training, and if the quality of the data is poor, it will have a great impact on the prediction accuracy [16].It is important to choose a model with strong performance.LSTM is widely used in processing problems for longer time sequences to overcome information loss, which is a good solution to the problem of gradient in RNN models [17][18][19].In [20], the authors demonstrate that LSTM improves significantly in prediction accuracy compared to RNN, with lower error RMSE and MAE than other models.The back propagation (BP) neural network technique with time-delayed inputs serves as the foundation for the NARX neural network.The NARX neural network may successfully relate complicated dynamic interactions and react more swiftly to past state information by enhancing the time-delayed feedback connection from output to input.NARX has been used to resolve non-linear series forecasting issues in several disciplines and is appropriate for time series forecasting [21,22].In [23], the authors proposed an RNN (DA-RNN).The concept of the DA-RNN mechanism is to use NARX to attend to the input sequence, followed by an LSTM to investigate the temporal instances.However, NARX may miss the interpretation of parts of the first-level attention under non-smooth weather conditions.In addition, the continuous computation of the pre-weighted inputs obtained from the encoder leads to a large computational effort.Although deep learning models can learn quickly and predict better outcomes, their structure is typically complex and computationally time-consuming, which results in low efficiency.An established and popular integrated learning strategy is the gradient boosting machine (GBM).extreme gradient boosting (XGBoost) is a gradient boosting tool that is incredibly scalable, adaptable, and versatile.XGBoost employs regularization to control the overfitting issue to efficiently utilize resources and get around the drawbacks of earlier gradient boosting algorithms [24].In [25], the authors convert the weather feature vector into a Gram matrix, fully exploiting the intrinsic connection of the data, and using the particle swarm optimization algorithm to find the optimal hyperparameters of XGBoost to complete the PV power prediction; however, it has been concluded that XGBoost is time consuming in model training, leading to high memory usage and computational cost [26].In comparison to XGBoost, LightGBM can speed up model training, uses less memory, allows for parallelized learning, and analyzes massive amounts of data without compromising accuracy [27][28][29].In [26], the authors propose the use of LightGBM to predict PV power and show that it completes the training in a much shorter time (1.39 s) compared to 16.83 s for XGboost, with similar accuracy.The accuracy of the combined model prediction outputs can be increased by giving various models varied weights.Using different weights for each time point of various forecasting models, also known as variable weight combination forecasting [30] or weighting the forecasts of two models using the inverse of error method [31], can both improve accuracy, and the choice of weights is a crucial issue in these studies.All of the aforementioned literature provide direct predictions of the starting PV power, but because of the complicated nonlinear and stochastic nature of the meteorological elements affecting PV power, it is challenging to produce precise forecasts using the initial PV power.
To increase prediction accuracy, modal decomposition techniques and machine learning models can delve deeper into the latent hidden information in the data.Wavelet analysis and empirical mode decomposition are two frequently employed techniques.Both methods can decompose the original signal and can improve the prediction accuracy; however, they both have drawbacks.Wavelet analysis is poorly adapted, and EMD has problems, such as mode aliasing and over-envelope effects [32,33].In [33], the PV power is decomposed by EMD, and the LSTM neural network is built for each intrinsic mode function (IMF) sequence to predict the PV power separately, and finally, the sub-series prediction results are superimposed to obtain the final prediction results; however, the prediction accuracy is limited due to the low correlation between the IMF of a part of the modal aliasing problem, and the sampling frequency of once an hour is not fine-grained enough.The purpose of EEMD [34] is to effectively improve the modal aliasing problem by introducing Gaussian white noise of equal amplitude to smooth the distribution of extreme points.However, the presence of noise signals of a certain amplitude in the decomposition component will affect the IMF quality leading to the degradation of prediction accuracy.CEEMDAN is created by enhancing the algorithmic method for EMD, which incorporates the benefits of both EEMD and EMD while also speeding up the decomposition process [35][36][37].In [36], a prediction framework that combines a gated recurrent unit (GRU) with CEEMADAN is proposed.All of the above literature demonstrates the effectiveness of modal decomposition in PV power prediction; however, most studies have used a single machine learning model, and the generalizability and reliability of these studies remain inadequate.Combinatorial approaches in machine learning are often used to deal with the shortcomings of individual machine learning models as well.
In response to the above questions, the objective of the paper is to propose a new short-term prediction approach that is a combinatorial machine learning prediction model with CD.The performance of the proposed model is proven to be better than other models under different weather types.The main contributions of this paper can be summarized as follows:

•
The decomposition of PV power by a combination of EMD, EEMD, and CEEMDAN was integrated with Pearson correlation analysis.The filtered high correlation IMF features substantially increase the accuracy of the model for PV power prediction; • Preliminary NARX predictions and error correction vector features with high correlation coefficients are added to the NARX-LSTM-LightGBM.This enhanced the ability to parse PV data and the predictive performance of the proposed model in capturing actual PV power trends.Moreover, the models are combined by the inverse of error method, which greatly corrects the accuracy in case of large errors in individual models; • A novel model for PV power prediction is analyzed in depth.The new approach is evaluated using a real dataset.A comparative study of the model performance under six test days (three types of weather) has been performed to reveal the high reliability of the proposed CD-NARX-LSTM-LightGBM model as a competitive model in capturing the time dependence with high accuracy.
The rest of the paper is structured as follows: Section 2 provides the conceptual foundation for constructing the combined prediction model.Section 3 presents the simulation results and discussion, and Section 4 concludes the paper.

Method of the Optimal Weight Determination
Features such as ambient temperature, PV inverter temperature, PV model temperature, solar irradiance, and wind speed influence PV power generation.To avoid the negative impacts of individual features on PV power prediction, the correlation coefficient of each feature variable regarding PV power is calculated using the Pearson correlation coefficient approach.With the following formula, the Pearson correlation coefficient evaluates the linear connection between two continuous variables [38][39][40]: where x is the mean value of the characteristic variable x, and y is the mean value of the characteristic variable y.

Modal Decomposition
For the non-stationary nature of PV power signals, it is necessary to construct the modal decomposition method without a parameter.EMD was recommended by NE.Huang et al. has a method for dissecting and examining nonlinear or non-stationary time series data.The advantage is that the signal is decomposed over the initial time scale of the data, using adaptively created intrinsic modal functions rather than fixed basis functions.EEMD is based on the EMD algorithm, which adds normally distributed white noise to the original signal so that the signal is uniformly distributed throughout the frequency band at the interval of extreme points, which effectively reduces the modal aliasing problem.CEEMADAN is an optimization algorithm based on EMD as well as the EEMD algorithm.The CEEMDAN algorithm enhances the speed of signal decomposition by adding a small amount of adaptive white noise to the EEMD algorithm.This fixes issues with the EEMD algorithm's incompleteness and reconstruction error after adding white noise.The implementation steps of decomposition are listed as follows [41][42][43]: 1.
Adding white noise v i (t) with a standard normal distribution to the raw signal.s(t).
The i-th signal is denoted as s i (t) = s(t) + v i (t), i = 1, 2 . . .EMD decomposes the timing signal to obtain the corresponding sequence I MF i 1 and the residual error vector r 1 (t): Adaptive white noise v i (t) is added to the error, and the experiments are performed i times.Each time, the results are decomposed using EMD r i 1 (t) = x(t) + v i (t) to obtain its first-order component I MF 1 .An error of the 2nd sequence r 2 (t) removed from the 2nd sequence I MF 2 for CEEMDAN decomposition: 3.
To acquire the components that satisfy the conditions and the corresponding errors, the decomposition procedure is repeated.The repetition comes to an end if the error is a monotonic function and cannot be broken down by EMD.The original signal s(t) can be expressed as:

NARX Neural Network
NARX is a recurrent dynamic neural network.It has feedback connections that enclose several layers of the network, and it incorporates two time-delay structures from the signals at the input and output to describe the model of a nonlinear discrete system.The parametric formulas for the NARX neural network are as follows [44,45]: 7) where x(n) and y(n) denote the input and output values of the NARX neural network at the discrete moment n, respectively, and d E and d y , when greater than or equal to 0, are the maximum delay order of the input and output, respectively.NARX's feedback loop and delay mechanism improve the ability to retain historical time series data, which enables better exploration of the non-linear sequence relationships of time series data.The construction of the NARX neural network is shown in Figure 1.
obtain its first-order component 1 IMF .An error of the 2nd sequence 2 ( ) r t removed from the 2nd sequence 2 IMF for CEEMDAN decomposition: 3. To acquire the components that satisfy the conditions and the corresponding errors, the decomposition procedure is repeated.The repetition comes to an end if the error is a monotonic function and cannot be broken down by EMD.The original signal ( ) s t can be expressed as:

NARX Neural Network
NARX is a recurrent dynamic neural network.It has feedback connections that enclose several layers of the network, and it incorporates two time-delay structures from the signals at the input and output to describe the model of a nonlinear discrete system.The parametric formulas for the NARX neural network are as follows [44,45]: ] where ( ) x n and ( ) y n denote the input and output values of the NARX neural network at the discrete moment n, respectively, and E d and y d , when greater than or equal to 0, are the maximum delay order of the input and output, respectively.NARX's feedback loop and delay mechanism improve the ability to retain historical time series data, which enables better exploration of the non-linear sequence relationships of time series data.The construction of the NARX neural network is shown in Figure 1.The NARX neural network has two layers of feedforward networks with a linear transfer function in the output layer and the hidden layer having a sigmoid function σ(x), calculated as: The network has a time-delay structure to store sequential prior values of u(n) and y(n), and the output y(n) is fed back to the input of the network.The input vectors are inserted through two-time delay structures of the input and output signals that demonstrate over-jump connections in the time-expanded network, increasing the capacity of the gradient descent to propagate back with shorter paths.This increases the NARX neural network's capacity for historical data analysis [46].

Long Short-Term Memory
One of the major disadvantages of traditional neural networks is that they are not able to relate the current prediction results to historical time series data well in modeling.RNN introduces the concept of time series and improves this problem by iterating through the cycle, but it is difficult to train when the data is large because there is no forgetting mechanism that causes the gradient to explode or disappear.For the gradient problem, LSTM can be a useful solution.Through its special gate control and memory system, the LSTM neural network can fully use time series data [47,48].The structure of the LSTM neural network is shown in Figure 2.
The network has a time-delay structure to store sequential prior values of ( ) u n and ( ) y n , and the output ( ) y n is fed back to the input of the network.The input vectors are inserted through two-time delay structures of the input and output signals that demonstrate over-jump connections in the time-expanded network, increasing the capacity of the gradient descent to propagate back with shorter paths.This increases the NARX neural network's capacity for historical data analysis [46].

Long Short-Term Memory
One of the major disadvantages of traditional neural networks is that they are not able to relate the current prediction results to historical time series data well in modeling.RNN introduces the concept of time series and improves this problem by iterating through the cycle, but it is difficult to train when the data is large because there is no forgetting mechanism that causes the gradient to explode or disappear.For the gradient problem, LSTM can be a useful solution.Through its special gate control and memory system, the LSTM neural network can fully use time series data [47,48].The structure of the LSTM neural network is shown in Figure 2. The LSTM mitigates the gradient explosion and gradient disappearance problems of the RNN through three gating structures, including the input gate t I , forget gate t f , and output gate t O .These gates are responsible for managing the interaction between memory cells implemented through tanh functions, sigmoid functions, and matrix multiplication.In addition, the forgetting gate has the ability to remove irrelevant states that mislead the prediction process, keeping only the important information to be forwarded to the hidden layer.The value of the forgetting gate ranges from 0 to 1, where a higher value means that the information is the most important to select for retention, and a result of 0 requires complete discard [49].
Here, contrary to the input gate, the output gate checks its effect on the state of other memory cells.The LSTM gates, hidden outputs, and cell states are given as follows [50]: ) The LSTM mitigates the gradient explosion and gradient disappearance problems of the RNN through three gating structures, including the input gate I t , forget gate f t , and output gate O t .These gates are responsible for managing the interaction between memory cells implemented through tanh functions, sigmoid functions, and matrix multiplication.In addition, the forgetting gate has the ability to remove irrelevant states that mislead the prediction process, keeping only the important information to be forwarded to the hidden layer.The value of the forgetting gate ranges from 0 to 1, where a higher value means that the information is the most important to select for retention, and a result of 0 requires complete discard [49].
Here, contrary to the input gate, the output gate checks its effect on the state of other memory cells.The LSTM gates, hidden outputs, and cell states are given as follows [50]: where x t and C t denote the input and storage units at time t, respectively.b, W, and U are the deviation, cycle weight, and input weight of each gate, respectively.h t−1 is the hidden layer of each gate x at the moment of t − 1.The flow of the LSTM neural network operation is shown in Figure 2. Firstly, h t−1 , C t−1 , and x t input information to the LSTM cell.The LSTM gates interact with the input to generate a logic function.The input goes through f t , and a new cell state C t is constructed, quantifying the importance of the input information with 0 and 1 to be used to decide whether the input information is stored or not.Then, f t will update the cell state with the new important information.Finally, the remaining state values are calculated by the hidden layer of the LSTM.

LightGBM
The LightGBM base learner is a decision tree, which supports efficient parallel training.The advantages of LightGBM include faster training, lower memory consumption, better accuracy, and distributed support for the fast processing of large amounts of data.The decision tree algorithm by histogram, which obtains the leaf histogram by subtracting its father node histogram from its sibling node histogram, can be doubled in computational speed.To calculate the information gain, the LightGBM algorithm employs gradient-based one side sampling (GOSS), which not only reduces the number of samples used but also speeds up computation.Only the data with higher gradients are kept, while the data with smaller gradients are discarded.Exclusive feature bundling (EFB) makes several mutually exclusive features bind together, which can achieve the effect of reducing the data dimension [51].The LightGBM algorithm effectively improves the operational efficiency through the leaf-wise growth strategy with depth limitation.It solves the problem that most GBDT tools use the inefficient level-wise decision tree growth strategy, which reduces the efficiency of machine learning by continuing to split and explore when the split gain is not high and achieves higher accuracy when the number of splits is the same [52].

Combined Forecasting Model and Process
The proposed model in this paper is a powerful combination of NARX, LSTM, and LightGBM models.The NARX-LSTM-LightGBM model merges the properties of each prediction model to obtain better results.In the proposed model, NARX is associated with the embedded memory that makes jump connections in the network.NARX is applied to calculate the error correction vector, which is utilized to reduce the dependence and sensitivity of the network structure on the time series.Let E n = [e 1 , . . ., e n ] be the error vector between the actual value Y t = [y 1 , . . ., y n ] and the predicted value of NARX.The residuals are calculated as follows: where F it and w i denote the nonlinear mapping function of the NARX and the corresponding weight values.Firstly, the initial PV power prediction from NARX is used to obtain the error correction vector.Then, the error correction vector and predictions are used as input for the new features.Finally, the LSTM and NARX predict the PV power with the data incorporating the new features as input, respectively.The NARX-LSTM model predictions are obtained by the integration of the two models through the inverse of error method.The formula is as follows: where e 1 is the error of model 1 and e 2 is the error of model 2; w 1 is the weight of model 1 and w 2 is the weight of model 2; v 1 and v 2 are the predicted values of model 1 and model 2, respectively; and f P is the error inverse method weighted average model.Since NARX-LSTM is a combined model of deep learning and the LightGBM is a boosted tree machine learning model with a low correlation between the model principles, LightGBM has achieved better results in the field of prediction, so it combines the two.To obtain the final prediction results, the prediction results of the LightGBM algorithm and the combined NARX-LSTM algorithm are combined using the error inverse method.To better understand the proposed model, the combined prediction model is shown in Figure 3.

model 1 and 2
w is the weight of model 2; 1 v and 2 v are the predicted values of model 1 and model 2, respectively; and P f is the error inverse method weighted average model.
Since NARX-LSTM is a combined model of deep learning and the LightGBM is a boosted tree machine learning model with a low correlation between the model principles, LightGBM has achieved better results in the field of prediction, so it combines the two.To obtain the final prediction results, the prediction results of the LightGBM algorithm and the combined NARX-LSTM algorithm are combined using the error inverse method.To better understand the proposed model, the combined prediction model is shown in Figure 3.To further enhance the exploration of the internal linkage of the historical time series, a prediction method based on the combined modal decomposition is proposed, and the flow chart is shown in Figure 4, which mainly consists of the following parts: 1.After pre-processing the data, only the data in the period of 5:00-20:00 retains.The Pearson algorithm analyzes the correlation of environmental features, selects the environmental variables with strong correlation as the features of the combined Where w 1 and w 2 are the weights of the NARX neural network and the LSTM neural network, respectively; w 3 and w 4 are the weights of the combined NARX-LSTM model and the LightGBM algorithm, respectively.∆ Data is the segmented data set.f P1 is the error inverse method weighted average model of NARX and LSTM; f P2 is the error inverse method weighted average model of combined NARX-LSTM and LightGBM.
To further enhance the exploration of the internal linkage of the historical time series, a prediction method based on the combined modal decomposition is proposed, and the flow chart is shown in Figure 4, which mainly consists of the following parts: 1.
After pre-processing the data, only the data in the period of 5:00-20:00 retains.The Pearson algorithm analyzes the correlation of environmental features, selects the environmental variables with strong correlation as the features of the combined prediction model, and normalizes the features with strong correlation to improve the convergence speed and efficiency of the model.

2.
The EMD, EEMD, and CEEMDAN modal decomposition methods were selected to decompose the original PV power, and the respective modal sequence was combined to construct the feature matrix for correlation analysis, and the sequence features with high correlation and environmental features with strong correlation were selected to integrate into the NARX-LSTM-LightGBM prediction model.

3.
The proposed model predicts three typical types of weather (six test days) and evaluates the performance of NARX-LSTM-LightGBM.

Model Performance Evaluation Indicators
The indicators used in this paper to predict the selected performance evaluation include the MAE, the mean absolute percentage error (MAPE), and the root mean square error (RMSE).These percentage error measures are used because of their independent judgment and the efficiency of the judgment model.The formulas are as follows [53]: where y i and ∧ y i are the actual and predicted values; y is the sample mean.

Data Preprocessing
The performance of the proposed model was evaluated using real data sets, and the experimental study used measured data from Andre Agassi College, USA, to verify the generalization capability of the proposed model.The data set includes seven environmental characteristic variables: ambient temperature, PV inverter temperature, module temperature, irradiance, ambient humidity, wind speed, and wind direction.
The period of the dataset is 1 January 2012-31 December 2014, with a time interval of 15 min and a total of 96 sampling points a day.The time point of a day without PV power is eliminated to increase the efficiency of the model calculation, while the daily period of 5:00-20:00 with a time interval of 15 min and a total of 60 sampling points per day are kept.Missing data are filled by the mean fill method, and min-max normalization is done for the filled data.

PV Power Characteristics Correlation Analysis
The correlation of the characteristic variables has considerable significance to the accuracy of PV power prediction.In this paper, Table 1 [54] is used to define the degree of correlation between each environmental characteristic variable and PV power, and the Pearson correlation coefficients of the relevant environmental characteristic variables in the dataset for PV power are calculated by Equation ( 1) as shown in Table 2, the relative humidity and wind direction are not correlated, which are unfavorable to PV power prediction and therefore are discarded, and the wind speed is weakly correlated with less correlation to PV power, which is also negligible.Thus, we choose the strongly correlated environmental variables as the characteristic variables for prediction, i.e., ambient temperature, PV inverter temperature, PV panel temperature, and irradiance as the characteristic variables for the PV prediction model.

Combinatorial Decomposition to Build New Features
This paper decomposes the original power using EMD, EEMD, and CEEMDAN decomposition methods to reduce the model's complexity, thoroughly excavate the intrinsic information, and obtain 45 sequences to construct the feature matrix.The decomposed results are illustrated in Figures 5-7.For example, in EEMD decomposition, IMF1-IMF9 are high-frequency sequences where the non-smooth data are concentrated, IMF10-IMF12 are medium-frequency sequences that vary with a certain period, and IMF13-IMF16 are lowfrequency components with a larger period, which have little impact on the overall data fluctuations.From the figure, it can be known that the sequence after modal decomposition     Too many sequences will result in low computational efficiency of the combined model, and non-correlated sequences will affect the prediction accuracy of the model, so the correlation analysis leaves a higher correlation with the PV power feature sequences to obtain the feature matrix.This paper selects the sequence with a correlation greater than 0.4 and combines them into a feature matrix.It is known from Table 3 that the correlation of the IMF5 sequence of EMD is 0.802; the correlation of IMF4, IMF5, and IMF6 sequence of EEMD are 0.614, 0.925, and 0.474, respectively; the correlation of IMF5 and IMF6 sequence of CEEMDAN are 0.684 and 0.574, respectively.Thus, these sequences are selected to form the new feature matrix and the rest of the sequence are excluded from the model.Too many sequences will result in low computational efficiency of the combined model, and non-correlated sequences will affect the prediction accuracy of the model, so the correlation analysis leaves a higher correlation with the PV power feature sequences to obtain the feature matrix.This paper selects the sequence with a correlation greater than 0.4 and combines them into a feature matrix.It is known from Table 3 that the correlation of the IMF5 sequence of EMD is 0.802; the correlation of IMF4, IMF5, and IMF6 sequence of EEMD are 0.614, 0.925, and 0.474, respectively; the correlation of IMF5 and IMF6 sequence of CEEMDAN are 0.684 and 0.574, respectively.Thus, these sequences are selected to form the new feature matrix and the rest of the sequence are excluded from the model.

Model Parameters Setting
After several experiments, the prediction model was created by dividing the initial 66,870 sets of data into a training set and a test set in a ratio of 7:3, then feeding the results into a NARX neural network model with 12 hidden layer neurons and the order of the time delay being 8.The output 20,059 PV power test set and residual vector are obtained, and the latter 20,059 sets of the original data are jointly input into the LSTM neural network and LightGBM algorithm, and the correlation analysis shows that the test set power and residual vector have a high correlation of 0.87 and 0.51.The LSTM neural network model uses the ReLU function as the activation function, the optimizer is Adam, batch_size is 32, the maximum number of iterations is 32, and the learning rate is 0.1; the optimization of the LightGBM algorithm hyperparameters is performed using the grid search algorithm to obtain a learning rate of 0.01, the number of base learners (n_estimators) is 15,000, and the number of leaf nodes (num_leaves) is 31 by default.
Using the data from 6 February 2014 to 17 December 2014 as training samples, the PV power is shown in Figure 8.In order to avoid chance error and provide a high generalization capability of the model, a weather type two days in total six days were selected as the sample test set: December 23 (sunny day 1) and December 25 (sunny day 2); December 18 (cloudy day 1) and December 20 (cloudy day 2); and December 17 (rainy day 1) and December 31 (rainy day 2).

Validation of Combined Modal Decomposition
This paper adopts the method of combined decomposition to deeply explore the intrinsic connection between PV power and historical time series to decompose PV power.The average run time of the NARX-LSTM-LightGBM model is 45.23 min, and the average run time of the CD-NARX-LSTM-LightGBM model is 67.59 min.
For the three weather types, sunny, cloudy, and rainy days, the reference models, including LSTM, CD-LSTM, NARX-LSTM-LightGBM, and CD-NARX-LSTM-LightGBM, are constructed, respectively, for PV power prediction experiments contrast.Predictions were made for the above test days, and by observing Figure 9, it can be seen that the four models match the actual PV power prediction curves under sunny weather.The prediction models with combined modal decomposition in cloudy and rainy weather perform better, the prediction curves fit better with the real values, and the overall curves are consistent.

Validation of Combined Modal Decomposition
This paper adopts the method of combined decomposition to deeply explore the intrinsic connection between PV power and historical time series to decompose PV power.The average run time of the NARX-LSTM-LightGBM model is 45.23 min, and the average run time of the CD-NARX-LSTM-LightGBM model is 67.59 min.
For the three weather types, sunny, cloudy, and rainy days, the reference models, including LSTM, CD-LSTM, NARX-LSTM-LightGBM, and CD-NARX-LSTM-LightGBM, are constructed, respectively, for PV power prediction experiments contrast.Predictions were made for the above test days, and by observing Figure 9, it can be seen that the four models match the actual PV power prediction curves under sunny weather.The prediction models with combined modal decomposition in cloudy and rainy weather perform better, the prediction curves fit better with the real values, and the overall curves are consistent.Table 4 shows the results of the error-index contrast table, and Figure 10 shows the histogram stacking of prediction errors for the prediction of PV power by the four models (different colored areas indicate different prediction methods, and their smaller areas indicate smaller errors of the corresponding methods).For example, in the MAE in cloudy 1, the area of the rectangle representing CD-NARX-LSTM-LightGBM (red) is smaller than the area of the other models, and the area of the rectangle representing CD-LSTM (light blue) is smaller than the area of the rectangle representing LSTM (dark blue), and even smaller than the area of the rectangle representing NARX-LSTM-LightGBM (white).In RMSE, MAE, and MAPE, the CD-LSTM model reduces by 24.68%, 29.82%, and 29.82%, respectively, compared to the LSTM model, the CD-NARX-LSTM-LightGBM model compared to the NARX-LSTM-LightGBM model reduced by 56.30%, 58.45%, and 63.04%, respectively, and the CD-LSTM model reduces by 8.52%, 2.56%, and 13.04%, respectively, compared to the NARX-LSTM-LightGBM model.Overall, the prediction performance of the model is improved significantly by adding the high correlation sequence of the combined modal decomposition, and in some cases, the CD-individual model is even smaller than the combined prediction model error, which demonstrates that the machine learning model incorporated with the combined modal decomposition creates favorable conditions for improving the accuracy and reducing the uncertainty of PV power prediction.

Validation of the NARX-LSTM-LightGBM Model
This section further validates the performance of the CD-NARX-LSTM-LightGBM model for PV power prediction.For the three weather types, sunny, cloudy, and rainy days, combined decomposition reference models, including NARX, LSTM, LightGBM, RNN, and GRU, are constructed, respectively, for PV power prediction experiment contrast.The simulation results are shown in Figure 11.

Validation of the NARX-LSTM-LightGBM Model
This section further validates the performance of the CD-NARX-LSTM-LightGBM model for PV power prediction.For the three weather types, sunny, cloudy, and rainy days, combined decomposition reference models, including NARX, LSTM, LightGBM, RNN, and GRU, are constructed, respectively, for PV power prediction experiment contrast.The simulation results are shown in Figure 11. Figure 12 shows the histogram stacking of prediction errors for the prediction of PV power by the six models, and Tables 5-7 show the contrast table of prediction errors.As can be seen from the figure, the area of the rectangle representing the error of the CD-NARX-LSTM-LightGBM model (red) is the smallest among the six models under any weather conditions.Specifically, according to the numerical results presented in Table 5-7, LightGBM has poor prediction results compared to other models during sunny days, but it achieves good results during cloudy and rainy days.The mean RMSE for CD-NARX-LSTM-LightGBM is 1.654 kw and 0.884 kw for cloudy and rainy days, respectively, and the mean RMSE for rainy days is reduced by 46.55% compared to that for cloudy days.Take rainy days as an example, in which the mean RSME of CD-NARX-LSTM-LightGBM is 0.884 kw, while the closest model (CD-RNN) achieves a mean RMSE = 1.585 kw, and the proposed model has a 44.23% reduction in RMSE compared to CD-RNN.In summary, compared to individual models, the RMSE, MAE, and MAPE of CD-NARX-LSTM-LightGBM are lower than 1.665 kw, 0.892 kw, and 0.211, respectively for different weather types, which are better than the predictions of other models.The obtained results confirm the high performance of the proposed model in PV power prediction.According to Figure 11, It can be observed that the prediction performance depends greatly on the type of weather.Specifically, the prediction performance of the LightGBM model decreases significantly on cloudy and rainy days.The PV power has a large amplitude variation at certain times.This is due to characteristics such as high volatility and strong nonlinearity of environmental variables on cloudy and rainy days that directly impact the prediction results.With more detailed information comparing the original LSTM with the proposed model, it can be said that the predicted values and the error correction vector obtained from the initial NARX prediction and combinatorial machine learning models have significantly enhanced its performance in capturing the trend of the actual PV power.Overall, the proposed model performs the best, and its prediction performance of PV power is superior, especially in the two weather types of cloudy and rainy days, and its predicted power curve fits better with the real power curve.
Figure 12 shows the histogram stacking of prediction errors for the prediction of PV power by the six models, and Tables 5-7 show the contrast table of prediction errors.As can be seen from the figure, the area of the rectangle representing the error of the CD-NARX-LSTM-LightGBM model (red) is the smallest among the six models under any weather conditions.Specifically, according to the numerical results presented in Tables 5-7, LightGBM has poor prediction results compared to other models during sunny days, but it achieves good results during cloudy and rainy days.The mean RMSE for CD-NARX-LSTM-LightGBM is 1.654 kw and 0.884 kw for cloudy and rainy days, respectively, and the mean RMSE for rainy days is reduced by 46.55% compared to that for cloudy days.Take rainy days as an example, in which the mean RSME of CD-NARX-LSTM-LightGBM is 0.884 kw, while the closest model (CD-RNN) achieves a mean RMSE = 1.585 kw, and the proposed model has a 44.23% reduction in RMSE compared to CD-RNN.In summary, compared to individual models, the RMSE, MAE, and MAPE of CD-NARX-LSTM-LightGBM are lower than 1.665 kw, 0.892 kw, and 0.211, respectively for different weather types, which are better than the predictions of other models.The obtained results confirm the high performance of the proposed model in PV power prediction.

Conclusions
The main objective of this study is to accurately predict PV power to achieve the integration of additional PV systems into the grid and to further improve energy management.This paper proposed a novel prediction framework based on the combination of NARX, LSTM, and LightGBM with the combination modal decomposition to predict PV power.The conclusions are as follows:

•
The proposed model was effective at the accuracy of prediction as compared to an individual model, which has two apparent advantages.Firstly, the preliminary NARX prediction values and error correction vector features with high correlation coefficients (0.87 and 0.51, respectively) enhance the proposed model's prediction performance in capturing the trend of the actual PV power.Furthermore, when large errors occur in individual models, the combination of prediction models by the inverse error method can largely reduce the impact on the accuracy of the prediction models; • The original PV power was decomposed by EMD, EEMD, and CEEMDAN, and six groups of strong correlation sequences were obtained by correlation analysis, among which the highest correlation coefficient in EEMD was 0.925, which was second only to the irradiance correlation coefficient.The error of the model is obviously reduced after integrating the combined modal decomposition features.Taking rainy day 2 as an example, for RMSE, MAE, and MAPE, CD-LSTM is reduced by 55.12%, 39.88%, and 39.85%, respectively, compared to LSTM, and CD-NARX-LSTM-LightGBM is reduced by 57.19%, 25.43%, and 25.34%, respectively, compared to NARX-LSTM-LightGBM.It was demonstrated that the combined mode decomposition is strongly able to reduce the complexity of the original power curve and improve the accuracy of PV power prediction;

•
The experiment results demonstrated that the proposed CD-NARX-LSTM-LightGBM achieved the lowest RMSE, MAE and MAPE, as compared to other models.The prediction stability of CD-NARX-LSTM-LightGBM is higher than other prediction methods, during the six test days the range of RMSE is stable in 0.399-1.664kw, the range of MAE is stable in 0.136-0.892kw, and the range of MAPE is stable in 0.015-0.221.The prediction effect of GRU fluctuates the most; the best RMSE can reach 0.897 kw, and the worst is only 3.690 kw; • For all investigated PV systems, the proposed CD-NARX-LSTM-LightGBM model has invariably performed better than other models under different climatic conditions, indicating that the proposed model is superior and acceptable.The proposed model helps control the operation of the photovoltaic grid connection, making solar energy become a more economical, efficient, and reliable way to provide energy.Future work will expand the scope to include proposing models for other smart grid applications, including wind power and load prediction.

Figure 1 .
Figure 1.NARX neural network structure diagram.The NARX neural network has two layers of feedforward networks with a linear transfer function in the output layer and the hidden layer having a sigmoid function ( ) x σ

Figure 3 .
Figure 3. Structure of combined prediction network.Where 1 w and 2 w are the weights of the NARX neural network and the LSTM neural network, respectively; 3 w and 4 w are the weights of the combined NARX-LSTM model and the LightGBM algorithm, respectively.DataΔ

Figure 3 .
Figure 3. Structure of combined prediction network.

Figure 5 .
Figure 5. Decomposition results of the PV series using EMD.

Figure 5 .
Figure 5. Decomposition results of the PV series using EMD.

Figure 6 .
Figure 6.Decomposition results of the PV series using EEMD.

Figure 6 .
Figure 6.Decomposition results of the PV series using EEMD.

Figure 7 .
Figure 7. Decomposition results of the PV series using CEEMDAN.

Figure 7 .
Figure 7. Decomposition results of the PV series using CEEMDAN.

Sustainability 2023 , 24 Figure 8 .
Figure 8. Original PV power.3.5.Validation of Combined Modal Decomposition This paper adopts the method of combined decomposition to deeply explore the intrinsic connection between PV power and historical time series to decompose PV power.The average run time of the NARX-LSTM-LightGBM model is 45.23 min, and the average run time of the CD-NARX-LSTM-LightGBM model is 67.59 min.

Figure 9 .
Figure 9.The contrast of predicted and true values before and after modal decomposition.Figure 9.The contrast of predicted and true values before and after modal decomposition.

Figure 9 .
Figure 9.The contrast of predicted and true values before and after modal decomposition.Figure 9.The contrast of predicted and true values before and after modal decomposition.

Sustainability 2023 , 24 Figure 11 .
Figure 11.CD-prediction model Contrast of predicted and real values.

Figure 11 .
Figure 11.CD-prediction model Contrast of predicted and real values.

Table 2 .
Relative coefficients of power and individual characteristics.

Table 3 .
The contrast of sequence correlations of different decomposition methods.

Table 4 .
The error of prediction results before and after combined modal decomposition.