An Effective Hybrid Symbolic Regression–Deep Multilayer Perceptron Technique for PV Power Forecasting

Trabelsi, Mohamed; Massaoudi, Mohamed; Chihi, Ines; Sidhom, Lilia; Refaat, Shady S.; Huang, Tingwen; Oueslati, Fakhreddine S.

doi:10.3390/en15239008

Open AccessArticle

An Effective Hybrid Symbolic Regression–Deep Multilayer Perceptron Technique for PV Power Forecasting

by

Mohamed Trabelsi

^1,*

,

Mohamed Massaoudi

^2,3

,

Ines Chihi

⁴

,

Lilia Sidhom

^5,6

,

Shady S. Refaat

⁷

,

Tingwen Huang

⁸ and

Fakhreddine S. Oueslati

⁹

¹

Electronics and Communications Engineering Department, Kuwait College of Science and Technology, Doha P.O. Box 27235, Kuwait

²

Department of Electrical and Computer Engineering, Texas A&M University at Qatar, Doha 23874, Qatar

³

Department of Electrical and Computer Engineering, Texas A&M University, College Station, TX 77843, USA

⁴

Department of Engineering, Faculty of Science, Technology and Medicine, University of Luxembourg, L-135 Luxembourg, Luxembourg

⁵

LAPER, Faculty of Sciences of Tunis, El Manar University, Tunis 1068, Tunisia

⁶

National Engineering School of Bizerta, Carthage University, Tunis 7035, Tunisia

⁷

Engineering, and Computer Science, University of Hertfordshire, Hatfield AL10 9AB, UK

⁸

Arts and Sciences Department, Texas A&M University at Qatar, Doha 23874, Qatar

⁹

Laboratoire Matériaux, Molécules, et Applications (LMMA) à l’IPEST, Carthage University, Tunis 1054, Tunisia

^*

Author to whom correspondence should be addressed.

Energies 2022, 15(23), 9008; https://doi.org/10.3390/en15239008

Submission received: 9 November 2022 / Revised: 21 November 2022 / Accepted: 23 November 2022 / Published: 28 November 2022

(This article belongs to the Section F: Electrical Engineering)

Download

Browse Figures

Versions Notes

Abstract

The integration of Photovoltaic (PV) systems requires the implementation of potential PV power forecasting techniques to deal with the high intermittency of weather parameters. In the PV power prediction process, Genetic Programming (GP) based on the Symbolic Regression (SR) model has a widespread deployment since it provides an effective solution for nonlinear problems. However, during the training process, SR models might miss optimal solutions due to the large search space for the leaf generations. This paper proposes a novel hybrid model that combines SR and Deep Multi-Layer Perceptron (MLP) for one-month-ahead PV power forecasting. A case study analysis using a real Australian weather dataset was conducted, where the employed input features were the solar irradiation and the historical PV power data. The main contribution of the proposed hybrid SR-MLP algorithm are as follows: (1) The training speed was significantly improved by eliminating unimportant inputs during the feature selection process performed by the Extreme Boosting and Elastic Net techniques; (2) The hyperparameters were preserved throughout the training and testing phases; (3) The proposed hybrid model made use of a reduced number of layers and neurons while guaranteeing a high forecasting accuracy; (4) The number of iterations due to the use of SR was reduced. The presented simulation results demonstrate the higher forecasting accuracy (reductions of more than 20% for Root Mean Square Error (RMSE) and 30 % for Mean Absolute Error (MAE) in addition to an improvement in the

R^{2}

evaluation metric) and robustness (preventing the SR from converging to local minima with the help of the ANN branch) of the proposed SR-MLP model as compared to individual SR and MLP models.

Keywords:

hybrid model; genetic algorithm; PV power forecasting; symbolic regression; deep multi-layer perceptron; MLP

1. Introduction

Recently, the world has been witnessing an increasing interest in Renewable Energy (RE). RE refers to ecological resources that produce electricity from free and inexhaustible energy, with no emission of greenhouse gases. On the one hand, energy security accompanied by environmental worries are a major concern when utilizing fossil fuels for electricity production. On the other hand, the deployment of RE ensures the supply of electricity to isolated sites without the creation of new transmission and distribution lines. These are the reasons why most countries have made the use of RE a priority in their energy policy toward achieving Sustainable Development Goals (SDGs). For instance, a total of 509 GW of generated Photovoltaic (PV) power was recorded by the end of 2018, with an increase of 102 GW in comparison with 2017 [1]. The large accessibility of this type of RE in many locations in the world would enormously impact energy security, the environment, and economic growth, which justifies intensive research and development efforts in this direction. Thus, the use of PV power appears to be an obvious choice when targeting a massive reduction in

C O_{2}

emissions worldwide in the next decade. Nevertheless, the generated PV power depends mainly on the weather parameters, which continuously vary during the day [2,3,4]. Usually, PV power is forecasted over single or multiple steps ahead [5]. An accurate power forecasting is thus mandatory as it might prevent PV power plants from sudden interruptions and total collapse [6,7].

Forecasting models ensure the effective operation of unit commitments and fast proactive dispatches to the grid utility [8]. These techniques are classified into short, medium, and long-term prediction, depending on the forecasting horizon [9]. PV power is predicted using a comprehensive analysis of weather parameters, including temperature, irradiance, and dust [10]. The prediction process is usually implemented using numerical weather prediction techniques [11]. Markov chains have been widely employed for weather and power prediction because the estimated power is unaffected by prior forecasts. Moreover, data-driven approaches using domain knowledge have been widely implemented to estimate the prospective behavior of energy systems [12].

On the other hand, physical models have been extensively used (despite their poorer accuracy) to define the actual PV power output [13,14]. The stochastic prediction of weather data is performed by statistical time series algorithms such as Auto-Regressive Moving Average (ARMA) and exogenous input-based Auto-Regressive Moving Average (ARMAX) to indirectly predict PV power [15]. The work in [16] presented many ARMA-based techniques for short-term PV power prediction such as Seasonal Auto-Regressive Integrated Moving Average (SARIMA), modified SARIMA, exogenous inputs-based SARIMA (SARIMAX), and optimized SARIMAX. The numerical simulations verified that statistical methods are only reliable in the summer season, with the superiority of the Optimized Combined SARIMAX [16]. Moreover, the accuracy decreases significantly due to unstable weather conditions.

The authors of [17] investigated the performance of several hybrid PV power forecasting techniques (nonlinear models) such as least squares support vector regression (LSSVR), feedforward neural network (FFNN), and exogenous input-based auto-regressive models (NARX). The comparison showed that the applied FFNN marginally outperforms the standard models. Alternatively, genetic algorithms (GAs) have shown potential functionality in time series forecasting for various applications. GAs mimic biological evolution by using various duplications of their components. The architectural structure is carried out by individual selection, mutation, and crossover. GAs offer key perceptions for developing effective models such as machine learning (ML), intelligent search, and deep learning (DL) [18] for many applications. For instance, the authors of [19] presented a comprehensive review on the use of Artificial Neural Networks (ANN), Support Vector Machines (SVM), and Extreme Learning Machines (ELM) in smart grids. The work in [20] discussed an interesting application of DL Neural Networks that improves the accuracy of short-term PV power forecasting using sky images. The authors concluded that the cloud coverage rate was a key feature that led to the improvement of the short-term forecasting accuracy by 2%. In [21], the application of supervised, unsupervised, and statistical ML techniques with minimum input features was discussed for PV fault detection. Another application of ML was proposed in [22], which investigated the energy storage properties of novel ionic liquid–MXene hybrid nanofluids. Two modern ensemble ML techniques—quadratic support vector regression and Matern 5/2 Gaussian process regression—were employed to predict the model of the specific heat, viscosity, and thermal conductivity of the above-mentioned nanofluids. Numerous statistical indicators (such as correlation coefficient and RMSE) were taken into consideration to assess the performance of the proposed ML techniques. Moreover, a Random Forest (RF) algorithm was proposed in [23] to predict wind power generation. Then, the wind power forecasting data were correlated with the predicted load demand to determine the day-ahead optimal energy of a pumped hydro energy system.

For PV power forecasting, it has been remarkably noticed that in most cases, the accuracy falls when the time horizon increases. The increase in the forecasting accuracy by using DL models with multiple layers has gained increasing interest in PV power prediction. For instance, the authors of [15] investigated the use of DL algorithms such as Long Short Term Memory (LSTM), Convolutional Neural Network (CNN), and a hybrid LSTM–CNN model in PV power prediction. It is worth mentioning that the hybrid LSTM–CNN model showed a higher prediction accuracy in comparison with the LSTM and CNN predictors. Indeed, the model selection criteria depend on the complexity, computational cost, forecasting horizon, and accuracy requirements.

Hybrid structures and ensemble models are usually applied to improve the performance of the current approaches. For instance, a hybrid model using the Radial Basis Function Neural Network (RBFNN), Wavelet Transform (WT), and Particle Swarm Optimization (PSO) has been proposed in the literature to increase the effectiveness of prediction. Hybrid models aim to predict the PV power for various time ranges (from 1 h to 3 days across multiple seasons, and for sunny, cloudy, and rainy days). In the analyzed case study, WT has a major contribution to the precision of the hybrid system by smoothing the input data. In [16], the authors combined PSO, GA, and Adaptive Neuro-Fuzzy Inference Systems (ANFIS) to reduce the error value. The GA-PSO-ANFIS hybrid method was tested on real-world data from the Goldwind microgrid system in Beijing. According to the current literature study, most of the work achieved to date tends to emphasize short-term (e.g., hourly and daily) PV power forecasting. Only a few studies have been conducted to address a longer time horizon due to the decrease in the forecasting performance with time horizon expansion. In this study, monthly PV power forecasting was achieved by using a perfectly tailored hybrid model with excellent accuracy. The extrapolation ability of the proposed model was verified on a real PV power plant. Table 1 illustrates the state-of-the-art work for PV power forecasting.

This paper’s main contributions are the following:

A novel feature selection technique was employed to investigate the feature patterns;
A novel hybrid algorithm was explored for PV power forecasting;
A fair evaluation was presented by showing the numerical and graphical performances of the proposed hybrid model.

The rest of the paper is structured as follows: Section 2 briefly defines the models employed in the simulation process and formulates the problem statement for the paper. Section 3 presents the proposed architecture and explains the adopted structure. In Section 4, a case study is provided for the validation of the proposed model along with a comprehensive interpretation of the simulation results. Finally, Section 5 summarizes the proposed techniques and concludes the study.

2. Background and Proposed Architecture

The separate stages of the proposed forecasting system, including the Symbolic Regression (SR) and Multi-Layer Perceptron (MLP), are comprehensively explained in this section. The main emphasis is on the problem statement of the proposed architecture.

2.1. Symbolic Regression

Unlike most of the ML techniques that make use of a pre-defined parametric function (prior assumptions of the fitness procedure), SR, which is considered to be an evolutionary algorithm, creates a formula that is fit to a proposed database [38] throughout the training phase. The structure of the SR is seen as a set of coordinating tree functions (Figure 1).

As illustrated in Figure 1, the inner nodes represent the start of a mathematical process completed by a leaf. During the training phase, the technique examines the parameter patterns in the dataset using analytic functions and state variables. From a hierarchical perspective, the inputs are supplied to the system, and the fitness functions are built during the initial iteration. The error values are generated by mutations and crossovers. Gene replication then generates the descendants’ offsprings, thus replacing the first generation and producing the final symbolic function.

The goal is to create new entities from stronger genes that respect the Darwinian–survival of the most appropriate strategy [39]. Mutations are then employed randomly to reduce the root mean square error (RMSE) values. The symbolic function is fixed once the error hits the minimum threshold, and the training part ends to allow for the evaluation process to take place. It is worth noting that only supervised problems are compatible with SR as the database is the main factor in designing the symbolic functions. The biggest advantage of most of the SR models is the fact that they allow the dataset itself to select the best function that corresponds to the lowest RMSE. The most important parameters of SR are generation, stopping criteria, population size, and mutation point. Despite the aforementioned features of SR, the main drawback of such models is the large search space with infinite generation. A large number of local minima slows down the search process. Therefore, there is a risk that the model will be fooled with false sub-optimal solutions.

2.2. Deep Multi-Layer Perceptron

The MLP model is a deep FFNN consisting of input, hidden layers, and output layers [40]. In such a model, the information transmission is unidirectional. It was shown in [41] that the perceptrons are triggered by nonlinear activation functions, including sigmoid, hyperbolic tangent function (tanh), Rectified Linear Unit (ReLU), and a normalized exponential function (Softmax) [17]. These functions are computed as follows [17]:

Sigmoid (x) = \frac{1}{1 + e x p (- x)}

(1)

\tan h (x) = \frac{2}{1 + e x p (- 2 x)} - 1

(2)

ReLu (x) = m a x {0, x}

(3)

Softmax {(x)}_{j} = \frac{e^{x_{j}}}{\sum_{j = 1}^{k} e^{x_{k}}}

(4)

where

x =

(

x_{1}

,…,

x_{k}

) represents the input samples, k denotes the total values, and

x_{j}

is the input sample at j time step. Each neuron is characterized by a bias, while the importance of each connection is defined by its specific weight. During the transmission, the weighted inputs

w_{i} x_{i}

are summed up with the bias value b, as per [42].

y_{i} = w_{i} x_{i} + b

(5)

After applying the nonlinear activation function to the residual, the last value is conveyed to the next layer. The same procedure is duplicated until the final output is obtained using the formula below [42]:

y = Φ (\sum_{i = 1}^{n} w_{i} x_{i} + b) = Φ (w^{T} x + b)

(6)

where y denotes the system output and

Φ (.) : R \to R

is the nonlinear activation function. The design of the MLP is presented in Figure 2, where the nodes are interconnected via weighted links.

Finally, the back-propagation concludes the training process, where the bias and weights are tuned based on the disparity between the actual and predicted values (loss function). The learning rate is identified by the gradient-based optimization algorithm at each iteration. MLP, considered as the initial form of DL due to the high number of neurons and layers [43], has been used in many applications to address supervised problems involving Natural Language Processing (NLP), regression, and classification algorithms. This model is characterized by its ability to effectively handle nonlinear problems. However, its main disadvantages are its sensitivity to the input scaling and redundancy in high-dimensional space. In short, the hyperparameter tuning, initial bias, and weights’ values as well as the type of activation function have a big impact on the model architecture and the accuracy/rapidity of convergence.

2.3. Genetic Programming

Genetic programming (GP) is very similar to GA. GP was first introduced in [44], where the presented architecture consisted of a series of tree structures, and the final function was constructed from the operational functions (nodes). Moreover, GP offers more flexibility with fewer invalid states as compared to GAs. GP makes use of a set of commands such as Auto-Defined Loop (ADL), Auto-Defined Recursion (ADR), Auto-Defined Iteration (ADI), and Auto-Defined Function (ADF) [45]. The iterative solution-finding process of GP is illustrated in Figure 3.

2.4. Problem Formulation

The nonlinearity of PV power parameters is due to the used meteorological data, which are linearly independent. The mathematical relationship between the weather parameters is represented by the formula below [46]:

P_{P V} = V_{p v} N_{p} \frac{I_{s c} + K_{i} (T - T_{r e f}) G}{G_{r e f}} - I_{d} - I_{s h}

(7)

where

P_{P V}

is the generated PV power, G is the irradiation, and T is the cell temperature. Moreover,

G_{r e f}

,

T_{r e f}

,

K_{i}

,

I_{d}

,

V_{p v}

,

N_{p}

, and

I_{s c}

are the reference irradiation, reference temperature, temperature coefficient, diode current, PV voltage, number of parallel cells, and the short-circuit current, respectively. The generated PV power from a single module using the Australian weather dataset is illustrated in Figure 4.

As can be seen in Figure 4, the PV power output is highly correlated with the temperature and the irradiation, which has adverse effects on the grid utility. Numerous weather parameters affect the seasonality of the PV power generation which in turns affects the grid stability and unit commitment. This paper proposes efficient and accurate PV power forecasting for 30 days ahead in order to lessen the effects of weather variation on the utility grid by providing information about future PV generation.

3. Hybrid Model

The hybrid model proposed in this paper is built by combining SR, GA, and deep MLP models. The SR-MLP is represented by a group of sub-trees with heterogeneous units. The role of the mutation is to find a local minimum between the models, taking into account the residuals of the offsprings. In contrast to the ensemble models that combine homogeneous models, the proposed predictor merges two heterogeneous tree structures. The first model uses an iteration of mathematical operators, while the second one makes use of multiple neurons and weight/bias adjustments. The final output is obtained in the last layer with the use of a single operator. The application of this technique implies a kind of transfer learning that reduces losses and noise. The dataset undergoes a feature selection process to eliminate missing or erroneous values resulting from sensor damage or recorded errors. MLP and SR are then trained separately, and the voting method is used to average the two predictors’ outputs. Extensive simulation analysis has been conducted to provide the ML technique that best meets the performance requirements of the underlying forecasting task. To the authors’ best knowledge, the symbolic regression model has never been joined together with others for PV power forecasting. Moreover, the hybridization of the MLP and Symbolic Regression (SR) has never been tailored to perform a prediction task. Figure 5 shows the flowchart of the proposed predictor.

The proposed forecasting algorithm performs as follows: The weather station generates precise information about the meteorological parameters of the PV system, such as temperature, irradiation, and relative humidity. The data collected are fed to the second bloc, where an extensive feature selection process is employed to clean the data of erroneous values and outliers. The proposed feature selection tool classifies the inputs according to their importance to optimize the data processing. Then, the selected feature vectors are fed to the third bloc, which represents the SR-MLP model. The proposed model generates monthly PV power forecasts that are to be used in the scheduling of the operative conditions control for the grid utility (Figure 5).

4. Case Study

4.1. Features Selection

The dataset includes open-source measurements of meteorological parameters taken with a 5-min time step in a solar farm in Australia [47]. The training was performed using data recorded between 1 January 2017 and 31 December 2018, while the testing was performed using the data from January 2019. To optimize the number of feature inputs and limit the database size in order to speed up the training process, two effective feature selection methods were applied at the same time (Extreme Boosting and Elastic Net) [48,49]. The attribute selection contributes to the system by removing the irrelevant and correlated features from the dataset. Having these techniques combined leads to a more reliable collection (a different way of measuring the parameter magnitude). Figure 6 illustrates the feature selection results, while the irradiation and former PV power values over two years are illustrated in Figure 7 and Figure 8.

According to Figure 6, the horizontal radiation (irradiation) and the former PV power values (historical values recorded at the same time in the previous year) are the most important features compared to the other parameters. Figure 7 shows the high seasonality of the irradiation. However, this seasonality does not match the behavior of the variation of the recorded PV power, which is characterized by high variations and nonlinear trends (Figure 8). This analysis is crucial for the determination of the most significant indicators for the next PV power predictions. It is worth noting that 70% of the data was used for training, while the remaining 30% was devoted to testing and validation.

4.2. Training and Simulation Results

The performance of the proposed model is investigated in this section. For an improved learning process, data were pre-processed to eliminate missing and erroneous data. The resulting data were then unified between 0 and 1. This re-scaling allowed for a better understanding of the functional (features) behavior of the MLP model in particular. Both training and testing phases were performed using PYTHON. The hyperparameters of each model were designated with the use of a Random Search method. The RS was selected due to its excellent performance for hyperparameter optimization as compared to several benchmarks, including PSO and Bayesian Optimization (BO). The MLP consisted of 3000 layers, while the SR had 25 iterations. The performance evaluation was implemented through simulation graphs, cross-validation, and score metrics (mean absolute error (MAE), RMSE, and coefficient of determination (

R^{2}

)), as per [50,51,52]:

M A E = \frac{1}{n} \sum_{i = 0}^{n - 1} | y_{i} - \hat{y_{i}} |

(8)

R M S E = \sqrt{\frac{1}{n} \sum_{i = 0}^{n - 1} {(y_{i} - \hat{y_{i}})}^{2}}

(9)

R^{2} = 1 - \frac{\sum_{i = 0}^{n - 1} {(\hat{y_{i}} - y_{i})}^{2}}{\sum_{i = 0}^{n - 1} {(\bar{y_{i}} - y_{i})}^{2}}, \bar{y} = \sum_{i = 0}^{n - 1} y_{i}

(10)

where n denotes the total number of samples,

\hat{y}

and y refer to the forecasted and real observations, respectively. The experimental results were taken from real-time series data. The results of the predicted PV power against the real PV power values over one month are presented in Figure 9.

As shown in Figure 9, the proposed algorithm presents highly accurate predicted PV power values. It is worth mentioning that the training process only lasted for 6 min. Thus, the proposed algorithm works well with time series data and time periods of up to 1 month. To separately examine the improvement rate of the proposed SR and MLP models, Figure 10 is presented. Additionally, Figure 11 shows a high-resolution zoom of the prediction results using the different models.

As shown in Figure 10 and Figure 11, the proposed SR-MLP technique outperformed the individual SR and MLP algorithms. Moreover, the employed transfer learning, through a voted technique, improved the prediction accuracy. The hybrid tree joined two sub-branches at an average connected leaf point. Using only irradiance and previous PV power values, the model was able to produce accurate estimates. The proposed predictor avoided an overfitting of the system and maintained high efficiency over the prediction horizon.

Figure 12 displays the cross-validation visualization of the proposed approach. The two curves of the real and forecasted PV power nearly have a perfect match, with a small difference during the PV generation peaks. The highest error was registered at time step 43 at 37 kW, while the rest of the forecasting points remained close to the ground truth. Therefore, the proposed approach was successfully cross-validated. Table 2 details the performance metrics of each model, while Figure 13 illustrates their graphical representations.

4.3. Discussions

Table 2 and Figure 13 present the PV power forecasting accuracy of the different predictors in terms of the RMSE, MAE, and

R^{2}

metrics. One can notice that the proposed SR-MLP algorithm clearly outperforms the individual SR and MLP algorithms. Moreover, it is worth mentioning that parallel computing greatly reduces the computational cost (only 7 min for a 2-year historical database at a 5-min time step, on a LENOVO Ideapad 720S-15IKB computer using Python version 3.7).

The training speed of the hybrid algorithm was significantly improved by eliminating unimportant inputs during the feature selection process performed by the Extreme Boosting and Elastic Net techniques. The training as well as the testing datasets were equally supplied to the system. Moreover, the hyperparameters were preserved throughout the training and testing phases. Thus, the proposed algorithm is characterized by higher forecasting performance as compared to the individual algorithms. Indeed, the reliability of the proposed SR-MLP is based on two predictors, which greatly improves the overall forecasting accuracy. The robustness of the SR-MLP algorithm prevented the SR from converging to local minima with the help of the ANN branch. The system output was computed by averaging the results for each predictor. Further research is needed on the effect of varying the contribution ratio of each predictor on the optimal results. However, the proposed forecasting system is very effective in keeping grid-connected PV systems protected from unexpected disturbances.

5. Conclusions

This paper presented an effective hybrid model that supports the Symbolic Regressor (SR) model in the search for local minima and optimal solutions. The proposed hybrid method consists of a combination of SR and Deep Multi-Layer Perceptron (MLP). At each forecasting time step, the hybrid SR-MLP creates an optimum by averaging the results of each predictor. The proposed SR-MLP is characterized by the following features: (1) its easy implementation and training rapidity (eliminating unimportant inputs during the feature selection process with the use of Extreme Boosting and Elastic Net techniques); (2) a reduced number of layers and neurons while guaranteeing high accuracy; (3) a reduction in the number of iterations due to the use of SR; and (4) the preservation of the hyperparameters throughout the training and testing phases. The presented simulation results demonstrated that the proposed SR-MLP is characterized by its high effectiveness through different test scenarios. The proposed technique could be of significant interest to grid utilities, including unit commitment and economic dispatch. However, if the SR does not inversely follow the MLP, the error will dramatically increase. For this reason, the development of an indicator that creates a warning and guides the mixture to an accurate forecast is needed to prevent the hybrid predictor from providing any misleading information.

Author Contributions

Conceptualization, M.T., M.M., I.C. and L.S.; Methodology, M.T., M.M., L.S. and S.S.R.; Software, M.M.; Validation, M.T., M.M. and I.C.; Formal analysis, M.T., M.M. and F.S.O.; Investigation, M.T., M.M. and S.S.R.; Resources, T.H.; Data curation, M.M.; Writing—original draft, M.T. and M.M.; Writing—review & editing, M.T., I.C., L.S., S.S.R., T.H. and F.S.O.; Visualization, M.T., I.C. and L.S.; Supervision, I.C., L.S. and F.S.O.; Project administration, T.H.; Funding acquisition, T.H. All authors have read and agreed to the published version of the manuscript.

Funding

This publication was made possible by NPRP12C-33905-SP-220 from the Qatar National Research Fund (a member of Qatar Foundation). The statements made herein are the sole responsibility of the authors.

Data Availability Statement

Not applicable.

Conflicts of Interest

The authors declare no conflict of interest.

References

Feldman, D.J.; Margolis, R.M. Q4 2018/Q1 2019 Solar Industry Update; Technical report; National Renewable Energy Lab.(NREL): Golden, CO, USA, 2019. [Google Scholar]
Shi, J.; Lee, W.J.; Liu, Y.; Yang, Y.; Wang, P. Forecasting power output of photovoltaic systems based on weather classification and support vector machines. IEEE Trans. Ind. Appl. 2012, 48, 1064–1069. [Google Scholar] [CrossRef]
Guo, B.; Javed, W.; Figgis, B.; Mirza, T. Effect of dust and weather conditions on photovoltaic performance in Doha, Qatar. In Proceedings of the 2015 First Workshop on Smart Grid and Renewable Energy (SGRE), Doha, Qatar, 22–23 March 2015; pp. 1–6. [Google Scholar]
Chaichan, M.T.; Kazem, H.A. Experimental analysis of solar intensity on photovoltaic in hot and humid weather conditions. Int. J. Sci. Eng. Res. 2016, 7, 91–96. [Google Scholar]
Antonanzas, J.; Osorio, N.; Escobar, R.; Urraca, R.; Martinez-de Pison, F.J.; Antonanzas-Torres, F. Review of photovoltaic power forecasting. Sol. Energy 2016, 136, 78–111. [Google Scholar] [CrossRef]
Massaoudi, M.; Chihi, I.; Sidhom, L.; Trabelsi, M.; Refaat, S.S.; Abu-Rub, H.; Oueslati, F.S. An effective hybrid NARX-LSTM model for point and interval PV power forecasting. IEEE Access 2021, 9, 36571–36588. [Google Scholar] [CrossRef]
Massaoudi, M.; Chihi, I.; Sidhom, L.; Trabelsi, M.; Refaat, S.S.; Oueslati, F.S. Performance Evaluation of Deep Recurrent Neural Networks Architectures: Application to PV Power Forecasting. In Proceedings of the 2019 2nd International Conference on Smart Grid and Renewable Energy (SGRE), Doha, Qatar, 19–21 November 2019; pp. 1–6. [Google Scholar]
Das, U.K.; Tey, K.S.; Seyedmahmoudian, M.; Mekhilef, S.; Idris, M.Y.I.; Van Deventer, W.; Horan, B.; Stojcevski, A. Forecasting of photovoltaic power generation and model optimization: A review. Renew. Sustain. Energy Rev. 2018, 81, 912–928. [Google Scholar] [CrossRef]
Yadav, H.K.; Pal, Y.; Tripathi, M.M. Photovoltaic power forecasting methods in smart power grid. In Proceedings of the 2015 Annual IEEE India Conference (INDICON), New Delhi, India, 17–20 December 2015; pp. 1–6. [Google Scholar]
Thevenard, D.; Pelland, S. Estimating the uncertainty in long-term photovoltaic yield predictions. Sol. Energy 2013, 91, 432–445. [Google Scholar] [CrossRef]
Wolff, B.; Kühnert, J.; Lorenz, E.; Kramer, O.; Heinemann, D. Comparing support vector regression for PV power forecasting to a physical modeling approach using measurement, numerical weather prediction, and cloud motion data. Sol. Energy 2016, 135, 197–208. [Google Scholar] [CrossRef]
Wan, C.; Zhao, J.; Song, Y.; Xu, Z.; Lin, J.; Hu, Z. Photovoltaic and solar power forecasting for smart grid energy management. CSEE J. Power Energy Syst. 2015, 1, 38–46. [Google Scholar] [CrossRef]
Ogliari, E.; Dolara, A.; Manzolini, G.; Leva, S. Physical and hybrid methods comparison for the day ahead PV output power forecast. Renew. Energy 2017, 113, 11–21. [Google Scholar] [CrossRef]
Dolara, A.; Leva, S.; Manzolini, G. Comparison of different physical models for PV power output prediction. Sol. Energy 2015, 119, 83–99. [Google Scholar] [CrossRef]
Wang, K.; Qi, X.; Liu, H. A comparison of day-ahead photovoltaic power forecasting models based on deep learning neural network. Appl. Energy 2019, 251, 113315. [Google Scholar] [CrossRef]
Zheng, D.; Semero, Y.K.; Zhang, J.; Wei, D. Short-term wind power prediction in microgrids using a hybrid approach integrating genetic algorithm, particle swarm optimization, and adaptive neuro-fuzzy inference systems. IEEJ Trans. Electr. Electron. Eng. 2018, 13, 1561–1567. [Google Scholar] [CrossRef]
Fentis, A.; Bahatti, L.; Tabaa, M.; Mestari, M. Short-term nonlinear autoregressive photovoltaic power forecasting using statistical learning approaches and in-situ observations. Int. J. Energy Environ. Eng. 2019, 10, 189–206. [Google Scholar] [CrossRef]
Mellit, A.; Kalogirou, S.A. Artificial intelligence techniques for photovoltaic applications: A review. Prog. Energy Combust. Sci. 2008, 34, 574–632. [Google Scholar] [CrossRef]
Akhter, M.N.; Mekhilef, S.; Mokhlis, H.; Shah, N.M. Review on forecasting of photovoltaic power generation based on machine learning and metaheuristic techniques. IET Renew. Power Gener. 2019, 13, 1009–1023. [Google Scholar] [CrossRef]
Kuo, W.C.; Chen, C.H.; Chen, S.Y.; Wang, C.C. Deep Learning Neural Networks for Short-Term PV Power Forecasting via Sky Image Method. Energies 2022, 15, 4779. [Google Scholar] [CrossRef]
Hussain, M.; Al-Aqrabi, H.; Hill, R. Statistical Analysis and Development of an Ensemble-Based Machine Learning Model for Photovoltaic Fault Detection. Energies 2022, 15, 5492. [Google Scholar] [CrossRef]
Said, Z.; Sharma, P.; Aslfattahi, N.; Ghodbane, M. Experimental analysis of novel ionic liquid-MXene hybrid nanofluid’s energy storage properties: Model-prediction using modern ensemble machine learning methods. J. Energy Storage 2022, 52, 104858. [Google Scholar] [CrossRef]
Jamii, J.; Trabelsi, M.; Mansouri, M.; Mimouni, M.F.; Shatanawi, W. Non-Linear Programming-Based Energy Management for a Wind Farm Coupled with Pumped Hydro Storage System. Sustainability 2022, 14, 11287. [Google Scholar] [CrossRef]
Kumari, P.; Toshniwal, D. Extreme gradient boosting and deep neural network based ensemble learning approach to forecast hourly solar irradiance. J. Clean. Prod. 2021, 279, 123285. [Google Scholar] [CrossRef]
Ramsami, P.; Oree, V. A hybrid method for forecasting the energy output of photovoltaic systems. Energy Convers. Manag. 2015, 95, 406–413. [Google Scholar] [CrossRef]
Acharya, S.K.; Wi, Y.M.; Lee, J. Day-Ahead Forecasting for Small-Scale Photovoltaic Power Based on Similar Day Detection with Selective Weather Variables. Electronics 2020, 9, 1117. [Google Scholar] [CrossRef]
Son, N.; Jung, M. Analysis of Meteorological Factor Multivariate Models for Medium-and Long-Term Photovoltaic Solar Power Forecasting Using Long Short-Term Memory. Appl. Sci. 2021, 11, 316. [Google Scholar] [CrossRef]
Kim, Y.; Seo, K.; Harrington, R.J.; Lee, Y.; Kim, H.; Kim, S. High accuracy modeling for solar PV power generation using Noble BD-LSTM-based neural networks with EMA. Appl. Sci. 2020, 10, 7339. [Google Scholar] [CrossRef]
Gigoni, L.; Betti, A.; Crisostomi, E.; Franco, A.; Tucci, M.; Bizzarri, F.; Mucci, D. Day-ahead hourly forecasting of power generation from photovoltaic plants. IEEE Trans. Sustain. Energy 2017, 9, 831–842. [Google Scholar] [CrossRef]
Semero, Y.K.; Zhang, J.; Zheng, D. PV power forecasting using an integrated GA-PSO-ANFIS approach and Gaussian process regression based feature selection strategy. CSEE J. Power Energy Syst. 2018, 4, 210–218. [Google Scholar] [CrossRef]
Yang, H.T.; Huang, C.M.; Huang, Y.C.; Pai, Y.S. A weather-based hybrid method for 1-day ahead hourly forecasting of PV power output. IEEE Trans. Sustain. Energy 2014, 5, 917–926. [Google Scholar] [CrossRef]
Wang, G.; Su, Y.; Shu, L. One-day-ahead daily power forecasting of photovoltaic systems based on partial functional linear regression models. Renew. Energy 2016, 96, 469–478. [Google Scholar] [CrossRef]
Shuvho, M.B.A.; Chowdhury, M.A.; Ahmed, S.; Kashem, M.A. Prediction of solar irradiation and performance evaluation of grid connected solar 80KWp PV plant in Bangladesh. Energy Rep. 2019, 5, 714–722. [Google Scholar] [CrossRef]
Yang, M.; Huang, X. Ultra-short-term prediction of photovoltaic power based on periodic extraction of PV energy and LSH algorithm. IEEE Access 2018, 6, 51200–51205. [Google Scholar] [CrossRef]
Lee, W.; Kim, K.; Park, J.; Kim, J.; Kim, Y. Forecasting solar power using long-short term memory and convolutional neural networks. IEEE Access 2018, 6, 73068–73080. [Google Scholar] [CrossRef]
Asrari, A.; Wu, T.X.; Ramos, B. A hybrid algorithm for short-term solar power prediction—Sunshine state case study. IEEE Trans. Sustain. Energy 2016, 8, 582–591. [Google Scholar] [CrossRef]
Wang, F.; Pang, S.; Zhen, Z.; Li, K.; Ren, H.; Shafie-Khah, M.; Catalão, J.P. Pattern classification and pso optimal weights based sky images cloud motion speed calculation method for solar pv power forecasting. In Proceedings of the 2018 IEEE Industry Applications Society Annual Meeting (IAS), Portland, OR, USA, 23–27 September 2018; pp. 1–9. [Google Scholar]
Hokoi, S.; Matsumoto, M.; Ihara, T. Statistical time series models of solar radiation and outdoor temperature—Identification of seasonal models by Kalman filter. Energy Build. 1990, 15, 373–383. [Google Scholar] [CrossRef]
Cohen, I.R. Updating Darwin: Information and entropy drive the evolution of life. F1000Research 2016, 5, 2808. [Google Scholar] [CrossRef]
Massaoudi, M.; Abu-Rub, H.; Refaat, S.S.; Chihi, I.; Oueslati, F.S. Deep learning in smart grid technology: A review of recent advancements and future prospects. IEEE Access 2021, 9, 54558–54578. [Google Scholar] [CrossRef]
Elsheikh, A.H.; Sharshir, S.W.; Abd Elaziz, M.; Kabeel, A.; Guilan, W.; Haiou, Z. Modeling of solar energy systems using artificial neural network: A comprehensive review. Sol. Energy 2019, 180, 622–639. [Google Scholar] [CrossRef]
Isa, I.S.; Omar, S.; Saad, Z.; Noor, N.M.; Osman, M.K. Weather forecasting using photovoltaic system and neural network. In Proceedings of the 2010 2nd International Conference on Computational Intelligence, Communication Systems and Networks, Liverpool, UK, 28–30 July 2010; pp. 96–100. [Google Scholar]
Massaoudi, M.; Refaat, S.S.; Chihi, I.; Trabelsi, M.; Oueslati, F.S.; Abu-Rub, H. A novel stacked generalization ensemble-based hybrid LGBM-XGB-MLP model for Short-Term Load Forecasting. Energy 2021, 214, 118874. [Google Scholar] [CrossRef]
Koza, J.R.; Poli, R. Genetic programming. In Search Methodologies; Springer: Berlin/Heidelberg, Germany, 2005; pp. 127–164. [Google Scholar]
Brabazon, A.; O’Neill, M.; McGarraghy, S. Natural Computing Algorithms; Springer: Berlin/Heidelberg, Germany, 2015; Volume 554. [Google Scholar]
Bhuvaneswari, G.; Annamalai, R. Development of a solar cell model in MATLAB for PV based generation system. In Proceedings of the 2011 Annual IEEE India Conference, Hyderabad, India, 16–18 December 2011; pp. 1–5. [Google Scholar]
DKA Solar Centre. Available online: http://dkasolarcentre.com (accessed on 23 September 2019).
Zou, H.; Hastie, T. Regularization and variable selection via the elastic net. J. R. Stat. Soc. Ser. B (Stat. Methodol.) 2005, 67, 301–320. [Google Scholar] [CrossRef]
Chen, T.; He, T.; Benesty, M.; Khotilovich, V.; Tang, Y.; Cho, H.; Chen, K. Xgboost: Extreme Gradient Boosting; R package version 0.4-2. 2015. Available online: https://cran.microsoft.com/snapshot/2017-12-11/web/packages/xgboost/vignettes/xgboost.pdf (accessed on 23 September 2019).
Shah, I.; Iftikhar, H.; Ali, S. Modeling and Forecasting Electricity Demand and Prices: A Comparison of Alternative Approaches. J. Math. 2022, 2022, 3581037. [Google Scholar] [CrossRef]
Shah, I.; Jan, F.; Ali, S. Functional data approach for short-term electricity demand forecasting. Math. Probl. Eng. 2022, 2022, 6709779. [Google Scholar] [CrossRef]
Lisi, F.; Shah, I. Forecasting next-day electricity demand and prices based on functional models. Energy Syst. 2020, 11, 947–979. [Google Scholar] [CrossRef]

Figure 1. Binary genetic tree programming representation:

f (x, y) = sin (\frac{π x + 0.5 + y}{z}), z \in R^{*}

.

Figure 1. Binary genetic tree programming representation:

f (x, y) = sin (\frac{π x + 0.5 + y}{z}), z \in R^{*}

.

Figure 2. MLP model architecture.

Figure 3. GP flowchart.

Figure 4. Typical 1-day PV power generation versus the temperature and irradiation.

Figure 5. Flowchart of the proposed PV power forecasting algorithm.

Figure 6. Coefficients of the nonlinear correlation between the PV power and related system attributes.

Figure 7. Recorded irradiation variation over 2 years (W/m²).

Figure 8. Recorded PV power variation over 2 years (kW).

Figure 9. One-month predicted vs. real PV power (kW).

Figure 10. Performance investigation of the PV power forecasting using different models.

Figure 11. One-day PV power forecasting using the different models (kW) (high resolution).

Figure 12. A capture of the SR-MLP Cross-Validation from [0,100] (high resolution).

Figure 13. MAE and RMSE Error comparison.

Table 1. State-of-the-art PV power forecasting techniques.

Model	Reference	Score Metrics	Lowest Score	Dataset
XGBF $^{1}$ -DNN $^{2}$	[24]	RMSE, MBE $^{3}$ , FS $^{4}$	RMSE = 51.35 W	PV data in Limberg, Belgium
SR-FFNN	[25]	RMSE, MBE $^{2}$ , MAE, $R^{2}$	$R^{2}$ = 0.932	Solar power in Flanders, Belgium
LSTM	[26]	NMAE, RMSE	RMSE = 38.13 kWh	1 MW PV site in Goheung, Korea
Modified LSTM	[27]	MAE, RMSE	RMSE = 0.55 kW	Ansan, Gyeonggi-do, Korea
LSTM-EMA $^{5}$	[28]	RMSE, $R^{2}$ , MAPE	$R^{2}$ = 0.96	Yeonseong-gun, Gyeonggi-do, South Korea
ENS $^{6}$	[29]	NRMSE, nMBE, MAE, nMAE	MAE = 74.1 kW	32 PV plants installed at different latitudes in Italy
GA-PSO-ANFIS	[30]	RMSE, MAE, NMAE, FS $^{4}$	RMSE = 2.08 kW	Goldwind microgrid system found in Beijing
SOM $^{7}$ , LVQ $^{8}$ , SVR $^{9}$	[31]	MRE $^{10}$ and RMSE	MRE = 1.79%	Taiwan Central Weather Bureau
PFLRM $^{11}$	[32]	RMSE, MAD $^{12}$ , MAPE	RMSE = 59.38 kW	Coloane island of Macau
ANN	[33]	RMSE, $R^{2}$	$R^{2}$ = 0.999	Solar power plant in Dhaka
LSH $^{13}$	[34]	RMSE, MRE, QR $^{14}$	RMSE = 4.23 kW	PV power station in Ashland
AE $^{15}$ -LSTM	[35]	MAPE, RMSE, MAE	RMSE = 0.14 kW	PV inverter installed in Haenam, South Korea
SFLA $^{16}$ -ANN	[36]	MAPE	MAPE = 5.38%	PV sites in Florida
PCPOW $^{17}$	[37]	$R^{2}$	$R^{2}$ = 0.938	Yunnan Electric Power Research Institute

¹ Extreme Gradient Boosting Forest. ² Deep Neural Network. ³ Mean Bias Error. ⁴ Forecast Skill. ⁵ Exponential Moving Average. ⁶ Ensemble of Methods. ⁷ Self-Organization Map. ⁸ Learning Vector Quantization. ⁹ Support Vector Regression. ¹⁰ Mean Relative Error. ¹¹ Partial Functional Linear Regression Model. ¹² Mean Absolute Deviation. ¹³ Local Sensitive Hashing. ¹⁴ QR pass rate. ¹⁵ Auto-Encoder. ¹⁶ Shuffled Frog Leaping Algorithm. ¹⁷ PSO-based sky images cloud motion speed calculation method for PV power.

Table 2. Comparison of Score Metrics.

Errors	SR	MLP	SR-MLP
RMSE (kW)	7.21	6.48	5.58
MAE (kW)	4.92	3.81	3.3
$R^{2}$	0.988	0.990	0.993

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Trabelsi, M.; Massaoudi, M.; Chihi, I.; Sidhom, L.; Refaat, S.S.; Huang, T.; Oueslati, F.S. An Effective Hybrid Symbolic Regression–Deep Multilayer Perceptron Technique for PV Power Forecasting. Energies 2022, 15, 9008. https://doi.org/10.3390/en15239008

AMA Style

Trabelsi M, Massaoudi M, Chihi I, Sidhom L, Refaat SS, Huang T, Oueslati FS. An Effective Hybrid Symbolic Regression–Deep Multilayer Perceptron Technique for PV Power Forecasting. Energies. 2022; 15(23):9008. https://doi.org/10.3390/en15239008

Chicago/Turabian Style

Trabelsi, Mohamed, Mohamed Massaoudi, Ines Chihi, Lilia Sidhom, Shady S. Refaat, Tingwen Huang, and Fakhreddine S. Oueslati. 2022. "An Effective Hybrid Symbolic Regression–Deep Multilayer Perceptron Technique for PV Power Forecasting" Energies 15, no. 23: 9008. https://doi.org/10.3390/en15239008

APA Style

Trabelsi, M., Massaoudi, M., Chihi, I., Sidhom, L., Refaat, S. S., Huang, T., & Oueslati, F. S. (2022). An Effective Hybrid Symbolic Regression–Deep Multilayer Perceptron Technique for PV Power Forecasting. Energies, 15(23), 9008. https://doi.org/10.3390/en15239008

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

An Effective Hybrid Symbolic Regression–Deep Multilayer Perceptron Technique for PV Power Forecasting

Abstract

1. Introduction

2. Background and Proposed Architecture

2.1. Symbolic Regression

2.2. Deep Multi-Layer Perceptron

2.3. Genetic Programming

2.4. Problem Formulation

3. Hybrid Model

4. Case Study

4.1. Features Selection

4.2. Training and Simulation Results

4.3. Discussions

5. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI