Next Article in Journal
State of Charge Estimation of Lithium-Ion Battery Based on Back Propagation Neural Network and AdaBoost Algorithm
Previous Article in Journal
Numerical Simulation of Environmental Characteristics of Containment in Severe Accident of Marine Nuclear Power Plant
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Day-Ahead Hourly Solar Photovoltaic Output Forecasting Using SARIMAX, Long Short-Term Memory, and Extreme Gradient Boosting: Case of the Philippines

by
Ian B. Benitez
1,
Jessa A. Ibañez
1,
Cenon III D. Lumabad
1,
Jayson M. Cañete
1 and
Jeark A. Principe
2,*
1
National Engineering Center, University of the Philippines Diliman, Quezon City 1101, Philippines
2
Department of Geodetic Engineering, University of the Philippines Diliman, Quezon City 1101, Philippines
*
Author to whom correspondence should be addressed.
Energies 2023, 16(23), 7823; https://doi.org/10.3390/en16237823
Submission received: 3 October 2023 / Revised: 16 October 2023 / Accepted: 20 October 2023 / Published: 28 November 2023
(This article belongs to the Section A2: Solar Energy and Photovoltaic Systems)

Abstract

:
This study explores the forecasting accuracy of SARIMAX, LSTM, and XGBoost models in predicting solar PV output using one-year data from three solar PV installations in the Philippines. The research aims to compare the performance of these models with their hybrid counterparts and investigate their performance. The study utilizes the adjusted shortwave radiation (SWR) product in the Advanced Himawari Imager 8 (AHI-8), as a proxy for in situ solar irradiance, and weather parameters, to improve the accuracy of the forecasting models. The results show that SARIMAX outperforms LSTM, XGBoost, and their combinations for Plants 1 and 2, while XGBoost performs best for Plant 3. Contrary to previous studies, the hybrid models did not provide more accurate forecasts than the individual methods. The performance of the models varied depending on the forecasted month and installation site. Using adjusted SWR and other weather parameters, as inputs in forecasting solar PV output, adds novelty to this research. Future research should consider comparing the accuracy of using adjusted SWR alone and combined with other weather parameters. This study contributes to solar PV output forecasting by utilizing adjusted satellite-derived solar radiation, and combining SARIMAX, LSTM, and XGBoost models, including their hybrid counterparts, in a single and comprehensive analysis.

1. Introduction

Solar photovoltaic (PV) systems are one of the widely used renewable energy (RE) technologies, contributing to global RE generation targets [1,2]. As one of the new least costly alternatives for electricity generation, it is anticipated that solar PV will spur investments in the next few years. In 2021 alone, new solar PV plants were reported to have generated about 52% of global solar capacity [3]. While policy support drives solar PV deployment globally, one of the main challenges to integrating solar PV into the electricity grid is its variable and intermittent nature, resulting in technical and economic challenges [4,5,6]. Furthermore, solar power production depends on weather conditions, such as the temperature, humidity, wind speed, cloud cover, and solar irradiance, which can vary significantly over short periods [7,8,9,10,11,12,13,14]. This variability makes it difficult for grid operators to balance supply and demand in real time. However, solar PV output forecasting can help ensure grid stability by allowing grid operators to accurately predict the amount of energy their PV systems will produce during a given period. Moreover, accurate solar PV forecasts help reduce the need for balancing supply and demand and, thereby, optimize operations.
There are two main approaches to solar PV output forecasting: indirect and direct. On one hand, using the indirect approach, solar irradiance is forecasted and then used to predict solar PV output. On the other hand, the direct approach forecasts solar PV output directly using historical data [14]. Both approaches use various methods that are widely discussed in the literature, such as statistical methods, machine learning, a combination of two or more methods (i.e., hybrid), and optimization techniques, to improve forecasting model accuracy [7,14,15,16,17,18,19].
Statistical methods, which are data driven and rely on historical data, are commonly used for solar PV output forecasting [15]. It is most preferred when data availability is limited, and the relationships between the variables of interest are already well understood [16]. Meanwhile, machine learning (ML) methods use algorithms that learn data patterns and perform predictions based on these patterns [15,16]. ML methods are preferred when data is extensive, and the relationships between the variables of interest are complex and need to be understood [16]. Although statistical models are good at capturing trends and seasonality, machine learning models often achieve higher accuracy because they are better at handling complex patterns [15]. Hence, combining statistical and ML methods, as in hybrid models, often provides more accurate forecasts than any single method alone [14,15,16]. Table 1 lists recent studies that used statistical, machine learning, and hybrid models to predict solar PV output, as well as the location of the solar PV plants, the train-test ratio, and the error metrics used in each study.
Several studies demonstrated the satisfactory performance of extreme gradient boosting (XGBoost) for solar PV forecasting. In the study by Grzebyk et al. [10], XGBoost performed better than the commercial software Solar Monkey in forecasting solar PV output. Meanwhile, in the study by Zhong and Wu [21], XGBoost outperformed an artificial neural network (ANN) and a long short-term memory (LSTM) network. Dimitropoulos et al. [22] showed that XGBoost outperformed LSTM, support vector regression (SVR), and multiple linear regression (MLR), which they employed to predict solar PV output. In the study by Sharma et al. [20], the LSTM with a Nadam optimizer ranked first in improving the forecasting accuracy for solar PV output, followed by the autoregressive integrated moving average (ARIMA) and the seasonal ARIMA with exogenous variables (SARIMAX). In forecasting multisite solar PV output, Kim et al. [9] found that the hybrid model SARIMAX-LSTM ranked first in improving forecasting accuracy, followed by the random forest (RF), deep neural network (DNN), SARIMAX, LSTM, and linear SVR. Another study [25] that looked into predicting load demand also found that the SARIMAX-LSTM model ranked first in improving the forecasting accuracy, followed by the LSTM, SARIMAX with back propagation (SARIMAX-BP), and SARIMAX. These studies show that while some single models perform better than others, their hybrid counterparts outperform them.
Since most of the studies in Table 1 showed that SARIMAX, LSTM, and XGBoost outperform other methods, this work aims to investigate their forecasting accuracy in predicting solar PV output using data specific to the Philippines and compare them with their hybrid counterparts. There is no consensus in the existing literature on the dataset period, train-test ratio, and error metrics used. Since solar PV output is dependent on weather conditions, which are also location specific, it is important to examine how these methodologies affect the solar PV output forecasting accuracy when used to predict a day-ahead forecast for identified locations in the Philippines. What makes the Philippines a relevant case study for solar PV output forecasting is that the country has been experiencing growth in its solar PV market, requiring accurate forecasting for optimal energy utilization. Additionally, solar PV output forecasting models developed for other regions may not be directly applicable to the unique conditions in the Philippines. Hence, locally adapted forecasting models considering the unique conditions in the Philippines are needed for the effective integration of solar PV systems into the energy grid.
At the time of writing, no existing literature has investigated solar PV output forecasting using SARIMAX, LSTM, XGBoost, and their hybrid counterparts in one analysis. Hence, this work aims to fill this gap by developing locally adapted forecasting models to accurately predict the output power of specified solar PV power plants in the Philippines. This work also offers novelty by using the adjusted Advanced Himawari Imager (AHI-8) shortwave radiation (SWR) for solar irradiance, hereon called R’, which uses a cloud optical thickness (CLOT)-derived correction factor and is well-documented in the works by Sotto et al. (2023) [26] and Principe and Takeuchi (2019) [27]. Their methodology has improved the consistency of SWR values in solar PV potential assessment by lessening the variations caused by weather and clouds. The use of reanalysis data for the weather parameters is due to the unavailability of in situ data. This study is also limited to the solar PV output obtained from solar PV installations in the Philippines, the time series covering the period from January 2021 to December 2021, and parameters with an hourly temporal resolution. The pre-processing of R’ values is also not covered in this study.

2. Materials and Methods

This section presents the methodology used in this study to develop solar PV forecasting models using statistical, machine learning, and hybrid techniques. Figure 1 shows the process flow from the data processing, feature engineering, forecasting, and model evaluation.

2.1. Data

2.1.1. Solar PV Output Data

This study considered solar PV output data from 45 solar PV installations in the Philippines, 8 of which were from the Philippines Department of Energy (DOE), 2 from university installations, 1 from industry, and 32 from an online database. The data sets were further narrowed down based on the completeness of the data from January 2021 to December 2021 (one year), since forecast accuracy heavily relies on the quality of historical data [15].

2.1.2. Weather Parameters and Solar Irradiance

The fifth generation of the ECMWF’s global climate and weather reanalysis (ERA5) provides hourly estimates for a wide range of atmospheric, oceanic, and land–surface parameters. It replaces the ERA-Interim reanalysis by combining model data with global observations, according to physical laws [28]. This study used the ERA5 data on the u and v components of wind, ambient and dewpoint temperature, total precipitation, and different levels of cloud cover (low, medium, high, and total). For wind speed, wind direction, and relative humidity, they were obtained using Equations (1)–(3):
w s = u 2 + v 2
w d = m o d   180 + 180 π   a t a n 2   ( v , u ) ,   360
r h = 100     e x p 17.625     t d 243.04 + t d   /   e x p 17.625     t 243.04 + t
where u2 is the 10 m u wind component, while v2 is the 10 m v wind component [29], t d is the dewpoint temperature, and t is the ambient temperature [30]. Table 2 shows a summary of the exogenous variables used in forecasting models and the related literature, which used the same variables in solar PV studies.

2.2. Data Processing and Feature Engineering

2.2.1. Outlier Detection and Data Gaps Filling

To ensure the quality of the input data for the models, outliers and data gaps were filled using a modified column mean imputation (CMI) [38]. The modified CMI employed in this work begins by searching the entire dataset for solar irradiance (R’) values for the same hour. Then, the next step is to scan the R’ values and determine whether they fall within the range of R’ from the missing PV output data. If they are outside of the acceptable range, the solar PV output data is discarded. The same steps are followed for each meteorological parameter, and the process loop is repeated until the gap is filled. This process loop employs a limiting multiplier with an increment of 1% to 10%. Once all the data sets’ solar PV output passes the set parameters, it is averaged and used as a stand-in for the missing PV plant output data. Weather parameters were evaluated first using the Pearson correlation coefficient, in which only the significant parameters (p-value   0.5 ) were used in the next step. The missing solar PV output data were supplemented by scanning and averaging the values of the solar PV output data and the significant weather parameters from the same timestamp. Studies also show that removing outliers and reducing the missing values in the data set enhances forecasting model performance [9,24].

2.2.2. Decomposition

The decomposition of time series data is a prerequisite for employing an ARIMA process, where the seasonality and trend are removed, only retaining the residuals for analysis. If the seasonality and trend are not removed, the variance and mean of the time series keeps changing over time, which may result in spurious forecasting accuracy results. This study used the multiplicative model in decomposing all of the time series data.

2.2.3. Feature Selection

When dealing with multiple variables in a regression analysis, it is important to address multicollinearity where one independent variable, x1, could already explain another variable, x2. If the highly correlated variables are not dealt with and used in a regression analysis, specification errors will occur since there would be no way to identify if the variation in the dependent variable y is attributable to the unique variation in x1 or in x2. To address this problem, this study employed the variance inflation factor (VIF) to exclude redundant variables. It follows the condition that when the VIF of some independent variables is greater than 5, it means that one of them already explains the rest. After all conditions have been satisfied and all redundant variables removed, only the remaining variables were used in the forecasting models.

2.2.4. Unit Root Testing

With regression analysis, it is also important to address the presence of autocorrelated errors. Otherwise, the analysis would produce unreliable coefficient estimates and false significance test results. To identify whether the variables considered in this study have autocorrelated errors, this work employed three existing and well-documented methods, namely, the Augmented Dickey–Fuller (ADF) test [39], the Phillips–Perron (PP) test [40], and Kwiatkowski–Phillips–Schmidt–Shin (KPSS) test [41]. Both the ADF and PP tests follow the same null hypothesis that the time series has autocorrelated errors, while the KPSS test has a null hypothesis that the time series does not have autocorrelated errors.

2.2.5. Data Splitting

There has been no consensus on what train-test ratio must be used in forecasting time series data. For this study, the models were built with a train-test split of 80–20, following the study by Gholamy et al. [42], whose empirical results suggest that allocating 70–80% for training and 20–30% for testing yield the best results. To further evaluate the accuracy of the derived models, all the models were tested to forecast the 15th day of every month for each location identified in this study.

2.3. Forecasting Techniques

This study developed locally adapted forecasting models using SARIMAX, LSTM, and XGBoost, and their hybrid counterparts in one analysis. Each technique is discussed in the following subsections.

2.3.1. SARIMAX

The seasonal autoregressive integrated moving average with exogenous variables (SARIMAX) model is well documented in the study by Manigandan et al., in 2021, to forecast natural gas production and consumption in the United States [43], by Au et al., in 2020, to forecast power consumption in Pennsylvania during the COVID-19 pandemic [44], and by Xie et al., in 2013, to forecast day-ahead electricity spot market prices in Sweden [45]. SARIMAX is a variation of the autoregressive integrated moving average (ARIMA) model, which accounts for exogenous variables related to the response variable in the regression model. It is defined in Equation (4):
ϕ p ( B ) Φ P ( B s ) Y t = α + β k X k , t + θ q ( B ) Θ Q ( B s ) ε t  
where ϕ p ( B ) is the nonseasonal AR(p), Φ P ( B s ) is the seasonal AR(P), α is a constant term, β k X k , t is the exogenous variable of the kth input at time t , θ q ( B ) is the nonseasonal MA(q), Θ Q ( B s ) is the seasonal MA(Q), and ε is the error term. SARIMAX models were identified using the pmdarima.arima.auto_arima library in Python [46]. Table 3 shows the summary of SARIMAX hyperparameters used in the study.

2.3.2. Long Short-Term Memory

Long short-term memory (LSTM) is one of the most common deep learning techniques utilized for solar PV output forecasting [20,21,22]. LSTM is a recurrent neural network (RNN) architecture that has emerged as a basic architecture for time series data analysis and forecasting. LSTM excels at capturing long-term dependencies and temporal patterns and is, therefore, recommended when processing sequential data with inherent time dependence. By combining memory cells and gating processes, LSTM models successfully maintain and update information over long periods, allowing them to replicate complex relationships within time series data [46]. This capability is practical in applications such as forecasting solar PV output. Table 4 shows the summary of LSTM hyperparameters used in the study.

2.3.3. Extreme Gradient Boosting (XGBoost)

Extreme gradient boosting (XGBoost) is another widely used machine learning technique utilized for solar PV output forecasting [10,21,22,23,24,47]. XGBoost, is a distributed gradient boosting toolkit that builds on the basic gradient boosting framework with system and algorithmic improvements, making it extremely adaptable, portable, and effective [48]. It builds new trees iteratively to anticipate faults from previous trees, and final forecasts are made by combining these trees. Each feature’s importance is assigned within the trees to indicate its significance in making predictions. The value of a characteristic grows with usage, making it easier to understand how important it is for forecasting. The models were built with various hyperparameter combinations and were rigorously trained to avoid overfitting. Table 5 shows a summary of the hyperparameters used in the study.

2.3.4. Hybrid Models

Four hybrid models (HM) using the combination of SARIMAX, LSTM, and XGBoost were tested in this study (Table 6). These hybrid models were created to assess whether combinations of the three forecasting techniques perform better than their individual counterparts. HM1, HM2, and HM3 were derived using Equation (5), while HM4 was derived using Equation (6):
H M P = M 1 P M 1 A M 1 A + M 2 A + M 2 P M 2 A M 1 A + M 2 A   
H M P = M 1 P M 1 A M 1 A + M 2 A + M 3 A    + M 2 P M 2 A M 1 A + M 2 A + M 3 A + M 3 P M 3 A M 1 A + M 2 A + M 3 A
where H M P is the hybrid model predictions; M 1 P , M 2 P , and M 3 P are the single model predictions in kW; M 1 A ,   M 2 A , a n d   M 3 A are single model accuracies in per cent (%).

2.4. Model Evaluation

This work adapted the two most commonly used error metrics to evaluate the performance of the models [49], namely, the root mean square error (RMSE) and mean absolute error (MAE) presented in Equations (7) and (8). The mean absolute percentage error (MAPE) in Equation (9) was also used in this study, considering the Philippines’ wholesale electricity spot market (WESM) uses this error metric.
R M S E = i = 1 n ( y i x i ) 2 n  
M A E = i = 1 n y i x i n  
M A P E = 100 % n   t = 1 n y i y ^ i y i
For a proper comparison, the RMSE and MAE values per plant were divided by the installed capacity to obtain the percentage (%) equivalent, as shown in Equations (10) and (11).
R M S E   ( % ) = R M S E   ( k W ) I n s t a l l e d   C a p a c i t y   ( k W ) × 100
M A E   % = M A E   ( k W ) I n s t a l l e d   C a p a c i t y   ( k W ) × 100

3. Results and Discussions

3.1. Data Evaluation

Three out of 45 solar PV installations in the Philippines were selected based on the completeness of the data. The three identified solar PV installations are located in each of the major island groups in the Philippines. The said installations are summarized in Table 7, with their location and installed capacity in kW.

3.2. Data Processing and Feature Engineering

During the data processing, the outliers were detected and removed, then gaps in the data sets were filled using an imputation method. The results show the absence of outliers. However, there were 18, 35, and 16 missing values for Plants 1, 2, and 3, respectively. To fill these gaps, the relationship between the weather variables and the solar PV output was calculated using the Pearson correlation. Significant weather parameters were then used in the data gaps filling methodology adopted in this study. Table 8 summarizes the Pearson correlation results, wherein only the R’, rh, and t2m are considered significant for the three solar PV installations.
The residuals were evaluated using the variance inflation factor (VIF) to identify highly correlated variables and remove the redundant ones. The results are summarized in Table 9, which show that total cloud cover (TCC) should be removed for all plants, while ambient temperature (T) should be removed for Plant 2 and Plant 3. Only the remaining variables were used for the modeling process.
After employing three unit root tests (i.e., ADF, PP, and KPSS), the results confirm that the data sets do not contain a unit root and are, thus, qualified to be used for the SARIMAX models.

3.3. Forecasting

Three forecasting techniques and four hybrid models were employed for the final three solar PV installations. Each of the model accuracies are presented in Table 10, Table 11 and Table 12. The following subsections discuss the model accuracies for each solar PV installation considered in this study.

3.3.1. Plant 1

For HM1 (a hybrid of SARIMAX and LSTM), the LSTM model predictions improved in terms of the RMSE when combined with SARIMAX, indicating that HM1, with an average value of 6.01%, reduced the overall error compared to the LSTM, having an average value of 7.73%. However, SARIMAX yielded the lowest average RMSE value at 4.95%.
SARIMAX also outperformed HM1 in most months, except for July and September, when HM1 outperformed both the LSTM and SARIMAX. Considering the MAE values, the LSTM model predictions also improved when combined with SARIMAX, except for July, where SARIMAX alone produced better results. Furthermore, in January, July, and September, SARIMAX performed better than HM1. In those three months, HM1 outperformed the LSTM and SARIMAX. Overall, the SARIMAX model had the best MAE accuracy at an average of 2.44%, followed by HM1 at 2.86%, and LSTM at 3.41%. Regarding the MAPE, HM1 outperformed the LSTM with mean error values of 15.19% and 17.50%, respectively. However, SARIMAX outperformed HM1, except for January, September, and October, when HM1 outperformed both the LSTM and SARIMAX. Therefore, the combination of LSTM and SARIMAX improved the overall predictions, but SARIMAX performed better alone in most months. However, HM1 performed better in July, September, January, and October compared to both individual models. Looking at the average MAPE values, the SARIMAX model had the highest accuracy at 10.15%, followed by HM1 at 15.19%, and LSTM at 17.50%.
For HM2, the SARIMAX model predictions with an average RMSE value of 4.95% improved when combined with XGBoost, where HM2 yielded an average RMSE value of 4.14%. Still, using the XGBoost model alone can produce better results than combining it with SARIMAX, except in April, June, and December, wherein HM2 performed better compared to the individual models. The SARIMAX model predictions also improved in terms of the MAE when combined with XGBoost, except for December. However, using the XGBoost model alone can produce better results than combining it with SARIMAX, except in March and April, where HM2 performed better compared to SARIMAX and XGBoost. Overall, XGBoost had the highest model accuracy at an average MAE value of 1.79%, followed by HM2 at 2.01%, and SARIMAX at 2.44%. Furthermore, the SARIMAX model predictions improved when combined with XGBoost during November and December. However, HM2 yielded better forecasting accuracy than the XGBoost model alone, except in November and December. The SARIMAX model, having an average MAPE value of 10.15%, still performed better compared to HM2 at 12.35% and XGBoost at 12.47%.
For HM3, the LSTM model predictions improved in terms of the RMSE when combined with XGBoost, except in March, May, October, and December. Still, using the XGBoost model alone, having an average RMSE value of 3.91%, can produce better results than HM3 at 5.86% and LSTM at 7.73%. In addition, the LSTM model predictions improved in terms of the MAE when combined with XGBoost, except in March, May, October, November, and December. However, using the XGBoost model with an average MAE value of 1.79% can produce better results than HM3, with an average MAE value of 2.72%. The LSTM model predictions also improved the MAPE when combined with XGBoost, except in January, March, April, May, and June. Still, using the XGBoost model alone can produce better results than when combined with the LSTM, except for December, when the LSTM outperformed both HM3 and XGBoost. Based on the results, HM3 yielded better forecasting results, with an average MAPE value of 15.59%, than LSTM alone at 17.50%, mainly due to the performance of XGBoost in forecasting solar PV output, which was at 12.47%.
For HM4, the LSTM model predictions improved the RMSE when combined with XGBoost and SARIMAX, with HM4 having an average RMSE value of 5.21%. Meanwhile, the SARIMAX model improved when combined with LSTM and XGBoost, except for January, July, and September. Still, XGBoost had better model accuracy performance than HM4, except for July. Regarding the MAE, the LSTM model predictions improved when combined with XGBoost and SARIMAX, with HM4 having an average MAE value of 2.26%. The SARIMAX model also improved when combined with the LSTM and XGBoost, except for January and September. However, using the XGBoost model had better accuracy, with an average MAE value of 1.79%, than HM4 at 2.55%. Regarding the MAPE, the LSTM model predictions improved when combined with XGBoost and SARIMAX, except in December. The SARIMAX model improved when combined with LSTM and XGBoost, except for January, February, July, September, and October. Still, the XGBoost model alone can produce better results than HM4, except during November and December. The SARIMAX model still had the highest accuracy, with an average MAPE value of 10.15%, followed by XGBoost at 12.47%, HM4 at 14.71%, and LSTM at 17.50%.
Figure 2 shows the comparison between the monthly average for the Plant 1 solar PV output values and the model predictions.

3.3.2. Plant 2

For HM1, the SARIMAX model predictions improved in terms of the RMSE when combined with the LSTM, except for March, August, October, and November. Meanwhile, the LSTM model predictions also improved when combined with SARIMAX, except for January, June, and July. HM1 had the best model accuracy with an average RMSE value of 4.98%, followed by SARIMAX at 5.02%, and LSTM at 5.61%. Regarding the MAE, the SARIMAX model prediction improved when combined with the LSTM, except for March, April, August, October, and November. It also had a better MAE average at 2.46% than HM1 at 2.57%. The LSTM model predictions also improved when combined with SARIMAX, except in January and June. Overall, the SARIMAX model performed better with an average MAE value of 2.46%, followed by HM1 at 2.57%, and LSTM at 2.99%. Regarding the MAPE, the SARIMAX model predictions improved when combined with the LSTM in January and May. However, SARIMAX, having an average MAPE value of 13.16%, outperformed HM1 at 23.35% and LSTM at 28.09%. The LSTM model predictions improved when combined with SARIMAX, except for May.
For HM2, the SARIMAX model predictions improved regarding the RMSE when combined with XGBoost, except for August, September, and November. Meanwhile, the XGBoost model predictions improved when combined with SARIMAX, except for January, February, April to July, and October. Regarding the average RMSE values, XGBoost had a higher model accuracy at 4.06 kW, followed by HM2 at 4.33, and SARIMAX at 5.02. Regarding the MAE, the SARIMAX model predictions improved when combined with XGBoost, except for August, September, and November. The XGBoost model predictions also improved when combined with SARIMAX, except for January, February, and April to July. However, the XGBoost model was still better than HM2 at an average MAE value of 2.11 kW. Regarding the MAPE, the SARIMAX model predictions improved when combined with XGBoost, except for August, September, November, and December. Meanwhile, SARIMAX still has a lower average MAPE, with a value of 13.16%, than HM2 at 16.74%. The XGBoost model predictions also improved when combined with SARIMAX, except for January, February, April, June, and July.
For HM3, the LSTM model predictions improved the RMSE when combined with XGBoost, except for July. Meanwhile, the XGBoost model predictions improved when combined with the LSTM for August, September, and December. However, the average RMSE value for XGBoost at 4.06 kW was still better than HM3 at 4.56 kW and 5.61 kW. Regarding the MAE, the LSTM model predictions improved when combined with XGBoost, except for May. The XGBoost model predictions also improved when combined with LSTM for January, August, and September. It also yielded better model accuracy at an average MAE value of 2.11 kW than HM3 at 2.41 kW and LSTM at 2.99 kW. Regarding the MAPE, except for May, the LSTM model predictions improved when combined with XGBoost. Like the MAE results, the XGBoost model predictions also improved when combined with LSTM in January, August, and December. With the average MAPE values, XGBoost remained the model with the highest model accuracy at 19.05%, with LSTM having the lowest model accuracy at 28.09%.
For HM4, the LSTM model predictions improved in terms of the RMSE when combined with XGBoost and SARIMAX, except for January. The SARIMAX model also improved when combined with LSTM and XGBoost, except for February, March, August, and November. Meanwhile, the XGBoost model alone can produce better results than when combining it with LSTM and SARIMAX, except for August, September, and December. Across all the models, XGBoost had the highest model accuracy at an average RMSE value of 4.06 kW, followed by HM4 at 4.51 kW, SARIMAX at 5.02 kW, and LSTM at 5.61 kW. Regarding the MAPE, the SARIMAX model also improved when combined with LSTM and XGBoost, except for February, March, August, October, and November. The improvement can be observed with HM4 having an average MAE value of 2.34 kW, while for SARIMAX it was 2.45 kW. Except for January, the LSTM model predictions also improved when combined with XGBoost and SARIMAX. Meanwhile, the XGBoost model alone can produce better results than combining it with LSTM and SARIMAX, except for May, August, September, and December. The XGBoost model remains the highest-performing model at 2.11 kW. Regarding the MAPE, the SARIMAX model improved when combined with the LSTM and XGBoost in January, May to July, and September. The results show that using the HM4 model, with an average MAPE value of 22.27%, resulted in a significant increase in accuracy compared to the LSTM, with an average MAPE value of 28.09%. However, the XGBoost model can produce better results than when combining it with LSTM and SARIMAX, except for January to April, June, and October.
Figure 3 shows the comparison between the monthly average for the Plant 2 solar PV output values and the model predictions.

3.3.3. Plant 3

For HM1, the SARIMAX model predictions improved when combined with LSTM, except for July, August, and September. The LSTM model predictions also improved when combined with SARIMAX, except for March to May, June, and October. But in terms of the average RMSE value, the LSTM model had a higher model accuracy at 4.40 kW than the HM1 model at 4.54 kW, and SARIMAX with LSTM, except for January, February, July, August, September, and December. The LSTM model predictions improved when combined with SARIMAX, except for March, April, May, and June. This time, SARIMAX had the highest model accuracy at an average MAPE value of 16.17%, followed by HM1 at 16.52%, and LSTM at 17.77%.
For HM2, the SARIMAX model predictions improved regarding the RMSE when combined with XGBoost, except for December. The XGBoost model predictions improved when combined with SARIMAX in September, November, and December. Comparing their average RMSE values, XGBoost had a higher model accuracy at 2.98 kW than SARIMAX at 5.25, making their hybrid model better than employing SARIMAX alone. Regarding the MAE and MAPE, the SARIMAX model predictions improved when combined with XGBoost, except in December. Meanwhile, the XGBoost model predictions improved when combined with SARIMAX in September and December.
For HM3, the LSTM model predictions improved regarding the RMSE when combined with XGBoost in January and July. The XGBoost model predictions also improved when combined with LSTM in July and December. The model accuracy improvement can be attributed to the XGBoost model, which had an average RMSE value of 2.98 kW, meaning that HM3 was at 3.81 kW even when the LSTM model was at 4.40 kW. Regarding the MAE, the LSTM model predictions improved when combined with XGBoost in March and July. The XGBoost model predictions also improved when combined with LSTM in December. Regarding the MAPE, the LSTM model predictions improved when combined with XGBoost in July and October. Similar to the model accuracy improvement in the RMSE, combining the LSTM with XGBoost yielded better results for the MAE and MAPE. However, when comparing the individual models and their hybrid, the results show that the XGBoost model predictions were better than the LSTM and HM3.
For HM4, the LSTM model predictions improved regarding the RMSE when combined with XGBoost and SARIMAX, except in June, October, and December. The SARIMAX model also improved when combined with LSTM and XGBoost, except for September and December. Furthermore, XGBoost combined with LSTM and SARIMAX had better predictions, except for November and December. The XGBoost model yielded the highest accuracy, with an average RMSE value of 2.98 kW, followed by HM4 at 4.20 kW, LSTM at 4.40, and SARIMAX at 5.25 kW. Regarding the MAE, the SARIMAX model improved when combined with LSTM and XGBoost, except in December. The LSTM model predictions also improved when combined with XGBoost and SARIMAX, except in May, June, October, and December. XGBoost, combined with LSTM and SARIMAX, yielded better predictions, except for November and December. XGBoost also generated the lowest average MAE at 1.49 kW, followed by HM4 at 2.16 kW, LSTM at 2.25, and SARIMAX at 2.67 kW. Regarding the MAPE, the SARIMAX model improved when combined with LSTM and XGBoost, except in January, February, July, September, and December. The LSTM model predictions improved when combined with XGBoost and SARIMAX, except in May and June. XGBoost, combined with LSTM and SARIMAX, had a higher model accuracy, except for November and December. The XGBoost model also yielded the lowest average MAPE at 9.42%, which means that it can forecast solar PV output more accurately than SARIMAX at 16.17%, LSTM at 17.77%, and the hybrid HM4 at 15.15%.
Figure 4 shows the comparison between the monthly average for the Plant 3 solar PV output values and the model predictions.

3.3.4. Model Performance for the Three Plants

In general, the results show that XGBoost outperformed the SARIMAX and LSTM models for some months, while SARIMAX outperformed the rest of the models in other months. Meanwhile, LSTM was the worst-performing model among the three, which indicates that its predictive capabilities need to be improved by further refining the feature selection process. The Philippines’ wholesale electricity spot market (WESM) requires solar power plants to have a maximum annual average MAPE not exceeding 18.00%. Based on the results, the model accuracies for SARIMAX, LSTM, and XGBoost vary for the three solar PV installations. For Plant 1, all the models have average MAPE values lower than 18.00%, with SARIMAX outperforming the other models at 10.15%. For Plant 2, only the SARIMAX model and HM2 (a hybrid of SARIMAX and XGBoost) met the 18.00% WESM requirement. Even when HM2 is at 16.74%, SARIMAX still has a better model forecasting accuracy at 13.16%, which means that using SARIMAX is still better than combining it with XGBoost. For Plant 3, all the models have an MAPE lower than 18.00%, with XGBoost outperforming the other models. Table 13 shows the forecasting model with the highest accuracy (based on the MAPE) per month and per plant.

4. Conclusions

This work investigates the forecasting accuracy of SARIMAX, LSTM, and XGBoost, including their hybrids, in predicting solar PV output, using data from three solar PV installations in the Philippines. This research shows that SARIMAX outperforms LSTM, XGBoost, and their combinations for Plants 1 and 2, while XGBoost outperforms the other two models for Plant 3. Although the literature cited in this paper presented that hybrid models often provide more accurate forecasts than any single method alone, the findings in this work revealed otherwise. One possible explanation for this variance could be the quality of the solar PV output, which depends on location-specific factors, such as the meteorological conditions. These factors can significantly impact the performance of hybrid models in different settings. The performance of the forecasting techniques also varies based on the forecasted month and the location of the solar PV plant. Throughout the analysis, the LSTM model was consistently underperforming, which might indicate that its predictive capabilities need to be further improved.
This study offers novelty by using the cloud optical thickness (CLOT)-adjusted Advanced Himawari Imager (AHI-8) shortwave radiation (SWR) as a proxy for in situ solar irradiance data. R’ was used to forecast the solar PV output together with other weather parameters. While this study has provided valuable insights about the performance of SARIMAX, LSTM, and XGBoost in predicting solar PV output, it is important to acknowledge its limitations. This research focused solely on solar PV outputs obtained from solar PV installations in the Philippines and used the available one-year data from January 2021 to December 2021. For future studies, it is recommended that a comparison is considered between the accuracies of the models that used solely the adjusted solar irradiance data against one that used both the adjusted irradiance data and other weather parameters to predict solar PV output. While the inclusion of adjusted SWR in forecasting models is a novel approach, it is important to assess whether the addition of other weather parameters improves the accuracy of the predictions. This comparison can provide insights into the relative importance of different weather variables and guide the development of more comprehensive forecasting models.
This study emphasized the importance of ensuring the reliability, completeness, and consistency of the solar PV output data, as well as the weather data used as input variables, since they are crucial for obtaining reliable and meaningful results. The researchers also learned about the strengths and limitations of each model and gained insights into which models are more suitable for solar PV output forecasting in the Philippines. The availability of comprehensive and long-term solar PV output data was also a challenge.
This research contributes to the existing solar PV output forecasting knowledge by combining SARIMAX, LSTM, and XGBoost in one analysis and utilizing the same in the Philippines’ context. The practical implications and potential benefits of this study include improved forecasting accuracy, enhanced decision-making, cost optimization, informed planning and investment decisions, and the facilitation of renewable energy integration. Commercial applications of the outputs from this study include optimization of solar PV power plant operations and support for energy trading, project development, energy management, grid stability, and research on the renewable energy sector.

Author Contributions

Conceptualization and methodology, I.B.B.; software, I.B.B., C.I.D.L. and J.M.C.; validation, I.B.B. and C.I.D.L.; formal analysis, I.B.B. and J.A.I.; resources, I.B.B. and J.M.C.; data curation, J.A.I.; visualization, I.B.B. and J.A.I.; writing—original draft preparation, I.B.B. and J.A.I.; writing—review and editing, I.B.B., J.A.I. and J.A.P.; supervision, project administration, and funding acquisition, J.A.P. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the Department of Science and Technology—Philippine Council for Industry, Energy and Emerging Technology Research and Development (DOST-PCIEERD) under Project SINAG (Solar PV Resource and Installation Assessment Using Geospatial Technologies). The APC was funded by DOST-PCIEERD.

Data Availability Statement

The data and source codes presented in this study are available on request from the corresponding author. The data are not publicly available due to the author’s agreement with the Philippines Department of Energy (DOE).

Acknowledgments

This study is implemented under the OutSolar component of Project SINAG (Solar PV Resource and Installation Assessment Using Geospatial Technologies). The authors would also like to acknowledge Engr. Ron-Ron Madera from the Philippines Department of Energy (DOE) for providing the solar PV output data used in the study.

Conflicts of Interest

The authors declare no conflict of interest.

Appendix A

Table A1. Summary of Plant 1 model accuracies.
Table A1. Summary of Plant 1 model accuracies.
ModelSARIMAXLSTMXGBoostHM1HM2HM3HM4
Month
Metrics
January
RMSE2.924.381.913.202.463.122.74
MAE1.611.910.951.531.371.441.31
MAPE7.528.934.296.406.256.805.57
February
RMSE5.187.213.726.0184.125.665.29
MAE2.683.662.013.0162.173.012.74
MAPE10.9413.938.1611.689.0111.4410.57
March
RMSE3.597.594.465.883.726.185.35
MAE1.723.332.082.561.842.732.39
MAPE4.827.575.005.894.776.155.49
April
RMSE2.586.432.784.852.305.094.19
MAE1.332.801.182.171.122.301.91
MAPE3.587.562.905.942.986.185.18
May
RMSE3.145.471.784.162.484.433.64
MAE1.432.710.932.151.182.261.93
MAPE4.638.332.646.863.936.926.12
June
RMSE4.7211.014.767.044.238.466.18
MAE2.604.922.203.492.274.003.19
MAPE9.1314.635.8311.177.3311.759.99
July
RMSE12.4914.2910.2610.9411.1510.6610.23
MAE5.965.333.435.454.584.574.99
MAPE17.2916.008.1115.8412.1712.7814.00
August
RMSE3.746.272.044.542.735.094.05
MAE1.992.701.002.351.382.222.06
MAPE10.4115.014.9012.847.0312.2511.19
September
RMSE13.1413.008.3012.079.8610.2710.98
MAE6.176.264.085.504.794.655.08
MAPE24.3121.3514.7020.8117.9516.4819.09
October
RMSE4.057.902.305.142.705.634.38
MAE1.953.611.162.691.382.702.31
MAPE19.3921.8012.3517.9412.3418.3415.67
November
RMSE2.305.292.894.822.553.323.19
MAE1.172.061.511.941.341.671.63
MAPE6.3153.1651.7748.1243.6152.2849.64
December
RMSE1.523.891.763.401.422.372.24
MAE0.701.650.921.510.721.131.10
MAPE3.4721.7429.0218.7520.8725.7024.06
Average
RMSE4.957.733.916.014.145.865.21
MAE2.443.411.792.862.012.722.55
MAPE10.1517.5012.4715.1912.3515.5914.71
RMSE, MAE, and MAPE values are in %.
Table A2. Summary of Plant 2 model accuracies.
Table A2. Summary of Plant 2 model accuracies.
ModelSARIMAXLSTMXGBoostHM1HM2HM3HM4
Month
Metrics
January
RMSE4.862.532.282.993.522.312.65
MAE2.731.271.201.551.891.111.44
MAPE8.417.494.255.465.565.705.12
February
RMSE6.458.423.107.613.917.356.93
MAE2.764.491.523.901.803.913.55
MAPE39.4090.3121.9874.6726.6476.9467.04
March
RMSE4.055.764.134.933.825.174.69
MAE2.152.962.012.411.962.642.28
MAPE7.2314.656.3610.956.2312.049.81
April
RMSE4.114.232.773.743.583.303.31
MAE1.922.281.542.021.791.841.80
MAPE5.309.234.236.984.887.005.96
May
RMSE5.244.953.534.544.083.383.90
MAE2.792.402.112.242.361.732.09
MAPE11.068.2510.938.3910.046.647.82
June
RMSE6.866.093.586.215.535.135.66
MAE3.153.061.533.082.452.562.77
MAPE11.5013.115.2912.238.8710.6510.87
July
RMSE7.906.935.317.186.315.956.54
MAE3.873.532.843.413.252.923.16
MAPE13.6914.0211.4712.2312.3211.0111.20
August
RMSE4.225.907.414.435.535.344.66
MAE2.103.543.862.703.053.022.68
MAPE6.9415.6517.0411.8913.0414.6912.85
September
RMSE4.635.624.983.594.743.443.22
MAE2.193.452.712.042.451.741.70
MAPE14.5322.8218.2215.9516.1313.9513.49
October
RMSE3.924.982.764.003.033.713.45
MAE1.652.621.292.001.262.021.72
MAPE13.1622.3213.1317.7312.2518.5116.11
November
RMSE4.197.384.876.734.565.705.52
MAE2.013.852.553.522.273.082.96
MAPE17.59104.1899.1791.4575.68100.7794.01
December
RMSE3.834.533.983.833.403.973.62
MAE2.162.482.192.001.912.292.00
MAPE9.1115.1016.5311.139.3115.7113.01
Average
RMSE5.025.614.064.984.334.564.51
MAE2.462.992.112.572.202.412.34
MAPE13.1628.0919.0523.2516.7424.4722.27
RMSE, MAE, and MAPE values are in %.
Table A3. Summary of Plant 3 model accuracies.
Table A3. Summary of Plant 3 model accuracies.
ModelSARIMAXLSTMXGBoostHM1HM2HM3HM4
Month
Metrics
January
RMSE5.145.083.165.034.424.464.65
MAE2.572.511.492.542.192.202.33
MAPE30.5537.6816.5834.4925.6331.2430.99
February
RMSE4.643.552.053.503.932.983.11
MAE2.571.931.091.852.161.561.68
MAPE11.4819.378.2515.5910.5915.6814.02
March
RMSE5.344.592.424.704.573.724.26
MAE2.362.391.202.112.041.841.93
MAPE26.3021.4710.6722.2022.2017.0220.00
April
RMSE6.105.123.735.385.444.514.90
MAE2.892.631.682.632.572.242.39
MAPE13.9911.778.0712.7112.6310.2711.57
May
RMSE7.956.973.057.337.105.596.54
MAE3.872.801.623.273.442.382.90
MAPE15.8511.655.7213.5913.979.6111.92
June
RMSE7.115.524.576.336.545.035.88
MAE3.952.752.323.343.612.443.10
MAPE15.1510.688.0813.0413.679.2711.85
July
RMSE5.285.574.325.334.804.614.80
MAE3.063.282.093.192.772.812.90
MAPE15.5720.4310.5618.3314.1016.8216.40
August
RMSE2.102.431.292.141.931.851.82
MAE1.111.170.741.091.000.910.94
MAPE5.828.094.426.355.405.735.59
September
RMSE1.512.331.381.781.341.891.57
MAE0.831.350.740.970.711.020.81
MAPE4.7611.495.408.064.178.996.96
October
RMSE11.435.002.067.009.564.416.43
MAE5.362.771.043.894.512.453.60
MAPE30.9028.616.4328.1026.1824.5325.97
November
RMSE2.592.662.442.322.402.462.31
MAE1.461.441.211.191.291.251.16
MAPE9.779.877.797.178.587.527.13
December
RMSE3.873.905.253.694.604.254.07
MAE2.021.942.711.902.432.202.16
MAPE13.8822.1421.1318.6718.4721.2219.44
Average
RMSE5.254.402.984.544.723.814.20
MAE2.672.251.492.332.391.942.16
MAPE16.1717.779.4216.5214.6314.8315.15
RMSE, MAE, and MAPE values are in %.

References

  1. International Energy Agency. Global Energy Review 2021. 2021. Available online: https://www.iea.org/reports/global-energy-review-2021/renewables (accessed on 31 March 2023).
  2. International Renewable Energy Agency. Renewable Energy and Climate Pledges: Five Years after the Paris Agreement. Abu Dhabi. Available online: https://www.irena.org/-/media/Files/IRENA/Agency/Publication/2020/Dec/IRENA_NDC_update_2020.pdf?rev=cdad99bc95ce4bff98ee48e157847a9f (accessed on 31 March 2023).
  3. International Energy Agency. “Solar PV”, Paris, Sep. Available online: https://www.iea.org/reports/solar-pv (accessed on 31 March 2023).
  4. Nwaigwe, K.; Mutabilwa, P.; Dintwa, E. An overview of solar power (PV systems) integration into electricity grids. Mater. Sci. Energy Technol. 2019, 2, 629–633. [Google Scholar] [CrossRef]
  5. Shafiullah; Ahmed, S.D.; Al-Sulaiman, F.A. Grid Integration Challenges and Solution Strategies for Solar PV Systems: A Review. IEEE Access 2022, 10, 52233–52257. [Google Scholar] [CrossRef]
  6. Zaporozhets, A.; Sverdlova, A. Photovoltaic technologies: Problems, technical and economic losses, prospects. In Proceedings of the 1st International Workshop on Information Technologies: Theoretical and Applied Problems, Ternopil, Ukraine, 16–18 November 2021. [Google Scholar]
  7. Raza, M.Q.; Nadarajah, M.; Ekanayake, C. On recent advances in PV output power forecast. Sol. Energy 2016, 136, 125–144. [Google Scholar] [CrossRef]
  8. Pawar, P.; Mithulananthan, N.; Raza, M.Q. Solar PV Power Forecasting Using Modified SVR with Gauss-Newton Method. In Proceedings of the 2020 2nd Global Power, Energy and Communication Conference (GPECOM), Izmir, Turkey, 20–23 October 2020; pp. 226–231. [Google Scholar] [CrossRef]
  9. Kim, B.; Suh, D.; Otto, M.-O.; Huh, J.-S. A Novel Hybrid Spatio-Temporal Forecasting of Multisite Solar Photovoltaic Generation. Remote Sens. 2021, 13, 2605. [Google Scholar] [CrossRef]
  10. Grzebyk, D.; Alcañiz, A.; Donker, J.C.; Zeman, M.; Ziar, H.; Isabella, O. Individual yield nowcasting for residential PV systems. Sol. Energy 2023, 251, 325–336. [Google Scholar] [CrossRef]
  11. Paulescu, M.; Paulescu, E.; Gravila, P.; Badescu, V. Weather Modeling and Forecasting of PV Systems Operation; Green Energy and Technology; Springer: London, UK, 2013. [Google Scholar] [CrossRef]
  12. Harrou, F.; Kadri, F.; Sun, Y.; Harrou, F.; Kadri, F.; Sun, Y. Forecasting of Photovoltaic Solar Power Production Using LSTM Approach. In Advanced Statistical Modeling, Forecasting, and Fault Detection in Renewable Energy Systems; IntechOpen: London, UK, 2020. [Google Scholar] [CrossRef]
  13. Gupta, A.K.; Singh, R.K. Short-term day-ahead photovoltaic output forecasting using PCA-SFLA-GRNN algorithm. Front. Energy Res. 2022, 10, 1029449. [Google Scholar] [CrossRef]
  14. Das, U.K.; Tey, K.S.; Seyedmahmoudian, M.; Mekhilef, S.; Idris, M.Y.I.; Van Deventer, W.; Horan, B.; Stojcevski, A. Forecasting of photovoltaic power generation and model optimization: A review. Renew. Sustain. Energy Rev. 2018, 81, 912–928. [Google Scholar] [CrossRef]
  15. Antonanzas, J.; Osorio, N.; Escobar, R.; Urraca, R.; Martinez-De-Pison, F.J.; Antonanzas-Torres, F. Review of photovoltaic power forecasting. Sol. Energy 2016, 136, 78–111. [Google Scholar] [CrossRef]
  16. Mayer, M.J.; Gróf, G. Extensive comparison of physical models for photovoltaic power forecasting. Appl. Energy 2021, 283, 116239. [Google Scholar] [CrossRef]
  17. Yang, D.; Kleissl, J.; Gueymard, C.A.; Pedro, H.T.; Coimbra, C.F. History and trends in solar irradiance and PV power forecasting: A preliminary assessment and review using text mining. Sol. Energy 2018, 168, 60–101. [Google Scholar] [CrossRef]
  18. Sobri, S.; Koohi-Kamali, S.; Rahim, N.A. Solar photovoltaic generation forecasting methods: A review. Energy Convers. Manag. 2018, 156, 459–497. [Google Scholar] [CrossRef]
  19. Inman, R.H.; Pedro, H.T.; Coimbra, C.F. Solar forecasting methods for renewable energy integration. Prog. Energy Combust. Sci. 2013, 39, 535–576. [Google Scholar] [CrossRef]
  20. Sharma, J.; Soni, S.; Paliwal, P.; Saboor, S.; Chaurasiya, P.K.; Sharifpur, M.; Khalilpoor, N.; Afzal, A. A novel long term solar photovoltaic power forecasting approach using LSTM with Nadam optimizer: A case study of India. Energy Sci. Eng. 2022, 10, 2909–2929. [Google Scholar] [CrossRef]
  21. Zhong, Y.-J.; Wu, Y.-K. Short-Term Solar Power Forecasts Considering Various Weather Variables. In Proceedings of the 2020 International Symposium on Computer, Consumer and Control (IS3C), Taichung City, Taiwan, 13–16 November 2020; pp. 432–435. [Google Scholar] [CrossRef]
  22. Dimitropoulos, N.; Sofias, N.; Kapsalis, P.; Mylona, Z.; Marinakis, V.; Primo, N.; Doukas, H. Forecasting of short-term PV production in energy communities through Machine Learning and Deep Learning algorithms. In Proceedings of the 2021 12th International Conference on Information, Intelligence, Systems & Applications (IISA), Chania Crete, Greece, 12–14 July 2021; pp. 1–6. [Google Scholar] [CrossRef]
  23. Santos, M.L.; García-Santiago, X.; Camarero, F.E.; Gil, G.B.; Ortega, P.C. Application of Temporal Fusion Transformer for Day-Ahead PV Power Forecasting. Energies 2022, 15, 5232. [Google Scholar] [CrossRef]
  24. Zhu, R.; Guo, W.; Gong, X. Short-Term Photovoltaic Power Output Prediction Based on k-Fold Cross-Validation and an Ensemble Model. Energies 2019, 12, 1220. [Google Scholar] [CrossRef]
  25. Sheng, F.; Jia, L. Short-Term Load Forecasting Based on SARIMAX-LSTM. In Proceedings of the 2020 5th International Conference on Power and Renewable Energy (ICPRE), Shanghai, China, 12–14 September 2020; pp. 90–94. [Google Scholar] [CrossRef]
  26. Sotto, M.; Bauzon, M.; Cañete, J.; Principe, J. AHI-8 SWR Adjustment using CLOT-derived Correction Factor for Solar PV Power Potential Assessment in the Philippines. In Proceedings of the 31st IIS Forum Earth Observation, Disaster Monitoring and Risk Assessment from Space, Tokyo, Japan, 6–7 March 2023. [Google Scholar]
  27. Principe, J.; Takeuchi, W. Assessment of solar PV power potential over Asia Pacific region with remote sensing considering meteorological factors. J. Renew. Sustain. Energy 2019, 11, 013502. [Google Scholar] [CrossRef]
  28. Hersbach, H.; Bell, B.; Berrisford, P.; Biavati, G.; Horányi, A.; Muñoz Sabater, J.; Thépaut, J.N. ERA5 hourly data on single levels from 1979 to present. In Copernicus Climate Change Service (C3S) Climate Data Store (CDS); European Commission: Brussels, Belgium, 2018. [Google Scholar] [CrossRef]
  29. Copernicus Knowledge Base. ERA5: How to Calculate Wind Speed and Wind Direction from u and v Components of the Wind?—Copernicus Knowledge Base—ECMWF Confluence Wiki, May 16. 2022. Available online: https://confluence.ecmwf.int/pages/viewpage.action?pageId=133262398 (accessed on 20 January 2023).
  30. Alduchov, O.A.; Eskridge, R.E. Improved Magnus’ form approximation of saturation vapor pressure. J. Appl. Meteorol. 1996, 35, 601–609. [Google Scholar] [CrossRef]
  31. Schwingshackl, C.; Petitta, M.; Wagner, J.; Belluardo, G.; Moser, D.; Castelli, M.; Zebisch, M.; Tetzlaff, A. Wind Effect on PV Module Temperature: Analysis of Different Techniques for an Accurate Estimation. Energy Procedia 2013, 40, 77–86. [Google Scholar] [CrossRef]
  32. Waterworth, D.; Armstrong, A. Southerly winds increase the electricity generated by solar photovoltaic systems. Sol. Energy 2020, 202, 123–135. [Google Scholar] [CrossRef]
  33. Alam, M.; Nahid-Al-Masood; Razee, I.A.; Zunaed, M. Solar PV Power Forecasting Using Traditional Methods and Machine Learning Techniques. In Proceedings of the 2021 IEEE Kansas Power and Energy Conference (KPEC), Manhattan, KS, USA, 19–20 April 2021; pp. 1–5. [Google Scholar] [CrossRef]
  34. Suresh, V.; Janik, P.; Rezmer, J.; Leonowicz, Z. Forecasting Solar PV Output Using Convolutional Neural Networks with a Sliding Window Algorithm. Energies 2020, 13, 723. [Google Scholar] [CrossRef]
  35. Sharma, N.; Puri, V.; Mahajan, S.; Abualigah, L.; Abu Zitar, R.; Gandomi, A.H. Solar power forecasting beneath diverse weather conditions using GD and LM-artificial neural networks. Sci. Rep. 2023, 13, 8517. [Google Scholar] [CrossRef] [PubMed]
  36. Essam, Y.; Ahmed, A.N.; Ramli, R.; Chau, K.-W.; Ibrahim, M.S.I.; Sherif, M.; Sefelnasr, A.; El-Shafie, A. Investigating photovoltaic solar power output forecasting using machine learning algorithms. Eng. Appl. Comput. Fluid Mech. 2022, 16, 2002–2034. [Google Scholar] [CrossRef]
  37. Jung, Y.; Jung, J.; Kim, B.; Han, S. Long short-term memory recurrent neural network for modeling temporal patterns in long-term power forecasting for solar PV facilities: Case study of South Korea. J. Clean. Prod. 2020, 250, 119476. [Google Scholar] [CrossRef]
  38. Benitez, I.B.; Ibañez, J.A.; Lumabad, C.D.; Cañete, J.M.; Reyes, F.N.D.L.; Principe, J.A. A novel data gaps filling method for solar PV output forecasting. J. Renew. Sustain. Energy 2023, 15, 046102. [Google Scholar] [CrossRef]
  39. Dickey, D.A.; Fuller, W.A. Likelihood Ratio Statistics for Autoregressive Time Series with a Unit Root. Econometrica 1981, 49, 1057–1072. [Google Scholar] [CrossRef]
  40. Phillips, P.C.B.; Perron, P. Testing for a unit root in time series regression. Biometrika 1988, 75, 335–346. [Google Scholar] [CrossRef]
  41. Kwiatkowski, D.; Phillips, P.C.B.; Schmidt, P.; Shin, Y. Testing the null hypothesis of stationarity against the alternative of a unit root: How sure are we that economic time series have a unit root? J. Econom. 1992, 54, 159–178. [Google Scholar] [CrossRef]
  42. Gholamy; Kreinovich, V.; Kosheleva, O. Why 70/30 or 80/20 Relation between Training and Testing Sets: A Pedagogical Explanation Departmental Technical Reports (CS), Feb. 2018. Available online: https://scholarworks.utep.edu/cs_techrep/1209 (accessed on 13 June 2023).
  43. Manigandan, P.; Alam, M.D.S.; Alharthi, M.; Khan, U.; Alagirisamy, K.; Pachiyappan, D.; Rehman, A. Forecasting Natural Gas Production and Consumption in United States-Evidence from SARIMA and SARIMAX Models. Energies 2021, 14, 6021. [Google Scholar] [CrossRef]
  44. Au, J.; Saldaña, J., Jr.; Spanswick, B.; Santerre, J. Forecasting Power Consumption in Pennsylvania During the COVID-19 Pandemic: A SARIMAX Model with External COVID-19 and Unemployment Variables. SMU Data Sci. Rev. 2020, 3, 6. [Google Scholar]
  45. Xie, M.; Sandels, C.; Zhu, K.; Nordström, L. A seasonal ARIMA model with exogenous variables for elspot electricity prices in Sweden. In Proceedings of the 2013 10th International Conference on the European Energy Market (EEM), Stockholm, Sweden, 27–31 May 2013; pp. 1–4. [Google Scholar] [CrossRef]
  46. Smith, T.G. Pmdarima: ARIMA Estimators for Python, alkaline-ml, Sep. 03. 2017. Available online: https://alkaline-ml.com/pmdarima/modules/generated/pmdarima.arima.auto_arima.html#pmdarima-arima-auto-arima (accessed on 27 March 2023).
  47. Bamisile; Ejiyi, C.J.; Osei-Mensah, E.; Chikwendu, I.A.; Li, J.; Huang, Q. Long-Term Prediction of Solar Radiation Using XGboost, LSTM, and Machine Learning Algorithms. In Proceedings of the 2022 4th Asia Energy and Electrical Engineering Symposium (AEEES), Chengdu, China, 25–28 March 2022; pp. 214–218. [Google Scholar] [CrossRef]
  48. Chen, T.; Guestrin, C. XGBoost: A Scalable Tree Boosting System. In Proceedings of the KDD ’16: 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Francisco, CA, USA, 13–17 August 2016; Association for Computing Machinery: New York, NY, USA, 2016; pp. 785–794. [Google Scholar] [CrossRef]
  49. Alcañiz, A.; Grzebyk, D.; Ziar, H.; Isabella, O. Trends and gaps in photovoltaic power forecasting with machine learning. Energy Rep. 2023, 9, 447–471. [Google Scholar] [CrossRef]
Figure 1. Methodology on solar PV output forecasting using SARIMAX, LSTM, and XGBoost.
Figure 1. Methodology on solar PV output forecasting using SARIMAX, LSTM, and XGBoost.
Energies 16 07823 g001
Figure 2. Comparison between Plant 1 monthly average solar PV output and model predicted values.
Figure 2. Comparison between Plant 1 monthly average solar PV output and model predicted values.
Energies 16 07823 g002
Figure 3. Comparison between Plant 2 monthly average solar PV output and model predicted values.
Figure 3. Comparison between Plant 2 monthly average solar PV output and model predicted values.
Energies 16 07823 g003
Figure 4. Comparison between Plant 3 monthly average solar PV output and model predicted values.
Figure 4. Comparison between Plant 3 monthly average solar PV output and model predicted values.
Energies 16 07823 g004
Table 1. List of recent studies that used statistical, machine learning, and hybrid models to predict solar PV output.
Table 1. List of recent studies that used statistical, machine learning, and hybrid models to predict solar PV output.
StudyYearLocationMethodologyTrain-Test
Ratio
Error
Metrics
[10]2023Netherlands and BelgiumXGBoost50–50MAE
MAPE
RMSE
[20]2022IndiaLSTM
ARIMA
SARIMA
Variable from 75–25 to 90–10RMSE
MSE
[21]2021TaiwanANN
LSTM
XGBoost
60–40MAPE
RMSE
NRMSE
[9]2021Incheon, Busan, and YeongamSARIMAX
linear SVR
LSTM
DNN
RF
SARIMAX-LSTM
75–25MAE
RMSE
sMAPE
MBE
Cv
[22]2021Undisclosed locationLSTM
SVR
MLR
XGBoost
80–20Rsq.
RMSE
[23]2022Germany and AustraliaTFT
ARIMA
LSTM
MLP
XGBoost
70–20–20RMSE
MAE
MASE
Rsq.
QuantileLoss
[24]2019ChinaMLP
GRU
XGBoost
Ensemble
80–10 (random sampling)MAE
MAPE
RMSE
Table 2. Summary of exogenous variables used in the forecasting models.
Table 2. Summary of exogenous variables used in the forecasting models.
VariableUnitShort NameRelated Literature
Wind speedm·s−2WS[31,32]
Wind directionCardinal directionWD[9,33]
Ambient temperature°CT[32,34]
Relative humidity%RH[32,35]
Total precipitationmmTP[36,37]
Cloud cover
(low, medium,
high, total)
oktaLCC, MCC, HCC, TCC[9]
Solar irradianceW·m−2R’[9,12]
Table 3. Summary of hyperparameters used in the SARIMAX forecasting models.
Table 3. Summary of hyperparameters used in the SARIMAX forecasting models.
HyperparameterPlant 1Plant 2Plant 3
p131
d000
q051
P535
D000
Q111
Table 4. Summary of LSTM hyperparameters.
Table 4. Summary of LSTM hyperparameters.
HyperparameterPlant 1Plant 2Plant 3
OptimizerAdamAdamAdam
Learning rate 10.010.010.01
Epochs 2500500500
Loss functionMAEMAEMAE
Hidden layers
 LSTM333
 LSTM units323232
 ActivationtanHtanHtanH
Output layers
 LayerDenseDenseDense
 ActivationLinearLinearLinear
1 With auto-reduction on plateau. 2 With save best model and early stopping.
Table 5. Summary of XGBoost hyperparameters.
Table 5. Summary of XGBoost hyperparameters.
HyperparameterPlant 1Plant 2Plant 3
learning_rate0.30.010.01
n_estimators100020002000
subsample111
colsample_bytree111
colsample_bylevel111
min_child_weight111
max_depth666
objectivereg:squarederrorreg:squarederrorreg:squarederror
Table 6. Summary of hybrid model configurations.
Table 6. Summary of hybrid model configurations.
Model NameAlgorithm
HM1SARIMAX + LSTM
HM2SARIMAX + XGBoost
HM3LSTM + XGBoost
HM4SARIMAX + LSTM + XGBoost
Table 7. Summary of representative solar PV installations used forecasting model.
Table 7. Summary of representative solar PV installations used forecasting model.
Plant NumberMajor IslandLocationInstalled Capacity (kW)
1LuzonPangasinan40.92
2VisayasNegros Occidental605.00
3MindanaoDavao del Norte1110.00
Table 8. Summary of exogenous variables used in the forecasting model.
Table 8. Summary of exogenous variables used in the forecasting model.
ParameterPlant 1Plant 2Plant 3
R’0.98 *0.78 *0.75 *
WS0.100.340.30
WD−0.080.11−0.31
RH−0.66 *−0.78 *−0.72 *
T0.73 *0.78 *0.76 *
TP−0.09−0.13−0.04
HCC−0.050.09−0.11
LCC−0.080.09−0.02
MCC−0.09−0.06−0.05
TCC0.030.210.06
* /r/ ≥ 0.5 and they are significant.
Table 9. Summary of VIF results.
Table 9. Summary of VIF results.
ParameterPlant 1Plant 2Plant 3
Step 1Step 2Step 1Step 2Step 3Step 1Step 2Step 3
R’1.551.551.851.851.681.581.581.46
WS1.121.121.111.111.091.121.121.07
WD1.281.271.201.191.101.101.101.09
RH2.572.555.395.391.889.409.401.69
T2.032.025.455.45 *rmvd10.0210.02 *rmvd
TP1.311.271.131.131.131.221.201.20
HCC15.591.3110.381.211.2116.031.161.16
LCC1.411.241.261.131.121.351.191.19
MCC1.731.301.271.171.171.561.441.43
TCC17.82 *rmvd10.73 *rmvd-16.99 *rmvd-
* VIF > 5 and are first to be removed (rmvd). The steps end once VIF < 5.
Table 10. Summary of Plant 1 average model accuracies.
Table 10. Summary of Plant 1 average model accuracies.
ModelSARIMAXLSTMXGBoostHM1HM2HM3HM4
Metrics
RMSE4.957.733.916.014.145.865.21
MAE2.443.411.792.862.012.722.55
MAPE10.1517.5012.4715.1912.3515.5914.71
RMSE, MAE, and MAPE values are in %. The monthly accuracies for each model are summarized in Table A1 in Appendix A.
Table 11. Summary of Plant 2 average model accuracies.
Table 11. Summary of Plant 2 average model accuracies.
ModelSARIMAXLSTMXGBoostHM1HM2HM3HM4
Metrics
RMSE5.025.614.064.984.334.564.51
MAE2.462.992.112.572.202.412.34
MAPE13.1628.0919.0523.2516.7424.4722.27
RMSE, MAE, and MAPE values are in %. The monthly accuracies for each model are summarized in Table A2 in Appendix A.
Table 12. Summary of Plant 3 average model accuracies.
Table 12. Summary of Plant 3 average model accuracies.
ModelSARIMAXLSTMXGBoostHM1HM2HM3HM4
Metrics
RMSE5.254.402.984.544.723.814.20
MAE2.672.251.492.332.391.942.16
MAPE16.1717.779.4216.5214.6314.8315.15
RMSE, MAE, and MAPE values are in %. The monthly accuracies for each model are summarized in Table A3 in Appendix A.
Table 13. Forecasting models with the highest accuracy based on the MAPE for each month at each power plant.
Table 13. Forecasting models with the highest accuracy based on the MAPE for each month at each power plant.
MonthPlant 1Plant 2Plant 3
JanuaryXGBoostXGBoostXGBoost
FebruaryXGBoostXGBoostXGBoost
MarchHM2HM2XGBoost
AprilXGBoostXGBoostXGBoost
MayXGBoostHM3XGBoost
JuneXGBoostXGBoostXGBoost
JulyXGBoostHM3XGBoost
AugustXGBoostSARIMAXXGBoost
SeptemberXGBoostHM4HM2
OctoberHM2HM3XGBoost
NovemberSARIMAXSARIMAXHM4
DecemberSARIMAXSARIMAXSARIMAX
Annual AverageSARIMAXSARIMAXXGBoost
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Benitez, I.B.; Ibañez, J.A.; Lumabad, C.I.D.; Cañete, J.M.; Principe, J.A. Day-Ahead Hourly Solar Photovoltaic Output Forecasting Using SARIMAX, Long Short-Term Memory, and Extreme Gradient Boosting: Case of the Philippines. Energies 2023, 16, 7823. https://doi.org/10.3390/en16237823

AMA Style

Benitez IB, Ibañez JA, Lumabad CID, Cañete JM, Principe JA. Day-Ahead Hourly Solar Photovoltaic Output Forecasting Using SARIMAX, Long Short-Term Memory, and Extreme Gradient Boosting: Case of the Philippines. Energies. 2023; 16(23):7823. https://doi.org/10.3390/en16237823

Chicago/Turabian Style

Benitez, Ian B., Jessa A. Ibañez, Cenon III D. Lumabad, Jayson M. Cañete, and Jeark A. Principe. 2023. "Day-Ahead Hourly Solar Photovoltaic Output Forecasting Using SARIMAX, Long Short-Term Memory, and Extreme Gradient Boosting: Case of the Philippines" Energies 16, no. 23: 7823. https://doi.org/10.3390/en16237823

APA Style

Benitez, I. B., Ibañez, J. A., Lumabad, C. I. D., Cañete, J. M., & Principe, J. A. (2023). Day-Ahead Hourly Solar Photovoltaic Output Forecasting Using SARIMAX, Long Short-Term Memory, and Extreme Gradient Boosting: Case of the Philippines. Energies, 16(23), 7823. https://doi.org/10.3390/en16237823

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop