Regional Photovoltaic Power Forecasting Using Vector Autoregression Model in South Korea

: Renewable energy forecasting is a key for efﬁcient resource use in terms of power generation and safe grid control. In this study, we investigated a short-term statistical forecasting model with 1 to 3 h horizons using photovoltaic operation data from 215 power plants throughout South Korea. A vector autoregression (VAR) model-based regional photovoltaic power forecasting system is proposed for seven clusters of power plants in South Korea. This method showed better predictability than the autoregressive integrated moving average (ARIMA) model. The normalized root-mean-square errors of hourly photovoltaic generation predictions obtained from VAR (ARIMA) were 8.5–10.9% (9.8–13.0%) and 18.5–22.8% (21.3–26.3%) for 1 h and 3 h horizon, respectively, at 215 power plants. The coefﬁcient of determination, R 2 was higher for VAR, at 4–5%, than ARIMA. The VAR model had greater accuracy than ARIMA. This will be useful for economical and efﬁcient grid management.


Introduction
The use of renewable energy is increasing due to the environmental impacts of fossil fuels and energy security.The European Union recently announced the REPowerEU plan for developing more affordable, secure, and sustainable energy sources such as solar power to displace consumption of natural gas [1].The most commonly used renewable energy type in South Korea is solar energy, with national targets of 20% renewable energy by 2030 and 42% by 2034 [2].Increased shares of solar power energy must be considered in advance for distributed grid operations and planning [3,4].In Jeju, South Korea, curtailment in solar power energy has been started under a 20% share of the energy mix.Reliable forecasting of renewable energy is key to optimization and technical development of management methods based on load balancing, which can reduce annual energy operating costs and ensure safe grid operation [5].In addition, the volatility of power generation can be high due to weather conditions in the surrounding oceans of South Korea.Thus, accurate forecasting is essential.
Numerous studies have predicted solar power generation using various methods.Studies that have employed artificial neural network (ANN) or autoregressive integrated moving average (ARIMA) models with variables such as solar radiation, generator module temperature, and module ambient temperature have also been reported [6].A support vector machine (SVM) model that predicts 10 min solar power generation has been designed [7].In addition, statistical models such as time-series analysis, neural network, SVMs, and numerical weather prediction models based on satellite data have been developed for short-term solar power forecasting [8][9][10][11][12][13].The time-series method has been used with a neural network model to predict ultra-short-term solar power generation [14].Other studies have established models to predict power consumption based on ARIMA and various ANNs [15].Additional studies have employed several extended models based on the ANN and ARIMA models.The adaptive neuro-fuzzy inference system (ANFIS) [16], Infinite impulse response multilayer perceptron (IIR-MLP) [17], locally feedback dynamic fuzzy neural network (LF-DFNN) [18], and deep learning models with several layers such as the convolution neural-network (CNN), recurrent neural network (RNN), long-short term memory neural network (LSTM), and CNN-LSTM and RNN-LSTM hybrid [19][20][21][22][23][24] are examples of extended models based on ANN.These models have demonstrated good predictability.The ARIMA model's predictive performance can be improved by applying multiple fractal noise to the noise of the time series [25].Accuracy of the results can also be improved through application of an ensemble model and ANN to leverage the principal components between independent variables [26].Predictive models using machine learning methods, such as ANNs, have the advantage of high accuracy and the disadvantage of low interpretability.In comparison, predictive models that use statistical techniques have the advantage easy interpretation [27,28].
In this study, we propose a solar power forecasting system using a spatiotemporal time-series analysis model.The proposed model is novel in terms of efficiently managing photovoltaic power plant groups rather than individual plants.We designed the forecasting models using solar power generation data from 215 power plants throughout South Korea.We established models for forecasting solar power generation 1 to 3 h in advance, which are the vector autoregression (VAR) and ARIMA models.We then compared the accuracy of the two models.Section 2 provides detailed descriptions of the model and data.The constructed forecasting system is presented in Section 3. Summary and conclusions are given in Section 4. series [25].Accuracy of the results can also be improved through ensemble model and ANN to leverage the principal components be variables [26].Predictive models using machine learning methods, su the advantage of high accuracy and the disadvantage of low comparison, predictive models that use statistical techniques have t interpretation [27,28].

Regional Solar Power Forecasting System
In this study, we propose a solar power forecasting system usin time-series analysis model.The proposed model is novel in terms of e photovoltaic power plant groups rather than individual plants.forecasting models using solar power generation data from 215 powe South Korea.We established models for forecasting solar power gen advance, which are the vector autoregression (VAR) and ARIMA compared the accuracy of the two models.Section 2 provides detailed model and data.The constructed forecasting system is presented in S and conclusions are given in Section 4. The process of model construction was largely divided definition of regional management areas using cluster analysis, construction, and model evaluation.The datasets used for the thr slightly.Cluster analysis was conducted for the period from January 1 to December 31, 2016, at 23:00 LST.Afterward, the forecasting mode developed using the training data from January 1, 2014, at 00:00 LST to at 23:00 LST.Finally, the model evaluation used the data from January to December 31, 2016, at 23:00 LST.Energies 2022, 15, 7853 3 of 13 2.1.1.Regional Management Area Cluster analysis is used to classify observations based on their similarity.Euclidean distance is commonly used as a measure of similarity and the Ward method, a hierarchical clustering analysis method, was used here.The hierarchical K-means clustering method is clustering analysis method that complements K-means clustering with hierarchical cluster analysis.First, through hierarchical cluster analysis, the number of clusters and their center points are determined, and then K-means clustering is performed using the calculated values as the initial center points.Through this process, an observation is assigned to a specific cluster during hierarchical clustering, and thus the disadvantage of that observation being excluded from further consideration is complemented.Furthermore, this method avoids the difficulty of selecting initial center points for K-mean clustering.

Forecasting Models
Figure 2 describes the data movement for 1 to 3 h ahead prediction.
Cluster analysis is used to classify observations based on their similarity.Euclid distance is commonly used as a measure of similarity and the Ward method, a hierarch clustering analysis method, was used here.The hierarchical K-means clustering met is clustering analysis method that complements K-means clustering with hierarch cluster analysis.First, through hierarchical cluster analysis, the number of clusters their center points are determined, and then K-means clustering is performed using calculated values as the initial center points.Through this process, an observatio assigned to a specific cluster during hierarchical clustering, and thus the disadvantag that observation being excluded from further consideration is complemen Furthermore, this method avoids the difficulty of selecting initial center points for K-m clustering.Algorithm 1 shows the brief process how to predict the solar power generation u VAR and ARIMA models.Algorithm 1: The prediction algorithm for solar power generation using the VAR and ARIMA models is as follows:

Forecasting Models
Step 1. Build the models at each point using solar power generation data for the train period to construct the VAR (p) and ARIMA (p, d, q) models.
Step 2. Update the coefficients of the VAR and ARIMA models using data from the previous 30 days.Step 3. Predict solar power generation 1 to 3 h in advance using the updated VAR an ARIMA models.
Step 4. Repeat Step. 2 and 3 to estimate solar power generation during the model evaluation period.Algorithm 1 shows the brief process how to predict the solar power generation using VAR and ARIMA models.
Algorithm 1: The prediction algorithm for solar power generation using the VAR and ARIMA models is as follows: Step 1. Build the models at each point using solar power generation data for the training period to construct the VAR (p) and ARIMA (p, d, q) models.
Step 2. Update the coefficients of the VAR and ARIMA models using data from the previous 30 days.
Step 3. Predict solar power generation 1 to 3 h in advance using the updated VAR and ARIMA models.
Step 4. Repeat Step. 2 and 3 to estimate solar power generation during the model evaluation period.
Energies 2022, 15, 7853 VAR is a model that extends the AR model, which is a univariate time-series analysis model, to a multivariate model, and reflects the correlation variables [29].VAR models are generally denoted as VAR (p) where parameters p is the non-negative integers representing the order of the maximum lag.To determine the order of a VAR model, we often use Akaike's Information Criterion (AIC, [14]), the corrected AIC (AICc), and Schwartz's Bayesian Criterion (SBC, [15]). where ARIMA is a statistical model that considers both auto-regression and moving averages.The AR part of ARIMA represents the current observation in the form of a function of the past observations if the current state depends on the past state in the time series process.The MA part of ARIMA represents the error between the past observation value and the current observation value in the form of a function.The ARIMA model is a model used when data have non-stationarity in which deterministic and stochastic trends are shown, and variable transformation or differencing is performed to eliminate nonstationarity.A non-seasonal ARIMA model is expressed by ARIMA (p,d,q); p represents the order of the AR model, q represents the order of the MA model, and d represents the differencing.
To determine the order of an ARIMA model, we can also use AIC, AICc, and SBC. where

Data and Evaluation Method
Solar power generation data was input from 2014 to 2016 for development and evaluation of the forecasting system.The solar power generation data were collected at 1 h intervals from 00:00 1 January 2014, to 23:00 31 December 2016.Only 215 power plants were used for the analysis, excluding 12 power plants with zero solar power generation for periods longer than two weeks.In addition, data for analysis were created by filling gaps in the observation with 0 for the dusk and night h of 7 p.m. to 7 a.m. when no observations were available.The analysis was performed using standardized data obtained by dividing the raw solar power generation data by the capacity of power plants from 00:00 1 January 2014, to 23:00 31 December 2016.Figure 3 shows the 215 power plants on a national map.
To compare the accuracy of the row models, we used normalized root mean square error (nRMSE).This indicator is calculated by dividing the root mean squared error by the capacity of solar power plants.A lower nRMSE, value indicates a more accurate model.The formula for nRMSE is presented as Equation (3).In addition, P-values were calculated using Student's t-test to determine the significance of prediction errors of the VAR and ARIMA models.The actual and predicted values used for calculating the prediction error were not standardizing.nRMSE = 1 Solar generator capacity for periods longer than two weeks.In addition, data for analysis were created by f gaps in the observation with 0 for the dusk and night h of 7 p.m. to 7 a.m.whe observations were available.The analysis was performed using standardized obtained by dividing the raw solar power generation data by the capacity of power p from 00:00 January 1, 2014, to 23:00 December, 31 2016.Figure 3 shows the 215 p plants on a national map.

Regonal Photovaltaic Power Management Areas
The results of Hierarchical K-means clustering are shown in Figure 4. Based on Figure 4, the most appropriate number of clusters is seven.To compare the accuracy of the row models, we used normalized root mean square error (nRMSE).This indicator is calculated by dividing the root mean squared error by the capacity of solar power plants.A lower nRMSE, value indicates a more accurate model.The formula for nRMSE is presented as equation (3).In addition, P-values were calculated using Student's t-test to determine the significance of prediction errors of the VAR and ARIMA models.The actual and predicted values used for calculating the prediction error were not standardizing.

Regonal photovaltaic power management areas
The results of Hierarchical K-means clustering are shown in Figure 4. Based on Figure 4, the most appropriate number of clusters is seven.As a result of mapping the plants by cluster, we determined that clustering resu are divided by region.The first group contains 45 power plants mainly in Jeollabuk-The second group includes 37 power plants in Jeollanam-do near the West Sea, and third group has 22 power plants in Jeollanam-do, near to the South Sea.The fourth gro consists of 30 power plants in Gyeongsangnam-do and the fifth group contains 34 pow plants in Gyeongsangbuk-do and Gangwon-do.The sixth group includes 36 power pla in Seoul, Gyeonggi-do, and Chungcheong-do.The seventh group is comprised of power plants on Jeju Island.As a result of mapping the plants by cluster, we determined that clustering results are divided by region.The first group contains 45 power plants mainly in Jeollabuk-do.The second group includes 37 power plants in Jeollanam-do near the West Sea, and the third group has 22 power plants in Jeollanam-do, near to the South Sea.The fourth group consists of 30 power plants in Gyeongsangnam-do and the fifth group contains 34 power plants in Gyeongsangbuk-do and Gangwon-do.The sixth group includes 36 power plants in Seoul, Gyeonggi-do, and Chungcheong-do.The seventh group is comprised of 11 power plants on Jeju Island.

Construction of a Regional Photovoltaic Power Prediction System
The VAR and ARIMA models were used for time-series analysis, and the prediction period for the model ranged from 1 to 3 h.The forecasting model based on VAR contained seven forecast models, representing each cluster, while 215 forecast models representing individual power plants were constructed for the ARIMA model.

VAR
Table 2 shows the results of varying the p-value to explore the appropriate VAR(p) model for Group 2 along with AIC and SBC values.As shown in Table 2, the VAR model most suitable for Group 2 is the VAR (2) model, which has the lowest SBC value.Although the VAR (5) model has the smallest AIC value, based on the principle of parsimony, the simpler model of VAR ( 2) is preferred over the VAR (5) model.

Group VAR (p)
Group 1 (2) Group 2 (2) Group 3 (2) Group 4 (2) Group 5 (2) Group 6 (2) Group 7 (3) Figure 6 is a time-series plot comparing values predicted using the VAR model from 00:00 on 1 October 2016, to 23:00 on 7 October 2016.The black line represents actual observations, the red line represents the 1 h predicted values calculated using the model, the green line represents the 2 h predicted values, and the blue line represents the 3 h predicted values.

ARIMA
The statistical model fit discrimination measures, AIC and SBC, were used to determine the p, d, and q values of the ARIMA (p,d,q) model.Lower results for both AIC and SBC indicate a better model fit.Table 4 shows the results of ARIMA model construction using the training dataset from 1 January 2014, to 31 December 2015 at power plant #9289 in Group 2. As shown in Table 4, the most suitable ARIMA model for solar power plant #9289 was the ARIMA model with the lowest AIC and SBC values.Table 5 presents the results of ARIMA model fitting for solar power plants with their frequencies in each cluster.
Figure 7 is a time-series plots comparing predicted values using the ARIMA model from 00:00 on 1 October 2016, to 23:00 on 7 October 2016.The black line represents actual observation, the red line represents the 1 h predicted values calculated using the model, the green line represents the 2 h predicted values, and the blue line represents the 3 h predicted values.

Evaluation
Predicted solar power generation levels for 1 to 3 h horizons were evaluated from 00:00 on January 1, 2016 to 23:00 on December 31, 2016.Tables 6 and 7 compare the VAR model and ARIMA model based on nRMSE and  2 values, which are calculated from comparison of predicted values for 1 to 3 h in advance with actual observations.Tables 6  and 7 present the p-values obtained from Student's t-test to identify significance between the prediction errors of the VAR and ARIMA models.Figures 8 and 9 show bar plots of nRMSE and  2 for 1 to 3-h advance predictions obtained from the VAR and ARIMA models.

Evaluation
Predicted solar power generation levels for 1 to 3 h horizons were evaluated from 00:00 on 1 January 2016 to 23:00 on 31 December 2016.Tables 6 and 7 compare the VAR model and ARIMA model based on nRMSE and R 2 values, which are calculated from comparison of predicted values for 1 to 3 h in advance with actual observations.Tables 6 and 7 present the p-values obtained from Student's t-test to identify significance between the prediction errors of the VAR and ARIMA models.Figures 8 and 9 show bar plots of nRMSE and R 2 for 1 to 3-h advance predictions obtained from the VAR and ARIMA models.In all clusters, the prediction errors were smaller for the VAR model than the ARIMA model for 1, 2, and 3 h advance prediction of solar power generation values.In addition,  2 values were higher for the VAR model than the ARIMA model for 1, 2, and 3 h advance prediction of solar power generation prediction.This pattern suggests that the VAR model is more suitable than the ARIMA model for forecasting solar power generation 1 to 3 h in advance.
The forecasting model constructed for the seven clustered regions showed slightly different prediction results than VAR model.To investigate the possible causes of this difference, the properties of solar resources were compared.Groups 2 and 3 showed the worst and best model performance, respectively.Group 3 prediction had relatively good accuracy, with smaller nRMSE and larger  2 than the corresponding average values.Averaged hourly variation in irradiance was 0.378-0.402kWh for Group 3. On the other hands, Group 2 showed larger nRMSE and smaller R 2 than other groups for 1 to 3 h prediction times.Averaged hourly variation in irradiance was 0.374-0.774kWh for Group 2. Group 2 also exhibited greater spatial variation in hourly changes in irradiance than Group 3. Thus, VAR performance may be affected by regional differences in the spatiotemporal variation in hourly solar resources.In all clusters, the prediction errors were smaller for the VAR model than the ARIMA model for 1, 2, and 3 h advance prediction of solar power generation values.In addition, R 2 values were higher for the VAR model than the ARIMA model for 1, 2, and 3 h advance prediction of solar power generation prediction.This pattern suggests that the VAR model is more suitable than the ARIMA model for forecasting solar power generation 1 to 3 h in advance.
The forecasting model constructed for the seven clustered regions showed slightly different prediction results than the VAR model.To investigate the possible causes of this difference, the properties of solar resources were compared.Groups 2 and 3 showed the worst and best model performance, respectively.Group 3 prediction had relatively good accuracy, with smaller nRMSE and larger R 2 than the corresponding average values.Averaged hourly variation in irradiance was 0.378-0.402kWh for Group 3. On the other hands, Group 2 showed larger nRMSE and smaller R 2 than other groups for 1 to 3 h prediction times.Averaged hourly variation in irradiance was 0.374-0.774kWh for Group 2. Group 2 also exhibited greater spatial variation in hourly changes in irradiance than Group 3. Thus, VAR performance may be affected by regional differences in the spatiotemporal variation in hourly solar resources.
In the future, we plan to study the following topics.Due to the climatic of Korea, the patterns of solar radiation and solar power generation are strongly affected by seasonal changes.In addition, the regional presence of clouds affects solar radiation and solar power generation.Therefore, to improve the accuracy of solar radiation and solar power generation prediction model, a VAR model that considers seasonality or a prediction model in the form of a transition function including the presence of clouds as an exogenous variable should be used.Photovoltaic system operation in relation to various solar radiation parameters requires consideration in future advanced forecasting models, as suggested previously [30].Recent research trends on solar radiation and solar power generation prediction show that both time-series and machine learning methods are applicable.Thus, more accurate predictions may be obtained using machine learning in future studies.

Figure 1
Figure 1 is a schematic diagram showing the development of the photovoltaic power forecasting model.The forecasting model employs long-term memory for time-series analysis.The process of model construction was largely divided into three phases: definition of regional management areas using cluster analysis, forecasting model construction, and model evaluation.The datasets used for the three phases differed slightly.Cluster analysis was conducted for the period from 1 January 2014, at 00:00 LST to 31 December 2016, at 23:00 LST.Afterward, the forecasting model construction was developed using the training data from 1 January 2014, at 00:00 LST to 31 December 2015, at 23:00 LST.Finally, the model evaluation used the data from 1 January 2016 at 00:00 LST to 31 December 2016, at 23:00 LST.

Figure 1
Figure 1 is a schematic diagram showing the development of the forecasting model.The forecasting model employs long-term mem analysis.The process of model construction was largely divided definition of regional management areas using cluster analysis, construction, and model evaluation.The datasets used for the thr slightly.Cluster analysis was conducted for the period from January 1 to December 31, 2016, at 23:00 LST.Afterward, the forecasting mode developed using the training data from January 1, 2014, at 00:00 LST to at 23:00 LST.Finally, the model evaluation used the data from January to December 31, 2016, at 23:00 LST.

Figure 1 .
Figure 1.Schematic diagram of the photovoltaic power forecasting system time-series analyses.

Figure 1 .
Figure 1.Schematic diagram of the photovoltaic power forecasting system based on cluster and time-series analyses.

Figure 2
Figure 2 describes the data movement for 1 to 3 h ahead prediction.

Figure 2 .
Figure 2. Data management for prediction 1 to 3 h in advance.

Figure 2 .
Figure 2. Data management for prediction 1 to 3 h in advance.

Figure 3 .
Figure 3. Spatial distribution of 215 power plants (black dots) on a national map.

Figure 3 .
Figure 3. Spatial distribution of 215 power plants (black dots) on a national map.

Figure 4 .
Figure 4. Dendrogram of clusters based on standardized solar power generation.

Figure 5
Figure5shows the clusters, presented in different colors, on a map showing the latitude and longitude of the power plants, and Table1lists the number of plants and regions associated with each of the seven clusters.

Figure 4 .
Figure 4. Dendrogram of clusters based on standardized solar power generation.

Figure 5 Figure 5 .
Figure 5 shows the clusters, presented in different colors, on a map showing the latitude and longitude of the power plants, andTable 1 lists the number of plants and regions associated with each of the seven clusters.Energies 2022, 15, x FOR PEER REVIEW 6 o

Figure 6
Figure6is a time-series plot comparing values predicted using the VAR model from 00:00 on October 1, 2016, to 23:00 on October 7, 2016.The black line represents actua observations, the red line represents the 1 h predicted values calculated using the mode the green line represents the 2 h predicted values, and the blue line represents the 3 predicted values.

Figure 8 .
Figure 8. nRMSE bar plot comparing the VAR and ARIMA models for (a) 1 h, (b) 2 h, and (c) 3 h advance forecasting, respectively.Blue and red indicate error statistics obtained from the VAR and ARIMA forecasting models, respectively.

Figure 8 .
Figure 8. nRMSE bar plot comparing the VAR and ARIMA models for (a) 1 h, (b) 2 h, and (c) 3 h advance forecasting, respectively.Blue and red indicate error statistics obtained from the VAR and ARIMA forecasting models, respectively.

Table 1 .
The numbers of solar power plants and regions of the seven groups.

Table 1 .
The numbers of solar power plants and regions of the seven groups.

Table 2 .
Fitting results for p of the VAR model for Group 2.Table3presents the result of VAR model construction for each group obtained using a training dataset from 1 January 2014, to 31 December 2015.The optimal VAR model was selected using AIC and SBC.Table3lists the results of VAR model selection for each group based on AIC and SBC.

Table 3 .
Results of VAR model selection for each cluster.

Table 4 .
Fitting of p, d, q of the ARIMA model for power plant #9289.# of Power PlantsARIMA (p,d,q)

Table 5 .
Frequencies of ARIMA model results for solar power plants in each cluster.

Table 6 .
Comparison of the VAR and ARIMA models based on nRMSE for 1-3 h advance power forecasting.

Table 7 .
Comparison of the VAR and ARIMA models based on  2 for 1-3 h advance power forecasting.

Table 6 .
Comparison of the VAR and ARIMA models based on nRMSE for 1-3 h advance power forecasting.

Table 7 .
Comparison of the VAR and ARIMA models based on R 2 for 1-3 h advance power forecasting.