1. Introduction
The issue of rapid climate change caused by industrialization, fossil fuel depletion, and carbon emissions is emerging worldwide [
1]. Therefore, the Kyoto Protocol (1997) and Paris Agreement (2016) have been concluded for decarbonization in countries globally [
2,
3]. South Korea is one of the top 10 countries with the highest per capita carbon emissions. In response, the South Korean government announced the Renewable Energy 3020 Plan (2017) to achieve 20% renewable energy generation by 2030 and supply more than 95% of new facilities with clean energy, such as solar PV and wind power [
4]. For solar PV generation, the most popular are clean energy, large scale solar PV farms have been constructed worldwide because of the decline in the cost of solar panels and facilities of power generation systems over the past decade [
5]. The United States, Germany, and China have representative gigawatt-scale solar PV farms. South Korea has expanded to 5.7 GW in 2017, constituting 38% of the total capacity of renewable energy in the country, starting with 467 MW solar PV farms in 2013 [
6].
Solar PV generation is a technology that generates electricity by converting sunlight into electricity through the photoelectric effect when light energy from the sun passes through the atmosphere and is absorbed by the solar panel. It has the advantage of clean and infinite resources [
7]. Compared to other renewable energy generation fields, installation and maintenance costs are low, and the life expectancy is more than 20 years. Furthermore, minimal damage to the nature around the power plant occurs when installing the power plant. However, solar PV generation requires a large installation area because of its low energy density, and the amount of solar PV generation reacts sensitively to fluctuations in external meteorological factors such as clouds moving by wind, naturally occurring yellow dust, or particulate matter (PM) generated from the city center. These changes in meteorological factors are fluid and complex, preventing the prediction of solar PV generation, causing anxiety in the system stability of the Smart Grid, a technology combining information and communication technology with the power grid [
8]. Consequently, accurate demand forecasting technology that contributes to stabilize power supply and demand is critical. If an accurate supply and demand plan is not established, it can incur huge financial and social losses, such as blackouts and consuming more resources than necessary. Therefore, accurate forecasting of power generation for renewable energy sources is critical in establishing an efficient power supply and demand plan.
Recently, air pollution caused by PM has emerged as a social issue in South Korea [
9]. As the PM concentration in the atmosphere increases, it absorbs or scatters solar radiation before passing through the atmosphere and reaching the surface, reducing the amount of irradiance reaching the solar panel. Most studies have been conducted in Southeast Asia, where the effects of red soil in the dry regions of the Middle East have been analyzed or where the natural and anthropogenic emissions of PM are higher than that in other regions [
10,
11,
12]. Furthermore, these studies analyzed the phenomenon of various types of dust accumulated on the solar panel rather than the influence of PM concentrations distributed in the atmosphere. Therefore, this study analyzes and reflects on the effects of concentrations of other air pollutants, including PM
10 and PM
2.5, on solar PV generation.
Solar PV generation prediction can be classified into the direct prediction method of solar PV generation using various independent parameters and the indirect prediction method of solar PV generation using predicted irradiance as independent parameters. The prediction parameters can also be classified into two methods. The first method uses text data numerically composed of parameters, such as temperature, humidity, and precipitation, provided by the Meteorological Agency [
13,
14,
15,
16,
17]. The numerical text data of various time units comprise hourly data, and the amount of solar PV generation is predicted using the time-series characteristics contained in the data organized with time. However, this method does not reflect the spatial characteristics of parameters such as clouds and PM displaced by the wind. The second method uses motion vectors or indices of clouds and aerosols in satellite images [
18,
19,
20,
21,
22]. The shading from the clouds and scattering of light from yellow dust or PM cause significant fluctuations in the amount of insolation, which has the most direct influence on solar PV generation prediction. The increase or decrease in irradiance can be reflected by tracking the motion vector of cloud and aerosol movement appearing in the satellite image. However, as satellite images occupy a large area, it is challenging to obtain detailed information about a specific area to predict solar PV generation.
Clouds and PM values change with time at the observation point. However, when measured by expanding the observation area, clouds and PM have spatial characteristics that are moved by the wind. Therefore, to predict the amount of solar PV generation, a hybrid spatio-temporal model was developed by combining numerical text data and information extracted from the satellite image [
23], unlike the methods using numerical text data or satellite images individually, as in previous studies [
13,
14,
15,
16,
17,
18,
19,
20,
21,
22]. It combines the time-series characteristics from numerical text data and spatial characteristics from satellite images simultaneously to predict solar PV generation. However, the hybrid spatio-temporal prediction model in a previous study predicted solar PV power plants in a single region [
23]. The amount of solar PV generation in the single site fluctuates sensitively to climate change, however, if the solar PV generation in multiple distant regions is aggregated, extreme fluctuations in solar PV generation can be prevented using the smoothing effect to operate an efficient power supply and demand plan. Therefore, in this study, to solve the climate change sensitivity problem of a single-site solar PV generation and overcome the performance of a single-site prediction model, multiple regions were analyzed and an advanced integrated solar PV generation prediction model was developed in South Korea. The single-site solar PV generation prediction model predicted the solar PV generation of only one solar PV power plant, located in Incheon; therefore, to predict a multisite solar PV generation, the solar PV power plants in two regions, Busan and Yeongam, were added to the study. By developing an advanced multisite integrated solar PV generation prediction model in South Korea, the amount of solar PV generation for future new solar PV power plants can also be predicted by simply filling out facility and geographical information for each solar PV power plant. Therefore, this study proposed an advanced multisite integrated hybrid spatio-temporal solar PV generation prediction model in South Korea. It combined spatial information data extracted from satellite images, reflecting the analysis of wider spatial characteristics with numerical weather data mainly used in conventional solar PV generation prediction studies.
Various machine learning algorithms and prediction techniques were used to predict the amount of solar PV generation [
24,
25,
26,
27,
28,
29]. An hourly advanced multisite integrated hybrid spatio-temporal solar PV generation prediction model was developed that is more accurate and precise than a single-site solar PV generation prediction model. Various prediction models using machine learning algorithms such as the SARIMAX, SVR, DNN, LSTM, Random Forest, and SARIMAX-LSTM models were used.
Research Framework
This study develops an hourly advanced multisite integrated hybrid spatio-temporal solar PV generation prediction model in South Korea. The prediction model uses meteorological numerical text data provided by the Korea Meteorological Agency (KMA) and spatial information data extracted from satellite images to reflect both temporal and spatial characteristics. By reflecting the spatio-temporal characteristics, higher prediction accuracy can be derived than the model using only existing numerical text data and satellite images.
Figure 1 shows the overall flow of this study. The first step is to select solar PV power plants in three cities in South Korea, namely, Incheon, Busan, and Yeongam. A database (DB) was built by collecting and preprocessing meteorological information provided by the KMA in each region and satellite images provided by the National Meteorological Satellite Center (NMSC). The second step extracted the necessary spatial information from four satellite images. In the atmospheric motion vector (AMV) image, the wind direction vector and wind speed, the amount of cloud and thickness of the cloud in the cloud optical thickness (COT) image, the amount of PM and PM concentrations in the aerosol optical depth (AOD) image, and the amount of irradiance were extracted from the insolation (INS) image. The third step was to set the center coordinates for each region and the region of interest (ROI) around it. Furthermore, the ROI
adj is set to the same size as the ROI for the eight adjacent directions to the ROI. To learn spatial information from the solar PV generation prediction models, the effects of cloud and PM on wind direction were analyzed in ROI
adj and ROI. The fourth step was combining the meteorological numerical text data DB built in the first step and the data DB extracted from satellite images and performing a correlation analysis between each meteorological parameter, including clouds and PM, and the amount of solar PV generation. Finally, the fifth step was to develop predictions by applying the SARIMAX, traditional time-series analysis method, SVR, DNN, LSTM, Random Forest, and the SARIMAX-LSTM model, which incorporates the advantages of each method, for developing an hourly advanced multisite integrated hybrid spatio-temporal solar PV generation prediction model. Later, parameter optimization was performed for each technique to increase the prediction performance.
4. Experimental Results
To compare the performance of the single-site and multisite solar PV generation prediction models, 21 of 36 parameters were validated, excluding the facilities and geographic parameters of a single-site solar PV generation prediction model used in the results of a previous study [
23].
Table 13 shows the results of the evaluation by applying the data of three regions to the previous study, the single-site solar PV generation prediction model. Based on the absolute evaluation method
SMAPE, the prediction performance was excellent in the order of DNN model, ARIMAX model, SVR_Linear model, SVR_RBF model, and ANN model. Among all five models, the ARIMAX, which manages multivariate time-series data, was the best in all error verification methods, except the
SMAPE and
MBE. The ARIMAX model predicts by showing the time-series characteristics; hence, it has a certain level of predictive performance, but does not have optimal performance. The SVR_Linear model, including the ARIMAX and DNN models, shows satisfactory performance, whereas the ANN model shows severe performance degradation. However, all five models did not meet the criteria of ASHRAE Guideline 14.
Table 14 shows the prediction results of the five models proposed for multisite solar PV generation in this study. Based on the
SMAPE, the prediction performance was excellent in the order of Random Forest model, SARIMAX-LSTM model, DNN model, LSTM model, SARIMAX model, and SVR_Linear model. The Random Forest model has the best performance based on the
SMAPE, but does not meet the ASHRAE Guideline 14. For the SARIMAX model, the performance is increased compared to the ARIMAX model. Compared with the existing model, the SVR_Linear and DNN models show an increase in performance of 3.96 and 10.5%, respectively, based on
RMSE. Although the performance of the LSTM model is low compared to the newly proposed DNN model, it has the best performance of all proposed models for the SARIMAX-LSTM model combined with the SARIMAX model by applying the stacking ensemble technique. Furthermore, the SARIMAX-LSTM model has
MBE: 2.65;
Cv: 29.92, which is the only one of 10 models meeting the criteria of ASHRAE Guideline 14.
Figure 8 shows 50 h of the overall prediction results of the SARIMAX, SVR_Linear, LSTM, DNN, Random Forest, and SARIMAX-LSTM models. The thick black line is the original observation value and has a value similar to the predicted result of the overall model. The SARIMAX-LSTM model marked with solid red lines shows that it has superior performance to the other models.
5. Discussion
The single-site solar PV generation prediction model has limitations when using multisite data. The ARIMAX model shows the multivariate time-series characteristics in a single-site solar PV generation prediction model, and the SARIMAX model in a multisite solar PV generation prediction model, show higher performance than the other models but do not fulfill the criteria of ASHRAE Guideline 14. The performance of the single-site solar PV generation prediction model using multisite data set is similar to the performance of the multisite solar PV generation prediction model but does not have the optimal results because the single-site solar PV generation prediction model cannot learn on several factors, including the facility and geographic information of the solar PV power plants included in the multisite data. To improve the performance of the proposed model, finding and improving the factors hindering the prediction performance is necessary. The inhibitory factor is deemed the missing value of the AMV data. In the preprocessing step, after recognizing the wind direction arrow image of the AMV image, one must proceed to the next step. However, in this case, if there are no wind direction data in the ROI in the entire AMV image, the corresponding time zone is recognized as a missing value because there is no wind direction arrow. Therefore, if the number of missing values can be reduced when using various interpolation methods or extracting satellite image data using other methods, more improved models could have better performance.
6. Conclusions
This study proposed an advanced multisite integrated hybrid spatio-temporal solar PV generation prediction model by combining time-series-based meteorological numerical text and satellite image data with spatial information to develop a precise and accurate prediction model for solar PV power plants in multiple regions. The existing data provided by the KMA contain time-series characteristics but do not reflect the spatial characteristics of clouds and PM moving according to the wind direction. Therefore, data on clouds and PM moving according to the wind direction were extracted using satellite images to show the spatial characteristics together. It predicted the solar PV generation of existing solar PV power plants in both single and other regions. The data from 2015 to 2018 were used for three solar PV power plants in Incheon, Busan, and Yeongam in South Korea. To reflect the spatial characteristics of clouds and PM, the data from 2015 to 2017 were learned in order to predict the number of clouds and PM in 2018 first, and the amount of solar PV generation in 2018 was predicted using the predicted cloud and PM data. To develop the optimal prediction model, SARIMAX, a traditional time-series analysis method, and SVR_Linear, DNN, LSTM, Random Forest, and SARIMAX-LSTM models based on machine learning algorithms were used.
Consequently, the overall performance increased compared to the single-site solar PV generation prediction model. For the SARIMAX-LSTM model to which the stacking ensemble technique was used to make the most of the temporal characteristics of the solar power generation data, the results were MAE: 64.730; RMSE: 95.800; SMAPE: 19.891; MBE: 2.650; and Cv: 29.923. Among the proposed models, it is the only model that satisfies ASHRAE Guideline 14 and showed the best performance.
The proposed advanced multisite integrated hybrid spatio-temporal solar PV generation prediction model can predict integrated solar PV power generation for solar PV power plants in various regions in South Korea using numerical text data and satellite images. Therefore, it enables the prediction of solar PV generation for both existing and newly constructed solar PV power plants. By learning the facility and geographic information of each solar PV power plant, and the meteorological and air pollutant data of the area where the solar PV power plant is located, the amount of solar PV generation can be predicted. This reflects the spatio-temporal characteristics of solar PV generation, thereby providing guidelines for developing a precise and accurate solar PV generation prediction model for a stable power supply and demand plan.