Forecasting Photovoltaic Power Generation Using Satellite Images

As the relative importance of renewable energy in electric power systems increases, the prediction of photovoltaic (PV) power generation has become a crucial technology, for improving stability in the operation of next-generation power systems, such as microgrid and virtual power plants (VPP). In order to improve the accuracy of PV power generation forecasting, a fair amount of research has been applied to weather forecast data (to a learning process). Despite these efforts, the problems of forecasting PV power generation remains challenging since existing methods show limited accuracy due to inappropriate cloud amount forecast data, which are strongly correlated with PV power generation. To address this problem, we propose a PV power forecasting model, including a cloud amount forecasting network trained with satellite images. In addition, our proposed model adopts convolutional self-attention to effectively capture historical features, and thus acquire helpful information from weather forecasts. To show the efficacy of the proposed cloud amount forecast network, we conduct extensive experiments on PV power generation forecasting with and without the cloud amount forecast network. The experimental results show that the Mean Absolute Percentage Error (MAPE) of our proposed prediction model, combined with the cloud amount forecast network, are reduced by 22.5% compared to the model without the cloud amount forecast network.


Introduction
As the penetration level of distributed energy resources (DERs) increases in microgrids and virtual power plants (VPPs), the problem of photovoltaic (PV) power generation forecasting is becoming crucial for such next generation power systems [1]. Microgrids are power systems consisting of DERs and electrical end users, with controllable elastic loads, all distributed in limited areas [2]. A VPP [3] is a flexible representation of the DER portfolio that can be used to make contracts in the wholesale market and provide services to system operators [4]. The increasing penetration of intermittent and variable renewable energy resources (e.g., wind and solar) has significantly complicated energy system management for microgrids and VPPs [5,6]. Next-generation power systems must be able to supply electricity that meets consumer demand. For these reasons, PV power generation forecasting is an important and challenging topic for the fields of microgrids and VPPs.
The PV power generation, a power generation method that produces electricity by converting sunlight into direct current electricity, is a key renewable energy source for microgrids and VPPs. The amount of sunlight mainly depends on weather variables, such as temperature solar radiation, wind speed, cloud changes, and relative humidity. Therefore, the problem of PV power generation prediction is becoming an equivalent problem to the problem of forecasting weather, which means that there are difficulties in predicting the exact amount of PV power generation [7,8]. Furthermore, the solar eclipse may also have non-trivial impact on PV power generation [9].
To deal with this problem, many previous studies have attempted to use historical weather data to improve the accuracy of forecasting PV power generation [10][11][12][13][14][15]. These previous researches have constructed a predictive model by independently using or combining recurrent neural network (RNN) and convolution neural network (CNN). Although these approaches used weather data to learn the correlation with PV power generation, cloud amount forecasts from the meteorological administration, which is strongly correlated with PV power generation, might be inappropriate for PV power generation prediction, because they provide only long-term (e.g., 3-h or more) forecasts, and wide area forecasts, instead of a prediction for a specific location where PV panels are installed. To address this problem, we first propose a cloud amount forecasting model based on Eidetic three-dimensional (E3D)-long short-term memory (LSTM) [16] to predict exact short-term cloud amount for a specific location using satellite images. Then, using this short-term and location-specific cloud amount forecast, our proposed PV power generation prediction model generates a future profile using the convolutional self-attention based long short-term memory (LSTM) [17,18]. The rationale of the proposed model design is to combine a cloud amount forecasting network with the ensemble potential of a convolutional self-attention-based context learning approach.
The rest of the paper is organized as follows: in Section 2, we introduce related researches on PV forecasting. We provide an overview of the proposed cloud amount forecasting model using satellite images in Section 3, including a method for preprocessing satellite images. Section 4 describes the proposed PV power prediction model, and Section 5 shows the results of the experiment. Lastly, Section 6 concludes the paper.

Related Work
Based on the type of dataset used for training a prediction model, we categorize the existing PV power generation prediction models into two categories; (i) using historical PV power data and weather data; (ii) using historical PV power data, weather data, and weather forecasts.

Models Using Only Historical Data
The PV power prediction models in the first category use historical PV power data and weather data. In [19], in order to discover nonlinear features and invariant structures from adjacent days and intra-day data, the authors apply an RNN, by dividing solar power time series dataset into inter-day data and intra-day data. In addition, they propose a point prediction model using the PV power generation data as input without historical weather data.
The wavelet transform (WT)-LSTM model [20] transforms the historical solar energy data into different frequency series using the wavelet transform (WT). Then the authors adopt a LSTM with dropout based deep learning model by combining the statistical WT based features with weather factors, such as temperature, humidity, wind speed, pressure, and cloudy index.
In [21], the authors argue that solar power generation is difficult to predict due to the randomness, volatility, and intermittent nature of solar power generation. To solve this problem, they proposed the Adaptive Hyper Parameter Adjustment LSTM (AHPA-LSTM) [21] to improve the temporal correlation of the LSTM network through temporal learning weights. In addition, this model adopts the Fusion Activation Function (FAF) to solve the gradient loss and uses PV maximum output power, PV module rear temperature, bevel illumination, relative humidity, and other weather factors as input.
The high-precision deep neural network model (PVPNet) [22] applies a CNN in order to generate a 24-h forecasting of PV power output using weather information, such as temperature and solar radiation. In [23], the authors propose a CNN based model by constructing a 2D feature map extracted Energies 2020, 13, 6603 3 of 15 from PV power generation data and past weather data. Then they apply ResNet and DenseNet using the transfer learning, fine-tuned, and the Quantile loss function.
In [24], the authors claim that the rapid fluctuations in PV power generation are mainly caused by changes in cloud coverage. They apply a CNN to forecasting model to forecast 5-20 min ahead of global horizontal irradiance (GHI) using total-sky images (TSI) with cloud information and lagged GHI.
To overcome the limitations of LSTM in a one-dimensional forecasting problem, the authors in [25] propose a method for forecasting solar irradiation by encoding one-dimensional time-series data into images using the Gramian Angular Field (GAF), and thus apply the Convolutional LSTM (ConvLSTM) network.
In [26], the authors propose an LSTM attention embedding model based on the Bayesian optimization to predict one-day ahead PV power output by exploring various features, such as the statistical features, combined features, time features, and wind speed categorical features.
On the other hand, a hybrid ensemble deep learning framework [27] is proposed to forecast short-term photovoltaic power generation. This model uses two LSTM networks for temperature and power output forecasting. In addition, this model uses the attention mechanism to effectively focus on more significant input features.

Models Using Both Past and Future Data
The following research attempts to improve forecasting accuracy by using not only past data, but also future forecast data. In [28], the authors propose a similarity-based forecasting model (SBFM).
To effectively generalize a model for different scenarios of available weather data, three forecasting models (i.e., basic SBFM, categorical SBFM, and hierarchical SBFM) are proposed. These models use PV power, temperature, humidity, dew point, and wind speed as input data to find similar weather conditions in the past by considering weather forecasts, and to calculate weights.
In [29], a dataset from SolarTechLab in the Politecnico di Milano area is used as input to the multi-layer perceptron (MLP) to predict insolation on a day. If the average of the predicted insolation is 150 W/m 2 or more, the data are clustered on sunny days, and if it is less than that, the data are clustered on cloudy days to train the two MLP models, such as PV power generation in the past. The experiment results show the Weighted Mean Absolute Error (WMAE) of 30.39% on a clear day and 750.01% on a cloudy day, which shows that it is especially difficult to predict PV power on a cloudy day.
In [30], the authors propose a hybrid deep learning model using LSTM and CNN for long-term PV forecasting. The hybrid model first forecasts n-years PV power generation by using global horizontal irradiance (GHI), direct normal irradiance (DNI), diffuse horizontal irradiance (DHI), and solar thermal data over one year as inputs to LSTM. Finally, the PV power generation is forecasted by using the forecasted value of LSTM, past PV power generation, and meteorological data as inputs to CNN. This model shows 2% improvement in MAPE.
In [18], the authors propose the convolutional self-attention LSTM model using both historical and future data. This model improves forecasting performance by calculating the similarity with their local context information among input features using the convolutional self-attention technique. However, this model does not use accurate cloud amount, resulting in failure to efficiently capture solar energy features affected by cloud amount.
Compared with the existing works summarized in Table 1, our proposed method has two prominent advantages. First, our method predicts future cloud amount using satellite images to improve the prediction accuracy by informing future cloud amount to the prediction model. Second, in addition to this forecasted cloud amount, we adopt the convolutional self-attention to efficiently capture features of both historical and future data by modeling local context of time-series data. To the best of our knowledge, our proposed model is the first effort to use satellite images to enhance the prediction accuracy of PV power generation forecasting by exploring the future profile of cloud amount.

Forecasting Cloud Amount Using Satellite Images
In chapter 3, we first describe the details of our two-step preprocessing method of satellite images. Then, we present the process of forecasting future cloud amount and the structure of our prediction model in detail.

Two-Step Preprocessing of Satellite Images
To forecast cloud amount, we use the infrared channel (8.7 µm) images of the GK2A (GEO-KOMPSAT -2A) provided by the National Meteorological Satellite Center in Korea. The size of an image is 900 × 900. The GK2A provides images observed since 25 July 2019, with two-kilometer resolution. Observations take place every two minutes, however, observations are sometimes stopped for various reasons. Therefore, it is necessary to interpolate the missing images, but not to teach false information to the prediction model. Figure 1 shows an example of the GK2A satellite image (infrared channel), including yellow border lines. Because these border lines can be misinterpreted by the prediction model, we first remove border lines as shown in Figure 2.

Forecasting Cloud Amount Using Satellite Images
In Chapter 3, we first describe the details of our two-step preprocessing method of satellite images. Then, we present the process of forecasting future cloud amount and the structure of our prediction model in detail.

Two-Step Preprocessing of Satellite Images
To forecast cloud amount, we use the infrared channel (8.7 μm) images of the GK2A (GEO-KOMPSAT-2A) provided by the National Meteorological Satellite Center in Korea. The size of an image is 900 × 900. The GK2A provides images observed since 25 July 2019, with two-kilometer resolution. Observations take place every two minutes, however, observations are sometimes stopped for various reasons. Therefore, it is necessary to interpolate the missing images, but not to teach false information to the prediction model. Figure 1 shows an example of the GK2A satellite image (infrared channel), including yellow border lines. Because these border lines can be misinterpreted by the prediction model, we first remove border lines as shown in Figure 2.    Figure 2 shows the process of removing border lines. After memorizing the coordinates with the values indicated in yellow, the pixel values are set to zero. Afterwards, these empty coordinates are filled by the average vector of 3 × 3 around the memorized coordinates. By doing this, we can obtain a satellite image with border lines removed.
When observation occurs, normally every two minutes, 720 satellite images per day are collected. However, during some events, such as station keeping, wheel offloading, etc., observations are not performed. Consequently, even though we use hourly satellite images for training the prediction model, a considerable amount of hourly images may be missed. Figure 3a shows the number of missing images per day before linear interpolation. These missing images have non-trivial impact on the prediction performance. Specifically, if we just discard these missing images, the number of training images will significantly decrease. On the other hand, if we interpolate all of these missing images without any restrictions, the prediction model will learn false information from these interpolated images. Therefore, a method for solving this trade-off is necessary. In this paper, we heuristically apply the linear interpolation method only when three or less consecutive images are missed. Figure 3b shows the number of missing images per day after linear interpolation.
(a) Before linear interpolation (b) After linear interpolation Even after applying linear interpolation, missing images may exist as shown in Figure 3b. To handle these missing images, as the second step, we apply a window based sliding method to check whether an input window (blue box in Figure 4) or output window (green box in Figure 4) includes missing images. As shown in Figure 4, we divide the given dataset into two parts, i.e., 90% as a training dataset and 10% as a test dataset. For the training dataset, we set the size of stride to 1 h, whereas the size of stride for the test dataset is 24 h. Each input and output window in both datasets consists of two parts, the blue part for past 24-h images and the green part for cloud amount of future 5-h. If the blue or green part includes one or more missing images, we exclude this window from the dataset to prevent the prediction model from misunderstanding features due to these missing images. By doing this, we cannot only maintain the number of data, but also maintain the quality of the data.  Figure 2 shows the process of removing border lines. After memorizing the coordinates with the values indicated in yellow, the pixel values are set to zero. Afterwards, these empty coordinates are filled by the average vector of 3 × 3 around the memorized coordinates. By doing this, we can obtain a satellite image with border lines removed.
When observation occurs, normally every two minutes, 720 satellite images per day are collected. However, during some events, such as station keeping, wheel offloading, etc., observations are not performed. Consequently, even though we use hourly satellite images for training the prediction model, a considerable amount of hourly images may be missed. Figure 3a shows the number of missing images per day before linear interpolation. These missing images have non-trivial impact on the prediction performance. Specifically, if we just discard these missing images, the number of training images will significantly decrease. On the other hand, if we interpolate all of these missing images without any restrictions, the prediction model will learn false information from these interpolated images. Therefore, a method for solving this trade-off is necessary. In this paper, we heuristically apply the linear interpolation method only when three or less consecutive images are missed. Figure 3b shows the number of missing images per day after linear interpolation.  Figure 2 shows the process of removing border lines. After memorizing the coordinates with the values indicated in yellow, the pixel values are set to zero. Afterwards, these empty coordinates are filled by the average vector of 3 × 3 around the memorized coordinates. By doing this, we can obtain a satellite image with border lines removed.
When observation occurs, normally every two minutes, 720 satellite images per day are collected. However, during some events, such as station keeping, wheel offloading, etc., observations are not performed. Consequently, even though we use hourly satellite images for training the prediction model, a considerable amount of hourly images may be missed. Figure 3a shows the number of missing images per day before linear interpolation. These missing images have non-trivial impact on the prediction performance. Specifically, if we just discard these missing images, the number of training images will significantly decrease. On the other hand, if we interpolate all of these missing images without any restrictions, the prediction model will learn false information from these interpolated images. Therefore, a method for solving this trade-off is necessary. In this paper, we heuristically apply the linear interpolation method only when three or less consecutive images are missed. Figure 3b shows the number of missing images per day after linear interpolation.  Even after applying linear interpolation, missing images may exist as shown in Figure 3b. To handle these missing images, as the second step, we apply a window based sliding method to check whether an input window (blue box in Figure 4) or output window (green box in Figure 4) includes missing images. As shown in Figure 4, we divide the given dataset into two parts, i.e., 90% as a training dataset and 10% as a test dataset. For the training dataset, we set the size of stride to 1 h, whereas the size of stride for the test dataset is 24 h. Each input and output window in both datasets consists of two parts, the blue part for past 24-h images and the green part for cloud amount of future 5-h. If the blue or green part includes one or more missing images, we exclude this window from the dataset to prevent the prediction model from misunderstanding features due to these missing images. By doing this, we cannot only maintain the number of data, but also maintain the quality of the data. Even after applying linear interpolation, missing images may exist as shown in Figure 3b. To handle these missing images, as the second step, we apply a window based sliding method to check whether an input window (blue box in Figure 4) or output window (green box in Figure 4) includes missing images. As shown in Figure 4, we divide the given dataset into two parts, i.e., 90% as a training dataset and 10% as a test dataset. For the training dataset, we set the size of stride to 1 h, whereas the size of stride for the test dataset is 24 h. Each input and output window in both datasets consists of two parts, the blue part for past 24-h images and the green part for cloud amount of future 5-h. If the blue or green part includes one or more missing images, we exclude this window from the dataset to prevent the prediction model from misunderstanding features due to these missing images. By doing this, we cannot only maintain the number of data, but also maintain the quality of the data.

Forecasting Model of Cloud Amount Using Satellite Images
Our first goal is to forecast future 5-h cloud amount profile using satellite images for the past 24 h. The cloud amount forecasting model should be able to simultaneously capture the temporal features and the spatial features of cloud movement in satellite images.
ConvLSTM is proposed to model spatiotemporal features by tensorizing spatial information in order to overcome the limitation of expression of vector variables in a vanilla LSTM, which cannot capture spatial information [31]. However, the memory cells of each layer of ConvLSTM are used independently, and the weights are updated only in the time domain. Due to these characteristics, the information stored in the last layer of the previous time step cannot be used in the first layer of the current time step [32].
To solve the disadvantages of ConvLSTM, spatiotemporal-LSTM is proposed as shown in Figure  5, which shows the structure of spatiotemporal-LSTM. The memory M of Spatiotemporal-LSTM is updated in a zigzag direction. This update structure can utilize the last layer of the previous time step and enables effective modeling of short-term motion [32]. However, the forget gate of memory C of spatiotemporal-LSTM, which is used to ease vanishing gradient problem in the Fully Connected LSTM (FC-LSTM) structure, tends to react strongly to short-term features. Therefore, it easily enters the saturation region and hinders the flow of long-range information. Eidetic three-dimensional (E3D)-LSTM is a model that improves the forgetting mechanism of updating temporal memory C, which is a drawback of spatiotemporal-LSTM. Figure 6 shows the

Forecasting Model of Cloud Amount Using Satellite Images
Our first goal is to forecast future 5-h cloud amount profile using satellite images for the past 24 h.
The cloud amount forecasting model should be able to simultaneously capture the temporal features and the spatial features of cloud movement in satellite images.
ConvLSTM is proposed to model spatiotemporal features by tensorizing spatial information in order to overcome the limitation of expression of vector variables in a vanilla LSTM, which cannot capture spatial information [31]. However, the memory cells of each layer of ConvLSTM are used independently, and the weights are updated only in the time domain. Due to these characteristics, the information stored in the last layer of the previous time step cannot be used in the first layer of the current time step [32].
To solve the disadvantages of ConvLSTM, spatiotemporal-LSTM is proposed as shown in Figure 5, which shows the structure of spatiotemporal-LSTM. The memory M of Spatiotemporal-LSTM is updated in a zigzag direction. This update structure can utilize the last layer of the previous time step and enables effective modeling of short-term motion [32]. However, the forget gate of memory C of spatiotemporal-LSTM, which is used to ease vanishing gradient problem in the Fully Connected LSTM (FC-LSTM) structure, tends to react strongly to short-term features. Therefore, it easily enters the saturation region and hinders the flow of long-range information.

Forecasting Model of Cloud Amount Using Satellite Images
Our first goal is to forecast future 5-h cloud amount profile using satellite images for the past 24 h. The cloud amount forecasting model should be able to simultaneously capture the temporal features and the spatial features of cloud movement in satellite images.
ConvLSTM is proposed to model spatiotemporal features by tensorizing spatial information in order to overcome the limitation of expression of vector variables in a vanilla LSTM, which cannot capture spatial information [31]. However, the memory cells of each layer of ConvLSTM are used independently, and the weights are updated only in the time domain. Due to these characteristics, the information stored in the last layer of the previous time step cannot be used in the first layer of the current time step [32].
To solve the disadvantages of ConvLSTM, spatiotemporal-LSTM is proposed as shown in Figure  5, which shows the structure of spatiotemporal-LSTM. The memory M of Spatiotemporal-LSTM is updated in a zigzag direction. This update structure can utilize the last layer of the previous time step and enables effective modeling of short-term motion [32]. However, the forget gate of memory C of spatiotemporal-LSTM, which is used to ease vanishing gradient problem in the Fully Connected LSTM (FC-LSTM) structure, tends to react strongly to short-term features. Therefore, it easily enters the saturation region and hinders the flow of long-range information. Eidetic three-dimensional (E3D)-LSTM is a model that improves the forgetting mechanism of updating temporal memory C, which is a drawback of spatiotemporal-LSTM. Figure 6 shows the Eidetic three-dimensional (E3D)-LSTM is a model that improves the forgetting mechanism of updating temporal memory C, which is a drawback of spatiotemporal-LSTM. Figure 6 shows the structure of the E3D-LSTM. Unlike FC-LSTM that uses only memory C at the previous timestamp, Energies 2020, 13, 6603 7 of 15 the memory stored over several timestamps is effectively used by referring to all previous memories, C as much as τ [16].
Energies 2020, 13, x FOR PEER REVIEW 7 of 15 structure of the E3D-LSTM. Unlike FC-LSTM that uses only memory C at the previous timestamp, the memory stored over several timestamps is effectively used by referring to all previous memories, C as much as [16]. In this paper, we design a cloud amount forecasting model based on this E3D-LSTM model to effectively model spatiotemporal features of satellite images for the past 24 h. Figure 7 shows the architecture of our cloud amount forecasting model. The input data of this model is a set of satellite images for the past 24 h resized to (32,32)    In this paper, we design a cloud amount forecasting model based on this E3D-LSTM model to effectively model spatiotemporal features of satellite images for the past 24 h. Figure 7 shows the architecture of our cloud amount forecasting model. The input data of this model is a set of satellite images for the past 24 h resized to (32,32) after preprocessing the border lines from Figure 2. We stack four E3D-LSTM cells with 64 hidden state channels. Moreover, all 3D-Conv operators in E3D-LSTM cell are composed of a 2 × 3 × 3 (time × height × width) convolutions. Our model first receives the size of (1, 24, 32, 32) and converts channel 1 to channel 64 by 3D convolution operations with 64 size hidden state channels inside the E3D LSTM, and outputs the result of size (64, 24, 32, 32).
Energies 2020, 13, x FOR PEER REVIEW 7 of 15 structure of the E3D-LSTM. Unlike FC-LSTM that uses only memory C at the previous timestamp, the memory stored over several timestamps is effectively used by referring to all previous memories, C as much as [16]. In this paper, we design a cloud amount forecasting model based on this E3D-LSTM model to effectively model spatiotemporal features of satellite images for the past 24 h. Figure 7 shows the architecture of our cloud amount forecasting model. The input data of this model is a set of satellite images for the past 24 h resized to (32,32) after preprocessing the border lines from Figure 2. We stack four E3D-LSTM cells with 64 hidden state channels. Moreover, all 3D-Conv operators in E3D-LSTM cell are composed of a 2 3 3 (time × height × width) convolutions. Our model first receives the size of (1, 24, 32, 32) and converts channel 1 to channel 64 by 3D convolution operations with 64 size hidden state channels inside the E3D LSTM, and outputs the result of size (64, 24, 32, 32).  in the satellite images. From these context vectors encoded by 4-cell E3D-LSTMs, followed by a Conv3D layer and a permutation layer, we can get a future 5-h cloud amount profile through a fully connected layer. The spatiotemporal features of past satellite images are extracted through E3D LSTM. Then, these features are decoded through the two 3D Convolution layers and fully connected layer. The output from our model is a future 5-h profile of cloud amount at a specific location, e.g., Gangneung-si, Korea.

Forecasting PV Power Generation
Once we obtain a 5-h cloud amount profile, we can forecast the PV power generation by using the future cloud amount as one of context information. In chapter 4, we describe a PV power generation forecasting model that can effectively capture context features from context data, such as weather forecasts, the solar elevation angle, and our cloud amount forecasts.

Context Features for Forecasting
The PV power generation is a power generation method that generates electricity by converting sunlight into direct current electricity. Therefore, the PV power generation is bound to depend on weather conditions. Specifically, meteorological factors, such as solar radiation, temperature, cloud change, and wind speed, mainly affect the amount of PV power generation. Moreover, if we use only historical data, we cannot get a sufficient level of prediction accuracy. To deal with this problem, we should not only use temporal features extracted from past weather, but also use weather forecasts such as precipitation, temperature, relative humidity, and wind speed.
The solar elevation angle is the angle between the sun's rays and a horizontal plane and has a value between 0 • before sunset and 90 • at noon. As the solar elevation angle increases, the amount of solar radiation energy reaching a unit area increases. Therefore, it can be used as an important feature for forecasting PV power generation. The solar elevation angle can simply be obtained by Equation (1), which means that future solar elevation angle can be obtained without a forecast. sin E = (sin δ * sin ϕ) + (cos δ * cos ϕ * cos H) where E is the solar elevation angle, δ is the current declination of the Sun, ϕ is the local latitude and H is the hour angle in the local solar time.
In order to further improve the accuracy of PV power generation forecasting, we use a 5-h cloud amount forecasting from our cloud amount forecasting model. The goal of our model is to increase the accuracy of forecasting PV power generation by informing accurate future cloud amount to the prediction model.

PV Power Generation Forecasting Model
Using these context data including our cloud amount forecasts, we design a PV power generation prediction model based on the convolutional self-attention mechanism, as shown in Figure 8. satellite images. From these context vectors encoded by 4-cell E3D-LSTMs, followed by a Conv3D layer and a permutation layer, we can get a future 5-h cloud amount profile through a fully connected layer. The spatiotemporal features of past satellite images are extracted through E3D LSTM. Then, these features are decoded through the two 3D Convolution layers and fully connected layer. The output from our model is a future 5-h profile of cloud amount at a specific location, e.g., Gangneungsi, Korea.

Forecasting PV Power Generation
Once we obtain a 5-h cloud amount profile, we can forecast the PV power generation by using the future cloud amount as one of context information. In Chapter 4, we describe a PV power generation forecasting model that can effectively capture context features from context data, such as weather forecasts, the solar elevation angle, and our cloud amount forecasts.

Context Features for Forecasting
The PV power generation is a power generation method that generates electricity by converting sunlight into direct current electricity. Therefore, the PV power generation is bound to depend on weather conditions. Specifically, meteorological factors, such as solar radiation, temperature, cloud change, and wind speed, mainly affect the amount of PV power generation. Moreover, if we use only historical data, we cannot get a sufficient level of prediction accuracy. To deal with this problem, we should not only use temporal features extracted from past weather, but also use weather forecasts such as precipitation, temperature, relative humidity, and wind speed.
The solar elevation angle is the angle between the sun's rays and a horizontal plane and has a value between 0° before sunset and 90° at noon. As the solar elevation angle increases, the amount of solar radiation energy reaching a unit area increases. Therefore, it can be used as an important feature for forecasting PV power generation. The solar elevation angle can simply be obtained by Equation (1), which means that future solar elevation angle can be obtained without a forecast.
sin sin * sin cos * cos * cos (1) where E is the solar elevation angle, δ is the current declination of the Sun, φ is the local latitude and H is the hour angle in the local solar time.
In order to further improve the accuracy of PV power generation forecasting, we use a 5-h cloud amount forecasting from our cloud amount forecasting model. The goal of our model is to increase the accuracy of forecasting PV power generation by informing accurate future cloud amount to the prediction model.

PV Power Generation Forecasting Model
Using these context data including our cloud amount forecasts, we design a PV power generation prediction model based on the convolutional self-attention mechanism, as shown in Figure 8.  The self-attention method proposed in [33] is a technique that improves accuracy by calculating the similarity between words in the input sentence and assigning a weight corresponding to the degree of association between words. However, this canonical self-attention, which generates queries and keys that do not fit in a simple dot-based local context, can degrade the model's performance. The convolutional self-attention [18] is proposed to ease this problem by generating queries and keys suitable for local context.

Experiment
We use hourly PV power generation data from 25 July 2019 to 30 June 2020. We obtain PV power generation data from a PV power plant in Jebi-ri, Gujeong-myeon, Gangneung-si, Gangwon-do, Korea. We obtain the GK2A satellite images (infrared channel) from the National Meteorological Satellite Center in Korea. We also collect hourly weather measurement data and 3-hourly weather forecast data to be used for testing from the Korean Meteorological Administration (KMA). Linear interpolation is used to transform the 3-hourly weather forecast data into hourly data.

Performance of Cloud Amount Forecasting Model
This experiment shows the performance of our 5-h cloud amount forecasting model using GK2A satellite image data. In order to prevent the model from overfitting the training data during the training phase, we divided the given dataset into the training data and the testing data. Among the entire data, which are composed of 344 days, 90% is used as a training dataset and 10% as a test dataset. More specifically, satellite images from 25 July 2019 to 27 May 2020 are used as training data, and satellite images from 28 May 2020 to 30 June 2020 are used as testing data.
Among 34 windows corresponding to 34 days in the test dataset, we get 33 windows that do not include missing images. Note that the size of stride for the test dataset is 24 h as described in Section 3.1. The length of 165 in the x-axis in Figure 9 means the results of 5-h prediction per day for 33 days. Figure 9 shows the forecasting results of cloud amount, which acts as an important feature to further enhance the prediction accuracy as will be shown in Section 5.4. Note that even though the predicted values in the dotted-red lines may be different from the actual values in the dotted-black lines, their predicted patterns are very similar to those of the actual values. We observe that this cloud amount forecast data play an important role in the prediction of PV power generation by giving the model valuable information about the future cloud amount in Section 5.5.
Energies 2020, 13, x FOR PEER REVIEW 9 of 15 The self-attention method proposed in [33] is a technique that improves accuracy by calculating the similarity between words in the input sentence and assigning a weight corresponding to the degree of association between words. However, this canonical self-attention, which generates queries and keys that do not fit in a simple dot-based local context, can degrade the model's performance. The convolutional self-attention [18] is proposed to ease this problem by generating queries and keys suitable for local context.

Experiment
We use hourly PV power generation data from 25 July 2019 to 30 June 2020. We obtain PV power generation data from a PV power plant in Jebi-ri, Gujeong-myeon, Gangneung-si, Gangwon-do, Korea. We obtain the GK2A satellite images (infrared channel) from the National Meteorological Satellite Center in Korea. We also collect hourly weather measurement data and 3-hourly weather forecast data to be used for testing from the Korean Meteorological Administration (KMA). Linear interpolation is used to transform the 3-hourly weather forecast data into hourly data.

Performance of Cloud Amount Forecasting Model
This experiment shows the performance of our 5-h cloud amount forecasting model using GK2A satellite image data. In order to prevent the model from overfitting the training data during the training phase, we divided the given dataset into the training data and the testing data. Among the entire data, which are composed of 344 days, 90% is used as a training dataset and 10% as a test dataset. More specifically, satellite images from 25 July 2019 to 27 May 2020 are used as training data, and satellite images from 28 May 2020 to 30 June 2020 are used as testing data.
Among 34 windows corresponding to 34 days in the test dataset, we get 33 windows that do not include missing images. Note that the size of stride for the test dataset is 24 h as described in Section 3.1. The length of 165 in the x-axis in Figure 9 means the results of 5-h prediction per day for 33 days. Figure 9 shows the forecasting results of cloud amount, which acts as an important feature to further enhance the prediction accuracy as will be shown in Section 5.4. Note that even though the predicted values in the dotted-red lines may be different from the actual values in the dotted-black lines, their predicted patterns are very similar to those of the actual values. We observe that this cloud amount forecast data play an important role in the prediction of PV power generation by giving the model valuable information about the future cloud amount in Section 5.5.

Details of Input Data of PV Power Generation Forecasting Model
In this section, we present the details of the input data for our model to forecast 5-h ahead (i.e., from t + 1 to t + 5) PV power generation prediction. We use 120-h data equivalent to 5 days as input data. The input data are composed of the PV power generation from t-119 to t, whereas the context features from t-114 to t + 5. In the testing phase, the context features from t + 1 to t + 5 are replaced by weather forecast data. This is because the future weather data are not known during the testing phase. Figure 10 shows an example of the input used in the testing phase. The features consist of PV power generation, precipitation, temperature, relativity humidity, wind speed, solar elevation angle, and cloud amount forecasts. The model uses PV power generation from 0 to 119 and the context features from 5 to 124 to forecast 5-h (i.e., from 120 to 124) PV power generation. The blue box in Figure 10 shows that the context features from 120 to 124 are replaced by weather forecast data. Note that the cloud forecast data are forecasted by our cloud amount forecast model.

Details of Input Data of PV Power Generation Forecasting Model
In this section, we present the details of the input data for our model to forecast 5-h ahead (i.e., from t + 1 to t + 5) PV power generation prediction. We use 120-h data equivalent to 5 days as input data. The input data are composed of the PV power generation from t-119 to t, whereas the context features from t-114 to t + 5. In the testing phase, the context features from t + 1 to t + 5 are replaced by weather forecast data. This is because the future weather data are not known during the testing phase. Figure 10 shows an example of the input used in the testing phase. The features consist of PV power generation, precipitation, temperature, relativity humidity, wind speed, solar elevation angle, and cloud amount forecasts. The model uses PV power generation from 0 to 119 and the context features from 5 to 124 to forecast 5-h (i.e., from 120 to 124) PV power generation. The blue box in Figure 10 shows that the context features from 120 to 124 are replaced by weather forecast data. Note that the cloud forecast data are forecasted by our cloud amount forecast model.

Performance Metric
In this paper, Mean Absolute Error (MAE), Mean Absolute Percentage Error (MAPE), Root Mean Squared Error (RMSE), and normalized Mean Absolute Error (nMAE) are used to measure the accuracy of the 1-day prediction. The MAE, MAPE, RMSE, nMAE are defined as: where n represents the number of data, y true,t represents the t-th actual PV power generation value, y pred, t represents the t-th predicted PV power generation value.

Results of PV Power Generation Forecasting
In order to validate the applicability of our prediction model, we conduct two sets of experiments using the model with and without cloud amount forecasts. All experiments are conducted on a machine running Ubuntu 18.04.3 LTS with Intel Xeon ER-2630v4 at 2.20 GHz, 128 G of main memory, and an NVIDIA GeForce RTX 2080 graphic card. Note that all prediction processes took less than 0.85 s, which means that our model can be applied to real-time services, such as one-minute or five-minute ahead prediction services.
To further analyze the effect of cloud amount forecasts, we present three different scenarios, i.e., a sunny day in Figure 11, a cloudy day in Figure 12 and a rainy day in Figure 13.
where represents the number of data, , represents the t-th actual PV power generation value, , represents the -th predicted PV power generation value.

Results of PV Power Generation Forecasting
In order to validate the applicability of our prediction model, we conduct two sets of experiments using the model with and without cloud amount forecasts. All experiments are conducted on a machine running Ubuntu 18.04.3 LTS with Intel Xeon ER-2630v4 at 2.20 GHz, 128G of main memory, and an NVIDIA GeForce RTX 2080 graphic card. Note that all prediction processes took less than 0.85 s, which means that our model can be applied to real-time services, such as oneminute or five-minute ahead prediction services.
To further analyze the effect of cloud amount forecasts, we present three different scenarios, i.e., a sunny day in Figure 11, a cloudy day in Figure 12 and a rainy day in Figure 13.
where represents the number of data, , represents the t-th actual PV power generation value, , represents the -th predicted PV power generation value.

Results of PV Power Generation Forecasting
In order to validate the applicability of our prediction model, we conduct two sets of experiments using the model with and without cloud amount forecasts. All experiments are conducted on a machine running Ubuntu 18.04.3 LTS with Intel Xeon ER-2630v4 at 2.20 GHz, 128G of main memory, and an NVIDIA GeForce RTX 2080 graphic card. Note that all prediction processes took less than 0.85 s, which means that our model can be applied to real-time services, such as oneminute or five-minute ahead prediction services.
To further analyze the effect of cloud amount forecasts, we present three different scenarios, i.e., a sunny day in Figure 11, a cloudy day in Figure 12 and a rainy day in Figure 13.   Figure 11a shows the results of forecasting PV power generation on a sunny day, and Figure 11b denotes the PV data and weather data on that day.
The dotted-red lines in Figure 11a denote when cloud amount forecasts are used, whereas the dotted-blue lines represent when cloud amount forecasts are not used. As shown in Figure 11, the prediction result of PV power generation shows a typical bell shape on a sunny day. Specifically, the model using cloud amount forecasts produces more similar patterns to those of the actual PV power generation indicated in dotted-black lines than the model without cloud amount forecasts. This means that the model using cloud amount forecasts is able to learn valuable future information from cloud amount forecasts in the cyan box of Figure 11b. Figure 12a shows the results of forecasting PV power generation on a cloudy day. Note that the weather conditions in the red box of Figure 12b are very similar to those in the orange box of Figure  11b except for the cloud amount in the purple box. Therefore, the model without cloud amount forecasts produces the same wrong patterns as those in the dotted-blue lines in Figure 11a. On the other hand, the model using cloud amount forecasts is able to accurately predict PV power generation on a cloudy day because it learns that the sunlight will be disturbed by a high cloud amount, as shown in the dotted-red lines in Figure 12a. Figure 13a shows the results of forecasting PV power generation on a rainy day. As shown in Figure 13a, PV power generation is greatly affected by rainfall, high humidity (green boxes), and high cloud amount (pink box). Nevertheless, the model using cloud amount forecasts is able to predict a similar pattern (in the dotted-red lines) to the actual pattern (in the dotted-black lines) as shown in Figure 13a by capturing the temporal features of the meteorological flow and future cloud amount.
The results of PV power generation forecasting are summarized in Table 2.

Seasonal Effects
To further analyze the seasonal performance of our proposed model, we conduct an additional set of experiments with the dataset divided by seasons. Concretely, we test our model with four test sets, i.e., the autumn test set from 1 October 2019 to 30 October 2019, the winter test set from 1 January 2020 to 30 January 2020, the spring test set from 1 March 2020 to 30 March 2020, and the summer test set from 28 May 2020 to 30 June 2020.  Figure 11a shows the results of forecasting PV power generation on a sunny day, and Figure 11b denotes the PV data and weather data on that day.
The dotted-red lines in Figure 11a denote when cloud amount forecasts are used, whereas the dotted-blue lines represent when cloud amount forecasts are not used. As shown in Figure 11, the prediction result of PV power generation shows a typical bell shape on a sunny day. Specifically, the model using cloud amount forecasts produces more similar patterns to those of the actual PV power generation indicated in dotted-black lines than the model without cloud amount forecasts. This means that the model using cloud amount forecasts is able to learn valuable future information from cloud amount forecasts in the cyan box of Figure 11b. Figure 12a shows the results of forecasting PV power generation on a cloudy day. Note that the weather conditions in the red box of Figure 12b are very similar to those in the orange box of Figure 11b except for the cloud amount in the purple box. Therefore, the model without cloud amount forecasts produces the same wrong patterns as those in the dotted-blue lines in Figure 11a. On the other hand, the model using cloud amount forecasts is able to accurately predict PV power generation on a cloudy day because it learns that the sunlight will be disturbed by a high cloud amount, as shown in the dotted-red lines in Figure 12a. Figure 13a shows the results of forecasting PV power generation on a rainy day. As shown in Figure 13a, PV power generation is greatly affected by rainfall, high humidity (green boxes), and high cloud amount (pink box). Nevertheless, the model using cloud amount forecasts is able to predict a similar pattern (in the dotted-red lines) to the actual pattern (in the dotted-black lines) as shown in Figure 13a by capturing the temporal features of the meteorological flow and future cloud amount.
The results of PV power generation forecasting are summarized in Table 2.

Seasonal Effects
To further analyze the seasonal performance of our proposed model, we conduct an additional set of experiments with the dataset divided by seasons. Concretely, we test our model with four test sets, i.e., the autumn test set from 1 October 2019 to 30 October 2019, the winter test set from 1 January 2020 Energies 2020, 13, 6603 13 of 15 to 30 January 2020, the spring test set from 1 March 2020 to 30 March 2020, and the summer test set from 28 May 2020 to 30 June 2020.
The results of this experiment are summarized in Table 3. The accuracy of the spring test set shows a similar result to that of the summer test set. As can be easily expected, the results of the winter test set show the lowest accuracy mainly because there are many days when the total amount of solar power generation is less than 5 kWh in winter. Furthermore, many days with zero power production can also be observed in winter. For these reasons, it is difficult for a prediction model to learn the common features of power generation and weather information in winter.

Conclusions
In this paper, we propose a PV power generation prediction model using satellite images. To efficiently handle missing data, we devise a two-step preprocessing method of satellite images, which maintains the quality of satellite image dataset without sacrificing the quantity of available satellite images.
Using the satellite image dataset, the proposed model first forecasts future cloud amount and uses it as a context feature to enhance prediction accuracy. To accurately forecast cloud amount, we apply the E3D-LSTM model proposed to overcome the drawbacks of ConvLSTM and spatiotemporal-LSTM.
Through extensive experiments, we show that our cloud amount forecasting model generates an accurate 5-h profile of cloud amount by simultaneously extracting the temporal and the spatial features of cloud amount in the satellite images. Using this forecasted cloud amount as an input data, we can inform the model about how much solar energy will be disturbed in the near future due to cloud cover. In addition, the convolutional self-attention based LSTM model is used to capture the local context from features by generating keys and queries that are more aware of the local context.
To show the efficacy of the cloud amount forecast model, we conduct experiments with and without future cloud amount forecasts. The forecasting results of the experiments using the cloud amount forecast show an accuracy improvement of 22.5% in MAPE, 16.6% in MAE, 15.7% in RMASE, and 13.6% in nMAE, compared to the forecast results of the experiment without cloud amount forecasts. These results confirm that the forecasting of cloud amount is essential for improving the accuracy of the PV power generation prediction model.
For future work, we plan to augment our method with long-term (e.g., 12-h ahead, 24-h ahead etc.) cloud amount prediction ability. We also plan to extend our model by applying various attention mechanisms to further enhance the prediction accuracy.