Cloud Cover Forecast Based on Correlation Analysis on Satellite Images for Short-Term Photovoltaic Power Forecasting

: Photovoltaic power generation must be predicted to counter the system instability caused by an increasing number of photovoltaic power-plant connections. In this study, a method for predicting the cloud volume and power generation using satellite images is proposed. Generally, solar irradiance and cloud cover have a high correlation. However, because the predicted solar irradiance is not provided by the Meteorological Administration or a weather site, cloud cover can be used instead of the predicted solar radiation. A lot of information, such as the direction and speed of movement of the cloud is contained in the satellite image. Therefore, the spatio-temporal correlation of the cloud is obtained from satellite images, and this correlation is presented pictorially. When the learning is complete, the current satellite image can be entered at the current time and the cloud value for the desired time can be obtained. In the case of the predictive model, the artiﬁcial neural network (ANN) model with the identical hyperparameters or setting values is used for data performance evaluation. Four cases of forecasting models are tested: cloud cover, visible image, infrared image, and a combination of the three variables. According to the result, the multivariable case showed the best performance for all test periods. Among single variable models, cloud cover presented a fair performance for short-term forecasting, and visible image presented a good performance for ultra-short-term forecasting.


Motivation and Aims
According to the Paris Climate Agreement, more than 200 countries, or the countries responsible for 87% of the world's carbon emissions, are implementing agreements aimed at reducing greenhouse gas emissions. To this end, we are attempting to reduce the use of fossil fuels, which are the biggest cause of greenhouse gas generation, and naturally reducing the number of power plants using fossil fuels has become an important task. Accordingly, the proportion of power plants using renewable energy is increasing rapidly. In particular, the solar facility capacity installed in 2020 was 707.5 GW, which was 21.5% higher than the previous year [1].
However, as the proportion of renewable energy generation increases, volatility and intermittency problems play a role in deteriorating the stability of the power system [2]. The power generation of a power plant is generally adjusted by the predicted load because the system needs to maintain a balance between power generation and load. However, because the output of variable renewable energy sources cannot be controlled, the larger the proportion of renewable energy, the more likely it is to break the balance between the power generation and load [3]. To flexibly respond to these inequalities and variability, it is necessary to predict the amount of renewable energy generation. This is because, by predicting the generation amount of each volatile resource in advance, the system operator

•
Correlation analysis between satellite images with respect to time and space. • Extraction of the cloud value of the target area in the correlation-based satellite image. • Presentation of methodology for data performance comparison.
Section 2 describes the data characteristics and the data processing, and Section 3 describes the process for predicting solar power and the technique used in the prediction model. Section 4 presents the simulation results, including the evaluation criteria, simulation method, results, and discussion, and Section 5 presents the conclusion.

Meteorological Data
Weather data can be obtained from the Meteorological Administration. These data were collected using the Automated Surface Observing System at meteorological observation stations scattered throughout Korea. The system provides data such as air pressure, dryness, and wind direction, as well as major weather variables such as temperature, Sustainability 2022, 14, 4427 3 of 24 humidity, and wind speed. However, only a few observation stations provide all variables. Therefore, the three stations located in Heuksando, Mokpo, and Yeosu were selected to collect cloud cover and irradiance data. The location of each observation station is shown in Figure 1. Based on the amount of solar irradiance and cloud cover measured at each observation station, the correlation between the PV power generation in Jeollanam-do and the meteorological data will be analyzed in Section 4. dryness, and wind direction, as well as major weather variables such as temperature, humidity, and wind speed. However, only a few observation stations provide all variables. Therefore, the three stations located in Heuksando, Mokpo, and Yeosu were selected to collect cloud cover and irradiance data. The location of each observation station is shown in Figure 1. Based on the amount of solar irradiance and cloud cover measured at each observation station, the correlation between the PV power generation in Jeollanam-do and the meteorological data will be analyzed in Section 4.
The data period used in this study was from 1 January 2017 to 31 December 2019, and the period of weather data was in units of 1 h. Because observational data were obtained using a measuring device, missing data and outlier data were also included. Therefore, the error was preprocessed before the correlation analysis. Missing data were interpolated using the average of the previous and following time zones. If the outliers were not in the range of 1% to 99% of the variance of the total data distribution, the data point was considered as an outlier and was replaced with values of 1% and 99%.

Satellite Image
Satellite images for three years from 1 January 2017 to 31 December 2019 were used. The satellite image was taken by the geostationary orbit satellite "Cheonlian 1" of Korea and released to the National Meteorological Satellite Center. Images measured at wavelengths in various areas were provided, and in this study, infrared images in the 10.8 µm wavelength and visible images in 0.87 µm wavelength were collected to track the movement of clouds. The visible image can reflect the intensity of sunlight reflected by the clouds and the ground, and the thicker the cloud, the stronger the reflective intensity, making it brighter in the image. Visible images are not available because there is no solar light at night. Infrared images represent images through large and small amounts of infrared energy emitted by an object; therefore, the observation is possible for 24 h. The The data period used in this study was from 1 January 2017 to 31 December 2019, and the period of weather data was in units of 1 h. Because observational data were obtained using a measuring device, missing data and outlier data were also included. Therefore, the error was preprocessed before the correlation analysis. Missing data were interpolated using the average of the previous and following time zones. If the outliers were not in the range of 1% to 99% of the variance of the total data distribution, the data point was considered as an outlier and was replaced with values of 1% and 99%.

Satellite Image
Satellite images for three years from 1 January 2017 to 31 December 2019 were used. The satellite image was taken by the geostationary orbit satellite "Cheonlian 1" of Korea and released to the National Meteorological Satellite Center. Images measured at wavelengths in various areas were provided, and in this study, infrared images in the 10.8 µm wavelength and visible images in 0.87 µm wavelength were collected to track the movement of clouds. The visible image can reflect the intensity of sunlight reflected by the clouds and the ground, and the thicker the cloud, the stronger the reflective intensity, making it brighter in the image. Visible images are not available because there is no solar light at night. Infrared images represent images through large and small amounts of infrared energy emitted by an object; therefore, the observation is possible for 24 h. The amount of infrared energy depends on the temperature of the object. The higher the temperature, the lower the cloud, and the darker it appears.
A visible image is useful to detect daytime cloud images, yellow dust, forest fires, fog observation, and atmospheric motion vectors, while an infrared image is useful for detecting cloud information, sea-level temperature, and yellow dust observation. Each image collected in this study is gray-scaled and has a pixel value between 0 and 255. Similarly, the images are 1500 pixels wide and 1300 pixels high, and each pixel represents a Sustainability 2022, 14, 4427 4 of 24 4 km × 4 km area, thus representing an area of 16 km 2 . An example of each image is shown in Figure 2, and the images are displayed for the same time, that is, 2 PM, when the light is strong. Although the time instant is the same for all the images, it can be seen that the observed cloud shapes in the visible image and infrared image are different. The accuracy of the data obtained through each image will be presented in Section 4. amount of infrared energy depends on the temperature of the object. The higher the temperature, the lower the cloud, and the darker it appears.
A visible image is useful to detect daytime cloud images, yellow dust, forest fires, fog observation, and atmospheric motion vectors, while an infrared image is useful for detecting cloud information, sea-level temperature, and yellow dust observation. Each image collected in this study is gray-scaled and has a pixel value between 0 and 255. Similarly, the images are 1500 pixels wide and 1300 pixels high, and each pixel represents a 4 km × 4 km area, thus representing an area of 16 km 2 . An example of each image is shown in Figure 2, and the images are displayed for the same time, that is, 2 PM, when the light is strong. Although the time instant is the same for all the images, it can be seen that the observed cloud shapes in the visible image and infrared image are different. The accuracy of the data obtained through each image will be presented in Section 4.  Table 1 shows the weather and satellite data during daylight hours corresponding to Figure 2. The weather data represent the data measured at the Mokpo Weather Station, and the satellite data represent the pixel value corresponding to the Mokpo area on the pixel coordinates. The maximum value of the cloud cover is 10, and the maximum pixel value of the satellite image is 255. The larger the value of cloud cover, visible image, and infrared image, the more is the cloud volume. Note that some measurements of visible images on 1 January 2017 are missing because the data were not available owing to short daylight hours in winter. This can be confirmed from Figure 3.   Table 1 shows the weather and satellite data during daylight hours corresponding to Figure 2. The weather data represent the data measured at the Mokpo Weather Station, and the satellite data represent the pixel value corresponding to the Mokpo area on the pixel coordinates. The maximum value of the cloud cover is 10, and the maximum pixel value of the satellite image is 255. The larger the value of cloud cover, visible image, and infrared image, the more is the cloud volume. Note that some measurements of visible images on 1 January 2017 are missing because the data were not available owing to short daylight hours in winter. This can be confirmed from Figure 3.

Photovoltaic Data
Historical power generation data are essential for analyzing the characteristics of PV power generation. In this study, the PV power generation data were acquired from power plants in Jeollanam-do, Korea. These data are publicly available. The period of the

Photovoltaic Data
Historical power generation data are essential for analyzing the characteristics of PV power generation. In this study, the PV power generation data were acquired from power plants in Jeollanam-do, Korea. These data are publicly available. The period of the collected data was from 1 January 2017 to 31 December 2019, and the interval of data was the same as that of the weather data. The maximum PV power generation in the region was approximately 533 MW. Figure 4 displays the distribution of data and shows the visible light-based pixel value, infrared-based pixel value, measurement cloud of the weather station, and power generation of power plants present in Jeollanam-do. Figure 4a,b shows that the distribution of values obtained from the satellite images is diverse and detailed, and Figure 4c shows that the range of data is small, and the data are concentrated at both ends. Figure 4d shows an approximately uniform data distribution from 0 to 533 MW. Just because the data obtained from the satellite images are finely organized, it cannot be inferred that the variables and amount of power generated are always highly correlated. However, because the number of various causes can be analyzed compared to the cloud data, a more precise power generation prediction can be performed. collected data was from 1 January 2017 to 31 December 2019, and the interval of data was the same as that of the weather data. The maximum PV power generation in the region was approximately 533 MW. Figure 4 displays the distribution of data and shows the visible light-based pixel value, infrared-based pixel value, measurement cloud of the weather station, and power generation of power plants present in Jeollanam-do. Figure 4a,b shows that the distribution of values obtained from the satellite images is diverse and detailed, and Figure 4c shows that the range of data is small, and the data are concentrated at both ends. Figure  4d shows an approximately uniform data distribution from 0 to 533 MW. Just because the data obtained from the satellite images are finely organized, it cannot be inferred that the variables and amount of power generated are always highly correlated. However, because the number of various causes can be analyzed compared to the cloud data, a more precise power generation prediction can be performed.

Image Processing
To extract data from satellite images and perform correlation analysis using other data, it is necessary to convert the satellite images into numerical data and synchronize the time. This process is depicted in Figure 5. First, images were sequentially retrieved from a database that stored satellite images. The file name included 12 digits, such as "201701010100", and consisted of 4 digits per year, 2 digits per month, 2 digits per day, 2 digits per hour, and 2 digits per minute. A procedure for checking whether the photographing date and time of the image were correct based on the corresponding file name was performed. This reduces errors when calculating the correlation between data, such as weather data and power generation. If no orthogonal data existed, an image taken 15 min later was used instead. This is because the image at the closest time is the image after 15 min because the interval of the image is 15 min, and the cloud does not change rapidly within 15 min. If there was no image after 15 min, the image after 30 min or 45 min was applied as an alternative, and if there was no such image, the data change for that time was reserved.

Image Processing
To extract data from satellite images and perform correlation analysis using other data, it is necessary to convert the satellite images into numerical data and synchronize the time. This process is depicted in Figure 5. First, images were sequentially retrieved from a database that stored satellite images. The file name included 12 digits, such as "201701010100", and consisted of 4 digits per year, 2 digits per month, 2 digits per day, 2 digits per hour, and 2 digits per minute. A procedure for checking whether the photographing date and time of the image were correct based on the corresponding file name was performed. This reduces errors when calculating the correlation between data, such as weather data and power generation. If no orthogonal data existed, an image taken 15 min later was used instead. This is because the image at the closest time is the image after 15 min because the interval of the image is 15 min, and the cloud does not change rapidly within 15 min. If there was no image after 15 min, the image after 30 min or 45 min was applied as an alternative, and if there was no such image, the data change for that time was reserved.  Table 2 shows the average change between images after 15 min, 1 h, 2 h, and 6 h at the target time based on the target area.  Table 2 shows the average change between images after 15 min, 1 h, 2 h, and 6 h at the target time based on the target area. The next step involved removing the guidelines existing in the satellite image, and the same value was input for all images to avoid affecting the calculation. As shown in Figure 6, the original image is drawn with a yellow guideline, and in this study, it was eliminated by using 0, as shown in Figure 7. Thereafter, for convenience of calculation, the image is gray-scaled, as shown in Figure 8. In the case of infrared or visible images, even though the RGB values were the same, they were duplicated and recorded, so they were unnecessarily 3D images. Therefore, because this was likely to cause additional operations, the one dimension must be converted into a two-dimensional image by unifying it into a single matrix. The next step involved removing the guidelines existing in the satellite image, an the same value was input for all images to avoid affecting the calculation. As shown i Figure 6, the original image is drawn with a yellow guideline, and in this study, it wa eliminated by using 0, as shown in Figure 7. Thereafter, for convenience of calculation, th image is gray-scaled, as shown in Figure 8. In the case of infrared or visible images, eve though the RGB values were the same, they were duplicated and recorded, so they wer unnecessarily 3D images. Therefore, because this was likely to cause additional opera tions, the one dimension must be converted into a two-dimensional image by unifying into a single matrix.     When image color conversion was completed, the correlation must be divide track the clouds. However, analyzing all correlations for images of 1500 px × 1300 px difficult because of resource limitations. Therefore, after separating the image into se grids, as shown in Figure 9, we can determine whether a cloud exists using the repres tive value of the grid. In this study, satellite images were divided into grids in units px × 10 px and reduced to 150 px × 130 px images. During the reduction process, the resentative value of each grid was determined using an average value of 100 pixels equation expressing the average value of the grid is presented as follows, and the re are shown in Figure 10. When image color conversion was completed, the correlation must be divided to track the clouds. However, analyzing all correlations for images of 1500 px × 1300 px was difficult because of resource limitations. Therefore, after separating the image into several grids, as shown in Figure 9, we can determine whether a cloud exists using the representative value of the grid. In this study, satellite images were divided into grids in units of 10 px × 10 px and reduced to 150 px × 130 px images. During the reduction process, the representative value of each grid was determined using an average value of 100 pixels. The equation expressing the average value of the grid is presented as follows, and the results are shown in Figure 10.
Average value o f Grid n = 1 x grid size × y grid size When size conversion was completed, the two-dimensional image was converted into a one-dimensional arrangement and stored in a database. This will facilitate calculations in future correlation analyses. Table 3 shows the average value of how much the value changes after 15 min, 1 h, 2 h, and 6 h for the target area after image processing. Compared to Table 1, the change decreased by approximately 1% for 15 min and increased by 2% for 1 h intervals.
When size conversion was completed, the two-dimensional image was converted into a one-dimensional arrangement and stored in a database. This will facilitate calculations in future correlation analyses.    When size conversion was completed, the two-dimensional image was conve into a one-dimensional arrangement and stored in a database. This will facilitate cal tions in future correlation analyses.

Correlation Analysis
The correlation analysis was the core process of this study, and this process is shown in Figure 11. The algorithm starts with inputting the value in the target time, target area, and T. The target time denotes reference time and selects 24 h period composed of 1-h intervals. The target area denotes a reference area or pixel. The coordinates of the pixel were found by matching the targeted area in the preprocessed satellite image. The T can be 23 h maximum before the reference time, and correlation analysis was performed for 1-h interval time as necessary.
were found by matching the targeted area in the preprocessed satellite image. The T can be 23 h maximum before the reference time, and correlation analysis was performed for 1-hour interval time as necessary.
In the paper, the Mokpo area of Jeollanam-do was selected as the reference area, and the target area would be the pixel (65, 75) matched with the targeted area in preprocessed images. To analyze how the cloud varied between 11 AM and 2 PM, we inputted 2 PM in the target time and 11 AM in T. In the next step, uploading the image corresponding to the target time will enable the data of the pixel indicated by the target area to be extracted. Then, uploading the new image corresponding to the time T, the value of the target area of time T was replaced by the extracted value and stored in the workspace. If the same steps were applied to the same time interval for three-year satellite images, data would be collected for approximately 1000 images. The data have N scalar observations, and then the Pearson correlation coefficient is defined as: Figure 11. Process of correlation analysis.
In the paper, the Mokpo area of Jeollanam-do was selected as the reference area, and the target area would be the pixel (65, 75) matched with the targeted area in preprocessed images. To analyze how the cloud varied between 11 AM and 2 PM, we inputted 2 PM in the target time and 11 AM in T.
In the next step, uploading the image corresponding to the target time will enable the data of the pixel indicated by the target area to be extracted. Then, uploading the new image corresponding to the time T, the value of the target area of time T was replaced by the extracted value and stored in the workspace. If the same steps were applied to the same time interval for three-year satellite images, data would be collected for approximately 1000 images. The data have N scalar observations, and then the Pearson correlation coefficient is defined as: where (target area) i and grid n denote the cloud cover value of a designated area and other grids' cloud cover values, respectively. n ranges from one to the total number of pixels, which totals 19,500 pixels in the image. N is the number of data for correlation analysis. µ(target area) and µ(grid n ) are the averages of the target and nth grids, respectively. σ(target area) and σ(grid n ) are the covariances of the target and nth grids, respectively. Assuming 2 PM as the target time, comparing the relationship with 11 AM, which was 3 h ago, the results of the correlation analysis are shown in Figure 12. In the image, there are 150 x-axis pixels and 130 y-axis pixels, which are of the same size as the existing adjusted satellite image. In other words, each point represents a correlation between the region and the target region. The black dot in the middle of Figure 12 represents the target area, the yellow color in the surrounding pixels denotes that the cloud correlation between the target area and the area is high, and the blue color denotes that the cloud correlation between the target area and the area is low.
Assuming 2 PM as the target time, comparing the relationship with 11 AM, w was 3 h ago, the results of the correlation analysis are shown in Figure 12. In the im there are 150 x-axis pixels and 130 y-axis pixels, which are of the same size as the exis adjusted satellite image. In other words, each point represents a correlation between region and the target region. The black dot in the middle of Figure 12 represents the ta area, the yellow color in the surrounding pixels denotes that the cloud correlation betw the target area and the area is high, and the blue color denotes that the cloud correla between the target area and the area is low. When the correlation analysis was finished, the coordinates of the pixel having highest correlation between 19,499 pixels and the target area were stored in the datab Since correlation analysis was performed for up to 23 h, a correlation coefficient matr size 24 × 24 was generated when all correlation analysis was performed for 24 h.
These were classified and stored with respect to reference time and target time z and each stored value has a role to inform pixel coordinates for the designated time p when entering a new satellite image. For example, when the current time is 11 AM, the cloud volume of the target area is to be predicted at 2 PM, the value of the sate image is extracted using the pixel coordinate value corresponding to the interval of t hours from 11 AM. This was used under the assumption that if there were large amo of clouds in the correlated area at that time, there were also many clouds in the target a few hours later.

Prediction Process
When performing the actual prediction process, the time at which the correla analysis was performed was reversed. If the prediction time was 9 AM, the cl When the correlation analysis was finished, the coordinates of the pixel having the highest correlation between 19,499 pixels and the target area were stored in the database. Since correlation analysis was performed for up to 23 h, a correlation coefficient matrix of size 24 × 24 was generated when all correlation analysis was performed for 24 h.
These were classified and stored with respect to reference time and target time zone, and each stored value has a role to inform pixel coordinates for the designated time point when entering a new satellite image. For example, when the current time is 11 AM, and the cloud volume of the target area is to be predicted at 2 PM, the value of the satellite image is extracted using the pixel coordinate value corresponding to the interval of three hours from 11 AM. This was used under the assumption that if there were large amounts of clouds in the correlated area at that time, there were also many clouds in the target area a few hours later.

Prediction Process
When performing the actual prediction process, the time at which the correlation analysis was performed was reversed. If the prediction time was 9 AM, the cloud prediction was performed for 24 h using a correlation matrix with an image taken 1 h before 10 AM, 11 AM with an image taken 2 h ago, and noon with an image taken 3 h ago. Therefore, when a satellite image for the current time point was input, the pixel coordinates of the highly correlated regions were retrieved for each period, and the pixel values in the current image were extracted. Based on this, it was possible to predict the target region's cloud cover value for 24 h and to determine the solar power generation prediction value for the target region using this as an input value of the power generation prediction model. The process is shown in Figure 13. before 10 AM, 11 AM with an image taken 2 h ago, and noon with an image taken 3 h ago. Therefore, when a satellite image for the current time point was input, the pixel coordinates of the highly correlated regions were retrieved for each period, and the pixel values in the current image were extracted. Based on this, it was possible to predict the target region's cloud cover value for 24 h and to determine the solar power generation prediction value for the target region using this as an input value of the power generation prediction model. The process is shown in Figure 13.

Forecasting Model with ANN
For evaluating the impact and performance of the data, the prediction model was fixed with ANN, and only the inputs were set differently. The defined ANN model in the paper is a fully connected structure and consists of one input layer, two hidden layers, and one output layer. At the input layer, a single variable input uses one neuron, and a multivariable input uses three neurons. Each of the hidden layers consists of 10 neurons. The mathematical expression of the first hidden layer is shown in Equation (3).
( ) denotes the weight connected from the th of the input layer to the th neuron of the first hidden layer. ( ) denotes the bias of the th neuron, and means the th neuron of the input layer. means the number of neurons in the input layer. Since the number of neurons in the first hidden layer is 10, 1 to 10 are input in .
represents the sum of weight and bias.
( ) represents transferred signals through the activation function shown in Equation (4). The calculation method of the second hidden layer is the same as the calculation method of the first hidden layer, which is shown in Equations (6) and (7). The activation function uses the same function, and the rectified linear unit (ReLU) function is used.

Forecasting Model with ANN
For evaluating the impact and performance of the data, the prediction model was fixed with ANN, and only the inputs were set differently. The defined ANN model in the paper is a fully connected structure and consists of one input layer, two hidden layers, and one output layer. At the input layer, a single variable input uses one neuron, and a multivariable input uses three neurons. Each of the hidden layers consists of 10 neurons.

The mathematical expression of the first hidden layer is shown in Equation (3). w
(1) ji denotes the weight connected from the ith of the input layer to the jth neuron of the first hidden layer. b (1) j denotes the bias of the jth neuron, and x i means the ith neuron of the input layer. m means the number of neurons in the input layer. Since the number of neurons in the first hidden layer is 10, 1 to 10 are input in j. a represents the sum of weight and bias. z (1) j represents transferred signals through the activation function shown in Equation (4). The calculation method of the second hidden layer is the same as the calculation method of the first hidden layer, which is shown in Equations (6) and (7). The activation function uses the same function, and the rectified linear unit (ReLU) function is used.

of 24
The output layer is composed of one neuron, and this model uses an identity function because it is a problem of forecasting continuous values from input data. The sum of the weights of the output layers is represented by Equation (8), and y, the predicted value output, is represented by Equation (9).
The ANN structure representing the above equation is shown in Figure 14. ANN works as a regression model for learning nonlinear characteristics. In the input layer, x presents a cloud cover, a pixel value of an infrared image, and a pixel value of a visible image. The signal from the input layer is variated by the weight, bias, and activation function while passing through each neuron of the hidden layers. In the output layer, each signal is calculated without an activation function. The output signal y is the amount of PV power predicted by the input variable. Since there is an error between predicted PV power and actual PV power in the early training sequence, the forecasting model reduces the error by updating the weight and the bias through backpropagation and repetitive tasks.
The output layer is composed of one neuron, and this model uses an identity function because it is a problem of forecasting continuous values from input data. The sum of the weights of the output layers is represented by Equation (8), and , the predicted value output, is represented by Equation (9).
The ANN structure representing the above equation is shown in Figure 14. ANN works as a regression model for learning nonlinear characteristics. In the input layer, presents a cloud cover, a pixel value of an infrared image, and a pixel value of a visible image. The signal from the input layer is variated by the weight, bias, and activation function while passing through each neuron of the hidden layers. In the output layer, each signal is calculated without an activation function. The output signal is the amount of PV power predicted by the input variable. Since there is an error between predicted PV power and actual PV power in the early training sequence, the forecasting model reduces the error by updating the weight and the bias through backpropagation and repetitive tasks.  The period of the data was from 1 January 2019 to 31 December 2019. The training period was 1 January to 30 September, and the test period was 1 October to 31 December. The data consist of the set of input values and actual PV power.

Performance Evaluation Metric and Equipment
This section evaluates the accuracy of the PV power generation forecasting model based on satellite images. As an assessment metric to measure accuracy, the mean square error was adapted as follows: where y i, real and y i, predict denote the real PV power generation value and the predicted value using the forecast model, respectively, and n is the total amount of data. All the operating sequences and models were realized using MATLAB 2021b. The computer was equipped with Windows 10 Pro, NVIDIA GeForce RTX 2070 Super, i7-9700k CPU, and 32 GB RAM.

Simulation Results
First, the correlation between each weather station and solar power generation was analyzed. Of the 26,279 data points collected over a total of three years, a correlation analysis was conducted on 10,950 data points based on the sunshine time of 7 AM to 6 PM. Among the existing measurement stations in Jeollanam-do, the correlation analysis targets were the Mokpo Weather Station, Yeosu Weather Station, and Heuksando Weather Station, which provide solar irradiance information. Figure 15 displays 168 data points as an example from 1 January 2017 to 7 January 2017, and scales from 0 to 1 to compare each variable. The figure indicates that the trend of insolation at each measuring station and the trend of solar power generation in Jeollanamdo are similar. Because Mokpo Meteorological Station has the highest correlation with the amount of power generated, the performance of the meteorological site's cloud cover and the cloud cover extracted from satellite images will be determined based on the region. In addition, as can be seen from the correlation analysis results in Figure 16, the amount of solar irradiance measured at the Mokpo Meteorological Station was 0.9079, which was the highest correlation with the amount of power generation. Therefore, the results were compared based on the Mokpo area.    Cloud quantity divides the amount of cloud into grades from 0 to 10, and the close the quantity is to 10, the more clouds are present. In general, because there is an invers relationship in which the amount of power generation decreases with the increase in th number of clouds, the value was converted and used in this study by subtracting the cloud from 10. Subsequently, it was scaled from 0 to 1 for correlation analysis. Figure 17 shows the cloud data and power generation collected for the same period and region. In the case of the cloudiness of each meteorological station, it shows a differen form of the curve from the amount of solar irradiance. This is because the range of value is simple, and the fluctuation of values is large. The correlation analysis results in Figur  18 show that the correlation coefficient with the generation amount is low, unlike sola irradiance. In addition, unlike the amount of solar irradiance that directly measures th amount of light reaching the ground, it does not appear to reflect the difference in powe generation due to the altitude of the sun because the focus is the amount of cloud. Cloud quantity divides the amount of cloud into grades from 0 to 10, and the closer the quantity is to 10, the more clouds are present. In general, because there is an inverse relationship in which the amount of power generation decreases with the increase in the number of clouds, the value was converted and used in this study by subtracting the cloud from 10. Subsequently, it was scaled from 0 to 1 for correlation analysis. Figure 17 shows the cloud data and power generation collected for the same period and region. In the case of the cloudiness of each meteorological station, it shows a different form of the curve from the amount of solar irradiance. This is because the range of values is simple, and the fluctuation of values is large. The correlation analysis results in Figure 18 show that the correlation coefficient with the generation amount is low, unlike solar irradiance. In addition, unlike the amount of solar irradiance that directly measures the amount of light reaching the ground, it does not appear to reflect the difference in power generation due to the altitude of the sun because the focus is the amount of cloud.    To evaluate the correlation of the satellite images, a comparison was conducted based on the data of the weather station in Mokpo, which has the highest correlation with th amount of solar power in Jeollanam-do. Satellite images were obtained from two types o images: visible and infrared. The values of pixels corresponding to the Mokpo area wer extracted using the methodology mentioned in Section 3. Similar to the cloud provided by the weather station, the satellite image has an inversely proportional relationship with the amount of power generated. The higher the value of the cloud, the higher the value o the pixel. Therefore, the extracted pixel value was subtracted from 255, which is the max imum pixel value of the image, and applied to the correlation analysis.
It can be observed in Figure 19 that the infrared image shows a graph that is the mos similar to the amount of power generation. The visible image tends to have a high valu at sunrise time, which may be because it does not reflect the change in the sunrise tim depending on the season. The visible image measures the intensity of sunlight reflected from the clouds and the ground; therefore, it is impossible to measure the intensity a night without sunlight, and only half of the image is photographed at sunrise and sunset To evaluate the correlation of the satellite images, a comparison was conducted based on the data of the weather station in Mokpo, which has the highest correlation with the amount of solar power in Jeollanam-do. Satellite images were obtained from two types of images: visible and infrared. The values of pixels corresponding to the Mokpo area were extracted using the methodology mentioned in Section 3. Similar to the cloud provided by the weather station, the satellite image has an inversely proportional relationship with the amount of power generated. The higher the value of the cloud, the higher the value of the pixel. Therefore, the extracted pixel value was subtracted from 255, which is the maximum pixel value of the image, and applied to the correlation analysis.
It can be observed in Figure 19 that the infrared image shows a graph that is the most similar to the amount of power generation. The visible image tends to have a high value at sunrise time, which may be because it does not reflect the change in the sunrise time depending on the season. The visible image measures the intensity of sunlight reflected from the clouds and the ground; therefore, it is impossible to measure the intensity at night without sunlight, and only half of the image is photographed at sunrise and sunset; therefore, if the target area spans the area, the value is low. Consequently, it appears that the converted visible image value will always output high values. The cloud cover appears to correspond to changes in the amount of power generation; however, the correlation is unlikely to be high because the value fluctuates remarkably, and it is difficult to reflect the intensity of light. Figure 20 presents the results of correlation analysis for the variables presented in the study. Excluding the amount of insolation, the method of extracting pixel values from infrared images was the most correlated with 0.6572, while the visible image was 0.6116, and the cloud measured at the measuring station was 0.5711. The variable with the highest correlation with insolation was infrared images, and the variables with the highest correlation between input variables were 0.8396 with infrared images and visible images.
As shown in the correlation analysis, the satellite images are similar to cloud cover, but have a higher correlation coefficient. When making actual predictions, the weather variables that the users can obtain through the Meteorological Administration or weather sites are the predictions of cloud cover and real-time satellite images. However, unlike satellite images, because the predicted value does not exist in the database, the experiment was conducted using the measured value instead of the predicted value. therefore, if the target area spans the area, the value is low. Consequently, it appears that the converted visible image value will always output high values. The cloud cover appears to correspond to changes in the amount of power generation; however, the correlation is unlikely to be high because the value fluctuates remarkably, and it is difficult to reflect the intensity of light.   As shown in the correlation analysis, the satellite images are similar to cloud cover, but have a higher correlation coefficient. When making actual predictions, the weather    As shown in the correlation analysis, the satellite images are similar to cloud cover but have a higher correlation coefficient. When making actual predictions, the weathe First, because the predicted values extracted from the satellite images were required, correlation analysis between the target area and other areas was conducted at 1 h intervals using the method proposed in Section 3. For comparison, the correlation analysis was conducted on the infrared and visible images in the same way, and Figure 21 shows the correlation results for 10 AM, 1 PM, and 4 PM, when predictions were made for 7 AM. In Figure 21, the closer the color of each pixel to yellow, the higher the correlation, and the closer the color to blue, the lower the correlation. Figure 21a,c,e shows the results for the visible images, and Figure 21b,d,f shows the results for infrared images. In both the types of images, it can be observed that as the time interval between the current time point and predicted time point increases, the range of pixels having a high correlation increases. This can also help to infer the main direction and distance of the cloud. Figure 21, the closer the color of each pixel to yellow, the higher the correlation, and the closer the color to blue, the lower the correlation. Figure 21a,c,e shows the results for the visible images, and Figure 21b,d,f shows the results for infrared images. In both the types of images, it can be observed that as the time interval between the current time point and predicted time point increases, the range of pixels having a high correlation increases. This can also help to infer the main direction and distance of the cloud.  Table 4 selects pixels with the highest correlation coefficient over time for infrared and visible images and shows the corresponding values. It means the correlation between the value at the coordinates of the pixel closest to yellow in each image of Figure 21 and the target point at the current time. This result is obtained from the correlation analysis at 1 h intervals for 7 AM as the prediction start and 8 AM to 5 PM as the sunlight time. The selected maximum correlation pixel stores the coordinates and is used to predict the flow rate later. In the case of infrared images, as shown in Figure 21, the maximum correlation coefficient decreased as the distance from the prediction time increased. In the case of visible images, in a peculiar case, if you consider the image taken at 7 AM, only half of the images are displayed because the sun has not yet completely illuminated all areas, and it seems that the exact correlation could not be inferred. In contrast, we found that the sun starts to set at 5 PM, and half of the images appear black, which results in a peculiar correlation. As shown in Figure 22, we compared the measured cloud cover and satellite images, predicted satellite images, and PV power for three days from 2 December to 5 December 2019. The PV power first shows a decreasing trend once during the day and then increases again, and the cloud cover measured by the Korea Meteorological Administration shows the same trend. However, owing to the lack of resolution of the values, the trend was not elaborately expressed. Green and red show cloud cover predicted using infrared and visible images, respectively. Although power generation tends to decrease during the day, errors also occur.  Table 5 lists the results of predicting the amount of power generation after 1 h to 6 h from 7 AM. As for the prediction model, all the ANN-based models have identical structures and hyperparameters, and the period was 1 October 2019-31 December 2019, and the verification was performed on a total of 92 data points. The results using visible image data were excellent until 2 h in a single model, but the results using cloud cover from 3 h later show better results, and the gap widens as time increases. However, the data that  Table 5 lists the results of predicting the amount of power generation after 1 h to 6 h from 7 AM. As for the prediction model, all the ANN-based models have identical structures and hyperparameters, and the period was 1 October 2019-31 December 2019, and the verification was performed on a total of 92 data points. The results using visible image data were excellent until 2 h in a single model, but the results using cloud cover from 3 h later show better results, and the gap widens as time increases. However, the data that showed the best results are the cases where all the variables were used together. The multivariable case shows the highest accuracy from 1 h to 6 h, which means that each variable can be used complementarily.

Discussion
In the correlation analysis, because the correlation of the infrared images was the highest, it was expected that the best performance would be achieved when applied to the actual prediction model. However, in reality, the visible images demonstrated better performance for the ultra-short term, and the measured cloud cover demonstrated better prediction results after 3 h. In addition, to determine the possibility of data as a whole, an analysis of the images captured for three years was conducted without considering seasonality; however, further research is required on this because the movement of clouds is generally related to seasonal factors. In addition, performance evaluation is subsequently required by performing predictions on cloud volume and analyzing various prediction models for comparison.

Conclusions
With the increase in the proportion of solar power plants, more sophisticated prediction models than those before are required. The methods for improving forecast accuracy includes improving the quality of data or optimizing it using a suitable prediction model. This study was focused on improving the quality of the data. In particular, because the amount of solar irradiance that has the greatest correlation with the amount of PV power is not provided as a predicted value by the Meteorological Administration or weather site, the cloud cover that is correlated with PV power generation was selected as an alternative.
This study selected the weather station, the most highly correlated with the PV generation among the weather stations that record cloud cover, and compared the correlation with satellite images using data from the weather station. To this end, the pixel values of the satellite image were compared with the weather station data in the Mokpo area, and it was confirmed that the satellite image had a higher correlation with the PV power generation amount by 0.04-0.08. However, on applying cloud cover and satellite images to actual predictions, it was confirmed that the results of predictions using satellite images were better in the ultra-short term; however, the results of predictions using cloud cover were better after 3 h. Although cloud cover has an advantage of using measured values, the multivariable case showed the best predictive performance.
In conclusion, when using single input data, the satellite images were the best in the ultra-short term, and it may be better to use cloud cover when moving on to the short term. In addition, it was confirmed that when using multiple input data, the multivariable case shows better prediction performance than that using single meteorological data or satellite data.