1. Introduction
Clouds have a serious impact on photovoltaic (PV) power production. By limiting the levels of solar irradiance reaching the PV, the clouds can contribute to the high variability of PV power output. Therefore, it is necessary to monitor cloud formation and obtain information about clouds in order to be able to prepare backup energy sources that will cover any reductions in energy output. Cloud information is applied in weather analyses and meteorological data; also, it is used for energy applications such as solar irradiance and PV power estimation. Furthermore, cloud information, such as cloud cover and cloud motion, has frequently been examined as a source of renewable energy. Chow et al. [
1] used a sky camera installed in California to develop a method for intra-hour cloud motion and to forecast global horizontal irradiance (GHI). Kim et al. [
2] used sky images to retrieve cloud cover and validated these images through human observations to estimate solar irradiance. Lothon et al. [
3] investigated an algorithm to estimate the cloud cover from sky images.
Cloud information is a key input variable for solar irradiance forecasting, which is a critical issue to manage uncontrollable production of PV power. In general, many methods are able to forecast solar irradiance, and these methods are grouped according to the forecast horizon. For long-term forecast horizons, numerical weather prediction that relies on the numerical solution of governing equations in meteorological sciences is most useful. Because predictions can be made up to 15 days in advance, such long-term forecast horizon helps operation optimization and market participation. On the other hand, sky images and satellite images are used for forecasting solar irradiance in the short-term forecast horizon. These methods are able to forecast solar irradiance several minutes to several hours in advance. This short-term forecasting is useful in the anticipation of ramp events caused by variations in solar irradiance. Compared to satellite images, sky images offer more detailed cloud information with higher spatial resolution. Therefore, the solar irradiance forecasting based on sky images is appropriate for management of a specific PV system, in spite of shorter time horizon.
In general, information about clouds, especially for cloud cover data, is collected through human observations, and the World Meteorological Organization provides the rules for the registration of cloud cover. Observers estimate the cloud cover in oktas or tenths, in which the sky is divided into 8 or 10 regions and the regions that are covered by clouds are evaluated. However, the accuracy of this traditional method is deemed unsatisfactory: it provides low temporal and spatial resolutions, and errors can be introduced through the subjective nature of observers’ judgments. Hence, a hemispherical sky camera is an alternative solution that can be used to address some of these problems. The camera gathers images at high frequency during the hours of daylight.
The fast evolution of low-cost hemispherical sky cameras has been preferred due to their application in gathering cloud information [
4], and they have become very popular in the fields of solar energy and cloud motion detection. The sky camera is able to produce cloud cover information by image processing from pixel distribution. The data are calculated based on an algorithm that applies the red, green, and blue (RGB) channels of an image and combines them with an adaptive threshold in order to distinguish the cloud and sky pixels. Besides, this data can be combined with physical models to estimate solar irradiance. For example, Kim et al. [
5] used cloud cover to estimate solar irradiance in South Korea using the cloud cover radiation model. This model is a regression-type model that was developed by Kasten et al. [
6] to estimate GHI on an hourly basis. Furthermore, some researchers have also used sky images to forecast solar irradiance in short-term forecast horizons. Caldas et al. [
7] forecasted GHI using sky images and real-time GHI measurements. They applied the cloud correlation model (CCM) to these images to predict GHI up to 10 min in advance. These results indicate that the proposed model was able to predict GHI in a very short-term forecast horizon under high variability of solar irradiance.
So far, the GHI forecasting using sky images was calculated using conventional methods, such as CCM and cloud motion vector (CMV). They estimated the cloud motion by calculating the motion of the pixels on images. However, these methods did not provide good accuracy because the cloud motion is not linear and hard to predict. Meanwhile, deep learning, which is the subpart of artificial intelligence (AI), has been increasingly used in solar irradiance and PV power forecasting [
8,
9,
10,
11]. Various deep learning models, such as recurrent neural network (RNN), long short-term memory (LSTM), gated recurrent unit (GRU), and deep belief network (DBN), provide high accuracy in forecasting solar irradiance for both short-term and long-term forecast horizons. Nevertheless, no studies applied deep learning models to forecast the solar irradiance using cloud cover data obtained from sky images.
In this study, we proposed a method to estimate solar irradiance in high variability of solar irradiance such as partly cloudy days. The proposed method combines the cloud cover obtained from sky images, the deep learning model to forecast the cloud cover ten minutes ahead, and the physical model to estimate GHI with the forecasted cloud cover. The LSTM was selected out of deep learning algorithms because it is recommendable for predicting time-series data compared with other deep learning models [
8]. The forecasting was conducted for clear, cloudy, and overcast days on a minute basis and validated by the comparison with GHI measurement at a ground station in Seoul, South Korea. Because no study used the deep learning model to forecast the cloud cover from sky images, this study will investigate its applicability.
3. Methods
The method to forecast solar irradiance, which is comprised of the two steps, is presented in this section. In the first step, the cloud cover is calculated from sky images, and then the future cloud cover is predict using LSTM. In the second step, the GHI is forecasted by applying the predicted cloud cover for the solar radiation model as the input data.
3.1. Cloud Cover Algorithm
To include further details of the cloud cover retrieval, we have created a simple flow chart of the algorithm used to detect the cloud and sky pixels on images, as shown in
Figure 5. This flow chart outlines the process of obtaining the cloud cover from sky images by using the distribution of the pixel values in images. The first step is to collect the sky images from the sky camera. These images are converted into a one-channel image in order to improve the contrast and to reduce the noise of the image by applying the RBR method; this method was proposed by Shield et al. [
13] and is able to successfully detect the cloud and sky pixels in an image. Mathematically, the RBR method is defined as the ratio of the red and blue channels of an image with the pixel value ranging from 0 to 255. It should be noted that the value of the blue channel is increased by 1 if it equals 0 to avoid dividing by 0 [
14].
After that, the threshold value is applied to the one-channel image to distinguish the clouds from the sky. We calculated the value using the Otsu thresholding method that Nobuyaski Otsu proposed back in 1979 [
15]. This method was chosen on account of its flexibility and robustness in identifying the cloud and sky pixels. Another advantage of this method is the simple process involved in determining the threshold values: since the calculation requires one-dimensional intensity data, the other parameters, such as the shape or geometric components of an object, do not affect the accuracy of the threshold value.
It is worth noting that the basic concept in thresholding is to segment an image based on the difference between the pixel value. Therefore, a fixed threshold value cannot be applied to all-sky images because the brightness in sky images differs on account of the ever-changing position of the sun. To address these problems, the threshold value is adapted to all images, and this makes the value different in each image.
Finally, the cloud cover was calculated using the ratio of cloud pixels to total pixels based on the percentage of the cloud pixels in the binary image. The measurement of cloud cover is reported in the number of parts of the sky covered by clouds. The sky can be split into 8 (oktas) or 10 (tenths) parts representing the amount of cloud in a particular sky. It should also be noted that cloud cover does not describe the cloud thickness and that it only refers to the amount of the sky covered by clouds in a particular location.
3.2. Solar Radiation Model
The solar radiation model has been used to estimate GHI using various input data. One of the solar radiation models that uses cloud cover as the input data is the Kasten model [
6]. This regression-type model estimates GHI by using a correlation between the cloud cover and clear-sky irradiance. In this model, the cloud cover was divided into 9 classes, where 0 refers to a clear sky and 8 an overcast sky. This model has been widely used in different locations, but some researchers had to modify the coefficients in order to obtain results in their location. In this specific location in Seoul, the coefficients were obtained from research conducted by Yoo et al. [
16]. The GHI is calculated using this formula:
where
is the cloud cover,
is the solar irradiance on clear sky condition,
is solar elevation,
A = 0.75,
B = 2.6,
C = 963, and
D =106. However, the specific value does not reach the GHI under overcast sky conditions, since the minimum value for an overcast sky is a quarter of clear-sky irradiance. Therefore, we proposed a new model by modifying the coefficient in Kasten’s model and the clear sky irradiance model. The coefficients to calculate the GHI for each model are presented in
Table 1.
Many clear-sky irradiance models have been made available to calculate the solar irradiance under clear sky conditions, and each of these models requires different parameters as the input. For example, Dazhi et al. [
17] calculated clear-sky irradiance in combination with the solar zenith angle and the eccentricity of the earth. Antonanzas-Torres et al. [
18] estimated solar irradiance based on commonly measured variables, such as temperature, rainfall, and humidity. Yang et al. [
19] proposed a model to calculate solar irradiance based on ozone absorption, water vapor absorption, permanent gas absorption, aerosol extinction, and Rayleigh scattering. In this work, we used the Ineichen clear-sky model after modification by Reno et al. [
20], as this model affords good accuracy and fairly easy to execute. The equations to obtain the clear-sky irradiance obtained from are expressed as follows:
where
is extraterrestrial normal incident irradiance,
is solar zenith,
is air mass,
is linked turbidity factor, and
is the ground elevation expressed in meters.
Here, the linked turbidity refers to the optical thickness of the atmosphere due to the presence of gaseous water vapor and the absorption and scattering by the aerosol [
21]. It expresses the transparency of the sky or cloudless atmosphere. In a case where the sky is perfectly blue (clean), the
value is close to 1. However, if the sky has high water vapor and the color is closer to white, the
becomes larger. The air mass refers to the relative path length of the direct solar beam through the atmosphere, and it describes the ratio of the distance traveled by solar radiation in reaching the atmosphere to the distance of the sun directly overhead. Note, in this solar irradiance model, the air mass is dependent solely on the so-lar zenith.
3.3. Deep Learning Model
In this study, we used LSTM as a deep learning model to forecast the cloud cover up to several minutes ahead. LSTM was chosen as it provides satisfactory results in handling time series data. Rajagukguk et al. [
7] identified that this model performs better than other deep learning models, such as RNN and GRU. Furthermore, this deep learning model demonstrates good performance in solar irradiance forecasting in both short-term and long-term forecast horizons. LSTM was developed to overcome the problems with vanishing and explosion gradients that often occur in other deep learning models. For instance, in the case of the RNN, when these problems occurred in the learning process, the learning performance failed to increase [
22].
The LSTM model was proposed by Hochreiter and Scmidhuber to adapt to the long-term dependence on the information [
23]. The unit, as has been illustrated in
Figure 6, consists of forget gate, input gate, output gate, and cell state. For simplicity’s sake, the structure can be formulated as follows:
where
is the forget gate,
is the input gate,
is the output gate,
is input data,
is bias,
,
,
are weight matrices,
is the value of memory cell,
is candidate state of the memory cell,
and
are the activation functions, and
is the state of the memory cell.
In Equations (11), (12) and (15), the sigmoid function is used to calculate the amount of information that passes through the gate with values from 0 to 1. The candidate state of the memory cell in Equation (13) contains the function that makes the value ranging from −1 to 1 in order to calculate the new information. The input and forgot gates in Equation (14) are operated with the Hadamard product to calculate the state memory of the cell. The final output of the memory cell in Equation (16) was obtained after multiplying with the output gate.
A total of 12,000 images were processed to obtain minutely cloud cover information in this study. These calculated data provided the input for LSTM to forecast the future cloud cover. To illustrate the model design, the details of these hyperparameters are listed in
Table 2.
The optimum hyperparameters such as epochs, batch size, learning rate, optimization algorithm, and activation functions for deep learning models depend on the datasets. The epochs describe the number of complete passes (forward and backward) through the neural network. The batch size denotes the number of training examples in one pass (forward and backward). The learning rate controls the weight in a neural network with reference to loss gradient. The optimization algorithm is used to find the attributes, such as determining the weights neural network to reduce the losses. Along with this structure, this LSTM model is also equipped with an activation function known as a rectified linear unit (ReLU). ReLu is a non-linear function that oppresses a value below 0 to become exactly 0 but still inherits some linear property for cases above 0. Because this function has a linear characteristic, it can easily train by a deep network of neurons and also solve a case of gradient problem by ignoring the negative values. It should be noted that there is no fixed value in deep learning models to explain the optimum design for each model because the networks inside deep learning models are trained iteratively. Therefore, the best way to determine the optimal hyperparameter was to use errors during validation and training to assess the algorithm’s accuracy.
3.4. Evaluation Metric
In order to validate the performance of forecasting, various common evaluation metrics have been used to calculate the accuracy of the model, including mean bias difference (MBD), root mean square difference (RMSD), relative root means square difference (rRMSD), and relative mean bias difference (rMBD). We prefer to use differences rather than errors as the measurement data by nature include uncertainty; hence, the true values are unknown [
24]. The evaluation metrics are given in the following equations:
where
is the estimated values at each time,
is the measured values at each time, and
is the number of sample data for the period.
The RMSD explains the deviation from the measurement and it always generates a positive value. This value measures how close the prediction is to the measurement; thus, a smaller value is deemed better. The MBD shows the average bias of the prediction, and it also provides the long-term performance of the model [
25], where the positive and negative values represent overprediction and underprediction, respectively. Furthermore, in cases where the data vary with location and time scale, relative metrics such as rRMSD and rMBD are more useful, as they provide the percentage difference.
4. Results and Discussion
This section provides the results from both the RBR model used to obtain the cloud cover and the deep learning model used to forecast solar irradiance. The RBR method detects the presence of clouds by considering the ratio of the red channels to blue channels in a color image. These one-channel images combined with the threshold values were able to distinguish cloud pixels from the background. In the sky images, the cloud and sky pixels were detected as white and black colors and were further transformed from the original image into the binary image to facilitate the identification of cloud and sky pixels. However, in some cases, errors occurred when the cloud pixel was almost the same as the sky pixel. In addition, if the clouds are extremely thin, the pixels are lower than the threshold value, and they are not detected as clouds. Hence, it is essential to determine an accurate threshold value to lessen the occurrence of such classification errors.
In order to understand the correlation between cloud cover and the clear-sky index, the distribution of the daily calculated cover data is presented in
Figure 7, which illustrates the variation of mean cloud cover for each day. To investigate the results, the mean clear-sky index has been converted into an opposite value on account of the differences in the definition of sky conditions in relation to cloud cover and the clear-sky index; for example, in referring to cloud cover, a value of 0 is a clear-sky condition, while in the clear-sky index, a value of 0 refers to an overcast sky. The correlation between the cloud cover and the clear-sky index is worth mentioning because it shows similar trends, which indicates that the method provides satisfactory results in obtaining the cloud cover from sky images.
In forecasting results, the forecast cloud cover using the LSTM model is converted into GHI data using the correlation cloud cover and clear-sky irradiance as described in Equation (3). The results for the minutes forecasting were compared with the GHI measurement in terms of evaluation metrics. The summary of solar irradiance forecasting is grouped in three categories based on sky conditions: partly cloudy, clear, and overcast days. The proposed model by modifying the coefficient in Kasten’s model outperformed the Yoo model and the persistence model for partly cloudy days as shown in
Table 3. After 10 min forecasting, the RMSD for the proposed, the Yoo, and the persistence models are 199.75 Wm
−2, 214.97 Wm
−2, and 317.94 Wm
−2, respectively. For reference, another forecasting model by Caldas et al. [
7], which used the cross-correlation method (CCM) in all-sky imaging, attained an RMSD of 251 Wm
−2 for partly cloudy days in forecasting solar irradiance up to 10 min in advance.
The results for clear days and overcast days are shown in
Table 4. The results indicate that under clear day conditions the proposed model outperforms the persistence and the Yoo models for forecast horizon more than 5 min. The RMSD in 10 min forecasting for the proposed, the Yoo, and the persistence models are 43.08 Wm
−2, 63.74 Wm
−2, and 52.49 Wm
−2, respectively. However, the information was different on overcast days. In this case, the smallest error by the persistence model, the RMSD of 9.98 Wm
−2 in 10 min ahead of forecasting, demonstrated that the persistence model performed well under the overcast condition. The proposed and the Yoo models provide the RMSD of 17.37 Wm
−2 and 169.86 Wm
−2, respectively. The result in these overcast days was not surprising since the value of solar irradiance does not change significantly. The experiment by Caldas et al. [
7] also reported that the persistence model has better performance than their proposed model for overcast day conditions, with an RMSD of 110 Wm
−2 for 10 min ahead forecast horizon.
To further analyze the variability of the solar irradiance under different sky conditions, the metric that indicates the variability of solar irradiance is presented in
Figure 8. The metric was calculated using the universal variability index (UVI) to quantify the variation of solar irradiance under clear, partly cloudy, and overcast days by comparing the measured irradiance and calculated clear-sky irradiance. As expected, the small variations in clear and overcast days resulted in the values of UVI close to one. However, the UVI for partly cloudy days does not remain constant anymore due to the high variability of solar irradiance. It shows the high variability index with the value ranging from 2.68 to 61.16.
The performance of forecast models significantly differs by solar irradiance variability.
Figure 9 shows the mean RMSD and the mean UVI of 10-min-ahead forecast results under different sky conditions. It clearly explains that the proposed model remarkably outperformed the persistence model in partly cloudy days with the mean UVI > 20. On the other hand, when the mean UVI < 5, the forecast performance of the proposed model is comparable to that of the persistence model. In overcast days, the proposed model is slightly less accurate than the persistence model. Probably, the reason is that the solar radiation model is less accurate under overcast sky conditions.
5. Conclusions
In this study, a model to forecast solar irradiance from sky images was proposed and evaluated. The images recorded using a sky camera were designed to capture a hemispherical image each minute. Image processing using the RBR method was used to improve the quality of the image in detecting the cloud pixels in the image. The LSTM, as a deep learning model, was applied to forecast the cloud cover for several minutes ahead. Then, the physical model was used to estimate the GHI by using the forecasted cloud cover.
When the sky is partly cloudy, i.e., the UVI is larger than 20, the proposed model is able to forecast GHI for 10 min in advance with RMSD, rMSD, MBD, and rMBD values of 199.75 Wm−2, 25.1%, −60.65 Wm−2, and −12.70%. These evaluation metrics demonstrate that the proposed model has a better performance than the Yoo and the persistence models. On the other hand, for clear and overcast sky conditions when the UVI is smaller than 5, performances of the proposed and the persistence models are comparable.