Short-Term Photovoltaic Power Forecasting Using a Convolutional Neural Network–Salp Swarm Algorithm

: The high utilization of renewable energy to manage climate change and provide green energy requires short-term photovoltaic (PV) power forecasting. In this paper, a novel forecasting strategy that combines a convolutional neural network (CNN) and a salp swarm algorithm (SSA) is proposed to forecast PV power output. First, the historical PV power data and associated weather information are classiﬁed into ﬁve weather types, such as rainy, heavy cloudy, cloudy, light cloudy and sunny. The CNN classiﬁcation is then used to determine the prediction for the next day’s weather type. Five models of CNN regression are established to accommodate the prediction for di ﬀ erent weather types. Each CNN regression is optimized using a salp swarm algorithm (SSA) to tune the best parameter. To evaluate the performance of the proposed method, comparisons were made to the SSA based support vector machine (SVM-SSA) and long short-term memory neural network (LSTM-SSA) methods. The proposed method was tested on a PV power generation system with a 500 kWp capacity located in south Taiwan. The results showed that the proposed CNN-SSA could accommodate the actual generation pattern better than the SVM-SSA and LSTM-SSA methods.


Introduction
As photovoltaic (PV) power harvesting becomes more affordable, PV systems are increasingly being integrated into and used for existing power systems. For example, PV power is used in reactive power planning [1], energy storage systems (ESSs) [2], and various billing mechanisms in demand-side management [3]. Techno-economic evaluation was conducted in Gao et al. [4]. PV power systems are more accessible to the demand side but create new problems due to their variability, particularly in larger PV power systems. In Mills and Wiser [5], short-term power forecasting was used to determine the balancing reserve from renewables. The forecasting algorithm accuracy determines the solar variability, which significantly increases the cost.
In Zhang et al. [6], PV power injection reduces the costs of the spinning reserve more than the improved unit commitment and economic dispatch do. The paper also emphasizes that some forecasting errors must be managed and that these increase as the plant size increases. In Yang and Liao [7], the optimal PV reactive power regulation was researched, whereby accurate PV power forecasting leads to more reactive power regulation. In Di Piazza et al. [8], the online replanning task was used to minimize the maximum deviation due to forecasting errors for the PV-ESS hybrid system in reshaping the daily extract both long-range and short-time local features. The CNN was manually tuned with filter sizes of (2,4), (4,8), and (8,16) using NWPs. In Riaz et al. [28], the fuzzy rough C-mean with an unsupervised CNN was used for the image clustering of large-scale image data. An image as input was fed into AlexNet with five convolution layers, two adjustment layers, three feature maps, a max-pooling layer, one fully connected layer, and one soft-max layer before the cluster-label production.
In Nam and Hur [29], naïve Bayes classifier and Kriging models were used to forecast the hourly uncertainty in PV power, whereby the Kriging technique allowed for the spatial modeling of weather information and the location of PV power output through the estimations of irradiance, humidity, and temperature at neighboring points. In Suresh et al. [30], a manually tuned CNN model was used for PV power forecasting, which consisted of four convolutional layers. Those convolutional layers were assigned for each input variable and fed to the max-pooling layer. The input variables consisted of irradiation, module temperature, ambient temperature, and wind speed.
In Zhao et al. [31], the use of the CNN was a time-consuming method for the time-series classification, particularly for trial-and-error CNN parameter determination. Research in Miyazaki et al. [32] offered an alternative approach to PV power forecasting by using the relationship between the movement of the geographical distribution and the harvested PV power. This approach is less practical for individual PV system users because a powerful weather station is required. In this study, an optimization tool called the salp swarm algorithm (SSA) [33] was used to tune the best parameter for the CNN.
In contrast with point forecasting, probabilistic PV power forecasting enables operators to perform flexible analyses in one plot. The worst and best conditions for prediction can be observed, which facilitates decision-making by considering the prediction uncertainty. The various probabilistic PV power forecasting methods include the modeling of random PV prediction errors with fuzzy prediction intervals [34], the modeling of PV power generation using quantile regression [35,36], and Monte Carlo simulations [37]. Moreover, random PV power generation can be related to the uncertainty of input variables modeled using a cumulant probability distribution [38]. Although probabilistic load forecasting is outside the scope of this work, the proposed method has implications for the development of probabilistic load forecasting using CNNs, which accommodates the flexible use of multivariate inputs.
Many studies  have identified the drawbacks of developments in PV power forecasting: (1) ineffective use of trial and error for PV power forecasting optimization; (2) extensive historical data requirements for forecasting model accuracy; (3) the requirement for temporal-and periodicity-related forecasting models to use minute fractions of NWP or any meteorological image, which are unavailable in practice for small plants; and (4) a gradual decrease in forecasting accuracy toward the end of a prediction window. Thus, this paper proposes a novel forecasting strategy to address the aforementioned problems. The proposed method uses a CNN optimized with an SSA to seamlessly generate the forecasting model without the requirement of time-consuming trial and error. The CNN classification is used for the prediction of the next day's weather type. The CNN-SSA then produces five CNN regression models to accommodate the unique characteristics of several weather types, which are run with an hourly prediction horizon. The proposed method is compared with an optimal support vector machine and an LSTM neural network by using an SSA (SVM-SSA and LSTM-SSA).
The state-of-the-art features of the proposed method are as follows: (1) simple arrangement of input variables that moderate features and training datasets for an accurate day-ahead forecasting result, (2) a modified state-of-the-art forecasting algorithm to produce a fine-tuned CNN model that requires no time-consuming trial and error, and (3) consistent accuracy in short-term PV power forecasting from day-ahead to three-days-ahead forecasting windows.
The rest of the paper is organized as follows. In Section 2, factors affecting PV power forecasting are described. Section 3 describes the proposed CNN-SSA forecasting method and the benchmark algorithms. Section 4 details the testing performance of the proposed method. Comparisons with other well-established methods are also provided in this section. Finally, conclusions are provided in Section 5.

Modeling Historical Data for CNN Predictors
Preparing predictors for CNN-based forecasting algorithms is essential for ensuring accuracy. In general, the accuracy of PV power forecasting is affected by weather information [39], which is described by clear-sky models, clearness indices, the origin of inputs, and persistence models. It also enables the data set to be classified into conditions, such as sunny and cloudy [40]. The studies in question indicate that the choice of available historical data and data preconditions is evident for exhibiting features of available historical data for obtaining the expected PV power pattern. Moreover, periodically updating input data to the recent time guarantees reliable power output estimations in the long term [13], which demonstrates the value of choosing a suitable training duration to improve forecasting results.
For an individual PV power system, since a weather service value may not be preferred over a weather forecast-based PV forecasting using irradiance-weather models [41] and weather forecast variables [42], CNN classification is developed to determine the next day's weather, where this uses the historical PV power and site-related weather information regarding the temperature, precipitation, relative humidity, clear-sky radiation, module temperature, and wind speed, for example. The historical PV power and on-site weather information are classified into five weather types: rainy, heavy cloudy, cloudy, light cloudy, and sunny. In addition to preparing the historical data itself, data preprocessing is required for the forecasting model structure [43]. Max-min normalization and standard deviation normalization are thereby applied to predictors.
In this study's short-term PV power forecasting, historical data were used as input variables. The data consisted of daily and hourly datasets. The hourly dataset consisted of historical PV power output and temperature. The daily dataset consisted of average PV power, standard deviation of PV power, peak PV power, maximum temperature, minimum temperature, precipitation, and weather type. In addition, the hour of the day was also accounted for as an input variable. Per the sunlight duration, the period of observation was set to 12 h per day, from 6 a.m. to 5 p.m. The arrangement of input variables is described in Figure 1. The ten potential input variables were indexed from 1 to 10 for PV power output, temperature, average PV power, standard deviation of PV power, peak PV power, maximum temperature, minimum temperature, precipitation, weather type, and hour of the day, respectively. The potential preconditions were indexed with the letters a to h for 1 h lag, 2 h lag, 3 h lag, same hour as last day, same hour as last 2 days, same hour as last 3 days, same hour as last 4 days, and present (recent), respectively. The available data with correlation coefficients used in Zhong et al. [44] were then tested to establish whether input variables were closely correlated with the PV power generation. Based on the t-test result, the input vectors with p-values lower than 0.05 were classified as significant and were used as input variables. The correlation coefficient results are listed in Table 1.
Energies 2020, 13, x FOR PEER REVIEW 4 of 20 other well-established methods are also provided in this section. Finally, conclusions are provided in Section 5.

Modeling Historical Data for CNN Predictors
Preparing predictors for CNN-based forecasting algorithms is essential for ensuring accuracy. In general, the accuracy of PV power forecasting is affected by weather information [39], which is described by clear-sky models, clearness indices, the origin of inputs, and persistence models. It also enables the data set to be classified into conditions, such as sunny and cloudy [40]. The studies in question indicate that the choice of available historical data and data preconditions is evident for exhibiting features of available historical data for obtaining the expected PV power pattern. Moreover, periodically updating input data to the recent time guarantees reliable power output estimations in the long term [41], which demonstrates the value of choosing a suitable training duration to improve forecasting results.
For an individual PV power system, since a weather service value may not be preferred over a weather forecast-based PV forecasting using irradiance-weather models [42] and weather forecast variables [43], CNN classification is developed to determine the next day's weather, where this uses the historical PV power and site-related weather information regarding the temperature, precipitation, relative humidity, clear-sky radiation, module temperature, and wind speed, for example. The historical PV power and on-site weather information are classified into five weather types: rainy, heavy cloudy, cloudy, light cloudy, and sunny. In addition to preparing the historical data itself, data preprocessing is required for the forecasting model structure [44]. Max-min normalization and standard deviation normalization are thereby applied to predictors.
In this study's short-term PV power forecasting, historical data were used as input variables. The data consisted of daily and hourly datasets. The hourly dataset consisted of historical PV power output and temperature. The daily dataset consisted of average PV power, standard deviation of PV power, peak PV power, maximum temperature, minimum temperature, precipitation, and weather type. In addition, the hour of the day was also accounted for as an input variable. Per the sunlight duration, the period of observation was set to 12 h per day, from 6 a.m. to 5 p.m. The arrangement of input variables is described in Figure 1. The ten potential input variables were indexed from 1 to 10 for PV power output, temperature, average PV power, standard deviation of PV power, peak PV power, maximum temperature, minimum temperature, precipitation, weather type, and hour of the day, respectively. The potential preconditions were indexed with the letters a to h for 1 h lag, 2 h lag, 3 h lag, same hour as last day, same hour as last 2 days, same hour as last 3 days, same hour as last 4 days, and present (recent), respectively. The available data with correlation coefficients used in Zhong et al. [45] were then tested to establish whether input variables were closely correlated with the PV power generation. Based on the t-test result, the input vectors with p-values lower than 0.05 were classified as significant and were used as input variables. The correlation coefficient results are listed in Table 1.   From the available input variables and preconditions that exhibited a close correlation with the PV power output, the model with the fewest input variables was constructed to generalize the practicability of the model. The computations between the input variables and preconditions were as follows. As an example, we take PV power output, temperature, average PV power, and the standard deviation of PV power as the chosen input variables-labeled as 1, 2, 3, and 4, respectively-as features of CNN predictors. Subsequently, 1 h lag, same hour as last day, same hour as last 2 days, same hour as last 3 days, and same hour as last 4 days are selected as the preconditions and labeled as a, d, e, f, and g, respectively. The CNN forecasting model requires a set of predictors and expected outputs. The CNN predictor arrangement is described in Figure 2. For the CNN's input, where we must apply the set of chosen preconditions for each feature, which is portrayed as a 3-D matrix such that the observations, the number of chosen preconditions, and the number of features represent the row, column, and width sizes, respectively. The CNN's expected output constitutes a 2-D matrix for which the number of observations and responses (expected PV power outputs) represent its row and column sizes, respectively. Each of the predictors and the expected output pairs are then normalized, categorized, and sequenced. The data are normalized with max-min normalization. Subsequently, the expected output is categorized according to the historical weather type. To some extent, data categorization can be used to recognize crucial times during the observations, such as the peak time or huge decreases in the load demand. Last, the predictor-expected output pairs are rearranged based on their sequence per the observation times. The predictor matrix is transformed into a 4-D matrix such that the sizes of the predictors are given by the number of observations, the number of features, the number of responses, and the number of observations for the row, column, layer, and sequence, respectively. The arrangement of the expected output is also the same, but the sizes of the rows and columns are all set to 1 per the size of the response. From the available input variables and preconditions that exhibited a close correlation with the PV power output, the model with the fewest input variables was constructed to generalize the practicability of the model. The computations between the input variables and preconditions were as follows. As an example, we take PV power output, temperature, average PV power, and the standard deviation of PV power as the chosen input variables-labeled as 1, 2, 3, and 4, respectively-as features of CNN predictors. Subsequently, 1 h lag, same hour as last day, same hour as last 2 days, same hour as last 3 days, and same hour as last 4 days are selected as the preconditions and labeled as a, d, e, f, and g, respectively. The CNN forecasting model requires a set of predictors and expected outputs.
The CNN predictor arrangement is described in Figure 2. For the CNN's input, where we must apply the set of chosen preconditions for each feature, which is portrayed as a 3-D matrix such that the observations, the number of chosen preconditions, and the number of features represent the row, column, and width sizes, respectively. The CNN's expected output constitutes a 2-D matrix for which the number of observations and responses (expected PV power outputs) represent its row and column sizes, respectively. Each of the predictors and the expected output pairs are then normalized, categorized, and sequenced. The data are normalized with max-min normalization. Subsequently, the expected output is categorized according to the historical weather type. To some extent, data categorization can be used to recognize crucial times during the observations, such as the peak time or huge decreases in the load demand. Last, the predictor-expected output pairs are rearranged based on their sequence per the observation times. The predictor matrix is transformed into a 4-D matrix such that the sizes of the predictors are given by the number of observations, the number of features, the number of responses, and the number of observations for the row, column, layer, and sequence, respectively. The arrangement of the expected output is also the same, but the sizes of the rows and columns are all set to 1 per the size of the response.

Proposed Forecasting Strategy
The proposed forecasting strategy consists of a CNN classification and an SSA-based CNN regression. The CNN classification is used to determine the weather type for the following day. The SSA is used to tune the CNN regression parameters. The CNN regression is used as the short-term forecasting model for PV power and the predictors, using the optimal parameter obtained from SSA. In the training stage, as depicted in Figure 3, the historical PV power and weather information is processed using the steps described in Section 2, with a set of CNN predictors closely correlated to the training PV power output expected. A set of CNN predictors and training outputs are subjected to CNN classification to label each of the training data sets with suitable weather types. In addition to the label, the structure of this CNN classification is recorded for use in the testing stage. The labeled training data set is then classified into the five weather types of rain, heavy cloudy, cloudy, light cloudy, and sunny. To obtain a close correlation and less variation in the data set, the grouped dataset is classified into observation hours. Subsequently, the SSA initializes the parameters of the CNN regression and the forecasting model is trained using SSA initial parameters. The forecasting result is anti-normalized and evaluated using the mean absolute percentage error (MAPE) and the mean relative error (MRE). The SSA identifies the best CNN parameters using the MRE as the objective function. If the MRE mismatch between the iterations is within the tolerance threshold, the SSA iteration is terminated and the best parameter of the CNN for the respective weather type and hour is recorded. For the rest of the weather-type model, the same process is applied for each hour of observation.

Proposed Forecasting Strategy
The proposed forecasting strategy consists of a CNN classification and an SSA-based CNN regression. The CNN classification is used to determine the weather type for the following day. The SSA is used to tune the CNN regression parameters. The CNN regression is used as the short-term forecasting model for PV power and the predictors, using the optimal parameter obtained from SSA. In the training stage, as depicted in Figure 3, the historical PV power and weather information is processed using the steps described in Section 2, with a set of CNN predictors closely correlated to the training PV power output expected. A set of CNN predictors and training outputs are subjected to CNN classification to label each of the training data sets with suitable weather types. In addition to the label, the structure of this CNN classification is recorded for use in the testing stage. The labeled training data set is then classified into the five weather types of rain, heavy cloudy, cloudy, light cloudy, and sunny. To obtain a close correlation and less variation in the data set, the grouped dataset is classified into observation hours. Subsequently, the SSA initializes the parameters of the CNN regression and the forecasting model is trained using SSA initial parameters. The forecasting result is anti-normalized and evaluated using the mean absolute percentage error (MAPE) and the mean relative error (MRE). The SSA identifies the best CNN parameters using the MRE as the objective function. If the MRE mismatch between the iterations is within the tolerance threshold, the SSA iteration is terminated and the best parameter of the CNN for the respective weather type and hour is recorded. For the rest of the weather-type model, the same process is applied for each hour of observation. For the testing stage, as depicted in Figure 4, the recent PV power and weather information are used. The resulting data set is processed and subjected to the CNN classification to establish the weather type that is applicable for the day in question. The input variables are then fed into the respective CNN regression model based on the weather type. After the anti-normalization process, For the testing stage, as depicted in Figure 4, the recent PV power and weather information are used. The resulting data set is processed and subjected to the CNN classification to establish the weather type that is applicable for the day in question. The input variables are then fed into the Energies 2020, 13, 1879 7 of 20 respective CNN regression model based on the weather type. After the anti-normalization process, the forecasting model result is evaluated using the MAPE and the MRE. The following section is thereby divided into three sections for CNN classification, SSA-CNN regression, and benchmark algorithms.

CNN Classification For Suitable Weather-Type Identification
The CNN classification is constructed for investigating weather conditions before the forecasting approach is used. The CNN is constructed with multiple layers, such as the convolutional layer, maxpooling layer, rectified linear unit (ReLU) layer, batch normalization layer, softmax layer, and fully connected layer, as depicted in Figure 5. The batch normalization layer is inserted between the convolutional layer and the ReLU layer to accelerate the training of the CNNs and to reduce the sensitivity to the network initialization. The ReLU layer features a threshold operation for each element of the input, where any value less than zero is set to zero. The max-pooling layer is the layer that spatially rearranges the convolutional layer and extracts the feature map. In the pooling process, the number of parameters is reduced. The softmax layer produces a discrete probability distribution function over a multiclass classification. According to the following CNN classification structure, the neurons in the convolutional layers must connect to the subregions of the layers (such as the max-pooling layer, batch normalization layer, and ReLU layer) instead of being fully connected as in other types of neural networks [46]. The neurons in the convolutional layer are spatially arranged in accordance with the model-correlated outcomes from the subregions [47] to reflect weight sharing among neurons, unlike in other types of neural networks, which include no weight sharing among the connections but instead produce the outcomes directly. In this research, the fully connected layer has five neurons corresponding to the five weather types.

CNN Classification For Suitable Weather-Type Identification
The CNN classification is constructed for investigating weather conditions before the forecasting approach is used. The CNN is constructed with multiple layers, such as the convolutional layer, max-pooling layer, rectified linear unit (ReLU) layer, batch normalization layer, softmax layer, and fully connected layer, as depicted in Figure 5. The batch normalization layer is inserted between the convolutional layer and the ReLU layer to accelerate the training of the CNNs and to reduce the sensitivity to the network initialization. The ReLU layer features a threshold operation for each element of the input, where any value less than zero is set to zero. The max-pooling layer is the layer that spatially rearranges the convolutional layer and extracts the feature map. In the pooling process, the number of parameters is reduced. The softmax layer produces a discrete probability distribution function over a multiclass classification.

CNN Classification For Suitable Weather-Type Identification
The CNN classification is constructed for investigating weather conditions before the forecasting approach is used. The CNN is constructed with multiple layers, such as the convolutional layer, maxpooling layer, rectified linear unit (ReLU) layer, batch normalization layer, softmax layer, and fully connected layer, as depicted in Figure 5. The batch normalization layer is inserted between the convolutional layer and the ReLU layer to accelerate the training of the CNNs and to reduce the sensitivity to the network initialization. The ReLU layer features a threshold operation for each element of the input, where any value less than zero is set to zero. The max-pooling layer is the layer that spatially rearranges the convolutional layer and extracts the feature map. In the pooling process, the number of parameters is reduced. The softmax layer produces a discrete probability distribution function over a multiclass classification. According to the following CNN classification structure, the neurons in the convolutional layers must connect to the subregions of the layers (such as the max-pooling layer, batch normalization layer, and ReLU layer) instead of being fully connected as in other types of neural networks [46]. The neurons in the convolutional layer are spatially arranged in accordance with the model-correlated outcomes from the subregions [47] to reflect weight sharing among neurons, unlike in other types of neural networks, which include no weight sharing among the connections but instead produce the outcomes directly. In this research, the fully connected layer has five neurons corresponding to the five weather types. According to the following CNN classification structure, the neurons in the convolutional layers must connect to the subregions of the layers (such as the max-pooling layer, batch normalization layer, and ReLU layer) instead of being fully connected as in other types of neural networks [45]. The neurons in the convolutional layer are spatially arranged in accordance with the model-correlated outcomes from the subregions [46] to reflect weight sharing among neurons, unlike in other types of neural networks, which include no weight sharing among the connections but instead produce the outcomes directly. In this research, the fully connected layer has five neurons corresponding to the five weather types.

SSA-CNN Regression For Short-Term PV Power Forecasting
The CNN regression [30] is used as the forecasting model for the PV power output. The structure of the CNN regression, shown in Figure 6, consists of the convolutional layer, batch normalization layer, ReLU layer, max-pooling layer, drop-out layer, and fully connected layer, which is called the regression layer. The dropout layer is inserted between the ReLU layer and the fully connected layer to avoid overfitting, which may produce inaccurate forecasting results. The fully connected layer only has one neuron because the forecasted response is only the PV power output. In the CNN regression, the size of each layer must be initialized to obtain high-performance forecasting results. Due to the small dimension of the input layer, the convolutional layer and the max-pooling layer are set to having kernel size of two, corresponding to the two pixels in the input layout. In this research, SSA is integrated to obtain the best CNN regression initialization, particularly the size of the drop-out layer, the initial learning rate, and the mini-batch size.
Energies 2020, 13, x FOR PEER REVIEW 8 of 20 The CNN regression [30] is used as the forecasting model for the PV power output. The structure of the CNN regression, shown in Figure 6, consists of the convolutional layer, batch normalization layer, ReLU layer, max-pooling layer, drop-out layer, and fully connected layer, which is called the regression layer. The dropout layer is inserted between the ReLU layer and the fully connected layer to avoid overfitting, which may produce inaccurate forecasting results. The fully connected layer only has one neuron because the forecasted response is only the PV power output. In the CNN regression, the size of each layer must be initialized to obtain high-performance forecasting results. Due to the small dimension of the input layer, the convolutional layer and the max-pooling layer are set to having kernel size of two, corresponding to the two pixels in the input layout. In this research, SSA is integrated to obtain the best CNN regression initialization, particularly the size of the drop-out layer, the initial learning rate, and the mini-batch size. The SSA [33] is developed using the behavior of salps in a chain formation regarding their hunting tactics. For optimization, these chains exert a significant effect on the SSA's balancing of the exploration and exploitation inclinations by assisting it in escaping from local optima and preventing the problem of stagnation. In SSA, the population is formed using the chains of salps, of which two types exist: leaders and followers. When the agent is the front-runner, it is classified as the leader, whereas other salps are classified as followers. The role of the leader salp is to guide and direct the population's next steps, and the follower salps pay attention to other peers.
As shown in Figure 7, the salps' positions during the exploration and exploitation phases are defined as an n-dimensional space, where n is the total number of variables. For a set of salps X consisting of N salps with d dimensions, an SSA population is recorded in an (N × d)-dimensional matrix. The salp population is then divided into two groups for leaders and followers. The leader is the salp at the front of the chain, and the rest of the salps are considered followers. The salp chains are computed by updating the positions of the leader and follower salps. The updating process of the leader salp position is related to the food source. In the th dimension of the search space, the leader position is updated according to the food source position , which lies within the upper and lower bounds of and , respectively. The leader position is also updated based on the coefficients , , and , as in Equation (1): The SSA [33] is developed using the behavior of salps in a chain formation regarding their hunting tactics. For optimization, these chains exert a significant effect on the SSA's balancing of the exploration and exploitation inclinations by assisting it in escaping from local optima and preventing the problem of stagnation. In SSA, the population is formed using the chains of salps, of which two types exist: leaders and followers. When the agent is the front-runner, it is classified as the leader, whereas other salps are classified as followers. The role of the leader salp is to guide and direct the population's next steps, and the follower salps pay attention to other peers.
As shown in Figure 7, the salps' positions during the exploration and exploitation phases are defined as an n-dimensional space, where n is the total number of variables. For a set of salps X consisting of N salps with d dimensions, an SSA population is recorded in an (N × d)-dimensional matrix. The salp population is then divided into two groups for leaders and followers. The leader is the salp at the front of the chain, and the rest of the salps are considered followers. The CNN regression [30] is used as the forecasting model for the PV power output. The structure of the CNN regression, shown in Figure 6, consists of the convolutional layer, batch normalization layer, ReLU layer, max-pooling layer, drop-out layer, and fully connected layer, which is called the regression layer. The dropout layer is inserted between the ReLU layer and the fully connected layer to avoid overfitting, which may produce inaccurate forecasting results. The fully connected layer only has one neuron because the forecasted response is only the PV power output. In the CNN regression, the size of each layer must be initialized to obtain high-performance forecasting results. Due to the small dimension of the input layer, the convolutional layer and the max-pooling layer are set to having kernel size of two, corresponding to the two pixels in the input layout. In this research, SSA is integrated to obtain the best CNN regression initialization, particularly the size of the drop-out layer, the initial learning rate, and the mini-batch size. The SSA [33] is developed using the behavior of salps in a chain formation regarding their hunting tactics. For optimization, these chains exert a significant effect on the SSA's balancing of the exploration and exploitation inclinations by assisting it in escaping from local optima and preventing the problem of stagnation. In SSA, the population is formed using the chains of salps, of which two types exist: leaders and followers. When the agent is the front-runner, it is classified as the leader, whereas other salps are classified as followers. The role of the leader salp is to guide and direct the population's next steps, and the follower salps pay attention to other peers.
As shown in Figure 7, the salps' positions during the exploration and exploitation phases are defined as an n-dimensional space, where n is the total number of variables. For a set of salps X consisting of N salps with d dimensions, an SSA population is recorded in an (N × d)-dimensional matrix. The salp population is then divided into two groups for leaders and followers. The leader is the salp at the front of the chain, and the rest of the salps are considered followers. The salp chains are computed by updating the positions of the leader and follower salps. The updating process of the leader salp position is related to the food source. In the th dimension of the search space, the leader position is updated according to the food source position , which lies within the upper and lower bounds of and , respectively. The leader position is also updated based on the coefficients , , and , as in Equation (1): The salp chains are computed by updating the positions of the leader and follower salps. The updating process of the leader salp position is related to the food source. In the jth dimension of the search space, the leader position x 1 j is updated according to the food source position F j , which lies within the upper and lower bounds of ub j and lb j , respectively. The leader position is also updated based on the coefficients c 1 , c 2 , and c 3 , as in Equation (1): Energies 2020, 13, 1879 9 of 20 where the coefficients c 2 and c 3 are uniform random numbers between [0,1]. For c 1 in particular, this coefficient represents the balance portion of exploration and exploitation, which is defined as follows: where l is the current iteration and L is the maximum number of iterations. The incorporation of c 1 , c 2 , and c 3 demonstrates that the next position in the jth dimension should move toward positive infinity or negative infinity, as well as the step size. The updating process of the follower salps' positions can then be modeled as follows: where i ≥ 2 is the number of follower salps, t is the iteration, and v 0 is the initial speed.
t , is a ratio of the final speed to the initial speed. Accordingly, the follower salps' positions are obtained as follows: where x i j is the position of the ith follower salp in the jth dimension. In this study, SSA was used to identify optimal parameters for the proposed method and benchmark algorithms.

Benchmark Algorithms and Evaluation Index
To validate the performance of the proposed method, SVMs and an LSTM were used as benchmarks. These methods have been applied as forecasting models and have shown accurate results. The algorithms are summarized as follows.

Long Short-Term Memory-SSA (LSTM-SSA)
An LSTM [46] is a type of recurrent neural network that can collect information and determine whether to accumulate new information or to forget the information once a gate is triggered. The interaction between these gates enables an LSTM to model long-term dependencies and prevents gradient vanishing in the solution of time-series predictions. The LSTM structure is depicted in Figure 8. The sequence input is essentially the sequence arrangement of LSTM predictors that was shown in Section 2. In an LSTM, the size of the predictors is a two-dimensional time series in which the sizes of the predictors should be three-dimensional matrices for CNNs. As an SSA is integrated to an LSTM, seven parameters are searched by the SSA: (1) the number of hidden units, (2) max epoch, (3) gradient threshold, (4) initial learning rate, (5) learning rate decrease period, (6) learning rate decrease factor, and (7) mini-batch size. where the coefficients and are uniform random numbers between [0,1]. For in particular, this coefficient represents the balance portion of exploration and exploitation, which is defined as follows: where is the current iteration and is the maximum number of iterations. The incorporation of , , and demonstrates that the next position in the th dimension should move toward positive infinity or negative infinity, as well as the step size.
The updating process of the follower salps' positions can then be modeled as follows: where ≥ 2 is the number of follower salps, is the iteration, and is the initial speed. = , where = , is a ratio of the final speed to the initial speed. Accordingly, the follower salps' positions are obtained as follows: where is the position of the th follower salp in the th dimension. In this study, SSA was used to identify optimal parameters for the proposed method and benchmark algorithms.

Benchmark Algorithms and Evaluation Index
To validate the performance of the proposed method, SVMs and an LSTM were used as benchmarks. These methods have been applied as forecasting models and have shown accurate results. The algorithms are summarized as follows.

Long Short-Term Memory-SSA (LSTM-SSA)
An LSTM [47] is a type of recurrent neural network that can collect information and determine whether to accumulate new information or to forget the information once a gate is triggered. The interaction between these gates enables an LSTM to model long-term dependencies and prevents gradient vanishing in the solution of time-series predictions. The LSTM structure is depicted in Figure  8. The sequence input is essentially the sequence arrangement of LSTM predictors that was shown in Section 2. In an LSTM, the size of the predictors is a two-dimensional time series in which the sizes of the predictors should be three-dimensional matrices for CNNs. As an SSA is integrated to an LSTM, seven parameters are searched by the SSA: (1) the number of hidden units, (2) max epoch, (3) gradient threshold, (4) initial learning rate, (5) learning rate decrease period, (6) learning rate decrease factor, and (7) mini-batch size. An SVM [48,49] is a nonparametric technique using sequential minimal optimization to solve a decomposed equation for the input variables. For each iteration, a working set of two points is used to find a function f(x) that deviates from yt by a value not greater than the error in each previous training point for x. The result of the iteration process can be recalled as the mapping of the training An SVM [47,48] is a nonparametric technique using sequential minimal optimization to solve a decomposed equation for the input variables. For each iteration, a working set of two points is used to find a function f (x) that deviates from y t by a value not greater than the error in each previous training point for x. The result of the iteration process can be recalled as the mapping of the training data x into a high-dimensional feature space to represent the nonlinear relationships between the input variables and the targeted output. In an SVM, the weight of the α value is solved using an SSA.

Evaluation Index
The MAPE and MRE are used to evaluate the forecasting resultsŶ t relative to the actual PV power output Y t at the observed hour t. The calculation of MAPE and MRE are as follows: where Nh is the number of observation hours, and PV capacity is the PV power plant capacity.

Simulation Results
The proposed method was simulated in a MATLAB 2018 environment running using an Intel Core i-73770 CPU operating at 3.40 GHz with 8 GB RAM. The SSA used 15 agents with a maximum of 100 iterations. For each tuning model, the best performance and error distribution were observed in ten trials. The SVM was set using the sequential minimal optimization solver using five kernels.

Test System
The test system had a capacity of 500 kWp and was located in the south of Taiwan. The site was located at a latitude of 22.71 • N, a longitude of 120.54 • E, and an altitude of 43 m. The historical hourly data was collected from January to December of 2017. The test system consisted of historical data for the PV power output, average temperature, relative humidity, clear-sky radiation, wind speed, and day stamp. These historical data were grouped in terms of the CNN classifications for five weather types: rain, heavy cloudy, cloudy, light cloudy, and sunny. The forecasting model required five preconditions, which were same hour as the last 1, 2, 3, and 4 days; lagged 2 h; and the hour stamp. For the cloudy weather model, seven preconditions were used, which were same hour as the last 1, 2, 3, and 4 days; lagged 2 and 3 h; and hour stamp. The arrangement of the CNN predictors was per the explanation given in Section 2. The datasets for each weather type included 14, 23, 36, 60, and 42 days for rain, heavy cloudy, cloudy, light cloudy, and sunny, respectively. The training and testing datasets were set to 70% and 30%, respectively, of the total dataset length. With the approach of a real-time situation, in which the longer historical PV power and weather information may be unavailable, only the four previous days' historical data were incorporated as the CNN predictors for forecasting the next day.

Short-Term PV Power Forecasting
The optimal CNN, LSTM, and SVM parameters searched for using the SSA for each weather type are shown in Tables 2 and 3, and Figure 9, respectively. For the CNN-SSA method, five parameters were tuned, as shown in Table 2. The kernel size of the convolutional layer and the max-pooling layer ranged between 2 and 3 because the width of the input layer was only 5 × 5. The dropout layer value ranged between 0.315 and 0.5, the initial learning rate between 0.012 and 0.01875, and the mini-batch size ranged between 2 and 4.    For the LSTM-SSA method, seven parameters were tuned: hidden units, max epoch, gradient threshold, initial learn rate, learning rate decrease period, and learning rate decrease factor, as shown in Table 3. The number of hidden units in rainy, cloudy, and sunny were 7, and those in heavy cloudy and light cloudy were 8 and 3, respectively. The max epoch value varied between 178 and 200 epochs. The gradient threshold ranged from 479 to 546, initial learn rate from 0.010172 to 0.028262, learning rate decrease period from 55 to 96, and the learning rate decrease factor from 0.505356 to 0.819285.
A set of parameters represents the SVM-SSA weight. The SVM kernel was set to 5. The values of were 50, 54, 58, 55, and 58 for rain, heavy cloudy, cloudy, light cloudy, and sunny, respectively. The weights were bounded between [−1, 1]. The tuned SVM-SSA parameters are shown in Figure 9.
The results of the proposed CNN-SSA PV power forecasting in the training stage are seen in Table 4, including one-day-ahead and three-day-ahead forecasts for the five weather types. To obtain a broader view of the forecasting performance, the observation time was extended to three days ahead, which exhibited variations due to the nonstationary nature of PV power generation. In the day-ahead forecasting results, the sunny model achieved the lowest MRE value of 1.43% and MAPE value of 5.34%. The light cloudy model achieved the highest MRE value of 3.8%; the rain model achieved the highest MAPE value of 42.55%. To obtain the optimal forecasting model, the For the LSTM-SSA method, seven parameters were tuned: hidden units, max epoch, gradient threshold, initial learn rate, learning rate decrease period, and learning rate decrease factor, as shown in Table 3. The number of hidden units in rainy, cloudy, and sunny were 7, and those in heavy cloudy and light cloudy were 8 and 3, respectively. The max epoch value varied between 178 and 200 epochs. The gradient threshold ranged from 479 to 546, initial learn rate from 0.010172 to 0.028262, learning rate decrease period from 55 to 96, and the learning rate decrease factor from 0.505356 to 0.819285.
A set of parameters α represents the SVM-SSA weight. The SVM kernel was set to 5. The values of α were 50, 54, 58, 55, and 58 for rain, heavy cloudy, cloudy, light cloudy, and sunny, respectively. The weights were bounded between [−1, 1]. The tuned SVM-SSA parameters are shown in Figure 9.
The results of the proposed CNN-SSA PV power forecasting in the training stage are seen in Table 4, including one-day-ahead and three-day-ahead forecasts for the five weather types. To obtain a broader view of the forecasting performance, the observation time was extended to three days ahead, which exhibited variations due to the nonstationary nature of PV power generation. In the day-ahead forecasting results, the sunny model achieved the lowest MRE value of 1.43% and MAPE value of 5.34%. The light cloudy model achieved the highest MRE value of 3.8%; the rain model achieved the highest MAPE value of 42.55%. To obtain the optimal forecasting model, the computation times ranged between 14.57 and 16.76 min, which are plausible values for the training stage. In the three-days-ahead observation, the lowest MRE value of 2.33% and MAPE value of 15.30% were achieved for sunny and light cloudy models, respectively. The heavy cloudy model achieved the worst MRE value of 4.41% and MAPE value of 59.61%. The computation cost for the three-days-ahead model varied from 13.62 to 15.93 min. The day-ahead and three-days-ahead forecasting results are depicted in Figures 10 and 11, respectively. In Figure 10, the y-axis represents the PV power in kW and the x-axis represents the hours of observation within a day. Each day, the PV power was observed between 6 a.m. and 5 p.m. because this was the time during which the PV power plant generated power. If the forecasting horizon was extended to three days, the observation hour was extended from 12 to 36 h. By contrast, the x-axis represents the hours of observation within three days, including 3 days × 12 observation hours between 6 a.m. and 5 p.m.; thus, the x-axis comprises 36 points in Figure 11. Using the proposed CNN-SSA method, the actual PV power pattern is represented, particularly for sunny and cloud models in the day-ahead observation, in Figure 10, and for the cloudy, heavy cloudy, and sunny models in three-days-ahead observation in Figure 11. In the rainy and heavy cloudy models, the gap between the actual PV power and the predictions was perceptible, particularly after the first peak of the PV power generation was reached, due to nonstationary variation at this time; this was more visible in three-days-ahead observation. Table 5  CNN-SSA method, the actual PV power pattern is represented, particularly for sunny and cloud models in the day-ahead observation, in Figure 10, and for the cloudy, heavy cloudy, and sunny models in three-days-ahead observation in Figure 11. In the rainy and heavy cloudy models, the gap between the actual PV power and the predictions was perceptible, particularly after the first peak of the PV power generation was reached, due to nonstationary variation at this time; this was more visible in three-days-ahead observation.     By comparison with the general method, the integration of an SSA significantly improved the forecasting accuracy of the traditional CNN, LSTM, and SVM. Except for the rainy model, the traditional SVM showed better accuracy than the SVM-SSA. However, for the rest of the weather models, forecasting accuracy of the SVM-SSA surpassed the traditional SVM by 3.65%, 8.47%, 2.38%, and 2.94% MAPE difference for the heavy cloudy, cloudy, light cloudy, and sunny models, respectively. In the traditional LSTM, there were 3.75%, 4.53%, 5.38%, 8.32%, and 6.91% MAPE differences for each of the rainy, heavy cloudy, cloudy, light cloudy, and sunny models, respectively. Meanwhile in the traditional CNN case, the SSA integration resulted in the forecasting accuracy improvement of 8.55%, 38.60%, 8.37%, 0.04%, and 7.41% for the rainy, heavy cloudy, cloudy, light cloudy, and sunny models, respectively. If we observed the trend in the MRE, the forecasting accuracies increased by 0.32%, 0.88%, 0.19%, 0.59%, and 0.08% for each weather model for CNN-SSA and the traditional CNN comparison. For the LSTM-SSA relative to the traditional LSTM, the MRE also decreased by 2.29%, 0.12%, 0.66%, 0.74%, and 1.85% for each of the rainy, heavy cloudy, cloudy, light cloudy, and sunny models, respectively. For the traditional LSTM relative to the LSTM-SSA, the improvement trends shown in the heavy cloudy, cloudy, light cloudy, and sunny models were 3.65%, 8.47%, 2.38%, and 2.94%, respectively; in contrast, the rainy model showed an increase in MRE of 0.10%.
The plots for the three-days-ahead forecasting results are shown in Figure 12. In the rain and heavy cloud models, the CNN-SSA method provided an accurate representation of the actual PV power, especially at peak times. In the cloudy and light cloudy models, the actual PV power exhibited one peak. The CNN-SSA and LSTM-SSA methods represented the pattern, whereas SVM-SSA was likely to predict two peaks. The CNN-SSA performance was better than the LSTM-SSA performance from the beginning of the forecast period until the last of the forecasting hours. In the sunny model, the CNN-SSA and SVM-SSA methods represented the actual PV power pattern better than the LSTM-SSA did. The LSTM-SSA method performed poorly at the beginning and end of the observation time, during which the CNN-SSA method outperformed the benchmark algorithms. In comparison with the traditional benchmark algorithms as seen in Figure 13, the proposed method performed well. As can be seen, the traditional LSTM could not follow the actual load pattern in the beginning and peak observation hours. This happened due to the general input variables applied to the LSTM for each hour, which made the LSTM more prone to sudden changes in the PV power output. Different to the LSTM, the CNN and SVM had a more generalized ability to overcome sudden change for each observation hour. Energies 2020, 13, x FOR PEER REVIEW 15 of 20 Figure 12. Forecasting results comparison with the SSA integration. Figure 12. Forecasting results comparison with the SSA integration.  Figure 13. Forecasting results comparison without the SSA integration. Figure 13. Forecasting results comparison without the SSA integration.

Discussions
The proposed CNN-SSA method was designed to provide an accurate short-term PV power forecasting model that can overcome the drawbacks of large sets of historical data [16] and intensive spatial [29] and geographical information [32]. To examine the performance of the proposed method, the proposed CNN-SSA method was observed for a day ahead and extended to three days ahead to establish whether nonstationary variation affected the accuracy of the proposed method. For both day-ahead and three-days-ahead forecasting, the proposed CNN-SSA method outperformed the benchmark algorithms, which were evaluated as having MAPE values of 1.43% and 5.05% in the training and testing stages, respectively, for the sunny model. The performance of the rainy and heavy cloudy models was expected to be inferior to the rest of the models because of their greater PV power variation. Nevertheless, the proposed method maintained an MAPE value of 21.17% and an MRE value of 2.62%, which were superior to the benchmark algorithms' corresponding values. Moreover, the proposed method flexibly identified the optimal forecasting model because the SSA was integrated into the CNN structure, which resolved the inefficiency of the CNN parameter trial and error [23,24]. The historical data arrangement in the proposed method enabled the time series data set to be rearranged as CNN predictors, which accommodated multiple input variables associated with the expected forecasting outcomes without the need for bounding within specific image formats [22]. Despite the accuracy of the proposed CNN-SSA method, it required more time to produce an accurate model, ranging between 12 and 28 min, whereas the benchmark algorithms required a quarter of the proposed method's computation time.
As seen in Figure 12, though, the performance of CNN-SSA in cloudy, light cloudy, and sunny models fitted the actual PV power pattern closely, while the forecasting model struggled to follow the peak time in the sunny model. For the rainy and heavy cloudy models, the proposed method could not maintain a gap as close as in the sunny model. This condition implied that the uncertainty of the PV power output in the rainy and heavy cloudy models, and peak time of sunny model, to be addressed further. This condition also happened because the forecasting model was built from less training data, which suppressed the ability of the CNN to learn the uncertainty of the PV power. Nevertheless, the proposed CNN-SSA provided an accurate forecasting algorithm of the PV power that could accommodate multiple inputs and responses at once in comparison to the benchmark algorithm.
For a comparison of the proposed method with other CNN applications in PV power forecasting, the forecasting strategy in Jeong and Kim [49] accommodated the temporal PV power generation at multiple-site PV power plants. The CNN in Jeong and Kim [49] took a space-time matrix as the input, which consisted of the historical PV power generation of the preferred observation time collected from multiple PV sites. The CNN was applied to two-hours-ahead and six-hours-ahead predictions. In the proposed method, the proposed CNN-SSA method adopts multiple input variables, including the related historical (for training) and forecast (for prediction) weather data for applications of longer forecasting horizon of one-day-ahead and up to three-day-ahead PV power forecasting.
In addition, as a longer forecasting horizon is needed, a larger CNN structure should be employed with many more parameters to be tuned. Since there is no general rule to determine the best parameters of a CNN, the smallest number of CNN parameters was chosen for the input size in Jeong and Kim [49], which was much smaller than the number used in AlexNet [50] and VGGNet [51]. In the proposed method, the SSA was used to overcome this issue instead of intuitively choosing the suitable parameters. Therefore, the more suitable parameters in the proposed CNN predictors could be achieved for a larger structure.
In Jeong and Kim [49], a greedy adjoining algorithm (GAA) was used to indirectly capture cloud cover and movements conducted for the geographic location and historical dataset of multi-site PV generation, an approach that is suitable for a very short forecasting horizon. The proposed method uses the weather classification approach to improve the forecasting accuracy under the PV power generation uncertainty due to daily weather variations. The proposed method is thus more appropriate for a longer forecasting horizon from one to three days ahead.

Conclusions
This paper proposes a short-term PV power forecasting algorithm based on a CNN-SSA. CNN regression is used to construct the prediction model, and SSA is used to identify the optimal CNN parameter. CNN classification is used for the CNN-SSA to obtain the correct weather type. The results show the proposed method provided better accuracy than the benchmark algorithms did. The proposed algorithm provides a simple approach for the creation of this forecasting method, yet guarantees consistent accuracy within the day-ahead to the three-days-ahead forecasting windows. Although only five CNN regression models were used to establish the forecasting models, the proposed method can be extended to other models for more accurate predictions. Addressing the uncertainty, especially for rainy weather, heavy cloudy weather, and the peak time, is the future work of this study. Furthermore, forecasting on typhoon days represents a potential challenge for future research.