High Accuracy Modeling for Solar PV Power Generation Using Noble BD-LSTM-Based Neural Networks with EMA

More accurate self-forecasting not only provides a better-integrated solution for electricity grids but also reduces the cost of operation of the entire power system. To predict solar photovoltaic (PV) power generation (SPVG) for a specific hour, this paper proposes the combination of a two-step neural network bi directional long short-term memory (BD-LSTM) model with an artificial neural network (ANN) model using exponential moving average (EMA) preprocessing. In this study, four types of historical input data are used: hourly PV generation for one week (168 h) ahead, hourly horizontal radiation, hourly ambient temperature, and hourly device (surface) temperature, downloaded from the Korea Open Data Portal. The first strategy is employed using the LSTM prediction model, which forecasts the SPVG of the desired time through the data from the previous week, which is preprocessed to smooth the dynamic SPVG using the EMA approach. The SPVG was predicted using the LSTM model according to the trend of the previous time-series data. However, slight errors still occur because the weather condition of the time is not reflected at the desired time. Therefore, we proposed a second strategy of an ANN model for more accurate estimation to compensate for this slight error using the four inputs predicted by the LSTM model. As a result, the LSTM prediction model with the ANN estimation model using EMA preprocessing exhibited higher accuracy in performance than other options for SPVG.


Introduction
A report from the International Energy Agency (IEA) revealed that solar, wind, and hydropower energy is growing at a fast rate. In the Renewables 2018 forecast, the market analysis and forecasts from 2019 to 2024 by the IEA [1], the segment of renewables satisfying global energy demand is expected to grow by one-fifth in the following five years to reach 12.4% in 2023. South Korea is one of the most developed countries in Asia, and it is the eighth largest electricity consumer in the world [2]. South Korea has been making a great effort to increase the renewable energy portion of their energy mix [3]. The country has a strong solar photovoltaic (PV) manufacturing industry and supportive policies to achieve the national renewable energy target of 20% by 2030. Moreover, South Korea

Review of Recent Works for SPVG Forecasting
Recently, various SPVG prediction models have been developed on a much-better-integrated solution for the electricity grid, as introduced in [10]. Solar PV forecasting methods can be classified into three approaches [13]: (1) physical, (2) statistical, and (3) artificial intelligence models. The physical approach is a traditional technique that has been generally measured from meteorological and geological data through numerical weather prediction [14], satellite remote sensing measurement [15], and ground measurement devices [16].
However, these techniques require a large number of data samples [17][18][19][20][21] and fitting results, which are sensitive to pathological [22,23] factors and rely on many different types of data [18][19][20][21]. The statistical and machine learning approaches are based on data-driven methods that use historical data, such as solar PV power, irradiation, humidity, and atmospheric temperature, to build the prediction model [6]. The key factor of this statistical approach is minimizing the error of future PV power output by extracting high-quality historical samples [24]. However, these data usually contain irregular errors due to the unrefined input data. Furthermore, heavy preprocessing is required, which results in complexity and cost issues [25] such as requiring an advanced sensor for accurate input data [26], and not be describing a relationship between input features and PV power output although it is relatively clear and simple [27,28].
Machine learning is a subset of artificial intelligence (AI) that consists of techniques that enable computers to make estimations from the data and deliver AI applications [29]. Furthermore, machine learning is a popular forecasting method for time-series data, which is classified as supervised or unsupervised. The machine learning algorithm for solar PV forecasting has generally been a supervised Appl. Sci. 2020, 10, 7339 3 of 16 learning algorithm, such as k-nearest neighbors [30], multiple linear regression [31,32], support vector machine [33][34][35], decision tree [36,37], and random forest regression [38]. However, the machine learning approach also requires a large dataset that forecasts SPVG accurately [7].
With the rapid development of the deep learning structures algorithm, the deep neural network (DNN) is an emerging area of machine learning research that has become one of the most popular fields in academia and industry [27]. Deep learning algorithms in each layer create an artificial neural network (ANN) and can learn and make intelligent decisions by themselves. The existing short-term solar PV forecasting technique overcomes traditional machine learning problems, which include slow learning speed, falling into local extrema, gradient disappearance, and lack of a time correlation [9]. However, the deep learning approach uses an ANN that is more layered than machine learning to analyze the accuracy of the data and find patterns [12]. To use these characteristics, several authors have tried to apply deep learning for the accuracy of solar PV power forecasting, such as feed-forward neural networks, convolutional neural networks (CNNs), recurrent neural networks (RNNs), long short-term memory (LSTM), gated recurrent units (GRUs), restricted Boltzmann machines, deep belief networks, and autoencoders, as shown in Table 1. To deal with the nonlinear and periodic problems of SPVG, the traditional method did not guarantee the precision of the forecasting results. According to the above literature, to improve prediction accuracy, as shown in Table 1, most researchers used one of three types of forecasting models, classified as hybrid/ensemble, optimized parameters of DNN, and DNN with preprocessing.  [13] studied the hybrid LSTM-convolutional network, which was evaluated by the technique that considers temporal-spatial feature extraction in two steps. The LSTM model is used to extract the temporal feature information of the historical data, whereas the convolutional neural network extracts the spatial feature information of the historical data. Chai et al. [9] proposed an ultra-short-term PV power forecasting, which reduces the problem of hyperparameters in LSTM using the adaptive hyperparameter adjustment-LSTM model framework. Wang et al. [27] proposed two preprocessing methods based on GRU modeling: the Pearson coefficient extracts the main features that affect solar PV power and then examines the relationship between the input data and future PV power output. Then, the K-means method is used as a cluster analysis, which divides each group based on a similar pattern of input data.

Exponential Moving Average
An exponential moving average (EMA) is a type of moving average (MA) that emphasizes the most recent data points (i.e., it is a type of weighted MA where more weight is given to the latest data points as the weight exponentially decreases for older data points) [45]. The EMA is also called the exponential weighted moving average (EWMA) or the exponential smoothing method [7].
The EMA model can be described using the following equation [45]: where the weighted multiplier K is given by K = 2/(N + 1), with N as the period of the EMA calculation.
Moreover, E t is the SPVG at the current time,Ê t is the estimated EMA of the SPVG at a previous time, andÊ t+1 is the future value of the estimated SPVG.
The EWMA [46] is a concept that expands the formula of EMA when more weighting factors exist through the repeated application of the formula using different times. It is expressed using the following equation [46]: for any suitable ϕ ∈ (0, 1, 2, . . . , n).

Long Short-Term Modeling
Neural networks are a set of algorithms that artificially imitate biological neurons, which are the basis for human cognition functions. An RNN is a type of neural network where the output from the previous step is the input to the current step. The RNN can generate a sequence of data so that each sample data point can be assumed to be dependent on the previous sequences. If RNNs can accomplish the sample data, they would be extremely useful. However, as a sequence of data increases the gap between the input and reference data, it is difficult for RNNs to keep past data in memory due to the vanishing gradient problem, which decreases in long-term dependencies and eventually disappears.
The LSTM model or architecture extends the memory of RNNs. It was introduced by Hochreiter and Schmidhuber in 1997. This model mitigates the vanishing gradient problem, which is caused by the repeated application of the recurrent weight matrix in the RNN. In the LSTM model, the recurrent weight matrix is not only replaced by the identity function in the carousel, but it is also controlled by a few noble gates.

Structure of the LSTM
All RNNs are in the form of a chain of repeating modules of neural networks. In standard RNNs, this repeating module has a basic structure, such as a single tanh layer. However, LSTMs have a chain-like structure (called a carousel) with different repeating modules of the RNN. In the LSTM model, the repeating module has three different types of gates, namely, the forget gate, fg(t), the input gate, ip(t) × Gi(t), and the output gate, O(t), as shown in Figure 1.
previous CS (t − 1) values approaching 0 do not affect future results. However, if the output of the sigmoid as an active function is 1, the previous CS (t − 1) value is completely remembered in order to affect the next step for CS (t) and is given by the following equation [11,13,47,48]: In (3), w fg x and w fg h are the weighting vectors for the forget layer that are connected to the input vector denoted as x(t) and the previous short-term state h(t − 1), respectively. In addition, b f is defined as the bias for the forget layers.
The next step determines the new information to be stored in the CS, which is classified into two subprocesses. First, a sigmoid layer called the input gate layer, ip(t), determines which values to update. Next, a tanh layer creates a vector of new candidate values, Gi(t), that could be added to the state. In the next step, we combine these two steps to update the state ig(t), as shown in (4) to (6) [11,13,47,48]: (6) where w ip x and w hi h are represented as weighting vectors for the input gate layer that are connected to the input vector denoted as x(t) and the previous short-term state h(t−1), respectively. The terms and are defined as the bias for the input gate layer.
The final decision step for the optimal CS (t) is based on the previous two steps, which can be readily obtained as follows [11,13,49,50]: First, the sigmoid function σ is determined by whether a CS outputs 0 or 1 for the neuron, denoted as O(t) [11,13,49,50]: where w O x and w O h can be expressed as weighting vectors for the output gate that are connected to the input vector denoted as x(t) and the previous short-term state h(t − 1), respectively. In addition, bo is defined as the bias for the output gate layer. Then, we apply the tanh function, which assigns

Operation of LSTM
The first step in the LSTM model is to determine which piece of information to discard from the cell state (CS), considering the current input vector, x(t), and the previous short-term state vector, h(t − 1). The first step is accomplished using a sigmoid layer, which is called the "forget gate layer". If the output of the sigmoid as an active function is 0, all previous CS (t − 1) values approaching 0 do not affect future results. However, if the output of the sigmoid as an active function is 1, the previous CS (t − 1) value is completely remembered in order to affect the next step for CS (t) and is given by the following equation [11,13,47,48]: In (3), w x fg and w h fg are the weighting vectors for the forget layer that are connected to the input vector denoted as x(t) and the previous short-term state h(t − 1), respectively. In addition, b f is defined as the bias for the forget layers.
The next step determines the new information to be stored in the CS, which is classified into two subprocesses. First, a sigmoid layer called the input gate layer, ip(t), determines which values to update. Next, a tanh layer creates a vector of new candidate values, Gi(t), that could be added to the state. In the next step, we combine these two steps to update the state ig(t), as shown in (4) to (6) [11,13,47,48]: where w x ip and w h hi are represented as weighting vectors for the input gate layer that are connected to the input vector denoted as x(t) and the previous short-term state h(t−1), respectively. The terms b i and b Gi are defined as the bias for the input gate layer.
The final decision step for the optimal CS (t) is based on the previous two steps, which can be readily obtained as follows [11,13,47,48]: First, the sigmoid function σ is determined by whether a CS outputs 0 or 1 for the neuron, denoted as O(t) [11,13,47,48]: where w x O and w h O can be expressed as weighting vectors for the output gate that are connected to the input vector denoted as x(t) and the previous short-term state h(t − 1), respectively. In addition, b o is defined as the bias for the output gate layer. Then, we apply the tanh function, which assigns weights to the values that are passed through, determining the level of importance, ranging from −1 to 1 of the CSs, and multiplying it by the output of the sigmoid gate denoted as h(t) [11,13,47,48]:

Proposed SolPV ELA
The proposed model (SolPV ELA ) consists of two stages, including a powerful two-stage neural network model for short-term forecasting of PV output power. The first stage uses the LSTM model to predict the target of the SPVG using the previous one week of data concerning the EMA. The second model uses the ANN model for accuracy estimation to obtain precise results using the previous data obtained from the LSTM model and three other types of input data at the target time. Through this approach, the first model (LSTM) is expected to have a value according to the previous trend, and the second model (ANN) obtains a value closer to the actual value by reflecting the input data at the desired time, as described in Figure 2.
Appl. Sci. 2020, 10, x FOR PEER REVIEW 8 of 16 weights to the values that are passed through, determining the level of importance, ranging from -1 to 1 of the CSs, and multiplying it by the output of the sigmoid gate denoted as h(t) [11,13,49,50]:

Proposed SolPVELA
The proposed model (SolPVELA) consists of two stages, including a powerful two-stage neural network model for short-term forecasting of PV output power. The first stage uses the LSTM model to predict the target of the SPVG using the previous one week of data concerning the EMA. The second model uses the ANN model for accuracy estimation to obtain precise results using the previous data obtained from the LSTM model and three other types of input data at the target time. Through this approach, the first model (LSTM) is expected to have a value according to the previous trend, and the second model (ANN) obtains a value closer to the actual value by reflecting the input data at the desired time, as described in Figure 2.

Data Processing (Training and Testing Procedures)
To forecast the output of one hour of solar PV recorded at a sampling interval of 1 h, the input data used a sequence of data consisting of four inputs of 168 h. In addition, 32,040 pieces of input data were generated using the given 32,208 data points, and each data point had the form of a 168 × 4 matrix. The dataset was divided into two subsets: 80% (25,632) as a training set, which is the subset for training the proposed model, and a 20% testing set, which is the subset for testing the proposed model from the trained data (6408), as shown in Figure 3.

Data Processing (Training and Testing Procedures)
To forecast the output of one hour of solar PV recorded at a sampling interval of 1 h, the input data used a sequence of data consisting of four inputs of 168 h. In addition, 32,040 pieces of input data were generated using the given 32,208 data points, and each data point had the form of a 168 × 4 matrix. The dataset was divided into two subsets: 80% (25,632) as a training set, which is the subset for training the proposed model, and a 20% testing set, which is the subset for testing the proposed model from the trained data (6408), as shown in Figure 3. Appl. Sci. 2020, 10, x FOR PEER REVIEW 9 of 16

LSTM Deep Learning Algorithm of the SolPVELA
As discussed in Chapter 2, the LSTM architecture effectively "extends" the memory of the RNN. The basic idea of a unidirectional LSTM model only preserves the past information because the inputs only originated from the previous sequence. However, the bidirectional LSTM (BD-LSTM) runs each training sequence forward and backward to separate two recurrent nets, both of which are connected to the same output layer, as presented in Figure 4. In other words, BD-LSTM does not require any prior knowledge or predesign due to the information being kept in both directions [49]. In this study, BD-LSTM was used to improve the accuracy of the prediction output by removing sequence classification problems [49]. The shape of the input data and output data was (168, 4) and (1, 1), respectively, and 64 nodes were arranged in each layer. Tanh was used for the activation function of the LSTM model, which is also known as the transfer function. This transfer function determines the output of the neural network, such as yes or no, which maps the resulting values between 0 and 1 or −1 and 1, as derived from the following equation:

ANN Estimation with Target Selection
We executed the learning rates for the training epoch at 50 in terms of the mean squared error (MSE), evaluating the training set using the optimizer function NADAM, which updates the weight parameters to minimize the loss function. To achieve high accuracy for the training set, the validation was set at 20% to check for the overfitting and underfitting of the training data.

LSTM Deep Learning Algorithm of the SolPV ELA
As discussed in Chapter 2, the LSTM architecture effectively "extends" the memory of the RNN. The basic idea of a unidirectional LSTM model only preserves the past information because the inputs only originated from the previous sequence. However, the bidirectional LSTM (BD-LSTM) runs each training sequence forward and backward to separate two recurrent nets, both of which are connected to the same output layer, as presented in Figure 4. In other words, BD-LSTM does not require any prior knowledge or predesign due to the information being kept in both directions [49]. In this study, BD-LSTM was used to improve the accuracy of the prediction output by removing sequence classification problems [49]. The shape of the input data and output data was (168, 4) and (1, 1), respectively, and 64 nodes were arranged in each layer. Tanh was used for the activation function of the LSTM model, which is also known as the transfer function. This transfer function determines the output of the neural network, such as yes or no, which maps the resulting values between 0 and 1 or −1 and 1, as derived from the following equation:

LSTM Deep Learning Algorithm of the SolPVELA
As discussed in Chapter 2, the LSTM architecture effectively "extends" the memory of the RNN. The basic idea of a unidirectional LSTM model only preserves the past information because the inputs only originated from the previous sequence. However, the bidirectional LSTM (BD-LSTM) runs each training sequence forward and backward to separate two recurrent nets, both of which are connected to the same output layer, as presented in Figure 4. In other words, BD-LSTM does not require any prior knowledge or predesign due to the information being kept in both directions [49]. In this study, BD-LSTM was used to improve the accuracy of the prediction output by removing sequence classification problems [49]. The shape of the input data and output data was (168, 4) and (1, 1), respectively, and 64 nodes were arranged in each layer. Tanh was used for the activation function of the LSTM model, which is also known as the transfer function. This transfer function determines the output of the neural network, such as yes or no, which maps the resulting values between 0 and 1 or −1 and 1, as derived from the following equation:

ANN Estimation with Target Selection
We executed the learning rates for the training epoch at 50 in terms of the mean squared error (MSE), evaluating the training set using the optimizer function NADAM, which updates the weight parameters to minimize the loss function. To achieve high accuracy for the training set, the validation was set at 20% to check for the overfitting and underfitting of the training data.  Configuration of bidirectional (BD) long short-term memory (LSTM) for the proposed framework.

ANN Estimation with Target Selection
We executed the learning rates for the training epoch at 50 in terms of the mean squared error (MSE), evaluating the training set using the optimizer function NADAM, which updates the weight parameters to minimize the loss function. To achieve high accuracy for the training set, the validation was set at 20% to check for the overfitting and underfitting of the training data.
The trained data from the BD-LSTM prediction model might be able to reduce the MSE, which indicates that it could make a difference between the real and prediction values. However, problems still exist when environmental changes cannot be reflected at that time, which is due to unexpected trends such as a sudden weather change. To solve this issue, this study aimed to estimate the solar PV value of the predicted data from the BD-LSTM model to improve the accuracy by combining solar radiation and two temperature readings for the target time. The input and output layers of the ANN model are (4, 1) and (1, 1), respectively, as illustrated in Figure 5. Sixty-four nodes were placed in each layer, except for the output layer. The activation function of the node used the rectified linear unit that computes the function f (x) = max (0, x), which approaches zero as x < 0. In other words, the activation function is simply the thresholder at zero. Like the BD-LSTM model, the ANN model was trained for the first time when 50 training samples were collected using the loss function of the MSE and NADAM as the optimization function, with a 20% validation setting.
Appl. Sci. 2020, 10, x FOR PEER REVIEW 10 of 16 The trained data from the BD-LSTM prediction model might be able to reduce the MSE, which indicates that it could make a difference between the real and prediction values. However, problems still exist when environmental changes cannot be reflected at that time, which is due to unexpected trends such as a sudden weather change. To solve this issue, this study aimed to estimate the solar PV value of the predicted data from the BD-LSTM model to improve the accuracy by combining solar radiation and two temperature readings for the target time. The input and output layers of the ANN model are (4, 1) and (1, 1), respectively, as illustrated in Figure 5. Sixty-four nodes were placed in each layer, except for the output layer. The activation function of the node used the rectified linear unit that computes the function f(x) = max (0, x), which approaches zero as x < 0. In other words, the activation function is simply the thresholder at zero. Like the BD-LSTM model, the ANN model was trained for the first time when 50 training samples were collected using the loss function of the MSE and NADAM as the optimization function, with a 20% validation setting.

Data Description
The testing data were acquired from a 1490 kWh (rated capacity) PV system that starts generating (usually between 6:00 and 19:00) when the sun rises in Yeonseong-gun, Gyeonggi-do, South Korea. The historical data were downloaded from the Korea Open Data Portal and then used as the input data from January 2015 to October 2018. To reduce the average prediction error, we excluded abnormal data and filled in missing data due to PV system failure and data loss. The original data consisted of four different parameters: SPVG (kWh), solar PV irradiation (W/m 2 ), device surface temperature (°C; temperature range on the surface of the solar cell), and ambient temperature (°C; ambient temperature). In addition, the number of data points was 1342 (32,208 h), which was divided into training and testing datasets. Figure 6 depicts the SPVG (kWh), with the range of the highest density when the solar PV irradiation (W/m 2 ) peaks (usually between 12:00 and 14:00). Furthermore, the device surface temperature (°C) has similar growth patterns in relation to the atmospheric temperature (°C). At peak times, the device surface temperature is 10 to 20 °C higher than the atmospheric temperature, with a similar seasonal pattern.

Data Description
The testing data were acquired from a 1490 kWh (rated capacity) PV system that starts generating (usually between 6:00 and 19:00) when the sun rises in Yeonseong-gun, Gyeonggi-do, South Korea. The historical data were downloaded from the Korea Open Data Portal and then used as the input data from January 2015 to October 2018. To reduce the average prediction error, we excluded abnormal data and filled in missing data due to PV system failure and data loss. The original data consisted of four different parameters: SPVG (kWh), solar PV irradiation (W/m 2 ), device surface temperature ( • C; temperature range on the surface of the solar cell), and ambient temperature ( • C; ambient temperature). In addition, the number of data points was 1342 (32,208 h), which was divided into training and testing datasets. Figure 6 depicts the SPVG (kWh), with the range of the highest density when the solar PV irradiation (W/m 2 ) peaks (usually between 12:00 and 14:00). Furthermore, the device surface temperature ( • C) has similar growth patterns in relation to the atmospheric temperature ( • C). At peak times, the device surface temperature is 10 to 20 • C higher than the atmospheric temperature, with a similar seasonal pattern.

Performance Metrics in Terms of the Evaluation Index
Several evaluation indices are applied for the accuracy of SPVG. In this work, we measured the accuracy of the difference between the measured value and the predicted output using metrics such as the root mean squared error (RMSE in %), mean absolute percentage error (MAPE in %), and Pearson's correlation coefficient ( 2 ).
The RMSE is the most common metric used to measure the accuracy of SPVG for continuous variables and can be defined as follows: where P i is the actual value, P ı � is the forecast value, and N represents the size of the test dataset for the i-th day in the test dataset.
The MAPE is also known as the mean absolute percentage deviation [26]. This index calculates the average error ratio in terms of a measure of prediction accuracy using the following formula: Pearson's correlation coefficient ( 2 ) is the test statistic that measures the statistical relationship or association between two continuous variables [28]. If 1 , 2 , … .
are the numbers of one set and

Performance Metrics in Terms of the Evaluation Index
Several evaluation indices are applied for the accuracy of SPVG. In this work, we measured the accuracy of the difference between the measured value and the predicted output using metrics such as the root mean squared error (RMSE in %), mean absolute percentage error (MAPE in %), and Pearson's correlation coefficient (R 2 ).
The RMSE is the most common metric used to measure the accuracy of SPVG for continuous variables and can be defined as follows: where P i is the actual value,P i is the forecast value, and N represents the size of the test dataset for the i-th day in the test dataset.
The MAPE is also known as the mean absolute percentage deviation [26]. This index calculates the average error ratio in terms of a measure of prediction accuracy using the following formula: Pearson's correlation coefficient (R 2 ) is the test statistic that measures the statistical relationship or association between two continuous variables [28]. If x 1 , x 2 , . . . , x n are the numbers of one set and y 1 , y 2 , . . . , y n are the numbers of another set, the coefficient of correlation between the two sets is as follows [50,51]: This quantity must lie between −1 and 1. The Pearson correlation coefficient is calculated to measure the correlation between solar PV and three different input data for training. It identifies the most relevant parameters affecting the solar PV output power, as presented in Figure 7. Three input data were calculated: solar irradiation (0.92), device surface temperature (0.62), and ambient temperature (0.34), as depicted in Figure 7. All cases indicate a positive correlation, which has a positive effect on SPVG, and solar irradiation has the highest correlation compared to the other two cases. The device surface and ambient temperatures have a similar pattern distribution, whereas the device surface temperature is 0.28 better than the ambient temperature.
Appl. Sci. 2020, 10, x FOR PEER REVIEW 12 of 16 This quantity must lie between −1 and 1. The Pearson correlation coefficient is calculated to measure the correlation between solar PV and three different input data for training. It identifies the most relevant parameters affecting the solar PV output power, as presented in Figure 7. Three input data were calculated: solar irradiation (0.92), device surface temperature (0.62), and ambient temperature (0.34), as depicted in Figure 7. All cases indicate a positive correlation, which has a positive effect on SPVG, and solar irradiation has the highest correlation compared to the other two cases. The device surface and ambient temperatures have a similar pattern distribution, whereas the device surface temperature is 0.28 better than the ambient temperature.

Performance Analysis
For a target hour, the SolPVELA (ELA) was verified by comparing it using different approaches: LSTM, CNN, and LSTM with an EMA (EL). We computed the performance indices, RMSE, MAE, and 2 , to compare the performance of our proposed algorithm with that of the other models and present the results in Table 2.  Table 2 indicates that the EMA did not reduce the error rate for one-hour-ahead PV forecasting, as revealed by the comparative performances of EL and LSTM. Figure 8 illustrates that the ELA model has high accuracy and high stability under different weather conditions in all four seasons. In Case I, winter, the mid-day shape was broken by a passing cloud. The ELA model still improves the predicted output better than the other options. Cases III and IV have normal condition patterns, while two days in Case II, winter, have a lower PV, and the shape was distorted by the rainy season. However, the ELA model can still track real data under bad weather conditions.

Performance Analysis
For a target hour, the SolPV ELA (ELA) was verified by comparing it using different approaches: LSTM, CNN, and LSTM with an EMA (EL). We computed the performance indices, RMSE, MAE, and R 2 , to compare the performance of our proposed algorithm with that of the other models and present the results in Table 2.   Table 2 indicates that the EMA did not reduce the error rate for one-hour-ahead PV forecasting, as revealed by the comparative performances of EL and LSTM. Figure 8 illustrates that the ELA model has high accuracy and high stability under different weather conditions in all four seasons. In Case I, winter, the mid-day shape was broken by a passing cloud. The ELA model still improves the predicted output better than the other options. Cases III and IV have normal condition patterns, while two days in Case II, winter, have a lower PV, and the shape was distorted by the rainy season. However, the ELA model can still track real data under bad weather conditions.

Conclusions
This paper proposed a new approach (ELA) to improve the accuracy of a two-step model by estimating the ANN of four types of input data at the designated time based on LSTM prediction modeling with EMA smoothing preprocessing. The ELA model was validated using real input data that originated from the Korea Open Data Portal for January 2015 to October 2018. These input data were used to forecast the output of one-hour solar PV power recorded at a sampling interval of 1 h.

Conclusions
This paper proposed a new approach (ELA) to improve the accuracy of a two-step model by estimating the ANN of four types of input data at the designated time based on LSTM prediction modeling with EMA smoothing preprocessing. The ELA model was validated using real input data that originated from the Korea Open Data Portal for January 2015 to October 2018. These input data were used to forecast the output of one-hour solar PV power recorded at a sampling interval of 1 h.
The numerical results indicated that the ELA model outperforms (R 2 = 0.16, RMSE = 102, MAE = 60) the worst case of the CNN model. The other neural-network-based solutions (CNN, LSTM, and EL) could not perform better than our proposed theory-based method (i.e., the ELA model) due to insufficient data and no capability to leverage the priori knowledge. Furthermore, the performance of the ELA model was visualized to better predict the performance in different cases for four seasons, which provides reliable PV power forecasting for an actual power plant. In future studies, we plan to consider how to combine our method with neural-network-based methods so that a data-driven gain can enhance the theoretical advantage.