The initial steps of this study were data collection and pre-processing. The data obtained from these steps are defined in
Section 3.1. After obtaining the data, the next step prior to making forecasts was training the forecasting models. The models used for training are described in
Section 3.2. Finally, the trained models were tested on the unseen data, and the models performing the best were deployed. The results obtained from this step are described in
Section 4. All of these steps can be visualized in
Figure 1. A view of the location of some PV sites used in this study is shown in
Figure 2.
2.1. Data Description
The data used in this study came from 50 solar photovoltaic (PV) sites located in Suncheon, South Korea. The data included readings of the output of the PV panels taken at regular intervals of 15 min over a period of 6 months from 1 January 2020 to 31 July 2020. The data were collected in a time series format, with readings taken every 15 min. A total of 11,142 data points were used. A small portion of the used dataset has been shown in the
Table 1.
The obtained data were pre-processed, and the first pre-processing step was dealing with missing data points and removing inconsistencies. For example, during nighttime, there should not be any PV output at all, so the data during night time were set to zero if otherwise found. The missing data were also replaced by the average of nearby data points. Similarly, it was possible that poor results could be obtained if the dataset was inconsistent, so, if for any sites it was found that a huge chunk of data was missing, then such sites were discarded. Originally, the dataset contained readings from 50 different sites. However, during the preprocessing step, the data from 10 of the sites were removed due to the presence of inconsistencies and huge intervals of missing data. After this, the data were split into train, test, and validation data and scaled.
It is important to note that this study does not utilize weather data, such as temperature, humidity and wind speed, which are often used in forecasting the output of solar PV panels. This highlights the need for alternative approaches when weather data are not available.
In this study, I focus on multi-step time series forecasting, where the goal is to predict the output of the PV panels based on the readings at multiple time steps in the past. Despite the absence of weather data, this study aims to demonstrate the feasibility of using alternative approaches to forecast the output of solar PV panels.
2.2. Used Forecasting Models
For the purpose of this study, I used four popular machine learning models to perform forecasting of the solar PV output. The used models are Recurrent Neural Network (RNN) [
38], Gated Recurrent Unit (GRU) [
39], Long Short-Term Memory (LSTM) [
37], and Transformer [
24].
2.2.1. Recurrent Neural Network (RNN) [38]
A Recurrent Neural Network (RNN) is a type of artificial neural network that is well suited for processing sequential data. RNNs are used for tasks such as natural language processing and speech recognition, where the input data have a temporal structure. RNNs maintain an internal hidden state that can capture information from the entire sequence of inputs up to a given time step. This allows RNNs to model long-term dependencies in sequential data.
RNNs consist of a series of interconnected nodes or neurons that are connected in a feedforward manner. The hidden state of the network is updated at each time step based on the previous hidden state and the input data at that time step. This allows RNNs to model the dependencies between the data at different time steps, which is crucial for processing sequential data.
The main drawback of traditional RNNs is that they are prone to the vanishing gradient problem, where the gradient of the error signal with respect to the network parameters decreases exponentially as it propagates through time. This makes it difficult to train RNNs on long sequences, as the gradient of the error signal becomes very small, making it difficult to update the network parameters. To overcome this, various variants of RNNs have been developed, such as LSTMs (Long Short-Term Memory) and GRUs (Gated Recurrent Units), which are more robust to the vanishing gradient problem. The architecture of a general RNN is shown in
Figure 3.
The equation for an RNN model at time step t can be expressed as follows:
where
is the hidden state at time step
t,
is the input at time step
t,
is the output at time step
t,
and
are weight matrices,
is the bias vector for the hidden layer,
is the weight matrix and
is the bias vector for the output layer.
f and
g are activation functions, which are often the hyperbolic tangent or the rectified linear unit (ReLU) function.
2.2.2. Gated Recurrent Unit (GRU) [39]
Gated Recurrent Units (GRUs) are a type of Recurrent Neural Network (RNN) designed to capture the long-term dependencies between time steps in sequential data. Unlike traditional RNNs, which have a simple linear activation function to capture the relationships between time steps, GRUs have a gating mechanism that allows them to selectively choose which information to preserve from previous time steps and which to discard.
The GRU has two hidden states, the reset gate and the update gate, which are used to control the flow of information from previous time steps. The reset gate decides how much of the previous hidden state is to be forgotten, while the update gate decides how much of the previous hidden state is to be combined with the current input. The final hidden state is then used to predict the output at the current time step.
The GRU’s gating mechanism allows it to efficiently handle long sequences of data, as it can selectively preserve the most relevant information and discard the rest. Additionally, GRUs require fewer parameters than traditional RNNs, which can reduce overfitting and improve model training efficiency.
GRUs have been applied to various time series forecasting problems, including solar PV output forecasting. In such applications, the GRU is trained on historical time series data to capture the relationships between time steps and then used to make predictions for future time steps. The input to the GRU is the time series data, and the output is a prediction for the future values of the time series. The architecture of a single block of GRU is shown in
Figure 4.
The update gate, reset gate and new memory cell vector of a Gated Recurrent Unit (GRU) can be computed using the following equations:
where
is the input at time step
t;
is the hidden state at the previous time step;
and
are the weights and bias for the update gate;
and
are the weights and bias for the reset gate;
and
b are the weights and bias for computing the new memory cell vector;
is the update gate output;
is the reset gate output;
is the candidate hidden state; and
is the current hidden state. The
function is the sigmoid function, and ⊙ is the element-wise product (Hadamard product).
2.2.3. Long Short-Term Memory (LSTM) [37]
Long Short-Term Memory (LSTM) is a type of Recurrent Neural Network (RNN) architecture that is specifically designed to overcome the vanishing gradient problem faced by traditional RNNs. LSTM is widely used for time-series analysis, sequential prediction and natural language processing tasks.
LSTMs consist of memory cells that store information for an extended period of time and gates that control the flow of information into and out of the memory cells. The three gates in LSTM are the input gate, forget gate and output gate. These gates help in deciding what information should be stored in the memory, what information should be discarded and what information should be outputted.
The input gate controls the amount of new information that is allowed to enter the memory cell. The forget gate decides what information should be discarded from the memory cell. The output gate controls what information should be outputted from the memory cell.
LSTMs are trained using backpropagation through time (BPTT) to minimize a loss function that represents the difference between the predicted output and the true output. The weights of the gates and the memory cells are updated during the training process to minimize the loss function.
LSTMs are able to capture long-term dependencies in time-series data and to outperform traditional RNNs in tasks that require memory and sequential prediction. They have been widely used for time-series forecasting, natural language processing and speech recognition. The architecture of a single block of LSTM is shown in
Figure 5.
The equations for the different gates of LSTM are as follows:
where
is the forget gate,
is the input gate,
is the candidate cell state,
is the cell state,
is the output gate,
is the hidden state at time
t,
is the input at time
t,
W and
b are the weight and bias matrices, and
is the sigmoid activation function.
2.2.4. Transformer [24]
Transformer is a neural network architecture that was introduced in 2017 by Vaswani et al. in the paper “Attention is All You Need” [
24]. Transformer is designed to handle sequential data and has revolutionized the field of natural language processing (NLP). The key innovation of Transformer is the self-attention mechanism, which allows the model to weigh the importance of each feature in the input sequence when making predictions. The original transformer and its variants have also been used in multiple time series forecasting applications [
24,
40].
A traditional Recurrent Neural Network (RNN) operates on a sequence by processing one element at a time while maintaining an internal state. In contrast, the Transformer operates on the entire sequence at once, allowing it to capture long-term dependencies between elements. This capability of Transformer makes it well suited for sequence-to-sequence problems, such as time series forecasting.
The Transformer architecture consists of an encoder and a decoder, both of which are made up of a series of stacked attention and feed-forward layers. The encoder takes the input sequence and computes a sequence of hidden states. The decoder then takes the hidden states and produces the output sequence.
The attention mechanism in Transformer computes a weight for each element in the input sequence, indicating its importance in the current prediction. These weights are used to compute a weighted sum of the hidden states, which is then used to make the prediction. This allows Transformer to focus on the most relevant parts of the input sequence when making predictions.
In addition to the self-attention mechanism, Transformer also uses multi-head attention, which allows the model to attend to multiple aspects of the input sequence at once. This allows the model to capture complex relationships between elements in the input sequence.
Figure 6 shows the architecture of a transformer.
The self-attention in the Transformer architecture is given by the following equation:
This equation calculates the output of the self-attention mechanism, which is used to compute the relationship between each position in the input sequence. Here, Q represents the queries, K represents the keys and V represents the values. The equation calculates the dot product of the queries and keys, which are divided by the square root of the dimensionality of the keys, to scale the gradients. The result is passed through a softmax activation function to obtain a probability distribution over the keys, which is then used to weight the values.
The multi-head attention in the transformer architecture is given by the following equation:
where
This equation applies the self-attention mechanism multiple times in parallel to capture different relationships between the input sequence elements. Here, Q, K and V are the same as in the self-attention equation but are projected using weight matrices , and to create h different subspaces, or "heads". The self-attention operation is then applied to each of these subspaces to produce h different outputs, which are concatenated and linearly transformed using a weight matrix to produce the final output.