One-Day-Ahead Solar Irradiation and Windspeed Forecasting with Advanced Deep Learning Techniques

: In recent years, demand for electric energy has steadily increased; therefore, the integration of renewable energy sources (RES) at a large scale into power systems is a major concern. Wind and solar energy are among the most widely used alternative sources of energy. However, there is intense variability both in solar irradiation and even more in windspeed, which causes solar and wind power generation to ﬂuctuate highly. As a result, the penetration of RES technologies into electricity networks is a difﬁcult task. Therefore, more accurate solar irradiation and windspeed one-day-ahead forecasting is crucial for safe and reliable operation of electrical systems, the management of RES power plants, and the supply of high-quality electric power at the lowest possible cost. Clouds’ inﬂuence on solar irradiation forecasting, data categorization per month for successive years due to the similarity of patterns of solar irradiation per month during the year, and relative seasonal similarity of windspeed patterns have not been taken into consideration in previous work. In this study, three deep learning techniques, i.e., multi-head CNN, multi-channel CNN, and encoder– decoder LSTM, were adopted for medium-term windspeed and solar irradiance forecasting based on a real-time measurement dataset and were compared with two well-known conventional methods, i.e., RegARMA and NARX. Utilization of a walk-forward validation forecast strategy was combined, ﬁrstly with a recursive multistep forecast strategy and secondly with a multiple-output forecast strategy, using a speciﬁc cloud index introduced for the ﬁrst time. Moreover, the similarity of patterns of solar irradiation per month during the year and the relative seasonal similarity of windspeed patterns in a timeseries measurements dataset for several successive years demonstrates that they contribute to very high one-day-ahead windspeed and solar irradiation forecasting performance.


Introduction
A significant amount of global and domestic energy requirements are covered by fossil fuel consumption.It is widely accepted that consuming fossil fuels such as oil, coal, and natural gas releases a large amount of greenhouse gasses into the atmosphere, leading to extremely negative effects on the environment.The production of "cleaner", carbonfree energy can be achieved by utilizing renewable energy sources such as the wind and sun, which have begun to be used to cover the globe's increasing energy needs.Electric energy market liberalization in conjunction with the increasing need for sustainable energy has turned political and investing interests into further utilizing RES to cover electricity needs [1,2].
Energy produced from the wind and the sun depends largely on local weather conditions, such as temperature, windspeed, air pressure, humidity, sunlight, etc., and their fluctuations.Thus, wind and solar power generation is often difficult to control and predict, as weather conditions constantly change.This makes integration of wind and solar energy into power grids, especially isolated grids, a significant challenge [3,4].
To tackle the aforementioned challenge, it is essential to improve the performance of windspeed and solar irradiation one-day-ahead forecasting in order to minimize uncertainty about the amount of renewable power that can be generated in any electric grid operational situation.Given the inherent relationship between solar irradiation and the electric power produced from photovoltaics, and between windspeed and wind turbine power generation, it is necessary to create computational models that will accurately predict solar irradiation and windspeed in medium-and/or short-term time scales [5][6][7][8][9][10][11].
Windspeed forecasting can be separated into four temporal ranges: very short-term (from a few seconds to 30 min), short-term (from 30 min to 6 h ahead), medium-term (from 6 h to 1 day ahead), and long-term (more than 1 day ahead) [6].Solar irradiation forecasting can also be divided into four temporal ranges: very short-term (a few minutes to 1 h), short-term (1-4 h), medium-term (1 day ahead), and long-term (more than 1 day ahead) [7].
Over the last few years, various tools have been established to predict windspeed and solar irradiation.These tools can be separated into three main groups: (1) datadriven models, such as statistical models and machine learning models, which are the most prevalent tools used for predicting such timeseries; (2) physical models that use meteorological and topographical data; and (3) hybrid algorithms, which have found great success in a number of research areas [3,6,8].
Hybrid methods found in the literature include variational mode decomposition with Gram-Schmidt orthogonal and extreme learning machines, which are enhanced at the same time by a gravitational search algorithm [42], nonlinear neural network architectural models combined with a modified firefly algorithm and particle swarm optimization (PSO) [43], the hybrid model decomposition (HMD) method and online sequential outlier robust extreme learning machine (OSORELM) [44], empirical mode decomposition and Elman neural networks (EMD-ENN) [45], wavelet transform (WT-ARIMA) [46], empirical wavelet transform (EWT) and least-square support vector machines (LSSVM) improved by coupled simulated annealing [47], and variational mode decomposition (VMD) combined with several ML methods, including SVM and back propagation neural networks (BPNN).Moreover, ELMs and ENNs were implemented to perform advanced data preprocessing based on complementary ensemble empirical mode decomposition (CEEMD) [48], while sample entropy and VMD forecasting methods based on ENNs and on a multi-objective "satin bowerbird" optimization algorithm have been introduced [49].Bidirectional long short-term memory neural networks with an effective hierarchical evolutionary decomposition technique and an improved generalized normal distribution optimization algorithm for hyperparameter tuning, a combined model system including an improved hybrid timeseries decomposition strategy (HTD), a novel multi-objective binary backtracking search algorithm (MOBBSA), and an advanced sequence-to-sequence (Seq2Seq) predictor for windspeed forecasting have been presented in [50,51], respectively.Further, recurrent neural network prediction algorithms combined with error decomposition correction methods have also been presented in [52].The purpose of this paper is to develop models for high-performance, medium-term forecasting (i.e., for the next 24 h) of windspeed and solar irradiation, which will be based on hourly data recorded on Dia Island, which is located north of Heraklion city in Crete, Greece.In order to achieve this, the efficacies of three deep learning techniques, i.e., multi-channel CNN, multi-head CNN, and encoder-decoder LSTM, are investigated and compared with two conventional methods, i.e., RegARMA and NARX, in order, among other things, to demonstrate the improved forecasting performance of the deep learning techniques and to highlight the most effective among them.All the presented methodologies were tested on a benchmarked dataset of real measurements for the purpose of predicting with the highest possible statistical accuracy the windspeed and solar irradiation for a forecasting period of 24 h, i.e., of one day ahead.
The main contributions of this paper are:

•
Data were categorized by each month for successive years, firstly due to the similarity of patterns of solar irradiation by month during the year, and secondly because of the relative seasonal similarity of the windspeed patterns, resulting in a monthly timeseries dataset, which is more significant for high-performance forecasting.

•
A walk-forward validation forecast strategy in combination first with a recursive multistep forecast strategy and secondly with a multiple-output forecast strategy was successfully implemented in order to significantly improve medium-term windspeed and solar irradiation forecasts.

•
The recursive multistep forecast strategy was compared to the multiple-output forecast strategy.
The paper is organized as follows: In Section 2, we present the theory behind the proposed deep learning forecasting methods and the real measurements categorized by each months' dataset, model configurations, the methodology followed, and the algorithms for the medium-term windspeed and solar irradiation forecasting.In Section 3, the simulation results and the discussion of these results are presented, while in Section 4, the conclusions of the paper are summarized.

Dataset Presentation
The dataset used in this research is derived from measurements carried out on Dia Island, Crete, Greece.Table 1 includes the required parameters given in hourly values for every day for years 2005-2016 at a height of 10 m from the ground.All these parameters were recorded except for the beam/direct irradiance on a plane always normal to the sun's rays and the diffuse irradiance on the horizontal plane, which were estimated from the global irradiance on the horizontal plane using the anisotropic model described in [53].The beam/direct irradiance on a plane always normal to the sun rays was considered for two main reasons: (1) it improves the forecasting performance of the examined models, and (2) it is an essential parameter for the estimation of a photovoltaic system's performance in a specific location.Moreover, extraterrestrial irradiation is calculated using the typical solar geometry equations presented in [54].Table 2 includes some statistical data for solar irradiation and windspeed, including maximum and minimum mean values and standard deviations (Std).For solar irradiation forecasting, due to the lack of a cloud index, the normalized discrete index for each day (NDD(d)) and for each hour of the day (NDD(h,d)) were introduced and calculated by Equations ( 1) and (2) below, provided the extraterrestrial solar irradiation for Dia Island and the solar irradiation in the horizontal plane [36].Due to the periodicity of solar irradiation, we constructed two columns: (1) the number of days in the month (31, 30 or 28); and (2) the hour of the day for every observation .For solar irradiation forecasting, the following parameters were used as inputs from the initial measurements' dataset: air temperature, NDD(d), NDD(h,d), the number of days in the month, and the hour of the day.From the initial dataset of measurements, the nighttime values (zero solar irradiation) were removed due to the fact that night hours do not contribute to solar irradiation forecasting.
The parameters NDD(d) and NDD(h,d) are calculated as follows: where "d" is the day of the year (1 to 365), "i" is the hour number of each day (1 to 24), "h" is the specific hour of the day for which the cloud index NDD(h, d) is calculated, G on is the normalized extraterrestrial irradiance, and G sn is the normalized surface irradiance.Global irradiance data on the horizontal plane are presented in Table 1, where extraterrestrial irradiance data were calculated from well-known solar geometry equations, using as parameters the solar constant (1367 W/m 2 ), day of the year, latitude and longitude of the location, solar hour angle, and declination angle of the Sun [54].For normalization of G on and G sn , their corresponding maximum values for each year of the dataset were used.Even if the value of extraterrestrial or surface irradiance exceeds its historical maximum (so the normalized maximum irradiance could slightly exceed 1), this does not affect the performance of the forecasting.In addition, statistical parameters such as the maxima, minima, means, and standard deviations for windspeed and solar irradiation data are shown in Table 2.
For windspeed forecasting, the following parameters were used as inputs from the initial dataset: air temperature ( • C), relative humidity (%), and global irradiance on the horizontal plane (W/m 2 ) [5].A typical CNN consists of at least one convolutional layer, fully connected layers, flattened layers, pooling layers, and dropout layers.The purpose of the convolutional layer is to convolve the input image and generate the feature maps.Input image convolving is carried out by sliding a group of small-sized filters (kernels)-each of which contain a sufficient number of learnable weights-over the input image, implementing elementwise multiplication at each possible position.A completely new layer is generated from each kernel, which contains the application results of the particular kernel in the input image.The number of generated feature maps (convolutional layer depth) is defined by the number of kernels and constitutes the CNN hyperparameters, which must be chosen correctly based on available data.Then, this resulting group of layers undergoes a pooling process.Pooling involves a down-sampling operation in which sets of elements in the feature maps are integrated and restricted to a single value based on some criterion or calculation (e.g., maximum value or average of all values).As a result, noise data are eliminated, and better performance is achieved.Repeating the two aforementioned layers multiple times by applying different kernels of different sizes and depths, successive extraction of higher-level features improves, which constitutes one of the assets of CNNs.Dropout layers can be used after convolutional layers and pooling layers to protect neural networks from overfitting.

Presentation of the
Finally, the last pooled layer can be converted into a single vector that includes all of its weights and which is connected to a fully connected layer, which is further connected to the output layer that contains a summation of every possible class, thus providing the classification success estimation for the given input [55][56][57][58].
The multi-channel approach applied in this paper is based on the aforementioned typical CNN architecture and extends it by adding a further embedding layer into the model in order to raise the number of channels matching the degree of semantic enrichment of the present paper's data.Multi-channel CNNs use each of the solar irradiation inputs and the windspeed forecasting timeseries variables to predict the windspeed and solar irradiation of the next day.This is implemented by entering each one-dimensional timeseries into the model as a separate input channel.A distinct kernel is then used by the CNN, which will read each input sequence onto a separate set of filter maps, essentially learning features from each input timeseries variable.This is useful for situations where the output sequence is some function of the observations at prior timesteps derived from their multiple different features, and also when the output sequence does not contain only the feature to be forecasted [57,59].
Another extension of the CNN model is to obtain a separate sub-CNN model, or, in other words, a head for each input variable, whose structure can be referred to as a multiheaded CNN model.This extension requires transformation of the model preparation, and, in turn, modification of the preparation of the training and test datasets.Regarding the model, a separate CNN model must be defined for each of the input variables: solar irradiation and windspeed.Inserting each input into an independent CNN has a number of advantages, such as feature extraction that is improved by focusing only on one input, and each convolutional head can be controlled for the specific nature of each input.The configuration of the model, taking into consideration the number of layers and their hyperparameters, was also modified to better suit the new approach presented above [57].

Encoder-Decoder LSTM
Long short-term memory (LSTM) is a modified version of artificial recurrent neural network (RNN) architecture mainly used in deep learning algorithms.LSTMs use feedback connections, in contrast to standard feed forward neural networks, which enhances the memory recovery of a given network.LSTMs can process single data points (such as images) and entire sequences of data (such as speech or video); therefore, LSTMs are suitable for applications such as unsegmented, connected handwriting recognition, speech recognition, anomaly detection in network traffic or intrusion detection systems (IDSs), etc. [60].
A common LSTM unit consists of a cell, an input gate (to investigate which information should be used for memory modification), an output gate, and a forget gate (to decide the information to be dismissed).The cell remembers values over arbitrary time intervals, and the three gates adjust the information flow into and out of the cell.
LSTM networks are appropriate for forecasting, classifying, and processing based on timeseries data, since unknown duration lags may exist between important events when dealing with timeseries problems.LSTMs are able to cope with the vanishing gradient problem that can arise during training of traditional RNNs.Their relative insensitivity to gap lengths is an advantage of LSTMs over RNNs, hidden Markov models, and other sequence learning methods in numerous applications [61].
Encoder-decoder LSTM is a recurrent neural network designed to cope with sequenceto-sequence (seq2seq) problems (text translation, learning program execution, etc.).Due to variations in the number of items in the inputs and outputs, sequence-to-sequence prediction problems have been worth studying.One advantage of an encoder-decoder LSTM is its use of fixed-sized internal representation in the core of the model [59].
The encoder and the decoder are usually LSTM units or gated recurrent units.The purpose of the encoder is to read the input sequence and to summarize the information in the internal state vectors (the hidden state and cell state vectors in the case of LSTMs).The outputs of the encoder can be discarded; only the internal states need to be retained.The decoder is an LSTM whose initial states are initialized to the final states of the encoder LSTM.Using these initial states, the decoder starts to generate the output sequence (see Figure 1).

Solar and Wind Data Preprocessing and Forecasting Model Configurations
To appropriately train the model, two data preprocessing procedures were carried out.The first procedure normalized the data and the latter procedure accommodated for missing data.As for the latter, the average of nearby values during the same week was calculated to fill missing data values.Furthermore, it is worth noting that data normalization before inserting the input data into the network is a good practice, since inserting variables with both large and small magnitudes will have negative effects on learning algorithm performance.For data normalization, the well-known formula of Equation ( 3) was used: where y is the normalized value,  is the current value, and  and  are the minimum and the maximum of the original parameters, respectively.The encoder transforms the input sequence into state vectors (known as thought vectors), which are then inserted into the decoder in order to start output sequence generation according to the thought vectors.The decoder is just a language model conditioned by the initial states [61].

Solar and Wind Data Preprocessing and Forecasting Model Configurations
To appropriately train the model, two data preprocessing procedures were carried out.The first procedure normalized the data and the latter procedure accommodated for missing data.As for the latter, the average of nearby values during the same week was calculated to fill missing data values.Furthermore, it is worth noting that data normalization before inserting the input data into the network is a good practice, since inserting variables with both large and small magnitudes will have negative effects on learning algorithm performance.For data normalization, the well-known formula of Equation ( 3) was used: where y is the normalized value, x i is the current value, and x min and x max are the minimum and the maximum of the original parameters, respectively.These data were categorized by month, resulting in a monthly timeseries for years 2005-2016, which was then followed by model training and medium-term forecasting.Data were separated by month mainly because of the similarity of solar irradiation patterns, and secondly because of the relative similarity of windspeed patterns.
The most commonly used strategies for making multistep forecasts are [6,28,30,62]: For every timestep forecast, a new model is developed.This strategy demands large computational time since there are as many models to learn as the size of the forecasting horizon.
The recursive multistep strategy first trains a one-step model and then uses this single model for each horizon, but the prediction of the prior timestep is used as an input in place of the original dataset value for making a prediction at the following timestep.The recursive approach is not so computationally intensive in comparison with the direct strategy, as only one model is fitted.This type of strategy strengthens error accumulation because the predictions of prior steps are inserted into the model instead of the real values.This phenomenon results in poor algorithm performance as the prediction time horizon increases.
In this strategy, a combination of direct and recursive strategies is used in order to take advantage of both methods.This method computes the forecasts with different models for every forecasting horizon (direct strategy), and at each timestep it enlarges the set of inputs by adding variables corresponding to the forecasts of the previous step (recursive strategy).
For the multiple output strategy, one model is developed in order to predict the whole forecast sequence in a one-shot manner.
In this study, the walk-forward validation forecast strategy is introduced, with an adaptive training window that expands after the desired forecast horizon (of 24 h) to include each time's recent actual (measured) values, and was applied with improved success for a prediction horizon of 24 h.The walk-forward validation forecast strategy splits the monthly timeseries dataset into preconcerted sub-fragments.Walk-forward validation is based on the sliding window method, where the data are used in ascending order of time rather than randomly shuffling training-test datasets.This validation approach is essential for time-series analysis methods in general, where observations with future timestamp information cannot be used to predict past (old) values.Thus, it is crucial to assess model forecasting performance by recursively augmenting training data with recent observations and reevaluating the model over the extended horizon [62].The recursive multistep forecast strategy and the multiple-output forecast strategy are applied over expanded timeseries fragments with a fixed sliding window of 24 h.The recursive multistep forecast strategy computes one-step-ahead forecasts (i.e., 1 h ahead) recursively until the desired forecast horizon (24 h) is achieved, while the multiple-output forecast strategy predicts the whole forecast horizon (i.e., 24 h ahead) in a one-shot manner.Then, the training set is expanded to incorporate recent actual (measured) values.Especially for solar irradiation forecasting, the sliding window magnitude is smaller than 24 h due to the subtraction of zero solar irradiation for every day, and it depends on the variable length of night during the year.Although the sliding window is smaller than 24 h (because of the excluded night hours), it represents, for the forecasting procedure, the window of the previous 24 h.For the training set, the months from the 2005-2014 monthly timeseries dataset were used in order to forecast the values for the corresponding months of 2015 and 2016.For instance, in order to forecast the windspeed and solar irradiation for January of 2015 and January of 2016, the measurements (dataset) for January in the years 2005-2014 were used to train the forecasting model.For every 24 h ahead forecasting, the real measurements (training dataset) available until midnight of the previous day were used to train the forecasting models [59].
The methodologies presented above for solar irradiation and windspeed medium-term forecasting with the recursive multistep forecast strategy and the multiple-output forecast strategy are described formally by the following equations, respectively: where: "ŷ" is the predicted value for hour "h", . . ..,"h − (k − 1) i.e., h − k + 1" of day "d"; . . .y(h − k, d − 1), . . . .y(h − 24, d − 1) are the historical measured values, "u i " represents the other external inputs (i.e., air temperature, relative humidity, global irradiance on the horizontal plane for windspeed forecasting, and air temperature), NDD(d), NDD(h,d) are the number of days in the month and the hour of the day, respectively, for solar irradiation forecasting, and k is the time instant sliding index.
In Table 3, the configuration of each layer for each model used is presented.Concerning the data shapes of encoder-decoder LSTM, multi-channel CNN, and multi-head CNN, one sample consists of 24 timesteps (i.e., 24 h ahead), with three features for windspeed forecasting and five features for solar irradiation.The training dataset has 300 days (7200 h) or 310 days (7440 h) of data, so the shape of the training dataset would be: [7200/7440, 24, 3/5].
The encoder-decoder LSTM model consists of two sub-models, the encoder and the decoder.The purpose of the encoder is to read and encode the input sequence, and then the decoder reads the encoded input sequence and makes a one-step prediction for each element in the output sequence.After the input sequence reading by the encoder, a 200element vector output is constructed (one output per unit) that captures features from the input sequence.At first, the internal representation of the input sequence is iterated multiple times, once for each timestep in the output sequence.This sequence of vectors is carried forward to the LSTM decoder.Then, the decoder is defined as an LSTM hidden layer with 200 units.It is worth mentioning that the decoder will output the entire sequence, not just the output at the end of the sequence, as was done with the encoder.This means that each of the 200 units will output a value for each of the 24 h, representing the basis of what to predict for each hour in the output sequence.Then, a fully connected layer to interpret each timestep in the output sequence is used before the final output layer.It is important to note that the output layer predicts a single step in the output sequence, not all of the 24 h at a time.
In multi-head CNN, a different CNN sub-model reads each input with two convolutional layers with 32 filters with a kernel size of 3, a max pooling layer, and a flattened layer.The internal representations come together before them to be interpreted by two fully connected layers of 200 and 100 nodes, respectively, and used to make a prediction.
In multi-channel CNN, a separate channel is linked to each input, similar to different image channels (e.g., red, green, and blue).A model that shows excellent performance consists of two convolutional layers with 32 filter maps with a kernel size of 3 followed by pooling, then another convolutional layer with 16 feature maps and pooling.The fully connected layer that interprets the features consists of 100 nodes.
The choice of hyperparameter values is of great importance [63][64][65][66]; for this reason, the well-known grid search method was adopted [49,67,68].In this study, a grid search took place for the number of prior inputs, training epochs, and samples to include in each mini-batch, optimizer type, type of activation function, and learning rate.In more detail, for number of prior inputs, a set of {6, 12, 24, 48} was examined; for number of training epochs, a set of {5-100} was examined; for mini-batch size, a set of {8-512} was examined; optimizer types {RMSProp, ADAM, SGD, AdaGrad, AdaDelta, AdaMax, NADAM} were applied; activation functions {Relu, Elu, Tanh, Sigmoid} were applied; the learning rate takes values within {10 −5 -10 −1 }; see refs [49,67,68].The grid search ended up with the optimal hyperparameters shown in Table 4.In this research, 12 monthly models were applied for each deep learning technique for solar irradiation and windspeed one-day-ahead forecasting, and were developed with their corresponding optimal parameter configurations.Each model was run 20 times by performing several experiments in order to reduce the forecasting error statistics, which was found to be sufficient for the present work's case studies.Then, the findings were recorded according to the mean values of the forecasting performance statistical metrics.Computations were carried out on a desktop computer with the following characteristics: Energies 2022, 15, 4361 10 of 25 64 bit OS, CPU i5 2.30 GHz, and 8.00 GB of RAM.The forecasting run time for each test set was about 8 min.

Deep Learning Forecasting Performance Evaluation Using Well-Established Error Metrics
Having arrived at the optimal hyperparameters of the forecasting models, evaluation of the results of windspeed and solar irradiation forecasting was based on well-known relationships to calculate the deviation (error) between predicted and real (measured) values, i.e., the well-known forecasting error statistical metrics [1].These well-known relationships that are used extensively to evaluate forecasting methods in such prediction problems are shown in Table 5, where Y is the actual value and Ŷ is the forecasted value.
In Figures 2-5, solar irradiation hourly predictions and windspeed hourly predictions are presented for July and November of 2016 for all the deep learning models that were applied in this survey.The figures followed with the letter 'a' (e.g., Figure 2a) refer to the recursive multistep forecast strategy, while the figures followed with the letter 'b' refer to the multiple-output forecast strategy.It is clarified that in Figures 2-5 in the horizontal axes the time unit is 'hour', but obviously this is not possible to show graphically; thus, the time interval appearing is 'day', so within each interval of 'one day', 24 hourly values are depicted.The fluctuations in solar irradiation observed in Figure 3a,b are due to the cloudy weather during November, in contrast with Figure 2a,b, where the clear sky during July gives an almost periodical curve.In both Figure 4a,b and Figure 5a,b, small and high variations in the windspeed were observed.
The average daily performance metrics for each of the three deep learning algorithms applied for each month of 2015 and 2016 for solar irradiation forecasting and windspeed forecasting are presented in Tables 6 and 7, respectively, in order to determine which method is more appropriate for solar irradiation and windspeed forecasting.In Tables 6 and 7, CNN1 and CNN2 refer to multi-head CNN and multi-channel CNN, respectively.
Concerning the three deep learning techniques, the encoder-decoder LSTM method showed improved forecasting performance for solar irradiation forecasting, while multihead CNN (CNN1) gave higher success rates for windspeed forecasting according to the performance metrics shown above for both strategies.Comparing the recursive multistep forecast strategy with the multiple-output forecast strategy, the latter outperformed the former in all cases studied.Moreover, Table 6 clearly shows that for the summer months the deep learning models had better forecasting rates than for the remaining months of the year for solar irradiation forecasting due to the absence of clouds, which is somewhat expected.Encoder-decoder LSTM presents a strong competitive advantage, especially in summer months, while in the remaining months encoder-decoder LSTM performs slightly better in comparison with CNN1 and CNN2.In Table 7, CNN1 performs a little better in all the months of the year in comparison with the encoder-decoder LSTM and CNN2 for windspeed forecasting.Taking into account the increased variability of windspeed in contrast to solar irradiation and the 24 h forecasting horizon, the MAPE index values are justified (see similar results in refs [69][70][71]).Moreover, April and March are the windiest months of the year, which justifies the high MAPE index values of these months compared to the other months of the year.The average daily performance metrics for each of the three deep learning algorithms applied for each month of 2015 and 2016 for solar irradiation forecasting and windspeed forecasting are presented in Tables 6 and 7, respectively, in order to determine which method is more appropriate for solar irradiation and windspeed forecasting.In Tables 6 and 7, CNN1 and CNN2 refer to multi-head CNN and multi channel CNN, respectively.

Evaluation of Conventional Forecasting Performance Methods Using Error Metrics
In Tables 8 and 9, respectively, the average daily performance metrics for the two wellproven conventional methods examined (RegARMA and NARX) and the deep learning technique with the more accurate forecasting performance for solar irradiation (i.e., encoderdecoder LSTM) and windspeed (i.e., CNN1) are presented [72][73][74][75][76][77].NARX is a nonlinear autoregressive exogenous model that has become popular in the last few years for its performance in timeseries forecasting problems, and RegARMA is a model that is based on regression with autoregressive-moving average (ARMA) timeseries errors.
The architecture that was developed based on NARX is series-parallel.This architecture is used when the output of the NARX network is considered to be an estimate of the output of a nonlinear dynamic system.Specifically, the model was created with the following parameters: input delays (1:24), feedback delays (1:24), hidden layer size: 20, and training learning algorithm (Levenberg-Marquardt).
The inputs used for NARX and RegARMA were the same as those used in the deep learning techniques.Regarding the comparison of the conventional methods (Tables 8 and 9), NARX had slightly better performance than RegARMA for the majority of cases.
The comparison between these two categories of forecasting methods (conventional vs. deep learning, as presented in Tables 8 and 9) clearly showed the improved forecasting performance of the deep learning techniques in all of the cases presented and for both forecasting strategies (i.e., recursive multistep forecast strategy and multiple-output forecast strategy).Tables 10 and 11 compare the MAPE performance of these methods with the best performance in each category with respect to turbulence intensity (TI) and clearness index (CI).TI is defined as the ratio of standard deviation of fluctuating wind velocity to the mean windspeed, and it represents the intensity of wind velocity fluctuation [78].CI is defined as the ratio of the monthly average daily irradiation on a horizontal surface to the monthly average daily extraterrestrial irradiation, and its value (which lies between 0 and 1) represents a measure of the clearness of the atmosphere: higher CI values appear under clear and sunny conditions, and lower CI values appear under cloudy conditions [54].More specifically, Table 10 compare the performance improvement of CNN1 over NARX (i.e., the conventional method with the best average forecasting performance) with respect to the TI value for the windspeed data of 2015-2016.From Table 10, it can be seen that CNN1 tends to have lower MAPE values with slight MAPE index improvement compared to NARX for the months with lower TI (i.e., July to September) and high MAPE index improvement for the months with higher TI (i.e., April and October).Table 11 compares the performance improvement of encoder-decoder LSTM over NARX (i.e., the conventional method with the best average forecasting performance) with respect to the CI value for solar irradiation data of 2015-2016.Regarding Table 11, it can be seen that for months with higher CI (i.e., summer months), MAPE index improvement is significantly lower.conventional forecasting methods also examined.However, given the extremely large differences in the number of parameters and in the use of information between deep learning and conventional forecasting techniques, this result was somewhat expected.Finally, comparison of the recursive multistep forecast strategy versus the multiple-output forecast strategy was thoroughly performed.
The improved, with the slight modifications proposed above, deep learning forecasting models presented in this paper were shown to perform better than conventional deep learning and autoregressive methods [69][70][71][72][73].Moreover, they can also be applied to photovoltaic panel-and wind turbine-generated electric power forecasting.It must be noted that errors of the measuring equipment were not taken into account.If their measurements are available, additional meteorological and site determination factors such as the amount of rain, azimuth for solar irradiation, wind direction, and the terrain's form and roughness for windspeed forecasting could also be considered for further improvement of forecasting performance.Accurate solar irradiation and windspeed one-day-ahead forecasting constitutes the first indispensable module, together with the energy storage and management module, to form smart energy management system (SEMS) to optimize the operation of a microgrid incorporating RES.

Figure 1 .
Figure 1.Encoder-decoder LSTM basic architecture.The decoder operates slightly differently during training and inference.During training, teacher forcing is used, which accelerates decoder training.The input to the decoder at each timestep is the output from the previous timestep.The encoder transforms the input sequence into state vectors (known as thought vectors), which are then inserted into the decoder in order to start output sequence generation according to the thought vectors.The decoder is just a language model conditioned by the initial states[61].

Figures 2- 5
Figures 2-5 in the horizontal axes the time unit is 'hour', but obviously this is not possible to show graphically; thus, the time interval appearing is 'day', so within each interval of 'one day', 24 hourly values are depicted.The fluctuations in solar irradiation observed in Figure 3a,b are due to the cloudy weather during November, in contrast with Figure 2a,b, where the clear sky during July gives an almost periodical curve.In both Figures 4a,b and 5a,b, small and high variations in the windspeed were observed.

Table 3 .
Model configurations for windspeed and solar irradiation forecasting.

Table 4 .
Optimal hyperparameters of the models.

Table 5 .
The performance metrics used.

Table 6 .
Solar irradiation forecasting results: (a) average daily forecasting results for 2015 and 2016 with the recursive multistep forecast strategy.(b) average daily forecasting results for 2015 and 2016 with the multiple-output forecast strategy.

Table 7 .
Windspeed forecasting results: (a) average daily forecasting results for 2015 and 2016 with the recursive multistep forecast strategy.(b) average daily forecasting results for 2015 and 2016 with the multiple-output forecast strategy.

Table 8 .
Solar irradiation forecasting results: (a) average daily forecasting results for 2015 and 2016 with the conventional methods and the best deep learning technique via the recursive multistep forecast strategy.(b) average daily forecasting results for 2015 and 2016 with the conventional methods and the best deep learning technique via the multiple-output forecast strategy.

Table 9 .
Windspeed forecasting results: (a) average daily forecasting results for 2015 and 2016 with the conventional methods and the best deep learning technique via the recursive multistep forecast strategy.(b) average daily forecasting results for 2015 and 2016 with the conventional methods and the best deep learning technique via the multiple-output forecast strategy.

Table 10 .
CNN1 and NARX forecasting performance comparison: (a) windspeed average daily forecasting MAPE with respect to the turbulence intensity (TI) monthly average for years 2015-2016 via the recursive multistep forecast strategy.(b) windspeed average daily forecasting MAPE with respect to the turbulence intensity (TI) monthly average for years 2015-2016 via the multiple-output forecast strategy.

Table 11 .
LSTM and NARX forecasting performance comparison: (a) solar irradiation average daily forecasting MAPE with respect to the clearness index (CI) monthly average for years 2015-2016 via the recursive multistep forecast strategy.(b) solar irradiation average daily forecasting MAPE with respect to the clearness index (CI) monthly average for years 2015-2016 via the multiple-output forecast strategy.