Algorithms for Hyperparameter Tuning of LSTMs for Time Series Forecasting

: The rapid growth in the use of Solar Energy for sustaining energy demand around the world requires accurate forecasts of Solar Irradiance to estimate the contribution of solar power to the power grid. Accurate forecasts for higher time horizons help to balance the power grid effectively and efﬁciently. Traditional forecasting techniques rely on physical weather parameters and complex mathematical models. However, these techniques are time-consuming and produce accurate results only for short forecast horizons. Deep Learning Techniques like Long Short Term Memory (LSTM) networks are employed to learn and predict complex varying time series data. However, LSTM networks are susceptible to poor performance due to improper conﬁguration of hyperparameters. This work introduces two new algorithms for hyperparameter tuning of LSTM networks and a Fast Fourier Transform (FFT) based data decomposition technique. This work also proposes an optimised workﬂow for training LSTM networks based on the above techniques. The results show a signiﬁcant ﬁtness increase from 81.20% to 95.23% and a 53.42% reduction in RMSE for 90 min ahead forecast after using the optimised training workﬂow. The results were compared to several other techniques for forecasting solar energy for multiple forecast horizons.


Introduction
Over recent years, the focus has shifted to exploiting sustainable sources of energy.Forecasting Solar Irradiance several timesteps ahead can prove beneficial to power stations in grid management and conventional-non-conventional energy balance in the grid [1].Machine Learning (ML), Neural Networks (NNs) and Stochastic Models like Autoregressive Integrated Moving Average (ARIMA) Models have been applied to learn and understand the patterns of variations in Solar Irradiance to forecast the production of Solar Energy.There are 4 commonly used techniques for Solar Energy forecasting, Statistical, Physical, ML-based and Artificial Intelligence (AI) based methods according to Wang et al., 2018 [2].Physical Models are deterministic models which use lower atmospheric parameters to understand the data and its patterns before making the forecast.Thus the physical methods are more complex, time-consuming and less accurate for long-term forecasts.Statistical Methods use mathematical models to forecast and have been widely used for time series forecasting [3].These models may be linear or non-linear but give good performance only for short-duration forecasts [4,5].Colak et al., 2015 [6] employed the ARIMA model for forecasting Solar Radiation for up to 3 h ahead and obtained a mean absolute percentage error of 92.37%.The Seasonal ARIMA (SARIMA) model was used to forecast monthly average insolation around the regions of Delhi, India, obtaining an R 2 score of 92.93% in Shadab et al., 2020 [7].It was observed that ML and AI methods produce good long-term forecasting results compared to statistical and physical methods, despite having no clear understanding of the data [8].
NNs perform better than traditional ML forecasting algorithms and techniques like ARIMA for time series forecasting [9,10].Recurrent Neural Networks (RNNs) were used initially for forecasting but RNNs are unable to learn the relevant information from input data when the input gap is large.Long Short Term Memory (LSTMs) can handle long-term dependencies due to its gate functions [11].LSTMs form a neural network with the nodes unwrapped over time and allow the network to take advantage of previous calculations.LSTM is used to forecast Solar Energy due to its intermittent nature [12,13].
LSTM Networks are highly configurable through several hyperparameters.Choosing the correct set of hyperparameters for the network is crucial because it directly impacts the model's performance.According to Bischl et al., 2021 [14], the brute force search for hyperparameters is time-consuming and irreproducible for different runs of the model.This highly unreliable trial-and-error technique slows the entire machine-learning process pipeline.Falkner et al., 2018 [15] explore several techniques like Bayesian optimisation and bandit-based methods in the domain of hyperparameter tuning providing a practical solution for several desired statistics in ML models like Strong Anytime Performance, Strong Final Performance, Effective use of parallel resources, scalability, robustness and flexibility.Hyperparameter Tuning is the process of finding the optimal set of hyperparameters which generate a network with maximum performance.Hyperparameter Tuning and Feature Selection should be used alongside as it helps in minimising the ratio of resources allocated to the performance obtained from the model [16].Hyperparameter Tuning algorithms usually work on balancing the following two factors which affect the optimisation power of the algorithm:

•
Trade-off between the exploration and exploitation of the search space of hyperparameters.This affects the bias of the algorithm to perform a local search near the current best-selected search agents or perform a random search to attain a new set of search agents.Such trade-offs are generally affected by resource constraints.

•
Trade-off between inference and search of hyperparameters.The algorithm generates a new set of hyperparameters from the currently available set of hyperparameters either by searching for better hyperparameters or inferencing from the existing set.This refers to the exploitation characteristic of the algorithm.
The Genetic Algorithm is an optimisation algorithm based on the evolution principle found in nature.The algorithm consists of 6 fundamental steps, Population Initialisation, Fitness Evaluation, Termination Condition check, Random Selection, Breeding or Crossover, and Random Mutation.Population Initialisation is responsible for generating the initial population which has to be optimised.Fitness Evaluation finds out the fitness of each member of the population using the objective or the fitness function.This fitness value helps in obtaining the best member out of the population.The Random Selection step ensures that two unique members are chosen randomly (following a uniform random distribution pattern) for breeding.The Breeding or Crossover step is responsible for generating children from the randomly chosen parent members.These children may randomly mutate in the Random Mutation step and are added to the population thereafter.Finally, the algorithm terminates if the number of generations exceeds the specified value or any termination condition mentioned has been satisfied [17].Gorgolis et al., 2019 [17] also explores the use of the Genetic Algorithm for tuning the hyperparameters for LSTM Network Models and uses an n-dimensional configuration spacefor hyperparameter optimisation where n is the number of configurable hyperparameters of the network.LSTMs are highly sensitive towards network parameters like the number of hidden layers, number of cell units in each layer, activation functions, size of history time input and so on, which drastically affect the model's performance.Genetic Algorithm based optimiser has also been explored in several other time-series forecasting problems (like stock market prediction) using LSTMs and has given out good results [18].
Heap Based Optimiser (HBO) has been used extensively for Feature Selection for Time Series Applications using LSTMs [19] and for extraction of parameters of various models [20,21] while works like Ginidi et al., 2021 [22] use HBO to solve complex optimisa-tion problems.HBO uses a heap data structure to organise the input data population and applies the Corporate Ranking Hierarchy (CRH) to it.CRH ensures that the algorithm does not get stuck on local optima, which is a common problem faced by several optimisation algorithms [23].
This paper makes the following contributions: • Decomposing Time Series Data using Fast Fourier Transform to extract logical and meaningful information from the raw data.• Two algorithms and an optimized workflow for tuning hyperparameters of LSTM networks using HBO and GA have been designed and developed for potential operationalready applications.

•
The effect of two different optimizers on LSTM Networks and the effect of data decomposition on the network's forecast performance and efficiency have been analyzed and inter-compared.

•
The optimized LSTM Network with the default network using 90 min ahead forecast has been evaluated and compared.The impact and the necessity of hyperparameter tuning are highlighted and depicted through the comparisons performed.
The rest of the paper is structured as follows.Section 2.1 describes the data processing involved in the system, which contains feature selection, data scaling and supervised time series conversion in Sections 2.1.1 and 2.1.2and the data decomposition in Section 2.1.3.Section 2.2 describes the Optimiser Design for Hyperparameter Tuning, which consists of Section 2.2.1 elucidating HBO and Section 2.2.2 describing Genetic Algorithm Based Optimiser (GAO).The results and observations are presented in Section 3. Conclusions are presented in Section 4.

Data Processing
This paper used raw data similar to data used in Kumar et al., 2022 [24].The collected raw data contained 154,277 entries of solar irradiance (W m −2 ) received from 22 different locations of a solar power plant in Southern India and the net irradiance received.The raw data was timestamped with each record at an interval of 5 min.Each of the 22 inputs is labelled as a feature and the net irradiance is labelled as output.The desired output of the forecast was the 18th point from the current i.e., 90 min ahead forecast.The raw data was further processed as described in the following sections.

Feature Selection and Data Processing
Feature Selection is done after plotting each of these features individually and in a correlation matrix to understand their influence on the data and their relationships with the target variable as well as with other variables.The data contained input features which were highly correlated with the output as well as with each other.Hence, all of the features were selected to minimise the loss of information after feature selection.To avoid bias towards the features having higher numerical values, each of the features was scaled to the range [0, 1].This scaling is applied to the input as well as the output to maintain the original relation between them.This scaling is inverted after the forecasting process to obtain the actual value of the output instead of the normalised value.Also, the NULL values were filled with the average value of the feature to avoid deviancy in data and retain the feature correlation with others.Figure 1 shows the correlation matrix obtained for the raw data.

Conversion to Supervised Time Series
LSTM network requires data in a supervised time series format.A supervised data set implies that the input features and the output variable have to be separated.A time series implies that data must form a series with its terms separated over time.The output variable is assumed to be influenced by previous data of all the input features as well as the previous values of the output variable itself.The steps followed to convert the data to supervised time series are as follows, 1.
Column Axis Shifting to generate Time Series: All the input features were shifted by 18 to 1 records on the column axis to create a time series of length 18 points as follows, The past value of the output variable is also considered as an input feature and forms the input pattern similar to one observed in Jursa and Rohrig, 2008 [25].The output variable is shifted forward by 18 points i.e., output(t + 18) is the actual time series output which is to be forecasted.

2.
Reshaping the Data for LSTM Network: After creating the time series, the data consisted of 18 × 23 = 414 input features and 1 output column.The input vector is a two-dimensional vector and is not compatible with LSTM networks.The input data needs to be reshaped as,

Data Decomposition
The data was decomposed into two components: High Frequency (HF) Data and Low Frequency (LF) data, using the Fourier Transform to extract long-term trends and other information which would aid the LSTM network for forecasting.The data was analysed using the Fast Fourier Transform technique [26] to obtain an optimal decomposition.FFT is an efficient approach to calculating the Fourier Transform of a signal.FFT considers any complex signal as a combination of several sine waves with varying amplitudes and frequencies [27].The optimal frequency for splitting the signal into two components was obtained by minimising the difference of R 2 scores of LF and HF data with the original raw signal.Figure 2 shows the FFT plot obtained for the data.Section 3.1 shows the results obtained after data decomposition.

Optimiser Design
Hyperparameter Tuning can be considered an optimization problem with the objective of improving the model's forecasting performance by changing the hyperparameters.This problem needs to be tackled in a systematic way which would assure proper tuning with optimal space and time complexities.The HBO proposed in Askari et al., 2020 [23] has been adapted into several works to solve numerous optimization problems and has performed considerably well.Similarly, the GA has also been adapted into different works to solve similar optimization problems.Thus, both of these algorithms have matured in this domain making them fit for hyperparameter tuning.Inspired by HBO and GA, this work proposes custom versions of HBO and GAO for hyperparameter tuning of LSTM networks.These optimiser algorithms have been explained in the following sections.

Heap Based Optimiser
The Heap Based Optimizer updates the search space of the next best node from the current node using the most impacting neighbour node.The search space of the next best node from the current node is updated as follows: • Impact of the immediate parent or the superior node It implies that the next optimal point may be located in the neighbourhood of the parent node and hence the search area is modified to match the neighbourhood of the parent heap node.

• Impact of cousin nodes or colleagues
Nodes at the same level as the current node balance the exploration and exploitation of the search space for the optimal solution.• Impact of the heap node on itself (Self-contribution) The node has some effect on itself for the next iteration and remains unchanged for the particular iteration.
where x k i (t) is the k th component of the i th node in the current iteration and x k i (t + 1) is the updated component.λ k is the component of a randomly generated vector − → λ .γ is an optimiser parameter which helps in escaping local optima and exploiting the region around it.B k refers to the component of the parent/superior node.S k r refers to the component of a randomly selected cousin/same-level node.f ( − → x i (t)) is the objective function and p 1 , p 2 , p 3 are probabilities which are calculated as, where t is current iteration number and T is the maximum number of iterations.
Based on HBO this work proposes HBO-algorithm to tune the hyperparameters of an LSTM network.The algorithm is enumerated below.

1.
Initialise the algorithm with the optimiser parameters like Initial Population Size (Heap Size) and Maximum Iterations (T). 2.
Random-generate the initial population and build the heap structure of Configuration Vectors (Section 2.3).

3.
Update each component of the configuration vector using HBO's updation policy.
After updating all the components in the configuration vector, a new configuration vector that implies a new possible LSTM network has been obtained.4.
Repeat steps 3-4 until T iterations are done.
After the termination of the algorithm, the apex node of the heap denotes the configuration vector which generates the LSTM network with the highest performance.Zhang and Wen, 2022 [28] explore 3 different node updating policies for HBO which shows the versatility of the algorithm.Figure 3 summarises the algorithm as a flowchart.[17,18,29,30], this work proposes GAO-algorithm for tuning hyperparameters of an LSTM network.The algorithm is enumerated below.

1.
Configure the optimiser with its parameters like the number of generations, population size, mutation chance and random selection chance.2.
Generate the initial population and evaluate the fitness of each of the configuration vectors in the population.

3.
Breed children from two randomly selected parents and either evolve or mutate them to obtain new configuration vectors based on GA parameters Mutation Chance and Random Select Chance.The parents are selected using a uniform random distribution.4.
Add the children obtained to the population and sort the population based on fitness. 5.
Repeat steps 3-4 until the required number of generations is reached.
After the termination of the algorithm, the first element in the population denotes the configuration vector which generates the LSTM network with the highest performance.Figure 4 summarises the algorithm in a flowchart.

Hyperparameter Configuration Space Setup
The hyperparameter configuration space is an n-dimensional functional space which contains the set of all possible combinations of the hyperparameters for the given network.A configuration vector is a vector in the hyperparameter configuration space which represents a unique set of hyperparameters of an LSTM network.The hyperparameter configuration space can become very extensive and may cause the system to become resource intensive.Hence we defined the configuration space for the data used as follows, The total number of possible configuration vectors in the above configuration space is, Total Configuration Vectors = 6 × 4 × 7 × 10 × 6 = 10,080.
Hence, over 10,000 different LSTM networks can be generated from this configuration space.Training all of these networks would be time-consuming and inefficient, thus, it requires HBO-algorithm and GAO-algorithm to find the optimal configuration vector.

LSTM Network Properties
After processing the data (Sections 2.1.1-2.1.3)and designing the algorithms for tuning the hyperparameters (Sections 2.2.1 and 2.2.2), the next step was the LSTM Network's properties.This is divided as follows: Section 2.4.1:Network Generation using the Configuration Vector.Section 2.4.2:LSTM Network's Performance Parameters.Section 2.4.3:Optimised LSTM Network Training Setup.

Network Generation
LSTM Network is generated from a configuration vector by using the hyperparameters represented by the vector.Each component of the configuration vector represents a parameter required to construct and train the LSTM network.

Network Performance Parameters
Network performance parameters are the measures used to analyse and compare the LSTM networks.The following parameters were used in the system: • Root Mean Squared Error (RMSE) • Mean Squared Logarithmic Error (MSLE) • Model Fitness Model Fitness is a custom metric designed to give a balanced R 2 Score in the range of [−100, 100].Model Fitness was used as the objective/fitness function in both HBO-algorithm and GAO-algorithm for updating the population.Model Fitness is calculated as,

• Iteration Fitness
The Hyperparameter Tuning algorithms (Sections 2.2.1 and 2.2.2) carry out their operations in multiple steps which are repeated a certain number of times.Each such cycle is known as Iteration.Hence, Iteration Fitness is defined as the average Model Fitness of the entire population for any particular iteration.This metric helps to understand the performance of the algorithm in improving the Model Fitness of the population collectively.
where Model Fitness i refers to the fitness of the i th configuration vector in the population and n(population) is the total number of elements in the population.
In Equations ( 3)-( 6) f i represents the forecast obtained from the model and fi refers to the actual solar irradiance value.N refers to the total length of the forecast made.

Optimised LSTM Network Training Setup
The entire data set consisted of 414 columns of time series data with an interval of 5 min between each record.The dataset was divided into a 75-25% (3:1) training-to-testing split ratio.Finally, Python (and its libraries) was used to process the input data, split the data into HF and LF components, design and develop the hyperparameter tuning algorithms and define the hyperparameter configuration space.Python-Keras was used to generate, train and test the LSTM networks.Once the LSTM Network properties have been defined, the next step was to set up the training process using the hyperparameter tuning algorithms designed in Sections 2.2.1 and 2.2.2.Before starting with the training of the network the optimiser must be configured with its parameters to aid it in finding the optimal hyperparameters.The first block of the flowcharts in Figures 3 and 4 represents this step.No network is trained explicitly by the system to reduce the time required to search the optimal hyperparameters.After randomly generating the initial population (Step 2 in algorithms in Sections 2.2.1 and 2.2.2), the networks are trained only when the algorithm requires the Model Fitness of the network to evaluate its position and viability in the population.This helps in avoiding unnecessary training of multiple networks while tuning the hyperparameters of the network, which saves time and resources.

Results and Discussions
After processing the data (Section 2.1), designing optimisers (Section 2.2), setting up hyperparameter configuration space (Section 2.3), setting the LSTM Network Properties and the optimised training process for the LSTM networks (Section 2.4), the next step was evaluation and analysis of the entire process and its outcome.This section is divided into the following sub-sections:

Comparison of Frequencies for Data Decomposition
In Section 2.1.3,the procedure for analysing the data using FFT and splitting the data into HF and LF components is defined.Figure 2 shows the FFT plot for the raw data.The frequencies are very low in value because the digital signal is sampled every 5 min or 300 s.Hence the sampling frequency is 1/300 samples/second.To identify the frequency which would best split the data into two unique components R 2 Score between the original signal and HF and LF signals for different frequencies was plotted.Figure 5 shows the plot of R 2 Score versus different frequencies for HF and LF components.Table 1 enumerates the same for a wider range of frequencies.After a detailed analysis of the results obtained, it was found that 0.000225 Hz was the best frequency to split the data.The HF and LF components post data decomposition using FFT with a frequency of 0.000225 Hz are shown in Figure 6.
These HF and LF components correspond to two different aspects of the input signal.The LF component can be perceived as the daily trend (daily swing) in solar irradiance.Thus, it helps in understanding the approximate change in irradiance as the day progresses.The HF component can be perceived as random changes in solar parameters (like illumination, cloud cover, attenuation et cetera.)which affect the irradiance constantly.However, no relevant works were found to support these arguments.

Comparison of Optimisers
The LSTM Networks were trained using the algorithms designed in Sections 2.2.1 and 2.2.2.The performance of the algorithms over each iteration was compared using the Iteration Fitness metric defined in Section 2.4.2.Iteration Fitness for each iteration was obtained from both algorithms and compared to observe the improvement in the fitness of the entire population as the algorithm processed it in successions.Figure 7 shows the Iteration Fitness plot for HF, LF and Original data sets.
From Figure 7 it can be observed that the algorithms improve the average efficiency of the population of models present significantly.HBO-algorithm uses a carefully designed parameter C which is calculated as C = T/25 [23].Hence it assumes that the Total Number of Iterations is well over 25.Hence HBO's performance may be undermined due to the limited number of iterations (5 iterations).But this must be limited on the upper end as well.Similar to Bischl et al., 2021 [14], it was observed that long hyperparameter tuning algorithm runs may lead to biased performance estimators and choosing an incorrect set of hyperparameters.Longer runs also implies that a higher amount of resources must be allocated.Hence, the number of iterations must be chosen carefully to avoid either of the above-mentioned conditions.
The performance of these optimisers is also compared by using networks trained with 3 different input time series data: High Frequency Data, Low Frequency Data and Original Data.These time series data were obtained from the procedures in Sections 2.1.3and 3.1.Table 2 shows the error comparisons between the optimisers in detail.The HF data, decomposed from the Original data, gave similar results in all three cases presented in Table 2.This phenomenon can be attributed to the highly varying nature of the data (shown in Figure 6a.Each of the models used in the study, forecasted HF data close towards the mean value as shown in Figure 8a.

Effect of Data Decomposition versus Tuning Algorithms
It was also important to understand the effect of data decomposition along with the effect of different tuning algorithms.Furthermore, it is crucial to compare these effects against each other to ensure that the best possible effect on the network is obtained and hence the best possible performance from the network is achieved.This section details the effect of data decomposition against the effect of tuning algorithms using the fitness function defined in Section 2.4.2,Model Fitness.Table 3 shows the Model Fitness based comparison.It was observed that the model fitness for High-Frequency Data is much lower than what was obtained for Low Frequency and Original data sets, similar to the results obtained in Section 3.2.The results obtained in Table 3 have been analysed using previous works done for short-term and long-term solar energy forecasting.In Fentis et al., 2017 [31], Feed Forward Neural Networks trained with Levenberg-Marquardt algorithm for 15 min ahead shortterm forecasts achieving a best R 2 Score of 0.96 (eq.96% Model Fitness).In Elsaraiti and Merabet, 2022 [32], LSTMs have been employed to obtain 30 min ahead forecast with R 2 Score of 0.745 (eq.74.5% Model Fitness).Serttas et al., 2018 [33] proposed and implemented the Mycielski-Markov model achieving a R 2 Score of 0.8749 (eq.87.49% Model Fitness).In Haider et al., 2022 [34], Artificial Neural Networks (ANN), Convolutional Neural Networks (CNN) and LSTM have been used to forecast 1 h, 3 h, 6 h and 12 h ahead.Haider et al., 2022 [34] achieved the highest values of R 2 Scores of 0.984 (98.4%) and 0.913 (91.3%) for 1 h and 3 h ahead forecasts using LSTM.LSTM struggled to perform adequately for forecast horizons of 6 and 12 h ahead obtaining low R 2 Scores less than 0.5 (50%).ANN and CNN performed similarly to LSTM for 1 h and 3 h ahead forecasts but performed significantly better for 6 h and 12 h ahead forecast horizons achieving consistent R 2 Scores above 0.8 (80%).According to the survey conducted by Lai et al., 2020 [35], the average R 2 Score obtained in the solar energy forecasting literature was 0.9240 (92.4%).Liu et al., 2022 [36] used LSTM f usion model for Leaf-Area Index (LAI) and obtained the highest R 2 score of 0.73 (73.0%) between retrieved and aggregated reference LAI on multiple groundmeasured dates in 2016.Along similar lines, the HBO-algorithm LSTM Network trained with decomposed data in our work achieved the best Model Fitness of 95.278% for 90 minahead forecasts.However, the Model Fitness (and hence R 2 Scores) was found to vary with the nature of data input to the system and hence is data-dependent, similar to the conclusions drawn in [32][33][34][35].
To study these effects even further, Figure 8 shows the forecasted values by the networks generated against the actual values.
From Tables 2 and 3, it can be observed that High Frequency Data proved to be difficult to forecast.It can be seen that HBO-algorithm gave a better network for more data inputs.The following observations were made from the experiments carried out: 1.
High-Frequency Data produces the lowest amount of errors (from Table 2) but this can be misleading.After checking the Model Forecast Plot in Figure 8 and Model Fitness from Table 3 it was verified that the High-Frequency Data causes the system to forecast the mean value of data and hence the low error but a flat line forecast and very low model fitness.

2.
Low-Frequency Data produces the best Model Fitness (from Table 3).This can be verified using Figure 8.

3.
Original Data produces the highest error (amongst the 3 data sets).This is explained by the fact that original data contains characteristics from both high-frequency and low-frequency data.The Model Fitness is also a little lower than Low Frequency but much higher than High-Frequency Data.

Further Discussions
The analysed and assessed workings and results indicate that the optimised training process based on Tuning Algorithms/Optimisers presented in this paper along with Data Decomposition improves the LSTM network's performance significantly.In comparison to the references and previously done works, HBO-algorithm optimised LSTM Network with decomposed data offered reductions in RMSE by 53.42%, 78.30% in MSE and 79.07% in MSLE as compared to the default LSTM network (without data decomposition and no optimiser applied to it).It achieved the highest fitness of 95.278% for 90 min ahead forecast.GAO-algorithm LSTM Network with data decomposition also gave promising results slightly behind HBO with 94.38% fitness.Furthermore, the tuning algorithms may be combined and used in parallel to allow the system to train the Network which captures a wider variety of trends and characteristics as both the optimisers are sensitive to data differently (Figure 8).Zou and Yang, 2004 [3] and Wang et al., 2022 [37] also share similar insights on combining different models in order to capture more information from the available data.

Conclusions
The majority of the works conducted in this domain previously have concluded that the performance can be improved by tuning the hyperparameters of their models.However, Hyperparameter Tuning faces several problems due to the absence of a proper system which can tune the hyperparameters efficiently and hassle-free.For the first time, this paper introduced two different algorithms, based on proven optimisation algorithms, to efficiently find the optimal set of hyperparameters for an LSTM Network and the Data Decomposition technique using FFT to improve the performance of the network.Similar to results obtained in Shahid et al., 2021 [38] this work also concluded that adding optimising algorithms positively and significantly impacts the network's performance provided the optimal number of iterations and optimiser parameters are used.

Figure 1 .
Figure 1.Heat Map of Correlation Matrix between each of the input features and the output variable.

Figure 2 .
Figure 2. Fourier Transform plot of an input feature, denoting the frequency distribution present in the input signal.The plot is obtained by using Fast Fourier Transform technique.

Section 3 . 1 :
Comparison of Frequencies for Data Decomposition Section 3.2: Comparison of Optimisers Section 3.3: Effect of Data Decomposition versus Tuning Algorithms Section 3.4: Further Discussions

Figure 5 .
Figure 5. R 2 Score versus Cutoff Frequency used for decomposing the data signal.

Table 1 .Figure 6 .
Figure 6.Comparison between the original signal and the signals obtained after data decomposition using FFT filtering: (a) High Frequency Component of the original data (b) Low Frequency Component of the original data.

Figure 7 .
Figure 7.Comparison between HBO-algorithm and GAO-algorithm using Iteration Fitness Performance parameter for 5 iterations (5 generations for GAO-algorithm): (a) Iteration Fitness for High Frequency Component of the original data (b) Iteration Fitness for Low Frequency Component of the original data (c) Iteration Fitness for the original data.

Figure 8 .
Figure 8.Comparison of HBO-algorithm, GAO-algorithm, No Optimiser LSTM Networks 90 min ahead Forecasts with actual output values: (a) Forecast of High Frequency Component of original data (b) Forecast of Low Frequency Component of original data (c) Forecast of original data.

Table 2 .
Comparison of Optimisers using Error Parameters for different data inputs.
* No Optimiser based LSTM Network implies the default network generated by Python-Keras.

Table 3 .
Comparison between Effect of Data Decomposition and Effect of Tuning Algorithms using Model Fitness.No Optimiser based LSTM Network implies the default network generated by Python-Keras. *