Next Article in Journal
A Review of Innovative Medical Rehabilitation Systems with Scalable AI-Assisted Platforms for Sensor-Based Recovery Monitoring
Previous Article in Journal
A Comparative Study of Deep Reinforcement Learning Algorithms for Urban Autonomous Driving: Addressing the Geographic and Regulatory Challenges in CARLA
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Short-Term Energy Consumption Forecasting Analysis Using Different Optimization and Activation Functions with Deep Learning Models

by
Mehmet Tahir Ucar
1,* and
Asim Kaygusuz
2
1
Ergani Vocational School, Dicle University, 21280 Diyarbakır, Türkiye
2
Department of Electrical and Electronics Engineering, Engineering Faculty, Inonu University, 44280 Malatya, Türkiye
*
Author to whom correspondence should be addressed.
Appl. Sci. 2025, 15(12), 6839; https://doi.org/10.3390/app15126839
Submission received: 2 May 2025 / Revised: 8 June 2025 / Accepted: 11 June 2025 / Published: 18 June 2025
(This article belongs to the Section Computing and Artificial Intelligence)

Abstract

:
Modelling events that change over time is one of the most difficult problems in data analysis. Forecasting of time-varying electric power values is also an important problem in data analysis. Regression methods, machine learning, and deep learning methods are used to learn different patterns from data and develop a consumption prediction model. The aim of this study is to determine the most successful models for short-term power consumption prediction with deep learning and to achieve the highest prediction accuracy. In this study, firstly, the data was evaluated and organized with exploratory data analysis (EDA) on a ready dataset and the features of the data were extracted. Studies were carried out on long short-term memory (LSTM), gated recurrent unit (GRU), simple recurrent neural networks (SimpleRNN) and bidirectional long short-term memory (BiLSTM) architectures. First, four architectures were used with 11 different optimization methods. In this study, it was seen that a high success rate of 0.9972 was achieved according to the R2 score index. In the following, the first study was tried with different epoch numbers. Afterwards, this study was carried out with 264 separate models produced using four architectures, 11 optimization methods, and six activation functions in order. The results of all these studies were obtained according to the root mean square error (RMSE), mean absolute error (MAE), and R2_score indexes. The R2_score indexes graphs are presented. Finally, the 10 most successful applications are listed.

1. Introduction

Increasingly, information communication technologies and artificial intelligence applications are being used in smart grids. Networks are being built healthier, faults are better detected and resolved, needs are better identified, outages are reduced, forecasts are improved, illegal use is being reduced, and quality is being improved. In the infrastructure of forward-looking programs, new inferences and calculations are being made in light of previous information and planning is improving as a result. The inferences, predictions, and calculations made for future situations, which we call predictions, actually form the basis of all the planning we do.
Forecasting is an indispensable element of planning. Successful and consistent planning depends on the success of forecasts, predictions, and calculations. The accuracy of the prediction is a very important issue and affects the success of the programming. Therefore, it is important for programmers to focus on prediction, to specialize in it, and to generate and develop new methods in this area.
Studies in the field of forecasting are very diverse. The majority of these studies have focused on stock market forecasting [1,2,3,4]. In addition, Cheng et al. worked on a stock-trading system [5], Vargas et al. on predicting the intraday movement direction of a selected index [6], and Li et al. on air temperature forecasting [7]. In addition, studies on the prediction of meteorological conditions such as air temperature, humidity, solar radiation values, air pollution, etc., are also noteworthy [8,9,10,11].
In the field of electricity, there are studies on remaining useful life (RUL) prediction and degradation process prediction [12,13,14,15]. When we look at the field of smart grids, there are studies such as energy-consumed estimation, energy-to-be-produced estimation, and solar or wind potential estimation [16,17,18].
In electricity planning, the planning of electrical energy for the following hours and days is very important. Incorrect planning, interruptions, bottlenecks, power fluctuations, and similar negativities directly cause great damage to production, the economy, and the country.
If power plants can estimate the amount of energy they will produce or sell, it will be easier for load dispatch units to make programming for the following days. Thus, the demanded and produced energy estimates will be balanced and will meet each other, and unplanned energy outages, voltage fluctuations, and system crashes will be prevented. In addition, these estimates are important in the operation, maintenance, fuel use, and similar planning of electricity production facilities and in electricity energy pricing. It should not be forgotten that increasing electricity efficiency will have important effects on the development of the country. Therefore, making the plans and estimates with high accuracy and minimum error rates will direct the measures to be taken and the studies to be carried out.
We can list electricity forecasts under four headings. 1. Very short-term load forecast (VSTLF), 2. Short-term load forecast (STLF), 3. Medium-term load forecast (MTLF), and 4. Long-term load forecast (LTLF). This study is based on short-term electricity consumption forecasts.
VSTLF requires only historical loads; STLF usually requires historical loads and weather information; MTLF requires weather and economic information; and LTLF needs weather, economic, demographic, and sometimes land-use information [19].

2. Literature Review

While statistical methods work well under normal conditions, their performance decreases during changing weather conditions, different sociological and economic conditions, and holidays. For this reason, researchers have turned to new approaches, and artificial intelligence methods have gained importance. The main artificial intelligence methods used for load forecasting are artificial neural networks (ANN), fuzzy logic, support vector machines (SVM), support vector regression (SVR), genetic algorithms (GA), particle swarm optimization (PSO), and ant colony optimization (ACO) [20,21,22].
Xiaoou Monica Zhang et al. and Yang Meiyan et al. used an SVM modeling approach as an energy consumption predicting algorithm. Here, analyses were made according to weather conditions, calendar, and usage time prices. The analysis shows that estimating residential energy consumption based on weather, calendar, and time-of-use price is feasible and reliable with sufficient accuracy for daily or hourly estimation for some individual residential uses [23,24].
According to the literature, artificial intelligence methods are more successful than statistical methods in capturing different features that affect the prediction and in making more accurate predictions.
In time-series studies, artificial intelligence methods show success despite various limitations such as the poor fit of statistical forecasting models to nonlinear data, the sensitivity of model parameters, and low model generalization ability. However, most methods typically adopt a predefined nonlinear shape and cannot simulate the real nonlinear relationship [25].
Although machine learning methods were known in the field of artificial intelligence, they were not widely used and were not preferred due to a lack of data. However, with the use of smart meters and their widespread use, a lot of data flow has occurred. As a result, machine learning methods began to be used instead of previous methods and they began to yield successful results.
In fact, thanks to the recorded data and developing data science, deep learning, a sub-branch of machine learning, has developed rapidly and has surpassed traditional machine learning in terms of prediction accuracy and efficiency in some areas [26].
An extensive literature study has also been conducted on artificial intelligence, machine learning, and deep learning applications [27].
There are different architectures designed to solve different problems in deep learning. The most well-known of these architectures are auto-encoders, convolutional neural networks (CNN), recurrent neural networks (RNN), long short-term memory (LSTM), restricted Boltzmann machines (RBM), deep belief networks (DBN), generative adversarial networks (GAN), transfer learning, and deep reinforcement learning (DRL).
In deep learning, RNNs, which process sequences with temporal relationships using self-feedback neurons, are used specifically for time series data [28]. Although the power consumption model is determined by long-term dependence, the predicted value of RNN is only related to the last value. The vanishing of gradients or exploding occurs frequently, which reduces the prediction accuracy. In recent years, gated cyclic unit LSTM, a sub-branch of RNN, has been frequently used to solve this problem [25].
LSTM has established itself in most time series forecasting studies and has outperformed other short-term load forecasting algorithms used in related studies [29,30,31].
Some studies have shown that deep learning-based algorithms such as the LSTM outperform traditional-based algorithms such as the autoregressive integrated moving average (ARIMA) model [32,33].
As seen in the literature, the LSTM model provides suitable solutions for time series. For this reason, LSTM has been mostly used in deep learning studies on time series and electrical energy forecasting studies in recent years. However, researchers were not satisfied with this performance and tried to increase the performance by playing with the LSTM model, forecast data, training method, hyper parameters and other variables. In our work, we use a total of four different architectures, including SimpleRNN and LSTM, as well as GRU and BiLSTM. We were not satisfied with the architectures we chose, but attempted to see the effects of these variables on the system and increase the system performance by playing with hyper parameters and other variables. Briefly, we examined the studies conducted to increase performance over the LSTM architecture and use different architectures:
Lin et al. used a two-stage attention-based LSTM network for short-term regional load probability forecasting, taking into account feature correlation and temporal dependencies. In the first stage, a feature attention-based encoder was built to calculate the correlation of input features with the electric charge at each time step. The most relevant input features were adaptively selected. In the second phase, a temporal attention-based decoder was developed to investigate time dependencies. Then, an LSTM model integrated these attention results, and probabilistic predictions could be made using a pinball loss function [34].
Ding et al. proposed an evolutionary dual attention-based long short-term memory model and introduced binary features using feature combination. This study compared the prediction performance of eight prediction methods. The results show that an attention mechanism can improve the efficiency of the LSTM algorithm when the model uses input time series data [25].
Wang et al. proposed a new approach based on an LSTM network to predict periodic energy consumption. First, hidden features were extracted by an autocorrelation plot among real industrial data. Correlation analysis and mechanism analysis contributed to finding appropriate secondary variables as model inputs. In addition, the time variable was complemented to fully capture periodicity. Experimental results on a specific cooling system show that the proposed method has higher prediction performance compared with back propagation neural network (BPNN), autoregressive and moving average (ARMA), and autoregressive fractionally integrated moving average (ARFIMA) [35].
A different model is gate recurrent unit networks (GRU). In some studies, GRU is considered to provide more accuracy, improve prediction, produce better performance results, and be faster than LSTM [36,37,38,39].
Another method to be evaluated is bidirectional LSTM (Bi-LSTM). Bi-LSTMs provide additional training by moving the input data twice (i.e., 1 left to right and 2 right to left). Siami-Namini et al. show that Bi-LSTM-based models provide better predictions than normal LSTM-based models. More specifically, Bi-LSTM models provide better forecasts than ARIMA and LSTM models. It has also been observed that Bi-LSTM models reach equilibrium much slower than LSTM-based models [40]. In another short-term energy consumption study, it was seen that the BiLSTM method is more successful than the SVR, CNN, and GRU methods [41].
There are also some studies on short-term electricity load consumption forecasting that use four of the LSTM, GRU, Simple RNN, and Bi-LSTM models. However, it can be seen that similar studies are mostly conducted in the fields of pandemics, production forecasting, and stock markets. In [42], a study was conducted on deaths and recoveries in ten major countries affected by COVID-19, and the performance of the models was measured using root mean squared error (RMSE), mean absolute error (MAE), and R2_score indices. In most cases, the Bi-LSTM model outperforms in terms of validated indices. The models ranked from best to worst performance in all scenarios were Bi-LSTM, LSTM, GRU, SVR, and ARIMA. In [43], a price forecasting system was developed by comparative analysis between LSTM, GRU, and Bi-LSTM models. It can be seen that the Bi-LSTM model, which used the last 5 days of trading data of the stock as input, reached an accuracy value of 63.54%. It can be seen that the best result is obtained with this method.
If we perform a short meta-analysis and look critically at three similar studies using the same dataset, we can see the following:
In a study using the same dataset as the one we studied, it was seen that missing and excess data in the dataset were not analyzed, and the data was not sorted according to time. Despite this, it can be seen that a successful study was performed with a multi-layered and complex Simple RNN and LSTM architecture with an R2 metric of 0.9814 [44]. Although we used a simple model with a single hidden layer in our study, 114 different models exceeded this value.
In another study using the same dataset, missing and excess data in the dataset were not analyzed. The models were compared using mean squared error (MSE) as the loss function. Here, the lowest error rate was shown as RNN 0.4, LSTM 0.31, and GRU 0.24. In our study, it was observed that this metric drops to a value of 0.002 [45].
A third study using the same dataset analyzed missing and excess data in the dataset. Models were compared using MSE, MAE, and RMSE as loss functions. Here, in the study conducted with the MSE loss function, the lowest error rate belongs to LSTM with 0.103. The order of success for the other models is multi-layer perceptron (MLP), CNN, linear regression, SVR, decision tree, extreme learning machine (ELM), and GRU [46].
It has been observed that studies using LSTM, GRU, and Bi-LSTM architectures, which are a sub-branch of RNN that has recently been studied extensively in the literature, have yielded successful results. Therefore, in this paper, Simple RNN, LSTM, GRU, and Bi-LSTM models were used and an increase of the prediction performance was attempted with different hyper-parameter values. Experiments were also conducted with many hyper-parameters that are not commonly seen in the literature.
The next part of the paper is organized as follows: Section 3 presents the models, methods, and aspects that differentiate the proposed study from other studies. Section 4 presents the findings obtained according to the methods used. Section 5 presents the findings and results of our study and visualizes them with graphs. Section 6 ends with the conclusion.

3. Materials and Methods

In this study, estimation studies are carried out using deep learning algorithms, which is a sub-branch of machine learning, from different artificial intelligence methods whose performance has been tested in the literature in estimating electricity consumption. It is seen that in most of the studies, joint studies such as electrical energy demand production or consumption estimation are carried out. However, generalizable methods have not yet been achieved in this regard, and even though some methods have been applied to similar datasets, different results have been obtained. In other words, new and different applications and experiments are often needed. There are also many areas that have not yet been clearly studied and are waiting to be developed. Therefore, there is a need to compare different techniques and produce different models in order to make model recommendations for datasets.
Some studies were limited because they were carried out without organizing the dataset and because they used classical methods for hyper-parameters, which have been found to be successful before. Developing models by considering these constraints will constitute the most important and different part of the study.
The studies that make the study different, meaningful and subjective are listed below.
  • Data will be organized.
  • 4 different architectures will be used.
  • To The effect of epoch numbers on the study will be examined.
  • 6 different methods will be used as activation functions.
  • 11 different methods will be used as optimization functions.
Using the above 5 methods, different models will be created in the field of deep learning and hourly energy consumption estimates will be made.
As data, hourly energy consumption data in megawatts for 17 years (1 January 2002–3 August 2018) of PJM, a regional transmission organization in the USA, will be used.
First, the data will be examined and organized. Excess data will be removed and missing data will be completed.
In addition to consumption values, new columns are created for year, month, week, day, and hour values. Again, 24-h data for each day in the day-ahead market is taken and used in the next hour’s consumption forecast. In this way, it is aimed to implement a supervised learning step and increase success. Again, with this method, it is aimed to increase our single-column consumption data to 6 columns and to reach 144 columns when 24-h sample values are taken. Thus, our dataset is intended to consist of 145 columns and 145,368 rows, including the datatime column. With this technique, our simple, single-column data is adapted to the MLP model.
In normal studies, the data will be in two parts. The first part will be subjected to the training process (80%), and the second part (20%) will be used for the test performance of these trained models. In this way, models will be trained with the training set, and their performance will be observed by making predictions on the test data.
As a method, RNN, a model designed to work with time series in the field of deep learning, LSTM, GRU, and Bi-LSTM architectures, which are a sub-branch of RNN, are used, and the results are compared. It is seen that in most studies, restrictions are imposed on the dataset and hyper-parameters. Developing models by considering these constraints and comparing the results by changing these hyper-parameters are among the main topics of this study. For this purpose, experiments are carried out with different activation functions for each model. For each model, 10, 50, and 100 epoch trials are performed, and the results are interpreted. In addition, a comparative analysis is made between the models by using different activation functions and optimization methods for each model. Finally, the 10 models that provided the highest performance in all tests conducted with 50 epochs are presented.

3.1. Data Analysis

3.1.1. Data

For the purpose of this article, we used hourly energy consumption data for the years 2002–2018, which is freely available on Kaggle and belongs to PJM Interconnection LLC (PJM) (Norristown, PA, USA). The datasets are hourly energy consumption data from PJM, a regional transmission organization in the United States that supplies electricity to Delaware, Illinois, Indiana, Kentucky, Maryland, Michigan, New Jersey, North Carolina, Ohio, Pennsylvania, Tennessee, Virginia, West Virginia, and the District of Columbia. We did not use the entire dataset but a part of it in our study. The data we use is the energy consumption rate of PJM East Region (PJME). PJM Interconnection LLC (PJME) provides a time series dataset consisting of 145,366 rows and 2 features. The consumption data is in megawatts (MW) [47].

3.1.2. Exploratory Data Analysis

Exploratory data analysis (EDA) includes the presentation, summarization, and graphical presentation of these analyses in an understandable and easy way in order to obtain relevant information from existing data by utilizing probability and statistical methods. These studies provide the foundations of feature engineering. That is, it is the work of creating, transforming, and inferring features from the dataset so that the model can work in the most efficient way possible.
First, a clear exploratory data analysis template is created that extracts and summarizes the most important features of the dataset and focuses on time series. Some common Python libraries 3.8.20, such as Pandas 1.4.4, Matplotlib 3.6.2, Numpy 1.19.2, Seaborn 0.12.2 and Statsmodels 0.13.5, are used for this.
The first thing to do when working with time series is to roughly observe and check the type of data, its values, and whether there are any incorrect or missing values. One of the important things we do is plot the data. Thus, many features such as graphs, patterns, unusual observations, changes over time, and relationships between variables can be observed here. The results from these observations should be incorporated into the forecasting model as much as possible. In addition, some mathematical tools, such as descriptive statistics and time series decomposition, also provide us with great benefits [48,49].
In the next section, studies and conclusions regarding some of the steps implemented for EDA will be made.

Descriptive Statistics, Data Organization, and Time Plot

The first and last 5 rows of our raw data are shown in Table 1a. Our data consists of two columns: datetime and consumption amount.
As can be seen from Table 1a, the values are sorted by years and hours but not by months and days. The data is not ordered from past to future. Moreover, our data consists of 145,366 rows and 2 columns excluding the header row. As a result of the examinations made according to time, it can be observed that 30 lines are missing and 4 lines are repeated. Missing values were added by calculating the mean value. The more appropriate of the redundant lines was selected and the other line was deleted. The data was then sorted from the past to the future. The final version of our data is shown in Table 1b.
Figure 1a shows the time-dependent change in consumption according to raw data, and Figure 1b shows the time-dependent change in consumption according to edited data. Here it can be seen that unedited raw data will mislead us. Because incomplete and unordered data will shift both the peak time and the holiday hours.
In Figure 1b, there is no major increase/decrease trend over the years and the average consumption remains more stable compared with Figure 1a.

Seasonal Plots

When we plot graphs annually, monthly, weekly, and daily, the recurring events or differences that we call seasonality become clearer.
According to Figure 1b, it can be seen that all years have similar patterns. It can be observed that consumption values increase significantly in winter and summer (for heating/cooling purposes) and reach the highest levels in summer. However, in the mild spring and autumn months, consumption can be seen to be at a minimum.
As seen in Figure 2 and Figure 3, consumption values reached maximum values in the 7th and 8th months and minimum values in the 4th and 10th months.
Figure 4 shows that consumption on weekdays is higher than consumption on weekends. Moreover, the highest difference occurs during daytime hours. The highest consumption is seen at 18:00 and 19:00 (peak hours), while the lowest consumption is seen at 3:00, 4:00, and 5:00 (midnight).
The values to be considered for time series are the time intervals that exhibit seasonal behavior and the most variability. For this reason, the values that should be taken as an example in our study are seen as the last 24 h values, because, when looking at the annual, monthly, weekly, daily, and hourly values in Figure 2, Figure 3 and Figure 4, it can be seen that the data shows similar behavior and there is no extreme difference between the values. The maximum difference is seen between intraday hourly values. However, for example, the consumption value of the previous day at 11:00 and the consumption value of the next day at 11:00 are similar. Therefore, when estimating today’s consumption, one should look at yesterday as the most similar value.
According to Figure 2, Figure 3 and Figure 4, it can be seen that year, month, week, day, and hour affect consumption values. Therefore, when estimating consumption, this information should be converted into columns and used when estimating. That is, the columns to be analyzed should be year, month, week, hour, day, and PJME_MW columns. Accordingly, the first 5 rows of the final version of our data are shown in Table 2.

3.2. Deep Learning Study

Some of the parameters we will use in our deep learning model are shown below.
  • SEED = 123
  • batch_size = 128
  • return_sequence = True
  • target = ‘PJME_MW’  *
  • n_hours = 24          **
  • train_size = 0.8         ***
  • cols_to_analyze = [“PJME_MW”, “year”, “month”, “week”, “day”, “hour”]
(* We bought one dimension as a basis, we could have taken more than one dimension. ** The value we will take for sampling. *** 80% of the data will be used for training and 20% for testing.)
In order to produce the same values every time, we set the fixing value, seed, equal to the seed value we determined before, i.e., 123.
We are trying to estimate the energy consumption for the next hour. Of course, the energy consumption values of the previous day will be important for us. Past values are known as lag. The value at time t is greatly affected by the value at time t-1. This application is called framing in the field of feature engineering. In our study, 24 h data for each day in the day-ahead market is taken and used to forecast consumption for the next hour. In other words, the aim is to predict the 25th hour with 24 h of historical data (with n_hours = 24). Then, we need a frame that estimates the 26th hour from the past 2–25 h. Our time series columns are rearranged based on the year, month, week, day, hour, and consumption value in Table 2. For the 6 columns in our time series, all values from the previous day t-1 to t-24 are taken and included in the calculation. When we take the 24 h values of the 6 columns and arrange them side by side, 144 columns are formed. Thus, our dataset is intended to consist of 145 columns and 145,368 rows, including the datatime column. With this technique, our simple single-column data was adapted to the MLP model. For this, a supervised learning method is applied by using a ready-made function.
In addition to the libraries given in the previous sections for deep learning, some common Python libraries, such as Scikit-learn 1.6.1, TensorFlow 2.3.0, and Keras 3.10.0, are also needed.
The following model is used as a model.
  • model = Sequential ()
  • model.add (LSTM (50, activation = ‘relu’,input_shape = (X_train.shape [1], X_train.shape [2])))
  • model.add (Dense (1))
  • model.compile (optimizer = ‘adam’, loss = ‘mse’, metrics = [‘accuracy’])
The next part is continued through the program, and as a result of changing some parameters, the performance of the models can be measured with MAE, RMSE, and R2_score indices.

3.3. Creating the Program Interface

First of all, the Python program was used in all studies via the Jupyter Integrated Development Environment (IDE) within Anaconda. The graphs in the 3rd section were drawn with the Python program, and the graphs in the 5th section were drawn with the Microsoft Excel program. Additionally, Microsoft Excel was used in some studies and tables. RMSE, MAE, and R2 metrics were calculated via Python.

4. Working

Within the scope of study, 32 different group studies consisting of 352 trials were conducted. The studies were carried out with 44 times 10 epochs, 11 times 70 epochs, 33 times 100 epochs, and 264 times 50 epochs. In addition, the computer was turned off and on again during each study to ensure that what was learned in the previous study would not affect the next study and that the RAM energy was completely discharged.
LSTM, GRU, Simple RNN, and Bi-LSTM models were used in all studies, respectively. Along with four different models, 11 different optimization methods belonging to the Keras library were tried for optimization. These were Adam, Nadam, Adamax, RMSprop, RMSprop*, SGD, SGDtrue, SGDfalse, Adagrad, FTRL, and Adadelta methods. A few more optimization models were tried but were not added to the lists because they did not yield results. The seven most successful of these methods are painted in their own unique colors. Default values for standard methods were used in this study. However, some values for the RMSprop*, SGDtrue, and SGDfalse methods have been changed. The details of these three methods are listed in Table 3.

4.1. Analysis of Consumption Forecast Using 11 Different Optimizer Methods on Four Different Models

Using four different deep learning models, LSTM, GRU, SimpleRNN, and BiLSTM, 11 different optimization methods were tested for each model. The results of this study were measured with RMSE, MAE, and R2 metrics and are given in Table 4. In the table, the models are listed from top to bottom according to their success status. In these experiments, the epoch value was taken as 50.

4.2. 4 Analysis of Consumption Forecast Using 10 Epochs, 50 Epochs, and 100 Epochs Using 11 Different Optimizer Methods on Four Different Models

Using four different deep learning models, LSTM, GRU, SimpleRNN, and BiLSTM, 11 different optimization methods were tested with 10 epochs, 50 epochs, and 100 epochs. However, in the study conducted with the BiLSTM model, the computer gave an error before reaching the 100-epoch value. Therefore, experiments were conducted with 70 epochs instead of 100 epochs for the BiLSTM model. This study was measured with RMSE, MAE, and R2 metrics, and the results are given in Table 5. The values in the table are listed from top to bottom according to their success status for each model. In all experiments in this section, ReLU was used as the activation function. Table 5 includes Table 4 for comparisons.

4.3. Analysis of Consumption Forecast Using Four Different Models, Six Different Activation Functions, and 11 Different Optimization Methods

In our study, four different models (LSTM, GRU, Simple RNN, and BiLSTM), six different activation functions (ReLU, tanh, ELU, LeakyReLU, sigmoid, softmax), and 11 different optimization methods were tested. In each study, one of the four different models, six different activation functions, and 11 different optimization methods were changed and 264 different models were studied. The epoch value was taken as 50 in the experiments. The results of this study were measured with RMSE, MAE, and R2 metrics and the results are given in Table 6. The values in the table are listed from top to bottom according to their success status for each model. Table 6 includes Table 4 for comparisons.

5. Results of the Study

The values in the tables above are listed from top to bottom according to their success status for each model. It can be seen that the successful models are successful in all three RMSE, MAE, and R2 metrics. However, in a few studies, it can be seen that one of the upper and lower trials was successful in one metric and the other in another metric. Therefore, for convenience, the R2 metric was used as a criterion both in the listing and in the graphs.
The background of values with an R2 value of 0.99 and above is painted green, and the background of values between 0.98 and 0.99 is painted pink. The background of values with an R2 value of 0 or negative is colored brown.
In most of the studies in the literature, Adam was used as the optimizer, and ReLU was used as the activation function. In this study, methods that can compete with Adam in terms of performance and achieve higher success in some trials have been noted. According to the study results, Nadam obtained the highest values in six groups of studies, Adamax in three groups of studies, and RMSprop in one group of studies. In other groups, the Adam method reached the highest values. In this respect, Nadam and Adamax, which are not prevalent in the literature, attract attention.
It is observed that the majority of the study results tested with the FTRL, Adadelta, and SGDfalse methods are painted with a brown background and were unsuccessful. It can be seen that the FTRL method obtained positive values only in five trials and reached an R2 value of 0.91 in only one trial. It can be seen that the Adadelta method obtained positive values only in six studies and reached an R2 value of 0.76 in one study. It can be seen that the SGDfalse method obtained a positive value in only one trial and reached an R2 value of 0.95. Therefore, these three methods are not taken into account in graphical drawings. Since lower values were obtained in the experiments conducted with the Adagrad method compared with other methods, the results of this method have also been removed from some graphs. In a continuation of the fifth section, the titles in the fourth section were analyzed and the findings obtained were evaluated. Therefore, the work continues without changing the titles. However, in order to better compare successful models with each other, models with low performance in some graphs have been removed from the graph.
We have also used some methods to solve the overfitting problem in our models. Cross-validation and sliding window splitting are two of these methods. In cross-validation, the data is divided into several subsets and the performance of the machine learning model is evaluated. Dividing the data into training and test data serves this purpose. We also tried to predict the next hour using the last 24 h of data for six columns. In fact, we created a separate dataset for each prediction and cross-validated it. We also used the sliding window method by shifting the data we use in each prediction by one hour and using the next 24 h of values.

5.1. Results of the Estimation Made Using Four Different Models and 11 Different Optimization Methods

Figure 5 and Figure 6 are plotted for the results of the seven most successful optimization methods according to the four models. In this study, ReLU was used as the activation function, and the study was conducted over 50 epochs. According to Figure 5, when the most successful results are looked at, Adam and Nadam come first, followed by Adamax and RMSprop. The Adagrad and SGD methods are not shown in Figure 5 because of their lower performance. In Figure 6, the Adagrad, SGD, and SGDTrue methods are not shown. According to Figure 6, BiLSTM and LSTM stand out as the most successful models, although there is no clear distinction. This conclusion is supported by similar studies conducted with different datasets, which reached R2 values of 0.99 and MAE values of 0.092 [50,51].

5.2. Results of the Study Using Four Different Models, 11 Different Optimizers, and 10–50–100 Epochs

In Figure 7, Figure 8, Figure 9 and Figure 10 below, this study evaluated eight different optimizers using LSTM, GRU, Simple RNN, and BiLSTM models and the ReLU activation function, respectively. All studies performed for epoch values 10, 50, and 100 were evaluated. In this study, it can be seen that increasing the number of epochs has a positive effect on the results in most trials. In fact, performance decreases after a certain number of epochs for each model. However, since 100 epochs is considered sufficient for now, epoch values were not increased further. Another study observed that performance increased with the number of epochs, but that performance did not increase further after a certain number of epochs [52]. In fact, in order to save time and avoid bad results, the epoch value was taken as 50 in all studies in other sections. In Figure 8 and Figure 9, the Adagrad method, and in Figure 7 and Figure 10, the Adagrad and SGD methods were removed from the graph due to their poor performance.
In Figure 7, for the RMSprop* method, the epoch value of 100 was removed from the graph because it was too low. The R2 value increases as the number of epochs increases, while for both RMSprop and RMSprop*, the value increases at 50 epochs but decreases at 100 epochs.
In Figure 8, as the number of epochs increases, the R2 value increases, while for RMSprop this value increases at 50 epochs and decreases at 100 epochs.
In Figure 9, as the number of epochs increases, the R2 value increases, while for both RMSprop* and Nadam, the value increases at 50 epochs but decreases at 100 epochs. Despite this, the Nadam method stands out as the most successful model.
In Figure 10, as the number of epochs increases, the R2 value increases, while for Adam, the value increases at 50 epochs but decreases at 100 epochs. In addition, the 70 epoch value could not be measured in the RMSprop* method because it gave a NaN error due to the exploding gradients problem.

5.3. Results of the Study Using Four Different Models, Six Different Activation Functions, and 11 Different Optimizers

In Figure 11, Figure 12, Figure 13 and Figure 14 below, the results of the experiments with four different architectures (LSTM, GRU, SimpleRNN, and BiLSTM), six different activation functions (ReLU, tanh, ELU, LeakyReLU, sigmoid, softmax), and six different optimizers are plotted. The epoch value is taken as 50. The Adagrad method is not shown in the graph because it obtained negative values in 8 out of 24 trials. The SGD method is not shown in the graph due to its low performance. Additionally, negative values in other methods are not shown.
As can be seen from Figure 11a, Figure 12a, Figure 13a, and Figure 14a, the most successful activation functions are ReLU, tanh, ELU, and LeakyReLU. It can be seen that sigmoid and softmax are less successful. In Figure 13a,b, the study using the SimpleRNN–LeakyReLU–RMSprop* methods was deleted because its result value was low. Moreover, the SimpleRNN–ELU–RMSprop* model in Figure 13a,b and the BiLSTM–ELU–RMSprop* model in Figure 14a,b suffer from the exploding gradients problem.
As can be seen from Figure 11b, Figure 12b, Figure 13b, and Figure 14b, the most successful optimizers are Adam, Nadam, Adamax, and RMSprop. It can be seen that RMSprop* and SGDtrue are less successful.
It can be seen that the SGDtrue method is more successful than the SGD method in all of the studies except one trial. When compared with these two methods, the SGDfalse method is seen to be quite unsuccessful. Additionally, throughout the study, it can be observed that the RMSprop method is more successful than the RMSprop* method, except in two trials.

5.4. Best Results Achieved

The 10 most successful studies according to MAE, RMSE, and R2_score indices are listed in Table 7. In all these experiments, the epoch value was taken as 50. The most successful study was obtained with the BiLSTM model, the ReLU activation function, and the Adam optimization function. In the study, a value of 0.9976 was reached for R2. Another study using the same dataset reached an R2 value of 0.9913, which was presented as the highest value obtained [53]. Another study using the same dataset reached an R2 value of 0.98 [54]. In a similar study, applying random hyperparameter optimisation to the LSTM model reduced the error rate to below 1.5% [52]. The graph of the real and predicted values of the first 200 and 30,000 values of this model can be seen in Figure 15a,b. Figure 16 shows how close the predictions are to the real value line. In other models, the success ranking decreases from the highest to the lowest.
The graph of the 10 most successful models according to Table 7 according to their activation function is shown in Figure 17. Accordingly, successful models can be listed as BiLSTM–Adam, LSTM–Adam, BiLSTM–Nadam, Simple RNN–Nadam, and GRU–Adam, respectively.
The graph of the 10 most successful models according to Table 7, by model and optimizer, is shown in Figure 18. Accordingly, the most successful activation functions can be listed as ReLU, LeakyReLU, ELU, and tanh, respectively.
In our study, we observed that performance varies according to the architectures and models employed. The same dataset produces different results depending on the method used. However, applying the same methods to different datasets will also produce different results. Indeed, studies have emphasized that data-driven approaches will play a significant role in estimation accuracy, and that activation function performance largely depends on the dataset [55,56].

6. Discussion, and Conclusions

The contributions of this article to the literature are listed as follows:
  • Emphasizing the importance of electricity consumption estimation in smart grid energy systems.
  • Showing that more accurate data will be obtained with data editing EDA.
  • Showing that there are actually many successful options and models, even though single options are usually offered in similar studies.
  • Showing the success of the proposed models in estimating energy consumption correctly. Also, obtaining different models and successful results with different variations.
  • Outlining statistical metrics with RMSE, MAE, and R2 to evaluate model performances.
In this study, 89 of the 352 experiments yielded R2 values of 0.95 and above. This corresponds to a rate of 25%. In five trials, the problem of exploding gradients was encountered. Three of these occurred while working with the RMSprop* method and two while working with the SGDfalse method.
First of all, it can be seen that the BiLSTM and the LSTM models are the most successful models.
The most successful optimizers are Adam, Nadam, Adamax, and RMSprop, respectively. RMSprop* and SGDtrue seem to be less successful. It was seen that 96.5% of the studies with FTRL, Adadelta, and SGDfalse methods offered very poor results. In the literature, the Adam optimization method is generally used. Of the 32 group studies, Nadam gave the highest values in six, Adamax in three, and RMSprop in one. In this respect, Nadam and Adamax, which do not feature much in the literature, attract attention.
It can be seen that increasing the number of epochs has a positive effect on the results.
The most successful activation functions can be listed as ReLU, LeakyReLU, ELU, and tanh, respectively. Sigmoid and softmax appear to be less successful. Most of the studies in the literature were conducted with the ReLU method. When we look at the most successful models in the study, the use of 40% ReLU, 30% LeakyReLU, 20% ELU, and 10% tanh is noteworthy.
It can be seen that the SGDtrue method is more successful than the SGD method, while the SGDfalse method is quite unsuccessful. It can also be seen that the RMSprop method is more successful than the RMSprop* method.
Looking at the 10 most successful studies, the R2 metric reaches 0.9976 with the use of BiLSTM, ReLU, and Adam.
In most studies, joint studies were carried out, such as forecasting electricity demand, production, or consumption. However, a common method or rule for model development has not been produced. Therefore, in this study, different techniques were compared, different models were applied, and model suggestions were made using different hyperparameters.
The reason for the high success in this study is that there is seasonality in our data, and the consumption data is more regular. It is proposed that the methods here will be successful when applied to similar datasets, but their performance will decrease on different datasets.

Author Contributions

Conceptualization, M.T.U. and A.K.; methodology, M.T.U. and A.K.; software, M.T.U. and A.K.; validation, M.T.U. and A.K.; formal analysis, M.T.U. and A.K.; investigation, M.T.U. and A.K.; resources, M.T.U. and A.K.; data curation, M.T.U. and A.K.; writing—original draft preparation, M.T.U. and A.K.; writing—review and editing, M.T.U. and A.K.; visualization, M.T.U. and A.K.; supervision, M.T.U. and A.K.; project administration, M.T.U. and A.K.; funding acquisition, M.T.U. and A.K. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Publicly available datasets were analyzed in this study. This data can be found here: https://github.com/yusuficolab/MDPI-Makalesi/tree/main accessed on 1 May 2025.

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. Ali, M.; Khan, D.M.; Alshanbari, H.M.; El-Bagoury, A.A.-A.H. Prediction of Complex Stock Market Data Using an Improved Hybrid EMD-LSTM Model. Appl. Sci. 2023, 13, 1429. [Google Scholar] [CrossRef]
  2. Mndawe, S.T.; Paul, B.S.; Doorsamy, W. Development of a Stock Price Prediction Framework for Intelligent Media and Technical Analysis. Appl. Sci. 2022, 12, 719. [Google Scholar] [CrossRef]
  3. Jarrah, M.; Derbali, M. Predicting Saudi Stock Market Index by Using Multivariate Time Series Based on Deep Learning. Appl. Sci. 2023, 13, 8356. [Google Scholar] [CrossRef]
  4. Huang, D.; Zhang, Q.; Wen, Z.; Hu, M.; Xu, W. Research on a Time Series Data Prediction Model Based on Causal Feature Weight Adjustment. Appl. Sci. 2023, 13, 10782. [Google Scholar] [CrossRef]
  5. Cheng, C.-H.; Chen, Y.-S. Fundamental Analysis of Stock Trading Systems using Classification Techniques. In Proceedings of the 2007 International Conference on Machine Learning and Cybernetics, Hong Kong, China, 19–22 August 2007; pp. 1377–1382. [Google Scholar]
  6. Vargas, M.R.; de Lima, B.S.L.P.; Evsukoff, A.G. Deep Learning for Stock Market Prediction from Financial News Articles. In Proceedings of the 2017 IEEE International Conference on Computational Intelligence and Virtual Environments for Measurement Systems and Applications (CIVEMSA), Annecy, France, 26–28 June 2017; pp. 60–65. [Google Scholar] [CrossRef]
  7. Li, C.; Zhao, M.; Liu, Y.; Xu, F. Air Temperature Forecasting using Traditional and Deep Learning Algorithms. In Proceedings of the 7th International Conference on Information Science and Control Engineering (ICISCE), Changsha, China, 18–20 December 2020; pp. 189–194. [Google Scholar] [CrossRef]
  8. Escalona-Llaguno, M.I.; Solís-Sánchez, L.O.; Castañeda-Miranda, C.L.; Olvera-Olvera, C.A.; Martinez-Blanco, M.d.R.; Guerrero-Osuna, H.A.; Castañeda-Miranda, R.; Díaz-Flórez, G.; Ornelas-Vargas, G. Comparative Analysis of Solar Radiation Forecasting Techniques in Zacatecas, Mexico. Appl. Sci. 2024, 14, 7449. [Google Scholar] [CrossRef]
  9. Lyu, C.; Eftekharnejad, S. Probabilistic Solar Generation Forecasting for Rapidly Changing Weather Conditions. IEEE Access 2024, 12, 79091–79103. [Google Scholar] [CrossRef]
  10. Wu, D.; Jia, Z.; Zhang, Y.; Wang, J. Predicting Temperature and Humidity in Roadway with Water Trickling Using Principal Component Analysis-Long Short-Term Memory-Genetic Algorithm Method. Appl. Sci. 2023, 13, 13343. [Google Scholar] [CrossRef]
  11. Rosca, C.-M.; Carbureanu, M.; Stancu, A. Data-Driven Approaches for Predicting and Forecasting Air Quality in Urban Areas. Appl. Sci. 2025, 15, 4390. [Google Scholar] [CrossRef]
  12. Swain, D.; Kumar, M.; Nour, A.; Patel, K.; Bhatt, A.; Acharya, B.; Bostani, A. Remaining Useful Life Predictor for EV Batteries Using Machine Learning. IEEE Access 2024, 12, 134418–134426. [Google Scholar] [CrossRef]
  13. Wang, H.; Wang, H.; Jiang, G.; Li, J.; Wang, Y. Early fault detection of wind turbines based on operational condition clustering and optimized deep belief network modeling. Energies 2019, 12, 984. [Google Scholar] [CrossRef]
  14. Zou, Y.; Sun, W.; Wang, H.; Xu, T.; Wang, B. Research on Bearing Remaining Useful Life Prediction Method Based on Double Bidirectional Long Short-Term Memory. Appl. Sci. 2025, 15, 4441. [Google Scholar] [CrossRef]
  15. Wu, M.; Yue, C.; Zhang, F.; Sun, R.; Tang, J.; Hu, S.; Zhao, N.; Wang, J. State of Health Estimation and Remaining Useful Life Prediction of Lithium-Ion Batteries by Charging Feature Extraction and Ridge Regression. Appl. Sci. 2024, 14, 3153. [Google Scholar] [CrossRef]
  16. Alsmadi, L.; Lei, G.; Li, L. Forecasting Day-Ahead Electricity Demand in Australia Using a CNN-LSTM Model with an Attention Mechanism. Appl. Sci. 2025, 15, 3829. [Google Scholar] [CrossRef]
  17. Mukhtar, M.; Oluwasanmi, A.; Yimen, N.; Qinxiu, Z.; Ukwuoma, C.C.; Ezurike, B.; Bamisile, O. Development and Comparison of Two Novel Hybrid Neural Network Models for Hourly Solar Radiation Prediction. Appl. Sci. 2022, 12, 1435. [Google Scholar] [CrossRef]
  18. Ayaz Atalan, Y.; Atalan, A. Testing the Wind Energy Data Based on Environmental Factors Predicted by Machine Learning with Analysis of Variance. Appl. Sci. 2025, 15, 241. [Google Scholar] [CrossRef]
  19. Energy Forecasting. A Blog by Dr. Tao Hong. Available online: http://blog.drhongtao.com/2014/10/very-short-short-medium-long-term-load-forecasting.html (accessed on 29 May 2025).
  20. Kuster, C.; Rezgui, Y.; Mourshed, M. Electrical load forecasting models: A critical systematic review. Sustain. Cities Soc. 2017, 35, 257–270. [Google Scholar] [CrossRef]
  21. Tan, Z.; Zhang, J.; Wang, J.; Xu, J. Day-ahead electricity price forecasting using wavelet transform combined with ARIMA and GARCH models. Appl. Energy 2010, 87, 3606–3610. [Google Scholar] [CrossRef]
  22. Esener, I.I.; Yüksel, T.; Kurban, M. Artificial Intelligence Based Hybrid Structures for Short-Term Load Forecasting Without Temperature Data. In Proceedings of the 11th International Conference on Machine Learning and Applications, Boca Raton, FL, USA, 12–15 December 2012; pp. 457–462. [Google Scholar] [CrossRef]
  23. Zhang, X.M.; Grolinger, K.; Capretz, M.A.M. Forecasting Residential Energy Consumption Using Support Vector Regressions. In Proceedings of the IEEE International Conference on Machine Learning and Applications, Orlando, FL, USA, 17–20 December 2018; pp. 110–117. [Google Scholar]
  24. Yang, M.; Li, W.; Zhang, H.; Wang, H. Parameters Optimization Improvement of SVM on Load Forecasting. In Proceedings of the 8th International Conference on Intelligent Human-Machine Systems and Cybernetics (IHMSC), Hangzhou, China, 27–28 August 2016; IEEE: New York, NY, USA, 2016; pp. 257–260. [Google Scholar] [CrossRef]
  25. Ding, Z.; Chen, W.; Hu, T.; Xu, X. Evolutionary double attention-based long short-term memory model for building energy prediction: Case study of a green building. Appl. Energy 2021, 288, 116660. [Google Scholar] [CrossRef]
  26. Fan, C.; Sun, Y.; Zhao, Y.; Song, M.; Wang, J. Deep learning-based feature engineering methods for improved building energy prediction. Appl. Energy 2019, 240, 35–45. [Google Scholar] [CrossRef]
  27. Zhu, T.; Ran, Y.; Zhou, X.; Wen, Y. A Survey of Predictive Maintenance: Systems, Purposes and Approaches. arXiv 2024, arXiv:1912.07383v2. [Google Scholar] [CrossRef]
  28. Fan, C.; Wang, J.; Gang, W.; Li, S. Assessment of deep recurrent neural network-based strategies for short-term building energy predictions. Appl. Energy 2019, 236, 700–710. [Google Scholar] [CrossRef]
  29. Kong, W.; Dong, Z.Y.; Jia, Y.; Hill, D.J.; Xu, Y.; Zhang, Y. Short-Term Residential Load Forecasting Based on LSTM Recurrent Neural Network. IEEE Trans. Smart Grid 2019, 10, 841–851. [Google Scholar] [CrossRef]
  30. Muzaffar, S.; Afshari, A. Short-Term Load Forecasts Using LSTM Networks. Energy Procedia 2019, 158, 2922–2927. [Google Scholar] [CrossRef]
  31. Wang, C.; Yan, Z.; Li, Q.; Zhu, Z.; Zhang, C. Energy Consumption Prediction for Drilling Pumps Based on a Long Short-Term Memory Attention Method. Appl. Sci. 2024, 14, 10750. [Google Scholar] [CrossRef]
  32. Siami-Namini, S.; Tavakoli, N.; Namin, A.S. A Comparison of ARIMA and LSTM in Forecasting Time Series. In Proceedings of the 17th IEEE International Conference on Machine Learning and Applications (ICMLA), Orlando, FL, USA, 17–20 December 2018; pp. 1394–1401. [Google Scholar] [CrossRef]
  33. IDahlan, A.; Ariateja, D.; Hamami, F.; Heryanto. The Implementation of Building Intelligent Smart Energy using LSTM Neural Network. In Proceedings of the 2021 International Conference on Artificial Intelligence and Mechatronics Systems (AIMS), Bandung, Indonesia, 28–30 April 2021; pp. 1–5. [Google Scholar] [CrossRef]
  34. Lin, J.; Ma, J.; Zhu, J.; Cui, Y. Short-term load forecasting based on LSTM networks considering attention mechanism. Int. J. Electr. Power Energy Syst. 2022, 137, 107818. [Google Scholar] [CrossRef]
  35. Wang, J.Q.; Du, Y.; Wang, J. LSTM-based long-term energy consumption prediction with periodicity. Energy 2020, 197, 117197. [Google Scholar] [CrossRef]
  36. Ungureanu, S.; Topa, V.; Cziker, A.C. Deep Learning for Short-Term Load Forecasting—Industrial Consumer Case Study. Appl. Sci. 2021, 11, 10126. [Google Scholar] [CrossRef]
  37. Almalki, A.J.; Wocjan, P. Forecasting Method based upon GRU-based Deep Learning Model. In Proceedings of the 2020 International Conference on Computational Science and Computational Intelligence (CSCI), Las Vegas, NV, USA, 16–18 December 2020; pp. 534–538. [Google Scholar] [CrossRef]
  38. Karim, F.; Majumdar, S.; Darabi, H. Insights Into LSTM Fully Convolutional Networks for Time Series Classification. IEEE Access 2019, 7, 67718–67725. [Google Scholar] [CrossRef]
  39. Yang, S.; Yu, X.; Zhou, Y. LSTM and GRU Neural Network Performance Comparison Study: Taking Yelp Review Dataset as an Example. In Proceedings of the 2020 International Workshop on Electronic Communication and Artificial Intelligence (IWECAI), Shanghai, China, 1–4 June 2020; pp. 98–101. [Google Scholar] [CrossRef]
  40. Siami-Namini, S.; Tavakoli, N.; Namin, A.S. The Performance of LSTM and BiLSTM in Forecasting Time Series. In Proceedings of the IEEE International Conference on Big Data (Big Data), Los Angeles, CA, USA, 9–12 December 2019; pp. 3285–3292. [Google Scholar] [CrossRef]
  41. Xu, J.; Zeng, P. Short-term Load Forecasting by BiLSTM Model Based on Multidimensional Time-domain Feature. In Proceedings of the 4th International Conference on Neural Networks, Information and Communication Engineering (NNICE), Guangzhou, China, 19–21 January 2024; pp. 1526–1530. [Google Scholar] [CrossRef]
  42. Shahid, F.; Zameer, A.; Muneeb, M. Predictions for COVID-19 with deep learning models of LSTM, GRU, and Bi-LSTM. Chaos Solitons Fractals 2020, 140, 110212. [Google Scholar] [CrossRef]
  43. Şişmanoğlu, G.; Koçer, F.; Önde, M.A.; Sahingoz, O.K. Price Forecasting in Stock Exchange with Deep Learning Methods. BEU J. Sci. 2020, 9, 434–445. [Google Scholar] [CrossRef]
  44. Irfan, M.; Shaf, A.; Ali, T.; Zafar, M.; Rahman, S.; Mursal, S.N.F.; AlThobiani, F.; Almas, M.A.; Attar, H.M.; Abdussamiee, N.; et al. Multi-region electricity demand prediction with ensemble deep neural networks. PLoS ONE 2023, 18, e0285456. [Google Scholar] [CrossRef] [PubMed]
  45. Alıoghlı, A.A.; Yıldırım Okay, F. IoT-Based Energy Consumption Prediction Using Transformers. Gazi Univ. J. Sci. Part A Eng. Innov. 2024, 11, 304–323. [Google Scholar] [CrossRef]
  46. Khan, Z.A.; Ullah, A.; Haq, I.U.; Hamdy, M.; Mauro, G.M.; Muhammad, K.; Hijji, M.; Baik, S.W. Efficient Short-Term Electricity Load Forecasting for Effective Energy Management. Sustain. Energy Technol. Assess. 2022, 53 Part A, 102337. [Google Scholar] [CrossRef]
  47. Energy Consumption Dataset, Kaggle. Available online: https://www.kaggle.com/datasets/raminhuseyn/energy-consumption-dataset/data (accessed on 29 May 2025).
  48. Time Series Forecasting: Exploratory Data Analysis, Kaggle. Available online: https://www.kaggle.com/code/raminhuseyn/time-series-forecasting-exploratory-data-analysis (accessed on 29 May 2025).
  49. Time Series Forecasting: A Practical Guide to Exploratory Data Analysis, Towards Data Science. Available online: https://towardsdatascience.com/time-series-forecasting-a-practical-guide-to-exploratory-data-analysis-a101dc5f85b1/ (accessed on 29 May 2025).
  50. Mellit, A.; Pavan, A.M.; Lughi, V. Deep learning neural networks for short-term photovoltaic power forecasting. Renew. Energy 2021, 172, 276–288. [Google Scholar] [CrossRef]
  51. Alizadegan, H.; Rashidi Malki, B.; Radmehr, A.; Karimi, H.; Ilani, M.A. Comparative study of long short-term memory (LSTM), bidirectional LSTM, and traditional machine learning approaches for energy consumption prediction. Energy Explor. Exploit. 2024, 43, 281–301. [Google Scholar] [CrossRef]
  52. Torres, J.F.; Martínez-Álvarez, F.; Troncoso, A. A deep LSTM network for the Spanish electricity consumption forecasting. Neural Comput. Applic 2022, 34, 10533–10545. [Google Scholar] [CrossRef]
  53. Alsharekh, M.F.; Habib, S.; Dewi, D.A.; Albattah, W.; Islam, M.; Albahli, S. Improving the Efficiency of Multistep Short-Term Electricity Load Forecasting via R-CNN with ML-LSTM. Sensors 2022, 22, 6913. [Google Scholar] [CrossRef] [PubMed]
  54. Ali, A.N.; Etem, T. Hourly energy consumption forecasting by LSTM and ARIMA methods. J. Comput. Electr. Electron. Eng. Sci. 2025, 3, 14–20. [Google Scholar] [CrossRef]
  55. Khalil, M.; McGough, A.S.; Pourmirza, Z.; Pazhoohesh, M.; Walker, S. Machine Learning, Deep Learning and Statistical Analysis for forecasting building energy consumption—A systematic review. Eng. Appl. Artif. Intell. 2022, 115, 105287. [Google Scholar] [CrossRef]
  56. Liu, J.; Ahmad, F.A.; Samsudin, K.; Hashim, F.; Kadir, M.Z.A.A. Performance Evaluation of Activation Functions in Deep Residual Networks for Short-Term Load Forecasting. IEEE Access 2025, 13, 78618–78633. [Google Scholar] [CrossRef]
Figure 1. (a) Unedited raw data time plot, (b) Edited raw data time plot.
Figure 1. (a) Unedited raw data time plot, (b) Edited raw data time plot.
Applsci 15 06839 g001
Figure 2. PJME yearly–monthly consumption seasonal plot.
Figure 2. PJME yearly–monthly consumption seasonal plot.
Applsci 15 06839 g002
Figure 3. PJME monthly–weekly–daily consumption seasonal plot.
Figure 3. PJME monthly–weekly–daily consumption seasonal plot.
Applsci 15 06839 g003
Figure 4. PJME daily–hourly consumption seasonal plot.
Figure 4. PJME daily–hourly consumption seasonal plot.
Applsci 15 06839 g004
Figure 5. Four models–six optimizers–R2 graph.
Figure 5. Four models–six optimizers–R2 graph.
Applsci 15 06839 g005
Figure 6. Four models–five optimizers–R2 graph.
Figure 6. Four models–five optimizers–R2 graph.
Applsci 15 06839 g006
Figure 7. LSTM: six optimizers–10–50–100 epochs–R2 graph.
Figure 7. LSTM: six optimizers–10–50–100 epochs–R2 graph.
Applsci 15 06839 g007
Figure 8. GRU: seven optimizers–10–50–100 epochs–R2 graph.
Figure 8. GRU: seven optimizers–10–50–100 epochs–R2 graph.
Applsci 15 06839 g008
Figure 9. SimpleRNN: seven optimizers–10–50–100 epochs–R2 graph.
Figure 9. SimpleRNN: seven optimizers–10–50–100 epochs–R2 graph.
Applsci 15 06839 g009
Figure 10. BiLSTM: six optimizers–10–50–70 epochs–R2 graph.
Figure 10. BiLSTM: six optimizers–10–50–70 epochs–R2 graph.
Applsci 15 06839 g010
Figure 11. (a) Six optimizers–R2 plots of six activation functions with LSTM, (b) LSTM–six activation functions–R2 plots of six optimizers.
Figure 11. (a) Six optimizers–R2 plots of six activation functions with LSTM, (b) LSTM–six activation functions–R2 plots of six optimizers.
Applsci 15 06839 g011
Figure 12. (a) Six optimizers–R2 plots of six activation functions with GRU, (b) GRU–six activation functions–R2 plots of six optimizers.
Figure 12. (a) Six optimizers–R2 plots of six activation functions with GRU, (b) GRU–six activation functions–R2 plots of six optimizers.
Applsci 15 06839 g012
Figure 13. (a) Six optimizers–R2 plots of six activation functions with Simple RNN, (b) Simple RNN–six activation functions–R2 plots of six optimizers.
Figure 13. (a) Six optimizers–R2 plots of six activation functions with Simple RNN, (b) Simple RNN–six activation functions–R2 plots of six optimizers.
Applsci 15 06839 g013
Figure 14. (a) Six optimizers–R2 plots of six activation functions with BiLSTM, (b) BiLSTM–six activation functions–R2 plots of six optimizers.
Figure 14. (a) Six optimizers–R2 plots of six activation functions with BiLSTM, (b) BiLSTM–six activation functions–R2 plots of six optimizers.
Applsci 15 06839 g014
Figure 15. Actual value-prediction graphs of the first 200 (a), and 30,000 (b) values according to the most successful model.
Figure 15. Actual value-prediction graphs of the first 200 (a), and 30,000 (b) values according to the most successful model.
Applsci 15 06839 g015
Figure 16. True value line and prediction population for all values according to the most successful model. The blue line in the figure shows the true values, and the red dots show our predictions.
Figure 16. True value line and prediction population for all values according to the most successful model. The blue line in the figure shows the true values, and the red dots show our predictions.
Applsci 15 06839 g016
Figure 17. Activation function–R2 plots according to the top 10 results.
Figure 17. Activation function–R2 plots according to the top 10 results.
Applsci 15 06839 g017
Figure 18. Model-optimizer–R2 plots according to top 10 results.
Figure 18. Model-optimizer–R2 plots according to top 10 results.
Applsci 15 06839 g018
Table 1. (a) First and last 5 rows of unedited raw data, (b) First and last 5 rows of edited data.
Table 1. (a) First and last 5 rows of unedited raw data, (b) First and last 5 rows of edited data.
(a)(b)
Line NoDate TimePJME_MWLine NoDate TimePJME_MW
12002.12.31 01:0026,498.011.01.2002 01:0030,393.0
22002.12.31 02:0025,147.021.01.2002 02:0029,265.0
32002.12.31 03:0024,574.031.01.2002 03:0028,357.0
42002.12.31 04:0024,393.041.01.2002 04:0027,899.0
52002.12.31 05:0024,860.051.01.2002 05:0028,057.0
1453622018.01.1 20:0044,284.01453882.08.2018 20:0044,057.0
1453632018.01.1 21:0043,751.01453892.08.2018 21:0043,256.0
1453642018.01.1 22:0042,402.01453902.08.2018 22:0041,552.0
1453652018.01.1 23:0040,164.01453912.08.2018 23:0038,500.0
1453662018.01.2 00:0038,608.01453923.08.2018 00:0035,486.0
Table 2. First 5 rows of edited data.
Table 2. First 5 rows of edited data.
Date TimePJME_MWYearMonthWeekHourDayday_strYear_Month
1.01.2002 01:0030,393.020021101:00:001Tue2002_1
1.01.2002 02:0029,265.020021102:00:001Tue2002_1
1.01.2002 03:0028,357.020021103:00:001Tue2002_1
1.01.2002 04:0027,899.020021104:00:001Tue2002_1
1.01.2002 05:0028,057.020021105:00:001Tue2002_1
Table 3. Details of the methods used.
Table 3. Details of the methods used.
MethodHow to Use in the Program
RMSprop*optimizer = tf.keras.optimizers.RMSprop (learning_rate = 0.01, rho = 0.9)
SGDTrueoptimizer = tf.keras.optimizers.SGD (lr = 0.01, decay = 1 × 10−5, momentum = 0.9, nesterov = True)
SGDFalseoptimizer = tf.keras.optimizers.SGD (lr = 0.001, decay = 1 × 10−5, momentum = 1.0, nesterov = False)
Table 4. Four model–11 optimizer study results. The background of values with an R2 value of 0.99 and above is painted green, and the background of values between 0.98 and 0.99 is painted pink. The background of values with an R2 value of 0 or negative is painted brown.
Table 4. Four model–11 optimizer study results. The background of values with an R2 value of 0.99 and above is painted green, and the background of values between 0.98 and 0.99 is painted pink. The background of values with an R2 value of 0 or negative is painted brown.
Model Optimizertest_rmsetest-maetest_R2Model Optimizertest_rmsetest-maetest_R2
LSTM(50,
activation = ‘relu
Adam339.33242.540.9972SimpleRNN(50,
activation = ‘relu
Nadam368.15265.960.9968
Nadam428.91321.560.9954Adam425.05307.470.9956
RMSprop420.57288.890.9957RMSprop486.2357.390.9942
RMSprop*500.28376.880.9938adamax567.34440.710.9921
adamax517.6406.570.9935SGDTrue667.51503.70.9889
SGDTrue768.55586.430.9853RMSprop*812.6664.710.9832
SGD1602.391284.10.9294SGD904.34684.670.9792
adagrad2278.761838.130.8518adagrad1857.11488.040.9031
Ftrl5526.374603.25−5.2028Ftrl3173.832492.140.4742
adadelta6749.25649.35−5.44adadelta5071.944243.3−0.5113
SGDFalse13,125.0611,938.01−1 × 1013SGDFalse17,577.1516,466.97−2 × 1012
GRU(50,
activation = ‘relu
Adam412.18296.070.996Bidirectional(LSTM(50, activation = ‘reluAdam313.04223.660.9976
Nadam404.5296.40.996Nadam345.03255.650.9971
adamax481.16366.410.9945adamax458.3342.280.9948
RMSprop474.57347.390.9944RMSprop564.17458.310.9923
RMSprop*659.23537.330.9887RMSprop*655.86518.520.989
SGDTrue961.56721.470.9754SGDTrue831.81635.590.9829
SGD1336.021082.170.9546SGD1480.731165.060.9419
adagrad1935.031535.20.8824adagrad2212.381727.570.865
Ftrl4800.643921.81−1.7584Ftrl4940.554151.66−2.66
adadelta6497.185390.8−5.6841adadelta6296.765378.12−8.6355
SGDFalse12,246.2110,396.46−4 × 1013SGDFalse11,978.9610,797.690
Table 5. Four models–11 optimizers 10–50–100 epoch test results. The background of values with an R2 value of 0.99 and above is painted green, and the background of values between 0.98 and 0.99 is painted pink. The background of values with an R2 value of 0 or negative is painted brown.
Table 5. Four models–11 optimizers 10–50–100 epoch test results. The background of values with an R2 value of 0.99 and above is painted green, and the background of values between 0.98 and 0.99 is painted pink. The background of values with an R2 value of 0 or negative is painted brown.
Epochs10 Epochs50 Epochs100 Epochs
Model Optimizertest rmsetest maetest_R2test rmsetest maetest_R2test rmsetest maetest_R2
LSTM(50, activation = ‘reluAdam5714480.99183392430.99722892080.9979
Nadam6605150.98854293220.99543112290.9976
RMSprop8106530.98384212890.99575364300.9931
RMSprop*8286910.98195003770.9938260221980.7719
adamax12829850.95425184070.99353452560.9971
SGDTrue172713480.90957695860.98536074480.9909
SGD221917920.8494160212840.92949657530.9761
adagrad42933464−0.6808227918380.8518207616780.8796
Ftrl64015272−8.777755264603−5.202847543896−1.4893
adadelta94627775−2.630767495649−5.4442143415−0.5044
SGDFalse68516365−0.215413,12511,938−1 × 101333,34232,704−7.4604
GRU(50, activation = ‘reluAdam6014720.9914122960.9963392370.9972
Nadam6434880.98964052960.9963582660.9969
adamax9087070.97874813660.99454183080.9959
RMSprop7295870.98684753470.99445384310.993
RMSprop*8266250.98096595370.98876905640.9873
SGDTrue132810330.95349627210.97547335460.9862
SGD172513780.9207133610820.954611739420.965
adagrad38683063−0.3151193515350.8824171313860.9248
Ftrl61745109−9.944548013922−1.7584356728370.1859
adadelta71765724−4.230864975391−5.684146053768−1.6678
SGDFalse12,88311,131−4 × 101312,24610,396−4 × 101347,91047,468−3 × 1014
SimpleRNN(50, activation = ‘reluNadam5834440.99123682660.99683973000.9962
Adam7655920.9854253070.99563962750.9962
RMSprop9267860.97834863570.99424403060.9953
adamax7896020.98445674410.99214163100.9958
SGDTrue9307160.97796685040.98896154570.9906
RMSprop*8326150.98148136650.983211509410.963
SGD164013000.92159046850.97927445550.9861
adagrad256620350.7921185714880.9031155012420.9355
Ftrl54034469−3.4172317424920.4742167313500.9184
adadelta99418009−0.713350724243−0.5113268621320.7625
SGDFalse14,96513,796−4 × 101517,57716,467−2 × 101219,40418,2870
Epochs10 Epochs50 Epochs70 Epochs
Bidirectional(LSTM(50, activation = ‘reluAdam5324040.99293132240.99763212280.9975
Nadam5634380.9923452560.99713372530.9972
adamax7496140.98554583420.99484323200.9954
RMSprop7485970.98575644580.99234863440.9943
RMSprop*8937540.97966565190.989errorerrorerror
SGDTrue138110710.94798326360.98297185440.9873
SGD203615680.8873148111650.9419135710720.9515
adagrad359928610.0957221217280.8649211716490.8806
Ftrl62675193−9.791749414152−2.6643713639−0.8450
adadelta10,6938918−3.799262975378−8.635549934202−2.7761
SGDFalse10,5269370−7 × 101511,97910,798018,98117,837−4 × 1014
Table 6. Four models–six activation functions–11 optimizers study results. The background of values with an R2 value of 0.99 and above is painted green, and the background of values between 0.98 and 0.99 is painted pink. The background of values with an R2 value of 0 or negative is painted brown.
Table 6. Four models–six activation functions–11 optimizers study results. The background of values with an R2 value of 0.99 and above is painted green, and the background of values between 0.98 and 0.99 is painted pink. The background of values with an R2 value of 0 or negative is painted brown.
Model Optimizertest rmsetest maetest_R2Model Optimizertest rmsetest maetest_R2
LSTM(50,
activation = ‘relu
Adam3392430.9972LSTM(50,
activation = ‘tanh
Adam3482480.997
Nadam4293220.9954Nadam4383270.9953
RMSprop4212890.9957adamax4873780.9942
RMSprop*5003770.9938RMSprop5123810.9937
adamax5184070.9935RMSprop*8557230.9814
SGDTrue7695860.9853SGDTrue10398010.9714
SGD160212840.9294SGD178114370.9119
adagrad227918380.8518adagrad234118790.8342
Ftrl55264603−5.2028adadelta44153535−1.1026
adadelta67495649−5.44Ftrl51964283−34704
SGDFalse13,12511,938−1 × 1013SGDFalse93688245−6 × 1012
LSTM(50,
activation = ‘elu
Adam3452490.997LSTM(50,
model.add(LeakyReLU
(alpha = 0.05))
model.add(Dense(1))
model.add(LeakyReLU
(alpha = 0.05))
Adam3562580.9969
adamax4443410.9951RMSprop4763610.9943
Nadam5033940.9937Nadam4803730.9943
RMSprop5153870.9935adamax5274150.9932
SGDTrue6404730.9898RMSprop*7356120.9871
SGD172213860.9179SGDTrue9317060.9777
adagrad233718790.8366SGD185415020.8977
RMSprop*1,963,629166,538−0.0027adagrad249719820.7888
adadelta45403641−1,4515adadelta55824530−10.04
Ftrl52334319−3,6805Ftrl55954650−6.63
SGDFalse27,72226,952−5 × 1013SGDFalse1964418541−2 × 109
LSTM(50, activation = ‘sigmoidAdam5724250.9919LSTM(50,
activation = ‘softmax
RMSprop6745140.9883
adamax6044780.9909RMSprop*7075820.9869
Nadam6515280.9893Nadam7385510.9861
RMSprop6775550.9883Adam7505760.9856
RMSprop*7356030.9857adamax10108040.9736
SGDTrue9547420.9772SGDFalse79767769−1.0162
SGD131110400.9514SGDTrue72856206−1506.18
adagrad49113931−4.5018adagrad64785000−6 × 104
adadelta62925116−87.23SGD65475237−2 × 105
Ftrl63145103−219.58adadelta110969103−2 × 105
SGDFalse71315998−3 × 108Ftrl64845017−2 × 105
GRU(50,
activation = ‘relu
Adam4122960.996GRU(50, activation = ‘tanhAdam3852910.9964
Nadam4052960.996RMSprop4573390.995
adamax4813660.9945Nadam4543550.9948
RMSprop4753470.9944adamax5043930.9939
RMSprop*6595370.9887SGDTrue9887470.9743
SGDTrue9627210.9754SGD130410590.9563
SGD133610820.9546SGDFalse135411050.953
adagrad193515350.8824adagrad177214450.9156
Ftrl48013922−1.7584adadelta40803355−0.4021
adadelta64975391−5.6841RMSprop*54444511−0.5166
SGDFalse12,24610,396−4 × 1013Ftrl43423477−0.6307
GRU(50, activation = ‘eluAdam3732760.9965GRU(50,
model.add(LeakyReLU(alpha = 0.05))
model.add(Dense(1))
model.add(LeakyReLU(alpha = 0.05))
Adam4073130.996
Nadam4923790.9938adamax4343300.9954
adamax5354170.993Nadam4353370.9954
RMSprop5934830.9914RMSprop4853650.9944
SGDTrue10648150.9701SGDTrue8896640.9794
SGD131610700.9559RMSprop*118110450.9601
adagrad180914710.9117SGD134910980.9531
Ftrl43733503−0.6797adagrad199516060.8833
adadelta42963513−0.8821Ftrl49234032−2.1815
RMSprop*38,56938,093−19.488adadelta54884556−3.2035
SGDFalseerrorerrorerrorSGDFalse34,73034,1180
GRU(50, activation = ‘sigmoidAdam4863650.9942GRU(50, activation = ‘softmaxNadam5804270.9914
adamax5574040.9923Adam5824370.9913
RMSprop*5864730.9909RMSprop6054800.9907
Nadam6845630.9881adamax7605880.9851
SGDTrue10998750.9702RMSprop*7725900.984
RMSprop156614400.9373SGDTrue146711630.9472
SGD172913730.9118adagrad65075231−589.92
adagrad50093992−4.7411SGD64185136−1050.57
adadelta64015206−25.411adadelta90566966−1384.89
Ftrl58754734−28.589Ftrl65505234−3 × 1012
SGDFalse17,97616,764−3 × 1014SGDFalse16,74215,610−2 × 1013
SimpleRNN(50,
activation = ‘relu
Nadam3682660.9968SimpleRNN(50, activation = ‘tanhNadam4443320.9952
Adam4253070.9956adamax4993910.9939
RMSprop4863570.9942Adam5043810.9938
adamax5674410.9921SGDTrue5794300.9917
SGDTrue6685040.9889RMSprop6375110.9899
RMSprop*8136650.9832SGD9887430.9748
SGD9046850.9792adagrad181813850.911
adagrad185714880.9031Ftrl261920560.6899
Ftrl317424920.4742adadelta375229690.295
adadelta50724243−0.5113RMSprop*85537295−9.0302
SGDFalse17,57716,467−2 × 1012SGDFalse16,18715,0410
SimpleRNN(50, activation = ‘eluadamax4303290.9956SimpleRNN(50,
model.add(LeakyReLU(alpha = 0.05))
model.add(Dense(1))
model.add(LeakyReLU(alpha = 0.05))
adamax4343300.9955
Nadam4543480.9949Adam4723700.9946
Adam5043910.9937Nadam4813650.9944
RMSprop6274870.9902RMSprop4963550.9941
SGDTrue6374740.9896SGDTrue7025490.988
SGD9036840.9789SGD11478810.966
adagrad177113510.9182adagrad205316110.8802
Ftrl235218390.7707RMSprop*354131890.4623
adadelta398230550.5751adadelta441034620.0894
RMSprop*errorerrorerrorFtrl41993377−0.3491
SGDFalseerrorerrorerrorSGDFalse47,85847,417−2 × 1015
SimpleRNN(50, activation = ‘sigmoidAdam5754300.9918SimpleRNN(50, activation = ‘softmaxAdam7445680.9859
adamax5874360.9913adamax7595950.9856
Nadam6034620.991RMSprop*8026260.9832
RMSprop7676220.9851Nadam8226370.9828
SGDTrue9077180.9798RMSprop8366490.9827
RMSprop*9818490.9759SGDTrue146911740.9484
SGD140711230.9477SGD63195068−280.15
adagrad48483833−3.4409adagrad64925215−363.93
Ftrl54484472−11.308adadelta92077114−957.72
adadelta66235303−16.200SGDFalse65355203−1 × 1012
SGDFalse87726677−2 × 1013Ftrl65505234−2 × 1013
Bidirectional(LSTM(50, activation = ‘reluAdam3132240.9976Bidirectional(LSTM(50, activation = ‘tanhAdam4493270.9953
Nadam3452560.9971adamax4693560.9947
adamax4583420.9948Nadam4863850.9942
RMSprop5644580.9923RMSprop5474210.9927
RMSprop*6565190.989RMSprop*7596500.9854
SGDTrue8326360.9829SGDTrue11148470.9678
SGD148111650.9419SGD152111820.9395
adagrad221217280.8649adagrad213516530.8748
Ftrl49414152−2.66adadelta337225850.3352
adadelta62975378−8.6355Ftrl43983642−0.8918
SGDFalse11,97910,7980SGDFalse26,46325,656−3 × 1015
Bidirectional(LSTM(50, activation = ‘elu’adamax4073100.996Bidirectional(LSTM(50,
model.add(LeakyReLU(alpha = 0.05))
model.add(Dense(1))
model.add(LeakyReLU(alpha = 0.05))
Adam3462550.9971
Adam4092950.996Nadam3762750.9966
Nadam4573610.9949adamax4233200.9956
RMSprop4903680.9941RMSprop5694430.9923
SGDTrue7886040.9844SGDTrue10447810.9706
adagrad214716570.8749SGD161712720.9252
SGD147911480.9423adagrad235618260.8345
adadelta350127090.1778RMSprop*47433693−1.185
Ftrl44753706−1.0879adadelta47623745−2.6703
SGDFalse25,00124,144−3 × 1014Ftrl50804236−3.9808
RMSprop*errorerrorerrorSGDFalse36,37135,825−14357
Bidirectional(LSTM(50, activation = ‘sigmoidAdam5333860.9929Bidirectional(LSTM(50, activation = ‘softmaxNadam5844220.9914
adamax5513950.9924RMSprop6254950.9901
Nadam5834750.9914Adam6604790.9889
RMSprop6484840.9893RMSprop*6695240.9882
RMSprop*7186010.9868adamax10828390.97
SGDTrue9427420.9783SGDTrue71476101−323.77
SGD179613590.8969adagrad64905150−19493
adagrad44103545−0.9066SGD65255222−20221
adadelta59064933−14.449Ftrl65255189−3 × 105
Ftrl58094697−25.309adadelta99167837−77048
SGDFalse32,29231,633−7 × 1013SGDFalse18,93018,878−7.3997
Table 7. Top 10 results of studies conducted with 50 epochs. The background of values with an R2 value of 0.99 and above is painted green.
Table 7. Top 10 results of studies conducted with 50 epochs. The background of values with an R2 value of 0.99 and above is painted green.
Model LayersActivatörOptimizerEpochstest_rmsetest-maetest_R2
BiLSTM50reluAdam50313.04223.660.9976
LSTM50reluAdam50339.33242.540.9972
BiLSTM50reluNadam50345.03255.650.9971
BiLSTM50LeakyreluAdam50345.7255.370.9971
LSTM50eluAdam50345.192490.997
LSTM50tanhAdam50348.22248.380.997
LSTM50LeakyreluAdam50356.31257.810.9969
SimpleRNN50reluNadam50368.15265.960.9968
BiLSTM50LeakyreluNadam50375.89275.410.9966
GRU50eluAdam50373.14275.810.9965
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Ucar, M.T.; Kaygusuz, A. Short-Term Energy Consumption Forecasting Analysis Using Different Optimization and Activation Functions with Deep Learning Models. Appl. Sci. 2025, 15, 6839. https://doi.org/10.3390/app15126839

AMA Style

Ucar MT, Kaygusuz A. Short-Term Energy Consumption Forecasting Analysis Using Different Optimization and Activation Functions with Deep Learning Models. Applied Sciences. 2025; 15(12):6839. https://doi.org/10.3390/app15126839

Chicago/Turabian Style

Ucar, Mehmet Tahir, and Asim Kaygusuz. 2025. "Short-Term Energy Consumption Forecasting Analysis Using Different Optimization and Activation Functions with Deep Learning Models" Applied Sciences 15, no. 12: 6839. https://doi.org/10.3390/app15126839

APA Style

Ucar, M. T., & Kaygusuz, A. (2025). Short-Term Energy Consumption Forecasting Analysis Using Different Optimization and Activation Functions with Deep Learning Models. Applied Sciences, 15(12), 6839. https://doi.org/10.3390/app15126839

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop