AB-Net: A Novel Deep Learning Assisted Framework for Renewable Energy Generation Forecasting

: Renewable energy (RE) power plants are deployed globally because the renewable energy sources (RESs) are sustainable, clean, and environmentally friendly. However, the demand for power increases on a daily basis due to population growth, technology, marketing, and the number of installed industries. This challenge has raised a critical issue of how to intelligently match the power generation with the consumption for efﬁcient energy management. To handle this issue, we propose a novel architecture called ‘AB-Net’: a one-step forecast of RE generation for short-term horizons by incorporating an autoencoder (AE) with bidirectional long short-term memory (BiLSTM). Firstly, the data acquisition step is applied, where the data are acquired from various RESs such as wind and solar. The second step performs deep preprocessing of the acquired data via several de-noising and cleansing ﬁlters to clean the data and normalize them prior to actual processing. Thirdly, an AE is employed to extract the discriminative features from the cleaned data sequence through its encoder part. BiLSTM is used to learn these features to provide a ﬁnal forecast of power generation. The proposed AB-Net was evaluated using two publicly available benchmark datasets where the proposed method obtains state-of-the-art results in terms of the error metrics.


Introduction
In recent years, an exponential increase in power consumption has been noted due to the growth of the population and economy, which requires a continual demand for energy resources [1]. Globally, fossil fuels have been utilized as a primary and vital source of power generation throughout the years. The extensive usage of fossil fuels for energy production has instigated their shortage and many other serious environmental issues that cause living health threats as well as an alarming case for global climate change [2]. Further, it takes several decades for fossil fuels to be developed, while the existing supplied energy is consumed faster than the new fossil fuels. For this reason, power generation industries are showing a keen interest in RESs for energy generation [3]. The main resources of RE are photovoltaics (PV), wind power, hydropower, and geothermal power [4]. These RESs are plentiful, inexhaustible, and renewable in the real world and are clean, efficient, and helpful for the protection of the natural environment by decreasing the threat of atmospheric contamination and the greenhouse effect [5]. Similarly, the usage of RESs helps to reduce the burden on power stations and the demand for natural fossil fuels. These resources contribute to reducing carbon emissions as well as natural energy resource conservation. In recent years, power generation from renewable resources has been developed on a large scale. In 2016, the total production of RE accounted for 24.5% of the electric power generation and 19.3% of the overall global energy consumption [6].
RESs are considered as the most promising replacements for fossil fuels since they are naturally replenished over a huge geographical region, and their energy conversion is possible [7]. However, their use also involves unpredictable uncertainty that adversely affects the stability and reliability of large-scale RE power plants [8]. Forecasting energy production at RE generation plants is a key factor towards future settlement and enhancement [9]. Due to the inconsistent, unpredictable, and irregular character of RE data, precise energy generation and consumption forecasting remains a difficult challenge. On this account, RE forecasting has been investigated in recent decades to address the issues that have arisen due to the significant increase in RES power plants around the world [10]. Different techniques for RE forecasting such as the future short-and long-term time intervals have been documented in the literature. Future prediction techniques for RE are generally based on physical models which estimate the energy using weather and power station information [11]. Physical techniques are mostly based on numerical weather simulation of atmospheric phenomena using scientific parameters and geographic conditions to simulate atmospheric dynamics [12]. For short-term intervals of forecasting demands, physical techniques are ineffective and not suitable for efficient and accurate predictions [13]. In the literature, different statistical approaches such as the Bayesian-based adaptive model, autoregressive moving average technique, Kalman filter (KF), Hammerstein model, Markov chain model, and other regression models are frequently incorporated for the prediction of future power generation [14][15][16]. The statistical approaches yield the most accurate predictions; however, most of them are linear in nature and are unable to handle the predictions with long-term forecasting demands [13].
Due to the development and enhancement in the field of artificial intelligence (AI)based prediction models, machine learning (ML) and deep learning (DL) have proven to be successful tools for RE prediction. Research reveals that various ML and DL algorithms have been used for the purpose of RE forecasting [10]. Different assembled AI-based models have been developed to enhance the RE forecast accuracy [17]. To predict RE generation, several time horizons have been investigated such as minutely, hourly, daily, weekly, and monthly depending on the objective of the forecast [18]. Data-driven prediction models based on ML techniques including support vector machines (SVMs), k-nearest neighbors (k-NNs), support vector regression (SVR), multiple linear regression (LR), regression tree, gradient boosting (GB), and random forest (RF) are frequently utilized for the RE prediction domain. Deep neural networks (DNNs), long short-term memory (LSTM), and gated recurrent units (GRUs) are DL-based models that have been utilized for the prediction of power consumption and RE generation for different horizons with adequate results [8,19]; further, LSTM along with AE has been incorporated with satisfactory results [20].
Due to the large-scale applications and prominent role of RE, there is a wide range of literature published on RE forecasting [21,22]. However, there exist several challenges in artificial neural networks (ANNs) and traditional ML methods that work with only a fixed length of input data. Similarly, DL models such as convolutional neural networks (CNNs) are limited to extract meaningful and suitable features from time series data. However, AI-based ML and DL models have shown satisfactory performance for real-time expected power generation predictions, particularly when learning from dynamic changes in environmental circumstances is crucial to improve the forecasting accuracy [23]. Thus, our study aimed to use a DL ensemble approach based on an AE and BiLSTM to improve the one-step forecast accuracy of RE systems for short-term horizons. The proposed AB-Net network possesses the ability to extract the complex and most discriminative features from sequential data with the AE and feed it to the BiLSTM to learn the sequence for prediction via the internal memory concept. Following are the main contributions of our research: • Initially, acquiring power generation data through meters introduces different abnormalities and noise in the data such as missing values, outliers, and redundancy, due to the environmental conditions. Processing such a type of data yields incorrect energy generation forecasting. To overcome this issue, the raw data are passed through the preprocessing layer where they are cleaned, normalized, and de-noised to make them suitable for effective processing.

•
The established literature reveals that the sequential learning approaches have a strong performance in time series prediction data. Inspired by their reasonable and accurate performances for prediction problems, for the first time, a novel hybrid network composed of an AE and BiLSTM is proposed for single-step forecasting of RE power generation. • Short-term RES power production forecasting is very useful, and this information can improve the performance of existing energy systems. Furthermore, short-term forecasting of power allows for efficient integration, trading, storage unit management, and control systems of energy. Therefore, in this paper, we propose a model that has the ability to forecast short-term horizons for one-step RE forecasting.

•
To confirm and verify the effectiveness of the proposed method, we conduct an extensive set of experiments on publicly available power generation datasets. We experimentally prove that the proposed method outperforms state-of-the-art methods by comparing it with competitive models including BiLSTM, CNN-BiLSTM, and an encoder-decoder (ED) via basic evaluation metrics such as mean absolute error (MAE), mean squared error (MSE), and root mean square error (RMSE), where the proposed AB-Net obtains the lowest error rate.
The remaining part of this paper is structured as follows: Briefly, the literature on RE forecasting is discussed in Section 2. Section 3 provides the detailed research methodology and its full overview. Section 4 discusses the experiments and visual representation of the results, while Section 5 concludes the paper along with further research directions.

Literature Review
In recent years, researchers have switched their attentions towards RESs for estimating power generation. These sources have been widely utilized for power generation due to the ease of their availability and renewable nature. One of the challenges in power production from RESs is its sustainability. Prediction of power generation from RESs mainly depends on environmental variables such as wind speed, wind direction, and weather conditions. These non-human controllable parameters make the prediction problem more challenging. Different types of prediction techniques have been utilized for estimating power generation from the renewable sources that are discussed in the following subsections.

Wind Power Generation
In the domain of wind power generation forecasting, different statistical, DL, and ML methods have traditionally been used. For instance, Liu et al. [24] proposed a combined model for short-term wind speed forecasting that utilized a multi-objective optimization algorithm to tackle wind speed issues such as nonlinearity, irregularity, and non-stationarity. Similarly, Sun et al. [25] introduced a hybrid approach by incorporating various techniques such as LSTM principle computing, secondary decomposition, and random forest to tackle issues related to wind energy generation such as sustainability and sanitation. Next, Hu et al. [26] proposed a stacked hierarchy of reservoirs which introduces the basic echo state network and DL framework for power production and consumption forecasting. Sharifzadeh et al. [27] conducted a study using an ANN, Gaussian process regression, and SVR for the prediction of future energy from wind resources. Similarly, Demolli et al. [28] used extreme GB (XGB) regression, SVR, and RF approaches for wind power energy forecasting using daily wind speed data. Li et al. [29] applied the least square SVM for short-term wind speed prediction, while Andrade et al. [30] presented a wind and solar power prediction model using the GB decision tree (DT) algorithm with feature engineering techniques. Moreover, Khosravi et al. [31] investigated the fuzzy inference system (FIS), SVR, and other ML approaches to forecast wind speed data for a power plant located in Brazil. Guoyang et al. [32] analyzed time series data of wind speed using the autoregressive moving and autoregressive integrated moving average approaches to predict wind power. Furthermore, Ding et al. [33] used the KF model for online prediction of wind speed and power generation for an efficient grid management system. Manero et al. [34] evaluated different DL approaches using wind speed time series data for the prediction of wind power generation. Khan et al. [35] combined DL and principal component analysis approaches for forecasting wind power using datasets of hourly, monthly, and yearly wind speed data. Eze et al. [36] introduced LSTM networks for the prediction of power generated at a wind power plant. Liu et al. [37] practiced wavelet packets along with other DL approaches for wind speed prediction, with outstanding results.

Solar Power Generation
Solar energy is a limitless RES that does not emit carbon or other greenhouse gases, since it does not need fuel or other resources. This property makes it one of the most ecofriendly energy generation technologies. In solar energy, radiation is considered to be an important parameter with different intervals of time scales. Different ML and DL approaches based on data-driven methods have been used for the purpose of effective management at solar power plants. For instance, Aslam et al. [38] analyzed different DL approaches for the prediction of solar radiation for one year ahead in intervals of hours and days through a recurrent neural network (RNN), GRU, LSTM, feedforward neural network (FFNN), and SVR. Next, Torres-Barrán et al. [39] utilized the methods of RF regression, GB regression, and XGB for the prediction of power generation from the renewable sources of solar and wind. Another group, Saloux et al. [40], investigated DT, SVM, and ANN for the prediction of the heating demand at a solar power plant, while Sun et al. [41] presented a CNN-based prediction approach for PV power generation. Torres et al. [42] proposed an FFNN to predict the day ahead electricity generated by PV solar systems, while Kamadinata et al. [43] forecasted the solar radiation from sky images using the ANN architecture. Similarly, Correa-Jullian et al. [44] explored the techniques of ANN, RNN, and LSTM and found these methods reliable for solar energy prediction. AlKandari et al. [45] used both ML and statistical methods for the prediction of future solar power generation in solar plants. Liu et al. [46] comparatively analyzed the SVM and copula-based nonlinear quantile regression (CNQR) approaches in terms of predicting solar radiation and proved the efficiency of CNQR over SVM.

Hydropower Generation
Among RESs, hydropower is also one of the most widely used power generation sources. Water sources are used for energy production due to their efficient characteristics, economic viability, and availability [47]. Different resources of water such as rivers and stored water are used for power generation purposes. Rainfall is also considered an important parameter affecting the power generation process [48]. Different types of prediction approaches have been presented for the better planning and management of hydropower plants [49]. For instance, Sapitang et al. [50] predicted the water level at a hydropower generation plant using the supervised ML approaches of Bayesian linear regression (LR), boosted DT regression, neural network regression, and decision forest regression. Similarly, Dehghani et al. [51] presented a promising approach using gray wolf optimization and an adaptive neuro-fuzzy inference system for hydropower generation prediction. Further, Zhang et al. [52] presented a multi-step hybrid approach of long-, medium-, and short-term Bayesian stochastic dynamic programming for the purpose of forecasting hydropower inflows. Hong et al. [53] forecasted rainfall with the hybrid approach of RNN and SVR along with the chaotic particle swarm optimization approach, while Wang et al. [54] presented a seasonal decomposition-based least square SVR approach for power generation prediction in hydropower plants. Lansberry et al. [55] utilized the genetic algorithm approach for optimization of the gains of governors that are plant parameters of the conduit constant and load self-legalization at a hydropower plant. Similarly, the authors in [56] used wavelet transform and SVR to predict tidal current speed and direction at a tidal power generation plant. Safari et al. [57] predicted tidal current speed and direction using least square SVR and ensemble empirical mode decomposition. Ozbas et al. [58] predicted hydrogen pro-duction through biomass gasification using ML-based approaches of LR, SVM regression, k-NN regression, and DT regression.

Methodology
This section discusses the proposed framework for power generation prediction. First, we discuss the data acquisition and preprocessing steps. Then, the technical details of the proposed AB-Net architecture are presented, and, finally, the model evaluation strategy is explained. The overall framework of the proposed system is shown in Figure 1.
plant parameters of the conduit constant and load self-legalization at a hydropower plant. Similarly, the authors in [56] used wavelet transform and SVR to predict tidal current speed and direction at a tidal power generation plant. Safari et al. [57] predicted tidal current speed and direction using least square SVR and ensemble empirical mode decomposition. Ozbas et al. [58] predicted hydrogen production through biomass gasification using ML-based approaches of LR, SVM regression, k-NN regression, and DT regression.

Methodology
This section discusses the proposed framework for power generation prediction. First, we discuss the data acquisition and preprocessing steps. Then, the technical details of the proposed AB-Net architecture are presented, and, finally, the model evaluation strategy is explained. The overall framework of the proposed system is shown in Figure  1.

Figure 1.
Overall framework of the proposed architecture. In step 1, the power generation data are acquired, which are further preprocessed in step 2. In step 3, the features are extracted and passed through the BiLSTM for decoding. In step 4, the predictions are obtained based on the trained model that is evaluated through basic error metrics and graphs. Overall framework of the proposed architecture. In step 1, the power generation data are acquired, which are further preprocessed in step 2. In step 3, the features are extracted and passed through the BiLSTM for decoding. In step 4, the predictions are obtained based on the trained model that is evaluated through basic error metrics and graphs.

Data Acquisition and Preprocessing
In this section, we discuss the data acquisition and preprocessing steps in detail. Power is generated from different RESs such as wind, hydro, solar, geothermal, tidal, and biomass. The generated power from renewable sources is provided to consumers through a power distribution system such as a smart grid. In the proposed method, solar and wind power generation data are considered. Detailed descriptions of each dataset such as location, time, samples, duration, interval, and other attributes are presented in Table 1 (Section 4.1). Several smart sensors are installed in smart grids that measure the power generation and consumption information, and they keep records for future analysis. These previous data such as power generation and consumption are utilized for training ML and DL models for future power generation forecasting and consumption prediction. During the acquisition of data, there are some uncertainties in the data such as noise and missing values. To remove these abnormalities, preprocessing techniques are applied. The moving average filter is an important technique that is utilized to smooth data and make them appropriate for model training. For handling missing values, the substitution method can be applied, where missing values are filled with previous time values. ML and DL models learn to map the input data to the output data [59]. There are multiple variables in input data that have different distributions and scale ranges. The difference between the scale and distribution of the input variables makes it difficult to model a particular problem. Hence, DL models learn huge values for weights when the input variable values are large and the values are in different ranges. As a result, the model becomes unstable and yields a poor performance. Similarly, a model with large values for weights has a higher generalization error and suffers from a poor performance throughout the learning. Furthermore, a large difference in output variable values makes the learning process unstable and results in a large error gradient. Therefore, it is very important to scale the input and output data before training ML and DL models. To tackle the above-mentioned problems, the input and output variable data can be normalized to a range of 0 and 1.

Proposed Network for Power Generation
This section discusses the proposed AB-Net framework, which is a hybrid network of an AE and BiLSTM. Then, various sequential models such as RNN, LSTM, BiLSTM, and an autoencoder are discussed in the following subsections.

Recurrent Neural Network
RNNs are an important type of DNN which deals with sequential data using the internal memory concept and loops. Figure 2a shows the basic structure of an RNN that is similar to the architecture of LSTM, while Figure 2b illustrates the unfolded structure. The calculation process of a hidden layer state is presented in Equation (1). The hidden state h t of a hidden layer is modified and retained on the basis of the previous hidden state h t−1 and the layer input x t at every time interval t.
In Equation (1), σ h is the activation function, W xh is the weight matrix for the input to the hidden layer, W hh is successive hidden states' weight matrix, and b h is the hidden layer bias vector to produce the hidden state. The output of the network is shown in Equation (2), where σ y is the output layer activation function, and W hy is the weight matrix for the hidden layer to the output layer, while the output layer bias vector is represented by b y . In nonlinear time series problems, RNNs have shown a good performance compared to traditional ML methods. However, general RNNs have some problems during backpropagation such as exploding gradients and vanishing gradients due to which these sequential models become incapable of learning long-term dependencies and longtime lags.

Long Short-Term Memory
RNNs suffer from vanishing and exploding gradient problems; therefore, to handle these issues, the architecture of LSTM has been introduced, which is well known for its good performance on sequential problems with long-term dependencies [60]. The hidden layer of LSTM, which is also called the LSTM cell, makes it different from the general RNN architecture [61]. Figure 3a shows the hidden layer of LSTM, where is the input of the cell at time t, and ℎ is the output. During weight updating and training, the hidden layer of LSTM also considers different cell states including the input , output , and previous output . The gate concept is present in LSTM compared to general RNNs due to which LSTM is capable of learning useful information from long-term as well as shortterm dependencies. LSTM includes three types of gates: input, forget, and output gates, which make it an effective and scalable model for various sequence-based tasks. In Figure  3a, for time t, the input gate of the LSTM cell is represented by and the forget gate by , while the output gate is represented by . Equations (3)-(6) are used to calculate the gates of a cell [62].
In the above equations, is the activation function for each gate which is normally a sigmoid function, while the hyperbolic tangent function is represented by ℎ. Weight matrices are represented by , , and for mapping from the cell input to the LSTM gates, while is the weight matrix for mapping the cell input to the input cell state. Similarly, for connecting the prior hidden layer output state to the gates and the input cell state, weight matrices are represented by , , , and . The bias vectors are represented by , , , and in each equation. At time interval t, the layer output ℎ and cell output state can be calculated using Equations (7) and (8):

Long Short-Term Memory
RNNs suffer from vanishing and exploding gradient problems; therefore, to handle these issues, the architecture of LSTM has been introduced, which is well known for its good performance on sequential problems with long-term dependencies [60]. The hidden layer of LSTM, which is also called the LSTM cell, makes it different from the general RNN architecture [61]. Figure 3a shows the hidden layer of LSTM, where x t is the input of the cell at time t, and h t is the output. During weight updating and training, the hidden layer of LSTM also considers different cell states including the input C t , output C t , and previous output C t−1 . The gate concept is present in LSTM compared to general RNNs due to which LSTM is capable of learning useful information from long-term as well as short-term dependencies. LSTM includes three types of gates: input, forget, and output gates, which make it an effective and scalable model for various sequence-based tasks. In Figure 3a, for time t, the input gate of the LSTM cell is represented by i t and the forget gate by f t , while the output gate is represented by o t . Equations (3)-(6) are used to calculate the gates of a cell [62].

Bidirectional LSTM
The bidirectional RNN and BiLSTM ideas are similar, which involve the processing of sequential data with separate hidden layers in both directions, i.e., forward and backward [63]. These two hidden layers are connected to the same output layer in a BiLSTM network, and it is proved that these bidirectional networks are considerably better than unidirectional models in many domains such as speech classification and gene sequence classification. Figure 3b shows the unfolded structure of BiLSTM, which contains the forward and backward LSTM layers. The output sequence of the forward layer ℎ ⃗ is repeatedly calculated from time T−n to time T−1, utilizing inputs in a positive sequence. Similarly, by means of reversed inputs from T−n to T−1, the output sequence of the backward layer ℎ ⃖ is iteratively calculated. The bidirectional layer produces an output vector where every element is calculated using the following Equation (9): To combine the forward and backward layer output sequences, the function is used. This function can have different purposes such as summation, average, concatenation, or multiplication.

Bidirectional Autoencoder
An AE performs the task of learning the compact representation of data using an unsupervised learning approach. In this technique, a neural network architecture is designed in such a way that imposes a compressed information representation of the original input data. Figure 4 shows that unlabeled data can be framed as a supervised learning In the above equations, σ g is the activation function for each gate which is normally a sigmoid function, while the hyperbolic tangent function is represented by tanh. Weight  (7) and (8):

Bidirectional LSTM
The bidirectional RNN and BiLSTM ideas are similar, which involve the processing of sequential data with separate hidden layers in both directions, i.e., forward and backward [63]. These two hidden layers are connected to the same output layer in a BiLSTM network, and it is proved that these bidirectional networks are considerably better than unidirectional models in many domains such as speech classification and gene sequence classification. Figure 3b shows the unfolded structure of BiLSTM, which contains the forward and backward LSTM layers. The output sequence of the forward layer → h is repeatedly calculated from time T − n to time T − 1, utilizing inputs in a positive sequence. Similarly, by means of reversed inputs from T − n to T − 1, the output sequence of the backward layer ← h is iteratively calculated. The bidirectional layer produces an output vector where every element is calculated using the following Equation (9): To combine the forward and backward layer output sequences, the function σ is used. This function can have different purposes such as summation, average, concatenation, or multiplication.

Bidirectional Autoencoder
An AE performs the task of learning the compact representation of data using an unsupervised learning approach. In this technique, a neural network architecture is designed in such a way that imposes a compressed information representation of the original input data. Figure 4 shows that unlabeled data can be framed as a supervised learning task with the reconstruction of the original input data. There are three types of layers in an AE, which are the input, hidden, and output layers, where the hidden layers learn to encode the data, while the output layers reconstruct the original data from the encoded data [64]. An AE is trained in order to reduce the reconstruction error, which is the difference between the reconstructed data and original input data. The important attribute in the AE architecture design is the bottleneck, which is utilized to obtain the compressed form of the original input data. An AE simply learns to memorize the input data by passing the data through the model with the presence of bottleneck information. The bottleneck is responsible for restraining the required information by traversing the whole architecture, forcing the original input data into a compressed representation. A small number of nodes are maintained in the hidden layer of our network architecture due to which the information flow is also reduced through the network. The AE is trained according to the reconstruction error and tries to learn the key attributes from the original input data, which is called data encoding, and then it tries to reconstruct the real original data from the encoded data, which is called data decoding. A BiLSTM-based ED structure can be used to implement a BiLSTM-based AE for time series data. A BiLSTM-based ED is constructed for sequential input data in such a way that it can read the input data properly, encode it, and finally reconstruct it. The efficiency of the architecture is then computed from its capability to reconstruct the original input time series data. During unsupervised learning, when the model obtains the preferred accuracy, the encoder part of the model is used to encode the input data to a fixed length vector, while the decoder part of the model is removed. reconstruction error and tries to learn the key attributes from the original input data, which is called data encoding, and then it tries to reconstruct the real original data from the encoded data, which is called data decoding. A BiLSTM-based ED structure can be used to implement a BiLSTM-based AE for time series data. A BiLSTM-based ED is constructed for sequential input data in such a way that it can read the input data properly, encode it, and finally reconstruct it. The efficiency of the architecture is then computed from its capability to reconstruct the original input time series data. During unsupervised learning, when the model obtains the preferred accuracy, the encoder part of the model is used to encode the input data to a fixed length vector, while the decoder part of the model is removed.

Model Evaluation
In this work, an ablation study was conducted using four different sequential models on two publicly available power generation datasets. In the proposed AB-Net model, first, a BiLSTM autoencoder is trained, then its decoder part is removed, and the encoder part is used for extracting the meaningful features from the data. Finally, the extracted features are passed through another BiLSTM network for one-step forecasting of power generation. All the forecasting methods were evaluated using basic error metrics that are presented in Equations (10)

Model Evaluation
In this work, an ablation study was conducted using four different sequential models on two publicly available power generation datasets. In the proposed AB-Net model, first, a BiLSTM autoencoder is trained, then its decoder part is removed, and the encoder part is used for extracting the meaningful features from the data. Finally, the extracted features are passed through another BiLSTM network for one-step forecasting of power generation. All the forecasting methods were evaluated using basic error metrics that are presented in Equations (10)-(12) and visual graphs. For instance, y ∼ i shows variable values for n number of predictions that are samples from the power generation, while y i shows the predicted/observed numbers. The MSE calculates the average of squared error, showing the difference between the estimated and observed values. Similarly, the RMSE is the square root of the value obtained from the MSE. The details of the ablation study are presented in the Experimental Result section.

System Settings and Hyperparameters
The sequential models used for power generation forecasting were implemented in Python (version 3.8.5) with a popular DL framework (Keras) with Tensorflow at the backend. Each model was trained up to 100 epochs on each dataset with the Adam optimizer, with a learning rate of 0.001, and a batch size of 16. In the BiLSTM network, two BiLSTM layers with 200 and 100 neurons are used for the first and second layers, respectively, followed by a fully connected layer with 50 neurons. Similarly, in the CNN-BiLSTM network, two layers of a one-dimensional CNN are used with a 1 × 3 filter size, while 128 and 256 filters are utilized in the first and second layers, respectively, followed by a max pooling layer. After the CNN layers, two BiLSTM layers having 200 and 100 neurons are used, followed by a fully connected layer with 50 neurons. Furthermore, the encoder part of the ED model has two BiLSTM layers with 200 and 100 neurons in the first and second layers, respectively, while the decoder part also comprises two BiLSTM layers with 100 and 200 neurons. After the decoder part, there is one fully connected layer with 50 neurons. The proposed model is a hybrid connection of two networks that uses the encoder part of the AE for feature extraction and passes the features to the BiLSM layers for decoding. The encoder part of the AE in the proposed model contains two BiLSM layers with 200 and 100 neurons. After that, two layers of BiLSM are used, having 200 and 100 neurons, followed by a fully connected layer with 50 neurons.

Experimental Results
This section thoroughly explains the experiments performed for power generation forecasting using publicly available datasets with the hold-out method to evaluate the performance of the proposed method. We used 70% and 30% of the data for training and testing, respectively, which is a standard data splitting procedure. Next, for classification purposes, model validation was performed via accuracy, recall, and precision. However, time series forecasting is a regression problem; therefore, basic error metrics such as the MSE, RMSE, and MAE were assessed, which are widely used to validate and verify the effectiveness of regression problems.

Datasets
To verify and evaluate the performance of the proposed method, two publicly available datasets, namely, a solar dataset [65] and a wind dataset [66], were used. The description of each dataset is presented in Table 1. The solar dataset was obtained from [65] and was collected at a solar plant located at the stadium of the Yeongam F1. These data cover three years and ten months (i.e.,

Wind Dataset
This dataset was obtained from NREL [66] and was gathered in New Kirk. The wind dataset consists of power, wind speed, wind direction, surface air pressure, air temperature, and air density. In this dataset, five variables such as wind direction, air temperature, wind speed, air density, and surface air pressure are considered as input variables, while the power is considered to be forecasted.

Results on Solar Dataset
This section discusses the results obtained over state-of-the-art techniques that include the most popular competitive DL networks such as BiLSTM, CNN-BiLSTM, ED, and AB-Net.
There are several studies that have used different DL approaches for forecasting purposes. The RNN architecture is one of the most employed techniques for forecasting problems, which is capable of remembering the preceding input data to learn the weights of the network. Several variants of the RNN architecture such as LSTM and BiLSTM have been used that have improved the network's ability to preserve the network states by capturing the long-term sequential dependencies. Initially, LSTM was formed to extend the memory state in RNNs and to enable them to deal with long-term dependencies. Similarly, another form is BiLSTM, where the proceeding input sequences are learned in both the forward and backward directions. In BiLSTM, several layers are stacked to capture the complex features in time series. In the experiments, we firstly analyzed the results obtained over BiLSTM by using its predefined settings. The two layers are stacked together to process the input data, where each layer performs its operations in the reverse direction. The results obtained from BiLSTM are combined in the final layer to produce the final prediction/forecast. BiLSTM was found to be effective in the literature. The MSE value obtained by BiLSTM on the solar dataset was 0.0112. The value is presented in Table 2, where the RMSE and MAE are also shown. The forecasting graph obtained over BiLSTM is presented in Figure 5a. Next, the experiments were performed on the hybrid network where CNN and BiLSTM are combined to extract the most important and discriminative features. In this network, the features from multivariate data are extracted through the CNN layers which contain the most important details about the sequential series data. The features obtained through the CNN are forward propagated into BiLSTM to learn them for forecasting purposes. The value obtained for the MSE on the solar dataset was 0.0111, while the other metric values such as the RMSE and MAE are presented in Table 2. The forecasting graph obtained over CNN-BiLSTM is presented in Figure 5b. Next, the ED model was applied, which is also a technique of using BiLSTM for sequence-to-sequence forecasting problems. This technique involves two BiLSTM networks, where one network encodes the sequence, known as an encoder, while the other decodes the input sequence into a target, called a decoder. The encoder takes a single element from the input sequence at every time step by processing it. It collects the information and forward propagates it. The encoder produces an internal state that contains the information about the entire sequence that helps the decoder to carry out accurate forecasting. Finally, the decoder provides the final prediction at each time step. The MSE value obtained with the ED on the solar dataset was 0.0107. The value is presented in Table 2, where the RMSE and MAE are also presented. The forecasting graph obtained over the ED is presented in Figure 6a. Next, the ED model was applied, which is also a technique of using BiLSTM for sequence-to-sequence forecasting problems. This technique involves two BiLSTM networks, where one network encodes the sequence, known as an encoder, while the other decodes the input sequence into a target, called a decoder. The encoder takes a single element from the input sequence at every time step by processing it. It collects the information and forward propagates it. The encoder produces an internal state that contains the information about the entire sequence that helps the decoder to carry out accurate forecasting. Finally, the decoder provides the final prediction at each time step. The MSE value obtained with the ED on the solar dataset was 0.0107. The value is presented in Table 2, where the RMSE and MAE are also presented. The forecasting graph obtained over the ED is presented in Figure 6a. The proposed method is a hybrid connection of an AE and BiLSTM, rendering the network more capable of extracting the most important and hierarchical features from the multivariate data. The initial part of the network consists of an AE that takes the input  Next, the ED model was applied, which is also a technique of using BiLSTM for sequence-to-sequence forecasting problems. This technique involves two BiLSTM networks, where one network encodes the sequence, known as an encoder, while the other decodes the input sequence into a target, called a decoder. The encoder takes a single element from the input sequence at every time step by processing it. It collects the information and forward propagates it. The encoder produces an internal state that contains the information about the entire sequence that helps the decoder to carry out accurate forecasting. Finally, the decoder provides the final prediction at each time step. The MSE value obtained with the ED on the solar dataset was 0.0107. The value is presented in Table 2, where the RMSE and MAE are also presented. The forecasting graph obtained over the ED is presented in Figure 6a. The proposed method is a hybrid connection of an AE and BiLSTM, rendering the network more capable of extracting the most important and hierarchical features from the multivariate data. The initial part of the network consists of an AE that takes the input The proposed method is a hybrid connection of an AE and BiLSTM, rendering the network more capable of extracting the most important and hierarchical features from the multivariate data. The initial part of the network consists of an AE that takes the input sequence and analyzes it for detailed information collection. After this step, once the information from the AE part is collected, this information is forward propagated into the BiLSTM for final forecasting. In traditional time series data problems, the AE is usually formed by stacking simple LSTM layers that are not effective in encoding long-term dependencies. However, in the proposed method, we create the AE part from the BiLSTM. The output from the AE is forward fed into the BiLSTM to learn the sequence and provide the final prediction/forecast. The first input layer is a BiLSTM that is followed by another BiLSTM layer, which has a small size. The output taken from the encoder part of the AE is fed into the repeat vector, which is a single vector that reshapes it in our BiLSTM network. The value of the MSE obtained on the solar data was 0.0106. The value is presented in Table 2, where the RMSE and MAE are also presented. The forecasting graph obtained over AB-Net is presented in Figure 6b.

Results on Wind Dataset
This section thoroughly explains the results obtained on the wind dataset. Similar to the solar dataset, we practiced the same strategy that was previously applied for the ablation study.
Firstly, the BiLSTM was applied to study its performance on the wind dataset, where we examined that the BiLSTM has a good performance compared to its results on the solar dataset. In fact, the wind blows for a constant time, and the air turbines continuously operate for 24 h, while the solar panel only works in the daytime where sunlight radiation occurs in a specific period. Therefore, some values in this duration are not recorded. The obtained MSE value by the BiLSTM on the wind dataset was 0.0005, while the RMSE and MAE were 0.0219 and 0.0142, respectively. The forecasting graph obtained over BiLSTM is presented in Figure 7a. The hybrid connection of CNN and BiLSTM was also evaluated on the wind dataset, and it was found to perform better than the results obtained on the solar dataset due to the same previously stated reason. However, its results on the wind dataset are better than the simple BiLSTM, where the obtained MSE value was 0.0005, while the RMSE and MAE values were 0.0216 and 0.0133, respectively. The forecasting graph obtained over CNN-BiLSTM is presented in Figure 7b.
This section thoroughly explains the results obtained on the wind dataset. Similar to the solar dataset, we practiced the same strategy that was previously applied for the ablation study.
Firstly, the BiLSTM was applied to study its performance on the wind dataset, where we examined that the BiLSTM has a good performance compared to its results on the solar dataset. In fact, the wind blows for a constant time, and the air turbines continuously operate for 24 h, while the solar panel only works in the daytime where sunlight radiation occurs in a specific period. Therefore, some values in this duration are not recorded. The obtained MSE value by the BiLSTM on the wind dataset was 0.0005, while the RMSE and MAE were 0.0219 and 0.0142, respectively. The forecasting graph obtained over BiLSTM is presented in Figure 7a. The hybrid connection of CNN and BiLSTM was also evaluated on the wind dataset, and it was found to perform better than the results obtained on the solar dataset due to the same previously stated reason. However, its results on the wind dataset are better than the simple BiLSTM, where the obtained MSE value was 0.0005, while the RMSE and MAE values were 0.0216 and 0.0133, respectively. The forecasting graph obtained over CNN-BiLSTM is presented in Figure 7b. The ED that was formed by the BiLSTM variants was also evaluated on the wind dataset and obtained promising results. The architectural details of ED have been previously discussed. The ED performed better than BiLSTM and CNN-BiLSTM by obtaining a 0.0005 MSE on the wind dataset. The RMSE and MAE values were 0.0198 and 0.0130, respectively. The forecasting graph obtained over ED is presented in Figure 8a. Finally, the proposed AB-Net architecture was evaluated on the wind dataset, which beats all the The ED that was formed by the BiLSTM variants was also evaluated on the wind dataset and obtained promising results. The architectural details of ED have been previously discussed. The ED performed better than BiLSTM and CNN-BiLSTM by obtaining a 0.0005 MSE on the wind dataset. The RMSE and MAE values were 0.0198 and 0.0130, respectively. The forecasting graph obtained over ED is presented in Figure 8a. Finally, the proposed AB-Net architecture was evaluated on the wind dataset, which beats all the previously practiced networks on the wind dataset. The network settings of the proposed AB-Net have already been explained in the previous section, and its further details are out of the scope of this paper. The obtained MSE of the proposed method on the wind dataset was 0.0004, while the RMSE and MAE values were 0.0189 and 0.0109, respectively, as shown in Table 3. The forecasting graph obtained over AB-Net is presented in Figure 8b.  Table 3. The forecasting graph obtained over AB-Net is presented in Figure 8b.

Assessment with State of the Art
In this section, we compare the proposed method with recent research carried out for power generation forecasting. Both the solar and wind datasets were considered for the comparative study. The comparison was performed with the most recent method [67],  In this section, we compare the proposed method with recent research carried out for power generation forecasting. Both the solar and wind datasets were considered for the comparative study. The comparison was performed with the most recent method [67], where a mode-adaptive ANN algorithm is proposed via Spearman's ranking order and population-based algorithms. They evaluate different models such as advanced particle swarm optimization (APSO) and the fine-tuning metaheuristic algorithm (FTMA). We considered their most outstanding results for the comparison, which were obtained using FTMA, in their case. The MSE values obtained by FTMA on the solar dataset and wind dataset were 0.0207 and 0.4944, while the RMSE was 0.1438 and 0.7031, respectively. Finally, we pose the results of the proposed method on the solar dataset where the obtained MSE, RMSE, and MAE were 0.0106, 0.1028, and 0.0743, respectively, while the MSE, RMSE, and MAE for the wind dataset were 0.0004, 0.0189, and 0.0109, respectively. Comparative results are shown in Figure 9a,b for both datasets.

Conclusions
To mitigate climate change and global warming impacts, RE usage is significantly increasing on a daily basis. A certain amount of power has been generated by different RESs in recent decades. The power generated through these plants is used by consumers for different applications. However, the power produced needs to be predicted so that an exact amount of power is produced in the future. To forecast this problem, several techniques have come into the foreground, where the majority of these methods are based on traditional learning techniques. To this purpose, we developed a novel architecture that creates a hybrid connection between an AE and a BiLSTM network. Initially, the data are cleaned through a refinement step in the preprocessing step, and their refined sequence is passed into the AE for feature collection. The obtained features from the AE are fed into the BiLSTM for final forecasting. The proposed approach is capable of learning a compressed representation from the sequential input data and of forecasting RES power accurately. The proposed method is helpful to avoid extra production of power energy and its wastage. The smart grid and the consumer side will smoothly cooperate following the proposed algorithm. Further, using publicly available datasets, the proposed method's performance was shown to be higher than state-of-the-art techniques.
In the future, we aim to consider different scenarios for power energy generation and its consumption by residential areas, industries, and the commercial side for proper energy management. Moreover, lightweight models will be investigated for their deployment as prediction models over resource-constrained devices by reducing the computa-

Conclusions
To mitigate climate change and global warming impacts, RE usage is significantly increasing on a daily basis. A certain amount of power has been generated by different RESs in recent decades. The power generated through these plants is used by consumers for different applications. However, the power produced needs to be predicted so that an exact amount of power is produced in the future. To forecast this problem, several techniques have come into the foreground, where the majority of these methods are based on traditional learning techniques. To this purpose, we developed a novel architecture that creates a hybrid connection between an AE and a BiLSTM network. Initially, the data are cleaned through a refinement step in the preprocessing step, and their refined sequence is passed into the AE for feature collection. The obtained features from the AE are fed into the BiLSTM for final forecasting. The proposed approach is capable of learning a compressed representation from the sequential input data and of forecasting RES power accurately. The proposed method is helpful to avoid extra production of power energy and its wastage. The smart grid and the consumer side will smoothly cooperate following the proposed algorithm. Further, using publicly available datasets, the proposed method's performance was shown to be higher than state-of-the-art techniques.
In the future, we aim to consider different scenarios for power energy generation and its consumption by residential areas, industries, and the commercial side for proper energy management. Moreover, lightweight models will be investigated for their deployment as prediction models over resource-constrained devices by reducing the computation and cost.