1. Introduction
The global demand for clean energy resources that meet the increasing need of energy demands is rising day by day. Since the early 1990s, natural gas is used more for these energy resources. While household users consume natural gas for heating, cooking and hot water, factory users utilize them for power generation, transportation, processing, heating, cooling and cooking. The cost and selling price of natural gas are affected by natural gas consumption of high-use industrial subscribers with expenditure items of energy. Therefore, forecasting year ahead natural gas demands close to actual consumption is important to industrial subscribers.
Although industrial subscribers’ consumption needs to be predictable, household and low-consuming subscribers do not have to know in advance. This makes consumption estimation for low-consuming subscribers difficult. Demand forecasting methods have been developed and continue to be developed, in order to perform the difficult estimation for low-consuming subscribers’ consumption. Decision makers in the energy sector use these methods to make predictions about future demand, and supply-and-demand must overlap as much as possible. The supply-and-demand should be balanced with high accuracy. As a result, the stabilization process becomes a very important sub-discipline of energy sectors including electricity, gas, water and wind.
Privatization of the electricity and natural gas sectors in Turkey brought the formation of a market structure. In the market structure, high errors in demand forecast result in penalties. Operation of the market and penalties will be discussed in the following section. There are various demand forecast ranges in natural gas, like year ahead monthly, month-ahead daily capacity reservation and day-ahead forecasting. The most difficult range among these demands is day-ahead forecasting, since it is hard to implement and has a low error rate in prediction. In this study, day-ahead natural gas demand forecast using low consuming subscribers’ data is predicted. The hybrid method applying the artificial bee colony (ABC) algorithm for training the artificial neural network (ANN) structure was used.
1.1. Related Work
The literature for this study can be roughly grouped into two categories, according to the methods applied. The first category is the daily natural gas consumption demand forecast, and the second is the ANN and hybrid methods used for energy demand. There are many studies based on daily natural gas demand forecasting [
1,
2,
3,
4,
5,
6,
7,
8,
9,
10,
11,
12]. Khotanzad et al. worked on a combination of ANN forecasters for predicting natural gas consumption at a citywide distribution level [
1]. Gorucu et al. used ANN to forecast gas consumption [
2] at a citywide distribution level. Potocnik et al. proposed a strategy to estimate the forecasting risk for the citywide distribution level [
3] using hourly consumption data. Akpinar and Yumusak divided consumption monthly by season, and tried to forecast consumption [
4]. Sanchez-Ubeda and Berzosa presented a novel prediction model that provides forecasting for the end-use of industrial consumption in Spain for a medium-term horizon (1–3 years) with a very high resolution (days) based on a decomposition approach on a national level [
5]. Yokoyama et al. proposed a global optimization method called the model trimming method to identify the model parameters [
6]. They used neural networks with predicted air temperature and relative humidity as input and energy demand forecasted. Akpinar and Yumusak used a linear regression in their study with the sliding window technique [
8]. They slid different sized windows data, and researched the best solution for natural gas demand. Natural gas consumption is forecasted using daily gas consumption data through different methods including the seasonal autoregressive integrated moving average model with exogenous inputs (SARIMAX), multi-layer perceptron ANN (ANN-MLP), ANN with Radial Basis Functions (ANN-RBF), and multivariate Ordinary Least Squares (OLS) [
9]. They found that SARIMAX gives more accurate results than the others. Soldo et al. used the linear autoregreesive model with exogenous inputs (ARX), ANN and support vector machines (SVM) to forecast daily natural gas consumption with solar radiation [
11]. Their results confirm that solar radiation improves forecast accuracy. Similar to Soldo’s study, a simulation work was done on energy consumption and valuable results were obtained [
13].
ANN and hybrid methods are frequently used in the energy sector [
1,
2,
6,
7,
9,
10,
11,
12,
14,
15,
16,
17,
18,
19,
20,
21,
22,
23,
24,
25,
26,
27,
28,
29,
30,
31,
32,
33]. The adaptive network-based fuzzy inference system (ANFIS) for estimating natural gas demand is one of the hybrid methods in these studies [
7]. Azadah et al. used historical data in their study. Karimi and Dastranj used an ANN-based genetic algorithm (GA) to predict natural gas consumption [
10]. They used GA to optimize the parameters of neural network topology. Yalcinoz and Eminoglu forecasted the electricity load of Nigde province in Turkey [
15]. They used past data for mid-term monthly forecasting, and weather data along with historical data to forecast daily loads with ANN. Amjady attempted one-day hourly price forecasting of electricity markets by a new Fuzzy NN [
16]. They examined a proposed method for the Spanish electricity market, and this proposed technique was more accurate than the autoregressive integrated moving average (ARIMA), wavelet-ARIMA, multilayer perceptron (MLP) and radial basis function NN (RBF). Saini researched feedforward ANN based on steepest descent, Bayesian regularization, resilient and adaptive backpropagation (BP) learning methods to forecast seven-day peak load with weather, and past peak load information [
25]. Best performance is accomplished with adaptive BP learning for peak load forecasting. Azadeh et al. proposed ANFIS-fuzzy data envelopment analysis (FDEA) [
30]. FDEA is used to examine the behavior of gas consumption, and the algorithm is capable of dealing with both complexity and uncertainty. Szoplik analyzed seasonal and diurnal variation [
31]. In this research, the design and training of the MLP model to forecast the hourly demand for natural gas in the city has been studied. In another study, Azadeh et al. showed how to model sharp drops/jumps in natural gas consumption [
32]. They proposed an emotional learning-neuro-fuzzy inference approach for optimum training and forecasting of gas consumption estimation.
1.2. Motivation
Optimization algorithms, mentioned previously, are used in ANN. The new optimization technique called Artificial Bee Colony (ABC) is one of them, and has a wide range of usage for optimizing, mentioned below. The ABC algorithm could be used as combination with other algorithms [
12,
33,
34,
35]. Akpinar et al. forecasted day-ahead natural gas demand using Hybrid ANN-ABC and ANN-BP [
12]. They used various ANN structures and hidden layers. They found 18% mean absolute percentage error (MAPE) and 0.891 coefficient of determination. Uzlu et al. estimated hydroelectric generation using ANN with the ABC algorithm for Turkey [
33]. They found that the ANN-ABC model is more accurate than classical ANN. Li et al. studied optimal power flow problems using differential evolution (DE) and ABC algorithms [
34]. They mentioned that the DE algorithm solves problems with a large population size, as opposed to ABC and a proposed hybrid DE-ABC algorithm. They found DE-ABC convergence time took less than DE, and the DE-ABC algorithm was effective. An energy efficient optimal deployment strategy is studied in their research. Adak and Yumusak studied classification of the aroma data for four fruits using ABC [
35]. They discovered ANN trained by ABC was successful in classifying aroma data.
Suganthi presented a survey about energy demand forecasting [
36]. A review of forecasting natural gas demand was performed by Soldo [
37], and another review of the ABC algorithm proposed in this study was also presented by Karaboga [
38].
1.3. Our Contribution
This paper studies forecasting day-ahead natural gas demand. ANN has a wide range usage area for predicting energy demand as mentioned in previous studies, and the ABC algorithm has been used for optimization in several fields. In our paper, as an alternative of using the BP algorithm, the ABC algorithm is applied for the training stage of the ANN.
The main reasons for selecting ABC are that ABC is easier to apply, and it requires less parameters than other algorithms. Since the exploration feature of ABC is more successful than others, it will reach the global minimum without getting stuck in a shallow local minimum. In addition, the use of the ANN-ABC algorithm with univariate data, and use of the sliding window technique are important points of study while predicting day-ahead demand. ABC optimized feedforward-ANN (ANN-ABC) and ANN-BP with three different hidden layer structures and various neurons are applied for the day-ahead demand prediction of natural gas, which is a sub-branch of the energy sector. These methods have not been used together previously for natural gas and day-ahead predictions. The data does not contain any information with other variables except its own past data.
The rest of the paper is organized as follows: the natural gas market of Turkey and collected natural gas, and the data are presented in
Section 2. A theoretical description of methods is provided in
Section 3.
Section 4 gives detailed information about modeling, definitions, scenarios and results. The key findings and next studies are given at the end of the paper as conclusions in
Section 5.
4. Scenarios and Results
In this study, different scenarios are prepared to forecast natural gas consumption. These scenarios have two parts. The first part is training the ANN using the BP algorithm, and the second part is training the ANN using the ABC algorithm. All trainings are realized with the same numbers of neurons and hidden layers. The MSE, MAPE and
measurements are selected in the training stage for the ABC. The MSE, MAPE and
measurements are also calculated for BP training. ANNs are tested for the year 2014 using calculated training weights.
Table 2 shows the training parameters for the ABC and BP. In this table, the food source limit is the values of natural gas consumption, and the value is 365 due to the daily forecast.
The ANN structure is shown in
Figure 6. Seven days before the forecasting day were used for input values for the network, and day-ahead consumption is forecasted. The training stage for ANNs are prepared individually by MSE, MAPE and
error calculations. Normalized consumptions are transformed to real consumptions before calculating errors. The error calculations (MSE, MAPE and
) are done with real consumptions. Thus, training is done with real data, and various ANNs are designed to determine the best-performing ANN. For this purpose, one hidden layer network with 20, 40, 60, 80 and 100 neurons is trained with 500, 1000, 3000, 5000, 7000, and 10,000 epochs. Thirty different results are found for BP training, and 90 different results are found for ABC training for one hidden layer. The test data in the hidden layer network that provides suitable network performance are selected. The suitable network performance criterion is 0.2 or less MAPE. The networks having suitable performance are redesigned with two hidden layers, and the training is repeated with the same number of epochs used for one hidden layer. In the two hidden layers structure, the second hidden layer’s neuron number is incremented in tens from 10 to 60. For each MSE, MAPE and
training error for ABC training and MSE for BP training, 120 individual network structures are prepared and trained. Unlike the suitable performance of one hidden layer network structure, two hidden layers’ network structures performance criteria is 0.16 or less MAPE. The networks are redesigned for the performance criteria of two hidden layers and five, 15, and 30 neurons are used in the third hidden layer with the same numbers of epochs used for one and two hidden layers.
The numbers of prepared ANN structures within the performance criteria are given in
Table 3. The rows indicate the number of hidden layers, and the columns indicate the number of epochs. It is seen that the most efficient training is with 7000 epochs.
The error terms MSE, MAPE and
are used for training the ANN-ABC. Due to the nature of BP, only MSE is used in the training stage of ANN-BP. The network structures with the best results from MAPE for the test dataset and the abbreviations for them are given in
Table 4. The best network structures are shown for three criteria. These three criteria are the number of hidden layers, training algorithms for the ABC, and the test results for BP and ABC. The ANN models with the lowest MSE, MAPE and
values in the test dataset are shown in
Table 4. In the test dataset for one hidden layer BP training model, 40 neurons and 1000 epochs are the best solutions for the MSE value. For ABC training using the MSE error values in one hidden layer, the best test dataset result is obtained with 20 neurons and 3000 epochs. Other situations are seen in the table. Abbreviations are written along with the type of training and two parameters in parentheses. The first parameter is the error type for ANN at the end of training with the BP algorithm (the training can be done only with MSE). If the training type is ABC, then the first parameter shows an error type used in training and at the end of training. The first parameter
S stands for MSE value,
M stands for MAPE value, and
R stands for
value. The second parameter in parentheses is the number of hidden layers.
The test dataset based on the best results is found with a different ANN structure and different number of epochs (
Table 4). All results mentioned in this paragraph are based on the best test dataset outcomes (
Figure 7). The best results when MSE is used for error criteria are 500 epochs for ABC training (
Figure 7(a2)) and 1000 epochs for BP training. The lowest MSE value is found as 646,201,826.5 in three hidden layers on BP training, while the lowest MSE value for ABC training is 2,185,385,306 in three hidden layers. As can be seen, the BP training MSE value is 3.5 times less than the lowest ABC training MSE value (
Figure 7(a3)). For MAPE error criteria, the lowest epoch number is 7000 for training the ABC, and 10,000 epochs for BP training (
Figure 7(b1–b3)). The lowest MAPE values are 0.0894 and 0.1412 with three hidden layers for BP and ABC training, respectively (
Figure 7(b3)). At the end of the training, the MAPE value for BP is two times lower than ABC. The lowest epoch numbers of
in training are 7000 and 3000 for BP and ABC, respectively (
Figure 7(c1,c2)). The
value is too low for BP training with 1.78 ×
with three hidden layers. 0.0734 is the lowest value of
for ABC training with three hidden layers.
is close to 1 in BP training, and the
value for ABC training is 0.9266 (
Figure 7(c3)). During the training stage, differentiated errors are propagated to the network weights as the structure of BP. Thus, BP training presents less errors than ABC training on both error criteria and hidden layer structures (
Figure 7). The error criteria results indicate that BP has a significant success in training. However, ABC training results are less successful compared to the BP training. This statement can be also applied to forecasting results for BP and ABC.
As depicted in
Figure 7, most of the training takes place in the first 100 epochs of BP training. After 100 epochs, the training slope goes down to nearly zero. Even with the MAPE criterion on one hidden layer, training errors increase (
Figure 7(b1)). Unlike BP training, error values in ABC decreases in every epoch step continuously. This situation expresses that, after finding the most effective nectar, the ABC searches for better sources nearby. This is so that it continues to make improvements in learning. Comparatively, BP tends to over-fit the training data. After the training step, day-ahead forecasts are done from 01.01.2014 to 31.12.2014 by trained networks. Seven-day consumptions before the forecasting day are used for day-ahead prediction. Based on the training error criteria, forecasting results for the lowest error ANN are showed in
Figure 8 for three different hidden layer structures. The most noticeable point that is independent from the error criteria is that the BP and ABC errors visibly drop as the number of hidden layers increase. For BP, one and two hidden layers have high errors independent from the error criteria. The tested network, which is trained by BP, previously had 100% MAPE with a single hidden layer, approximately 63% MAPE with two hidden layers, and 33% with three hidden layers, regardless of the error criteria. It can be said that each added hidden layer decreases 33% MAPE in the BP algorithm. The negligibility of the layer amount of the ABC trained network is also notable. Independent from the error criteria, both one and two hidden layers have 16.8% MAPE on average, while three hidden layers have nearly 16.6% MAPE. In other words, in the ABC trained network, hidden layer number does not vary regardless of the error criteria. The fact that different results obtained in the same structure by changing the network weights shows the success level of the training. Since the ABC trained network has better results than the BP trained network, the accuracy of the approach applied in the study is confirmed.
Tests in all scenarios showed that ANN-ABC generates lower error values than ANN-BP. The error values at the end of the test process are presented in
Table 5 according to the hidden layers. Among single hidden layer structures for all scenarios, the network that has 7000 epochs with 20 neurons has the best performance, with a MAPE value of 16.29%. Among the two hidden layer networks, the best performing network with a 15.36% MAPE value is the network that ran with 7000 epochs and has 20 + 20 neurons. In addition, the best network of the three hidden layers structures presents a 14.94% MAPE value run with 7000 epochs and 20 + 10 + 5 neurons. All of these values are for ANN-ABC. Since ANN-BP performs worse compared to ANN-ABC in all scenarios, the ANN is trained by a different BP examination. In this case, the ANN-BP has one, two and three hidden layers and a normalized dataset between [0, 1], where these values are not converted to real numbers during training. The lowest error value obtained with a single hidden layer network is 41.7%, two hidden layers is 30.21%, and three hidden layers is 29.97%.
The lowest MAPE values based on the predictions for the test data of BP and ABC trainings are given in
Figure 9. During the forecasting series, the BP algorithm error is usually higher than the ABCs. The high prediction values for the summer period indicate that the BP algorithm effect is stronger in the winter. The high influence in winter months from BP leads to a sharp increase or decrease in predictions for summer, spring, and fall seasons, when the previous days’ consumption values are used as input. The sudden increase or decrease in the BP algorithm can be seen clearly from February–March and November–December. For instance, in October, the BP algorithm forecast is totally different from the actual consumption. In this month, the slow increase in consumption behavior causes a very high consumption response, and this implies that BP uses memorization instead of learning. In contrast, it can be said that all seasons have a similar effect on the ABC algorithm. It can also be clearly seen that the prediction and realization overlap for the summer. The fact that a small amount of increase in October has almost the same level of consumption prediction, clearly showing the success of ABC training. Different from the BP algorithm, towards the end of April, the ABC algorithm predictions are proportionally similar to the real consumption decrease.
Figure 10 shows states for the lowest MAPE values of the BP and ABC models during training. The lowest error for the BP algorithm is found in the MSE value, while, for the ABC algorithm, it is the MAPE value. Therefore, the graph has a two-sided
y-axis, where the left axis represents MAPE, and the right axis represents MSE values. In the detailed BP algorithm graph, from the 5.21 ×
MSE value to the 7.5 ×
, MSE value is reached at the 10th epoch, and the local minimum point is reached only one time. The 110th epoch value has a 3 ×
MSE value. While the training stage after this time stays almost unchanged, after 4000 epochs, the training increased, and at the end of 10,000 epochs, it reached a 6.46 ×
MSE value. The BP algorithm reduced the MSE error by one tenth in training. In the BP algorithm, 7/10 of the total reduced errors take place until 4000 epochs, while 3/10 of reduced errors occur between 4000 and 10,000 epochs. This state indicates that the majority of the training is completed until the first half of training. However, in the ABC model, MAPE decreased for each epoch during the training stage, which had a value of 0.67 at the beginning. It is also analyzed that, in training, the ABC has below 15% MAPE for the 6097th epoch, and 14.68% MAPE at the end of the 7000th Epoch. In the test, the ANN-ABC gives 14.9% MAPE for one-day forecasting, which proves that the ABC does not memorize the consumption data, and the training with the ABC succeeds.