Exploring an Ensemble of Methods that Combines Fuzzy Cognitive Maps and Neural Networks in Solving the Time Series Prediction Problem of Gas Consumption in Greece

: This paper introduced a new ensemble learning approach, based on evolutionary fuzzy cognitive maps (FCMs), artiﬁcial neural networks (ANNs), and their hybrid structure (FCM-ANN), for time series prediction. The main aim of time series forecasting is to obtain reasonably accurate forecasts of future data from analyzing records of data. In the paper, we proposed an ensemble-based forecast combination methodology as an alternative approach to forecasting methods for time series prediction. The ensemble learning technique combines various learning algorithms, including SOGA (structure optimization genetic algorithm)-based FCMs, RCGA (real coded genetic algorithm)-based FCMs, e ﬃ cient and adaptive ANNs architectures


Introduction
Time series forecasting is a highly important and dynamic research domain, which has wide applicability to many diverse scientific fields, ranging from ecological modeling to energy [1], on the efficient capabilities of evolutionary fuzzy cognitive maps (FCMs) and enhanced by structure optimization algorithms and artificial neural networks (ANNs), was introduced in [69]. Furthermore, the researchers in [21,60] recently conducted a preliminary study on implementing FCMs with NNs for natural gas prediction.

Research Aim and Approach
The purpose of this paper was to propose a new forecast combination approach resulting from FCMs, ANNs, and hybrid models. This ensemble forecasting method, including the two most popular ensemble methods, the Average and the Error-based, is based on ANNs, FCMs with learning capabilities, as well as on a hybrid FCM-ANN model with different configurations, to produce an accurate non-linear time series model for the prediction of natural gas consumption. A real case study problem of natural gas consumption in Greece was performed to show the applicability of the proposed approach. Furthermore, in order to validate the proposed forecasting combination approach, a comparison analysis between the ensemble methods and an innovative machine learning technique, the long short-term memory (LSTM) algorithm (which is devoted to time series forecasting), was conducted, and the results demonstrated enough evidence that the proposed approach could be used effectively to conduct forecasting based on multivariate time series. The LSTM algorithm, as an advanced recurrent NN method, was previously used for short-term natural gas demand forecasting in Greece [70]. In that research paper, LSTM was applied in one day-ahead natural gas consumption, forecasting for the same three Greek cities, which were also examined in the case study presented in the current paper. Many similar works can be found in the literature that examine various forecast combinations in terms of accuracy and error variability but, in the present work, an innovative approach that combines FCMs, ANNS, and hybrid FCM-ANN models, producing a non-linear time series model for the prediction of natural gas consumption, was studied exclusively, contributing to the novelty of the current study. The results demonstrated in a clear way that the proposed approach had attained better accuracies than other individual models. This study justified the superiority of the selective ensemble method over combining the important features and capabilities of the models that consist of the overall approach, making it a useful tool for future work.
The outline of the paper is as follows. Section 2 describes the material and methods of our research study; Section 2.1 describes the case study problem and refers to the datasets of natural gas demand that are used, whereas Section 2.2 presents the analyzed approaches for time series forecasting based on ANNs, FCMs with evolutionary learning algorithms, and their hybrid combinations. The most widely used ensemble methods for forecasting problems (i.e., the error-based and the simple average method) are also presented in Section 2.2. In Section 3, the proposed forecasting combination approach is described. The same Section presents the evaluation criteria, which we have used to analyze the performance of the analyzed approaches for natural gas prediction. Section 4 presents the results of simulation analysis for three different Greek cities, as well as the conducted comparative analysis of the proposed approach with other intelligent techniques. A discussion of the results highlights the main findings of the proposed ensemble forecasts approach. Section 5 summarizes the main conclusions of the paper with further discussion and suggestions about future research expansion.

Material-Dataset
In the considered case study, three different prediction datasets of natural gas demand, derived from different districts in Greece, were analyzed from the records of the Hellenic Gas Transmission System Operator S.A. (www.desfa.gr, DESFA). DESFA company is responsible for the operation, management, exploitation, and development of the Greek Natural Gas System and its interconnections in a technically sound and economically viable way. From 2008, DESFA provides historical data of transmission system operation and natural gas deliveries/off-takes. In this research work, historical data with the values of gas consumption for a period of five years, from 2013 to 2017, were used as initial data to accomplish forecasting. These data were split into training and testing data, where usually the training data came from the first four years and were used for learning models, whereas the data of the last year were used for testing the applied artificial intelligence models.
It is crucial for an efficient forecast to properly select the number and types of inputs. Thus, we emphasized on defining proper input candidates. Six different inputs for time series prediction were considered. The first three inputs were devoted to month indicator, day indicator, and mean temperature. Specifically, concerning the calendar indicators, we used one input for months and one input for days coding. Let m = 1, 2, . . . , 12 be the number of months. We considered the following matching: January/1, February/2, . . . , December/12. Let l = 1, 2, . . . , 7 be the number of days. The day type matching was as follows: Monday/1, Tuesday/2, . . . , Sunday/7. The temperature data were obtained by the nearest to the distribution gas point station. The rest three inputs were the previously measured values of natural gas demand, for one-day before, two-day before, and the current day. These six variables were used to form the input pattern of the FCM. The output referred to the total daily demand for the specific distribution point.
The features that were gathered and used in our study to form the FCM model were enough and properly selected according to the relevant literature. From a recent literature review regarding the prediction of natural gas consumption [40], it can be seen that past gas consumption combined with meteorological data (especially temperature) are the most commonly used input variables for the prediction of natural gas consumption. A recent study [41] used past consumption, temperature, months, and days of the week, while in [55], day of week and demand of the same day in the previous year were used as input variables for natural gas forecasting. Considering the above practices described in the literature, it can be concluded that the features used in the current work were enough to predict the consumption of natural gas for the selected areas.
The Greek cities of Thessaloniki, Athens, and Larissa were selected for the conducted simulation analysis and comparison of the best performing algorithms. These different natural gas consumption datasets may offer insight into whether the analyzed algorithms perform equally in different locations, where the energy demand could be completely different for the same days.

Fuzzy Cognitive Maps Overview
A fuzzy cognitive map (FCM) is a directed graph in which nodes denote concepts important for the analyzed problem, and links represent the causal relationships between concepts [71]. It is an effective tool for modeling decision support systems. FCMs have been applied in many research domains, e.g., in business performance analysis [72], strategy planning [73], modeling virtual worlds [74], time series prediction [69], and adoption of educational software [75].
The FCM model can be used to perform simulations by utilizing its dynamic model. The values of the concepts change in time as simulation goes on [68]. The new values of the concepts can be calculated based on the popular dynamic model described as follows [59]: where X i (t) is the value of the ith concept at the tth iteration, w j,i is the weight of the connection (relationship) between the jth concept and the ith concept, t is discrete-time, i. j = 1, 2, . . . , n, n is the number of concepts, F(x) is the sigmoidal transformation function [58]: where c is a parameter, c > 0. The weights of the relationships show how causal concepts affect one another. If w j,i > 0, then an increase/decrease in the value of the jth concept will increase/decrease the value of the ith concept. If w j,i < 0, then an increase/decrease in the value of the jth concept will decrease/increase the value of the ith concept. If w j,i = 0, there is no causal relationship between the jth and the ith concepts [74]. The FCM structure is often constructed based on expert knowledge or surveys [74]. We could also use machine learning algorithms and available historical data to construct the FCM model and determine the weights of the relationships between the FCM's concepts.

Fuzzy Cognitive Maps Evolutionary Learning
Evolutionary algorithms are popular techniques for FCMs learning. In this paper, we explored two effective techniques: the real-coded genetic algorithm (RCGA) [68] and the structure optimization genetic algorithm (SOGA) [69].

Real-Coded Genetic Algorithm (RCGA)
The RCGA algorithm defines individual in the population as follows [24]: where w j,i is the weight of the relationship between the jth concept and the ith concept.
Individual in the population is evaluated with the use of a fitness function based on data error [66]: where a is a parameter, l is the number of generation, l = 1, . . . ,L, L is the maximum number of generations, p is the number of individual, p = 1, . . . ,P, P is the population size, and MSE tr (l) is the data error, described as follows: where t = 1, . . . ,N tr , N tr is the number of training records, and e t is the one-step-ahead prediction error at the tth iteration, described as follows: where X(t) is the predicted value of the output concept, and Z(t) is the desired value of the output concept. When the maximum number of generations L is reached, or the condition (7) is met, which means that the learning process is successful, then the RCGA stops.
f itness p (MSE tr (l)) > f itness max (7) where f itness p (MSE tr (l)) is the fitness function value for the best individual, and f itness max is a parameter.

Structure Optimization Genetic Algorithm (SOGA)
The SOGA algorithm is an extension of the RCGA algorithm [65,66] that allows the decision-maker to determine the most significant concepts and the relationships between them.
Individual is evaluated based on the fitness function based on new data error, described as follows [66]: MSE tr (l) = MSE tr (l) + b 1 n r n 2 MSE tr (l) + b 2 n c n MSE tr (l) (8) where b 1 , b 2 are the parameters of the fitness function, n r is the number of the non-zero relationships, n c is the number of the concepts in the analyzed model, n is the number of all possible concepts, l is the number of generation, l = 1, . . . ,L, L is the maximum number of generations. The fitness function that follows (9) calculates the quality of each population.
f itness p (MSE tr (l)) = 1 aMSE tr (l) + 1 (9) where α is an experimentally defined parameter, p is the number of the individual, p = 1, . . . ,P, P is the population size, and MSE tr (l) is the new error measure. We could construct a less complex time series prediction model by removing the redundant concepts and connections between them with the use of a binary vector C and the proposed error function.
The algorithmic steps of the learning and analysis of the FCM in modeling prediction systems with the use of population-based algorithms (SOGA and RCGA) were analytically presented in [69].
For our experiments, the evolutionary operators, a) ranking selection, b) uniform crossover, and c) random mutation were used [76,77]. In addition, we applied elite strategy selection, while a probability of crossover P c and mutation P m was assigned to each population.

Artificial Neural Networks
An artificial neural network (ANN) is a collection of artificial neurons organized in the form of layers [25]. Neurons are connected by weighted connections to form a NN. The most widely used ANNs in time series prediction are the multilayer perceptrons with an input layer, an output layer, and a single hidden layer that lies between the input and output layer. The most common structure is an ANN that uses one or two hidden layers, as a feed-forward neural network with one hidden layer is able to approximate any continuous function. Supervised learning algorithms and historical data can be used for the learning process of ANNs. The output of each neuron can be calculated based on the following formula: where X j (t) is the value of the jth input signal, t = 1, . . . ,N tr , N tr is the number of training records, w j is the synaptic weight, m is the number of input signals, b is the bias, and F is the sigmoid activation function. Training a neural network needs the values of the connection weights and the biases of the neurons to be determined. There are many neural network learning algorithms. The most popular algorithm for ANN learning is the back-propagation method. In this learning method, the weights change their values according to the learning records until one epoch (an entire learning dataset) is reached. This method aims to minimize the error function, described as follows [14,78,79]: where t = 1, . . . ,N tr , N tr is the number of training records, l is the number of epoch, l = 1, . . . ,L, L is the maximum number of epochs, and e t is the one-step-ahead prediction error at the tth iteration, which is equal to: where X(t) is the output value of the ANN, and Z(t) is the desired value. The modification of the weights in the back-propagation algorithm can be calculated by the formula: where ∆ w k j (l) is a change of the weight w k j at the lth epoch, γ is a learning coefficient. Backpropagation algorithm with momentum modifies the weights according to the formula: where α is a momentum parameter.

Hybrid Approach Based on FCMs, SOGA, and ANNs
The hybrid approach for time series prediction is based on FCMs, the SOGA algorithm, and ANNs [68]. This approach consists of two stages:

1.
Construction of the FCM model based on the SOGA algorithm to reduce the concepts that have no significant influence on data error.

2.
Considering the selected concepts (data attributes) as the inputs for the ANN and ANN learning with the use of backpropagation method with momentum.
This hybrid structure allows the decision-maker to select the most significant concepts for an FCM model using the SOGA algorithm. These concepts are used as inputs for the ANN model. Such a hybrid approach aims to find the most accurate model for time series prediction problems.

The Ensemble Forecasting Method
The most intuitive and popular way of forecast aggregation is to linearly combine the constituent forecasts [80]. There are various methods proposed in the literature for selecting the combining weights [81]. The most popular and widely used ensemble methods are the error-based and the simple average [82]. The easiest among them is the simple average in which all forecasts are weighted equally, often remarkably improving overall forecasting accuracy [82,83].
Considering that Y = y 1 , y 2 , y 3 , . . . , y N T is the actual out-of-sample testing dataset of a time is the forecast for the i th model, the linear combination of n forecasts is produced by [15]: Here, our analysis is based on these most popular ensemble methods. A brief discussion follows for each one.

•
The simple average (AVG) method [82] is an unambiguous technique, which assigns the same weight to every single forecast. Based on empirical studies in the literature, it has been observed that the AVG method is robust and able to generate reliable predictions, while it can be characterized as remarkably accurate and impartial. Being applied in several models, with respect to effectiveness, the AVG improved the average accuracy when increasing the number of combined single methods [82]. Comparing the referent method with the weighted combination techniques, in terms of forecasting performance, the researchers in [84] concluded that a simple average combination might be more robust than weighted average combinations. In the simple average combination, the weights can be specified as follows: • The error-based (EB) method [16] consists of component forecasts, which are given weights that are inversely proportional to their in-sample forecasting errors. For instance, researchers may give a higher weight to a model with lower error, while they may assign a less weight value to a model that presents more error, respectively. In most of the cases, the forecasting error is calculated using total absolute error statistic, such as the sum of squared error (SSE) [80,83]. The combining weight for individual prediction is mathematically given by:

The Proposed Forecast Combination Methodology
In the rest of the paper, we explored a new advanced forecasting approach by introducing a different split of dataset in the case of daily, weekly, or monthly forecasting, as well as a combination of forecasts from multiple structurally different models, like ANN and FCM with various efficient learning algorithms and hybrid configurations of them. Also, the two most popular and usually used ensemble methods, the AVG and the EB methods, were applied to the ensemble forecasts to improve the prediction accuracy.
In the described ensemble scheme, the selection of the appropriate validation set, i.e., the selection of the parameter N vd and the group size N tr , is very important. The validation set should reflect the characteristics of the testing dataset that is practically unknown in advance. As such, in this study, we set the following process of data split. The data split takes place by removing 15% of the total dataset N and saving for later use as testing data. The remaining 85% of the dataset is then split again into an 82/18 ratio, resulting in the following portions: 70% for training and 15% for validation. Also, the group size N tr (i.e., the training data) should be appropriately selected so that it is neither too small nor too large.
Due to the problem nature, as we work with time-series data, the most efficient method for resampling is the boosting/bootstrapping method [85]. In boosting, resampling is strategically geared to provide the most informative training data for each consecutive predictor. Therefore, in this study, an appropriate bootstrapping method was applied, so that the training dataset should have the same size at each resampling set, and the validation and testing sets should keep the same size (after excluding the k-values from the in-sample dataset).
The proposed effective forecast combination methodology for time series forecasting, presented in the paper, includes three main processing steps: data pre-processing to handle missing values, normalize the collected time-series data, and split the dataset; the various forecasting methods of ANNs, RCGA-FCMs, SOGA-FCMs, and hybrid SOGA FCM-ANN with their ensembles; and evaluation of the prediction results, implementing the two most popular and used ensemble methods of simple average (AVG) and error-based (EB). Figure 1 visually illustrates the suggested methodology.
In the followed approach, data preprocessing included outlier detection and removal, handling missing data, and data normalization, all of which were in accordance with the principles of Data Science practices described in corresponding literature. For outlier detection, the Z-score was first calculated for each sample on the data set (using the standard deviation value that is presented in the descriptive statistics Tables A1 and A2 in Appendix A). Then, a threshold was specified, and the data points that lied beyond this threshold were classified as outliers and were removed. Mean imputation was performed to handle missing values. Specifically, for numerical features, missing values were replaced by the mean feature value. In the followed approach, data preprocessing included outlier detection and removal, handling missing data, and data normalization, all of which were in accordance with the principles of Data Science practices described in corresponding literature. For outlier detection, the Z-score was first calculated for each sample on the data set (using the standard deviation value that is presented in the descriptive statistics Tables A1 and A2 in Appendix A). Then, a threshold was specified, and the data points that lied beyond this threshold were classified as outliers and were removed. Mean imputation was performed to handle missing values. Specifically, for numerical features, missing values were replaced by the mean feature value.
Each dataset was normalized to [0,1] before the forecasting models were applied. The normalized datasets were taking again their original values, while the testing phase was implemented. The data normalizations were carried out mathematically, as follows: where Y = y , y , y , … , y is the training dataset, and is the normalized dataset, ( ) and ( ) are, respectively, the minimum and maximum values of the training dataset Y.
We selected the Min-Max normalization method [86] as it is one of the most popular and comprehensible methods, in terms of performance of the examined systems, while several researchers showed that it produces better (if not equally good) results with high accuracy, compared to the other normalization methods [87,88]. In [88], the Min-Max was valued as the second-best normalization method in the backpropagation NN model, justifying our choice to deploy this method for data normalization. Moreover, since the FCM concepts use values within the range [0,1] for the conducted simulations and do not deal with real values, the selected method seemed to be proper for our study. Also, this normalization approach was previously used in [66,69]. Each dataset was normalized to [0,1] before the forecasting models were applied. The normalized datasets were taking again their original values, while the testing phase was implemented. The data normalizations were carried out mathematically, as follows: where Y = y 1 , y 2 , y 3 , . . . , y N train T is the training dataset, and Y (new) = y T is the normalized dataset, y (min) and y (max) are, respectively, the minimum and maximum values of the training dataset Y. We selected the Min-Max normalization method [86] as it is one of the most popular and comprehensible methods, in terms of performance of the examined systems, while several researchers showed that it produces better (if not equally good) results with high accuracy, compared to the other normalization methods [87,88]. In [88], the Min-Max was valued as the second-best normalization method in the backpropagation NN model, justifying our choice to deploy this method for data normalization. Moreover, since the FCM concepts use values within the range [0,1] for the conducted simulations and do not deal with real values, the selected method seemed to be proper for our study. Also, this normalization approach was previously used in [66,69].
Due to our intention to suggest a generic forecasting combination approach (with ANNs, FCMs, and their hybrid structures) able to be applied in any time series dataset, the following steps are thoroughly presented and executed.
Step 1. (Split Dataset) We divided the original time series Y = y 1 , y 2 , y 3 , . . . , y N T into the in-sample training dataset Y tr = y 1 , y 2 , y 3 , . . . , y N tr T , the in-sample validation dataset Y vd = y N tr +1 , y N tr +2 , y N tr +3 , . . . , y N tr +N vd T , and the out-of-sample testing dataset Y ts = y N in +1 , y N in +2 , y N in +3 , . . . , y N in +N ts T , so that N in = N tr + N vd is the size of the total in-sample dataset and N in + N ts = N, where N is the number of days, or weeks, or months, according to the short-or long-term prediction based on the time series horizon.
Step 2. (Resampling method/Bootstrapping). Let's consider k sets as training sets from the whole dataset every time. For example, in the monthly forecasting, we excluded one month every time from the initial in-sample dataset, starting from the first month of the time series values, and proceeding with next month till k = 12, (i.e., this means that 1 to 12 months were excluded from the initial in-sample dataset). Therefore, k subsets of training data were created and used for training. The remaining values of the in-sample dataset were used for validation, whereas the testing set remained the same. Figure 2 shows an example of this bootstrapping method for the ensemble SOGA-FCM approach. In particular, Figure 2a represents the individual forecasters' prediction values and their average error calculation, whereas, in Figure 2b, the proposed forecasting combination approach for SOGA-FCM is depicted for both ensemble methods. Due to our intention to suggest a generic forecasting combination approach (with ANNs, FCMs, and their hybrid structures) able to be applied in any time series dataset, the following steps are thoroughly presented and executed.
Step 1. (Split Dataset) We divided the original time series Y = y , y , y , … , y into the insample training dataset Y = y , y , y , … , y , the in-sample validation dataset Y = y , y , y , … , y , and the out-of-sample testing dataset Y = y , y , y , … , y , so that = + is the size of the total in-sample dataset and + = , where is the number of days, or weeks, or months, according to the short-or long-term prediction based on the time series horizon.
Step 2. (Resampling method/Bootstrapping). Let's consider k sets as training sets from the whole dataset every time. For example, in the monthly forecasting, we excluded one month every time from the initial in-sample dataset, starting from the first month of the time series values, and proceeding with next month till k = 12, (i.e., this means that 1 to 12 months were excluded from the initial in-sample dataset). Therefore, k subsets of training data were created and used for training.
The remaining values of the in-sample dataset were used for validation, whereas the testing set remained the same. Figure 2 shows an example of this bootstrapping method for the ensemble SOGA-FCM approach. In particular, Figure 2a represents the individual forecasters' prediction values and their average error calculation, whereas, in Figure 2b, the proposed forecasting combination approach for SOGA-FCM is depicted for both ensemble methods. If we needed to accomplish daily forecasting, then we preselected the number of days excluded at each subset k. For the case of simplicity (as in the case of monthly forecasting), we could consider that one day was excluded at each sub-set from the initial in-sample dataset. The overall approach, including ANN, FCMs, and hybrid configurations of them, is illustrated in Figure 3. In Figure 3, the four ensemble forecasters were produced after the validation process and used for testing through the proposed approach. If we needed to accomplish daily forecasting, then we preselected the number of days excluded at each subset k. For the case of simplicity (as in the case of monthly forecasting), we could consider that one day was excluded at each sub-set from the initial in-sample dataset. The overall approach, including ANN, FCMs, and hybrid configurations of them, is illustrated in Figure 3. In Figure 3, the four ensemble forecasters were produced after the validation process and used for testing through the proposed approach. If we needed to accomplish daily forecasting, then we preselected the number of days excluded at each subset k. For the case of simplicity (as in the case of monthly forecasting), we could consider that one day was excluded at each sub-set from the initial in-sample dataset. The overall approach, including ANN, FCMs, and hybrid configurations of them, is illustrated in Figure 3. In Figure 3, the four ensemble forecasters were produced after the validation process and used for testing through the proposed approach.  Step 4. We implemented each model on Y tr and used it to predict Y vd . LetŶ i vd = ŷ i N tr +1 ,ŷ i N tr +2 , . . . ,ŷ i N tr +N vd T be the prediction of Y vd through the i th model. Step 5. We found the in-sample forecasting error of each model through some suitable error measures. We used the mean absolute error (MAE) and the mean squared error (MSE). These are widely popular error statistics [68], and their mathematical formulation is presented below in this paper. In the present study, we adopted the MSE and MAE to find the in-sample forecasting errors of the component models.
Step 6. Based on the obtained in-sample forecasting errors, we assigned a score to each component model as The scores are assigned to be inversely proportional to the respective errors so that a model with a comparatively smaller in-sample error receives more score and vice versa.
Step 7. We assigned a rank r i 1, 2, . . . , n to the i th model, based on its score, so that r i ≥ r j , if γ i ≤ γ j , ∀i, j = 1, 2, . . . , n. The minimum, i.e., the best rank is equal to 1 and the maximum, i.e., the worst rank is at most equal to n.
Step 8. We chose a number n r so that 1 ≤ n r ≤ n and let I = i 1 , i 2 , . . . , i n r be the index set of the n r component models, whose ranks are in the range [1, n r ]. So, we selected a subgroup of n r smallest ranked component models.
Step 9. Finally, we obtained the weighted linear combination of these selected n r component forecasts, as follows: Here, w i k = γ i k nr k=1 γ i k is the normalized weight to the selected component model, so that n r k=1 w i k = 1.
Step 10. The simple average method could be also adopted, as an alternative to Step 6-9, to calculate the forecasted value.
The validation set was used during the training process for updating the algorithm weights appropriately and, thus, improving its performance and avoiding overfitting. After training the model, we could run it on the testing data, to verify if it has predicted them correctly and, if it has been so, to keep hidden the validation set.
The most popular and widely used performance metrics or evaluation criteria for time series prediction are the following: coefficient of determination (R2), mean square error (MSE), root mean square error (RMSE), mean absolute error (MAE), and mean absolute percentage error (MAPE). The mathematical equations of all these statistical indicators were described in the study [69]. The goodness of fit and the performance of the studied models, when they applied to a natural gas prediction process, were evaluated and compared using two of these five commonly used statistical indicators, namely, the MSE and the MAE [9]. In particular, the performance of the analyzed approaches for natural gas prediction was evaluated based on the following criteria: 1. Mean squared error: 2. Mean absolute error: where X(t) is the predicted value of the neural gas at the tth iteration, Z(t) is the desired value of the neural gas at the tth iteration, t = 1, . . . , N ts , and N ts is the number of testing records. The lower values of the MSE and MAE indicate that the model performance is better with respect to the prediction accuracy, and the regression line fits the data well. All the modeling approaches, tests, and evaluations were performed with the use of the ISEMK (intelligent expert system based on cognitive maps) software tool [66], in which all the algorithms based on ANNs, FCMs, and their hybrid combinations were developed. C# programming language has been used for implementing ensemble models and also for developing ISEMK, which incorporates FCM construction from data and learning, both for RCGA and SOGA implementations [69].

Case Study and Datasets
The natural gas consumption datasets that were used in this research work to examine the applicability and effectiveness of the proposed forecast methodology corresponded to five years (2013-2017), as described in Section 3. Following the first step of the methodology, we split our dataset into training, validation, and testing ones. For the convenience of handling properly the dataset, we defined the data of the first three years as the training dataset (1095 days), the data of the fourth year as the validation dataset (365 days), and the remaining data (5th year) as the testing dataset (365 days), which approximately corresponded to 60%, 20%, and 20%, respectively, as presented in Section 3. Thus, it was easier for our analysis to handle the above values as annual datasets and have a clearer perception of the whole process.
Out of the three years of the defined training dataset, we used the first two as the initial training dataset, while the third (3rd) year was used as a dataset reservoir for the bootstrapping procedure. This year was properly selected to be part of the initial dataset, as for each value of k (the bootstrapping step), a corresponding number of days/weeks/months was additionally needed to be included in the training dataset during the bootstrapping process, thus, avoiding any possible data shortage and/or deterioration that would lead to inaccurate results.
The proposed forecast combination approach, presented in Section 3, offered generalization capabilities and, thus, it could be applied in various time-series datasets, for a different number of k, according to daily, weekly, or monthly prediction. Taking as an example the case of a month-ahead prediction, for each bootstrapping step k, the training dataset shifted one month ahead, getting one additional month each time from the reserved third year of the initial training dataset. In this case, k more months in total were needed for implementing efficiently this approach. If we considered k = 12, then 12 additional months of the initial dataset needed to be added and reserved. This approach justified our case where one year (i.e., the third year) was added to the initial training dataset and was further reserved for serving the purposes of the proposed methodology. Different values of k were also examined without noticing significant divergences in forecasting, compared to the selected k value.
In the next step, the validation procedure (comprising one year of data) was implemented to calculate the in-sample forecasting errors (MSE and MAE) for each ensemble forecasting algorithm (ensemble ANN, ensemble hybrid, ensemble RCGA-FCM, and ensemble SOGA-FCM). The same process was followed for the testing procedure by considering the data of the last year. The two examined ensemble forecasting methods, i.e., the simple average (AVG) and the error-based (EB), were then applied in the calculated validation and testing vectors (Yvd) for each one of the forecast combined methodology (Yvd-ANN, Yvd-Hybrid, Yvd-RCGA, Yvd-SOGA).

Case Study Results
In this study, we applied both the AVG and the EB method in two different cases: case (A) where scores were calculated for individual forecaster of each one of the methods ANN, hybrid, RCGA-FCM, and SOGA-FCM, and case (B), where scores were calculated for each ensemble forecaster (ANN ensemble, hybrid ensemble, RCGA-FCM ensemble, and SOGA-FCM ensemble).
Considering case (A), Table 1 shows the calculated errors and scores based on the EB method for individual forecaster of the two forecasting methods: ANN and hybrid for the city of Athens. The rest calculated errors and scores, based on the EB method, for individual forecaster for the other two remaining forecasting methods RCGA-FCM and SOGA-FCM for Athens can be found in Appendix A of the paper (Table A3). In Appendix A, parts of the corresponding results for the other two examined cities (Larissa and Thessaloniki) are also presented (Tables A4 and A5). Considering case (B), Table 2 presents the calculated weights based on scores for each ensemble forecaster (ANN (ensemble, hybrid ensemble, RCGA ensemble, and SOGA ensemble) for all three cities.
The calculated weights, based on scores for the EB method, were computed using Equation (17). According to this equation, the weights of the component forecasts are inversely proportional to their in-sample forecasting errors, concluding that the model with more error is assigned less weight to it and vice versa [80]. In this work, as the values of errors were high for certain ensemble forecasters, the corresponding weights were approximately zero, so they were considered to have a zero value for further predictions. The obtained forecasting results of the individual and combination methods are depicted in Tables 3-8, respectively, for the three cities. In each of these tables, the best results (i.e., those associated with the least values of error measures) are presented in bold letters. In Figures 4 and 5, the forecasting results concerning Thessaloniki and Larissa are visually illustrated for both ensemble methods (AVG, EB). Moreover, Figure 6 gathers the forecasting results for all three cities considering the best ensemble method.

Discussion of Results
The following important observations were noticed after careful analysis of Tables and Figures  above. 1. After a thorough analysis of the Tables 3-8, on the basis of examining the MAE and MSE errors, it could be clearly stated that the EB method presented lower errors concerning the individual forecasters (ANN, hybrid, RCGA-FCM, and SOGA-FCM) for all three cities (Athens, Thessaloniki, and Larisa). EB seemed to outperform the AVG method in terms of achieving overall better forecasting results when applied to individual forecasters (see Figure  6). 2. Considering the ensemble forecasters, it could be seen from the obtained results that none of the two forecast combination methods had attained consistently better accuracies compared

Discussion of Results
The following important observations were noticed after careful analysis of Tables and Figures above.

1.
After a thorough analysis of the Tables 3-8, on the basis of examining the MAE and MSE errors, it could be clearly stated that the EB method presented lower errors concerning the individual forecasters (ANN, hybrid, RCGA-FCM, and SOGA-FCM) for all three cities (Athens, Thessaloniki, and Larisa). EB seemed to outperform the AVG method in terms of achieving overall better forecasting results when applied to individual forecasters (see Figure 6).

2.
Considering the ensemble forecasters, it could be seen from the obtained results that none of the two forecast combination methods had attained consistently better accuracies compared to each other, as far as the cities of Athens and Thessaloniki were concerned. Specifically, from Tables 3-6, it was observed that the MAE and MSE values across the two combination methods were similar for the two cities; however, their errors were lower than those produced by each separate ensemble forecaster.

3.
Although the AVG and the EB methods performed similarly for Athens and Thessaloniki datasets, the EB forecast combination technique presented lower MAE and MSE errors than the AVG for the examined dataset of Larissa (see Figure 5).
The fact that when a forecasting method presented lower MAE and MSE errors than another means that the accuracy of the results produced with the first method, in terms of predicting consumption, was higher than the latter forecasting methods examined and compared to, so as the overall performance of the ensemble method was. Regarding the amount of improvement that was presented when a forecasting method was applied, slightly better performance of both ensemble forecasting methods could be noticed, and that constituted strong evidence for the efficiency of the examined method in the domain of natural gas demand forecasting.
In order to examine the efficiency of the proposed algorithm, a statistical test was conducted to reveal no statistical significance. Concerning the individual methods, a t-test paired two samples of mean was previously conducted in [60] for the cities of Thessaly (Larissa, Volos, Trikala, and Karditsa), for the year 2016, showing that there was no statistical significance among these techniques. In current work, a t-test paired two samples of mean was also performed, regarding the ensemble methods (average and error-based) for the examined cities (Athens, Thessaloniki, and Larissa), regarding the dataset of the same year. The results of the hypothesis tests (Tables A6-A8 in Appendix A) revealed no statistical significance between these techniques. In all cases, the calculated p-value exceeded 0.05, so no statistical significance was noticed from the obtained statistical analysis. Therefore, there was no particular need to conduct a post hoc statistical test, since a post hoc test should only be run when you have an overall statistically significant difference in group means, according to the relevant literature [89,90].
Furthermore, for comparison purposes, to show the effectiveness of the proposed forecasting combination approach of multivariate time series, the experimental analysis was conducted with a new and well-known effective machine learning technique for time series forecasting, the LSTM (long short-term memory). LSTM algorithm encloses the characteristics of the advanced recurrent neural network methods and is mainly applied for time series prediction problems in diverse domains [91].
LSTM was applied in one day-ahead natural gas consumption prediction concerning the same dataset of the three Greek cities (Athens, Thessaloniki, and Larissa) in [70]. For the LSTM implementation, one feature of the dataset as a time series was selected. As explained in [70], LSTM was fed previous values, and, in that case, the time-step was set to be 364 values to predict the next 364. For validation, 20% of random data from the training dataset was used, and for testing, the same dataset that was used for the ANN, RCGA-FCM, SOGA-FCM, and hybrid FCM-ANN, as well as with their ensemble structures implementation. In [70], various experiments with different numbers of units, number of layers, and dropout rates were accomplished. Through the provided experimental analysis, the best results of LSTM emerged for one layer, 200 units, and dropout rate = 0.2. These results are gathered in Table 9 for the three cities.
In Table 9, it is clear that both ensemble forecasting methods can achieve high accuracy in the predictions of the energy consumption patterns in a day-ahead timescale. Additional exploratory analysis and investigation of other types of ensemble methods, as well as other types of neural networks, such as convolutional neural networks (CNNs), could lead to a better insight of the modeling the particular problem and achieve higher prediction accuracy.

Conclusions
To sum up, we applied a time series forecasting method for natural gas demand in three Greek cities, implementing an efficient ensemble forecasting approach through combining ANN, RCGA-FCM, SOGA-FCM, and hybrid FCM-ANN. The proposed forecasting combination approach incorporates the two most popular ensemble methods for error calculation in forecasting problems and is deployed in certain steps offering generalization capabilities. The whole framework seems to be a promising approach for ensemble time series forecasting that can easily be applied in many scientific domains. An initial comparison analysis was conducted with benchmark methods of ANN, FCM, and their different configurations. Next, further comparison analysis was conducted with new promising LSTM networks previously used for time series prediction.
Through the experimental analysis, two error statistics (MAE, MSE) needed to be calculated in order to examine the effectiveness of the ensemble learning approach in time series prediction. The results of this study showed that the examined ensemble approach through designing an ensemble structure of various ANN, SOGA-FCM models by different learning parameters and their hybrid structures could significantly improve forecasting. Moreover, obtained results clearly demonstrated that a relatively higher forecasting accuracy was noticed when the applied ensemble approach was compared against independent forecasting approaches, such as ANN or FCM, as well as with LSTM.
Future work is devoted to applying the advantageous forecast combination approach to a larger number of distribution points that compose the natural gas grid of Greek regions (larger and smaller cities) as well as to investigate a new forecast combination structure of efficient convolutional neural networks (CNN) and LSTM networks for time series prediction in various application domains. Furthermore, an extensive comparative analysis with various LSTM structures, as well as with other advanced machine learning and time series prediction methods, will be conducted in future work. The presented research work could also contribute to explainability, transparency, and re-traceability of artificial intelligence (AI) and machine learning systems. These systems are being applied in various fields, and the decisions being made by them are not always clear due to the use of complicated algorithms in order to achieve power, performance, and accuracy. The authors with the use of complicated, but powerful algorithms, such as neural networks and ensemble methods, tried to describe all the steps and models involved in decision-making process to attain explainability and, in future, they would further explore ways to make the best-performing methods more transparent, re-traceable, and understandable, explaining why certain decisions have been made [92].
Funding: This research received no external funding.

Conflicts of Interest:
The authors declare no conflict of interest.
Appendix A Table A1. Descriptive statistics values for real dataset Z(t), forecasting values of AVG and EB methods for the three cities (validation).