Energy Performance Forecasting of Residential Buildings Using Fuzzy Approaches

The energy consumption used for domestic purposes in Europe is, to a considerable extent, due to heating and cooling. This energy is produced mostly by burning fossil fuels, which has a high negative environmental impact. The characteristics of a building are an important factor to determine the necessities of heating and cooling loads. Therefore, the study of the relevant characteristics of the buildings, regarding the heating and cooling needed to maintain comfortable indoor air conditions, could be very useful in order to design and construct energy-efficient buildings. In previous studies, different machine-learning approaches have been used to predict heating and cooling loads from the set of variables: relative compactness, surface area, wall area, roof area, overall height, orientation, glazing area and glazing area distribution. However, none of these methods are based on fuzzy logic. In this research, we study two fuzzy logic approaches, i.e., fuzzy inductive reasoning (FIR) and adaptive neuro fuzzy inference system (ANFIS), to deal with the same problem. Fuzzy approaches obtain very good results, outperforming all the methods described in previous studies except one. In this work, we also study the feature selection process of FIR methodology as a pre-processing tool to select the more relevant variables before the use of any predictive modelling methodology. It is proven that FIR feature selection provides interesting insights into the main building variables causally related to heating and cooling loads. This allows better decision making and design strategies, since accurate cooling and heating load estimations and correct identification of parameters that affect building energy demands are of high importance to optimize building designs and equipment specifications.


Introduction
In recent years there has been a substantial increase of research in the area of energy performance in buildings. The aim is to design and construct more energy-efficient buildings in order to reduce their energy consumption and CO2 emissions. Buildings are responsible for approximately 40% of energy consumption and 36% of CO2 emissions in the EU, making them the single largest energy consumer in Europe. Residential buildings comprise the biggest segment of the EU's building stock and are responsible for the majority of the sector's energy consumption [1]. The European Commission (EC) boosted the research in this area with a program framed in Horizon 2020 [1]. In the Energy Efficiency Plan in Europe, the EC states that a greatest energy saving potential lies in buildings. The minimum energy savings in buildings can generate a reduction of 60-80 Mtoe in final energy consumption by 2020, and make a considerable contribution to the reduction of greenhouse gas (GHG) emissions [2].
Machine-learning (ML) strategies, within the field of artificial intelligence, have already been used to deal with estimation and performance of energy consumption in buildings, as summarized in the

Fuzzy Methodologies
Both FIR and ANFIS machine-learning approaches proposed in this research are hybrid methodologies that combine mainly soft computing techniques. FIR combines fuzzy logic with k-nearest neighbours while ANFIS combines fuzzy logic with neural networks.
FIR and ANFIS, as fuzzy methodologies, have some main differences that are listed below: -ANFIS is a pure fuzzy inference system, i.e., a fuzzy system is trained and a fuzzy inference process is performed for prediction. FIR identifies a fuzzy model that represents the system under study, but the prediction is performed applying a k-nearest neighbour (KNN) hybrid approach. -ANFIS models are trained by means of neural network training algorithms. FIR models are synthesized rather than trained, meaning that an optimization process is performed to organize the training data appropriately. -ANFIS models are Sugeno-type fuzzy systems where the antecedents are fuzzy sets and the consequent is not a fuzzy set, but a function. FIR, on the contrary, uses fuzzy sets to represent both the antecedents and the consequent, being more useful for obtaining explainable models that can be used for decision making. -FIR has an internal feature selection (FS) process that allows identifying the input variables that are most relevant in the inference process and their degree of individual relevance. ANFIS has no FS integrated in its structure, and it is necessary to apply a FS algorithm as a pre-process, if desired.
The main characteristics of these two methodologies are summarized in this section.

Fuzzy Inductive Reasoning (FIR)
The conceptualization of the FIR methodology arises from the General System Problem Solving (GSPS) approach proposed by Klir [22]. FIR offers a model-based approach to predicting either univariate or multi-variate time series [23][24][25][26]. Visual-FIR is a tool based on the FIR methodology (runs under Matlab environment), that offers a new perspective on the modelling and simulation of complex systems.
FIR is composed of four main processes: fuzzification, qualitative model identification, fuzzy forecasting, and defuzzification. The fuzzification process converts quantitative data stemming from the system into fuzzy data. The qualitative model identification process is responsible for finding causal and temporal relations between variables (feature selection process) and is used to obtain the best model that represents the system. Once the FIR model is available, the prediction system can take place using the FIR inference engine. This process is called fuzzy forecasting. The FIR inference engine is a specialization of the KNN algorithm, commonly used in the pattern recognition field. Defuzzification is the inverse process of fuzzification, it allows converting the qualitative predicted output into quantitative values that can then be used as inputs to an external quantitative model. Figure 1 illustrates the process of fuzzification by means of an example. A quantitative value is fuzzified into a qualitative triple. The first element of the triple is the class value, the second element is the fuzzy membership value, and the third element is the side value. The side value indicates whether the qualitative value is to the left or right of the peak value of the associated membership function. The side value is responsible for preserving, in the qualitative triple, the complete knowledge that had been contained in the original quantitative value. According to Figure 1, a human age of 50 years old would be fuzzified into the class Middle Age (class where a value of 50 years has the highest membership value), a fuzzy membership value of 0.75 and a side value of right (because 50 years is to the right hand side of the maximum membership value of the class Middle Age).

Qualitative Model Identification
A FIR model is composed by a structure, called mask, and a pattern rule base, named behaviour matrix. A mask denotes a dynamic relationship among qualitative variables. An example of a mask is presented in Figure 2.
Example of a mask in FIR methodology.
Each negative element in the mask is called m-input (mask input). It denotes a causal relation with the output, that is, it influences the output up to a certain degree. The enumeration of the minputs is immaterial and has no relevance. The single positive value denotes the m-output. The mask of Figure 2 contains four m-inputs. In this example, the first and second m-inputs, i1 and i2, correspond to the input variables u1 and u4 two sampling intervals back in time, u1(t-2δt) and u4(t-2δt). The third m-input, i3, refers to the output variable y1 one sampling interval into the past, y1(t-δt), etc.
Finding a mask that optimizes the predictivness of the model can be done either by a mechanism of exhaustive search of exponential complexity or by one among a number of suboptimal search strategies of polynomial complexity. The optimality of the mask is evaluated with respect to the maximisation of its forecasting power using the Shannon entropy measure [27]. The m-inputs of the optimal mask are the features that will be used in the prediction process. Therefore, the mask identification can be viewed as a feature selection algorithm.
Once the best mask is identified it can be used to obtain the pattern rule base. This process is illustrated in Figure 3.
The mask is used to 'flatten' dynamic relationships into pseudo-static relationships. The left side of Figure 3 shows an excerpt of the qualitative data matrix that stores the class values. It shows the numerical rather than the symbolic class values. The box symbolizes the optimal mask that is shifted downwards along the qualitative data matrix. The round "holes" in the mask denote the positions of the m-inputs, whereas the square "hole" indicates the position of the m-output. The class values are read out from the class value matrix through the "holes" of the mask, and are placed next to each other in the behaviour matrix that is shown on the right side of Figure 3. Each row of the behaviour matrix represents one pseudo-static qualitative state or qualitative rule (also called pattern rule). For example, the framed rule of Figure 3 can be read as follows, "If all m-inputs (i1, i2, i3, i4) have a value of 2 (corresponding to medium) then the m-output, O1, assumes a value of 1 (corresponding to low)".

Qualitative Model Identification
A FIR model is composed by a structure, called mask, and a pattern rule base, named behaviour matrix. A mask denotes a dynamic relationship among qualitative variables. An example of a mask is presented in Figure 2.

Qualitative Model Identification
A FIR model is composed by a structure, called mask, and a pattern rule base, named behaviour matrix. A mask denotes a dynamic relationship among qualitative variables. An example of a mask is presented in Figure 2.
Example of a mask in FIR methodology.
Each negative element in the mask is called m-input (mask input). It denotes a causal relation with the output, that is, it influences the output up to a certain degree. The enumeration of the minputs is immaterial and has no relevance. The single positive value denotes the m-output. The mask of Figure 2 contains four m-inputs. In this example, the first and second m-inputs, i1 and i2, correspond to the input variables u1 and u4 two sampling intervals back in time, u1(t-2δt) and u4(t-2δt). The third m-input, i3, refers to the output variable y1 one sampling interval into the past, y1(t-δt), etc.
Finding a mask that optimizes the predictivness of the model can be done either by a mechanism of exhaustive search of exponential complexity or by one among a number of suboptimal search strategies of polynomial complexity. The optimality of the mask is evaluated with respect to the maximisation of its forecasting power using the Shannon entropy measure [27]. The m-inputs of the optimal mask are the features that will be used in the prediction process. Therefore, the mask identification can be viewed as a feature selection algorithm.
Once the best mask is identified it can be used to obtain the pattern rule base. This process is illustrated in Figure 3.
The mask is used to 'flatten' dynamic relationships into pseudo-static relationships. The left side of Figure 3 shows an excerpt of the qualitative data matrix that stores the class values. It shows the numerical rather than the symbolic class values. The box symbolizes the optimal mask that is shifted downwards along the qualitative data matrix. The round "holes" in the mask denote the positions of the m-inputs, whereas the square "hole" indicates the position of the m-output. The class values are read out from the class value matrix through the "holes" of the mask, and are placed next to each other in the behaviour matrix that is shown on the right side of Figure 3. Each row of the behaviour matrix represents one pseudo-static qualitative state or qualitative rule (also called pattern rule). For example, the framed rule of Figure 3 can be read as follows, "If all m-inputs (i1, i2, i3, i4) have a value of 2 (corresponding to medium) then the m-output, O1, assumes a value of 1 (corresponding to low)". Each negative element in the mask is called m-input (mask input). It denotes a causal relation with the output, that is, it influences the output up to a certain degree. The enumeration of the m-inputs is immaterial and has no relevance. The single positive value denotes the m-output. The mask of Figure 2 contains four m-inputs. In this example, the first and second m-inputs, i 1 and i 2 , correspond to the input variables u 1 and u 4 two sampling intervals back in time, u 1 (t-2δt) and u 4 (t-2δt). The third m-input, i 3 , refers to the output variable y 1 one sampling interval into the past, y 1 (t-δt), etc.
Finding a mask that optimizes the predictivness of the model can be done either by a mechanism of exhaustive search of exponential complexity or by one among a number of suboptimal search strategies of polynomial complexity. The optimality of the mask is evaluated with respect to the maximisation of its forecasting power using the Shannon entropy measure [27]. The m-inputs of the optimal mask are the features that will be used in the prediction process. Therefore, the mask identification can be viewed as a feature selection algorithm.
Once the best mask is identified it can be used to obtain the pattern rule base. This process is illustrated in Figure 3.
The mask is used to 'flatten' dynamic relationships into pseudo-static relationships. The left side of Figure 3 shows an excerpt of the qualitative data matrix that stores the class values. It shows the numerical rather than the symbolic class values. The box symbolizes the optimal mask that is shifted downwards along the qualitative data matrix. The round "holes" in the mask denote the positions of the m-inputs, whereas the square "hole" indicates the position of the m-output. The class values are read out from the class value matrix through the "holes" of the mask, and are placed next to each other in the behaviour matrix that is shown on the right side of Figure 3. Each row of the behaviour matrix represents one pseudo-static qualitative state or qualitative rule (also called pattern rule). For example, the framed rule of Figure 3 can be read as follows, "If all m-inputs (i 1 , i 2 , i 3 , i 4 ) have a value of 2 (corresponding to medium) then the m-output, O 1 , assumes a value of 1 (corresponding to low)".

Fuzzy Forecasting
In the forecasting phase, future qualitative output states can be predicted using the model by means of the inference engine [23]. The FIR inference engine is based on a variant of the k-nearest neighbour algorithm. Using the KNN fuzzy inference algorithm, the k pattern rules with the smallest distance measure are selected from the pattern rule base that is part of the FIR model, and a distanceweighted average of their fuzzy components (class, membership, and side values) is computed as shown in Figure 4. The forecast of the output variable is obtained as a weighted average of the potential conclusions of the k pattern rules with smallest distances. The qualitative simulation process predicts an entire qualitative triple, from which a quantitative value can be obtained whenever needed by means of the FIR defuzzification process (inverse process of fuzzification). For a deeper and more detailed insight into the FIR methodology, the reader is referred to [23][24][25][26].

Fuzzy Forecasting
In the forecasting phase, future qualitative output states can be predicted using the model by means of the inference engine [23]. The FIR inference engine is based on a variant of the k-nearest neighbour algorithm. Using the KNN fuzzy inference algorithm, the k pattern rules with the smallest distance measure are selected from the pattern rule base that is part of the FIR model, and a distance-weighted average of their fuzzy components (class, membership, and side values) is computed as shown in Figure 4. The forecast of the output variable is obtained as a weighted average of the potential conclusions of the k pattern rules with smallest distances. The qualitative simulation process predicts an entire qualitative triple, from which a quantitative value can be obtained whenever needed by means of the FIR defuzzification process (inverse process of fuzzification).

Fuzzy Forecasting
In the forecasting phase, future qualitative output states can be predicted using the model by means of the inference engine [23]. The FIR inference engine is based on a variant of the k-nearest neighbour algorithm. Using the KNN fuzzy inference algorithm, the k pattern rules with the smallest distance measure are selected from the pattern rule base that is part of the FIR model, and a distanceweighted average of their fuzzy components (class, membership, and side values) is computed as shown in Figure 4. The forecast of the output variable is obtained as a weighted average of the potential conclusions of the k pattern rules with smallest distances. The qualitative simulation process predicts an entire qualitative triple, from which a quantitative value can be obtained whenever needed by means of the FIR defuzzification process (inverse process of fuzzification). For a deeper and more detailed insight into the FIR methodology, the reader is referred to [23][24][25][26]. For a deeper and more detailed insight into the FIR methodology, the reader is referred to [23][24][25][26].

Adaptive Neuro-Fuzzy Inference System (ANFIS)
The adaptive neuro-fuzzy inference system, developed by Jang, is one of the most popular hybrid neuro-fuzzy systems for function approximation [28]. ANFIS represents a Sugeno-type neuro-fuzzy system. A neuro-fuzzy system is a fuzzy system that uses learning methods derived from neural networks to find its own parameters. It is relevant that the learning process is not knowledge-based but data-driven, as FIR methodology.
The main characteristic of the Sugeno inference system is that the consequent, or output, of the fuzzy rules is not a fuzzy set (as are the inputs of the fuzzy rules) but a function, as shown in where a and b are linguistic variables; A 1 , A 2 , B 1 , and B 2 are linguistic values determined by fuzzy sets on the universe of discourse of A and B, and are represented in the form of membership functions; p 1 , q 1 , r 1 , p 2 , q 2 , and r 2 are parameters of the output functions. Figure 5 graphically describes the inference process of a Sugeno model composed by the two rules described in Equation (1). The first step of the Sugeno inference is to combine a given input tuple (in the example of Figure 5: a = 3 and b = 2) with the rule's antecedents to determine the degree to which each input belongs to each corresponding fuzzy set (left panel of Figure 5). The min operator is then used to obtain the weight of each rule, w i , which is used in the final output computation, z (right panel of Figure 5). Notice that the Sugeno inference has two differentiated sets of parameters. The first set corresponds to the membership function's parameters of the input variables. The second set corresponds to the parameters associated with the output function of each rule, that is, p i , q i , and r i .

Adaptive Neuro-Fuzzy Inference System (ANFIS)
The adaptive neuro-fuzzy inference system, developed by Jang, is one of the most popular hybrid neuro-fuzzy systems for function approximation [28]. ANFIS represents a Sugeno-type neurofuzzy system. A neuro-fuzzy system is a fuzzy system that uses learning methods derived from neural networks to find its own parameters. It is relevant that the learning process is not knowledgebased but data-driven, as FIR methodology.
The main characteristic of the Sugeno inference system is that the consequent, or output, of the fuzzy rules is not a fuzzy set (as are the inputs of the fuzzy rules) but a function, as shown in Equation (1) where a and b are linguistic variables; A1, A2, B1, and B2 are linguistic values determined by fuzzy sets on the universe of discourse of A and B, and are represented in the form of membership functions; p1, q1, r1, p2, q2, and r2 are parameters of the output functions. Figure 5 graphically describes the inference process of a Sugeno model composed by the two rules described in Equation (1). The first step of the Sugeno inference is to combine a given input tuple (in the example of Figure 5: a = 3 and b = 2) with the rule's antecedents to determine the degree to which each input belongs to each corresponding fuzzy set (left panel of Figure 5). The min operator is then used to obtain the weight of each rule, wi, which is used in the final output computation, z (right panel of Figure 5). Notice that the Sugeno inference has two differentiated sets of parameters. The first set corresponds to the membership function's parameters of the input variables. The second set corresponds to the parameters associated with the output function of each rule, that is, pi, qi, and ri. ANFIS is responsible for automatically adjusting these two sets of parameters by means of two optimization algorithms: back-propagation (gradient descendent) and least square estimation. The back-propagation is used to learn the parameters of the antecedents (membership functions) and the least squares estimation is used to determine the coefficients of the linear combinations in the rules' consequents. The ANFIS structure is composed of five layers as shown in Figure 6. ANFIS has n input nodes, where n is the number of inputs to the system. In the example of Figure 6, n is 2 and corresponds to a and b.
Layer 1 is the fuzzification layer in which each node represents a membership value to a linguistic term as a Gaussian, triangular, etc. function. The parameters of the selected function are adapted by means of the back-propagation algorithm during the learning stage. ANFIS is responsible for automatically adjusting these two sets of parameters by means of two optimization algorithms: back-propagation (gradient descendent) and least square estimation. The back-propagation is used to learn the parameters of the antecedents (membership functions) and the least squares estimation is used to determine the coefficients of the linear combinations in the rules' consequents. The ANFIS structure is composed of five layers as shown in Figure 6. ANFIS has n input nodes, where n is the number of inputs to the system. In the example of Figure 6, n is 2 and corresponds to a and b.
Layer 1 is the fuzzification layer in which each node represents a membership value to a linguistic term as a Gaussian, triangular, etc. function. The parameters of the selected function are adapted by means of the back-propagation algorithm during the learning stage.
As the values of the parameters change, the membership function of the linguistic terms, A i and B i , changes. Layer 2 applies a T-norm operator that performs fuzzy intersection. Layer 3 is a normalization layer that normalizes the strength of all rules, i.e., computes ω i = ω i i ω i . Layer 4 updates the parameters of the consequents of the rules by a learning step based on the least squares approximation algorithm. Finally, Layer 5 performs the summation of the outputs of the nodes in Layer 4. ANFIS is a function of the Fuzzy toolbox that runs under the Matlab environment. For a more detailed explanation of the ANFIS methodology refer to Nauck et al. [28].
As the values of the parameters change, the membership function of the linguistic terms, Ai and Bi, changes. Layer 2 applies a T-norm operator that performs fuzzy intersection. Layer 3 is a normalization layer that normalizes the strength of all rules, i.e., computes = ∑ . Layer 4 updates the parameters of the consequents of the rules by a learning step based on the least squares approximation algorithm. Finally, Layer 5 performs the summation of the outputs of the nodes in Layer 4. ANFIS is a function of the Fuzzy toolbox that runs under the Matlab environment. For a more detailed explanation of the ANFIS methodology refer to Nauck et al. [28].

Data Collection
The data used for this study stems from the UCI machine learning repository [18] under the name: Energy efficiency data set. Tsanas and Xifara [17] created the data by generating 768 simulated buildings using Ecotet. Ecotet is a sustainable building design software tool that allows the design of buildings by performing a complete analysis on building energy, thermal performance, and water usage, among other functionalities [29].
All buildings in the simulation have a volume of 771.75 m 3 , but different surface areas and dimensions. All of them are designed with the newest and most common building materials in the construction industry and with building elements achieving the lowest heat loss through the wall, floor or roof (U-value). The simulation assumes that the buildings are located in Athens, Greece, and are residential buildings.
Three types of glazing areas were used, expressed as percentages of the floor area: 10%, 25%, and 40%. Furthermore, five different distribution scenarios for each glazing area were simulated: (1) uniform: 25% glazing on each side, (2) north: 55% on the north side and 15% on each of the other sides, (3) east: 55% on the east side and 15% on each of the other sides, (4) south: 55% on the south side and 15% on each of the other sides, and (5) west: 55% on the west side and 15% on each of the other sides. In addition, they obtained samples with no glazing areas. Finally, all shapes were rotated to face the four cardinal points.
Each of the 768 simulated buildings can be characterized by eight building parameters, which are summarized in Table 1. These parameters correspond to the input variables.

Data Collection
The data used for this study stems from the UCI machine learning repository [18] under the name: Energy efficiency data set. Tsanas and Xifara [17] created the data by generating 768 simulated buildings using Ecotet. Ecotet is a sustainable building design software tool that allows the design of buildings by performing a complete analysis on building energy, thermal performance, and water usage, among other functionalities [29].
All buildings in the simulation have a volume of 771.75 m 3 , but different surface areas and dimensions. All of them are designed with the newest and most common building materials in the construction industry and with building elements achieving the lowest heat loss through the wall, floor or roof (U-value). The simulation assumes that the buildings are located in Athens, Greece, and are residential buildings.
Three types of glazing areas were used, expressed as percentages of the floor area: 10%, 25%, and 40%. Furthermore, five different distribution scenarios for each glazing area were simulated: (1) uniform: 25% glazing on each side, (2) north: 55% on the north side and 15% on each of the other sides, (3) east: 55% on the east side and 15% on each of the other sides, (4) south: 55% on the south side and 15% on each of the other sides, and (5) west: 55% on the west side and 15% on each of the other sides. In addition, they obtained samples with no glazing areas. Finally, all shapes were rotated to face the four cardinal points.
Each of the 768 simulated buildings can be characterized by eight building parameters, which are summarized in Table 1. These parameters correspond to the input variables. They also recorded the heating load (HL) and the cooling load (CL), which correspond to the output variables. Basic statistical analysis of the data was also performed, showing that linear modelling techniques are not appropriate for this data, since the scatter and density plots do not follow a Gaussian distribution. As mentioned earlier, the data generated represent with high probability actual real data, enabling energy comparisons between buildings [17]. Moreover, the use of simulated building data is useful and justified when dealing with energy prediction challenges, since it allows us to reduce considerably the time needed to obtain the data, design easily different building configurations and work with a lower level of detail.

Performance Criteria
In order to test the performance of FIR and ANFIS fuzzy models we use 10-fold cross validation (10-fold CV). Model parameters are obtained using the training subset and model validation is performed using the testing subset. For statistical confidence, the training and testing processes are repeated 100 times to the whole dataset, randomly permuted in each run prior to splitting in training and testing subsets.
Three evaluation criteria are used to evaluate the performance of each of the models. Two of them are error measures, i.e., root mean square error (RMSE) and the mean absolute error (MAE), described in Equations (2) and (3), respectively. These errors have been chosen because they are the ones most frequently used in previous studies [17,[19][20][21], and it is intended to compare the fuzzy proposed techniques with the methodologies presented in those works.
In Equations (2) and (3),ŷ(t) corresponds to the predicted output, y(t) to the system output, and N is the number of samples. Both RMSE and MAE are commonly used accuracy measures that are useful when comparing different methods on the same data, as is the case in this research. The MAE is a linear score, which means all individual differences are weighted equally in the average. The RMSE gives a relatively high weight to large errors, since the errors are squared before they are averaged. This means that the RMSE is most useful when large errors are particularly undesirable. The MAE and the RMSE can be used together to diagnose the variation in the errors in a set of forecasts.
The third evaluation criteria has the goal to obtain a comprehensive performance measure. To this end, RMSE and MAE measures are combined into a synthesis index, SI, as defined in Equation (4).
where Meas i is the performance measure, i.e., RMSE, MAE, and Meas min,i and Meas max,i are the minimum and maximum values of the corresponding measure for the set of prediction models, respectively. The SI range is [0, 1] and values close to 0 indicates a highly accurate prediction model.

Fuzzy Models Identification
In this section the development of ANFIS and FIR models is described. As mentioned before, in this application we have two output variables: HL and CL. We want to study if it is possible to estimate each output by using the eight input variables that represent different building parameters (see Table 1). Both ANFIS and FIR methodologies allow developing models that have a single output, that is, SISO (single input single output) or MISO (multiple inputs single output) models. Therefore, for each methodology two sets of models are obtained, one for each possible output. We refer to sets of models because for each methodology and each output we obtain a model for each of the 10-fold CV, and this is repeated 100 times. Therefore, 1000 models are obtained and validated for each of the two methodologies and outputs studied.
Both fuzzy approaches need to discretize the quantitative data into qualitative data. To this end, it becomes necessary to define at least two parameters: the number of classes (also called granularity) chosen for each input variable (and also for the output variables in the case of the FIR models), and the shape of the membership functions of the input variables (and also for the output variables in the case of the FIR models).
In this research we decided to discretize the input variables relative compactness (RC), surface area (SA), wall area (WA), and glazing area distribution (GAD) into three classes and roof area (RA), overall height (OH), orientation (O), and glazing area (GA) into two classes. The output variables are discretized into three classes in the case of the FIR models. Remember that ANFIS does not have fuzzy consequents, that is, the rules' output is a function (see Figure 5). These discretization parameters have been chosen based on the analysis of the data. The variable OH can take only two possible values and is, therefore, represented in two classes. Variables RA, O, and GA can take four different values, and some of these values appear only a few times. A discretization with more than two classes does not enhance the model's prediction power; instead, a higher number of classes can lead to a curse of dimensionality problem. Therefore, it was found that two classes are enough for these variables to have a good representation and to obtain efficient models. The variables that are discretized into three classes have a uniform distribution in their dimensionality space; therefore, an odd number of classes seem more reasonable. Three is the lowest number of classes that give good results for these variables. A Gaussian shape has been used to discretize all variables, since slightly better results are obtained.

ANFIS Models
In order to obtain ANFIS models it is necessary to define two sets of parameters: one related to the discretization process of the input variables, which has been explained before, and the parameters related to the training process. The parameters needed to perform the training process are: the type of the output function (i.e., constant or linear), the optimization method to train the fuzzy inference system, and the number of training epochs.
ANFIS uses the eight input variables to predict each output, HL and CL, and does not perform any kind of feature selection. In this experiment we use a constant output function, two different numbers of training epochs, and a hybrid optimization method (default option). A constant output has been chosen instead of a linear one because the number of parameters that ANFIS needs to optimize when a linear output function is used causes Matlab to run out of memory. Notice there are 8 input variables, 4 discretized into 3 classes and 4 into 2 classes; therefore, each ANFIS model is composed of a rule set of 3 4 × 2 4 = 1296 rules. If the output of each rule is a constant output function then the number of output parameters to be optimized is 1296. However, if each rule has as output a first-order linear function the number of output parameters to optimize becomes higher: 11,664 (= 1296 × 9). Two different values, i.e., 50 and 300, are studied to set the number of epochs. The time needed to run a 10-fold CV using an Intel Core2 Duo (3GHz, 4GB of RAM) during 50 epochs is more than 7 h and during 300 epochs is more than 41 h. However, the errors obtained in both cases are virtually identical, for instance, in the case of the HL the RMSE errors obtained are 0.5205 (50 epochs) versus 0.5203 (300 epochs). Therefore, in this case, it does not make sense to increase more than 50 the number of epochs. With respect to the optimization method, the hybrid approach, i.e., a combination of backpropagation and least square estimation (see Section 2.2), has been chosen because it is the original one designed to optimize ANFIS models, and from our point of view is better suited for this task.

FIR Models
As in the case of ANFIS, the first step in order to obtain the FIR models is to discretize the data by converting quantitative values into fuzzy data. For this reason, it is necessary to specify the two parameters described in the previous section, granularity and shape of the membership functions, as well as a third parameter that refers to the discretization algorithm. Depending on the algorithm chosen, the distribution of the membership functions in the variable space may vary, which has a direct impact to the reasoning process and, therefore, to the model predictions. Contrary to FIR, ANFIS does not have this discretization parameter since ANFIS distributes uniformly all the membership functions that describe a specific variable and then performs an optimization process. Figure 7 shows an example of uniform (upper plot) and non-uniform (lower plot) distribution of the membership functions of a variable.
chosen, the distribution of the membership functions in the variable space may vary, which has a direct impact to the reasoning process and, therefore, to the model predictions. Contrary to FIR, ANFIS does not have this discretization parameter since ANFIS distributes uniformly all the membership functions that describe a specific variable and then performs an optimization process. Figure 7 shows an example of uniform (upper plot) and non-uniform (lower plot) distribution of the membership functions of a variable.
In this research, based on the observation of the data distribution, the equal with partition (EWP) algorithm has been used for the discretization of the RA and OH variables, and the equal frequency partition (EFP) algorithm is used for the discretization of the rest of the variables. EWP performs a uniform distribution of the membership functions. EFP distributes the membership functions of a variable in such a way that all the classes contain the same number of data points. Visual-FIR allows the modeller to choose between 15 discretization algorithms, some of them belonging to the hierarchical family and others to the fuzzy family [26]. Once the data has been discretized, FIR methodology performs a feature selection process where the most relevant causal relations between the input variables and the output variable are identified. To this end, the model structure identification process of FIR is used, which performs a feature selection based on the entropy reduction measure, as described in Section 2.1.
FIR founds as optimal HL and CL masks the one shown in Figure 8. That means that for both outputs (HL and CL), the features that have higher relevant causal relation with the output are RC (relative compactness) and GA (glazing area). The use of the other variables does not improve the predictive power of FIR models. Therefore, these two variables represent the minimum subset of variables needed to accurately estimate the heating and cooling loads.  In this research, based on the observation of the data distribution, the equal with partition (EWP) algorithm has been used for the discretization of the RA and OH variables, and the equal frequency partition (EFP) algorithm is used for the discretization of the rest of the variables. EWP performs a uniform distribution of the membership functions. EFP distributes the membership functions of a variable in such a way that all the classes contain the same number of data points. Visual-FIR allows the modeller to choose between 15 discretization algorithms, some of them belonging to the hierarchical family and others to the fuzzy family [26].

RC
Once the data has been discretized, FIR methodology performs a feature selection process where the most relevant causal relations between the input variables and the output variable are identified. To this end, the model structure identification process of FIR is used, which performs a feature selection based on the entropy reduction measure, as described in Section 2.1.
FIR founds as optimal HL and CL masks the one shown in Figure 8. That means that for both outputs (HL and CL), the features that have higher relevant causal relation with the output are RC (relative compactness) and GA (glazing area). The use of the other variables does not improve the predictive power of FIR models. Therefore, these two variables represent the minimum subset of variables needed to accurately estimate the heating and cooling loads.

FIR Models
As in the case of ANFIS, the first step in order to obtain the FIR models is to discretize the data by converting quantitative values into fuzzy data. For this reason, it is necessary to specify the two parameters described in the previous section, granularity and shape of the membership functions, as well as a third parameter that refers to the discretization algorithm. Depending on the algorithm chosen, the distribution of the membership functions in the variable space may vary, which has a direct impact to the reasoning process and, therefore, to the model predictions. Contrary to FIR, ANFIS does not have this discretization parameter since ANFIS distributes uniformly all the membership functions that describe a specific variable and then performs an optimization process. Figure 7 shows an example of uniform (upper plot) and non-uniform (lower plot) distribution of the membership functions of a variable.
In this research, based on the observation of the data distribution, the equal with partition (EWP) algorithm has been used for the discretization of the RA and OH variables, and the equal frequency partition (EFP) algorithm is used for the discretization of the rest of the variables. EWP performs a uniform distribution of the membership functions. EFP distributes the membership functions of a variable in such a way that all the classes contain the same number of data points. Visual-FIR allows the modeller to choose between 15 discretization algorithms, some of them belonging to the hierarchical family and others to the fuzzy family [26]. Once the data has been discretized, FIR methodology performs a feature selection process where the most relevant causal relations between the input variables and the output variable are identified. To this end, the model structure identification process of FIR is used, which performs a feature selection based on the entropy reduction measure, as described in Section 2.1.
FIR founds as optimal HL and CL masks the one shown in Figure 8. That means that for both outputs (HL and CL), the features that have higher relevant causal relation with the output are RC (relative compactness) and GA (glazing area). The use of the other variables does not improve the predictive power of FIR models. Therefore, these two variables represent the minimum subset of variables needed to accurately estimate the heating and cooling loads.

Fuzzy Approaches Results and Discussion
The RMSE and MAE obtained by ANFIS and FIR models, for both HL and CL output variables, are summarized in the last two rows of Table 2. To demonstrate the performance of the developed fuzzy models, we compare the results with those obtained using 13 other techniques, namely IRLS, RF, SVR, GLR, CHAID, CART, SVM, RBFNN, BPNN, MARS, GSGP, GSGP-LS, GSGP-LS-LIN [17,[19][20][21]. All these machine-learning methodologies have in common that they generate individual predictive models and are not an ensemble of models. The full results obtained for all the machine-learning approaches are also shown in Table 2. Table 2. RMSE (root mean square prediction errors), MAE (mean absolute prediction errors) and SI (synthesis index) obtained by all the machine learning methodologies, including the fuzzy approaches, ANFIS and FIR, at the bottom of the table. Near the name of each model the reference of the paper is included. The best prediction performances are highlighted. IRLS is a linear regression algorithm that adjusts weights in the coefficients of the classical regression scheme. RF is a method that builds a set of classification and regression trees using the Bagging algorithm. SVR is a support vector machine that solves regression problems. While other linear regression models try to minimize the error between the predicted and the actual value, SVR tries to fit the best line within a predefined or threshold error value.

Model
GLR, general linear regression, is a more flexible version of standard linear regression that assumes that data points have an arbitrary distribution pattern. CHAID, Chi-squared automatic interaction detector, is a decision tree classification technique that uses the Chi-square as test of independence. The CART is a decision binary tree method of constructing a classification or regression tree according to its dependent variable. The split is chosen using a greedy algorithm to minimize a cost function. The support vector machines, SVM, construct a hyperplane or a set of hyperplanes in a high or infinite dimensional space, which can be used for classification or other tasks like outliers detection. A radial basis function network, RBFNN, is an artificial neural network that uses radial basis functions as activation functions. BPNN is a back-propagation neural network. These networks can be trained with a training dataset propagating the error back to input until the minimal mean square error is achieved. MARS, multivariate adaptive regression splines, fits a basis function (term) to distinct independent variable intervals.
The next three techniques, i.e., GSGP, GSGP-LS and GSGP-LS-LIN, are genetic programming approaches based on evolutionary computation. All of them use geometric semantic operators to produce new candidate solutions. GSGP-LS includes a local search strategy within mutation operator. GSGP-LS-LIN modifies the fitness function including linear scaling. For knowing the parameter settings of these methodologies for the problem at hand refer to [17,[19][20][21].
The best prediction performances are highlighted in Table 2. The best results for CL and HL are obtained when the SVR algorithm is used. The second and third best results are obtained by the fuzzy approaches. FIR obtains a slightly better synthesis index (SI) than ANFIS when modelling the HL, whereas for CL it is ANFIS that obtains a slightly better index. Following SVR, FIR and ANFIS performances (with SI lower than 0.1), the next-based alternative is the MARS algorithm. MARS is able to get equivalent performances than FIR for CL, but quite worse for HL. The remaining 12 modelling methodologies have results that are far away from the ones obtained by SVR, FIR and ANFIS. Figure 9 shows the synthesis indices, SI, obtained for all the modelling methodologies. From this figure it can be seen that IRLS, SVM and BPNN have by far the lowest performance, with SI values higher than 0.65. GSGP and GSGP-LS techniques have poor indices too, i.e., SI values around 0.40. Finally, RF, GLR, CHAID, CART, RBFNN and GSGP-LS-LIN obtain better results than the aforementioned methodologies, with SI values usually lower than 0.24 for CL and 0.26 for HL. However, these results are still away from the values obtained by fuzzy approaches and SVR methodology that are lower than 0.1.
The best prediction performances are highlighted in Table 2. The best results for CL and HL are obtained when the SVR algorithm is used. The second and third best results are obtained by the fuzzy approaches. FIR obtains a slightly better synthesis index (SI) than ANFIS when modelling the HL, whereas for CL it is ANFIS that obtains a slightly better index. Following SVR, FIR and ANFIS performances (with SI lower than 0.1), the next-based alternative is the MARS algorithm. MARS is able to get equivalent performances than FIR for CL, but quite worse for HL. The remaining 12 modelling methodologies have results that are far away from the ones obtained by SVR, FIR and ANFIS. Figure 9 shows the synthesis indices, SI, obtained for all the modelling methodologies. From this figure it can be seen that IRLS, SVM and BPNN have by far the lowest performance, with SI values higher than 0.65. GSGP and GSGP-LS techniques have poor indices too, i.e., SI values around 0.40. Finally, RF, GLR, CHAID, CART, RBFNN and GSGP-LS-LIN obtain better results than the aforementioned methodologies, with SI values usually lower than 0.24 for CL and 0.26 for HL. However, these results are still away from the values obtained by fuzzy approaches and SVR methodology that are lower than 0.1.  Figure 10 shows real versus predicted ANFIS and FIR results for HL and CL models. In both cases we present the fold that gives larger RMSE, in order to show that even for the worse prediction results, the difference from the real data is almost indistinguishable, especially in the case of the heating load model. Each point in Figure 10 represents a building.  Figure 10 shows real versus predicted ANFIS and FIR results for HL and CL models. In both cases we present the fold that gives larger RMSE, in order to show that even for the worse prediction results, the difference from the real data is almost indistinguishable, especially in the case of the heating load model. Each point in Figure 10 represents a building.
Both models perform well for the task at hand. However, it is important to note the huge difference in computational time between the model identification/training process of ANFIS and FIR methodologies. While ANFIS needs more than 7 h to perform a unique run (for a whole 10-fold CV), FIR needs only a few minutes to perform the same task, both running on the same machine (Intel Core2 Duo, 3 GHz, 4 GB of RAM).
Comparing RMSE versus MAE errors, we can conclude that for both outputs, HL and CL, there is variation in the errors, since RMSE > MAE. However, this variation is not sizeable enough to indicate the presence of very significant errors.
Therefore, a first contribution of this research is to show that fuzzy approaches, like ANFIS and FIR, are good alternatives to estimate energy performance of residential buildings based on some building design parameters. Moreover, the fuzzy approaches studied here, have shown better throughput than twelve powerful machine learning approaches well suited for prediction problems. Both models perform well for the task at hand. However, it is important to note the huge difference in computational time between the model identification/training process of ANFIS and FIR methodologies. While ANFIS needs more than 7 h to perform a unique run (for a whole 10-fold CV), FIR needs only a few minutes to perform the same task, both running on the same machine (Intel Core2 Duo, 3 GHz, 4 GB of RAM).
Comparing RMSE versus MAE errors, we can conclude that for both outputs, HL and CL, there is variation in the errors, since > . However, this variation is not sizeable enough to indicate the presence of very significant errors. Therefore, a first contribution of this research is to show that fuzzy approaches, like ANFIS and FIR, are good alternatives to estimate energy performance of residential buildings based on some building design parameters. Moreover, the fuzzy approaches studied here, have shown better throughput than twelve powerful machine learning approaches well suited for prediction problems.

Feature Selection Results and Discussion
As described in the previous section, FIR finds that two of the eight input variables, RC and GA, have the highest causal relation with the outputs and are enough to predict the heating and cooling loads. This is an interesting result. On the one hand, it is consistent with Tsanas and Xifara's outcomes that claim the GA is the most important predictor for both HL and CL. On the other hand, it allows the conclusion that the remaining six variables, SA, WA, RA, OH, O and GAD, are redundant or irrelevant. Again, this is consistent with the previous work that concludes variables RC, SA, WA, RA, and OH appear reasonably strongly correlated to the output variables, and also finds that some input variables are highly correlated between them. Based on the FIR feature selection process, it is reasonable to think that the relative compactness variable, RC, includes the information of other relevant variables involved in the study, such as SA or RA. In fact, this makes sense because there is an analytic formula linking RC, SA and the volume [17]. It is clear that the WA variable is directly

Feature Selection Results and Discussion
As described in the previous section, FIR finds that two of the eight input variables, RC and GA, have the highest causal relation with the outputs and are enough to predict the heating and cooling loads. This is an interesting result. On the one hand, it is consistent with Tsanas and Xifara's outcomes that claim the GA is the most important predictor for both HL and CL. On the other hand, it allows the conclusion that the remaining six variables, SA, WA, RA, OH, O and GAD, are redundant or irrelevant. Again, this is consistent with the previous work that concludes variables RC, SA, WA, RA, and OH appear reasonably strongly correlated to the output variables, and also finds that some input variables are highly correlated between them. Based on the FIR feature selection process, it is reasonable to think that the relative compactness variable, RC, includes the information of other relevant variables involved in the study, such as SA or RA. In fact, this makes sense because there is an analytic formula linking RC, SA and the volume [17]. It is clear that the WA variable is directly related to the GA, making it redundant. Therefore, the five variables that appear reasonably strongly correlated to the output variables contain redundant information if RC and GA are already selected.
Based on these results, and with the goal to validate the FIR feature selection process, it has been decided to use the selected features, RC and GA, as the only input variables to find a new ANFIS model. If the rest of the variables are redundant, as the feature selection process of FIR states, ANFIS should be able to model the heating and cooling loads using only RC and GA. Note that the ANFIS models presented in Section 3 used the whole set of input variables. To perform this study, different parameters of the ANFIS methodology are evaluated. Now the number of input variables is reduced from 8 to 2, allowing experimentation with different parameter sets involved in ANFIS modelling: number of epochs (i.e., 50, 100, 500, 1000, 2000) and number of classes of the input variables or partition (i.e., 3-2 and 5-2). All of the experiments use Gaussian membership functions, linear output function, and the hybrid optimization method. Notice that in this experiment it is possible to work with a linear output function since the number of rules for the rule set is 6 in the case that the chosen partition is 3-2, and 10 if the chosen partition is 5-2.
The results obtained for the heating and cooling loads using the different set of parameters and performing an average of the errors obtained when executing the 10-fold CV 100 times are summarized in Tables 3 and 4, respectively. Comparing Table 2 vs. Tables 3 and 4 it is clear that there are ANFIS models that reach the same level of accuracy, or even a little higher, than the ANFIS models that use the whole set of 8 features. The best results (shaded cells) are obtained when RC and GA variables are discretized into 5 and 2 classes (5-2), respectively. Fifty epochs is enough to derive a good CL model; increasing the number of optimization steps do not improve the model. The HL models need at least 100 epochs to be optimized properly.
Note that in these experiments a linear output function is always selected. Since the model has only two features, it becomes useful to use those ANFIS parameters that allow a higher tuning level in order to extract as much information as possible from the data available. That is why a 5-2 discretization (instead of the previous 3-2) and a linear output function (instead of the constant previously used) are the parameters that obtain better results. Increasing the granulation to 7-2 does not improve the models obtained. The computational time needed when using only the 2 features selected by FIR is in the order of minutes, not hours, as was the case when all 8 features were used. Specifically, the number of output parameters to be optimized has been reduced from 1296 to only 30. It is relevant to recall that a quick and accurate prediction model is of great importance for buildings design and decision making.
Therefore, a second contribution of this research is to illustrate that FIR methodology is able to identify the key parameters (from a set of building design characteristics) governing the energy use of residential buildings. That is, the model identification process of FIR methodology has been revealed as an efficient feature selection approach which can be used by other modelling methodologies, such as ANFIS, to enhance its skills.

Conclusions
The main goal of this work is to study the feasibility of fuzzy approaches to estimate the energy performance of buildings. The characteristics of a building are an important factor in determining the necessities of heating and cooling loads. Therefore, it becomes very useful to determine the most relevant building characteristics, related to the heating and cooling, needed to maintain comfortable indoor conditions in order to design and build energy-efficient buildings. This research starts from a previous study [17], which designed a set of buildings with the aim to predict the heating and cooling load of buildings taking into account variables: relative compactness, surface area, wall area, roof area, overall height, orientation, glazing area and glazing area distribution.
Two fuzzy methodologies have been studied in this research, i.e., the fuzzy inductive reasoning (FIR) and the adaptive neuro-fuzzy inference system (ANFIS). In order to test the generalization performance of FIR and ANFIS fuzzy models we use 10-fold cross validation. The training and testing processes were repeated 100 times to the whole dataset, which was randomly permuted in each run prior to splitting it into training and testing subsets. The results achieved by ANFIS and FIR methodologies were compared to those obtained by thirteen machine learning methodologies presented in previous works, i.e., IRLS, RF, SVR, GLR, CHAID, CART, SVM, RBFNN, BPNN, MARS, GSGP, GSGP-LS, GSGP-LS-LIN [17,[19][20][21]. From the results, it can be concluded that SVR, together with the two fuzzy approaches (ANFIS and FIR), works much better than the rest of the methodologies. An interesting result is that the feature selection process of the FIR methodology finds that only two input variables, i.e., relative compactness and glazing area, contain the relevant information needed to accurately predict heating and cooling loads. Obtaining ANFIS and FIR models that use only these two variables, and are just as accurate as the models that use the whole data set of eight input variables, proves this finding. Moreover, these quick and accurate prediction models can be very useful for buildings design and decision making.