An Octahedric Regression Model of Energy E ﬃ ciency on Residential Buildings

Featured Application: The use of this new regression method increases the available toolkit for system modeling and machine learning techniques. Their main advantages are simplicity and easy geometric meaning, allowing the treatment of a big class of modelling problems. The application to the study of a problem related with energy efficiency on buildings reviews a widely used dataset introducing some additional considerations that serve as a benchmark of the proposed methodology. Abstract: System modeling is a main task in several research ﬁelds. The development of numerical models is of crucial importance at the present because of its wide use in the applications of the generically named machine learning technology, including di ﬀ erent kinds of neural networks, random ﬁeld models, and kernel-based methodologies. However, some problems involving the reliability of their predictions are common to their use in the real world. Octahedric regression is a kernel averaged methodology developed by the authors that tries to simplify the entire process from raw data acquisition to model generation. A discussion about the treatment and prevention of overﬁtting is presented and, as a result, models are obtained that allow for the measurement of this e ﬀ ect. In this paper, this methodology is applied to the problem of estimating the energetic needs of di ﬀ erent buildings according to their principal characteristics, a problem that has importance in architecture and civil and environmental engineering due to increasing concerns about energetic e ﬃ ciency and ecological footprint.


Introduction
Energy consumption rates are increasing up to the point of causing environmental problems associated to the use of non-renewable energy sources.Two main aspects should be considered in order to improve the situation.First, the development and improvement of renewable energy sources is receiving growing interest from researchers.Second, the reduction of waste and increments in the efficiency of energetic uses is a crucial point to take into account.Buildings contribute to this situation in an important form (between 20% to 40% of the overall energy consumption, depending on the country), so the number of studies devoted to its reduction have increased in the last few years.
Some papers focus on working with real data obtained at different sampling rate, trying to estimate and schedule a more sustainable energy consumption pattern [1,2] or physical and statistical models [3][4][5][6].
However, the diversity of buildings and the randomness of weather conditions difficult the problem of finding and modeling the influence of the different factors on the overall consumption to give advices addressed to reduce the environmental impact of existing or new constructions.One of the biggest problems in this case is the availability of data to work with, because the difficulty of its collection.The existence of public free data at the UCI Machine Learning repository [7] may explain the number of studies based on the dataset of reference [8], where a variety of machine learning (ML) techniques have been applied, as Table 1 presents.
Table 1.Machine learning (ML) techniques used to model the reference [8] dataset.
The finite element method (FEM) is a numerical method for finding numerical solutions to differential equations and boundary problems in partial differential equations, initially used for solving structural problems in civil and aeronautical engineering [46][47][48][49][50][51] and can be described as follows: Given a differential equation defined by a differential operator, D, where f , vIeV, V being a function space, the finite element method replaces V by a finite dimensional subspace V h ⊂ V, which is composed by continuous piecewise polynomial functions of degree K (Sobolev space), associated with a division of the domain, [0, 1] d , where the problem is defined in N e parts called elements The problem to solve is now Let us consider a basis for the functions of satisfying the next conditions at a set of Q points called nodes The functions of the basis are called shape functions and they are used to interpolate in points different from the nodes.Thus, any function f can be developed in terms of the basis with components u i taking the form According to Equation ( 9), the sum is restricted to the nodes that determine the element which contains the point x.Several conditions can be defined to get the optimum values of u i as an approximation for the solution to the differential equation.
The selection of the nodes is a crucial point for the precision of the results.The operation that constructs the set of nodes and elements is called meshing.The resulting mesh can be locally characterized by a parameter related to the size of each element and the error of its approximated solution.
The numerical regression model for the relation Equation (3) consists in the set formed by the nodal values {u i }, which is called a representation model for the system.Then, the value of the relationship at any point can be estimated using the expression For simplicity in the subsequent calculations, let us use a mesh formed by regular hyper-cubic elements with edge length h.Introducing the complexity c as the number of elements in each dimension,h = 1/c, the total number of elements is c d , and the total number of nodes is (c + 1) d .
The key point in FEM-based methods is the determination of the values {u i } considering the definition of an error function and its minimization.That problem is equivalent to solve a linear system of size, (c + 1) d .
Authors have been developing regression techniques based on the properties of the finite elements as method of approximation [52][53][54][55].To guide the reader through the methods presented ahead, a short summary is included at this point:

1.
Minimization of least squared error defined on the dataset.
a. Problems: Derived linear system is usually underdetermined.

2.
Minimization of an energy associated with the deformation of the mesh defined on the problem domain.
b. Advantages: Associated linear system results always determined.

c.
Problems: Computational time order of solving a symmetric linear system of size (c + 1) d .

3.
Introduction of multi-indexes dimensional decomposition over a Galerkin optimization scheme.

d.
Advantages: Computational time order is O (c + 1) d .
e. Problems: Dependence of computational time as a dimensional power of complexity.

4.
Local average of simple radial functions based estimators calculated on the finite element meshing nodes.
f. Advantages: Independence between the time of computation and the complexity.The problem of numeric model estimation given the experimental dataset If Equation ( 12) is calculated at each point of the dataset, it results in where Ξ represents a function acting on the local error (usually the absolute value or squared).Presented in that form, the modeling problem is in some way equivalent to the problem of solving a differential equation, considered in Equation ( 5), and the same techniques based on the reduction of the dimension of the functional space of the solutions can be developed.One natural form to do this is applying the finite element methods in which the problem is reduced to the calculation on the nodal values.So, the model is determined by the value of the Q nodes This basic idea has been developed by the authors in references [52,53].The error of a model is defined as Usually, the obtained linear system is underdetermined, so additional conditions over the nodal values are required.The cause of these degrees of freedom is the absence of sample points in some elements of the discretized domain.These additional constraints come from the "minimum deformation" or "rigidization" condition, where the value of an undetermined node is calculated as the average on the values of the adjacent nodes, A deeper use of the minimum deformation principle is used in reference [54], where the point of view is a bit different.The geometric image of the mesh deformation introduced in the previous paragraph has been developed until the definition of an energy associated with the model.Furthermore, the energy U is directly inspired by the elastic energy of a bi-dimensional mesh composed by vertex joined by springs (with elongation V elong and flexion V f lex components for the energy of the mesh) and an interaction term V interact accounting for the attractive effect between the surface of the defined mesh and the experimental points.
To simplify, under the assumption of a general principle of smoothness of the model, the last expression approximates as where c f corresponds to the relative coupling between the different energy components.Using this global energy for the system, the equilibrium point can be obtained as a minimization problem whose solution takes the form of a linear system, The main advantage over the first methodology is that the system obtained has a unique solution and no additional time-wasting processes, like rigidization, are needed.
Both methods suffer of a common bottleneck related with the algorithms available to solve the linear system.However, given that the discretization is composed of hyper-cubes, it is possible to introduce a multi-index system (i 1 , . . ., i d ) for each node and element.Using this notation, the shape function in any dimension can be calculated as the products of shape functions of dimension one ϕ Then, the system Equation (19) and that derived from Equation (15) can be built following a structure that allows for a faster solution.In particular, in reference [55], the system is obtained from Galerkin's method using a definition of the error given by where z {E(x)} is an approximation of the unknown function as a weighted average of the experimental points for each element defined in terms of a radial function φ, where η E is the geometric center of the corresponding element.
The method can then be seen as a two-step process.In the first part, the use of element averages approximates the model by a piecewise function that is afterward smoothed.
The use of a radial weighted average is in fact a simple model that is used to compose the joined structure of the linear system through the Galerkin's error minimization formula.The linear system appears because the model is considered in terms of the determination of all the nodes of the domain in the FEM discretization.However, in fact, obtaining the best estimation for a point is more a local than a global problem as it is suggested by Equation (22).This global character causes that all the presented methods have computing times depending dramatically on the complexity c, that is, related to the good of fitness of the model.
To escape from this global requirement, one possibility is to construct only local models as Equation (22).Nevertheless, the calculation of the average is influenced by the experimental errors, and it is desirable to reduce the impact of the individual errors at the experimental points.Therefore, if the model is estimated in different points, the influence of these errors will vary from one point to another.
Given that the finite element model introduces a structure composed by points separated by a distance parametrized by the model complexity h = 1/c and a method for interpolate the values defined on the nodes of an element, a natural option is using radial functions Φ(x) in the calculation of simple models on each node, Afterward, the interpolation of these values through the form functions Although this methodology improves some of the problems commented previously, it has also some points that should be considered.
First, the effectiveness of the error smoothing process Equation ( 24) is optimum for points near the center of the element, but the points beside a node are mainly determined by the value of the radial average on this node.Second, the use of hyper-cubic elements presents an advantage from the point of view of shape function calculations as considered in Equation ( 20), but the number of nodes is 2 d , introducing a dependence with the dimension in the computational complexity of the algorithm of O d • 2 d .
These problems are solved with the methodology presented in the following point, called Octahedric regression.It represents an improvement in the computational order O d • 2 d of the algorithm, a feature that will allow the modelling of systems with higher dimensions.
The presented methodology (Octahedric regression) is a hybrid methodology, including the main characteristics of the finite element method, radial basis function, and nearest neighbors.It can be explained as a two-step algorithm where the result is calculated after a FEM-like interpolation process acting on a set of simple estimators obtained with a weighted version of a (1+ε)-approximate nearest neighbor.
Octahedric regression presents some degree of similitude with the technique of kriging.The main difference is that the set of points used to calculate the final result has a fixed structure (forming an octahedral around the objective point), and the quantities involved in the interpolation are not the sample values, but predictors obtained using a radial-based weighted average of the experimental data.
The rest of the paper is organized as follows: Section 2 introduces the octahedric regression methodology and its computational algorithm.The last part of Section 2 is devoted to the study of the treatment of overfitting from the point of view of the new methodology.Section 3 presents the application of the octahedric regression to the problem of determining heating and cooling loads using the characteristics of a group of buildings.Finally, Section 4 exposes the conclusions and future investigations.

Materials and Methods
Definition 1.A parametrised radial function [56,57], is a function Φ : R + xR + → R + characterized by a parameter ω that accomplishes the conditions An example of parametrised radial function is the exponential The following definitions introduce the basic tools used afterwards to develop the methodology.
Definition 2. Given a function f (x) defined on a domain, Ω = [0, 1] d , the weighted average regression of f (x) in a point x o , c * (x o ), is a number implicitly defined as where Φ(., .) is a parametrised radial function.
Definition 3. The J-th moment of the radial function Φ is defined as Using this expression, Equation ( 27) can be written as Definitions 4 and 5 introduce the concept of support of an interpolation to be used in the following definitions.Definition 5.An octahedric support for interpolation of size h around x 0 is the support for interpolation around x 0 given by the 2 • d vectors Definitions 6 and 7 present the main objective of the research in form of an interpolation of several radial function-based estimations.Definition 6.Given an octahedric support for interpolation of size h around a point x o and a function f (x), the interpolated function f (x) is defined as Definition 7. Given a function defined on Ω , and a point x o ∈ Ω , the octahedric estimation ẑ(x) of width h of f (x) at x o is defined as the interpolation of the weighted averages on its octahedric support of size h, given by ẑ(x o ) = 1 2d where c * (..) are the numbers defined in Definition 2. By similitude with the methods introduced in Section 1, the value of 1 h is called pseudo-complexity.
To study the meaning of the definition 7, let consider the case when h is small.From Equation (29), Developing W 0 and Φ around x o up to second order in h, (35)   Equation ( 33) takes the form That is, Summing negative and positive components of the support on dimension i, Summing now for every dimension index i and dividing by 2 • d: Therefore, the octahedric regression is Taking as radial function given by Equation (26), So, for points far from hypercube's boundary, by symmetry, Equation ( 40) becomes That is, octahedric regression is a correction of order h 2 to the weigthed average for central objective points.As a consequence, when h → 0 , both values tend to coincide ẑ(x o ) → c s * (x o ) .The following definition and propositions are related with the behaviour of the estimations with relation to the experimental error distributions, considered as normal and uncorrelated over the problem domain.
Definition 8. Given a function f (x) defined on a domain Ω and a random field e(x) ∼ N(0, σ(x)) defined on Ω , an experimental realisation of f associated with a sample e S (x) of e(x) is a function y s : Ω → R , given by Proposition 1.The expected value (denoted as,E[. ..]) of the weighted averaged regression corresponding to the values y s in any point Proof.Following the Equation ( 27), the regression of the experimental realisation is Using Equation ( 44), the last expression can be written as Therefore, by Equation ( 27) and using the definition (Equation ( 29) Under the conditions of definition 8, E e S (x) = 0, and the expected value of Equation ( 48) is So, Proposition 2. Let e(x) a distribution not correlated for different points of Ω, E e S (x) • e S (y) = σ 2 (x).δ(x − y).Then, the variance of the weighted averaged regression corresponding to the values y s is given by Proof.Taking Equation (48) to the square and calculating expected values, That is, Now, the computational algorithm is presented with some considerations relative to its implementation.

Computational Algorithm
In real cases, complete sets of values of y S (x) in the domain Ω are not available, so one must use only a subset of P points.For this sample, the integrals are calculated using finite sums, so Now, the finite sample version of Equation ( 46) is Following (Equation ( 56), the proposed algorithm (Algorithm 1) can be condensed in the next schema: Let us introduce the size of the problem data as = P • d.Given that the calculation of the radial kernel function has a computational cost of O(d), where d is the dimension, the number of operations of the algorithm can be calculated as O 2 , while the memory requirements will be O().

Full and Restricted Models as a Prevention of Overfitting
Equation (43) shows the dependence of the estimator on the parameter h.However, the algorithm includes one additional parameter, ω, used in the radial function.Let us order the points depending on the distance to x o .Taking Equation ( 54) of the function W 0 (x) at the support points The term on brackets represent the relative weight of each sample point in the weighted average.Summing up the contributions on each support point, the result can be written as where The value of represents the weight of all the points in the estimation of the model calculated at x r .At zero order in h, Equation ( 59) implies that Equation ( 60) is approximately By Equation ( 25), the fractions converge very fast to 0 as the distance to x r growths, and the only contribution depends on the points that are at a distance similar to the nearest point, that is, when Φ(x k − x r , ω) ≈ Φ h 2 , ω .The number of q points involved are determined by the value of the parameter ω, and c S (x o ) is a weighted mean of the q-nearest points.So, the octahedric regression presented in the present paper corresponds to a mean of a simpler estimator calculated on the support of size h defined around x o .These simple estimators correspond to the weighted average of the q(x o , ω)-nearest neighbours.In the limit, ω → 0 , c S (x o ± h • e i ) are obtained from the nearest point.In the case of considering all the experimental points, the model is called full.In that case, when h 1, the nearest to the support points will be very frequently x o , and then: c S (x o ± h • e i ) → y o , and according to Equation (43), ẑ(x o ) → y o , confirming the trend to overfitting.
The overfitting is caused by the incorporation of noise to the model.If the point that is being calculated is not included in the estimation, the points that are used to obtain the values of c S (x o ± h • e i ) have a greater probability of presenting independent noise influence, diminishing in that form the overfitting.The new model calculated in this form is called restricted and corresponds to having a cross-validation for each experimental point where the test set is formed by itself.

Results
The fit of a model can be measured using different parameters.If y i and y i are the observed and estimated values, 1.
Mean squared error: Mean absolute error: MAE = 1 P • P i=1 y i − y i 3.
Mean absolute percentage error: MAPE = 100 P • P i=1 y i − y i y i

4.
Regression error characteristic (REC) curve: It is a curve obtained plotting the error tolerance on the X-axis versus the percentage of points predicted within the tolerance on the Y-axis.
As a case study, the energy efficiency dataset [8] from the UCI Machine Learning Repository [7] has been selected.The data corresponds to the input values of two simulations generated by the software Ecotect [58] from Autodesk, San Rafael (CA) US, representing the heating and cooling loads necessary to achieve comfortable indoor conditions.The dataset variables are represented in Table 2.The studied buildings are generated using 18 cubes with a side length of 3.5 m, forming 12 different building forms with equal volume but different areas and dimensions.Each side can act as different elements (wall, floor, roof, or window) with different thermodynamic properties.The different combinations of values allowed for the model can be consulted with more detail in reference [8].As a result of the process, 768 buildings are simulated, and the results for both simulations (heating and cooling cases) are presented in the dataset.
The quality parameters obtained for the learning techniques used in reference [8] are shown in Table 3. Iteratively reweighted least squares (IRLS) is a method used to diminish the effect of outliers in classical linear regression [59], while RF stands for random forest.Given the huge development of research references based on this dataset, the median and minimum value for each applied algorithm of machine learning is shown on Table 4, where the bold text represents the best case of each quality parameter.

Initial Dataset Variables
The variables in reference [8] have been used in previous studies, but some considerations can be done with respect them previously to their inclusion in the proposed model.A more detailed analysis will be done in Section 3.1.2.The relationships between the variables can be seen in Figure 1.With respect to the dependent variable, the relationship between the heating load and the independent descriptors can be seen in Figure 2.An analysis of the data using octahedric regression using different values of h and ω parameters gives a result that can be seen in Figure 3: The selection of parameters h and d ω can be done from the results shown in Figure 3.However, the total process can be accelerated diminishing the number of models to evaluate using a simple relationship between the parameters ω and h, as ω = h.This model is called normalized octahedric regression and increases the errors in a small magnitude from the best case of free selection for ω and h, but the model selection is easier, as can be seen in Figure 4.As was previously observed from Equation (61), full model tends to overfitting when h → 0 , while restricted models tend to overestimate the stochastic errors.For these reasons, a good indicator of the model behavior would be the mixed model, defined as Following the behavior of the mixed model in Figure 4, the selection of ω = h = 0.033 seems a logical option, corresponding to a pseudo-complexity of 30. Figure 5 shows the estimated versus the experimental values.
Figure 6 shows the error over each independent variable.The REC curve for the model is shown in Figure 7.

Reduced Models-Separated Models by Number of Floors
A detailed revision of the model obtained in Section 3.1.1shows two points that should be considered more in detail.First, the plots of Figure 1a-d,h-j,n,o,s,w-y show some kind of relationship between the involved variables.Considering that the vertical interfaces of the buildings are of floor or roof type (and given that the build unit is a cube, both variables must have the same value, resumed by Floor), the total surface can be written as Also, given the definition of the relative compactness defined as in reference [60] taking the cube as elemental compactness measure, Equation (63) and Equation (64) introduce two constraints that jointly to the constant volume condition imposed by the algorithm of design of the buildings, reduce in two the independent or basic variables related with the building's geometry to be considered.In this study, these two variables will be the wall and the floor areas.Moreover, the overall height can be eliminated from the variable set because, given that all the basic cubic elements will have one lower side, if floor surface equals 18 • 12.25 = 220.5 m 2 , overall height must be 3.5 m.Selection of variables have been done, by example, in [28], where different models are compared for the sets of variables (roof area, overall height), (relative compactness, roof area, overall height), (relative compactness, surface area, wall area, roof area, overall height, glazing area) and the full dataset.In reference [18], the considered variables are (surface area, wall area, roof area, overall height, glazing area), while reference [35] uses (relative compactness, surface area, wall area, overall height, glazing area) to model the cooling problem.The importance of each variable is studied in [9] using ANOVA with the result of (relative compactness, surface area, wall area, overall height, glazing area) and (relative compactness, wall area, roof area, overall height, glazing area) as the most important variables for the heating and cooling problems, respectively.
However, any extension to a more general case should consider if any of the constraints remain or can be ignored, introducing additional independent descriptors.So, the reduced set of variables considered in the modelling with the new methodology are those presented in Table 5.A second detail to consider is the results of Figure 5, especially the plot (e), which shows different behavior for the two values of the height, that physically would correspond to buildings with one and two floors.
Moreover, form Figure 2, plots (d) and (e) show two groups for the values of the dependent variable, being this behaviour more remarkable in the case of the dependent variable height, corresponding to buildings with one and two floors.
So, a separate study of two different models called 1_floor and 2_floors defined by the number of floors will be carried out (Table 6).In the case of one floor buildings, the roof area corresponds to 220.5 m 2 , as was commented previously, so this variable can also be omitted.The results for the 1_floor normalised model are shown in Figure 8, for different values of h-parameters.
The different behaviors of the relative error and mean absolute error (MAE) recommends the use of an intermediate value h = 0.02 for the selected model, whose results are shown on Figure 9.
The 2_floors model results can be seen in Figure 10.A selection of h = 0.02 gives as results the models shown in Figure 11: To compare with the join dataset introduced in Table 5, a group of models has been generated to see their behaviors.
A selection of h=0.04 gives the minimum error value for the models represented in Figure 12.The summary of the resulting model is shown in Figure 13.A resume of the Mean Absolute Error (MAE), Mean Square Error (MSE) and Mean Absolute Percentage Error (MAPE) obtained by the octahedric regression models for the heating load model is shown in Table 7.

Cooling Load Models
A similar study to the heating load case can be done for the cooling load problem.

Complete and Reduced Datasets
Including the constraints introduced by Equations ( 63) and ( 64), the reduced variable model as introduced in Table 5 is applied to the cooling load problem.
The models for the complete and reduced variable datasets are shown on Figure 14.

Separated Models by Number of Floors
An analysis of errors for the complete problem is shown in Figure 16.Following the reasoning in Section 3.1.2,the models defined in Table 6 can be studied for the cooling load case, as it is shown in Figure 17: Selection of optimum values of h parameters could be h = 0.02 in both cases, obtaining the following corresponding models shown in Figure 18: A resume of the available models for the cooling load problem can be seen at Table 8.

Discussion
The proposed methodology has been compared with other machine learning methods applied to the estimation of energy loads in building from dataset in reference [8].In the heating load problem, the results for the complete datasets are discrete, in the complete and reduced variables cases.For the separated study of building depending on its floors, the result for one floor is between the median value for the different ML techniques (see Table 4).However, the results for the two floors buildings are again discrete.The results are, in general better for MAPE than for MAE indicators.
In the case of the cooling load problem, the results are lightly worse, something that also happens in most of the ML methods presented in Table 4.However, this worsening is relatively minor, obtaining in general results in order with the median of the techniques.
The effect of the additional variables on the results can be seen on Tables 7 and 8, where the quality parameters are better in all the categories for the complete against the reduced models.Using variables correlated between them can deteriorate the performance of ML methods and detecting and treating the associated overfitting is a difficult problem.The proposed methodology shows a relative stability front the effect of these spurious predictors.
Also, an interesting characteristic of the method is the capability of estimating the overfitting at cheap computational cost through the full and restricted models.A compromise between both values is taken as the output of the predictive process.However, more in-depth research must be done in order to adequately evaluate the behaviour and fitness of the octahedric regression technique.For example, the characteristics of the computational algorithm make it an ideal candidate for the use of parallel computing versions.Measurements of the speedups obtained using some benchmarking datasets will be one of the future developments.
Another promising characteristic of the methodology is that it is derived of the geometrical properties of the octahedral.The spatial distributions of the symmetry axes allow studying the effect of different orientations on the partial calculation of the estimation.The way this property can be implemented to select the principal components of the dataset is another field of research.
With respect to the problem of energy efficiency of buildings, two main lines of study open from this paper.First, the existence of different models far from the considered number of floors could be deduced from the behaviour of the error in Figures 5 and 13.The origin and causes of this different behaviour should be studied in more depth.Related with this problem, the second aspect to consider is the existence of additional variables not related with those in Table 5 that could affect the way the software Ecotect calculates the heating and cooling loads and is not considered in the present study.
g.Problems: Computational time order of O d • 2 d with a power dependence on the dimension.

Definition 4 .
Given a point x o Ω and a set of Q d-dimensional vectors {ζ r } Q r=1 , with Q ≥ d = dim(Ω) , the set of points obtained by the combinations {x o + ζ r } Q r=1 ⊂ Ω is called a support for interpolation around x o .

Figure 3 .
Figure 3. Results for the heating load model depending on the h-parameter.The values of the ω parameter are shown in the legend.Continuous lines represent full models, while discontinuous corresponds to restricted or partial models: (a) MAPE; (b) R2.

Figure 4 .
Figure 4. Results of the heating load problem for the full, partial, and mixed models with the parameter selection h = ω.(a) Mean absolute percentage error (MAPE) depending on h; (b) Mean absolute error (MAE) depending on h.

Figure 5 .
Figure 5. Results of the heating load problem for the full and restricted models with the parameter selection h = ω = 0.033.(a) Estimated vs experimental values; (b) Estimated vs experimental values (sorted).

Figure 7 .
Figure 7. Regression error characteristic (REC) curve for the heating load full restricted and null (Mean) models with h = ω = 0.033.

Figure 8 .
Figure 8. Results of the heating load 1_floor problem for the full, restricted and mixed models with the parameter selection h = ω.(a) MAPE depending on h; (b) MAE depending h.

Figure 9 .
Figure 9. Results of the Heating load 1_floor problem for the full and restricted models with the parameter selection h = ω = 0.02.(a) Estimated vs experimental values; (b) REC curve for the full, restricted and null (mean) models.

Figure 10 .
Figure 10.Results of the Heating load 2_floors problem for the full, restricted and mixed models with the parameter selection h = ω.(a) MAPE depending on h; (b) MAE depending on h.

Figure 11 .
Figure 11.Results of the Heating load 2_floors problem for the full and restricted models with the parameter selection h = ω = 0.02.(a) Estimated vs experimental values; (b) REC curve for the full, restricted, and null (mean) models.

Figure 12 .
Figure 12. Results of the Heating load (reduced variable set) problem for the full, restricted and mixed models with the parameter selection h = ω.(a) MAPE depending on h; (b) MAE depending on h.

Figure 13 .
Figure 13.Results of the heating load (reduced variable set) problem for the full and restricted models with the parameter selection h = ω = 0.04.(a) Estimated vs experimental values; (b) REC curve for the full, restricted, and null (mean) models.

Figure 14 .
Figure 14.Results of the Cooling load problem for the complete and reduced variable dataset models with the parameter selection h = ω.(a) MAPE depending on h for the model with 8 variables; (b) MAE depending on h for the model with 8 variables; (c) MAPE depending on h for the model with 5 variables; (d) MAE depending on h for the model with 5 variables.The results are very similar, but comparing Figure 14a,b, it seems a good selection of h = 0.033 for the complete dataset model, whereas Figure 14c,d seems to recommend a value h = 0.033 for the reduced variable dataset.The results for both models are shown in Figure 15.

Figure 15 .
Figure 15.Results of the cooling load problem (a) Estimated vs experimental values for the full and restricted models for the eight-variable dataset with parameter selection h = ω = 0.033; (b) REC curve for the full, restricted and null (mean) models for the full and restricted models for the eight-variable dataset with parameter selection h = ω = 0.033; (c) Estimated vs experimental values for the model with five variables and h = ω = 0.033; (d) REC curve for the model with five variables and h = ω = 0.033.

Figure 16 .
Figure 16.Errors over independent variables for the cooling problem including all the variables.(a) Relative compactness; (b) Surface area; (c) Wall area; (d) Roof area; (e) Overall height; (f) Orientation; (g) Glazing area; (h) Glazing area distribution.

Figure 17 .
Figure 17.Results of the cooling load problem for the complete and reduced variable dataset models with the parameter selection h = ω.(a) MAPE depending on h for the model one-floor model with four variables; (b) MAE depending on h for the one-floor model; (c) MAPE depending on h for the two-floors model with five variables; (d) MAE depending on h for the two-floors model.

Figure 18 .
Figure 18.Results of the cooling load reduced problem for the full and restricted models with the parameter selection h = ω = 0.02.(a) Estimated vs experimental values for 1_floor model with five variables; (b) REC curve for the full, restricted and null (mean) 1_floor model; (c) Estimated vs experimental values for 2_floors model; (d) REC curve for the full, restricted, and null (mean) 2_floors model.
= f x 1 , . . ., x d that has an unknown expression, is equivalent to determining a function z x 1 , . . ., x d defined in terms of an algorithm to its calculation that results in a minimum over some kind of global error E obtained as a function H k=1,2,....,p , which comes from measurements of a system determined by the relationship y → e of individual errors e [k] .E = H e [1] , e [2] , . . .., e [P] ,

Table 2 .
Input and output variables for the energy efficiency problem.

Table 3 .
[8]ameters of quality for models of energy efficiency dataset in reference[8].
MAE: Mean Absolute Error; MSE: Mean Square Error; MAPE: Mean Absolute Percentage Error.

Table 4 .
Median and minimum MAE and MAPE for different Machine Learning techniques.Bold value represent the smallest value of each model's quality parameter.

Table 5 .
Input and output variables considered for the modelling of the energy efficiency problem with octahedric regression.

Table 6 .
Input variables for the 1_floor and 2_floors models of the heating load problem.

Table 7 .
Quality parameters for the models of the heating load problem.

Table 8 .
Quality parameters for the mixed and restricted models of the cooling load problem.