Numerical Non-Linear Modelling Algorithm Using Radial Kernels on Local Mesh Support

Estimation problems are frequent in several fields such as engineering, economics, and physics, etc. Linear and non-linear regression are powerful techniques based on optimizing an error defined over a dataset. Although they have a strong theoretical background, the need of supposing an analytical expression sometimes makes them impractical. Consequently, a group of other approaches and methodologies are available, from neural networks to random forest, etc. This work presents a new methodology to increase the number of available numerical techniques and corresponds to a natural evolution of the previous algorithms for regression based on finite elements developed by the authors improving the computational behavior and allowing the study of problems with a greater number of points. It possesses an interesting characteristic: Its direct and clear geometrical meaning. The modelling problem is presented from the point of view of the statistical analysis of the data noise considered as a random field. The goodness of fit of the generated models has been tested and compared with some other methodologies validating the results with some experimental campaigns obtained from bibliography in the engineering field, showing good approximation. In addition, a small variation on the data estimation algorithm allows studying overfitting in a model, that it is a problematic fact when numerical methods are used to model experimental values.


Introduction
Obtaining a numerical or analytical representation of a given unknown relationship is a frequent problem in the study and modelling of systems, with application in a wide variety of fields ranging from experimental and social sciences to engineering. In addition, in most cases, the only information available is a set of experimental data from the variables that define the relationship. In other words, without loss of generality, given y = f (x 1 , . . . , x d ) (1) with → x = (x 1 , . . . , x d ) ∈ Ω ⊂ R d (2) usually only one set of experimental data is known for the variables involved in the relationship (1): Mathematics 2020, 8 These values are usually affected by several factors that induce variations between the real and the observed quantities. That is, for any k: If Ω is a bounded domain of R d , and f ∈ C 1 , developing the function f ( (4): where Defining a new value e [k] that includes all perturbations: Therefore, Expression (7) is frequently assumed implying that y is the only variable affected by the error, while the variables correspond to the exact values. It is also frequent to include some considerations on the distribution of probability that generates the values of the perturbation, especially the assumption of normality.
Sometimes the form of the relations is known as occurs in linear regression where the equation corresponding to the studied system is: The problem to determine the values of lambda parameters, that give the best fit between the linear expression and the data, is now reduced minimizing a previously defined error function.
In real cases, the effects of non-linearity dominate the dynamics of the system and far from the equilibrium points linear relations as in (8) are not useful. In the cases when the relation is not linear but known: y = z(x 1 , x 2 , . . . ., x d , . . . .., λ 1 , λ 2 , . . . .., λ Q ) The problem consists in the calculation of the unknown parameters (λ 1 , λ 2 , . . . .., λ Q ) using the data.
In (8) and (9), determining the values of the parameters of the model can be done by minimizing an error function defined over the parameters. One common method is to consider the mean square error (MSE) of the estimated values on the experimental points.
However, in real practice, the expression for the relationship is unknown and the problem is to obtain an analytical expression or a numerical approximation to the relation (1).
The finite element method (FEM) is a numerical method used initially for solving structural problems in civil and aeronautical engineering that is applied in a wide variety of fields, see [1,2]. This method allows finding approximate solutions to differential equations and boundary problems in partial differential equations. Given a differential equation defined through a differential operator, D: (11) where f , v ∈ V with V a function space the finite elements method replaces V by a finite dimensional subspace V h ⊂ V whose elements are continuous piecewise polynomial functions of degree K (this is a Sobolev Space). In addition, the domain of the problem, Ω is divided in N e parts called elements Ω i .
The original Equation (11) changes now to: where f h, v h V h . If dimV h = Q, taking a basis of V h : the solution can be approximated as: The values of u i must be determined to get a good approximation for the solution to the differential equation.
The basis (14) is selected considering the behaviour of their functions in a set of Q points called nodes (ζ 1 , ζ 2 , . . . .., ζ Q ) that satisfy the following conditions: In addition, an important property is that ϕ i (x) = 0 if the node ζ i does not form part of the element containing the point x.
The functions of the basis are named shape functions and they are used to interpolate in points different from the nodes.
The selection of the nodes is a crucial point for the precision of the results. The operation that construct the set of nodes and elements is called meshing and is determined by the set of elements and ). The resulting mesh can be locally characterized by a parameter related with the size of the corresponding local element and the error of the approximated solution in the element.
The authors in previous papers, using different methodologies, have studied regression methods based on the FEM. In [3], the regression algorithm is based on a least squares approach comparing experimental data and the model generated using FEM. In [4], the model is built from a physical analogy that solves some problems on the linear system. One approach more on the line of the original FEM can be seen in [5] and applied by an example in [6,7]. Although different modifications of the initial algorithm have been developed, see [8] in order to improve the computing speeds for the regression model, the basic methodology can be resumed as follows.
The numerical regression model for the relation consists of the set formed by the nodal values {u i }, which is called a representation model for the system. Then, the value of the relationship in any point can be estimated using the expression: By (16), the sum is restricted to the nodes that determine the element which contains point x. For simplicity in the subsequent calculations the authors have normalised the data in the domain, [0, 1] d , and used a mesh formed by regular hyper-cubic elements with edge length h. Introducing the complexity c as the number of elements in each dimension, h = 1/c, the total number of elements is c d and the total number of nodes is (c + 1) d .
The key point in FEM-based methods is the determination of the values {u i } considering the definition of an error function and its minimization. That problem is equivalent to solve a linear system of size, (c + 1) d . The results and precision have been checked with good performance for low dimensions, but there is a practical problem when the dimension of the problem becomes greater because of the order of the linear system solution's algorithms.
The problem with this approach is the conversion of a local problem, which is the determination of the function value in a point x o , into a global question solving a system to obtain the (c + 1) d nodal values.
As a conclusion, the direct geometric interpretation of FEM regression methods makes them very interesting compared to other approaches. The techniques related with finite elements are equivalent to the determination of the tangent subspace of the manifold defined by y = f (x) at the objective point x 0 . The natural evolution of the methods previously considered in the introduction is the restriction of the calculations to the boundary of x 0 , determined by the underlying FEM methodology. The only requirement is the determination of estimations on nodal points. These estimations can be as simple as the k-nearest neighbor (k-NN). However, k-NN presents the problem of determining the optimal value of k, to include as much representative values as possible, minimizing the influence of outliers. This can be solved by introducing the distance-based nearest neighbor.
To do this, the proposed algorithm is based on radial basis functions (and radial basis function networks) and the results on random fields, given that the regression problem on experimental data is equivalent to removing the influence of this stochastic term over the samples generated from the relationship under study.
After the introduction in Section 1, the paper is organised as follows. Section 2.1 considers the main characteristics and use of radial basis functions in problems of interpolation and regression, while Section 2.2 presents an introduction and some results on random fields. In Section 3, the method proposed by the authors is developed and applied to different problems. Section 4 corresponds to the conclusions and analysis of future investigations.

Radial Basis Functions
Radial functions are functions, which depend only on the distance between their arguments: These kinds of functions can be used as a basis of functional space (radial basis function), see [9], introducing an interpolation method for any function f on a set of pairs composed by points called centres and the values of the function on that points, (c i , f i ) P i=1 given by: The values of the alpha-coefficients are calculated imposing P conditions given by: This expression can be used to approximate the solution of a differential equation, using by example a Galerking approach [10]. The determination of centres is a central part of the problem to consider, because they are not fixed points as in other methods as finite elements, finite boundaries, etc., so these techniques receive the name of meshless methods. Several works have studied these approaches for different fields (fluid dynamics in [11]) or solving problems related with the corresponding systems as in [12,13].
Another frequent use of radial functions is in the problem of functional where an unknown function f (x) must be approximated from a set of pairs (x k , y k ) P k=1 provided by discrete measurements, where x corresponds to a vector and y are values related to the value of the function in the x-points. In addition, usually there exists errors between y and the value of f (x), that is The direct use of radial basics function (RBF) as an interpolation method is not adequate here because of the need of exact interpolation on the available points. For solving this issue, some neural networks models are developed [14] with the addition of radial functions [15,16] because they are faster compared to the classical sigmoid neurons. This better behaviour is due to the improvements of the algorithm [17] or the determinations of radial centres [18,19] because it is capable of obtaining arbitrarily good approximations to the functions [20,21]. An additional advantage of these radial basis function networks (RBFN) is the more transparent interpretation of the results compared to the case of multilayer perceptron (MLP) with the non-linearities associated with the sigmoidal functions, [19,20].
Compared with other techniques there are different assets depending on the problem, from very good results [22] to other studies affirming its inferior behaviour [23]. Other authors have also cited the need of a greater number of points to obtain similar accuracies to sigmoid networks [20].
In a neural network with n neurons, the output is: Following [24], a necessary and sufficient condition for a function to act as an activation function in a RBFN is not to be an even polynomial. A typical but not the only, see [36], selection for these radial functions is a set of Gaussian functions given by: Compared with other techniques there are different assets depending on the problem, from very good results [22] to other studies affirming its inferior behaviour [23]. Other authors have also cited the need of a greater number of points to obtain similar accuracies to sigmoid networks [20].
In a neural network with n neurons, the output is: Following [24], a necessary and sufficient condition for a function to act as an activation function in a RBFN is not to be an even polynomial. A typical but not the only, see [36], selection for these radial functions is a set of Gaussian functions given by:  The use of these kinds of functions was initially applied in the study of some structures of the cerebral cortex [37].
The output is obtained from the parameters { , , } =1 and several algorithms have been developed to obtain an optimum selection of those parameters (learning process). Some of them use a preliminary algorithm to determine the centres (genetic algorithms [38,39] or classification trees [40] and the widths of each element of the network [19,31]. However, widths are not usually The use of these kinds of functions was initially applied in the study of some structures of the cerebral cortex [37]. The output is obtained from the parameters c i , σ i , f i n i=1 and several algorithms have been developed to obtain an optimum selection of those parameters (learning process). Some of them use a preliminary algorithm to determine the centres c i (genetic algorithms [38,39] or classification trees [40] and the widths σ i of each element of the network [19,31]. However, widths are not usually considered as important as the rest of the parameters, so some authors have focused on the development of algorithms to optimize them [41]. Some more recent papers have used a variety of algorithms to improve the RBFN learning capabilities using hybrid approaches [42], including techniques to reduce the dimensionality as the principal component analysis [43]. Between the modifications of the basic version of RBFN the most developed are the addition of a linear term to improve the behaviour on the regions that are out of the cloud of available points [26]. In addition, the use of a normalising operation defines the normalised radial function neural networks (NRFNN) [44,45], where the network output has the expression: NRBFN present some good properties as the partition of unity, covering the whole input space and greater stability with respect to the selection of the network parameters [46].
For a general study of the families of models related to (23), papers [52,53] can be of interest.

Random Fields
Given a domain Ω, a random field defined on it is a collection of random variables defined on each point of Ω.
Random fields are widely used in problems where local correlations exist on data perturbations as speech recognition, computer vision [54,55], statistical analysis of MEG images [56], and magnetic resonances imaging [57]. Some of these treatments are related with the problem of the determination of a threshold value, which has been generalized, as the determination of the frontier in a binary classification from a set of known points [58]. Additional applications are referenced in [59], etc., including particle physics, agriculture, economics, etc.
An interesting further step is the use of random fields and neural networks as considered in [60]. An important part of the investigations is devoted to specific types of random fields: Gaussian random fields, see [61], Markovian random fields, and conditional random fields used in structured learning problems where relationships exist between the regressed variables in different spatial-temporal points, by example, in the case of temporal data [62].
As pointed in [63], some of the considered problems can be seen as an incomplete data problem where part of the relevant data is hidden and must be estimated. Frequently, this estimation is made using maximum-likelihood related algorithms or using the mean field theory approach.
Following [64], let us consider a set of labelled points , f : V → R that will be used to assign labels to the points in U, under the condition f (x i ) = y i for x i ∈ L. The introduction of a weight function Mathematics 2020, 8, 1600 7 of 27 to preserve 'local smoothness' and the definition of an energy function gives as a result the harmonic property for the minimum energy function calculated from E(f), that is, the value of f at any unlabeled point can be written as an average of the rest of the points In [65], the problem of calculating the regression of a random field from a set of observations is considered. Under the conditions of a finite domain and distribution of the points over that domain, the authors suggest considering the problem as a continuous case.
Other studies of local regression of random fields are considered in [66], where given a realization of a random field (x i , y i ) : i ∈ Z N the problem is the determination of the spatial regression function An approach to the nonparametric regression can be seen in [67,68], where the regression problem is determining the function f defined on a lattice domain and given by the expression y i = f (x i ) + e i . The estimation expression for the function f on the domain [0, 1] d is related with the expression: where K is a kernel function and N is a parameter that determines the number of points in the lattice and the value of the parameter h N . Under some general conditions, the difference between the expected value of f N (x) and the value of the function, These results are very interesting, but they are obtained using a dataset defined on a lattice.

Radial Kernel Weighted Average
Definition 1. A parameterized radial function is a function Φ : R + × R + → R + characterized by a parameter ω that accomplishes the conditions: Following the definition, an example of parameterized radial function is the Gaussian: , is a number implicitly defined as: where Φ(., .) is a parameterized radial function.
Definition 3. The J-th moment of the radial function Φ is defined as: Using this expression, Equation (30) can be written as: Definition 4. Given a domain Ω, the J-th core of width χ (represented as K J

Definition 5.
In the same conditions of Definition 4, the axial core of width χ (represented as AK Ω (ω, χ)), is the subdomain AK Ω (ω, χ) ⊂ Ω where ∀x 0 ∈ AK Ω (ω, χ) and for every i = {1, 2, .., d }: For practical effects, Expression (33) implies considering W J (x) as the constant value W J for points belonging to K J Ω (ω, χ) within the precision given by χ. Whereas, (34) is equivalent to using a zero value for the integral inside AK Ω (ω, χ) with the same consideration with respect to the precision, properties that will be used to obtain an upper bound for the error of the estimation calculated with the algorithm on points belonging to K 0 Ω (ω, χ) ∩ AK Ω (ω, χ).

Interpolation over a Mesh of Finite Elements
Let the decomposition of the domain in N e elements Ω = ∪ N e i=1 Ω i and Q nodes {ζ r } Q r=1 with the associated shape functions N r (x) Q r=1 . Then, the following relations are true: Given a point x o Ω l ⊂ Ω with nodes {ζ r } Q r=1 and a function f (x), the interpolated value of the function in the pointf (x o ) is defined as: is defined as the interpolation of the weighted averages of f on the nodes of Ω l :ẑ This expression is closely related with the approaches presented in (23)-(25) and (27).

Definition 7.
Given a function f (x) defined on a domain Ω and a random field e(x) ∼ N(0, σ(x)) defined on Ω, an experimental realisation of f associated with a sample e S (x) of e(x) is a function y s : Ω → R , given by: Proof. Following Expression (30), the regression of the experimental realisation on each node is: Using (38), the last expression can be written as: Therefore, by (30) and using the definition (31) Under the conditions of Definition 7, E e S (x) = 0, and the expected value of (42) is: Therefore, Proposition 2. Let e(x) be a distribution not correlated for different points of Ω, E e S (x)·e S (y) = σ 2 (x).δ(x − y). Then, the variance of the regression corresponding to the values y s is given by: Proof. Taking (42) to the square and calculating the expected values: That is:

Proposition 3. (Upper bound of the error on inner points). Let f be a function f C
and y S a Gaussian experimental realisation of f . Then: Proof. From Equation (37): If the element containing x o accomplishes that Ω i ⊂ K 0 Ω (ω, χ) up to precision χ, W 0 (x) ≈ W 0 (ζ r ) ≈ W 0 , and using (32) and (41): Developing now f (x) around x = x o , the last expression takes the form: where α is a two-dimensional multi-index. Writing x − x o = x − ζ r + ζ r − x o , the first part of the integral takes the form: If that ζ r ∈ AK Ω (ω, χ) the first integral is approximately zero, while for the second: That is: Taking the absolute value: If M is an upper bound for the norm of the hessian matrix: For the last integral, the expected value is: where π(x, | |) is the probability of an error in the point given by x. In the case when this probability is independent of the point π(x, | |) = π(| |): In the case of a Gaussian distribution of :

Computational Algorithm
A natural selection for the parameter ω of the radial function is the value corresponding to the element width h. In addition, in real cases, complete sets of values of y S (x) are not available, so one must use just a subset of P points. Then, from this subsample, the integrals are calculated using finite sums, so: Now, the finite sample version of (40) is: Following Expression (62), the proposed algorithm can be condensed in the next schema (Algorithm 1): The calculation of radial kernels in lines 9 and 10 for a radial function as that in Equation (29), dimension d implies the sums of the d-coordinate differences r i = x i − x i k . The loop from 7 to 10 iterates over the number of points P, so given that these steps are repeated in the 2 d nodes of the finite element for each objective points, the computational time of the algorithm is t c ∼ O(P 2 ·2 d ·d). The memory complexity corresponds to saving the experimental points, that is, m ∼ O(P). An analysis of the relation between execution times and number of points will be considered in the applications section.
The analysis developed above shows clearly the improvement of the current algorithm compared with those presented in [4,5] where the dominant computational orders were O((c + 1) 3d ) and O((c + 1) d ). Remembering that the complexity is related with the precision of the estimation (without considering overfitting problems), the previous methodologies were highly limited in their application by the dimensionality of the problem. However, this limit disappears in this new approach, with the number of points being the main limiting factor.
Let us consider a normalised dataset. Then, the models are determined by the parameter h = 1 c that represents the length of the edge of the hypercubes that form the discretization of the domain Ω. Each value of c (complexity) determines a unique model, but following (37), when h → 0 , the regression estimation accomplishes that: In that limit,ẑ(x o ) corresponds to a weighted average of the K-nearest points to x o , with K depending on the parameter ω in Φ, which has been fixed as ω = h.
Let us consider the case of an estimation on a point (x 1 0 , x 2 0 , .., x d 0 ), based on the model defined by a dataset (x 1 k , x 2 k , .., x d k ; y k ) P k=0 containing x o . For commodity, also let the dataset be ordered with respect to the distance to x o , with Φ i = Φ( x i − x o , h) the parameterized radial function. The weighted average regression is then obtained as: However, if the estimation is calculated without using the self x o , the restricted weighted average regression c s r * takes the form: Therefore, there exists a relationship between both expressions: If the model dataset has a low density, the sum vanishes compared with the constant terms, obtaining: Joining (63) and (67), in the limit h → 0 , the result of the estimation tends toẑ(x o ) → y S 0 , so there will be a trend to the overfitting as the complexity grows in the full model.
The restricted model obtained from Equation (65) does not incorporate the information about the experimental value, if present, on the objective point, so the model is not using all the available information and it is obviously under fitted.

Applications
To test the algorithm behaviour, several problems have been selected from the UCI Machine Learning Datasets [69] to compare the results with those presented in other papers obtained using different techniques. UCI is a well-known repository and widely used data source for problems related to machine learning, regression, and time series prediction.
The fit of a model can be measured using different parameters, where y i is the observed and y i is the estimated value:

1.
R 2 coefficient: The strict use of R 2 is limited to the linear regression. Out of this case, the value of R 2 can vary in the interval ] −∞, 1]. However, the expression that defines R 2 is related to the other family of quality indicators as Legates-McCabe, Nash and Sutcliffe, and Willmott: Mean squared error (MSE): This indicator is associated to the second moment of the error, incorporating information about its variance and bias. Its principal problem is the overweighting of outliers, so usually, other parameters as MAE are preferred: Root mean squared error (RMSE): Squared root of MSE. Its main advantage, with respect to it, is that it is measured in the same units of the predicted variable. However, it suffers of the same problems with respect to the outlier influence: Mean absolute error (MAE): The best characteristic of MAE compared to RMSE is the equal weight of the deviations on the error. The ratio RMSE/MAE serves as an indication of the outlier presence on the sample:

5.
Mean absolute percentage error (MAPE): This parameter is frequently used by its simple interpretation as a relative error. However, it presents a bias that tends to select the model with smaller forecasts. One option could be the symmetric MAPE, but is not so widely used: Regression error characteristic (REC) curve: It is a curve that obtains plotting the error tolerance on the X-axis versus the percentage of points predicted within the tolerance on the Y-axis.
The proposed methodology is a deterministic algorithm and the obtained model (and in consequence the quality parameters) depends only on the available experimental data, so there is no difference in them between several runnings of the model.

Air foil Self Noise
The study of self-noise is an active topic in aerodynamics with a great number of research papers devoted to its prediction or improving the design of air foils and other structures. The phenomena related to this problem are complex involving vortex, turbulence, and flows in the linear-limit.
The NASA "Air foil Self Noise" dataset [70,71] was obtained from a series of aerodynamic and acoustic tests of two and three-dimensional air foil sections conducted in an anechoic wind tunnel.
The data set comprises different size NACA 0012 air foils (with sizes from 2.5 to 61 cm) at various wind tunnel speeds (up to Mach 0.21) and angles of attack (from 0 to 25.2 • ). The span of the air foil and the observer position were the same in all of the experiments. The measurements were made in a wind tunnel.
The data is composed of 1503 rows with five independent variables to determine the sound pressure level as indicated in Table 1.  Table 2 shows the parameters of quality for the models obtained in previous research using this dataset. The techniques are linear regression (LR), decision tree (DT), support vector machine regression (SVMR), random forest regression (RF), AdaBoost regression (ABR), XGBoost regression (XGBR), several types of neural networks (NN), and Stochastic Gradient Tree Boosting (SGTB). The first step is to study the behaviour of a cross validation using (80% of the data as train and 20% as test sets) and 100 iterations. The result for different values of the complexity is shown in Figure 2.
Apart from the expected result of a smaller error in the train subsets, an important behaviour can be observed in the MAE corresponding to the test subsamples with a stable value further than complexity 90 (with a mean MAE of 1.958). This value corresponds to a complexity close to 30 on the train subsamples.
The next step is the study of the models generated by the full dataset to test if this trend is verified. To do this, for each complexity two estimations have been done for the points of the dataset, corresponding to the full and the restricted estimations introduced in Equations (64) and (65). The results of MAE, MAPE, and R 2 coefficients are shown in Figure 3. [74] NN 0.894 [75] NN 0.943/0.838 (several models) The first step is to study the behaviour of a cross validation using (80% of the data as train and 20% as test sets) and 100 iterations. The result for different values of the complexity is shown in Figure  2.  Apart from the expected result of a smaller error in the train subsets, an important behaviour can be observed in the MAE corresponding to the test subsamples with a stable value further than complexity 90 (with a mean MAE of 1.958). This value corresponds to a complexity close to 30 on the train subsamples.
The next step is the study of the models generated by the full dataset to test if this trend is verified. To do this, for each complexity two estimations have been done for the points of the dataset, corresponding to the full and the restricted estimations introduced in Equations (64) and (65). The results of MAE, MAPE, and R 2 coefficients are shown in Figure 3.   The behaviour is consistent with the one observed in Figure 2, with a stable value in MAE around complexity 100 and MAPE from complexity 80. In addition, it is clear how the full model has diminishing values for MAE and MAPE coefficients when the complexity grows (making evident in that case the existence of overfitting because improvements in the full model are not parallel to improvements in the values of the restricted model).
Let us then study the characteristics of the model with complexity 90. Figure 4 shows the errors as pairs observed/estimated for the full and restricted models for complexity 90.  The behaviour is consistent with the one observed in Figure 2, with a stable value in MAE around complexity 100 and MAPE from complexity 80. In addition, it is clear how the full model has diminishing values for MAE and MAPE coefficients when the complexity grows (making evident in that case the existence of overfitting because improvements in the full model are not parallel to improvements in the values of the restricted model).
Let us then study the characteristics of the model with complexity 90. Figure 4 shows the errors as pairs observed/estimated for the full and restricted models for complexity 90.  The behaviour is consistent with the one observed in Figure 2, with a stable value in MAE around complexity 100 and MAPE from complexity 80. In addition, it is clear how the full model has diminishing values for MAE and MAPE coefficients when the complexity grows (making evident in that case the existence of overfitting because improvements in the full model are not parallel to improvements in the values of the restricted model).
Let us then study the characteristics of the model with complexity 90. Figure 4 shows the errors as pairs observed/estimated for the full and restricted models for complexity 90.   The parameters of quality for this model are shown in Table 3. The results are comparable to those presented in Table 2 corresponding to previous papers. The computation time over 10 iterations has been measured for different complexities, obtaining a result shown on Figure 5. Given the expression of the computational order ( 2 • 2 • ) no dependence on complexity is expected, a result that is ratified by the plot results.  The parameters of quality for this model are shown in Table 3. The results are comparable to those presented in Table 2 corresponding to previous papers. The computation time over 10 iterations has been measured for different complexities, obtaining a result shown on Figure 5. Given the expression of the computational order O(P 2 ·2 d ·d) no dependence on complexity is expected, a result that is ratified by the plot results. To confirm the quadratic dependence on the number of points ( 2 • 2 • ), a study of the computing time for different values of P in the dataset, for a high complexity (in this study complexity 100 has been selected) has been realised. The result for the value of the variable has been obtained, the computing time divided by the square of the number of points is shown in Figure 6. The constant behaviour at a greater number of points of the last figure certifies the hypothesis introduced on the dependence ~2.

Combined Cycle Power
The problem of effective prediction of power output for electric power plants is of great importance in order to optimize the economic profit by the generated megawatt. For a gas turbine, the relationship between the power available under full load conditions and variables as temperature, pressure, relative humidity, and the exhaust vacuum measured from steam turbine, is complex enough to be studied using machine learning methods. Specifically, given the diversity in the values of the different periods (summer and winter) regression and prediction methods based on locality To confirm the quadratic dependence on the number of points O(P 2 ·2 d ·d), a study of the computing time for different values of P in the dataset, for a high complexity (in this study complexity 100 has been selected) has been realised. The result for the value of the variable has been obtained, the computing time divided by the square of the number of points is shown in Figure 6. To confirm the quadratic dependence on the number of points ( 2 • 2 • ), a study of the computing time for different values of P in the dataset, for a high complexity (in this study complexity 100 has been selected) has been realised. The result for the value of the variable has been obtained, the computing time divided by the square of the number of points is shown in Figure 6. The constant behaviour at a greater number of points of the last figure certifies the hypothesis introduced on the dependence ~2.

Combined Cycle Power
The problem of effective prediction of power output for electric power plants is of great importance in order to optimize the economic profit by the generated megawatt. For a gas turbine, the relationship between the power available under full load conditions and variables as temperature, pressure, relative humidity, and the exhaust vacuum measured from steam turbine, is complex enough to be studied using machine learning methods. Specifically, given the diversity in the values of the different periods (summer and winter) regression and prediction methods based on locality The constant behaviour at a greater number of points of the last figure certifies the hypothesis introduced on the dependence t c ∼ P 2 .

Combined Cycle Power
The problem of effective prediction of power output for electric power plants is of great importance in order to optimize the economic profit by the generated megawatt. For a gas turbine, the relationship between the power available under full load conditions and variables as temperature, pressure, relative humidity, and the exhaust vacuum measured from steam turbine, is complex enough to be studied using machine learning methods. Specifically, given the diversity in the values of the different periods (summer and winter) regression and prediction methods based on locality will have a better performance. This makes this dataset a good benchmark for the current methodology.
The dataset contains 9568 data points collected from a combined Cycle Power Plant over six years (2006)(2007)(2008)(2009)(2010)(2011), when the power plant was set to work with a full load. The variables that correspond to the hourly average values are presented in Table 4. For additional details, consider the original papers [76,77]. The models considered in [76] are: (i) Least median square (LMS); (ii) support vector poly kernel regression (SMOReg); (iii) KStar (K *); (iv) bagging REP tree (BREP); (v) model trees rules (M5R); (vi) model trees regression (M5P) and; (vii) reduced error pruning (REP) trees. The results presented in Table 5 for [77] are: (i) k-nearest neighbour (k-NN) smoother + feedforward error backpropagating ANN and; (ii) the best of several combinations of K-means clustering + feedforward error backpropagating ANN. Following the steps done in the first example, the cross validation results can be seen in Figure 7. will have a better performance. This makes this dataset a good benchmark for the current methodology.
The dataset contains 9568 data points collected from a combined Cycle Power Plant over six years (2006)(2007)(2008)(2009)(2010)(2011), when the power plant was set to work with a full load. The variables that correspond to the hourly average values are presented in Table 4.  [76,77]. The models considered in [76] are: (i) Least median square (LMS); (ii) support vector poly kernel regression (SMOReg); (iii) KStar (K *); (iv) bagging REP tree (BREP); (v) model trees rules (M5R); (vi) model trees regression (M5P) and; (vii) reduced error pruning (REP) trees. The results presented in Table 5 for [77] are: (i) k-nearest neighbour (k-NN) smoother + feedforward error backpropagating ANN and; (ii) the best of several combinations of K-means clustering + feedforward error backpropagating ANN. Following the steps done in the first example, the cross validation results can be seen in Figure  7.  Now, the value of the stabilised MAE for the test subsamples (calculated from complexities above 70) is 2.479, corresponding to a complexity of 50 in the train subsample. Running the full and restricted models for complexities between 10 and 200, it can be seen that the stable behaviour for MAE and MAPE starts before that value, and can be fixed around complexity 70. Figure 8 shows the comparative of the quality parameters (MAE, MAPE, and R 2 ) for the full and restricted models on the dataset, for different complexities from 10 to 200. restricted models for complexities between 10 and 200, it can be seen that the stable behaviour for MAE and MAPE starts before that value, and can be fixed around complexity 70. Figure 8 shows the comparative of the quality parameters (MAE, MAPE, and R 2 ) for the full and restricted models on the dataset, for different complexities from 10 to 200. To be conservative with respect to the possibility of overfitting, complexity 60 is selected to run the final model. The corresponding plot of estimated versus experimental values can be seen in Figure 9.  To be conservative with respect to the possibility of overfitting, complexity 60 is selected to run the final model. The corresponding plot of estimated versus experimental values can be seen in Figure  9. The parameters of quality for this model are shown in Table 6. The computation time has been also obtained from 10 iterations for complexities between 10 and 200, showing its independence in Figure 10. The parameters of quality for this model are shown in Table 6. The computation time has been also obtained from 10 iterations for complexities between 10 and 200, showing its independence in Figure 10. for complexity 100 is shown in Figure 11. Again, a representation of t c /P 2 for complexity 100 is shown in Figure 11. The parameters of quality for this model are shown in Table 6. The computation time has been also obtained from 10 iterations for complexities between 10 and 200, showing its independence in Figure 10. for complexity 100 is shown in Figure 11.

Discussion
Several questions must be considered in this step. Firstly, the presented algorithm continues the line of work developed by the authors in previous papers, see [3,4,78], solving the common bottleneck caused by the dependence of the time of computation on the complexity of the model at orders O((c + 1) 3d ) and O((c + 1) d ). Now, the dependence on the dimension has an expression of d·2 d . This represents an important advance, allowing the treatment of a greater number of problems.
Secondly, given that for complexities less than 100, (for the most part of the studied problems this value is usually around 70-80), there exists a direct relationship between the complexity of the model and the precision of the estimation, the independence of the computing time of current algorithm from this parameter is an important result. However, the problem has been now displaced to the number of points, which contribution to the total computing time was hidden in the previous methodologies. Therefore, the two new challenges for future investigations will be the use of more efficient algorithms and data structures to improve the dependence of the number of points and the dimensionality of the problem that is present as d·2 d .
On the other side, the characteristics of the methodology allow a cheap parallel calculation of the estimated values for points belonging to the sample that determines the model in two different ways: The full and the restricted models. The study of their behaviour for different complexities allows the inclusion of a benchmark to capture the overfitting in the model. Overfitting of models is a very common problem that appears where the model tends to treat in-model and out-of-model points in a different way causing the estimations to be useless. Following in this direction, considering the development of algorithms for automatic detection of overfitting, and the selection of optimum complexity will be another pending task for future investigations. Considering that the process of selecting the most adequate complexity for the model is equivalent to optimising the error of a weighted combination of both models, its automation seems to be a feasible problem.
However, at the moment, obtaining an adequate combination of the full and restricted models introduced in 4.3 is a complex problem. This makes the comparisons with other available methodologies diffuse because the quality parameters of the optimal model would be an intermediate case between those corresponding to the full and restricted cases.
Another characteristic that could be considered as a distinctive aspect of the methodology is its 'liquidity', in the sense of the absence of a model as such, that is, the methodology does not generate any additional structure beyond the estimations and the optimum value for the h parameter. Thus, it can also be viewed as a positive aspect.
Finally, the proposed methodology has been used on well-known problems in order to compare the results with those obtained in other papers and methods.
For the Air Foil Self Noise dataset, the quality parameters of the restricted model, considered as the worst prediction are outperformed by decision trees, XGBoost regression, and some models of NN. However, the full model, considered as the most optimistic model (in occasions presenting overfitting) obtain results similar to the best available models.
In the case of Combined Cycle Power dataset, both models (full and restricted) obtain better quality parameters than the methodologies previously considered. This very interesting result should be studied more in depth to understand it, especially in the case of the restricted model.
In addition, a wider use of the regression method must be faced by increasing the number of studied datasets, and comparing the results with other available methods, especially those based on radial basis functions, to compare their performance.