Unbiased Least-Squares Modelling

: In this paper we analyze the bias in a general linear least-squares parameter estimation problem, when it is caused by deterministic variables that have not been included in the model. We propose a method to substantially reduce this bias, under the hypothesis that some a-priori information on the magnitude of the modelled and unmodelled components of the model is known. We call this method Unbiased Least-Squares (ULS) parameter estimation and present here its essential properties and some numerical results on an applied example.


Introduction
The well known least-squares problem [1], very often used to estimate the parameters of a mathematical model, assumes an equivalence between a matrix-vector product Ax on the left, and a vector b on the right hand side: the matrix A is produced by the true model equations, evaluated at some operating conditions, the vector x contains the unknown parameters and the vector b are measurements, corrupted by white, Gaussian noise. This equivalence cannot be satisfied exactly, but the least-squares solution yields a minimum variance, maximum likelihood estimate of the parameters x, with a nice geometric interpretation: the resulting predictions Ax are at the minimum Euclidean distance from the true measurements b and the vector of residuals is orthogonal w.r.t. the subspace of all possible predictions.
Unfortunately, each violation of these assumptions produces in general a bias in the estimates. Various modifications have been introduced in the literature to cope with some of them: mainly, colored noise on b and/or A due to model error and/or colored measurement noise. The model error is often assumed as an additive stochastic term in the model, e.g., error-in-variables [2,3], with consequent solution methods like Total Least-Squares [4] and Extended Least-Squares [5], to cite a few. All these techniques let the model to be modified to describe, in some sense, the model error.
Here, instead, we assume that the model error depends from deterministic variables in a way that has not been included in the model, i.e., we suppose to use a reduced model of the real system, as it is often the case in applications. In this paper we propose a method to cope with the bias in the parameter estimates of the approximate model by exploiting the geometric properties of least-squares and using small additional a-priori information about the norm of the modelled and un-modelled components of the system response, available with some approximation in most applications. To eliminate the bias on the parameter estimates we perturb the right-hand-side without modifying the reduced model, since we assume it describes accurately one part of the true model.

Model Problem
In applied mathematics, physical models are often available, usually rather precise at describing quantitatively the main phenomena, but not satisfactory at the level of detail required by the application at hand. Here we refer to models described by differential equations, with ordinary and/or partial derivatives, commonly used in engineering and applied sciences. We assume, therefore, that there are two models at hand: a true, unknown model M and an approximate, known model M a . These models are usually parametric and they must be tuned to describe a specific physical system, using a-priori knowledge about the application and experimental measurements. Model tuning, and in particular parameter estimation, is usually done with a prediction error minimization criterion that makes the model response to be a good approximation of the dynamics shown by the measured variables used in the estimation process. Assuming that the true model M is linear in the parameters that must be estimated, the application of this criterion brings to a linear least-squares problem: where, from here on, · is the Euclidean norm, A ∈ R m×n is supposed full rank rank(A) = n, m ≥ n, x ∈ R n×1 , Ax are the model response values andf is the vector of experimental measurements. Usually the measured data contain noise, i.e., we measure f =f + , with a certain kind of additive noise (e.g., white Gaussian). Since we are interested here in algebraic and geometric aspects of the problem, we suppose = 0 and set f =f . Moreover, we assume ideally thatf = Ax holds exactly. Let us consider also the estimation problem for the approximate model M a : where A a ∈ R m×n a , x ∈ R n a ×1 , with n a < n. The choice of the notation for x is to remind that the least-squares solution satisfies A a x = P A a ( f ) =: f , where f is the orthogonal projection off on the subspace generated by A a , and the residual A a x −f is orthogonal to this subspace. Let us suppose that A a corresponds to the first n a columns of A, which means that the approximate model M a is exactly one part of the true model M, i.e., A = [A a , A u ] and so the solutionx of (1) can be decomposed in two parts such that This means that the model error corresponds to an additive term A uxu in the estimation problem. Note that the columns of A a are linearly independent since A is supposed to be of full rank. We do not consider the case in which A a is rank-deficient, because it would mean that the model is not well parametrized. Moreover, some noise in the data is sufficient to determine a full rank matrix.
For brevity, we will call A the subspace generated by the columns of A and A a , A u the subspaces generated by the columns of A a , A u respectively. Note that if A a and A u were orthogonal, decomposition (3) would be orthogonal. However, in the following we will consider the case in which the two subspaces are not orthogonal, as it commonly happens in practice. Oblique projections, even if not as common as orthogonal ones, have a large literature, e.g., [6,7]. Now, it is well known and easy to demonstrate that, when we solve problem (2) and A u is not orthogonal to A a , we get a biased solution, i.e., x =x a : Lemma 1. Given A ∈ R m×n with n ≥ 2 and A = [A a , A u ] , and given b ∈ R m×1 ∈ I m (A a ), call x the least-squares solution of (2) andx = [x a ,x u ] the solution of (1) decomposed as in (3). Then Proof. The least-squares problem Ax = f boils down to finding x such that Ax = P A a ( f ). Let us consider the unique decomposition of f on A a and A ⊥ a as f = f + f ⊥ with f = P A a ( f ) and f ⊥ = P A ⊥ a ( f ). Call f = f a + f u the decomposition on A a and A u , hence there exist two vectors x a ∈ R n a , x u ∈ R n−n a such that f a = A a x a and f u = A u x u . If A u ⊥ A a then the two decompositions are the same, hence f = f a and so x =x a . Otherwise, for the definition of orthogonal projection ( [6], third point of Def at page 429), it must hold x =x a .

Analysis of the Parameter Estimation Error
The aim of this paper is to propose a method to decrease substantially the bias of the solution of the approximated problem (2), with the smallest additional information about the norms of the model error and of the modelled part responses.
In this section we will introduce sufficient conditions to remove the bias and retrieve the true solution in a unique way, as summarized in Lemma 4. Let us start with a definition. Definition 1 (Intensity Ratio). The intensity ratio I f between modelled and un-modelled dynamics is defined as In the following we assume that a good approximation of this intensity ratio is available and that its magnitude is sufficiently big, i.e., we have an approximate model that is quite accurate. This information about the model error will be used to reduce the bias, as shown in the following sections. Moreover we will consider also the norm N f = A a x a (or, equivalently, the norm A u x u ).

The Case of Exact Knowledge about I f and N f
Here we assume, initially, to know the exact values of I f and N f , i.e., This ideal setting is important to figure out the problem also with more practical assumptions. First of all, let us show a nice geometric property that relates x a and f a under a condition like (4).

Lemma 2.
The problem of finding the set of x a ∈ R n that give a constant, prescribed value for I f and N f is equivalent to that of finding the set of f a = A a x a ∈ A a of the decomposition f = f a + f u (see the proof of Lemma 1) lying on the intersection of A a and the boundaries of two n-dimensional balls in R n . In fact, it holds: Proof. For every x a ∈ R n a holds, where we used the fact that Given I f and N f , we call the feasible set of accurate model responses all the f a that satisfy the relations (5). Now we will see that Lemma 2 allows us to reformulate problem (2) in the problem of finding a feasible f a that, replaced tof in (2), gives as solution an unbiased estimate ofx a . Indeed, it is easy to note that A axa belongs to this feasible set. Moreover, since f a ∈ A a , we can reduce the dimensionality of the problem and work on the subspace A a which has dimension n a , instead of the global space A of dimension n. To this aim, let us consider U a the matrix of the SVD decomposition of A a , A a = U a S a V T a , and complete its columns to an orthonormal basis of R n to obtain a matrix U. Since the vectors f a , f ∈ R n belong to the subspace A a , the vectorsf a ,f ∈ R n defined such that f a = Uf a and f = Uf must have zeros on the last n − n a components. Since U has orthonormal columns, it preserves the norms and so f = f and f a = f a . If we callf a ,f ∈ R n a the first n a components of the vectorsf a ,f (which have again the same norms of the full vectors in R n ) respectively, we have In this way the problem depends only on the dimension of the known subspace, i.e., the value of n a , and does not depend on the dimensions m n a and n > n a . From (8) we can deduce the equation of the (n a − 2)-dimensional boundary of an (n a − 1)-ball to which the vector f a = A a x a must belong. In the following we discuss the various cases.
In this case, we have one unique solution when both conditions on I f and N f are imposed. When only one of these two is imposed, two solutions are found, shown in Figure 1a,c. Figure 1b shows the intensity ratio I f .

Case n a = 2.
Consider the vectorsf a ,f ∈ R n a =2 as defined previously, in particular we are looking for f a = [ξ 1 , ξ 2 ] ∈ R 2 . Hence, conditions (8) can be written as where the right equation is the (n a − 1) = 1-dimensional subspace (line) F obtained subtracting the first equation to the second. This subspace has to be intersected with one of the beginning circumferences to obtain the feasible vectorsf a , as can be seen in Figure 2a and its projection on A a in Figure 2b. The intersection of the two circumferences (5) can have different solutions depending on the value of (N f − f ) − T f . When this value is strictly positive there are zero solutions, this means that the estimates of I f and N f are not correct: we are not interested in this case because we suppose the two values to be sufficiently well estimated. When the value is strictly negative there are two solutions, that coincide when the value is zero. When there are two solutions, we have no sufficient information to determine which one of the two solutions is the true one, i.e., the one that gives f a = A axa : we cannot choose the one that has minimum residual, neither the vector f a that has the minimum angle with f , because both solutions have the same values of these two quantities. However, since we are supposing the linear system to be originated by an input/output system, where the matrix A a is a function also of the input and f are the measurements of the output, we can take two tests with different inputs. Since all the solution sets contain the true parameter vector, we can determine the true solution from their intersection, unless the solutions of the two tests are coincident. The condition for coincidence is expressed in Lemma 3.
Let us call A a,i ∈ R n×n a the matrix of the test i = 1, 2, to which correspond a vector f i . The line on which lie the two feasible vectors f a of the same test i is F i and S i = A † a,i F i is the line through the two solution points. To have two tests with non-coincident solutions, we need that these two lines S 1 , S 2 do not have more than one common point, that in the case n a = 2 is equivalent to S 1 = S 2 , i.e., A † a,1 F 1 = A † a,2 F 2 , i.e., F 1 = A a,1 A † a,2 F 2 =: F 12 . We represent the lines F i by means of their orthogonal vector from the origin f ort,i = l ort,i f i f i . We introduce the matrices C a , C f , C f p such that A a,2 = C a A a,1 ,

Lemma 3.
Consider two tests i = 1, 2 from the same system with n a = 2 with the above notation. Then it holds F 1 = F 12 if and only if C a = C f p .

Proof. From the relation
It holds F 1 = F 12 ⇐⇒ f ort,1 = f ort,12 := A a,1 A † a,2 f ort,2 , hence we will show this second equivalence. We note that l ort,2 = k f l ort,1 and calculate Now let us call s ort,1 the vector such that f ort,1 = A a,1 s ort,1 , then, using the fact that C a = C f p we obtain Hence we have More generally, for the case n a ≥ 3, consider the vectorsf a ,f ∈ R n a as defined previously, in particular we are looking forf a = [ξ 1 , . . . , ξ n a ] ∈ R n a . Conditions (8) can be written as where the two equations on the left are two (n a − 1)-spheres, i.e., the boundaries of two n a -dimensional balls. Analogously to the case n a = 2, the intersection of these equations can be empty, one point or the boundary of a (n a − 1)-dimensional ball (with the same conditions on (N f − f ) − T f ). The equation on the right of (13) is the (n a − 1)-dimensional subspace F on which lies the boundary of the (n a − 1)-dimensional ball of the feasible vectors f a , and is obtained subtracting the first equation to the second one. In Figure 3a the graphical representation of the decomposition f = f a + f u for the case n a = 3 is shown, and in Figure 3b the solution ellipsoids of 3 tests whose intersection is one point. Figure 4a shows the solution hyperellipsoids of 4 tests whose intersection is one point, in the case n a = 4. We note that, to obtain one unique solution x a we must intersect the solutions of at least two tests. Let us give a more precise idea of what happens in general. Given i = 1, . . . , n a tests we call, as in the previous case, f ort,i the vector orthogonal to the (n a − 1)-dimensional subspace F i that contains the feasible f a , and S i = A † a,i F i . We project this subspace on A a,1 and obtain F 1i = A a,1 A † a,i F i that we describe through its orthogonal vector f ort,1i = A a,1 A † a,i f ort,i . If the vectors f ort,1 , f ort,12 , . . . f ort,1n a are linearly independent, it means that the (n a − 1)-dimensional subspaces F 1 , F 12 , . . . F 1n a intersect themselves in one point. In Figure 4b it is shown an example in which, in the case n a = 3 the vectors f ort,1 , f ort,12 , f ort,13 are not linearly independent. The three solution sets of this example will intersect in two points, hence, for n a = 3, three tests are not always sufficient to determine a unique solution. Lemma 4. For all n a > 1, the condition that, given i = 1, . . . , n a tests, the n a hyperplanes S i = A † a,i F i previously defined have linearly independent normal vectors is sufficient to determine one unique intersection, i.e., one unique solution vectorx a , that satisfies the system of conditions (4) for each test.
Proof. The intersection of n a independent hyperplanes in R n a is a point. Given a test i and S i = A † a,i F i the affine subspace of that test where n i is the normal vector of the linear subspace and v i the translation with respect to the origin. The conditions on S i relative to n a tests correspond to a linear system Ax = b, where n i is the i-th row of A and each component of the vector b given by b i = n T i v i . The matrix A has full rank because of the linear independence condition of the vectors n i , hence the solution of the linear system is unique.
The unique intersection is due to the hypothesis of full column rank of the matrices A a,i : this condition implies that the matrices A a,i map the surfaces F i to hyperplanes S i = A a,i F i . For example, with n a = 2 (Lemma 3) this condition is equal to considering two tests with non-coincident lines S 1 , S 2 , i.e., two non-coincident F 1 , F 12 .

The Case of Approximate Knowledge of I f and N f Values
Let us consider N tests and call I f ,i , N f ,i and T f ,i the values as defined in Lemma 2, relative to test i. Since the system of conditions is equivalent, as shown in Lemma 2, we will take into account the system on the right for its simplicity: the equation on T f ,i represents an hyperellipsoid, translated with respect to the origin. In a real application, we can assume to know only an interval in which the true values of I f is contained and, analogously, an interval for N f values. Supposing we know the bounds on I f and N f , then the bounds on T f can be easily computed. Let us call these extreme values N max , T min f , we will assume it always holds for each i-th test of the considered set i = 0, . . . , N.
Condition (4) is now relaxed as follows: the true solutionx a satisfies  (4) of Section 3.1), but an entire closed region of the space that may be even not connected, and contains infinite possible solutions x different fromx a .
In Figure 5 two examples, with n a = 2, of the conditions for a single test are shown: on the left in the case of exact knowledge of the N f ,i and T f ,i values, and on the right with the knowledge of two intervals containing the right values.
Given a single test, the conditions (16) on a point x can be easily characterized. Given the condition we write x a = ∑ χ i v i with v i the vectors of the orthogonal basis, given by the columns V of the SVD decomposition A a = USV T . Then Since the norm condition f a 2 = ∑ i (s i χ i ) 2 = N 2 f holds, then we obtain the equation of the hyperellipsoid for x a as : The bounded conditions hence gives the region inside the two hyperellipsoids centered in the origin: Analogously for the I f condition, the region inside the two translated hyperellipsoids: Given a test i, each of the conditions (18) and (19), constrainx a to lie inside a thick hyperellipsoid, i.e., the region between the two concentric hyperellipsoids. The intersection of these two conditions for test i is a zero-residual region that we call Z r i It is easy to verify that if N f ,i is equal to the assumed N min   Figure 5. Examples of the exact and approximated conditions on a test with n a = 2. In the left equation the two black ellipsoids are the two constraints of the right system of (14), while in the right figure the two couples of concentric ellipsoids are the borders of the thick ellipsoids defined by (16) and the blue region Z r i is the intersection of (18)  When more tests i = 1, . . . , N are put together, we have to consider the points that belong to the intersection of all these regions Z r i , i.e., These points minimize, with zero residual, the following optimization problem: It is also easy to verify that, if the true solution lies on an edge/vertex of one of the regions Z r i , it will lie on an edge/vertex of their intersection.
The intersected region I zr tends to monotonically shrink in a way that depends from the properties of the added tests. We are interested to study the conditions that make it reduce to a point, or at least to a small region. A sufficient condition to obtain a point is given in Theorem 1.
Let us first consider the function that, given a point in the space R n a , returns the squared norm of its image through the matrix A a : where v i are the columns of V and x = [x(1) x(2) . . . , x(n a )].
they point inward to the region of the thick hyperellipsoid. • the Upward Outgoing Gradients as the set of negative gradients of points on the maximum hyperellipsoid

Problem Solution
The theory previously presented allows us to build a solution algorithm that can deal with different a-priori information. We will start in Section 4.1 with the ideal case, i.e., with exact knowledge of I f and N f . Then, we generalize to a more practical setting, where we suppose to know an interval that contains the T f values of all the experiments considered and an interval for the N f values. Hence, the estimate solution will satisfy Equations (18) and (19). In this case we describe an algorithm for computing an estimate of the solution, that we will test in Section 5 against a toy model.

Exact Knowledge of I f and N f
When the information about I f and N f is exact, with the minimum amount of experiments indicated in Section 3 we can find the unbiased parameter estimate as the intersection I zr of the zero-residual sets Z r i corresponding to each experiment. In principle this could be done also following the proof of Lemma 4, but the computation of the v i vectors is quite cumbersome. Since this is an ideal case, we solve it by simply imposing the satisfaction of the various N f and T f conditions (Equation (14)) as an optimization problem: The solution of this problem is unique when the tests are in a sufficient number and satisfies the conditions of Lemma 4.
This nonlinear least-squares problem can be solved using a general nonlinear optimization algorithm, like Gauss-Newton method or Levenberg-Marquardt [8].

Approximate Knowledge of I f and N f
In practice, as already pointed out in Section 3.2, it is more realistic to know the two intervals that contain all the N f ,i and I f ,i values for each test i. Then, we know that within the region I zr there is also the exact unbiased parameter solutionx a , that we want at least to approximate. We introduce here an Unbiased Least-Squares (ULS) Algorithm 1 for the computation of this estimate. 1) compute a solution with zero residual of the problem (22) with a nonlinear least-squares optimization algorithm, 5: 2) estimate the size of the zero-residual region as described below in (31), 6: 3) increment by one the number i t of tests. 7: end while 8: Accept the final solution if the estimated region diameter is sufficiently small.
In general, the zero-residual region Z r i of each test contains the true point of the parameters vector, while the estimated iterates with the local optimization usually start from a point outside this region and converge to a point on the boundary of the region.
The ULS estimate can converge to the true solution in two cases: 1. the true solution lies on the border of the region I zr and the estimate reach the border on that point; 2. the region I zr reduces to a dimension smaller than the required accuracy, or reduces to a point.
The size of the intersection set I zr , of the zero-residual regions Z r i , is estimated in the following way.
Let us define an index, that we call region shrinkage estimate, as follows: where we used µ = 1.5 in the experiments below, P = {δ ∈ R n a | δ(i) ∈ (−1, 0, 1) ∀i = 1, . . . , n a } and ∆ I zr is the Dirac function of the set I zr .

Numerical Examples
Let us consider a classical application example, the equations of a DC motor with a mechanical load, where the electrical variables are governed by the following ordinary differential equation where I is the motor current, ω the motor angular speed, V the applied voltage, and f u (t) a possible unmodelled component f u (t) = −m err cos(n poles θ(t)), where n poles is the number of poles of the motor, i.e., the number of windings or magnets [9], m err the magnitude of the error model and θ the angle, given by the system Note that the unknown component f u of this example can be seen as a difference in the potential that is not described by the approximated model. We are interested in the estimation of parameters [L, K, R]. In our test the true values were constant values [L = 0.0035, K = 0.14, R = 0.53].
We suppose to know the measurements of I and ω at equally spaced times t 0 , . . . , tN with step h, such that t k = t 0 + kh, and t k+1 = t k + h. In Figure 7 we see the plots of the motor speed ω and of the unknown component f u for this experiment. We compute the approximation of the derivative of the current signalˆİ(t k ) with the forward finite difference formula of order onêİ (t k ) = I(t k ) − I(t k−1 ) h , for t k = t 1 , . . . , tN with a step h = 4 × 10 −4 . The applied voltage is held constant to the value V(t) = 30.0 .
To obtain a more accurate estimate, or to allow the possibility of using higher step size values h, finite differences of higher order can be used, for example the fourth order difference formulâİ With the choice of the finite difference formula, we obtain the discretized equations We will show a possible implementation of the method explained in the previous sections, and the results we get with this toy-model example. The comparison is made against the standard least-squares. In particular, we will show that when the information about I f and N f is exact, we have an exact removal of the bias. In case this information is only approximate, which is common in a real application, we will show how the bias asymptotically disappears when the number of experiments increases.
We build each test taking the Equation (35) for n samples in the range t 1 , . . . , tN, obtaining the linear system . . .
so that the first matrix in the equation is A a ∈ R n×n a with n a = 3, the number of parameters to be estimated.
To measure the estimation relative errorê rel we will use the following formula, wherex a is the parameter estimate:ê Note that the tests that we built in the numerical experiments below are simply small chunks of consecutive data, taken from one single simulation for each experiment.
The results have been obtained with a Python code developed by the authors, using NumPy for linear algebra computations and scipy.optimize for the nonlinear least-squares optimization.

Exact Knowledge of I f and N f
As analyzed in Section 4.1, the solution of the minimization problem (30) is computed with a local optimization algorithm.
Here the obtained results show an errorê rel with an order of magnitude of 10 −7 in every test we made. Note that it is also possible to construct geometrically the solution, with exact results.

Approximate Knowledge of I f and N f
When I f and N f are known only approximately, i.e., we know only an interval that contains all the I f values and an interval that contains all the N f values, we lose the unique intersection of Lemma 4, that would require only n a tests. Moreover, with a finite number of tests we cannot guarantee in general to satisfy the exact hypotheses of Theorem 1. As a consequence, various issues open up. Let's start by showing in Figure 8 that when all the four conditions of (15) hold with equality, the true solution lies on the boundary of the region I zr as already mentioned in Section 3.2. If this happens, then with the conditions of Theorem 1 on the upward/downward outgoing gradients, the region I zr is a point. When all the four conditions of (15) hold with strict inequalities, the true solution lies inside the region I zr (Figure 8b). From a theoretical point of view this distinction has a big importance, since it means that the zero-residual region can or cannot be reduced to a single point. From a practical point of view it becomes less important, for the moment, since we cannot guarantee that the available tests will reduce I zr exactly to a single point and we will arrive most of the times to an approximate estimate. This can be more or less accurate, but this depends on the specific application, and this is out of the scope of the present work. To be more precise, when the conditions of Theorem 1 are not satisfied, there is an entire region of the parameters space which satisfies exactly problem (30), but only one point of this region is the true solutionx a . As more tests are added and intersected together, the zero-residual region I zr tends to reduce, simply because it must satisfy an increasing number of inequalities. In Figure 9 we can see four iterations taken from an example, precisely with 3, 5, 9 and 20 tests intersected and m err = 19. With only three tests (Figure 9a), there is a big region I zr (described by the mesh of small dots), and here we see that the true solution (thick point) and the current estimate (star) stay on opposite sides of the region, as accidentally happens. With five tests ( Figure 9a) the region has shrunk considerably and the estimate is reaching the boundary (in the plot it is still half-way), and even more with nine tests (Figure 9c). The convergence arrives here before the region collapses to a single point, because accidentally the estimate has approached the region boundary at the same point where the true solution is located.
In general, the zero-residual region Z r i (20) of each test contains the true solution, while the estimate arrives from outside the region and stops when it bumps the border of the intersection region I zr (21). For this reason we have convergence when the region that contains the true solution is reduced to a single point, and the current estimatex a does not lie in a disconnected sub-region of I zr different from the one in which the true solution lies. Figure 10 shows an example of an intersection region I zr which is the union of two closed disconnected regions: this case creates a local minimum in problem (30).   Figure 12 synthesizes the main results that we have experienced with this new approach. Globally it shows a great reduction of the bias contained in the standard least-squares estimates; indeed, we had to use the logarithmic scale to enhance the differences in the behaviour of the proposed method while varying m err . In particular, • with considerable levels of modelling error, let us say m err between 2 and 12, the parameter estimation errorê rel is at least one order of magnitude smaller that that of least-squares; this is accompanied by high levels of shrinkage of the zero-residual region ( Figure 12b); • with higher levels of m err , we see a low shrinkage of the zero-residual region and consequently an estimate whose error is highly oscillating, depending on where the optimization algorithm has brought it to get in contact with the zero-residual region; • at m err = 18 we see the presence of a local minimum, due to the falling to pieces of the zero-residual region as in Figure 10: the shrinkage at the true solution is estimated to be very high, while at the estimated solution it is quite low, since it is attached to a disconnected, wider sub-region. • the shrinking of the zero-residual region is related to the distribution of the outgoing gradients, as stated by Theorem 1: in Figure 12d we see that in the experiment with m err = 18 they occupy only three of eight orthants, while in the best results of the other experiments the gradients distribute themselves in almost all orthants (not shown).
It is evident from these results that for lower values of modelling error m err , it is much easier to produce tests that reduce the zero-residual region to a quite small interval of R n a , while for high values of m err it is much more difficult and the region I zr can even fall to pieces, thus creating local minima. It is also evident that a simple estimate of the I zr region size, like (31), can reliably assess the quality of the estimate produced by the approach here proposed, as summarized in Figure 12c.

Conclusions
In this paper we have analyzed the bias commonly arising in parameter estimation problems where the model is lacking some deterministic part of the system. This result is useful in applications where an accurate estimation of parameters is important, e.g., in physical (grey-box) modelling typically arising in the model-based design of multi-physical systems, see e.g., the motivations that the authors did experience in the design of digital twins of controlled systems [10][11][12] for virtual prototyping, among an actually huge literature.
At this point, the method should be tested in a variety of applications, since the ULS approach here proposed is not applicable black-box as Least-Squares are. Indeed, it requires some additional a priori information. Moreover, since the computational complexity of the method here presented is relevant, efficient computational methods must be considered and will be a major issue in future investigations.
Another aspect that is even worth to deepen is also the possibility to design tests that contribute optimally to the reduction of the zero-residual region.
Author Contributions: Conceptualization, methodology, validation, formal analysis, investigation, software, resources, data curation, writing-original draft preparation, writing-review and editing, visualization: M.G. and F.M.; supervision, project administration, funding acquisition: F.M. All authors have read and agreed to the published version of the manuscript.