Unbiased Least-Squares Modelling

Gatto, Marta; Marcuzzi, Fabio

doi:10.3390/math8060982

Open AccessArticle

Unbiased Least-Squares Modelling

by

Marta Gatto

and

Fabio Marcuzzi

^*

Department of Mathematics, “Tullio Levi Civita”, University of Padova, Via Trieste 63, 35131 Padova, Italy

^*

Author to whom correspondence should be addressed.

Mathematics 2020, 8(6), 982; https://doi.org/10.3390/math8060982

Submission received: 25 May 2020 / Revised: 9 June 2020 / Accepted: 11 June 2020 / Published: 16 June 2020

(This article belongs to the Special Issue Multivariate Approximation for solving ODE and PDE)

Download

Browse Figures

Versions Notes

Abstract

:

In this paper we analyze the bias in a general linear least-squares parameter estimation problem, when it is caused by deterministic variables that have not been included in the model. We propose a method to substantially reduce this bias, under the hypothesis that some a-priori information on the magnitude of the modelled and unmodelled components of the model is known. We call this method Unbiased Least-Squares (ULS) parameter estimation and present here its essential properties and some numerical results on an applied example.

Keywords:

parameter estimation; physical modelling; oblique decomposition; least-squares

1. Introduction

The well known least-squares problem [1], very often used to estimate the parameters of a mathematical model, assumes an equivalence between a matrix-vector product

A x

on the left, and a vector b on the right hand side: the matrix A is produced by the true model equations, evaluated at some operating conditions, the vector x contains the unknown parameters and the vector b are measurements, corrupted by white, Gaussian noise. This equivalence cannot be satisfied exactly, but the least-squares solution yields a minimum variance, maximum likelihood estimate of the parameters x, with a nice geometric interpretation: the resulting predictions

A x

are at the minimum Euclidean distance from the true measurements b and the vector of residuals is orthogonal w.r.t. the subspace of all possible predictions.

Unfortunately, each violation of these assumptions produces in general a bias in the estimates. Various modifications have been introduced in the literature to cope with some of them: mainly, colored noise on b and/or A due to model error and/or colored measurement noise. The model error is often assumed as an additive stochastic term in the model, e.g., error-in-variables [2,3], with consequent solution methods like Total Least-Squares [4] and Extended Least-Squares [5], to cite a few. All these techniques let the model to be modified to describe, in some sense, the model error.

Here, instead, we assume that the model error depends from deterministic variables in a way that has not been included in the model, i.e., we suppose to use a reduced model of the real system, as it is often the case in applications. In this paper we propose a method to cope with the bias in the parameter estimates of the approximate model by exploiting the geometric properties of least-squares and using small additional a-priori information about the norm of the modelled and un-modelled components of the system response, available with some approximation in most applications. To eliminate the bias on the parameter estimates we perturb the right-hand-side without modifying the reduced model, since we assume it describes accurately one part of the true model.

2. Model Problem

In applied mathematics, physical models are often available, usually rather precise at describing quantitatively the main phenomena, but not satisfactory at the level of detail required by the application at hand. Here we refer to models described by differential equations, with ordinary and/or partial derivatives, commonly used in engineering and applied sciences. We assume, therefore, that there are two models at hand: a true, unknown model

M

and an approximate, known model

M_{a}

. These models are usually parametric and they must be tuned to describe a specific physical system, using a-priori knowledge about the application and experimental measurements. Model tuning, and in particular parameter estimation, is usually done with a prediction error minimization criterion that makes the model response to be a good approximation of the dynamics shown by the measured variables used in the estimation process. Assuming that the true model

M

is linear in the parameters that must be estimated, the application of this criterion brings to a linear least-squares problem:

\bar{x} = \underset{x^{'} \in R^{n}}{argmin} {∥ A x^{'} - \bar{f} ∥}^{2},

(1)

where, from here on,

∥ \cdot ∥

is the Euclidean norm,

A \in R^{m \times n}

is supposed full rank

rank (A) = n

,

m \geq n

,

\bar{x} \in R^{n \times 1}

,

A x^{'}

are the model response values and

\bar{f}

is the vector of experimental measurements. Usually the measured data contain noise, i.e., we measure

f = \bar{f} + ϵ

, with

ϵ

a certain kind of additive noise (e.g., white Gaussian). Since we are interested here in algebraic and geometric aspects of the problem, we suppose

ϵ = 0

and set

f = \bar{f}

. Moreover, we assume ideally that

\bar{f} = A \bar{x}

holds exactly. Let us consider also the estimation problem for the approximate model

M_{a}

:

x^{‖} = \underset{x^{'} \in R^{n_{a}}}{argmin} {∥ A_{a} x^{'} - \bar{f} ∥}^{2},

(2)

where

A_{a} \in R^{m \times n_{a}}

,

x^{‖} \in R^{n_{a} \times 1}

, with

n_{a} < n

. The choice of the notation for

x^{‖}

is to remind that the least-squares solution satisfies

A_{a} x^{‖} = P_{A_{a}} (f) = : f^{‖},

where

f^{‖}

is the orthogonal projection of

\bar{f}

on the subspace generated by

A_{a}

, and the residual

A_{a} x^{‖} - \bar{f}

is orthogonal to this subspace. Let us suppose that

A_{a}

corresponds to the first

n_{a}

columns of A, which means that the approximate model

M_{a}

is exactly one part of the true model

M

, i.e.,

A = [A_{a}, A_{u}]

and so the solution

\bar{x}

of (1) can be decomposed in two parts such that

A \bar{x} = [A_{a}, A_{u}] [\begin{matrix} {\bar{x}}_{a} \\ {\bar{x}}_{u} \end{matrix}] = A_{a} {\bar{x}}_{a} + A_{u} {\bar{x}}_{u} = \bar{f} .

(3)

This means that the model error corresponds to an additive term

A_{u} {\bar{x}}_{u}

in the estimation problem.

Note that the columns of

A_{a}

are linearly independent since A is supposed to be of full rank. We do not consider the case in which

A_{a}

is rank-deficient, because it would mean that the model is not well parametrized. Moreover, some noise in the data is sufficient to determine a full rank matrix.

For brevity, we will call

A

the subspace generated by the columns of A and

A_{a}

,

A_{u}

the subspaces generated by the columns of

A_{a}

,

A_{u}

respectively. Note that if

A_{a}

and

A_{u}

were orthogonal, decomposition (3) would be orthogonal. However, in the following we will consider the case in which the two subspaces are not orthogonal, as it commonly happens in practice. Oblique projections, even if not as common as orthogonal ones, have a large literature, e.g., [6,7].

Now, it is well known and easy to demonstrate that, when we solve problem (2) and

A_{u}

is not orthogonal to

A_{a}

, we get a biased solution, i.e.,

x^{‖} \neq {\bar{x}}_{a}

:

Lemma 1.

Given

A \in R^{m \times n}

with

n \geq 2

and

A = [A_{a}, A_{u}]

, and given

b \in R^{m \times 1} \notin I_{m} (A_{a})

, call x the least-squares solution of (2) and

\bar{x} = [{\bar{x}}_{a}, {\bar{x}}_{u}]

the solution of (1) decomposed as in (3). Then

(i): if $A_{u} ⊥ A_{a}$ then $x^{‖} = {\bar{x}}_{a},$
(ii): if $A_{u} ⊥ A_{a}$ then $x^{‖} \neq {\bar{x}}_{a} .$

Proof.

The least-squares problem

A x = f

boils down to finding x such that

A x = P_{A_{a}} (f) .

Let us consider the unique decomposition of f on

A_{a}

and

A_{a}^{⊥}

as

f = f^{‖} + f^{⊥}

with

f^{‖} = P_{A_{a}} (f)

and

f^{⊥} = P_{A_{a}^{⊥}} (f)

. Call

f = f_{a} + f_{u}

the decomposition on

A_{a}

and

A_{u}

, hence there exist two vectors

x_{a} \in R^{n_{a}}, x_{u} \in R^{n - n_{a}}

such that

f_{a} = A_{a} x_{a}

and

f_{u} = A_{u} x_{u}

. If

A_{u} ⊥ A_{a}

then the two decompositions are the same, hence

f^{‖} = f_{a}

and so

x^{‖} = {\bar{x}}_{a}

. Otherwise, for the definition of orthogonal projection ([6], third point of Def at page 429), it must hold

x^{‖} \neq {\bar{x}}_{a}

. □

3. Analysis of the Parameter Estimation Error

The aim of this paper is to propose a method to decrease substantially the bias of the solution of the approximated problem (2), with the smallest additional information about the norms of the model error and of the modelled part responses.

In this section we will introduce sufficient conditions to remove the bias and retrieve the true solution in a unique way, as summarized in Lemma 4. Let us start with a definition.

Definition 1

(Intensity Ratio). The intensity ratio

I_{f}

between modelled and un-modelled dynamics is defined as

I_{f} = \frac{∥ A_{a} x_{a} ∥}{∥ A_{u} x_{u} ∥} .

In the following we assume that a good approximation of this intensity ratio is available and that its magnitude is sufficiently big, i.e., we have an approximate model that is quite accurate. This information about the model error will be used to reduce the bias, as shown in the following sections. Moreover we will consider also the norm

N_{f} = ∥ A_{a} x_{a} ∥

(or, equivalently, the norm

∥ A_{u} x_{u} ∥

).

3.1. The Case of Exact Knowledge about $I_{f}$ and $N_{f}$

Here we assume, initially, to know the exact values of

I_{f}

and

N_{f}

, i.e.,

\{\begin{matrix} N_{f} = {\bar{N}}_{f} = ∥ A_{a} {\bar{x}}_{a} ∥, \\ I_{f} = {\bar{I}}_{f} = \frac{∥ A_{a} {\bar{x}}_{a} ∥}{∥ A_{u} {\bar{x}}_{u} ∥} . \end{matrix}

(4)

This ideal setting is important to figure out the problem also with more practical assumptions. First of all, let us show a nice geometric property that relates

x_{a}

and

f_{a}

under a condition like (4).

Lemma 2.

The problem of finding the set of

x_{a} \in R^{n}

that give a constant, prescribed value for

I_{f}

and

N_{f}

is equivalent to that of finding the set of

f_{a} = A_{a} x_{a} \in A_{a}

of the decomposition

f = f_{a} + f_{u}

(see the proof of Lemma 1) lying on the intersection of

A_{a}

and the boundaries of two n-dimensional balls in

R^{n}

. In fact, it holds:

\{\begin{matrix} N_{f} = ∥ A_{a} x_{a} ∥ \\ I_{f} = \frac{∥ A_{a} x_{a} ∥}{∥ A_{u} x_{u} ∥} \end{matrix} \Leftrightarrow \{\begin{matrix} f_{a} \in \partial B_{n} (0, N_{f}) \\ f_{a} \in \partial B_{n} (f^{‖}, T_{f}) \end{matrix} with T_{f} : = \sqrt{{(\frac{N_{f}}{I_{f}})}^{2} - {∥ f^{⊥} ∥}^{2}} .

(5)

Proof.

For every

x_{a} \in R^{n_{a}}

holds,

\begin{matrix} \{\begin{matrix} N_{f} = ∥ f_{a} ∥ = ∥ A_{a} x_{a} ∥ \\ I_{f} = \frac{∥ f_{a} ∥}{∥ f_{u} ∥} = \frac{N_{f}}{∥ f_{u}^{⊥} + f_{u}^{‖} ∥} = \frac{N_{f}}{\sqrt{{∥ f^{⊥} ∥}^{2} + {∥ f^{‖} - A_{a} x_{a} ∥}^{2}}} = \frac{N_{f}}{\sqrt{{∥ f^{⊥} ∥}^{2} + {∥ f^{‖} - f_{a} ∥}^{2}}} \end{matrix} \Leftrightarrow \end{matrix}

(6)

\begin{matrix} \Leftrightarrow \{\begin{matrix} ∥ f_{a} ∥ = N_{f} \\ ∥ f^{‖} - f_{a} ∥ = \sqrt{{(\frac{N_{f}}{I_{f}})}^{2} - {∥ f^{⊥} ∥}^{2}} = : T_{f}, \end{matrix} \end{matrix}

(7)

where we used the fact that

f_{u} = f_{u}^{‖} + f_{u}^{⊥}

with

f_{u}^{⊥} : = P_{A_{a}^{⊥}} (f_{u}) = f^{⊥}

,

f_{u}^{‖} : = P_{A_{a}} (f_{u}) = A_{a} δ x_{a} = f^{‖} - A_{a} x_{a}

, and

δ x_{a} = (x^{‖} - x_{a})

. Hence the equivalence (5) is proved. □

Given

I_{f}

and

N_{f}

, we call the feasible set of accurate model responses all the

f_{a}

that satisfy the relations (5). Now we will see that Lemma 2 allows us to reformulate problem (2) in the problem of finding a feasible

f_{a}

that, replaced to

\bar{f}

in (2), gives as solution an unbiased estimate of

{\bar{x}}_{a}

. Indeed, it is easy to note that

A_{a} {\bar{x}}_{a}

belongs to this feasible set. Moreover, since

f_{a} \in A_{a}

, we can reduce the dimensionality of the problem and work on the subspace

A_{a}

which has dimension

n_{a}

, instead of the global space

A

of dimension n. To this aim, let us consider

U_{a}

the matrix of the SVD decomposition of

A_{a}

,

A_{a} = U_{a} S_{a} V_{a}^{T}

, and complete its columns to an orthonormal basis of

R^{n}

to obtain a matrix U. Since the vectors

f_{a}, f^{‖} \in R^{n}

belong to the subspace

A_{a}

, the vectors

{\tilde{f}}_{a}, {\tilde{f}}^{‖} \in R^{n}

defined such that

f_{a} = U {\tilde{f}}_{a}

and

f^{‖} = U {\tilde{f}}^{‖}

must have zeros on the last

n - n_{a}

components. Since U has orthonormal columns, it preserves the norms and so

∥ f^{‖} ∥ = ∥ {\tilde{f}}^{‖} ∥

and

∥ f_{a} ∥ = ∥ {\tilde{f}}_{a} ∥

. If we call

{\hat{f}}_{a}, {\hat{f}}^{‖} \in R^{n_{a}}

the first

n_{a}

components of the vectors

{\tilde{f}}_{a}, {\tilde{f}}^{‖}

(which have again the same norms of the full vectors in

R^{n}

) respectively, we have

\{\begin{matrix} {\hat{f}}_{a} \in \partial B_{n_{a}} (0, N_{f}), \\ {\hat{f}}_{a} \in \partial B_{n_{a}} (f^{‖}, T_{f}) . \end{matrix}

(8)

In this way the problem depends only on the dimension of the known subspace, i.e., the value of

n_{a}

, and does not depend on the dimensions

m ≫ n_{a}

and

n > n_{a}

. From (8) we can deduce the equation of the

(n_{a} - 2)

-dimensional boundary of an

(n_{a} - 1)

-ball to which the vector

f_{a} = A_{a} x_{a}

must belong. In the following we discuss the various cases.

3.1.1. Case $n_{a} = 1$

In this case, we have one unique solution when both conditions on

I_{f}

and

N_{f}

are imposed. When only one of these two is imposed, two solutions are found, shown in Figure 1a,c. Figure 1b shows the intensity ratio

I_{f}

.

3.1.2. Case $n_{a} = 2$

Consider the vectors

{\hat{f}}_{a}, {\hat{f}}^{‖} \in R^{n_{a} = 2}

as defined previously, in particular we are looking for

{\hat{f}}_{a} = [ξ_{1}, ξ_{2}] \in R^{2}

. Hence, conditions (8) can be written as

\{\begin{matrix} ξ_{1}^{2} + ξ_{2}^{2} = N_{f}^{2} \\ {(ξ_{1} - {\hat{f}}_{ξ_{1}}^{‖})}^{2} + {(ξ_{2} - {\hat{f}}_{ξ_{2}}^{‖})}^{2} = T_{f}^{2} \end{matrix} ⟶ F : {({\hat{f}}_{ξ_{1}}^{‖})}^{2} - 2 {\hat{f}}_{ξ_{1}}^{‖} ξ_{1} + {({\hat{f}}_{ξ_{2}}^{‖})}^{2} - 2 {\hat{f}}_{ξ_{2}}^{‖} ξ_{2} = N_{f}^{2} - T_{f}^{2},

(9)

where the right equation is the

(n_{a} - 1) = 1

-dimensional subspace (line)

F

obtained subtracting the first equation to the second. This subspace has to be intersected with one of the beginning circumferences to obtain the feasible vectors

{\hat{f}}_{a}

, as can be seen in Figure 2a and its projection on

A_{a}

in Figure 2b. The intersection of the two circumferences (5) can have different solutions depending on the value of

(N_{f} - ∥ f^{‖} ∥) - T_{f}

. When this value is strictly positive there are zero solutions, this means that the estimates of

I_{f}

and

N_{f}

are not correct: we are not interested in this case because we suppose the two values to be sufficiently well estimated. When the value is strictly negative there are two solutions, that coincide when the value is zero.

When there are two solutions, we have no sufficient information to determine which one of the two solutions is the true one, i.e., the one that gives

f_{a} = A_{a} {\bar{x}}_{a}

: we cannot choose the one that has minimum residual, neither the vector

f_{a}

that has the minimum angle with f, because both solutions have the same values of these two quantities. However, since we are supposing the linear system to be originated by an input/output system, where the matrix

A_{a}

is a function also of the input and f are the measurements of the output, we can take two tests with different inputs. Since all the solution sets contain the true parameter vector, we can determine the true solution from their intersection, unless the solutions of the two tests are coincident. The condition for coincidence is expressed in Lemma 3.

Let us call

A_{a, i} \in R^{n \times n_{a}}

the matrix of the test

i = 1, 2

, to which correspond a vector

f_{i}

. The line on which lie the two feasible vectors

f_{a}

of the same test i is

F_{i}

and

S_{i} = A_{a, i}^{†} F_{i}

is the line through the two solution points. To have two tests with non-coincident solutions, we need that these two lines

S_{1}, S_{2}

do not have more than one common point, that in the case

n_{a} = 2

is equivalent to

S_{1} \neq S_{2}

, i.e.,

A_{a, 1}^{†} F_{1} \neq A_{a, 2}^{†} F_{2}

, i.e.,

F_{1} \neq A_{a, 1} A_{a, 2}^{†} F_{2} = : F_{12}

. We represent the lines

F_{i}

by means of their orthogonal vector from the origin

f^{o r t, i} = l_{o r t, i} \frac{f_{i}^{‖}}{∥ f_{i}^{‖} ∥} .

We introduce the matrices

C_{a}, C_{f}, C_{f p}

such that

A_{a, 2} = C_{a} A_{a, 1}

,

f_{2} = C_{f} f_{1}

,

f_{2}^{‖} = C_{f p} f_{1}^{‖}

and

k_{f}

such that

∥ f_{2}^{‖} ∥ = k_{f} ∥ f_{1}^{‖} ∥

.

Lemma 3.

Consider two tests

i = 1, 2

from the same system with

n_{a} = 2

with the above notation. Then it holds

F_{1} = F_{12}

if and only if

C_{a} = C_{f p}

.

Proof.

From the relation

f_{i}^{‖} = P_{A_{a, i}} (f_{i}) = A_{a, i} {(A_{a, i}^{T} A_{a, i})}^{- 1} A_{a, i}^{T} f_{i}

, we have

f_{2}^{‖} = A_{a, 2} {(A_{a, 2}^{T} A_{a, 2})}^{- 1} A_{a, 2}^{T} f_{2} = C_{a} A_{a, 1} {(A_{a, 1}^{T} C_{a}^{T} C_{a} A_{a, 1})}^{- 1} A_{a, 1}^{T} C_{a}^{T} C_{f} f_{1} .

(10)

It holds

F_{1} = F_{12} \Leftrightarrow f^{o r t, 1} = f^{o r t, 12} : = A_{a, 1} A_{a, 2}^{†} f^{o r t, 2},

hence we will show this second equivalence. We note that

l_{o r t, 2} = k_{f} l_{o r t, 1}

and calculate

f^{o r t, 12} = A_{a, 1} A_{a, 2}^{†} f^{o r t, 2} = A_{a, 1} A_{a, 1}^{†} C_{a}^{†} (l_{o r t, 2} \frac{f_{2}^{‖}}{∥ f_{2}^{‖} ∥}) = A_{a, 1} A_{a, 1}^{†} C_{a}^{†} (k_{f} l_{o r t, 1} \frac{C_{f p} f_{1}^{‖}}{k_{f} ∥ f_{1}^{‖} ∥}) = A_{a, 1} A_{a, 1}^{†} C_{a}^{†} C_{f p} f^{o r t, 1} .

(11)

Now let us call

s^{o r t, 1}

the vector such that

f^{o r t, 1} = A_{a, 1} s^{o r t, 1}

, then, using the fact that

C_{a} = C_{f p}

we obtain

f^{o r t, 12} = A_{a, 1} A_{a, 1}^{†} C_{a}^{†} C_{f p} A_{a, 1} s^{o r t, 1} = A_{a, 1} (A_{a, 1}^{†} A_{a, 1}) s^{o r t, 1} = (s i n c e A_{a, 1}^{†} A_{a, 1} = I_{n_{a}}) = A_{a, 1} s^{o r t, 1}

(12)

Hence we have

F_{12} = F_{1} \Leftrightarrow A_{a, 1} A_{a, 1}^{†} C_{a}^{†} C_{f p} f^{o r t, 1} = f^{o r t, 1} \Leftrightarrow C_{a}^{†} C_{f p} = I .

□

3.1.3. Case $n_{a} \geq 3$

More generally, for the case

n_{a} \geq 3

, consider the vectors

{\hat{f}}_{a}, {\hat{f}}^{‖} \in R^{n_{a}}

as defined previously, in particular we are looking for

{\hat{f}}_{a} = [ξ_{1}, \dots, ξ_{n_{a}}] \in R^{n_{a}}

. Conditions (8) can be written as

\{\begin{matrix} \sum_{i = 1}^{n_{a}} ξ_{i}^{2} = N_{f}^{2} \\ \sum_{i = 1}^{n_{a}} {(ξ_{i} - {\hat{f}}_{ξ_{i}}^{‖})}^{2} = T_{f}^{2} \end{matrix} ⟶ F : \sum_{i = 1}^{n_{a}} ({({\hat{f}}_{ξ_{i}}^{‖})}^{2} - 2 {\hat{f}}_{ξ_{i}}^{‖} ξ_{i}) = N_{f}^{2} - T_{f}^{2},

(13)

where the two equations on the left are two

(n_{a} - 1)

-spheres, i.e., the boundaries of two

n_{a}

-dimensional balls. Analogously to the case

n_{a} = 2

, the intersection of these equations can be empty, one point or the boundary of a

(n_{a} - 1)

-dimensional ball (with the same conditions on

(N_{f} - ∥ f^{‖} ∥) - T_{f}

). The equation on the right of (13) is the

(n_{a} - 1)

-dimensional subspace

F

on which lies the boundary of the

(n_{a} - 1)

-dimensional ball of the feasible vectors

f_{a}

, and is obtained subtracting the first equation to the second one. In Figure 3a the graphical representation of the decomposition

f^{‖} = f_{a} + f_{u}^{‖}

for the case

n_{a} = 3

is shown, and in Figure 3b the solution ellipsoids of 3 tests whose intersection is one point. Figure 4a shows the solution hyperellipsoids of 4 tests whose intersection is one point, in the case

n_{a} = 4

.

We note that, to obtain one unique solution

x_{a}

we must intersect the solutions of at least two tests. Let us give a more precise idea of what happens in general. Given

i = 1, \dots, n_{a}

tests we call, as in the previous case,

f^{o r t, i}

the vector orthogonal to the

(n_{a} - 1)

-dimensional subspace

F_{i}

that contains the feasible

f_{a}

, and

S_{i} = A_{a, i}^{†} F_{i}

. We project this subspace on

A_{a, 1}

and obtain

F_{1 i} = A_{a, 1} A_{a, i}^{†} F_{i}

that we describe through its orthogonal vector

f^{o r t, 1 i} = A_{a, 1} A_{a, i}^{†} f^{o r t, i}

. If the vectors

f^{o r t, 1}, f^{o r t, 12}, \dots f^{o r t, 1 n_{a}}

are linearly independent, it means that the

(n_{a} - 1)

-dimensional subspaces

F_{1}, F_{12}, \dots F_{1 n_{a}}

intersect themselves in one point. In Figure 4b it is shown an example in which, in the case

n_{a} = 3

the vectors

f^{o r t, 1}, f^{o r t, 12}, f^{o r t, 13}

are not linearly independent. The three solution sets of this example will intersect in two points, hence, for

n_{a} = 3

, three tests are not always sufficient to determine a unique solution.

Lemma 4.

For all

n_{a} > 1

, the condition that, given

i = 1, \dots, n_{a}

tests, the

n_{a}

hyperplanes

S_{i} = A_{a, i}^{†} F_{i}

previously defined have linearly independent normal vectors is sufficient to determine one unique intersection, i.e., one unique solution vector

{\bar{x}}_{a}

, that satisfies the system of conditions (4) for each test.

Proof.

The intersection of

n_{a}

independent hyperplanes in

R^{n_{a}}

is a point. Given a test i and

S_{i} = A_{a, i}^{†} F_{i}

the affine subspace of that test

S_{i} = v_{i} + W_{i} = {v_{i} + w \in R^{n_{a}} : w \cdot n_{i} = 0} = {x \in R^{n_{a}} : n_{i}^{T} (x - v_{i}) = 0},

where

n_{i}

is the normal vector of the linear subspace and

v_{i}

the translation with respect to the origin.

The conditions on

S_{i}

relative to

n_{a}

tests correspond to a linear system

A x = b

, where

n_{i}

is the i-th row of A and each component of the vector b given by

b_{i} = n_{i}^{T} v_{i}

. The matrix A has full rank because of the linear independence condition of the vectors

n_{i}

, hence the solution of the linear system is unique.

The unique intersection is due to the hypothesis of full column rank of the matrices

A_{a, i}

: this condition implies that the matrices

A_{a, i}

map the surfaces

F_{i}

to hyperplanes

S_{i} = A_{a, i} F_{i}

. □

For example, with

n_{a} = 2

(Lemma 3) this condition is equal to considering two tests with non-coincident lines

S_{1}, S_{2}

, i.e., two non-coincident

F_{1}, F_{12}

.

3.2. The Case of Approximate Knowledge of $I_{f}$ and $N_{f}$ Values

Let us consider N tests and call

I_{f, i}, N_{f, i}

and

T_{f, i}

the values as defined in Lemma 2, relative to test i. Since the system of conditions

\{\begin{matrix} N_{f, i} = ∥ A_{a, i} x_{a} ∥ \\ I_{f, i} = \frac{∥ A_{a, i} x_{a} ∥}{∥ z_{i} - A_{a, i} x_{a} ∥} \end{matrix} and \{\begin{matrix} N_{f, i} = ∥ A_{a, i} x_{a} ∥ \\ T_{f, i} = ∥ f_{i}^{‖} - A_{a, i} x_{a} ∥ \end{matrix}

(14)

is equivalent, as shown in Lemma 2, we will take into account the system on the right for its simplicity: the equation on

T_{f, i}

represents an hyperellipsoid, translated with respect to the origin.

In a real application, we can assume to know only an interval in which the true values of

I_{f}

is contained and, analogously, an interval for

N_{f}

values. Supposing we know the bounds on

I_{f}

and

N_{f}

, then the bounds on

T_{f}

can be easily computed. Let us call these extreme values

N_{f}^{m a x}, N_{f}^{m i n}, T_{f}^{m a x}, T_{f}^{m i n}

, we will assume it always holds

\{\begin{matrix} N_{f}^{m a x} \geq m a x_{i} (N_{f, i}), \\ N_{f}^{m i n} \leq m i n_{i} (N_{f, i}), \end{matrix} and \{\begin{matrix} T_{f}^{m a x} \geq m a x_{i} (T_{f, i}), \\ T_{f}^{m i n} \leq m i n_{i} (T_{f, i}), \end{matrix}

(15)

for each i-th test of the considered set

i = 0, \dots, N

.

Condition (4) is now relaxed as follows: the true solution

{\bar{x}}_{a}

satisfies

\{\begin{matrix} ∥ A_{a, i} {\bar{x}}_{a} ∥ \leq N_{f}^{m a x}, \\ ∥ A_{a, i} {\bar{x}}_{a} ∥ \geq N_{f}^{m i n}, \end{matrix} and \{\begin{matrix} ∥ A_{a, i} {\bar{x}}_{a} - f_{i}^{‖} ∥ \leq T_{f}^{m a x}, \\ ∥ A_{a, i} {\bar{x}}_{a} - f_{i}^{‖} ∥ \geq T_{f}^{m i n}, \end{matrix}

(16)

for each i-th test of the considered set

i = 0, \dots, N

.

Assuming the extremes to be non-coincident (

N_{f}^{m i n} \neq N_{f}^{m a x}

and

T_{f}^{m i n} \neq T_{f}^{m a x}

), these conditions do not define a single point, i.e., the unique solution

{\bar{x}}_{a}

(as in (4) of Section 3.1), but an entire closed region of the space that may be even not connected, and contains infinite possible solutions x different from

{\bar{x}}_{a}

.

In Figure 5 two examples, with

n_{a} = 2

, of the conditions for a single test are shown: on the left in the case of exact knowledge of the

N_{f, i}

and

T_{f, i}

values, and on the right with the knowledge of two intervals containing the right values.

Given a single test, the conditions (16) on a point x can be easily characterized. Given the condition

∥ f_{a} ∥ = ∥ A_{a} x_{a} ∥ = N_{f},

we write

x_{a} = \sum χ_{i} v_{i}

with

v_{i}

the vectors of the orthogonal basis, given by the columns V of the SVD decomposition

A_{a} = U S V^{T}

. Then

f_{a} = A_{a} x_{a} = U S V^{T} (\sum_{i} χ_{i} v_{i}) = U S (\sum_{i} χ_{i} e_{i}) = U (\sum_{i} s_{i} χ_{i} e_{i}) = \sum_{i} s_{i} χ_{i} u_{i} .

Since the norm condition

∥ f_{a} ∥^{2} = \sum_{i} {(s_{i} χ_{i})}^{2} = N_{f}^{2}

holds, then we obtain the equation of the hyperellipsoid for

x_{a}

as:

\sum_{i} {(s_{i} χ_{i})}^{2} = \sum_{i} \frac{χ_{i}^{2}}{{(\frac{1}{s_{i}})}^{2}} = N_{f}^{2} .

(17)

The bounded conditions hence gives the region inside the two hyperellipsoids centered in the origin:

N_{f}^{m i n} \leq \sum_{i} \frac{χ_{i}^{2}}{{(\frac{1}{s_{i}})}^{2}} \leq N_{f}^{m a x} .

(18)

Analogously for the

I_{f}

condition, the region inside the two translated hyperellipsoids:

T_{f}^{m i n} \leq \sum_{i} \frac{χ_{i}^{2}}{{(\frac{1}{s_{i}})}^{2}} - f^{‖} \leq T_{f}^{m a x} .

(19)

Given a test i, each of the conditions (18) and (19), constrain

{\bar{x}}_{a}

to lie inside a thick hyperellipsoid, i.e., the region between the two concentric hyperellipsoids. The intersection of these two conditions for test i is a zero-residual region that we call

Z_{r_{i}}

Z_{r_{i}} = {x \in R^{n_{a}} | (18) and (19) hold} .

(20)

It is easy to verify that if

N_{f, i}

is equal to the assumed

N_{f}^{m i n}

or

N_{f}^{m a x}

, or

T_{f, i}

is equal to the assumed

T_{f}^{m i n}

or

T_{f}^{m a x}

, the true solution will be on a border of the region

Z_{r_{i}}

, and if it holds for both

N_{f, i}

and

T_{f, i}

it will lie on a vertex.

When more tests

i = 1, \dots, N

are put together, we have to consider the points that belong to the intersection of all these regions

Z_{r_{i}}

, i.e.,

I_{z r} = ⋂_{i = 0, \dots, N} Z_{r_{i}} .

(21)

These points minimize, with zero residual, the following optimization problem:

\begin{matrix} min_{x} & \sum_{i = 1}^{N} {m i n (0, ∥ A_{a, i} x ∥ - N_{f}^{m i n})}^{2} + \sum_{i = 1}^{N} {m a x (0, ∥ A_{a, i} x ∥ - N_{f}^{m a x})}^{2} + \\ + \sum_{i = 1}^{N} m i n {(0, ∥ A_{a, i} x - f_{i}^{‖} ∥ - T_{f}^{m i n})}^{2} + \sum_{i = 1}^{N} {m a x (0, ∥ A_{a, i} x - f_{i}^{‖} ∥ - T_{f}^{m a x})}^{2} . \end{matrix}

(22)

It is also easy to verify that, if the true solution lies on an edge/vertex of one of the regions

Z_{r_{i}}

, it will lie on an edge/vertex of their intersection.

The intersected region

I_{z r}

tends to monotonically shrink in a way that depends from the properties of the added tests. We are interested to study the conditions that make it reduce to a point, or at least to a small region. A sufficient condition to obtain a point is given in Theorem 1.

Let us first consider the function that, given a point in the space

R^{n_{a}}

, returns the squared norm of its image through the matrix

A_{a}

:

\begin{matrix} N_{f}^{2} (x) & = {∥ A_{a} x ∥}_{2}^{2} = {∥ U Σ V^{T} x ∥}_{2}^{2} = {∥ Σ V^{T} x ∥}_{2}^{2} = {(Σ V^{T} x)}^{T} (Σ V^{T} x) = x^{T} (V Σ^{T} Σ V^{T}) x = \\ = | | [\begin{matrix} σ_{1} v_{1}^{T} x \\ σ_{2} v_{2}^{T} x \\ ⋮ \end{matrix}] {| |}_{2}^{2} = σ_{1}^{2} {(v_{1}^{T} x)}^{2} + σ_{2}^{2} {(v_{2}^{T} x)}^{2} + \dots, \end{matrix}

(23)

where

v_{i}

are the columns of V and

x = [x (1) x (2) \dots, x (n_{a})]

.

The direction of maximum increase of this function is given by its gradient

\begin{matrix} \nabla N_{f}^{2} (x) & = 2 (V Σ^{2} V^{T}) x = [\begin{matrix} 2 σ_{1}^{2} v_{1}^{T} x v_{1} (1) + 2 σ_{2}^{2} v_{2}^{T} x v_{2} (1) + \dots + 2 σ_{n_{a}}^{2} v_{n_{a}}^{T} x v_{n_{a}} (1) \\ 2 σ_{1}^{2} v_{1}^{T} x v_{1} (2) + 2 σ_{2}^{2} v_{2}^{T} x v_{2} (2) + \dots + 2 σ_{n_{a}}^{2} v_{n_{a}}^{T} x v_{n_{a}} (2) \\ ⋮ \end{matrix}] . \end{matrix}

(24)

Analogously, define the function

T_{f}^{2} (x)

as

\begin{matrix} T_{f}^{2} (x) & = {∥ A_{a} x - f^{‖} ∥}_{2}^{2} = {∥ U Σ V^{T} x - f^{‖} ∥}_{2}^{2} = {∥ Σ V^{T} x - f^{‖} ∥}_{2}^{2} = \\ = {(Σ V^{T} x - f^{‖})}^{T} (Σ V^{T} x - f^{‖}) = {(Σ V^{T} x)}^{T} (Σ V^{T} x) - 2 {(Σ V^{T} x)}^{T} f^{‖} + {(f^{‖})}^{T} (f^{‖}) \\ = x (V Σ^{2} V^{T}) x - 2 {(x)}^{T} V Σ f^{‖} + {(f^{‖})}^{T} (f^{‖}) = \\ = {∥[\begin{matrix} σ_{1} v_{1}^{T} x \\ σ_{2} v_{2}^{T} x \\ ⋮ \end{matrix}] - f^{‖}∥}_{2}^{2} \end{matrix}

(25)

with gradient

\begin{matrix} \nabla T_{f}^{2} (x) & = 2 (V Σ^{2} V^{T}) x - 2 V Σ f^{‖} = \\ = [\begin{matrix} 2 σ_{1}^{2} v_{1}^{T} x v_{1} (1) + 2 σ_{2}^{2} v_{2}^{T} x v_{2} (1) + \dots + 2 σ_{n_{a}}^{2} v_{n_{a}}^{T} x v_{n_{a}} (1) \\ ⋮ \\ 2 σ_{1}^{2} v_{1}^{T} x v_{1} (j) + 2 σ_{2}^{2} v_{2}^{T} x v_{2} (j) + \dots + 2 σ_{n_{a}}^{2} v_{n_{a}}^{T} x v_{n_{a}} (j) \\ ⋮ \end{matrix}] - [\begin{matrix} - 2 σ_{i}^{2} \sum_{i} f^{‖} (i) v_{i} (1) \\ ⋮ \\ - 2 σ_{i}^{2} \sum_{i} f^{‖} (i) v_{i} (j) \\ ⋮ \end{matrix}] . \end{matrix}

(26)

Definition 2.

(Upward/Downward Outgoing Gradients) Take a test i, and the functions

N_{f}^{2} (x)

and

T_{f}^{2} (x)

as in (23) and (25), with the formulas of the gradient vectors of these two functions

\nabla N_{f, i} (x), \nabla T_{f, i} (x)

as in (24) and (26). Given the two extreme values

N_{f}^{m i n / m a x}

and

T_{f}^{m i n / m a x}

for each test, let us define

the downward outgoing gradients as the set of gradients calculated on the points on the minimum hyperellipsoid

${- \nabla N_{f, i} (x) | N_{f, i} (x) = N_{f}^{m i n}} and {- \nabla T_{f, i} (x) | T_{f, i} (x) = T_{f}^{m i n}}$

(27)

they point inward to the region of the thick hyperellipsoid.
the Upward Outgoing Gradients as the set of negative gradients of points on the maximum hyperellipsoid

${\nabla N_{f, i} (x) | N_{f, i} (x) = N_{f}^{m a x}} and {\nabla T_{f, i} (x) | T_{f, i} (x) = T_{f}^{m a x}}$

(28)

they point outward the region.

Note that the upward/downward outgoing gradient of function

N_{f}^{2} (x)

(or

T_{f}^{2} (x)

) on point x is the normal vector to the tangent plane on the hyperellipsoid on which the point lies. Moreover, these vectors point outward the region defined by Equation (18) (and (19) respectively). In Figure 6, an example of some upward/downward outgoing gradients of function

N_{f}^{2} (x)

is shown.

Theorem 1.

Given N tests with values

I_{f, i}

and

N_{f, i}

in the closed intervals

[I_{f}^{m i n}, I_{f}^{m a x}]

and

[N_{f}^{m i n}, N_{f}^{m a x}]

, take the set of all the upward/downward outgoing gradients of functions

N_{f, i}^{2} (x)

and

T_{f, i}^{2} (x)

calculated in the true solution

{\bar{x}}_{a}

, i.e.,

\begin{matrix} {\nabla N_{f, i} ({\bar{x}}_{a}) for i = 1, \dots, N | N_{f, i} ({\bar{x}}_{a}) = N_{f}^{m a x}} \cup {\nabla N_{f, i} ({\bar{x}}_{a}) for i = 1, \dots, N | N_{f, i} ({\bar{x}}_{a}) = N_{f}^{m i n}} \cup \\ \cup {\nabla T_{f, i} ({\bar{x}}_{a}) for i = 1, \dots, N | T_{f, i} ({\bar{x}}_{a}) = T_{f}^{m a x}} \cup {\nabla T_{f, i} ({\bar{x}}_{a}) for i = 1, \dots, N | T_{f, i} ({\bar{x}}_{a}) = T_{f}^{m i n}} . \end{matrix}

(29)

If there is at least one outgoing gradient of this set in each orthant of

R^{n_{a}}

, then the intersection region

I_{z r}

of Equation (21) reduces to a point.

Proof.

What we want to show is that given any perturbation

δ_{x}

of the real solution

{\bar{x}}_{a}

, there exists at least one condition among (18) and (19) that is not satisfied by the new perturbed point

{\bar{x}}_{a} + δ_{x}

.

Any sufficiently small perturbation

δ_{x}

in an orthant in which lies an upward/downward outgoing gradient (from now on "Gradient"), determines an increase/decrease in the value of the hyperellipsoid function relative to that Gradient, that makes the relative condition to be unsatisfied.

Hence, if the Gradient in the orthant considered is upward, it satisfies

N_{f, i} ({\bar{x}}_{a}) = N_{f}^{m a x}

(or analogously with

T_{f, i}

) and for each perturbation

δ_{x}

in the same orthant we obtain

N_{f, i} ({\bar{x}}_{a} + δ_{x}) > N_{f, i} ({\bar{x}}_{a}) = N_{f}^{m a x}

(or analogously with

T_{f, i}

). In the same way, if the Gradient is downward we obtain

N_{f, i} ({\bar{x}}_{a} + δ_{x}) < N_{f, i} ({\bar{x}}_{a}) = N_{f}^{m i n}

(or analogously with

T_{f, i}

).

When in one orthant there are more than one Gradient, it means that more than one condition will be unsatisfied by the perturbed point

{\bar{x}}_{a} + δ_{x}

for a sufficiently small

δ_{x}

in that orthant. □

4. Problem Solution

The theory previously presented allows us to build a solution algorithm that can deal with different a-priori information. We will start in Section 4.1 with the ideal case, i.e., with exact knowledge of

I_{f}

and

N_{f}

. Then, we generalize to a more practical setting, where we suppose to know an interval that contains the

T_{f}

values of all the experiments considered and an interval for the

N_{f}

values. Hence, the estimate solution will satisfy Equations (18) and (19). In this case we describe an algorithm for computing an estimate of the solution, that we will test in Section 5 against a toy model.

4.1. Exact Knowledge of $I_{f}$ and $N_{f}$

When the information about

I_{f}

and

N_{f}

is exact, with the minimum amount of experiments indicated in Section 3 we can find the unbiased parameter estimate as the intersection

I_{z r}

of the zero-residual sets

Z_{r_{i}}

corresponding to each experiment. In principle this could be done also following the proof of Lemma 4, but the computation of the

v_{i}

vectors is quite cumbersome. Since this is an ideal case, we solve it by simply imposing the satisfaction of the various

N_{f}

and

T_{f}

conditions (Equation (14)) as an optimization problem:

min_{x} F (x) with F (x) = \sum_{i = 1}^{N} {(∥ A_{a, i} x ∥ - N_{f, i})}^{2} + \sum_{i = 1}^{N} (∥ A_{a, i} x - f_{i}^{‖} {∥ - T_{f, i})}^{2} .

(30)

The solution of this problem is unique when the tests are in a sufficient number and satisfies the conditions of Lemma 4.

This nonlinear least-squares problem can be solved using a general nonlinear optimization algorithm, like Gauss–Newton method or Levenberg–Marquardt [8].

4.2. Approximate Knowledge of $I_{f}$ and $N_{f}$

In practice, as already pointed out in Section 3.2, it is more realistic to know the two intervals that contain all the

N_{f, i}

and

I_{f, i}

values for each test i. Then, we know that within the region

I_{z r}

there is also the exact unbiased parameter solution

{\bar{x}}_{a}

, that we want at least to approximate. We introduce here an Unbiased Least-Squares (ULS) Algorithm 1 for the computation of this estimate.

Algorithm 1 An Unbiased Least-Squares (ULS) algorithm.

1:: Given a number $n_{t e s t s}$ of available tests, indexed with a number between 1 and $n_{t e s t s}$ , and two intervals, $[I_{f}^{m i n}, I_{f}^{m a x}]$ and $[N_{f}^{m i n}, N_{f}^{m a x}]$ , containing the $I_{f}$ and $N_{f}$ values of all tests.
2:: At each iteration we will consider the tests indexed by the interval $[1, i_{t}]$ ; set initially $i_{t} = n_{a}$ .
3:: while $i_{t} \leq n_{t e s t s}$ do
4:: 1) compute a solution with zero residual of the problem (22) with a nonlinear least-squares optimization algorithm,
5:: 2) estimate the size of the zero-residual region as described below in (31),
6:: 3) increment by one the number $i_{t}$ of tests.
7:: end while
8:: Accept the final solution if the estimated region diameter is sufficiently small.

In general, the zero-residual region

Z_{r_{i}}

of each test contains the true point of the parameters vector, while the estimated iterates with the local optimization usually start from a point outside this region and converge to a point on the boundary of the region.

The ULS estimate can converge to the true solution in two cases:

the true solution lies on the border of the region $I_{z r}$ and the estimate reach the border on that point;
the region $I_{z r}$ reduces to a dimension smaller than the required accuracy, or reduces to a point.

The size of the intersection set

I_{z r}

, of the zero-residual regions

Z_{r_{i}}

, is estimated in the following way.

Let us define an index, that we call region shrinkage estimate, as follows:

\hat{s} (x) = m i n {n | \sum_{δ \in P} Δ_{I_{z r}} (x + μ^{- n} δ) > 0},

(31)

where we used

μ = 1.5

in the experiments below,

P = {δ \in R^{n_{a}} | δ (i) \in (- 1, 0, 1) \forall i = 1, \dots, n_{a}}

and

Δ_{I_{z r}}

is the Dirac function of the set

I_{z r}

.

5. Numerical Examples

Let us consider a classical application example, the equations of a DC motor with a mechanical load, where the electrical variables are governed by the following ordinary differential equation

\{\begin{matrix} L \dot{I} (t) & = - K ω (t) - R I (t) + V (t) - f_{u} (t) \\ I (t_{0}) & = I_{0}, \end{matrix}

(32)

where I is the motor current,

ω

the motor angular speed, V the applied voltage, and

f_{u} (t)

a possible unmodelled component

f_{u} (t) = - m_{e r r} c o s (n_{p o l e s} θ (t)),

(33)

where

n_{p o l e s}

is the number of poles of the motor, i.e., the number of windings or magnets [9],

m_{e r r}

the magnitude of the error model and

θ

the angle, given by the system

\{\begin{matrix} \dot{ω} (t) & = θ (t) \\ ω (t_{0}) & = ω_{0} . \end{matrix}

(34)

Note that the unknown component

f_{u}

of this example can be seen as a difference in the potential that is not described by the approximated model. We are interested in the estimation of parameters

[L, K, R]

. In our test the true values were constant values

[L = 0.0035, K = 0.14, R = 0.53]

.

We suppose to know the measurements of I and

ω

at equally spaced times

t_{0}, \dots, t_{\bar{N}}

with step h, such that

t_{k} = t_{0} + k h

, and

t_{k + 1} = t_{k} + h

. In Figure 7 we see the plots of the motor speed

ω

and of the unknown component

f_{u}

for this experiment.

We compute the approximation of the derivative of the current signal

\hat{\dot{I}} (t_{k})

with the forward finite difference formula of order one

\hat{\dot{I}} (t_{k}) = \frac{I (t_{k}) - I (t_{k - 1})}{h}, for t_{k} = t_{1}, \dots, t_{\bar{N}}

with a step

h = 4 \times 10^{- 4}

. The applied voltage is held constant to the value

V (t) = 30.0

.

To obtain a more accurate estimate, or to allow the possibility of using higher step size values h, finite differences of higher order can be used, for example the fourth order difference formula

\hat{\dot{I}} (t_{k}) = \frac{I (t_{k} - 2 h) - 8 I (t_{k} - h) + 8 I (t_{k} + h) - I (t_{k} + 2 h)}{12 h}, for t_{k} = t_{2}, \dots, t_{\bar{N} - 2} .

With the choice of the finite difference formula, we obtain the discretized equations

L \hat{\dot{I}} (t_{k}) = - K ω (t_{k}) - R I (t_{k}) + V (t_{k}) - f_{u} (t_{k}), for t_{k} = t_{1}, \dots, t_{\bar{N}} .

(35)

We will show a possible implementation of the method explained in the previous sections, and the results we get with this toy-model example. The comparison is made against the standard least-squares. In particular, we will show that when the information about

I_{f}

and

N_{f}

is exact, we have an exact removal of the bias. In case this information is only approximate, which is common in a real application, we will show how the bias asymptotically disappears when the number of experiments increases.

We build each test taking the Equation (35) for n samples in the range

t_{1}, \dots, t_{\bar{N}}

, obtaining the linear system

[\begin{matrix} \hat{\dot{I}} (t_{k}) & ω (t_{k}) & I (t_{k}) \\ \hat{\dot{I}} (t_{k + 1}) & ω (t_{k + 1}) & I (t_{k + 1}) \\ ⋮ & ⋮ & ⋮ \\ \hat{\dot{I}} (t_{k + n}) & ω (t_{k + n}) & I (t_{k + n}) \end{matrix}] [\begin{matrix} L \\ K \\ R \end{matrix}] + [\begin{matrix} f_{u} (t_{k}) \\ f_{u} (t_{k + 1}) \\ ⋮ \\ f_{u} (t_{k + n}) \end{matrix}] = [\begin{matrix} V (t_{k}) \\ V (t_{k + 1}) \\ ⋮ \\ V (t_{k + n}) \end{matrix}]

(36)

so that the first matrix in the equation is

A_{a} \in R^{n \times n_{a}}

with

n_{a} = 3

, the number of parameters to be estimated.

To measure the estimation relative error

{\hat{e}}_{r e l}

we will use the following formula, where

{\hat{x}}_{a}

is the parameter estimate:

{\hat{e}}_{r e l} = \frac{1}{n_{a}} \sum_{i = 1}^{n_{a}} \frac{| | {\hat{x}}_{a} (i) - {\bar{x}}_{a} (i) {| |}_{2}}{| | {\bar{x}}_{a} (i) {| |}_{2}} .

(37)

Note that the tests that we built in the numerical experiments below are simply small chunks of consecutive data, taken from one single simulation for each experiment.

The results have been obtained with a Python code developed by the authors, using NumPy for linear algebra computations and scipy.optimize for the nonlinear least-squares optimization.

5.1. Exact Knowledge of $I_{f}$ and $N_{f}$

As analyzed in Section 4.1, the solution of the minimization problem (30) is computed with a local optimization algorithm.

Here the obtained results show an error

{\hat{e}}_{r e l}

with an order of magnitude of

10^{- 7}

in every test we made. Note that it is also possible to construct geometrically the solution, with exact results.

5.2. Approximate Knowledge of $I_{f}$ and $N_{f}$

When

I_{f}

and

N_{f}

are known only approximately, i.e., we know only an interval that contains all the

I_{f}

values and an interval that contains all the

N_{f}

values, we lose the unique intersection of Lemma 4, that would require only

n_{a}

tests. Moreover, with a finite number of tests we cannot guarantee in general to satisfy the exact hypotheses of Theorem 1. As a consequence, various issues open up. Let’s start by showing in Figure 8 that when all the four conditions of (15) hold with equality, the true solution lies on the boundary of the region

I_{z r}

as already mentioned in Section 3.2. If this happens, then with the conditions of Theorem 1 on the upward/downward outgoing gradients, the region

I_{z r}

is a point. When all the four conditions of (15) hold with strict inequalities, the true solution lies inside the region

I_{z r}

(Figure 8b). From a theoretical point of view this distinction has a big importance, since it means that the zero-residual region can or cannot be reduced to a single point. From a practical point of view it becomes less important, for the moment, since we cannot guarantee that the available tests will reduce

I_{z r}

exactly to a single point and we will arrive most of the times to an approximate estimate. This can be more or less accurate, but this depends on the specific application, and this is out of the scope of the present work.

To be more precise, when the conditions of Theorem 1 are not satisfied, there is an entire region of the parameters space which satisfies exactly problem (30), but only one point of this region is the true solution

{\bar{x}}_{a}

. As more tests are added and intersected together, the zero-residual region

I_{z r}

tends to reduce, simply because it must satisfy an increasing number of inequalities. In Figure 9 we can see four iterations taken from an example, precisely with 3, 5, 9 and 20 tests intersected and

m_{e r r} = 19

. With only three tests (Figure 9a), there is a big region

I_{z r}

(described by the mesh of small dots), and here we see that the true solution (thick point) and the current estimate (star) stay on opposite sides of the region, as accidentally happens. With five tests (Figure 9a) the region has shrunk considerably and the estimate is reaching the boundary (in the plot it is still half-way), and even more with nine tests (Figure 9c). The convergence arrives here before the region collapses to a single point, because accidentally the estimate has approached the region boundary at the same point where the true solution is located.

In general, the zero-residual region

Z_{r_{i}}

(20) of each test contains the true solution, while the estimate arrives from outside the region and stops when it bumps the border of the intersection region

I_{z r}

(21). For this reason we have convergence when the region that contains the true solution is reduced to a single point, and the current estimate

{\hat{x}}_{a}

does not lie in a disconnected sub-region of

I_{z r}

different from the one in which the true solution lies. Figure 10 shows an example of an intersection region

I_{z r}

which is the union of two closed disconnected regions: this case creates a local minimum in problem (30).

In Figure 11 we see the differences

N_{f}^{m a x} - N_{f}^{m i n}

and

T_{f}^{m a x} - T_{f}^{m i n}

vs.

m_{e r r}

. The differences are bigger for higher values of the model error. It seems that this is the cause of a more frequent creation of local minima.

Figure 12 synthesizes the main results that we have experienced with this new approach. Globally it shows a great reduction of the bias contained in the standard least-squares estimates; indeed, we had to use the logarithmic scale to enhance the differences in the behaviour of the proposed method while varying

m_{e r r}

. In particular,

with considerable levels of modelling error, let us say $m_{e r r}$ between 2 and 12, the parameter estimation error ${\hat{e}}_{r e l}$ is at least one order of magnitude smaller that that of least-squares; this is accompanied by high levels of shrinkage of the zero-residual region (Figure 12b);
with higher levels of $m_{e r r}$ , we see a low shrinkage of the zero-residual region and consequently an estimate whose error is highly oscillating, depending on where the optimization algorithm has brought it to get in contact with the zero-residual region;
at $m_{e r r} = 18$ we see the presence of a local minimum, due to the falling to pieces of the zero-residual region as in Figure 10: the shrinkage at the true solution is estimated to be very high, while at the estimated solution it is quite low, since it is attached to a disconnected, wider sub-region.
the shrinking of the zero-residual region is related to the distribution of the outgoing gradients, as stated by Theorem 1: in Figure 12d we see that in the experiment with $m_{e r r} = 18$ they occupy only three of eight orthants, while in the best results of the other experiments the gradients distribute themselves in almost all orthants (not shown).

It is evident from these results that for lower values of modelling error

m_{e r r}

, it is much easier to produce tests that reduce the zero-residual region to a quite small interval of

R^{n_{a}}

, while for high values of

m_{e r r}

it is much more difficult and the region

I_{z r}

can even fall to pieces, thus creating local minima. It is also evident that a simple estimate of the

I_{z r}

region size, like (31), can reliably assess the quality of the estimate produced by the approach here proposed, as summarized in Figure 12c.

6. Conclusions

In this paper we have analyzed the bias commonly arising in parameter estimation problems where the model is lacking some deterministic part of the system. This result is useful in applications where an accurate estimation of parameters is important, e.g., in physical (grey-box) modelling typically arising in the model-based design of multi-physical systems, see e.g., the motivations that the authors did experience in the design of digital twins of controlled systems [10,11,12] for virtual prototyping, among an actually huge literature.

At this point, the method should be tested in a variety of applications, since the ULS approach here proposed is not applicable black-box as Least-Squares are. Indeed, it requires some additional a priori information. Moreover, since the computational complexity of the method here presented is relevant, efficient computational methods must be considered and will be a major issue in future investigations.

Another aspect that is even worth to deepen is also the possibility to design tests that contribute optimally to the reduction of the zero-residual region.

Author Contributions

Conceptualization, methodology, validation, formal analysis, investigation, software, resources, data curation, writing—original draft preparation, writing—review and editing, visualization: M.G. and F.M.; supervision, project administration, funding acquisition: F.M. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the project DOR1983079/19 from the University of Padova and by the doctoral grant “Calcolo ad alte prestazioni per il Model Based Design" from Electrolux Italia s.p.a.

Conflicts of Interest

The authors declare no conflict of interest.

References

Björck, A. Numerical Methods for Least Squares Problems; Society for Industrial and Applied Mathematics: Philadelphia, PA, USA, 1996. [Google Scholar] [CrossRef]
Huffel, S.V.; Markovsky, I.; Vaccaro, R.J.; Söderström, T. Total least squares and errors-in-variables modeling. Signal Process. 2007, 87, 2281–2282. [Google Scholar] [CrossRef]
Söderström, T.; Soverini, U.; Mahata, K. Perspectives on errors-in-variables estimation for dynamic systems. Signal Process. 2002, 82, 1139–1154. [Google Scholar] [CrossRef]
Van Huffel, S.; Vandewalle, J. The Total Least Squares Problem: Computational Aspects and Analysis; Frontiers in Applied Mathematics (Book 9); Society for Industrial and Applied Mathematics: Philadelphia, PA, USA, 1991. [Google Scholar]
Peck, C.C.; Beal, S.L.; Sheiner, L.B.; Nichols, A.I. Extended least squares nonlinear regression: A possible solution to the “choice of weights” problem in analysis of individual pharmacokinetic data. J. Pharmacokinet. Biopharm. 1984, 12, 545–558. [Google Scholar] [CrossRef] [PubMed]
Meyer, C.D. Matrix Analysis and Applied Linear Algebra; Society for Industrial and Applied Mathematics: Philadelphia, PA, USA, 2000. [Google Scholar]
Hansen, P.C. Oblique projections and standard-form transformations for discrete inverse problems. Numer. Linear Algebra Appl. 2013, 20, 250–258. [Google Scholar] [CrossRef] [Green Version]
Nocedal, J.; Wright, S. Numerical Optimization; Springer: Berlin, Germany, 1999. [Google Scholar]
Krause, P.C. Analysis of Electric Machinery; McGraw Hill: New York, NY, USA, 1986. [Google Scholar]
Beghi, A.; Marcuzzi, F.; Rampazzo, M.; Virgulin, M. Enhancing the Simulation-Centric Design of Cyber-Physical and Multi-physics Systems through Co-simulation. In Proceedings of the 2014 17th Euromicro Conference on Digital System Design, Verona, Italy, 27–29 August 2014; pp. 687–690. [Google Scholar] [CrossRef]
Beghi, A.; Marcuzzi, F.; Rampazzo, M. A Virtual Laboratory for the Prototyping of Cyber-Physical Systems. IFAC-PapersOnLine 2016, 49, 63–68. [Google Scholar] [CrossRef]
Beghi, A.; Marcuzzi, F.; Martin, P.; Tinazzi, F.; Zigliotto, M. Virtual prototyping of embedded control software in mechatronic systems: A case study. Mechatronics 2017, 43, 99–111. [Google Scholar] [CrossRef]

Figure 1. Case

n_{a} = 1

. (a): Case

n_{a} = 1

,

m = n = 2

. Solutions with the condition on

N_{f}

. In the figure: the true decomposition obtained imposing both the conditions (blue), the orthogonal decomposition (red), another possible decomposition (green) that satisfy the same norm condition

N_{f}

, but different

I_{f}

; (b): Case

n_{a} = 1

. Intensity Ratio value w.r.t the norm of the vector

A_{a} x_{a}

: given a fixed value of Intensity Ratio there can be two solution, i.e. two possible decomposition of f as sum of two vectors with the same Intensity Ratio; (c): Case

n_{a} = 1

,

m = n = 2

. Solutions with the condition on

I_{f}

. In the figure: the true decomposition obtained imposing both the conditions (blue), the orthogonal decomposition (red), another possible decomposition (green) with the same intensity ratio

I_{f}

, but different

N_{f}

.

Figure 1. Case

n_{a} = 1

. (a): Case

n_{a} = 1

,

m = n = 2

. Solutions with the condition on

N_{f}

. In the figure: the true decomposition obtained imposing both the conditions (blue), the orthogonal decomposition (red), another possible decomposition (green) that satisfy the same norm condition

N_{f}

, but different

I_{f}

; (b): Case

n_{a} = 1

. Intensity Ratio value w.r.t the norm of the vector

A_{a} x_{a}

: given a fixed value of Intensity Ratio there can be two solution, i.e. two possible decomposition of f as sum of two vectors with the same Intensity Ratio; (c): Case

n_{a} = 1

,

m = n = 2

. Solutions with the condition on

I_{f}

. In the figure: the true decomposition obtained imposing both the conditions (blue), the orthogonal decomposition (red), another possible decomposition (green) with the same intensity ratio

I_{f}

, but different

N_{f}

.

Figure 2. Case

n_{a} = 2

. (a): Case

n_{a} = 2

,

m = n = 3

, with

A_{a} x_{a} = [A_{a} (1) A_{a} (2)] {[x_{a} (1) x_{a} (2)]}^{T}

. In the figure: the true decomposition (blue), the orthogonal decomposition (red), another possible decomposition of the infinite ones (green); (b): Case

n_{a} = 2

,

m = n = 3

. Projection of the two circumferences on the subspace

A_{a}

, and projections of the possible decompositions of f (red, blue and green).

Figure 2. Case

n_{a} = 2

. (a): Case

n_{a} = 2

,

m = n = 3

, with

A_{a} x_{a} = [A_{a} (1) A_{a} (2)] {[x_{a} (1) x_{a} (2)]}^{T}

. In the figure: the true decomposition (blue), the orthogonal decomposition (red), another possible decomposition of the infinite ones (green); (b): Case

n_{a} = 2

,

m = n = 3

. Projection of the two circumferences on the subspace

A_{a}

, and projections of the possible decompositions of f (red, blue and green).

Figure 3. Case

n_{a} = 3

. (a): Case

n_{a} = 3

,

m = n = 4

,

n - n_{a} = 1

: in the picture

{\bar{f}}^{‖}

, i.e., the projection of f on

A_{a}

. The decompositions that satisfies the conditions on

I_{f}

and

N_{f}

are the ones with

f_{a}

that lies on the red circumference on the left. The spheres determined by the conditions are shown in yellow for the vector

f_{a}

and in blue for the vector

f^{‖} - a_{a}

. Two feasible decompositions are shown in blue and green; (b): Case

n_{a} = 3

. Intersection of three hyperellipsoids, set of the solutions

x_{a}

of three different tests, in the space

R^{n_{a} = 3}

.

Figure 3. Case

n_{a} = 3

. (a): Case

n_{a} = 3

,

m = n = 4

,

n - n_{a} = 1

: in the picture

{\bar{f}}^{‖}

, i.e., the projection of f on

A_{a}

. The decompositions that satisfies the conditions on

I_{f}

and

N_{f}

are the ones with

f_{a}

that lies on the red circumference on the left. The spheres determined by the conditions are shown in yellow for the vector

f_{a}

and in blue for the vector

f^{‖} - a_{a}

. Two feasible decompositions are shown in blue and green; (b): Case

n_{a} = 3

. Intersection of three hyperellipsoids, set of the solutions

x_{a}

of three different tests, in the space

R^{n_{a} = 3}

.

Figure 4. Case

n_{a} \geq 3

. (a): Case

n_{a} = 4

. Intersection of four hyperellipsoids, set of the solutions

x_{a}

of four different tests, in the space

R^{n_{a} = 4}

; (b): Case

n_{a} = 3

. Example of three tests for which the solution has an intersection bigger than one single point. The three

(n_{a} - 1)

-dimensional subspaces

F_{1}, F_{12}, F_{13}

in the space generated by

A_{a, 1}

intersect in a line and their three orthogonal vectors are not linearly independent.

Figure 4. Case

n_{a} \geq 3

. (a): Case

n_{a} = 4

. Intersection of four hyperellipsoids, set of the solutions

x_{a}

of four different tests, in the space

R^{n_{a} = 4}

; (b): Case

n_{a} = 3

. Example of three tests for which the solution has an intersection bigger than one single point. The three

(n_{a} - 1)

-dimensional subspaces

F_{1}, F_{12}, F_{13}

in the space generated by

A_{a, 1}

intersect in a line and their three orthogonal vectors are not linearly independent.

Figure 5. Examples of the exact and approximated conditions on a test with

n_{a} = 2

. In the left equation the two black ellipsoids are the two constraints of the right system of (14), while in the right figure the two couples of concentric ellipsoids are the borders of the thick ellipsoids defined by (16) and the blue region

Z_{r_{i}}

is the intersection of (18) and (19). The black dot in both the figures is the true solution. (a): Exact conditions on

N_{f}

and

T_{f}

; (b): Approximated conditions on

N_{f}

and

T_{f}

.

Figure 5. Examples of the exact and approximated conditions on a test with

n_{a} = 2

. In the left equation the two black ellipsoids are the two constraints of the right system of (14), while in the right figure the two couples of concentric ellipsoids are the borders of the thick ellipsoids defined by (16) and the blue region

Z_{r_{i}}

is the intersection of (18) and (19). The black dot in both the figures is the true solution. (a): Exact conditions on

N_{f}

and

T_{f}

; (b): Approximated conditions on

N_{f}

and

T_{f}

.

Figure 6. In the figure some upward/downward outgoing gradients are shown: the blue internal ones are downward outgoing gradients calculated on points x on the internal ellipsoid with

N_{f, i} (x) = N_{f}^{m i n}

, while the external red ones are upward outgoing gradients calculated on points x on the external ellipsoid with

N_{f, i} (x) = N_{f}^{m a x}

.

Figure 6. In the figure some upward/downward outgoing gradients are shown: the blue internal ones are downward outgoing gradients calculated on points x on the internal ellipsoid with

N_{f, i} (x) = N_{f}^{m i n}

, while the external red ones are upward outgoing gradients calculated on points x on the external ellipsoid with

N_{f, i} (x) = N_{f}^{m a x}

.

Figure 7. The plots of (a)

ω (t)

and (b)

f_{u} (t)

in the experiment.

Figure 7. The plots of (a)

ω (t)

and (b)

f_{u} (t)

in the experiment.

Figure 8. Two examples of (zero-residual) intersection regions

I_{z r} \subset R^{3}

with different location of the true solution: inside the region or on its border. For graphical reasons the region has been discretized and the dots are the grid nodes; the bigger ball (thick point) is the true solution. (a): The true solution (ball) is on the border of

I_{z r}

; (b): The true solution (ball) is internal to

I_{z r}

.

Figure 8. Two examples of (zero-residual) intersection regions

I_{z r} \subset R^{3}

with different location of the true solution: inside the region or on its border. For graphical reasons the region has been discretized and the dots are the grid nodes; the bigger ball (thick point) is the true solution. (a): The true solution (ball) is on the border of

I_{z r}

; (b): The true solution (ball) is internal to

I_{z r}

.

Figure 9. The intersection region

I_{z r} \subset R^{3}

at different number of tests involved. For graphical reasons the region has been discretized and the dots are the grid nodes; the bigger ball is the true solution and the star is the current estimate in the experiment. (a) 3 tests; (b) 5 tests; (c) 9 tests; (d) 20 tests.

Figure 9. The intersection region

I_{z r} \subset R^{3}

at different number of tests involved. For graphical reasons the region has been discretized and the dots are the grid nodes; the bigger ball is the true solution and the star is the current estimate in the experiment. (a) 3 tests; (b) 5 tests; (c) 9 tests; (d) 20 tests.

Figure 10. The intersection region

I_{z r} \subset R^{3}

at different number of tests involved. On the left a few tests have created a single connected region while, on the right, adding more tests have splitted it into two subregions. For graphical reasons the region has been discretized and the dots are the grid nodes; the bigger ball is the true solution and the star is the current estimate in the experiment. (a) A (portion of a) connected region

I_{z r}

; (b) A region

I_{z r}

split into two not connected sub regions.

Figure 10. The intersection region

I_{z r} \subset R^{3}

at different number of tests involved. On the left a few tests have created a single connected region while, on the right, adding more tests have splitted it into two subregions. For graphical reasons the region has been discretized and the dots are the grid nodes; the bigger ball is the true solution and the star is the current estimate in the experiment. (a) A (portion of a) connected region

I_{z r}

; (b) A region

I_{z r}

split into two not connected sub regions.

Figure 11. The three plots show the values assumed by the extreme values (15) as a function of

m_{e r r}

. (a):

{I_{f}^{m i n}, I_{f}^{m a x}} vs . m_{e r r}

; (b):

{N_{f}^{m i n}, N_{f}^{m a x}} vs . m_{e r r}

; (c)

{T_{f}^{m i n}, T_{f}^{m a x}} vs . m_{e r r}

.

Figure 11. The three plots show the values assumed by the extreme values (15) as a function of

m_{e r r}

. (a):

{I_{f}^{m i n}, I_{f}^{m a x}} vs . m_{e r r}

; (b):

{N_{f}^{m i n}, N_{f}^{m a x}} vs . m_{e r r}

; (c)

{T_{f}^{m i n}, T_{f}^{m a x}} vs . m_{e r r}

.

Figure 12. The plots summarize the results obtained by the ULS approach to parameter estimation no the model problem explained at the beginning of this section. (a): The relative estimation error (37) vs.

m_{e r r}

; (b): The

I_{z r}

region shrinkage estimate (31) vs.

m_{e r r}

; (c): The relative estimation error (37) vs. the estimate of the

I_{z r}

region shrinkage, considering the experiments with

m_{e r r} \in [2, 20]

; (d): A three dimensional view of the Outgoing Gradients at the last iteration of the experiment with

m_{e r r} = 18

.

Figure 12. The plots summarize the results obtained by the ULS approach to parameter estimation no the model problem explained at the beginning of this section. (a): The relative estimation error (37) vs.

m_{e r r}

; (b): The

I_{z r}

region shrinkage estimate (31) vs.

m_{e r r}

; (c): The relative estimation error (37) vs. the estimate of the

I_{z r}

region shrinkage, considering the experiments with

m_{e r r} \in [2, 20]

; (d): A three dimensional view of the Outgoing Gradients at the last iteration of the experiment with

m_{e r r} = 18

.

© 2020 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Gatto, M.; Marcuzzi, F. Unbiased Least-Squares Modelling. Mathematics 2020, 8, 982. https://doi.org/10.3390/math8060982

AMA Style

Gatto M, Marcuzzi F. Unbiased Least-Squares Modelling. Mathematics. 2020; 8(6):982. https://doi.org/10.3390/math8060982

Chicago/Turabian Style

Gatto, Marta, and Fabio Marcuzzi. 2020. "Unbiased Least-Squares Modelling" Mathematics 8, no. 6: 982. https://doi.org/10.3390/math8060982

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Unbiased Least-Squares Modelling

Abstract

1. Introduction

2. Model Problem

3. Analysis of the Parameter Estimation Error

3.1. The Case of Exact Knowledge about $I_{f}$ and $N_{f}$

3.1.1. Case $n_{a} = 1$

3.1.2. Case $n_{a} = 2$

3.1.3. Case $n_{a} \geq 3$

3.2. The Case of Approximate Knowledge of $I_{f}$ and $N_{f}$ Values

4. Problem Solution

4.1. Exact Knowledge of $I_{f}$ and $N_{f}$

4.2. Approximate Knowledge of $I_{f}$ and $N_{f}$

5. Numerical Examples

5.1. Exact Knowledge of $I_{f}$ and $N_{f}$

5.2. Approximate Knowledge of $I_{f}$ and $N_{f}$

6. Conclusions

Author Contributions

Funding

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Article Menu

Unbiased Least-Squares Modelling

Abstract

1. Introduction

2. Model Problem

3. Analysis of the Parameter Estimation Error

3.1. The Case of Exact Knowledge about I f and N f

3.1.1. Case n a = 1

3.1.2. Case n a = 2

3.1.3. Case n a ≥ 3

3.2. The Case of Approximate Knowledge of I f and N f Values

4. Problem Solution

4.1. Exact Knowledge of I f and N f

4.2. Approximate Knowledge of I f and N f

5. Numerical Examples

5.1. Exact Knowledge of I f and N f

5.2. Approximate Knowledge of I f and N f

6. Conclusions

Author Contributions

Funding

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

3.1. The Case of Exact Knowledge about $I_{f}$ and $N_{f}$

3.1.1. Case $n_{a} = 1$

3.1.2. Case $n_{a} = 2$

3.1.3. Case $n_{a} \geq 3$

3.2. The Case of Approximate Knowledge of $I_{f}$ and $N_{f}$ Values

4.1. Exact Knowledge of $I_{f}$ and $N_{f}$

4.2. Approximate Knowledge of $I_{f}$ and $N_{f}$

5.1. Exact Knowledge of $I_{f}$ and $N_{f}$

5.2. Approximate Knowledge of $I_{f}$ and $N_{f}$