Next Article in Journal
Acknowledgement to Reviewers of Instruments in 2019
Previous Article in Journal
Electron Beam Brightness and Undulator Radiation Brilliance for a Laser Plasma Acceleration Based Free Electron Laser
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

The Cramer—Rao Inequality to Improve the Resolution of the Least-Squares Method in Track Fitting

by
Gregorio Landi
1,* and
Giovanni E. Landi
2,*
1
Department of Physics and Astronomy, University of Firenze, and INFN, Largo E. Fermi 2 (Arcetri), 50125 Firenze, Italy
2
ArchonVR S.a.g.l., Via Cisieri 3, 6900 Lugano, Switzerland
*
Authors to whom correspondence should be addressed.
Instruments 2020, 4(1), 2; https://doi.org/10.3390/instruments4010002
Submission received: 25 November 2019 / Revised: 27 December 2019 / Accepted: 2 January 2020 / Published: 4 January 2020

Abstract

:
The Cramer–Rao–Frechet inequality is reviewed and extended to track fitting. A diffused opinion attributes to this inequality the limitation of the resolution of the track fits with the number N of observations. It will be shown that this opinion is incorrect, the weighted least squares method is not subjected to that N-limitation and the resolution can be improved beyond those limits. In previous publications, simulations with realistic models and simple Gaussian models produced interesting results: linear growths of the peaks of the distributions of the fitted parameters with the number N of observations, much faster than the N of the standard least-squares. These results could be considered a violation of a well-known 1 / N -rule for the variance of an unbiased estimator, frequently reported as the Cramer–Rao–Frechet bound. To clarify this point beyond any doubt, a direct proof of the consistency of those results with this inequality would be essential. Unfortunately, such proof is lacking. Hence, the Cramer–Rao–Frechet developments are applied to prove the efficiency (optimality) of the simple Gaussian model and the consistency of the linear growth. The inequality remains valid even for irregular models supporting the similar improvement of resolution for the realistic models.

1. Introduction

Track reconstruction is a fundamental tool in experimental particle physics, which justifies the large efforts in hardware and software dedicated to its improvements. As part of these efforts, we started to develop new fitting tools based on studies of the signal patterns released by minimum ionizing particles (MIPs) in silicon detectors [1,2]. It is evident that any information acquired by a detector is precious to improve the track reconstruction. To properly handle these pieces of information, we developed a set of mathematical methods to calculate a probability density function (PDF) for each hit. The non-Gaussian forms of the PDFs oblige the use of the maximum likelihood search for the fits. This search on products of PDFs for the hits of each track gives very large improvements in the resolution of all the track parameters and shows insensitivity to outliers. Among the new fitting results, one looks particularly attractive [2], it shows a linear growth of the momentum distributions with the number N of detecting layers. This linear growth is much faster than the growth of the standard least-squares method that cannot increase faster than N . This effect could not be discussed with the due details in ref. [2], it contained too many different arguments to add other. Therefore, ref. [3] was entirely dedicated to the description of the linear growth. A section of ref. [3] deals with a simplified illustration of the mechanism of its production. For this, we developed a Gaussian toy-model able to an easy reproduction of some our results. The simplicity of the toy-model allows the development of analytical expressions (some pages of mathematics) that completely describe the model. Instead of equations, we prefer to prepare an easy simulation to be contained in a few lines of MATLAB code. The results of this toy-model have a very effective visual impact for the unusual slenderness of the parameter distributions (empirical PDFs) and their fast growth with N. Nothing to do with the slow growth of the corresponding PDFs of the standard fit. Such evident differences could activate reminiscences of mathematical statistics where the N is a frequent result. In particular, the widely reported form of the Cramer–Rao–Frechet (CRF) bound was used to raise doubts [4] on the generality of these fast growths. To dissipate forever any possible doubt and to defend our growth (linear or faster than linear) against any criticism, we are forced to recall the necessary details of the CRF method to demonstrate the consistency of our results with the CRF-inequality. To illustrate better what spurred us on this exploration of the CRF technologies, we report some results of the Gaussian toy-model, with a few details of its construction. In principle, there is no need to know these details for the following mathematical developments, they are completely self-contained. Instead, the results of the toy-model are relevant to give a dimension to the practical effects of the versions of the CRF-inequality that will be developed. Those practical effects are invisible in the very simple aspect of the inequalities. At the same time, the mathematical results give an extended generality to the improvements of these fitting methods.
The following Figure 1 illustrates (Monte Carlo) simulations of a simple tracker with N detecting layers of identical technology (silicon microstrips) crossed at orthogonal incidence by a set of parallel straight tracks of minimum ionizing particles (MIPs). The reported estimator is the track direction (the tangent of a small angle). In this toy-model, the hit uncertainties are Gaussian distributions with two type of hit quality. One of very good quality when the MIP crosses the detector near the borders ( 20 % of strip width), and the other of bad quality ( 80 % of the strip width) around the strip center. The standard deviations of the hit Gaussian distributions are σ 1 = 0.18 and σ 2 = 0.018 strip widths (the strip width is 63 μ m). (These models with non-identical variances are called heteroscedastic in statistics.) The simulated orthogonal straight tracks (produced and organized as described in Appendix B) are fitted with two different least-squares methods, one that uses only the hit positions and the second that also uses the hit quality (weighted least-squares).
With standard least-squares, we define the usual least-squares where all the inserted observations have identical variances (homoscedastic models). The standard least-squares, built for homoscedastic models, when applied to heteroscedastic models gives results almost insensitive to the presence of the good hits, the maximums of the empirical PDFs show a growth very near N . The blue lines of Figure 1. Instead, the weighted least-squares, in this heteroscedastic model, has an almost linear growth of the maximums of the empirical PDFs. Translating these results on mean variances, the standard fit shows a 1 / N trend, instead, the weighted fit shows a trend of the mean variances as 1 / N 2 . It is just this trend 1 / N 2 that could be considered a violation of the CRF-bound as reported in many books and documentations. However, the main part of ref. [3] deals with physical models with Cauchy-like tails and (in theory) with infinite variance. Hence, any finite lower bound on the variance, given by the CRF-inequality, is automatically satisfied by them. In any case, the evident similarity of the PDFs of the physical models with those of the Gaussian toy-model could extend to their linear growth doubts similar to those of the toy-model. Therefore, we must explore the CRF-inequality to prove the inappropriate extension [4] of the rule 1 / N outside its domain of validity. For this, we will extend the CRF developments to heteroscedastic models and to the least squares method for track fitting (for straight tracks and curved tracks) with a special attention to our Gaussian model. We did not find the last two points in the literature. The interest in CRF-inequality moves in field very far from track fitting as illustrated by ref. [5] and therein references. Even if the essential equations are very similar, and our tracks are stochastic processes, it is better to develop our approach in direct connection to our problem with a strict reference to the equations used in refs. [1,2,3]. If other authors did similar developments we apologize for not citing them. In any case, the enormous statistical samples of tracks (of the order of 10 12 tracks per year) renders very important careful studies of any detail of their fitting methods and the selection of the best. The use of the appropriate fitting tools in heteroscedastic systems allows drastic improvements of the quality of the track reconstruction. The lucky model of ref. [3] is a first order tool to account for the heteroscedasticity of the silicon trackers.
Section 2 recall the essential definitions and the demonstration of the simplest form of the CRF-inequality in the case of the independent and identically distributed (i.i.d.) random variables. Section 3 contains a first model of heteroscedasticity, the weighted mean and its efficiency (optimality) compared to the standard (arithmetic) mean. In Section 4 we apply the CRF-inequality to the least-squares for straight tracks with the corresponding estimator definitions, it is proved that in heteroscedastic system the estimators of standard least-squares have greater variances compared to the heteroscedastic ones. Section 5 deals with the momentum estimators in a simplified model of high-momentum tracks. The conclusions are in Section 6. We added in Appendix A with a few essential properties of the positive semi-definite matrices used in Section 4 and Section 5. Appendix B contains details about the Gaussian toy-model.

2. The Standard CRF-Inequality

The developments of this section follow as far as possible refs. [6,7]. As usual it is considered a N-times repetition of random observations { x 1 , x 2 x N } described by an identical probability density function (PDF) f ( x , θ ) , where θ is a (scalar) parameter. The variables x = { x 1 , x 2 x N } are i.i.d. random variables. We will not comment the absolute physical inconsistency of this assumption. The probability of the set x of i.i.d. random variables (likelihood function) is:
L ( x , θ ) = f ( x 1 , θ ) f ( x 2 , θ ) f ( x N , θ )   .
As usual f ( x , θ ) is positive, normalized, and supposed differentiable with respect to the θ -parameter. Let us define the random variable U ( x , θ ) :
U ( x , θ ) = ln   L ( x , θ ) θ = i = 1 N ln   f ( x i , θ ) θ   .
The following developments require differentiation of the integrals of f ( x , θ ) and invert the order of integration and differentiation. The functions that allow these operations are called regular models and they imply strict analytical conditions. A necessary condition of regular models is the independence of the range of f ( x , θ ) from the θ -parameter. The normalization of the f ( x , θ ) is:
L ( x , θ ) d x = f ( x 1 , θ ) f ( x 2 , θ ) f ( x N , θ ) d x 1 d x 2 d x N = 1   .
Differentiating Equation (3) with respect to θ and inverting the integral with the differentiation gives:
0 = L ( x , θ ) θ d x = i = 1 N f ( x i , θ ) θ d x i = ln L ( x , θ ) θ L ( x , θ ) d x = i = 1 N ln f ( x i , θ ) θ f ( x i , θ ) d x i   .
The second line of Equation (4) is given by a property of the logarithmic derivative and it implies that the mean value of U ( x , θ ) is zero. Instead, its mean square is different from zero:
U ( x , θ ) 2 d x = i = 1 N ln f ( x i , θ ) θ 2 f ( x i , θ ) d x i = N   ln f ( x , θ ) θ 2 f ( x , θ ) d x   .
The first term of Equation (5) is called Fisher information of the sample, the last integral is the Fisher information contained in a single observation.
For i.i.d. random variables, the Fisher information is N-times the Fisher information of a single observation.
This originates the 1 / N factor in the CRF-bound. It will be evident that for non i.i.d. (heteroscedastic) random variables this factorization disappears. We underline the i.i.d. assumption in Equation (5) because often this condition is a standard assumption (as in ref. [6]), and it can only be recovered with an accurate re-reading of the entire demonstration and doing another demonstration without the i.i.d. assumption.
If f ( x , θ ) is twice differentiable with respect to θ , the last term of Equation (4) gives the following identity:
0 = ln f ( x , θ ) θ 2 f ( x , θ ) d x + 2 ln f ( x , θ ) θ 2   f ( x , θ ) d x   .
The Fisher information i ( θ ) for a single observation becomes:
i ( θ ) = ln f ( x , θ ) θ 2 f ( x , θ ) d x = 2 ln f ( x , θ ) θ 2   f ( x , θ ) d x   .
The last form of i ( θ ) is particularly effective for Gaussian PDF.

2.1. The CRF-Inequality

At this point it is easy to obtain the CRF-inequality for these i.i.d. random variables. Defining an estimator T ( X ) = T ( x 1 , x 2 x n ) and its mean value:
T ( x ) L ( x , θ ) d x = τ ( θ )   ,
differentiating Equation (8) in θ we obtain:
τ ( θ ) = T ( x ) ln L ( x , θ ) θ L ( x , θ ) d x = T ( x ) τ ( θ ) ln L ( x , θ ) θ L ( x , θ ) d x = T ^ ( x ) L ( x , θ ) ln L ( x , θ ) θ L ( x , θ ) d x             T ^ ( x ) = T ( x ) τ ( θ )   .
The form of the last integral is allowed by the positivity of L ( x , θ ) . Due to Equation (4), the subtraction of τ ( θ ) from T ( x ) does not modify the integral in Equation (9). The Cauchy-Schwarz inequality gives:
[ τ ( θ ) ] 2 T ( x ) ^ 2 L ( x , θ ) d x ln L ( x , θ ) θ 2 L ( x , θ ) d x [ τ ( θ ) ] 2   N   i ( θ ) T ( x ) ^ 2 L ( x , θ ) d x
or in its final form:
T ( x ) ^ 2 L ( x , θ ) d x [ τ ( θ ) ] 2 N   i ( θ )   .
The variance of T ( x ) cannot be lower than the right side of Equation (11). This is the well-known CRF-inequality that creates doubts to some of our readers. When the CRF-inequality becomes an identity, the estimator is defined efficient.

2.2. Efficient Estimators

The Cauchy-Schwarz inequality becomes an equality if and only if T ^ ( x ) and U ( x , θ ) are linearly dependent:
T ( x , θ ) τ ( θ ) = a ( θ ) U ( x , θ )
Hence, if an efficient estimator exists, it is a function of the model. The logarithmic functions contained in U ( x , θ ) allow the discovery of estimators only in exponential models.

2.3. The Standard Mean with an i.i.d. Gaussian Model

In a Gaussian model, the PDF f ( x , θ ) , the mean value τ ( θ ) and the Fisher information i ( θ ) are:
f ( x , θ ) = exp [ ( x θ ) 2 / 2 σ 2 ] 2 π σ               τ ( θ ) = T ( x ) L ( x , θ ) d x = θ i ( θ ) = 2 ln f ( x , θ ) θ 2   f ( x , θ ) d x = 1 σ 2   .
The efficient estimator of this model (the arithmetic mean), as defined by Equation (12), is given by:
U ( x , θ ) = 1 σ 2 j = 1 N   x j N σ 2 θ             1 N i U ( x , θ ) = T e ( x ) θ           T e ( x ) = j = 1 N   x j N
Its unbiased form T e ( x ) ^ = T e ( x ) θ , inserted in Equation (11) gives the identity σ 2 / N = σ 2 / N .

3. CRF-Inequality for Heteroscedastic Models

It is well-known by long time that different observations have always non-identical PDF. The universe would be very simple if the observations were i.i.d., in this case the precision of a measure could be improved at any level by the simple repetition and averaging the results. Gauss knew very well this problem and in his paper of 1823 extended his least-squares method to handle observations of different PDF. For our misfortune, the standard books of mathematical statistics do not cover this argument. However, to complete our confutation of critics moved to ref. [3], we must venture in this field. As in the previous section, we suppose a set of observations obtained by an array of different detectors as in a tracker. Even if all the detectors are built with identical technology, differences are always present. Each detector is anisotropic for its construction, optimized to the measure of positions. The signal spreads in a different recognizable way for each hit point. Moreover, each observation contains quantum mechanical interactions that add further complications. Nevertheless, a PDF for each observation can be defined (ref. [1]) even if very different from the form required for the CRF-inequality. Here, the likelihood of a set of heteroscedastic observations becomes:
L 1 , 2 , , N ( x , θ ) = f 1 ( x 1 , θ ) f 2 ( x 2 , θ ) f N ( x N , θ )
The evident difference is an index attached to each PDF. Instead of the long PDFs of ref. [1,2], we will use an easy model. The simple model of ref. [3] is a heteroscedastic Gaussian model (independent non-identically distributed Gaussian random variables), hence, it has all the analytical properties required for the CRF-inequality. Again U 1 , 2 , , N ( x , θ ) is defined as:
U 1 , 2 , , N ( x , θ ) = ln   L 1 , 2 , , N ( x , θ ) θ = i = 1 N ln   f i ( x i , θ ) θ
Even the Fisher information acquires a set of indices,
U 1 , 2 , , N ( x , θ ) 2 L 1 , 2 , , N ( x , θ ) d x = j = 1 N ln f j ( x j , θ ) θ 2 f j ( x j , θ ) d x j = j = 1 N i j ( θ )   ,
now all the PDF of the observations are different as the Fisher information of each observation. The factorization of Equation (5) disappears with its N factor and its supposed generality. All the other equations are formally identical up to the first line of Equation (10) ( a part an additional index to the PDFs). With the new Fisher information, the CRF-inequality for an unbiased estimator T ^ 1 , 2 , , N ( x ) becomes:
T ^ 1 , 2 , , N ( x ) 2 L 1 , 2 , , N ( x , θ ) d x [ τ 1 , 2 , , N ( θ ) ] 2 j = 1 N i j ( θ )
Supposing a large set of possible PDFs ( as those of ref. [1] ) the estimator T 1 , 2 , , N ( x , θ ) must keep the track of the types and the order of the PDFs contained. It must be substituted with T j 1 , j 2 , j N ( x , θ ) , similarly for the likelihood function L j 1 , j 2 , j N ( x , θ ) and the Fisher information. In this extended form the CRF-inequality becomes:
T ^ j 1 , j 2 , j N ( x ) 2 L j 1 , j 2 , j N ( x , θ ) d x [ τ j 1 , j 2 , j N ( θ ) ] 2 i j 1 , j 2 , j N ( θ )
The efficient estimators have the variance identical to the CRF-bound. Again, the logarithmic functions of Equation (15) limit the definition of efficient estimators to exponential PDFs.

3.1. The Weighted Mean in Heteroscedastic Gaussian Models

Let us consider the standard mean T 1 , 2 , N a ( x ) = i = 1 N x i / N as an estimator in heteroscedastic Gaussian models. The PDFs f j ( x , θ ) , the mean value τ 1 , 2 , N ( θ ) and the Fisher information due to the j-PDF i j ( θ ) are:
f j ( x , θ ) = exp [ ( x θ ) 2 / 2 σ j 2 ] 2 π σ j               τ 1 , 2 , N ( θ ) = T 1 , 2 , N a ( x ) L 1 , 2 , N ( x , θ ) d x = θ i j ( θ ) = 2 ln f j ( x , θ ) θ 2   f j ( x , θ ) d x = 1 σ j 2             i = 1 N x i N θ 2   L 1 , 2 , N ( x , θ )   d x = J = 1 N σ j 2 N 2
With the variance of T a the CRF-inequality is:
J = 1 N σ j 2 N 2 > 1 j = 1 N 1 σ j 2
The equality is impossible for the assumption of the heteroscedasticity (the identity of the σ j is excluded). Let us extract from Equation (15) an efficient estimator defined (if it exists) as usual:
T 1 , 2 , N ( x ) τ 1 , 2 , N ( θ ) = a ( θ ) U 1 , 2 , , N ( x , θ )              
With a ( θ ) = 1 / [ j = 1 N 1 / σ j 2 ] and τ 1 , 2 , N = θ it is:
a ( θ ) i = 1 N ln   f i ( x i , θ ) θ = 1 j = 1 N 1 / σ j 2 i = 1 N x i σ i 2 θ             T 1 , 2 , N e ( x ) = i = 1 N x i σ i 2 1 j = 1 N 1 σ j 2
This estimator is proportional to ln L 1 , 2 , , N ( x , θ ) / θ and transforms the Cauchy-Schwarz inequality (18) in an identity. It is easy to verify, without the CRF-inequality, that the variance of the standard mean is always greater than the variance of the weighted mean.
J = 1 N σ j 2 N 2 > 1 j = 1 N 1 σ j 2                         J = 1 N σ j 2 j = 1 N 1 σ j 2 > N 2
The second equation is the Cauchy-Schwarz inequality for the two vectors { σ j } and { 1 / σ j } . The equality is impossible because σ 2 and 1 / σ 2 can never be proportional excluding the trivial case of identical σ j . Heteroscedasticity excludes this case. Thus, in heteroscedastic models, the weighted average has always minor variance compared to the standard mean for regular and irregular models.
Hence, each combination of observations from our Gaussian model satisfies the CRF-inequality. No 1 / N factor is present in the equation due to a different Fisher information of each single observation. The case of identity of all the observations is excluded a priori. Moreover, the mean of the standard variances of a set of random sequences { j 1 , j 2 , j N } , not all identical, is higher than the mean of the corresponding weighted averages. The absence of 1 / N factors in the variance of the weighted average allows fast decreases with N that is impossible to the variance of the standard mean.

3.2. The Weighted Mean in a Homoscedastic Model

To prove another aspect of the CRF-inequality, we use the weighted mean outside its definition model (heteroscedasticity) and we use it in a homoscedastic model. In this case the standard mean is the efficient estimator. The CRF-inequality states that the variance of the weighted mean is greater than the variance of the standard mean. It is easy to illustrate this fact (the reverse of the last equations).
T 1 , 2 , N e ( x ) θ 2 L ( x , θ )   d x = σ 2 j   1 / σ j 4 ( l 1 / σ l 2 ) 2 > σ 2 N                       N j 1 σ j 4   >   l 1 σ l 2 2
The last inequality is again a Cauchy-Schwarz inequality. The equality is excluded. Thus, an efficient estimator in a model, outside its model, has always a greater variance compared to the efficient estimator of the new model.

4. CRF-Bound for Heteroscedastic Least-Squares of Straight Tracks

The application of the CRF-inequality to heteroscedastic least-squares fit for straight tracks adds other complications. Two parameters must be handled. The first parameter is the constant β , the impact point of the track in the reference plane. The second is γ   y i , the angular shift of the track in the plane distant y i from the reference plane. Also here, the heteroscedasticity eliminates any easy rule with the increase of the number (N) of the observations for two different reasons. One is again the differences of the σ i ; the new one is due to the constraint of the fixed length of the tracker. Each insertion of a new detecting plane requires a repositioning of all the others. We will study our Gaussian model with the condition j = 1 N y j = 0 , this introduces some simplification on standard least-squares equations. The likelihood function is similar to that of Equation (14), but now, the order of the PDFs must be conserved. The parameter y j orders the measuring planes:
L 1 , 2 , , N ( x , β , γ ) = f 1 ( x 1 , β , y 1 γ ) f 2 ( x 2 , β , y 2 γ ) f N ( x N , β , y N γ )
f j ( x , β , y j γ ) = exp [ ( x β y j γ ) 2 / 2 σ j 2 ] 2 π σ j               τ 1 , 2 , N γ ( β , γ ) = T 1 , 2 , N γ ( x ) L 1 , 2 , N ( x , β , γ ) d x = γ τ 1 , 2 , N β ( β , γ ) = T 1 , 2 , N β ( x ) L 1 , 2 , N ( x , β , γ ) d x = β
The function U 1 , 2 , , N ( x , θ ) is the 2 x 1 matrix U 1 , 2 , , N ( x , β , γ ) :
U 1 , 2 , , N ( x , β , γ ) = ln L 1 , 2 , , N ( x , β , γ ) β ln L 1 , 2 , , N ( x , β , γ ) γ
The Fisher information I 1 , 2 , , N becomes the 2 x 2 matrix:
I 1 , 2 , , N = d x L 1 , 2 , , N ( x , β , γ ) 2 ln L 1 , 2 , , N ( x , β , γ ) β 2 2 ln L 1 , 2 , , N ( x , β , γ ) β γ 2 ln L 1 , 2 , , N ( x , β , γ ) γ β 2 ln L 1 , 2 , , N ( x , β , γ ) γ 2
I 1 , 2 , , N = j = 1 N 1 σ j 2 j = 1 N y j σ j 2 j = 1 N y j σ j 2 j = 1 N y j 2 σ j 2
Its inverse is:
I 1 , 2 , , N 1 = 1 Det   { I 1 , 2 , , N }   j = 1 N y j 2 σ j 2 j = 1 N y j σ j 2 j = 1 N y j σ j 2 j = 1 N 1 σ j 2
The CRF-inequality assumes the form of a matrix inequality, we limit to relations among variances:
D i a g . V a r ( β , β ) C o r ( β , γ ) C o r ( γ , β ) V a r ( γ , γ ) D i a g .   I 1 1 , 2 , , N
where V a r ( β , β ) and V a r ( γ , γ ) are the variances of any estimators for β and γ :
V a r ( β , β ) = T 1 , 2 , N β ( x ) β 2 L 1 , 2 , N ( x , β , γ ) d x j = 1 N   y j 2 / σ j 2 j = 1 N   y j 2 / σ j 2 k = 1 N   1 / σ k 2 ( j = 1 N   y j / σ j 2 ) 2
and:
V a r ( γ , γ ) = T 1 , 2 , N γ ( x ) γ 2 L 1 , 2 , N ( x , β , γ ) d x j = 1 N   1 / σ j 2 j = 1 N   y j 2 / σ j 2 k = 1 N   1 / σ k 2 ( j = 1 N   y j / σ j 2 ) 2
The Gaussian model is efficient because its variances are equal to the diagonal of I 1 (here and in the following the indications 1 , 2 , , N of the used PDFs will be subtended). The unbiased efficient estimators of β and γ for the model of Equation (25) are given by I 1 U ( x , β , γ ) (similar to those for a single parameter):
U ( x , β , γ ) = T 1 ( x ) T 2 ( x ) I β γ                       I 1 U ( x , β , γ ) = I 1 T 1 ( x ) T 2 ( x ) β γ
The last equation allows the extraction of the formal expressions of the unbiased efficient estimators for this model (and are identical to those reported everywhere, for example, in ref. [8]). The mean values of products of logarithmic derivatives are obtained by the mean values of double derivative of a single logarithmic function (as in Equation (6)), and due to Equation (28) gives:
U ( x , β , γ ) · U ( x , β , γ )   L ( x , β , γ ) d x = I I 1 U ( x , β , γ ) · U ( x , β , γ ) I 1   L ( x , β , γ ) d x = I 1   .
The second equation uses the first equation giving the Fisher information and the symmetry I 1 = I 1 . The diagonal terms are the variances of the efficient estimators for β and γ . It is evident the privileged position of the Gaussian PDFs in the CRF-inequality.

4.1. The Standard Least-Squares Equations with Heteroscedastic Likelihood

We defined standard least-squares equations the estimators (for β and γ ) obtained with a homoscedastic model. It is easy to prove that the standard least-squares estimators are efficient estimators for a Gaussian model with identical σ s. The extension of Equations (13) to the case for two parameters shows immediately this property. The condition i = 1 N y i = 0 simplifies the forms of the estimators. In the case of identical σ s the CRF-inequality states that any other estimator has a greater variance compared to the efficient estimators. It is evident that the efficiency of an estimator is strictly bound to the likelihood model of its definition (in a given model the efficient estimator of a parameter is unique). We could derive the forms of the standard least-squares estimators as the efficient estimators for the Gaussian model with identical σ j , but the usual derivation is faster. The estimators of β and γ of the standard least squares are:
i = 1 N x i β γ y i = 0             T s β ^ ( x , β , γ ) = i = 1 N x i N β         i = 1 N x i β γ y i y i = 0               T s γ ^ ( x , β , γ ) = i = 1 N x i y i i = 1 N y i 2 γ
For an easy check of the forms of the estimators, we report the least-squares equations that give their expressions (after inserting i y i = 0 ). The heteroscedastic likelihood L ( x , β , γ ) destroys their efficiency (optimality) and the new variances are different from those in the homoscedastic model ( σ 2 / N and σ 2 / j y j 2 ):
T s β ^ ( x , β , γ ) 2 L ( x , β , γ ) d x = i = 1 N σ i 2 N 2 T s γ ^ ( x , β , γ ) 2 L ( x , β , γ ) d x = i = 1 N σ i 2 y i 2 ( i = 1 N y i 2 ) 2   ,
and, due to the differences from the variances of the efficient estimators, the CRF inequalities of Equations (32) and (33) impose to them to be grater. We will extensively prove this property as a prototype of demonstration [9] that will be used even for the momentum variances.
First, it is required the joint variance-covariance matrix of T s ^ β , γ with the efficient estimators I 1 U ( x , β , γ ) . This matrix is positive semi-definite (symmetrical with the variances in the diagonal). Let C be this matrix, T the column vector T = { T s ^ β , T s γ } and with <   > the average on the likelihood of Equation (25). The matrix I 1 is the variance-covariance matrix of the efficient estimators (Equation (35))
C = < T   T > < T   U I 1 > < I 1 U   T > I 1 0
It is easy to show the following general property of the covariance of the unbiased estimator T ^ s β with the efficient estimators. By definition, the mean value of an unbiased estimator is always zero as any of its derivative:
β ( T s β ( x ) β ) L ( x , β , γ ) d x = 0 = 1 + ( T s β ( x ) β ) β L ( x , β , γ ) d x   .
The derivative of the likelihood is the first component of U ( x , β , γ ) of Equation (27), and this correlation (the integral term) becomes equal to one. The derivative in γ of the mean value of T ^ s β has no one and the sought correlation is zero. Similarly, for the other estimator T ^ s γ . Hence the matrix < T   U > is:
< T   U > = I 2 = 1 0 0 1 = < U   T >         C = < T   T > I 1 I 1 I 1 0
The matrix I 1 is symmetric. The properties (positive semi-definite) of C -matrix are conserved even with transformation A   C   A with A any real matrix of four lines and two columns (a few details about the positive semi-definite matrices are reported in Appendix A):
( I 2     I 2 ) C I 2 I 2 = < T   T > I 1 0                         T   T   >   I 1
The last of Equation (41) is Equation (31) as a matrix relation. The equality is eliminated being evidently impossible. For the definition of positive semi-definite matrices, it is straightforward to extract the relations among the variances. The Equations (31)–(41) will be extended the estimators of the momentum reconstructions.
Equations (41) were demonstrated with heteroscedastic Gaussian model; however, the variances of T remain always greater also for heteroscedastic irregular models with equivalent set of { σ j } . The efficient estimators of heteroscedastic models continue to be efficient even for homoscedastic models, converging to them when all the σ j become identical.
This set of demonstrations for heteroscedastic model is addressed to each given chain of PDFs with a defined order of σ j . Their variances of Equations (32) and (33) are strongly modified by the differences among the { σ j } without any evident N limitation. Instead, the variances of the standard least squares are weakly modified, and they save explicit N limitations. For example, the variances of the β parameter has averages σ m 2 / N . Figure 1 illustrates these limits for the γ estimator (the blue distributions) also j y j 2 cannot grow faster than N for the fixed length of the tracker. The large modifications of the variances of the chains of PDFs, render the track fitting strongly dependent from other PDFs. For example, those that select the chains among the possible set of chains. These PDFs are introduced by the properties of the measuring devices. As we said above, a binomial PDF suffices in the disordered tracker detector for the models of refs. [2,3], just a mean disorder of half strip (≈30 μ m) is enough for this. The use of standard least-squares method in heteroscedastic models has enormous simplifications, because it does not need the knowledge of the σ j for each observation. Unfortunately, it amounts to a large loss of resolution. Our models of ref. [3] illustrate in evident way the amplitude of this loss. In any case the lucky model of ref. [3] can be an economic way to introduce effective σ j .
For our pleasure, in Figure 2, we invented a faster growth than that of the Gaussian model of ref. [3]. The quality of the detectors increases with the number N of detecting layers, each inserted layer is better than the previous one. The bleu straight line shows an evident deviation from a linear growth with an almost parabolic growth (with the variances going down as 1 / N 4 ). The blue distributions are practically insensible to these changes. With appropriate selection of the detector quality many types of growth can be implemented.

4.2. Deviations from Optimality

Equations (41) prove the non-optimality of the standard least-squares in the case of its use with a different (heteroscedastic) likelihood. The standard least squares is the efficient estimator for a homoscedastic Gaussian model, the use outside this condition of efficiency (optimality) destroys its optimality and produces an increase of variances with a degradation of the resolution, as illustrated by the simulations of ref. [1,2,3]. Even our heteroscedastic model suffers of a similar optimality loss outside its range of validity. We underlined that the orders and the values (whatever they are) of the set of { σ j } , in the likelihood, define the model. Each deviation from the values or the order (or both) of the σ j implies an increase of the variance of the new estimator compared to the variance of the optimal (efficient) estimator. We used this property to prove the crucial correlation of the appropriate σ j with the hit j. If the ordered set of { σ j } is randomized, destroying the correlation with the set of the hits, the distributions of the estimators change drastically becoming worse than the standard least-squares. Instead, the differences of the heteroscedastic efficient estimator minus suboptimal estimators, with the elimination of some PDFs, rapidly converge toward zero when the suppressed PDFs are reduced. This rapid convergence is not observed in the case of the standard least-squares.

5. Inequality for Momentum Estimators

As anticipated above, the CRF-inequality will be extended to case of the momentum reconstruction. It will be studied the case of a high-momentum charged particle moving in a tracker with a homogeneous magnetic field and the path of the particle orthogonal to the field direction. Here, the segment of the circular path can be approximated with a parabolic path (algorithms for circular path are described in ref. [10]). As for the straight tracks, we will compare the estimators of standard least squares with those of a heteroscedastic model. The estimators of the standard least-squares are efficient (optimum) in the homoscedastic Gaussian model.

5.1. The Momentum Estimators in Homoscedastic Gaussian Model

We use this approach to define our expressions for the estimators. These forms will be used in heteroscedastic models. Our equation will be expressed in compact way with matrices. The function L ( x , β , γ , η ) is the likelihood for this case, β and γ are just defined, η is the curvature of the track, proportional to 1 / p with p the track momentum:
L ( x , β , γ , η ) = f 1 ( x 1 , β , γ , η ) f 2 ( x 2 , β , γ , η )   f N ( x N , β , γ , η ) f j ( x , β , γ , η ) = exp [ ( x β γ y j η y j 2 ) 2 / 2 σ 2 ] 2 π σ
The index j orders the PDFs f j as the tracker layers. The equivalent of Equation (27) is:
V ( x , β , γ , η ) = ln L ( x , β , γ , η ) β ln L ( x , β , γ , η ) γ ln L ( x , β , γ , η ) η
The unbiased estimators for β , γ , η are obtained from V ( x , β , γ , η ) :
V ( x , β , γ , η ) = T a ( x ) T b ( x ) T c ( x ) R β γ η                       R 1 V ( x , β , γ , η ) = R 1 T a ( x ) T b ( x ) T c ( x ) β γ η
The vector R 1 V ( x , β , γ , η ) contains the efficient estimators of this model, the matrix R is its Fisher information (with the condition j y j = 0 and j = { 1 , 2 , , N } ):
R = N 0 j y j 2 0 j y j 2 j y j 3 j y j 2 j y j 3 j y j 4 1 σ 2   .
For Equation (35) extended to this case, it is:
V ( x , β , γ , η ) · V ( x , β , γ , η )   L ( x , β , γ , η ) d x = R R 1 V ( x , β , γ , η ) · V ( x , β , γ , η ) R 1   L ( x , β , γ , η ) d x = R 1   .
The matrices R and R 1 are symmetric and positive semi-definite. The second line of the last equation is the variance-covariance matrix of the efficient estimators. The variance-covariance matrix of a generic vector T of unbiased estimators of β , γ , η with the efficient estimators is:
K = < T   T > < T   V R 1 > < R 1 V   T > R 1 0
As defined, the symbol <   > indicates an integral with the likelihood L . The cross correlations < V   T > follow the rules of Equation (39) giving now the 3 x 3 identity matrix I 3 reducing the off-diagonal elements of K to R 1 . With a transformation of the type A   K   A with a 6 x 3 real matrix A the fundamental inequality is obtained:
( I 3     I 3 ) < T   T > R 1 R 1 R 1 I 3 I 3 = T   T R 1 0
The equality is obtained only for the efficient estimators of this model. Therefore, any other set of estimators have a greater variance in this model. Any heteroscedastic model has greater variance than the efficient model. The reason is connected to the modification that the likelihood L introduces in the variances. These modifications always increase the variances. In this model, the parameter σ disappears in the expressions for the estimators. The form R of the Fisher information depends from N, this limits the growth with the numbers of detecting layers as illustrated in ref. [2].

5.2. The Momentum Estimators in Heteroscedastic Gaussian Model

In its essence this model can be obtained from the corresponding homoscedastic model with a set of formal differences.
L ( x , β , γ , η ) = f 1 ( x 1 , β , γ , η ) f 2 ( x 2 , β , γ , η )   f N ( x N , β , γ , η ) f j ( x , β , γ , η ) = exp [ ( x β γ y j η y j 2 ) 2 / 2 σ j 2 ] 2 π σ j
The index j orders the PDFs as the detecting layers and selects a different σ j for this observation. The other differences are in the definitions of the vector T 1 , T 2 , T 3 that are different from the previous T a , T b , T c and the Fisher information I .
U ( x , β , γ , η ) = T 1 ( x ) T 2 ( x ) T 3 ( x ) I β γ η                       T 1 ( x ) T 2 ( x ) T 3 ( x ) = j x j σ j 2 j x j   y j σ j 2 j x j   y j 2 σ j 2
The Fisher information becomes:
I = j 1 / σ j 2 j y j / σ j 2 j y j 2 / σ j 2 j y j / σ j 2 j y j 2 / σ j 2 j y j 3 / σ j 2 j y j 2 / σ j 2 j y j 3 / σ j 2 j y j 4 / σ j 2
The efficient unbiased estimators for the likelihood L ( x , β , γ , η ) are contained in the vector I 1 U ( x , β , γ , η ) . Even in this case the Fisher information has no obvious N-dependence. The variance- covariance matrix for any vector T (as T = ( T a   , T b   , T c ) ) with the unbiased estimators can be transformed as described giving the inequality:
( I 3     I 3 ) < T   T > I 1 I 1 I 1 I 3 I 3 = T   T I 1 0
The symbol <   > indicates an integral on x with the likelihood of this model L ( x , β , γ , η ) . The equality to zero is obtained only when the vector T is I 1 U ( x , β , γ , η ) . For any other unbiased estimator, that difference is greater than zero (in the sense of the positive semi-definite matrices).
These last forms of demonstration easily extend the CRF-inequality to any number M of estimators.

6. Conclusions

The Cramer–Rao–Frechet inequality is applied to a set of heteroscedastic Gaussian model to prove the absence of any limitation introduced by this fundamental inequality. Instead, the inequality shows that in heteroscedastic models, the estimators of the standard least-squares have always greater variances compared to that given by the weighted least-squares. The variances of the standard least-squares conserve approximate 1 / N form that imply limitations in the growth of the distributions with the number N of detecting layers. No similar limitations are present in the weighted least-squares that are dominated by the differences of the weights 1 / σ j 2 . For identical σ j the estimators converge to those of the standard least-squares. The inequality is calculated also for the momentum estimators. Again the appropriate heteroscedastic estimators have a lower variance than those of the standard least-squares. Given the loss of resolution implied by greater variances, it must be avoid the use of standard least-squares with heteroscedastic data. The lucky model may be helpful in silicon detectors to insert the approximate σ j to each observation.

Author Contributions

Formal analysis, G.L. and G.E.L.; Software, G.E.L. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external founding.

Conflicts of Interest

The authors declare no conflict of interest.

Abbreviations

The following abbreviations are used in this manuscript:
MIPMinimum Ionizing Particle
PDFProbability Density Function
CRFCramer–Rao–Frechet
i.i.d.independent and identically distributed

Appendix A

Let us add a few details about variance-covariance matrix or about the cross-covariance matrix. For their definitions these matrices are positive semi-definite. They are symmetric with the positive variances in the principal diagonal, the off-diagonal elements are limited due to Cauchy-Schwarz inequalities. If the symmetric matrix C is positive semi-definite, it must be z C z 0 for any real vector z. From these definitions it follows that if A is a real matrix of m   ×   n , with n < m , the matrix K given by A   C   A is again a n   ×   n positive semi-definite matrix. The diagonal elements of K are given by z C z where each z is a column of A and the off-diagonal terms are symmetric. A m   ×   m positive semi-definite matrix implies an infinite number of inequalities ( m ), our interest is to find the minimal one with useful dimensions. The Equations (41), (48) and (52) are addressed to this, the identity matrices I 2 and I 3 select the useful blocks of other matrices. As an example, we can operate on the C matrix ( 4   ×   4 ). In general, one can use a 4   ×   2 matrix of the form ( I 2 , a I 2 ) , with a any real constant, and search the minimum in a:
( I 2   a I 2 ) C   I 2 a I 2 = < T   T > + ( 2   a + a 2 ) I 1 0 a ( < T   T > + ( 2   a + a 2 ) I 1 ) = 0     ( 2 + 2 a ) = 0   B = < T   T > I 1 0
The solution a = 1 gives the last inequality. The inequality for each couple of variances is obtained with z B   z where z is ( 1 , 0 ) for the first two and ( 0 , 1 ) for the second two. The form of the matrix C allows extraction of the inequalities among variances without the use of the positive semi-definite property of C , but directly from the Cauchy-Schwarz inequalities applied to the cross-covariances. The cross-covariances are contained in the off-diagonal blocks of the C matrix. In any case, the Cauchy-Schwarz inequalities of the off-diagonal elements are the key properties to have C as a positive semi-defined matrix.

Appendix B

In this toy-model, the hit uncertainties are Gaussian distributions with two type of hit quality. One of very good quality when the MIP crosses the detector near the borders ( 20 % of strip width of 63 μ m), and the other of bad quality ( 80 % of the strip width) around the strip center. The standard deviation of the hit Gaussian distributions are 0.018 and 0.18 strip widths, the tracker length is 7063.5 strip widths. A slight random mechanical misalignment was supposed with mean value around a half strip (30 μ m). Random misalignments much larger than this are always present in real trackers and recovered by the alignment algorithms. The misalignment (rotations and translations) is simulated, as usual, with a randomization of the hit position on the strips producing a binomial distribution of the hit quality. A large number ( 150000   N ) of Gaussian random numbers, with zero mean and unit standard deviation, are generated (with the MATLAB [11] randn function), each one multiplied by one of the two standard deviations (0.018, 0.18) with the given probability ( 20 % ,   80 % ) another file saves this sequence of the hit standard deviations. The file of the data and that of the hit standard deviations are randomized with an identical random permutation and assembled as two matrices with N lines. The columns simulate a random sample of tracks with equation x i = β + γ y i with β = 0 and γ = 0 . The translation invariance allows this representation of the set of parallel tracks. The collection of N hits (a track) is fitted with two different least-squares method, one with the standard least-squares (using only the matrix of the data) and the other with the weighted least-squares (using the matrix of the data and that of the hit quality). The MATLAB code of this simulation is available on request.

References

  1. Landi, G.; Landi, G.E. Improvement of track reconstruction with well tuned probability distributions. J. Instrum. 2014, 9, P10006. [Google Scholar] [CrossRef] [Green Version]
  2. Landi, G.; Landi, G.E. Optimizing momentum resolution with a new fitting method for silicon-strip detectors. Instruments 2018, 2, 22. [Google Scholar] [CrossRef] [Green Version]
  3. Landi, G.; Landi, G.E. Beyond the N -limit of the least squares resolution and the lucky-model. arXiv 2018, arXiv:1808.06708. [Google Scholar]
  4. Anonymous editor and referee of a journal of instrumentation. Private written communications, 2019.
  5. Ito, S.; Dechant, A. Stochastic time evolution, information geometry and Cramer-Rao bound. arXiv 2018, arXiv:1810.06832. [Google Scholar]
  6. Ivchenko, G.; Yu, M. Mathematical Statistics; URSS: Moscow, Russia, 1990. [Google Scholar]
  7. Gowan, G. Statistical Data Analisis; Oxford University Press: Oxford, UK, 1998. [Google Scholar]
  8. Olive, K.A.; Agashe, K.; Amsler, C.; Antonelli, M.; Arguin, J.-F.; Asner, D.M.; Baer, H.; Band, H.R.; Barnett, R.M.; Basaglia, T.; et al. Particle Data Group. Chin. Phys. C 2014, 38, 090001. [Google Scholar] [CrossRef]
  9. Amemiya, T. Advanced Econometrics; Harvard University Press: Cambridge, MA, USA, 1985. [Google Scholar]
  10. Al-Sharadqah, A.; Chernov, N. Error analysis for circle fitting algorithms. Electron. J. Stat. 2009, 3, 886–911. [Google Scholar] [CrossRef]
  11. MATLAB 8; The MathWorks Inc.: Natick, MA, USA, 2012.
Figure 1. Empirical PDFs of the fitted (straight) track direction for tracker models with N = 2 to N = 13 detecting layers. The PDFs for N = 2 are centered on zero, the others are shifted by N-2 steps of fixed amplitude. The blue PDFs are the results of the standard least-squares. The red PDFs are obtained with the weighted least-squares. (The reported dimensions are pure numbers).
Figure 1. Empirical PDFs of the fitted (straight) track direction for tracker models with N = 2 to N = 13 detecting layers. The PDFs for N = 2 are centered on zero, the others are shifted by N-2 steps of fixed amplitude. The blue PDFs are the results of the standard least-squares. The red PDFs are obtained with the weighted least-squares. (The reported dimensions are pure numbers).
Instruments 04 00002 g001
Figure 2. Another Gaussian model. A faster increase in the maximums compared to that of Figure 1. The blue straight lines connect the maximum of the first distribution to that of the last red distribution. The color cade is that of Figure 1.
Figure 2. Another Gaussian model. A faster increase in the maximums compared to that of Figure 1. The blue straight lines connect the maximum of the first distribution to that of the last red distribution. The color cade is that of Figure 1.
Instruments 04 00002 g002

Share and Cite

MDPI and ACS Style

Landi, G.; Landi, G.E. The Cramer—Rao Inequality to Improve the Resolution of the Least-Squares Method in Track Fitting. Instruments 2020, 4, 2. https://doi.org/10.3390/instruments4010002

AMA Style

Landi G, Landi GE. The Cramer—Rao Inequality to Improve the Resolution of the Least-Squares Method in Track Fitting. Instruments. 2020; 4(1):2. https://doi.org/10.3390/instruments4010002

Chicago/Turabian Style

Landi, Gregorio, and Giovanni E. Landi. 2020. "The Cramer—Rao Inequality to Improve the Resolution of the Least-Squares Method in Track Fitting" Instruments 4, no. 1: 2. https://doi.org/10.3390/instruments4010002

Article Metrics

Back to TopTop