3.2. The Effects of Morphisms and Bivariate Sampling
Let us start with the effect of morphisms transforming original variables
into their transformed
. That depends on the rank of variables within the available sample. Without loss of generality, let us sort
by ascending order in the sample,
i.e., the
l-th value equaling the ordered
l-th value
,
l=1,…,
N. The bivariate
l-th realization is
, where
is the random bivariate rank permutation depending upon the particular sample (
e.g. the first of
coming with the third of
, then
l’(
l=1)
=3 and so on). In particular
when correlation equals one. The inverse of the function
is written
. The probability p-values of
i.e., their marginal cumulated probability functions (CDFs) are respectively
, growing as function of
. Those p-values can only be inferred from the sample or prescribed from
a-priori hypotheses. The sorted transformed RVs given by ME-morphisms are:
where
are the ME prescribed CDFs (
e.g. CDFs of Gaussians) of
X and
Y respectively. Then the morphisms relies upon invertible transformations
. The bivariate transformed realizations
are then used to compute expectations (Equation 15). Since the exact marginal distributions are not known, their cumulated probabilities must be prescribed, for example with regular steps
in which
.
In order to obtain moments of
we need rewriting it in a convenient form:
where
is the Kronecker delta,
are the marginal cumulated probabilities, corresponding respectively to probabilities
and
in the sum (19) and
is the copula function [
23] (ratio between the joint PDF and the product of marginal PDFs). By looking at (19), one sees that
is an estimator of the copula
. In particular, if
X,
Y are independent, then
l and
l’(
l) are independent,
and
i.e. there is an average equipartition of the bivariate ranks.
Equation (19) shows that moments of
depend on statistics of the error of the copula estimator, which can be very tricky due to the imposition of marginal PDFs by morphisms, presenting unusual effects with respect to classical results from samples of
iid realizations [
32].
For that, let us denote the random perturbation
, then
, also satisfying to the constraints
or
as a consequence of the fact that
and
are index permutations of
N values. Therefore, taking into account those constraints,
can be written in different forms in terms of perturbations:
where
and its perturbation with respect to the global mean is
. The perturbation with respect to
X-conditional mean is
where
. A similar definition is written for the
Y- perturbation
.
The estimator (15) of independent constraints (components of uniquely dependent on X or Y) have a bias but vanishing variances (null components of ), since perturbations or vanish because the local values of coincide to one of the (X or Y)-conditional means. That bias reduces to a numerical integration error. For example for X-depending functions expectations, the error reduces to bias , of order as given by the trapezoidal integration rule for bounded functions. The estimators of cross expectations have bias and non-vanishing variances.
Now, our goal is to get the estimation of the covariance matrix
(17). As a consequence of the non-replacement of quantiles or rankins, the deviations
and
in (20) are not necessarily independent for
, which will not occur if different realizations would be independent, leading to
. The statistics without replacement generally lead to a deflation of estimator variances as compared to those satisfying the hypothesis of independence of realizations [
33] or, in other words,
. Therefore, in order to get a
N−1-scaled expression for
, we will consider another type of deviations of
consistent with (20).
We propose new deviations, denoted by
, that are given by the linear combination both of the global deviation
and of the marginal deviations
with the respective coefficients summing 1 and having the least mean square (
lms). Those deviations are consistently given by:
which are the residuals of the best linear fit of
using the conditional means
and
as predictors and where the coefficients are those of the linear regression:
Those deviations take into account the maximum implicit knowledge of marginal PDFs through their conditional means. Now we will use them for expressing the error moments.
The expression of the error covariances in
relies upon the expansion (20) with perturbations written as function of mean values of products of deltas
. These means depend on the true copula and are written as:
where we have considered the fact that
l’(
l) and its inverse
l(
l’) are permutations of ranks (no duplication allowed). The values indicated with asterisk in (23) correspond to
X,
Y independent (
l’(
l) independent of
l). Those moments are difficult to obtain in practice unless variables are independent or the bivariate PDF is known
a priori. From these moments, a large ensemble of
N-sized surrogate samples is generated from which empirical estimator covariances are computed.
Then, by plugging (23) into the generic (
α-th row,
β-th column) of
, and denoting the
α-th and
β-th components of
by
and
with estimation errors
, we get
The first term of the
rhs of (24) is given by
i.e. 1/
N times the expectation of the covariance among
N realizations. That term converges asymptotically to
,
i.e., the estimator’s covariance in the hypothesis of
N iid realizations. However, when marginals are imposed or the morphism of variables is performed, that hypothesis no longer holds because the covariance estimator is a statistic without replacement [
33], since quantiles of
X and
Y are not repeated in the sample. Therefore, the additional term of (24) reduces the estimator’s variances with respect to the case of
iid trials.
Looking for a correct representation of the cross estimator’s variances when marginals are imposed, we represent the
perturbations by
(21) (residuals of the best linear regression). There, we will benefit from a generic property of
lse (least squares error) regression residuals which is the fact that they are uncorrelated with the predictors (here the conditional means of
). This means that
is represented in terms of noises which are uncorrelated, both with
X and
Y. Consequently, different realizations of
are uncorrelated, which will simplify the expression of the covariance matrix. Therefore, using those
lms perturbations, the generic matrix entry
(24) is rewritten as
The
-scaled term of (25) converges asymptotically (as
) to
,
i.e., 1/
N times the covariances between residuals of the linear regression relying upon conditional variances. This let us to formulate the Theorem:
Theorem 2: Let us suppose imposed
X and
Y marginal PDFs by variable morphisms. Then, the covariance between the
N-sized based estimators
and
of the means of cross functions of
and
is given by
where
is the residual of the best linear fit taking conditional means as predictors, and
are the corresponding coefficients (idem for
). The expectation is computed with the true PDF of the population. The proof was given before in the text.
An immediate corollary of this Theorem applies in the case data are governed by a certain MinMI-PDF issued from
. In that conditions
and
are themselves cross functions from the constraining set
and
are entries of
(17). Then, if the true joint PDF is the MinMI-PDF issued from
, we get:
where we use the covariance matrix introduced in (4). Under those conditions one has the identity for the matricial product
, which will be crucial for the evaluation of asymptotic MinMI estimation bias.
3.3. Errors of the Estimators of Polynomial Moments under Gaussian Distributions
In this section we assess the bias, the covariance of estimators and its expression (25) when constraints are bivariate monomials (13) and Gaussian morphisms are performed as described in
Section 2.3. For the purpose of discussing statistical tests of non-Gaussianity presented in a next section, we will restrict our study by considering the case of
N-sized samples of
iid realizations of independent variables
(taken without loss of generality standard Gaussians). There, an empiric Monte-Carlo strategy is used by taking the standard Gaussian morphisms
of the
N outcomes, from which one estimates the expectation of a vector of generic functions
(13). The bias is
, which is determined by the fixed Gaussian centered moments
and
,
. The sample is centered and standardized such that
. The variance
of
can be rigorously computed from the quadruple sum (25) using the
N quantiles from the standard Gaussian and the delta expectations (23) for the case of
X,
Y independent from each other. However, the computation of that sum is very time-consuming for high
N values. For that reason, we approximate it by a Monte-Carlo mean obtained with
independent realizations of the
N-sized samples. The finite and asymptotic values of
, valid for the case of
N iid trials, are given by:
whereas those (smaller than those of (28)) obtained from least mean squares (25) are:
Figure 1 compares the variance
with the squared bias
of the estimator, both relevant in the bias of the MinMI estimation. In the same figure, one compares the empirical variance
, with its approximation
and with the variance for the case of
iid trials:
. We use
,respectively in panels a), b), c), sorted by growing total variance
, specially concentrated at the distribution queues. In all figures,
N=25*2
k,k=0,..,11. We have verified that the empirical variance
agrees very well to the theoretical value
for all
Ns. (not shown).
At this point, some generic conclusions can be drawn. The estimator’s variance grows with dominating over the squared bias, except for small N values and higher values of . This will lead us to neglect the bias of covariance estimator’s in the MinMI asymptotic statistics.
Figure 1.
Squared empirical bias: (black lines) of N-based - expectations as function of N, empirical variances: (red lines), approximated variances: (blue lines) and variance for the case of N iid trials: (green lines). stands for different bivariate monomials: (a), (b) and (c).
Figure 1.
Squared empirical bias: (black lines) of N-based - expectations as function of N, empirical variances: (red lines), approximated variances: (blue lines) and variance for the case of N iid trials: (green lines). stands for different bivariate monomials: (a), (b) and (c).
From
Figure 1, we also note that the variance reduction coming from morphisms of variables, tends to decrease for higher
N values, where the effect of sampling prevails with a
scaling on the estimator variance where it is closely approximated by the asymptotic
lms variance
. That can lead to a slight increase of
for small
Ns, followed by a decrease (e.g.,
), due to the effect that
is small for lower values of
N.
Moreover, thanks to the Central Limit Theorem (CLT), the distribution of estimator errors tends towards Gaussianity with increasing
N, with a slower convergence rate for higher
variances. However, the Gaussian PDF limit has an infinite support which must be truncated since the estimated moments
must be within a kind of polytope with edges determined by Schwartz-like inequalities as shown by PP12 [
12] (e.g.,
and
, working as bounds for nonlinear correlations. Since estimators have bounds, the estimation errors do so as well. This can be solved by using the Fisher Z-transform arctanh(
c) of a generic linear or nonlinear correlation
c and projecting it over the real support (not done here).
Now we illustrate in
Figure 2, the Theorem 2 under different values of correlation
. We consider the variables
with a joint Gaussian PDF of correlation
with marginal standard Gaussians. In
Figure 2 we compare the empirical Monte-Carlo value of
(MC in the Figure), within an ensemble of 5000
N-sized samples with the theoretical one
(case where morphism is performed, AN in the Figure) and
(case of
iid realizations, ANiid in the Figure). We have used a sample of
N=200, which is supposed to be near the beginning of the asymptotic regime and two cross functions:
and
. The aforementioned variances are
while
and
is the mean squared residual of the best linear fit using the predictors
and
. For both functions, a very good agreement is verified between Monte-Carlo values and the theoretical ones within 1–5% relative error. A generic result of
Figure 2 is the fact that, under the fixation (presetting) of marginals, the sampling variability of cross estimators falls to zero as far the absolute value of correlation tends to one.
Figure 2.
N times Monte-Carlo variances: thick solid lines) and its theoretical analytical value (thick dashed lines), both under imposed marginals (morphisms) and analytical value of for iid data (thin solid lines). means different bivariate monomials: (black curves), (red curves). N = 200.
Figure 2.
N times Monte-Carlo variances: thick solid lines) and its theoretical analytical value (thick dashed lines), both under imposed marginals (morphisms) and analytical value of for iid data (thin solid lines). means different bivariate monomials: (black curves), (red curves). N = 200.