Permutation Variation and Alternative Hyper-Sphere Decomposition

Li, Qingze; Pan, Jianxin

doi:10.3390/math10040562

Open AccessArticle

Permutation Variation and Alternative Hyper-Sphere Decomposition

by

Qingze Li

^1,*,†

and

Jianxin Pan

^2,3,*,†

¹

School of Mathematics and Statistics, Yunnan University, Kunming 650091, China

²

Research Center for Mathematics, Beijing Normal University at Zhuhai, Zhuhai 519087, China

³

Division of Science and Technology, United International College (BNU-HKBU), Zhuhai 519087, China

^*

Authors to whom correspondence should be addressed.

^†

These authors contributed equally to this work.

Mathematics 2022, 10(4), 562; https://doi.org/10.3390/math10040562

Submission received: 18 December 2021 / Revised: 30 January 2022 / Accepted: 8 February 2022 / Published: 11 February 2022

(This article belongs to the Special Issue Recent Advances in Computational Statistics)

Download

Browse Figures

Versions Notes

Abstract

:

Current covariance modeling methods work well in longitudinal data analysis. In the analysis of data with no nature order, a common covariance modeling method would be inadequate. In this paper, a study is implemented to investigate the effects of permutations of data on the estimation of covariance matrix

Σ

. Based on the Hyper-sphere decomposition method (HPC), this study suggests that the change of data’s permutation breaks the consistency of covariance estimation. An alternative Hyper-sphere decomposition method with permutation invariant is introduced later in this paper. The alternative method’s consistency and asymptotic normality are studied when the observations follow a normal distribution. These results are tested using some example studies. Furthermore, a real data analysis is conducted for illustration purposes.

Keywords:

order dependency; alternative Hyper-sphere decomposition; permutation invariant; unconstrained parameterization

1. Introduction

The covariance matrix is a simple and popular method to describe the variation and correlation between random variables in multivariate statistics. A valid covariance matrix must be symmetric and semi-positive definite. In some areas, like social sciences, finance, economics and geology, people are more interested in studying the covariance matrices than mean models. Meanwhile, an appropriate working covariance matrix could increase the estimator’s efficiency. Moreover, Pourahmadi (2013) [1] suggests a good estimate of the covariance matrix can lead to an accurate statistical inference and test results. Little and Rubin (2019) [2] show that an accurate estimate of covariance is essential in dealing with missing data problems. A sample covariance is usually used to estimate the population covariance matrix for its convenience. In some special cases, for example, high-dimensional situations where the number of repeated measurements n is less than that of unknown parameters in covariance matrix, using sample covariance matrix would end up with a biased estimator of mean model [3].

Pourahmadi (1999) [4] introduced the modified Cholesky decomposition method (MCD ) into the estimation of covariance matrix by applying regression-based ideas into matrix decomposition methods. However, the interpretation of the relationship between model coefficients and correlation/variance of population Y is indirect. Pan and Pan (2017) [5] proposed the alternative Cholesky decomposition method (ACD) in 2017 that improves the correlation-interpretation problem of MCD by applying Cholesky decomposition on the correlation matrix R,

Σ = D R D = D T T^{'} D,

where matrix D is a diagonal matrix with standard deviations

σ

of population Y, and T is a lower-triangular matrix with unit row vectors,

{\vec{T}}_{(i)}

i = 1, \dots, p

, whose Euclidean norm is 1. The same model regression procedure as in MCD cannot be applied in ACD due to the extra unit-norm-restriction on row vectors in matrix T. Rebonato and Jäckel (2011) [6] projected T in ACD into a unit Hyper-sphere coordinate system and proposed the Hyper-sphere decomposition method (HPC), in which elements

t_{i j}

in T are

t_{i j} = \{\begin{matrix} 1 & i = j = 1 \\ cos ϕ_{i j} \prod_{k = 1}^{j - 1} sin ϕ_{i k} & j = 1, \dots, p - 1, i = 2, \dots, p \\ \prod_{k = 1}^{j - 1} sin ϕ_{i k} & j = p \end{matrix},

(1)

where

ϕ_{j k}

are corresponding new angular coordinates of vector

{\vec{T}}_{(i)}

in a unit hyper-sphere coordinate system. As a result we have

T = (\begin{matrix} 1 & 0 & 0 & \dots & 0 \\ c_{21} & s_{21} & 0 & \dots & 0 \\ ⋮ & ⋮ & ⋮ & ⋱ & ⋮ \\ c_{p, 1} & c_{p, 2} s_{p, 1} & c_{p, 3} s_{p, 2} s_{p, 1} & \dots & \prod_{k = 1}^{p} s_{p, k} \end{matrix}),

where

c_{i j} and s_{i j}

refer to

c o s (ϕ_{i j}) and s i n (ϕ_{i j})

. Comparing to ACD there is an additional matrix

Φ

, which is also a lower triangular with diagonal elements 0 and with the lower off-diagonal being angles

ϕ_{i j} \in [0, π)

,

Φ = (\begin{matrix} 0 & 0 & 0 & \dots & 0 & 0 \\ ϕ_{21} & 0 & 0 & \dots & 0 & 0 \\ ϕ_{31} & ϕ_{32} & 0 & \dots & 0 & 0 \\ ⋮ & ⋮ & ⋮ & ⋱ & ⋮ & ⋮ \\ ϕ_{p 1} & ϕ_{p 2} & ϕ_{p 3} & \dots & ϕ_{p, p - 1} & 0 \end{matrix}) .

This method can guarantee the unit-row vectors in T for any given estimator matrix

Φ

; meanwhile, this decomposition has a geometry meaning for

Φ

[7]. In that sense, the regular model regression procedure can be applied in HPC just as in Pourahmadi (2000) [8]. To ensure

ϕ_{j k} \in [0, π)

we can do a further triangular transpose on the liner model:

\begin{matrix} t a n (ϕ_{j k} - \frac{π}{2}) = ω_{j k}^{'} γ & ϕ \in [0, π), & and & log σ_{j}^{2} = h_{j}^{'} λ, \end{matrix}

(2)

where

ω_{j k}

and

h_{j}

are design matrices and vectors for angles

ϕ_{j k}

and variances

σ_{j}^{2}

, respectively. Meanwhile,

γ

and

λ

in Equation (2) are unknown correlation and variance components.

The current popular methods like HPC work well in longitudinal data analysis in which there is a natural order of the sample data. While in analyses like geometric and causal data there is no natural permutation of data, these methods generate estimates that depend on the permutation of sample data. The purpose of this article is to illustrate the permutation variance of covariance estimator of HPC. After that, we redefine the translation between T and

Φ

, then propose an alternative Hyper-sphere decomposition (AHPC) method that inherits most of the advantages of HPC while improving the permutation variance. Meanwhile, the AHPC has a more straightforward geometrical interpretation.

2. Interpretations of Hyper-Sphere Decomposition Method

Rapisarda, Brigo and Mercurio (2007) [7] explained HPC from the aspect of Jacobi rotation. In a p-dimensional right-hand system, axis

e_{1}, e_{2}, \dots, e_{p}

are selected as follows:

1.: Set the direction of ${\vec{T}}_{(1)}$ as the first axis $e_{1}$ .
2.: In the plane containing vectors ${\vec{T}}_{(1)}$ and ${\vec{T}}_{(2)}$ , make the $e_{2}$ vertical to $e_{1}$ in a right-hand system.
3.: $e_{3}$ is perpendicular to the previous axis, $e_{1}$ and $e_{2}$ , and on the same side of their cross product $e_{1} \times e_{2}$ .
4.: After the signed j axis, the next axis $e_{j + 1}$ is defined as vertical to all previous axes and on the same direction of $e_{1} \times e_{2} \dots \times e_{j}$ , while $j + 1 \leq p$ .

Cross product

e_{j} \times e_{k} = | | e_{j} | | \cdot | | e_{k} | | s i n θ \cdot \vec{n}

determines a direction

\vec{n}

perpendicular to the plane containing

e_{j}

and

e_{k}

given by the right-hand rule.

With all axes having been set up, the row vectors

{\vec{T}}_{(j)} = {(t_{j 1}, \dots, t_{j j}, 0, \dots, 0)}^{'}

shows us the coordinate of

{\vec{T}}_{(j)}

in this Cartesian system. According to [7], the relation between angles

ϕ_{i j}

and coordinates of

{\vec{T}}_{(k)}

s is (Figure 1a):

1.: We begin with ${\vec{T}}_{(1)}$ . Next we turn ${\vec{T}}_{(1)}$ counter-clockwise in hyper-plane $P L_{12}$ with angle $ϕ_{2, 1}$ . Then we have ${\vec{T}}_{(2)} = G (1, 2; ϕ_{2, 1}) {\vec{T}}_{(1)}$ . The hyper-plane $P L_{12}$ is built with $e_{1}$ and $e_{2}$ .
2.: Turn ${\vec{T}}_{(1)}$ in $P L_{12}$ with angle $ϕ_{3, 1}$ ; then do another rotation in $P L_{23}$ with angle $ϕ_{3, 2}$ . We then have ${\vec{T}}_{(3)} = G (2, 3; ϕ_{3, 2}) G (1, 2; ϕ_{3, 1}) {\vec{T}}_{(1)}$ .
3.: ${\vec{T}}_{(k)}$ is built up with $k - 1$ times turn as: ${\vec{T}}_{(k)} = G (k - 1, k; ϕ_{k, k - 1}) \dots G (1, 2; ϕ_{k, 1}) {\vec{T}}_{(1)}$ .

In a p-dimensional space,

G (j, k; θ)

is a Jacobi rotation matrix in hyper-plane

P L_{j, k}

with counter-clockwise angle

θ

. Besides the geometric explanation from [7], we also propose a new interpretation of the correlation between matrices T and

Φ

in the HPC method, regarding angles between two hyper-planes in a row space of T (Figure 1b):

1.: Lower off-diagonal entries in the first column of $Φ$ , $ϕ_{i 1}$ s, are angles between the first and ith row vectors in T.
2.: The lower off-diagonal entries in second column of $Φ$ contains angles, $ϕ_{i 2}$ s, between plane $H P_{12}$ , which is defined by the first and second row vectors in T, and $H P_{1 i}$ $i > 2$ , which is defined by the first and ith row vectors in T.
3.: After that, elements $ϕ_{j k}$ in kth column of matrix $Φ$ , $j > k \geq 3$ , are angles between two hyper-planes $H P_{12 \dots (k - 1) k}$ and $H P_{12 \dots (k - 1) j}$ .

The

H P_{12 \dots k}

’s are built with the first, second and until the kth row vectors. The angle between hyper-planes is well known in the mathematics literature, see, e.g., [Chep. 3] of Murty [9]. The angle

ϕ_{j k}

’s between two hyper-planes,

H P_{12 \dots (k - 1) k}

and

H P_{12 \dots (k - 1) j}

can be calculated recursively via

θ_{j k} = \{\begin{matrix} a r c o s (t_{j k}) & j \geq 2, k = 1 \\ a r c o s (\frac{t_{j k}}{t_{j (k - 1)} t a n (θ_{j (k - 1)})}) & j > 2, k \geq 2 \end{matrix},

(3)

According to [7], when

θ_{j k}

’s and

ϕ_{j k}

’s are in

[0, π]

, the translation in Equation (3) is unique,

θ_{j k} = ϕ_{j k}

.

3. Order-Dependence of Hyper-Sphere Decomposition Method

Based on both explanations of the HPC method above, we show that the HPC is order-dependent from two aspects in this section.

Setting R as a correlation matrix between random sample

Y = {(Y_{1}, \dots, Y_{p})}^{'}

, and after a standard Cholesky decomposition

R = T T^{'}

, where

T = {({\vec{T}}_{(1)}, \dots, {\vec{T}}_{(p)})}^{'}

, based on the definition of HPC decomposition, the correlation

ρ_{j k}

in R is actually the

c o s i n e

value of an angle between two unit vectors

{\vec{T}}_{(j)}

and

{\vec{T}}_{(k)}

,

ρ_{j k} = {\vec{T}}_{(j)}^{'} {\vec{T}}_{(k)} = c o s < {\vec{T}}_{(j)}, {\vec{T}}_{(k)} >

. So eventually we say that these unit vectors

{\vec{T}}_{(j)}

in lower-triangular matrix T represent the whole correlation information of corresponding random sample

Y_{j}

. In that sense, the order of row vectors in T has almost the same representation of the order of

Y_{j}

in correlation matrix R. For example, when

p = 6

, under the order

Y_{1}, Y_{2}, Y_{3}, Y_{4}, Y_{5}, Y_{6}

, we have R and the lower triangular T matrix in (4) where, for example,

t_{42}

is the coordinate on axis

e_{2}

for vector

{\vec{T}}_{(4)}

, which corresponds to

Y_{4}

. Then we will have a

Φ

matrix as the left one in (5) where in the matrix

Φ

, from left to right,

ϕ_{j k}

is the kth Jacobi rotation angle in hyper-plane

P L_{k, k + 1}

, which is built with

e_{k}

and

e_{k + 1}

. For example,

ϕ_{42}

is the second Jacobi rotation angle in

P L_{23}

.

(4)

From the aspect of angles between two hyper-planes, we have the table for matrix

Φ

as in the right one in (5) where j stands for row number. In this table, for example,

ϕ_{53}

is the angle between

H P_{1, 2, 3}

and

H P_{1, 2, 5}

. Hyper-plane

H P_{1, 2, 5}

is built with the first, second and fifth row vectors in T.

(5)

Eventually we have the same equation between

ρ

and

ϕ

as in [10]:

ρ_{j k} = c_{j k} \prod_{l = 1}^{k - 1} s_{j l} s_{k l} + \sum_{l = 1}^{k - 1} [c_{j l} c_{k l} \prod_{t = 1}^{l - 1} s_{j t} s_{k t}],

(6)

where

s_{j k}

and

c_{j k}

stand for

s i n (ϕ_{j k})

and

c o s (ϕ_{j k})

, respectively.

If we change the order of

Y_{j}

s in correlation matrix R, simultaneously we change the corresponding row vectors,

{\vec{T}}_{(j)}

s, in lower-triangular matrix T. While the elements in new matrix

T^{*}

may change after changing order, the relationship between

{\vec{T}}_{(j)}

and

Y_{j}

remains the same. As illustrated before, elements in matrix T are coordinates. When the order of

Y_{j}

s is different, we set up a new set of axes. That leads us to have a new set of coordinates for

{\vec{T}}_{(j)}

s. If we change the order into

Y_{1}, Y_{2}, Y_{5}, Y_{4}, Y_{3}, Y_{6}

, then the new correlation matrix

R^{*}

can be obtained by swapping elements in R and doing the Cholesky decomposition on

R^{*}

we get

T^{*}

as in (7).

(7)

Since we changed the order from position

Y_{3}

, simultaneously a new set of axes

e_{3}^{*}

,

e_{4}^{*}

,

e_{5}^{*}

and

e_{6}^{*}

are selected. As a result we will have new value for the coordinate on

e_{3}^{*}

,

e_{4}^{*}

and

e_{5}^{*}

, represented as *s. Meanwhile, the corresponding

Φ^{*}

matrix will be like in (8).

(8)

From the aspect of Jacobi rotation, element

ϕ_{j k}

in

Φ

is a rotating angle in plane

P L_{k, k + 1}

built by axes

e_{k}

and

e_{k + 1}

. Because the definition of axis

e_{j}

s depend on the order of

{\vec{T}}_{(j)}

, which is again also the order of

Y_{j}

,

ϕ_{j k}

still depends on the order of

Y_{j}

. We can observe that

c o s < {\vec{T}}_{(1)}, {\vec{T}}_{(k)} > = {\vec{T}}_{(1)}^{'} {\vec{T}}_{(k)} = (1, \dots, 0) {(c_{k 1}, \dots . 0)}^{'} = c_{k 1} = f (ϕ_{k 1})

, and

c o s < {\vec{T}}_{(2)}, {\vec{T}}_{(k)} > = {\vec{T}}_{(2)}^{'} \cdot {\vec{T}}_{(k)} = (c_{21}, \dots, 0) {(c_{k 1}, \dots, 0)}^{'} = c_{21} c_{k 1} + s_{21} c_{k 2} s_{k 1} = f (ϕ_{21}, ϕ_{k 1}, ϕ_{k 2}),

which shows that

ρ_{1 k}

and

ρ_{2 k}

only relay on

ϕ_{k 1}

,

ϕ_{k 2}

and

ϕ_{21}

. This explains the reason why

ϕ_{k 1}

and

ϕ_{k 2}

remain the same, respectively. Since the third vector becomes

{\vec{T}}_{(5)}

, angles on the column 3 all will have a new value, except

ϕ_{53}

, for obvious reasons, the angle between

H P_{1, 2, 3}^{*}

and

H P_{1, 2, 5}^{*}

equals that between

H P_{1, 2, 5}

and

H P_{1, 2, 3}

.

We summarize this dependency in Table 1. Table 1 indicates how the value of entries in

Φ

depend on the order of row vectors in matrix T.

After proving the permutation variation of the values in the

Φ

matrix, we consider how this would affect our modeling process. If the HPC method were permutation invariant, the model assumption should always hold no matter how we change the order of data Y. However, a different order gives different values of

Φ

, proven above. Then, under the model (2), with the same covariates

ω^{*}

and

h^{*}

in which the corresponding values have been rearranged according to the new order, HPC would have a different estimator of

\hat{γ}

and

\hat{λ}

. In some cases, even the whole model assumption would be wrong under the new permutation of sample data.

Due to the definition of translation between matrices T and

Φ

, according to our study, the data order affects

Φ

element-wise. Furthermore, there is no way of transforming a new angle matrix under the new data order back to its previous one only by rearranging elements. In other words, a change of the order will give a different set of values for

Φ

. The regression model on

Φ

elements would have different parameter estimates under varied data order. Furthermore, even the model assumption may vary according to the data order. In that sense, we say the current HPC method is order dependent. As the conclusion, under both interpretations, the HPC method is permutation variate both in terms of the

Φ

matrix and models for variance components.

4. Alternative Hyper-Sphere Decomposition Method

We notice that the order dependency of the HPC method is caused by the definition of the translation between two coordinate systems, the Cartesian coordinate system for matrix T and a unique angular coordinate system for

Φ

. We now propose a new definition of matrix

Φ

inspired by spherical parameterization [11], which would improve the relationship between matrices R and

Φ

. Then eventually the newly proposed alternative Hyper-sphere decomposition (AHPC) would be permutation invariant.

4.1. The AHPC Model

The AHPC method differs from HPC in a new definition on

Φ

, the lower triangular matrix with diagonal elements equal to 0. Instead of being angles between hyper-planes in HPC, here we define our elements in

Φ

as angles between row vectors

{\vec{T}}_{(j)}

s, as illustrated in Figure 2,

ϕ_{j k} = < {\vec{T}}_{(j)}, {\vec{T}}_{(k)} > 1 < k < j \leq p,

(9)

where

ϕ_{j k} \in [0, π)

to ensure the uniqueness.

For dimension

p \geq 3

, there is a three-dimensional restriction. Any pairs of vectors

T_{i}

and

T_{j}

must be in a two-dimensional plan, the angle

ϕ_{i j} \in [0, π)

between

{\vec{T}}_{(i)}

and

{\vec{T}}_{(j)}

, to guarantee the uniqueness. Furthermore, for angles between three vectors, there are only two proper situations. The first one is

ϕ_{j k} = ϕ_{j i} + ϕ_{i k}

when vector

{\vec{T}}_{(i)}

,

{\vec{T}}_{(j)}

and

{\vec{T}}_{(k)}

are in the same plane, and

{\vec{T}}_{(i)}

is sitting between

{\vec{T}}_{(j)}

and

{\vec{T}}_{(k)}

. The second scenario is

ϕ_{j k} < ϕ_{j i} + ϕ_{i k}

where three vectors are not in the same plane. Consequently, for any three vectors

{\vec{T}}_{(i)}

,

{\vec{T}}_{(j)}

and

{\vec{T}}_{(k)}

, angles between each pair of vectors must satisfy that the sum of any two angles be not less than the third one and the difference between any two angles be not greater than the third one,

ϕ_{j k} \leq ϕ_{j i} + ϕ_{i k} and ϕ_{j k} \geq ϕ_{j i} - ϕ_{i k}, k < i < j,

(10)

where

ϕ_{j i} = ϕ_{i j}

. Thus for the restriction on angles between vectors in three-dimensional spaces, the model assumption must satisfy the constraint

ϕ_{j k} \leq ϕ_{i j} + ϕ_{i k}

. This is not only due to the geometric fact but also is sufficient for positive semi-definiteness of correlation matrix R. When

ϕ_{j k} = ϕ_{j i} + ϕ_{i k}

,

T_{i}

,

T_{j}

and

T_{k}

must be on the same plane, which suggests that there must exist constants a and b such that

T_{i} = a T_{j} + b T_{k}

. As a result, in the ith row of correlation matrix R,

ρ_{i h} = T_{i}^{'} T_{h} = {(a T_{j} + b T_{k})}^{'} T_{h} = a T_{j}^{'} T_{h} + b T_{k}^{'} T_{h} 1 \leq j \leq p,

which indicates the ith, jth and kth rows are linearly dependent; then the corresponding correlation matrix R would be positive semi-definite,

d e t (R) = 0

. On the other hand, if

ϕ_{j k} < ϕ_{j i} + ϕ_{i k}

, R would be positive definite.

We model the variance components by:

log σ_{j}^{2} = h_{j}^{'} λ, ϕ_{j k} = g (ω_{j k}),

(11)

where

g (ω_{j k})

should be a monotonic decreasing function for stationary data whose correlation only depends on absolute distances between covariates

ω_{j k} = | | L o c a t i o n_{j} - L o c a t i o n_{k} | |

. For angles between three vectors

{\vec{T}}_{(i)}

,

{\vec{T}}_{(j)}

and

{\vec{T}}_{(i)}

, the distance

ω_{j k}

is less than

ω_{j i} + ω_{i k}

; then

ϕ_{j k} = g (ω_{j k}) \leq g (ω_{j i}) + g (ω_{i k}) = ϕ_{j i} + ϕ_{i k}

. Another option is optimization with nonlinear programming (NLP). We can use NLP methods [12], with nonlinear inequality constraints,

\begin{matrix} g (ω_{j k}) - g (ω_{j i}) - g (ω_{i k}) + s_{j k}^{2} = 0, & j < i < k, \\ | g (ω_{j i}) - g (ω_{i k}) | - g (ω_{j k}) + s_{j k}^{2} = 0, & j < i < k, \end{matrix}

(12)

to calculate the maximum likelihood estimator. The

s_{j k}

are slack parameters in the Karush–Kuhn–Tucker (KKT) condition for NLP to translate inequality constraints into equality ones. Libraries like fmincon in MATLAB and program LINGO could deal with this NLP problem as well. However, this is more of a mathematical issue than a statistical problem. So no details on NLP will be offered and studied in this paper.

Under the new definition of

Φ

, we have a direct relation between R and

Φ

:

ρ_{j k} = ρ_{k j} = c o s (ϕ_{j k}),

(13)

where

ϕ_{j k}

is in

[0, π]

. This is the key to guarantee permutation invariance. Comparing with Equation (6),

ρ_{j k}

in AHPC only depends on one corresponding component

ϕ_{j k}

instead of depending on a series of

ϕ_{j k} s

in a triangular area over

j + 1

rows in HPC. The translations between T and

Φ

are

c o s (ϕ_{j k}) = c_{j k} = c o s < {\vec{T}}_{(j)}, {\vec{T}}_{(k)} > = {\vec{T}}_{(j)}^{'} {\vec{T}}_{(k)} = \sum_{l = 1}^{k} t_{j l} t_{k l} j > k,

(14)

and

t_{j k} = \{\begin{matrix} c_{j 1} & j > k = 1 \\ [c_{j k} - (\sum_{l = 1}^{k - 1} t_{j l} t_{k l})] / (t_{k k}) & j > k > 1 \\ 1 & j = k = 1 \\ {[1 - (\sum_{l = 1}^{k - 1} t_{j l}^{2})]}^{1 / 2} & j = k > 1 \end{matrix},

(15)

where

t_{j k}

s are elements in T.

When changing R into

R^{*}

by re-ordering Y in matrix form, actually we apply a transformation matrix A on R such that:

A R A^{T} = R^{*},

in which R and

R^{*}

are non-negative definite symmetric and A is a square matrix. Under this setting, finding A solves the unique form of the algebraic Riccati equation

A C_{1}^{T} C_{1} A = R^{*} + F^{T} A + A F,

(16)

where

C_{1}^{T} C_{1} = R

, which means

C_{1}

is the Cholesky decomposition of R and setting

F = 0

. Consequently, there must exist a transformation matrix A such that

A R A^{T} = R^{*}

.

Meanwhile, based on the definition of the permutation matrix, P, when changing the order of rows in matrix R from

1, 2, 3, 4

to

R^{*}

with new order

4, 1, 3, 2

,

P R P^{T} = R^{*},

(17)

where P is a permutation matrix. Thus, permutation matrix P is a solution of transformation matrix A. In that sense, after Cholesky decomposition

R = L L^{T}

and

R^{*} = L^{*} L^{* T}

, the equation below is true:

P R P^{T} = P L L^{T} P^{T} = L^{*} L^{* T} = R^{*} .

(18)

Eventually, we have

P L = L^{*}

, noting that we need to simplify

P L

into a lower triangular matrix. We turn

P L

into a lower-triangular matrix, equivalent to selecting a new set of the coordinate axis. Furthermore, that is what makes

L^{*}

differ from L.

In the alternative hyper-sphere decomposition method, there is a direct link between correlation

ρ_{j k}

and its corresponding coefficient

ϕ_{j k}

in function (13). Only in the proof below, we use the symmetric

Φ_{f} = Φ + Φ^{T}

instead of the lower triangular one. Thus, we can have:

P R P^{T} = P c o s (Φ_{f}) P^{T} = c o s (P Φ_{f} P^{T}) = c o s (Φ_{f}^{*}) = R^{*},

which also means

P Φ_{f} P^{T} = Φ_{f}^{*}

, where

Φ_{f}^{*}

is the angle matrix under the new order. For example, changing

1, 2, 3, 4

into

4, 1, 2, 3

:

\begin{matrix} P Φ_{f} P^{T} & = (\begin{matrix} 0 & ϕ_{41} & ϕ_{43} & ϕ_{42} \\ ϕ_{14} & 0 & ϕ_{13} & ϕ_{12} \\ ϕ_{34} & ϕ_{31} & 0 & ϕ_{32} \\ ϕ_{24} & ϕ_{21} & ϕ_{23} & 0 \end{matrix}) = Φ_{f}^{*} . \end{matrix}

We can observe that entries in

Φ_{f}

and

Φ_{f}^{*}

are identical with locations rearranged. This guarantees the

Φ

matrix in AHPC is permutation invariant.

Make model assumption on variance components as in (2) with covariate matrices

Ω

and H. After changing into the new order, correspondingly we rearrange our new

Ω_{f}^{*}

as

Ω_{f}^{*} = P Ω_{f} P^{T}

. As proven before, the value of

ϕ_{j k}

remains the same. Its corresponding

ω_{j k}

and

h_{j}

under the model assumption also remains the same. We can see that the model assumption and the estimators

\hat{λ}

&

\hat{γ}

will remain the same after any change of the order.

As shown above, from both the aspect of entries in

Φ

and the model for variance components, the AHPC method is permutation invariant.

4.2. Estimation Method for AHPC

We use the two algorithms to get the maximum likelihood estimator of mean parameter

β

and covariance components

γ

and

λ

jointly. Giving response

Y_{p \times n} = {(Y_{1}, Y_{2}, \dots, Y_{n})}^{'}

that with p measurements and each measurement has n observations, under the linear model assumption, we have:

Y - X β \sim M V N (0, Σ),

(19)

where X is a design matrix, and

β

is an unknown parameter vector. In AHPC without loss of generality, we can set up a joint regression model for mean and covariance components as:

μ_{j} = x_{j}^{'} β, log σ_{j}^{2} = h_{j}^{'} λ, ϕ_{j k} = ω_{j k}^{'} γ,

(20)

where

μ_{j}

and

σ_{j}

(

j = 1, 2, \dots, p

) are, respectively, the mean and standard deviation for the jth measurement

Y_{j}

. Moreover,

ϕ_{j k} s \in [0, π]

(

1 < k < j \leq p

) are angles in matrix

Φ

. Dimensions for

β

,

λ

and

γ

are

q_{β}, q_{λ}, q_{γ}

, respectively. Ignoring the constant

- (n p / 2) log (2 π)

, the minus twice log-likelihood function has the following representation:

\begin{matrix} - 2 l & = n log | D R D | + \sum_{i = 1}^{n} r_{i}^{'} D^{- 1} R^{- 1} D^{- 1} r_{i} \\ = n log | D R D | + \sum_{i = 1}^{n} ξ_{i}^{'} ξ_{i}, \end{matrix}

(21)

where

r_{i}

s are residuals

r_{i} = Y_{i} - μ

and

ξ_{i} = T^{- 1} D^{- 1} r_{i}

. In AHPC we have

log | D R D | = log | D T T^{'} D | = 2 log | D | + 2 log | T | = \sum_{j = 1}^{p} (log σ_{j}^{2} + 2 log t_{j j}),

where

t_{j j}

s are diagonal elements in T. We define

Δ

as a

p \times p

matrix, and

Δ (\frac{\partial ρ_{j k}}{\partial γ_{m}})

means the elements of this matrix are

\frac{\partial ρ_{j k}}{\partial γ_{m}}

s (

1 \leq j, k \leq p

;

m = 1, 2, \dots, q_{γ}

). Since

Δ (\frac{\partial ρ_{j k}}{\partial γ_{m}})

is symmetric as R,

Δ (\frac{\partial ρ_{j k}}{\partial γ_{m}}) = Δ^{'} (\frac{\partial ρ_{j k}}{\partial γ_{m}})

. Taking the first derivative of the log-likelihood function above with respect to

β

,

λ

and

γ

separately, we can obtain the following score functions:

\begin{matrix} - 2 \frac{\partial l}{\partial β} & = 2 \sum_{i = 1}^{n} X_{i}^{'} Σ^{- 1} r_{i}, \\ - 2 \frac{\partial l}{\partial λ} & = n \sum_{l = 1}^{p} h_{l} - \sum_{i = 1}^{n} \sum_{l = 1}^{p} ξ_{i j} \sum_{k = 1}^{j} a_{l k} r_{k} \frac{h_{k}}{σ_{k}}, \\ - 2 \frac{\partial l}{\partial γ_{m}} & = n t r (R^{- 1} Δ (\frac{\partial ρ_{j k}}{\partial γ_{m}})) - \sum_{i = 1}^{n} t r (r_{i}^{'} D^{- 1} R^{- 1} Δ (\frac{\partial ρ_{j k}}{\partial γ_{m}}) R^{- 1} D^{- 1} r_{i}), \end{matrix}

(22)

where

ξ_{j k}

(

1 \leq j \leq n

;

1 \leq k \leq p

) is the kth element in

ξ_{j}

,

a_{l k}

s are the

(l, k)

th elements in

T^{- 1}

. Based on model assumption (20),

\frac{\partial ρ_{j k}}{\partial γ_{m}} = - s i n (ϕ_{j k}) ω_{j k m}

, where

ω_{j k m}

is the mth element in vector

ω_{j k}

; and

\partial l / \partial γ = {(\partial l / \partial γ_{1}, \partial l / \partial γ_{2}, \dots, \partial l / \partial γ_{q_{γ}})}^{'}

. Set score functions above or equal to zero to get the estimator of parameters. In general, the score function does not have explicit solutions. Numerical optimization procedures such as the Newton–Raphson algorithm can be used here.

By calculating the second derivatives of the log-likelihood function with respect to

β

,

λ

and

γ

we can have the Fisher-information matrix I,

I = (\begin{matrix} I_{β β^{'}} & I_{β λ^{'}} & I_{β γ^{'}} \\ I_{β λ^{'}}^{'} & I_{λ λ^{'}} & I_{λ γ^{'}} \\ I_{β γ^{'}}^{'} & I_{λ γ^{'}}^{'} & I_{γ γ^{'}} \end{matrix}) .

(23)

in which,

I_{β β^{'}} = \frac{\partial^{2} l}{\partial β \partial β^{'}} = n X^{'} Σ^{- 1} X .

(24)

The expected Fisher-information matrix of

λ

is as follows:

I_{λ λ^{'}} = - E (\frac{\partial^{2} l}{\partial λ \partial λ^{'}}) = \frac{n}{4} \sum_{l = 1}^{p} \sum_{k_{1} = 1}^{l} \sum_{k_{2} = 1}^{l} a_{l k_{1}} a_{l k_{2}} ρ_{k_{1} k_{2}} (h_{k_{1}} h_{k_{1}}^{'} + h_{k_{2}} h_{k_{1}}^{'}) .

(25)

Similarly to how we deal with the score function for

γ

,

- I_{γ γ^{'}} = (\begin{matrix} \frac{\partial^{2} l}{\partial γ_{1} \partial γ_{1}^{'}} & \frac{\partial^{2} l}{\partial γ_{1} \partial γ_{2}^{'}} & \dots & \frac{\partial^{2} l}{\partial γ_{1} \partial γ_{q_{γ}}^{'}} \\ \frac{\partial^{2} l}{\partial γ_{2} \partial γ_{1}^{'}} & \frac{\partial^{2} l}{\partial γ_{2} \partial γ_{2}^{'}} & \dots & \frac{\partial^{2} l}{\partial γ_{2} \partial γ_{q_{γ}}^{'}} \\ ⋮ & ⋮ & ⋱ & ⋮ \\ \frac{\partial^{2} l}{\partial γ_{q_{γ}} \partial γ_{1}^{'}} & \frac{\partial^{2} l}{\partial γ_{q_{γ}} \partial γ_{2}^{'}} & \dots & \frac{\partial^{2} l}{\partial γ_{q_{γ}} \partial γ_{q_{γ}}^{'}} \end{matrix})

(26)

We have:

\begin{matrix} - 2 \frac{\partial^{2} l}{\partial γ_{m_{1}} \partial γ_{m_{2}}^{'}} = & n \times t r (R^{- 1} Δ (\frac{\partial ρ_{j k}}{\partial γ_{m_{2}}}) R^{- 1} Δ (\frac{\partial ρ_{j k}}{\partial γ_{m_{1}}})) \\ + \sum_{i = 1}^{n} t r (r_{i}^{'} D^{- 1} R^{- 1} Δ (\frac{\partial ρ_{j k}}{\partial γ_{m_{1}}} + \frac{\partial ρ_{j k}}{\partial γ_{m_{2}}}) R^{- 1} Δ (\frac{\partial ρ_{j k}}{\partial γ_{m_{1}}} + \frac{\partial ρ_{j k}}{\partial γ_{m_{2}}}) R^{- 1} D^{- 1} r_{i}) \end{matrix}

(27)

We use the same method to find the cross section parts with

γ

:

\begin{matrix} I_{γ_{m} β} = I_{β γ_{m}} & = - \sum_{i = 1}^{n} t r (\frac{\partial r_{i}^{'}}{\partial β} D^{- 1} R^{- 1} Δ (\frac{\partial ρ_{j k}}{\partial γ_{m}}) R^{- 1} D^{- 1} r_{i}) \\ = \sum_{i = 1}^{n} t r (X^{'} D^{- 1} R^{- 1} Δ (\frac{\partial ρ_{j k}}{\partial γ_{m}}) R^{- 1} D^{- 1} r_{i}) . \end{matrix}

(28)

and

\begin{matrix} I_{γ_{m} λ} = I_{λ γ_{m}} & = - \sum_{i = 1}^{n} t r (r_{i}^{'} \frac{\partial D^{- 1}}{\partial λ} R^{- 1} Δ (\frac{\partial ρ_{j k}}{\partial γ_{m}}) R^{- 1} D^{- 1} r_{i}) \\ = 2 \sum_{i = 1}^{n} t r (r_{i}^{'} D^{- 1} H R^{- 1} Δ (\frac{\partial ρ_{j k}}{\partial γ_{m}}) R^{- 1} D^{- 1} r_{i}) \end{matrix},

(29)

where

H = {(h_{1}, h_{2}, \dots, h_{p})}^{'}

. We can show that the expectations of off block diagonal matrices

I_{β λ}

and

I_{β γ}

are equal to

0

. Detail of derivation of score functions and expected Fisher-information matrix can be found in Supplementary Materials.

Giving the expected Fisher-information matrix in the form:

I = (\begin{matrix} I_{β β^{'}} & 0 & 0 \\ 0 & I_{λ λ^{'}} & I_{λ γ^{'}} \\ 0 & I_{λ γ^{'}}^{'} & I_{γ γ^{'}} \end{matrix}),

The Fisher-scoring algorithm can be applied here. This algorithm conveniently gives us the asymptotic covariance matrix of the estimator

\hat{β}

,

\hat{λ}

and

\hat{γ}

, the inverse Fisher-information matrix evaluated at the MLEs.

We propose the simple regression algorithm based on the direct relationship between

ρ_{j k}

and

ϕ_{j k}

as in function (13). We can use sample variance and the correlation matrix to estimate covariance coefficients

λ

and

γ

, even if the sample covariance matrix is singular. This approach cannot be applied to the HPC method for the requirement that the correlation matrix be positive-definite for the necessity of Cholesky decomposition. Otherwise there is no direct way to translate R into

Φ

.

This estimation method is more computationally efficient compared to the Newton–Raphson algorithm. Since we use LSE and GWLSE, there is no optimal searching problem, and also the grand optimality is guaranteed. As we prove before, parameter

β

in the mean model is independent of the covariance coefficients

λ

and

γ

, so there is no problem separating the estimation procedures as above.

Algorithm 1:Newton–Raphson Algorithm

Algorithm 2:Simple Regression Algorithm

5. Asymptotic Property

In this section, consistency and asymptotic normality of AHPC estimators under standard situation

p < n

are proven, similar to the proof in [10], under some regularity conditions.

Theorem 1.

Under the regularity conditions when sample size n goes to ∞, the maximum likelihood estimator

\hat{β}

,

\hat{λ}

and

\hat{γ}

is consistent with the true value,

{({\hat{β}}^{'}, {\hat{λ}}^{'} . {\hat{γ}}^{'})}^{'} \overset{a . s}{\to} {(β_{0}^{'}, λ_{0}^{'}, γ_{0}^{'})}^{'}

.

We already have the score functions for

β

,

λ

and

γ

. Based on the consistency of MLEs

\hat{β}

,

\hat{λ}

and

\hat{γ}

in Theorem 1 , we then have:

E {(\frac{\partial l}{\partial β})}_{\hat{β}} = 0, E {(\frac{\partial l}{\partial λ})}_{\hat{λ}} = 0, E {(\frac{\partial l}{\partial γ})}_{\hat{γ}} = 0 .

By simple calculation, we can prove that:

\begin{matrix} E (\frac{\partial l}{\partial β} \frac{\partial l}{\partial β^{'}}) = - E (\frac{\partial^{2} l}{\partial β \partial β^{'}}), & E (\frac{\partial l}{\partial β} \frac{\partial l}{\partial λ^{'}}) = - E (\frac{\partial^{2} l}{\partial β \partial λ^{'}}), \\ E (\frac{\partial l}{\partial β} \frac{\partial l}{\partial γ^{'}}) = - E (\frac{\partial^{2} l}{\partial β \partial γ^{'}}), & E (\frac{\partial l}{\partial λ} \frac{\partial l}{\partial λ^{'}}) = - E (\frac{\partial^{2} l}{\partial λ \partial λ^{'}}), \\ E (\frac{\partial l}{\partial λ} \frac{\partial l}{\partial γ^{'}}) = - E (\frac{\partial^{2} l}{\partial λ \partial γ^{'}}), & E (\frac{\partial l}{\partial γ} \frac{\partial l}{\partial γ^{'}}) = - E (\frac{\partial^{2} l}{\partial γ \partial γ^{'}}) . \end{matrix}

Then, similar to the proof in [13], we have the asymptotic normality of estimators.

Theorem 2.

Based on the regularity conditions and Liapounov form of the multivariate central limit theorem,

{({\hat{β}}^{'}, {\hat{λ}}^{'}, {\hat{γ}}^{'})}^{'}

is asymptotically normally distributed with variance

I {(β_{0}, λ_{0}, γ_{0})}^{- 1}

:

\sqrt{n} (\begin{matrix} \hat{β} - β_{0} \\ \hat{λ} - λ_{0} \\ \hat{γ} - γ_{0} \end{matrix}) d M V N (\vec{0}, I {(β_{0}, λ_{0}, γ_{0})}^{- 1}) .

Theorems 1 and 2 indicate the asymptotic consistency and normality of AHPC estimators

{({\hat{β}}^{'}, {\hat{λ}}^{'}, {\hat{γ}}^{'})}^{'}

, respectively. Regularity conditions and the proof of theorems above are included in Supplementary Materials.

6. Example Studies

In this section we set up several numerical examples to test the order-dependence of HPC and permutation invariance of AHPC. We directly decompose and model the correlation matrix

R_{0}

using both methods instead of making estimations by fitting sample data. In this way we isolate the permutation variation, which is the main difference between the two methods, from randomness.

6.1. Order-Dependency of Entries in $Φ$ of HPC

Giving a set of

Y_{j}

j = 1, 2, 3, 4, 5, 6

, assume there is no nature order for

Y_{j}

s. Then, under order

Y_{1}, Y_{2}, Y_{3}, Y_{4}, Y_{5}, Y_{6}

, we will have HPC decomposition of

R_{0}

as

R_{0} = T_{0} T_{0}^{'}

and the corresponding

Φ_{0}

matrix,

T_{0} = (\begin{matrix} 1 & 0 & 0 & 0 & 0 & 0 \\ 0.980 & 0.198 & 0 & 0 & 0 & 0 \\ 0.921 & 0.321 & 0.219 & 0 & 0 & 0 \\ 0.696 & 0.387 & 0.218 & 0.562 & 0 & 0 \\ 0.169 & - 0.028 & - 0.223 & - 0.399 & 0.872 & 0 \\ - 0.588 & - 0.596 & - 0.467 & - 0.265 & - 0.093 & 0.013 \end{matrix}), Φ_{0} = (\begin{matrix} 0 & 0 & 0 & 0 & 0 & 0 \\ 0.2 & 0 & 0 & 0 & 0 & 0 \\ 0.4 & 0.6 & 0 & 0 & 0 & 0 \\ 0.8 & 1.0 & 1.2 & 0 & 0 & 0 \\ 1.4 & 1.6 & 1.8 & 2.0 & 0 & 0 \\ 2.2 & 2.4 & 2.6 & 2.8 & 3.0 & 0 \end{matrix}) .

First, we change the order into

Y_{1}, Y_{2}, Y_{5}, Y_{4}, Y_{3}, Y_{6}

by switching the position between

Y_{3}

and

Y_{5}

. That will make

R_{0}

into

R_{3 / 5}

by relocating entries’ positions in the dark shaded area:

The HPC decomposition of

R_{3 / 5}

is as follows:

T_{3 / 5} = (\begin{matrix} 1 & 0 & 0 & 0 & 0 & 0 \\ 0.980 & 0.198 & 0 & 0 & 0 & 0 \\ 0.169 & - 0.028 & 0.985 & 0 & 0 & 0 \\ 0.696 & 0.387 & - 0.277 & 0.535 & 0 & 0 \\ 0.921 & 0.321 & - 0.049 & 0.063 & 0.204 & 0 \\ - 0.588 & - 0.596 & 0.131 & - 0.401 & - 0.345 & 0.013 \end{matrix}), Φ_{3 / 5} = (\begin{matrix} 0.0 & 0.0 & 0 & 0 & 0 & 0 \\ 0.2 & 0.0 & 0 & 0 & 0 & 0 \\ 1.4 & 1.6 & 0 & 0 & 0 & 0 \\ 0.8 & 1.0 & 2.048 & 0 & 0 & 0 \\ 3 - 3 0.4 & 0.6 & 1.8 & 1.268 & 0 & 0 \\ 3 - 3 2.2 & 2.4 & 1.328 & 2.429 & 3.103 & 0 \end{matrix}) .

The values of entries on the left-hand side of the vertical line remain the same. Only the location of the entries changes accordingly. For the reasons we stated before, on the right-hand side are all new values, except 0s. This simple simulation study testifies that the HPC method on the correlation matrix is order dependent in terms of entries in

Φ

. Further numeric example studies are in Supplementary Materials.

6.2. Order Dependency of Covariance Model of HPC

Next, we study how the change of order under HPC would affect model assumption. Setting dimension

p = 50

, and a cubic linear model for correlation component

ϕ_{j k}

:

ϕ_{j k} = γ_{1} + γ_{2} ω_{j k} + γ_{3} ω_{j k}^{2} + γ_{4} ω_{j k}^{3},

(32)

where

ω_{j k} = | j - k |

(1 < k < j \leq p)

. Under the starting sequential order

1, 2, \dots, 50

,

γ_{n u l l} = {(1.671, - 1.004 \times 10^{- 03}, - 4.982 \times 10^{- 04}, 9.967 \times 10^{- 06})}^{'}

. Based on model (32) and

γ_{n u l l}

we can have

Φ_{n u l l}

,

T_{n u l l}

and

R_{n u l l}

. We plot

ϕ_{j k}

s out against their corresponding

ω_{j k}

s; we can see they form a smooth cubic line in Figure 3a. Then we switch the order 1 and 50; under the new order

50, 2, 3, \dots, 49, 1

we can have the new correlation matrix

R_{1 / 50}

by re-allocating some relevant elements. By applying the HPC method, we have

T_{1 / 50}

and

Φ_{1 / 50}

. Then we rearrange the design matrix

Ω_{1 / 50}

according to the new order. Again we plot

Φ_{1 / 50}

against their corresponding

Ω_{1 / 50}

in Figure 3b. It can be observed that all values are different since the order changed.

We fit a new cubic liner model on the

ϕ_{j k}

s under the new order. As a result, new

{\hat{γ}}_{1 / 50}

is generated,

{\hat{γ}}_{1 / 50} = {(1.614594, 1.503461 \times 10^{- 03}, - 3.527650 \times 10^{- 04}, 5.440070 \times 10^{- 06})}^{'}

. It is obvious that

{\hat{γ}}_{1 / 50}

in Figure 3a is quite different from the original

γ_{n u l l}

, which is plotted in Figure 3b.

Even worse, when using

{\hat{γ}}_{1 / 50}

to create an estimate

{\hat{R}}_{1 / 50}

, we find

{\hat{R}}_{1 / 50}

is a poor estimator of

R_{1 / 50}

with a large sum of the absolute difference (SAD), the Frobenius norm

| | R_{1 / 50} - {\hat{R}}_{1 / 50} | | = 60.91713

. That implies that the model assumption based on the original order may not even be appropriate under the new order. Furthermore, to test how the HPC method reacts to different kinds of order changes, we do a series of simulations as described below.

Algorithm 3:HPC Reactions to Different Permutation Changes

1: Switch order j and k. And get real $R_{j / k}$ by reallocating elements in $R_{n u l l}$ .
2: Apply HPC method on $R_{j / k}$ to get true value for $T_{j / k}$ and $Φ_{j / k}$ . And rearrange $Ω$ into $Ω_{j / k}$ under the changed order.
3: Fit a cubic liner model on $Φ_{j / k}$ against $Ω_{j / k}$ to get estimator ${\hat{γ}}_{j / k}$ . And use ${\hat{γ}}_{j / k}$ to create estimator ${\hat{Φ}}_{j / k}$ , ${\hat{T}}_{j / k}$ and ${\hat{R}}_{j / k}$ .
4: Record the Frobenius norms $| | R_{j / k} - {\hat{R}}_{j / k} | |$ , $| | T_{j / k} - {\hat{T}}_{j / k} | |$ and $| | Φ_{j / k} - {\hat{Φ}}_{j / k} | |$ .

First, we test how the distance between the pair of data we switch would affect the SAD between the estimator and true value. Here we do 49 times different switches between order 1 and k (

k = 2, 3, \dots, 50

). The results can be seen in Figure 4, from which we can observe the SAD between estimators and the true value getting bigger with the increase of distance between switched positions. That indicates that the further position being switched, the more significantly biased the estimator becomes.

Second, we test how the first position we switch would affect the estimators. This time we also use the procedure above, while setting

j = 1, 2, 3, \dots, 49

and

k = j + 1

. Then we have Figure 5, from which we can see the bias decrease as further change happens from the first order. Furthermore, comparing with Figure 4, we can see the effect of distance between the pair of switched positions is much stronger than that of the minimum number being changed.

6.3. Permutation Invariance of Entries in $Φ$ in AHPC

Now we apply the same simulation study on AHPC. Giving the same starting correlation matrix

R_{0}

as above, after Cholesky decomposition on

R_{0}

, the same

T_{0}

as the one before is generated. Based on Equation (14) we have

{\tilde{Φ}}_{0}

for the AHPC. Then we swap positions 3 and 5, which gives

R_{3 / 5}

and

T_{3 / 5}

as those in HPC. According to the new proposed definition of the translation between

Φ

and T in AHPC, we have

{\tilde{Φ}}_{3 / 5}

for AHPC:

all those gray shaded parts have positions re-allocated, but the value remains the same as the original

{\tilde{Φ}}_{0}

for AHPC. We can see that, after the order changes, all values in matrix

Φ

under AHPC remain the same; only their position changes to fit the new order. In that sense, we say AHPC is order invariant in terms of matrix

Φ

.

6.4. Permutation Invariance of Covariance Model in AHPC

Then to find out how the AHPC modeling reacts to the order change, we implement the same

γ_{n u l l}

and cubic linear model assumption (32) under sequential order

1, 2, \dots, 50

. Applying the AHPC method and under the model assumption, we have

Φ_{n u l l}^{*}

,

T_{n u l l}^{*}

and

R_{n u l l}^{*}

. After switching the order between 1 and 50, applying AHPC we have

R_{1 / 50}^{*}

,

T_{1 / 50}^{*}

and

Φ_{1 / 50}^{*}

. This time we can see all values in

Φ_{1 / 50}^{*}

are as same as those in

Φ_{n u l l}^{*}

. Plotting

ϕ_{j k}

s against their corresponding

ω_{j k}

s, we have Figure 6.

Furthermore, we fit a cubic linear model to

ϕ_{j k}

that, under the new order, against their corresponding

ω_{j k}

s, involves entries from

Ω_{1 / 50}

. Then we get an estimator

{\hat{γ}}_{1 / 50} = γ_{n u l l}

. Moreover, the

{\hat{R}}_{1 / 50}^{*}

also has zero bias from

R_{1 / 50}^{*}

. In conclusion, the AHPC is order invariant in terms of both value in

Φ^{*}

and covariance modeling.

The model assumptions under HPC and AHPC are not compatible. Continue the simulation above; under the cubic linear model assumption, when correlation matrix

R_{H P C}

is generated from HPC, we decompose

R_{H P C}

by AHPC and end up with a plot of

ϕ_{j k}^{*}

against

ω_{j k}

in Figure 7a. The other way around, when correlation matrix

R_{A H P C}

is created by AHPC we decompose

R_{A H P C}

by HPC and plot Figure 7b. As we can see, the model assumptions in HPC and AHPC are not compatible with each other. Based on the asymptotic research, the consistency only holds when the model assumption is correct. Thus there is no point to doing the cross method estimation comparison under one identical model assumption.

7. Real Data Analysis

In this section, we analyze a set of weather data by HPC and AHPC. We focus on comparing the estimators of the correlation matrix. Furthermore, there is no natural order in this weather data set. In this section, jmcm package in R [5] is used for estimations of the HPC method. Function lm is applied in R to do the linear regression for AHPC. These weather data are collected from the UK government public web page, Met Office. We select

p = 5

different weather stations in the UK. They are allocated in Ballypartick Forest, Cambridge, Lewick, Leuchars and Sheffield. We cut off the data before 1962 and make this data set balance, leaving us with

n = 660

sample size. There is no missing value in this data set. The average minimum temperature in each month is recorded in Celsius degrees.

For comparison between HPC and AHPC, our interest is on the estimator of correlation matrices. Thus, sample variance

{\tilde{σ}}_{i}

is used as the estimator of the variance for station i. We set the initial order alphabetically: Ballypartick Forest, Cambridge, Lewick, Leuchars and Sheffield. Then we shift this initial order by relocating the Sheffield station to the beginning. Meanwhile, based on the non-parametric analysis of

Φ

under both methods with respect to distances

d_{i j}

between climate stations i and j, solid lines are plotted in Figure 8. Combined with the consideration of AIC and BIC, we assume three different polynomial models for

ϕ

in HPC and AHPC under two orders, respectively.

For

ϕ_{i j}

s of the HPC method, we assume a linear model under the initial order and a quadratic one under the shifted order. On the other hand, to keep the model monotonic decreasing, we model angle parameters

(1 / ϕ_{i j})

s in AHPC instead of

ϕ_{i j}

s. We assume a linear model under both orders.

Under both orders, by applying HPC and AHPC, we can get four regression results. The fitted models are plotted with dot lines in Figure 8. Cross comparison of these results with the sample correlation matrices under these two orders, by their relative errors defined as

Relative Error = | | R_{S} - \hat{R} | | / | | R_{S} | |

, where

| | . | |

is Euclidean norm, and

R_{S}

and

\hat{R}

are the sample and estimated correlation matrix, respectively. We should notice that the norm of sample correlations remains the same,

| | R_{i n i t i a l} | | = | | R_{s h i f t e d} | | = 11.055

, and the norm of the difference between sample correlation matrices under both is order

| | R_{i n i t i a l} - R_{s h i f t e d} | | = 13.69

.

Observing the results in Table 2 and Table 3, it is obvious that the estimator of HPC depends on the order of the data, while AHPC presents a consistent estimating result. Even redoing the modeling process from the model selecting stage, we observe from the relative errors in Table 2 that the relative errors between sample and estimated correlation generated on

γ_{i n i t i a l} & γ_{s h i f t e d}

are different in HPC. As one conclusion, under different permutations, the estimating results of HPC would vary. Consequentially, it is improper to use the HPC method to model the covariance matrices of data without a natural order for its failure to offer a permutation consistent estimator.

On the other hand, for AHPC, Table 3 shows the consistency of these estimators under different orders. Moreover, there is an interpretation advantage in AHPC comparing to HPC. For example, in this real data analysis, the interpretation of the relationship between correlation

ρ_{j k}

and the distance between stations j and k are not obvious in the HPC model. Meanwhile, the AHPC model for

1 / ϕ_{i j}

is monotonic decreasing, suggesting

ϕ_{j k}

increases with distance

ω_{j k}

. Moreover,

ρ_{j k} = c o s (ϕ_{j k})

ϕ_{j k} \in (0, π]

is a monotonic decreasing function. Thus

ρ_{j k}

decreases with respect to the distance between stations.

8. Conclusions and Discussion

In this paper, we addressed the permutation variation of HPC through its geometrical interpretations. Then AHPC for covariance modeling was proposed. AHPC was proven to improve the order-dependence issue of HPC. Furthermore, the direct relation between

Φ

and R in AHPC provides an advantage in making model assumptions, parameter estimations and statistical interpretations. However, due to the limitation of the relation between angles, the model assumption for angles

Φ

in AHPC must satisfy an extra constraint.

Both HPC and AHPC can only guarantee semi-positive definiteness. The reason behind this drawback of both methods is the same. From the geometrical interpretations above, we can see that the definition of angles in both methods can only ensure the symmetry and diagonal elements being 1s in the correlation matrix R. By changing the inequality constraint on angles in AHPC to strict less, we can simultaneously make sure the correlation matrix is positive definite.

These four covariance modeling methods we mentioned in this paper have their advantages. For the most accurate estimator, an appropriate model assumption is essential. Thus we may use a certain decomposition method based on the data, since different methods would generate their pattern against coefficients. Under an appropriate model assumption, estimators in all four methods are consistent. AHPC, MCD and ACD can do the model selection visually, while in some cases, model selection in HPC can only be based on statistical criteria.

There are some potential researches available on this AHPC method. As we see in simulation examples, sometimes the pattern is too complicated to fit with the linear model; the non-parametric and semi-parametric model could be applied in the AHPC method, similar to the studies of [14,15].

Supplementary Materials

The following are available online at https://www.mdpi.com/article/10.3390/math10040562/s1.

Author Contributions

Conceptualization, J.P. and Q.L.; methodology, J.P. and Q.L.; software, Q.L.; validation, J.P.; writing—original draft preparation, Q.L.; writing—review and editing, Q.L.; visualization, Q.L.; supervision, J.P.; project administration, J.P. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the National Natural Science Foundation of China grant number 11731011 and 11871357.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Not applicable.

Conflicts of Interest

The authors declare no conflict of interest.

References

Pourahmadi, M. High-Dimensional Covariance Estimation; Wiley: Hoboken, NJ, USA, 2013. [Google Scholar]
Little, R.J.; Rubin, D.B. Statistical Analysis with Missing Data; John Wiley & Sons: Hoboken, NJ, USA, 2019; Volume 793. [Google Scholar]
Zhang, C.H.; Huang, J. The sparsity and bias of the lasso selection in high-dimensional linear regression. Ann. Stat. 2008, 36, 1567–1594. [Google Scholar] [CrossRef]
Pourahmadi, M. Joint mean-covariance models with applications to longitudinal data: Unconstrained parameterisation. Biometrika 1999, 86, 677–690. [Google Scholar] [CrossRef]
Pan, J.; Pan, Y. jmcm: An R package for joint mean-covariance modeling of longitudinal data. J. Stat. Softw. 2017, 82, 1–29. [Google Scholar] [CrossRef] [Green Version]
Rebonato, R.; Jäckel, P. The Most General Methodology to Create a Valid Correlation Matrix for Risk Management and Option Pricing Purposes. 2011. Available online: https://ssrn.com/abstract=1969689 (accessed on 8 May 2012).
Rapisarda, F.; Brigo, D.; Mercurio, F. Parameterizing correlations: A geometric interpretation. IMA J. Manag. Math. 2007, 18, 55–73. [Google Scholar] [CrossRef] [Green Version]
Pourahmadi, M. Maximum likelihood estimation of generalised linear models for multivariate normal covariance matrix. Biometrika 2000, 87, 425–435. [Google Scholar] [CrossRef]
Murty, K.G. Computational and Algorithmic Linear Algebra and n-Dimensional Geometry; World Scientific Publishing Company: Singapore, 2014. [Google Scholar]
Zhang, W.; Leng, C.; Tang, C.Y. A joint modelling approach for longitudinal studies. J. R. Stat. Soc. Ser. B (Stat. Methodol.) 2015, 77, 219–238. [Google Scholar] [CrossRef]
Bates, D.M.; Watts, D.G. Nonlinear Regression Analysis and Its Applications; Wiley: New York, NY, USA, 1988; Volume 2. [Google Scholar]
Sallan, J.M.; Lordan, O.; Fernandez, V. Modeling and Solving Linear Programming with R; OmniaScience: Barcelona, Spain, 2015. [Google Scholar]
Chiu, T.Y.; Leonard, T.; Tsui, K.W. The matrix-logarithmic covariance model. J. Am. Stat. Assoc. 1996, 91, 198–210. [Google Scholar] [CrossRef]
Wang, N.; Carroll, R.J.; Lin, X. Efficient semiparametric marginal estimation for longitudinal/clustered data. J. Am. Stat. Assoc. 2005, 100, 147–157. [Google Scholar] [CrossRef] [Green Version]
Fan, J.; Huang, T.; Li, R. Analysis of longitudinal data with semiparametric estimation of covariance function. J. Am. Stat. Assoc. 2007, 102, 632–641. [Google Scholar] [CrossRef] [PubMed] [Green Version]

Figure 1. 3-dimensional illustration of HPC. (a) Interpretation of HPC method based on the idea of angles between hyper-plane. (b) Interpretation of HPC method based on Jacobi rotation.

Figure 2. A geometric representation of AHPC when

p = 3

. The

ϕ_{j k}

s are angles between

{\vec{T}}_{(j)}

and

{\vec{T}}_{(k)}

, respectively.

Figure 2. A geometric representation of AHPC when

p = 3

. The

ϕ_{j k}

s are angles between

{\vec{T}}_{(j)}

and

{\vec{T}}_{(k)}

, respectively.

Figure 3. Plot of

ϕ_{j k}

s against their corresponding

ω_{j k}

s. (a) Plot of

ϕ_{j k}

s against their corresponding

ω_{j k}

s, with

γ_{n u l l}

under sequential order

1, \dots, 50

, and the solid line in the true model. (b) Plot of

ϕ_{j k}

against

ω_{j k}

under new order

50, 2, \dots, 49, 1

in dots. The solid line is the fitted line with

{\hat{γ}}_{1 / 50}

, while the dotted line stands for the

γ_{n u l l}

model.

Figure 3. Plot of

ϕ_{j k}

s against their corresponding

ω_{j k}

s. (a) Plot of

ϕ_{j k}

s against their corresponding

ω_{j k}

s, with

γ_{n u l l}

under sequential order

1, \dots, 50

, and the solid line in the true model. (b) Plot of

ϕ_{j k}

against

ω_{j k}

under new order

50, 2, \dots, 49, 1

in dots. The solid line is the fitted line with

{\hat{γ}}_{1 / 50}

, while the dotted line stands for the

γ_{n u l l}

model.

Figure 4. Plot of the difference between estimators and true value when switching order 1 and k (

k = 2, 3, \dots, 50

).

Figure 4. Plot of the difference between estimators and true value when switching order 1 and k (

k = 2, 3, \dots, 50

).

Figure 5. Plot of differences before and after switching order j and

j + 1

(

j = 1, 2, 3, \dots, 49

).

Figure 5. Plot of differences before and after switching order j and

j + 1

(

j = 1, 2, 3, \dots, 49

).

Figure 6. Plots of

ϕ_{j k}^{*}

s in the AHPC method against

ω_{j k}

s under the original sequential order and switched order.

Figure 6. Plots of

ϕ_{j k}^{*}

s in the AHPC method against

ω_{j k}

s under the original sequential order and switched order.

Figure 7. Cross method results. (a)

ϕ_{j k}

s against

ω_{j k}

s from

R_{H P C}

de-composited by the AHPC method. (b)

ϕ_{j k}

s against

ω_{j k}

s from

R_{M H P C}

de-composited by the HPC method.

Figure 7. Cross method results. (a)

ϕ_{j k}

s against

ω_{j k}

s from

R_{H P C}

de-composited by the AHPC method. (b)

ϕ_{j k}

s against

ω_{j k}

s from

R_{M H P C}

de-composited by the HPC method.

Figure 8. Solid and dotted lines are non-parametric smooth and fitted model of covariance components

Φ

(circles in each plots) under HPC and AHPC methods separately with respect to distances between five stations, under initial and shifted order.

Figure 8. Solid and dotted lines are non-parametric smooth and fitted model of covariance components

Φ

(circles in each plots) under HPC and AHPC methods separately with respect to distances between five stations, under initial and shifted order.

Table 1. Dependence between entries in

Φ

and row vectors in T. For example, 1, 2, 3, p means element

ϕ_{p 3}

, at row p column 3 in

Φ

depends on the first, second, third and pth row vectors in T.

Table 1. Dependence between entries in

Φ

and row vectors in T. For example, 1, 2, 3, p means element

ϕ_{p 3}

, at row p column 3 in

Φ

depends on the first, second, third and pth row vectors in T.

	col1	col2	col3	...	colp-1	colp
row1	0	0	0	...	0	0
row2	1, 2	0	0	...	0	0
row3	1, 3	1, 2, 3	0	...	0	0
⋮	⋮	⋮	⋮	⋮	⋮	⋮
rowp	1, p	1, 2, p	1, 2, 3, p	...	1, 2, 3, ..., p − 1, p	0

Table 2. Resulting HPC estimates and standard error of correlation components

γ

, under initial and shifted order, respectively. Relative errors between sample and estimated correlation under different orders are cross compared.

Table 2. Resulting HPC estimates and standard error of correlation components

γ

, under initial and shifted order, respectively. Relative errors between sample and estimated correlation under different orders are cross compared.

Data Order	Initial Order		Shifted Order
	Estimator	Standard Error	Estimator	Standard Error
$γ_{0}$	5.847	2.598 $\times 10^{- 1}$	−1.544 $\times 10^{- 1}$	5.672 $\times 10^{- 1}$
$γ_{1}$	1.488 $\times 10^{- 3}$	4.93 $\times 10^{- 4}$	5.186 $\times 10^{- 3}$	2.375 $\times 10^{- 3}$
$γ_{2}$	0	0	−3.480 $\times 10^{- 6}$	2.222 $\times 10^{- 6}$
	$\hat{R} ({\hat{γ}}_{i n i t i a l})$	$\hat{R} ({\hat{γ}}_{s h i f t e d})$	$\hat{R} ({\hat{γ}}_{i n i t i a l})$	$\hat{R} ({\hat{γ}}_{s h i f t e d})$
Relative Error	0.173	0.216	1.575	0.124

Table 3. Resulting AHPC estimates and standard error of correlation components

γ

, under initial and shifted order, respectively. Relative errors between sample and estimated correlation under different orders are cross compared.

Table 3. Resulting AHPC estimates and standard error of correlation components

γ

, under initial and shifted order, respectively. Relative errors between sample and estimated correlation under different orders are cross compared.

Data Order	Initial Order		Shifted Order
	Estimator	Standard Error	Estimator	Standard Error
$γ_{0}$	1.548	2.365 $\times 10^{- 1}$	1.548	2.365 $\times 10^{- 1}$
$γ_{1}$	−1.323 $\times 10^{- 3}$	4.488 $\times 10^{- 4}$	−1.323 $\times 10^{- 3}$	4.488 $\times 10^{- 4}$
	$\hat{R} ({\hat{γ}}_{i n i t i a l})$	$\hat{R} ({\hat{γ}}_{s h i f t e d})$	$\hat{R} ({\hat{γ}}_{i n i t i a l})$	$\hat{R} ({\hat{γ}}_{s h i f t e d})$
Relative Error	0.347	0.347	0.347	0.347

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Li, Q.; Pan, J. Permutation Variation and Alternative Hyper-Sphere Decomposition. Mathematics 2022, 10, 562. https://doi.org/10.3390/math10040562

AMA Style

Li Q, Pan J. Permutation Variation and Alternative Hyper-Sphere Decomposition. Mathematics. 2022; 10(4):562. https://doi.org/10.3390/math10040562

Chicago/Turabian Style

Li, Qingze, and Jianxin Pan. 2022. "Permutation Variation and Alternative Hyper-Sphere Decomposition" Mathematics 10, no. 4: 562. https://doi.org/10.3390/math10040562

APA Style

Li, Q., & Pan, J. (2022). Permutation Variation and Alternative Hyper-Sphere Decomposition. Mathematics, 10(4), 562. https://doi.org/10.3390/math10040562

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Permutation Variation and Alternative Hyper-Sphere Decomposition

Abstract

1. Introduction

2. Interpretations of Hyper-Sphere Decomposition Method

3. Order-Dependence of Hyper-Sphere Decomposition Method

4. Alternative Hyper-Sphere Decomposition Method

4.1. The AHPC Model

4.2. Estimation Method for AHPC

5. Asymptotic Property

6. Example Studies

6.1. Order-Dependency of Entries in $Φ$ of HPC

6.2. Order Dependency of Covariance Model of HPC

6.3. Permutation Invariance of Entries in $Φ$ in AHPC

6.4. Permutation Invariance of Covariance Model in AHPC

7. Real Data Analysis

8. Conclusions and Discussion

Supplementary Materials

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Article Menu

Permutation Variation and Alternative Hyper-Sphere Decomposition

Abstract

1. Introduction

2. Interpretations of Hyper-Sphere Decomposition Method

3. Order-Dependence of Hyper-Sphere Decomposition Method

4. Alternative Hyper-Sphere Decomposition Method

4.1. The AHPC Model

4.2. Estimation Method for AHPC

5. Asymptotic Property

6. Example Studies

6.1. Order-Dependency of Entries in Φ of HPC

6.2. Order Dependency of Covariance Model of HPC

6.3. Permutation Invariance of Entries in Φ in AHPC

6.4. Permutation Invariance of Covariance Model in AHPC

7. Real Data Analysis

8. Conclusions and Discussion

Supplementary Materials

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

6.1. Order-Dependency of Entries in $Φ$ of HPC

6.3. Permutation Invariance of Entries in $Φ$ in AHPC