Abstract
An extension of the test statistic to test the equality of mean for high-dimensional and k-th order array-variate data using k-self similar compound symmetry (k-SSCS) covariance structure is derived. The k-th order data appear in many scientific fields including agriculture, medical, environmental and engineering applications. We discuss the property of this k-SSCS covariance structure, namely, the property of Jordan algebra. We formally show that our test statistic for k-th order data is an extension or the generalization of the test statistic for second-order data and for third-order data, respectively. We also derive the test statistic for third-order data and illustrate its application using a medical dataset from a clinical trial study of the eye disease glaucoma. The new test statistic is very efficient for high-dimensional data where the estimation of unstructured variance-covariance matrix is not feasible due to small sample size.
1. Introduction
We study the hypotheses testing problems of equality of means for high-dimensional and higher-order (multi-dimensional arrays) data. Standard multivariate techniques such as Hotelling’s distribution with one big unstructured variance-covariance matrix (assuming a large sample size) do not work for these higher-order data, as Hotelling’s distribution cannot incorporate any higher-order information into the test statistic and thus draws wrong conclusions []. Higher-order data are formed by representing the additional associations that are inherent in repetition across several dimensions. To obtain a better understanding of higher-order data, we first share a simple and interesting example of higher-order data:
- Traditional multivariate (vector-variate) data are the first-order data. For example, a clinical trial study of glaucoma, where several factors such as intraocular pressure (IOP) and central corneal thickness (CCT) are effective in the diagnosis of glaucoma. This example is an illustration of the vector-variate first-order data.
- When the first-order data are measured at various locations/sites or time points, the data become two-dimensional matrix-variate data, and we name it as second-order data. These data are also recognized as multivariate repeated measures data, or doubly multivariate data; e.g., multivariate spatial data or multivariate temporal data. In the above example of the clinical trial study, an ophthalmologist or optometrist diagnoses glaucoma by measuring IOP and CCT in both the eyes . So, we see how the vector-variate first-order dataset discussed in the previous paragraph becomes a matrix-variate second-order dataset by measuring variables repeatedly over another dimension.
- When the second-order data are measured at various sites, or over various time points, the data become three-dimensional array-variate data, and we name it as third-order data. In addition, these are recognized as triply multivariate data, e.g., multivariate spatio-temporal data or multivariate spatio-spatio data. In the previous example, if the IOP and CCT are measured in both eyes as well as over, say, three time points , the dataset would become third-order data.
- When the third-order data are measured at various directions, the data become four-dimensional array-variate fourth-order data, e.g., multivariate directo-spatio-temporal data or multivariate directo-spatio-spatio data.
- When the fourth-order data are measured at various depths, the data become five-dimensional array-variate fifth-order data, and so on, e.g., multivariate deptho-directo-spatio-temporal data.
In the above glaucoma data example, the dataset has variables, and these variables are repeatedly measured over various dimensions adding higher-order information to the dataset. Now, the question is, what is the higher-order information? Higher-order information is embedded in the higher-order covariance structures that are formed by signifying the additional associations that are inherent in repetition of the variables across several dimensions. The other question is how can we measure and capture the higher-order information? For this, one needs to understand how to read these structured higher-order data and how to use the appropriate variance-covariance structure to incorporate the higher-order information that are integral to the higher-order data.
Higher-order data have been studied by many authors in the last 20 years using various variance-covariance structures to reduce the number of unknown parameters, which is very important for high-dimensional data. Second-order data are studied using matrix-variate normal distribution [,]. Second-order data can also be analyzed vectorially using a two-separable (Kronecker product) variance-covariance structure [,], or a block compound symmetry (BCS), also named a block exchangeable (BE) or a 2-SSCS covariance structure []. Two-separable covariance structure for second-order data has two covariance matrices, one for each order of the data; in other words, one covariance matrix for within-subject information and the other covariance matrix for between-subject information. Combining the covariance structures of within-subject information and between-subject information results in a second-order model for second-order data. Ignoring this information often influences the test statistic, and if not properly taken care of this information, test statistic will end up yielding wrong conclusions []. To obtain a picture of the third-order data, see []. Manceur and Dutilleul [] used tensor normal distribution with double separable covariance structure. 2-SSCS and 3-SSCS covariance structures are useful tools for the analyses of the second- and third-order datasets, respectively. Manceur and Dutilleul [] also studied fourth-order data with four-separable covariance structure. In the same way, k-th order data can be analyzed vectorially with some structured variance-covariance matrix to integrate the higher-order information into the model, e.g., k-separable covariance structure [,] for the k-th order data. However, k-separable covariance structure may not be appropriate for all datasets; thus, we investigate some other structure, namely, k-SSCS covariance structure (defined in Section 3) for the k-th order data in this article. See [].
The high-dimensionality of a dataset needs to exploit the structural properties of the data to reduce the number of estimated degrees of freedom for more accurate conclusion for the k-th order data, and k-SSCS covariance structure is one of them. For example, for the third-order glaucoma data, the number of unknown parameters in the -dimensional unstructured variance-covariance matrix is 78, whereas the number of unknown parameters for 3-SSCS covariance structure is just 9 and thus may help in providing the correct information about the true association of the structured third-order data. The data quickly become high-dimensional with the increase in the order of the data, and thus, the variance-covariance matrix becomes singular for small samples, and thus testing of mean is not possible. This necessitates the development of new statistical methods with a suitable structured variance-covariance matrix, which can integrate the existing correlations information of the higher-order data into the test statistic and can take care of the high-dimensionality of the data as well.
Rao [,] introduced 2-SSCS covariance structure while classifying genetically different groups. Olkin and Press [] examined a circular stationary model. The problem of estimation in balanced multilevel models with a block circular symmetric covariance structure was studied by Liang et al. []. Olkin [] studied the hypothesis testing problem of the equality of the mean vectors of multiple populations of second-order data using a 2-SSCS covariance structure, which is reminiscent of a model of Wilks []. Arnold [] studied normal testing problems that mean is patterned when the variance-covariance matrix has a 2-SSCS structure. Arnold [] also studied multivariate analysis of variance problem when the variance-covariance matrix has a 2-SSCS structure. Arnold [] later developed linear models with a 2-SSCS structure as the error matrix for one matrix-variate observation. Roy et al. [] and Žežula et al. [] studied the hypotheses testing problems on the mean for the second-order data using a 2-SSCS covariance structure. There are few studies on third-order data using the 3-SSCS covariance structure. See Leiva and Roy [] for classification problems and Roy and Fonseca [] for linear models with a 3-SSCS covariance structure on the error vectors. Recently, Žežula et al. [] studied the mean value test for third-order data using a 3-SSCS covariance structure.
A majority of the above-mentioned authors only studied the second-order matrix-variate data and used a 2-SSCS covariance structure where the exchangeability (invariance) property in one factor was present. However, we obtain datasets these days with more than one factor, and the assumption of exchangeability on the levels of factors is appropriate for these datasets. A k-SSCS structured matrix results from the exchangeability property of the factors of a dataset. Employing a 2-SSCS covariance structure would be wrong for the datasets with more than one factor. One may construct a second-order data from a k-th order data by summing the observations, however, it would result in a loss of detailed information of particular characteristics that may be of interest. One may also consider matricization of the k-th order data to a second-order data and then using the 2-SSCS covariance structure, but then, once again, all the correlation information will be wiped out. So, the development of new statistical methods are in demand to handle the k-th order data using k-SSCS variance-covariance matrix.
The aim of this paper is to derive a test statistic for mean for high-dimensional k-th order data using k-SSCS covariance matrix by generalizing test statistics developed in Žežula et al. []. In doing so, we exploit the distributions of the eigenblocks of the k-SSCS covariance matrix. We obtain the test statistic to test the mean for one sample case, paired samples case and two independent samples case. We show through Remark 2 that our generalized test statistic for the k-th order data generalizes the test statistic for the second-order data, and we derive the test statistic for the third-order data, which is largely motivated by the work of Žežula et al. [] and in Remark 2 as well.
This article is organized as follows. In Section 2, we set up some preliminaries about some matrix notations and definitions related to block matrices. Section 3 has the definition of k-SSCS covariance matrix. Section 3 has properties of the k-SSCS covariance matrix, such as Jordan algebra. Section 4 discusses the estimation of the eigenblocks and their distributions. The test for the mean for one population is proposed in Section 5. Tests for the equality of means for two populations are proposed in Section 6, and an example of a dataset exemplifying our proposed method is presented in Section 7. Finally, Section 8 concludes with some discussion and the scope for the future research.
2. Preliminaries
Let for be natural numbers greater than and be given by:
with We denote by the set , for
Definition 1.
We say that a matrix is a k-th order block matrix according to the factorization to point out that it can be expressed as k different “natural" partitioned matrix forms, that is:
Note that for the case , the matrix is a dimensional matrix with blocks. Clearly, both and for second-order data, and , and for third-order data, and so on. Next, we define matrix operators that will be useful tools in working with these k-th order block matrices, where . Let denote the set of -matrices.
Definition 2.
Let and denote the and block operators from to for , respecitvely, where will always be evident from the context. These block operators applied to a matrix
give the following -matrices:
The subindex in these block matrix operators represents dimensional blocks in a partitioned square matrix , and thus their use results in dimensional matrices. Many useful properties of these block operators, which we will use later in this article, are examined in Leiva and Roy []. For any natural number we use the following additional notations:
where , with be the vector of ones, and being the identity matrix with the ith column vector of Observe that and are idempotent matrices and mutually orthogonal to each other, that is:
For a fixed natural number let be the -matrix
where, the symbol ⊗ represents the Kronecker product operator and for each
with
and . Also, let be the -matrix such that
where
3. Properties of the Self Similar Compound Symmetry Covariance Matrix
Let be an variate vector of measurements on the rth replicate (individual) at the factor combination. Let be the variate vector of all measurements corresponding to the rth sample unit of the population, that is, . Thus, the unstructured covariance matrix has unknown parameters that can be large for random values of either of the ’s. Consequently, if the data are high-dimensional, k-SSCS covariance matrix (defined bellow in Definition 3) with number of unknown parameters is a good choice if the exchangeable feature is present in the data.
Definition 3.
We say that has a k-SSCS covariance matrix if is of the form:
where for are -matrices called SSCS-component matrices, with the assumption that is equal to the real number 1.
The covariance matrix given in (8) is called k-self similar compound symmetry covariance matrix because if we consider the -dimensional vector with a k-SSCS covariance matrix and for each fixed , we also consider the partition of in -subvectors. Then, its corresponding covariance matrix is partitioned in -submatrices, which is -SSCS matrix (see Leiva and Roy []) as follows:
where is the g-SSCS matrix given by:
The existence of can be proved using the principle of Mathematical Induction and to derive its expression as well. For the expression of , we need matrices for , which are defined as follows:
where and Note that
It can be proved that if matrices are non-singular, then exists and is given by:
(see Leiva and Roy []), where the symbol indicates the zero matrix . It is worthwhile to note that the structure of is the same as the structure of that is, it has the k-SSCS structure given in (9) with (10) and (11) and
where, in this formula , is as follows
Using a similar inductive arguments, it can be proved that:
where the matrices are given by (12), and it is assumed that and . The matrices are the k eigenblocks of the k-SSCS covariance structure. See Lemma 4 of Leiva and Roy [] for proof. The matrix can be written as the following sum of k orthogonal parts:
and if exists, then it can be written as:
where is given in (5), for each and, for is given in (7).
The conventional Hotelling’s statistic to test the mean is based on the unbiased estimate of the unstructured variance-covariance matrix, which follows a Wishart distribution. Nevertheless, the unbiased estimate of the k-SSCS covariance matrix does not follow a Wishart distribution, and thus the test statistic to test the equality of mean does not follow Hotelling’s statistic. We thus make a canonical transformation of the data to block diagonalize the k-SSCS covariance matrix, and show that a scalar multiple of the estimates of the diagonal blocks (eigenblocks) follow independent Wishart distributions and use this property in our advantage to obtain test statistics to test the mean for the k-th order data . We see from Leiva and Roy [] that the k-SSCS matrix given by (8) can be transformed into an -block diagonal matrix (blocks in the diagonal are -matrices) by pre- and post-multiplying by appropriate orthogonal matrices.
For let denote the identity matrix, that is:
and let
where is an Helmert matrix for each i.e., each is an orthogonal matrix whose first column is proportional to Then:
is an orthogonal matrix (note that are not function of either of the ’s), and in particular
Lemma 4 of Leiva and Roy [] states and proves the block diagonalization result of the k-SSCS matrix by using the orthogonal matrix as defined in (16), that is:
where, for each the -diagonal matrices are given by:
where is not taken into consideration, that is:
Thus, are the k eigenblocks of the k-SSCS covariance matrix . We will obtain the estimators of the eigenblocks in Section 4. In the following section, we briefly discuss that the k-SSCS covariance structure is of the Jordan algebra type.
k-SSCS Covariance Structure Is of the Jordan Algebra Type
The k-SSCS covariance structure is of the Jordan algebra type (Jordan et al. []). Let be the set of all k-SSCS matrices. It is clear that under the usual matrix addition and scalar multiplication, is a subspace of the linear vectorial space of the symmetric matrices. For any natural number , it is easy to prove the following proposition:
Proposition 1.
Therefore, we conclude that is a Jordan Algebra. See Lemma 4.1 on Page 10 in Malley [], which states that is a Jordan Algebra if and only if for all . See Roy et al. [] and Kozioł et al. [] for proofs that 2-SSCS and 3-SSCS covariance structures are of Jordan algebra types.
4. Estimators of the Eigenblocks
Let be random vectors partitioned into subvectors as follows:
The vectors are a random sample from a population with distribution where is a positive definite k-SSCS structured covariance matrix as given in (8) in Definition 3. Let be the -sample data matrix as follows:
with
In this section, we prove that certain unbiased estimators (to be defined) of the matrix parameters can be written as functions of the usual sample variance-covariance matrix as follows:
where is given in (2) with (3). Now the sample mean can be expressed as:
Thus, in can be expressed as:
Since is an unbiased estimator of , we have:
Therefore, to find a better unbiased estimator of , we average all the above random matrices that are unbiased estimator of the same . The unbiased estimators of for each are derived in Lemma 5 in Leiva and Roy [] with defined in Lemma 3 in Leiva and Roy [] as:
Unbiased estimators of the eigenblocks can be obtained from (13). Then, using (14), the unbiased estimators of can be obtained as the following othogonal sums:
and if exists, it can be obtained from (15) as follows:
where is given in (5), for each and, for is given in (7).
The computation of the unbiased estimates of the component matrices for each is easy, as all of them have explicit solutions. At this point, we want to mention that for k-separable covariance structure the estimates of the component matrices are not easy, as the MLEs have implicit equations, and therefore are not tractable analytically. Now, from Theorem 1 of Leiva and Roy [], we see that a multiple of the unbiased estimators of the eigenblocks for each , have Wishart distributions as follows:
where given by (4) with (5) and (6) with (7), are independent and
From Corollary 1 of Leiva and Roy [], the 2-SSCS covariance matrix for second-order data or multivariate repeated measures data has two eigenblocks, and with multiplicity , and their distributions are as follows:
The 3-SSCS covariance matrix for third-order data has three eigenblocks, , with multiplicity and with multiplicity , and their distributions are as follows:
5. Test for the Mean
5.1. One Sample Test
Using the notation and assumptions in Section 4, let be a -dimensional data matrix formed from the random samples from . Let be the sample mean, then . We are interested in testing the following hypothesis:
for known . For testing hypothesis (21), we use the test statistic defined as:
5.1.1. Distribution of Test Statistic under
Now, let be the matrix as given in (16). We use here the following canonical transformation:
Therefore, . Therefore, according to (17) with (18), we have:
where, for each the diagonal - matrices are given by:
where is not taken into consideration, and the component vectors with are independent. The distribution of , under is given by:
Since
for we have:
and for we have:
Therefore, using (20), the statistic in (22) can be written as:
that is,
where for
and for , we assume
Note that the subsets of vectors involved in respectively, form a partition of the set of independent vectors . Therefore, are mutually independent. Moreover, since for
where
and
with
Therefore, given by (24) reduces to
and has a Lawley–Hotelling trace (LH-trace) distribution denoted by if Note that, using (25), the case reduces to and . Then, has the LH-trace distribution if
Thus, the distribution of given by (23) is the convolution of k-independent LH-trace distributions:
The critical values of this distribution can be obtained using simulations. However, LH-trace distribution is usually aproximated by distribution, and we use here the second approximation suggested in McKeon []. For jth case, i.e., for let us use the notations , and
Then, the distribution
of can be approximated by where and
Finally, for the distribution
is the usual Hotelling that is, distributed as an exact distribution as follows:
This means that the distribution of can be approximated by the convolution of the above k distributions ( approximated distribution and one exact distribution), where its critical values are obtained by the method suggested by Dyer [].
Remark 1.
The statistic has LH-trace distribution if . We also note that has the LH-trace distribution if . Now, for k-th order data, all for and . See Definition 1. Now, . Therefore, and then . Thus, . Since for all k-th order data, , we have when . Therefore, the only constraint needed on sample size in order to have all to follow LH-trace distribution is , i.e., , regardless of any . In essence, the minimum sample size needed to compute the test statistic is , although the minimum sample size needed to compute the Hotelling’s test statistic is , where is the full dimension of the observations. For this reason, one cannot compute Hotelling’s test statistic for a small sample data set where , which is doable for the test statistic.
We will now discuss some special cases of the statistic in the following remark.
Remark 2.
For second-order data or multivariate repeated measures data, . Now, is distributed as LH-trace distribution , and is distributed as LH-trace distribution as follows
Thus, is distributed as LH-trace distribution .
So, we see that this test exactly matches the test obtained by Žežula et al. [] for multivariate repeated measures data (second-order data) with 2-SSCS or BCS covariance structure. Therefore, we can say that our mean test statistic in this article is an extension or generalization of Žežula et al.’s [] mean test statistic for k-th order data with k-SSCS covariance structure.
We will now derive the mean test statistic for third-order data with 3-SSCS covariance structure. For third-order data, . Now, is distributed as LH-trace distribution , and is distributed as LH-trace distribution as follows:
that can be approximated by where and with and and is distributed as LH-trace distribution as follows
that can be approximated bywhere and with and
So, one can easily derive the test statistic for j-th order data for from our generalized statistic. The distribution of under for second-order data with 2-SSCS covariance structure is discussed in detail in Žežula et al. []. We will discuss the distribution of under for third-order data in detail in the following section.
5.1.2. Distribution of Statistic under for Third-order Data with 3-SSCS Covariance Structure
This section is adopted from the work of Žežula et al. []. However, we use a much simpler, straightforward approach so that the practitioners or the analysts can appreciate and apply the method easily. Let be a matrix such that for each is an Helmert matrix, that is, an orthogonal matrix with the first column proportional to the vector of 1’s. We use here the following canonical transformation:
Therefore, where
where, for each the diagonal -matrices are given by
and the component vectors with are independent, with distributions (under the null hypothesis) if
Therefore, particularizing , given by (23), for we have
Since the subsets of vectors involved in respectively, form a partition of the set of independent vectors are mutually independent. Moreover, since, for
has a LH-trace distribution denoted by if Similarly, for
has a LH-trace distribution denoted by if Note that the case reduces to and then has the LH-trace distribution if
Therefore, the distribution of is the following convolution of three independent LH-trace distributions:
if for , The critical values of this distribution can be obtained using simulations. LH-trace distribution is usually approximated by distribution as mentioned before, however, we use here the second approximation suggested in McKeon [].
For denoting by by and by
the distribution
of can be approximated by where and
For denoting by by and by
the distribution
of can be approximated by where and
Finally, for our last case corresponding to the distribution is the usual Hotelling that is, an exact distribution as follows:
This means that the distribution of can be approximated by the convolution of the above three distributions (two approximated distribution and one exact distribution), where its critical values are obtained by the method suggested by Dyer [].
Now, we need to perform the convolution of three distribution functions. Since convolution is associative, for three distribution functions and , the associative law of random variables implies that , so we can dispense with the parentheses and can write . In the following section, we present the unbiased estimates of the eigenblocks for a 3-SSCS covariance matrix.
5.2. The Expressions of the ’s Estimators for the Case
- 1.
- From Lemma 5 in Leiva and Roy [], the unbiased estimators of for each are written as follows:andwhere are given in (19). Therefore, an unbiased estimator of is given by:Since k-SSCS matrix in (9) is of Jordan algebra type, following Kozioł et al. [] one can show that the above estimate is the best unbiased, consistent and complete estimator for .
- 2.
- For each an unbiased estimator of is given by:where and or equivalently:The above unbiased estimators admit the following expressions as functions of :for and where and and an unbiased estimator of , is given by:
6. Test for the Equality of Two Means
6.1. Paired Observation Model
In this section, we consider that in each one of the n individuals, a -variate vector is measured at two different times (e.g., before and after a treatment). These measurements are k-th order (array-variate) measurements from each individual. To be more precise, for each let and be the paired -dimensional vectors measured at the site of the r individuals, for Let be the partitioned -variate vectors where and where and with and , respectively, where and are the paired measurements taken from the rth individual, for We assume that where i.i.d. stands for independent and identically distributed, and and is the partitioned -matrix
where
The matrices and are accountable for the linear dependence among the considered paired measurements. Particular cases of could be of interest, e.g., that is, (see, for example, when Definition 2 on Page 388 in Leiva []). Under this set up, we are interested in testing the following hypothesis:
If we define the above hypothesis is equivalent to
as Moreover, are i.i.d. where and
Assuming is a positive definite matrix and that one may consider the likelihood ratio test for the above hypothesis testing problem for k- level multivariate data assuming the mean vectors and are unstructured. Note that this test problem reduces to the one sample mean case of the previous section where Therefore, all the results obtained in the previous section are valid for this case. Following the same logic as in Remark 1, the needed sample size for the test is , regardless of any .
6.2. Independent Observation Model
In this section, we consider the case where we have two independent samples: one random sample of size of vectors with where , …, and where with being the measurements taken from the individual, for and another random sample of size of vectors , with where , where , …, and where with being the measurements taken from the individual, for for .
Our interest is in testing the following hypothesis:
under the assumption that is an unknown k-SSCS covariance matrix of the form (8). Let and denote the corresponding two sample matrix data. We know that the sample means and are independent of the covariance matrix estimators and respectively. Therefore, they are also independent of the pooled unbiased estimator (convex linear combination of unbiased estimators of which is given by:
where
Now
We know that under :
Due to the exchangeable form of it is clear that we again have:
Note that each of the following expressions is the arithmetic mean of all submatrices of , which, according to (31), have the same expectation. It is easy to prove that for each an unbiased estimator of for is given by:
and for after some algebraic simplification is given by:
where is given in (19). Therefore, we can use the following unbiased estimator of variance and covariance matrices
where, for each the diagonal -matrices are given by:
where is not taken into consideration and where
with , or equvalently:
The usual likelihood ratio test of (30) is to reject if:
Since ,
Nevertheless, we cannot use the above result, as in our case, is an estimator of . However, by Theorem 1 of Leiva and Roy [], we know that the random vectors and where are given by (4) with (5) and (6) with (7) are independent and
Since the estimators are functions of they are independent of Therefore, using a similar procedure as in the one sample case where we used the transformation , we now use the following transformation:
According to the previous result, where
where, for each the diagonal - matrices are given by:
Using a similar result as the one used in the one sample case, we obtain the statistic as follows:
Then, the distribution of is the convolution of k independent LH-trace distributions as follows:
The only condition needed on the sample size in order to have the above convolution of k independent LH-trace distribution is . However, LH-trace distributions are usually approximated by distribution (We use here the second approximation suggested in McKeon []). This means that the distribution of can be aproximated by the convolution of k distributions, where its critical values are obtained by the method sugested by Dyer [].
7. An Example
We apply our proposed extended test statistic to a third-order () medical dataset as described in the Introduction, where the interest is in testing the equality of mean of a population of glaucoma patients to a target mean of another population of glaucoma patients []. Several studies showed that the central corneal thickness (CCT) plays a major role in the diagnosis of glaucoma. Intraocular pressure (IOP) is positively correlated with CCT and may therefore affect diagnosis. Therefore, CCT should be measured along with IOP in all patients for verification of glaucoma. CCT and IOP vary from individual to individual, from right eye to left eye, and from time to time. We have a sample of 30 glaucoma patients. Measurements on IOP and CCT were taken from both the eyes (sites) and were observed over three time points at an interval of three months. Clearly, then, this dataset is a third-order dataset with , and . This dataset was studied by Leiva and Roy [] by assuming a 3-SSCS covariance structure. Here, we also assume that this dataset has a 3-SSCS covariance structure. The -dimensional sample partitioned mean vector in our sample of 30 glaucoma patients is presented in Table 1.
Table 1.
The (2 × 1) dimensional sample partitioned mean vector in our sample of 30 glaucoma patients.
Additionally, using the Formulas (27)–(29) presented in Section 5.2, the unbiased estimates , , and are:
respectively. Using the above estimates, the unbiased estimate of is:


The block diagonals represent the estimate of the variance-covariance matrix of the two response variables IOP and CCT at any given eye and at any given time point, whereas the block off diagonals represent the estimate of the covariance matrix of the two response variables IOP and CCT between the two eyes and at any given time point. The block off diagonals represent the covariance matrix of the two response variables IOP and CCT between any two time points.
Iester et al. [] reported the mean and standard deviation (SD) of the IOP and CCT measurements for both the eyes from 794 Italian Caucasian glaucoma patients (see Table 2). We deem these means as the means of the IOP and CCT at the first time point and then randomly generate four samples within three standard errors (SD of mean) from these reported means of IOP and CCT to represent the means of IOP and CCT for the left and right eyes in the third and sixth months, respectively. These randomly generated means of IOP and CCT for the left and right eyes at three time points in vector form are reported in Table 3, and we will take this mean vector as the targeted mean in (21). The sample mean vector in Table 1 appears to be very different from the targeted population mean vector in Table 3.
Table 2.
IOP and CCT measurements from 794 Italian Caucasian glaucoma patients.
Table 3.
The (2 × 1) dimensional targeted partitioned mean vector in the Italian Caucasian glaucoma patients.
The aim of our study is to see whether our sample of 30 glaucoma patients has the same mean vector as the Italian Caucasian glaucoma patients. Our main intention of the analysis of our glaucoma dataset is to illustrate the use of our new hypotheses testing procedures rather than giving any insight into the dataset itself.
The calculated statistic (26), which is a convolution of three independent L–H distributions, , , respectively, which in turn is approximated by two approximated F distributions and one exact F distribution, is 317.2971, and the corresponding p value is 0. So, we reject the null hypothesis that the population mean of our dataset is equal to the Italian Caucasian glaucoma patients, and this conclusion was expected from the data.
8. Conclusions and Discussion
We study the tests of hypotheses of equality of means for one population as well as for two populations for high-dimensional and higher-order data with k-SSCS covariance structure. Such a structure is natural and a credible assumption in many research studies. MLEs and the unbiased estimates of the matrix parameters of the k-SSCS covariance structure have closed-form solutions. On the other hand, the MLEs and the unbiased estimates of the matrix parameters of the separable covariance structure are not tractable and are computationally intensive. So, k-SSCS covariance structure is a desirable covariance structure for k-th order data. Aghaian et al. [] examined differences in CCT of 801 subjects, establishing the fact that the CCT of Japanese participants was significantly lower than that of Caucasians, Chinese, Filipinos, and Hispanics, and greater than that of African Americans. African American individuals have thinner corneas compared to white individuals []. So, CCT and IOP in glaucoma patients vary with race, and our result confirms this fact. Our proposed new hypotheses testing procedures are perfect for high-dimensional array-variate data, which are ubiquitous in this century. In discriminant analysis [], the first step is to test the equality of means for the two populations. Therefore, our new method developed in this article will have important applications in the analysis of modern multivariate datasets with higher-order structure. Our new method can be extended to non-normal datasets. In addition, it can be extended in testing the equality of means for more than two populations and simultaneous hypotheses testing in models with k-SSCS covariance structure.
Author Contributions
All authors contributed equally to this work. All authors have read and agreed to the published version of the manuscript.
Funding
This research received no external funding.
Institutional Review Board Statement
Not applicable.
Informed Consent Statement
Not applicable.
Data Availability Statement
Not applicable.
Acknowledgments
The authors gratefully acknowledge the funding support from the Department of Management Science and Statistics, Carlos Alvarez College of Business, The University of Texas at San Antonio, San Antonio, Texas for paying the APC. The authors are also thankful to the editor and three anonymous referees for their careful reading, valuable comments, and suggestions that led to a quite improved version of the manuscript.
Conflicts of Interest
The authors declare no conflict of interest.
References
- Žežula, I.; Klein, D.; Roy, A. Testing of multivariate repeated measures data with block exchangeable covariance structure. Test 2018, 27, 360–378. [Google Scholar] [CrossRef]
- Dutilleul, P. The mle algorithm for the matrix normal distribution. J. Stat. Comput. Simul. 1999, 64, 105–123. [Google Scholar] [CrossRef]
- Gupta, A.K. On a multivariate statistical classification model. In Multivariate Statistical Analysis; Gupta, R.P., Ed.; North-Holland: Amsterdam, The Netherlands, 1980; pp. 83–93. [Google Scholar]
- Lu, N.; Zimmerman, D. The likelihood ratio test for a separable covariance matrix. Stat. Probab. Lett. 2005, 73, 449–457. [Google Scholar] [CrossRef]
- Kollo, T.; von Rosen, D. Advanced Multivariate Statistics with Matrices; Springer: Dordrecht, The Netherlands, 2005. [Google Scholar]
- Leiva, R. Linear discrimination with equicorrelated training vectors. J. Multivar. Anal. 2007, 98, 384–409. [Google Scholar] [CrossRef] [Green Version]
- Manceur, A.M.; Dutilleul, P. Maximum likelihood estimation for the tensor normal distribution: Algorithm, minimum sample size, and empirical bias and dispersion. J. Computat. Appl. Math. 2013, 239, 37–49. [Google Scholar] [CrossRef]
- Leiva, R.; Roy, A. Classification of higher-order data with separable covariance and structured multiplicative or additive mean models. Commun. Stat. Theory Methods 2014, 43, 989–1012. [Google Scholar] [CrossRef]
- Ohlson, M.; Ahmad, M.R.; von Rosen, D. The multilinear normal distribution: Introduction and some basic properties. J. Multivar. Anal. 2013, 113, 37–47. [Google Scholar] [CrossRef] [Green Version]
- Leiva, R.; Roy, A. Self Similar Compound Symmetry Covariance Structure. J. Stat. Theory Pract. 2021, 15, 70. [Google Scholar] [CrossRef]
- Rao, C.R. Familial correlations or the multivariate generalizations of the intraclass correlation. Curr. Sci. 1945, 14, 66–67. [Google Scholar]
- Rao, C.R. Discriminant functions for genetic differentiation and selection. Sankhya 1953, 12, 229–246. [Google Scholar]
- Olkin, I.; Press S., J. Testing and estimation for a circular stationary model. Ann. Math. Stat. 1969, 40, 1358–1373. [Google Scholar] [CrossRef]
- Liang, Y.; von Rosen, D.; von Rosen, T. On estimation in hierarchical models with block circular covariance structures. Ann. Inst. Stat. Math. 2015, 67, 773–791. [Google Scholar] [CrossRef]
- Olkin, I. Inference for a Normal Population when the Parameters Exhibit Some Structure, Reliability and Biometry; SIAM: Philadelphia, PA, USA, 1974; pp. 759–773. [Google Scholar]
- Wilks, S.S. Sample criteria for testing equality of means, equality of variances, and equality of covariances in a normal multivariate distribution. Ann. Math. Stat. 1946, 17, 257–281. [Google Scholar] [CrossRef]
- Arnold, S.F. Application of the theory of products of problems to certain patterned covariance matrices. Ann. Stat. 1973, 1, 682–699. [Google Scholar] [CrossRef]
- Arnold, S.F. Linear models with exchangeably distributed errors. J. Am. Stat. Assoc. 1979, 74, 194–199. [Google Scholar] [CrossRef]
- Roy, A.; Leiva, R.; Žežula, I.; Klein, D. Testing of equality of mean vectors for paired doubly multivariate observations in blocked compound symmetric covariance matrix setup. J. Multivar. Anal. 2015, 137, 50–60. [Google Scholar] [CrossRef]
- Leiva, R.; Roy, A. Linear discrimination for three-level multivariate data with separable additive mean vector and doubly exchangeable covariance structure. Comput. Stat. Data Anal. 2012, 56, 1644–1661. [Google Scholar] [CrossRef]
- Roy, A.; Fonseca, M. Linear models with doubly exchangeable distributed errors. Commun. Stat. Theory Methods 2012, 41, 2545–2569. [Google Scholar] [CrossRef]
- Žežula, I.; Klein, D.; Roy, A. Mean Value Test for Three-Level Multivariate Observations with Doubly Exchangeable Covariance Structure. In Recent Developments in Multivariate and Random Matrix Analysis; Holgersson, T., Singull, M., Eds.; Springer Nature: Cham, Switzerland, 2020; pp. 335–349. [Google Scholar]
- Jordan, P.; von Neumann, J.; Wigner, E.P. On an algebraic generalization of the quantum mechanical formalism. Ann. Math. 1934, 35, 29–64. [Google Scholar] [CrossRef]
- Malley, J.D. Statistical Applications of Jordan Algebras. In Lecture Notes in Statistics; Springer: New York, NY, USA, 1994. [Google Scholar]
- Roy, A.; Zmyślony, R.; Fonseca, M.; Leiva, R. Optimal estimation for doubly multivariate data in blocked compound symmetric covariance structure. J. Multivar. Anal. 2016, 144, 81–90. [Google Scholar] [CrossRef]
- Kozioł, A.; Roy, A.; Zmyślony, R.; Leiva, R.; Fonseca, M. Best unbiased estimates for parameters of three-level multivariate data with doubly exchangeable covariance structure. Linear Algebra Appl. 2017, 535, 87–104. [Google Scholar] [CrossRef]
- McKeon, J.J. F approximations to the distribution of Hotelling’s . Biometrika 1974, 61, 381–383. [Google Scholar]
- Dyer, D. The Convolution of Generalized F Distributions. J. Am. Stat. Assoc. 1982, 77, 184–189. [Google Scholar] [CrossRef]
- Iester, M.; Telani, S.; Frezzotti, P.; Manni, G.; Uva, M.; Figus, M.; Perdicchi, A. Differences in central corneal thickness between the paired eyes and the severity of the glaucomatous damage. Eye 2012, 26, 1424–1430. [Google Scholar] [CrossRef] [Green Version]
- Aghaian, E.; Choe, J.E.; Lin, S.; Stamper, R.L. Central corneal thickness of Caucasians, Chinese, Hispanics, Filipinos, African Americans, and Japanese in a glaucoma clinic. Ophthalmology 2004, 111, 2211–2219. [Google Scholar] [CrossRef] [PubMed]
- Brandt, J.D.; Beiser, J.A.; Kass, M.A.; Gordon, M.O.; Ocular Hypertension Treatment Study (OHTS) Group. Central corneal thickness in the Ocular Hypertension Treatment Study (OHTS). Ophthalmology 2001, 108, 1779–1788. [Google Scholar] [CrossRef]
- Johnson, R.A.; Wichern, D.W. Applied Multivariate Statistical Analysis, 6th ed.; Pearson Prentice Hall: Hoboken, NJ, USA, 2007. [Google Scholar]
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |
© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).