Two-Sample Hypothesis Test for Functional Data

Jing Zhao; Sanying Feng; Yuping Hu

doi:10.3390/math10214060

Abstract

In this paper, we develop and study a novel testing procedure that has more a powerful ability to detect mean difference for functional data. In general, it includes two stages: first, splitting the sample into two parts and selecting principle components adaptively based on the first half-sample; then, constructing a test statistic based on another half-sample. An extensive simulation study is presented, which shows that the proposed test works very well in comparison with several other methods in a variety of alternative settings.

Keywords:

functional data analysis; mean functions comparison; two sample testing; sample splitting

MSC:

62G10; 62G20

1. Introduction

In the recent literature, there has been an increasing interest in functional data analysis, with its extensive application in biometrics, chemometrics, econometrics, and medical research, as well as other fields. Functional data have intrinsically infinite dimensions and thus, classical methods for multivariate observations are not applicable. Therefore, it is necessary to develop special techniques for this type of data. There has intensive methodological and theoretical development in function data analysis; see [1,2,3,4,5] and so on.

In functional data analysis, a functional data set or curve can be modeled as independent realizations of an underlying stochastic process:

x_{i} (t) = μ (t) + α_{i} (t) + ϵ_{i} (t), i = 1, 2, \dots, n,

(1)

where

μ (t)

is the mean function of the stochastic process,

α_{i} (t)

is the ith individual function variation from

μ (t)

, and

ϵ_{i} (t)

is the ith measurement error process. In general, we assume

α_{i} (t)

and

ϵ_{i} (t)

are independent, and i.i.d. sample from

α (t)

and

ϵ (t)

, respectively, where,

α (t) \sim SP (0, γ)

,

ϵ (t) \sim SP (0, γ_{ϵ})

, and SP denotes a stochastic process with mean function

μ (t)

and covariance function

γ (s, t)

.

The mean function

μ (t)

reflects the underlying trend and can be used as an important index for population response, such as in drug and biomedical purposes, among other. One important statistical inference other than estimation is related to testing of various hypotheses about the mean function. Therefore, we focus on the problem of testing the equality of mean functions in two random samples independently drawn from two functional random variables. There have been some approaches proposed so far to address this problem. For instance, Ref. [6] proposed an adaptive Neyman test, but in the case when the sampling information is in a “discrete” format. Ref. [7] discussed two methods: multivariate analysis-based and bootstrap-based testing.

However, these methods have only been applied in narrow fields and not available for a global testing result. Refs. [8,9] proposed an

L^{2}

-norm based statistic to test the equality of mean functions. Ref. [10] proposed and studied a so-called Globalized Pointwise F-test, abbreviated as GPF test. The GPF test is in general comparable with the

L^{2}

-norm-based test and the F-type test adopted for the one-way ANOVA problem. Then, Ref. [11] proposed the

F_{\max}

-test; via some simulation studies, it was found that in terms of both level accuracy and power, the

F_{\max}

-test outperforms the Globalized Pointwise F (GPF) test of [10] when the functional data are highly or moderately correlated, and its performance is comparable with the latter otherwise. Ref. [5] proposed a statistic based on the functional principal component emi-distances. Furthermore, they gave a normalized version of the functional principal components that has a chi-square limit distribution. The statistic is scale-invariant. However, this method require pre-specifying a threshold to choose the leading principal components (PCs), where PCs are ranked based on eigenvalues. They chose the number of PCs, for example, d, based on the percentage of variance explained for the functional covariates. This method has two drawbacks: one is that it can only detect mean differences in this d-dimensional subspace; the other is that different thresholds often lead to different tests, whose power depends on the particular simple alternative hypothesis.

In this paper, we develop and study a novel testing procedure that overcomes the drawback that many tests can only detect mean difference in the d-dimensional subspace. Furthermore, the novel testing procedure is very powerful in the cases when there are differences in middle part and latter part of two function sequences. Additionally, we derived the asymptotic distribution of the new test statistics under the alternative hypothesis, which is the key difficulty in the current approach. In general, the novel testing procedure includes two stages: first, split the sample into two parts and select PCs adaptively based on the first half-sample; then, construct the test statistic based on another half-sample. Sample splitting is often used in high-dimension regression problems because most computationally efficient selection algorithms cannot guard against inclusion of noise variables. Asymptotically valid p-values are not available. Ref. [12] adopted this technology to reduce the number of variables to a manageable size using the first split sample and apply classical variable selection techniques with the remaining variables, using the data from the second split.

In our procedure, we mainly adopt two methods to select PCs in the first stage: the adaptive Neyman test [6] and the adaptive ordered Neyman test. At the same time, we also consider selecting PCs based on pre-specifying a threshold; however, this threshold is an association–projection index that combines both the variation and the projection along each direction. The purpose of splitting the sample is two-fold: (1) to decrease the random noise effect in the first stage; (2) to derive the asymptotic distribution of the test statistic. From simulation results, we can see that our testing procedure asymptotically achieves the pre-specified significance level, and enjoys certain optimality in terms of its power, even when the population is a non-Gaussian process.

This paper is organized as follows. In Section 2, we introduce the test problem and briefly review the existing global two-sample test methods. Section 3 proposes our testing procedure. Simulation studies are given in Section 4. A real-data example is analyzed in Section 5. Section 6 concludes the present work. The derivations are given in Appendix A.

2. The Testing Problem for Functional Data

2.1. Preliminary

Let

SP

(

μ, Γ

) denote a stochastic process with mean function

μ (t), t \in T

and covariance function

Γ (s, t), s, t \in T

, where

T \in [0, 1]

. Suppose we have the following two independent function samples:

X_{1} (t), \dots, X_{n_{1}} (t) \sim SP (μ_{1}, Γ_{1} (s, t)),

(2)

Y_{1} (t), \dots, Y_{n_{2}} (t) \sim SP (μ_{2}, Γ_{2} (s, t)),

(3)

where

Γ_{1} (s, t)

denotes the covariance function of the function data

X (t)

and

Γ_{2} (s, t)

denotes the covariance function of the function data

Y (t)

. However, we do not know if

Γ_{1} (s, t)

and

Γ_{2} (s, t)

are equal. We want to test whether the two mean functions are equal:

H_{0} : μ_{1} (t) = μ_{2} (t) vs . H_{1} : μ_{1} (t) \neq μ_{2} (t) .

(4)

Let

\bar{X} (t), \bar{Y} (t)

denote the sample mean functions of the two samples, respectively. First, we have

{\hat{μ}}_{1} (t) = \bar{X} (t), {\hat{μ}}_{2} (t) = \bar{Y} (t),

(5)

and

\sqrt{n} ({\hat{μ}}_{1} (t) - {\hat{μ}}_{2} (t)) \sim SP (\sqrt{n} (μ_{1} (t) - μ_{2} (t)), Γ_{12} (s, t)),

(6)

where

Γ_{12} (s, t) = \frac{Γ_{1} (s, t)}{n_{1} / n} + \frac{Γ_{2} (s, t)}{n_{2} / n}, n = n_{1} + n_{2}

.

Γ_{12} (s, t)

can be written as

Γ_{12} (s, t) = \sum_{k = 1}^{\infty} λ_{k} φ_{k} (s) φ_{k} (t)

, where

λ_{1} \geq λ_{2} \dots \geq 0

are the eigenvalues and

φ_{k} (t), k = 1, \dots, \infty

are eigenfunctions satisfying

\int_{T} φ_{k} {(t)}^{2} d t = 1

and

\int_{T} φ_{k} (t) φ_{l} (t) d t = 0, k \neq l

.

It is easy to see that

{\hat{Γ}}_{1} (s, t) = \frac{1}{n_{1}} \sum_{i = 1}^{n_{1}} (X_{i} (t) - {\hat{μ}}_{1} (t)) (X_{i} (s) - {\hat{μ}}_{1} (s))

,

{\hat{Γ}}_{2} (s, t) = \frac{1}{n_{2}} \sum_{i = 1}^{n_{2}}

(Y_{i} (t) - {\hat{μ}}_{2} (t)) (Y_{i} (s) - {\hat{μ}}_{2} (s))

and

{\hat{Γ}}_{12} (s, t) = \frac{{\hat{Γ}}_{1} (s, t)}{n_{1} / n} + \frac{{\hat{Γ}}_{2} (s, t)}{n_{2} / n}

.

{\hat{Γ}}_{12} (s, t)

can also be written in as an eigen-decomposition

{\hat{Γ}}_{12} (s, t) = \sum_{k = 1}^{\infty} {\hat{λ}}_{k} {\hat{φ}}_{k} (s) {\hat{φ}}_{k} (t),

(7)

where the nonincreasing sequence

({\hat{λ}}_{k} : k \geq 1)

is the sample eigenvalues and

({\hat{φ}}_{k} : k \geq 1)

are the corresponding eigenfunctions forming an orthonormal basis of

L_{2} [0, 1]

.

To simplify notation, we use the symbol

Γ

for both the kernel and the operator. Now, the functional Mahalanobis semi-distance between

{\hat{μ}}_{1} (t)

and

{\hat{μ}}_{2} (t)

is defined as:

d_{F M}^{2} ({\hat{μ}}_{1}, {\hat{μ}}_{2}) = ⟨ {\hat{Γ}}_{12, p_{n}}^{- \frac{1}{2}} \sqrt{n} ({\hat{μ}}_{1} - {\hat{μ}}_{2}), {\hat{Γ}}_{12, p_{n}}^{- \frac{1}{2}} \sqrt{n} ({\hat{μ}}_{1} - {\hat{μ}}_{2}) ⟩ .

(8)

Plugging (6) into (7), we have

d_{F M}^{2} ({\hat{μ}}_{1}, {\hat{μ}}_{2}) = \sum_{k = 1}^{p_{n}} \frac{n {⟨ {\hat{μ}}_{1} - {\hat{μ}}_{2}, {\hat{φ}}_{k} ⟩}^{2}}{{\hat{λ}}_{k}} .

(9)

2.2. Existing Global Testing Methods

Although there is significant literature discussing the equality of means for two functional data sets, they can be roughly grouped into a few broad categories, as follows.

(1): $L_{2}$ -norm-based test

The test is based on the

L_{2}

-norm of the difference between

{\hat{μ}}_{1} (t)

and

{\hat{μ}}_{2} (t)

:

T_{L} = n {∥ {\hat{μ}}_{1} (t) - {\hat{μ}}_{2} (t) ∥}^{2} = n \int {({\hat{μ}}_{1} (t) - {\hat{μ}}_{2} (t))}^{2} d t .

(10)

Ref. [8] proved that

T_{L} = n \int {({\hat{μ}}_{1} (t) - {\hat{μ}}_{2} (t))}^{2} d t \overset{d}{=} \sum_{k = 1}^{\infty} λ_{k} A_{k} + o_{p} (1)

,

A_{k} \sim χ_{1}^{2} (\frac{n u_{k}^{2}}{λ_{k}})

, where

X \overset{d}{=} Y

denotes that X and Y have the same distribution, and

u_{k} = \int_{0}^{1} (μ_{1} (t) - μ_{2} (t)) φ_{k} (t) d t, k = 1, 2, \dots .

Furthermore, they use the two-cumulant matched

χ^{2}

approximation method and obtained an approximate distribution of

T_{L}

, which is,

α χ_{d}^{2} + β

, where

α = \frac{\sum_{k = 1}^{\infty} {\hat{λ}}_{k}^{3}}{\sum_{k = 1}^{\infty} {\hat{λ}}_{k}^{2}}, d = \frac{{\{\sum_{k = 1}^{\infty} {\hat{λ}}_{k}^{2}\}}^{3}}{{\{\sum_{k = 1}^{\infty} {\hat{λ}}_{k}^{3}\}}^{2}}, β = \sum_{k = 1}^{\infty} {\hat{λ}}_{k} - \frac{{\{\sum_{k = 1}^{\infty} {\hat{λ}}_{k}^{2}\}}^{2}}{\sum_{k = 1}^{\infty} {\hat{λ}}_{k}^{3}} .

Then, they have

P (T_{L} > K) \approx P (χ_{d}^{2} > (K - β) / α) .

(2): Projection-based Test

Ref. [5] considered projecting the observed mean difference onto the space spanned by

{\hat{φ}}_{1}, \dots, {\hat{φ}}_{d}

, where d is determined based on the percentage of eigenvalues, constructed the following test statistic:

T_{H} = \sum_{k = 1}^{d} {⟨ \sqrt{n} ({\hat{μ}}_{1} - {\hat{μ}}_{2}), {\hat{φ}}_{k} ⟩}^{2} .

(11)

Given d, the asymptotic distribution of

T_{H}

under the null hypothesis is the distribution of

\sum_{k = 1}^{d} λ_{k} Z_{k}^{2}

, where

Z_{k}, k = 1, \dots, d

are independent standard normal random variables. Alternately, they proposed a normalized version of

T_{H}

which is given by

N T_{H} = \sum_{k = 1}^{d} {⟨ \sqrt{n} ({\hat{μ}}_{1} - {\hat{μ}}_{2}), {\hat{φ}}_{k} ⟩}^{2} / {\hat{λ}}_{k},

(12)

Then, under the null hypothesis,

N T_{H}

has an asymptotic

χ_{d}^{2}

distribution.

(3): F-test

Finally, we describe the testing procedure proposed by [13]. Although they proposed a functional F test in a functional linear regression setting, the method can be specified for our two sample test as follows.

The F-test statistic for our setting is

T_{F} = \frac{R S S_{0} - R S S_{1}}{R S S_{1} / (n - 1)},

(13)

where

\begin{matrix} R S S_{1} & = & \sum_{i = 1}^{n_{1}} \int_{0}^{1} {(X_{i} (t) - \bar{X} (t))}^{2} d t + \sum_{i = 1}^{n_{2}} \int_{0}^{1} {(Y_{i} (t) - \bar{Y} (t))}^{2} d t, \\ R S S_{0} & = & \sum_{i = 1}^{n_{1}} \int_{0}^{1} {(X_{i} (t) - \bar{Z} (t))}^{2} d t + \sum_{i = 1}^{n_{2}} \int_{0}^{1} {(Y_{i} (t) - \bar{Z} (t))}^{2} d t, \\ \bar{Z} (t) & = & \frac{n_{1} \bar{X} (t) + n_{2} \bar{Y} (t)}{n_{1} + n_{2}} . \end{matrix}

Ref. [13] also presented the distribution of the F-test as

\frac{\sum_{k = 1}^{\infty} λ_{k} χ_{1}^{2}}{\sum_{k = 1}^{\infty} λ_{k} χ_{n - 2}^{2}} .

In practice, they also applied the idea of [14] approximation to derive approximate distribution of F-statistic, that is,

(χ_{f_{1}}^{2} / f_{1}) / (χ_{f_{2}}^{2} / f_{2})

, which is an ordinary F distribution with degrees of freedom

f_{1}

and

f_{2}

, where

f_{1} = {(\sum_{k = 1}^{\infty} {\hat{λ}}_{k})}^{2} / \sum_{k = 1}^{\infty} {\hat{λ}}_{k}^{2}

,

f_{2} = (n - 2) {(\sum_{k = 1}^{\infty} {\hat{λ}}_{k})}^{2} / \sum_{k = 1}^{\infty} {\hat{λ}}_{k}^{2}

.

3. Our Testing Procedure

In order to determine the number of PCs

p_{n}

adaptively and find significant parts to construct a more powerful test statistic, we propose a two-stage procedure via a data-splitting technique. With the help of this technique, we can derive the distribution of the test statistic.

First, we assume that the sample size is even for simplicity and randomly split samples into two groups:

(X^{(1)}, Y^{(1)})

and

(X^{(2)}, Y^{(2)})

. In the first stage, we choose

p_{n}

based on the adaptively truncated Hotelling

T^{2}

order statistic via the first group sample

(X^{(1)}, Y^{(1)})

. In the second stage, we construct the test statistic via the second group sample

(X^{(2)}, Y^{(2)})

and

p_{n}

.

Next, we show three methods of choosing

p_{n}

in a general case.

Denote

{\hat{V}}_{k}^{(1)} = {⟨ \sqrt{n / 2} ({\hat{μ}}_{1}^{(1)} - {\hat{μ}}_{2}^{(1)}), {\hat{φ}}_{k}^{(1)} ⟩}^{2} / {\hat{λ}}_{k}^{(1)}, k = 1, \dots, d, \dots .

In practice, since many of the trailing eigenvalues are close to zero and

{\hat{V}}_{k}

will be very large for a large k. Hence, we generally give the cutoff

d_{n}

a high threshold, for example,

d_{n} = max {k : {\hat{λ}}_{k} / \sum_{t = 1}^{k} {\hat{λ}}_{t} > 0.001}

. For most inference problems, there is no optimal test, but the adaptive Neyman tests have been shown to work well against a broad range of alternatives. Therefore, we choose

p_{n}

adaptively based on the following adaptive Neyman test methods. One method is maximizing normalized (8) (in the Appendix A we have proven

{\hat{V}}_{k}

has an approximate

χ_{1}^{2}

distribution), that is,

p_{1 n} = {argmax}_{1 \leq d \leq d_{n}} \frac{\sum_{k = 1}^{d} {\hat{V}}_{k}^{(1)} - d}{\sqrt{2 d}} .

Considering some possible nonsignificant

{\hat{V}}_{k}^{(1)}

terms, we propose another method that maximizes the sun of normalized order statistic

V_{(k)}

, that is,

p_{2 n} = {argmax}_{1 \leq d \leq d_{n}} \frac{\sum_{k = 1}^{d} {\hat{V}}_{(k)}^{(1)} - E (\sum_{k = 1}^{d} {\hat{V}}_{(k)}^{(1)})}{\sqrt{var (\sum_{k = 1}^{d} {\hat{V}}_{(k)}^{(1)})}},

where

{\hat{V}}_{(k)}^{(1)}

is the k-th order statistic of

{\hat{V}}_{k}^{(1)}, k = 1, \dots, d

in decreasing order. Unfortunately, there is no closed form since it involves order statistics. However, empirical approximations of this maximum value can be conducted by very fast Monte Carlo approximations. The third method is a hard threshold truncation method, but we truncate at the d-th term based on the percentage of

{\hat{V}}_{k}

, which combines both the variation and the projection along each direction. In our simulation, we set a truncation threshold as [5] projection-based test for comparison.

Remark 1.

p_{1 n}

and

p_{2 n}

are chosen adaptively based on the first group data. For the convenience of follow-up theoretical analysis, we denote them as

p_{n}

uniformly.

After we derive

p_{n}

in the first stage, we construct the following statistic based on the second group sample as follows:

T_{1 A} = \sum_{k = 1}^{p_{n}} {\hat{V}}_{k}^{(2)},

T_{2 A} = \sum_{k = 1}^{p_{n}} {\hat{V}}_{(k)}^{(2)},

N T_{1 A} = \frac{\sum_{k = 1}^{p_{n}} {\hat{V}}_{k}^{(2)} - p_{n}}{\sqrt{2 p_{n}}},

N T_{2 A} = \frac{\sum_{k = 1}^{p_{n}} {\hat{V}}_{(k)}^{(2)} - E (\sum_{k = 1}^{p_{n}} {\hat{V}}_{(k)}^{2})}{\sqrt{var (\sum_{k = 1}^{p_{n}} {\hat{V}}_{(k)}^{2})}},

where

{\hat{V}}_{k}^{(2)} = {⟨ \sqrt{n / 2} ({\hat{μ}}_{1}^{(2)} - {\hat{μ}}_{2}^{(2)}), {\hat{φ}}_{k}^{(2)} ⟩}^{2} / {\hat{λ}}_{k}^{(2)}

To derive the asymptotic distribution of

T_{1 A}

,

T_{2 A}

,

N T_{1 A}

, and

N T_{2 A}

, we make the following assumptions:

Assumption 1.

There exist constants

a > 1

and C such that

λ_{k} - λ_{k + 1} \geq C k^{- a - 1}

for

k \geq 1

and

C^{- 1} k^{- a} \leq λ_{k} \leq C k^{- a} .

Assumption 2.

\int_{l} E X^{4} < \infty, \int_{l} E Y^{4} < \infty .

Assumption 3.

τ = lim_{n \to \infty} \frac{n_{1}}{n}, 0 < τ < 1 .

Assumption 4.

p_{n}^{2 a + 3} m^{- 1} = o_{p} (1)

, where

m = min (n_{1}, n_{2}) .

Assumptions 1 and 2 are regular in functional principle component analysis (FPCA). Assumption 1 implies that

λ_{k} \geq C k^{- a}

. Because the covariance functions are bounded, one has

a > 1

. Assumption 1 essentially assumes that all the eigenvalues are positive, but decay exponentially. Assumption 3 requires that the two sample sizes

n_{1}, n_{2}

tend to ∞ proportionally. Assumption 4 specifies the growing rate of

p_{n}

. In related literature e.g., [15], to guarantee estimation consistency,

p_{n}

is usually assumed to satisfy

p_{n}^{a + 1} n^{- 1} = o_{p} (1)

.

Theorem 1.

Under

H_{0}

,

\sqrt{n} (\bar{X} - \bar{Y}) \overset{d}{⟶} G

, where G is a Gaussian process with mean zero and covariance function Γ, where

Γ (s, t) = \frac{Γ_{1} (s, t)}{τ} + \frac{Γ_{2} (s, t)}{1 - τ} .

The proof of Theorem 1 follows the trivial central limit of the stochastic process and we omit it.

Remark 2.

We can write G as

G = \sum_{j = 1}^{\infty} η_{j} \sqrt{λ_{j}} φ_{j}

, where

η_{j}, j \geq 1

are i.i.d centered real Gaussian random variables with variance 1.

Theorem 2.

Under Assumptions 1–4, there exist some increasing sequences

{(p_{n})}_{n}

such that under

H_{0}

lim_{p_{n} \to \infty} P (N T_{1 A} \leq x) = Φ (x),

(14)

where

Φ (x)

denotes the cumulative distribution function (cdf) of standard normal distribution.

Theorem 3.

Under Assumptions 1–4, the test statistic

T_{2 A}

is approximately equivalent to

\sum_{k = 1}^{p_{n}} χ_{(k)}^{2}

(1) under

H_{0}

, where

χ_{(1)}^{2} (1), \dots, χ_{(p_{n})}^{2} (1)

are order statistics (in a decreasing order) of

p_{n}

χ^{2} (1)

random variables.

Remark 3.

The asymptotic null distribution of

T_{2 A}

is affected not only by the values of

{\hat{V}}_{k}^{(2)}

, but also by the order of them. In practice, empirical approximations of quantiles and tail probability of the null distribution of

T_{2 A}

can be deduced by very fast Monte Carlo approximations.

Theorem 4.

Under Assumptions 1–4 and

H_{0}

,

lim_{(m, p_{n}) \to \infty} P (N T_{2 A} \leq x) = Φ (x),

(15)

where

Φ (x)

denotes the cdf of the standard normal distribution.

To obtain the asymptotic distribution of

N T_{1 A}

under the alternative in (3), we choose the local alternative, as defined in the following assumption:

Assumption 5.

H_{1 n} : μ_{1} (t) - μ_{2} (t) = n^{- \frac{1}{2}} u (t),

(16)

where

u (t)

is any fixed real function such that

0 < ‖ u ‖ < \infty .

Then, we have the following asymptotic power of

N T_{1 A}

:

Theorem 5.

Under Assumptions 1–6, the asymptotic distribution of the

N T_{1 A}

is given by

lim_{(m, p_{n}) \to \infty} P (N T_{1 A} > z_{1 - α}) = Φ (- z_{1 - α} + ∥ Γ_{12}^{- 1} u (t) ∥^{2}),

where P denotes that the distribution have been obtained under the alternative, and

z_{1 - α}

is the upper

(1 - α) 100 %

point of the standard normal distribution.

Theorem 6.

Under Assumptions 1–5, the distribution of the

T_{2 A}

is approximately equivalent to a noncentral

χ^{2}

distribution

χ_{p_{n}}^{2 *} (ζ_{0})

, where

ζ_{0} = \sum_{k = 1}^{p_{n}} V_{(k)}^{2},

and

V_{k}^{2} = \frac{{⟨ \sqrt{n / 2} (μ_{1} - μ_{2}), φ_{k}^{(2)} ⟩}^{2}}{λ_{k}^{(2)}} = \frac{{⟨ u (t), φ_{k}^{(2)} ⟩}^{2}}{λ_{k}^{(2)}} .

4. Simulations

In this section, we report some Monte Carlo simulation results to compare the finite samples performance of the classical and proposed methods on the two sample mean testing problem under different settings, including a fixed simple alternative and sparse signals with varying locations.

4.1. Fixed Simple Alternative

In this subsection, we first look at a simple setting where the alternatives are fixed. We generate curves from two populations that are generated by 40 Fourier bases as

X (t) = \sum_{k = 1}^{40} (θ_{k}^{1 / 2} z_{1 k} + μ_{1 k}) ϕ_{k} (t), Y (t) = \sum_{k = 1}^{40} (θ_{k}^{1 / 2} z_{2 k} + μ_{2 k}) ϕ_{k} (t) .

Here,

z_{1 k}, z_{2 k}

are independent standard normal random variables. In each case, we take

ϕ_{k} (t) = \sqrt{2} \sin ((k - 0.5) π t), t \in [0, 1]

for

k = 1, 2, \dots

and generate the data on a discrete grid of 100 equispaced points in [0, 1]. We took

θ_{k} = 1 / {(π (k - 0.5))}^{2}

. We choose

μ_{1 k} s

and

μ_{2 k} s

depending on the property that we want to illustrate (see below). We compare power and size under three different methods: Ref. [8]

L_{2}

-norm based test (

T_{L}

); Ref. [13]’s F-test

T_{F}

; Ref. [5]’s projection-based test (

T_{H}

) with fixed truncation and our two methods. We choose the commonly used threshold

99 %

to determine the truncated term in [5]’s projection-based test (

T_{H}

). The results are based on 1000 Monte Carlo replications. In all scenarios, we set the nominal size

α = 0.05

.

To cover as many different scenes as possible, we set five different settings referring to the mean difference: (1) the mean differences arise early in the sequence

μ_{1 k} s

and

μ_{2 k} s

, that is,

(μ_{11}, μ_{12}, μ_{13}, μ_{14}, μ_{15}, μ_{16}) = (0.5, - 0.5, 1.5, - 0.5, 1.5, - 0.5)

and

μ_{1 k} = 0

for

k > 6

,

μ_{2 k} = μ_{1 k} + δ

for

k \leq 6

,

μ_{2 k} = 0

for

k > 6

; (2) the mean differences arise in the middle of the sequence

μ_{1 k} s

and

μ_{2 k} s

, that is,

(μ_{1, 11}, μ_{1, 12}, μ_{1, 13}, μ_{1, 14}, μ_{1, 15}, μ_{1, 16}) = (0.5, - 0.5, 1.5, - 0.5, 1.5, - 0.5)

and

μ_{1 k} = 0

for other k,

μ_{2 k} = μ_{1 k} + δ

for

11 \leq k \leq 16

,

μ_{2 k} = 0

for other k; (3) the mean differences arise in the latter part of the sequence

μ_{1 k} s

and

μ_{2 k} s

, that is,

(μ_{1, 21}, μ_{1, 22}, μ_{1, 23}, μ_{1, 24}, μ_{1, 25}, μ_{1, 26}) = (0.5, - 0.5, 1.5, - 0.5, 1.5, - 0.5)

and

μ_{1 k} = 0

for other k,

μ_{2 k} = μ_{1 k} + δ

for

21 \leq k \leq 26

,

μ_{2 k} = 0

for other k; (4) the mean differences are scattered in the first, middle and latter part, that is,

(μ_{1, 11}, μ_{1, 12}, μ_{1, 13}, μ_{1, 21}, μ_{1, 22}, μ_{1, 23}) = (0.5, - 0.5, 1.5, - 0.5, 1.5, - 0.5)

and

μ_{1 k} = 0

for other k,

μ_{2 k} = μ_{1 k} + δ

for

k \in {1, 2, 11, 12, 21, 22}

,

μ_{2 k} = 0

for other k; (5) the tiny differences appear in all the principal components. In this case, we set

μ_{1 k}

as independent

N (0, 1)

random variables, and

μ_{2 k} = μ_{1 k} + δ, 1 \leq k \leq 40

.

From Table 1, Table 2, Table 3, Table 4 and Table 5, we can see that these are obvious different performances in different settings. From Table 1, we can see that when the mean difference lies in early part in the sequence, Ref. [5]’s projection-based test (

T_{H}

) has most powerful performance. This should not be surprising, because their method just chooses projection space spanned by the first few eigenfunctions, where the mean difference lies. From Table 2 we observe that when mean difference lies in the middle part in the sequence, our method has very high power compared to

T_{H}

,

T_{F}

and

T_{L}

. Particularly, we notice that Ref. [5]’s projection-based test (

T_{H}

) has a dramatic power loss. From Table 3, we can see that when the mean difference lies in the latter part of the sequence, our method still has the best performance. At the same time, we can find that

T_{F}

and

T_{L}

have higher power then

T_{H}

in this case. This illustrates that

T_{F}

and

T_{L}

are sensitive to divergence degree and

T_{H}

is more sensitive to location of mean difference. Furthermore, we notice that the power of

T_{L}

and

T_{F}

outperform our method only on large sample sizes and large discrepancies between the null hypothesis and alternative hypothesis. This is understandable, because our method also depends on mean difference projection on space spanned by the eigenfunction, excluding last few eigenfunctions. Table 4 and Table 5 illustrate more general cases. Table 4 demonstrates that when there are tiny differences in all directions, our method is still the most powerful, while

T_{L}

and

T_{F}

are useless. Table 5 demonstrates the performance of each method when there are differences in the early part, middle part, and latter part. From the simulation results, we can see that in this general case, our method has the most satisfactory performance.

Table 1. Size and power of five methods in Setting 1 (mean difference lies in early part

μ_{2} [1 : 6] = μ_{1} [1 : 6] + δ

).

Table 2. Size and power of five methods in Setting 2 (mean difference lies in middle part

μ_{2} [11 : 16] = μ_{1} [11 : 16] + δ

).

Table 3. Size and power of five methods in Setting 3 (mean difference lies in latter part

μ_{2} [21 : 26] = μ_{1} [21 : 26] + δ

).

Table 4. Size and power of five methods in Setting 4 (mean difference lies in separable part

μ_{2} [1 : 2] = μ_{1} [1 : 2] + δ

,

μ_{2} [11 : 12] = μ_{1} [11 : 12] + δ

,

μ_{2} [21 : 22] = μ_{1} [21 : 22] + δ

).

Table 5. Size and power of five methods in Setting 5 (difference averaged in total vector

μ_{2} = μ_{1} + δ 1_{40}

).

We also conducted simulation studies under other similar scenarios. As they demonstrated similar patterns to those discussed above, we omit them here to save space.

It is worth noting that the proposed method has a first stage with randomly split data. There could be a potential limitation with large randomness. In order to understand the robustness for this splitting, we perform some supporting simulation studies, again considering multi-fold cross validation (CV), including two-fold CV, five-fold CV, and ten-fold CV. For convenience, we use the same data setting of Table 5. The results are shown in Table 6. From Table 6, we can see that in most cases, the hypothesis is robust for this splitting.

Table 6. Size and power of

N T_{2 A}

in Setting 5 (difference averaged in total vector

μ_{2} = μ_{1} + δ 1_{40}

).

4.2. Sparse Signals with Varying Locations

In this section, we demonstrate the performance of different tests under signals with varying locations. We set

μ_{1 k} (1 \leq k \leq 40)

as independent

U (0, 1)

random variables, and

μ_{2 k} = μ_{1 k} + δ

, where k is six random locations out of 40. In Setting 1, the signal of difference appears randomly in six of the first twenty principal components; we denote

M (20, 6)

. In Setting 2, the signal of difference appears in six of forty principal components; we denote

M (40, 6)

. The simulation results are illustrated in Table 7 and Table 8. From the simulation results, we can see that in these cases, our method also have the most satisfactory performance.

Table 7. Size and power of five methods under randomized signal in Setting 1

M (20, 6)

.

Table 8. Size and power of five methods under randomized signal in Setting 2

M (40, 6)

.

We also compare our method with the full-sample adaptive methods, including the adaptive Neyman–Pearson test (

T_{A}

) and the ordered adaptive test

T_{O A}

. In the full sample, the distribution of adaptive Neyman–Pearson test (

T_{A}

) and ordered adaptive test

T_{O A}

is notable, that is,

T_{A} = max_{1 \leq d \leq p_{n}} \frac{\sum_{k = 1}^{d} {\hat{V}}_{k}^{(1)} - d}{\sqrt{2 d}},

T_{O A} = max_{1 \leq d \leq p_{n}} \frac{\sum_{k = 1}^{d} {\hat{V}}_{(k)}^{(1)} - E (\sum_{k = 1}^{d} {\hat{V}}_{(k)}^{(1)})}{\sqrt{var (\sum_{k = 1}^{d} {\hat{V}}_{(k)}^{(1)})}} .

We use permutation method to calculate size and power. Table 9 illustrates the simulation results, where Time1 is the time of running once

T_{A}

and

T_{O A}

and Time2 is the time to run

N T_{1 A}

and

N T_{2 A}

once. The unit is seconds. From Table 9, we can see that our splitting sample methods have a slight power loss compared to the adaptive Neyman–Pearson test

T_{A}

and the ordered adaptive test

T_{O A}

. However, we can save significant time in real computing.

Table 9. Size and power of seven methods under randomized signal in Setting 2

M (40, 6)

.

5. Application

In this section, we apply our proposed hypothesis testing procedures to a real PM 2.5 dataset in Beijing, Tianjin, and Shijiazhuang between January 2017 and December 2019. The dataset was downloaded from the website http://www.tianqihoubao.com/aqi/, accessed on 10 June 2021. The data readings were taken every day, so the total data size is 1085 for every city. Beijing is surrounded by Tianjin and Shijiazhuang. Therefore, we want to know more about the average PM 2.5 difference in these three areas. The following Figure 1 and Figure 2 show the mean PM 2.5 (

μ

g/m

^{3}

) in Beijing, Tianjin, and Hebei Province in different time periods. There are some missing days in some cycles. Note that Figure 1 shows negative values at the beginning for a measure that is always greater than zero because of B-spline approximation.

Figure 1. The mean PM 2.5 (μg/m

^{3}

) of Beijing, Tianjin, and Shijiazhuang from December 2019 to January 2019. The black line stands for Beijing. The red dasehed line stands for Tianjin. The green dotted line stands for Shijiazhuang.

Figure 2. The mean PM 2.5 (μg/m

^{3}

) of Beijing, Tianjin, and Shijiazhuang from December 2019 to January 2017. The black line stands for Beijing. The red asehed line stands for Tianjin. The green dotted line stands for Shijiazhuang.

It is obvious that PM 2.5 changes over individual periods. Here, we test whether there is a significant difference in PM 2.5 among the three cities using the method we proposed. First, the sample is divided into two data sets: the training sample is the dataset in 2017; the test sample is the dataset in 2018–2019. The principle components are adaptively based on the training sample. Then, the test statistic is constructed via the test sample and the principle components are selected by the training sample. To test whether there is significant difference in PM 2.5 between the three cities, we carry out permutations 1000 times within each group to calculate the rejection proportions; then, we obtain the p-value of the test. The results are shown in Table 10.

Table 10. p-value of two tests.

From Table 10, we can see that all p-values are less than 0.05. The tests are statistically significant and suggest that the average PM 2.5 in these three areas are different from each other at a 0.05 level of significance.

6. Conclusions and Discussions

In this paper, we consider the problem of testing the equality of mean functions in two random samples independently drawn from two functional random variables. We develop and study a novel testing procedure that has a more powerful ability to detect mean difference. In general, it includes two stages: first, splitting the sample into two parts and selecting principle components adaptively based on the first half-sample; then, constructing the test statistic based on another half-sample. An extensive simulation study is presented, which shows that the proposed test works very well in comparison with several other methods in a variety of settings. Our future project is to detect differences in the covariance functions of independent sample curves. There have been some approaches proposed so far to address this problem, for instance, the factor-based test proposed by [4] and the regularized M-test introduced by [16].

Author Contributions

Data curation, S.F.; Funding acquisition, S.F.; Investigation, J.Z.; Methodology, J.Z.; Project administration, Y.H.; Validation, Y.H.; Writing—original draft, J.Z.; Writing—review & editing, Y.H. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by China National Institute of Standardization through the “Special funds for basic R.

Conflicts of Interest

The authors declare no conflict of interest.

Appendix A. Proofs of the Main Results

Before proving the main results, we introduce the following useful lemmas. Furthermore, for the convenience of notation, we give proofs in the full sample.

Lemma A1.

Under Assumptions 1–4, we have

\frac{1}{\sqrt{2 p_{n}}} \sum_{k = 1}^{p_{n}} \frac{n {⟨ {\hat{μ}}_{1} - {\hat{μ}}_{2}, {\hat{φ}}_{k} ⟩}^{2}}{{\hat{λ}}_{k}} = \frac{1}{\sqrt{2 p_{n}}} \sum_{k = 1}^{p_{n}} \frac{n {⟨ {\hat{μ}}_{1} - {\hat{μ}}_{2}, {\hat{φ}}_{k} ⟩}^{2}}{λ_{k}} + o_{p} (1) .

(A1)

Proof.

Denote

ε_{p} (n) = {\hat{Δ} = | | | Γ_{12} - {\hat{Γ}}_{12} | | | \leq \frac{1}{2} λ_{p_{n}}}

; note that provided

ε_{p} (n)

holds, we have

\begin{matrix} \frac{1}{\sqrt{2 p_{n}}} \sum_{k = 1}^{p_{n}} \frac{n {⟨ {\hat{μ}}_{1} - {\hat{μ}}_{2}, {\hat{φ}}_{k} ⟩}^{2}}{{\hat{λ}}_{k}} - \frac{1}{\sqrt{2 p_{n}}} \sum_{k = 1}^{p_{n}} \frac{n {⟨ {\hat{μ}}_{1} - {\hat{μ}}_{2}, {\hat{φ}}_{k} ⟩}^{2}}{λ_{k}} \\ = & \frac{1}{\sqrt{2 p_{n}}} \sum_{k = 1}^{p_{n}} \frac{n (λ_{k} - {\hat{λ}}_{k})}{{\hat{λ}}_{k} λ_{k}} {⟨ \bar{f} - \bar{g}, {\hat{φ}}_{k} ⟩}^{2} \\ \leq & \frac{1}{\sqrt{2 p_{n}}} \sum_{k = 1}^{p_{n}} \frac{n {⟨ \bar{X} - \bar{Y}, {\hat{φ}}_{k} ⟩}^{2}}{{\hat{λ}}_{k} λ_{k}} sup ∣ {\hat{λ}}_{k} - λ_{k} ∣ \\ \leq & \frac{1}{\sqrt{2 p_{n}}} \sum_{k = 1}^{p_{n}} \frac{n {⟨ \bar{X} - \bar{Y}, {\hat{φ}}_{k} ⟩}^{2}}{{\hat{λ}}_{k}^{2}} | | | Γ_{12} - {\hat{Γ}}_{12} | | | . \end{matrix}

It can be proven easily that

E | | | Γ_{1} - {\hat{Γ}}_{1} {| | |}^{2} = O (n_{1}^{- 1})

,

E | | | Γ_{2} - {\hat{Γ}}_{2} {| | |}^{2} = O (n_{2}^{- 1})

; then,

| | | Γ_{12} - {\hat{Γ}}_{12} | | | = O_{p} (m^{- 1 / 2}) .

According to central limit theory, we have

\frac{n ⟨ \bar{X} - \bar{Y}, {\hat{φ}}_{k} ⟩}{\sqrt{{\hat{λ}}_{k}}} \overset{d}{⟶} N (0, 1),

(A2)

Then,

\frac{n {⟨ \bar{X} - \bar{Y}, {\hat{φ}}_{k} ⟩}^{2}}{{\hat{λ}}_{k}} \overset{d}{⟶} χ_{1}^{2}

, which means

\frac{n {⟨ \bar{X} - \bar{Y}, {\hat{φ}}_{k} ⟩}^{2}}{{\hat{λ}}_{k}}

is bounded in probability.

Notice that

∣ {\hat{λ}}_{k} - λ_{k} ∣ = O_{p} (m^{- 1 / 2})

; therefore

\begin{matrix} \frac{1}{\sqrt{2 p_{n}}} \sum_{k = 1}^{p_{n}} \frac{n {⟨ {\hat{μ}}_{1} - {\hat{μ}}_{2}, {\hat{φ}}_{k} ⟩}^{2}}{{\hat{λ}}_{k}} - \frac{1}{\sqrt{2 p_{n}}} \sum_{k = 1}^{p_{n}} \frac{n {⟨ {\hat{μ}}_{1} - {\hat{μ}}_{2}, {\hat{φ}}_{k} ⟩}^{2}}{λ_{k}} \\ \leq & O_{p} (\frac{1}{\sqrt{2 p_{n}}} m^{- 1 / 2} p_{n}^{a + 1}) = o_{p} (1) . \end{matrix}

Ref. [15] has proven that

P (ε_{p}) \to 1

as

m \to \infty

; thus, Lemma A1 holds. □

Lemma A2.

Under Assumptions 1–4, we have

\frac{1}{\sqrt{2 p_{n}}} \sum_{k = 1}^{p_{n}} \frac{n {⟨ {\hat{μ}}_{1} - {\hat{μ}}_{2}, {\hat{φ}}_{k} ⟩}^{2}}{λ_{k}} = \frac{1}{\sqrt{2 p_{n}}} \sum_{k = 1}^{p_{n}} \frac{n {⟨ {\hat{μ}}_{1} - {\hat{μ}}_{2}, φ_{k} ⟩}^{2}}{λ_{k}} + o_{p} (1) .

(A3)

Proof.

First,

\begin{matrix} \frac{1}{\sqrt{2 p_{n}}} \sum_{k = 1}^{p_{n}} \frac{n {⟨ {\hat{μ}}_{1} - {\hat{μ}}_{2}, {\hat{φ}}_{k} ⟩}^{2}}{λ_{k}} \\ = & \frac{1}{\sqrt{2 p_{n}}} \sum_{k = 1}^{p_{n}} \frac{n {⟨ {\hat{μ}}_{1} - {\hat{μ}}_{2}, {\hat{φ}}_{k} - φ_{k} + φ_{k} ⟩}^{2}}{λ_{k}} \\ = & \frac{1}{\sqrt{2 p_{n}}} \sum_{k = 1}^{p_{n}} \{\frac{n {⟨ {\hat{μ}}_{1} - {\hat{μ}}_{2}, {\hat{φ}}_{k} - φ_{k} ⟩}^{2}}{λ_{k}}\} + \frac{1}{\sqrt{2 p_{n}}} \sum_{k = 1}^{p_{n}} \{\frac{n {⟨ {\hat{μ}}_{1} - {\hat{μ}}_{2}, φ_{k} ⟩}^{2}}{λ_{k}}\} \\ + \frac{1}{\sqrt{2 p_{n}}} \sum_{k = 1}^{p_{n}} \{2 \frac{n ⟨ {\hat{μ}}_{1} - {\hat{μ}}_{2}, {\hat{φ}}_{k} - φ_{k} ⟩ ⟨ {\hat{μ}}_{1} - {\hat{μ}}_{2}, φ_{k} ⟩}{λ_{k}}\} \\ . \end{matrix}

Then,

\begin{matrix} \frac{1}{\sqrt{2 p_{n}}} \sum_{k = 1}^{p_{n}} \frac{n {⟨ {\hat{μ}}_{1} - {\hat{μ}}_{2}, {\hat{φ}}_{k} ⟩}^{2}}{λ_{k}} - \frac{1}{\sqrt{2 p_{n}}} \sum_{k = 1}^{p_{n}} \frac{n {⟨ {\hat{μ}}_{1} - {\hat{μ}}_{2}, φ_{k} ⟩}^{2}}{λ_{k}} \\ = & \frac{1}{\sqrt{2 p_{n}}} \sum_{k = 1}^{p_{n}} \{\frac{n {⟨ {\hat{μ}}_{1} - {\hat{μ}}_{2}, {\hat{φ}}_{k} - φ_{k} ⟩}^{2}}{λ_{k}}\} + \frac{1}{\sqrt{2 p_{n}}} \sum_{k = 1}^{p_{n}} \{2 \frac{n ⟨ {\hat{μ}}_{1} - {\hat{μ}}_{2}, {\hat{φ}}_{k} - φ_{k} ⟩ ⟨ {\hat{μ}}_{1} - {\hat{μ}}_{2}, φ_{k} ⟩}{λ_{k}}\} \\ = & I_{1} + I_{2} . \end{matrix}

where

I_{1} = \frac{1}{\sqrt{2 p_{n}}} \sum_{k = 1}^{p_{n}} \{\frac{n {⟨ {\hat{μ}}_{1} - {\hat{μ}}_{2}, {\hat{φ}}_{k} - φ_{k} ⟩}^{2}}{λ_{k}}\},

I_{2} = \frac{1}{\sqrt{2 p_{n}}} \sum_{k = 1}^{p_{n}} \{2 \frac{n ⟨ {\hat{μ}}_{1} - {\hat{μ}}_{2}, {\hat{φ}}_{k} - φ_{k} ⟩ ⟨ {\hat{μ}}_{1} - {\hat{μ}}_{2}, φ_{k} ⟩}{λ_{k}}\} .

It is obvious that

\sum_{k = 1}^{p_{n}} {⟨ \sqrt{n} (\bar{X} - \bar{Y}), {\hat{φ}}_{k} - φ_{k} ⟩}^{2} λ_{k}^{- 1} \leq \sum_{k = 1}^{p_{n}} ∥ \sqrt{n} (\bar{f} - \bar{g}) ∥^{2} \cdot {‖ {\hat{φ}}_{k} - φ_{k} ∥}^{2} λ_{k}^{- 1} .

It can also be easily proven that

∥ \sqrt{n} (\bar{X} - \bar{Y}) ∥^{2} = O_{p} (1)

. According to the result of [17], we have

∥ {\hat{φ}}_{k} - φ_{k} ∥ = O_{p} (k m^{- 1 / 2})

under corresponding conditions. Then, we have

I_{1} = O_{p} (\frac{1}{\sqrt{2 p_{n}}} \sum_{k = 1}^{p_{n}} k^{2} m^{- 1} k^{a}) = O_{p} (\frac{1}{\sqrt{2 p_{n}}} p_{n}^{3} m^{- 1}) = o_{p} (1) .

Similarly, we can prove

I_{2} = o_{p} (1)

. □

Lemma A3.

Under Assumptions 1–4, we have

T^{*} = \frac{1}{\sqrt{2 p_{n}}} \sum_{k = 1}^{p_{n}} [\frac{n {⟨ {\hat{μ}}_{1} - {\hat{μ}}_{2}, φ_{k} ⟩}^{2}}{λ_{k}} - 1]

which converges in distribution to a centered Gaussian random variable g with variance 1.

The proof of Lemma A3 is similar to the techniques used by [18], so we omit it here.

Proof of Theorem 2.

Combine Lemmas A1 and A2 with Lemma A3; then, can proof Theorem 2. □

Proof of Theorem 3.

According to Theorem 1, Lemmas A1 and A2, we have

{\hat{V}}_{k} \sim N (0, 1)

and

{\hat{V}}_{k}^{2} \sim χ^{2} (1)

. Then, the conclusion is obvious. □

Proof of Theorem 5.

We note that

\begin{matrix} n {⟨ {\hat{μ}}_{1} - {\hat{μ}}_{2}, φ_{k} ⟩}^{2} & = & n {⟨ {\hat{μ}}_{1} - {\hat{μ}}_{2} - μ_{1} + μ_{2}, φ_{k} ⟩}^{2} \\ + 2 n ⟨ {\hat{μ}}_{1} - {\hat{μ}}_{2}, φ_{k} ⟩ ⟨ μ_{1} - μ_{2}, φ_{k} ⟩ - n {⟨ μ_{1} - μ_{2}, φ_{k} ⟩}^{2}, \\ = & J_{k 1} + 2 J_{k 2} - J_{k 3} . \end{matrix}

where

J_{k 1} = n {⟨ {\hat{μ}}_{1} - {\hat{μ}}_{2} - μ_{1} + μ_{2}, φ_{k} ⟩}^{2},

J_{k 2} = n ⟨ {\hat{μ}}_{1} - {\hat{μ}}_{2}, φ_{k} ⟩ ⟨ μ_{1} - μ_{2}, φ_{k} ⟩,

J_{k 3} = n {⟨ μ_{1} - μ_{2}, φ_{k} ⟩}^{2} .

Then

\frac{1}{\sqrt{2 p_{n}}} \sum_{k = 1}^{p_{n}} {\hat{V}}_{k} = \frac{1}{\sqrt{2 p_{n}}} \sum_{k = 1}^{p_{n}} (J_{k 1} + 2 J_{k 2} - J_{k 3}) / {\hat{λ}}_{k} .

Observe that

\frac{1}{\sqrt{2 p_{n}}} \sum_{k = 1}^{p_{n}} (J_{k 2} - J_{k 3}) / {\hat{λ}}_{k} = \frac{1}{\sqrt{2 p_{n}}} \sum_{k = 1}^{p_{n}} n ⟨ μ_{1} - μ_{2}, φ_{k} ⟩ ⟨ {\hat{μ}}_{1} - {\hat{μ}}_{2} - μ_{1} + μ_{2}, φ_{k} ⟩ / {\hat{λ}}_{k} .

According to (A2), we have

n ⟨ {\hat{μ}}_{1} - {\hat{μ}}_{2} - μ_{1} + μ_{2}, φ_{k} ⟩ / \sqrt{{\hat{λ}}_{k}} = O_{p} (1) .

By Assumptions 1–5 and Lemma A1, we have that

\begin{matrix} \frac{1}{\sqrt{2 p_{n}}} \sum_{k = 1}^{p_{n}} ⟨ μ_{1} - μ_{2}, φ_{k} ⟩ / {\hat{λ}}_{k} \\ \leq & \frac{1}{\sqrt{2 p_{n}}} \sum_{k = 1}^{p_{n}} n^{- \frac{1}{2}} ∥ u (t) ∥ / λ_{k} = O_{p} (\frac{n^{- \frac{1}{2} + b} \sum_{k = 1}^{p_{n}} k^{a}}{\sqrt{2 p_{n}}}) \\ = & O_{p} (n^{- \frac{1}{2}} p_{n}^{a + 1}) = o_{p} (1) . \end{matrix}

Under Assumptions 4 and 5, we have

\begin{matrix} lim_{p_{n} \to \infty} \frac{1}{\sqrt{2 p_{n}}} \sum_{k = 1}^{p_{n}} J_{k 2} / {\hat{λ}}_{k} & = & lim_{p_{n} \to \infty} \frac{1}{\sqrt{2 p_{n}}} \sum_{k = 1}^{p_{n}} n ⟨ {\hat{μ}}_{1} - {\hat{μ}}_{2}, {\hat{φ}}_{k} ⟩ ⟨ μ_{1} - μ_{2}, {\hat{φ}}_{k} ⟩ / {\hat{λ}}_{k} \\ = & lim_{p_{n} \to \infty} \frac{1}{\sqrt{2 p_{n}}} \sum_{k = 1}^{p_{n}} \frac{n {⟨ μ_{1} - μ_{2}, φ_{k} ⟩}^{2}}{λ_{k}} \\ = & lim_{p_{n} \to \infty} \sum_{k = 1}^{p_{n}} {⟨ u (t), φ_{k} ⟩}^{2} / λ_{k} \\ = & ∥ Γ_{12}^{- 1} {u (t) ∥}^{2} . \end{matrix}

(A4)

From Theorem 2 and the above results, we have

\begin{matrix} P (N T_{1 A} \geq z_{1 - α}) & = & P (\frac{\sum_{k = 1}^{p_{n}} {\hat{V}}_{k} - p_{n}}{\sqrt{2 p_{n}}} \geq z_{1 - α}) \\ = & P (\frac{\sum_{k = 1}^{p_{n}} J_{k 1} / {\hat{λ}}_{k} - p_{n}}{\sqrt{2 p_{n}}} \geq z_{1 - α} - \frac{\sum_{k = 1}^{p_{n}} J_{k 2} / {\hat{λ}}_{k}}{\sqrt{2 p_{n}}}) \end{matrix}

From (A2) we can obtain

\frac{J_{k 1}}{{\hat{λ}}_{k}} \overset{d}{⟶} χ_{1}^{2}

; then,

lim_{(m, p_{n}) \to \infty} P (\frac{\sum_{k = 1}^{p_{n}} J_{k 1} / {\hat{λ}}_{k} - p_{n}}{\sqrt{2 p_{n}}} \leq x) = Φ (x),

(A5)

Combined with (A4), we have

lim_{(m, p_{n}) \to \infty} P (N T_{1 A} > z_{1 - α}) = Φ (- z_{1 - α} + ∥ Γ_{12}^{- 1} u (t) ∥^{2}) .

□

Proof of Theorem 6.

By Lemma A2, we have

(\sum_{k = 1}^{p_{n}} {\hat{V}}_{k} - \sum_{k = 1}^{p_{n}} V_{k}) \overset{P}{⟶} 0

as

n \to \infty .

Define

(k_{1}^{*}, \dots, k_{p_{n}}^{*})

as decreasing orders of

V_{1}, \dots, V_{p_{n}}

and

(k_{1}, \dots, k_{p_{n}})

as decreasing orders of

{\hat{V}}_{1}, \dots, {\hat{V}}_{p_{n}}

. Ref. [19] have proven that the random orders in the selection procedure

{\hat{V}}_{(1)}, \dots, {\hat{V}}_{(p_{n})}

are asymptotically equivalent to fixed orders

V_{(1)}, \dots, V_{(p_{n})}

. □

References

Besse, P.; Ramsay, J. Principal components analysis of sampled functions. Psychometrika 1986, 51, 285–311. [Google Scholar] [CrossRef]
Rice, J.A.; Silverman, B.W. Estimating the mean and covariance structure nonparametrically when the data are curves. J. R. Stat. Soc. Ser. B (Stat. Methodol.) 1991, 53, 233–243. [Google Scholar] [CrossRef]
Ramsay, J.O.; Silverman, B.W. Functional Data Analysis; Springer Press: Berlin/Heidelberg, Germany, 2005. [Google Scholar]
Ferraty, F.; Vieu, P.; Viguier-Pla, S. Factor-based comparison of groups of curves. Comput. Stat. Data Anal. 2007, 51, 4903–4910. [Google Scholar] [CrossRef]
Horváth, L.; Kokoszka, P. Inference for Functional Data with Applications; Springer Press: Berlin/Heidelberg, Germany, 2012. [Google Scholar]
Fan, J.Q.; Lin, S.K. Test of significance when data are curves. J. Am. Stat. Assoc. 1998, 93, 1007–1021. [Google Scholar] [CrossRef]
Faraway, J.J. Regression analysis for a functional response. Technometrics 1997, 39, 254–261. [Google Scholar] [CrossRef]
Zhang, C.Q.; Peng, H.; Zhang, J.T. Two samples tests for functional data. Commun. Stat.-Methods 2010, 39, 559–578. [Google Scholar] [CrossRef]
Zhang, J.T. Statistical inferences for linear models with functional responses. Stat. Sin. 2011, 21, 1431–1451. [Google Scholar] [CrossRef]
Zhang, J.T.; Liang, X. One-way ANOVA for functional data via globalizing the pointwise F-test. Scand. J. Stat. 2014, 45, 51–71. [Google Scholar] [CrossRef]
Zhang, J.T.; Cheng, M.Y.; Wu, H.T.; Zhou, B. A new test for functional one-way ANOVA with applications to ischemic heart screening. Comput. Stat. Data Anal. 2019, 132, 3–17. [Google Scholar] [CrossRef]
Wasserman, L.; Roeder, K. High dimensional variable selection. Ann. Stat. 2009, 37, 2178–2201. [Google Scholar] [CrossRef] [PubMed]
Shen, Q.; Faraway, J.L. An F test for linear models with functional responses. Stat. Sin. 2004, 14, 1239–1257. [Google Scholar]
Satterthwaites, F.E. Synthesis of variance. Psychometrika. 1941, 6, 309–3167. [Google Scholar] [CrossRef]
Hall, P.; Hosseini-Nasab, M. On properties of functional principal components analysis. J. R. Stat. Soc. Ser. B (Stat. Methodol.) 2006, 68, 109–126. [Google Scholar] [CrossRef]
Kraus, D.; Panaretos, V.M. Dispersion operators and resistant second-order functional data analysis. Biometrika 2012, 99, 813–832. [Google Scholar] [CrossRef]
Kong, D.; Xue, K.; Yao, F.; Zhang, H.H. Partially functional linear regression in high dimensions (supplementary material). Biometrika 2016, 103, 147–159. [Google Scholar] [CrossRef]
Shang, Y.L. A central limit theorem for randomly indexed m-dependent random variables. Filomat 2012, 26, 713–717. [Google Scholar] [CrossRef]
Su, Y.R.; Di, C.Z.; Li, H. Hypothesis testing in functional linear models. Biometrics 2017, 73, 551–561. [Google Scholar] [CrossRef] [PubMed]

Figure 1. The mean PM 2.5 (μg/m

^{3}

) of Beijing, Tianjin, and Shijiazhuang from December 2019 to January 2019. The black line stands for Beijing. The red dasehed line stands for Tianjin. The green dotted line stands for Shijiazhuang.

Figure 1. The mean PM 2.5 (μg/m

^{3}

) of Beijing, Tianjin, and Shijiazhuang from December 2019 to January 2019. The black line stands for Beijing. The red dasehed line stands for Tianjin. The green dotted line stands for Shijiazhuang.

Figure 2. The mean PM 2.5 (μg/m

^{3}

) of Beijing, Tianjin, and Shijiazhuang from December 2019 to January 2017. The black line stands for Beijing. The red asehed line stands for Tianjin. The green dotted line stands for Shijiazhuang.

Figure 2. The mean PM 2.5 (μg/m

^{3}

) of Beijing, Tianjin, and Shijiazhuang from December 2019 to January 2017. The black line stands for Beijing. The red asehed line stands for Tianjin. The green dotted line stands for Shijiazhuang.

Table 1. Size and power of five methods in Setting 1 (mean difference lies in early part

μ_{2} [1 : 6] = μ_{1} [1 : 6] + δ

).

Table 1. Size and power of five methods in Setting 1 (mean difference lies in early part

μ_{2} [1 : 6] = μ_{1} [1 : 6] + δ

).

$n_{2} = n_{1}$	$δ$	$T_{L}$	$T_{F}$	$T_{H}$	${NT}_{1 A}$	${NT}_{2 A}$
$n_{1} = 120$	0.00	0.067	0.065	0.068	0.045	0.055
	0.10	0.151	0.105	0.461	0.357	0.365
	0.20	0.245	0.258	0.786	0.483	0.471
	0.30	0.405	0.413	0.998	0.872	0.805
$n_{1} = 220$	0.00	0.045	0.053	0.066	0.051	0.065
	0.10	0.265	0.253	0.643	0.482	0.471
	0.20	0.385	0.393	0.881	0.564	0.585
	0.30	0.553	0.512	0.999	0.976	0.918

Table 2. Size and power of five methods in Setting 2 (mean difference lies in middle part

μ_{2} [11 : 16] = μ_{1} [11 : 16] + δ

).

Table 2. Size and power of five methods in Setting 2 (mean difference lies in middle part

μ_{2} [11 : 16] = μ_{1} [11 : 16] + δ

).

$n_{2} = n_{1}$	$δ$	$T_{L}$	$T_{F}$	$T_{H}$	${NT}_{1 A}$	${NT}_{2 A}$
$n_{1} = 120$	0.00	0.052	0.063	0.055	0.067	0.066
	0.10	0.352	0.355	0.384	0.876	0.923
	0.20	0.445	0.466	0.448	0.998	0.999
	0.30	0.581	0.592	0.544	0.999	0.999
$n_{1} = 220$	0.00	0.034	0.045	0.066	0.068	0.063
	0.10	0.442	0.461	0.583	0.975	0.923
	0.20	0.565	0.543	0.654	0.999	0.999
	0.30	0.675	0.691	0.685	0.999	0.999

Table 3. Size and power of five methods in Setting 3 (mean difference lies in latter part

μ_{2} [21 : 26] = μ_{1} [21 : 26] + δ

).

Table 3. Size and power of five methods in Setting 3 (mean difference lies in latter part

μ_{2} [21 : 26] = μ_{1} [21 : 26] + δ

).

$n_{2} = n_{1}$	$δ$	$T_{L}$	$T_{F}$	$T_{H}$	${NT}_{1 A}$	${NT}_{2 A}$
$n_{1} = 120$	0.00	0.035	0.052	0.054	0.055	0.042
	0.10	0.342	0.355	0.372	0.486	0.497
	0.20	0.565	0.552	0.593	0.744	0.795
	0.30	0.725	0.753	0.765	0.996	0.985
$n_{1} = 220$	0.00	0.065	0.061	0.045	0.052	0.055
	0.10	0.431	0.425	0.456	0.527	0.588
	0.20	0.628	0.635	0.681	0.823	0.885
	0.30	0.864	0.875	0.824	0.999	0.999

Table 4. Size and power of five methods in Setting 4 (mean difference lies in separable part

μ_{2} [1 : 2] = μ_{1} [1 : 2] + δ

,

μ_{2} [11 : 12] = μ_{1} [11 : 12] + δ

,

μ_{2} [21 : 22] = μ_{1} [21 : 22] + δ

).

Table 4. Size and power of five methods in Setting 4 (mean difference lies in separable part

μ_{2} [1 : 2] = μ_{1} [1 : 2] + δ

,

μ_{2} [11 : 12] = μ_{1} [11 : 12] + δ

,

μ_{2} [21 : 22] = μ_{1} [21 : 22] + δ

).

$n_{2} = n_{1}$	$δ$	$T_{L}$	$T_{F}$	$T_{H}$	${NT}_{1 A}$	${NT}_{2 A}$
$n_{1} = 120$	0.00	0.045	0.053	0.067	0.068	0.062
	0.10	0.162	0.165	0.326	0.463	0.554
	0.20	0.205	0.272	0.466	0.963	0.955
	0.30	0.361	0.32	0.565	0.999	0.999
$n_{1} = 220$	0.00	0.045	0.038	0.067	0.045	0.068
	0.10	0.192	0.215	0.393	0.497	0.605
	0.20	0.282	0.314	0.516	0.999	0.999
	0.30	0.461	0.465	0.689	1.000	1.000

Table 5. Size and power of five methods in Setting 5 (difference averaged in total vector

μ_{2} = μ_{1} + δ 1_{40}

).

Table 5. Size and power of five methods in Setting 5 (difference averaged in total vector

μ_{2} = μ_{1} + δ 1_{40}

).

$n_{2} = n_{1}$	$δ$	$T_{L}$	$T_{F}$	$T_{H}$	${NT}_{1 A}$	${NT}_{2 A}$
$n_{1} = 120$	0.00	0.045	0.052	0.068	0.065	0.053
	0.10	0.182	0.179	0.366	0.794	0.836
	0.20	0.372	0.364	0.854	0.999	0.999
	0.30	0.577	0.564	0.896	1.000	1.000
$n_{1} = 220$	0.00	0.036	0.045	0.062	0.063	0.065
	0.10	0.212	0.249	0.905	0.826	0.887
	0.20	0.413	0.424	0.935	0.999	0.999
	0.30	0.625	0.636	0.943	1.000	1.000

Table 6. Size and power of

N T_{2 A}

in Setting 5 (difference averaged in total vector

μ_{2} = μ_{1} + δ 1_{40}

).

Table 6. Size and power of

N T_{2 A}

in Setting 5 (difference averaged in total vector

μ_{2} = μ_{1} + δ 1_{40}

).

$δ$
$n_{2} = n_{1} = 120$	0.00	0.1	0.2	0.3
two-fold CV	0.058	0.468	0.786	0.947
five-fold CV	0.056	0.459	0.772	0.943
ten-fold CV	0.053	0.465	0.765	0.978

Table 7. Size and power of five methods under randomized signal in Setting 1

M (20, 6)

.

Table 7. Size and power of five methods under randomized signal in Setting 1

M (20, 6)

.

$n_{2} = n_{1}$	$δ$	$T_{L}$	$T_{F}$	$T_{H}$	${NT}_{1 A}$	${NT}_{2 A}$
$n_{1} = 120$	0.00	0.034	0.035	0.046	0.057	0.065
	0.10	0.262	0.275	0.366	0.594	0.606
	0.20	0.375	0.384	0.589	0.926	0.935
	0.30	0.534	0.548	0.825	0.999	0.999
$n_{1} = 220$	0.00	0.067	0.056	0.063	0.066	0.068
	0.10	0.309	0.315	0.417	0.628	0.733
	0.20	0.423	0.439	0.615	0.986	0.995
	0.30	0.674	0.695	0.923	0.999	0.999

Table 8. Size and power of five methods under randomized signal in Setting 2

M (40, 6)

.

Table 8. Size and power of five methods under randomized signal in Setting 2

M (40, 6)

.

$n_{2} = n_{1}$	$δ$	$T_{L}$	$T_{F}$	$T_{H}$	${NT}_{1 A}$	${NT}_{2 A}$
$n_{1} = 120$	0.000	0.063	0.038	0.067	0.065	0.063
	0.1	0.275	0.286	0.447	0.457	0.478
	0.2	0.355	0.372	0.743	0.785	0.799
	0.3	0.426	0.465	0.878	0.943	0.966
$n_{1} = 220$	0.00	0.042	0.053	0.064	0.066	0.065
	0.1	0.323	0.376	0.524	0.539	0.567
	0.2	0.465	0.486	0.874	0.923	0.995
	0.3	0.549	0.563	0.925	0.999	0.999

Table 9. Size and power of seven methods under randomized signal in Setting 2

M (40, 6)

.

Table 9. Size and power of seven methods under randomized signal in Setting 2

M (40, 6)

.

$n_{2} = n_{1}$	$δ$	$T_{L}$	$T_{F}$	$T_{H}$	$T_{A}$	$T_{OA}$	${NT}_{1 A}$	${NT}_{2 A}$	Time1	Time2
120	0.00	0.045	0.052	0.045	0.063	0.065	0.064	0.061	41.4575	1.1953
	0.10	0.221	0.236	0.252	0.315	0.403	0.224	0.367	41.6208	1.1964
	0.20	0.415	0.454	0.621	0.883	0.966	0.749	0.851	37.4070	1.0789
	0.30	0.626	0.684	0.827	0.994	0.975	0.891	0.896	36.4592	1.0681
180	0.00	0.051	0.062	0.054	0.057	0.054	0.063	0.067	34.6782	1.4470
	0.10	0.293	0.296	0.305	0.483	0.491	0.317	0.428	41.6208	1.1964
	0.20	0.495	0.521	0.715	0.936	0.995	0.836	0.922	37.4070	1.0789
	0.30	0.721	0.784	0.935	0.999	0.999	9.966	0.975	36.4592	1.0681

Table 10. p-value of two tests.

	Beijing vs. Tianjin		Beijing vs. Shijiazhuang		Tianjin vs. Shijiazhuang
Test	$N T_{1 A}$	$N T_{2 A}$	$N T_{1 A}$	$N T_{2 A}$	$N T_{1 A}$	$N T_{2 A}$
p-value	0.035	0.034	0.025	0.032	0.039	0.045

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Article Metrics

Citations

Article Access Statistics

Journal Statistics

Multiple requests from the same IP address are counted as one view.