Representation Theorem and Functional CLT for RKHS-Based Function-on-Function Regressions

Hengzhen Huang; Guangni Mo; Haiou Li; Hong-Bin Fang

doi:10.3390/math10142507

,

and

¹

College of Mathematics and Statistics, Guangxi Normal University, Guilin 541004, China

²

Department of Biostatistics, Bioinformatics and Biomathematics, Georgetown University Medical Center, Washington, DC 20057, USA

^*

Author to whom correspondence should be addressed.

Mathematics2022, 10(14), 2507;https://doi.org/10.3390/math10142507

This article belongs to the Special Issue New Advances in High-Dimensional and Non-asymptotic Statistics

Version Notes

Order Reprints

Abstract

We investigate a nonparametric, varying coefficient regression approach for modeling and estimating the regression effects caused by two functionally correlated datasets. Due to modern biomedical technology to measure multiple patient features during a time interval or intermittently at several discrete time points to review underlying biological mechanisms, statistical models that do not properly incorporate interventions and their dynamic responses may lead to biased estimates of the intervention effects. We propose a shared parameter change point function-on-function regression model to evaluate the pre- and post-intervention time trends and develop a likelihood-based method for estimating the intervention effects and other parameters. We also propose new methods for estimating and hypothesis testing regression parameters for functional data via reproducing kernel Hilbert space. The estimators of regression parameters are closed-form without computation of the inverse of a large matrix, and hence are less computationally demanding and more applicable. By establishing a representation theorem and a functional central limit theorem, the asymptotic properties of the proposed estimators are obtained, and the corresponding hypothesis tests are proposed. Application and the statistical properties of our method are demonstrated through an immunotherapy clinical trial of advanced myeloma and simulation studies.

Keywords:

functional data; hypothesis testing; regression function; reproducing kernel Hilbert space; sparsely observed data

MSC:

62G05; 62G10

1. Introduction

Modern biomedical technology has made it possible to measure multiple patient features during a time interval or intermittently at several discrete time points to review underlying biological mechanisms. Functional data also arise in genetic studies—a massive amount of gene expression data is recorded for each subject and could be treated as a functional curve [1]. Functional data analysis provides distinct features related to the dynamics of cellular responses and activity and other biological processes. Existing methods, such as projection, dimension-reduction, and functional linear regression analysis, are not adapted for such data. Overviews can be found in the book by Horváth and Kokoszka [2] and some recently published papers such as Yuan et al. [3] and Lai et al. [4].

Ramsay and Silverman [5], Clarkson et al. [6], and Ferraty and Vieu [7] introduced some basic tools and widely accepted methods for functional data analysis; Horváth and Kokoszka [2] established some fundamental methods for estimation and hypothesis testing on mean functions and covariance operators of functional data. The topics are broad and the results are in depth. Conventionally, each data curve is assumed to be observed over a dense set of points, often over thousands of points, then smoothing techniques are used to produce continuous curves, and these curves are treated as completely observed functional data for statistical inference. In contrast with those assumptions, we consider the more practical issues in which the data curves are only observed at some (not dense) time points, and the observed data curves are actually interpolations at those observed points. Of course, a relatively large sample size is needed for sparse observations. The effects of both number of observation points and sample size are also considered in our analysis.

For analyzing longitudinal data, Zeger and Diggle [8] considered a semiparametric regression model of the form, with longitudinal observations

Y (t) = X^{'} (t) β + θ (t) + ϵ (t), t \in T,

(1)

where

Y (t)

is the response variable,

X (t)

is the

p \times 1

covariate vector at time t,

β

is a

p \times 1

constant vector of unknown regression coefficients,

θ (t)

is an unspecified baseline function,

ϵ (t)

is a zero-mean stochastic process, and

T

represents the observation interval. Under this model, Lin and Ying [9] estimated

β

via a weighted least squares estimator based on the theory of counting processes; Fan and Li [10] further studied this model using a weighted difference-based estimator and a weighted local linear estimator followed by statistical inference, as discussed in Xue and Zhu [11].

For functional data analysis, the data are often represented by

(y_{i}, x_{i} (\cdot))

(i = 1, \dots, n)

, and the model is [12,13,14]

y_{i} = \int_{T} β (t) x_{i} (t) d t + ϵ_{i} .

(2)

Some researchers considered the following model [2,5,15]

y_{i} (t) = \int_{T} β (s, t) x_{i} (s) d s + ϵ_{i} (t) .

(3)

To estimate

β (\cdot, \cdot)

, assume there are basis

{ξ_{k}}

and

{η_{k}}

, which span the spaces of the

{x_{i} (\cdot)}

and

{y_{i} (\cdot)}

, respectively. The estimate of

β (\cdot, \cdot)

of the form is given by

\hat{β} (s, t) = \sum_{i = 1}^{k} \sum_{j = 1}^{r} b_{i j} ξ_{i} (s) η_{j} (t),

and

b_{i j}

is estimated by minimizing the residual sum of squares

\sum_{i = 1}^{n} | | y_{i} - \int \hat{β} (s, t) x_{i} (s) d s {| |}^{2}

. Although the resulting estimator is useful, a representation theorem for such an estimator is hard to obtain, and hence the asymptotic distribution of this approach is not clear. Yao, et al. [15] investigated a functional principle component method for estimation of model (3) and obtained consistent results. Müller and Yao [16] studied a variation of the above model in the conditional expectation format.

The smoothing spline method is popular for curve estimation. The function curves can be estimated at any point, followed by the computation of coefficients. However, the asymptotic property of estimators based on the spline method is tough to handle. For natural polynomial splines, the number of knots is the number of untied observations, which is sometimes redundant and undesirable. B-splines only require a few (the degree of the polynomial plus two) basis functions and are easy to implement [17,18,19]. Another method is local linear fit [20,21,22], but the difficulty is in choosing the bandwidth, especially when the observation points are uneven. Therefore, in this paper we employ reproducing kernel Hilbert space (RKHS), a special form of spline method in which the turning point from curve estimation to point estimation Yuan and Cai [12] explored its application on functional linear regression problem, and Lei and Zhang [23] extented it to RKHS-based partially functional linear models. In general, one needs to choose a set of (orthogonal) basis functions and the number of basis for functional estimations, while with RKHS one only needs to determine the kernel(s) of RKHS. Furthermore, the Riesz presentation theorem shows that any bounded linear function can be reproduced as a representer based on the RKHS kernel with a closed form.

However, existing RKHS methods often meet obstacles when choosing different norms and the corresponding optimization procedures. Although using a carefully selected norm in the optimization criterion has the advantage of interpretation, it suffers in that the resulting regression estimator generally needs the computation of an inversion of a large matrix (the same as the sample size). Moreover, most of the existing methods, including the aforementioned RKHS methods, are designed for the case where the observed data are sampled from a dense rate and are limited to models in which either the response or predictors are functions. New methods for estimation and hypothesis testing of regression parameters for the more general case where both the response and predictors are functions with sparsely observed data are needed. To address these problems, we propose a new RKHS method with a unified norm to characterize the RKHS and the optimization criterion for function-on-function regression. Although the statistical interpretation of this optimization criterion is not fully clear, with a simple closed form of the estimated regressors under a general function-on-function regression model, this optimization is more computationally reliable and applicable without the need of computing the inverse of a massive matrix. By establishing a representation theorem and a functional central limit theorem based on the proposed model, we obtain the asymptotic distribution of the estimators. Hypothesis testing of the underlying curves is proposed accordingly.

The remainder of this paper is organized as follows. Section 2 describes the proposed method for the estimation and hypothesis testing of regression parameters for functional data via the reproducing kernel Hilbert space and establishes some theoretical properties. Simulation studies and a real-data example to demonstrate the effectiveness of our proposed method are given in Section 3 and Section 4, respectively. Section 5 gives some concluding remarks, and all technical proofs are left in the Appendix A.

2. The Proposed Method

We consider the observed data

\{(y_{i} (t_{i j}), x_{i} (t_{i j})), j = 1, \dots, m_{i}; i = 1, \dots, n\}

. The underlying data curves

(y_{i} (\cdot), x_{i} (\cdot))

are iid copies from

(y (\cdot), x (\cdot))

, where

y (\cdot)

and

x (\cdot) = {(x_{1} (\cdot), \dots, x_{d} (\cdot))}^{'}

are random curves on some region T. The observation times

t_{i j} \in (0, T]

are generally assumed to be different for each subject i for some

0 < T < \infty

. We assume that time points

m_{i}

(

i = 1, \dots, n

) are iid copies from some integer-valued random variable m, and given

m_{i}

, the time points

t_{i j}

for (

j = 1, \dots, m_{i}

) are iid copies from a positive random variable G, with its support on

(0, T]

. For each individual, the observed data

(y_{i}, x_{i})

can be interpolated as curves

({\hat{y}}_{i}, {\hat{x}}_{i})

on T. We assume the following model for the observed data

y_{i} (t) = β^{'} (t) x_{i} (t) + ϵ_{i} (t), E [ϵ_{i} (t)] = 0, (i = 1, \dots, n),

(4)

where

β (\cdot) = {(β_{1} (\cdot), \dots, β_{d} (\cdot))}^{'}

are the true regression coefficient functions for the covariates

x_{i} (\cdot)

’s, and the

ϵ_{i} (\cdot)

’s are random errors. In general,

ϵ_{i} (s)

and

ϵ_{i} (t)

are non-independent for

s \neq t

, e.g.,

ϵ_{i} (\cdot)

being a zero-mean Gaussian process with some covariance function

Γ (s, t)

, known or unknown. Note that model (4) is more general than (2) and is more straightforward than model (3) in describing the relationship between the responses

y_{i} (\cdot)

-th and the covariates

x_{i} (\cdot)

. Typically, we set

x_{1} (\cdot) \equiv 1

, and so

β_{1} (\cdot)

is the baseline function. Since

t_{i j}

and

t_{k j}

may be different even for the same j, there may be no observation or just a few observations at each time point t.

To estimate the regression coefficient function

β (\cdot)

, the simplest way is the point-wise least squares estimate or any other non-smoothing (i.e., without roughness penalty) functional estimates. However, those estimates have some undesirable properties, often with wiggly shape and large variances in the area with sparse observations. An established performance measure for functional estimation is the mean square error (MSE),

MSE = {Bias}^{2} + Sampling variance .

Non-smoothed estimates often have small bias but large sampling variance, while smoothed estimates are the other way around, with much smoother shape by adjusting the shape from neighboring data, but with larger bias. To better balance the trade-off between bias and sampling variance and optimize the MSE, a regularized smooth estimate is preferred, in which a smoothing parameter could control the degree of penalty.

Existing smoothing methods all suffer different aspects of weakness. Functional principal component analysis [15] is computationally intensive. General spline and kernel smoothing methods [24] do not fit the problem under research due to their constant choice of bandwidth. It is known that for non-smoothing methods, computation complexity is often of the order

O (n)

, where n is the data sample size, while for smoothing methods the amount of computation may substantially exceeds

O (n)

and even become computationally prohibitive. Thus, for smoothing methods, it is important to find a method with

O (n)

computation load. To achieve this with spline methods, the basis should have only local support (i.e., nonzero only locally). Recently, a popular method in functional estimation is using the reproducing kernel Hilbert space (RKHS). RKHS is a special spline method that has this property, and can achieve the

O (n)

computation for many functional estimation problems [5,12].

For functional estimate with RKHS, we define two norms (inner products) on the same RKHS

H

: one, denoted by

< \cdot, \cdot >

, defines the objective optimization criterion, and another one, denoted by

< \cdot, \cdot >_{H}

, is for the RKHS

H

. Different from a general Hilbert space, in an RKHS

H

of functions on T, the point evaluation functional

ρ_{t} (h) = h (t) (h \in H)

is a continuous linear map, so that by the Riesz representation theorem, there is a bi-variate function

k (\cdot, \cdot)

on T such that

ρ_{t} (h) = h (t) = < h (\cdot), k (\cdot, t) >_{H}, \forall h \in H .

Take

h (\cdot) = k (\cdot, s)

, we also get

k (t, s) = < k (\cdot, s), k (\cdot, t) >_{H} .

The above two properties yield the name RKHS.

Note that for a given Hilbert space

H

, a collection of functions on some domain T with a given inner product

< \cdot, \cdot >_{H}

, its reproducing kernel K may not be unique. In fact, for any mapping

G : T \mapsto ℓ_{2} (T^{2})

,

K (s, t) = < G (s, \cdot), G (t, \cdot) >_{H}

is a reproducing kernel for

H

, and any reproducing kernel of

H

can be expressed in this form (Berlinet and Thomas-Agnan, 2004), and it has a one-to-one correspondence with a covariance function on

H^{2}

. The choice of a kernel is mainly for convenience. However, a reproducing kernel under one inner product may not be a reproducing kernel under another inner product on the same space

H

. Assume

β \in H^{d}

, with

H

being some RKHS and a known kernel

K (\cdot, \cdot)

, both are to be specified later. Let

< \cdot, \cdot >

be another inner product on

H

(typically

< f, g \geq \int_{T} f (t) g (t) d t

and

{| | h | |}^{2} = < h, h >

for all

h \in H

). With the observed curves

({\hat{y}}_{i}, {\hat{x}}_{i})

(i = 1, \dots, n)

, ideally an optimization procedure for estimating

β (\cdot)

in (4) will be of the form

{\hat{β}}_{n, λ} (\cdot) = arg inf_{β \in H^{d}} (\frac{1}{n} \sum_{i = 1}^{n} | | {\hat{y}}_{i} - β^{'} {\hat{x}}_{i} {| |}^{2} + λ J (β)),

where

J (\cdot)

is a penalty functional, and

λ > 0

is the smoothing parameter. The penalty term

J (\cdot)

can be significantly simplified via the RKHS as shown in the proof of Theorem 1 below. If

λ = 0

, the above procedure gives the unsmoothed estimate with some undesirable properties such as overfitting and large variance.

For model (2) with one covariate variable, Yuan and Cai [12] considered penalized estimate

\hat{β}

of

β (\cdot)

. The corresponding estimator

\hat{β} (\cdot)

has a closed form of being linear in

x_{i} (\cdot)

, but the computation involves the inverse of an

(n + 2)

matrix. For model (1) with d covariates, we first consider estimator of

β (\cdot)

in the form of linear in

x_{i} (\cdot)

. It turns out that the estimator has a closed form but also involves the inverse of a

d (n + 2)

matrix, which is computationally infeasible in general.

Consider an estimator of

β

in the form of a linear combination of

({\hat{x}}_{1} (\cdot), \dots, {\hat{x}}_{n} (\cdot))

. For any

f \in H

, denote

(K_{0} f) (t) = < K_{0} (t, \cdot), f (\cdot) >_{H}

, and for any

f = {(f_{1}, \dots, f_{d})}^{'} \in H^{d}

, denote

(K_{0} f) (t) = {((K_{0} f_{1}) (t), \dots, (K_{0} f_{d}) (t))}^{'}

and similarly for

K_{1} f

. For

d \times n

matrix

B

and

n \times d

matrix

Z

, let

b_{1}, \dots, b_{d}

be the d rows of

B

,

z_{1}, \dots, z_{d}

be the d columns of

Z

, and define

B ⊙ Z = {(b_{1} z_{1}, \dots, b_{d} z_{d})}^{'}

a d-column vector. Since

{\hat{x}}_{i} = K_{0} {\hat{x}}_{i} + K_{1} {\hat{x}}_{i}

, and

K_{0} {\hat{x}}_{i} \in H_{0}^{d}

,

H_{0}

has a basis

g = {(g_{1} (\cdot), \dots, g_{k} (\cdot))}^{'}

, we consider estimate

\hat{β} (\cdot)

of

β_{0} (\cdot)

with the form

A g + B ⊙ Z_{n}

, where

A

is a

d \times k

matrix,

B

is a

d \times n

matrix, and

Z_{n} (\cdot) = {(K_{1} {\hat{x}}_{1} (\cdot), \dots, K_{1} {\hat{x}}_{n} (\cdot))}^{'}

is

n \times d

. With

{| | h | |}^{2} = \int_{T} h^{2} (t) d t

, for fixed

λ

an RKHS estimator of

β_{0} (\cdot)

is of the form

{\hat{β}}_{n, λ} (\cdot) = \hat{A} g + \hat{B} ⊙ Z_{n} (\cdot),

where

(\hat{A}, \hat{B}) = arg inf_{(A, B)} (\frac{1}{n} \sum_{i = 1}^{n} | | {\hat{y}}_{i} - {\hat{x}}_{i}^{'} (A g + B ⊙ Z_{n}) {| |}^{2} + λ J (A g + B ⊙ Z_{n})) .

(5)

For the penalty, let

D

be a pre-specified

d \times d

symmetric positive definite constant matrix; we define

J (h) = < h^{'} {(D^{1 / 2})}^{'}, D^{1 / 2} h >_{H} = < h^{'} D, h >_{H} : = {| | h | |}_{H}^{2}, h \in H^{d}

and

H_{0}^{d} = {h \in H^{d} : J (h) = 0} = {h \in H^{d} {: | | h | |}_{H}^{2} = 0} \subset H^{d}

as the null space for the penalty, and

H_{1}^{d}

is its orthogonal complement (with respect to the inner product

< \cdot D, \cdot >_{H}

). Then,

H^{d} = H_{0}^{d} \oplus H_{1}^{d}

. That is,

\forall h \in H^{d}

; it has the decomposition

h = h_{0} + h_{1}

, with

h_{0} \in H_{0}^{d}

and

h_{1} \in H_{1}^{d}

. Here,

H_{1}

is also an RKHS with some reproducing kernel

K_{1} (\cdot, \cdot)

on

H_{1}

. With RKHS,

K_{0} h \in H_{0}^{d}

for all

h \in H^{d}

, which implies that

< {(K_{0} h)}^{'} D, K_{0} h >_{H} = 0

. Further,

K_{1} h \in H_{1}^{d}

for all

h \in H^{d}

, and

< {(K_{0} h)}^{'} D, K_{1} h >_{H} = 0

. Thus

J (h) = < h^{'} D, h >_{H} = < {(K_{0} h + K_{1} h)}^{'} D, K_{0} h + K_{1} h >_{H} = < {(K_{0} h)}^{'} D, K_{0} h >_{H} + 2 < {(K_{0} h)}^{'} D, K_{1} h >_{H} + < {(K_{1} h)}^{'} D, K_{1} h >_{H} = < {(K_{1} h)}^{'} D, K_{1} h >_{H} .

Typically,

D

is chosen to be a

d \times d

identity matrix. The choices of

K_{0}

,

K_{1}

, and the inner product

< \cdot D, \cdot >_{H}

will be addressed latter.

For a function

a (\cdot)

and a vector of functions

b (\cdot) = {(b_{1} (\cdot), \dots, b_{k} (\cdot))}^{'}

, denote

< a, b \geq {(< a, b_{1} >, \dots < a, b_{k} >)}^{'}

; for a matrix

B (\cdot) = {(b_{i j} (\cdot))}_{d \times k}

, denote

< a, B \geq {(< a, b_{i j} >)}_{d \times k}

, and similarly for the notations

< a, b >_{H}

and

< a, B >_{H}

. The following representation theorem shows that the estimator given in (5) is computationally feasible for many applications.

Theorem 1.

Assume

β_{0} (\cdot) \in H^{d}

,

({\hat{y}}_{i} (\cdot), {\hat{x}}_{i} (\cdot)) \in H^{d + 1}

for

i = 1, \dots, n

. Then for the given penalty functional

J (β) = | | K_{1} (β) {| |}_{H}^{2}

and fixed λ, there are constant matrices

\hat{A} = {(a_{i j})}_{d \times k}

and

\hat{B} = {(b_{i j})}_{d \times n}

such that

{\hat{β}}_{n, λ}

given in (5) has the following representation

{\hat{β}}_{n, λ} (t) = \hat{A} g (t) + \hat{B} ⊙ {(K_{1} \hat{x})}_{n} (t), t \in (0, T]

where

{(K_{1} \hat{x})}_{n} (\cdot) = {(K_{1} {\hat{x}}_{1} (\cdot), \dots, K_{1} {\hat{x}}_{n} (\cdot))}^{'}

, and in vector form

(\hat{a}, \hat{b})

of

(\hat{A}, \hat{B})

(\begin{matrix} \hat{a} \\ \hat{b} \end{matrix}) = {(\begin{matrix} O & R \\ R^{'} & S + λ W \end{matrix})}^{- 1} (\begin{matrix} u \\ v \end{matrix}),

where the matrices

R

(

d k \times d n

),

O

(

d k \times d k

),

S

(d n \times d n

), and

W

(d n \times d n)

, and the vectors

u

and

v

are given in the proof.

For the ordinary regression model

y = β^{'} x + ϵ

, with

X_{n} = {(x_{1}, \dots, x_{n})}^{'}

and

y_{n} = {(y_{1}, \dots, y_{n})}^{'}

, the least squares method yields the estimation of

β

as

\hat{β} = {(X_{n}^{'} X_{n})}^{- 1} X_{n}^{'} y_{n}

. Since

{(X_{n}^{'} X_{n})}^{- 1}

is of order

n^{- 1}

(a.s.),

\hat{β}

can be viewed as approximately a linear form

n^{- 1} X_{n}^{'} y_{n}

. Let

{\hat{X}}_{n} (\cdot) = {({\hat{x}}_{1} (\cdot), \dots, {\hat{x}}_{n} (\cdot))}^{'}

and

{\hat{y}}_{n} (\cdot) = {({\hat{y}}_{1} (\cdot), \dots, {\hat{y}}_{n} (\cdot))}^{'}

. Now we consider estimate

\hat{β} (\cdot)

of

β_{0} (\cdot)

with linear form

n^{- 1} {\hat{X}}_{n}^{'} {\hat{y}}_{n}

. Since

n^{- 1} {\hat{X}}_{n}^{'} {\hat{y}}_{n} = K_{0} (n^{- 1} {\hat{X}}_{n}^{'} {\hat{y}}_{n}) + K_{1} (n^{- 1} {\hat{X}}_{n}^{'} {\hat{y}}_{n})

, and

K_{0} (n^{- 1} {\hat{X}}_{n}^{'} {\hat{y}}_{n}) \in H_{0}^{d}

, we only need to consider an estimate of the form

A g + B {\hat{z}}_{n}

, where

A

is a

d \times k

parameter matrix,

B

is a

d \times d

parameter matrix, and

{\hat{z}}_{n} (\cdot) = n^{- 1} [K_{1} ({\hat{X}}_{n}^{'} {\hat{y}}_{n})] (\cdot)

is a d-vector. This allows us to express the estimate via the basis of the RKHS and with a greater degree of flexibility than the linear combination of

n^{- 1} {\hat{X}}_{n}^{'} {\hat{y}}_{n}

. Another advantage of using estimates of the form

A g + B {\hat{z}}_{n}

is convenience of hypothesis testing. As typically

g = {(1, t)}^{'}

, thus testing the hypothesis of linearity of

β (\cdot)

is equivalent to testing

B = 0

.

For any function

h (\cdot)

, we set

{| | h | |}^{2} = \int_{T} h^{2} (t) d t

, and for fixed

λ

,

{\hat{β}}_{n, λ} (\cdot) = \hat{A} g (\cdot) + \hat{B} {\hat{z}}_{n} (\cdot),

where

(\hat{A}, \hat{B}) = arg inf_{(A, B)} (\frac{1}{n} \sum_{i = 1}^{n} | | {\hat{y}}_{i} - {(A g + B {\hat{z}}_{n})}^{'} {\hat{x}}_{i} {| |}^{2} + λ J (A g + B {\hat{z}}_{n})) .

(6)

Let

a = {(a_{11}, \dots, a_{1 k}, \dots, a_{d 1}, \dots, a_{d k})}^{'}

be the vector representation of

A

;

b = (b_{11}, \dots, b_{1 d}, \dots, b_{d 1}, \dots, b_{d d})^{'}

be that of

B

,

O = O_{d k \times d k} = n^{- 1} \sum_{i = 1}^{n} < s_{i}, s_{i}^{'} >

with

s_{i} = ({\hat{x}}_{i 1} g_{1}, \dots, {\hat{x}}_{i 1} g_{k}, \dots, {\hat{x}}_{i d} g_{1}, \dots, {\hat{x}}_{i d} g_{k})^{'}

,

R^{'} = P = P_{d^{2} \times d k} = n^{- 1} \sum_{i = 1}^{n} < t_{i}, s_{i}^{'} >

with

t_{i} = {({\hat{x}}_{i 1} {\hat{z}}_{1}, \dots, {\hat{x}}_{i 1} {\hat{z}}_{d}, \dots, {\hat{x}}_{i d} {\hat{z}}_{1}, \dots, {\hat{x}}_{i d} {\hat{z}}_{d})}^{'}

,

S = S_{d^{2} \times d^{2}} = n^{- 1} < t_{i}, t_{i}^{'} >

,

U = n^{- 1} \sum_{i = 1}^{n} < {\hat{y}}_{i}, {\hat{x}}_{i} g^{'} \geq {(u_{i j})}_{d \times k}

and its vector form

u = {(u_{11}, \dots, u_{1 k}, \dots, u_{d 1}, \dots, u_{d k})}^{'}

,

V = n^{- 1} \sum_{i = 1}^{n} < {\hat{y}}_{i}, {\hat{x}}_{i} {\hat{z}}_{n}^{'} \geq {(v_{i j})}_{d \times d}

and its vector form

v = {(v_{11}, \dots, v_{1 d}, \dots, v_{d 1}, \dots, v_{d d})}^{'}

;

λ_{1} \geq \dots \geq λ_{d} \geq 0

be all the eigenvalues of D, and

q_{1}, \dots, q_{d}

be its normalized eigenvectors,

W = W_{d^{2} \times d^{2}} = n^{- 1} \sum_{j = 1}^{n} < c_{j}, c_{j}^{'} >_{H}

and

c_{j} = λ_{j} {(q_{j 1} {\hat{z}}_{1}, \dots, q_{j 1} {\hat{z}}_{d}, \dots, q_{j d} {\hat{z}}_{1}, \dots, q_{j d} {\hat{z}}_{d})}^{'}

.

Theorem 2.

Assume

β (\cdot) \in H^{d}

,

({\hat{y}}_{i} (\cdot), {\hat{x}}_{i} (\cdot)) \in H^{d + 1}

for

i = 1, \dots, n

. Then for the given penalty functional

J (β) = | | K_{1} (β) {| |}_{H}^{2}

and fixed λ, there are constant matrices

\hat{A} = {(a_{i j})}_{d \times k}

and

\hat{B} = {(b_{i j})}_{d \times d}

such that

{\hat{β}}_{n, λ} (\cdot)

given in (6) has the following representation

{\hat{β}}_{n, λ} (t) = \hat{A} g (t) + \hat{B} (K_{1} [n^{- 1} {\hat{X}}_{n}^{'} {\hat{y}}_{n}]) (t), t \in (0, T]

and in vector form

(\hat{a}, \hat{b})

of

(\hat{A}, \hat{B})

when the following inverse exists,

(\begin{matrix} \hat{a} \\ \hat{b} \end{matrix}) = {(\begin{matrix} O & R \\ R^{'} & S + λ W \end{matrix})}^{- 1} (\begin{matrix} u \\ v \end{matrix}) .

Below we study asymptotic behavior of

{\hat{β}}_{n, λ} (\cdot)

given in (6). Denote

β_{0} (\cdot)

as the true value of

β (\cdot)

, and

| M |

is the determinant of a square matrix

M

. Lai et al. [25] proved strong consistency of the least squares estimate under general conditions, while Eicker [26] studied its asymptotic normality. The proposed estimators in this paper have some similarity to the least squares estimate, but they also have some different features and require different conditions.

(C1).: $β_{0} \in S p a n (E [x y])$ .
(C2).: ${inf}_{t \in T} | E [x (t) x^{'} (t)] | > 0$ .
(C3).: $E (| | y - {(A g + B Z)}^{'} x {| |}^{2}) < \infty$ for all bounded $(A, B)$ , where $Z = E [K_{1} (x y)]$ .
(C4).: ${lim}_{n \to \infty} {max}_{1 \leq i \leq n} | | ({\hat{y}}_{i}, {\hat{x}}_{i}) - (y_{i}, x_{i}) | | \to 0$ (a.s.).
(C5).: $λ = λ_{n} \to 0$ .

Theorem 3.

Assume conditions (C1)–(C5) hold, then as

n \to \infty

,

| | {\hat{β}}_{n, λ} - β_{0} | | \to 0, (a . s .) .

To emphasize the dependence on n, we denote

λ = λ_{n}

. Let

l^{\infty} (T)

be the space of bounded functions on T equipped with the supreme norm, and

\overset{D}{\Rightarrow}

stands for weak convergence in the space

l^{\infty} (T)

. With the following condition (C6), we obtain the asymptotic normality of

{\hat{β}}_{n, λ} (\cdot)

(C6).: $\sqrt{n} λ_{n} \to 0$ .

Theorem 4.

Assume conditions (C1)–(C4) and (C6) hold. Then as

n \to \infty

,

W_{n} : = \sqrt{n} ({\hat{β}}_{n, λ} - β_{0} - o_{p} (1)) \overset{D}{\Rightarrow} W o n l^{\infty} (T),

where

W (\cdot)

is the zero-mean Gaussian process on T with covariance function

σ (s, t) = E [W (s) W (t)]

given in the proof,

s, t \in T

, and

o_{p} (1)

is given in the proof.

Test linearity of

β_{0}

.

It is of interest to test the hypothesis

H_{0} (J) : J^{'} β_{0} (t)

is linear in t, where J is a d-dimensional vector with entries 0 or 1, with 1 corresponding to the element of

β_{0}

to be tested for linearity. The hypothesis

H_{0} (J)

is equivalent to test the corresponding coefficients

J^{'} \hat{B}

in

\hat{B}

be zeros. Let

O_{0} = E < s_{1}, s_{1}^{'} >

,

P_{0} = E < t_{1}, s_{1}^{'} >

,

S_{0} = E < t_{1}, t_{1}^{'} >

,

U_{0} = E < y_{1}, x_{1} g^{'} >

,

V_{0} = E < y_{1}, x_{1} z_{0}^{'} >

. Let

u_{0}

and

v_{0}

be the vector representations of

U_{0}

and

V_{0}

, and

w_{0} = {(u_{0}^{'}, v_{0}^{'})}^{'}

. Denote

T = m a t r i x (O, R; P, S)

,

T_{0} = m a t r i x (O_{0}, R_{0}; P_{0}, S_{0})

. By Theorem 4, we have

Corollary 1.

Assume the conditions of Theorem 4 hold, under

H_{0} (J)

, we have

\sqrt{n} (J^{'} \hat{B} - o_{p} (1)) \overset{D}{\to} N (0, Ω (J)),

where

Ω (J)

is the sub-matrix of

T_{0}^{- 1} Γ T_{0}^{- 1}

that corresponds to the covariance of

J^{'} \hat{B}

,

o_{p} (1) = (T - T_{0}) w_{0}

, and Γ is given in the proof of Theorem 4.

The nonzero bias term

o_{p} (1)

in Theorem 4 and Corollary 1 is typical in functional estimation, and often such a bias term is zero for the corresponding Euclidean parameter estimation.

Choice of the smoothing parameter. In nonparametric penalized regression for the model

y (t) = < β, x > (t) + ϵ (t)

, the most commonly-used method for the choice of the smoothing parameter is cross-validation (CV), based on the ideas of Allen (1974) and Stone (1974). This method chooses

λ

by minimizing

\frac{1}{n} \sum_{i = 1}^{n} \frac{1}{m_{i}} \sum_{j = 1}^{m_{i}} {[y_{i} (t_{i j}) - < {\hat{β}}_{n, λ, i}, {\hat{x}}_{i} > (t_{i j})]}^{2},

where

{\hat{β}}_{n, λ, i} (\cdot)

is the estimated regression function without using the observations of the ith individual. This method is usually computationally intensive even when the sample size is moderate. An improved version of the method is K-fold cross-validation. This method first randomly partitions the original sample equally into K subsamples, and then the cross-validation process is conducted K times. At each replicate,

K - 1

subsamples are used as the training data to construct the model, while the remaining one is used as the validation datum. The results from K folds are averaged to obtain a single estimation. In notation, let

n_{1}, \dots, n_{K}

be the sample sizes of the K folds, then the K-fold cross-validation method is to choose the

λ

which minimizes

\frac{1}{K} \sum_{J = 1}^{K} \frac{1}{n_{J}} \sum_{i = 1}^{n_{J}} \frac{1}{m_{i}} \sum_{j = 1}^{m_{i}} {[y_{i} (t_{i j}) - < {\hat{β}}_{n, λ, J}, {\hat{x}}_{i} > (t_{i j})]}^{2},

where

{\hat{β}}_{n, λ, J} (\cdot)

is the estimated regression function without using the data in the Jth fold. In this paper, we set

K = 5

, which is also the default setting in much software.

Choices of

K_{0}

,

K_{1}

, and

< \cdot, \cdot >_{H}

. For notational simplicity, we consider

T = [0, 1]

without loss of generality. Recall that for a function f on

[0, 1]

with

m - 1

continuous derivatives and

f^{(m)} (\cdot) \in L_{2} [0, 1]

, it has the following Taylor expansion [27]

f (t) = \sum_{j = 0}^{m - 1} \frac{f^{(j)} (0)}{j!} t^{j} + \int_{0}^{1} \frac{f^{(m)} (s)}{(m - 1)!} {(t - s)}_{+}^{m - 1} d s,

where

{(x)}_{+} = x

if

x > 0

and

{(x)}_{+} = 0

otherwise.

To construct an RKHS

H

on

L_{2} [0, 1]

, a common choice for the inner product on

H_{0} = {h : h^{(2)} (\cdot) \equiv 0}

is

< f, g >_{H, 0}

, and the orthogonal complement of

H_{0}

is

H_{1} = {h : h^{(j)} (0) = 0, j = 0, 1; \int_{0}^{1} h^{(2)} (t) d t < \infty}

, with inner product

< f, g >_{H, 1}

, where

< f, g >_{H, 0} = \sum_{j = 0}^{1} f^{(j)} (0) g^{(j)} (0), < f, g >_{H, 1} = \int_{0}^{1} f^{(2)} (t) g^{(2)} (t) d t .

The inner product on

H

is

< \cdot, \cdot >_{H} = < \cdot, \cdot >_{H, 0} + < \cdot, \cdot >_{H, 1}

. Kernels for the RKHS with more general

K_{0}

for

H_{0}

and

K_{1}

for

H_{1}

with these inner products can be found in [28]. More generalized construction of kernels

K_{0}

and

K_{1}

can be found in Ramsay and Silverman [5]. For our case,

K_{0} (s, t) = 1 + s t, K_{1} (s, t) = \int_{0}^{1} {(s - u)}_{+} {(t - u)}_{+} d u = {(s \land t)}^{2} (3 (s \lor t) - (s \land t)) / 6 .

With the above inner product,

K_{0}

, and

K_{1}

, let

K = K_{0} + K_{1}

, then

\forall h \in H

,

h (t) =

< K (t, \cdot), h (\cdot) >_{H}

, and

H_{0}

and

H_{1}

are orthogonal to each other with respect to

< \cdot, \cdot >_{H}

, but these are not true if

< \cdot, \cdot >_{H}

is replaced by a different inner product

< \cdot, \cdot >

on

[0, 1]

.

3. Simulation Studies

In this section, we conduct two simulation studies to investigate the finite sample performance of the proposed RKHS method. The first simulation study is designed to compare the RKHS estimator with the conventional smoothing spline and local polynomial model methods in terms of curve fitting. For more details on the implementations of smoothing spline and local polynomial model methods, please refer to the book by Fang, Li, and Sudijianto [24]. The second simulation study is to examine the performance of Corollary 1 for testing the linearity of the regression functions. It turns out that with moderate sample sizes, the proposed RKHS estimator performs very favorably with the competitors, and the type I errors and powers of the testing are satisfactory.

Simulation 1. Assume that the underlying individual curve i at time point

t \in T = [0, 1]

is generated from

y_{i} (t) = β_{0} (t) + β_{1} (t) x_{i 1} (t) + β_{2} (t) x_{i 2} (t) + ϵ_{i} (t),

where

β_{0} (t) \equiv 10, β_{1} (t) = 1 + t, β_{2} (t) = (1 - t) sin (2 π t), x_{i 1} (t) = sin (100 π t),

x_{i 2} (t) = cos (100 π t)

, and

ϵ_{i} (\cdot)

is a stationary Gaussian process with zero mean, unit variance, and a constant covariance

0.5

between any two distinct time points. For each subject i, the number of observation time points

m_{i}

is generated from the discrete uniform distribution on

{5, 6, \dots, 30}

, and the observation time points

t_{i j}

,

j = 1, \dots, m_{i}

are independently generated from the exponential distribution

E (0, 1)

. The density function of

E (0, 1)

is displayed in the left panel of Figure 1, from which it is easy to see that the density value decreases as t increases.

Figure 1. Left panel visualizes the density function of

E (0, 1)

; right panel visualizes the kernel density estimation of the number of observation time of MD001.

Then, we use cubic interpolation to interpolate the

y_{i} (t_{i j})

,

x_{i 1} (t_{i j})

, and

x_{i 2} (t_{i j})

on T to obtain

{\hat{y}}_{i} (\cdot)

,

{\hat{x}}_{i 1} (\cdot)

, and

{\hat{x}}_{i 2} (\cdot)

, respectively.

Based on the functions

{\hat{y}}_{i} (\cdot)

,

{\hat{x}}_{i 1} (\cdot)

, and

{\hat{x}}_{i 2} (\cdot)

described above, we use the RKHS introduced in Section 2 to estimate the regression functions

β_{0} (t), β_{1} (t)

, and

β_{2} (t)

, and compare its performance with the spline smoother and local polynomial models. Typical comparisons (the random seed is set to be “set.seed(1)” in R) are given in Figure 2, Figure 3 and Figure 4 with sample sizes of 50, 100, and 200, respectively. The simulation shows that the proposed RKHS method estimates the regression functions well and compares very favorably with the other two methods. Broadly speaking, the RKHS estimator has relatively stable performance and is close to the “true” curve; it has narrower confidence bands at dense sampling regions, and they become wider at sparse sampling regions. On the contrary, the spline smoother and local polynomial model appear to have good fit at dense sampling regions, but they have large bias when the data become sparse.

Figure 2. Performance of curve estimation when the sample size is 50 and the random seed is “set.seed(1)” in R. First row: curve estimation performance of the spline smoother; Second row: curve estimation performance of the local polynomial model; Third row: curve estimation performance of the proposed RKHS method. Solid red line: true curve; Solid blue line: estimated curve; Dotted lower and upper green lines:

95 %

confidence bands.

Figure 3. Performance of curve estimation when the sample size is 100 and the random seed is “set.seed(1)” in R. First row: curve estimation performance of the spline smoother; Second row: curve estimation performance of the local polynomial model; Third row: curve estimation performance of the proposed RKHS method. Solid red line: true curve; Solid blue line: estimated curve; Dotted lower and upper green lines:

95 %

confidence bands.

Figure 4. Performance of curve estimation when the sample size is 200 and the random seed is “set.seed(1)” in R. First row: curve estimation performance of the spline smoother; Second row: curve estimation performance of the local polynomial model; Third row: curve estimation performance of the proposed RKHS method. Solid red line: true curve; Solid blue line: estimated curve; Dotted lower and upper green lines:

95 %

confidence bands.

In order to make a thorough comparison for this simulation, we use the root integrated mean squared prediction error (RIMSPE) to measure the accuracy of the estimates [24]. The RIMSPE for estimate

\hat{β}

of

β

is given by

RIMSPE (\hat{β}) = \sqrt{\int_{0}^{1} {[β (t) - \hat{β} (t)]}^{2} d t},

and the simulation is repeated 1000 times. By using the R software, the CPU time of implementing this simulation is about 84.5 s on a PC with a 1.80 GHz dual-core Intel i5-8265U CPU and 8 GB memory. The boxplots of the RIMSPE values are presented in Figure 5, from which it is clear that RKHS performs much better than the other two methods, because it has much smaller RIMSPE values.

Figure 5. Boxplots of the RIMSPE values. The first row corresponds to sample size 50, the second row corresponds to sample size 100, and the third row corresponds to sample size 200. In each row, the left panel is for estimating

β_{0} (t)

, the middle panel is for estimating

β_{1} (t)

, and the right panel is for estimating

β_{2} (t)

.

Simulation 2. In this simulation study, we examine the performance of Corollary 1 for testing the hypothesis

H_{0} : β_{i} (t) is linear in t V S H_{1} : β_{i} (t) is not linear in t, for i = 1, 2 .

According to the setting described in Simulation 1,

β_{1} (t)

is linear in t, whereas

β_{2} (t)

is apparently not linear in t. Therefore, we will check the type I error for testing

β_{1} (t)

and the power for testing

β_{2} (t)

. By setting the significance level to

0.05

and repeating the simulation 1000 times, we use Corollary 1 to derive

χ^{2}

testing statistics and list its type I errors and powers in Table 1 for various sample sizes. The results in Table 1 suggest that the type I error of the test is close to the nominal level

0.05

, and the power of the test is not small even with a sample size of 50.

Table 1. Summary of simulation results for linearity testing.

4. Real Data Analysis

In this section, the proposed method is applied to characterize the relationships in patient immune response in a clinical trial of combination immunotherapy for advanced myeloma. The objective of the original trial was to study whether introducing vaccine-primed T cells early leads to cellular immune responses to the putative tumor antigen hTERT. In this study, 54 patients were recruited and assigned to two treatment arms based on their leukocyte response to human leukocyte antigen A2. Various immune cell parameters (CD3, CD4, CD8), T-cell levels, cytokines (IL7, IL-15), and immunoglobulins (IgA, IgG, IgM) were measured repeatedly to investigate the treatment effect on immune recovery and function. The measurements were taken at nine time points: 0, 2, 7, 14, 40, 60, 90, 100, and 180 days [29]. Moreover, as a subtype of white blood cells in the human immune system, absolute lymphocyte cell (ALC) count was recorded over time during or after patients’ hospitalization up to day 180. Figure 6 shows the trajectories of two individuals, namely “MD001” and “MD002”, in the dataset, with the observation interval scaled to

[0, 1]

. The trajectories of all 54 individuals can be found in the paper by Fang et al. [30]. Previous research has shown that the patient’s survival time is associated with the trajectory of the patient’s ALC counts.

Figure 6. Left panel: trajectory of individual “MD001"; right panel: trajectory of individual “MD002". The observation interval has been scaled to

[0, 1]

.

In the human immune system, the relationships among various biological features are too complicated and have been topologically described only. For illustrating the performance of our proposed methods with a limited sample size, we only investigate how the levels of a patient’s immunoglobulin IgG and immune cell CD8 dynamically affect the trajectory of the patient’s ALC counts in this section. For simplicity, the observation time points are scaled to the interval

[0, 1]

. Let

x_{1} (t), x_{2} (t)

and

y (t)

be the trajectories of the patient’s IgG, CD8, and ALC counts, respectively. Their relationship can then be described as follows

y (t) = β_{0} (t) + β_{1} (t) x_{1} (t) + β_{2} (t) x_{2} (t) + ϵ (t), E [ϵ (t)] = 0,

where

β_{0} (t), β_{1} (t)

and

β_{2} (t)

are the regression coefficient functions, and

ϵ (t)

is the random error function. The purpose of this study is to estimate the regression coefficient functions and test whether

β_{1} (t)

and

β_{2} (t)

are linear functions in t.

In the used data, the number of observation times generally becomes sparse as t increases. The right panel of Figure 7 visualizes the kernel density estimation of individual “MD001” in the data. The distribution of observed time points reveals the trend. The proposed RKHS method is used to estimate the regression coefficient functions and test the linearity. By using the R software, the CPU time of implementing the estimation procedure is only about 1.5 s on a PC with a 1.80 GHz dual-core Intel i5-8265U CPU and 8 GB memory. Figure 7 visualizes the estimated curves and their

95 %

confidence bands. It is observed that

β_{1} (t)

and

β_{2} (t)

are apparently nonlinear in t. This observation is also confirmed by the

χ^{2}

statistic derived from Corollary 1, which yields p-values less than

0.001

for both

β_{1} (t)

and

β_{2} (t)

. It is worth noting that

β_{0} (t)

is monotone in t, but

β_{1} (t)

and

β_{2} (t)

are not monotone in t. The results show that with the immunotherapy of tumor antigen vaccination, a patient’s immunoglobulin IgG enhances the ALC counts. When the increasing CD8 immune cells result in a high ALC count, immunoglobulin IgG inhibits the patient’s ALC counts such that the level of ALC counts is reconverted into the normal interval

(1000, 4500)

, and this immunotherapy can potentially improve patient survival time.

Figure 7. The regression coefficient functions estimated by the proposed RKHS method. Solid blue line: estimated curve; dotted lower and upper green lines:

95 %

confidence bands. The time t has been scaled to the interval

[0, 1]

.

5. Concluding Remarks

The existing work on functional data analysis has focused primarily on the case where the observed data are sampled from a dense rate and has been limited to models in which either the response or predictors are functions. In this paper, we consider the more practical situation for functional data analysis where the data are only observed at some (not dense) time points, and we propose a general regression model in which both the response and predictors are functions. This function-on-function regression model, as given by Equation (4), can be viewed as a generalization of multivariate multiple linear regression to allow the response, predictors, and even the regression coefficients to be all functions of t. In order to estimate the underlying regression curves and conduct hypothesis testing on these curves, we use reproducing kernel Hilbert space (RKHS), which only needs to choose the kernel(s) of the RKHS, and enables a closed-form solution for the regression coefficients in terms of the kernel. To the best of our knowledge, this is the first representation of functional regression coefficients with sparsely observed data. Furthermore, the estimator based on RKHS provides a foundation for hypothesis testing, and the asymptotic distribution of the estimator is obtained. Simulation studies show that the RKHS estimator has relatively stable performance. Application and statistical properties of our method are further demonstrated through an immunotherapy clinical trial of advanced myeloma. By using the proposed function-on-function regression model and related theorems established in this paper, this real application showed that with the immunotherapy of tumor antigen vaccination, patient immunoglobulin IgG enhances ALC counts, and hence this immunotherapy can potentially improve patient survival time. Future work may consider experimental design for the time points to be observed. If the time points can be controlled by the experimenter, their careful selection would improve the efficiency of the estimator (e.g., reduce the bias or MES). Further, we hope to study function-on-function generalized linear regressions with sparse estimation coefficient functions by the penalized method of Zhang and Jia [31].

Author Contributions

Conceptualization, H.-B.F.; methodology, H.H.; validation, G.M.; formal analysis, H.L.; writing—original draft preparation, H.H.; writing—H.-B.F. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported in part by the National Cancer Institute (NCI) grant P30CA 051008 and the Key Laboratory of Mathematical and Statistical Models (Guangxi Normal University), Education Department of Guangxi Zhuang Autonomous Region.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

All data included in this study are available upon request by contacting the corresponding author.

Conflicts of Interest

The authors declare no conflict of interest.

Appendix A

Proof of Theorem 1 (1-dimensional case).

In this case,

d = 1

,

A = a = (a_{1}, \dots, a_{k})

,

B = b = (b_{1}, \dots, b_{n})

,

Z_{n} (\cdot) = {(K_{1} {\hat{x}}_{1} (\cdot), \dots, K_{1} {\hat{x}}_{n} (\cdot))}^{'}

,

B ⊙ Z_{n} = b Z_{n} = \sum_{i = 1}^{n} b_{i} K_{1} {\hat{x}}_{i}

,

D = 1

, and

J (a g + b ⊙ Z_{n}) = J (K_{1} (b Z_{n})) = < (K_{1} b Z_{n}), K_{1} (b Z_{n}) >_{H} = < (b K_{1} Z_{n}), (b K_{1} Z_{n}) >_{H} = < (b Z_{n}), (b Z_{n}) >_{H} .

Below we evaluate

\partial < (b Z_{n}), (b Z_{n}) >_{H} / \partial b

. As

(b Z_{n}) (b Z_{n}) = \sum_{i = 1}^{n} b_{i}^{2} {(K_{1} {\hat{x}}_{i})}^{2} + \sum_{i \neq j} b_{i} b_{j} (K_{1} {\hat{x}}_{i}) (K_{1} {\hat{x}}_{j}),

thus

\frac{(b Z_{n}) (b Z_{n})}{\partial b_{i}} = 2 b_{i} {(K_{1} {\hat{x}}_{i})}^{2} + 2 \sum_{i \neq j} b_{j} (K_{1} {\hat{x}}_{i}) (K_{1} {\hat{x}}_{j}) = 2 (K_{1} {\hat{x}}_{i}) \sum_{j = 1}^{n} b_{j} (K_{1} {\hat{x}}_{j}) .

From this we get

\frac{\partial < (b Z_{n}), (b Z_{n}) >_{H}}{\partial b} = (\frac{\partial < (b Z_{n}), (b Z_{n}) >_{H}}{\partial b_{1}}, \dots, \frac{\partial < (b Z_{n}), (b Z_{n}) >_{H}}{\partial b_{n}}) = q, q = (q_{1}, \dots, q_{n}),

where,

q_{i} = 2 \sum_{j = 1}^{n} < (K_{1} {\hat{x}}_{i}), b_{j} (K_{1} {\hat{x}}_{j}) >_{H} = 2 < b Z_{n}, z_{i} >_{H}

. Note

\partial | | y - x (a g + b Z_{n}) {| |}^{2} / \partial a_{i} = - 2 < y - x (a g + b Z_{n}), x g_{i} >

, or

\frac{\partial | | y - x (a g + b Z_{n}) {| |}^{2}}{\partial a} = - 2 < y - x (a g + b Z_{n}), x g^{'} > .

Further,

\partial (x (b Z_{n})) / \partial b_{i} = x z_{i}

, or

\frac{\partial | | y - x (a g + b Z_{n}) {| |}^{2}}{\partial b} = - 2 < y - x (a g + b Z_{n}), x Z_{n}^{'} >,

where, by convention,

x Z_{n}^{'} = (x z_{1}, \dots, x z_{n})

, a n-dimensional row vector.

Rewrite (2) as

(\hat{a}, \hat{b}) = arg inf_{(a, b)} G (a, b),

where

G (a, b) = \frac{1}{n} \sum_{i = 1}^{n} | | {\hat{y}}_{i} - {\hat{x}}_{i} (a g + b Z_{n}) {| |}^{2} + λ < (b Z_{n}), (b Z_{n}) >_{H}

.

(\hat{a}, \hat{b})

must satisfy

\{\begin{matrix} 0_{1 \times k} = \frac{\partial G (a, b)}{\partial a} = - 2 \frac{1}{n} \sum_{i = 1}^{n} < {\hat{y}}_{i} - {\hat{x}}_{i} (a g + b Z_{n}), {\hat{x}}_{i} g^{'} > \\ 0_{1 \times n} = \frac{\partial G (a, b)}{\partial b} = - 2 (\frac{1}{n} \sum_{i = 1}^{n} < {\hat{y}}_{i} - {\hat{x}}_{i} (a g + b Z_{n}), {\hat{x}}_{i} Z_{n}^{'} > - \frac{λ}{2} q) \end{matrix}

or

\{\begin{matrix} \frac{1}{n} \sum_{i = 1}^{n} < {\hat{y}}_{i} {\hat{x}}_{i}, g > = \frac{1}{n} \sum_{i = 1}^{n} < {\hat{x}}_{i}^{2} a g, g > + \frac{1}{n} \sum_{i = 1}^{n} < {\hat{x}}_{i}^{2} b Z_{n}, g > \\ \frac{1}{n} \sum_{i = 1}^{n} < {\hat{y}}_{i} {\hat{x}}_{i}, Z_{n} > = \frac{1}{n} \sum_{i = 1}^{n} < {\hat{x}}_{i}^{2} a g, Z_{n} > + \frac{1}{n} \sum_{i = 1}^{n} < {\hat{x}}_{i}^{2} b Z_{n}, Z_{n} > + λ < b Z_{n}, Z_{n} >_{H} \end{matrix} .

It is easy to check that

n^{- 1} \sum_{i = 1}^{n} < {\hat{x}}_{i}^{2} a g, g \geq n^{- 1} \sum_{i = 1}^{n} < {\hat{x}}_{i}^{2}, g g^{'} > a^{'} : = O a^{'}

, (

O_{k \times k}

),

n^{- 1} \sum_{i = 1}^{n} < {\hat{x}}_{i}^{2} b Z_{n}, g \geq n^{- 1} \sum_{i = 1}^{n} < {\hat{x}}_{i}^{2}, g Z_{n}^{'} > b^{'} : = R b^{'}

,

(R_{k \times n})

,

n^{- 1} \sum_{i = 1}^{n} < {\hat{x}}_{i}^{2} a g, Z_{n} \geq n^{- 1} \sum_{i = 1}^{n} < {\hat{x}}_{i}^{2}, Z_{n} g^{'} > a^{'} = R^{'} a^{'}

,

n^{- 1} \sum_{i = 1}^{n} < {\hat{x}}_{i}^{2} b Z_{n}, Z_{n} > = n^{- 1} \sum_{i = 1}^{n} < {\hat{x}}_{i}^{2}, Z_{n} Z_{n}^{'} > b^{'} : = S b^{'}

,

(S_{n \times n})

, and

< b Z_{n}, Z_{n} >_{H} = < Z_{n} Z_{n}^{'} >_{H} b^{'} : = W b^{'}

,

(W_{n \times n})

. Denote

\frac{1}{n} \sum_{i = 1}^{n} < {\hat{y}}_{i} {\hat{x}}_{i}, g \geq u

,

(u_{k \times 1})

and

\frac{1}{n} \sum_{i = 1}^{n} < {\hat{y}}_{i} {\hat{x}}_{i}, Z_{n} \geq v

,

(v_{n \times 1}

), then the above system of equations can be rewritten as

(\begin{matrix} O & R \\ R^{'} & S + λ W \end{matrix}) (\begin{matrix} a^{'} \\ b^{'} \end{matrix}) = (\begin{matrix} u \\ v \end{matrix}),

(A1)

or when the following inverse exists,

(\begin{matrix} {\hat{a}}^{'} \\ {\hat{b}}^{'} \end{matrix}) = {(\begin{matrix} O & R \\ R^{'} & S + λ W \end{matrix})}^{- 1} (\begin{matrix} u \\ v \end{matrix}) .

□

Proof of Theorem 2 (one-dimensional case).

In this case,

{\hat{X}}_{n} = {({\hat{x}}_{1}, \dots, {\hat{x}}_{n})}^{'}

,

{\hat{z}}_{n} (\cdot) = n^{- 1} \sum_{i = 1}^{n} K_{1} ({\hat{x}}_{i} y_{i}) (\cdot)

,

a = {(a_{1}, \dots, a_{k})}^{'}

,

b = b

,

{\hat{β}}_{n, λ} (\cdot) = {\hat{β}}_{n, λ} (\cdot) = {\hat{a}}^{'} g (\cdot) + \hat{b} {\hat{z}}_{n} (\cdot)

, and

(\hat{a}, \hat{b}) = arg inf_{(a, b)} (\frac{1}{n} \sum_{i = 1}^{n} | | {\hat{y}}_{i} - (a^{'} g + b {\hat{z}}_{n}) {\hat{x}}_{i} {| |}^{2} + λ J (a^{'} g + b {\hat{z}}_{n})) = arg inf_{(a, b)} (\frac{1}{n} \sum_{i = 1}^{n} | | {\hat{y}}_{i} - (a^{'} g + b {\hat{z}}_{n}) {\hat{x}}_{i} {| |}^{2} + λ b^{2} | | {\hat{z}}_{n} {| |}_{H}^{2}) : = G (a, b) .

As in the proof of Theorem 1 (one-dimensional case),

(\hat{a}, \hat{b})

must satisfy

\{\begin{matrix} 0_{1 \times k} = \frac{\partial G (a, b)}{\partial a} = - 2 \frac{1}{n} \sum_{i = 1}^{n} < {\hat{y}}_{i} - {\hat{x}}_{i} (a^{'} g + b {\hat{z}}_{n}), {\hat{x}}_{i} g^{'} > \\ 0 = \frac{\partial G (a, b)}{\partial b} = - 2 (\frac{1}{n} \sum_{i = 1}^{n} < {\hat{y}}_{i} - {\hat{x}}_{i} (a^{'} g + b {\hat{z}}_{n}), {\hat{x}}_{i} {\hat{z}}_{n} > - λ b | | {\hat{z}}_{n} {| |}_{H}^{2}) \end{matrix},

or

\{\begin{matrix} \frac{1}{n} \sum_{i = 1}^{n} < {\hat{y}}_{i} {\hat{x}}_{i}, g > = \frac{1}{n} \sum_{i = 1}^{n} < {\hat{x}}_{i}^{2} a^{'} g, g > + b < {\hat{x}}_{i}^{2} {\hat{z}}_{n}, g > \\ \frac{1}{n} \sum_{i = 1}^{n} < {\hat{y}}_{i} {\hat{x}}_{i}, {\hat{z}}_{n} > = \frac{1}{n} \sum_{i = 1}^{n} < {\hat{x}}_{i}^{2} a^{'} g, {\hat{z}}_{n} > + b (\frac{1}{n} \sum_{i = 1}^{n} < {\hat{x}}_{i}^{2}, {\hat{z}}_{n}^{2} > + λ | | {\hat{z}}_{n} {| |}_{H}^{2}) \end{matrix}

It is easy to check that

n^{- 1} \sum_{i = 1}^{n} < {\hat{x}}_{i}^{2} a^{'} g, g \geq n^{- 1} \sum_{i = 1}^{n} < {\hat{x}}_{i}^{2}, g g^{'} > a : = O a

, (

O_{k \times k}

);

n^{- 1} \sum_{i = 1}^{n} < {\hat{x}}_{i}^{2} b {\hat{z}}_{n}, g \geq n^{- 1} \sum_{i = 1}^{n} < {\hat{x}}_{i}^{2} {\hat{z}}_{n}, g > b : = R b

,

(R_{k \times 1})

;

n^{- 1} \sum_{i = 1}^{n} < {\hat{x}}_{i}^{2} a^{'} g, {\hat{z}}_{n} \geq n^{- 1} \sum_{i = 1}^{n} < {\hat{x}}_{i}^{2} {\hat{z}}_{n}, g^{'} > a = R^{'} a

;

n^{- 1} \sum_{i = 1}^{n} < {\hat{x}}_{i}^{2} b {\hat{z}}_{n}, {\hat{z}}_{n} \geq n^{- 1} \sum_{i = 1}^{n} < {\hat{x}}_{i}^{2} {\hat{z}}_{n}, {\hat{z}}_{n} > b : = S b

; and

< b {\hat{z}}_{n}, {\hat{z}}_{n} >_{H} = < {\hat{z}}_{n}, {\hat{z}}_{n} >_{H} b : = W b

. Denote

\frac{1}{n} \sum_{i = 1}^{n} < {\hat{y}}_{i} {\hat{x}}_{i}, g \geq u

,

(u_{k \times 1})

, and

\frac{1}{n} \sum_{i = 1}^{n} < {\hat{y}}_{i} {\hat{x}}_{i}, {\hat{z}}_{n} \geq v

, the above system of equations is rewritten as

(\begin{matrix} O & R \\ R^{'} & S + λ W \end{matrix}) (\begin{matrix} a \\ b \end{matrix}) = (\begin{matrix} u \\ v \end{matrix}),

(A2)

or when the following inverse exists,

(\begin{matrix} \hat{a} \\ \hat{b} \end{matrix}) = {(\begin{matrix} O & R \\ R^{'} & S + λ W \end{matrix})}^{- 1} (\begin{matrix} u \\ v \end{matrix}) .

□

Proof of Theorem 1.

We first simplify the penalty term

J (A g + B ⊙ Z_{n})

. By property of RKHS,

K (s, t) = < K (s, \cdot), K (t, \cdot) >_{H}

, thus

\forall h \in H

,

(K_{1} h) (\cdot) : = < K_{1} (\cdot,), h >_{H} \in H_{1}

and

\forall h \in H_{1}

,

(K_{1} h) = h

. Thus

J (A g + B ⊙ Z_{n}) = J (K_{1} (B ⊙ Z_{n})) = < K_{1} {(B ⊙ Z_{n})}^{'} D, K_{1} (B ⊙ Z_{n}) >_{H} = < {(B ⊙ K_{1} Z_{n})}^{'} D, B ⊙ K_{1} Z_{n} >_{H} = < {(B ⊙ Z_{n})}^{'} D, B ⊙ Z_{n} >_{H} .

Note that the inner product

< \cdot, \cdot >_{H}

of the RKHS is often not the inner product

< \cdot, \cdot >

used in the optimization objective, such as the one corresponding to the

L_{2}

norm. Thus, the above expression of

J (A g + B ⊙ Z_{n})

does not hold under the inner product

< \cdot, \cdot >

.

Below we need to evaluate

\partial < {(B ⊙ Z_{n})}^{'} D, B ⊙ Z_{n} >_{H} / \partial B

. For this, write

b_{i} = (b_{i 1}, \dots, b_{i n})

for the i-th row of

B

(i = 1, \dots, d)

, and

z_{i} = {(z_{1 i}, \dots z_{n i})}^{'}

for the i-th column of

Z_{n}

. Then

{(B ⊙ Z_{n})}^{'} D (B ⊙ Z_{n}) = \sum_{i, r = 1}^{d} d_{i r} (b_{i} z_{i}) (b_{r} z_{r}) = \sum_{i}^{d} d_{i i} {(b_{i} z_{i})}^{2} + \sum_{i = 1}^{d} \sum_{r \neq i}^{d} d_{i r} (b_{i} z_{i}) (b_{r} z_{r})

and we get, since

d_{i r} = d_{r i}

, and

b_{i} z_{i} = \sum_{j = 1}^{n} b_{i j} z_{j i}

,

\frac{\partial ({(B ⊙ Z_{n})}^{'} D (B ⊙ Z_{n}))}{\partial b_{i j}} = 2 d_{i i} z_{j i} (b_{i} z_{i}) + \sum_{r \neq i}^{d} d_{i r} z_{j i} (b_{r} z_{r}) = d_{i i} z_{j i} (b_{i} z_{i}) + \sum_{r = 1}^{d} d_{i r} z_{j i} (b_{r} z_{r}) .

From this we get

\frac{\partial < {(B ⊙ Z_{n})}^{'} D, (B ⊙ Z_{n}) >_{H}}{\partial B} = Q, Q = {(q_{i j})}_{d \times n},

where

q_{i j} = d_{i i} < z_{j i}, (b_{i} z_{i}) >_{H} + \sum_{r = 1}^{d} d_{i r} < z_{j i}, (b_{r} z_{r}) >_{H}

. Note

\partial | | y - x^{'} (A g + B ⊙ Z_{n}) {| |}^{2} / \partial a_{i j} = - 2 < y - x^{'} (A g + B ⊙ Z_{n}), x_{i} g_{j} >

, or

\frac{\partial | | y - x^{'} (A g + B ⊙ Z_{n}) {| |}^{2}}{\partial A} = - 2 < y - x^{'} (A g + B ⊙ Z_{n}), x g^{'} > .

Further,

x^{'} (B ⊙ Z_{n}) = \sum_{i = 1}^{d} x_{i} (b_{i} z_{i})

, and

\partial (x^{'} (B ⊙ Z_{n})) / \partial b_{i j} = x_{i} z_{j i}

, or

\frac{\partial | | y - x^{'} (A g + B ⊙ Z_{n}) {| |}^{2}}{\partial B} = - 2 < y - x^{'} (A g + B ⊙ Z_{n}), x Z_{n}^{'} >,

where, by convention,

x Z_{n}^{'}

is the

d \times n

matrix with

(i, j)

-th entry

x_{i} z_{j i}

.

Rewrite (2) as

(\hat{A}, \hat{B}) = arg inf_{(A, B)} G (A, B),

where

G (A, B) = \frac{1}{n} \sum_{i = 1}^{n} | | {\hat{y}}_{i} - {\hat{x}}_{i}^{'} (A g + B ⊙ Z_{n}) {| |}^{2} + λ < {(B ⊙ Z_{n})}^{'} D, (B ⊙ Z_{n}) >_{H}

.

(\hat{A}, \hat{B})

must satisfy

\{\begin{matrix} 0_{d \times k} = \frac{\partial G (A, B)}{\partial A} = - 2 \frac{1}{n} \sum_{i = 1}^{n} < {\hat{y}}_{i} - {\hat{x}}_{i}^{'} (A g + B ⊙ Z_{n}), {\hat{x}}_{i} g^{'} > \\ 0_{d \times n} = \frac{\partial G (A, B)}{\partial B} = - 2 (\frac{1}{n} \sum_{i = 1}^{n} < {\hat{y}}_{i} - {\hat{x}}_{i}^{'} (A g + B ⊙ Z_{n}), {\hat{x}}_{i} Z_{n}^{'} > - \frac{λ}{2} Q) \end{matrix} .

(A3)

To solve the linear system (A3), we need to rewrite it in terms of vector forms

a

and

b

of

A

and

B

. For this, let

a = {(a_{11}, \dots, a_{1 k}, \dots, a_{d, 1}, \dots, a_{d, k})}^{'}

be the vector representation of

A

;

b = {(b_{11}, \dots, b_{1 n}, \dots, b_{d, 1}, \dots, b_{d, n})}^{'}

be that of

B

. For

x = {(x_{1}, \dots, x_{d})}^{'}

,

< x^{'} A g, x g^{'} >

is a

d \times k

matrix with

(i, j)

-th entry

< x^{'} A g, x_{i} g_{j} \geq \sum_{r = 1}^{d} \sum_{s = 1}^{k} a_{r s} < x_{r} g_{s}, x_{i} g_{j} >

. Similarly,

n^{- 1} \sum_{m = 1}^{n} < {\hat{x}}_{m}^{'} A g, {\hat{x}}_{m} g^{'} >

is a

d \times k

matrix with

(i, j)

-th entry

n^{- 1} \sum_{r = 1}^{d} \sum_{s = 1}^{k} a_{r s} \sum_{m = 1}^{n} < {\hat{x}}_{m r} g_{s}, {\hat{x}}_{m i} g_{j} >

;

n^{- 1} \sum_{m = 1}^{n} < {\hat{x}}_{m}^{'} (B ⊙ Z_{n}), {\hat{x}}_{m} g^{'} >

is a

d \times k

matrix with

(i, j)

-th entry

n^{- 1} \sum_{r = 1}^{d} \sum_{s = 1}^{n} b_{r s} \sum_{m = 1}^{n} < {\hat{x}}_{m r} z_{s r}, {\hat{x}}_{m i} g_{j} >

; and

n^{- 1} \sum_{m = 1}^{n} < {\hat{y}}_{m}, {\hat{x}}_{m} g^{'} >

is a

d \times k

matrix with

(i, j)

-th entry

n^{- 1} \sum_{m = 1}^{n} < {\hat{y}}_{m}, {\hat{x}}_{m i} g_{j} >

.

Likewise,

n^{- 1} \sum_{m = 1}^{n} < {\hat{x}}_{m}^{'} A g, {\hat{x}}_{m} Z_{n}^{'} >

is a

d \times n

matrix with

(i, j)

-th entry

n^{- 1} \sum_{r = 1}^{d} \sum_{s = 1}^{k} a_{r s} \sum_{m = 1}^{n} < {\hat{x}}_{m r} g_{s}, {\hat{x}}_{m i} z_{j i} >

;

n^{- 1} \sum_{m = 1}^{n} < {\hat{x}}_{m}^{'} (B ⊙ Z_{n}), {\hat{x}}_{m} Z_{n}^{'} >

is a

d \times k

matrix with

(i, j)

-th entry

n^{- 1} \sum_{l = 1}^{d} \sum_{r = 1}^{n} b_{l r} \sum_{m = 1}^{n} < {\hat{x}}_{m l} z_{r l}, {\hat{x}}_{m i} z_{j i} >

; and

n^{- 1} \sum_{m = 1}^{n} < {\hat{y}}_{m}, {\hat{x}}_{m} Z_{n}^{'} >

is a

d \times k

matrix with

(i, j)

-th entry

n^{- 1} \sum_{m = 1}^{n} < {\hat{y}}_{m}, {\hat{x}}_{m i} z_{j i} >

.

Let the notation

< x^{'} A g, x g^{'} > \sim O a

means rearrange elements in the

d \times k

matrix

< x^{'} A g, x g^{'} >

as a

d k

-vector in dictionary order in terms of its

d k

-vector

a

form. Thus,

n^{- 1} \sum_{m = 1}^{n} < {\hat{x}}_{m}^{'} A g, {\hat{x}}_{m} g^{'} > \sim O a, O_{d k \times d k} = n^{- 1} \sum_{i = 1}^{n} < s_{i}, s_{i}^{'} >,

where

s_{i} = {({\hat{x}}_{i 1} g_{1}, \dots, {\hat{x}}_{i 1} g_{k}, \dots, {\hat{x}}_{i, d} g_{1}, \dots, {\hat{x}}_{i, d} g_{k})}^{'}

; Similarly,

n^{- 1} \sum_{m = 1}^{n} < {\hat{x}}_{m}^{'} (B ⊙ Z_{n}), {\hat{x}}_{m} g^{'} > \sim R b, R_{d k \times d n} = n^{- 1} \sum_{i = 1}^{n} < s_{i}, t_{i}^{'} >,

where

t_{i} = {({\hat{x}}_{i 1} {\hat{z}}_{11}, \dots, {\hat{x}}_{i 1} {\hat{z}}_{n 1}, \dots, {\hat{x}}_{i d} {\hat{z}}_{11}, \dots, {\hat{x}}_{i d} {\hat{z}}_{n 1})}^{'}

; and

n^{- 1} \sum_{m = 1}^{n} < {\hat{y}}_{m}, {\hat{x}}_{m} g^{'} > \sim u, u = {(u_{11}, \dots, u_{1 k}, \dots, u_{d 1}, \dots, u_{d k})}^{'},

where

u_{i j} = n^{- 1} \sum_{m = 1}^{n} < {\hat{y}}_{m}, {\hat{x}}_{m i} g_{j} >

.

Likewise,

n^{- 1} \sum_{m = 1}^{n} < {\hat{x}}_{m}^{'} A g, {\hat{x}}_{m} {\hat{Z}}_{n}^{'} > \sim P a, P_{d n \times d k} = n^{- 1} \sum_{i = 1}^{n} < t_{i}, s_{i}^{'} \geq R^{'};

n^{- 1} \sum_{m = 1}^{n} < {\hat{x}}_{m}^{'} (B ⊙ {\hat{Z}}_{n}), {\hat{x}}_{m} {\hat{Z}}_{n}^{'} > \sim S b, S_{d n \times d n} = n^{- 1} \sum_{i = 1}^{n} < t_{i}, t_{i}^{'} >;

and

n^{- 1} \sum_{m = 1}^{n} < {\hat{y}}_{m}, {\hat{x}}_{m} {\hat{Z}}_{n}^{'} > \sim v, v = {(v_{11}, \dots, v_{1 n}, \dots ., v_{d 1}, \dots, v_{d n})}^{'},

where

v_{i j} = n^{- 1} \sum_{m = 1}^{n} < {\hat{y}}_{m}, {\hat{x}}_{m i} z_{j i} >

.

Rewrite

q_{i j}

as

q_{i j} = \sum_{s = 1}^{n} b_{i s} d_{i i} < {\hat{z}}_{j i}, {\hat{z}}_{s i} >_{H} + \sum_{r = 1}^{d} \sum_{s = 1}^{n} b_{r s} d_{i r} < {\hat{z}}_{j i}, {\hat{z}}_{s r} >_{H}, (1 \leq i \leq d; 1 \leq j \leq n) .

Let

z = {(z_{11}, \dots, z_{n 1}, \dots, z_{1 d}, \dots, z_{n d})}^{'}

,

1

be the

n \times n

matrix of 1’s,

D_{0} = d i a g {d_{11} 1, \dots, d_{d d} 1}

, and

D_{1} = (\begin{matrix} d_{11} 1 & \dots & d_{1 d} 1 \\ ⋱ & ⋱ & ⋱ \\ d_{d 1} 1 & \dots & d_{d d} 1 \end{matrix}) .

For any two matrices

A = (a_{i j})

and

B = (b_{i j})

of the same dimension, denote

A \otimes B = (a_{i j} b_{i j})

. Let

W_{d n \times d n} = (D_{0} + D_{1}) \otimes < z, z^{'} >_{H}

. It it not difficult to check that

Q = \sim W b .

Then (A1) is rewritten as

(\begin{matrix} O & R \\ R^{'} & S + \frac{λ}{2} W \end{matrix}) (\begin{matrix} \hat{a} \\ \hat{b} \end{matrix}) = (\begin{matrix} u \\ v \end{matrix}),

(A4)

or when the following inverse exists,

(\begin{matrix} \hat{a} \\ \hat{b} \end{matrix}) = {(\begin{matrix} O & R \\ R^{'} & S + \frac{λ}{2} W \end{matrix})}^{- 1} (\begin{matrix} u \\ v \end{matrix}) .

□

Proof of Theorem 2.

In this case,

{\hat{z}}_{n} = {({\hat{z}}_{1}, \dots, {\hat{z}}_{d})}^{'}

is a d-vector and, similar to the proof of Theorem 1, we have

J (A g + B {\hat{z}}_{n}) = < {\hat{z}}_{n}^{'} B^{'} D, B {\hat{z}}_{n} >_{H}

. To evaluate

\partial < {\hat{z}}_{n}^{'} B^{'} D, B {\hat{z}}_{n} >_{H} / \partial B

, write

B = (b_{1}, \dots, b_{d})

, where

b_{j} = (b_{1 j}, \dots, b_{d j})

is the j-th column of

B

. Then

B {\hat{z}}_{n} = \sum_{j = 1}^{d} z_{j} b_{j}

, and

{\hat{z}}_{n}^{'} B^{'} D B {\hat{z}}_{n} = \sum_{j = 1}^{d} ({\hat{z}}_{j}^{2} b_{j}^{'} D b_{j} + 2 \sum_{l \neq j}^{d} {\hat{z}}_{j} b_{j}^{'} D b_{l} {\hat{z}}_{l}) = \sum_{j = 1}^{d} ({\hat{z}}_{j}^{2} \sum_{i = 1}^{d} (b_{i j}^{2} d_{i i} + 2 \sum_{k \neq i}^{d} b_{i j} d_{i k} b_{k j} + \sum_{k \neq i}^{d} \sum_{l \neq i}^{d} b_{k j} d_{k l} b_{l j}) + 2 \sum_{l \neq j}^{d} {\hat{z}}_{j} (\sum_{r, s = 1}^{d} b_{r j} d_{r s} b_{s l}) {\hat{z}}_{l}),

we get, since

d_{i j} = d_{j i}

,

\frac{\partial ({\hat{z}}_{n}^{'} B^{'} D B {\hat{z}}_{n})}{\partial b_{i j}} = 2 {\hat{z}}_{j} (d_{i i} b_{i j} {\hat{z}}_{j} + \sum_{k \neq i}^{d} d_{i k} b_{k j} {\hat{z}}_{j} + \sum_{l \neq j}^{d} \sum_{s = 1}^{d} d_{i s} b_{s l} {\hat{z}}_{l}) = 2 {\hat{z}}_{j} \sum_{l = 1}^{d} \sum_{s = 1}^{d} d_{i s} b_{s l} {\hat{z}}_{l} = 2 {\hat{z}}_{j} d_{i} B {\hat{z}}_{n},

where

d_{i} = (d_{i 1}, \dots, d_{i d})

is the i-th row of D. From this we get

\frac{\partial < {\hat{z}}^{'} B^{'} D, B {\hat{z}}_{n} >_{H}}{\partial B} = 2 Q, Q = {(q_{i j})}_{d \times d}, q_{i j} = < d_{i} B {\hat{z}}_{n}, {\hat{z}}_{j} >_{H} = d_{i} B < {\hat{z}}_{n}, {\hat{z}}_{j} >_{H}

or

\frac{\partial < {\hat{z}}_{n}^{'} B^{'} D B, {\hat{z}}_{n} >_{H}}{\partial B} = 2 D B < {\hat{z}}_{n}, {\hat{z}}_{n}^{'} >_{H} .

Now (3) is rewritten as

(\hat{A}, \hat{B}) = arg inf_{(A, B)} G (A, B),

where

G (A, B) = \frac{1}{n} \sum_{i = 1}^{n} | | y_{i} - {(A g + B {\hat{z}}_{n})}^{'} {\hat{x}}_{i} {| |}^{2} + λ < {\hat{z}}_{n}^{'} B^{'} D B, {\hat{z}}_{n} >_{H}

, and

(\hat{A}, \hat{B})

must satisfy

\{\begin{matrix} 0_{d \times k} = \frac{\partial G (A, B)}{\partial A} = - 2 \frac{1}{n} \sum_{i = 1}^{n} < {\hat{y}}_{i} - {(A g + B {\hat{z}}_{n})}^{'} {\hat{x}}_{i}, {\hat{x}}_{i} g^{'} > \\ 0_{d \times d} = \frac{\partial G (A, B)}{\partial B} = - 2 (\frac{1}{n} \sum_{i = 1}^{n} < {\hat{y}}_{i} - {(A g + B {\hat{z}}_{n})}^{'} {\hat{x}}_{i}, {\hat{x}}_{i} {\hat{z}}_{n}^{'} > - λ D B < {\hat{z}}_{n}, {\hat{z}}_{n}^{'} >_{H}) \end{matrix},

or

\{\begin{matrix} \frac{1}{n} \sum_{i = 1}^{n} < x_{i}^{'} (A g + B {\hat{z}}_{n}), {\hat{x}}_{i} g^{'} > = \frac{1}{n} \sum_{i = 1}^{n} < {\hat{y}}_{i}, {\hat{x}}_{i} g^{'} > \\ \frac{1}{n} \sum_{i = 1}^{n} < {\hat{x}}_{i}^{'} (A g + B {\hat{z}}_{n}), {\hat{x}}_{i} {\hat{z}}_{n}^{'} > + λ D B < {\hat{z}}_{n}, {\hat{z}}_{n}^{'} >_{H} = \frac{1}{n} \sum_{i = 1}^{n} < {\hat{y}}_{i}, {\hat{x}}_{i} {\hat{z}}_{n}^{'} > \end{matrix} .

(A5)

Let

(\hat{A}, \hat{B})

be the solution of (A5).

To solve the linear system (A5), we need to rewrite it in terms of vector forms

a

and

b

of

A

and

B

. For this, let

a = {(a_{11}, \dots, a_{1 k}, \dots, a_{d 1}, \dots, a_{d k})}^{'}

be the vector representation of

A

; let

b = {(b_{11}, \dots, b_{1 d}, \dots, b_{d 1}, \dots, b_{d d})}^{'}

be that of

B

.

Let the notation

< x^{'} A g, x g^{'} > \sim O a

mean rearranging the elements in the matrix

< x^{'} A g, x g^{'} >

in terms of its vetor

a

form. As in the proof of Theorem 1,

n^{- 1} \sum_{m = 1}^{n} < x_{m}^{'} A g, {\hat{x}}_{m} g^{'} > \sim O a, O_{d k \times d k} = n^{- 1} \sum_{i = 1}^{n} < s_{i}, s_{i}^{'} >,

where

s_{i} = {({\hat{x}}_{i 1} g_{1}, \dots, {\hat{x}}_{i 1} g_{k}, \dots, {\hat{x}}_{i d} g_{1}, \dots, {\hat{x}}_{i d} g_{k})}^{'} .

Similarly,

n^{- 1} \sum_{m = 1}^{n} < {\hat{x}}_{m}^{'} A g, {\hat{x}}_{m} {\hat{z}}_{n}^{'} > \sim P a, P_{d^{2} \times d k} = n^{- 1} \sum_{i = 1}^{n} < t_{i}, s_{i}^{'} >,

where

t_{i} = {({\hat{x}}_{i 1} {\hat{z}}_{1}, \dots, {\hat{x}}_{i 1} {\hat{z}}_{d}, \dots, {\hat{x}}_{i d} {\hat{z}}_{1}, \dots, {\hat{x}}_{i d} {\hat{z}}_{d})}^{'}

;

n^{- 1} \sum_{m = 1}^{n} < {\hat{x}}_{m}^{'} B {\hat{z}}_{n}, {\hat{x}}_{m} g^{'} > \sim R b, R_{d k \times d^{2}} = n^{- 1} \sum_{i = 1}^{n} < s_{i}, t_{i}^{'} \geq P^{'};

and

n^{- 1} \sum_{m = 1}^{n} < x_{m}^{'} B {\hat{z}}_{n}, {\hat{x}}_{m} {\hat{z}}_{n}^{'} > \sim S b, S_{d^{2} \times d^{2}} = n^{- 1} \sum_{i = 1}^{n} < t_{i}, t_{i}^{'} > .

Denote

U = n^{- 1} \sum_{i = 1}^{n} < {\hat{y}}_{i}, {\hat{x}}_{i} g^{'} \geq {(u_{i j})}_{d \times k}

and its vector form

u = (u_{11}, \dots, u_{1 k}, \dots, u_{d 1}, \dots, u_{d k})^{'}

; let

V = n^{- 1} \sum_{i = 1}^{n} < {\hat{y}}_{i}, {\hat{x}}_{i} {\hat{z}}_{n}^{'} \geq {(v_{i j})}_{d \times d}

and its vector form

v = {(v_{11}, \dots, v_{1 d}, \dots, v_{d 1}, \dots, v_{d d})}^{'}

; since D is semipositive definite, let

λ_{1} \geq \dots \geq λ_{d} \geq 0

be its eigenvalues,

Λ = d i a g (λ_{1}, \dots, λ_{d})

and

q_{1}, \dots, q_{d}

be its normalized eigenvectors,

Q = (q_{1}, \dots, q_{d})

, then

D = Q Λ Q^{'} = \sum_{j = 1}^{d} λ_{j} q_{j} q_{j}^{'}

. Rearranging elements of

D B < {\hat{z}}_{n}, {\hat{z}}_{n}^{'} >_{H}

in vector form similarly as before

D B < {\hat{z}}_{n}, {\hat{z}}_{n}^{'} >_{H} = \sum_{j = 1}^{d} λ_{j} < q_{j}^{'} B {\hat{z}}_{n}, q_{j} {\hat{z}}_{n}^{'} >_{H} \sim W b, W_{d^{2} \times d^{2}} = \sum_{j = 1}^{d} < c_{j}, c_{j}^{'} >_{H},

where

c_{j} = \sqrt{λ_{j}} {(q_{j 1} {\hat{z}}_{1}, \dots, q_{j 1} {\hat{z}}_{d}, \dots, q_{j d} {\hat{z}}_{1}, \dots, q_{j d} {\hat{z}}_{d})}^{'}

.

Then (A5) is rewritten as

(\begin{matrix} O & R \\ R^{'} & S + λ W \end{matrix}) (\begin{matrix} \hat{a} \\ \hat{b} \end{matrix}) = (\begin{matrix} u \\ v \end{matrix}),

(A6)

or when the following inverse exists,

(\begin{matrix} \hat{a} \\ \hat{b} \end{matrix}) = {(\begin{matrix} O & R \\ R^{'} & S + λ W \end{matrix})}^{- 1} (\begin{matrix} u \\ v \end{matrix}) .

□

Proof of Theorem 3.

Note that

{\hat{z}}_{n} (\cdot) = n^{- 1} \sum_{i = 1}^{n} [K_{1} ({\hat{x}}_{i} {\hat{y}}_{i})] (\cdot) = n^{- 1} \sum_{i = 1}^{n} [K_{1} (x_{i} y_{i})] (\cdot) + n^{- 1} \sum_{i = 1}^{n} [K_{1} ({\hat{x}}_{i} {\hat{y}}_{i} - x_{i} y_{i})] (\cdot) : = x_{n} (\cdot) + r_{n} (\cdot) .

Note that (C3) implies

E | | x y | | < \infty

and

E | | K_{1} (x y) | | < \infty

, so by Theorem 7.9 (or Corollary 7.10) in Ledoux and Talagrand [32],

| | z_{n} - z_{0} | | \to 0

(a.s.), where

z_{0} (\cdot) = E [K_{1} (x y)] (\cdot)

. By (C4),

| | r_{n} | | \to 0

(a.s.). Thus,

| | {\hat{z}}_{n} - z_{0} | | \to 0

(a.s.).

Let

C = (A, B)

,

\hat{C} = (\hat{A}, \hat{B})

,

m (C) = | | y - {(A g + B z_{0})}^{'} x {| |}^{2}

,

P m (C) = E [m (C)]

,

P_{n}

is the empirical distribution based on n iid samples from

m (C)

. Let

M_{n} (C) = \frac{1}{n} \sum_{i = 1}^{n} | | {\hat{y}}_{i} - {(A g + B {\hat{z}}_{n})}^{'} {\hat{x}}_{i} {| |}^{2} + λ J (A g + B {\hat{z}}_{n}) .

By (C5) and (C4) and the fact

| | {\hat{z}}_{n} - z_{0} | | \to 0

(a.s.),

M_{n} (C) = \frac{1}{n} \sum_{i = 1}^{n} | | y_{i} - {(A g + B z_{0})}^{'} x_{i} {| |}^{2} + λ J (A g + B z_{0}) + o (1) = \frac{1}{n} \sum_{i = 1}^{n} | | y_{i} - {(A g + B z_{0})}^{'} x_{i} {| |}^{2} + o (1) : = P_{n} m (C) + o (1) = P m (C) + o (1), (a . s .) .

(A7)

In the above we used Theorem 7.9 (or Corollary 7.10) in Ledoux and Talagrand [32] again to get

P_{n} m (C) = P m (C) + o (1)

(a.s.).

Note that

E | | x y | | < \infty

implies

E (| | x y | | | x) < \infty

, this together with (C3) implies that

{inf}_{C} P m (C) = E ({inf}_{C} E [m (C) | x])

has an unique (and finite) minimizer

C_{0} = (A_{0}, B_{0})

. We first prove

| | \hat{C} - C_{0} | | \to 0

(a.s.).

By definition of

\hat{C}

,

M_{n} (\hat{C}) \leq M_{n} (C_{0}) = P m (C_{0}) + o (1)

(a.s.), and by (A7),

P m (\hat{C}) \leq P_{n} m (C_{0}) + o (1)

(a.s.). Thus,

P m (\hat{C}) - P m (C_{0}) \leq P_{n} m (C_{0}) - P m (C_{0}) + o (1) \leq sup_{C \in C} | P_{n} m (C) - P m (C) | + o (1) \to 0 (a . s .),

(A8)

where

C

is some bounded set of

C

’s, and we used the fact that

{P_{n} m (C) : C \in C}

is a Glivenko–Cantelli class on any bounded

C

. Thus

{sup}_{C \in C} | P_{n} m (C) - P m (C) | \to 0

(a.s.).

On the other hand, since

C_{0}

is the unique minimizer of

P m (C)

, for every

δ > 0

, there is

η > 0

, such that

inf_{C : | | C - C_{0} | | \geq δ} P m (C) > P m (C_{0}) + η .

Thus, by (A8) we must have that for all large n,

| | \hat{C} - C_{0} | | < δ

(a.s.) for every

δ > 0

. This gives

| | \hat{C} - C_{0} | | \to 0

(a.s.).

Note that

E_{β_{0}} (y | x) = β_{0}^{'} x

, which is the minimizer of the conditional expectation

E_{β_{0}} (| | y - β_{0}^{'} x {| |}^{2} | x)

, and

β_{0}

is also the pointwise least squares “estimate" of itself under the objective functional

E_{β_{0}} {| | y - β_{0}^{'} {x | |}^{2}} = E {E_{β_{0}} (| | y - β_{0}^{'} x {| |}^{2} | x)}

, so by (C1),

β_{0} = {[E (x x^{'})]}^{- 1} E (x y) \in S p a n (E (x y)) = S p a n (E [K_{0} (x y)], E [K_{1} (x y)]) \subset S p a n (g, z_{0})

, (C2) implies

E [x (\cdot) x {(\cdot)}^{'}]

is invertible, and so

θ_{0}

can be written in the form

{({(A g)}^{'}, {(B z_{0})}^{'})}^{'}

. Since

C_{0} = (A_{0}, B_{0})

also minimizes

P m (C)

(over a larger space than that

θ_{0}

belongs to), we must have

{({(A_{0} g)}^{'}, {(B_{0} z_{0})}^{'})}^{'} = β_{0}

, and

\hat{C} = (\hat{A}, \hat{B}) \to (A_{0}, B_{0})

(a.s.) gives

{\hat{β}}_{n, λ} = {({(\hat{A} g)}^{'}, {(\hat{B} {\hat{z}}_{n})}^{'})}^{'} \to {({(A_{0} g)}^{'}, {(B_{0} z_{0})}^{'})}^{'} = β_{0}

(a.s.). □

Proof of Theorem 4.

Recall the blockwise inversion formula

{(\begin{matrix} A & B \\ C & D \end{matrix})}^{- 1} = (\begin{matrix} A^{- 1} + A^{- 1} B {(D - C A^{- 1} B)}^{- 1} C A^{- 1} & - A^{- 1} B {(D - C A^{- 1} B)}^{- 1} \\ - {(D - C A^{- 1} B)}^{- 1} C A^{- 1} & {(D - C A^{- 1} B)}^{- 1} \end{matrix})

and for

λ \to 0

,

{(A + λ W)}^{- 1} = A^{- 1} - λ A^{- 1} W A^{- 1} + O (λ^{2}) = A^{- 1} - O (λ)

.

By (C2) and (C3), for all large n,

O^{- 1}

,

P^{- 1}

,

R^{- 1}

,

S^{- 1}

and

W^{- 1}

all exist (a.s.). Using the above blockwise inversion formulae, by Theorem 2, we get

(\begin{matrix} \hat{a} \\ \hat{b} \end{matrix}) = {(\begin{matrix} O & R \\ P & S \end{matrix})}^{- 1} (\begin{matrix} u \\ v \end{matrix}) - O (λ) (\begin{matrix} u \\ v \end{matrix}) .

In the proof of Theorem 3, we showed

| | \hat{C} - C_{0} | | \to 0

(a.s.), i.e.,

{({\hat{a}}^{'}, {\hat{b}}^{'})}^{'} \to (a_{0}, b_{0})

(a.s.). Further, similar to the proof of Theorem 3, we can get

O \overset{a . s .}{\to} O_{0} = E < s_{1}, s_{1}^{'} >, P = R^{'} \overset{a . s .}{\to} P_{0} = E < t_{1}, s_{1}^{'} >, S \overset{a . s .}{\to} S_{0} = E < t_{1}, t_{1}^{'} > .

U \overset{a . s .}{\to} U_{0} = E < y_{1}, x_{1} g^{'} >, V \overset{a . s .}{\to} V_{0} = E < y_{1}, x_{1} z_{0}^{'} > .

Let

u_{0}

and

v_{0}

be the vector representations of

U_{0}

and

V_{0}

, then we have

(\begin{matrix} \hat{a} \\ \hat{b} \end{matrix}) \overset{a . s .}{\to} {(\begin{matrix} O_{0} & R_{0} \\ P_{0} & S_{0} \end{matrix})}^{- 1} (\begin{matrix} u_{0} \\ v_{0} \end{matrix}) : = (\begin{matrix} a_{0} \\ b_{0} \end{matrix}) .

Denote

\hat{c} = {({\hat{a}}^{'}, \hat{b})}^{'}

and

c_{0} = {(a_{0}^{'}, b_{0}^{'})}^{'}

, we first find the asymptotic distribution of

\hat{c}

. Denote

T = m a t r i x (O, R; P, S)

,

T_{0} = m a t r i x (O_{0}, R_{0}; P_{0}, S_{0})

,

w = {(u^{'}, v^{'})}^{'}

and

w_{0} = {(u_{0}^{'}, v_{0}^{'})}^{'}

, then

c_{0} = T_{0}^{- 1} w_{0}

, and

\hat{c} = T^{- 1} w

. By (C6),

\sqrt{n} (\hat{c} - c_{0}) = \sqrt{n} (T_{0}^{- 1} + o_{p} (1)) (w - w_{0}) + o (1) .

It can be shown that the sequences

{{\hat{y}}_{i}, {\hat{x}}_{i} g^{'}}

and

{{\hat{y}}_{i}, {\hat{x}}_{i} {\hat{z}}_{n}^{'}}

are Donsker classes, and so

\sqrt{n} (w - w_{0}) \overset{D}{\to} N (0, Γ), Γ = {(γ_{i j})}_{d (d + k) \times d (d + k)}, γ_{i j} = C o v ({\tilde{w}}_{i}, {\tilde{w}}_{j}),

where

\tilde{w} (\cdot) = {({\tilde{u}}^{'}, {\tilde{v}}^{'})}^{'}

,

\tilde{u}

is the vector form of

\tilde{U} = < y_{1}, x_{1} g^{'} >

and

\tilde{v}

is the vector form of

\tilde{V} = < y_{1}, x_{1} z_{0}^{'} >

. From the above we get, as

T_{0}

is symmetric,

\sqrt{n} (\hat{c} - c_{0} - o_{p} (1)) \overset{D}{\to} N (0, T_{0}^{- 1} Γ T_{0}^{- 1}) .

(A9)

Now, rewrite

{\hat{β}}_{n, λ} (\cdot) = F_{n} (\cdot) \hat{c}

, with

F_{n} = (g^{'}, J_{n})

, and

F_{0} = (g^{'}, J_{0})

, where

J_{n} = ({\hat{z}}_{n}, \dots, {\hat{z}}_{n})

, and

J_{0} = (Z_{0}, \dots, Z_{0})

. Then

F_{n} (t) = F_{0} (t) + o_{p} (r_{n}^{- 1 / 2} (t))

, and by (A9) we get

W_{n} = \sqrt{n} ({\hat{β}}_{n, λ} (\cdot) - β_{0} (\cdot) - o_{p} (1)) = \sqrt{n} F_{0} (\cdot) (\hat{c} - c_{0} - o_{p} (1)) \overset{D}{\Rightarrow} W, i n {[l^{\infty} (T)]}^{d},

where

W

is a mean zero Gaussian process on T with covariance function

σ (s, t) = E (W (s), W (t)) = F_{0} (s) T_{0}^{- 1} Γ T_{0}^{- 1} F_{0}^{'} (t)

. □

References

Ullah, S.; Finch, C.F. Applications of functional data analysis: A systematic review. BMC Med. Res. Methodol. 2013, 13, 43. [Google Scholar] [CrossRef] [Green Version]
Horváth, L.; Kokoszka, P. Inference for Functional Data with Applications; Springer: New York, NY, USA, 2012. [Google Scholar]
Yuan, A.; Fang, H.B.; Li, H.; Wu, C.O.; Tan, M. Hypothesis Testing for Multiple Mean and Correlation Curves with Functional Data. Stat. Sin. 2020, 30, 1095–1116. [Google Scholar] [CrossRef]
Lai, T.Y.; Zhang, Z.Z.; Wang, Y.F. Testing Independence and Goodness-of-Fit Jointly for Functional Linear Models. J. Korean Statsitical Soc. 2021, 50, 380–402. [Google Scholar] [CrossRef]
Ramsay, J.O.; Silverman, B.W. Functional Data Analysis; Springer: New York, NY, USA, 2005. [Google Scholar]
Clarkson, D.B.; Fraley, C.; Gu, C.; Ramsay, J.O. S+ Functional Data Analysis; Springer: New York, NY, USA, 2005. [Google Scholar]
Ferraty, F.; Vieu, P. Nonparametric Fuctional Data Analysis; Springer: New York, NY, USA, 2006. [Google Scholar]
Zeger, S.L.; Diggle, P.J. Semiparametric Models for Longitudinal Data with Application to CD4 Cell Numbers in HIV Seroconverters. Biometrics 1994, 50, 689–699. [Google Scholar] [CrossRef] [PubMed]
Lin, D.Y.; Ying, Z. Semiparametric and Nonparametric Regression Analysis of Longitudinal Data. J. Am. Stat. Assoc. 2001, 96, 103–126. [Google Scholar] [CrossRef]
Fan, J.; Li, R. New Estimation and Model Selection Procedures for Semiparametric Modeling in Longitudinal Data Analysis. J. Am. Stat. Assoc. 2004, 99, 710–723. [Google Scholar] [CrossRef] [Green Version]
Xue, L.; Zhu, L. Empirical Likelihood Semiparametric Regression Analysis for Longitudinal Data. Biometrika 2007, 94, 921–937. [Google Scholar] [CrossRef]
Yuan, M.; Cai, T. A Reproducing Kernel Hilbert Space Approach to Functional Linear Regression. Ann. Stat. 2010, 38, 3412–3444. [Google Scholar] [CrossRef]
Reiss, P.T.; Goldsmith, J.; Shang, H.L.; Ogden, R.T. Methods for Scalar-on-Function Regression. Inte. Stat. Rev. 2017, 85, 228–249. [Google Scholar] [CrossRef]
Chen, C.; Guo, S.J.; Qian, X.H. Functional Linear Regression: Dependence and Error Contamination. J. Bus. Econ. Stat. 2022, 40, 444–457. [Google Scholar] [CrossRef]
Yao, F.; Müller, H.G.; Wang, J.L. Functional Linear Regression Analysis for Longitudinal Data. Ann. Stat. 2005, 33, 2873–2903. [Google Scholar] [CrossRef] [Green Version]
Müller, H.G.; Yao, F. Functional Additive Models. J. Am. Stat. Assoc. 2008, 103, 1534–1544. [Google Scholar] [CrossRef]
Kramer, N.; Boulesteix, A.L.; Tutz, G. Penalized Partial Least Squares with Applications to B-spline Transformations and Functional Data. Chem. Intell. Lab. Syst. 2008, 94, 60–69. [Google Scholar] [CrossRef] [Green Version]
Hayashi, K.; Hayashi, M.; Reich, B.; Lee, S.P.; Sachdeva, A.U.C.; Mizoguchi, I. Functional Data Analysis of Mandibular Movement Using Third-degree B-Spline Basis Functions and Self-modeling Regression. Orthod. Waves 2012, 71, 17–25. [Google Scholar] [CrossRef]
Aguilera, A.M.; Aguilera-Morillo, M.C. Penalized PCA Approaches for B-spline Expansions of Smooth Functional Data. Appl. Math. Comput. 2013, 219, 7805–7819. [Google Scholar] [CrossRef]
Berlinet, A.; Elamine, A.; Mas, A. Local Linear Regression for Functional Data. Ann. Inst. Stat. Math. 2011, 63, 1047–1075. [Google Scholar] [CrossRef] [Green Version]
Abeidallah, M.; Mechab, B.; Merouan, T. Local Linear Estimate of the Point at High Risk: Spatial Functional Data Case. Commun. Stat. Theory Methods 2020, 49, 2561–2584. [Google Scholar] [CrossRef]
Sara, L. Nonparametric Local Linear Regression Estimation for Censored Data and Functional Regressors. J. Korean Stat. Soc. 2020, 51, 1–22. [Google Scholar] [CrossRef]
Lei, X.; Zhang, H. Non-asymptotic Optimal Prediction Error for RKHS-based Partially Functional Linear Models. arXiv 2020, arXiv:2009.04729. [Google Scholar]
Fang, K.T.; Li, R.; Sudjianto, A. Design and Modeling for Computer Experiments; Chapman & Hall/CRC: New York, NY, USA, 2006. [Google Scholar]
Lai, T.L.; Robins, H.; Wei, C.Z. Strong Consistency of Least Squares Estimates in Multiple Regression. Proc. Natl. Acad. Sci. USA 1978, 75, 3034–3036. [Google Scholar] [CrossRef] [Green Version]
Eicker, F. Asymptotic Normality and Consistency of the Least Squares Estimators for Families of Linear Regressions. Ann. Math. Stat. 1963, 34, 447–456. [Google Scholar] [CrossRef]
Wahba, G. Spline Models for Observational Data; SIAM: Philadelphia, PA, USA, 1990. [Google Scholar]
Gu, C. Smoothing Spline ANOVA Models; Springer: New York, NY, USA, 2002. [Google Scholar]
Rapoport, A.P.; Aqui, N.A.; Stadtmauer, E.A.; Vogl, D.T.; Fang, H.B.; Cai, L.; Janofsky, S.; Chew, A.; Storek, J.; Gorgun, A.; et al. Combination immunotherapy using adoptive T-cell transfer and tumor antigen vaccination based on hTERT and survivin after ASCT for myeloma. Blood 2011, 117, 788–797. [Google Scholar] [CrossRef] [PubMed] [Green Version]
Fang, H.B.; Wu, T.T.; Rapoport, A.P.; Tan, M. Survival Analysis with Functional Covariates Based on Partial Follow-up Studies. Stat. Methods Med. Res. 2016, 25, 2405–2419. [Google Scholar] [CrossRef]
Zhang, H.; Jia, J. Elastic-net Regularized High-dimensional Negative Binomial Regression: Consistency and Weak Signals Detection. Stat. Sin. 2022, 32, 181–207. [Google Scholar] [CrossRef]
Ledoux, M.; Talagrand, M. Probability in Banach Spaces; Springer: New York, NY, USA, 1991. [Google Scholar]

Figure 1. Left panel visualizes the density function of

E (0, 1)

; right panel visualizes the kernel density estimation of the number of observation time of MD001.

Figure 2. Performance of curve estimation when the sample size is 50 and the random seed is “set.seed(1)” in R. First row: curve estimation performance of the spline smoother; Second row: curve estimation performance of the local polynomial model; Third row: curve estimation performance of the proposed RKHS method. Solid red line: true curve; Solid blue line: estimated curve; Dotted lower and upper green lines:

95 %

confidence bands.

Figure 3. Performance of curve estimation when the sample size is 100 and the random seed is “set.seed(1)” in R. First row: curve estimation performance of the spline smoother; Second row: curve estimation performance of the local polynomial model; Third row: curve estimation performance of the proposed RKHS method. Solid red line: true curve; Solid blue line: estimated curve; Dotted lower and upper green lines:

95 %

confidence bands.

Figure 4. Performance of curve estimation when the sample size is 200 and the random seed is “set.seed(1)” in R. First row: curve estimation performance of the spline smoother; Second row: curve estimation performance of the local polynomial model; Third row: curve estimation performance of the proposed RKHS method. Solid red line: true curve; Solid blue line: estimated curve; Dotted lower and upper green lines:

95 %

confidence bands.

Figure 5. Boxplots of the RIMSPE values. The first row corresponds to sample size 50, the second row corresponds to sample size 100, and the third row corresponds to sample size 200. In each row, the left panel is for estimating

β_{0} (t)

, the middle panel is for estimating

β_{1} (t)

, and the right panel is for estimating

β_{2} (t)

.

Figure 6. Left panel: trajectory of individual “MD001"; right panel: trajectory of individual “MD002". The observation interval has been scaled to

[0, 1]

.

Figure 7. The regression coefficient functions estimated by the proposed RKHS method. Solid blue line: estimated curve; dotted lower and upper green lines:

95 %

confidence bands. The time t has been scaled to the interval

[0, 1]

.

Table 1. Summary of simulation results for linearity testing.

Sample Size	Type I Error (for Testing $β_{0} (t)$ )	Power (for Testing $β_{1} (t)$ )
50	0.059	0.756
100	0.052	0.865
200	0.051	0.923

The simulation is based on 1000 repetitions.

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Representation Theorem and Functional CLT for RKHS-Based Function-on-Function Regressions

Abstract

1. Introduction

2. The Proposed Method

3. Simulation Studies

4. Real Data Analysis

5. Concluding Remarks

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

Appendix A

References

Article Metrics

Citations

Article Access Statistics