Partially Linear Generalized Single Index Models for Functional Data (PLGSIMF)

Alahiane, Mohamed; Ouassou, Idir; Rachdi, Mustapha; Vieu, Philippe

doi:10.3390/stats4040047

Open AccessArticle

Partially Linear Generalized Single Index Models for Functional Data (PLGSIMF)

¹

Ecole Nationale des Sciences Appliquées, Université Cadi Ayyad, Marrakech 40 001, Morocco

²

Laboratoiry AGEIS, UFR SHS, Université Grenoble Alpes, BP. 47, CEDEX 09, 38040 Grenoble, France

³

Institut de Mathématiques de Toulouse, Université Paul Sabatier, CEDEX 9, 31062 Toulouse, France

^*

Author to whom correspondence should be addressed.

Stats 2021, 4(4), 793-813; https://doi.org/10.3390/stats4040047

Submission received: 8 April 2021 / Revised: 24 July 2021 / Accepted: 27 July 2021 / Published: 27 September 2021

(This article belongs to the Special Issue Functional Data Analysis (FDA))

Download

Browse Figures

Versions Notes

Abstract

:

Single-index models are potentially important tools for multivariate non-parametric regression analysis. They generalize linear regression models by replacing the linear combination

α_{0}^{⊤} X

with a non-parametric component

η_{0} (α_{0}^{⊤} X)

, where

η_{0} (\cdot)

is an unknown univariate link function. In this article, we generalize these models to have a functional component, replacing the generalized partially linear single index models

η_{0} (α_{0}^{⊤} X) + β_{0}^{⊤} Z

, where

α

is a vector in

{I R}^{d}

,

η_{0} (\cdot)

and

β_{0} (\cdot)

are unknown functions that are to be estimated. We propose estimates of the unknown parameter

α_{0}

, the unknown functions

β_{0} (\cdot)

and

η_{0} (\cdot)

and establish their asymptotic distributions, and furthermore, a simulation study is carried out to evaluate the models and the effectiveness of the proposed estimation methodology.

Keywords:

asymptotic normality; functional data analysis (FDA); polynomial splines; quasi-likelihood; semi-parametric regression; single-index model

1. Introduction

Generalized linear models are proposed by Nelder and Wedderburn [1],

g (μ (X)) = β^{⊤} X

; for a detail review, we refer the readers to McCullagh and Nelder [2]; it consists of a random component and systematic component. GLMs assume the responses come from the exponential dispersion model family. They extend linear models to allow the relationship between the predictors and the function of the mean of continuous or discrete response through a canonical link function. These models encounter problems such as the canonical link function is sometimes unknown, the link between response and predictors can be complex as well as the plague of dimension reduction. To address these problems, several approaches have been developed. Hastie and Tibshirani [3] propose the GAMs models, in which the linear predictor depends linearly on smooth of predictor variables, one of the criticisms of these models is that they do not take into consideration the interactions between covariates. The manuscripts of Wood [4] and Dunn, Peter, Smyth, Gordon [5] are the latest references dealing with these two models.

The single index model had been employed to reduce the dimensionality of data, and avoid the “curse of dimensionality” while maintaining the advantages of non-parametric smoothing in multivariate regression cases over the last few decades, see for example the work of Lai et al. [6].

The single index

α^{⊤} X

aggregates the influence of the observed values

X = (X_{1}, \dots, X_{d})

of the explanatory variables into one number.

Examples of economic index include the following: a stock index, inflation index, cost-of-living index, and price index.

Furthermore, this idea had first been extended to the functional setting by Ferraty, Vieu et al. [7] for functional regression problems, which led to the functional single index regression model (FSIRM). The functional index acts as a filter permitting the extraction of the part of explaining the scalar response Y, and plays an important role in a such model.

The predictor is not generally linear, but is complex, which prompted Caroll et al. [8] to propose their model GPLSIM by applying the local-quasi-likelihood function and the kernel type of smoothing by approximating the function

η_{0}

using local linear methods

g (μ (X, Z)) = η_{0} (α_{0}^{⊤} X) + β_{0}^{⊤} Z

. A GPLSIM was proposed by Chin-Shang Li et al. [9], in which the unknown smooth function of single index was approximated by a spline function that can be expressed as a linear combination of B-spline basis function considered as follows

g (μ (X, Z)) = β^{⊤} X + φ (α^{⊤} Z)

using a modified Fisher-scoring method. Moreover, Wang and Cao [10], have studied the GPLSIM model by applying the quasi-likelihood and polynomial spline smoothing

g (μ (X, Z)) = η_{0} (α_{0}^{⊤} X) + β_{0}^{⊤} Z

.

In recent years, the analysis of functional data has made considerable progress in several areas, including image processing, biomedical studies, environmental sciences, public health, etc. Several researchers have focused their efforts on studying this type of data. We mention the work of Aneiros-Pérez and Vieu [11], and for more details, we refer to the books of Horváth Kokoszka [12], Ferraty and Vieu [7], Aneiros-Pérez and Vieu [13], and Ramsay and Silverman [14]. Yu, Du and Zhang [15] have proposed the SIPFLM model combining the single-index model (SIM) and the FLM model by optimizing the sum of the least squares using the B-Spline basis

Y = g (α^{⊤} X) + \int_{T} β (t) Z (t) d t

. Jiang Du et al. [16] proposed the GFPLM model using the functional principal component analysis

g (μ (X, Z)) = α^{⊤} X + \int_{T} β (t) Z (t) d t

. Rachdi, Alahiane, Ouassou and Vieu presented a book chapter on the generalization of the GPLSIM model in the Iwfos 2020 conference, see Rachdi et al. [17]. Our objective is to combine the GPLSIM with the SIPFLM and consider the following generalized partially functional single-index model called PLGSIMF using B-Spline expansion and the quasi-likelihood function

g (μ (X, Z)) = η_{0} (α^{⊤} X) + \int_{0}^{1} β (t) Z (t) d t

in order to remedy the interaction effects, the dimension scourge and to take into account the functional random variables.

The paper is organized as follows. In Section 1 and Section 2, we localize our model in the literature, and we present the Fisher-scoring update algorithm used to estimate our single-index vector, the parametric function and the slope function. In Section 3, we investigate an asymptotic study of the estimators presented in the paper. Numerical simulation in the Gaussian case as in the logistic case is presented in Section 4. The proofs of the results are developed in Section 6 and in the Appendix A for the different technical lemmas necessary to develop our asymptotic study both for the non-parametric function, for the single-index vector and for the slope function.

Let H be a separable Hilbert space, which is endowed with the scalar product

< \cdot, \cdot >_{H}

and the norm

| | \cdot {| |}_{H}

. Let Y be a scalar response variable and

(X, Z) \in {I R}^{d} \times H

be the predictor vector where

X = (X_{1}, \dots, X_{d})

and Z to be a functional random variable that is valued in H. For a fixed

(x, z) \in {I R}^{d} \times H

, we assume that the conditional density function of the response Y given

(X, Z) = (x, z)

belongs to the following canonical exponential family:

f_{Y | X = x, Z = z} (y) = exp (y ξ (x, z) - B (ξ (x, z)) + C (y)),

(1)

where B and C are two known functions that are defined from

I R

into

I R

, and

ξ : {I R}^{d} \times H ⟶ I R

is the parameter in the generalized parametric linear model, which is linked to the dependent variable

μ (x, z) = E [Y | X = x, Z = z] = B^{'} (ξ (x, z)),

(2)

where

B^{'}

denotes the first derivative of the function B. In what follows, we consider the function

g (μ (x, z))

as a generalized single-index partially functional linear model:

g (μ (x, z)) = η_{0} (α^{⊤} x) + \int_{0}^{1} β (t) z (t) d t,

(3)

where

α = (α_{1}, α_{2}, \dots, α_{d}) \in {I R}^{d}

is the d-dimensional single-index coefficient vector,

β

is the coefficient function in the functional component, and

η_{0}

is the unknown single-index link function which will be assumed to be sufficiently smooth.

If the conditional variance

V a r (Y | X = x, Z = z) = σ^{2} V (μ (x, z))

, where V is an unknown positive function, then the estimation of the mean function

g (μ)

may be obtained by replacing the log-likelihood

f_{Y | X = x, Z = z}

given by (1), by the quasi-likelihood

Q (u, v)

given by

\frac{\partial Q (u, v)}{\partial u} = \frac{v - u}{σ^{2} V (u)},

for any real numbers u and v, which may be written as

Q (u, v) = \int_{v}^{u} \frac{v - t}{σ^{2} V (t)} d t

.

2. Estimation Methodology

Let

{(X_{i}, Y_{i}, Z_{i})}_{i = 1, \dots, n}

be a sequence of independent and identically distributed (i.i.d.) as

(X, Y, Z)

and, for each

i = 1, \dots, n

,

g (μ (X_{i}, Z_{i})) = η_{0} (α^{⊤} X_{i}) + \int_{0}^{1} β (t) Z_{i} (t) d t .

(4)

We assume that the function

η_{0}

is supported within the interval

[a, b]

where

a = inf (α^{⊤} X)

and

b = sup (α^{⊤} X)

.

We introduce a sequence of knots

(k_{m})

in the interval

[a, b]

, with J interior knots, such that

k_{- r + 1} = \dots = k_{- 1} = k_{0} = a < k_{1} < \dots < k_{J} = k_{J + 1} = \dots = k_{J + r}

, where

J : = J_{n}

is a sequence of integers which increases with the sample size n. Now, let

N_{n} = J_{n} + r

be the number of knots,

{(B_{j} (u))}_{j = 1, \dots, N_{n}}

be the B-spline basis functions of order r, and

h = (b - a) / (J_{n} + 1)

be the distance between the neighbors knots.

Let

S_{n}

be the space of polynomial splines on

[a, b]

of order

r \geq 1

. By De Boor [18], we can approximate

η_{0},

assumed in

H (p)

(which will be defined in Section 3) by a function

\tilde{η} \in S_{n}

. So, we can write

\tilde{η} (u) = {\tilde{γ}}^{⊤} B (u)

where

B (u)

is the spline basis and

\tilde{γ} \in {I R}^{N_{n}}

is the spline coefficient vector.

We introduce a new knots sequence

0 = t_{0} < t_{1} < \dots < t_{k + 1} = 1

of

[0, 1]

. Then, there exists

N^{'} = k + r + 1

functions in the B-splines basis which are normalized and of order r, such that

β (\cdot) \approx δ^{⊤} B_{2} (.) where B_{2} (.) = {(B_{21} (.), B_{22} (.), \dots, B_{2 N^{'}} (.))}^{⊤} and δ \in {I R}^{N^{'}} .

By setting

W = (\int_{0}^{1} Z (t) B_{21} (t) d t, \dots, \int_{0}^{1} Z (t) B_{2 N^{'}} (t) d t),

(5)

and w and

W_{i}

are defined accordingly to (5), the mean function estimator

\hat{μ} (x, z)

is then given by the evaluation of the parameter

θ = {(α^{⊤}, γ^{⊤}, δ^{⊤})}^{⊤}

and by inverting the following equation

g (\hat{μ} (x, z)) = {\hat{γ}}^{⊤} B_{1} ({\hat{α}}^{⊤} x) + {\hat{δ}}^{⊤} w .

Notice that the parameter

θ = {(α^{⊤}, γ^{⊤}, δ^{⊤})}^{⊤}

is determined by maximizing the following quasi-likelihood rule

\begin{matrix} \hat{θ} = {({\hat{α}}^{⊤}, {\hat{γ}}^{⊤}, {\hat{δ}}^{⊤})}^{⊤} = \underset{θ = (α, γ, δ) \in {I R}^{d} \times {I R}^{N_{n}} \times {I R}^{N^{'}}}{arg max} l (θ), \end{matrix}

where

l (θ) : = l (α, γ, δ) = \frac{1}{n} \sum_{i = 1}^{n} Q (g^{- 1} (m_{i}), Y_{i})

, with

m (x, z) = γ^{⊤} B_{1} (α^{⊤} x) + δ^{⊤} ω,

m_{i} : = γ^{⊤} B_{1} (α^{⊤} X_{i}) + δ^{⊤} W_{i} and m_{0 i} = γ_{0}^{⊤} B_{1} (U_{0 i}) + δ_{0}^{⊤} W_{i},

where

U_{0 i} = α_{0}^{⊤} X_{i}

with

α_{0}

,

γ_{0}

,

δ_{0}

,

η_{0}

,

β_{0}

denoting the true values, respectively, of

α

,

γ

,

δ

,

η

and

β

.

To overcome the constraint

∥ α ∥ = 1

and

α_{1} > 0

of the d-dimensional index

α

, we proceed by a re-parameterization, which is similar to Yu and Ruppert [19]

α (τ) = {(\sqrt{1 - {∥ τ ∥}^{2}}, τ^{⊤})}^{⊤} for τ \in {I R}^{d - 1} .

The true value

τ_{0}

of

τ

, must satisfy

∥τ_{0}∥ \leq 1

. Then, we assume that

∥τ_{0}∥ < 1

. The jacobian matrix of

α : τ \to α (τ)

of dimension

d \times (d - 1)

is

J (τ)

. Notice that

τ

is unconstrained and is one dimension lower than

α

.

Finally, let

R (τ) = (\begin{matrix} J (τ) & 0 \\ 0 & I_{N^{'} \times N^{'}} \end{matrix})

the jacobian matrix of

{(α {(τ)}^{⊤}, δ^{⊤})}^{⊤}

, which is of dimension

(d + N^{'}) \times (d + N^{'} - 1)

. Let

(\tilde{α}, \tilde{δ}) = \underset{(α, δ) \in {I R}^{d} \times {I R}^{N^{'}}, τ \in {I R}^{d - 1}}{arg max} \frac{1}{n} \sum_{i = 1}^{n} Q (g^{- 1} \{\tilde{η} (α^{⊤} (τ) X_{i}) + δ^{⊤} W_{i}\}, Y_{i}) and T_{i} = {(X_{i}^{⊤}, W_{i}^{⊤})}^{⊤},

Denote

\begin{matrix} m_{i} & = & γ^{⊤} B_{1} (α^{⊤} X_{i}) + δ^{⊤} W_{i}, T_{i} = {(X_{i}^{⊤}, W_{i}^{⊤})}^{⊤}, \\ m_{0 i} & = & m_{0 i} (X_{i}, W_{i}) = γ_{0}^{⊤} B_{1} (α_{0}^{⊤} X_{i}) + δ_{0}^{⊤} W_{i} = γ_{0}^{⊤} B_{1} (U_{0 i}) + δ_{0}^{⊤} W_{i} with U_{0 i} = α_{0}^{⊤} X_{i}, \\ m_{0} (T) & = & γ_{0}^{⊤} B_{1} (α_{0}^{⊤} X) + δ_{0}^{⊤} W = γ_{0}^{⊤} B_{1} (U_{0}) + δ_{0}^{⊤} W with U_{0} = α_{0}^{⊤} X . \end{matrix}

and

(\tilde{τ}, \tilde{δ}) = \underset{τ, δ}{arg max} \tilde{l} (τ, δ)

where

\tilde{l} (τ, δ) = \frac{1}{n} \sum_{i = 1}^{n} Q (g^{- 1} \{\tilde{η} (α {(τ)}^{⊤} X_{i}) + δ^{⊤} W_{i}\}, Y_{i})

. Note that

θ_{τ} = {(τ^{⊤}, γ^{⊤}, δ^{⊤})}^{⊤}

is a

(d - 1) \times N_{n} \times N^{'}

-dimensional parameter, while

θ

is a

d \times N_{n} \times N^{'}

-dimensional one. Let

ρ_{l} (m) = \frac{1}{σ^{2} V (g^{- 1} (m))} {[\frac{d}{d m} (g^{- 1} (m))]}^{l}

and denote

q_{l} (m, y) = \frac{\partial^{l}}{\partial m^{l}} Q (g^{- 1} (m), y), for l = 1, 2 .

Then,

q_{1} (m, y) = (y - g^{- 1} (m)) ρ_{1} (m) and q_{2} (m, y) = (y - g^{- 1} (m)) ρ_{1}^{'} (m) - ρ_{2} (m) .

So,

l (θ_{τ})

becomes

l (θ_{τ}) = \frac{1}{n} \sum_{i = 1}^{n} Q (g^{- 1} \{γ^{⊤} B_{1} (α^{⊤} (τ) X_{i}) + δ^{⊤} W_{i}\}, Y_{i}) = \frac{1}{n} \sum_{i = 1}^{n} Q (g^{- 1} \{m_{i}\}, Y_{i})

The score vector is then

S (θ_{τ}) = \frac{\partial l}{\partial θ_{τ}} (θ_{τ}) = \frac{1}{n} \sum_{i = 1}^{n} q_{1} (m_{i}, Y_{i}) ξ_{i} (τ, γ, δ),

where

ξ_{i} (τ, γ, δ) = (\begin{matrix} γ^{⊤} B_{1}^{'} (α^{⊤} (τ) X_{i}) J^{⊤} (τ) X_{i} \\ B_{1} (α^{⊤} (τ) X_{i}) \\ W_{i} \end{matrix}) .

The expectation of the Hessian matrix is

\begin{matrix} H (θ_{τ}) & = & E [\frac{\partial^{2}}{\partial θ_{τ}^{⊤} \partial θ_{τ}} S (θ_{τ})] \\ = & - \frac{1}{n} \sum_{i = 1}^{n} ρ_{2} (m_{i}) ξ_{i} (τ, γ, δ) ξ_{i}^{⊤} (τ, γ, δ), \end{matrix}

The Fisher Scoring update equations

θ_{τ}^{(k + 1)} = θ_{τ}^{(k)} - {[H (θ_{τ}^{(k)})]}^{- 1} S (θ_{τ}^{(k)})

, becomes

\begin{matrix} θ_{τ}^{(k + 1)} & = & θ_{τ}^{(k)} + {[\sum_{i = 1}^{n} ρ_{2} (m_{i}^{(k)}) ξ_{i} (τ^{(k)}, γ^{(k)}, δ^{(k)}) ξ_{i}^{⊤} (τ^{(k)}, γ^{(k)}, δ^{(k)})]}^{- 1} \\ \times [\sum_{i = 1}^{n} (Y_{i} - μ_{i}^{(k)}) ρ_{1} (m_{i}^{(k)}) ξ_{i} (τ^{(k)}, γ^{(k)}, δ^{(k)})], \end{matrix}

where

m_{i}^{(k)} = γ^{(k) ⊤} B_{1} (α^{(k) ⊤} (τ^{(k)}) X_{i}) + δ^{(k) ⊤} W_{i}

, for

1 \leq i \leq n

and

μ_{i}^{(k)} = g^{- 1} (m_{i}^{(k)})

.

It follows that

\begin{matrix} \hat{β} (t) & = & {\hat{δ}}^{⊤} B_{2} (t) = δ^{(k) ⊤} B_{2} (t), \hat{η} (t) = {\hat{γ}}^{⊤} B_{1} (t) = γ^{(k) ⊤} B_{1} (t), \\ {\hat{m}}_{i} & = & {\hat{γ}}^{⊤} B_{1} (α^{⊤} (\hat{τ}) X_{i}) + {\hat{δ}}^{⊤} W_{i} = γ^{(k) ⊤} B_{1} (α^{⊤} (τ^{k})) X_{i} + δ^{(k) ⊤} W_{i}, \end{matrix}

where

{\hat{μ}}_{i} = g^{- 1} ({\hat{m}}_{i})

, and

\hat{α} = α (τ^{(k)})

is the estimator of the single-index coefficient vector of the PLGSIMF model.

3. Some Asymptotics

We present asymptotic properties of the estimators for the non-parametric components, the functional component, the single-index coefficient vector and the slope function of the PLGSIMF model. For this aim, we will need some assumptions.

3.1. Some Additional Notions and Assumptions

Let

φ

,

φ_{1}

and

φ_{2}

be measurable functions on

[a, b]

. We define the empirical inner product

{〈φ_{1}, φ_{2}〉}_{n}

and its corresponding norm

{∥ φ ∥}_{n}

as follows

{〈φ_{1}, φ_{2}〉}_{n} = \frac{1}{n} \sum_{i = 1}^{n} φ_{1} (U_{i}) φ_{2} (U_{i}) and {∥ φ ∥}_{n}^{2} = \frac{1}{n} \sum_{i = 1}^{n} φ^{2} (U_{i}) where U_{i} = α^{⊤} X_{i} .

If

φ

,

φ_{1}

and

φ_{2}

are

L^{2}

-integrable, we define the theoretical inner product and its corresponding norm as follows

〈φ_{1}, φ_{2}〉 = E [φ_{1} (U) φ_{2} (U)] and {∥ φ ∥}_{2}^{2} = E [φ^{2} (U)] = \int_{a}^{b} φ^{2} (u) f (u) d u .

Let

v \in N^{*}

and

e \in (0, 1]

such that

p = v + e > 1.5

. We denote by

H (p)

the collection of functions g, which are defined on

[a, b]

whose v-th order derivative,

g^{(v)}

, exists and satisfies the following e-th order Lipschitz condition

|g^{(v)} (m^{'}) - g^{(v)} (m)| \leq C {|m^{'} - m|}^{e}, for all a \leq m, m^{'} \leq b .

Let

ε = Y - g^{- 1} (m_{0} (T))

where

T = {(X^{⊤}, W^{⊤})}^{⊤}

.

(C1) The single-index link function

η_{0} \in H (p)

, where

H (p)

is defined as above.

(C2) For all

m \in I R

and for all y in the range of the response variable Y, the function

q_{2} (m, y)

is strictly negative, and for

k = 1, 2

, there exist some positive constants

c_{q}

and

C_{q}

such that

c_{q} < |q_{2}^{k} (m, y)| < C_{q} .

(C3) The marginal density function of

α^{⊤} X

is continuous and bounded away from zero and is infinite on its support

[a, b]

. The v-th order partial derivatives of the joint density function of X satisfy the Lipschitz condition of order

α

(

α \in (0, 1]

).

(C4) For any vector

τ

, there exist positive constants

c_{τ}

and

C_{τ}

, such that

c_{τ} I_{t \times t} \leq E [(\begin{matrix} 1 \\ T \end{matrix}) {(\begin{matrix} 1 \\ T \end{matrix})}^{⊤} | α^{⊤} (τ) X = α^{⊤} (τ) x] \leq C_{τ} I_{t \times t},

where

t = 1 + N_{n} + N^{'}

and

T = {(X^{⊤}, W^{⊤})}^{⊤}

.

(C5) The number of knots

N_{n}

satisfy

n^{\frac{1}{2 (p + 1)}} ≪ N_{n} ≪ n^{\frac{1}{8}}

, for

p > 3

.

(C6) The fourth order moment of the random variable Z is finite, i.e.,

{E ∥ Z (.) ∥}^{4} \leq C

, where C denotes a generic positive constant.

(C7) The covariance function

K (t, s) = Cov (Z (t), Z (s))

is positive definite.

(C8) The slope function

β

is a r-th order continuously differentiable function, i.e.,

β \in C^{r} ([0, 1]) .

(C9) For some finite positive constants

C_{ρ}

,

C_{ρ}^{*}

and

M_{0}

|ρ_{1} (m_{0})| \leq C_{ρ} and |ρ_{1} (m) - ρ_{1} (m_{0})| \leq C_{ρ}^{*} |m - m_{0}| for all |m - m_{0}| \leq M_{0} .

(C10) For some finite positive constants

C_{g}

,

C_{g}^{*}

and

M_{1}

, the link function g, in the model (3), satisfies:

|\frac{d}{d m} g (m) |_{m = m_{0}}| \leq C_{g}

and, for all

|m - m_{0}| \leq M_{1}

,

|\frac{d}{d m} g^{- 1} (m) - \frac{d}{d m} g^{- 1} (m) |_{m = m_{0}}| \leq C_{g}^{*} |m - m_{0}| .

(C11) It exists a positive constant

C_{0}

, such that

E (ϵ^{2} | U_{τ, 0}) \leq C_{0}, where ϵ = Y - g^{- 1} (m_{0} (T))

.

3.2. Estimators Consistencies

Next we formulate several assertions on the considered estimators.

3.3. Estimation of the Nonparametric Component

The following theorem states the convergence, with rates, of the estimator

\hat{η} .

Theorem 1.

Under assumptions

(C_{6}) - (C_{8})

, we have

{∥\hat{η} - η_{0}∥}_{2} = O_{I P} \{\sqrt{N_{n}} (\frac{1}{\sqrt{n} h} + h^{p})\} and {∥\hat{η} - η_{0}∥}_{n} = O_{I P} \{\sqrt{N_{n}} (\frac{1}{\sqrt{n} h} + h^{p})\},

where

O_{I P}

denotes a “grand O of Landau” in probability.

Proof of Theorem 1.

The proof of the previous theorem is given in the Appendix A. □

3.4. Estimation of the Slope Function

Theorem 2.

Under assumptions

(C_{1}) - (C_{8})

, and

k \sim n^{1 / (2 r + 1)}

, we have

∥ \hat{β} (\cdot) - β_{0} {(\cdot) ∥}^{2} = O_{I P} (N_{n}^{2} {(h^{p} + \frac{1}{\sqrt{n h}})}^{2}) + O_{I P} (n^{- 2 r / (2 r + 1)}) .

Proof of Theorem 2.

The proof of the previous theorem is given in the Appendix A. □

3.5. Estimation of the Parametric Components

The next theorem shows that the maximum quasi-likelihood estimator is root-n consistent and is asymptotically normal, although the convergence rate of the non-parametric component

\hat{η}

is slower than root-n. Before enouncing the theorem, let us denote

\begin{matrix} Υ (u_{τ, 0}) = \frac{E [X ρ_{2} (m_{0} (T)) | U_{τ, 0} = u_{τ, 0}]}{E [ρ_{2} (m_{0} (T)) | U_{τ, 0} = u_{τ, 0}]} & , & Γ (u_{τ, 0}) = \frac{E [W ρ_{2} (m_{0} (T)) | U_{τ, 0} = u_{τ, 0}]}{E [ρ_{2} (m_{0} (T)) | U_{τ, 0} = u_{τ, 0}]}, \\ Φ (x) = Φ (U_{τ, 0}, x) = x - Υ (u_{τ, 0}) & and & Ψ (w) = Ψ (U_{τ, 0}, w) = w - Γ (u_{τ, 0}) . \end{matrix}

Theorem 3.

Under assumptions

(C_{1}) - (C_{11})

, the constrained quasi-likelihood estimators

\hat{α}

and

\hat{δ}

with

∥ \hat{α} ∥_{d} = 1

are jointly asymptotically normally distributed, i.e.,

\sqrt{n} (\begin{matrix} \hat{α} - α_{0} \\ \hat{δ} - δ_{0} \end{matrix}) \overset{D}{⟶} N (0, R (τ_{0}) D^{- 1} R^{⊤} (τ_{0})),

where

\overset{D}{\to}

denotes the convergence in distribution, and

D = E [ρ_{2} (m_{0} (T)) (\begin{matrix} η_{0}^{'} (U_{τ, 0}) J^{⊤} (τ_{0}) Φ (X) \\ Ψ (W) \end{matrix}) {(\begin{matrix} η_{0}^{'} (U_{τ, 0}) J^{⊤} (τ_{0}) Φ (X) \\ Ψ (W) \end{matrix})}^{⊤}],

and

R (τ) = (\begin{matrix} J (τ) & 0 \\ 0 & I_{N^{'} \times N^{'}} \end{matrix}) .

Proof of Theorem 3.

The proof of the previous theorem is given in the Appendix A. □

Comments on the Assumptions

The smoothness condition in (C1) describes that the single-index function

η_{0} (\cdot)

can be approximated by functions in the B-spline space with a normalized basis. On the other hand, the condition (C2) ensures the uniqueness of the solution, where the condition (C3) is a smoothness condition on the joint and marginal density functions of

α^{⊤} X

and X. The condition (C5) allows to obtain the rate of growth of the dimension of the spline spaces relative to the sample size. Conditions (C6) and(C7) are required for covariates function Z, and (C8) is a smoothness condition for slope function. Conditions, (C4) and (C9)–(C11) are technical lemmas that will be used to prove the cited theorems in this article.

Then, in this paper, we introduce a new generalized functional partially linear single-index model based on a combination of polynomial smoothing. The asymptotic properties of the resulting estimators under certain regularity asymptions are established for this model and hence the non-parametric component

η

and the slope function

β

are evaluated by the B-spline functions. Finally, we give some simulations to illustrate our results.

4. A Numerical Study

We conduct a simulation study in order to show our results’ effectiveness. We will treat two main cases of link functions: the identity and the logit link functions.

Recall that if the density of Y given

X = x

and

Z = z

is

f_{Y | X = x, Z = z} (y) = exp (y ξ (x, z) - B (ξ (x, z)) + C (y)),

then the link fuction

g (μ (x, z)) = E [Y | X = x, Z = z] = B^{'} (ξ (x, z)) and V (μ (x, z)) = \frac{B^{''} (ξ (x, z))}{σ^{2}} .

4.1. Case 1: Identity Link Function

We consider the case where the link function is the identity and the model

Y_{i} = sin \{\frac{π (α^{⊤} X_{i} - A)}{B - A}\} + \int_{0}^{1} β (t) Z_{i} (t) d t + ε_{i} for i = 1, \dots, n .

(6)

The responses

Y_{i}

are simulated according to the Equation (6),

X_{i}

are taken uniformly over the interval

[- 0.5, 0.5]

, whereas the errors are normally distributed with mean 0 and variance

0.01

,

ε_{i} \sim N (0, 0.01)

. Moreover, we take the following coefficients

α = \frac{1}{\sqrt{3}} {(1, 1, 1)}^{⊤}, A = \frac{\sqrt{3}}{2} - \frac{1.645}{\sqrt{12}} and B = \frac{\sqrt{3}}{2} + \frac{1.645}{\sqrt{12}} .

The function

β (\cdot)

and

Z_{i} (\cdot)

are given by

β (t) = \sqrt{2} sin (\frac{π t}{2}) + 3 \sqrt{2} sin (\frac{3 π t}{2}) and Z_{i} (t) = \sum_{j = 1}^{50} ξ_{j} v_{j} (t)

where

v_{j} (t) = \sqrt{2} sin ((j - 0.5) π t)

,

ξ_{j} = N (0, λ_{j})

and

λ_{j} = {((j - 0.5) π)}^{- 2}

.

The knots are selected according to the formula

C n^{\frac{1}{2 r}} log (n)

where

C \in [0.3, 1]

(Like in Wang and Cao [10]). We chose

C = 0.6

and we made 300 replications with samples of sizes

n = 500

and

n = 1000

.

Computations of the bias, the standard deviation (SD) and the Mean Squared Error (MSE) with respect to (i) the parameter

τ

, (ii) the parameter

γ

and (iii) the parameter

δ

are summarized, for

n = 500

(respectively,

n = 1000

), in the following Table 1, Table 2 and Table 3 (respectively, in the Table 4, Table 5 and Table 6).

4.2. Case 2: Logit Link Function

By taking a logit link function, data are generated from the model

logit {P [Y_{i} = 1 | X_{i}, Z_{i}]} = sin \{\frac{π (α^{⊤} X_{i} - A)}{B - A}\} + \int_{0}^{1} β (t) Z_{i} (t) d t + ε_{i}, i = 1, \dots, n .

for which we have kept the same parameters and the variables as for the identity link function. Then, similarly to the identity link function case, computations of the bias, SD and the MSE with respect to the parameters

τ

,

γ

and then

δ

are summarized, for

n = 500

(respectively,

n = 1000

), in the Table 7, Table 8 and Table 9 (respectively, in the Table 10, Table 11 and Table 12).

It is obviously seen that the quality of the estimators are illustrated via simulations. The method performs quite well. The Bias, SD and MSE are reasonably small in general. The parametric and nonparametric components, the single-index and also the slope function are computed by the procedure given in this paper.

Both tables correspondingly indicate the consistency of

\hat{α}

and

\hat{δ}

as the bias, SD and MSE decrease as the sample size increasing. The knots selection with formula

C n^{\frac{1}{2 r}} log (n)

by using

C \in [0.3, 1]

like in Li Wang and Guanqun CAO [10], we have chosen

C = 0.6

.

We developed our algorithm in both cases: the identity link function and the logistic link function. Simulations show that the PLGSIMF algorithm works well in both cases.

In the figure below (Figure 1), we illustrate 500 realizations of the functional random variable Z.

In the following figure (Figure 2), we observe the almost linearity of the single-index

u = α^{⊤} (τ) X

and its estimate

\hat{u} = {\hat{α}}^{⊤} (\hat{τ}) X .

In the figure below (Figure 3), we plot the slope function

β (.)

and its estimator

\hat{β} (.)

Our model approximates well the slope function

β (.)

.

The following figure (Figure 4) shows us the comparison between the non-parametric function

η (.)

and its estimator

\hat{η} (.) .

We consider that our model approximated to the best the non-parametric function

η (.) .

To study the performance of our estimation for non-parametric function

η (.)

and slope function, respectively, we will use the square root of average square errors criterion (RASE, see Peng et al. [20]):

{RASE}_{1} = {(\frac{1}{n} \sum_{i = 1}^{n} {(\hat{η} (u_{i}) - η (u_{i}))}^{2})}^{1 / 2}

{RASE}_{2} = {(\frac{1}{n} \sum_{i = 1}^{n} {(\hat{β} (t_{i}) - β (t_{i}))}^{2})}^{1 / 2}

The following tables (Table 13 amd Table 14) summarize the sample means, medians and variances of the RASE

_{i}

(

i = 1, 2

) with different sample sizes in the Gaussian case.

For the case

n = 500

, we get

Table 13. The RASE criterion with the non-parametric function

η (.)

and slope function

β (.)

for the case

n = 500

.

Table 13. The RASE criterion with the non-parametric function

η (.)

and slope function

β (.)

for the case

n = 500

.

Gaussian Cases	Mean	Median	Var
RASE $_{1}$	0.039	0.038	0.003
RASE $_{2}$	0.123	0.122	0.020

For the case

n = 1000

, we get

Table 14. The RASE criterion with the non-parametric function

η (.)

and slope function

β (.)

for the case

n = 1000

.

Table 14. The RASE criterion with the non-parametric function

η (.)

and slope function

β (.)

for the case

n = 1000

.

Gaussian Cases	Mean	Median	Var
RASE $_{1}$	0.016	0.016	0.001
RASE $_{2}$	0.027	0.125	0.006

The following tables (Table 15 and Table 16) summarize the sample means, medians and variances of the RASE

_{i}

(

i = 1, 2

) with different sample sizes in the Logistic case.

For the case

n = 500

, we get

Table 15. The RASE criterion with the non-parametric function

η (.)

and slope function

β (.)

for the case

n = 500

.

Table 15. The RASE criterion with the non-parametric function

η (.)

and slope function

β (.)

for the case

n = 500

.

Logistic Cases	Mean	Median	Var
RASE $_{1}$	0.045	0.044	0.023
RASE $_{2}$	0.133	0.103	0.0120

For the case where

n = 1000

, we get

Table 16. The RASE criterion with the nonparametric function

η (.)

and slope function

β (.)

for the case where

n = 1000

.

Table 16. The RASE criterion with the nonparametric function

η (.)

and slope function

β (.)

for the case where

n = 1000

.

Logistic Cases	Mean	Median	Var
RASE $_{1}$	0.028	0.026	0.014
RASE $_{2}$	0.124	0.121	0.002

We conclude that as the sample size n increases from 500 to 1000, the sample mean, median and variance of RASE

_{i}

(i = 1, 2) decrease.

5. Application to Tecator Data

In this paragraph, we will apply the PLGSIMF model for Tecator data, popularly known in the functional data analysis. This data can be downloaded from the following link http://lib.stat.cmu.edu/datasets/tecator (accessed on 1 August 2021). For more details, see Ferraty and Vieu [7].

Given 215 finely chopped pieces of meat, Tecator’s data contain their corresponding fat contents (

Y_{i}, i = 1, \dots, 215

), near-infrared absorbance spectra (

Z_{i}, i = 1, \dots, 215

) observed on 100 equally wavelengths in the range 850–1050 nm, the protein content

X_{1, i}

and the moisture content

X_{2, i}

. We are trying to predict the fat content of the finely chopped meat samples.

The following figure (Figure 5) shows the absorbance curves.

We divide the sample randomly into two sub-samples: the training

I_{1}

of size 160 and the test

I_{2}

of size 55. The training sample is used to estimate the parameters, and the test sample is employed to verify the quality of predictions. To perform our model, we use the mean square error of prediction (MSEP) like in Aneiros-Pérez and Vieu [11] defined as follows:

M S E P = \frac{1}{50} \sum_{i \in I_{2}} {(Y_{i} - {\hat{Y}}_{i})}^{2} / v a r_{I_{2}} (Y_{i}),

where

{\hat{Y}}_{i}

is the predicted value based on the training sample and

v a r_{I_{2}}

is variance of response variables’ test sample.

The following table (Table 17) shows the performance of our PLGSIMF model by comparing it with other models. We can conclude that PLGSIMF is competitive one for such data.

The following figure (Figure 6) shows us the estimator of the non-parametric function

\hat{η} (.)

.

The following figure (Figure 7) shows us the estimator of the slope function

\hat{β} (.)

.

6. Proofs

In what follows, when no confusion is possible, we will denote by C a generic positive constant.

The following lemmas, [21,22,23] will be used to prove Theorem 1. The proof of these lemmas will be developed in the Appendix A.

Lemma 1.

Under assumptions(C1)–(C4)and(C6)–(C8), we have

\sqrt{n} (\begin{matrix} \tilde{τ} - τ_{0} \\ \tilde{δ} - δ_{0} \end{matrix}) \overset{D}{⟶} N (0, A^{- 1} Σ_{1} A^{- 1}),

where

Σ_{1}

and A will be defined below and in the appendix for more details.

\overset{D}{\to}

denotes the convergence in distribution.

A = (\begin{matrix} A_{11} & A_{12} \\ A_{12}^{⊤} & A_{22} \end{matrix}) .

with

\begin{matrix} A_{11} & = & E [ρ_{2} (m_{o} (T)) {\{η_{0}^{'} (U_{τ, 0})\}}^{2} J^{⊤} (τ_{0}) X W^{⊤} J (τ_{0})] \\ A_{22} & = & E [ρ_{2} (m_{0} (T)) W W^{⊤}] \\ A_{12} & = & E [ρ_{2} (m_{o} (T)) η_{0}^{'} (U_{τ, 0}) J^{⊤} (τ_{0}) X W^{⊤}] \end{matrix}

Σ_{1} = E [q_{1}^{2} (m_{0} (T), Y) (\begin{matrix} η_{0}^{'} (U_{τ, 0}) J^{⊤} (τ_{0}) X \\ Z \end{matrix}) {(\begin{matrix} η_{0}^{'} (U_{τ, 0}) J^{⊤} (τ_{0}) X \\ Z \end{matrix})}^{⊤}]

By applying the

δ

-method, we get the following lemma.

Lemma 2.

Under the conditions of Lemma 1, we obtain

\sqrt{n} (\begin{matrix} α (\tilde{τ}) - α (τ_{0}) \\ \tilde{δ} - δ_{0} \end{matrix}) \overset{D}{⟶} N (0, R (τ_{0}) A^{- 1} Σ_{1} A^{- 1} R^{⊤} (τ_{0})),

where

R (τ) = (\begin{matrix} J (τ) & 0 \\ 0 & I_{N^{'} \times N^{'}} \end{matrix}) .

Furthermore

α (\tilde{τ}) - α (τ_{0}) = o_{I P} (\frac{1}{\sqrt{n}})

and

\tilde{δ} - δ_{0} = o_{I P} (\frac{1}{\sqrt{n}})

.

Lemma 3.

Under the conditions of Lemma 1, we obtain

∥ \hat{θ} - \tilde{θ} ∥ = O_{I P} (\sqrt{N_{n}} (h^{p} + \frac{1}{\sqrt{n h}})),

where

N_{n}

is number of B-splines basis functions of order r.

Then, we can enounce the following theorem.

Theorem 4.

Under assumptions(C1)–(C5)and(C6)–(C8), we obtain

{∥\hat{η} - η_{0}∥}_{2} = O_{I P} (\sqrt{N_{n}} (\frac{1}{\sqrt{n} h} + h^{p})),

and

{∥\hat{η} - η_{0}∥}_{n} = O_{I P} (\sqrt{N_{n}} (\frac{1}{\sqrt{n} h} + h^{p})) .

Theorem 5.

Under assumptions(C1)–(C8), and

k \sim n^{1 / (2 r + 1)}

, we obtain

∥ \hat{β} (\cdot) - β_{0} {(\cdot) ∥}^{2} = O_{I P} (N_{n}^{2} {(h^{p} + \frac{1}{\sqrt{n h}})}^{2}) + O_{I P} (n^{- 2 r / (2 r + 1)}) .

Theorem 6.

Under assumptions

(C_{1}) - (C_{11})

, the constrained quasi-likelihood estimators

\hat{α}

and

\hat{δ}

with

∥ \hat{α} ∥_{d} = 1

is asymptotically normally distributed, i.e.,

\sqrt{n} (\begin{matrix} \hat{α} - α_{0} \\ \hat{δ} - δ_{0} \end{matrix}) \overset{D}{⟶} N (0, R (τ_{0}) D^{- 1} R^{⊤} (τ_{0}))

where

D = E [ρ_{2} (m_{0} (T)) (\begin{matrix} η_{0}^{'} (U_{τ, 0}) J^{⊤} (τ_{0}) Φ (X) \\ Ψ (W) \end{matrix}) {(\begin{matrix} η_{0}^{'} (U_{τ, 0}) J^{⊤} (τ_{0}) Φ (X) \\ Ψ (W) \end{matrix})}^{⊤}] .

Notice that the proof of this theorem is very long. So, in order to save space and not to make this paper more difficult to read, we opted for adding a Supplementary Materials, containing necessary details.

Supplementary Materials

The following are available online at https://www.mdpi.com/article/10.3390/stats4040047/s1.

Author Contributions

Methodology, M.A., I.O., M.R., P.V.; Software, M.A., I.O.; Visualization, M.A., I.O.; Writing, M.A., I.O., M.R., P.V.; original draft, M.A., I.O.; Writing—review & editing, M.A., I.O., M.R., P.V. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Conflicts of Interest

The authors declare no conflict of interest.

Appendix A

In what follows, we will present results and technical lemmas that would be used for the proof of the previous theorems.

First of all, for all probability measures Q, we denote by

L^{2} (Q)

the space of squared integrable functions, i.e.,

L^{2} (Q) = \{f function such that Q f^{2} = \int f^{2} d Q < \infty\}

. Then, let

F

be a subclass of

L^{2} (Q)

. So, for all

f \in F

, we denote by

∥ f ∥ = {(\int f^{2} d Q)}^{\frac{1}{2}}

the norm of f with respect to Q.

We give the following definition that will be necessary to understand the results’ proofs.

Definition A1.

The δ-covering number, $N (δ, F, L^{2} (Q))$ , of $F$ is the smallest value N for which it exists functions $f_{1}, f_{2}, \dots, f_{N}$ , such that for each $f \in F$ , it exists $j \in {1, \dots, N}$ such that $∥ f - f_{j} ∥ < δ$ or that $F \subset ⋃_{j = 1}^{N} B (f_{j}, δ)$ .
Notice that $f_{j}$ s are not necessarily in $F .$
For two functions l and u, a bracketing $[l, u]$ is the set of functions f such that $l \leq f \leq u$ , i.e., $[l, u] = {f : l \leq f \leq u} .$
The δ-covering number with bracketing $N_{[]} (δ, F, L^{2} (Q))$ is defined as the smallest value of N, necessary to cover the whole $F$ , for which it exists pairs of functions $\{[f_{j}^{L}, f_{j}^{U}]; j = 1, \dots, N\}$ with $∥f_{j}^{U} - f_{j}^{L}∥ \leq δ$ , such that for each $f \in F$ , there is a $j \in {1, \dots, N}$ such that $f_{j}^{L} \leq f \leq f_{j}^{U}$ .
Notice that $f_{j}^{L}$ and $f_{j}^{U}$ are not necessary in $F$ .
The δ-entropy with bracketing is defined as $log N_{[]} (δ, F, L^{2} (Q))$ .
The uniform entropy integral $J_{[]} (δ, F, L^{2} (Q))$ is defined by

$J_{[]} (δ, F, L^{2} (Q)) = \int_{0}^{δ} {\{1 + log N_{[]} (κ, F, L^{2} (Q))\}}^{\frac{1}{2}} d κ .$

Let

Q_{n}

be the empirical measure of Q, i.e.,

Q_{n} = \frac{1}{n} \sum_{i = 1}^{n} δ_{X_{i}} (\cdot)

such that

Q_{n} f = E^{Q_{n}} [f] = \int f d Q_{n} = \frac{1}{n} \sum_{i = 1}^{n} \int f δ_{X_{i}} = \frac{1}{n} \sum_{i = 1}^{n} f (X_{i}) .

We denote by

G_{n} = \sqrt{n} (Q_{n} - Q)

the standardized empirical process indexed by

F

, and

{∥G_{n}∥}_{F} = sup_{f \in F} |G_{n} f|

. Then, for all

f \in F

, we have

Q f = E^{Q} [f (X)]

and

G_{n} f = \frac{1}{\sqrt{n}} \sum_{i = 1}^{n} (f (X_{i}) - E [f (X)])

.

Lemma A1.

(Lemma 3.4.2. in Van Der Vaart and Wellner [23])

Let

M_{0} > 0

and

F

be a uniformly bounded class of measurable functions such that, for all

f \in F

,

{∥ f ∥}_{\infty} < M_{0}

and

Q f^{2} < δ^{2}

. Then

E^{Q} [{∥G_{n}∥}_{F}] \leq c_{0} J_{[]} (δ, F, L^{2} (Q)) \{1 + \frac{J_{[]} (δ, F, L^{2} (Q))}{δ^{2} \sqrt{n}} M_{0}\},

where

c_{0}

is a finite constant, which does not depend on n.

Lemma A2.

(Lemma A.1. in Huang [24])

For any

λ > 0

, let

Θ_{n} = \{η (α_{0}^{⊤} x) + δ^{⊤} w such that ∥δ - δ_{0}∥ \leq λ, η \in S_{n}, {∥η - η_{0}∥}_{2} \leq λ\} .

Then, for any

ε \leq λ,

log N_{[]} (λ, Θ_{n}, L^{2} (P)) \leq C N_{n} log (\frac{λ}{ε}) .

Lemma A3.

(Lemma A.2. in Supplement Material of Wang and Yang [25] and Lemma A.4. in Xue and Yang [26])

Let

S_{n}

be the space of all polynomial spline functions of order r on

[a, b]

. Under conditions(C1)–(C5), we have

A_{n} = sup_{η_{1}, η_{2} \in S_{n}} |\frac{{〈η_{1}, η_{2}〉}_{n} - 〈η_{1}, η_{2}〉}{{∥η_{1}∥}_{2} {∥η_{2}∥}_{2}}| = O_{a . s .} \{\sqrt{\frac{log n}{n h}}\},

where a.s. means “almost surely”.

Recall that

θ = {(α^{⊤}, γ^{⊤}, S^{⊤})}^{⊤}

. Let

D_{n, θ} = (\begin{matrix} γ^{⊤} B^{'} (α^{⊤} (\tilde{τ}) X_{i}) J^{⊤} (τ) & 0 & 0 \\ 0 & I & 0 \\ 0 & 0 & B (α^{⊤} (\tilde{τ}) X_{i}) \end{matrix})

and

T_{i} = {(X_{i}^{⊤}, W_{i}^{⊤})}^{⊤}

.

Denote

W_{n, θ} = \frac{1}{n} \sum_{i = 1}^{n} D_{i, θ} (\begin{matrix} T_{i} \\ 1 \end{matrix}) {(\begin{matrix} T_{i} \\ 1 \end{matrix})}^{⊤} D_{i, θ}^{⊤}

and

W_{θ} = \frac{1}{n} \sum_{i = 1}^{n} E [D_{i, θ} (\begin{matrix} T_{i} \\ 1 \end{matrix}) {(\begin{matrix} T_{i} \\ 1 \end{matrix})}^{⊤} D_{i, θ}^{⊤}] .

Lemma A4.

(Lemma A.3 in the Supplement Material of Wang and Yang [25])

Under assumptions(C1)–(C5)and(C6)–(C8), it exists a positive constant C such that

sup_{θ} {∥W_{θ}^{- 1}∥}_{2} \leq C \sqrt{N_{n}}, a . s .,

and

\underset{θ}{Sup} {∥W_{n, θ}^{- 1}∥}_{2} \leq C \sqrt{N_{n}}, a . s .,

where

{∥ A ∥}_{2} = sup_{x \neq 0} \frac{∥ A x ∥}{∥ x ∥} = \underset{∥ x ∥ = 1}{error} ∥ A x ∥ .

In what follows, we will enounce lemmas allowing us to prove Theorem 2.

Lemma A5.

(Lemma 1 in Yu et al. [15])

Under assumptions conditions(C1)and(C8), we have

sup_{u \in [a, b]} |η_{0} (u) - γ_{0}^{⊤} B_{1} (u)| \leq C J^{- r} and sup_{t \in [a, b]} |β_{0} (t) - δ_{0}^{⊤} B_{1} (t)| \leq C k^{- r},

where J is the number of inner nodes for

B_{1},

and k is the number of inner nodes for

B_{2}

.

In what follows, we will give lemmas allowing to prove Theorem 3.

Lemma A6.

Under assumptions(C1)–(C8), we have

\frac{1}{n} \sum_{i = 1}^{n} ρ_{2} (m_{0 i}) {(\hat{η} (U_{τ, 0 i}) - η_{0} (U_{τ, 0 i}))}_{0}^{'} (U_{τ, 0 i}) J^{⊤} (τ_{0}) Φ (X_{i}) = o_{I P} (\frac{1}{\sqrt{n}}),

(A1)

\frac{1}{n} \sum_{i = 1}^{n} ρ_{2} (m_{0 i}) η_{0}^{'} (U_{τ, 0 i}) Φ (X_{i}) Υ^{⊤} (U_{τ, 0 i}) J (τ_{0}) (\hat{τ} - τ_{0}) = o_{I P} (\frac{1}{\sqrt{n}}),

(A2)

\frac{1}{n} \sum_{i = 1}^{n} ρ_{2} (m_{0 i}) η_{0}^{'} (U_{τ, 0 i}) Φ (X_{i}) Γ^{⊤} (U_{τ, 0 i}) (\hat{δ} - δ_{0}) = o_{I P} (\frac{1}{\sqrt{n}}),

(A3)

where

Υ (u_{τ, 0}) = \frac{E [X ρ_{2} (m_{0} (T)) | U_{τ, 0} = u_{τ, 0}]}{E [ρ_{2} (m_{0} (T)) | U_{τ, 0} = u_{τ, 0}]},

Γ (u_{τ, 0}) = \frac{E [W ρ_{2} (m_{0} (T)) | U_{τ, 0} = u_{τ, 0}]}{E [ρ_{2} (m_{0} (T)) | U_{τ, 0} = u_{τ, 0}]},

Φ (x) = Φ (U_{τ, 0}, x) = x - Υ (u_{τ, 0}),

and

Ψ (w) = Ψ (U_{τ, 0}, w) = w - Γ (u_{τ, 0}) .

Lemma A7.

Under assumptions(C1)–(C8), we have

\begin{matrix} \frac{1}{n} \sum_{i = 1}^{n} ρ_{2} (m_{o i}) (\hat{η} (U_{τ, o i}) - η_{0} (U_{τ, o i})) Ψ (T_{i}) = o_{I P} (\frac{1}{\sqrt{n}}), \end{matrix}

(A4)

\begin{matrix} \frac{1}{n} \sum_{i = 1}^{n} ρ_{2} (m_{o i}) η_{0}^{'} (U_{τ, o i}) Ψ (T_{i}) Υ^{⊤} (U_{τ, o i}) J (τ_{0}) (\hat{τ} - τ_{0}) = o_{I P} (\frac{1}{\sqrt{n}}), \end{matrix}

(A5)

\begin{matrix} \frac{1}{n} \sum_{i = 1}^{n} ρ_{2} (m_{o i}) Ψ (U_{τ, o i}, W_{i}) Γ^{⊤} (U_{τ, o i}) (\hat{δ} - δ_{0}) = o_{I P} (\frac{1}{\sqrt{n}}) . \end{matrix}

(A6)

Summary

In this paper, we introduce estimates for the Generalized Partially Linear Single-Index Models for Functional Data (PLGSIMF). Our estimates are obtained via the Fisher Scoring update equation derived from the quasi likelihood function and the normalized B-splines basis with their derivatives.

We prove the n-consistency and asymptotic normality of our estimates and therefore, firstly, we define estimates, with rates, of the estimator

\hat{η}

, which still converges at the rate to the true non-parametric function

η

. Secondly, we define estimates, with rates, of the estimator

\hat{β}

, which still converges at the rate to the slope function

β

. Finally, we define estimates, with rates, of the estimator

\hat{α}

and

\hat{δ}

, which still converge at the rate to non-parametric parameters

α

and functional parameters

δ

, respectively, which still converge normally to the true parameters. A numerical study reveals that our estimation procedure performs well in higher dimensions. The quality of the estimators is illustrated via simulations.

References

Nelder, J.A.; Wedderburn, R.W.M. Generalized Linear Models. J. R. Stat. Soc. Ser. A 1972, 135, 370–384. [Google Scholar] [CrossRef]
McCullagh, P.; Nelder, J.A. Generalized Linear Models, 2nd ed.; Chapman and Hall: London, UK, 1972. [Google Scholar]
Hastie, T.J.; Tibshirani, R.J. Generalized Additive Models; Chapman and Hall: London, UK, 1990. [Google Scholar] [CrossRef]
Wood, S. Generalized Additive Models: An Introduction with R, Second ed.; CRC Press: Boca Raton, FL, USA, 2017. [Google Scholar] [CrossRef]
Dunn, P.K.; Smyth, G.K. Generalized Linear Models with Examples in R; Springer Texts in Statistics; Springer: New York, NY, USA, 2018. [Google Scholar]
Lai, P.; Tian, Y.; Lian, H. Estimation and variable selection for generalised partially linear single-index models. J. Nonparametr. Stat. 2014, 26, 171–185. [Google Scholar] [CrossRef]
Ferraty, F.; Vieu, P. Nonparametric Functional Data Analysis: Theory and Practice; Springer Series in Statistics; Springer: New York, NY, USA, 2006. [Google Scholar]
Caroll, R.J.; Fan, J.; Gijbels, I.; Wand, M.P. Generalized partially linear single-index models. J. Am. Stat. Assoc. 1997, 92, 477–489. [Google Scholar] [CrossRef]
Li, C.S.; Lu, M. A lack-of-fit test for generalized linear models via single-index techniques. Comput. Stat. 2018, 33, 731–756. [Google Scholar] [CrossRef]
Li, W.; Cao, G. Efficient estimation for generalized partially linear single-index models. Bernoulli 2018, 24, 1101–1127. [Google Scholar]
Aneiros-Perez, G.; Vieu, P. Semi functional partial linear regression. Stat. Probab. Lett. 2006, 76, 1102–1110. [Google Scholar] [CrossRef]
Horváth, L.; Kokoszka, P. Inference for Functional Data with Applications. Comput. Sci. 2012. [Google Scholar] [CrossRef]
Aneiros-Perez, G.; Vieu, P. Partial linear modelling with multi-functional covariates. Comput. Stat. 2015, 30, 647–671. [Google Scholar] [CrossRef]
Ramsay, J.O.; Silverman, B.W. Functional Data Analysis; Springer: New York, NY, USA, 2005. [Google Scholar]
Yu, P.; Du, J.; Zhang, Z. Single-index partially functional linear regression model. Stat. Pap. 2020, 61, 1107–1123. [Google Scholar] [CrossRef]
Cao, R.; Du, J.; Zhou, J.; Xie, T. FPCA-based estimation for generalized functional partially linear models. Stat. Pap. 2020, 61, 2715–2735. [Google Scholar] [CrossRef]
Rachdi, M.; Alahiane, M.; Ouassou, I.; Vieu, P. Generalized Functional Partially Linear Single-index Models. In IWFOS 2020: Functional and High-Dimensional Statistics and Related Fields; Springer: Cham, Switzerland, 2020; pp. 221–228. [Google Scholar] [CrossRef]
De Boor, C. A Practical Guide to Splines; Revised Edition of Applied Mathematical Sciences; Springer: Berlin, Germany, 2001; Volume 27. [Google Scholar]
Yu, Y.; Ruppert, D. Penalized spline estimation for partially linear single-index models. J. Am. Stat. Assoc. 2002, 16, 1042–1054. [Google Scholar] [CrossRef]
Peng, Q.; Zhou, J.; Tang, N. Varying coefficient partially functional linear regression models. Stat. Pap. 2015, 57, 827–841. [Google Scholar] [CrossRef]
Pollard, D. Asymptotics for least absolute deviation regression estimators. Econom. Theory 1991, 7, 186–199. [Google Scholar] [CrossRef]
Stone, C.J. The dimensionality reduction principle for generalized additive models. Ann. Stat. 1986, 14, 590–606. [Google Scholar] [CrossRef]
Van der Vaart, A.W.; Wellner, J.A. Weak Convergence and Empirical Processes with Applications To Statistics; Springer: New York, NY, USA, 1996. [Google Scholar]
Huang, J. Efficient estimation of the partly linear additive Cox model. Ann. Stat. 1999, 27, 1536–1563. [Google Scholar] [CrossRef]
Li, W.; Yang, L. Spline estimation of single-index models. Stat. Sin. 2009, 19, 765–783. [Google Scholar]
Xue, L.; Yang, L. Additive coefficient modelling via polynomial spline. Stat. Sin. 2006, 16, 1423–1446. [Google Scholar]

Figure 1. A sample of curves

{Z_{i} (t), t \in [0, 1]}_{i = 1, \dots, 500}

.

Figure 1. A sample of curves

{Z_{i} (t), t \in [0, 1]}_{i = 1, \dots, 500}

.

Figure 2. Single index

u = α^{⊤} X

versus predicted single index

\hat{u} = {\hat{α}}^{⊤} X

.

Figure 2. Single index

u = α^{⊤} X

versus predicted single index

\hat{u} = {\hat{α}}^{⊤} X

.

Figure 3. Estimated slope function

\hat{β} (.)

and slope function

β (.)

.

Figure 3. Estimated slope function

\hat{β} (.)

and slope function

β (.)

.

Figure 4. Estimated non-parametric function

\hat{η} (.)

and non-parametric function

η (.)

.

Figure 4. Estimated non-parametric function

\hat{η} (.)

and non-parametric function

η (.)

.

Figure 5. Sample of 100 absorbance curves Z.

Figure 6. Estimated nonparametric function

\hat{η} (.)

.

Figure 6. Estimated nonparametric function

\hat{η} (.)

.

Figure 7. Estimated slope function

\hat{β} (.)

.

Figure 7. Estimated slope function

\hat{β} (.)

.

Table 1. Bias, SD and MSE according to the parameter

τ

for PLGSIMF with the identity link function and

n = 500

.

Table 1. Bias, SD and MSE according to the parameter

τ

for PLGSIMF with the identity link function and

n = 500

.

	$τ_{1}$	$τ_{2}$
Bias	0.004	−0.0009
SD	0.0031	0.0025
MSE	2.6565 × 10 $^{- 5}$	7.1819 × 10 $^{- 6}$

Table 2. Bias, SD and MSE evolutions with respect to the parameter

γ

variation for PLGSIMF with the identity link function and

n = 500

.

Table 2. Bias, SD and MSE evolutions with respect to the parameter

γ

variation for PLGSIMF with the identity link function and

n = 500

.

	$γ_{1}$	$γ_{2}$	$γ_{3}$	$γ_{4}$	$γ_{5}$	$γ_{6}$	$γ_{7}$	$γ_{8}$	$γ_{9}$
Bias	−0.0256	−0.0575	0.0526	0.1230	0.0471	−0.0015	−0.0148	0.0241	0.0005
SD	0.1834	0.2430	0.3037	0.3621	0.2049	0.3756	0.3101	0.2281	0.1710
MSE	0.0343	0.0624	0.0950	0.1463	0.0442	0.1411	0.0964	0.0526	0.0292

Table 3. Bias, SD and MSE evolutions with respect to the parameter

δ

variation for PLGSIMF with the identity link function and

n = 500

.

Table 3. Bias, SD and MSE evolutions with respect to the parameter

δ

variation for PLGSIMF with the identity link function and

n = 500

.

	$δ_{1}$	$δ_{2}$	$δ_{3}$	$δ_{4}$	$δ_{5}$	$δ_{6}$	$δ_{7}$
Bias	−0.3444	−0.5969	−0.2527	0.0696	0.1093	0.0746	0.2909
SD	0.9457	0.5569	0.2565	0.0916	0.1059	0.1150	0.1669
MSE	1.0131	0.6664	0.1296	0.0132	0.0231	0.0188	0.1125

Table 4. Bias, SD and MSE according to the parameter

τ

for PLGSIMF with the identity link function and

n = 1000

.

Table 4. Bias, SD and MSE according to the parameter

τ

for PLGSIMF with the identity link function and

n = 1000

.

	$τ_{1}$	$τ_{2}$
Bias	0.0054	0.0021
SD	0.0023	0.0021
MSE	3.5152 × 10 $^{- 5}$	9.5050 × 10 $^{- 6}$

Table 5. Bias, SD and MSE evolutions with respect to the parameter

γ

variation for PLGSIMF with the identity link function and

n = 1000

.

Table 5. Bias, SD and MSE evolutions with respect to the parameter

γ

variation for PLGSIMF with the identity link function and

n = 1000

.

	$γ_{1}$	$γ_{2}$	$γ_{3}$	$γ_{4}$	$γ_{5}$	$γ_{6}$	$γ_{7}$	$γ_{8}$	$γ_{9}$
Bias	−0.0969	−0.2220	−0.1260	−0.0815	−0.0806	−0.0135	0.2509	0.2234	0.1636
SD	0.1583	0.2194	0.2478	0.3041	0.2098	0.3092	0.2509	0.2234	0.1636
MSE	0.0344	0.0974	0.0773	0.0991	0.0505	0.0958	0.1259	0.0998	0.0535

Table 6. Bias, SD and MSE evolutions with respect to the parameter

δ

variation for PLGSIMF with the identity link function and

n = 1000

.

Table 6. Bias, SD and MSE evolutions with respect to the parameter

δ

variation for PLGSIMF with the identity link function and

n = 1000

.

	$δ_{1}$	$δ_{2}$	$δ_{3}$	$δ_{4}$	$δ_{5}$	$δ_{6}$	$δ_{7}$
Bias	−0.2189	0.0665	−0.0113	−0.0025	−0.0171	−0.0006	−0.0231
SD	1.1558	0.3477	0.1447	0.0644	0.0712	0.0716	0.1296
MSE	1.3839	0.1253	0.0210	0.0041	0.0053	0.0051	0.0173

Table 7. Bias, SD and MSE evolutions with respect to the parameter

τ