Partially Linear Generalized Single Index Models for Functional Data (PLGSIMF)

: Single-index models are potentially important tools for multivariate non-parametric regression analysis. They generalize linear regression models by replacing the linear combination α (cid:62) 0 X with a non-parametric component η 0 (cid:16) α (cid:62) 0 X (cid:17) , where η 0 ( · ) is an unknown univariate link function. In this article, we generalize these models to have a functional component, replacing the generalized partially linear single index models η 0 (cid:16) α (cid:62) 0 X (cid:17) + β (cid:62) 0 Z , where α is a vector in IR d , η 0 ( · ) and β 0 ( · ) are unknown functions that are to be estimated. We propose estimates of the unknown parameter α 0 , the unknown functions β 0 ( · ) and η 0 ( · ) and establish their asymptotic distributions, and furthermore, a simulation study is carried out to evaluate the models and the effectiveness of the proposed estimation methodology. identity link function and the logistic link function. Simulations show that the PLGSIMF algorithm works well in both cases.


Introduction
Generalized linear models are proposed by Nelder and Wedderburn [1], g(µ(X)) = β X; for a detail review, we refer the readers to McCullagh and Nelder [2]; it consists of a random component and systematic component. GLMs assume the responses come from the exponential dispersion model family. They extend linear models to allow the relationship between the predictors and the function of the mean of continuous or discrete response through a canonical link function. These models encounter problems such as the canonical link function is sometimes unknown, the link between response and predictors can be complex as well as the plague of dimension reduction. To address these problems, several approaches have been developed. Hastie and Tibshirani [3] propose the GAMs models, in which the linear predictor depends linearly on smooth of predictor variables, one of the criticisms of these models is that they do not take into consideration the interactions between covariates. The manuscripts of Wood [4] and Dunn, Peter, Smyth, Gordon [5] are the latest references dealing with these two models. The single index model had been employed to reduce the dimensionality of data, and avoid the "curse of dimensionality" while maintaining the advantages of non-parametric smoothing in multivariate regression cases over the last few decades, see for example the work of Lai et al. [6].
The single index α X aggregates the influence of the observed values X = (X 1 , · · · , X d ) of the explanatory variables into one number.
Examples of economic index include the following: a stock index, inflation index, cost-of-living index, and price index. Furthermore, this idea had first been extended to the functional setting by Ferraty, Vieu et al. [7] for functional regression problems, which led to the functional single index 1 0 β(t)Z(t)dt in order to remedy the interaction effects, the dimension scourge and to take into account the functional random variables.
The paper is organized as follows. In Sections 1 and 2, we localize our model in the literature, and we present the Fisher-scoring update algorithm used to estimate our singleindex vector, the parametric function and the slope function. In Section 3, we investigate an asymptotic study of the estimators presented in the paper. Numerical simulation in the Gaussian case as in the logistic case is presented in Section 4. The proofs of the results are developed in Section 6 and in the Appendix A for the different technical lemmas necessary to develop our asymptotic study both for the non-parametric function, for the single-index vector and for the slope function.
Let H be a separable Hilbert space, which is endowed with the scalar product < ·, · > H and the norm || · || H . Let Y be a scalar response variable and (X, Z) ∈ IR d × H be the predictor vector where X = (X 1 , . . . , X d ) and Z to be a functional random variable that is valued in H. For a fixed (x, z) ∈ IR d × H, we assume that the conditional density function of the response Y given (X, Z) = (x, z) belongs to the following canonical exponential family: where B and C are two known functions that are defined from IR into IR, and ξ : IR d × H −→ IR is the parameter in the generalized parametric linear model, which is linked to the dependent variable where B denotes the first derivative of the function B. In what follows, we consider the function g(µ(x, z)) as a generalized single-index partially functional linear model: where α = (α 1 , α 2 , . . . , α d ) ∈ IR d is the d-dimensional single-index coefficient vector, β is the coefficient function in the functional component, and η 0 is the unknown single-index link function which will be assumed to be sufficiently smooth. If the conditional variance Var(Y|X = x, Z = z) = σ 2 V(µ(x, z)), where V is an unknown positive function, then the estimation of the mean function g(µ) may be obtained by replacing the log-likelihood f Y|X=x,Z=z given by (1), by the quasi-likelihood Q(u, v) given by for any real numbers u and v, which may be written as

Estimation Methodology
..,n be a sequence of independent and identically distributed (i.i.d.) as (X, Y, Z) and, for each i = 1, . . . , n, We assume that the function η 0 is supported within the interval [a, b] where a = inf(α X) and b = sup(α X).
We introduce a sequence of knots (k m ) in the interval [a, b], with J interior knots, such that k −r+1 = · · · = k −1 = k 0 = a < k 1 < · · · < k J = k J+1 = · · · = k J+r , where J := J n is a sequence of integers which increases with the sample size n. Now, let N n = J n + r be the number of knots, B j (u) j=1,...,N n be the B-spline basis functions of order r, and h = (b − a)/(J n + 1) be the distance between the neighbors knots. Let S n be the space of polynomial splines on [a, b] of order r ≥ 1. By De Boor [18], we can approximate η 0 , assumed in H(p) (which will be defined in section 3) by a functioñ η ∈ S n . So, we can writeη(u) =γ B(u) where B(u) is the spline basis andγ ∈ IR N n is the spline coefficient vector.
We introduce a new knots sequence 0 = t 0 < t 1 < · · · < t k+1 = 1 of [0, 1]. Then, there exists N = k + r + 1 functions in the B-splines basis which are normalized and of order r, such that and w and W i are defined accordingly to (5), the mean function estimator µ(x, z) is then given by the evaluation of the parameter θ = α , γ , δ and by inverting the following Notice that the parameter θ = α , γ , δ is determined by maximizing the following quasi-likelihood rule where U 0i = α 0 X i with α 0 , γ 0 , δ 0 , η 0 , β 0 denoting the true values, respectively, of α, γ, δ, η and β.
To overcome the constraint α = 1 and α 1 > 0 of the d-dimensional index α, we proceed by a re-parameterization, which is similar to Yu and Ruppert [19] The true value τ 0 of τ, must satisfy τ 0 ≤ 1. Then, we assume that τ 0 < 1. The jacobian matrix of α : Notice that τ is unconstrained and is one dimension lower than α. Finally, let and denote q l (m, y) = ∂ l ∂m l Q g −1 (m), y , for l = 1, 2. Then, The score vector is then The expectation of the Hessian matrix is The Fisher Scoring update equations θ It follows that is the estimator of the single-index coefficient vector of the PLGSIMF model.

Some Asymptotics
We present asymptotic properties of the estimators for the non-parametric components, the functional component, the single-index coefficient vector and the slope function of the PLGSIMF model. For this aim, we will need some assumptions.

Some Additional Notions and Assumptions
]Let ϕ, ϕ 1 and ϕ 2 be measurable functions on [a, b]. We define the empirical inner product ϕ 1 , ϕ 2 n and its corresponding norm ϕ n as follows If ϕ, ϕ 1 and ϕ 2 are L 2 -integrable, we define the theoretical inner product and its corresponding norm as follows Let v ∈ N * and e ∈ (0, 1] such that p = v + e > 1.5. We denote by H(p) the collection of functions g, which are defined on [a, b] whose v-th order derivative, g (v) , exists and satisfies the following e-th order Lipschitz condition (C2) For all m ∈ IR and for all y in the range of the response variable Y, the function q 2 (m, y) is strictly negative, and for k = 1, 2, there exist some positive constants c q and C q such that c q < q k 2 (m, y) < C q . (C3) The marginal density function of α X is continuous and bounded away from zero and is infinite on its support [a, b]. The v-th order partial derivatives of the joint density function of X satisfy the Lipschitz condition of order α (α ∈ (0, 1]).
(C4) For any vector τ, there exist positive constants c τ and C τ , such that where t = 1 + N n + N and T = X , W .
(C5) The number of knots N n satisfy n (C10) For some finite positive constants C g , C * g and M 1 , the link function g, in the model (3), satisfies:

Estimators Consistencies
Next we formulate several assertions on the considered estimators.

Estimation of the Nonparametric Component
The following theorem states the convergence, with rates, of the estimator η.
where O IP denotes a "grand O of Landau" in probability.
Proof of Theorem 1. The proof of the previous theorem is given in the Appendix A.

Estimation of the Slope Function
Theorem 2. Under assumptions (C 1 ) − (C 8 ), and k ∼ n 1/(2r+1) , we have Proof of Theorem 2. The proof of the previous theorem is given in the Appendix A.

Estimation of the Parametric Components
The next theorem shows that the maximum quasi-likelihood estimator is root-n consistent and is asymptotically normal, although the convergence rate of the non-parametric component η is slower than root-n. Before enouncing the theorem, let us denote

Theorem 3.
Under assumptions (C 1 ) − (C 11 ), the constrained quasi-likelihood estimators α and δ with α d = 1 are jointly asymptotically normally distributed, i.e., where D → denotes the convergence in distribution, and Proof of Theorem 3. The proof of the previous theorem is given in the Appendix A.

Comments on the Assumptions
The smoothness condition in (C1) describes that the single-index function η 0 (·) can be approximated by functions in the B-spline space with a normalized basis. On the other hand, the condition (C2) ensures the uniqueness of the solution, where the condition (C3) is a smoothness condition on the joint and marginal density functions of α X and X. The condition (C5) allows to obtain the rate of growth of the dimension of the spline spaces relative to the sample size. Conditions (C6) and(C7) are required for covariates function Z, and (C8 ) is a smoothness condition for slope function. Conditions, (C4) and (C9)-(C11) are technical lemmas that will be used to prove the cited theorems in this article .
Then, in this paper, we introduce a new generalized functional partially linear singleindex model based on a combination of polynomial smoothing. The asymptotic properties of the resulting estimators under certain regularity asymptions are established for this model and hence the non-parametric component η and the slope function β are evaluated by the B-spline functions. Finally, we give some simulations to illustrate our results.

A Numerical Study
We conduct a simulation study in order to show our results' effectiveness. We will treat two main cases of link functions: the identity and the logit link functions.

Case 1: Identity Link Function
We consider the case where the link function is the identity and the model The responses Y i are simulated according to the Equation (6), X i are taken uniformly over the interval [−0.5, 0.5], whereas the errors are normally distributed with mean 0 and variance 0.01, ε i ∼ N (0, 0.01). Moreover, we take the following coefficients The function β(·) and Z i (·) are given by The knots are selected according to the formula Cn 1 2r log(n) where C ∈ [0.3, 1] (Like in Wang and Cao [10]). We chose C = 0.6 and we made 300 replications with samples of sizes n = 500 and n = 1000.
Computations of the bias, the standard deviation (SD) and the Mean Squared Error (MSE) with respect to (i) the parameter τ, (ii) the parameter γ and (iii) the parameter δ are summarized, for n = 500 (respectively, n = 1000), in the following Tables 1-3 (respectively,  in the Tables 4-6).

Case 2: Logit Link Function
By taking a logit link function, data are generated from the model for which we have kept the same parameters and the variables as for the identity link function. Then, similarly to the identity link function case, computations of the bias, SD and the MSE with respect to the parameters τ, γ and then δ are summarized, for n = 500 (respectively, n = 1000), in the Tables 7-9 (respectively, in the Tables 10-12).   It is obviously seen that the quality of the estimators are illustrated via simulations. The method performs quite well. The Bias, SD and MSE are reasonably small in general. The parametric and nonparametric components, the single-index and also the slope function are computed by the procedure given in this paper.
Both tables correspondingly indicate the consistency ofα andδ as the bias, SD and MSE decrease as the sample size increasing. The knots selection with formula Cn 1 2r log(n) by using C ∈ [0.3, 1] like in Li Wang and Guanqun CAO [10], we have chosen C = 0.6.
We developed our algorithm in both cases: the identity link function and the logistic link function. Simulations show that the PLGSIMF algorithm works well in both cases.
In the figure below (Figure 1), we illustrate 500 realizations of the functional random variable Z.
In the following figure (Figure 2), we observe the almost linearity of the single-index u = α (τ)X and its estimateû =α (τ)X. In the figure below (Figure 3), we plot the slope function β(.) and its estimatorβ(.)  We consider that our model approximated to the best the non-parametric function η(.).
To study the performance of our estimation for non-parametric function η(.) and slope function, respectively, we will use the square root of average square errors criterion (RASE, see Peng et al. [20]): The following tables (Tables 13 amd 14) summarize the sample means, medians and variances of the RASE i (i = 1, 2) with different sample sizes in the Gaussian case.
For the case n = 500, we get For the case n = 1000, we get The following tables (Tables 15 and 16) summarize the sample means, medians and variances of the RASE i (i = 1, 2) with different sample sizes in the Logistic case.
For the case n = 500, we get For the case where n = 1000, we get We conclude that as the sample size n increases from 500 to 1000, the sample mean, median and variance of RASE i (i = 1, 2) decrease.

Application to Tecator Data
In this paragraph, we will apply the PLGSIMF model for Tecator data, popularly known in the functional data analysis. This data can be downloaded from the following link http://lib.stat.cmu.edu/datasets/tecator (accessed on 1 August 2021 ). For more details, see Ferraty and Vieu [7].
Given 215 finely chopped pieces of meat, Tecator's data contain their corresponding fat contents (Y i , i = 1, . . . , 215), near-infrared absorbance spectra (Z i , i = 1, . . . , 215) observed on 100 equally wavelengths in the range 850-1050 nm, the protein content X 1,i and the moisture content X 2,i . We are trying to predict the fat content of the finely chopped meat samples.
The following figure (Figure 5) shows the absorbance curves. We divide the sample randomly into two sub-samples: the training I 1 of size 160 and the test I 2 of size 55. The training sample is used to estimate the parameters, and the test sample is employed to verify the quality of predictions. To perform our model, we use the mean square error of prediction (MSEP) like in Aneiros-Pérez and Vieu [11] defined as follows: whereŶ i is the predicted value based on the training sample and var I 2 is variance of response variables' test sample. The following table (Table 17) shows the performance of our PLGSIMF model by comparing it with other models. We can conclude that PLGSIMF is competitive one for such data.

Functional Models MSEP
The following figure ( Figure 6) shows us the estimator of the non-parametric function η(.).

Proofs
In what follows, when no confusion is possible, we will denote by C a generic positive constant.
The following lemmas, [21][22][23] will be used to prove Theorem 1. The proof of these lemmas will be developed in the Appendix A.
where Σ 1 and A will be defined below and in the appendix for more details. D → denotes the convergence in distribution.
By applying the δ-method, we get the following lemma.

Lemma 2.
Under the conditions of Lemma 1, we obtain

Lemma 3.
Under the conditions of Lemma 1, we obtain where N n is number of B-splines basis functions of order r.
Then, we can enounce the following theorem.

Theorem 6.
Under assumptions (C 1 ) − (C 11 ), the constrained quasi-likelihood estimatorsα and δ with α d = 1 is asymptotically normally distributed, i.e., Notice that the proof of this theorem is very long. So, in order to save space and not to make this paper more difficult to read, we opted for adding a Supplementary Materials, containing necessary details.

Conflicts of Interest:
The authors declare no conflict of interest.

Appendix A
In what follows, we will present results and technical lemmas that would be used for the proof of the previous theorems.
First of all, for all probability measures Q, we denote by L 2 (Q) the space of squared integrable functions, i.e., L 2 (Q) = f function such that Q f 2 = f 2 dQ < ∞ . Then, let F be a subclass of L 2 (Q). So, for all f ∈ F , we denote by f = f 2 dQ 1 2 the norm of f with respect to Q. We give the following definition that will be necessary to understand the results' proofs.

•
The δ-covering number, N δ, F , L 2 (Q) , of F is the smallest value N for which it exists functions f 1 , f 2 , . . . , f N , such that for each f ∈ F , it exists j ∈ {1, . . . , N} such that Notice that f j s are not necessarily in F .

•
For two functions l and u, a bracketing [l, u] is the set of functions f such that l ≤ f ≤ u, i.e., [l, u] The δ-covering number with bracketing N [] δ, F , L 2 (Q) is defined as the smallest value of N, necessary to cover the whole F , for which it exists pairs of functions f L j , f U j ; j = 1, . . . , N with f U j − f L j ≤ δ, such that for each f ∈ F , there is a j ∈ {1, . . . , N} such that Notice that f L j and f U j are not necessary in F .

•
The uniform entropy integral J [] δ, F , L 2 (Q) is defined by Let Q n be the empirical measure of Q, i.e., Q n = 1 n We denote by G n = √ n(Q n − Q) the standardized empirical process indexed by F , and G n F = sup where c 0 is a finite constant, which does not depend on n.
In what follows, we will enounce lemmas allowing us to prove Theorem 2.
where J is the number of inner nodes for B 1 , and k is the number of inner nodes for B 2 .
In what follows, we will give lemmas allowing to prove Theorem 3.

Summary
In this paper, we introduce estimates for the Generalized Partially Linear Single-Index Models for Functional Data (PLGSIMF). Our estimates are obtained via the Fisher Scoring update equation derived from the quasi likelihood function and the normalized B-splines basis with their derivatives.
We prove the n-consistency and asymptotic normality of our estimates and therefore, firstly, we define estimates, with rates, of the estimatorη, which still converges at the rate to the true non-parametric function η. Secondly, we define estimates, with rates, of the estimatorβ, which still converges at the rate to the slope function β. Finally, we define estimates, with rates, of the estimatorα andδ, which still converge at the rate to non-parametric parameters α and functional parameters δ, respectively, which still converge normally to the true parameters. A numerical study reveals that our estimation procedure performs well in higher dimensions. The quality of the estimators is illustrated via simulations.