Sparse Semi-Functional Partial Linear Single-Index Regression †

: The variable selection problem is studied in the sparse semi-functional partial linear model, with single-index type inﬂuence of the functional covariate in the response. The penalized least squares procedure is employed for this task. Some properties of the resultant estimators are derived: the existence (and rate of convergence) of a consistent estimator for the parameters in the linear part and an oracle property for the variable selection method. Finally, a real data application illustrates the good performance of our procedure.


Introduction
In many real problems, to predict the value of a random variable, observations of many other variables are available. However, in many cases, it is unknown which of them (very few) have a real influence in the response. In this practical framework, we need procedures able to select the relevant variables to avoid high-dimensionality problems. Reducing the complexity of the model becomes even more crucial when regression involves a functional variable too (data are functions, images. . . ). Therefore, the main goal is the simplification of the model, which makes easier both its estimation and interpretation, without losing its predictive efficiency.
These practical problems have motived the peak of semiparametric models in the functional regression, together with the variable selection procedures. In [1] the penalized least squares method for estimation and variable selection is studied for the partial linear model with functional covariate. In this model, the real variables have a linear effect (involving interpretable coefficients that are the parameters) in the response, while the infinite-dimensional covariate has a nonlinear (nonparametric) influence. However, in real data applications, it would be interesting having parameters related to the functional variable to derive practical interpretations. This is one of the advantages of the semi-functional partial linear single-index model (SFPLSIM): the real covariates also affect in a linear way to the response, but the infinite-dimensional covariate influences it trough a projection in an unknown direction, after applying a nonlinear link function. This direction of projection behaves like a function-parameter that could have interesting interpretations. Some theoretical properties related to the nonparametric estimation of the functional single-index model are given in [2]. In this paper, we will study the sparse SFPLSIM, focusing in the variable selection problem. For this purpose, we will use the penalized least squares procedure for estimating the parameters of the lineal components and, simultaneously, selecting the relevant covariates. The properties of the estimators will be analysed from a theoretical point of view: we will set its convergence rates and the consistency for selecting the model. These results will be illustrated through a real data application.

The Model
The SFPLSIM is defined by the relationship where Y i denotes a scalar response, X i1 , . . . , X ip n are random covariates taking values in R and X i is a functional random covariate valued in a separable Hilbert space H with inner product ·, · . β β β 0 = (β 01 , . . . , β 0p n ) ∈ R p n , θ 0 ∈ H and m(·) are a vector of unknown real parameters, an unknown functional direction and an unknown smooth real-valued function, respectively. Finally, ε i is the random error, which verifies E ε i |X i1 , . . . , X ip n , X i = 0.

The Penalized Least-Squares Estimators
For the purpose of simultaneously estimating β-parameters and selecting relevant X-covariates in the SFPLSIM (1), we will apply the penalized least-squares approach. For that, in a first step we transform the SFPLSIM in a linear model by extracting from Y i and X ij (j = 1, . . . , p n ) the effect of the functional covariate X i when is projected on the direction θ 0 . Specifically, denoting by X X X i = X i1 , X i2 , . . . , X ip n , X X X = (X X X 1 , . . . , X X X n ) and Y Y Y = (Y 1 , . . . , Y n ) , the fact that allows to consider the following approximate linear model (see Appendix A for understanding the notation): where ε ε ε = (ε 1 , . . . , ε n ) . Then, in a second step, the penalized least-squares approach is applied to model (3). Specifically, β β β 0 and θ 0 are estimated by considering a minimizer, ( β β β 0 , θ 0 ), of the penalized profile least-squares function where β β β = (β 1 , . . . , β p n ) , P λ jn (·) is a penalty function and λ j n > 0 is a tuning parameter. Note that, simultaneously to the parameter estimation, the previous procedure can be considered as a variable selection method: if β β β 0j is a non-null component of β β β 0 , then X j is selected as an influential variable. From now on, we will denote J n = {1, . . . , p n } and S n ⊂ J n such that β 0j = 0 for j ∈ S n and β 0j = 0 for j ∈ S c n = J n /S n . In addition s n will mean card(S n ) and we will assume that S n = {1, . . . , s n }.

Asymptotic Theory
In this paper, the existence of the penalized estimator is established as well as the corresponding rates of convergence. In particular, under some assumptions, we proved that there exists a local minimizer β β β 0 , θ 0 of Q (β β β, θ) such that β β β 0 − β β β 0 = O p √ s n n −1/2 + δ n where δ n = max j∈S n P λ jn |β 0j | .
Furthermore, the selected set of variables, S n = {j ∈ J n ; β 0j = 0}, works as well (at least asymptotically) as it would do if the true set of relevant variables S n was known. Specifically, P( S n = S n ) → 1 as n → ∞.