Abstract
The variable selection problem is studied in the sparse semi-functional partial linear model, with single-index type influence of the functional covariate in the response. The penalized least squares procedure is employed for this task. Some properties of the resultant estimators are derived: the existence (and rate of convergence) of a consistent estimator for the parameters in the linear part and an oracle property for the variable selection method. Finally, a real data application illustrates the good performance of our procedure.
1. Introduction
In many real problems, to predict the value of a random variable, observations of many other variables are available. However, in many cases, it is unknown which of them (very few) have a real influence in the response. In this practical framework, we need procedures able to select the relevant variables to avoid high-dimensionality problems. Reducing the complexity of the model becomes even more crucial when regression involves a functional variable too (data are functions, images...). Therefore, the main goal is the simplification of the model, which makes easier both its estimation and interpretation, without losing its predictive efficiency.
These practical problems have motived the peak of semiparametric models in the functional regression, together with the variable selection procedures. In [1] the penalized least squares method for estimation and variable selection is studied for the partial linear model with functional covariate. In this model, the real variables have a linear effect (involving interpretable coefficients that are the parameters) in the response, while the infinite-dimensional covariate has a nonlinear (nonparametric) influence. However, in real data applications, it would be interesting having parameters related to the functional variable to derive practical interpretations. This is one of the advantages of the semi-functional partial linear single-index model (SFPLSIM): the real covariates also affect in a linear way to the response, but the infinite-dimensional covariate influences it trough a projection in an unknown direction, after applying a nonlinear link function. This direction of projection behaves like a function-parameter that could have interesting interpretations. Some theoretical properties related to the nonparametric estimation of the functional single-index model are given in [2]. In this paper, we will study the sparse SFPLSIM, focusing in the variable selection problem. For this purpose, we will use the penalized least squares procedure for estimating the parameters of the lineal components and, simultaneously, selecting the relevant covariates. The properties of the estimators will be analysed from a theoretical point of view: we will set its convergence rates and the consistency for selecting the model. These results will be illustrated through a real data application.
2. The Model
The SFPLSIM is defined by the relationship
where denotes a scalar response, are random covariates taking values in and is a functional random covariate valued in a separable Hilbert space with inner product . , and are a vector of unknown real parameters, an unknown functional direction and an unknown smooth real-valued function, respectively. Finally, is the random error, which verifies
3. The Penalized Least-Squares Estimators
For the purpose of simultaneously estimating -parameters and selecting relevant X-covariates in the SFPLSIM (1), we will apply the penalized least-squares approach. For that, in a first step we transform the SFPLSIM in a linear model by extracting from and () the effect of the functional covariate when is projected on the direction . Specifically, denoting by , the fact that
allows to consider the following approximate linear model (see Appendix A for understanding the notation):
where . Then, in a second step, the penalized least-squares approach is applied to model (3). Specifically, and are estimated by considering a minimizer, , of the penalized profile least-squares function
where , is a penalty function and is a tuning parameter. Note that, simultaneously to the parameter estimation, the previous procedure can be considered as a variable selection method: if is a non-null component of , then is selected as an influential variable.
From now on, we will denote and such that for and for . In addition will mean card and we will assume that .
4. Asymptotic Theory
In this paper, the existence of the penalized estimator is established as well as the corresponding rates of convergence. In particular, under some assumptions, we proved that there exists a local minimizer of such that
Furthermore, the selected set of variables, , works as well (at least asymptotically) as it would do if the true set of relevant variables was known. Specifically, .
An application to real data is included, which shows the good performance of the presented method in terms of error of prediction.
Funding
The authors acknowledge partial support by MINECO grants MTM2014-52876-R and MTM2017-82724-R (EU ERDF support included). Additionally, financial support from the Xunta de Galicia (Centro Singular de Investigación de Galicia accreditation ED431G/01 2016-2019 and Grupos de Referencia Competitiva ED431C2016-015) and the European Union (European Regional Development Fund - ERDF), is gratefully acknowledged. The first author also thanks the financial support from the Xunta de Galicia and the European Union (European Social Fund - ESF), the reference of which is ED481A-2018/191.
Conflicts of Interest
The authors declare no conflict of interest. The founding sponsors had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript, and in the decision to publish the results.
Abbreviations
The following abbreviations are used in this manuscript:
| SFPLSIM | Semi-functional partial linear single index model |
Appendix A. Notation
For any -matrix , if is the -identity-matrix, we denote
with being the weight function
where is a kernel function, is a smoothing parameter and, for , is the semimetric defined as
References
- Aneiros, G.; Ferraty, F.; Vieu, P. Variable selection in partial linear regression with functional covariate. Statistics 2015, 49, 1322–1347. [Google Scholar] [CrossRef]
- Novo, S.; Aneiros, G.; Vieu, P. Automatic and location-adaptive estimation in functional single-index regression. 2018; in press. [Google Scholar]
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |
© 2018 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).