On the Number of Independent Pieces of Information in a Functional Linear Model with a Scalar Response

: In a functional linear model (FLM) with scalar response, the parameter curve quantiﬁes the relationship between a functional explanatory variable and a scalar response. While these models can be ill-posed, a penalized regression spline approach may be used to obtain an estimate of the parameter curve. The penalized regression spline estimate will be dependent on the value of a smoothing parameter. However, the ability to obtain a reasonable parameter curve estimate is reliant on how much information is present in the covariate functions for estimating the parameter curve. We propose to quantify the information present in the covariate functions to estimate the parameter curve. In addition, we examine the inﬂuence of this information on the stability of the parameter curve estimator and on the performance of smoothing parameter selection methods in a FLM with a scalar response.


Introduction
Functional data analysis (FDA) continues to be an active and growing area of research as measurements from continuous processes are increasingly becoming prevalent in many fields. This type of data is functional data because they can be viewed as samples from curves. Recent key references in FDA include those of Hsing and Eubank [1] and Kokoszka and Reimherr [2]. Consider the following linear model where the response is a scalar, but the explanatory variable is function: for i = 1, 2, . . . , n. The response y i is a scalar value, x i (t) is a function, α is the intercept, β(t) is a parameter curve, and ε i is uncorrelated random noise with zero mean and constant variance σ 2 .
Model (1) is also referred to as a scalar-on-function regression model. Here, we assume T = [0, 1], but note that any closed continuous domain T in R can be transformed to [0,1]. Further, we assume that the covariate functions are known. The objective for model (1) is to estimate the smooth parameter curve, β(t). The parameter curve quantifies the relationship between the scalar response and a functional explanatory variable in the presence of uncertainty. A review of some common approaches for estimating β(t) are provided in [3], and these approaches depend on some variant of a tuning parameter (smoothing parameter, number of knots, bandwidth, number of retained principal components, etc.), where the size of the tuning parameter controls the trade-off between the goodness-of-fit and the smoothness of the parameter curve estimate. Regarding inference for model (1), some recently proposed methodologies include goodness-of-fit test [4] and testing linearity in a FLM with a scalar response [5]. Tekbudak et al. [6] provided a comparison of these and other recent testing procedures for linearity in a FLM with a scalar response. In influence diagnostics, Cook's distance [7] and Peña's distance [8] are extended to a FLM with a scalar response [9]. Stats 2020, 3 In this paper, our aim is to quantify the amount of information present in the covariate functions for estimating the parameter curve when β(t) is identifiable and assess the influence of the amount of this information on the numerical stability of the parameter curve estimator. In addition, we review the performance of different smoothing parameter selection methods based on the amount of information present in the covariate functions for estimating β(t). To our knowledge, no study has explicitly focused on this aspect of model (1). Here, the concept of the numerical stability (referred to simply as stability hereafter) is tied to the idea that the parameter curve estimate will not substantially change when the set of covariate functions are slightly altered. In Section 2, we define a measure, denoted ζ (x(t)), to quantify the amount of information present in the covariate functions for estimating β(t) when the parameter curve is identifiable. There are various computational methods to estimate the smooth parameter curve, β(t), in model (1). Our approach is to estimate β(t) using penalized regression spline estimation. Penalized regression spline estimation of β(t) and common smoothing parameter selection methods are discussed in Section 3. Section 4 proposes measures to assess the performance of smoothing parameter selection methods and the stability of a parameter curve estimator. A simulation study that assesses the relationship between ζ (x(t)) and the stability of the parameter curve estimator, as well as examines the performance of different smoothing parameter selection methods under varying ζ (x(t))s, is given in Section 5. Section 6 provides a real data application of ζ (x(t)). We conclude with discussion in Section 7.

Number of Independent Pieces of Information in a FLM
Model (1) can be ill-posed to varying degrees. An ill-posed problem refers to one for which no solution exists, the solution is not unique, or the solution is unstable [10]. Cardot et al. [11] provided theoretical conditions for the existence and uniqueness of a solution to (1), where the solution falls in the space spanned by the eigenfunctions of the functional covariate's covariance operator in which the model space is a separable Hilbert space of square integrable functions defined on [0, 1]. In practice, it is generally assumed that theoretical conditions for identifiability are satisfied when estimating β(t) in model (1). In scalar-on-image regression models, Happ et al. [12] studied the impact of structural assumptions of the parameter image, such as smoothness and sparsity, on the model estimates, as well as measures to assess to what degree the assumptions are satisfied.
Our focus is assessing how much information is present in the covariate functions to estimate the parameter curve in model (1) when the parameter curve is identifiable. To our knowledge, no prior studies have given consideration to this aspect of model (1) and its influence on the stability of the parameter curve estimator. Given the different areas of application of model (1), assessing this relationship is essential as the stability of the parameter curve estimate would influence the reliability of model uncertainty estimates. We aim to study this aspect of model (1) by proposing to quantify the information present in the covariate functions to estimate the parameter curve. The idea underlying this work is motivated by the work of Wahba [13].
Wahba [13] proposed the idea of the number of independent pieces of information to gauge if one can obtain a reasonable solution in a type of general smoothing spline model (GSSM). In this context, the number of eigenvalues provided by the eigendecomposition of the inner-product of the representer of a bounded linear functional with itself when scaled by the reciprocal of the variance of the error component in the model that are greater than one are considered to be the number of independent pieces of information. If the number of independent pieces of information in a GSSM is large, then a solution to a GSSM is recoverable. However, no explicit criterion was given to quantify how many pieces are required. Motivated by this, we define a measure for the number of independent pieces of information in the covariate functions for estimating β(t) in model (1). Let v 1 , . . . , v n be the eigenvalues from the eigendecomposition of 1 0 (x(t)) (x(t)) T dt where x(t) = (x 1 (t), . . . , x n (t)) T . We define the number of independent pieces of information in the covariate functions for estimating β(t) as where σ 2 is the variance of the error component in model (1). Measure (2) may be estimated by plugging in an estimate of σ 2 . An estimate of σ 2 is provide in Section 3.
As an illustration of (2), Figure 1 contains four different sets of covariate functions that vary in their number of independent pieces of information. These covariate functions are used in a simulation study in Section 5. Figures in this study were produced using the R packages ggplot2 [14] and cowplot [15]. If there is less information present in the covariate functions to estimate β(t) as quantified by (2), then one would anticipate that a reasonable or stable parameter curve estimator would be less feasible. To assess whether the size of ζ (x(t)) indicates the degree of stability of a solution to model (1), we propose a stability measure of a parameter curve estimator in Section 4. We address the relationship between ζ (x(t)) and the stability of the estimator in Section 5.

Penalized Regression Spline Estimate of β(t)
Here, we review penalized regression spline estimation of the parameter curve β(t). Assuming that the parameter curve is smooth in the sense that it lies in a Sobolev space of order 4, we may seek an estimate of β(t) by minimizing where the term 1 0 (β (t)) 2 dt penalizes curvature in the estimate. To ease notation, from here forward, we assume the intercept α = 0, but a non-zero α value is easily incorporated into the computational approaches discussed.
We use the method of regularized basis functions [16], with a B-spline basis of order 4 to minimize fitting criterion (3) with respect to the parameter curve β(t). With this approach, each covariate function x i (t) and the parameter curve β(t) are represented using a linear combination of B-spline basis functions. Let x i (t) be represented as ∑ J x j=1 k i,j N j (t) for i = 1, . . . , n, where N j (t) denotes the jth B-spline function of order 4. Similarly, we represent β(t) as ∑ denotes the ith row of the matrix K that has elements In this basis representation framework, penalized least squares criterion (3) is re-expressed as finding c to minimize where y = (y 1 , . . . , y n ) T , dt. For a given λ, an estimate of β(t) is obtained by minimizing (4) with respect to c viâ where the subscript λ signifies the dependence of the solution on the value of the smoothing parameter. Various data-driven approaches have been proposed to select the smoothing parameter λ in (3). These methods include Akaike's information criterion, Akaike's information criterion corrected, cross-validation, generalized cross-validation criterion, L-curve criterion, restricted maximum likelihood, Schwarz information criterion, etc. Since the size of λ controls the size of the penalty in (3), most of these data-driven methods consist of two parts: one that measures the goodness of fit of the model and another that quantifies the complexity of the parameter curve estimate. Thus, these methods attempt to achieve an optimal balance between how well the model fits the data and the smoothness of the parameter curve estimate (see [4,9,[17][18][19], and others cited therein for examples of recent studies that have used one or more of these criteria to select the smoothing parameter in models of type (1).) In our study, we restrict our discussion to the following commonly used data-driven criteria: Akaike's information criterion corrected (AICc), cross-validation (CV), Schwarz information criterion (SIC), and the generalized cross-validation (GCV) criterion. The smoothing parameter selection methods used in our study are by no means exhaustive, nor are they meant to be. Rather, our intent is to explore if the amount of information present in the covariate functions for estimating β(t) may effect the performance of given smoothing parameter selection method. For each criterion, the value of the smoothing parameter λ that minimizes the criterion is assumed to be a reasonable value for λ. Each criterion discussed here is dependent on the residuals sum of squares defined by Denote the minimizing solution by β where (−i) symbolizes an estimate based on all observations except for the ith case. Since there are n cases that one can delete for a given λ, a cross-validation score is defined as A computationally friendly form of CV [16] is defined as where S ii denotes the ith diagonal element of S λ . The GCV criterion [20] replaces the diagonal elements of the smoother matrix in the CV formula by tr(S λ )/n to obtain where d f λ = tr(S λ ). d f λ is referred to as the effective degrees of freedom [21]. RSS λ and d f λ are commonly used to estimate σ 2 viaσ 2 The term γ in (5) represents an inflation of the effective degrees of freedom (EDF) for γ > 1. Inflation of the EDF is used as a measure to guard against GCV selecting a smoothing parameter that over-fits the data in non-parametric models [22]. Some simulation studies suggest 1.4 to be reasonable in non-parametric models [22,23]. The SIC criterion [24] may be expressed as The AICc criterion proposed by Hurvich et al. [25] penalizes more complex estimates of β(t) than does the SIC for smaller sample sizes, and it may be defined as

Quantifying Stability ofβ(t) and the Performance of Smoothing Parameter Selection Methods
An ideal value for the smoothing parameter, call it λ , may be considered one that minimizes the integrated squared error, To assess the performance of the smoothing parameter selection methods discussed in Section 3, we use the median of the measure Note that any penalized regression spline estimate of the parameter curve will be dependent on the chosen value of the smoothing parameter, but the notation of this dependence is suppressed for better readability. Relative to β(t), the estimator β(t) is considered better the closer the median of (8) is to 1. The further away the median of (8) is from 1, the poorer is the performance of the smoothing parameter selection method. Measure (8) was motivated by a measure proposed by Lee [26] to compare smoothing parameter selection methods in smoothing splines.
To assess the stability of a parameter curve estimator, we propose a leave-one-out measure motivated by the DFFITS [27] statistic. Specifically, we use the median of a leave-one-out integrated squared error measure, where β represent estimators of β(t) and σ 2 , respectively, based on all observations except for the ith case, and where λ minimizes criterion (7). A large value of the median of (9) would reflect a less stable estimator. If ζ (x(t)) is to be considered an appropriate measure of information in the covariate functions for estimating β(t), then large values of ζ (x(t)) would be associated with small values of the median of (9) in the sense that slightly altering the set of covariate functions did not lead to large changes in the parameter curve estimate. Similarly, smaller values of ζ (x(t)) would be associated with larger values of the median of (9). This is evaluated with a simulation study in the next section.

A Simulation Study
Using a simulation study, we examine the stability of a parameter curve estimator and the performance of the smoothing parameter selection methods at varying levels of ζ (x(t)). The sampling distributions required to derive analytical formulas for the median of (8) and (9) are unattainable due to the dependence of the parameter curve estimator on the smoothing parameter. Therefore, a simulation estimate of the median of (8) is obtained by The subscript g denotes the gth simulated dataset. Similarly, we estimate the median of (9) via To obtain (10) and (11), simulated datasets are generated under various settings by the model, where ε i s are assumed to independent and identically normal random variables with mean 0 and standard deviation σ. Overall, under four different sets of covariate functions, g = 2500 simulated datasets were generated for each combination of n ∈ {25, 50, 100}, , and σ is chosen for each setting to ensure that κ ∈ {.10, .20}, where κ refers to the signal-to-noise ratio. As defined by Febrero-Bande et al. [9], The different sets of covariate functions used in this study were first produced on a discretized scale of 50 equally spaced values. To ensure sufficient flexibility in their functional representations, the parameter curve and the set covariate functions are represented as functions using the approach described in Section 3, with J β = 50 and J x = 50, respectively.
The simulations are performed using R [28], along with the extension and usage of code from the R package fda [29]. In addition, the R packages dplyr [30] and tidyr [31] were used for data management. The parameter curves in this study (shown in Figure 2) are defined as Beta [20,5] (t) + 1 3 Beta [12,12] (t) + 1 3 Beta [7,30] (t), Beta [3,10] (t) + 3 10 Beta [7,2] (t), where and they were chosen to assess the performance of (10) and (11) at varying levels of complexity of the parameter curve in terms of their approximate curvature.  The different sets of covariate functions were generated under different conditions to ensure varying levels of ζ (x(t)). The first set of covariate curves (x i1 (t)s) are produced by generating realizations from a normal random variable with mean 0 and standard deviation η that are randomly shifted about the y-axis by η, where η ∼ Uni f (1,30). We denote this first set of covariate curves as Covariate Set 1. Covariate Set 2 (x i2 (t)s) is created by generating realizations from a Gaussian process having an exponential covariogram with variance parameter 10 and scale parameter 0.4. Covariate Set 3 (x i3 (t)s) is produced by generating realizations from a mixture of beta random variables randomly shifted about the y-axis. Covariate Set 4 (x i4 (t)s) is obtained by generating realizations from a Brownian motion process with variance parameter 2.7. As an illustration, the four different sets of covariate functions used in this study are shown in Figure 1 for a sample of size n = 25 at a specified signal-to-noise ratio.
Tables 1-3 provide the SMK when the parameter curves are β 1 (t), β 2 (t), and β 3 (t), respectively, under each simulation setting and smoothing parameter selection method. The results are presented on the log scale to better depict the differences. Note that a small value of the SMK does not reflect whether a reasonable estimator of β(t) was obtained but rather reflects the performance of the given smoothing parameter selection method relative to criterion (7). Some of the differences between the smallest and the next smallest values of the SMK across the smoothing parameter selection methods for a given simulation scenario may not appear large. To better distinguish between the smallest and the next smallest values of the SMK, pairwise comparisons of the SMKs in a given simulation setting were performed using Mood's median test with the Benjamini and Hochberg [32] correction as implemented in the R package RVAideMemoire [33] at significance level 0.05. Here, we consider the performance of a smoothing parameter selection method to be favorable or best if it obtains the lowest SMK or an SMK that is not significantly different from the lowest SMK. We now summarize the favorable smoothing parameter selection method(s) that were most common across different simulation settings.
Under Covariate Set 1, SIC consistently obtained the lowest SMK across across all simulation settings. Under Covariate Set 2 at n = 25, GCV γ=1. 4 , AICc, and SIC consistently preformed best under the different simulation settings. However, SIC tended to performed best across all simulation settings at n = 50. At n = 100 and the lower κ, AICc generally performed best across the three parameter curves. At the higher κ, SIC was the better smoothing parameter selection method. When the covariate functions assume the form of Covariate Set 3, AICc and GCV γ=1.4 tended to perform just as well or better than the other smoothing parameter selection methods across all settings. Under Covariate Set 4 with n = 25, AICc was consistently among the better methods for all settings. For n = 50, GCV and AICc were favorable under parameter curves β 1 (t) and β 2 (t), whereas SIC was favorable under β 3 (t). When n = 100, AICc performed best or just as well as the other methods. Covariate Sets 1 and 2 tended to have a lower ζ (x(t)) in our study for a given sample size and κ, whereas Covariate Sets 3 and 4 had a higher ζ (x(t)). While a perfect one-to-one relationship does not appear evident between ζ (x(t)) and the performance of a smoothing parameter selection method, SIC tended to perform more favorably when ζ (x(t)) ranged between two and nine across the three different parameter curves. For the higher values of ζ (x(t)), AICc tended to perform just as well or better than the other methods more often than not. In addition, note that GCV γ=1.4 obtained a lower SMK than GCV for almost all simulation settings.  Table 4 shows the SMISE (n) under each simulation setting. The results are presented on the log scale to better depict the differences. The results show similar patterns or trends under each κ. For a given parameter curve and sample size, the SMISE (n) decreases as ζ (x(t)) increases. Thus, the more information present in the covariate functions for estimating the parameter curve, the more stable the parameter curve estimator. We also note that, for a given covariate set, an increase in the sample size corresponds to a decrease in the SMISE (n) and to a non-decreasing ζ (x(t)). This reassures that ζ (x(t)) may be viewed as a measure of the amount of information present in the covariate functions in the sense that under a given covariate set, more observed data tends to increase ζ (x(t)) to varying degrees and provide a smaller SMISE (n) . However, a large sample size does not imply a large ζ (x(t)), as illustrated by Covariate Set 1. Similarly, a small sample size does not imply a low ζ (x(t)), such as under Covariate Set 4. The differences in results under each κ are in part due to the scaling involved in (9) byσ 2(−i) , whereσ 2(−i) would tend to be higher under κ = 0.20.
Recall that for smoothing parameter selection under a covariate set with a low ζ (x(t)), SIC performed best or just as well as the other smoothing parameter selection methods. For covariate sets with a higher ζ (x(t)), AICc was generally among the better smoothing parameter selection methods. To visualize the impact of ζ (x(t)) on the stability of the parameter curve estimate, Figure 3 shows the resulting approximate expected value of the parameter curve estimator (computed across all parameter curve estimates in the simulation study) plus or minus two times the approximate standard deviation of the parameter curve estimator when using the preferred smoothing parameter selection method suggested by Tables 1-3. For brevity, we only present the results for n = 25. The parameter curve estimator, under Covariate Set 1, showed much higher variability than under the other covariate sets. Further, the covariate set with a higher ζ (x(t)) (Covariate Set 4) reflected the lowest variability. Similar behavior was exhibited at n = 50 and n = 100. This behavior is consistent with the behavior of the SMISE (n) for a given covariate set and sample size. This reflects that a low ζ (x(t)) is associated with a less stable solution, which in turn may substantially increase the variability of a parameter curve estimator, where such variability would not be reflected in an observed confidence interval for the parameter curve.
In practice, ζ (x(t)) will need to be estimated due to its dependence on σ 2 . An estimate of this measure,ζ (x(t)), is provided in (6) using the estimate of σ 2 provided in Section 3. The estimate of σ 2 will be dependent on a chosen value of the smoothing parameter. To better understand the impact of the chosen value of the smoothing parameter on (6), Tables 5-7 provide the average value of (6), computed as the average over all simulated datasets, when parameter curves are β 1 (t), β 2 (t), and β 3 (t), respectively, under each simulation setting and smoothing parameter selection method. On average, (6) provides a reasonable estimate of ζ (x(t)) in the simulation settings considered. Table 4. The SMISE (n) under each simulation setting. The size of ζ (x(t)) is shown below each respective SMISE (n) in parentheses.

A Real Data Illustration
In this section, we illustrate the use of ζ (x(t)) with two real datasets: the gasoline dataset described by Kokoszka and Reimherr [2] and used by Reiss and Ogden [18] and the streamflow and precipitation data described by Masselot et al. [34]. Model (1) was applied to both datasets to model the relationship between a functional covariate and scalar response. The gasoline data consist of near-infrared reflectance spectra of 60 gasoline samples (measured in 2-nm intervals from 900 to 1700 nm), as well as the octane numbers for each gasoline sample. These data are available in the R package refund [35]. The aim of modeling these data using model (1) is to determine the association between the octane rating (response variable) and the near-infrared reflectance spectra curves (covariate curves). We represent the near-infrared reflectance spectra discretized measurements and the parameter curve as functions using the methods described in Section 3, with J x = 50 and J β = 50. The top left panel of Figure 4 contains the near-infrared reflectance spectra curves, x i (ω) for i = 1, . . . , 60. The estimated number of independent pieces of information in these covariate functions for estimating parameter curve is 5. Sinceζ (x(t)) = 5, we use the SIC for smoothing parameter selection because it performed just as well or better in our simulation study when ζ (x(t)) was small. The upper right panel shows the parameter curve estimate along with 95% point-wise confidence intervals for β(ω) when using SIC for smoothing parameter selection. Note that the estimated parameter curve has a positive effect in the intervals (950, 1125) and (1325, 1475), implying that higher values of near-infrared reflectance spectra are associated with higher octane levels in these intervals. Lower values of near-infrared reflectance spectra are associated with lower octane levels in the intervals (1175, 1325) and (1525, 1650). For a given λ, approximate point-wise confidence intervals may be constructed using the variance of the parameter curve estimator, [16]. Since ζ (x(t)) is low, our simulation study suggest that the variability of the parameter curve is greater than what is reflected by the confidence interval.
We briefly summarize the streamflow and precipitation data, referring to Masselot et al. [34] for further information on the study and the corresponding data. The data consist of yearly observations of the sum daily streamflow values from 1 July to 31 October, and yearly precipitation time series from 1 June to 31 October for years 1981-2012. These data were measured in areas of the Dartmouth River located in a region of the province of Quebec, Canada. In this study, investigators were interested in estimating and forecasting yearly total streamflow (scalar response) using the corresponding yearly precipitation profile (functional covariate). The precipitation time series and the parameter curve are represented as functions using the methods described in Section 3, with J x = 153 and J β = 22. However, since precipitation measurements are non-negative, we constrain the functional representation of the precipitation measurements to be non-negative by imposing non-negative constraints on the B-spline coefficients. The bottom left panel of Figure 4 contains the precipitation curves, x i (t) for i = 1, . . . , 153, covering the daily time domain from June to October in a given year. For these data,ζ (x(t)) = 29. We use the AICc since it performed just as well or better than the other methods in our simulation study when ζ (x(t)) was large. The lower right panel shows the parameter curve estimate along with 95% point-wise confidence intervals for β(t) when using AICc for smoothing parameter selection. The estimated parameter curve shows that the effect of precipitation on total streamflow is negative in June, as well as in October. Due to the size of theζ (x(t)), our simulation study suggests that this parameter curve estimate is more stable than the one estimated for the gasoline data.  The reflectance spectra curves; (top right) the parameter curve estimate displaying the association between the octane rating and the near-infrared reflectance spectra curves is shown in the, together with a 95% point-wise confidence interval; and (bottom) analogous graphs for the streamflow and precipitation data.

Discussion
We present a measure, ζ (x(t)), for a FLM with a scalar response to determine how much information is present in the covariate curves for estimating the parameter curve β(t) when the parameter curve is identifiable. To estimate the parameter curve in model (1), penalized regression spline estimation is used, and we summarize several commonly used methods for selecting the smoothing parameter. To assess the stability of the parameter curve estimator under varying levels of ζ (x(t)), we examine the SMISE (n) of a parameter curve estimator. The results show that the greater is ζ (x(t)), the more stable is the parameter curve estimator in that it produces a smaller SMISE (n) than when ζ (x(t)) is smaller. Further, we assess the impact of ζ (x(t)) on smoothing parameter selection, and, while a one-to-one relationship is not clear between ζ (x(t)) and the performance of a smoothing selection method, SIC tends to perform just as well as or better than other methods when ζ (x(t)) is small, whereas AICc tends to perform just as well as or better than other methods when ζ (x(t)) is large.
Overall, our simulation study showed that the size of ζ (x(t)) impacts both the stability of a parameter curve estimator and the performance of the smoothing parameter selection methods. Future work will study if these results are consistent under alternative parameter curve estimation procedures. An interesting direction for future work is to investigate if shape constraints on the parameter curve could serve as a remedial measure for improving stability of the parameter curve estimator, particularly when ζ (x(t)) is low. Scenarios in which shape constraints are imposed on the parameter curve do arise in practice in a FLM with a scalar response (see [36,37] for some recent examples). Identifying problematic data in functional regression models remains critical and an on-going challenge. We hope this study encourages others to consider approximating ζ (x(t)) when applying model (1) so that the amount of information present in the covariate curves for estimating parameter curve can be gauged. This, in turn, may provide guidance in choosing a smoothing parameter selection method, as well as considering the stability of the parameter curve estimate.
Funding: This research was supported in part by NSF grant HRD-1547784.