Generalized Partially Functional Linear Model with Unknown Link Function

Weiwei Xiao; Songxuan Li; Haiyan Liu

doi:10.3390/axioms12121089

,

and

¹

School of Science, North China University of Technology, Beijing 100144, China

²

Department of Statistics, University of Leeds, Leeds LS2 9JT, UK

^*

Author to whom correspondence should be addressed.

Axioms2023, 12(12), 1089;https://doi.org/10.3390/axioms12121089

This article belongs to the Special Issue Advances in Mathematics: Theory and Applications

Version Notes

Order Reprints

Abstract

In existing models with an unknown link function, the issue of predictors containing both multiple functional data and multiple scalar data has not been studied. To fill this gap, we propose a generalized partially functional linear model, which not only models the relationship between multiple scalar and functional predictors and responses, but also automatically estimates the link function. Specifically, we use the functional principal component analysis method to reduce the dimensionality of functional predictors, estimate the regression coefficients using the maximum likelihood estimation method, estimate the link function using the method of local linear regression, iteratively obtain the final estimator, and establish the asymptotic normality of the estimator. The asymptotic normality is illustrated through simulation experiments. Finally, the proposed model is applied to study the influence of environmental, economic, and medical levels on life expectancy in China. In the study, functional predictors are the daily air quality index, temperature, and humidity of 58 cities in 2020, and scalar predictors are GDP and the number of beds in hospitals. The experimental results indicate that the unknown link function model has a smaller prediction error and better performance than both the model with the known link function and the model without a link function.

Keywords:

functional data analysis; unknown link function; generalized functional linear model average life expectancy

MSC:

00A71

1. Introduction

In 1982, Ramsay [1] first proposed the definition of functional data, laying a foundation for the development of functional data analysis. In 2005, Ramsay and Silverman provided a detailed introduction to the general methods and steps of functional data analysis, including functional principal component analysis and functional linear regression models in their book [2]. In 2012, Horváth and Kokoszka [3] focused on the inferential methods in functional data analysis.

In 2009, Shin [4] proposed a partial functional linear model (PFLM), which explores the relationship between a scalar response variable and mixed-type predictors. In 2012, Shin and Lee [5] derived the asymptotic prediction rate of PFLM and compared it with that of other functional regression models.

In 2002, James [6] proposed generalized linear models with functional predictors and applied them to standard missing data problems. In 2005, Müller and Stadtmüller [7] proposed a generalized functional linear regression model where the response variable is a scalar and the predictor is a random function. They also considered the situation where the link and variance functions were unknown. In 2015, Shang and Cheng [8] proposed a roughness regularization approach in making nonparametric inference for generalized functional linear models with known link functions. In 2019, Wong et al. [9] investigated a class of partially linear functional additive models that predict a scalar response by both the parametric effects of a multivariate predictor and the non-parametric effects of a multivariate functional predictor.

In a generalized linear model, sometimes the link function may not be known exactly, but can be assumed to be of some general ‘parametric’ form. In 1984, Scallan et al. [10] showed how generalized linear models can be extended to fit models with such link functions.

In 1994, Weisberg and Welsh [11] used kernel smoothing estimation to estimate the link function and estimated regression coefficients through the link function, then alternated between these two steps, which effectively solves the fitting problem when the link function is unknown. However, kernel smoothing estimation may have problems at the boundary, so local polynomial fitting is introduced, which performs better near the boundary.

In 1998, Chiou and Müller [12] considered the condition of the link and the variance functions to be unknown but smooth. Consistency results for the link and the variance function estimators, as well as the sampling distribution of the regression coefficients, were obtained. In 2005, Chiou and Müller [13] introduced a flexible marginal modeling approach for statistical inference for clustered and longitudinal data under minimal assumptions. The predictor was longitudinal data in the model. The estimated estimating equation approach was semi-parametric. The semi-parametric model proposed was fitted by quasi-likelihood regression. The consistency of the estimates of the link and variance functions and the asymptotic limit distribution of regression coefficients were given. In addition, there are other methods to estimate unknown functions. In 2009, Bai et al. [14] focused on single-index models for longitudinal data. They proposed a procedure to estimate the single-index component and the unknown link function based on the combination of the penalized splines and quadratic inference functions. In 2012, Pang and Xue [15] generalized the single-index models to the scenarios with random effects. The link function was estimated by using the local linear smoother. A new set of estimating equations modified for the boundary effects was proposed to estimate the index coefficients. In 2017, Yuan and Diao [16] developed a sieve maximum likelihood estimation for generalized linear models, in which the estimator of the unknown link function was assumed to lie in a sieve space. Various methods of sieves including the B-spline and P-spline-based methods were introduced.

In 2017, Kokoszka and Reimherr [17] wrote a book that introduced the basic concepts, methods, and applications of functional data analysis. The book provided a clear and systematic overview, covering key areas such as representation, smoothing, interpolation, statistical modeling, and inference for functional data. It also included detailed explanations of practical examples and computational methods. In 2023, Rao and Reimherr [18] introduced a novel neural network-based nonlinear model of functional data designed to exploit the structure of functional data and fit it with a derived function gradient optimization algorithm, demonstrating the effectiveness of these methods in dealing with complex functional models and providing new breakthroughs for deep learning applications in the field of functional data analysis.

The relationship between environmental factors and human health has been a topic of significant research interest in recent years. In 2012, Huang et al. [19] explored the relationship between temperature and years of life lost (YLL). The study found that both high and low temperatures lead to an increase in YLL, with high temperatures having a greater impact. In 2020, Yang et al. [20] applied a generalized additive model to assess the associations between daily PM2.5 exposure and YLL due to respiratory diseases in 96 Chinese cities during 2013–2016. They further estimated the avoidable YLL and potential gains in life expectancy under the assumption that daily PM2.5 level met World Health Organization standards. In 2021, Deryugina and Molitor [21] explored the factors influencing life expectancy across the United States. The study found that individuals living in areas with severe air pollution, poor water quality, and inadequate healthcare facilities generally had shorter life expectancy and poorer health conditions.

In summary, the existing models with unknown link functions have not addressed the issue of the generalized partially functional regression model, which involves regressing the response variable on multiple functional and scalar predictors. To fill this gap, this study proposes a generalized partially functional linear model with an unknown link function. The proposed model avoids the problem of decreased model accuracy caused by selecting an incorrect link function. The predictors in the proposed model include both multiple functional data and multiple scalar data. It reveals the complex relationships between variables and provides a flexible and effective modeling approach. It can achieve better prediction and explanation.

The paper is organized as follows. All the published works and definitions that are referred to in the process of theorem proving are introduced in Section 2. The abbreviations used in the article are introduced in the Section 3.1. The generalized partial functional linear model with unknown link function is proposed in Section 3.2. The estimation of the regression coefficients and the link function is discussed in Section 3.3. In Section 4, asymptotic normality of estimators are derived. Simulation results are reported in Section 5. The average life expectancy study in 58 cities in China is given in Section 6. In Section 7, a brief summary and limitations of the research are provided. Possible applications and future directions are presented in Section 7.

2. Preliminaries

In this section, we provide an overview of the published works and definitions that are relevant to our research. These preliminary concepts and references lay the foundation for a better understanding of the subsequent discussion.

(1) In 1982, Mack and Silverman [22] provided a comprehensive analysis of the weak and strong uniform consistency properties of kernel regression estimates, highlighted their theoretical properties and practical significance in non-parametric regression modeling. In this paper, we directly apply the results of Proposition 4 as Lemma 1 for Theorem 1 in this paper.

(2) In 1995, Masry and Tjøstheim [23] discussed the estimation and identification of nonlinear time series of ARCH type. They provided an estimation method to obtain consistent estimates of the parameters and proved the asymptotic normality. They also explored model identification methods. Their studies are of significant importance for modeling and analyzing financial time series. Theorem 3.3 in their work is used to prove Theorem 1 in this paper.

(3) In 1999, Chiou and Müller [24] focused on the study of non-parametric quasi-likelihood methods. They provided the theoretical derivation process of this method, and explored its applications in statistical inference. Theorem 4.1 in their paper is used to prove Lemma 2 and Lemma 3 for Theorem 2 in this paper.

(4) In 2021, Xiao et al. [25] proposed a generalized partially functional linear regression model where the response variable is 0 or 1 and the predictors were multiple functional and scalar, and the asymptotic property of the estimated coefficients in the model was established. The proof method of Theorem 1 in [25] is used to prove Theorem 2 in this work.

3. Model and Estimation

The data we observe for the i-th subject are

{Y_{i}, X_{i 1} (t), X_{i 2} (t), \dots, X_{i d} (t), Z_{i}},

i = 1, \dots, n .

We assume that these data are independent, identically distributed (i.i.d) copies of

{Y, X_{1} (t), \dots, X_{d} (t), Z}

. For

j = 1, \dots, d,

the functional predictor

X_{j} (t)

is a random curve.

X_{i j} (t), i = 1, 2, \dots, n

are samples of

X_{j} (t)

and

X_{i j} (t)

are square integrable on a real bounded interval T, i.e.,

X_{i j} (t) \in L^{2} (T)

.

L^{2} (T)

refers to the space of square integrable functions defined on T. And the scalar predictor vector

Z = {(Z_{1}, Z_{2}, \dots, Z_{q})}^{T}

is a q dimensional random vector. The response Y is a real-valued random variable that may be binary or count.

3.1. Abbreviation Introduction

Table 1 is a list of the abbreviations we use in this work along with their corresponding full forms:

Table 1. The abbreviations and their corresponding full forms.

3.2. Model

We establish a model for the relationship between the response variable

Y_{i}

and the predictors

X_{i j} (t), j = 1, 2, \dots, d

and

Z_{i}

:

Y_{i} = g (\sum_{j = 1}^{d} \int_{T} X_{i j} (t) β_{j} (t) d t + {Z_{i}}^{T} γ) + ε_{i},

(1)

where

β_{j} (\cdot)

is the regression coefficient function that needs to be estimated for the functional predictors

X_{i j} (t)

;

γ

is a q dimensional vector with the elements to be the regression coefficients for the scalar predictors

Z_{i}

that need to be estimated, i.e.,

γ = {(γ_{1}, γ_{2}, \dots, γ_{q})}^{T}

. Here

ε_{i}

is i.i.d copies of

ε

, which is the random error variable and

ε = Y - g (η)

,

E [ε |X_{j} (t), Z] = 0

, where

η = \sum_{j = 1}^{d} \int_{T} X_{j} (t) β_{j} (t) d t + Z^{T} γ .

The relationship between the response variable Y and

η

is established through

g (\cdot)

, i.e.,

E [Y | X_{j} (T), Z] = μ = g (η)

.

g (\cdot)

is the link function that is unknown and needs to be estimated in this paper.

Let

σ^{2} (\cdot)

be a variance function that satisfies

σ^{2} (\cdot) \geq c > 0

for a constant

c > 0

, such that

V a r [Y | X_{j} (t), Z] = σ^{2} (μ) = σ^{2} (g (η)),

V a r [ε] = E [ε^{2}] = σ^{2} (E [Y | X_{j} (t), Z]) .

To reduce the dimensionality of the functional predictors

X_{i j} (t)

, we adopt the method of FPCA in this paper. First, we need to standardize the original data by centering them, so that

E [X_{i j} (t)] = 0, j = 1, \dots, d,

and

E [Z_{l}] = 0, l = 1, \dots, q

.

By KL expansion and Mercer’s theorem,

X_{i j} (t)

can be expanded as

X_{i j} (t) = \sum_{k = 1}^{\infty} ξ_{i j k} ρ_{j k} (t),

(2)

where

ξ_{i j k}

represents the functional principal component scores, and

ρ_{j k} (\cdot)

are called functional principal components, which are the eigenfunctions of the covariance operator of

X_{i j} (t)

. Notice that

ρ_{j k} (\cdot), k = 1, 2, \dots

form an orthonormal basis for the function space

L^{2} (T)

. Then regression coefficient function

β_{j} (t) \in L^{2} (T)

can be expanded as

β_{j} (t) = \sum_{k = 1}^{\infty} χ_{j k} ρ_{j k} (t) .

(3)

where

χ_{j k}

represents the functional principal component scores.

After plugging the above two expansions into (1), we have

Y_{i} = g (\sum_{j = 1}^{d} \sum_{k = 1}^{m_{j}} ξ_{i j k} χ_{j k} + {Z_{i}}^{T} γ) + ε_{i} .

(4)

In (4), we truncated the predictors at

m_{j}

(depending on sample size n), and

m_{j}

increases asymptotically with

n \to \infty

.

3.3. Estimation

Define a parameter vector

θ_{0}

, where

θ_{0} = {(χ_{11}, χ_{12}, \dots, χ_{1 m_{1}}, \dots, χ_{d 1}, χ_{d 2}, \dots, χ_{d m_{d}}, γ_{1}, \dots, γ_{q})}^{T} .

For the estimation of the parameter vector

θ

and the link function g, we use an iterative estimation method to obtain the final estimates. Let there exist a constant

c > 0

; with this c and n, we can define

θ_{n} = {θ : ∥θ - θ_{0}∥ \leq c n^{- 1 / 2}}

. The norm of finite dimensional spaces used in this paper is the Euclidean norm. The overall iterative process is briefly described below:

Step 1 To obtain the estimate

θ^{(0)}

of

θ_{0}

by solving Equation (5), it is assumed that the link function

g (\cdot)

is known. The link function

g (\cdot)

is required to be second-order continuously differentiable to ensure the existence of the Hessian matrix, moreover, for the variance function

σ^{2} (\cdot)

is defined on the range of link function and is strictly positive.

U (θ) = \sum_{i = 1}^{n} (Y_{i} - μ_{i}) \frac{g^{'} (η_{i})}{σ^{2} (μ_{i})} Δ_{i} = 0,

(5)

where

η_{i} = \sum_{j = 1}^{d} \sum_{k = 1}^{m_{j}} ξ_{i j k} {\tilde{χ}}_{j k} + Z_{i} \tilde{γ}

,

\tilde{χ}

near

χ

,

\tilde{γ}

near

γ

,

μ_{i} = g (η_{i})

and

Δ_{i} = {(ξ_{i 11}, ξ_{i 12}, \dots, ξ_{i 1 m_{1}}, \dots, ξ_{i d 1}, ξ_{i d 2}, \dots, ξ_{i d m_{d}}, z_{i 1}, \dots, z_{i q})}^{T} .

Here,

\tilde{χ}

and

\tilde{γ}

represent the corresponding estimated value in step 1 but not the final estimate.

We introduce the following matrix:

D_{0} = D_{n, q} = {(z_{i l})}_{1 \leq i \leq n, 1 \leq l \leq q},

D_{j} = D_{n, m_{j}} = {(ξ_{i j k})}_{1 \leq i \leq n, 1 \leq j \leq d, 1 \leq k \leq m_{j}},

D = D_{n, q + \sum_{j = 1}^{d} m_{j}} = (D_{0}, D_{1}, \dots, D_{d}),

V = d i a g (σ^{2} (μ_{1}), σ^{2} (μ_{2}), \dots, σ^{2} (μ_{n})),

G = d i a g {(g^{'} (η_{i}))}_{1 \leq i \leq n},

Y = {(Y_{1}, Y_{2}, \dots, Y_{n})}^{T},

μ = {(μ_{1}, μ_{2}, \dots, μ_{n})}^{T} .

Then, Equation (5) can be expressed in matrix form, i.e.,

D^{T} V^{- 1} G (Y - μ) = 0 .

We can solve it by the weighted least squares method. A Taylor expansion of

g^{- 1} (Y)

, where

\begin{matrix} g^{- 1} (Y) & = g^{- 1} (μ) + {[g^{- 1} (μ)]}^{'} (Y - μ) \\ = η + G^{- 1} (Y - μ), \end{matrix}

and then we can get

D^{T} W (g^{- 1} (Y) - η) = 0,

where

W = V^{- 1} G^{2}

. Simplification yields estimates

{\tilde{χ}}_{j}^{(0)} = {(D_{j}^{T} W D_{j})}^{- 1} D_{j}^{T} W g^{- 1} (Y),

{\tilde{γ}}_{j}^{(0)} = {(D_{0}^{T} W D_{0})}^{- 1} D_{0}^{T} W g^{- 1} (Y),

where

{\tilde{χ}}_{j}^{(0)} = {({\tilde{χ}}_{j 1}^{(0)}, \dots, {\tilde{χ}}_{j m}^{(0)})}^{T}, j = 1, 2, \dots, d

,

{\tilde{γ}}^{(0)} = {({\tilde{γ}}_{1}^{(0)}, {\tilde{γ}}_{2}^{(0)}, \dots, {\tilde{γ}}_{q}^{(0)})}^{T}

.

Let

{\tilde{θ}}^{(0)} = {({\tilde{χ}}_{11}^{(0)}, {\tilde{χ}}_{12}^{(0)}, \dots, {\tilde{χ}}_{1 m_{1}}^{(0)}, \dots, {\tilde{χ}}_{d 1}^{(0)}, {\tilde{χ}}_{d 2}^{(0)}, \dots, {\tilde{χ}}_{d m_{d}}^{(0)}, {\tilde{γ}}_{1}^{(0)}, {\tilde{γ}}_{2}^{(0)}, \dots, {\tilde{γ}}_{q}^{(0)})}^{T} .

Step 2 By local linear regression, the estimates

g^{(0)}

,

g^{' (0)}

of the link functions g,

g^{'}

are obtained.

Let the bandwidth

b = b_{n}

of the kernel function

k (\cdot)

converge to zero and define

k_{b} (\cdot) = b^{- 1} k (\cdot / b)

. Since the convergence rates of

g (\cdot)

and

g^{'} (\cdot)

are different, their bandwidth choices should also be different. Let

h_{0} = h_{0 n}

denote the bandwidth of

g (\cdot)

, and

h_{1} = h_{1 n}

denote the bandwidth of

g^{'} (\cdot)

, but in this paper, for simplicity, the bandwidth

h = h_{0} = h_{1}

is chosen. Let the distributions of both the functional predictors

X_{j} (t)

and the scalar predictors Z belong to a compact support set U, and we have

Ω = {u = η_{i} |X_{j} (t), Z \in U}

. To simplify the expression, we let

g = g (u; θ)

,

g^{'} = g^{'} (u; θ)

. For a fixed

θ

, apply the method of local linear regression to obtain an initial estimate of

{\tilde{g}}^{(0)}

and

{\tilde{g}}^{' (0)}

for g and

g^{'}

, respectively. We minimize the weighted sum of squares at any point u, and the formula for calculating the weighted sum of squares is

\sum_{i = 1}^{n} {[Y_{i} - g - g^{'} (η_{i} - u)]}^{2} k_{h} (η_{i} - u) .

(6)

Through minimizing (6), we can obtain

{\tilde{g}}^{(0)}

and

{\tilde{g}}^{' (0)}

, and they can be represented as

{\tilde{g}}^{(0)} = \sum_{i = 1}^{n} ω_{i} (u; θ) Y_{i}

,

{\tilde{g}}^{' (0)} = \sum_{i = 1}^{n} {\tilde{ω}}_{i} (u; θ) Y_{i}

, where

ω_{i} (u; θ) = \frac{k_{h} (η_{i} - u) [φ_{n, 2} (u; θ, h) - (η_{i} - u) φ_{n, 1} (u; θ, h)]}{φ_{n, 0} (u; θ, h) φ_{n, 2} (u; θ, h) - φ_{n, 1}^{2} (u; θ, h)},

{\tilde{ω}}_{i} (u; θ) = \frac{k_{h} (η_{i} - u) [(η_{i} - u) φ_{n, 0} (u; θ, h) - φ_{n, 1} (u; θ, h)]}{φ_{n, 0} (u; θ, h) φ_{n, 2} (u; θ, h) - φ_{n, 1}^{2} (u; θ, h)},

φ_{n, l} (u; θ, h) = \frac{1}{n} \sum_{i = 1}^{n} {(\frac{η_{i} - u}{h})}^{l} k_{h} (η_{i} - u), l = 0, 1, 2 .

Step 3 Using the method of Step 1, the link function is replaced by the estimated link functions

{\tilde{g}}^{(α)}

and

{\tilde{g}}^{' (α)}

, where

α = 0, 1, 2, \dots

. To update

{\tilde{θ}}^{(α)}

, solve the estimation equation (5) for

θ

. From this we can obtain the estimated value of

{\tilde{θ}}^{(α)}

{\tilde{θ}}^{(α)} = {({\tilde{χ}}_{11}^{(α)}, {\tilde{χ}}_{12}^{(α)}, \dots, {\tilde{χ}}_{1 m_{1}}^{(α)}, \dots, {\tilde{χ}}_{d 1}^{(α)}, {\tilde{χ}}_{d 2}^{(α)}, \dots, {\tilde{χ}}_{d m_{d}}^{(α)}, {\tilde{γ}}_{1}^{(α)}, {\tilde{γ}}_{2}^{(α)}, \dots, {\tilde{γ}}_{q}^{(α)})}^{T} .

Step 4 Using the method in Step 2, the parameter vector is replaced by the estimated

{\tilde{θ}}^{(α)} = {({\tilde{χ}}_{j 1}^{(α)}, {\tilde{χ}}_{j 2}^{(α)}, \dots, {\tilde{χ}}_{j m}^{(α)}, {\tilde{γ}}_{1}^{(α)}, {\tilde{γ}}_{2}^{(α)}, \dots, {\tilde{γ}}_{q}^{(α)})}^{T}

, where

α = 1, 2, 3, \dots

From this we obtain the estimates

{\tilde{g}}^{(α)}

and

{\tilde{g}}^{' (α)}

for g and

g^{'}

, where

α = 1, 2, 3, \dots

Step 5 Repeat the above steps until

|{\tilde{θ}}^{(α + 1)} - {\tilde{θ}}^{(α)}|

converge, and stop the iteration.

Step 6 The final estimate of the regression coefficient

θ

is obtained as

\hat{θ}

, and the estimate of the link function g is obtained as

\hat{g}

.

4. Asymptotic Properties

To derive the asymptotics of the estimates of the link function

\hat{g}

and the regression coefficients

\hat{θ}

, some additional assumptions are required:

(C1): There exists $b = max (4, c)$ for a constant $c > 0$ , such that $E [\int_{T} {∥X_{j} (t)∥}^{b} d t] < \infty, j = 1, \dots, d, E [{∥Z∥}^{b}] < \infty, E [ε] < \infty .$
(C2): Let the density function $f (\cdot)$ of $η_{i}$ be strictly positive, and $f (\cdot)$ satisfies the first-order Lipschitz condition when $θ \to θ_{0}$ .
(C3): The kernel function $k (\cdot)$ satisfies the first-order Lipschitz condition and is a bounded and continuous symmetric probability density function and satisfies $\int_{- \infty}^{\infty} u^{2} k (u) d u \neq 0, \int_{- \infty}^{\infty} {|u|}^{2} k (u) d u < \infty .$
(C4): $n h^{4} / {log}^{2} n \to \infty, n h^{5} = O (1)$ . Here, h is the bandwidth of the kernel function.
(C5): For $j = 1, \dots, d$ , $m_{j} n^{- 1 / 4} \to 0$ as $n \to \infty$ .

Remark 1.

(C1) It is a necessary condition for the asymptotic normality of the estimator. (C2) Ensures that

{\tilde{g}}^{(α)}

,

{\tilde{g}}^{' (α)}

are far from 0 when

{\tilde{θ}}^{(α)}

is close enough to θ. (C3) The usual assumptions about the kernel function. (C4) The usual assumptions about the bandwidth. (C5) Some controls are applied to m in order to make the convergence faster.

4.1. Asymptotic Convergence of $g^{(α)}$

Lemma 1.

Let

(ζ_{1}, ι_{1}), \dots, (ζ_{n}, ι_{n})

be independent and identically distributed random vectors. Furthermore, assume that for any

s > 0

, there exist

E {|ι_{i}|}^{s} < \infty

,

i = 1, \dots, n

and

sup_{ζ} \int {|ι|}^{s} f (ζ, ι) d ι < \infty

such that

f (\cdot, \cdot)

is the joint density function of

(ζ, ι)

. Let

k (\cdot)

be a bounded and strictly positive kernel function that satisfies the Lipschitz condition, we have

sup_{ζ} |\frac{1}{n} \sum_{i = 1}^{n} [k_{h} (ζ_{i} - ζ) ι_{i} - E [k_{h} (ζ_{i} - ζ) ι_{i}]]| = O_{p} [{(\frac{log (1 / h)}{n h})}^{\frac{1}{2}}] .

Proof.

See Proposition 4 in Mack and Silverman (1982) [22]. □

Theorem 1.

If we assume that (C1)–(C5) holds, for

σ^{2} > 0

, then we have

\sqrt{n h} [{\tilde{g}}^{(α)} (u; θ) - g (u) - I (u)] \overset{D}{\to} N (0, ϑ^{2} (u))

where

I (u) = \frac{1}{2} h^{2} μ_{2} g^{″} (u)

,

ϑ^{2} (u) = ν_{2} σ^{2} / \sum_{j = 1}^{d} f_{j} (u)

, and for the kernel function, let

μ_{l} = \int u^{l} k (u) d u, ν_{l} = \int k^{l} (u) d u, l = 1, 2 .

Proof.

{({\tilde{g}}^{(α)} (u; θ_{0}), h {\tilde{g}}^{' (α)} (u; θ_{0}))}^{T} = Γ_{n}^{- 1} (u; θ_{0}) Φ_{n} (u; θ_{0}),

(7)

where

Γ_{n} (u; θ_{0}) = (\begin{matrix} φ_{n, 0} (u; θ_{0}) & φ_{n, 1} (u; θ_{0}) \\ φ_{n, 1} (u; θ_{0}) & φ_{n, 2} (u; θ_{0}) \end{matrix}),

Φ_{n} (u; θ_{0}) = {(ϕ_{n, 0} (u; θ_{0}), ϕ_{n, 1} (u; θ_{0}))}^{T},

φ_{n, l} (u; θ_{0}) = \frac{1}{n} \sum_{i = 1}^{n} {(\frac{η_{i 0} - u}{h})}^{l} k_{h} (η_{i 0} - u), l = 0, 1, 2,

ϕ_{n, l} (u; θ_{0}) = \frac{1}{n} \sum_{i = 1}^{n} {Y_{i} (\frac{η_{i 0} - u}{h})}^{l} k_{h} (η_{i 0} - u), l = 0, 1,

η_{i 0} = \sum_{j = 1}^{d} \sum_{k = 1}^{m_{j}} ξ_{i j k} χ_{j k} + {Z_{i}}^{T} γ .

By expanding

φ_{n, l} (u; θ)

,

l = 0, 1, 2, 3

, we obtain that

\begin{matrix} E [φ_{n, l} (u; θ)] & = E [\frac{1}{n} \sum_{i = 1}^{n} {(\frac{η_{i} - u}{h})}^{l} k_{h} (η_{i} - u)] \\ = \sum_{j = 1}^{d} f_{j} (u) μ_{l} + O (h) . \end{matrix}

(8)

From Lemma 1, it can be proved that for

l = 0, 1, 2, 3

\begin{matrix} φ_{n, l} (u; θ) - E [φ_{n, l} (u; θ)] = O_{p} [{(\frac{log (1 / h)}{n h})}^{\frac{1}{2}}] . \end{matrix}

(9)

Taking (8) into (9), we can obtain that

φ_{n, l} (u; θ) = \sum_{j = 1}^{d} f_{j} (u) μ_{l} + O_{p} [{(\frac{log (1 / h)}{n h})}^{\frac{1}{2}} + h] .

Then

Γ_{n} (u; θ) = Γ (u) + O_{p} [{(\frac{log (1 / h)}{n h})}^{\frac{1}{2}} + h],

where

Γ (u) = \sum_{j = 1}^{d} f_{j} (u) \otimes d i a g (1, μ_{2})

, ⊗ indicates the Kronecker product.

Inverting the matrix

Γ_{n} (u; θ)

, we get

Γ_{n}^{- 1} (u; θ) = Γ^{- 1} (u) + O_{p} [{(\frac{log (1 / h)}{n h})}^{\frac{1}{2}} + h] .

Let

\begin{matrix} ϕ_{n, l}^{*} (u; θ) = \frac{1}{n} \sum_{i = 1}^{n} {[Y_{i} - g (η_{i})] (\frac{η_{i} - u}{h})}^{l} k_{h} (η_{i} - u), \end{matrix}

(10)

where

l = 0, 1

, and

Φ_{n}^{*} \equiv Φ_{n}^{*} (u; θ) = {(ϕ_{n, 0}^{*} (u; θ), ϕ_{n, 1}^{*} (u; θ))}^{T} .

By expanding

ϕ_{n, l}^{*} (u; θ)

, we obtain when

l = 0, 1, 2,

\begin{matrix} E [ϕ_{n, l}^{*} (u; θ)] & = E [\frac{1}{n} \sum_{i = 1}^{n} [Y_{i} - g (η_{i})] {(\frac{η_{i} - u}{h})}^{l} k_{h} (η_{i} - u)] \\ = O (n^{- \frac{1}{2}}) . \end{matrix}

(11)

From Lemma 1, combined with (10) and (11), we can prove that

ϕ_{n, l}^{*} (u; θ) = O_{p} [{(\frac{log (1 / h)}{n h})}^{\frac{1}{2}} + n^{- 1 / 2}] .

The Taylor expansion of

g (η_{i})

at u is

\begin{matrix} Φ_{n} - Φ_{n}^{*} = Γ_{n} (\begin{matrix} g (u) \\ h g^{'} (u) \end{matrix}) + \frac{1}{2} h^{2} (\begin{matrix} φ_{n, 2} g^{″} (u) \\ φ_{n, 3} g^{″} (u) \end{matrix}) + o_{p} (h^{2}), \end{matrix}

(12)

where

Γ_{n} = Γ_{n} (u; θ)

,

φ_{n, l} = φ_{n, l} (u; θ), l = 2, 3

.

Combining (7) and (12), we can obtain

\begin{matrix} (\begin{matrix} {\tilde{g}}^{(α)} - g \\ h [{\tilde{g}}^{' (α)} - g^{'}] \end{matrix}) & = Γ_{n}^{- 1} Φ_{n}^{*} + \frac{1}{2} Γ_{n}^{- 1} h^{2} (\begin{matrix} φ_{n, 2} g^{″} (u) \\ φ_{n, 3} g^{″} (u) \end{matrix}) + o_{p} (h^{2}) \\ = Γ^{- 1} (u) Φ_{n}^{*} + \frac{1}{2} h^{2} (\begin{matrix} μ_{2} g^{″} (u) \\ \frac{μ_{3}}{μ_{2}} g^{″} (u) \end{matrix}) + o_{p} (h^{2} + n^{- 1 / 2}), \end{matrix}

where

\begin{matrix} {\tilde{g}}^{(α)} - g = {[\sum_{j = 1}^{d} f_{j} (u)]}^{- 1} ϕ_{n, 0}^{*} (u; θ) + \frac{1}{2} h^{2} μ_{2} g^{″} (u) + o_{p} (h^{2} + n^{- 1 / 2}) . \end{matrix}

(13)

Since

∥θ - θ_{0}∥ = O (n^{- 1 / 2})

, (10) can be transformed into

ϕ_{n, 0}^{*} (u; θ) = \frac{1}{n} \sum_{i = 1}^{n} [Y_{i} - g (η_{i 0})] k_{h} (η_{i 0} - u) + O_{p} (n^{- 1 / 2}) .

Taking it into (13) and combining it with Theorem 3.3 in Masry and Tjøstheim (1995) [23], finally, Theorem 1 can be proved. □

Corollary 1.

If we further refine the condition in assumption (C4) such that

n h^{5} \to 0

, then it follows that

\sqrt{n h} [{\tilde{g}}^{(α)} (u; θ) - g (u)] \overset{D}{\to} N (0, ϑ^{2} (u)) .

4.2. Asymptotic Convergence of $\hat{θ}$

First, we need to provide some more specific explanations for the estimation iteration process mentioned in “Estimation”, which makes some preparation for Theorem 2.

(1) solving

U (θ)

by Equation (5) given the assumption that the link function is known. Assume

Q_{θ_{0}} = - \frac{\partial U (θ_{0})}{\partial μ} \frac{\partial μ}{\partial θ_{0}^{T}}

, then it follows that

Q_{θ_{0}} = \sum_{i = 1}^{n} \frac{g^{'} {(η_{i 0})}^{2}}{σ^{2} (μ_{i 0})} Δ_{i} Δ_{i}^{T} + \sum_{i = 1}^{n} (Y_{i} - μ_{i 0}) [\frac{g^{'} (η_{i 0}) {[σ^{2} (μ_{i 0})]}^{'}}{{[σ^{2} (μ_{i 0})]}^{2}} - \frac{g^{″} (η_{i 0})}{σ^{2} (μ_{i 0})}] Δ_{i} Δ_{i}^{T} .

Let

\bar{θ} \in θ_{n}

, where

\bar{θ} = {({\bar{χ}}_{j 1}, {\bar{χ}}_{j 2}, \dots, {\bar{χ}}_{j p_{j}}, {\bar{γ}}_{1}, \dots, {\bar{γ}}_{q})}^{T}, j = 1, 2, \dots, d,

and satisfies

{\bar{η}}_{i} = \sum_{j = 1}^{d} \sum_{k = 1}^{m_{j}} ξ_{i j k} {\bar{χ}}_{j k} + Z_{i} \bar{γ}

,

{\bar{μ}}_{i} = g ({\bar{η}}_{i})

. Similarly, we can obtain

Q_{\bar{θ}} = \sum_{i = 1}^{n} \frac{g^{'} {({\bar{η}}_{i})}^{2}}{σ^{2} ({\bar{μ}}_{i})} Δ_{i} Δ_{i}^{T} + \sum_{i = 1}^{n} (Y_{i} - {\bar{μ}}_{i}) [\frac{g^{'} ({\bar{η}}_{i}) {[σ^{2} ({\bar{μ}}_{i})]}^{'}}{{[σ^{2} ({\bar{μ}}_{i})]}^{2}} - \frac{g^{″} ({\bar{η}}_{i})}{σ^{2} ({\bar{μ}}_{i})}] Δ_{i} Δ_{i}^{T} .

(2) Solving

U^{*} (θ)

given the link function is unknown by

U^{*} (θ_{0}) = \sum_{i = 1}^{n} (Y_{i} - {\tilde{μ}}_{i 0}) \frac{{\tilde{g}}^{' (α)} (η_{i 0})}{σ^{2} ({\tilde{μ}}_{i 0})} Δ_{i} = 0,

where

{\tilde{μ}}_{i 0} = {\tilde{g}}^{(α)} (η_{i 0})

. Similarly, we can obtain

Q_{\bar{θ}}^{*} = \sum_{i = 1}^{n} \frac{{\tilde{g}}^{' (α)} {({\bar{η}}_{i})}^{2}}{σ^{2} ({\bar{μ}}_{i}^{*})} Δ_{i} Δ_{i}^{T} + \sum_{i = 1}^{n} (y_{i} - {\bar{μ}}_{i}^{*}) [\frac{{\tilde{g}}^{' (α)} ({\bar{η}}_{i}) {[σ^{2} ({\bar{μ}}_{i}^{*})]}^{'}}{{[σ^{2} ({\bar{μ}}_{i}^{*})]}^{2}} - \frac{{\tilde{g}}^{″ (α)} ({\bar{η}}_{i})}{σ^{2} ({\bar{μ}}_{i}^{*})}] Δ_{i} Δ_{i}^{T},

where

{\bar{μ}}_{i}^{*} = g^{(α)} ({\bar{η}}_{i})

.

Lemma 2.

If the assumptions (C1)–(C5) hold, we have

sup_{\bar{θ} \in θ_{n}} \frac{1}{n} |Q_{\bar{θ}}^{*} - Q_{\bar{θ}}| = O_{p} (1) .

Proof.

Let

M_{i} = \frac{1}{σ^{2} ({\bar{μ}}_{i}^{*})} - \frac{1}{σ^{2} ({\bar{μ}}_{i})}

,

N_{i} = \frac{{[σ^{2} ({\bar{μ}}_{i}^{*})]}^{'}}{{[σ^{2} ({\bar{μ}}_{i}^{*})]}^{2}} - \frac{{[σ^{2} ({\bar{μ}}_{i})]}^{'}}{{[σ^{2} ({\bar{μ}}_{i})]}^{2}}

, By Theorem 4.1 of Chiou and Müller [24], we know that

max_{1 ⩽ i ⩽ n} |M_{i}| = o_{p} (1)

and

max_{1 ⩽ i ⩽ n} |N_{i}| = o_{p} (1)

, then

\frac{1}{n} |Q_{\bar{θ}}^{*} - Q_{\bar{θ}}| = A + B + C,

(14)

where A, B, and C can be expressed as

A = \frac{1}{n} \sum_{i = 1}^{n} [\frac{{\tilde{g}}^{' (α)} {({\bar{η}}_{i})}^{2}}{σ^{2} ({\bar{μ}}_{i}^{*})} - \frac{g^{'} {({\bar{η}}_{i})}^{2}}{σ^{2} ({\bar{μ}}_{i})}] Δ_{i} Δ_{i}^{T} ⩽ \frac{1}{n} max_{1 ⩽ i ⩽ n} |M_{i}|,

\begin{matrix} B & = \frac{1}{n} \sum_{i = 1}^{n} (y_{i} - {\bar{μ}}_{i}) [\frac{{\tilde{g}}^{' (α)} ({\bar{η}}_{i}) {[σ^{2} ({\bar{μ}}_{i}^{*})]}^{'}}{{[σ^{2} ({\bar{μ}}_{i}^{*})]}^{2}} - \frac{g^{'} ({\bar{η}}_{i}) {[σ^{2} ({\bar{μ}}_{i})]}^{'}}{{[σ^{2} ({\bar{μ}}_{i})]}^{2}} \\ - (\frac{{\tilde{g}}^{″ (α)} ({\bar{η}}_{i})}{σ^{2} ({\bar{μ}}_{i}^{*})} - \frac{g^{″} ({\bar{η}}_{i})}{σ^{2} ({\bar{μ}}_{i})})] Δ_{i} Δ_{i}^{T} \\ \leq \frac{1}{n} \sum_{i = 1}^{n} (y_{i} - {\bar{μ}}_{i}) (max_{1 \leq i \leq n} |N_{i}| + max_{1 \leq i \leq n} |M_{i}|), \end{matrix}

\begin{matrix} C & = \frac{1}{n} \sum_{i = 1}^{n} ({\bar{μ}}_{i}^{*} - {\bar{μ}}_{i}) [\frac{{\tilde{g}}^{' (α)} ({\bar{η}}_{i}) {[σ^{2} ({\bar{μ}}_{i}^{*})]}^{'}}{{[σ^{2} ({\bar{μ}}_{i}^{*})]}^{2}} - \frac{{\tilde{g}}^{″ (α)} ({\bar{η}}_{i})}{σ^{2} ({\bar{μ}}_{i}^{*})}] Δ_{i} Δ_{i}^{T} \\ = o_{p} (1) . \end{matrix}

Then. by (14) we can get

\frac{1}{n} |Q_{\bar{θ}}^{*} - Q_{\bar{θ}}| = o_{p} (1) .

□

Lemma 3.

If the assumptions (C1)–(C5) hold, we have

(\frac{1}{\sqrt{n}}) |U^{*} (θ_{0}) - U (θ_{0})| = o_{p} (1) .

Proof.

Combining Theorem 1 and

max_{1 ⩽ i ⩽ n} |M_{i}| = o_{p} (1)

in Lemma 2, we can prove that

\begin{matrix} \frac{1}{\sqrt{n}} (U^{*} (θ_{0}) - U (θ_{0})) = & \frac{1}{\sqrt{n}} [(Y_{i} - μ_{i 0}) g^{'} (η_{i 0}) (\frac{1}{σ^{2} ({\tilde{μ}}_{i 0})} - \frac{1}{σ^{2} (μ_{i 0})}) \\ + \frac{(Y_{i} - μ_{i 0})}{σ^{2} ({\tilde{μ}}_{i 0})} (g^{'} (η_{i 0}) - {\tilde{g}}^{' (α)} (η_{i 0})) + \frac{{\tilde{g}}^{' (α)} (η_{i 0})}{σ^{2} ({\tilde{μ}}_{i 0})} ({\tilde{μ}}_{i 0} - μ_{i 0})] Δ_{i} \\ = & o_{p} (1) . \end{matrix}

□

Theorem 2.

If we assume that (C1)–(C5) hold, we have

(\begin{matrix} \frac{n d_{G}^{2} ({\hat{β}}_{1}, β_{1}) - m_{1}}{\sqrt{2 m_{1}}} \\ ⋮ \\ \frac{n d_{G}^{2} ({\hat{β}}_{d}, β_{d}) - m_{d}}{\sqrt{2 m_{d}}} \\ \sqrt{n o_{1}} (γ_{1} - {\hat{γ}}_{1}) \\ ⋮ \\ \sqrt{n o_{q}} (γ_{q} - {\hat{γ}}_{q}) \end{matrix}) \overset{d}{\to} N (0, I),

In the case of truncated models for

m_{j}

, let

{\hat{χ}}_{j}

be the estimator of

χ_{j} = {(χ_{j 1}, χ_{j 2}, \dots, χ_{j m_{j}})}^{T}

,

{\tilde{Λ}}_{j} = (λ_{j, k_{1} k_{2}}), 1 \leq k_{1}, k_{2} \leq m_{j}

, where

λ_{j, k_{1} k_{2}} = E [\frac{g^{'} {(η)}^{2}}{σ^{2} (μ)} ξ_{j k_{1}} ξ_{j k_{2}}]

. We define

{\bar{χ}}_{j}

=

(χ_{j (m_{j} + 1)},

χ_{j (m_{j} + 2)} {, \dots)}^{T}

. Therefore, we have the following expression:

d_{G}^{2} ({\hat{β}}_{j}, β_{j}) = {({\hat{χ}}_{j} - χ_{j})}^{T} {\tilde{Λ}}_{j} ({\hat{χ}}_{j} - χ_{j}) + \sum_{k_{1}, k_{2} = m_{j}}^{\infty} λ_{j, k_{1} k_{2}} {\bar{χ}}_{j}^{2}, j = 1, \dots, d .

Furthermore, let

o_{l} = E [\frac{g^{'} {(η)}^{2}}{σ^{2} (μ)} z_{i l}^{2}]

, where

l = 1, \dots, q

. Here, I represents a

(q + \sum_{j = 1}^{d} m_{j}) \times (q + \sum_{j = 1}^{d} m_{j})

dimensional identity matrix.

Proof.

By using the Taylor expansion with a suitable mean value

\bar{θ}

, we can obtain

U^{*} (\hat{θ}) = U^{*} (θ_{0}) - Q_{\bar{θ}}^{*} (\hat{θ} - θ_{0}) = 0 .

(15)

Then, by Lemmas 2 and 3, (15) can be deformed as

U^{*} (\hat{θ}) = U (θ_{0}) - Q_{\bar{θ}} (\hat{θ} - θ_{0}) + o_{p} (\sqrt{n}) = 0 .

Then, we can get

\hat{θ} - θ_{0} = Q_{\bar{θ}}^{- 1} U (θ_{0}) + o_{p} (\frac{1}{\sqrt{n}}) .

By combining the above equation with

U ({\tilde{θ}}^{(α)}) = U (θ_{0}) - Q_{\bar{θ}} ({\tilde{θ}}^{(α)} - θ_{0}) = 0,

we can get

\sqrt{n} (\hat{θ} - θ_{0}) = \sqrt{n} ({\tilde{θ}}^{(α)} - θ_{0}) + o_{p} (1) .

(16)

By (16), it can be seen that it transforms the relationship between

\hat{θ}

and

θ_{0}

in the case of unknown link functions into the relationship between

{\tilde{θ}}^{(α)}

and

θ_{0}

in the case of known link functions, and then combined with Theorem 1 in [25], the proof of Theorem 2 can be obtained. □

4.3. Asymptotic Convergence of $\hat{g}$

Theorem 3.

If we assume that (C1)–(C5) hold, for

σ^{2} > 0

, then we have

\sqrt{n h} [\hat{g} (u; \hat{θ}) - g (u) - I (u)] \overset{D}{\to} N (0, ϑ^{2} (u)) .

Proof.

\begin{matrix} \sqrt{n h} [\hat{g} (u; \hat{θ}) - g (u) - I (u)] = & \sqrt{n h} [\hat{g} (u; \hat{θ}) - \hat{g} (u; θ) + {\tilde{g}}^{(α)} (u; θ) \\ - g (u) - I (u)] \\ \leq & \sqrt{n h} [{\hat{g}}^{'} (u; θ) |\hat{θ} - θ|] \\ + \sqrt{n h} [{\tilde{g}}^{(α)} (u; θ) - g (u) - I (u)] \\ = & \sqrt{n h} [{\tilde{g}}^{(α)} (u; θ) - g (u) - I (u)] + o p (1) . \end{matrix}

The above expression transforms the relationship between

\hat{g}

and g into the relationship between

{\tilde{g}}^{(α)}

and g (i.e., Theorem 1). Therefore, by Theorem 1, we can get Theorem 3. □

Corollary 2.

If we further refine the condition in assumption (C4) such that

n h^{5} \to 0

, then it follows that

\sqrt{n h} [\hat{g} (u; \hat{θ}) - g (u)] \overset{D}{\to} N (0, ϑ^{2} (u)) .

Remark 2.

Let

(e_{1}, λ_{1}), (e_{2}, λ_{2}), \dots, (e_{m_{j}}, λ_{m_{j}})

represent the eigenvalues and eigenvectors of Ω, where

e_{k} = (e_{j 1}, \dots, e_{j m_{j}}), w_{k} (t) = \sum_{j = 1}^{d} ρ_{j k} (t) e_{j k}, k = 1, 2, \dots, m_{j},

Ω = \frac{1}{n} E (\frac{{\hat{g}}^{'} {({\hat{η}}_{i})}^{2}}{σ^{2} ({\hat{μ}}_{i})} D_{j}^{T} D_{j}), i = 1, \dots, n, j = 1, \dots, d .

Then, the 95% confidence band for the regression coefficient function

{\hat{β}}_{j} (t)

can be expressed as

{\hat{β}}_{j} (t) \pm \sqrt{r (α) \sum_{k = 1}^{m_{j}} \frac{w_{k} {(t)}^{2}}{e_{k}}},

where

r (α) = [m_{j} + \sqrt{2 m_{j}} Φ (1 - α)]

,

α = 0.05

,

Φ (1 - α) = 1.96

.

5. Simulation

We consider a binary response and two functional predictors as well as three scalar predictors. The functional predictors

X_{i 1} (t)

and

X_{i 2} (t)

(

i = 1, \dots, n

) are observed at 50 equal distant time points on the interval

[0, 1]

.

The sample sizes are

n = 50, 100, 300

. Let the score coefficients

ξ_{i j k}

for each functional predictor satisfy the following assumptions:

ξ_{i 1 k} \sim N (0, λ_{1 k}), k = 1, 2, 3, 4,

where

λ_{11} = 1, λ_{12} = \sqrt{2} / 2, λ_{13} = 1 / 2, λ_{14} = \sqrt{2} / 4

.

ξ_{i 2 k} \sim N (0, λ_{2 k}), k = 1, 2, 3,

where

λ_{21} = 1, λ_{22} = \sqrt{2} / 2, λ_{23} = 1 / 2

.

We define the orthonormal basis functions

ρ_{1 k} (t)

and

ρ_{2 k} (t)

,

t \in [0, 1]

, which satisfy

ρ_{1 k} (t) = \sqrt{2} \sin (2 k π t), k = 1, 2, 3, 4,

ρ_{2 k} (t) = \sqrt{2} \cos (2 k π t), k = 1, 2, 3 .

Then,

X_{i j} (t)

can be represented through Karhunen–Loeve expansion as follows:

X_{i 1} (t) = \sum_{k = 1}^{4} ξ_{i 1 k} ρ_{1 k} (t),

X_{i 2} (t) = \sum_{k = 1}^{3} ξ_{i 2 k} ρ_{2 k} (t) .

Figure 1 shows the 50 trajectories of the two functional predictors

X_{1} (t)

and

X_{2} (t)

.

Figure 1. The predictors

X_{1} (t)

and

X_{2} (t)

.

The scalar predictor

Z = {(Z_{1}, Z_{2}, Z_{3})}^{T}

satisfies the following assumption

Z_{1} \sim N (0, 1), Z_{2} \sim N (0, \frac{\sqrt{3}}{3}), Z_{3} \sim N (0, \frac{\sqrt{5}}{5}) .

We assume that the regression coefficient functions of the functional predictors satisfy the following assumption

β_{1} (t) = \sum_{k = 1}^{4} χ_{1 k} ρ_{1 k} (t),

β_{2} (t) = \sum_{k = 1}^{3} χ_{2 k} ρ_{2 k} (t),

where

χ_{1 k} = \sqrt{\frac{1}{3 k}}, k = 1, 2, 3, 4

and

χ_{2 k} = \sqrt{\frac{1}{3 k}}, k = 1, 2, 3

. Moreover, we assume that the regression coefficients

γ = {(γ_{1}, γ_{2}, γ_{3})}^{T}

of the scalar predictors satisfy

γ_{1} = \sqrt{2} / 2

,

γ_{2} = \sqrt{3} / 3

,

γ_{3} = 1 / 2

.

Define

P (X, Z) = g (\sum_{j = 1}^{2} \int_{T} X_{j} (t) β_{j} (t) + Z^{T} γ) .

And we select the link function as

g (x) = \frac{exp (x)}{1 + exp (x)} .

We generate binary response

Y (X, Z) \sim B i n o m i a l (P (X, Z), 1)

as pseudo random sequence.

We obtain a sample

(Y_{i}, X_{i 1} (t), X_{i 2} (t), Z_{i}), i = 1, \dots, n,

where n is the sample size. The number of functional principal components that explain 85% of cumulative variation contribution are

m_{1} = 3, 3, 4

,

m_{2} = 2, 3, 3

, respectively. We run 100 simulations.

Figure 2 shows the asymptotic behavior of the link function under different sample sizes. The black lines in Figure 2 shows the relationship between

η

and

μ

, where

η = \sum_{j = 1}^{2} \int_{T} X_{j} (t) β_{j} (t) d t + Z^{T} γ, μ = g (η) = \frac{exp (η)}{1 + exp (η)} \in [0, 1] .

The additional colored lines shown in Figure 2 represent the estimated link function

\hat{g}

for different sample sizes. These lines are obtained through iterative processes, starting with an initial value of g set to

g (η) = η

. The iterative process continues until one of the following conditions is met: 100 iterations have been performed, or the error in the regression coefficients is less than 0.01. The purpose of these lines is to illustrate the relationship between

\hat{η}

and

\hat{μ}

, where

\hat{η} = \sum_{j = 1}^{2} \int_{T} X_{j} (t) {\hat{β}}_{j} (t) d t + Z^{T} \hat{γ}, \hat{μ} = \hat{g} (\hat{η}) \in [0, 1] .

Since in this case, both

\hat{η}

and

η

are in

[- 2, 2]

, we denote the argument of g and

\hat{g}

by

η

, and the x-axis in Figure 2 is denoted by

η

and is shown in the interval

[0, 1]

. Table 2 presents the estimates of

\hat{g}

evaluated through RMISE under different sample sizes. The RMISE is defined as follows:

R M I S E = \sqrt{\frac{1}{Q} \int_{- 2}^{2} {(\hat{g} (η) - g (η))}^{2} d η},

where

Q = 100

is the number of simulations here. In summary, Figure 2 and Table 2 demonstrate that as the sample size increases, the estimated link function

\hat{g}

becomes closer and closer to the true link function g.

Figure 2. Asymptotic properties of the link function g. The black line in the graph represents the true link function

g = exp (η) / (1 + exp (η))

. The purple, yellow, and red lines in the graph represent the estimated link functions

\hat{g}

under sample sizes of

n = 50

,

n = 100

, and

n = 300

, respectively.

Table 2. RMISE of g and

\hat{g}

for different sample size n.

In Table 3, it can be seen that both the SD and RMISE of the estimated regression coefficient functions

{\hat{β}}_{1} (t)

and

{\hat{β}}_{2} (t)

decrease as the sample size n increases.

Table 3. SD and RMISE of the estimated values of

{\hat{β}}_{1} (t)

and

{\hat{β}}_{2} (t)

for different sample sizes n.

Figure 3 displays the estimated functional regression coefficients

{\hat{β}}_{1} (t)

and

{\hat{β}}_{2} (t)

, as well as their 95% confidence intervals under different sample sizes. The red curve in the figure represents the theoretical values of

β_{1} (t)

and

β_{2} (t)

, while the blue curve represents the estimated values

{\hat{β}}_{1} (t)

and

{\hat{β}}_{1} (t)

. The gray shaded area represents the 95% confidence interval of the estimates. It can be seen that as the sample size increases, the estimated values become closer to the true values.

Figure 3. Estimated values of regression coefficient function

{\hat{β}}_{1} (t)

,

{\hat{β}}_{2} (t)

(blue curves) and their 95% confidence intervals (grey area) for difference sample size, where the red curves are the theoretical regression coefficient functions

β_{1} (t)

,

β_{2} (t)

.

Table 4 presents the estimated scalar regression coefficient

\hat{γ}

and corresponding standard deviation under different sample sizes. It can be seen that as the sample size n increases,

\hat{γ} = {({\hat{γ}}_{1}, {\hat{γ}}_{2}, {\hat{γ}}_{3})}^{T}

becomes closer to the true values

γ = {(\sqrt{2} / 2, \sqrt{3} / 3, 1 / 2)}^{T}

. Moreover, as the sample size n increases, the SD becomes smaller, indicating that the estimated values have more certainty.

Table 4. Estimated values of scalar regression coefficients

\hat{γ}

and their SD in brackets for different sample sizes n.

Table 5 presents the M1 and M2 values for different sample sizes, where

M 1 = \frac{1}{Q} \sum_{i = 1}^{Q} M A E

, MAE=

\frac{1}{n} \sum_{i = 1}^{n} | Y_{i} - {\hat{Y}}_{i} |

,

M 2 = \frac{1}{Q} \sum_{i = 1}^{Q} M S E

, MSE =

\frac{1}{n} \sum_{i = 1}^{n} {(Y_{i} - {\hat{Y}}_{i})}^{2}

, and

Y_{i}

and

{\hat{Y}}_{i}

represent the real and the predicted values of the response variable, respectively. We can find that as the sample size increases, the values of M1 and M2 become smaller, indicating that the predictive performance of the model improves.

Table 5. The M1 and M2 values for different sample sizes n.

6. Application

As is well known, research on average life expectancy is crucial for social development, health policies, and population management. Studies on average life expectancy can help governments, health departments, and social institutions develop relevant policies and plans to improve people’s quality of life and health conditions. By understanding people’s life expectancy, the efficiency of healthcare systems and the effectiveness of social welfare and public health policies can be evaluated, providing a basis for resource allocation and planning. Additionally, research on average life expectancy can also help people understand population structure and trends, providing references for social-economic development, pension systems, and labor market planning. Therefore, in the application of our proposed model, we investigate factors that influence average life expectancy, including air quality index (AQI), temperature, GDP, and number of beds in hospitals.

6.1. Data Description

We collected average daily temperature (Temp) data for 58 cities in China in 2020 from the National Meteorological Science Data Sharing Service Platform, and average daily Air Quality Index (AQI) data from the National Environmental Monitoring Station. We also collected GDP, number of beds in hospitals, and life expectancy data for each city from local statistical bulletins and government documents. Among them, there are two functional predictive variables, which are daily AQI and temperature from 1 January to 31 December 2020, for 366 days in 58 cities. There are also two scalar predictive variables, which are GDP and number of beds in hospitals for the 58 cities in 2020. The response variable is the life expectancy of residents in each city in 2020.

Figure 4 shows the daily AQI and temperature for 58 cities in 2020.

Figure 4. Daily AQI (left plot) and daily temperatures (right plot) for 58 cities in 2020; each curve represents one city.

6.2. Data Analysis

According to a report released by the National Health Commission, the average life expectancy of Chinese residents in 2020 was 77.9 years. Therefore, we divide the response variable as follows: when the life expectancy of a city is greater than 77.9 years, we represent it as 1; otherwise, when the life expectancy is less than 77.9 years, we represent it as 0. For the functional predictors, we first centralize the data. Second, we conduct FPCA and select the number of functional principal components that explain 75% of the variation. The number of components for AQI and temperature is

p_{A Q I} = 10

and

p_{T e m p} = 3

, respectively. We use GCV to demonstrate the predictive accuracy of the estimators. In this application,

G C V = 0.135

.

6.3. Results Analysis

By inputting the data into the generalized partially functional linear model, we obtain the regression coefficient function

\hat{β} (t)

for the functional predictors and the regression coefficients

\hat{γ}

for the scalar predictors. The results are shown in Table 6 and Figure 5, respectively.

Table 6. Regression coefficients

\hat{γ}

and their significance levels.

Figure 5. Estimated values of regression coefficient function

\hat{β} (t)

and their 95% confidence intervals.

Table 6 presents the estimated values of the regression coefficient

\hat{γ}

for scalar predictor variables. We can see that both GDP and number of beds in hospitals have a positive relationship with life expectancy, and are significant at the 5% level. This means that when a region has a higher GDP and more hospital beds, the life expectancy in that region is longer. In other words, the better the economic development and medical resources of a region, the longer the life expectancy.

In Figure 5, we see the estimated values of the regression coefficient function

\hat{β} (t)

. For AQI, we can find a negative relationship between AQI and life expectancy in general. The higher the value of AQI, the more serious the air pollution is, and the lower the life expectancy corresponding to it. However, there is a more obvious positive relationship trend in February to April, which may be influenced by some other external factors. For temperature, we can find that the effect of temperature on life expectancy varies with the change of seasons. In spring, summer, and fall (March to October), the effect of temperature on life expectancy is negatively correlated, and in winter (November to February), it is positively correlated, which is consistent with the conclusion in Huang et al. [19].

To confirm the necessity of considering the unknown link function model, we choose models without a link function and with the logit link function (i.e.,

g (η) = \frac{e^{η}}{1 + e^{η}}

) and compare them with our proposed models with unknown link functions. In order to evaluate the prediction performance of the three models, we use MAE, MSE, and

R^{2}

. Additionally, we calculate the accuracy using the confusion matrix, where we define TP as the number of samples correctly classified as positive, TN as the number of samples correctly classified as negative, FP as the number of samples incorrectly classified as positive, and FN as the number of samples incorrectly classified as negative (missed detections). We obtain the model’s accuracy using the formula

A c c u r a c y = \frac{T P + T N}{T P + T N + F P + F N} .

When the values of MAE and MSE are smaller, it indicates that the model has a smaller prediction error and better performance. When

R^{2}

is closer to 1, it indicates that the model has a stronger ability to explain the response variable. The experimental results are shown in Table 7. It can be seen that the model we proposed has the best performance.

Table 7. Comparison between Unknown Link Function Model, Logit Link Function Model, and Model without a Link Function.

7. Conclusions

This article proposes a generalized partially functional linear model for scalar response and predictor variables that include both functional and scalar components, without specifying a link function. We use functional principal component analysis to reduce the dimensionality of functional data, estimate the regression coefficients using the maximum likelihood estimation method, estimate the link function using the method of local linear regression, iteratively obtain the final estimator, and establish the asymptotic normality of the estimator. The accuracy of the proposed model is validated through simulation studies.

The article applies the proposed model to the study of average life expectancy. Using daily AQI, temperature, GDP, and number of beds in hospitals for 58 cities in China in 2020, the study explores the impact of environmental, economic, and medical factors on life expectancy. The results indicate that GDP and number of beds in hospitals have a positive correlation with the life expectancy, while the AQI has an overall negative correlation. Temperature has a negative correlation with the average life expectancy in spring, summer, and autumn, and a positive correlation in winter. Overall, the study concludes that the average life expectancy is higher in areas with better environmental, economic, and medical development.

This model can be used in various fields, including economics, bio-medicine, engineering, etc. However, this model still has certain limitations. For example, the relationship between air quality and temperature needs to be further considered. There is a certain correlation between temperature and air quality. Generally, an increase in temperature can lead to the intensified volatilization and diffusion of pollutants in the air, thereby causing a decline in air quality. In the next phase of research, we will consider the interactions between functional predictors to make results more accurate. In addition, the algorithms and optimization methods of the model can be further improved to enhance computational efficiency. Combining this model with other machine learning methods can further improve predictive performance.

Author Contributions

W.X.: methodology, software, validation, writing—review, supervision, funding acquisition. S.L.: methodology, software, data curation, writing—original draft. H.L.: writing—review, supervision. All authors have read and agreed to the published version of the manuscript.

Funding

This research is supported by the Yujie Talent Project of North China University of Technology (Grant No. 107051360023XN075-04).

Data Availability Statement

The original data supporting the results of this study can be obtained from the National Meteorological Science Data Sharing Service Platform, the National Environmental Monitoring Station, and local statistical bulletins.

Acknowledgments

The authors would like to thank the referees and the editor for their useful suggestions, which helped us improve the paper.

Conflicts of Interest

The authors declare no conflict of interest.

References

Ramsay, J.O. When the data are functions. Psychometrika 1982, 47, 379–396. [Google Scholar] [CrossRef]
Ramsay, J.O.; Silverman, B.W. Functional Data Analysis, 2nd ed.; Springer: New York, NY, USA, 2005. [Google Scholar]
Horváth, L.; Kokoszka, P. Inference for Functional Data with Application; Springer: New York, NY, USA, 2012. [Google Scholar]
Shin, H. Partial functional linear regression. J. Stat. Plan. Inference 2009, 139, 3405–3418. [Google Scholar] [CrossRef]
Shin, H.; Lee, M.H. On prediction rate in partial functional linear regression. J. Multivar. Anal. 2012, 103, 93–106. [Google Scholar] [CrossRef][Green Version]
James, G.M. Generalized linear models with functional predictors. J. R. Stat. Soc. Ser. B 2002, 64, 411–432. [Google Scholar] [CrossRef]
Müller, H.G.; Stadtmüller, U. Generalized functional linear models. Ann. Stat. 2005, 33, 774–805. [Google Scholar] [CrossRef]
Shang, Z.F.; Cheng, G. Nonparametric inference in generalized functional linear models. Ann. Stat. 2015, 43, 1742–1773. [Google Scholar] [CrossRef]
Wong, R.K.W.; Li, Y.; Zhu, Z.Y. Partially Linear Functional Additive Models for Multivariate Functional Data. J. Am. Stat. Assoc. 2019, 114, 406–418. [Google Scholar] [CrossRef]
Scallan, A.; Gilchrist, R.; Green, M. Fitting Parametric Link Functions in Generalized Linear Models. Comput. Stat. Data Anal. 1984, 2, 37–49. [Google Scholar] [CrossRef]
Weisberg, S.; Welsh, A.H. Adapting for the missing link. Ann. Stat. 1994, 22, 1674–1700. [Google Scholar] [CrossRef]
Chiou, J.M.; Müller, H.G. Quasi-likelihood regression with unknown link and variance functions. J. Am. Stat. Assoc. 1998, 93, 1376–1387. [Google Scholar] [CrossRef]
Chiou, J.M.; Müller, H.G. Estimated estimating equations: Semiparametric inference for clustered and longitudinal data. J. R. Stat. Soc. Ser. B 2005, 67, 531–553. [Google Scholar] [CrossRef]
Bai, Y.; Fung, W.K.; Zhu, Z.Y. Penalized quadratic inference functions for single-index models with longitudinal data. J. Multivar. Anal. 2009, 100, 152–161. [Google Scholar] [CrossRef]
Pang, Z.; Xue, L. Estimation for the single-index models with random effects. Comput. Stat. Data Anal. 2012, 56, 1837–1853. [Google Scholar] [CrossRef]
Yuan, M.; Diao, G. Sieve maximum likelihood estimation in generalized linear models with an unknown link function. Wiley Interdiscip. Rev. Comput. Stat. 2017, 10, e1425. [Google Scholar] [CrossRef]
Kokoszka, P.; Reimherr, M. Introduction to Functional Data Analysis; CRC Press: Boca Raton, FL, USA, 2017. [Google Scholar] [CrossRef]
Rao, A.R.; Reimherr, M. Nonlinear Functional Modeling Using Neural Networks. J. Comput. Graph. Stat. 2023, 32, 1248–1257. [Google Scholar] [CrossRef]
Huang, C.; Barnett, A.G.; Wang, X.; Tong, S. The impact of temperature on years of life lost in Brisbane, Australia. Nat. Clim. Chang. 2012, 2, 265–270. [Google Scholar] [CrossRef]
Yang, Y.; Qi, J.L.; Ruan, Z.L.; Yin, P.; Zhang, S.Y.; Liu, J.M.; Liu, Y.N.; Li, R.; Wang, L.J.; Lin, H.L. Changes in Life Expectancy of Respiratory Diseases from Attaining Daily PM2.5 Standard in China: A Nationwide Observational Study. Innovation 2020, 1, 100064. [Google Scholar] [CrossRef] [PubMed]
Deryugina, T.; Molitor, D. The Causal Effects of Place on Health and Longevity. J. Econ. Perspect. 2021, 35, 147–170. [Google Scholar] [CrossRef]
Mack, Y.P.; Silverman, B.W. Weak and strong uniform consistency of kernel regression estimates. Probab. Theory Relat. Fields 1982, 63, 405–415. [Google Scholar] [CrossRef]
Masry, E.; Tjøstheim, D. Estimation and Identification of Nonlinear ARCH Time Series: Strong Convergence and Asymptotic Normality. Econom. Theory 1995, 11, 258–289. [Google Scholar] [CrossRef]
Chiou, J.M.; Müller, H.G. Nonparametric quasi-likelihood. Ann. Stat. 1999, 27, 36–64. [Google Scholar] [CrossRef]
Xiao, W.W.; Wang, Y.X.; Liu, H.Y. Generalized partially functional linear model. Sci. Rep. 2021, 11, 23428. [Google Scholar] [CrossRef] [PubMed]

Figure 1. The predictors

X_{1} (t)

and

X_{2} (t)

.

Figure 2. Asymptotic properties of the link function g. The black line in the graph represents the true link function

g = exp (η) / (1 + exp (η))

. The purple, yellow, and red lines in the graph represent the estimated link functions

\hat{g}

under sample sizes of

n = 50

,

n = 100

, and

n = 300

, respectively.

Figure 3. Estimated values of regression coefficient function

{\hat{β}}_{1} (t)

,

{\hat{β}}_{2} (t)

(blue curves) and their 95% confidence intervals (grey area) for difference sample size, where the red curves are the theoretical regression coefficient functions

β_{1} (t)

,

β_{2} (t)

.

Figure 4. Daily AQI (left plot) and daily temperatures (right plot) for 58 cities in 2020; each curve represents one city.

Figure 5. Estimated values of regression coefficient function

\hat{β} (t)

and their 95% confidence intervals.

Table 1. The abbreviations and their corresponding full forms.

Abbreviation	Full Form
FPCA	Functional principal component analysis
KL expansion	Karhunen–Loeve expansion
RMISE	Root Mean Integrated Square Error
SD	Standard Deviation
GCV	Generalized Cross Validation
MAE	Mean Absolute Error
MSE	Mean Squared Error
TP	True Positive
TN	True Negative
FP	False Positive
FN	False Negative

Table 2. RMISE of g and

\hat{g}

for different sample size n.

Table 2. RMISE of g and

\hat{g}

for different sample size n.

n	RMISE
50	0.3540
100	0.2734
300	0.1449

Table 3. SD and RMISE of the estimated values of

{\hat{β}}_{1} (t)

and

{\hat{β}}_{2} (t)

for different sample sizes n.

Table 3. SD and RMISE of the estimated values of

{\hat{β}}_{1} (t)

and

{\hat{β}}_{2} (t)

for different sample sizes n.

	n	SD	RMISE
	50	0.2475	0.3405
${\hat{β}}_{1} (t)$	100	0.1344	0.2517
	300	0.0552	0.1204
	50	0.2536	0.3232
${\hat{β}}_{2} (t)$	100	0.1261	0.2863
	300	0.0239	0.1033

Table 4. Estimated values of scalar regression coefficients

\hat{γ}

and their SD in brackets for different sample sizes n.

Table 4. Estimated values of scalar regression coefficients

\hat{γ}

and their SD in brackets for different sample sizes n.

n	${\hat{γ}}_{1}$	${\hat{γ}}_{2}$	${\hat{γ}}_{3}$
50	0.7298 (0.191)	0.5928 (0.177)	0.5307 (0.232)
100	0.6892 (0.092)	0.5832 (0.071)	0.4894 (0.096)
300	0.7105 (0.019)	0.5732 (0.018)	0.4988 (0.016)

Table 5. The M1 and M2 values for different sample sizes n.

n	M1	M2
50	0.3182	0.1579
100	0.3028	0.1498
300	0.2921	0.1406

Table 6. Regression coefficients

\hat{γ}

and their significance levels.

Table 6. Regression coefficients

\hat{γ}

and their significance levels.

	Estimate	Std.Error	t Value	Pr (> $\|t\|$ )
${\hat{γ}}_{G D P}$	0.6776	0.339	1.9988	0.04639
${\hat{γ}}_{B e d s}$	0.7354	0.367	2.0038	0.04585

Table 7. Comparison between Unknown Link Function Model, Logit Link Function Model, and Model without a Link Function.

Link Function	MAE	MSE	$R^{2}$	Accuracy
Unknown	0.2584	0.1399	0.8916	81.03%
Logit	0.2872	0.2511	0.6673	75.86%
Without	0.4777	0.3146	0.4118	74.14%

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Generalized Partially Functional Linear Model with Unknown Link Function

Abstract

1. Introduction

2. Preliminaries

3. Model and Estimation

3.1. Abbreviation Introduction

3.2. Model

3.3. Estimation

4. Asymptotic Properties

4.1. Asymptotic Convergence of $g^{(α)}$

4.2. Asymptotic Convergence of $\hat{θ}$

4.3. Asymptotic Convergence of $\hat{g}$

5. Simulation

6. Application

6.1. Data Description

6.2. Data Analysis

6.3. Results Analysis

7. Conclusions

Author Contributions

Funding

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Article Metrics

Citations

Article Access Statistics

Generalized Partially Functional Linear Model with Unknown Link Function

Abstract

1. Introduction

2. Preliminaries

3. Model and Estimation

3.1. Abbreviation Introduction

3.2. Model

3.3. Estimation

4. Asymptotic Properties

4.1. Asymptotic Convergence of g ( α )

4.2. Asymptotic Convergence of θ ^

4.3. Asymptotic Convergence of g ^

5. Simulation

6. Application

6.1. Data Description

6.2. Data Analysis

6.3. Results Analysis

7. Conclusions

Author Contributions

Funding

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Article Metrics

Citations

Article Access Statistics

4.1. Asymptotic Convergence of $g^{(α)}$

4.2. Asymptotic Convergence of $\hat{θ}$

4.3. Asymptotic Convergence of $\hat{g}$