# Generalized Partially Functional Linear Model with Unknown Link Function

^{1}

^{2}

^{*}

## Abstract

**:**

## 1. Introduction

## 2. Preliminaries

## 3. Model and Estimation

#### 3.1. Abbreviation Introduction

#### 3.2. Model

#### 3.3. Estimation

**Step 1**To obtain the estimate ${\theta}^{\left(0\right)}$ of ${\theta}_{0}$ by solving Equation (5), it is assumed that the link function $g(\xb7)$ is known. The link function $g(\xb7)$ is required to be second-order continuously differentiable to ensure the existence of the Hessian matrix, moreover, for the variance function ${\sigma}^{2}(\xb7)$ is defined on the range of link function and is strictly positive.

**Step 2**By local linear regression, the estimates ${g}^{\left(0\right)}$, ${g}^{\prime \left(0\right)}$ of the link functions g, ${g}^{\prime}$ are obtained.

**Step 3**Using the method of

**Step 1**, the link function is replaced by the estimated link functions ${\tilde{g}}^{\left(\alpha \right)}$ and ${\tilde{g}}^{\prime \left(\alpha \right)}$, where $\alpha =0,1,2,\dots $. To update ${\tilde{\theta}}^{\left(\alpha \right)}$, solve the estimation equation (5) for $\theta $. From this we can obtain the estimated value of ${\tilde{\theta}}^{\left(\alpha \right)}$

**Step 4**Using the method in

**Step 2**, the parameter vector is replaced by the estimated ${\tilde{\theta}}^{\left(\alpha \right)}={({\tilde{\chi}}_{j1}^{\left(\alpha \right)},{\tilde{\chi}}_{j2}^{\left(\alpha \right)},\cdots ,{\tilde{\chi}}_{jm}^{\left(\alpha \right)},{\tilde{\gamma}}_{1}^{\left(\alpha \right)},{\tilde{\gamma}}_{2}^{\left(\alpha \right)},\cdots ,{\tilde{\gamma}}_{q}^{\left(\alpha \right)})}^{T}$, where $\alpha =1,2,3,\dots $ From this we obtain the estimates ${\tilde{g}}^{\left(\alpha \right)}$ and ${\tilde{g}}^{\prime \left(\alpha \right)}$ for g and ${g}^{\prime}$, where $\alpha =1,2,3,\dots $

**Step 5**Repeat the above steps until $\left|{\tilde{\theta}}^{(\alpha +1)}-{\tilde{\theta}}^{\left(\alpha \right)}\right|$ converge, and stop the iteration.

**Step 6**The final estimate of the regression coefficient $\theta $ is obtained as $\widehat{\theta}$, and the estimate of the link function g is obtained as $\widehat{g}$.

## 4. Asymptotic Properties

- (C1)
- There exists $b=max(4,c)$ for a constant $c>0$, such that $E\left[{\int}_{T}{\u2225{X}_{j}\left(t\right)\u2225}^{b}dt\right]<\infty ,\phantom{\rule{4pt}{0ex}}j=1,\cdots ,d,\phantom{\rule{4pt}{0ex}}E\left[{\u2225Z\u2225}^{b}\right]<\infty ,\phantom{\rule{4pt}{0ex}}E\left[\epsilon \right]<\infty .$
- (C2)
- Let the density function $f(\xb7)$ of ${\eta}_{i}$ be strictly positive, and $f(\xb7)$ satisfies the first-order Lipschitz condition when $\theta \to {\theta}_{0}$.
- (C3)
- The kernel function $k(\xb7)$ satisfies the first-order Lipschitz condition and is a bounded and continuous symmetric probability density function and satisfies ${\int}_{-\infty}^{\infty}{u}^{2}k\left(u\right)du\ne 0,\phantom{\rule{4pt}{0ex}}{\int}_{-\infty}^{\infty}{\left|u\right|}^{2}k\left(u\right)du<\infty .$
- (C4)
- $n{h}^{4}/{log}^{2}n\to \infty ,n{h}^{5}=O\left(1\right)$. Here, h is the bandwidth of the kernel function.
- (C5)
- For $j=1,\cdots ,d$, ${m}_{j}{n}^{-1/4}\to 0$ as $n\to \infty $.

**Remark 1.**

#### 4.1. Asymptotic Convergence of ${g}^{\left(\alpha \right)}$

**Lemma 1.**

**Proof.**

**Theorem 1.**

**Proof.**

**Corollary 1.**

#### 4.2. Asymptotic Convergence of $\widehat{\theta}$

**Lemma 2.**

**Proof.**

**Lemma 3.**

**Proof.**

**Theorem 2.**

**Proof.**

#### 4.3. Asymptotic Convergence of $\widehat{g}$

**Theorem 3.**

**Proof.**

**Corollary 2.**

**Remark 2.**

## 5. Simulation

## 6. Application

#### 6.1. Data Description

#### 6.2. Data Analysis

#### 6.3. Results Analysis

## 7. Conclusions

## Author Contributions

## Funding

## Data Availability Statement

## Acknowledgments

## Conflicts of Interest

**Figure 2.**Asymptotic properties of the link function g. The black line in the graph represents the true link function $g=exp\left(\eta \right)/(1+exp(\eta \left)\right)$. The purple, yellow, and red lines in the graph represent the estimated link functions $\widehat{g}$ under sample sizes of $n=50$, $n=100$, and $n=300$, respectively.

**Figure 3.**Estimated values of regression coefficient function ${\widehat{\beta}}_{1}\left(t\right)$, ${\widehat{\beta}}_{2}\left(t\right)$ (blue curves) and their 95% confidence intervals (grey area) for difference sample size, where the red curves are the theoretical regression coefficient functions ${\beta}_{1}\left(t\right)$, ${\beta}_{2}\left(t\right)$.

**Figure 4.**Daily AQI (left plot) and daily temperatures (right plot) for 58 cities in 2020; each curve represents one city.

**Figure 5.**Estimated values of regression coefficient function $\widehat{\beta}\left(t\right)$ and their 95% confidence intervals.

Abbreviation | Full Form |
---|---|

FPCA | Functional principal component analysis |

KL expansion | Karhunen–Loeve expansion |

RMISE | Root Mean Integrated Square Error |

SD | Standard Deviation |

GCV | Generalized Cross Validation |

MAE | Mean Absolute Error |

MSE | Mean Squared Error |

TP | True Positive |

TN | True Negative |

FP | False Positive |

FN | False Negative |

n | RMISE |
---|---|

50 | 0.3540 |

100 | 0.2734 |

300 | 0.1449 |

**Table 3.**SD and RMISE of the estimated values of ${\widehat{\beta}}_{1}\left(t\right)$ and ${\widehat{\beta}}_{2}\left(t\right)$ for different sample sizes n.

n | SD | RMISE | |
---|---|---|---|

50 | 0.2475 | 0.3405 | |

${\widehat{\beta}}_{1}\left(t\right)$ | 100 | 0.1344 | 0.2517 |

300 | 0.0552 | 0.1204 | |

50 | 0.2536 | 0.3232 | |

${\widehat{\beta}}_{2}\left(t\right)$ | 100 | 0.1261 | 0.2863 |

300 | 0.0239 | 0.1033 |

**Table 4.**Estimated values of scalar regression coefficients $\widehat{\gamma}$ and their SD in brackets for different sample sizes n.

n | ${\widehat{\mathit{\gamma}}}_{1}$ | ${\widehat{\mathit{\gamma}}}_{2}$ | ${\widehat{\mathit{\gamma}}}_{3}$ |
---|---|---|---|

50 | 0.7298 (0.191) | 0.5928 (0.177) | 0.5307 (0.232) |

100 | 0.6892 (0.092) | 0.5832 (0.071) | 0.4894 (0.096) |

300 | 0.7105 (0.019) | 0.5732 (0.018) | 0.4988 (0.016) |

n | M1 | M2 |
---|---|---|

50 | 0.3182 | 0.1579 |

100 | 0.3028 | 0.1498 |

300 | 0.2921 | 0.1406 |

Estimate | Std.Error | t Value | Pr (>$\left|\mathit{t}\right|$) | |
---|---|---|---|---|

${\widehat{\gamma}}_{GDP}$ | 0.6776 | 0.339 | 1.9988 | 0.04639 |

${\widehat{\gamma}}_{Beds}$ | 0.7354 | 0.367 | 2.0038 | 0.04585 |

**Table 7.**Comparison between Unknown Link Function Model, Logit Link Function Model, and Model without a Link Function.

Link Function | MAE | MSE | ${\mathit{R}}^{2}$ | Accuracy |
---|---|---|---|---|

Unknown | 0.2584 | 0.1399 | 0.8916 | 81.03% |

Logit | 0.2872 | 0.2511 | 0.6673 | 75.86% |

Without | 0.4777 | 0.3146 | 0.4118 | 74.14% |

