Estimation of the Conditional Hazard Function with a Recursive Kernel from Censored Functional Ergodic Data

Hadjer Kebir; Boubaker Mechab

doi:10.3390/IOCMA2023-14412

and

Laboratory of Statistic and Processus Stochastic, University of Djillali Liabes, L.P 89, Sidi Bel Abbes 22000, Algeria

^*

Author to whom correspondence should be addressed.

^†

Presented at the 1st International Online Conference on Mathematics and Applications, 1–15 May 2023; Available online: https://iocma2023.sciforum.net/.

Comput. Sci. Math. Forum2023, 7(1), 16;https://doi.org/10.3390/IOCMA2023-14412

This article belongs to the Proceedings The 1st International Online Conference on Mathematics and Applications

Version Notes

Order Reprints

Abstract

In this paper, we propose a non-parametric estimator of the conditional hazard function weighted on the recursive kernel method given an explanatory variable taking values in a semi-metric space when the scalar response is censored. Under the ergodicity condition, we establish the convergence rate of this estimator.

Keywords:

conditional hazard function; censored data; functional ergodic data; recursive kernel estimate

1. Introduction

The functional estimate has been a topic of great interest in the statistical literature. To obtain a summary of the current state of non-parametric functional data, we refer to the work in [1,2]. The hazard function, also known as the risk function, is a concept commonly used in survival analysis and reliability theory. It plays an important role in statistics and arises in a variety of fields, including econometrics, epidemiology, environmental science, and many others. The work in [3] is an important contribution to the conditional hazard rate for functional covariates in an infinite-dimensional space. Censored data are a type of data in which the values are incomplete or partially known. We consider a type of right-censored data, where the observation is known to be above a certain threshold, but the exact value is unknown. For example, if we are studying the time until a light bulb fails, we might know that the bulb lasted at least 500 h, but we do not know exactly how long it lasted beyond that. Ergodic data have represented a rising interest in this domain over the past few years. It is an essential postulate in statistical physics for analyzing the thermodynamic characteristics of gases, atoms, electrons, or plasmas. Ergodic theory enables us to circumvent intricate probabilistic computations related to the mixing condition. In our setting, we study the almost sure convergence of the kernel estimator of the conditional hazard function; we consider a recursive estimate with strictly stationary, censored, and ergodic observations. It is worth noting that the recursive estimate has a benefit in that the smoothing parameter is tied to the observation

(X_{i}, Y_{i})

, which enables us to continuously update our estimator as we receive new observations. To combine censored data and ergodic theory, we refer to the work in [4]. They estimated the conditional quantile using censored and ergodic data.

2. Materials and Methods

In practice, it is possible to coincide with censored variables, that is, instead of observing the lifetimes, we observe the censored lifetimes. This problem is usually modeled by considering

T_{1}, . . ., T_{n}

, a sequence of lifetimes which satisfy some kind of dependency, and

C_{1}, . . . C_{n}

, which is a sequence of i.i.d censored random variables with a common unknown continuous distribution function G. We observe only the n pairs

(Y_{i}, δ_{i})

, where

Y_{i} = m i n {T_{i}, C_{i}}

and

δ_{i} = I_{{T_{i} \leq C_{i}}}

,

1 \leq i \leq n

, where

I_{A}

denotes the indicator function of the set A. To ensure the identifiability of the model, we assume that

C_{i}

and

(T_{i}, X_{i})

(1 \leq i \leq n)

are independent. Let

{(X_{i}, T_{i})}_{i = 1, . . ., n}

be a sequence of strictly stationary ergodic processes with the same distribution. We also assume

X_{i}

takes values in a semi-metric space

(F, d)

, whereas

T_{i}

are real-valued random variables. In addition, for insuring good mathematical properties of the functional nonparametric methods, we establish our asymptotic results on the concentration properties of small balls of the probability measure of the functional variable.

We define the function hazard

h^{x}

for

y \in R

and

F^{x} (y) < 1

by

h^{x} (y) = \frac{f^{x} (y)}{1 - F^{x} (y)} .

To this aim, we first introduce the recursive double-kernel-type pseudo-estimator

{\tilde{F}}^{x}

of the conditional distribution function

F^{x}

defined by

{\tilde{F}}^{x} (t) = \frac{\sum_{i = 1}^{n} \frac{δ_{i}}{\bar{G} (Y_{i})} K (a_{i}^{- 1} d (x, X_{i})) H (b_{i}^{- 1} (t - Y_{i}))}{\sum_{i = 1}^{n} K (a_{i}^{- 1} d (x, X_{i}))}, \forall t \in R .

where K is the kernel, H is a strictly increasing distribution function,

a_{i}, b_{i}

are sequences of positive real numbers such that

lim_{n \to + \infty} a_{n} = lim_{n \to + \infty} b_{n} = 0

, and

\bar{G} (.) = 1 - G (.)

.

From this pseudo-estimator, we deduce a pseudo-estimator

{\tilde{f}}^{x}

of the conditional density

f^{x}

by

{\tilde{f}}^{x} (t) = \frac{\sum_{i = 1}^{n} \frac{δ_{i}}{\bar{G} (Y_{i})} b_{i}^{- 1} K (a_{i}^{- 1} d (x, X_{i})) H^{'} (b_{i}^{- 1} (t - Y_{i}))}{\sum_{i = 1}^{n} K (a_{i}^{- 1} d (x, X_{i}))}, \forall t \in R .

where

H^{'}

is the derivative of H.

In practice, G is unknown, and one can estimate it using the Kaplan and Meier (1958) estimate

{\bar{G}}_{n} (.)

, defined as:

\begin{matrix} {\bar{G}}_{n} (t) = \{\begin{matrix} \prod_{i = 1}^{n} {(1 - \frac{1 - δ_{(i)}}{n - i + 1})}^{I_{{Y_{(i)} \leq t}}} & if t < Y_{(n)}, \\ 0 & Otherwise . \end{matrix} \end{matrix}

where

Y_{(1)} < Y_{(2)} < . . . < Y_{(n)}

are the order statistics of

{(Y_{i})}_{1 \leq i \leq n}

and

δ_{(i)}

is concomitant with

Y_{(i)}

.

Thus, the feasible estimator of the conditional distribution function

F^{x} (t)

is given by

{\hat{F}}^{x} (t) = \frac{\sum_{i = 1}^{n} \frac{δ_{i}}{{\bar{G}}_{n} (Y_{i})} K (a_{i}^{- 1} d (x, X_{i})) H (b_{i}^{- 1} (t - Y_{i}))}{\sum_{i = 1}^{n} K (a_{i}^{- 1} d (x, X_{i}))}, \forall t \in R .

We deduce an estimator for a conditional density

f^{x} (t)

, defined as

{\hat{f}}^{x} (t) = \frac{\sum_{i = 1}^{n} \frac{δ_{i}}{{\bar{G}}_{n} (Y_{i})} b_{i}^{- 1} K (a_{i}^{- 1} d (x, X_{i})) H^{'} (b_{i}^{- 1} (t - Y_{i}))}{\sum_{i = 1}^{n} K (a_{i}^{- 1} d (x, X_{i}))}, \forall t \in R .

We estimate the conditional hazard function

{\hat{h}}^{x}

by

{\hat{h}}^{x} (t) = \frac{{\hat{f}}^{x} (t)}{1 - {\hat{F}}^{x} (t)} . \forall t \in R .

Remark 1.

The Kaplan–Meier estimator is not recursive and the use of such an estimator can slightly penalize the efficiency of our estimator in terms of computational time.

3. Results

To establish the almost sure convergence of

{\hat{h}}^{x}

, we need to include the following assumptions:

Assumptions 1.

(H1): $\{\begin{matrix} (i) T h e f u n c t i o n ϕ (x, h) : = P (X \in B (x, h)) > 0, \forall h > 0 . \\ (i i) F o r a l l i = 1, . . ., n t h e r e e x i s t s a d e t e r m i n i s t i c f u n c t i o n ϕ_{i} (x, .) s u c h t h a t a l m o s t s u r e l y \\ 0 < P (X_{i} \in B (x, h) | F_{i - 1}) \leq ϕ_{i} (x, h), \forall h > 0 . \\ a n d ϕ_{i} (x, h) \to 0 a s h \to 0 . \\ (i i i) For all sequences {(h_{i})}_{i = 1, . . ., n} > 0, \frac{\sum_{i = 1}^{n} P (X_{i} \in B (x, h_{i}) | F_{i - 1})}{\sum_{i = 1}^{n} ϕ (x, h_{i})} \to 1 \end{matrix}$

where $B (x, h) : = {x^{'} \in F / d (x^{'}, x) < h} .$
(H2): (i) The conditional distribution function $F^{x}$ is such that $\forall t \in S, \exists β > 0,$ $inf_{t \in S} (1 - F^{x} (t)) > β$ , ∀ ( $t_{1}$ , $t_{2}$ ) $\in S \times S$ , ∀ ( $x_{1}$ , $x_{2}$ ) $\in N_{x} \times N_{x}$ ,
$| F^{x_{1}} (y_{1}) - F^{x_{2}} (y_{2}) | \leq C_{1} (d {(x_{1}, x_{2})}^{β_{1}} + | t_{1} - t_{2} |^{β_{2}}),$ $β_{1} > 0, β_{2} > 0$ .
(ii) The density $f^{x}$ is such that $\forall t \in S, \exists α > 0$ $, f^{x} (t) < α$ , ∀ ( $t_{1}$ , $t_{2}$ ) $\in S \times S$ , ∀ ( $x_{1}$ , $x_{2}$ ) $\in N_{x} \times N_{x}$ ,
$| f^{x_{1}} (t_{1}) - f^{x_{2}} (t_{2}) | \leq C_{1} (d {(x_{1}, x_{2})}^{β_{1}} + | t_{1} - t_{2} |^{β_{2}}),$ $β_{1} > 0, β_{2} > 0$ .
(H3): $\forall (t_{1}, t_{2}) \in R^{2}$ , $| H^{(j)} (t_{1}) - H^{(j)} (t_{2}) | \leq C | t_{1} - t_{2} |$ for $j = 0, 1$ .
${\int | w |}^{β_{2}} H^{(1)} (w) d w < \infty$ , $\int H^{' 2} (t) d w < \infty$ .
(H4): K is a function with support (0,1) such that $0 < C_{1} I_{[0, 1]} < K (t) < C_{2} I_{[0, 1]} < \infty$ , where $I_{A}$ is the indicator function.
(H5): (i) $lim_{n \to + \infty} n^{- 1} \sum_{i = 1}^{n} \frac{a_{i}^{β_{1}} ϕ_{i} (x, a_{i})}{ϕ (x, a_{i})} = 0,$ (ii) $lim_{n \to + \infty} n^{- 1} \sum_{i = 1}^{n} \frac{b_{i}^{β_{2}} ϕ_{i} (x, a_{i})}{ϕ (x, a_{i})} = 0 .$
(H6): $lim_{n \to + \infty} \frac{φ_{n, j} (x) log n}{n^{2}} = 0$ , where $φ_{n, j} (x) = \sum_{i = 1}^{n} \frac{b_{i}^{- j} ϕ_{i} (x, a_{i})}{ϕ^{2} (x, a_{i})}$ for $j = 0, 1 .$
(H7): ${(C_{n})}_{n \geq 1}$ and ${(X_{n}, T_{n})}_{n \geq 1}$ are independent.
(H8): G has a bounded first derivative $G^{(1)}$ .

Theorem 1.

Under hypotheses (H1)–(H7), we have:

sup_{t \in S} | {\hat{h}}^{x} (t) - h^{x} (t) | =

\begin{matrix} O (n^{- 1} \sum_{i = 1}^{n} \frac{a_{i}^{β_{1}} ϕ_{i} (x, a_{i})}{ϕ (x, a_{i})}) + O (n^{- 1} \sum_{i = 1}^{n} \frac{b_{i}^{β_{2}} ϕ_{i} (x, a_{i})}{ϕ (x, a_{i})}) + O (\sqrt{\frac{φ_{n, 1} (x) log n}{n^{2}}}) a . s . \end{matrix}

(1)

Proof of Theorem 1.

The proof of this theorem is based on the following decomposition and lemmas below:

\begin{matrix} {\hat{h}}^{x} (t) - h^{x} (t) = \frac{1}{1 - {\hat{F}}^{x} (t)} [{\hat{f}}^{x} (t) - f^{x} (t)] + \frac{h^{x} (t)}{1 - {\hat{F}}^{x} (t)} [{\hat{F}}^{x} (t) - F^{x} (t)] . \end{matrix}

(2)

□

Lemma 1.

Under hypotheses (H1), (H2)(i) and (H3)–(H7), we have:

sup_{t \in S} | {\hat{F}}^{x} (t) - F^{x} (t) | =

\begin{matrix} O (n^{- 1} \sum_{i = 1}^{n} \frac{a_{i}^{β_{1}} ϕ_{i} (x, a_{i})}{ϕ (x, a_{i})}) + O (n^{- 1} \sum_{i = 1}^{n} \frac{b_{i}^{β_{2}} ϕ_{i} (x, a_{i})}{ϕ (x, a_{i})}) + O (\sqrt{\frac{φ_{n, 0} (x) log n}{n^{2}}}) a . s . \end{matrix}

(3)

Lemma 2.

Under hypotheses (H1), (H2)(ii) and (H3)–(H7), we have:

sup_{t \in S} | {\hat{f}}^{x} (t) - f^{x} (t) | =

\begin{matrix} O (n^{- 1} \sum_{i = 1}^{n} \frac{a_{i}^{β_{1}} ϕ_{i} (x, a_{i})}{ϕ (x, a_{i})}) + O (n^{- 1} \sum_{i = 1}^{n} \frac{b_{i}^{β_{2}} ϕ_{i} (x, a_{i})}{ϕ (x, a_{i})}) + O (\sqrt{\frac{φ_{n, 1} (x) log n}{n^{2}}}) a . s . \end{matrix}

(4)

Lemma 3.

Under the hypotheses of Lemma 1, we have:

\exists δ > 0 s u c h t h a t \sum_{n = 1}^{\infty} P \{inf_{t \in S} | 1 - {\hat{F}}^{x} (t) | \leq δ\} < \infty .

(5)

4. Discussion

This contribution concerns a recursive nonparametric estimation of the conditional hazard function in the presence of a functional explanatory variable when the scalar response is right censored in the ergodic case. As asymptotic results, we have established the almost sure convergence. Concerning the assumptions, they can be divided into three categories: structural assumptions, assumptions of the explanatory variable, and technical assumptions.

Author Contributions

Both authors contributed equally to this work. All authors have read and agreed to the published version of the manuscript.

Funding

The work received no external funding.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Not applicable.

Conflicts of Interest

The authors declare no conflict of interest.

References

Ferraty, F.; Vieu, P. Nonparametric Functional Data Analysis; Springer Science+Business Media, Inc.: Berlin/Heidelberg, Germany, 2006. [Google Scholar]
Ramsay, J.O.; Silverman, B.W. Applied Functional Data Analysis; Springer Series in Statistics; Springer: New York, NY, USA, 2002; Volume 1. [Google Scholar]
Ferraty, F.; Rabhi, A.; Vieu, P. Estimation non-paramétrique de la fonction de hasard avec variable explicative fonctionnelle. Rev. Math. Pures Appl. 2008, 53, 1–18. [Google Scholar]
Chaouch, M.; Khardani, S. Randomly censored quantile regression estimation unsing functional stationary ergodic data. J. Nonparametr. Stat. 2015, 27, 65–87. [Google Scholar] [CrossRef]

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Estimation of the Conditional Hazard Function with a Recursive Kernel from Censored Functional Ergodic Data^†

Abstract

1. Introduction

2. Materials and Methods

3. Results

4. Discussion

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

References

Article Metrics

Citations

Article Access Statistics

Estimation of the Conditional Hazard Function with a Recursive Kernel from Censored Functional Ergodic Data †

Abstract

1. Introduction

2. Materials and Methods

3. Results

4. Discussion

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

References

Article Metrics

Citations

Article Access Statistics

Estimation of the Conditional Hazard Function with a Recursive Kernel from Censored Functional Ergodic Data^†