Scalar-on-Function Mode Estimation Using Entropy and Ergodic Properties of Functional Time Series Data

Alamari, Mohammed B.; Almulhim, Fatimah A.; Almanjahie, Ibrahim M.; Bouzebda, Salim; Laksaci, Ali

doi:10.3390/e27060552

Open AccessArticle

Scalar-on-Function Mode Estimation Using Entropy and Ergodic Properties of Functional Time Series Data

by

Mohammed B. Alamari

¹

,

Fatimah A. Almulhim

²

,

Ibrahim M. Almanjahie

¹

,

Salim Bouzebda

^3,*

and

Ali Laksaci

¹

Department of Mathematics, College of Science, King Khalid University, Abha 62223, Saudi Arabia

²

Department of Mathematical Sciences, College of Science, Princess Nourah bint Abdulrahman University, P.O. Box 84428, Riyadh 11671, Saudi Arabia

³

Université de Technologie de Compiègne, LMAC (Laboratory of Applied Mathematics of Compiègne), 60203 Compiègne, France

^*

Author to whom correspondence should be addressed.

Entropy 2025, 27(6), 552; https://doi.org/10.3390/e27060552

Submission received: 18 April 2025 / Revised: 14 May 2025 / Accepted: 22 May 2025 / Published: 24 May 2025

(This article belongs to the Section Information Theory, Probability and Statistics)

Download

Browse Figures

Versions Notes

Abstract

In this paper, we investigate the recursive

L^{1}

estimator of the conditional mode when the input variable takes values in a pseudo-metric space. The new proposed estimator is constructed under an ergodicity assumption, which provides a robust alternative to the standard mixing processes in various practical settings. The particular interest of this contribution arises from the difficulty in incorporating the mathematical properties of a functional mixing process. In contrast, ergodicity is characterized by the Kolmogorov–Sinai entropy, which measures the dynamics, the sparsity, and the microscopic fluctuations of the functional process. Using an observation sampled from ergodic functional time series (fts), we establish the asymptotic properties of this estimator. In particular, we derive its convergence rate and show Borel–Cantelli (BC) consistency. The general expression for the convergence rate is then specialized to several notable scenarios, including the independence case, the classical kernel method, and the vector-valued case. Finally, numerical experiments on both simulated and real-world datasets demonstrate the superiority of the

L^{1}

-recursive estimator compared to existing competitors.

Keywords:

L^{1}

-modal regression; functional data; ergodic data; recursive estimate; nonparametric prediction; complete consistency; conditional mode; quantile regression

MSC:

62G08; 62G10; 62G35; 62G07; 62G32; 62G30; 62H12

1. Introduction

Investigating the joint behavior of two random variables in a functional setting is an active area of applied statistics, as it facilitates quantifying the influence of a functional covariate on a scalar response. Numerous functional approaches have been proposed to capture this relationship, including conditional expectation, relative regression, and median regression. However, modeling the relationship via the conditional distribution function is often regarded as more informative because it sheds light on both central and extreme parts of the response. Consequently, the principal goal of this work is to introduce a new estimator for modal regression using the cumulative distribution function.

Despite substantial literature on conditional mode prediction, the predominant estimator remains the Nadaraya–Watson (NW) method. The earliest investigation of conditional mode estimation can be traced back to [1], which demonstrated that the mode can yield superior predictive performance compared with the conditional mean. In [2], the authors proposed a mode-based predictor using the derivative of the conditional density, applicable to vector-valued input variables. Subsequently, ref. [3] derived the asymptotic distribution of the modal regression estimator under independence, and this result was generalized to dependent data by [4]. For more recent works, we refer the readers to [5].

A functional version of the NW-estimator for the conditional mode (CM) was first introduced in [6], where the authors established almost complete consistency of the estimator by identifying it as the maximizer of the conditional density. This result was extended to dependent processes in [7]. The asymptotic distribution of the NW-based functional CM estimator was studied under the i.i.d. assumption in [8], whereas [9] considered strong mixing functional time series under fractal conditions. The monograph of [10] represents a key contribution to nonparametric functional prediction, and further theoretical results on functional mode estimation can be found in [11], which addressed the

L^{p}

-convergence of NW-based functional mode estimators. In the context of ergodic functional time series, ref. [12] focused on conditional mode estimation and derived BC consistency under a missing-at-random framework for the functional covariate. Several alternative estimators to the NW approach have also emerged in functional data analysis (FDA). For example, ref. [13] established the asymptotic normality of a local linear CM estimator in the functional setting, ref. [14] investigated a kNN-based functional CM approach, and [15] developed a local linear functional-kNN version of the estimator. More extensive studies on functional CM estimation can be found in [16,17,18] and related references. Additionally, in the ergodic functional time series case, ref. [19] obtained BC consistency for the conditional mode estimator.

A distinctive contribution of the present paper is its focus on a recursive estimation algorithm, a direction that remains underexplored in FDA. One of the earliest examinations of recursive methods in this domain is [20], which addressed the recursive estimation of conditional mean functions. Later, ref. [21] investigated recursive procedures for functional time series under mixing conditions. More recent developments in functional nonparametric smoothing by means of recursive algorithms, along with relevant references, are presented in [22]. For additional perspectives on FDA and its applications, including dedicated survey articles and specialized journal issues, see [23,24,25,26], among others and recent papers [27,28,29].

1.1. Contributions of This Paper

The primary objective of this work is to propose a novel modal regression estimator and establish its asymptotic properties under a general framework of ergodic functional time series. Specifically, our estimator combines an

L^{1}

-based approach with a recursive procedure. In contrast to estimators built upon the NW or local linear methods, the newly constructed estimator offers multiple advantages. First, incorporating an

L^{1}

-technique promotes robustness, which mitigates the impact of outliers through a percentile-based approach. Moreover, harnessing the conditional distribution function to identify the conditional mode leverages comprehensive information about the functional covariate–response relationship, potentially enhancing the estimator’s precision. A further strength lies in the recursive structure, which seamlessly updates the estimator upon the arrival of each new data point—a feature especially valuable for real-time forecasting in ergodic functional time series. This adaptability is highly relevant in fields such as artificial intelligence, where continuous data processing is critical. From a theoretical standpoint, we derive the Borel–Cantelli convergence rate of the proposed estimator under mild ergodic conditions frequently satisfied by common processes (e.g., moving average (MA), generalized autoregressive conditional heteroskedasticity (GARCH), Volterra). Finally, we illustrate the practical value of our algorithm through empirical investigations on both synthetic and real-world datasets.

1.2. Paper Organization

We introduce the

L^{1}

-based conditional mode and its recursive estimator in Section 2. The main theoretical results, including consistency and convergence rates, are presented in Section 3. Section 4 is devoted to some discussion on the real impact of the principal axes of the studied topic. In Section 5, we investigate the finite-sample performance of the proposed estimator through simulation studies and applications to real data. The proofs of the auxiliary results are provided in Section 6.

2. The $L^{1}$ -Recursive Estimation of the Mode

Consider a strictly stationary sequence of dependent input–output random variables, denoted by

{(I_{i}, O_{i})}_{i = 1, \dots, n}

, which takes values in

F \times R

. Here,

F

is a semi-metric space endowed with a semi-metric

d (\cdot, \cdot)

. Let

N_{θ}

be a neighborhood of a fixed curve

θ \in F

. We assume that the conditional distribution function

G (\cdot ∣ θ)

is strictly increasing and admits a continuous density

g (y ∣ θ)

with respect to the Lebesgue measure on

R

. Recall that for a given

p \in (0, 1)

, the conditional quantile of O given

I = θ

, denoted by

Q u_{p} (θ)

, is obtained by inverting the conditional distribution function, namely

Q u_{p} (θ) = G^{- 1} (p | θ) .

Meanwhile, the conditional mode of O given

I = θ

, denoted by

C M (θ)

, is defined as the maximizer of the conditional density on a given compact set

K \subset R

:

C M (θ) = arg max_{y \in K} g (y ∣ θ) .

By combining these two notions, one can re-express modal regression as

C M (θ) = Q u_{p_{θ}} (θ) with p_{θ} = arg min_{p \in G^{- 1} (K ∣ θ)} \frac{\partial}{\partial p} Q u_{p} (θ) .

(1)

In the rest of this paper, we assume that the compact subset K locates one conditional mode

C M (θ)

and that (1) holds.

The

L^{1}

-estimator of the conditional mode naturally connects to the

L^{1}

-quantile regression, which is determined by

Q u_{p} (θ) = arg min_{t \in R} Ψ_{p} (θ, t),

where

Ψ_{p} (θ, t) = E [L_{p} (O - t) | I = θ] and L_{p} (s) = (2 p - 1) s + | s | .

An

L^{1}

-recursive estimator of the function

Q u_{p} (\cdot)

can thus be defined by

\hat{Q u_{p}} (θ) = arg min_{t \in R} \hat{Ψ_{p}} (θ, t),

(2)

where

\hat{Ψ_{p}} (θ, t) = \frac{\sum_{i = 1}^{n} Γ (a_{i}^{- 1} d (θ, I_{i})) [(2 p - 1) (O_{i} - t) + |O_{i} - t|]}{\sum_{i = 1}^{n} Γ (a_{i}^{- 1} d (θ, I_{i}))}, \forall t \in R,

where

Γ

is a kernel function, and

{a_{i}}

is a sequence of positive real numbers satisfying

{lim}_{n \to \infty} a_{n} = 0

. Note that in the recursive estimation setting, in contrast to the Nadaraya–Watson-type (NW) approach, the bandwidth

a_{i}

depends on the specific input observations

{I_{i}}

, thereby allowing the estimator to be updated whenever a new observation is obtained.

Before constructing a

L^{1}

-recursive estimator of the modal regression, it is necessary to define estimators for both

p_{θ}

and

Q u_{p}^{'} (θ)

. Recall that

Q u_{p}^{'} (θ) = \frac{\partial}{\partial p} Q u_{p} (θ) = lim_{b \to 0} \frac{Q u_{p + b} (θ) - Q u_{p} (θ)}{b} .

A natural estimator for

Q u_{p}^{'} (θ)

is then given by

\hat{Q u_{p}^{'}} (θ) = \frac{\hat{Q u_{p + h_{n}}} (θ) - \hat{Q u_{p - h_{n}}} (θ)}{2 h_{n}},

where

{h_{n}}

is a sequence of positive real numbers converging to 0. The conditional mode

C M (θ)

is accordingly estimated by

\hat{C M} (θ) = \hat{Q u_{\hat{p_{θ}}}} (θ),

(3)

where

\hat{p_{θ}} = arg min_{p \in G^{- 1} (K ∣ θ)} \hat{Q u_{p}^{'}} (θ) .

Of course,

\hat{C M} (θ)

is not necessarily unique, and if that is the case,

\hat{C M} (θ)

will concern any values verifying (3).

In the theoretical development, our principal goal is to establish a Borel–Cantelli convergence result for

\hat{C M} (θ)

. To achieve this, we adopt the classical ergodicity assumption, which is more general than ordinary mixing conditions. A process is typically considered ergodic if, over sufficient time, the entropy measured along a single evolving trajectory converges to the entropy of the system’s full ensemble of possible states. In the functional case, we employ the definition of ergodicity for functional statistics proposed by [7], as it provides a suitable framework for analyzing dependent functional data.

3. Main Results

We begin by letting C or

C^{'}

denote generic strictly positive constants. We also assume that

G^{- 1} (K ∣ θ) = [a_{θ}, b_{θ}] .

Moreover, for each

k = 1, \dots, n

, define

G_{k}

as the

σ

-field generated by

((I_{1}, O_{1}), \dots

,

(I_{k}, O_{k}), I_{k + 1})

, and let

F_{k}

be the

σ

-field generated by

((I_{1}, O_{1}), \dots, (I_{k}, O_{k}))

.

Our principal assumptions are as follows:

(Co1): $\{\begin{matrix} (i) The function ξ (θ, a) : = I P (I \in B (θ, a)) satisfies ξ (θ, a) > 0 for every a > 0, \\ where B (θ, r) = {θ^{'} \in F : d (θ^{'}, θ) < r} . \\ (ii) For each i = 1, \dots, n, there exists a deterministic function ξ_{i} (θ, \cdot) such that \\ almost surely, 0 < I P (I_{i} \in B (θ, a) ∣ F_{i - 1}) \leq ξ_{i} (θ, a), \forall a > 0, \\ and ξ_{i} (θ, a) \to 0 as a \to 0 . \\ (iii) For any positive sequence {(a_{i})}_{i = 1, \dots, n}, we have \\ \frac{\sum_{i = 1}^{n} I P (I_{i} \in B (θ, a_{i}) ∣ F_{i - 1})}{\sum_{i = 1}^{n} ξ (θ, a_{i})} ⟶ 1 a . co . \end{matrix}$
(Co2): The function $Q u_{\cdot} (θ)$ is three times continuously differentiable on $[a_{θ}, b_{θ}]$ . In addition, suppose that $G (\cdot ∣ \cdot)$ satisfies the Lipschitz condition

$\forall θ_{1}, θ_{2} \in N_{θ}, \forall t_{1}, t_{2} \in [a_{θ}, b_{θ}], |G (t_{1} ∣ θ_{1}) - G (t_{2} ∣ θ_{2})| \leq C d^{b} (θ_{1}, θ_{2}) + |t_{1} - t_{2}|,$

for some $b > 0$ , where $N_{θ}$ is a neighborhood of $θ$ .
(Co3): The function $Γ$ is supported on $(0, 1)$ and fulfills

$0 < C {1 I}_{(0, 1)} < Γ (t) < C^{'} {1 I}_{(0, 1)} < \infty .$
(Co4): $lim_{n \to \infty} \frac{ζ_{n} ln n}{n^{2} ξ_{n}^{2}} = 0, where ζ_{n} = \sum_{i = 1}^{n} ξ_{i} (θ, a_{i}) and ξ_{n} = \frac{1}{n} \sum_{i = 1}^{n} ξ (θ, a_{i}) .$

Clearly, conditions (Co1)–(Co4) are often encountered in nonparametric functional time series analysis. In particular, (Co1) describes the probabilistic concentration behavior of the input variable, including its conditional concentration with respect to the filtration, which underscores the impact of ergodicity on the asymptotic properties of the estimator. Assumption (Co2) is pivotal for the nonparametric structure of the model. Conditions (Co3) and (Co4) govern the behavior of the kernel

Γ

and the smoothing parameters

a_{n}

and

h_{n}

, ensuring the proper handling of the technical aspects of the estimator

\hat{C M} (θ)

. These requirements also allow us to express the convergence rate in a form analogous to the Nadaraya–Watson estimator, which can be viewed as arising from maximizing a double-kernel estimator of a conditional density. Thus, the assumptions under consideration effectively encompass the main components of the subject—namely, the model, the data structure, the correlation framework, and the convergence rate. These assumptions are not overly restrictive, especially given the complexity of the proposed functional time series model and the strength of the Borel–Cantelli (BC) type consistency. In fact, it is possible to establish a weaker form of consistency for the estimator under less stringent conditions. By employing techniques similar to those presented in [30], one can demonstrate weak consistency. Ultimately, the relationship between the assumptions and the resulting theoretical guarantees reflects a trade-off: stronger and more general results necessitate stronger assumptions. The next theorem establishes the almost-complete (a.co.) convergence of

\hat{C M} (θ)

.

Theorem 1.

Suppose (Co1)–(Co4) hold. If

inf_{p \in (0, 1)} \frac{\partial^{3}}{\partial p^{3}} Q u_{p} (θ) > 0,

then

\hat{C M} (θ) - C M (θ) = O (\frac{1}{ζ_{n}} \sum_{i = 1}^{n} a_{i}^{b} ξ_{i} (θ, a_{i})) + O (h_{n}) + O (\sqrt{\frac{ζ_{n} ln n}{n^{2} ξ_{n}^{2}}}), a . co .

Proof of the Main Result.

From the definitions of

\hat{C M} (θ)

and

C M (θ)

, it follows that

\begin{matrix} |\hat{C M} (θ) - C M (θ)| & = & |\hat{Q u_{\hat{p_{θ}}}} (θ) - Q u_{p_{θ}} (θ)| \\ = & |\hat{Q u_{\hat{p_{θ}}}} (θ) - Q u_{\hat{p_{θ}}} (θ) + Q u_{\hat{p_{θ}}} (θ) - Q u_{p_{θ}} (θ)| \\ \leq & sup_{p \in [a_{θ}, b_{θ}]} |{\hat{Q u}}_{p} (θ) - Q u_{p} (θ)| \end{matrix}

(4)

\begin{matrix} + sup_{p \in [a_{θ}, b_{θ}]} |Q u_{\hat{p_{θ}}} (θ) - Q u_{p_{θ}} (θ)| . \end{matrix}

(5)

Using a Taylor expansion, we have

Q u_{\hat{p_{θ}}} (θ) - Q u_{p_{θ}} (θ) = (\hat{p_{θ}} - p_{θ}) {\{Q u^{'}\}}_{p_{θ}^{*}} (θ), for some p_{θ}^{*} \in (\hat{p_{θ}}, p_{θ}) .

Since

p_{θ}

is the minimizer of

Q u^{'} (\cdot ∣ θ)

, we also obtain

Q u_{\hat{p_{θ}}}^{'} (θ) - Q u_{p_{θ}}^{'} (θ) = {(\hat{p_{θ}} - p_{θ})}^{2} {\{Q u^{‴}\}}_{p_{θ}^{* *}} (θ), for some p_{θ}^{* *} \in (\hat{p_{θ}}, p_{θ}) .

Analogously to (4), it follows that

|Q u_{\hat{p_{θ}}}^{'} (θ) - Q u_{p_{θ}}^{'} (θ)| \leq 2 sup_{p \in [a_{θ}, b_{θ}]} |{\hat{Q u^{'}}}_{p} (θ) - Q u_{p}^{'} (θ)| .

Because

inf_{p \in (0, 1)} \frac{\partial^{3}}{\partial p^{3}} Q u_{p} (θ) > 0,

we obtain

\hat{p_{θ}} - p_{θ} \leq C \sqrt{sup_{p \in [a_{θ}, b_{θ}]} |{\hat{Q u^{'}}}_{p} (θ) - Q u_{p}^{'} (θ)|} .

(6)

Combine (4) and (6) to obtain

\begin{matrix} |\hat{C M} (θ) - C M (θ)| \\ \leq C (sup_{p \in [a_{θ}, b_{θ}]} |{\hat{Q u}}_{p} (θ) - Q u_{p} (θ)| + \sqrt{sup_{p \in [a_{θ}, b_{θ}]} |{\hat{Q u^{'}}}_{p} (θ) - Q u_{p}^{'} (θ)|}) . \end{matrix}

(7)

Hence, determining the convergence rate reduces to studying

sup_{p \in [a_{θ}, b_{θ}]} |{\hat{Q u}}_{p} (θ) - Q u_{p} (θ)| and sup_{p \in [a_{θ}, b_{θ}]} |{\hat{Q u^{'}}}_{p} (θ) - Q u_{p}^{'} (θ)| .

Furthermore, asymptotically,

\begin{matrix} \hat{Q u_{p}^{'}} (θ) - Q u_{p}^{'} (θ) \\ = \frac{\hat{Q u_{p + h_{n}}} (θ) - Q u_{p + h_{n}} (θ) + Q u_{p - h_{n}} (θ) - \hat{Q u_{p - h_{n}}} (θ)}{2 h_{n}} \\ + \frac{Q u_{p + h_{n}} (θ) - Q u_{p} (θ) + Q u_{p} (θ) - Q u_{p - h_{n}} (θ) - 2 h_{n} Q u_{p}^{'} (θ)}{2 h_{n}} \\ \leq C h_{n}^{- 1} sup_{q \in (a_{θ} - h_{n}, b_{θ} + h_{n}]} |\hat{Q u_{q}} (θ) - Q u_{q} (θ)| + O (h_{n}) . \end{matrix}

Finally, Theorem 1 follows from the following lemmas. □

Lemma 1

([16]). Consider a family of real-valued random functions

{B_{n}}

, each of which is decreasing in γ. Let

{A_{n}}

be a real-valued random sequence. Suppose there exist positive constants

λ, M > 0

such that

A_{n} = o_{a . c o .} (1) and sup_{| γ | \leq M} |B_{n} (γ) + λ γ - A_{n}| = o_{a . c o .} (1) .

Then, for any real random sequence

{γ_{n}}

satisfying

B_{n} (γ_{n}) = o_{a . c o .} (1)

, it follows that

\sum_{n = 1}^{\infty} P \{| γ_{n} | \geq M\} < \infty .

Lemma 2.

Suppose (Co1) and (Co3)–(Co4) hold. Then, we have

{\hat{Q}}_{D} (θ) - {\bar{Q}}_{D} (θ) = O (\sqrt{\frac{ζ_{n} ln n}{n^{2} ξ_{n}^{2}}}), a . c o .

Moreover, there exists a constant

C > 0

such that

\sum_{n} I P ({\bar{Q}}_{D} (θ) < C) < \infty .

Here,

{\hat{Q}}_{D} (θ) : = \frac{1}{n ξ_{n}} \sum_{i = 1}^{n} Γ (a_{i}^{- 1} d (θ, I_{i}))

and

{\bar{Q}}_{D} (θ) : = \frac{1}{n ξ_{n}} \sum_{i = 1}^{n} I E [Γ (a_{i}^{- 1} d (θ, I_{i})) | F_{i - 1}] .

Proposition 1.

Assume (Co1)–(Co4) hold. Then, there is a positive constant λ such that

{\hat{Q u}}_{p} (θ) - Q u_{p} (θ) = \frac{1}{g (Q u_{p} ∣ θ)} A_{n} + O (sup_{| γ | \leq M} |B_{n} (γ) + λ γ - A_{n}|),

where

B_{n} (γ) = \frac{1}{n ξ_{n}} \sum_{i = 1}^{n} [p - 1_{{O_{i} \leq γ + Q u_{p} (θ)}}] Γ_{i}, and A_{n} = B_{n} (0) .

Proposition 2.

Under the same assumptions (Co1)–(Co4), we also have

sup_{p \in (0, 1)} |{\hat{Q u}}_{p} (θ) - Q u_{p} (θ)| = O (\frac{1}{ζ_{n}} \sum_{i = 1}^{n} a_{i}^{b} ξ_{i} (θ, a_{i})) + O (\sqrt{\frac{ζ_{n} ln n}{n^{2} ξ_{n}^{2}}}), a . c o .

4. Discussion and Comments

4.1. On the Ergodic Functional Time Series

Similarly to multivariate statistics, ergodicity plays a crucial role in functional statistics. In particular, ergodicity ensures that temporal averages converge to their corresponding stochastic means. This property is especially important, as it justifies the use of sample mean and covariance functions as consistent estimators of the true mean function and the true covariance operator. These estimators, in turn, allow for efficient estimation of eigenfunctions in functional principal component analysis (FPCA) and for accurate curve smoothing using a chosen basis, such as splines or Fourier functions. All these methodologies fundamentally rely on the sample mean and the empirical covariance operator. The ergodic behavior and functional characteristics of the time series under consideration are governed by assumption (Co1), which quantifies the concentration properties of the functional variables. This assumption is thoroughly discussed in [6], where it is shown that Co1(i) holds for a wide class of continuous processes whose probability measures are absolutely continuous with respect to the Wiener measure. Examples include the Poisson process, the Ornstein–Uhlenbeck process, fractional Brownian motion, and general diffusion processes. In this work, we also focus on the conditional version of this assumption, namely (Co1)(ii–iii). This extension enables us to account for the dependence structure of the process by analyzing its long-memory behavior, a standard approach in dynamic systems modeling and time series analysis. In such contexts, conditional distributions with respect to the past filtration

F_{i - 1}

are frequently employed to control process evolution, verify the martingale property, and assess predictability. Using arguments similar to those used in the unconditional case (Co1)(i), one can show that a trivial example of a functional ergodic process satisfying (Co1)(ii–iii) is when its conditional distribution, given the past, is absolutely continuous with respect to the Wiener measure. Additionally, the Karhunen–Loève decomposition can be used to represent such processes explicitly (see [6] for examples of functional processes admitting such a decomposition). It is worth emphasizing that while both mixing and ergodicity describe forms of dependence among observations, they are fundamentally different. Specifically, the mixing property implies that any two subsets of the state space become asymptotically independent over time, whereas ergodicity implies that the system’s trajectory visits all regions of the space in proportion to their probability measure. Importantly, ergodicity is generally easier to verify than mixing. It is well known that ergodicity does not imply mixing, and there exist numerous ergodic time series that fail to satisfy any form of mixing assumption. Prominent examples include the following:

–: First-order autoregressive processes with Bernoulli innovations (see [31]);
–: Gaussian processes with Hurst exponent $H > 0.5$ (see [32]);
–: Gaussian processes with non-decaying covariance structures (see [33]).

Additional examples are discussed in [34], and these models can be naturally extended to the functional setting.

4.2. The Conditional Mode Versus the Conditional Mean

In predictive modeling, particularly when the conditional distribution is asymmetric or multimodal, the conditional mode often yields more accurate and meaningful predictions than the conditional mean. While the conditional mean represents the average outcome given certain inputs, it can be significantly affected by outliers or skewness in the distribution. In contrast, the conditional mode reflects the most probable outcome, making it more robust, reliable, and informative. A similar conclusion applies to the conditional median, which is also less sensitive to extreme values than the mean. As a result, combining the conditional mode and median can significantly outperform the conditional mean in predictive tasks. This advantage becomes even more crucial in the context of ergodic functional time series, where providing robust predictors is essential. Ergodicity ensures that time averages converge to ensemble averages, offering a sound statistical basis for long-term forecasting. Moreover, the environmental data under study often exhibits seasonality, which can distort conditional mean predictions. In particular, repeating seasonal patterns can oversmooth the conditional expectation, reducing forecasting accuracy. The conditional mode, however, better captures the most likely outcomes within each seasonal segment, making it especially suitable for forecasting applications.

4.3. The Recursive Estimation in Action

As with all smoothing approaches, the choice of the bandwidth parameter

a_{i}

is critical to the quality of the estimation. Typically, the mean squared error serves as a fundamental criterion for selecting this parameter. In the recursive framework considered here, we adopt the selection algorithm proposed by [20]. The smoothing parameter

a_{i}

is fixed to

a_{i} = C i^{- υ}

, where

C = max d_{i} (θ, I_{i}),

and

υ

is selected by the cross-validation rule as follows

υ_{o p t} = arg min_{υ \in (0, 1)} \sum_{j = 1}^{n} {(O_{j} - {\hat{C M}}^{- j} (I_{j}))}^{2},

(8)

where

{\hat{C M}}^{- j}

is the leave-out-one estimator of the estimator. The rule (8) is similar to the cross-validation criterion considered by [10]. In our empirical analysis, the rule (8) is optimized over m equidistant points in the interval

(0, 1)

. Finally, it is worth noting that, although this selection approach has demonstrated good empirical performance, establishing its asymptotic optimality remains an important direction for future research.

4.4. The Computational Cost

Recall that, unlike traditional kernel estimators, which compute the estimate independently at each point, the recursive version updates the estimate sequentially with each new observation, potentially reducing execution time. Consequently, computational efficiency is a significant advantage of the recursive estimator. Quantifying this efficiency is particularly important in the context of large datasets or real-time applications. Specifically, if each update involves a constant number of operations, i.e., of order

O (1)

, then the total computational cost becomes

O (n)

, which is considerably more efficient than the

O (n^{2})

complexity of standard kernel estimators. However, in practical scenarios where the bandwidth is selected via adaptive tuning, the computational cost may increase. Despite its advantages, the recursive approach has a notable drawback: it requires storing past data, which negatively impacts memory usage. This limitation becomes especially critical when dealing with large sample sizes or high-dimensional data.

5. Simulation Study

In this simulation study, our objective is to investigate the feasibility and effectiveness of the proposed method. In particular, we seek to assess how dependence impacts the convergence rate comparing the algorithm’s performance under different scenarios such as varying dependence levels, signal-to-noise ratios, or outlier contamination. To achieve this, we first generate an artificial dataset following a nonparametric form:

\begin{matrix} heteroscedastic (Het .) Model : O_{i} & = & 4 \int_{0}^{1} sin (3 + I_{i}^{3} (t)) d t + cos (3 + I_{i}^{3} (t)) ϵ_{i}, \\ homoscedastic (Hom .) Model : O_{i} & = & 5 \int_{0}^{1} log (\frac{2 + I_{i}^{2} (t)}{3 + I_{i}^{3} (t)}) d t + ϵ_{i}, \end{matrix}

and heteroscedastic model with signal-to-noise ratio (SNRHet.) Model:

O_{i} = 4 \int_{0}^{1} sin (3 + I_{i}^{3} (t)) d t + σ_{S N R} ϵ_{i},

where

ϵ_{i}

and

I_{i}

are independent, and

σ_{S N R}

is controlled by considering various values of signal-to-noise ratio

S N R_{k} = 5 %

and

40 %

, where

S N R_{k} = \frac{σ_{S N R}^{2}}{\frac{1}{n} \sum_{i = 1}^{n} {(R_{i} - \bar{R})}^{2}}, R_{i} = 4 \int_{0}^{1} sin (3 + I_{i}^{3} (t)) .

The functional input variable is generated from dependent functional processes using the R-package FTSgof (https://www.r-project.org/). We have generated

n = 150

observations. The resulting functional variables are presented in Figure 1, Figure 2 and Figure 3.

Clearly, this sampling process encompasses three types of FTS-dependence, specifically functional autoregressive processes of order 2 (FAR(2)) involving two distinct kernels

Gaussian kernel ψ (t, s) = exp \{\frac{t^{2} + s^{2}}{2}\} t, s \in [0, 1],

Wiener kernel ψ (t, s) = t (1 - t) s (1 - s) t, s \in [0, 1] .

The third illustration of a functional covariate setting is the functional ARCH(1) model, whose conditional volatility depends on the following kernel:

Default kernel α (t, s) = 12 t (1 - t) s (1 - s), t, s \in [0, 1] .

(9)

This kernel is the default choice in the FTSgof R-package. In our experimental design, the correlation level is adjusted via the function fACF. It is evident that this aspect emphasizes the effect of dependency levels on the accuracy of estimates. Meanwhile, the impact of outliers is managed by adjusting the observation responses

{(O_{i})}_{i}

with a multiplicative factor

M F

, which can be either

M F = 1

or

M F = 10

. Meanwhile, the true values of the conditional mode, denoted by

C M (θ)

, are obtained from the distribution of the underlying white noise

ϵ_{i}

. This step is crucial because it allows us to evaluate how the nonparametric component influences the prediction task.

To investigate robustness, we consider three distinct distributions: Weibull, Laplace, and Log normal. These distributions are selected for their invariance under translation and varying degrees of heavy-tailed behavior, which in turn help to gauge the estimator’s resilience to outliers. We then compare the

L^{1}

-robustness of our estimator against multiple existing predictors.

To examine how recursion affects estimation, we contrast our proposed recursive estimator with a non-recursive version, in which the bandwidth parameter

a_{i}

remains fixed at

a_{n}

. We further assess robustness by comparing

\hat{C M}

against the double-kernel (DK) estimators given by

The NW - estimator : \tilde{C M} (θ) = arg max_{y} \frac{\sum_{i = 1}^{n} Γ (a_{n}^{- 1} d (θ, I_{i})) Γ (b_{n}^{- 1} (y - O_{i}))}{\sum_{i = 1}^{n} b_{n} Γ (a_{n}^{- 1} d (θ, I_{i}))},

(10)

and

The DK - recursive estimator : \bar{C M} (θ) = arg max_{y} \frac{\sum_{i = 1}^{n} Γ (a_{i}^{- 1} d (θ, I_{i})) Γ (b_{n}^{- 1} (y - O_{i}))}{\sum_{i = 1}^{n} b_{n} Γ (a_{i}^{- 1} d (θ, I_{i}))} .

(11)

The performance of these three estimators,

\hat{C M}

,

\tilde{C M}

, and

\bar{C M}

depends on the choice of parameters

(a_{n}, b_{n})

. Selecting semi-metric d and kernel

Γ

also influences efficiency. In particular,

Γ

is chosen to satisfy (Co3), while d controls the smoothing level of the functional predictors

I_{i}

. To examine the feasibility of the selector algorithm discussed in Section 4 we simulate with

a_{i} = C i^{- υ}

, where

υ

is chosen from a 10 equidistant grid in

(0, 1)

and

C = max d_{i} (θ, I_{i})

. Regarding the kernel

Γ

, we employ a quadratic kernel on

(0, 1)

, which is consistent with (Co3) and frequently used in nonparametric functional statistics. Moreover, the PCA metric proves especially suitable for cases in which the explanatory curves

I_{i}

exhibit discontinuities. In this empirical analysis, we proceed with the PCA associated with the third eigenfunction. Finally we point out that we took

b_{n} = a_{n} = C n^{- υ}

for the estimators

\tilde{C M}

, and

\bar{C M}

,

υ

being selected by the same manner as in the estimator

\hat{C M}

.

To compare the effectiveness of the estimators, we compute their mean square error (MSE) across all simulated scenarios,

MSE (\ddot{C M}) = \frac{1}{n} \sum_{i = 1}^{n} {(O_{i} - \ddot{C M} (I_{i}))}^{2},

(12)

where

\ddot{C M}

can represent either

\hat{C M}

,

\bar{C M}

, or

\tilde{C M}

. This metric enables us to contrast their performance under varying distributional assumptions dependence levels (GFAR(2) (Gaussian kernel based FAR(2)), WFAR(2) (Wiener kernel based FAR(2)), and FARCH(1)), signal-to-noise ratios, or outliers contamination. The results are reported in Table 1, Table 2 and Table 3.

The effectiveness of these estimators is substantially influenced by both the structure of the functional time series and the level of correlation. In addition, the predictor’s accuracy depends on the choice of nonparametric modeling. Nevertheless, empirical findings indicate that the recursive approach generally surpasses the NW method in terms of precision and that the

L^{1}

method exhibits more stable mean squared error (MSE) variability compared to double-kernel techniques. Consequently, the estimator

\hat{C M}

emerges as notably precise and robust since it combines the advantages of recursive algorithms with those of

L^{1}

-based techniques. Finally, we observe that all considered functional estimators are straightforward to implement and maintain acceptable accuracy across a variety of scenarios.

6. A Real Data Analysis

The purpose of this section is to evaluate how the

L^{1}

-recursive estimator of the conditional mode predictor performs in comparison with other recursive approaches. Specifically, we juxtapose it with recursive estimators of both the conditional mean and the conditional median. To conduct this forecasting problem, we rely on an environmental functional time series dataset. In particular, we focus on predicting air quality at a predetermined lead time by exploiting historical observations. The recursive predictors considered in this study are

(cond . mean) \hat{M E} (θ) = \frac{\sum_{i = 1}^{n} Γ (a_{i}^{- 1} d (θ, I_{i})) O_{i}}{\sum_{i = 1}^{n} Γ (a_{i}^{- 1} d (θ, I_{i}))},

and

(cond . median) \hat{C M} (θ) = arg min_{y} \frac{\sum_{i = 1}^{n} Γ (a_{i}^{- 1} d (θ, I_{i})) |O_{i} - y|}{\sum_{j = 1}^{n} Γ (a_{i}^{- 1} d (θ, I_{j}))} .

The dataset utilized for this comparative analysis is available online at https://gaftp.epa.gov/castnet (accessed on 8 January 2025). It contains information pertaining to the city of Bondville in Champaign, IL, USA. The monitoring station in question has the following geographical attributes, as shown in the following table.

Country	Sate	County	Code of Station	Geographical Coordinates
USA	lllinois	Champaign	BVL130	40.051981–88.372495

The dataset was recorded at hourly intervals from January through December 2024. The original set of observations is illustrated in Figure 4.

Recall that predicting ozone levels based on CO₂ concentrations is crucial for environmental sustainability. In Champaign, a city with a mix of urban traffic and surrounding agricultural activity, ozone levels can spike during hot, stagnant summer days, worsening air quality. Therefore, air quality in this region is significantly influenced by seasonal variations. Naturally, this effect can be managed by applying suitable seasonal data preprocessing techniques. Initially, we replaced missing values with the average of the nearest four values and employed correlation and causality analysis to pinpoint the right covariate variables. Following this preprocessing, we concluded that predicting CO₂ emissions three hours ahead using the past 24 h of historical data is more advantageous. To do that, we segment the 8736-h dataset into

N + 1 = 364

intervals, each denoted by

I_{i}

, and each interval

I_{i}

spans 24 h (i.e., one full day). Following this procedure, we define the output variable as

O_{i} = I_{i + 1} (3) .

(13)

The curve data are given in the following graph (Figure 5).

To construct the estimators

\hat{M E}

,

\hat{C M}

, and

\hat{M O}

, we maintain the same smoothing approach, the same kernel function, and the same distance metric (PCA metric associated to the third eigenfunction). We subsequently evaluate and compare these estimators using the following procedure:

Step 1. Randomly partition the dataset into two parts:
–
A training set, ${(I_{j}, O_{j})}_{j \in J}$ , consisting of 300 observations;
–
A test set, ${(I_{i}, O_{i})}_{i \in I}$ , consisting of 64 observations.
Step 2. For each $I_{i}$ in the training set, predict the corresponding response $O_{i}$ by applying:
–
Method 1 (Conditional mean):

$\hat{O_{i}^{M E}} = \hat{M E} (I_{i});$

–
Method 2 (Conditional mode):

$\hat{O_{i}^{M O}} = \hat{M O} (I_{i});$

–
Method 3 (Conditional median):

$\hat{O_{i}^{C M}} = \hat{C M} (I_{i}) .$
Step 3. For each $I_{new}$ in the test set, identify

$i^{*} = arg min_{I_{i} \in training set J} d (I_{new}, I_{i}),$

where $d (\cdot, \cdot)$ denotes the chosen distance function.
Step 4. Use the identified index $i^{*}$ to predict $O_{new}$ :
–
Method 1 (Conditional mean):

$\hat{O_{new}^{M E}} = \hat{M E} (I_{i^{*}});$

–
Method 2 (Conditional mode):

$\hat{O_{new}^{M O}} = \hat{M O} (I_{i^{*}});$

–
Method 3 (Conditional median):

$\hat{O_{new}^{C M}} = \hat{C M} (I_{i^{*}}) .$
Step 5. To assess the prediction accuracy among the methods, compute the square root of the mean squared error (SMSE):

$S M S E = \sqrt{\frac{1}{64} \sum_{i \in Test set} {(O_{i} - \hat{T} (I_{i}))}^{2}},$

where $\hat{T}$ can be $\hat{M E}$ , $\hat{C M}$ , or $\hat{M O}$ .
Step 6. Plot the actual response values versus the predicted values for each method.

Consistent with our expectations, the

L^{1}

-based recursive predictor

\hat{C M}

demonstrates superior performance compared to the alternative models,

\hat{M E}

and

\hat{C M}

, see Figure 6. This improvement in predictive accuracy is notably significant. To support this assertion, we computed the standardized mean squared error (SMSE). The SMSE for

\hat{C M}

was 3.26, while

\hat{M E}

and the second instance of

\hat{C M}

yielded SMSE values of 5.42 and 4.87, respectively. These predictive error measures are broadly consistent with the findings reported in [35], although one must consider the considerable differences in the climatic conditions of the regions studied. Moreover, to evaluate the sensitivity of the proposed predictors to various parameter settings, we re-ran the algorithm using the

L^{2}

metric associated with the B-spline basis function, as well as the

β

-kernel with shape parameters

(2, 3)

. Subsequently, we computed the SMSE for each scenario.

Once again,

\hat{C M}

appears to outperform the other models,

\hat{M E}

and

\hat{C M}

. Table 4 outlines the SMSE error for the different situations. It is evident that prediction is significantly influenced by the chosen parameters of the estimators. However, the choice of the metric has a greater impact compared to the choice of the kernel. It is clear that the PCA metric is more suitable for these data. This conclusion confirms the connection between the metric and the smoothing level of the curves. In fact, using the spline metric for discontinuous curves can over-smooth the functional covariate, leading to less accurate outcomes.

7. Conclusions

This work introduces a new predictor based on the estimation of the

L^{1}

-modal regression, constructed by means of a recursive procedure. The theoretical discussion provides the essential mathematical underpinnings that enable the straightforward, practical application of the proposed estimator. More specifically, we establish its asymptotic behavior under the fts-ergodic assumption, an alternative condition to the conventional correlation-based criteria.

Empirical evidence from artificial and real datasets confirms that the outlined implementation accommodates the theoretical assumptions. In particular, the accuracy of the estimator depends on the degree of correlation in the data, the smoothness of the underlying nonparametric model, and the careful selection of tuning parameters such as kernel and bandwidth. Notably, combining the

L^{1}

framework with a recursive approach offers improvements in terms of robustness and predictive precision.

In addition to these findings, the present study highlights several potential avenues for future investigation. One promising direction involves identifying the asymptotic distribution of the normalized estimator under various forms of fts, such as association or Markovian sequences. Another important extension concerns spatial modeling, which considers the geographical arrangement of the observations and supports more intricate prediction tasks. Although these extensions primarily focus on dependencies in the data, further generalizations to other smoothing methods, including the kNN approach, local linear estimators, and semi-partial linear techniques, remain equally compelling.

8. Proof of Propositions

Proof of Lemma 2.

First, we define

Γ_{i} = Γ (a_{i}^{- 1} d (θ, I_{i})) a n d L_{i} = Γ_{i} - I E [Γ_{i} | F_{i - 1}] .

Thus,

\begin{matrix} {\hat{Q}}_{D} (θ) - {\bar{Q}}_{D} (θ) = \frac{1}{n ξ_{n}} \sum_{i = 1}^{n} L_{i} . \end{matrix}

As

L

is a martingale difference for

q \geq 2

,

\begin{matrix} I E [L_{i}^{q} | F_{i - 1}] & \leq & C I E [L_{i}^{2} | F_{i - 1}] \\ \leq & C I E [L_{i}^{2} | F_{i - 1}] \\ < & C I P (I_{i} \in B (θ, a_{i}) | F_{i - 1}) \\ \leq & C ξ_{i} (θ, a_{i}) . \end{matrix}

Now, apply the exponential to obtain

\begin{matrix} I P \{|{\hat{Q}}_{D} (θ) - {\bar{Q}}_{D} (θ)| > ε\} & = & I P \{|\frac{1}{n ξ_{n}} \sum_{i = 1}^{n} L_{i}| > ε\} \\ \leq & 2 exp \{- \frac{ε^{2} n^{2} ξ_{n}^{2} (θ)}{2 (ζ_{n} + C ε n ξ_{n})}\} \\ = & 2 exp \{\frac{- ε^{2} n^{2} ξ_{n}^{2} (θ)}{2 ζ_{n}} (\frac{1}{1 + \frac{C ε n ξ_{n}}{ζ_{n}}})\} . \end{matrix}

Putting

ε = ϵ_{0} \frac{\sqrt{ζ_{n} ln n}}{n ξ_{n}}

, then,

I P \{|{\hat{Q}}_{D} (θ) - {\bar{Q}}_{D} (θ)| > ϵ_{0} \frac{\sqrt{ζ_{n} ln n}}{n ξ_{n}}\} \leq 2 exp \{\frac{- ϵ_{0}^{2} ln n}{2} (\frac{1}{1 + \frac{C ϵ_{0} \sqrt{ln n}}{\sqrt{ζ_{n}}}})\} .

Since

\frac{ln n}{ζ_{n}} \leq \frac{ln n}{C n ξ_{n}} \leq C^{'} \frac{ln n}{n ξ_{n}} \frac{ζ_{n}}{n ξ_{n}} .

Therefore,

lim_{n \mapsto \infty} \frac{ln n}{ζ_{n}} = 0 .

This implies that

I P \{|{\hat{Q}}_{D} (θ) - {\bar{Q}}_{D} (θ)| > ϵ_{0} \frac{\sqrt{ζ_{n} ln n}}{n ξ_{n}}\} \leq 2 exp \{\frac{- ϵ_{0}^{2} ln n}{2 C}\} \leq 2 n^{- C^{'} ϵ_{0}^{2}} .

Hence,

{\hat{Q}}_{D} (θ) - {\bar{Q}}_{D} (θ) = O (\sqrt{\frac{ζ_{n} ln n}{n^{2} ξ_{n}^{2}}}), a . c o .

For the second result, we have the following for

C > 0

0 < C \frac{\sum_{i = 1}^{n} I P (I_{i} \in B (θ, a_{i}) | F_{i - 1})}{\sum_{i = 1}^{n} I P (I_{i} \in B (θ, a_{i}))} \leq {\bar{Q}}_{D} (θ) \leq |{\bar{Q}}_{D} (θ) - {\hat{Q}}_{D} (θ)| + {\hat{Q}}_{D} (θ) .

Thus,

C \frac{\sum_{i = 1}^{n} I P (I_{i} \in B (θ, a_{i}) | F_{i - 1})}{\sum_{i = 1}^{n} I P (I_{i} \in B (θ, a_{i}))} - |{\hat{Q}}_{D} (θ) - {\bar{Q}}_{D} (θ)| < {\hat{Q}}_{D} (θ) .

So,

\begin{matrix} I P ({\hat{Q}}_{D} (θ) \leq \frac{C}{2}) \\ \leq I P (C \frac{\sum_{i = 1}^{n} I P (I_{i} \in B (θ, a_{i}) | F_{i - 1})}{\sum_{i = 1}^{n} I P (I_{i} \in B (θ, a_{i}))} - |{\hat{Q}}_{D} (θ) - {\bar{Q}}_{D} (θ)| < \frac{C}{2}) \\ \leq I P (|C \frac{\sum_{i = 1}^{n} I P (I_{i} \in B (θ, a_{i}) | F_{i - 1})}{\sum_{i = 1}^{n} I P (I_{i} \in B (θ, a_{i}))} - |{\hat{Q}}_{D} (θ) - {\bar{Q}}_{D} (θ)| - C| > \frac{C}{2}) . \end{matrix}

It follows that

\begin{matrix} \sum_{n} I P ({\hat{Q}}_{D} (θ) \leq \frac{C}{2}) \\ \leq \sum_{n} I P (|C \frac{\sum_{i = 1}^{n} I P (I_{i} \in B (θ, a_{i}) | F_{i - 1})}{\sum_{i = 1}^{n} I P (I_{i} \in B (θ, a_{i}))} - |{\hat{Q}}_{D} (θ) - {\bar{Q}}_{D} θ)| - C| > \frac{C}{2}) < \infty . \end{matrix}

Finally, from the first result, we obtain

\sum_{n} I P ({\bar{Q}}_{D} (θ) \leq \frac{C}{2}) < \infty .

□

Proof of Propositions 1.

Let

γ_{n} = {\hat{Q u}}_{p} (θ) - Q u_{p} (θ) .

Clearly,

B_{n} (γ_{n}) = 0

. Now, we check

A_{n} = o_{a . c o .} (1),

(14)

and there exist

M, λ > 0

such that

sup_{| γ | \leq M} | B_{n} (γ) + λ γ - A_{n} | = o_{a . c o .} (1) .

(15)

For (14), we evaluate

A_{n} - \bar{A_{n}} = O_{a . c o .} (\sqrt{\frac{ζ_{n} ln n}{n^{2} ξ_{n}^{2}}}) and \bar{A_{n}} = O (\frac{1}{ζ_{n}} \sum_{i = 1}^{n} a_{i}^{b} ξ_{i} (θ, a_{i})),

where

\bar{A_{n}} = \frac{1}{n ξ_{n}} \sum_{i = 1}^{n} I E [(p - 1 I_{[O_{i} \leq Q u_{p} (θ)]}) Γ_{i} | F_{i - 1}] .

Firstly, we have

\begin{matrix} \forall ϵ > 0 I P \{|A_{n} - \bar{A_{n}}| > ε\} & = & I P \{|\frac{1}{n ξ_{n}} \sum_{i = 1}^{n} Ψ_{i}| > ε\} \\ \leq & I P \{|\sum_{i = 1}^{n} Ψ_{i}| > ε n ξ_{n}\}, \end{matrix}

with

Ψ_{i} = \frac{1}{n ξ_{n}} (p - 1 I_{[O_{i} \leq Q u_{p} (θ)]}) Γ_{i} - I E [(p - 1 I_{[O_{i} \leq Q u_{p} (θ)]}) Γ_{i} | F_{i - 1}] .

Clearly,

Ψ_{i}

is a martingale difference with respect to the

σ

-algebra, and

{(F_{i - 1})}_{i}

satisfies the following for

q \geq 2

\begin{matrix} I E [Ψ_{i}^{q} | F_{i - 1}] & \leq & C I E [Ψ_{i}^{2} | F_{i - 1}] \\ \leq & C I E [Γ_{i}^{2} | F_{i - 1}] \\ < & C I P (I_{i} \in B (θ, a_{i}) | F_{i - 1}) \\ \leq & C ξ_{i} (θ, a_{i}) . \end{matrix}

Thus,

\begin{matrix} I P \{|A_{n} - \bar{A_{n}}| > ε\} & = & I P \{|\frac{1}{n ξ_{n}} \sum_{i = 1}^{n} Ψ_{i}| > ε\} \\ \leq & 2 exp \{- \frac{ε^{2} n^{2} ξ_{n}^{2}}{2 (ζ_{n} + C ε n ξ_{n})}\} \\ = & 2 exp \{\frac{- ε^{2} n^{2} ξ_{n}^{2}}{2 ζ_{n}} (\frac{1}{1 + \frac{C ε n ξ_{n}}{ζ_{n}}})\} . \end{matrix}

So, for

ε = ϵ_{0} \frac{\sqrt{ζ_{n} ln n}}{n ξ_{n}}

, we have,

I P \{|A_{n} - \bar{A_{n}}| > ϵ_{0} \frac{\sqrt{ζ_{n} ln n}}{n ξ_{n}}\} \leq 2 exp \{\frac{- ϵ_{0}^{2} ln n}{2} (\frac{1}{1 + \frac{C ϵ_{0} \sqrt{ln n}}{\sqrt{ζ_{n}}}})\} .

Since

\frac{ln n}{ζ_{n}} \leq \frac{ln n}{C n ξ_{n}} \leq C^{'} \frac{ln n}{n ξ_{n}} \frac{ζ_{n}}{n ξ_{n}} .

Then,

lim_{n \mapsto \infty} \frac{ln n}{ζ_{n}} = 0 .

Therefore,

I P \{|A_{n} - \bar{A_{n}}| > ϵ_{0} \frac{\sqrt{ζ_{n} ln n}}{n ξ_{n}}\} \leq 2 exp \{\frac{- ϵ_{0}^{2} ln n}{2 C}\} \leq 2 n^{- C^{'} ϵ_{0}^{2}} .

For the second one, we have

{1 I}_{B (θ,)} (I_{1}) | G (t, I_{i}) - G (t, θ) | \leq C a_{i}^{b}

Then,

\begin{matrix} \bar{A_{n}} & = & \frac{1}{n ξ_{n}} \sum_{i = 1}^{n} I E [(p - 1 I_{[O_{i} \leq Q u_{p} (θ)]}) Γ_{i} | F_{i - 1}] \\ = & \frac{1}{n ξ_{n}} \sum_{i = 1}^{n} (I E [Γ_{i} (G (Q u_{p} (θ) | θ) - I E [1 I_{[O_{i} \leq Q u_{p} (θ)]} | I_{i}]) | F_{i - 1}]) \\ \leq & \frac{1}{n ξ_{n}} \sum_{i = 1}^{n} I E [Γ_{i} {1 I}_{B (θ, a_{i})} (I_{i}) |G (Q u_{p} (θ) | θ) - G (Q u_{p} (θ) | I_{i})| | F_{i - 1}] . \\ \leq & \frac{1}{n ξ_{n}} \sum_{i = 1}^{n} a_{i}^{b} I E [Γ_{i} | F_{i - 1}] . \end{matrix}

(16)

Consequently,

| A_{n} - \bar{A_{n}} | = O_{a . c o .} (\sqrt{\frac{ζ_{n} ln n}{n^{2} ξ_{n}^{2}}})

and

\bar{A_{n}} = O (\frac{1}{ζ_{n}} \sum_{i = 1}^{n} a_{i}^{b} ξ_{i} (θ, a_{i})) .

For (15), similar to (14), we split the required result into parts

sup_{| γ | \leq M} |B_{n} (γ) - A_{n} - \frac{1}{n ξ_{n}} \sum_{i = 1}^{n} I E [(B_{n} (γ) - A_{n}) | F_{i - 1}]| = O_{a . c o .} (\sqrt{\frac{ζ_{n} ln n}{n^{2} ξ_{n}^{2}}}),

(17)

and the bias term

sup_{| γ | \leq M} |\frac{1}{n ξ_{n}} \sum_{i = 1}^{n} I E [B_{n} (γ) - A_{n} | F_{i - 1}] + g (Q u_{p} (θ)) γ| = O (\frac{1}{ζ_{n}} \sum_{i = 1}^{n} a_{i}^{b} ξ_{i} (θ, a_{i})) .

(18)

Let us start with the dispersion term in (17). We employ the compactness of the interval

[- M, M]

and write

[- M, M] \subset ⋃_{j = 1}^{d_{n}} [γ_{j} - l_{n}, γ_{j} + l_{n}], for γ_{j} \in [- M, M] and l_{n} = d_{n}^{- 1} = 1 / \sqrt{n} .

So, for all

γ \in [- M, M]

, we put

j (γ) = arg {min}_{j} | γ - γ_{j} |

and use the monotony of

B_{n} (\cdot)

and

I E [B_{n} (γ) | F_{i - 1}]

, which leads to the following for all

1 \leq j \leq d_{n}

,

B_{n} (γ_{j} + l_{n}) \leq sup_{y \in (γ_{j} - l_{n}, γ_{j} + l_{n})} B_{n} (γ) \leq B_{n} (γ_{j} - l_{n})

and

I E [B_{n} (γ_{j} + l_{n}) | F_{i - 1}] \leq sup_{γ \in (γ_{j} + l_{n}, γ_{j} + l_{n})} I E [B_{n} (γ) | F_{i - 1}] \leq I E [B_{n} (γ_{j} - l_{n}) | F_{i - 1}] .

We deduce, for any

γ_{1}, γ_{2} \in [- M, M]

,

|\frac{1}{n ξ_{n}} \sum_{i = 1}^{n} I E [B_{n} (γ_{1}) | F_{i - 1}] - \frac{1}{n ξ_{n}} \sum_{i = 1}^{n} I E [B_{n} (γ_{2}) | F_{i - 1}]| \leq C {| γ_{1} - γ_{2} |}^{b} {\bar{Q}}_{D} (x) .

It follows that

\begin{matrix} sup_{| γ | \leq M} |B_{n} (γ) - A_{n} - \frac{1}{n ξ_{n}} \sum_{i = 1}^{n} I E [(B_{n} (γ) - A_{n}) | F_{i - 1}]| \\ \leq max_{1 \leq j \leq d_{n}} max_{z \in {γ_{j} - l_{n}, γ_{j} + l_{n}}} |B_{n} (z) - A_{n} - \frac{1}{n ξ_{n}} \sum_{i = 1}^{n} I E [(B_{n} (z) - A_{n}) | F_{i - 1}]| \\ + 2^{b} C l_{n}^{b} {\bar{Q}}_{D} (x) . \end{matrix}

Concerning

l_{n}^{b}

, we write

\begin{matrix} \frac{l_{n}^{b}}{\sqrt{\frac{ζ_{n} log n}{n^{2} ξ_{n}^{2}}}} & = & \frac{l_{n}^{b} n ξ_{n}}{\sqrt{ζ_{n} log n}} \\ = & \frac{n ξ_{n}}{\sqrt{n ζ_{n} log n}} \\ = & \sqrt{\frac{\sum_{i = 1}^{n} ξ (θ, a_{i})}{n}} \sqrt{\frac{\sum_{i = 1}^{n} ξ (θ, a_{i})}{\sum_{i = 1}^{n} ξ_{i} (θ, a_{i})}} \sqrt{\frac{1}{log n}} . \end{matrix}

Furthermore, as

ξ (θ, a_{i}) \leq 1

we have, for all n,

\frac{\sum_{i = 1}^{n} ξ (θ, a_{i})}{n} \leq 1,

and by ((C1))(iii)

lim_{n \mapsto \infty} \frac{\sum_{i = 1}^{n} ξ (θ, a_{i})}{\sum_{i = 1}^{n} ξ_{i} (θ, a_{i})} \leq lim_{n \mapsto \infty} \frac{\sum_{i = 1}^{n} ξ (θ, a_{i})}{\sum_{i = 1}^{n} I P (I_{i} \in B (x, r_{i}) | F_{i - 1})} = 1 .

Finally, we can write

l_{n}^{b} = O (\sqrt{\frac{ζ_{n} log n}{n^{2} ξ_{n}^{2}}}) .

Dealing with

sup_{| γ | \leq M} |B_{n} (γ_{j}) - A_{n} - \frac{1}{n ξ_{n}} \sum_{i = 1}^{n} I E [B_{n} (γ_{j}) - A_{n} | F_{i - 1}]|,

we set

B_{n} (γ_{j}) - A_{n} - \frac{1}{n ξ_{n}} \sum_{i = 1}^{n} I E [B_{n} (γ_{j}) - A_{n} |] = \frac{1}{n ξ_{n}} \sum_{i = 1}^{n} Υ_{i},

with

\begin{matrix} Υ_{i} & = & (1 I_{O_{i} \leq Q u_{p} (θ)} - 1 I_{O_{i} \leq γ_{j} + Q u_{p} (θ)}) Γ_{i} \\ - I E [(1 I_{O_{i} \leq Q u_{p} (θ)} - 1 I_{O_{i} \leq γ_{j} + Q u_{p} (θ)}) Γ_{i} | F_{i - 1}] . \end{matrix}

As in

A_{n}

,

\begin{matrix} I E [Υ_{i}^{q} | F_{i - 1}] & \leq & C I E [Υ_{i}^{2} | F_{i - 1}] \\ \leq & C I E [Γ_{i}^{2} | F_{i - 1}] \\ < & C I P (I_{i} \in B (θ, a_{i}) | F_{i - 1}) \\ \leq & C ξ_{i} (θ, a_{i}) . \end{matrix}

Therefore,

\begin{matrix} I P \{|B_{n} (γ_{j}) - A_{n} - \frac{1}{n ξ_{n}} \sum_{i = 1}^{n} I E [B_{n} (γ_{j}) - A_{n} | F_{i - 1}]| > ϵ_{0} \frac{\sqrt{ζ_{n} ln n}}{n ξ_{n}}\} \\ \leq 2 exp \{\frac{- ϵ_{0}^{2} ln n}{2 C}\} \leq 2 n^{- C^{'} ϵ_{0}^{2}} . \end{matrix}

Consequently,

\begin{matrix} \sum_{n} I P (sup_{| γ | \leq M} |B_{n} (γ_{j (γ)}) - A_{n} - \frac{1}{n ξ_{n}} \sum_{i = 1}^{n} I E [B_{n} (γ_{j (γ)}) - A_{n}]| \geq ϵ_{0} (\frac{\sqrt{ζ_{n} ln n}}{n ξ_{n}})) \\ \leq \sum_{n} d_{n} max_{j} I P (|B_{n} (γ_{j}) - A_{n} - \frac{1}{n ξ_{n}} \sum_{i = 1}^{n} I E [B_{n} (γ_{j}) - A_{n}]| \geq ϵ_{0} (\frac{\sqrt{ζ_{n} ln n}}{n ξ_{n}})) < \infty, \end{matrix}

implying (17). Concerning (18), we write

\begin{matrix} \frac{1}{n ξ_{n}} \sum_{i = 1}^{n} I E [B_{n} (γ) - A_{n} | F_{i - 1}] \\ = - \frac{1}{n ξ_{n}} \sum_{i = 1}^{n} I E [(1 I_{[O_{1} \leq γ + Q u_{p} (θ)]} - 1 I_{[O_{1} \leq Q u_{p} (θ)]}) Γ_{i} | F_{i - 1}] \\ = - \frac{1}{n ξ_{n}} \sum_{i = 1}^{n} I E [(G (γ + Q u_{p} (θ) | I_{1}) - G (Q u_{p} (θ) | I_{1})) Γ_{i} | F_{i - 1}] \\ = - \frac{1}{n ξ_{n}} \sum_{i = 1}^{n} I E [(G (γ + Q u_{p} (θ) ∣ θ) - g (Q u_{p} (θ) ∣ θ)) Γ_{i} | F_{i - 1}] + O (a_{i}^{b}) \\ = - \frac{γ g (Q u_{p} (θ) ∣ θ)}{ξ_{n}} \frac{1}{n} \sum_{i = 1}^{n} I E [Γ_{i} | F_{i - 1}] + O (\frac{1}{ζ_{n}} \sum_{i = 1}^{n} a_{i}^{b} ξ_{i} (θ, a_{i})) + o (γ) . \end{matrix}

It follows that

\begin{matrix} I E [B_{n} (γ) - A_{n} | F_{i - 1}] \\ = - g (Q u_{p} ∣ θ) {\bar{Q}}_{D} (x) γ + O (\frac{1}{ζ_{n}} \sum_{i = 1}^{n} a_{i}^{b} ξ_{i} (θ, a_{i})) + o (γ) . \end{matrix}

Therefore, the Bahadur representation of

{\hat{Q u}}_{p} (θ) - Q u_{p} (θ)

is

{\hat{Q u}}_{p} (θ) - Q u_{p} (θ) = \frac{1}{g (Q u_{p} ∣ θ)} A_{n} + O (sup_{| γ | \leq M} | B_{n} (γ) + λ γ - A_{n} |) .

□

Proof of Propositions 2.

The uniform consistency of

{\hat{Q u}}_{p} (θ) - Q u_{p} (θ)

is based on

sup_{p \in [0, 1]} | A_{n} - I E [A_{n}] | = O_{a . c o .} (\frac{\sqrt{ζ_{n} ln n}}{n ξ_{n}}),

(19)

and

sup_{p \in [0, 1]} sup_{| γ | \leq M} | B_{n} (γ) + g (Q u_{p} ∣ θ) γ - A_{n} | = O_{a . c o .} (\frac{\sqrt{ζ_{n} ln n}}{n ξ_{n}}) .

(20)

Since the inequalities in the bias terms are uniform on

p \in [0, 1]

. We only focus on the dispersion terms of

sup_{p \in [0, 1]} | A_{n} (p) | and sup_{p \in [0, 1]} sup_{| γ | \leq M} | F_{n} (γ, p) |,

where

A_{n} (p) = |A_{n} - \frac{1}{n ξ_{n}} \sum_{i = 1}^{n} I E [A_{n} | F_{i - 1}]|,

and

F_{n} (γ, p) = |B_{n} (γ) - A_{n} - \frac{1}{n ξ_{n}} \sum_{i = 1}^{n} I E [(B_{n} (γ) - A_{n}) | F_{i - 1}]| .

We focus on the first-term, the second one is similar. Indeed,

[0, 1] \subset ⋃_{k = 1}^{d_{n}} [p_{k} - l_{n}, p_{k} + l_{n}], f o r p_{k} \in [0, 1] .

Next, for all

p \in [0, 1]

, we put

η_{p} = arg {min}_{k} | p - p_{k} |

. Then,

sup_{p \in [0, 1]} | A_{n} (p) | \leq max_{1 \leq j \leq d_{n}} max_{z_{p} \in {p_{j} - l_{n}, p_{j} + l_{n}}} | A_{n} (z_{p}) | + 2^{b} C l_{n}^{b} {\bar{Q}}_{D} (x) .

It is shown in Lemma that for all

p \in (0, 1)

that

I P (|A_{n} (z_{p})| \geq ϵ_{0} (\frac{\sqrt{ζ_{n} ln n}}{n ξ_{n}})) \leq 2 n^{- C^{'} ϵ_{0}^{2}} .

Therefore,

\begin{matrix} \sum_{n} I P (sup_{p \in [0, 1]} | A_{n} (p) | \geq ϵ_{0} (\frac{\sqrt{ζ_{n} ln n}}{n ξ_{n}})) \\ \leq \sum_{n} d_{n} max_{j} I P (|A_{n} (p)| \geq ϵ_{0} (\frac{\sqrt{ζ_{n} ln n}}{n ξ_{n}})) < \infty . \end{matrix}

The uniform consistency of (

{sup}_{p \in [0, 1]} | I E [A_{n}] |

) is obtained by taking the uniform version of (16), which allows us to conclude that

sup_{p \in [0, 1]} | {\hat{Q u}}_{p} (θ) - Q u_{p} (θ) | = O (\frac{1}{ζ_{n}} \sum_{i = 1}^{n} a_{i}^{b} ξ_{i} (θ, a_{i})) + O (\sqrt{\frac{ζ_{n} ln n}{n^{2} ξ_{n}^{2}}}) .

□

Author Contributions

Conceptualization, M.B.A., F.A.A., I.M.A., S.B. and A.L.; Methodology, M.B.A., F.A.A., I.M.A., S.B. and A.L.; Software, M.B.A., F.A.A., I.M.A., S.B. and A.L.; Validation, F.A.A., I.M.A., S.B. and A.L.; Formal analysis, M.B.A., F.A.A., I.M.A., S.B. and A.L.; Investigation, M.B.A., I.M.A., S.B. and A.L. All authors have read and agreed to the final version of the manuscript.

Funding

This research was funded by Princess Nourah bint Abdulrahman University Researchers; Supporting Project number (PNURSP2025R515), Princess Nourah bint Abdulrahman University, Riyadh, Saudi Arabia; and the Deanship of Scientific Research and Graduate Studies at King Khalid University through the Small Research Groups Program under grant number RGP1/41/46.

Data Availability Statement

The data used in this study are available through the link https://kilthub.cmu.edu (accessed on 2 February 2025).

Acknowledgments

The authors thank and extend their appreciation to the funders of this work: This work was supported by Princess Nourah bint Abdulrahman University Researchers Supporting Project number (PNURSP2025R515), Princess Nourah bint Abdulrahman University, Riyadh, Saudi Arabia, and the Deanship of Scientific Research and Graduate Studies at King Khalid University through the Small Research Groups Program under grant number RGP1/41/46. The authors extend their sincere gratitude to the Editor-in-Chief, the Associate Editor, and the reviewers for their invaluable feedback and for pointing out a number of oversights in the version initially submitted. Their insightful comments have greatly refined and focused the original work, resulting in markedly improved presentation.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Collomb, G.; Härdle, W.; Hassani, S. A note on prediction via estimation of the conditional mode function. J. Stat. Plan. Inference 1986, 15, 227–236. [Google Scholar] [CrossRef]
Quintela-Del-Rio, A.; Vieu, P. A nonparametric conditional mode estimate. J. Nonparametr. Stat. 1997, 8, 253–266. [Google Scholar] [CrossRef]
Ioannides, D.; Matzner-Løber, E. A note on asymptotic normality of convergent estimates of the conditional mode with errors-in-variables. J. Nonparametr. Stat. 2004, 16, 515–524. [Google Scholar] [CrossRef]
Louani, D.; Ould-Saïd, E. Asymptotic normality of kernel estimators of the conditional mode under strong mixing hypothesis. J. Nonparametr. Stat. 1999, 11, 413–442. [Google Scholar] [CrossRef]
Allaoui, S.; Bouzebda, S.; Chesneau, C.; Liu, J. Uniform almost sure convergence and asymptotic distribution of the wavelet-based estimators of partial derivatives of multivariate density function under weak dependence. J. Nonparametr. Stat. 2021, 33, 170–196. [Google Scholar] [CrossRef]
Ferraty, F.; Laksaci, A.; Vieu, P. Estimating some characteristics of the conditional distribution in nonparametric functional models. Stat. Inference Stoch. Process. 2006, 9, 47–76. [Google Scholar] [CrossRef]
Dabo-Niang, S.; Kaid, Z.; Laksaci, A. On spatial conditional mode estimation for a functional regressor. Stat. Probab. Lett. 2012, 82, 1413–1421. [Google Scholar] [CrossRef]
Ezzahrioui, M.H.; Ould-Saïd, E. Asymptotic normality of a nonparametric estimator of the conditional mode function for functional data. J. Nonparametr. Stat. 2008, 20, 3–18. [Google Scholar] [CrossRef]
Ezzahrioui, M.H.; Saïd, E.O. Some asymptotic results of a non-parametric conditional mode estimator for functional time-series data. Stat. Neerl. 2010, 64, 171–201. [Google Scholar] [CrossRef]
Ferraty, F.; Vieu, P. Functional Nonparametric Prediction Methodologies. In Nonparametric Functional Data Analysis: Theory and Practice; Springer: New York, NY, USA, 2006; pp. 49–59. [Google Scholar]
Dabo-Niang, S.; Kaid, Z.; Laksaci, A. Asymptotic properties of the kernel estimate of spatial conditional mode when the regressor is functional. AStA Adv. Stat. Anal. 2015, 99, 131–160. [Google Scholar] [CrossRef]
Bouanani, O.; Laksaci, A.; Rachdi, M.; Rahmani, S. Asymptotic normality of some conditional nonparametric functional parameters in high-dimensional statistics. Behaviormetrika 2019, 46, 199–233. [Google Scholar] [CrossRef]
Ling, N.; Liu, Y.; Vieu, P. Conditional mode estimation for functional stationary ergodic data with responses missing at random. Statistics 2016, 50, 991–1013. [Google Scholar] [CrossRef]
Almanjahie, I.M.; Kaid, Z.; Laksaci, A.; Rachdi, M. Estimating the conditional density in scalar-on-function regression structure: K-NN local linear approach. Mathematics 2022, 10, 902. [Google Scholar] [CrossRef]
Attouch, M.; Bouabsa, W. The k-nearest neighbors estimation of the conditional mode for functional data. Rev. Roum. Math. Pures Appl. 2013, 58, 393–415. [Google Scholar]
Azzi, A.; Belguerna, A.; Laksaci, A.; Rachdi, M. The scalar-on-function modal regression for functional time series data. J. Nonparametr. Stat. 2024, 36, 503–526. [Google Scholar] [CrossRef]
Wang, T. Non-parametric Estimator for Conditional Mode with Parametric Features. Oxf. Bull. Econ. Stat. 2024, 86, 44–73. [Google Scholar] [CrossRef]
Schouten, B.; Klausch, T.; Buelens, B.; Van Den Brakel, J. A Cost–Benefit Analysis of Reinterview Designs for Estimating and Adjusting Mode Measurement Effects: A Case Study for the Dutch Health Survey and Labour Force Survey. J. Surv. Stat. Methodol. 2024, 12, 790–813. [Google Scholar] [CrossRef]
Guenani, S.; Bouabsa, W.; Omar, F.; Kadi Attouch, M.; Khardani, S. Some asymptotic results of a kNN conditional mode estimator for functional stationary ergodic data. Commun. Stat.-Theory Methods 2024, 54, 3094–3113. [Google Scholar] [CrossRef]
Thiam, A.; Thiam, B.; Crambes, C. Recursive estimation of nonparametric regression with functional covariate. Qual. Control Appl. Stat. 2014, 59, 527–528. [Google Scholar]
Slaoui, Y. Recursive nonparametric regression estimation for dependent strong mixing functional data. Stat. Inference Stoch. Process. 2020, 23, 665–697. [Google Scholar] [CrossRef]
Alamari, M.B.; Almulhim, F.A.; Litimein, O.; Mechab, B. Strong Consistency of Incomplete Functional Percentile Regression. Axioms 2024, 13, 444. [Google Scholar] [CrossRef]
Shang, H.L.; Yang, Y. Nonstationary functional time series forecasting. J. Forecast. 2024. early view. [Google Scholar] [CrossRef]
Aneiros, G.; Horová, I.; Hušková, M.; Vieu, P. Special Issue on Functional Data Analysis and Related Fields. J. Multivar. Anal. 2022, 189, 104908. [Google Scholar] [CrossRef]
Moindjié, I.A.; Preda, C.; Dabo-Niang, S. Fusion regression methods with repeated functional data. Comput. Stat. Data Anal. 2025, 203, 108069. [Google Scholar] [CrossRef]
Gertheiss, J.; Rügamer, D.; Liew, B.X.; Greven, S. Functional data analysis: An introduction and recent developments. Biom. J. 2024, 66, e202300363. [Google Scholar] [CrossRef]
Agua, B.M.; Bouzebda, S. Single index regression for locally stationary functional time series. AIMS Math. 2024, 9, 36202–36258. [Google Scholar] [CrossRef]
Bouanani, O.; Bouzebda, S. Limit theorems for local polynomial estimation of regression for functional dependent data. AIMS Math. 2024, 9, 23651–23691. [Google Scholar] [CrossRef]
Bouzebda, S. Weak convergence of the conditional single index U-statistics for locally stationary functional time series. AIMS Math. 2024, 9, 14807–14898. [Google Scholar] [CrossRef]
Xu, R.; Wang, J. L 1-estimation for spatial nonparametric regression. J. Nonparametr. Stat. 2008, 20, 523–537. [Google Scholar] [CrossRef]
Andrews, D. First Order Autoregressive Processes and Strong Mixing; Cowles Foundation Discussion Papers 664. Cowles Foundation for Research in Economics; Yale University: New Haven, CT, USA, 1983. [Google Scholar]
Beran, J. Statistics for Long-Memory Processes; Routledge: London, UK, 2017. [Google Scholar]
Ibragimov, I.A.; Rozanov, Y.A.E. Gaussian Random Processes; Springer Science & Business Media: Berlin/Heidelberg, Germany, 2012; Volume 9. [Google Scholar]
Magdziarz, M.; Weron, A. Ergodic properties of anomalous diffusion processes. Ann. Phys. 2011, 326, 2431–2443. [Google Scholar] [CrossRef]
Aneiros-Pérez, G.; Cardot, H.; Estévez-Pérez, G.; Vieu, P. Maximum ozone concentration forecasting by functional non-parametric approaches. Environmetrics 2004, 15, 675–685. [Google Scholar] [CrossRef]

Figure 1. Functional autoregressive order 2: Wiener kernel.

Figure 2. Functional autoregressive order 2: Gaussian kernel.

Figure 3. Functional GRCH order 1.

Figure 4. The CO₂ emission during 2024.

Figure 5. A sample of 50 curves of daily CO₂ emission.

Figure 6. Comparison between the three predictors.

Table 1.

M S E

errors of the estimator

\hat{C M}

.

Table 1.

M S E

errors of the estimator

\hat{C M}

.

MF	FTS	Dist.	Het.	Hom.	SNRHet (5%)	SNRHet (40%)
MF = 1	GFAR(2)	Laplace	0.176	0.154	1.174	2.197
		Log normal	0.183	0.166	1.187	2.198
		Weibull	0.196	0.172	1.189	2.208
	WFAR(2)	Laplace	0.165	0.143	0.158	2.171
		Log normal	0.173	0.141	1.153	2.185
		Weibull	0.170	0.153	1.170	2.192
	FARCH(1)	Laplace	0.201	0.182	1.204	2.209
		Log normal	0.223	0.209	1.311	2.534
		Weibull	0.240	0.223	1.412	2.626
MF = 10	GFAR(2)	Laplace	0.256	0.261	1.353	2.413
		Log normal	0.317	0.312	1.487	2.507
		Weibull	0.296	0.542	1.618	2.698
	WFAR(2)	Laplace	0.356	0.334	1.385	0.497
		Log normal	0.432	0.513	1.635	2.758
		Weibull	0.408	0.443	1.489	2.595
	FARCH(1)	Laplace	0.513	0.523	1.641	2.691
		Log normal	0.434	0.419	1.453	2.554
		Weibull	0.504	0.514	1.562	1.516

Table 2.

M S E

errors of the estimator

\tilde{C M}

.

Table 2.

M S E

errors of the estimator

\tilde{C M}

.

MF	FTS	Dist.	HeT.	Hom.	SNRHet (5%)	SNRHet (40%)
MF = 1	GFAR(2)	Laplace	1.161	1.109	2.008	4.378
		Log normal	1.304	0.963	2.789	4.678
		Weibull	3.239	2.107	3.896	4.894
	WFAR(2)	Laplace	0.876	0.403	1.097	1.856
		Log normal	0.606	0.236	1.765	2.785
		Weibull	1.690	1.327	2.045	2.976
	FARCH(1)	Laplace	2.332	1.763	2.435	4.554
		Log normal	2.204	1.398	3.971	5.861
		Weibull	5.109	3.712	5.023	6.432
MF = 10	GFAR(2)	Laplace	4.201	4.216	7.312	8.417
		Log normal	4.230	4.736	6.789	7.678
		Weibull	5.117	6.107	7.186	8.243
	WFAR(2)	Laplace	3.654	3.212	4.178	5.164
		Log normal	3.902	4.561	5.605	6.194
		Weibull	4.310	4.127	6.205	6.817
	FARCH(1)	Laplace	6.231	7.862	8.333	9.352
		Log normal	4.315	5.493	6.771	8.662
		Weibull	7.101	7.513	8.224	9.533

Table 3.

M S E

errors of the estimators

\bar{C M}

.

Table 3.

M S E

errors of the estimators

\bar{C M}

.

MF	FTS	Dist.	Het.	Hom.	SNRHet (5%)	SNRHet (40%)
MF = 1	GFAR(2)	Laplace	1.535	1.331	2.103	4.414
		Log normal	1.202	1.106	2.452	4.786
		Weibull	2.119	1.811	3.02	4.949
	WFAR(2)	Laplace	0.167	0.156	1.861	3.843
		Log normal	0.134	0.136	1.451	3.073
		Weibull	0.109	0.117	0.698	1.785
	FARCH(1)	Laplace	3.101	2.512	3.972	4.952
		Log normal	2.603	2.363	3.861	5.045
		Weibull	4.009	3.227	4.961	5.895
MF = 10	GFAR(2)	Laplace	2.552	2.312	4.132	6.447
		Log normal	2.221	3.166	5.421	8.761
		Weibull	4.191	5.823	6.211	8.991
	WFAR(2)	Laplace	2.171	2.164	4.812	6.832
		Log normal	2.142	0.162	1.412	3.264
		Weibull	2.192	2.173	3.682	5.751
	FARCH(1)	Laplace	8.112	8.521	9.128	9.921
		Log normal	4.611	4.169	6.161	10.012
		Weibull	9.018	10.271	11.031	12.185

Table 4.

S M S E

errors of the three predictors with respect to different metrics and different kernels.

Table 4.

S M S E

errors of the three predictors with respect to different metrics and different kernels.

Model	Metric	Kernel	SMSE
$\hat{C M}$	PCA (3th eigenfunction)	Quadratic kernel	3.26
	PCA (3th eigenfunction)	$β$ -kernel	3.37
	8th eigenfunction	Quadratic kernel	4.03
	8th eigenfunction	$β$ -kernel	4.11
	$L_{2}$ Spline metric	Quadratic kernel	4.39
	$L_{2}$ Spline metric	$β$ -kernel	4.52
$\hat{M E}$	PCA (3th eigenfunction)	Quadratic kernel	5.42
	PCA (3th eigenfunction)	$β$ -kernel	5.61
	8th eigenfunction	Quadratic kernel	7.56
	8th eigenfunction	$β$ -kernel	8.22
	$L_{2}$ Spline metric	Quadratic kernel	6.45
	$L_{2}$ Spline metric	$β$ -kernel	6.82
$\hat{C M}$	PCA (3th eigenfunction)	Quadratic kernel	4.87
	PCA (3th eigenfunction)	$β$ -kernel	5.11
	8th eigenfunction	Quadratic kernel	8.62
	8th eigenfunction	$β$ -kernel	8.34
	$L_{2}$ Spline metric	Quadratic kernel	6.12
	$L_{2}$ Spline metric	$β$ -kernel	6.73

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Alamari, M.B.; Almulhim, F.A.; Almanjahie, I.M.; Bouzebda, S.; Laksaci, A. Scalar-on-Function Mode Estimation Using Entropy and Ergodic Properties of Functional Time Series Data. Entropy 2025, 27, 552. https://doi.org/10.3390/e27060552

AMA Style

Alamari MB, Almulhim FA, Almanjahie IM, Bouzebda S, Laksaci A. Scalar-on-Function Mode Estimation Using Entropy and Ergodic Properties of Functional Time Series Data. Entropy. 2025; 27(6):552. https://doi.org/10.3390/e27060552

Chicago/Turabian Style

Alamari, Mohammed B., Fatimah A. Almulhim, Ibrahim M. Almanjahie, Salim Bouzebda, and Ali Laksaci. 2025. "Scalar-on-Function Mode Estimation Using Entropy and Ergodic Properties of Functional Time Series Data" Entropy 27, no. 6: 552. https://doi.org/10.3390/e27060552

APA Style

Alamari, M. B., Almulhim, F. A., Almanjahie, I. M., Bouzebda, S., & Laksaci, A. (2025). Scalar-on-Function Mode Estimation Using Entropy and Ergodic Properties of Functional Time Series Data. Entropy, 27(6), 552. https://doi.org/10.3390/e27060552

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Scalar-on-Function Mode Estimation Using Entropy and Ergodic Properties of Functional Time Series Data

Abstract

1. Introduction

1.1. Contributions of This Paper

1.2. Paper Organization

2. The $L^{1}$ -Recursive Estimation of the Mode

3. Main Results

4. Discussion and Comments

4.1. On the Ergodic Functional Time Series

4.2. The Conditional Mode Versus the Conditional Mean

4.3. The Recursive Estimation in Action

4.4. The Computational Cost

5. Simulation Study

6. A Real Data Analysis

7. Conclusions

8. Proof of Propositions

Author Contributions

Funding

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Article Menu

Scalar-on-Function Mode Estimation Using Entropy and Ergodic Properties of Functional Time Series Data

Abstract

1. Introduction

1.1. Contributions of This Paper

1.2. Paper Organization

2. The L 1 -Recursive Estimation of the Mode

3. Main Results

4. Discussion and Comments

4.1. On the Ergodic Functional Time Series

4.2. The Conditional Mode Versus the Conditional Mean

4.3. The Recursive Estimation in Action

4.4. The Computational Cost

5. Simulation Study

6. A Real Data Analysis

7. Conclusions

8. Proof of Propositions

Author Contributions

Funding

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

2. The $L^{1}$ -Recursive Estimation of the Mode