Next Article in Journal
MEFL: Meta-Equilibrize Federated Learning for Imbalanced Data in IoT
Previous Article in Journal
Universal Encryption of Individual Sequences Under Maximal Information Leakage
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Scalar-on-Function Mode Estimation Using Entropy and Ergodic Properties of Functional Time Series Data

by
Mohammed B. Alamari
1,
Fatimah A. Almulhim
2,
Ibrahim M. Almanjahie
1,
Salim Bouzebda
3,* and
Ali Laksaci
1
1
Department of Mathematics, College of Science, King Khalid University, Abha 62223, Saudi Arabia
2
Department of Mathematical Sciences, College of Science, Princess Nourah bint Abdulrahman University, P.O. Box 84428, Riyadh 11671, Saudi Arabia
3
Université de Technologie de Compiègne, LMAC (Laboratory of Applied Mathematics of Compiègne), 60203 Compiègne, France
*
Author to whom correspondence should be addressed.
Entropy 2025, 27(6), 552; https://doi.org/10.3390/e27060552
Submission received: 18 April 2025 / Revised: 14 May 2025 / Accepted: 22 May 2025 / Published: 24 May 2025
(This article belongs to the Section Information Theory, Probability and Statistics)

Abstract

:
In this paper, we investigate the recursive L 1 estimator of the conditional mode when the input variable takes values in a pseudo-metric space. The new proposed estimator is constructed under an ergodicity assumption, which provides a robust alternative to the standard mixing processes in various practical settings. The particular interest of this contribution arises from the difficulty in incorporating the mathematical properties of a functional mixing process. In contrast, ergodicity is characterized by the Kolmogorov–Sinai entropy, which measures the dynamics, the sparsity, and the microscopic fluctuations of the functional process. Using an observation sampled from ergodic functional time series (fts), we establish the asymptotic properties of this estimator. In particular, we derive its convergence rate and show Borel–Cantelli (BC) consistency. The general expression for the convergence rate is then specialized to several notable scenarios, including the independence case, the classical kernel method, and the vector-valued case. Finally, numerical experiments on both simulated and real-world datasets demonstrate the superiority of the L 1 -recursive estimator compared to existing competitors.

1. Introduction

Investigating the joint behavior of two random variables in a functional setting is an active area of applied statistics, as it facilitates quantifying the influence of a functional covariate on a scalar response. Numerous functional approaches have been proposed to capture this relationship, including conditional expectation, relative regression, and median regression. However, modeling the relationship via the conditional distribution function is often regarded as more informative because it sheds light on both central and extreme parts of the response. Consequently, the principal goal of this work is to introduce a new estimator for modal regression using the cumulative distribution function.
Despite substantial literature on conditional mode prediction, the predominant estimator remains the Nadaraya–Watson (NW) method. The earliest investigation of conditional mode estimation can be traced back to [1], which demonstrated that the mode can yield superior predictive performance compared with the conditional mean. In [2], the authors proposed a mode-based predictor using the derivative of the conditional density, applicable to vector-valued input variables. Subsequently, ref. [3] derived the asymptotic distribution of the modal regression estimator under independence, and this result was generalized to dependent data by [4]. For more recent works, we refer the readers to [5].
A functional version of the NW-estimator for the conditional mode (CM) was first introduced in [6], where the authors established almost complete consistency of the estimator by identifying it as the maximizer of the conditional density. This result was extended to dependent processes in [7]. The asymptotic distribution of the NW-based functional CM estimator was studied under the i.i.d. assumption in [8], whereas [9] considered strong mixing functional time series under fractal conditions. The monograph of [10] represents a key contribution to nonparametric functional prediction, and further theoretical results on functional mode estimation can be found in [11], which addressed the L p -convergence of NW-based functional mode estimators. In the context of ergodic functional time series, ref. [12] focused on conditional mode estimation and derived BC consistency under a missing-at-random framework for the functional covariate. Several alternative estimators to the NW approach have also emerged in functional data analysis (FDA). For example, ref. [13] established the asymptotic normality of a local linear CM estimator in the functional setting, ref. [14] investigated a kNN-based functional CM approach, and [15] developed a local linear functional-kNN version of the estimator. More extensive studies on functional CM estimation can be found in [16,17,18] and related references. Additionally, in the ergodic functional time series case, ref. [19] obtained BC consistency for the conditional mode estimator.
A distinctive contribution of the present paper is its focus on a recursive estimation algorithm, a direction that remains underexplored in FDA. One of the earliest examinations of recursive methods in this domain is [20], which addressed the recursive estimation of conditional mean functions. Later, ref. [21] investigated recursive procedures for functional time series under mixing conditions. More recent developments in functional nonparametric smoothing by means of recursive algorithms, along with relevant references, are presented in [22]. For additional perspectives on FDA and its applications, including dedicated survey articles and specialized journal issues, see [23,24,25,26], among others and recent papers [27,28,29].

1.1. Contributions of This Paper

The primary objective of this work is to propose a novel modal regression estimator and establish its asymptotic properties under a general framework of ergodic functional time series. Specifically, our estimator combines an L 1 -based approach with a recursive procedure. In contrast to estimators built upon the NW or local linear methods, the newly constructed estimator offers multiple advantages. First, incorporating an L 1 -technique promotes robustness, which mitigates the impact of outliers through a percentile-based approach. Moreover, harnessing the conditional distribution function to identify the conditional mode leverages comprehensive information about the functional covariate–response relationship, potentially enhancing the estimator’s precision. A further strength lies in the recursive structure, which seamlessly updates the estimator upon the arrival of each new data point—a feature especially valuable for real-time forecasting in ergodic functional time series. This adaptability is highly relevant in fields such as artificial intelligence, where continuous data processing is critical. From a theoretical standpoint, we derive the Borel–Cantelli convergence rate of the proposed estimator under mild ergodic conditions frequently satisfied by common processes (e.g., moving average (MA), generalized autoregressive conditional heteroskedasticity (GARCH), Volterra). Finally, we illustrate the practical value of our algorithm through empirical investigations on both synthetic and real-world datasets.

1.2. Paper Organization

We introduce the L 1 -based conditional mode and its recursive estimator in Section 2. The main theoretical results, including consistency and convergence rates, are presented in Section 3. Section 4 is devoted to some discussion on the real impact of the principal axes of the studied topic. In Section 5, we investigate the finite-sample performance of the proposed estimator through simulation studies and applications to real data. The proofs of the auxiliary results are provided in Section 6.

2. The L 1 -Recursive Estimation of the Mode

Consider a strictly stationary sequence of dependent input–output random variables, denoted by ( I i , O i ) i = 1 , , n , which takes values in F × R . Here, F is a semi-metric space endowed with a semi-metric d ( · , · ) . Let N θ be a neighborhood of a fixed curve θ F . We assume that the conditional distribution function G ( · θ ) is strictly increasing and admits a continuous density g ( y θ ) with respect to the Lebesgue measure on R . Recall that for a given p ( 0 , 1 ) , the conditional quantile of O given I = θ , denoted by Q u p ( θ ) , is obtained by inverting the conditional distribution function, namely
Q u p ( θ ) = G 1 p | θ .
Meanwhile, the conditional mode of O given I = θ , denoted by C M ( θ ) , is defined as the maximizer of the conditional density on a given compact set K R :
C M ( θ ) = arg max y K g ( y θ ) .
By combining these two notions, one can re-express modal regression as
C M ( θ ) = Q u p θ ( θ ) with p θ = arg min p G 1 ( K θ ) p Q u p ( θ ) .
In the rest of this paper, we assume that the compact subset K locates one conditional mode C M ( θ ) and that (1) holds.
The L 1 -estimator of the conditional mode naturally connects to the L 1 -quantile regression, which is determined by
Q u p ( θ ) = arg min t R Ψ p ( θ , t ) ,
where
Ψ p ( θ , t ) = E L p ( O t ) | I = θ and L p ( s ) = ( 2 p 1 ) s + | s | .
An L 1 -recursive estimator of the function Q u p ( · ) can thus be defined by
Q u p ^ ( θ ) = arg min t R Ψ p ^ ( θ , t ) ,
where
Ψ p ^ ( θ , t ) = i = 1 n Γ a i 1 d ( θ , I i ) ( 2 p 1 ) O i t + O i t i = 1 n Γ a i 1 d ( θ , I i ) , t R ,
where Γ is a kernel function, and { a i } is a sequence of positive real numbers satisfying lim n a n = 0 . Note that in the recursive estimation setting, in contrast to the Nadaraya–Watson-type (NW) approach, the bandwidth a i depends on the specific input observations { I i } , thereby allowing the estimator to be updated whenever a new observation is obtained.
Before constructing a L 1 -recursive estimator of the modal regression, it is necessary to define estimators for both p θ and Q u p ( θ ) . Recall that
Q u p ( θ ) = p Q u p ( θ ) = lim b 0 Q u p + b ( θ ) Q u p ( θ ) b .
A natural estimator for Q u p ( θ ) is then given by
Q u p ^ ( θ ) = Q u p + h n ^ ( θ ) Q u p h n ^ ( θ ) 2 h n ,
where { h n } is a sequence of positive real numbers converging to 0. The conditional mode C M ( θ ) is accordingly estimated by
C M ^ ( θ ) = Q u p θ ^ ^ ( θ ) ,
where
p θ ^ = arg min p G 1 ( K θ ) Q u p ^ ( θ ) .
Of course, C M ^ ( θ ) is not necessarily unique, and if that is the case, C M ^ ( θ ) will concern any values verifying (3).
In the theoretical development, our principal goal is to establish a Borel–Cantelli convergence result for C M ^ ( θ ) . To achieve this, we adopt the classical ergodicity assumption, which is more general than ordinary mixing conditions. A process is typically considered ergodic if, over sufficient time, the entropy measured along a single evolving trajectory converges to the entropy of the system’s full ensemble of possible states. In the functional case, we employ the definition of ergodicity for functional statistics proposed by [7], as it provides a suitable framework for analyzing dependent functional data.

3. Main Results

We begin by letting C or C denote generic strictly positive constants. We also assume that
G 1 ( K θ ) = [ a θ , b θ ] .
Moreover, for each k = 1 , , n , define G k as the σ -field generated by ( ( I 1 , O 1 ) , , ( I k , O k ) , I k + 1 ) , and let F k be the σ -field generated by ( I 1 , O 1 ) , , ( I k , O k ) .
Our principal assumptions are as follows:
(Co1)
( i ) The function ξ ( θ , a ) : = I P I B ( θ , a ) satisfies ξ ( θ , a ) > 0 for every a > 0 , where B ( θ , r ) = { θ F : d ( θ , θ ) < r } . ( ii ) For each i = 1 , , n , there exists a deterministic function ξ i ( θ , · ) such that almost surely , 0 < I P I i B ( θ , a ) F i 1 ξ i ( θ , a ) , a > 0 , and ξ i ( θ , a ) 0 as a 0 . ( iii ) For any positive sequence ( a i ) i = 1 , , n , we have i = 1 n I P I i B ( θ , a i ) F i 1 i = 1 n ξ ( θ , a i ) 1 a . co .
(Co2)
The function Q u · ( θ ) is three times continuously differentiable on [ a θ , b θ ] . In addition, suppose that G ( · · ) satisfies the Lipschitz condition
θ 1 , θ 2 N θ , t 1 , t 2 [ a θ , b θ ] , G ( t 1 θ 1 ) G ( t 2 θ 2 ) C d b ( θ 1 , θ 2 ) + t 1 t 2 ,
for some b > 0 , where N θ is a neighborhood of θ .
(Co3)
The function Γ is supported on ( 0 , 1 ) and fulfills
0 < C 1 I ( 0 , 1 ) < Γ ( t ) < C 1 I ( 0 , 1 ) < .
(Co4)
lim n ζ n ln n n 2 ξ n 2 = 0 , where ζ n = i = 1 n ξ i ( θ , a i ) and ξ n = 1 n i = 1 n ξ θ , a i .
Clearly, conditions (Co1)–(Co4) are often encountered in nonparametric functional time series analysis. In particular, (Co1) describes the probabilistic concentration behavior of the input variable, including its conditional concentration with respect to the filtration, which underscores the impact of ergodicity on the asymptotic properties of the estimator. Assumption (Co2) is pivotal for the nonparametric structure of the model. Conditions (Co3) and (Co4) govern the behavior of the kernel Γ and the smoothing parameters a n and h n , ensuring the proper handling of the technical aspects of the estimator C M ^ ( θ ) . These requirements also allow us to express the convergence rate in a form analogous to the Nadaraya–Watson estimator, which can be viewed as arising from maximizing a double-kernel estimator of a conditional density. Thus, the assumptions under consideration effectively encompass the main components of the subject—namely, the model, the data structure, the correlation framework, and the convergence rate. These assumptions are not overly restrictive, especially given the complexity of the proposed functional time series model and the strength of the Borel–Cantelli (BC) type consistency. In fact, it is possible to establish a weaker form of consistency for the estimator under less stringent conditions. By employing techniques similar to those presented in [30], one can demonstrate weak consistency. Ultimately, the relationship between the assumptions and the resulting theoretical guarantees reflects a trade-off: stronger and more general results necessitate stronger assumptions. The next theorem establishes the almost-complete (a.co.) convergence of C M ^ ( θ ) .
Theorem 1. 
Suppose (Co1)–(Co4) hold. If
inf p ( 0 , 1 ) 3 p 3 Q u p ( θ ) > 0 ,
then
C M ^ ( θ ) C M ( θ ) = O 1 ζ n i = 1 n a i b ξ i ( θ , a i ) + O h n + O ζ n ln n n 2 ξ n 2 , a . co .
Proof of the Main Result. 
From the definitions of C M ^ ( θ ) and C M ( θ ) , it follows that
C M ^ ( θ ) C M ( θ ) = Q u p θ ^ ^ ( θ ) Q u p θ ( θ )   = Q u p θ ^ ^ ( θ ) Q u p θ ^ ( θ ) + Q u p θ ^ ( θ ) Q u p θ ( θ )   sup p [ a θ , b θ ] Q u ^ p ( θ ) Q u p ( θ )
    + sup p [ a θ , b θ ] Q u p θ ^ ( θ ) Q u p θ ( θ ) .
Using a Taylor expansion, we have
Q u p θ ^ ( θ ) Q u p θ ( θ ) = ( p θ ^ p θ ) Q u p θ * ( θ ) , for some p θ * ( p θ ^ , p θ ) .
Since p θ is the minimizer of Q u · θ , we also obtain
Q u p θ ^ ( θ ) Q u p θ ( θ ) = ( p θ ^ p θ ) 2 Q u p θ * * ( θ ) , for some p θ * * ( p θ ^ , p θ ) .
Analogously to (4), it follows that
Q u p θ ^ ( θ ) Q u p θ ( θ ) 2 sup p [ a θ , b θ ] Q u ^ p ( θ ) Q u p ( θ ) .
Because inf p ( 0 , 1 ) 3 p 3 Q u p ( θ ) > 0 , we obtain
p θ ^ p θ C sup p [ a θ , b θ ] Q u ^ p ( θ ) Q u p ( θ ) .
Combine (4) and (6) to obtain
C M ^ ( θ ) C M ( θ ) C sup p [ a θ , b θ ] Q u ^ p ( θ ) Q u p ( θ ) + sup p [ a θ , b θ ] Q u ^ p ( θ ) Q u p ( θ ) .
Hence, determining the convergence rate reduces to studying
sup p [ a θ , b θ ] Q u ^ p ( θ ) Q u p ( θ ) and sup p [ a θ , b θ ] Q u ^ p ( θ ) Q u p ( θ ) .
Furthermore, asymptotically,
Q u p ^ ( θ ) Q u p ( θ ) = Q u p + h n ^ ( θ ) Q u p + h n ( θ ) + Q u p h n ( θ ) Q u p h n ^ ( θ ) 2 h n + Q u p + h n ( θ ) Q u p ( θ ) + Q u p ( θ ) Q u p h n ( θ ) 2 h n Q u p ( θ ) 2 h n C h n 1 sup q ( a θ h n , b θ + h n ] Q u q ^ ( θ ) Q u q ( θ ) + O ( h n ) .
Finally, Theorem 1 follows from the following lemmas. □
Lemma 1
([16]). Consider a family of real-valued random functions { B n } , each of which is decreasing in γ. Let { A n } be a real-valued random sequence. Suppose there exist positive constants λ , M > 0 such that
A n = o a . c o . ( 1 ) and sup | γ | M B n ( γ ) + λ γ A n = o a . c o . ( 1 ) .
Then, for any real random sequence { γ n } satisfying B n ( γ n ) = o a . c o . ( 1 ) , it follows that
n = 1 P | γ n | M < .
Lemma 2. 
Suppose (Co1) and (Co3)–(Co4) hold. Then, we have
Q ^ D ( θ ) Q ¯ D ( θ ) = O ζ n ln n n 2 ξ n 2 , a . c o .
Moreover, there exists a constant C > 0 such that
n I P Q ¯ D ( θ ) < C < .
Here,
Q ^ D ( θ ) : = 1 n ξ n i = 1 n Γ a i 1 d ( θ , I i )
and
Q ¯ D ( θ ) : = 1 n ξ n i = 1 n I E Γ a i 1 d ( θ , I i ) | F i 1 .
Proposition 1. 
Assume (Co1)–(Co4) hold. Then, there is a positive constant λ such that
Q u ^ p ( θ ) Q u p ( θ ) = 1 g Q u p θ A n + O sup | γ | M B n ( γ ) + λ γ A n ,
where
B n ( γ ) = 1 n ξ n i = 1 n p 1 { O i γ + Q u p ( θ ) } Γ i , and A n = B n ( 0 ) .
Proposition 2. 
Under the same assumptions (Co1)–(Co4), we also have
sup p ( 0 , 1 ) Q u ^ p ( θ ) Q u p ( θ ) = O 1 ζ n i = 1 n a i b ξ i ( θ , a i ) + O ζ n ln n n 2 ξ n 2 , a . c o .

4. Discussion and Comments

4.1. On the Ergodic Functional Time Series

Similarly to multivariate statistics, ergodicity plays a crucial role in functional statistics. In particular, ergodicity ensures that temporal averages converge to their corresponding stochastic means. This property is especially important, as it justifies the use of sample mean and covariance functions as consistent estimators of the true mean function and the true covariance operator. These estimators, in turn, allow for efficient estimation of eigenfunctions in functional principal component analysis (FPCA) and for accurate curve smoothing using a chosen basis, such as splines or Fourier functions. All these methodologies fundamentally rely on the sample mean and the empirical covariance operator. The ergodic behavior and functional characteristics of the time series under consideration are governed by assumption (Co1), which quantifies the concentration properties of the functional variables. This assumption is thoroughly discussed in [6], where it is shown that Co1(i) holds for a wide class of continuous processes whose probability measures are absolutely continuous with respect to the Wiener measure. Examples include the Poisson process, the Ornstein–Uhlenbeck process, fractional Brownian motion, and general diffusion processes. In this work, we also focus on the conditional version of this assumption, namely (Co1)(ii–iii). This extension enables us to account for the dependence structure of the process by analyzing its long-memory behavior, a standard approach in dynamic systems modeling and time series analysis. In such contexts, conditional distributions with respect to the past filtration F i 1 are frequently employed to control process evolution, verify the martingale property, and assess predictability. Using arguments similar to those used in the unconditional case (Co1)(i), one can show that a trivial example of a functional ergodic process satisfying (Co1)(ii–iii) is when its conditional distribution, given the past, is absolutely continuous with respect to the Wiener measure. Additionally, the Karhunen–Loève decomposition can be used to represent such processes explicitly (see [6] for examples of functional processes admitting such a decomposition). It is worth emphasizing that while both mixing and ergodicity describe forms of dependence among observations, they are fundamentally different. Specifically, the mixing property implies that any two subsets of the state space become asymptotically independent over time, whereas ergodicity implies that the system’s trajectory visits all regions of the space in proportion to their probability measure. Importantly, ergodicity is generally easier to verify than mixing. It is well known that ergodicity does not imply mixing, and there exist numerous ergodic time series that fail to satisfy any form of mixing assumption. Prominent examples include the following:
First-order autoregressive processes with Bernoulli innovations (see [31]);
Gaussian processes with Hurst exponent H > 0.5 (see [32]);
Gaussian processes with non-decaying covariance structures (see [33]).
Additional examples are discussed in [34], and these models can be naturally extended to the functional setting.

4.2. The Conditional Mode Versus the Conditional Mean

In predictive modeling, particularly when the conditional distribution is asymmetric or multimodal, the conditional mode often yields more accurate and meaningful predictions than the conditional mean. While the conditional mean represents the average outcome given certain inputs, it can be significantly affected by outliers or skewness in the distribution. In contrast, the conditional mode reflects the most probable outcome, making it more robust, reliable, and informative. A similar conclusion applies to the conditional median, which is also less sensitive to extreme values than the mean. As a result, combining the conditional mode and median can significantly outperform the conditional mean in predictive tasks. This advantage becomes even more crucial in the context of ergodic functional time series, where providing robust predictors is essential. Ergodicity ensures that time averages converge to ensemble averages, offering a sound statistical basis for long-term forecasting. Moreover, the environmental data under study often exhibits seasonality, which can distort conditional mean predictions. In particular, repeating seasonal patterns can oversmooth the conditional expectation, reducing forecasting accuracy. The conditional mode, however, better captures the most likely outcomes within each seasonal segment, making it especially suitable for forecasting applications.

4.3. The Recursive Estimation in Action

As with all smoothing approaches, the choice of the bandwidth parameter a i is critical to the quality of the estimation. Typically, the mean squared error serves as a fundamental criterion for selecting this parameter. In the recursive framework considered here, we adopt the selection algorithm proposed by [20]. The smoothing parameter a i is fixed to a i = C i υ , where
C = max d i ( θ , I i ) ,
and υ is selected by the cross-validation rule as follows
υ o p t = arg min υ ( 0 , 1 ) j = 1 n ( O j C M ^ j ( I j ) ) 2 ,
where C M ^ j is the leave-out-one estimator of the estimator. The rule (8) is similar to the cross-validation criterion considered by [10]. In our empirical analysis, the rule (8) is optimized over m equidistant points in the interval ( 0 , 1 ) . Finally, it is worth noting that, although this selection approach has demonstrated good empirical performance, establishing its asymptotic optimality remains an important direction for future research.

4.4. The Computational Cost

Recall that, unlike traditional kernel estimators, which compute the estimate independently at each point, the recursive version updates the estimate sequentially with each new observation, potentially reducing execution time. Consequently, computational efficiency is a significant advantage of the recursive estimator. Quantifying this efficiency is particularly important in the context of large datasets or real-time applications. Specifically, if each update involves a constant number of operations, i.e., of order O ( 1 ) , then the total computational cost becomes O ( n ) , which is considerably more efficient than the O ( n 2 ) complexity of standard kernel estimators. However, in practical scenarios where the bandwidth is selected via adaptive tuning, the computational cost may increase. Despite its advantages, the recursive approach has a notable drawback: it requires storing past data, which negatively impacts memory usage. This limitation becomes especially critical when dealing with large sample sizes or high-dimensional data.

5. Simulation Study

In this simulation study, our objective is to investigate the feasibility and effectiveness of the proposed method. In particular, we seek to assess how dependence impacts the convergence rate comparing the algorithm’s performance under different scenarios such as varying dependence levels, signal-to-noise ratios, or outlier contamination. To achieve this, we first generate an artificial dataset following a nonparametric form:
heteroscedastic ( Het . ) Model : O i = 4 0 1 sin 3 + I i 3 ( t ) d t + cos 3 + I i 3 ( t ) ϵ i , homoscedastic ( Hom . ) Model : O i = 5 0 1 log 2 + I i 2 ( t ) 3 + I i 3 ( t ) d t + ϵ i ,
and heteroscedastic model with signal-to-noise ratio (SNRHet.) Model:
O i = 4 0 1 sin 3 + I i 3 ( t ) d t + σ S N R ϵ i ,
where ϵ i and I i are independent, and σ S N R is controlled by considering various values of signal-to-noise ratio S N R k = 5 % and 40 % , where
S N R k = σ S N R 2 1 n i = 1 n ( R i R ¯ ) 2 , R i = 4 0 1 sin 3 + I i 3 ( t ) .
The functional input variable is generated from dependent functional processes using the R-package FTSgof (https://www.r-project.org/). We have generated n = 150 observations. The resulting functional variables are presented in Figure 1, Figure 2 and Figure 3.
Clearly, this sampling process encompasses three types of FTS-dependence, specifically functional autoregressive processes of order 2 (FAR(2)) involving two distinct kernels
Gaussian kernel ψ ( t , s ) = exp t 2 + s 2 2 t , s [ 0 , 1 ] ,
Wiener kernel ψ ( t , s ) = t ( 1 t ) s ( 1 s ) t , s [ 0 , 1 ] .
The third illustration of a functional covariate setting is the functional ARCH(1) model, whose conditional volatility depends on the following kernel:
Default kernel α ( t , s ) = 12 t ( 1 t ) s ( 1 s ) , t , s [ 0 , 1 ] .
This kernel is the default choice in the FTSgof R-package. In our experimental design, the correlation level is adjusted via the function fACF. It is evident that this aspect emphasizes the effect of dependency levels on the accuracy of estimates. Meanwhile, the impact of outliers is managed by adjusting the observation responses ( O i ) i with a multiplicative factor M F , which can be either M F = 1 or M F = 10 . Meanwhile, the true values of the conditional mode, denoted by C M ( θ ) , are obtained from the distribution of the underlying white noise ϵ i . This step is crucial because it allows us to evaluate how the nonparametric component influences the prediction task.
To investigate robustness, we consider three distinct distributions: Weibull, Laplace, and Log normal. These distributions are selected for their invariance under translation and varying degrees of heavy-tailed behavior, which in turn help to gauge the estimator’s resilience to outliers. We then compare the L 1 -robustness of our estimator against multiple existing predictors.
To examine how recursion affects estimation, we contrast our proposed recursive estimator with a non-recursive version, in which the bandwidth parameter a i remains fixed at a n . We further assess robustness by comparing C M ^ against the double-kernel (DK) estimators given by
The NW - estimator : C M ˜ ( θ ) = arg max y i = 1 n Γ a n 1 d ( θ , I i ) Γ b n 1 ( y O i ) i = 1 n b n Γ a n 1 d ( θ , I i ) ,
and
The DK - recursive estimator : C M ¯ ( θ ) = arg max y i = 1 n Γ a i 1 d ( θ , I i ) Γ b n 1 ( y O i ) i = 1 n b n Γ a i 1 d ( θ , I i ) .
The performance of these three estimators, C M ^ , C M ˜ , and C M ¯ depends on the choice of parameters ( a n , b n ) . Selecting semi-metric d and kernel Γ also influences efficiency. In particular, Γ is chosen to satisfy (Co3), while d controls the smoothing level of the functional predictors I i . To examine the feasibility of the selector algorithm discussed in Section 4 we simulate with a i = C i υ , where υ is chosen from a 10 equidistant grid in ( 0 , 1 ) and C = max d i ( θ , I i ) . Regarding the kernel Γ , we employ a quadratic kernel on ( 0 , 1 ) , which is consistent with (Co3) and frequently used in nonparametric functional statistics. Moreover, the PCA metric proves especially suitable for cases in which the explanatory curves I i exhibit discontinuities. In this empirical analysis, we proceed with the PCA associated with the third eigenfunction. Finally we point out that we took b n = a n = C n υ for the estimators C M ˜ , and C M ¯ , υ being selected by the same manner as in the estimator C M ^ .
To compare the effectiveness of the estimators, we compute their mean square error (MSE) across all simulated scenarios,
MSE ( C M ¨ ) = 1 n i = 1 n O i C M ¨ ( I i ) 2 ,
where C M ¨ can represent either C M ^ , C M ¯ , or C M ˜ . This metric enables us to contrast their performance under varying distributional assumptions dependence levels (GFAR(2) (Gaussian kernel based FAR(2)), WFAR(2) (Wiener kernel based FAR(2)), and FARCH(1)), signal-to-noise ratios, or outliers contamination. The results are reported in Table 1, Table 2 and Table 3.
The effectiveness of these estimators is substantially influenced by both the structure of the functional time series and the level of correlation. In addition, the predictor’s accuracy depends on the choice of nonparametric modeling. Nevertheless, empirical findings indicate that the recursive approach generally surpasses the NW method in terms of precision and that the L 1 method exhibits more stable mean squared error (MSE) variability compared to double-kernel techniques. Consequently, the estimator C M ^ emerges as notably precise and robust since it combines the advantages of recursive algorithms with those of L 1 -based techniques. Finally, we observe that all considered functional estimators are straightforward to implement and maintain acceptable accuracy across a variety of scenarios.

6. A Real Data Analysis

The purpose of this section is to evaluate how the L 1 -recursive estimator of the conditional mode predictor performs in comparison with other recursive approaches. Specifically, we juxtapose it with recursive estimators of both the conditional mean and the conditional median. To conduct this forecasting problem, we rely on an environmental functional time series dataset. In particular, we focus on predicting air quality at a predetermined lead time by exploiting historical observations. The recursive predictors considered in this study are
( cond . mean ) M E ^ ( θ ) = i = 1 n Γ a i 1 d ( θ , I i ) O i i = 1 n Γ a i 1 d ( θ , I i ) ,
and
( cond . median ) C M ^ ( θ ) = arg min y i = 1 n Γ a i 1 d ( θ , I i ) O i y j = 1 n Γ a i 1 d ( θ , I j ) .
The dataset utilized for this comparative analysis is available online at https://gaftp.epa.gov/castnet (accessed on 8 January 2025). It contains information pertaining to the city of Bondville in Champaign, IL, USA. The monitoring station in question has the following geographical attributes, as shown in the following table.
CountrySateCountyCode of StationGeographical Coordinates
USAlllinoisChampaignBVL13040.051981–88.372495
The dataset was recorded at hourly intervals from January through December 2024. The original set of observations is illustrated in Figure 4.
Recall that predicting ozone levels based on CO2 concentrations is crucial for environmental sustainability. In Champaign, a city with a mix of urban traffic and surrounding agricultural activity, ozone levels can spike during hot, stagnant summer days, worsening air quality. Therefore, air quality in this region is significantly influenced by seasonal variations. Naturally, this effect can be managed by applying suitable seasonal data preprocessing techniques. Initially, we replaced missing values with the average of the nearest four values and employed correlation and causality analysis to pinpoint the right covariate variables. Following this preprocessing, we concluded that predicting CO2 emissions three hours ahead using the past 24 h of historical data is more advantageous. To do that, we segment the 8736-h dataset into N + 1 = 364 intervals, each denoted by I i , and each interval I i spans 24 h (i.e., one full day). Following this procedure, we define the output variable as
O i = I i + 1 ( 3 ) .
The curve data are given in the following graph (Figure 5).
To construct the estimators M E ^ , C M ^ , and M O ^ , we maintain the same smoothing approach, the same kernel function, and the same distance metric (PCA metric associated to the third eigenfunction). We subsequently evaluate and compare these estimators using the following procedure:
  • Step 1. Randomly partition the dataset into two parts:
    A training set, { ( I j , O j ) } j J , consisting of 300 observations;
    A test set, { ( I i , O i ) } i I , consisting of 64 observations.
  • Step 2. For each I i in the training set, predict the corresponding response O i by applying:
    Method 1 (Conditional mean):
    O i M E ^ = M E ^ ( I i ) ;
    Method 2 (Conditional mode):
    O i M O ^ = M O ^ ( I i ) ;
    Method 3 (Conditional median):
    O i C M ^ = C M ^ ( I i ) .
  • Step 3. For each I new in the test set, identify
    i * = arg min I i training set J d I new , I i ,
    where d ( · , · ) denotes the chosen distance function.
  • Step 4. Use the identified index i * to predict O new :
    Method 1 (Conditional mean):
    O new M E ^ = M E ^ ( I i * ) ;
    Method 2 (Conditional mode):
    O new M O ^ = M O ^ ( I i * ) ;
    Method 3 (Conditional median):
    O new C M ^ = C M ^ ( I i * ) .
  • Step 5. To assess the prediction accuracy among the methods, compute the square root of the mean squared error (SMSE):
    S M S E = 1 64 i Test set O i T ^ ( I i ) 2 ,
    where T ^ can be M E ^ , C M ^ , or M O ^ .
  • Step 6. Plot the actual response values versus the predicted values for each method.
Consistent with our expectations, the L 1 -based recursive predictor C M ^ demonstrates superior performance compared to the alternative models, M E ^ and C M ^ , see Figure 6. This improvement in predictive accuracy is notably significant. To support this assertion, we computed the standardized mean squared error (SMSE). The SMSE for C M ^ was 3.26, while M E ^ and the second instance of C M ^ yielded SMSE values of 5.42 and 4.87, respectively. These predictive error measures are broadly consistent with the findings reported in [35], although one must consider the considerable differences in the climatic conditions of the regions studied. Moreover, to evaluate the sensitivity of the proposed predictors to various parameter settings, we re-ran the algorithm using the L 2 metric associated with the B-spline basis function, as well as the β -kernel with shape parameters ( 2 , 3 ) . Subsequently, we computed the SMSE for each scenario.
Once again, C M ^ appears to outperform the other models, M E ^ and C M ^ . Table 4 outlines the SMSE error for the different situations. It is evident that prediction is significantly influenced by the chosen parameters of the estimators. However, the choice of the metric has a greater impact compared to the choice of the kernel. It is clear that the PCA metric is more suitable for these data. This conclusion confirms the connection between the metric and the smoothing level of the curves. In fact, using the spline metric for discontinuous curves can over-smooth the functional covariate, leading to less accurate outcomes.

7. Conclusions

This work introduces a new predictor based on the estimation of the L 1 -modal regression, constructed by means of a recursive procedure. The theoretical discussion provides the essential mathematical underpinnings that enable the straightforward, practical application of the proposed estimator. More specifically, we establish its asymptotic behavior under the fts-ergodic assumption, an alternative condition to the conventional correlation-based criteria.
Empirical evidence from artificial and real datasets confirms that the outlined implementation accommodates the theoretical assumptions. In particular, the accuracy of the estimator depends on the degree of correlation in the data, the smoothness of the underlying nonparametric model, and the careful selection of tuning parameters such as kernel and bandwidth. Notably, combining the L 1 framework with a recursive approach offers improvements in terms of robustness and predictive precision.
In addition to these findings, the present study highlights several potential avenues for future investigation. One promising direction involves identifying the asymptotic distribution of the normalized estimator under various forms of fts, such as association or Markovian sequences. Another important extension concerns spatial modeling, which considers the geographical arrangement of the observations and supports more intricate prediction tasks. Although these extensions primarily focus on dependencies in the data, further generalizations to other smoothing methods, including the kNN approach, local linear estimators, and semi-partial linear techniques, remain equally compelling.

8. Proof of Propositions

Proof of Lemma 2. 
First, we define
Γ i = Γ ( a i 1 d ( θ , I i ) ) a n d L i = Γ i I E Γ i | F i 1 .
Thus,
Q ^ D ( θ ) Q ¯ D ( θ ) = 1 n ξ n i = 1 n L i .
As L is a martingale difference for q 2 ,
I E L i q | F i 1 C I E L i 2 | F i 1     C I E L i 2 | F i 1     < C I P ( I i B ( θ , a i ) | F i 1 )     C ξ i ( θ , a i ) .
Now, apply the exponential to obtain
I P Q ^ D ( θ ) Q ¯ D ( θ ) > ε = I P 1 n ξ n i = 1 n L i > ε   2 exp ε 2 n 2 ξ n 2 ( θ ) 2 ( ζ n + C ε n ξ n )   = 2 exp ε 2 n 2 ξ n 2 ( θ ) 2 ζ n 1 1 + C ε n ξ n ζ n .
Putting ε = ϵ 0 ζ n ln n n ξ n , then,
I P Q ^ D ( θ ) Q ¯ D ( θ ) > ϵ 0 ζ n ln n n ξ n 2 exp ϵ 0 2 ln n 2 1 1 + C ϵ 0 ln n ζ n .
Since
ln n ζ n ln n C n ξ n C ln n n ξ n ζ n n ξ n .
Therefore,
lim n ln n ζ n = 0 .
This implies that
I P Q ^ D ( θ ) Q ¯ D ( θ ) > ϵ 0 ζ n ln n n ξ n 2 exp ϵ 0 2 ln n 2 C 2 n C ϵ 0 2 .
Hence,
Q ^ D ( θ ) Q ¯ D ( θ ) = O ζ n ln n n 2 ξ n 2 , a . c o .
For the second result, we have the following for C > 0
0 < C i = 1 n I P ( I i B ( θ , a i ) | F i 1 ) i = 1 n I P ( I i B ( θ , a i ) ) Q ¯ D ( θ ) Q ¯ D ( θ ) Q ^ D ( θ ) + Q ^ D ( θ ) .
Thus,
C i = 1 n I P ( I i B ( θ , a i ) | F i 1 ) i = 1 n I P ( I i B ( θ , a i ) ) Q ^ D ( θ ) Q ¯ D ( θ ) < Q ^ D ( θ ) .
So,
I P Q ^ D ( θ ) C 2 I P C i = 1 n I P ( I i B ( θ , a i ) | F i 1 ) i = 1 n I P ( I i B ( θ , a i ) ) Q ^ D ( θ ) Q ¯ D ( θ ) < C 2 I P C i = 1 n I P ( I i B ( θ , a i ) | F i 1 ) i = 1 n I P ( I i B ( θ , a i ) ) Q ^ D ( θ ) Q ¯ D ( θ ) C > C 2 .
It follows that
n I P Q ^ D ( θ ) C 2 n I P C i = 1 n I P ( I i B ( θ , a i ) | F i 1 ) i = 1 n I P ( I i B ( θ , a i ) ) Q ^ D ( θ ) Q ¯ D θ ) C > C 2 < .
Finally, from the first result, we obtain
n I P Q ¯ D ( θ ) C 2 < .
Proof of Propositions 1. 
Let
γ n = Q u ^ p ( θ ) Q u p ( θ ) .
Clearly, B n ( γ n ) = 0 . Now, we check
A n = o a . c o . ( 1 ) ,
and there exist M , λ > 0 such that
sup | γ | M | B n ( γ ) + λ γ A n | = o a . c o . ( 1 ) .
For (14), we evaluate
A n A n ¯ = O a . c o . ζ n ln n n 2 ξ n 2 and A n ¯ = O 1 ζ n i = 1 n a i b ξ i ( θ , a i ) ,
where
A n ¯ = 1 n ξ n i = 1 n I E p 1 I [ O i Q u p ( θ ) ] Γ i | F i 1 .
Firstly, we have
ϵ > 0 I P A n A n ¯ > ε = I P 1 n ξ n i = 1 n Ψ i > ε   I P i = 1 n Ψ i > ε n ξ n ,
with
Ψ i = 1 n ξ n p 1 I [ O i Q u p ( θ ) ] Γ i I E p 1 I [ O i Q u p ( θ ) ] Γ i | F i 1 .
Clearly, Ψ i is a martingale difference with respect to the σ -algebra, and ( F i 1 ) i satisfies the following for q 2
I E Ψ i q | F i 1 C I E Ψ i 2 | F i 1     C I E Γ i 2 | F i 1     < C I P ( I i B ( θ , a i ) | F i 1 )     C ξ i ( θ , a i ) .
Thus,
I P A n A n ¯ > ε = I P 1 n ξ n i = 1 n Ψ i > ε   2 exp ε 2 n 2 ξ n 2 2 ( ζ n + C ε n ξ n )   = 2 exp ε 2 n 2 ξ n 2 2 ζ n 1 1 + C ε n ξ n ζ n .
So, for ε = ϵ 0 ζ n ln n n ξ n , we have,
I P A n A n ¯ > ϵ 0 ζ n ln n n ξ n 2 exp ϵ 0 2 ln n 2 1 1 + C ϵ 0 ln n ζ n .
Since
ln n ζ n ln n C n ξ n C ln n n ξ n ζ n n ξ n .
Then,
lim n ln n ζ n = 0 .
Therefore,
I P A n A n ¯ > ϵ 0 ζ n ln n n ξ n 2 exp ϵ 0 2 ln n 2 C 2 n C ϵ 0 2 .
For the second one, we have
1 I B ( θ , ) ( I 1 ) | G ( t , I i ) G ( t , θ ) | C a i b
Then,
A n ¯ = 1 n ξ n i = 1 n I E p 1 I [ O i Q u p ( θ ) ] Γ i | F i 1   = 1 n ξ n i = 1 n I E Γ i G ( Q u p ( θ ) | θ ) I E 1 I [ O i Q u p ( θ ) ] | I i | F i 1   1 n ξ n i = 1 n I E Γ i 1 I B ( θ , a i ) ( I i ) G ( Q u p ( θ ) | θ ) G ( Q u p ( θ ) | I i ) | F i 1 .   1 n ξ n i = 1 n a i b I E Γ i | F i 1 .
Consequently,
| A n A n ¯ | = O a . c o . ζ n ln n n 2 ξ n 2
and
A n ¯ = O 1 ζ n i = 1 n a i b ξ i ( θ , a i ) .
For (15), similar to (14), we split the required result into parts
sup | γ | M B n ( γ ) A n 1 n ξ n i = 1 n I E ( B n ( γ ) A n ) | F i 1 = O a . c o . ζ n ln n n 2 ξ n 2 ,
and the bias term
sup | γ | M 1 n ξ n i = 1 n I E B n ( γ ) A n | F i 1 + g ( Q u p ( θ ) ) γ = O 1 ζ n i = 1 n a i b ξ i ( θ , a i ) .
Let us start with the dispersion term in (17). We employ the compactness of the interval [ M , M ] and write
[ M , M ] j = 1 d n [ γ j l n , γ j + l n ] , for γ j [ M , M ] and l n = d n 1 = 1 / n .
So, for all γ [ M , M ] , we put j ( γ ) = arg min j | γ γ j | and use the monotony of B n ( · ) and I E B n ( γ ) | F i 1 , which leads to the following for all 1 j d n ,
B n ( γ j + l n ) sup y ( γ j l n , γ j + l n ) B n ( γ ) B n ( γ j l n )
and
I E B n ( γ j + l n ) | F i 1 sup γ ( γ j + l n , γ j + l n ) I E B n ( γ ) | F i 1 I E B n ( γ j l n ) | F i 1 .
We deduce, for any γ 1 , γ 2 [ M , M ] ,
1 n ξ n i = 1 n I E B n ( γ 1 ) | F i 1 1 n ξ n i = 1 n I E B n ( γ 2 ) | F i 1 C | γ 1 γ 2 | b Q ¯ D ( x ) .
It follows that
sup | γ | M B n ( γ ) A n 1 n ξ n i = 1 n I E B n ( γ ) A n | F i 1 max 1 j d n max z { γ j l n , γ j + l n } B n ( z ) A n 1 n ξ n i = 1 n I E B n ( z ) A n | F i 1 + 2 b C l n b Q ¯ D ( x ) .
Concerning l n b , we write
l n b ζ n log n n 2 ξ n 2 = l n b n ξ n ζ n log n     = n ξ n n ζ n log n     = i = 1 n ξ ( θ , a i ) n i = 1 n ξ ( θ , a i ) i = 1 n ξ i ( θ , a i ) 1 log n .
Furthermore, as ξ ( θ , a i ) 1 we have, for all n,
i = 1 n ξ ( θ , a i ) n 1 ,
and by ((C1))(iii)
lim n i = 1 n ξ ( θ , a i ) i = 1 n ξ i ( θ , a i ) lim n i = 1 n ξ ( θ , a i ) i = 1 n I P I i B ( x , r i ) | F i 1 = 1 .
Finally, we can write
l n b = O ζ n log n n 2 ξ n 2 .
Dealing with
sup | γ | M B n ( γ j ) A n 1 n ξ n i = 1 n I E B n ( γ j ) A n | F i 1 ,
we set
B n ( γ j ) A n 1 n ξ n i = 1 n I E B n ( γ j ) A n | = 1 n ξ n i = 1 n Υ i ,
with
Υ i = 1 I O i Q u p ( θ ) 1 I O i γ j + Q u p ( θ ) Γ i     I E 1 I O i Q u p ( θ ) 1 I O i γ j + Q u p ( θ ) Γ i | F i 1 .
As in A n ,
I E Υ i q | F i 1 C I E Υ i 2 | F i 1     C I E Γ i 2 | F i 1     < C I P ( I i B ( θ , a i ) | F i 1 )     C ξ i ( θ , a i ) .
Therefore,
I P B n ( γ j ) A n 1 n ξ n i = 1 n I E B n ( γ j ) A n | F i 1 > ϵ 0 ζ n ln n n ξ n 2 exp ϵ 0 2 ln n 2 C 2 n C ϵ 0 2 .
Consequently,
n I P sup | γ | M B n ( γ j ( γ ) ) A n 1 n ξ n i = 1 n I E B n ( γ j ( γ ) ) A n ϵ 0 ζ n ln n n ξ n n d n max j I P B n ( γ j ) A n 1 n ξ n i = 1 n I E B n ( γ j ) A n ϵ 0 ζ n ln n n ξ n < ,
implying (17). Concerning (18), we write
1 n ξ n i = 1 n I E B n ( γ ) A n | F i 1 = 1 n ξ n i = 1 n I E 1 I O 1 γ + Q u p ( θ ) 1 I O 1 Q u p ( θ ) Γ i | F i 1 = 1 n ξ n i = 1 n I E G ( γ + Q u p ( θ ) | I 1 ) G ( Q u p ( θ ) | I 1 ) Γ i | F i 1 = 1 n ξ n i = 1 n I E G ( γ + Q u p ( θ ) θ ) g ( Q u p ( θ ) θ ) Γ i | F i 1 + O ( a i b ) = γ g ( Q u p ( θ ) θ ) ξ n 1 n i = 1 n I E Γ i | F i 1 + O 1 ζ n i = 1 n a i b ξ i ( θ , a i ) + o γ .
It follows that
I E B n ( γ ) A n | F i 1 = g ( Q u p θ ) Q ¯ D ( x ) γ + O 1 ζ n i = 1 n a i b ξ i ( θ , a i ) + o γ .
Therefore, the Bahadur representation of Q u ^ p ( θ ) Q u p ( θ ) is
Q u ^ p ( θ ) Q u p ( θ ) = 1 g ( Q u p θ ) A n + O sup | γ | M | B n ( γ ) + λ γ A n | .
Proof of Propositions 2. 
The uniform consistency of Q u ^ p ( θ ) Q u p ( θ ) is based on
sup p [ 0 , 1 ] | A n I E A n | = O a . c o . ζ n ln n n ξ n ,
and
sup p [ 0 , 1 ] sup | γ | M | B n ( γ ) + g ( Q u p θ ) γ A n | = O a . c o . ζ n ln n n ξ n .
Since the inequalities in the bias terms are uniform on p [ 0 , 1 ] . We only focus on the dispersion terms of
sup p [ 0 , 1 ] | A n ( p ) | and sup p [ 0 , 1 ] sup | γ | M | F n ( γ , p ) | ,
where
A n ( p ) = A n 1 n ξ n i = 1 n I E A n | F i 1 ,
and
F n ( γ , p ) = B n ( γ ) A n 1 n ξ n i = 1 n I E ( B n ( γ ) A n ) | F i 1 .
We focus on the first-term, the second one is similar. Indeed,
[ 0 , 1 ] k = 1 d n [ p k l n , p k + l n ] , f o r p k [ 0 , 1 ] .
Next, for all p [ 0 , 1 ] , we put η p = arg min k | p p k | . Then,
sup p [ 0 , 1 ] | A n ( p ) | max 1 j d n max z p { p j l n , p j + l n } | A n ( z p ) | + 2 b C l n b Q ¯ D ( x ) .
It is shown in Lemma that for all p ( 0 , 1 ) that
I P A n ( z p ) ϵ 0 ζ n ln n n ξ n 2 n C ϵ 0 2 .
Therefore,
n I P sup p [ 0 , 1 ] | A n ( p ) | ϵ 0 ζ n ln n n ξ n n d n max j I P A n ( p ) ϵ 0 ζ n ln n n ξ n < .
The uniform consistency of ( sup p [ 0 , 1 ] | I E A n | ) is obtained by taking the uniform version of (16), which allows us to conclude that
sup p [ 0 , 1 ] | Q u ^ p ( θ ) Q u p ( θ ) | = O 1 ζ n i = 1 n a i b ξ i ( θ , a i ) + O ζ n ln n n 2 ξ n 2 .

Author Contributions

Conceptualization, M.B.A., F.A.A., I.M.A., S.B. and A.L.; Methodology, M.B.A., F.A.A., I.M.A., S.B. and A.L.; Software, M.B.A., F.A.A., I.M.A., S.B. and A.L.; Validation, F.A.A., I.M.A., S.B. and A.L.; Formal analysis, M.B.A., F.A.A., I.M.A., S.B. and A.L.; Investigation, M.B.A., I.M.A., S.B. and A.L. All authors have read and agreed to the final version of the manuscript.

Funding

This research was funded by Princess Nourah bint Abdulrahman University Researchers; Supporting Project number (PNURSP2025R515), Princess Nourah bint Abdulrahman University, Riyadh, Saudi Arabia; and the Deanship of Scientific Research and Graduate Studies at King Khalid University through the Small Research Groups Program under grant number RGP1/41/46.

Data Availability Statement

The data used in this study are available through the link https://kilthub.cmu.edu (accessed on 2 February 2025).

Acknowledgments

The authors thank and extend their appreciation to the funders of this work: This work was supported by Princess Nourah bint Abdulrahman University Researchers Supporting Project number (PNURSP2025R515), Princess Nourah bint Abdulrahman University, Riyadh, Saudi Arabia, and the Deanship of Scientific Research and Graduate Studies at King Khalid University through the Small Research Groups Program under grant number RGP1/41/46. The authors extend their sincere gratitude to the Editor-in-Chief, the Associate Editor, and the reviewers for their invaluable feedback and for pointing out a number of oversights in the version initially submitted. Their insightful comments have greatly refined and focused the original work, resulting in markedly improved presentation.

Conflicts of Interest

The authors declare no conflicts of interest.

References

  1. Collomb, G.; Härdle, W.; Hassani, S. A note on prediction via estimation of the conditional mode function. J. Stat. Plan. Inference 1986, 15, 227–236. [Google Scholar] [CrossRef]
  2. Quintela-Del-Rio, A.; Vieu, P. A nonparametric conditional mode estimate. J. Nonparametr. Stat. 1997, 8, 253–266. [Google Scholar] [CrossRef]
  3. Ioannides, D.; Matzner-Løber, E. A note on asymptotic normality of convergent estimates of the conditional mode with errors-in-variables. J. Nonparametr. Stat. 2004, 16, 515–524. [Google Scholar] [CrossRef]
  4. Louani, D.; Ould-Saïd, E. Asymptotic normality of kernel estimators of the conditional mode under strong mixing hypothesis. J. Nonparametr. Stat. 1999, 11, 413–442. [Google Scholar] [CrossRef]
  5. Allaoui, S.; Bouzebda, S.; Chesneau, C.; Liu, J. Uniform almost sure convergence and asymptotic distribution of the wavelet-based estimators of partial derivatives of multivariate density function under weak dependence. J. Nonparametr. Stat. 2021, 33, 170–196. [Google Scholar] [CrossRef]
  6. Ferraty, F.; Laksaci, A.; Vieu, P. Estimating some characteristics of the conditional distribution in nonparametric functional models. Stat. Inference Stoch. Process. 2006, 9, 47–76. [Google Scholar] [CrossRef]
  7. Dabo-Niang, S.; Kaid, Z.; Laksaci, A. On spatial conditional mode estimation for a functional regressor. Stat. Probab. Lett. 2012, 82, 1413–1421. [Google Scholar] [CrossRef]
  8. Ezzahrioui, M.H.; Ould-Saïd, E. Asymptotic normality of a nonparametric estimator of the conditional mode function for functional data. J. Nonparametr. Stat. 2008, 20, 3–18. [Google Scholar] [CrossRef]
  9. Ezzahrioui, M.H.; Saïd, E.O. Some asymptotic results of a non-parametric conditional mode estimator for functional time-series data. Stat. Neerl. 2010, 64, 171–201. [Google Scholar] [CrossRef]
  10. Ferraty, F.; Vieu, P. Functional Nonparametric Prediction Methodologies. In Nonparametric Functional Data Analysis: Theory and Practice; Springer: New York, NY, USA, 2006; pp. 49–59. [Google Scholar]
  11. Dabo-Niang, S.; Kaid, Z.; Laksaci, A. Asymptotic properties of the kernel estimate of spatial conditional mode when the regressor is functional. AStA Adv. Stat. Anal. 2015, 99, 131–160. [Google Scholar] [CrossRef]
  12. Bouanani, O.; Laksaci, A.; Rachdi, M.; Rahmani, S. Asymptotic normality of some conditional nonparametric functional parameters in high-dimensional statistics. Behaviormetrika 2019, 46, 199–233. [Google Scholar] [CrossRef]
  13. Ling, N.; Liu, Y.; Vieu, P. Conditional mode estimation for functional stationary ergodic data with responses missing at random. Statistics 2016, 50, 991–1013. [Google Scholar] [CrossRef]
  14. Almanjahie, I.M.; Kaid, Z.; Laksaci, A.; Rachdi, M. Estimating the conditional density in scalar-on-function regression structure: K-NN local linear approach. Mathematics 2022, 10, 902. [Google Scholar] [CrossRef]
  15. Attouch, M.; Bouabsa, W. The k-nearest neighbors estimation of the conditional mode for functional data. Rev. Roum. Math. Pures Appl. 2013, 58, 393–415. [Google Scholar]
  16. Azzi, A.; Belguerna, A.; Laksaci, A.; Rachdi, M. The scalar-on-function modal regression for functional time series data. J. Nonparametr. Stat. 2024, 36, 503–526. [Google Scholar] [CrossRef]
  17. Wang, T. Non-parametric Estimator for Conditional Mode with Parametric Features. Oxf. Bull. Econ. Stat. 2024, 86, 44–73. [Google Scholar] [CrossRef]
  18. Schouten, B.; Klausch, T.; Buelens, B.; Van Den Brakel, J. A Cost–Benefit Analysis of Reinterview Designs for Estimating and Adjusting Mode Measurement Effects: A Case Study for the Dutch Health Survey and Labour Force Survey. J. Surv. Stat. Methodol. 2024, 12, 790–813. [Google Scholar] [CrossRef]
  19. Guenani, S.; Bouabsa, W.; Omar, F.; Kadi Attouch, M.; Khardani, S. Some asymptotic results of a kNN conditional mode estimator for functional stationary ergodic data. Commun. Stat.-Theory Methods 2024, 54, 3094–3113. [Google Scholar] [CrossRef]
  20. Thiam, A.; Thiam, B.; Crambes, C. Recursive estimation of nonparametric regression with functional covariate. Qual. Control Appl. Stat. 2014, 59, 527–528. [Google Scholar]
  21. Slaoui, Y. Recursive nonparametric regression estimation for dependent strong mixing functional data. Stat. Inference Stoch. Process. 2020, 23, 665–697. [Google Scholar] [CrossRef]
  22. Alamari, M.B.; Almulhim, F.A.; Litimein, O.; Mechab, B. Strong Consistency of Incomplete Functional Percentile Regression. Axioms 2024, 13, 444. [Google Scholar] [CrossRef]
  23. Shang, H.L.; Yang, Y. Nonstationary functional time series forecasting. J. Forecast. 2024. early view. [Google Scholar] [CrossRef]
  24. Aneiros, G.; Horová, I.; Hušková, M.; Vieu, P. Special Issue on Functional Data Analysis and Related Fields. J. Multivar. Anal. 2022, 189, 104908. [Google Scholar] [CrossRef]
  25. Moindjié, I.A.; Preda, C.; Dabo-Niang, S. Fusion regression methods with repeated functional data. Comput. Stat. Data Anal. 2025, 203, 108069. [Google Scholar] [CrossRef]
  26. Gertheiss, J.; Rügamer, D.; Liew, B.X.; Greven, S. Functional data analysis: An introduction and recent developments. Biom. J. 2024, 66, e202300363. [Google Scholar] [CrossRef]
  27. Agua, B.M.; Bouzebda, S. Single index regression for locally stationary functional time series. AIMS Math. 2024, 9, 36202–36258. [Google Scholar] [CrossRef]
  28. Bouanani, O.; Bouzebda, S. Limit theorems for local polynomial estimation of regression for functional dependent data. AIMS Math. 2024, 9, 23651–23691. [Google Scholar] [CrossRef]
  29. Bouzebda, S. Weak convergence of the conditional single index U-statistics for locally stationary functional time series. AIMS Math. 2024, 9, 14807–14898. [Google Scholar] [CrossRef]
  30. Xu, R.; Wang, J. L 1-estimation for spatial nonparametric regression. J. Nonparametr. Stat. 2008, 20, 523–537. [Google Scholar] [CrossRef]
  31. Andrews, D. First Order Autoregressive Processes and Strong Mixing; Cowles Foundation Discussion Papers 664. Cowles Foundation for Research in Economics; Yale University: New Haven, CT, USA, 1983. [Google Scholar]
  32. Beran, J. Statistics for Long-Memory Processes; Routledge: London, UK, 2017. [Google Scholar]
  33. Ibragimov, I.A.; Rozanov, Y.A.E. Gaussian Random Processes; Springer Science & Business Media: Berlin/Heidelberg, Germany, 2012; Volume 9. [Google Scholar]
  34. Magdziarz, M.; Weron, A. Ergodic properties of anomalous diffusion processes. Ann. Phys. 2011, 326, 2431–2443. [Google Scholar] [CrossRef]
  35. Aneiros-Pérez, G.; Cardot, H.; Estévez-Pérez, G.; Vieu, P. Maximum ozone concentration forecasting by functional non-parametric approaches. Environmetrics 2004, 15, 675–685. [Google Scholar] [CrossRef]
Figure 1. Functional autoregressive order 2: Wiener kernel.
Figure 1. Functional autoregressive order 2: Wiener kernel.
Entropy 27 00552 g001
Figure 2. Functional autoregressive order 2: Gaussian kernel.
Figure 2. Functional autoregressive order 2: Gaussian kernel.
Entropy 27 00552 g002
Figure 3. Functional GRCH order 1.
Figure 3. Functional GRCH order 1.
Entropy 27 00552 g003
Figure 4. The CO2 emission during 2024.
Figure 4. The CO2 emission during 2024.
Entropy 27 00552 g004
Figure 5. A sample of 50 curves of daily CO2 emission.
Figure 5. A sample of 50 curves of daily CO2 emission.
Entropy 27 00552 g005
Figure 6. Comparison between the three predictors.
Figure 6. Comparison between the three predictors.
Entropy 27 00552 g006
Table 1. M S E errors of the estimator C M ^ .
Table 1. M S E errors of the estimator C M ^ .
MFFTSDist.Het.Hom.SNRHet (5%)SNRHet (40%)
MF = 1GFAR(2)Laplace0.1760.1541.1742.197
Log normal0.1830.1661.1872.198
Weibull0.1960.1721.1892.208
WFAR(2)Laplace0.1650.1430.1582.171
Log normal0.1730.1411.1532.185
Weibull0.1700.1531.1702.192
FARCH(1)Laplace0.2010.1821.2042.209
Log normal0.2230.2091.3112.534
Weibull0.2400.2231.4122.626
MF = 10GFAR(2)Laplace0.2560.2611.3532.413
Log normal0.3170.3121.4872.507
Weibull0.2960.5421.6182.698
WFAR(2)Laplace0.3560.3341.3850.497
Log normal0.4320.5131.6352.758
Weibull0.4080.4431.4892.595
FARCH(1)Laplace0.5130.5231.6412.691
Log normal0.4340.4191.4532.554
Weibull0.5040.5141.5621.516
Table 2. M S E errors of the estimator C M ˜ .
Table 2. M S E errors of the estimator C M ˜ .
MFFTSDist.HeT.Hom.SNRHet (5%)SNRHet (40%)
MF = 1GFAR(2)Laplace1.1611.1092.0084.378
Log normal1.3040.9632.7894.678
Weibull3.2392.1073.8964.894
WFAR(2)Laplace0.8760.4031.0971.856
Log normal0.6060.2361.7652.785
Weibull1.6901.3272.0452.976
FARCH(1)Laplace2.3321.7632.4354.554
Log normal2.2041.3983.9715.861
Weibull5.1093.7125.0236.432
MF = 10GFAR(2)Laplace4.2014.2167.3128.417
Log normal4.2304.7366.7897.678
Weibull5.1176.1077.1868.243
WFAR(2)Laplace3.6543.2124.1785.164
Log normal3.9024.5615.6056.194
Weibull4.3104.1276.2056.817
FARCH(1)Laplace6.2317.8628.3339.352
Log normal4.3155.4936.7718.662
Weibull7.1017.5138.2249.533
Table 3. M S E errors of the estimators C M ¯ .
Table 3. M S E errors of the estimators C M ¯ .
MFFTSDist.Het.Hom.SNRHet (5%)SNRHet (40%)
MF = 1GFAR(2)Laplace1.5351.3312.1034.414
Log normal1.2021.1062.4524.786
Weibull2.1191.8113.024.949
WFAR(2)Laplace0.1670.1561.8613.843
Log normal0.1340.1361.4513.073
Weibull0.1090.1170.6981.785
FARCH(1)Laplace3.1012.5123.9724.952
Log normal2.6032.3633.8615.045
Weibull4.0093.2274.9615.895
MF = 10GFAR(2)Laplace2.5522.3124.1326.447
Log normal2.2213.1665.4218.761
Weibull4.1915.8236.2118.991
WFAR(2)Laplace2.1712.1644.8126.832
Log normal2.1420.1621.4123.264
Weibull2.1922.1733.6825.751
FARCH(1)Laplace8.1128.5219.1289.921
Log normal4.6114.1696.16110.012
Weibull9.01810.27111.03112.185
Table 4. S M S E errors of the three predictors with respect to different metrics and different kernels.
Table 4. S M S E errors of the three predictors with respect to different metrics and different kernels.
ModelMetricKernelSMSE
C M ^ PCA (3th eigenfunction)Quadratic kernel3.26
PCA (3th eigenfunction) β -kernel3.37
8th eigenfunctionQuadratic kernel4.03
8th eigenfunction β -kernel4.11
L 2 Spline metricQuadratic kernel4.39
L 2 Spline metric β -kernel4.52
M E ^ PCA (3th eigenfunction)Quadratic kernel5.42
PCA (3th eigenfunction) β -kernel5.61
8th eigenfunctionQuadratic kernel7.56
8th eigenfunction β -kernel8.22
L 2 Spline metricQuadratic kernel6.45
L 2 Spline metric β -kernel6.82
C M ^ PCA (3th eigenfunction)Quadratic kernel4.87
PCA (3th eigenfunction) β -kernel5.11
8th eigenfunctionQuadratic kernel8.62
8th eigenfunction β -kernel8.34
L 2 Spline metricQuadratic kernel6.12
L 2 Spline metric β -kernel6.73
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Alamari, M.B.; Almulhim, F.A.; Almanjahie, I.M.; Bouzebda, S.; Laksaci, A. Scalar-on-Function Mode Estimation Using Entropy and Ergodic Properties of Functional Time Series Data. Entropy 2025, 27, 552. https://doi.org/10.3390/e27060552

AMA Style

Alamari MB, Almulhim FA, Almanjahie IM, Bouzebda S, Laksaci A. Scalar-on-Function Mode Estimation Using Entropy and Ergodic Properties of Functional Time Series Data. Entropy. 2025; 27(6):552. https://doi.org/10.3390/e27060552

Chicago/Turabian Style

Alamari, Mohammed B., Fatimah A. Almulhim, Ibrahim M. Almanjahie, Salim Bouzebda, and Ali Laksaci. 2025. "Scalar-on-Function Mode Estimation Using Entropy and Ergodic Properties of Functional Time Series Data" Entropy 27, no. 6: 552. https://doi.org/10.3390/e27060552

APA Style

Alamari, M. B., Almulhim, F. A., Almanjahie, I. M., Bouzebda, S., & Laksaci, A. (2025). Scalar-on-Function Mode Estimation Using Entropy and Ergodic Properties of Functional Time Series Data. Entropy, 27(6), 552. https://doi.org/10.3390/e27060552

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop