Data-Driven Modeling of Web Traffic Flow Using Functional Modal Regression

Kaid, Zoulikha; Alamari, Mohammed B.

doi:10.3390/axioms14110815

Open AccessArticle

Data-Driven Modeling of Web Traffic Flow Using Functional Modal Regression

by

Zoulikha Kaid

and

Mohammed B. Alamari

^*

Department of Mathematics, College of Science, King Khalid University, Abha 62223, Saudi Arabia

^*

Author to whom correspondence should be addressed.

Axioms 2025, 14(11), 815; https://doi.org/10.3390/axioms14110815

Submission received: 16 September 2025 / Revised: 23 October 2025 / Accepted: 29 October 2025 / Published: 31 October 2025

(This article belongs to the Special Issue Functional Data Analysis and Its Application)

Download

Browse Figures

Versions Notes

Abstract

Real-time control of web traffic is a critical issue for network operators and service providers. It helps ensure robust service and avoid service interruptions, which has an important financial impact. However, due to the high speed and volume of actual internet traffic, standard multivariate time series models are inadequate for ensuring efficient real-time traffic management. In this paper we introduce a new model for functional time series analysis, developed by combining a local linear smoothing approach with an

L^{1}

-robust estimator of the quantile’s derivative. It constitutes an alternative, robust estimator for functional modal regression that is adequate to handle the stochastic volatility of high-frequency of web traffic data. The mathematical support of the new model is established under functional dependent case. The asymptotic analysis emphasizes the functional structure of the data, the functional feature of the model, and the stochastic characteristics of the underlying time-varying process. We evaluate the effectiveness of our proposed model using comprehensive simulations and real-data application. The computational results illustrate the superiority of the nonparametric functional model over the existing conventional methods in web traffic modeling.

Keywords:

functional time series; FARCH model; Local linear smoothing approach; modal regression; M-quantile regression; web traffic

MSC:

62G08; 62G10; 62G35; 62G07; 62G32; 62G30; 62H12

1. Introduction

The internet network has become an indispensable tool of our modern digital life. It is important for global commerce, communication, and all the daily services in our life. At this stage, the real-time management of web traffic is vital. The traditional statistical models are unable to fit its underlying structure in real time. Instead, functional data analysis (FDA) offers a promising alternative way. It treats the temporal dynamic of flows as a continuous process rather than a discrete random variable. In this context, the robust modal regression studied in this contribution provides flexible and accurate data-driven approach to monitor and forecast the fluctuations in web traffic.

There exist various network traffic prediction models proposed in the statistics literature. However, the efficiency of the prediction method is related to the prediction horizon, computational costs, prediction error rate, and response times. Based on the multivariate data analysis, the authors of [1] developed a predictive system that integrates decision trees, neural networks, and support vector machines to analyze Apple (AAPL) stock. Their findings indicate that the system achieved an accuracy rate of 85% in forecasting daily stock movements. Ref. [2] highlighted the nonlinear and time-varying characteristics of network traffic by treating the flow as a temporal process. The technical merits and limitations of various time series forecasting models for web traffic are compared in [3].

On the other hand, Functional Time Series Data Analysis (FTSDA) is an emerging branch of functional data analysis (FDA). The approach was first popularized by Bosq’s monograph [4], which focused on the linear model in functional spaces. The importance of the FTSDA topics is justified by the wide range of applications from diverse fields. We point out that the framework of this contribution is the nonparametric analysis of functional time series. Early results in this direction were provided by [5], who established the almost complete consistency of the kernel estimator for the regression operator under mixing assumptions. At this stage, ref. [6] proved the asymptotic normality of the kernel estimator of the regression operator. Readers interested in the (FTSDA) topic may refer to ref. [7] for the ergodic case, ref. [8] for the quasi-associated case, and ref. [9] for the long-memory dependence case.

Usually the prediction problem in FTSDA is resolved using the standard regression based on the conditional expectation. In contrast to this conventional regression methodologies, the modal and the quantile regression are more linked to the volatility of the data as well as its conditional distribution. They are particularly efficient when conditional distributions exhibit asymmetry, multimodality, or are influenced by the heavy distribution of the white noise. Other scenarios in which the mentioned models are better than the conditional mean are listed in [10].

The first results on nonparametric mode estimation were developed by [5]. They constructed a prediction model from the derivative of the conditional density. They stated the almost complete consistency (a.cc.) of the kernel estimator for the modal regression. While the

L_{p}

of the functional Nadaraya–Watson (NW) estimator was derived by [11]. Ref. [12] established stochastic consistency under ergodicity. The conditional mode estimation in a functional random field case is developed by [13]. They proved the complete convergence and stated the convergence rate of the constructed estimator, while the asymptotic distribution of the spatial mode regression is established by [14]. Additional theoretical and applied ideas in FDA, including expansions to real-world areas, are presented in [15].

In the same direction, quantile regression has attracted growing attention in the FTSDA branch. The first studies belong to functional linear modeling. It was developed by [16,17], which provide comprehensive overviews of recent advances. The moment integrability of the functional Nadaraya–Watson was developed by [18]. For more recent developments, we refer to [19]. The estimation of the conditional mode through quantile regression was first introduced in functional data analysis (FDA) by [20]. In their work, the authors established the pointwise consistency of the robust estimator under the independence structure. For extensions of this result, we refer the reader to [21,22], where the generalizations to functional time series data (FTSDA) and ergodic FTSDA, respectively, are developed.

While previous cited studies have focused on the Local Constant Approach (LCA), the main aim of contribution of this work is to adopt the Local Linear Estimation Method (LLEM). This method offers more advantages over the LCA estimator, namely, its ability to improve estimation efficiency by reducing bias. This improvement was first observed by [23] in the multivariate setting and extended to functional statistics by [24].

From a bibliographical point of view, the LLEM has been extensively studied in the context of functional statistics. For instance, ref. [25] established the quadratic consistency of the LLEM estimator for regression operators in Hilbert spaces. Next, Ref. [26] proposed a more general estimator usable for Banachic functional covariates, while [27] introduced an alternative approach based on the inverse local covariance operator. Ref. [28] developed functional LLEM fitting for nonparametric conditional models, proving both pointwise and uniform almost-complete (a.co.) consistency for the conditional density and its derivatives. The local linear estimator of the quantile regression is proposed by [29]. It is constructed using the

L^{1}

-method. More recently, Ref. [30] developed the LLEM of the conditional mode under the assumption of independence.

The present work aims to highlight and evaluate the practical feasibility of the robust local linear mode estimator for predicting the total web traffic at a given future time horizon. Clearly, web traffic data often exhibit strong seasonality, asymmetric features, and high volatility, reflecting daily user cycles or special events. For all these reasons the nonparametric modal regression is very adequate for this kind of data. In fact traditional time series approaches, which typically treat traffic as discrete observations, are inappropriate to fit these complex features, leading to suboptimal modeling. However, FTSDA approach treats traffic as continuous curves over time, allowing to take into account the mentioned characteristics of web traffic. To enhance the reliability of the functional forecasting algorithm, we have employed the robust LLEM of the modal regression, which permits to explore the effect of the smooth variations and the asymmetric behaviors of the flows data. Indeed, in contrast, our predictor combines the robustness of

L_{1}

regression with the efficiency of LLE estimation, making it more resistant to outliers while improving its predictive accuracy in functional statistics.

The accuracy of the predictor is confirmed through the establishment of the almost sure convergence of the estimator under standard assumptions and provide precise rates of convergence. A second advantage of the proposed predictor is its ability to improve the robustness of the conditional mode estimation without reducing accuracy. This gain in robustness is particularly valuable for predictive tasks, as it provides a stable predictor that retains its efficiency even when ideal assumptions are not satisfied.

Finally, we point out that the prediction of traffic flows is particularly valuable for resource optimization and anomalies detection related to cyberattacks or system failures. For large-scale platforms such as e-commerce sites, cloud services, and streaming networks, accurately forecasting web traffic flows is vital. Thus the integration temporal dynamics through the functional is crucial, allowing reliable predictions and help to provide more performance for data-driven network management.

The structure of this paper is as follows: Section 2 introduces the FTSDA framework and the local linear modal regression. Section 3 presents the principal assumptions and establishes the main theoretical results. Section 4 investigates the finite-sample performance of our estimator through both simulation experiments and applications to real data. Section 5 is devoted to highlighting the potential impact of this contribution in the field of functional data analysis. Finally, the Appendix A contains the detailed proofs of the intermediate results, and Appendix B is devoted to the notation box and the list of acronyms.

2. FTSDA Framework and the Local Linear Modal Regression

As previously emphasized, the main purpose of this paper is to develop a robust predictor to forecast the total traffic flows over a time frame. From a practical point of view, the prediction of total flows gives a better understanding of network demands. It is crucial for planning bandwidth, avoiding slowdowns, and optimizing digital resources. In particular, good traffic forecasts is primordial to maintain reliable services, reduce costs and improve user satisfaction. We formulate this problem by assuming that the observed web traffic flows at specific IP address and time interval

[0, T]

is represented by a process

W (t)

,

t \in [0, T]

. So, the objective is to predict the total traffic over a future horizon of length h, expressed as

\int_{T}^{T + h} W (t) d t .

Under this consideration the FTSDA framework is constructed by cutting the observed process

(W {(t)}_{t \in [0, T]})

at a small interval, thereby generating n dependent functional random variables

{(D_{i}, F_{i})}_{i = 1, \dots, n}

defined by the following:

\forall t \in [0, T], F_{i} (t) = W (((i - 1) T + t) / n) and D_{i} = \int_{T}^{T + h} F_{i} (t) d t .

In the context of predicting total web traffic, the dependence of the FTSDA framework is characterized through the strong mixing property, which is important to provide a theoretical foundation, allowing the validity of statistical inference when modeling dependent functional data. Since web traffic flows naturally exhibit temporal dependence, seasonality, and volatility, assuming a strong mixing framework permits us to establish consistency and convergence results for the proposed functional time series estimator of total traffic. Theoretically the strong mixing property is defined

M i x (n) = sup_{A \in F_{1}^{k}} sup_{B \in F_{k + n}^{\infty}} | I P (A \cap B) - I P (A) I P (B) | \to 0 .

(1)

F_{1}^{k}

is the

σ

-algebra generated by

(F_{1}, D_{1}), \dots, (F_{k}, D_{k})

, and

F_{k + n}^{\infty}

is the

σ

-algebra generated by

(F_{k + n}, D_{k + n}), (F_{k + n + 1}, D_{k + n + 1}), \dots

which are assumed to be stationary.

It is worth noting that the choice of this FTSDA framework is motivated by the fact that the strong mixing property permits to include many usual cases. For instance, it is well documented that ARMA processes are geometrically strongly mixing [31], EXPAR models [32] or ARCH processes [33].

A second important advantage of this contribution is the use of the conditional distribution of

D_{i}

given the function

F_{i}

to explore the relationship between past trajectory and future total traffic. In this framework, we define the conditional mode (CM) as

R M (F) = arg max_{d \in S} C D (d ∣ F_{i} = F),

(2)

where

C D

(respectively

C C

) denotes the conditional density (respectively, the conditional cumulative function) of

D_{i}

given

F_{i}

and S is a given compact set such that the conditional mode exists and is uniquely defined. The conditional mode provides a robust and informative summary of the conditional distribution, offering an effective tool for prediction, namely, in the presence of asymmetry or heavy tailed distribution of the data. Moreover, to increase this feature of robustness, we express the CM function through the conditional quantile

C Q

by the following:

R M (F) = C Q (p_{R M}, F), where p_{R M} = arg min_{p \in [a_{F}, b_{F}]} C Q^{'} (p, F) .

(3)

where

C Q^{'} (p, F) = \frac{\partial C Q (p, F)}{\partial p}

is the derivative of the conditional quantile and

[a_{F}, b_{F}] = C D^{- 1} (S | F)

. This can be obtained by a simple analytical argument (see [22]). Therefore, the LLEM estimator of the function

R M

is related to the estimator of

C Q

. The latter is introduced in FDA by [34] and is based on approximating

C Q (p, \cdot)

in a neighborhood

N_{F}

by the following:

C Q (p, Z) \approx A + B \int_{0}^{T} {| F (t) - Z (t) |}^{2} d t \forall Z \in N_{F},

(4)

where

A = C Q (p, F)

. The coefficients

(A, B)

are estimated by

min_{(A, B) \in R^{2}} \sum_{i = 1}^{n} L_{p} (D_{i} - A - B \int_{0}^{T} | F (t) - F_{i} (t) |^{2}) Ψ (\frac{M (F, F_{i})}{f_{n}}),

(5)

where

L_{p} (y) = y (p - {1 I}_{{y < 0}}

) is the quantile loss function and Ψ is a kernel function, f_n is a bandwidth sequence and M is a given locating function in the functional space of (F_i)_i. We point out that this issue has been discussed in the monograph [5]. It is noted that using a semi-metric space is often more suitable in practice. It permits to consider a large class of functional spaces and offers the possibility to improve the performance of the predictor by selecting an appropriate semi-metric among a specific family of metric. Moreover, the monograph is so precise that the smoothness of functional data strongly affects the choice of metric. In particular the FPCA-based metric is particularly appropriate in cases where the data are discontinuous. In our contribution we adopt the more general case studied by [26]. It follows that

\begin{matrix} \hat{R M} (F) = \hat{C Q} ({\hat{p}}_{R M}, F) where {\hat{p}}_{R M} = arg min_{p \in [a_{F}, b_{F}]} {\hat{C Q}}^{'} (p, F), \end{matrix}

(6)

with

{\hat{C Q}}^{'}

being the estimator of the derivative of the conditional quantile defined by

\begin{matrix} {\hat{C Q}}^{'} (p, F) = \frac{\hat{C Q} (p + r_{n}, F) - \hat{C Q} (p - r_{n}, F)}{2 r_{n}}, \end{matrix}

(7)

where

\hat{C Q} = \hat{A}

minimiser of (5), and r_n is a positive bandwidth-like sequence.

Algorithm

At the end, let us summarize the step-by-step flow of the predictor in practice.

Compute $M (F, F_{i});$
Fit local linear quantile regressions over a coarse grid in p;
Estimate $\frac{\partial C Q}{\partial p} (p, F)$ by ${\hat{C Q}}^{'} (p, F);$
Select ${\hat{p}}_{R M}$ where ${\hat{C Q}}^{'} (p, F)$ is smallest;
Output $\hat{C Q} ({\hat{p}}_{R M}, F) .$

Clearly this estimator is directly implementable using the quantile pinball loss function, the regressors

F_{i}

and the weights

Ψ (\frac{M (F, F_{i})}{f_{n}})

. However, mathematical support is essential to establish the estimator’s feasibility. The theoretical foundation of our approach is established by proving the complete consistency of

\hat{R M} (F)

by expressing the convergence rate under the dependence condition. To the best of our knowledge, this is the first contribution that developed the local linear estimation of the conditional mode within the framework of functional time series data.

3. Mathematical Support

Throughout this paper, we shall use C and

C^{'}

to denote strictly positive generic constants. For

f^{'} < f

, we introduce the set

B (F, f^{'}, f) = \{F^{'} \in F : f^{'} < M (F^{'}, F) < f\},

and define

P (X \in B (F, f^{'}, f)) = ξ_{F} (f^{'}, f) > 0 .

In what follows, we outline the conditions that are essential for demonstrating the almost complete convergence of

\hat{R M} (F)

toward its theoretical counterpart

R M (F)

.

(T1): For any $f > 0$ , $ξ_{F} (f) = ξ_{F} (- f, f) > 0$ , and there exists a function $γ_{F} (\cdot)$ such that

$for all t \in (- 1, 1), lim_{f \to 0} \frac{ξ_{F} (t f)}{ξ_{F} (f)} = γ_{F} (t)$
(T2): The function $C Q (\cdot, F)$ is of class $C^{3} ([a_{F}, b_{F}])$ , and $C C (\cdot | F) =$ satisfies the following Lipschitz condition:

$for all (F_{1}, F_{2}) \in N_{F}, | C C (t | F_{1}) - C C (t | F_{2}) | \leq C | M^{b} (F_{1}, F_{2}) |, and for some b > 0,$

where $N_{F}$ denotes a neighborhood of $F$ , and $C C$ is the conditional cumulative function of $D_{i}$ given $F_{i}$ .
(T3): The sequence ${((F_{i}, D_{i}))}_{i \in I N}$ satisfies $\exists a > 5, \exists c > 0 : \forall n \in I N, M i x (n) \leq c n^{- a}$ and

$I P ((F_{i}, F_{j}) \in B (F, - f, f) \times B (F, - f, f)) = O {(ξ_{F} (f))}^{(a + 1) / a} > 0 .$
(T4): The kernel $F$ is a positive and differentiable function, which is supported within $(- 1, 1)$ , and such that

$(\begin{matrix} Ψ (1) - \int_{- 1}^{1} Ψ^{'} (t) γ_{F} (t) d t & Ψ (1) - \int_{- 1}^{1} {(t Ψ (t))}^{'} γ_{F} (t) d t \\ Ψ (1) - \int_{- 1}^{1} {(t Ψ (t))}^{'} γ_{F} (t) d t & Ψ (1) - \int_{- 1}^{1} {(t^{2} Ψ (t))}^{'} γ_{F} (t) d t \end{matrix})$

is a positive definite matrix.
(T5): The bandwidth $f_{n}$ satisfies the following: $\exists 0 < η < \frac{a - 5}{a + 1}$ , such that

$C n^{\frac{(5 - a)}{(a + 1)} + η} \leq ξ_{F} (f_{n}), and \frac{log n}{n r_{n}^{2} ξ_{F} (f_{n})} \to 0 .$

The mathematical support is carried out by a standard assumptions. The imposed assumption provides a flexible framework allowing to highlight the principal elements of the proposed approach, including the model structure, the data correlation, and the convergence rates. Given the complexity of the proposed local linear algorithm and the strength of the Borel–Cantelli (BC) consistency, these assumptions represent a reasonable balance between the easy implantation and the strong consistency of the predictor. Typically, condition (T1) is pivotal in the NFDA analysis. We apply this condition on the bandwidth

f_{n} \to 0

. Condition (T2) has a direct influence on the bias term in the convergence rate of

\hat{R M} (F)

, while condition (T3) controls the mixing correlation of the functional time series as well as its local dependence structure. Finally, conditions (T4) and (T5) concern the kernel

Ψ

and the bandwidths

f_{n}

and

r_{n}

, which regulate the technical implementation of the estimator

\hat{R M} (F)

. It is important to highlight that these assumptions are crucial for establishing the convergence rate of the kernel estimator under the BC framework.

The following theorem establishes the almost-complete convergence (cf. [5] for details), (a.co.) of

\hat{R M} (F)

. This kind of convergence implies both almost sure convergence and convergence in probability.

Theorem 1.

Under assumptions (T1)–(T5) and if

inf_{p \in (0, 1)} \frac{\partial^{3} C Q (p, F)}{\partial p} > 0

we have the following:

| \hat{R M} (F) - R M (F) | = O (f_{n}^{b / 2}) + O (r_{n}^{1 / 2}) + O ({(\frac{log n}{n r_{n}^{2} ξ_{F} (f_{n})})}^{\frac{1}{4}}) a . c o .

4. Web Traffic Flow Modeling

4.1. Test of the Data-Driven Approach over Artificial Data

This section is devoted to perform simulation experiments designed to assess the finite sample performance of the proposed robust modal regression model under various scenarios. The empirical analysis has two main objectives. First, it aims to demonstrate the practical feasibility and straightforward implementation of the proposed data-driven approach. Second, it seeks to evaluate the impact of the principal parameters on the prediction quality. Clearly the pivotal parameters involved in the proposed prediction approach are the level of dependence, the functional space of the data, the smoothing parameter and the kernel of the estimator. To achieve these objectives, we generate various functional time series data that incorporate the dynamic of the flow of web traffic. Typically, the generated functional data simulates the characteristics of web activity, including random fluctuations and asymmetry of data distribution. To achieve this objective, we construct three levels of functional time series, generated as a linear process of the form

F_{i} = \sum_{j = i}^{i + 10} a_{j} Υ_{j},

where

{(Υ_{j})}_{j}

denotes a sequence of independent functional random variables, and

a_{j} = m^{j}

are calibrated constants. These functions are obtained by the code routine dgp.fiid. By construction, the resulting process is strong mixing and satisfies Assumption (T3). The functional

F_{i}

can be interpreted as the aggregated traffic load over a continuous time interval. Clearly with sampling design we can illustrate varying dependency structures as well as short and long-range correlations of the functional time series data.The shapes of the functional variables of different dependency stages are shown in Figure 1.

Recall that our main goal of this simulation experiment is to predict the total flow

D_{i} = \int_{T}^{T + h} F_{i} (t) d t

at future time horizon h, with particular emphasis on the impact of its three components: the mixing assumption, the choice of bandwidth parameters

f_{n}

, and the locating function M. In this framework, the correlation structure is controlled by the evaluating the accuracy over three parameters

m = 0.2, 0.5, 0.8

. Concerning the bandwidth parameters

(f_{n}, r_{n})

we compare three selectors that are

\{\begin{matrix} Rule 1 (f_{n}^{1}, r_{n}^{1}) = arg min_{f \in H_{n}, r \in B_{n}} \sum_{i = 1}^{n} {(D_{i} - \hat{R M} (F_{i}))}^{2}, \\ Rule 2 (f_{n}^{2}, r_{n}^{2}) = arg min_{f \in H_{n}, r \in B_{n}} \sum_{i = 1}^{n} |D_{i} - \hat{R M} (F_{i})|, \\ Rule 3 (f_{n}^{3}, r_{n}^{3}) = arg min_{f \in H_{n}, r \in B_{n}} \sum_{i = 1}^{n} L_{q_{R M}} (D_{i} - \hat{R M} (F_{i})), \end{matrix}

where

H_{n} = {a \geq 0 such that \sum_{i = 1}^{n} {1 I}_{B (F, a)} (F_{i}) = k},

and B_{n} = {r \geq 0 such that \sum_{i = 1}^{n} {1 I}_{[d - r, d + r]} (D_{i}) = k}

where

k \in {5, 15, 25, \dots, 0.5 n}

. Meanwhile, the third parameter concerns the locating function M. For this last part, we test two locating functions that are

\{\begin{matrix} Locating function 1 & M_{1} (F, F_{i}) = \int_{0}^{T} θ (t) (F (t) - F_{i} (t)) d t \\ θ is the first eigenvector of the variance covariance matrix \\ Locating function 2 & M_{2} (F, F_{i}) = \int_{0}^{T} {(F (t) - F_{i} (t))}^{2} d t . \end{matrix}

Now, for this comparison study, we generate n observations of

(F_{i}, D_{i})

, which are randomly divided into two subsets:

60 %

observations for the in-sample and

40 %

observations for the Testing-sample, and we examine the accuracy by then computing the mean squared error over the Testing-sample

M S E = \frac{1}{n_{t e s t}} \sum_{j = 1}^{n_{t e s t}} {(\hat{R M} (F_{i}) - R M (F_{i}))}^{2},

using different scenarios of h, m, sample size n, selectors and locating functions. In the first illustration, we fix some parameters, namely,

h = 0.1 T

, Rule 1, and the locating function

M_{1}

. We evaluate the robustness of the estimator by multiplying 10 in-sample observations by a fixed multiplicative factor

M F

. The results of this illustration are presented in Table 1, which reports the MSE for different values of

M F

and sample sizes n.

The MSE results show the strong performance of the local linear estimation of the robust modal regression. Specifically, the variability of the MSE values is small according to

M F = 1, 5, 10

. This small difference highlights the robustness of the estimator

\hat{R M}

. In the second experiment, we further analyze the sensitivity of the estimator

\hat{R M}

with respect to the remaining parameters h, m, as well as the choice of selectors and location functions.

The results presented in Table 2 show that the performance of the predictor is highly influenced by the different parameters used in the algorithm. In particular, smaller prediction errors are obtained when the data exhibit weak dependence, and the bandwidth parameter h takes small values, compared to cases with strong dependence. The simulation results also reveal that the accuracy of the prediction depends on the choice of the locating function and the smoothing parameter. Although the runtime is influenced by the parameter of the selected locating function M and by the local fits over the small p-grid used in the experiments, the computational cost does not vary significantly across the different scenarios. Overall, the proposed estimator demonstrates practical usefulness, computational efficiency, and robust performance even in the presence of strong dependence.

4.2. Real Data Application

In this paragraph we demonstrate the practical relevance of our approach using real data. More precisely we compare the predictor

\hat{R M}

with three competitive nonparametric functional modes that are the robust local constant (Robust LC) mode (introduced by [20]) defined by

\hat{R M_{1}} (F) = \hat{C Q} ({\hat{p}}_{R M}, F) where {\hat{p}}_{R M} = arg min_{p \in [a_{F}, b_{F}]} {\hat{C Q}}^{'} (p, F),

with

\hat{C Q} = arg {min}_{A} \sum_{i = 1}^{n} L_{p} (D_{i} - A) Ψ (\frac{M (F, F_{i})}{f_{n}})

and

{\hat{C Q}}^{'}

as its derivative. The standard local constant mode (Standard LC) (considered by [5]) is defined by

\hat{R M_{2}} (F) = arg max_{d} \frac{\sum_{i = 1}^{n} Ψ (\frac{M (F, F_{i})}{f_{n}}) Ψ (\frac{(d - D_{i})}{f_{n}})}{\sum_{i = 1}^{n} Ψ (\frac{M (F, F_{i})}{f_{n}})}

The standard local linear (Standard LL) mode (studied by [24]) is defined by

\hat{R M_{3}} (F) arg max_{d} \frac{\sum_{i, j = 1}^{n} W_{i j} Ψ (\frac{d - D_{i})}{f_{n}})}{\sum_{i, j = 1}^{n} W_{i j}}

where W_{i j} (x) = δ_{i} (δ_{i} - δ_{j}) Ψ (\frac{M (F, F_{j})}{f_{n}}) Ψ (\frac{M (F, F_{i})}{f_{n}}) and δ_{i} = \int_{0}^{T} {| F (t) - F_{i} (t) |}^{2}

For practical illustration, we consider the dataset so-called CESNET-TimeSeries24, which is available in Zenodo platform https://zenodo.org/record/13382427 (accessed on 8 September 2025). Such data was used and developed by [35]. The considered dataset contains various traffic web metrics including flows, packets, bytes, among others. It covers data at both institutional and IP subnetwork levels providing comprehensive coverage for anomaly detection tasks. The studied data is derived from approximately 66 billion IP flows, and it is offered at three distinct temporal resolutions (10 min, 1 h, and 1 day). In this work framework, we concentrate on the high-resolution case of 10-min frame. The data process is visualized in Figure 2.

According to the algorithm described in Section 2, we construct a collection of 279 functional random variables, denoted by

F_{i = 1}^{280} (\cdot)

, from the process

W (t) = l o g (bytes at t)

where each function

F_{i} (\cdot)

represents the curve of the logarithmic difference flow-traffic corresponding to the i-th day (time interval of 24 h). The trajectories of these 280 functional regressors are displayed in Figure 3.

Given the objective of approximating the total flow-traffic 2 h in advance, the estimators are computed using as response variable

D_{i} =

total flows of the first 2 h of day

i + 1

. We use the same quadratic kernel as in the simulation, and each estimator is evaluated within its appropriate selector, metric, and/or locating function. It appears that Rule 1 (from the previous section) performs well for the standard estimators

\hat{R M_{2}}

and

\hat{R M_{3}}

, whereas estimators

\hat{R M_{1}}

and

\hat{R M}

gives better results when Rule 3 is used. Concerning the metric and locating function for the LLEM estimation, we point out that we have observed that there is no significant difference between the locating functions

M_{1}

and

M_{2}

(of the previous section). Thus, we have opted to employ

M_{2}

using the PCA metric (see [5]) for different estimators. Finally, we examine the accuracy of different predictors by dividing the dataset randomly into an in-sample comprising 150 observations and an out-sample comprising 100 observations. The predictive results are assessed by plotting the observed values of

D_{i}

versus the corresponding predicted values (see Figure 4).

As illustrated in Figure 4 the

\hat{R M}

(Robust LL) forecasting procedure exhibits superior performance compared to the alternative predictors

\hat{R M_{1}}

,

\hat{R M_{2}}

\hat{R M_{3}}

. This statement is validated by the square root of the mean squared error (MSE) values, as defined in the previous section. Specifically, the MSE for the

\hat{R M}

is 6.78, whereas the corresponding values for the other predictors

\hat{R M_{1}}

\hat{R M_{2}}

\hat{R M_{3}}

are 8.36, 9.48 and 9.47, respectively.

5. Discussion and Conclusions

Contribution and Positioning:
We point out the novelty of this contribution is that it is the first paper that combines three important tools of statistical modeling. It combines the robust estimation, the local linear smoothing approach and the functional time series structure. Compared to existing works in functional data analysis, the gain is substantial in both theoretical and practical aspects. From the theoretical point of view, the proposed conditional mode estimator substantially improves the kernel estimator used by [5]. Indeed, the kernel estimator of [5] is obtained by maximizing the conditional density function, while the proposed estimator is related to the conditional quantile. This consideration improves the robustness and the accuracy of the estimator. It is well documented that the local linear approach reduces the bias term of the kernel method.
Additionally, for the practical point of view, the functional time series structure studied in the present contribution is more general than the linear process proposed by [4], in the sense that the strong mixing assumption is also fulfilled for many nonlinear processes, thus considering the mixing structure of functional time series, allowing to cover a large class of functional time series data (linear or nonlinear process).
At this stage, the web traffic flow is usually influenced by many factors such as time of day, special events, promotions, and user interest. The linearity of this kind of data is not often guaranteed. Therefore, linear models may provide inaccurate forecasts, especially when the data show strong fluctuations or contains outliers. Consequently, more flexible nonlinear models are often more appropriate to represent the dynamic nature of web traffic flow.
Connection Between the Estimator and Web Traffic Data:
We recall that the main feature of the proposed algorithm is its ability to combine three fundamental components in mathematical statistics, which are functional data modeling, the local linear approach and modal regression as a robust predictor. Combining these tools permits to provide an effective and comprehensive algorithm for modeling web traffic flow. Functional data modeling treats the entire traffic curve as a single observation, allowing the model to explore the temporal dependence and smooth variations in web activity over time. The local linear approach improves estimation accuracy by controlling the local behavior of both the model and the data through linear approximation in the neighborhood of location point. This local adaptation is especially useful during peak hours or special events when traffic change rapidly. The robust estimation component increases the model’s resistance to outliers caused by unusual numbers of visits, viral content, or server errors, ensuring more stable predictions. Conditional mode prediction focuses on estimating the most probable future value rather than the average, which is particularly important when the web traffic distribution is skewed or contains extreme values. Overall, the local linear estimation of the robust mode is well adequate for high-frequency web traffic data. It improves prediction accuracy compared with standard linear models, which often fail to fit the nonlinearity and highly variability nature of web traffic data.
Conclusions:
In this work we have introduced a new predictor based on the estimation of the $L^{1}$ -modal regression using a local linear approach. The theoretical part provides mathematical support for implementing the estimator in practice. Specifically, we have established the almost complete consistency under a strong mixing dependency assumption, which serves as an alternative to the conventional correlation-based criteria. Empirical results from both simulated and real datasets, including web traffic data, confirm that the feasibility of the proposed estimator is closely linked to different parameters involved in the estimator. Even if combining the $L^{1}$ framework with a local linear approach improves both robustness and predictive accuracy, especially for complex functional data like web traffic curves, the accuracy of the estimator depends on the strength of dependencies in the data, the smoothness of the underlying nonparametric model, and the careful choice of the bandwidth or semi-metric. Selecting these parameters can be challenging, as inappropriate choices may substantially affect the estimator’s robustness and predictive performance.
In addition to these findings, this study highlights several potential directions for future research. The first prospect is the asymptotic distribution of the normalized estimator under various forms of functional strong mixing, such as association or Markovian sequences. Another important extension concerns spatio-functional modeling, which takes into account the geographic coordinates of the data. Although these extensions focus on dependencies in the data, further generalizations to other smoothing techniques—including kNN methods, and semi-partial linear approaches—are also interesting topics for the future.

Author Contributions

The authors contributed approximately equally to this work. Formal analysis, M.B.A.; Writing—review and editing, Z.K. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the Deanship of Scientific Research and Graduate Studies at King Khalid University through a Small Research Project under grant number RGP1/41/46.

Data Availability Statement

The data used in this study are available through the link https://zenodo.org/record/13382427 (accessed on 8 September 2025).

Acknowledgments

The authors would like to thank the Editor, the Associate Editor and the anonymous reviewers for their valuable comments and suggestions, which substantially improved the quality of an earlier version of this paper. The authors thank and extend their appreciation to the funder of this work: Deanship of Scientific Research and Graduate Studies at King Khalid University through a Small Research Project under grant number RGP1/41/46.

Conflicts of Interest

The authors declare no conflicts of interest.

Appendix A. Proofs of the Asymptotic Result

Proof of Theorem 1.

We prove analytically

| \hat{R M} (F) - R M (F) | \leq C (sup_{p \in [a_{F}, b_{F}]} | \hat{C Q} (p, F) - C Q (p, F) | + \sqrt{{sup}_{p \in [a_{F}, b_{F}]} | \hat{C Q^{'}} (p, F) - C Q^{'} (p, F) |})

and for a large n

| \hat{C Q^{'}} (p, F F) - C Q^{'} (p, F) | \leq C r_{n}^{- 1} sup_{p \in (a_{F} - r_{n}, b_{F} + r_{n})} | \hat{C Q} (p, F) - C Q (p, F) | + O (r_{n}) .

We evaluate

sup_{p \in [0, 1]} | \hat{C Q} (p, F) - C Q (p, F) | .

Theorem 1 follows from the next proposition. □

Proposition A1.

Assuming the conditions of Theorem 1 hold to write

sup_{p \in (0, 1)} | \hat{C Q} (p, F) - C Q (p, F) | = O (f_{n}^{b}) + O_{a . c o .} ({(\frac{ln n}{n ξ_{F} (f_{n})})}^{\frac{1}{2}})

The proof uses the Bahadur representation of the conditional quantile using the following lemma.

Lemma A1

([34]). Let

Ξ_{n}

be a sequence of decreasing real random functions and

Σ_{n}

be a random real sequence such that

Σ_{n} = o_{a . c o .} (1) a n d sup_{| χ | \leq M} | Ξ_{n} (χ) + λ χ - Σ_{n} | = o_{a . c o .} (1) f o r c e r t a i n c o n s t a n t s λ, M > 0 .

Then, for any real sequence

χ_{n}

such that

Ξ_{n} (χ_{n}) = o_{a . c o .} (1)

, we have

\sum_{n = 1}^{\infty} I P \{| χ_{n} | \geq M\} < \infty .

Proof of Proposition A1.

We apply Lemma A1 on

χ_{n} = (\begin{matrix} \hat{A} - A \\ f_{n} (\hat{B} - B) \end{matrix})

X_{n} (χ) = \frac{1}{n ξ_{F} (f_{n})} \sum_{i = 1}^{n} Υ (p, χ) (\begin{matrix} 1 \\ f_{n}^{- 1} D_{i} \end{matrix}) Ψ_{i}, for χ = (\begin{matrix} c \\ d \end{matrix})

and

Σ_{n} = X_{n} (χ_{0}) with χ_{0} = (\begin{matrix} 0 \\ 0 \end{matrix})

where

Υ (p, χ) = p - 1 I_{D_{i} \leq (c + A) + (f_{n}^{- 1} d + B) I_{i}}, Ψ_{i} = Ψ (f_{n}^{- 1} m (F, F_{i})),

and I_{i} = \int_{0}^{T} {(F_{i} (t) - F (t))}^{2} d t .

So, the main theorem is derived as a consequence of the following lemmas: □

Lemma A2.

Under assumptions (T1)–(T5), we have

∥ Σ_{n} ∥ = O (f_{n}^{min (k_{1}, k_{2})}) + O_{a . c o .} ({(\frac{ln n}{n ξ_{F} (f_{n})})}^{1 / 2}) .

Proof of Lemma A2.

We write

Σ_{n} - I E [Σ_{n}] = (\begin{matrix} Σ_{n}^{1} \\ Σ_{n}^{2} \end{matrix})

where

\{\begin{matrix} Σ_{n}^{1} = \frac{1}{n ξ_{F} (f_{n})} \sum_{i = 1}^{n} Z_{i}^{1} \\ Σ_{n}^{2} = \frac{1}{n f_{n} ξ_{F} (f_{n})} \sum_{i = 1}^{n} Z_{i}^{2} . \end{matrix}

with

Z_{i}^{1} = (p - 1 I_{[D_{i} \leq A + B I_{i}]}) Ψ_{i} - I E [(p - 1 I_{[D_{i} \leq A + B I_{i}]}) Ψ_{i}]

and

Z_{i}^{2} = (p - 1 I_{[D_{i} \leq A + B I_{i}]}) I_{i} Ψ_{i} - I E [(p - 1 I_{[D_{i} \leq A + B I_{i}]}) I_{i} Ψ_{i}] .

The convergence is derived using the Fuk–Nagaev inequality

Z_{i}^{1}

and

Z_{i}^{2}

. It requires the asymptotic behavior of the variance defined by

S_{n}^{2} = \sum_{i = 1}^{n} \sum_{j = 1}^{n} Cov (Z_{i}^{1}, Z_{j}^{1})

= \sum_{i = 1}^{n} \sum_{i \neq j} Cov (Z_{i}^{1}, Z_{j}^{1}) + n V a r [Z_{1}^{1}] .

We split the sum into

J_{1, n}

and

J_{2, n}

. They are defined by

S_{1} = \{(i, j) such that 1 \leq i - j \leq u_{n}\}

and

S_{2} = \{(i, j) such that u_{n} + 1 \leq i - j \leq n - 1\} .

Thus,

\begin{matrix} J_{1, n} & = & \sum_{S_{1}} |Cov (Z_{i}^{1}, Z_{j}^{1})| \\ \leq & C \sum_{S_{1}} |I E [Ψ_{i} Ψ_{j}]| + |I E [Ψ_{i}] I E [Ψ_{j}]| . \end{matrix}

From (T1), (T3) and (T5) we have

J_{1, n} \leq C n u_{n} ξ_{F} (f_{n}) .

Next, for the quantity

J_{2, n}

we use Davydov–Rio’s inequality, in the bounded case, to show that

| Cov (Z_{i}^{1}, Z_{j}^{1}) | \leq C M i x (| i - j |) .

We deduce

J_{2, n} = \sum_{S_{2}} | Cov (Z_{i}^{1}, Z_{j}^{1}) | \leq \frac{n u_{n}^{- a + 1}}{a - 1} .

Taking

u_{n} = {(\frac{1}{ξ_{F} (f_{n})})}^{1 / a},

to prove

\sum_{i = 1}^{n} \sum_{i \neq j} Cov (Z_{i}^{1}, Z_{j}^{1}) = O (n ξ_{F} {(f_{n})}^{(a - 1) / a}) .

The variance term is

\begin{matrix} V a r [Z_{1}^{1}] \leq I E [Ψ_{i}^{2}] = O (ξ_{F} (f_{n})) . \end{matrix}

We conclude

S_{n}^{2} = O (n ξ_{F} (f_{n})) .

(A1)

Finally, for all

ζ > 0

and for all

ε > 0

,

\begin{matrix} I P \{Σ_{n}^{1} - I E [Σ_{n}^{1}] > ε\} & \leq & I P \{|\sum_{i = 1}^{n} Z_{i}^{1}| > ε n I E [Ψ_{1}]\} \\ \leq & C (B_{1} + B_{2}), \end{matrix}

where

B_{1} = {(1 + \frac{ε^{2} n^{2} {(I E [Ψ_{1}])}^{2}}{S_{n}^{2} ζ})}^{- ζ / 2} and B_{2} = n ζ^{- 1} {(\frac{ζ}{ε n I E [Ψ_{1}]})}^{a + 1} .

It suffices

ε = ϵ_{0} \frac{\sqrt{n ln n}}{n I E [Ψ_{1}]} and ζ = C {(ln n)}^{2},

to obtain, under (T5)

\begin{matrix} B_{2} & \leq & C n^{1 - (a + 1) / 2} ξ_{F} {(f_{n})}^{- (a + 1) / 4} {(ln n)}^{(3 a - 1) / 2} \\ \leq & C n^{- 1 - χ_{1}}, \end{matrix}

(A2)

and

\begin{matrix} B_{1} \leq C {(1 + \frac{λ^{2} log n}{ζ})}^{- ζ / 2} \leq C n^{- 1 - χ_{2}}, \end{matrix}

(A3)

for some

χ_{1} > 0

and

χ_{2} > 0

, the combination of relations (A2) and (A3) allows us to conclude that the dispersion term satisfies

\begin{matrix} Σ_{n}^{1} - I E [Σ_{n}^{1}] = O_{a . c o .} (\sqrt{\frac{ln n}{n ξ_{F} (f_{n})}}) . \end{matrix}

(A4)

Similarly,

Σ_{n}^{2} - I E [Σ_{n}^{2}] = O_{a . c o .} (\sqrt{\frac{log n}{n ξ_{F} (f_{n})}}) .

Next,

\begin{matrix} I E [Σ_{n}^{1}] & = & \frac{1}{ξ_{F} (f_{n})} \\ \leq & \frac{1}{ξ_{F} (f_{n})} I E |D (C Q (p, F) | F) - D ((A + B I_{1}) Ψ_{1} | F)| \\ = & O (f_{n}^{min (k_{1}, k_{2})}) . \end{matrix}

and

\begin{matrix} I E [Σ_{n}^{2}] & = & \frac{1}{h ξ_{F} (f_{n})} \\ \leq & \frac{1}{h ξ_{F} (f_{n})} I E |D (C Q (p, F) | F)) - D ((A + B I_{1}) I_{1} Ψ_{1} | F)| \\ = & O (f_{n}^{min (k_{1}, k_{2})}) . \end{matrix}

Therefore,

∥ Σ_{n} ∥ = O (f_{n}^{min (k_{1}, k_{2})}) + O_{a . c o .} ({(\frac{ln n}{n ξ_{F} (f_{n})})}^{1 / 2})

□

Lemma A3.

Under assumptions (T1)–(T5), we have

sup_{∥ χ ∥ \leq M} ∥ X_{n} (χ) + λ_{0} D χ - Σ_{n} ∥ = o_{a . c o .} (1)

with

D = (\begin{matrix} Ψ (1) - \int_{- 1}^{1} K^{'} (t) γ_{F} (t) d t & Ψ (1) - \int_{- 1}^{1} {(t Ψ (t))}^{'} γ_{F} (t) d t \\ Ψ (1) - \int_{- 1}^{1} {(t Ψ (t))}^{'} γ_{F} (t) d t & Ψ (1) - \int_{- 1}^{1} {(t^{2} Ψ (t))}^{'} γ_{F} (t) d t \end{matrix})

and

λ_{0} = C D (C Q (p, F) | F)

Proof of Lemma A3.

We prove

sup_{∥ χ ∥ \leq M} ∥ X_{n} (χ) - Σ_{n} - I E [X_{n} (χ) - Σ_{n}] ∥ = O_{a . c o .} (\sqrt{\frac{ln n}{n ξ_{F} (f_{n})}})

(A5)

and

sup_{∥ χ ∥ \leq M} ∥ I E [X_{n} (χ) - Σ_{n}] + C D (C Q (p, F) | F) D χ ∥ = O (f_{n}^{min (k_{1}, k_{2})}) .

(A6)

We use the compactness of the ball

B (0, M)

in

{I R}^{2}

, and we write

B (0, M) \subset ⋃_{j = 1}^{d_{n}} B (χ_{j}, l_{n}), χ_{j} = (\begin{matrix} c_{j} \\ d_{j} \end{matrix}) and l_{n} = d_{n}^{- 1} = 1 / \sqrt{n} .

Taking

j (χ) = arg {min}_{j} | χ - χ_{j} |

, we use the fact that

\begin{matrix} sup_{∥ χ ∥ \leq M} ∥ X_{n} (χ) - Σ_{n} - I E [X_{n} (χ) - Σ_{n}] ∥ \\ \leq sup_{∥ χ ∥ \leq M} ∥ X_{n} (χ) - X_{n} (χ_{j}) ∥ \\ + sup_{∥ χ ∥ \leq M} ∥ X_{n} (χ_{j}) - Σ_{n} - I E [X_{n} (χ_{j}) - Σ_{n}] ∥ \\ + sup_{∥ χ ∥ \leq M} ∥ I E [X_{n} (χ) - X_{n} (χ_{j})] . \end{matrix}

Since

| 1 I_{[Y < a]} - 1 I_{[Y < b]} | \leq 1 I_{[| Y - b | \leq | a - b |]}

, then

sup_{∥ χ ∥ \leq M} ∥ X_{n} (χ) - X_{n} (χ_{j}) ∥ \leq \frac{1}{n ξ_{F} (f_{n})} \sum_{i} Ω_{i}

where

Ω_{i} = sup_{∥ χ ∥ \leq M} 1 I_{[| D_{i} - (c_{j} + A) - (f_{n}^{- 1} d_{j} + d) I_{i} | \leq C l_{n}]} ∥(\begin{matrix} 1 \\ f_{n}^{- 1} I_{i} \end{matrix})∥ Ψ_{i} .

Once again we use the Fuk–Nagaev’s inequality to deduce its convergence rate. Here

S_{n}^{2} = V a r [\sum_{i = 1}^{n} Ω_{i}] = O (n l_{n} ξ_{F} (f_{n})) .

Therefore,

l_{n} = o (\sqrt{\frac{ln n}{n ξ_{F} (f_{n})}})

(A7)

shows that

sup_{∥ χ ∥ \leq M} ∥ X_{n} (χ) - X_{n} (χ_{j}) ∥ = O_{a . c o .} ({(\frac{log n}{n ξ_{F} (f_{n})})}^{1 / 2}) .

Concerning

sup_{∥ χ ∥ \leq M} ∥ I E [X_{n} (χ) - X_{n} (χ_{j})] ∥ \leq \frac{1}{ξ_{F} (f_{n})} I E [Ω_{1}] \leq C l_{n}

It follows

sup_{∥ χ ∥ \leq M} ∥ I E [X_{n} (χ) - X_{n} (χ_{j})] ∥ = o_{a . c o .} ({(\frac{log n}{n ξ_{F} (f_{n})})}^{1 / 2}) .

Secondly,

sup_{∥ χ ∥ \leq M} ∥ X_{n} (χ_{j}) - Σ_{n} - I E [X_{n} (χ_{j}) - Σ_{n}] ∥ .

It expressed

X_{n} (χ_{j}) - Σ_{n} - I E [X_{n} (χ_{j}) - Σ_{n}] = (\begin{matrix} Θ_{n}^{1} (χ_{j}) \\ Θ_{n}^{2} (χ_{j}) \end{matrix})

where

\{\begin{matrix} Θ_{n}^{1} (χ_{j}) = \frac{1}{n ξ_{F} (f_{n})} \sum_{i = 1}^{n} Λ_{i}^{1} \\ Θ_{n}^{2} (χ_{j}) = \frac{1}{n f_{n} ξ_{F} (f_{n})} \sum_{i = 1}^{n} Λ_{i}^{2} \end{matrix}

with

Λ_{i}^{1} = (Υ (p, χ_{j}) - Υ (p, χ_{0})) Ψ_{i} - I E [(Υ (p, χ_{j}) - Υ (p, χ_{0})) Ψ_{i}]

and

Λ_{i}^{2} = (Υ (p, χ_{j}) - Υ (p, χ_{0})) I_{i} Ψ_{i} - I E [(Υ (p, χ_{j}) - Υ (p, χ_{0})) I_{i} Ψ_{i}] .

We get

V a r [\sum_{i = 1}^{n} Λ_{i}^{1}] = O (n ξ_{F} (f_{n}))

and

V a r [\sum_{i = 1}^{n} Λ_{i}^{2}] = O (n ξ_{F} (f_{n}))

which implies

sup_{∥ χ ∥ \leq M} ∥ X_{n} (χ_{j}) - Σ_{n} - I E [X_{n} (χ_{j}) - Σ_{n}] ∥ . = O_{a . c o .} (\sqrt{\frac{ln n}{n ξ_{F} (f_{n})}}) .

(A8)

We have

sup_{∥ χ ∥ \leq M} ∥ I E [X_{n} (χ) - Σ_{n}] + d (C Q (p, F) | F) D χ + o (∥ χ ∥) ∥ = O (f_{n}^{min (k_{1}, k_{2})})

which leads to the result (A6). □

Lemma A4.

Under assumptions (T1)–(T5), we have

sup_{| χ | \leq M} sup_{p \in [0, 1]} | Ξ_{n} (χ, p) - I E [Ξ_{n} (χ, p)] | = O_{a . c o .} ({(\frac{ln n}{n ξ_{F} (f_{n})})}^{1 / 2}) .

where

Ξ_{n} (χ, p) = \frac{1}{n ξ_{F} (f_{n})} \sum_{i = 1}^{n} Υ (p, χ) (\begin{matrix} 1 \\ f_{n}^{- 1} I_{i} \end{matrix}) Ψ_{i}, f o r χ = (\begin{matrix} c \\ d \end{matrix})

Proof of Lemma A4.

The proof of this lemma is omitted. It relies on compactness

[0, 1]

that implies

[0, 1] \subset ⋃_{k = 1}^{d_{n}} [p_{k} - l_{n}, p_{k} + l_{n}], f o r p_{k} \in [0, 1] .

Next, for all

p \in [0, 1]

we put

F_{p} = arg {min}_{k} | p - p_{k} |

, and we evaluate the term as function of

χ

and p. We have

\begin{matrix} sup_{| χ | \leq M} sup_{p \in [0, 1]} | Ξ_{n} (χ, p) - I E [Ξ_{n} (χ, p)] | \leq sup_{| χ | \leq M} sup_{p \in [0, 1]} | Ξ_{n} (χ, p) - Ξ_{n} (χ_{j (χ)}, p) | \\ + sup_{| χ | \leq M} sup_{p \in [0, 1]} | Ξ_{n} (χ_{j (χ)}, p) - Ξ_{n} (χ_{j (χ)}, F_{p}) | \\ + sup_{| χ | \leq M} sup_{p \in [0, 1]} | Ξ_{n} (χ_{j (χ)}, F_{p}) - I E [Ξ_{n} (χ_{j (χ)}, F_{p})] | \\ + sup_{| χ | \leq M} sup_{p \in [0, 1]} | I E [Ξ_{n} (χ_{j (χ)}, F_{p})] - I E [Ξ_{n} (χ, F_{p})] \\ + sup_{| χ | \leq M} sup_{p \in [0, 1]} | I E [Ξ_{n} (χ, F_{p})] - I E [Ξ_{n} (χ, p)] . \end{matrix}

Using Fuk–Nagaev inequality for each term, we obtain

sup_{| χ | \leq M} sup_{p \in [0, 1]} | Ξ_{n} (χ, p) - Ξ_{n} (χ_{j (χ)}, p) | = O_{a . c o .} (\sqrt{\frac{ln n}{n ξ_{F} (f_{n})}})

sup_{| χ | \leq M} sup_{p \in [0, 1]} | Ξ_{n} (χ_{j (χ)}, p) - Ξ_{n} (χ_{j (χ)}, F_{p}) | = O_{a . c o .} (\sqrt{\frac{ln n}{n ξ_{F} (f_{n})}})

sup_{| χ | \leq M} sup_{p \in [0, 1]} | Ξ_{n} (χ_{j (χ)}, F_{p}) - I E [Ξ_{n} (χ_{j (χ)}, F_{p})] | = O_{a . c o .} (\sqrt{\frac{ln n}{n ξ_{F} (f_{n})}})

sup_{| χ | \leq M} sup_{p \in [0, 1]} | I E [Ξ_{n} (χ_{j (χ)}, F_{p})] - I E [Ξ_{n} (χ, F_{p})] = O (\sqrt{\frac{ln n}{n ξ_{F} (f_{n})}})

sup_{| χ | \leq M} sup_{p \in [0, 1]} | I E [Ξ_{n} (χ, F_{p})] - I E [Ξ_{n} (χ, p)] = O (\sqrt{\frac{ln n}{n ξ_{F} (f_{n})}})

□

Appendix B. Notation Box and Acronyms List

Functional Data Analysis	FDA
Functional Time Series Data Analysis	FTSDA
Nadaraya–Watson Estimator	NW
Local Linear Estimation Method	LLEM
Local Constant Approach	LCA
Conditional Mode	CM
Conditional Cumulative function	CC
Conditional Density Function	CD
The Kernel Function	$Ψ$
The Functional Bandwidth Sequence	$f_{n}$
The Locating Function	M
The Conditional Mode Estimator	$\hat{R M}$
The Conditional Quantile Estimator	$\hat{C Q}$
The Estimator of the Derivative of the Conditional Quantile	${\hat{C Q}}^{'}$
The Bandwidth Sequence of the Derivative	$r_{n}$

References

Li, J.; Li, J.; Jia, N.; Li, X.; Ma, W.; Shi, S. GeoTraPredict: A machine learning system of web spatio-temporal traffic flow. Neurocomputing 2021, 428, 317–324. [Google Scholar] [CrossRef]
Park, D.-C. Structure optimization of BiLinear Recurrent Neural Networks and its application to Ethernet network traffic prediction. Inf. Sci. 2013, 237, 18–28. [Google Scholar] [CrossRef]
Shelatkar, T.; Tondale, S.; Yadav, S.; Ahir, S. Web Traffic Time Series Forecasting using ARIMA and LSTM RNN. In Proceedings of the 2020 International Conference on Data Science and Engineering, Mumbai, India, 12–14 August 2020. [Google Scholar]
Bosq, D. Linear Processes in Function Spaces; Lecture Notes in Statistics; Springer: New York, NY, USA, 2000; Volume 149. [Google Scholar]
Ferraty, F.; Vieu, P. Nonparametric Functional Data Analysis. Theory and Practice; Springer Series in Statistics; Springer: New York, NY, USA, 2006. [Google Scholar]
Masry, E. Nonparametric regression estimation for dependent functional data: Asymptotic normality. Stochastic Process. Appl. 2005, 115, 155–177. [Google Scholar] [CrossRef]
Ling, N.; Vieu, P. Nonparametric modelling for functional data: Selected survey and tracks for future. Statistics 2018, 52, 934–949. [Google Scholar] [CrossRef]
Bouzebda, S.; Laksaci, A.; Mohammedi, M. Single index regression model for functional quasi-associated time series data. REVSTAT 2022, 20, 605–631. [Google Scholar]
Wang, L. Nearest neighbors estimation for long memory functional data. Stat. Methods Appl. 2020, 29, 709–725. [Google Scholar] [CrossRef]
Collomb, G.; Härdle, W.; Hassani, S. A note on prediction via estimation of the conditional mode function. J. Stat. Plan. Inference 1986, 15, 227–236. [Google Scholar] [CrossRef]
Dabo-Niang, S.; Kaid, Z.; Laksaci, A. Spatial conditional quantile regression: Weak consistency of a kernel estimate. Rev. Roum. Math. Pures Appl. 2012, 57, 311–339. [Google Scholar]
Bouzebda, S.; Didi, S. Some results about kernel estimators for function derivatives based on stationary and ergodic continuous time processes with applications. Commun. Stat. Theory Methods 2022, 51, 3886–3933. [Google Scholar] [CrossRef]
Dabo-Niang, S.; Kaid, Z.; Laksaci, A. Asymptotic properties of the kernel estimate of spatial conditional mode when the regressor is functional. AStA Adv. Stat. Anal. 2015, 99, 131–160. [Google Scholar] [CrossRef]
Ezzahrioui, M.H.; Ould-Saïd, E. Asymptotic normality of a nonparametric estimator of the conditional mode function for functional data. J. Nonparametr. Stat. 2008, 20, 3–18. [Google Scholar] [CrossRef]
Xu, Y. Functional Data Analysis. In Springer Handbook of Engineering Statistics; Pham, H., Ed.; Springer: London, UK, 2023; pp. 67–85. [Google Scholar]
Cardot, H.; Crambes, C.; Sarda, P. Quantile regression when the covariates are functions. J. Nonparametr. Stat. 2005, 17, 841–856. [Google Scholar] [CrossRef]
Wang, H.; Ma, Y. Optimal subsampling for quantile regression in big data. Biometrika 2021, 108, 99–112. [Google Scholar] [CrossRef]
Dabo-Niang, S.; Kaid, Z.; Laksaci, A. On spatial conditional mode estimation for a functional regressor. Stat. Probab. Lett. 2012, 82, 1413–1421. [Google Scholar] [CrossRef]
Dabana, H.; Agbokou, K.; Gneyou, K. Local linear estimation of conditional probability density and mode under right censoring and left truncation: Dependent data case. Gulf J. Math. 2025, 20, 338–359. [Google Scholar] [CrossRef]
Azzi, A.; Laksaci, A.; Ould-Saïd, E. On the robustification of the kernel estimator of the functional modal regression. Stat. Probab. Lett. 2021, 181, 109256. [Google Scholar] [CrossRef]
Azzi, A.; Belguerna, A.; Laksaci, A.; Rachdi, M. The scalar-on-function modal regression for functional time series data. J. Nonparametr. Stat. 2024, 36, 503–526. [Google Scholar] [CrossRef]
Alamari, M.B.; Almulhim, F.A.; Almanjahie, I.M.; Bouzebda, S.; Laksaci, A. Scalar-on-Function Mode Estimation Using Entropy and Ergodic Properties of Functional Time Series Data. Entropy 2025, 27, 552. [Google Scholar] [CrossRef]
Fan, J. Local Polynomial Modelling and Its Applications: Monographs on Statistics and Applied Probability 66; Routledge: Abingdon-on-Thames, UK, 2018. [Google Scholar]
Rachdi, M.; Laksaci, A.; Demongeot, J.; Abdali, A.; Madani, F. Theoretical and practical aspects of the quadratic error in the local linear estimation of the conditional density for functional data. Comput. Stat. Data Anal. 2014, 73, 53–68. [Google Scholar] [CrossRef]
Baíllo, A.; Grané, A. Local linear regression for functional predictor and scalar response. J. Multivar. Anal. 2009, 100, 102–111. [Google Scholar] [CrossRef]
Barrientos-Marin, J.; Ferraty, F.; Vieu, P. Locally modelled regression and functional data. J. Nonparametr. Stat. 2010, 22, 617–632. [Google Scholar] [CrossRef]
Berlinet, A.; Elamine, A.; Mas, A. Local linear regression for functional data. Ann. Inst. Stat. Math. 2011, 63, 1047–1075. [Google Scholar] [CrossRef]
Demongeot, J.; Laksaci, A.; Madani, F.; Rachdi, M. Functional data: Local linear estimation of the conditional density and its application. Statistics 2013, 47, 26–44. [Google Scholar] [CrossRef]
Laksaci, A.; Ould Saïd, E.; Rachdi, M. Uniform consistency in number of neighbors of the k NN estimator of the conditional quantile model. Metrika 2021, 84, 895–911. [Google Scholar] [CrossRef]
Almulhim, F.A.; Alamari, N.B.; Laksaci, A.; Kaid, Z. Modal Regression Estimation by Local Linear Approach in High-Dimensional Data Case. Axioms 2025, 14, 537. [Google Scholar] [CrossRef]
Jones, D.A. Nonlinear autoregressive processes. Proc. R. Soc. Lond. A 1978, 360, 71–95. [Google Scholar]
Ozaki, T. Nonlinear Time Series Models for Nonlinear Random Vibrations; Technical Report; University of Manchester: Manchester, UK, 1979. [Google Scholar]
Engle, R.F. Autoregressive conditional heteroskedasticity with estimates of the variance of U.K. inflation. Econometrica 1982, 50, 987–1007. [Google Scholar] [CrossRef]
Al-Awadhi, F.A.; Kaid, Z.; Laksaci, A.; Ouassou, I.; Rachdi, M. Functional data analysis: Local linear estimation of the L₁-conditional quantiles. Stat. Methods Appl. 2019, 28, 217–240. [Google Scholar] [CrossRef]
Koumar, J.; Hynek, K.; Čejka, T.; Šiška, P. CESNET-TimeSeries24: Time Series Dataset for Network Traffic Anomaly Detection and Forecasting. Sci. Data 2025, 12, 338. [Google Scholar] [CrossRef]

Figure 1. A small sample of functional regressor presented in different colors.

Figure 2. Flow during 40 weeks.

Figure 3. The daily curve of flow.

Figure 4. Prediction results.

Table 1. MSE Results for different sample size n and various

M F

.

Table 1. MSE Results for different sample size n and various

M F

.

Multiplicative Factor MV	r	Sample Size n	MSE
MF = 1	$m = 0.2$	50	0.94
	$m = 0.5$	50	0.72
	$m = 0.8$	50	0.64
	$m = 0.2$	100	0.71
	$m = 0.5$	100	0.56
	$m = 0.8$	100	0.47
	$m = 0.2$	250	0.12
	$m = 0.5$	250	0.32
	$m = 0.8$	250	0.58
MF = 5	$m = 0.2$	50	1.08
	$m = 0.5$	50	0.92
	$m = 0.8$	50	0.80
	$m = 0.2$	100	0.92
	$m = 0.5$	100	0.81
	$m = 0.8$	100	0.63
	$m = 0.2$	250	0.44
	$m = 0.5$	250	0.51
	$m = 0.8$	250	0.53
MF = 10	$m = 0.2$	50	1.13
	$m = 0.5$	50	1.04
	$m = 0.8$	50	0.93
	$m = 0.2$	100	0.92
	$m = 0.5$	100	0.88
	$m = 0.8$	100	0.75
	$m = 0.2$	250	0.63
	$m = 0.5$	250	0.70
	$m = 0.8$	250	0.74

Table 2. MSE results for different scenarios.

Future Time Horizon h	m	Selector Rule	Locating Function	MSE
h = 0.5 T	$m = 0.2$	Rule 1	$M_{1}$	0.21
			$M_{2}$	0.24
		Rule 2	$M_{1}$	0.28
			$M_{2}$	0.32
		Rule 3	$M_{1}$	0.15
			$M_{2}$	0.22
	$m = 0.5$	Rule 1	$M_{1}$	0.43
			$M_{2}$	0.52
		Rule 2	$M_{1}$	0.43
			$M_{2}$	0.63
		Rule 3	$M_{1}$	0.35
			$M_{2}$	0.41
	$m = 0.8$	Rule 1	$M_{1}$	0.66
			$M_{2}$	0.71
		Rule 2	$M_{1}$	0.79
			$M_{2}$	0.68
		Rule 3	$M_{1}$	0.57
			$M_{2}$	0.69
h = 2 T	$m = 0.2$	Rule 1	$M_{1}$	0.93
			$M_{2}$	0.96
		Rule 2	$M_{1}$	1.05
			$M_{2}$	0.32
		Rule 3	$M_{1}$	0.87
			$M_{2}$	0.84
	$m = 0.5$	Rule 1	$M_{1}$	1.11
			$M_{2}$	1.23
		Rule 2	$M_{1}$	1.22
			$M_{2}$	1.45
		Rule 3	$M_{1}$	1.35
			$M_{2}$	1.21
	$m = 0.8$	Rule 1	$M_{1}$	1.33
			$M_{2}$	1.41
		Rule 2	$M_{1}$	1.39
			$M_{2}$	1.36
		Rule 3	$M_{1}$	1.35
			$M_{2}$	1.42

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Kaid, Z.; Alamari, M.B. Data-Driven Modeling of Web Traffic Flow Using Functional Modal Regression. Axioms 2025, 14, 815. https://doi.org/10.3390/axioms14110815

AMA Style

Kaid Z, Alamari MB. Data-Driven Modeling of Web Traffic Flow Using Functional Modal Regression. Axioms. 2025; 14(11):815. https://doi.org/10.3390/axioms14110815

Chicago/Turabian Style

Kaid, Zoulikha, and Mohammed B. Alamari. 2025. "Data-Driven Modeling of Web Traffic Flow Using Functional Modal Regression" Axioms 14, no. 11: 815. https://doi.org/10.3390/axioms14110815

APA Style

Kaid, Z., & Alamari, M. B. (2025). Data-Driven Modeling of Web Traffic Flow Using Functional Modal Regression. Axioms, 14(11), 815. https://doi.org/10.3390/axioms14110815

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Data-Driven Modeling of Web Traffic Flow Using Functional Modal Regression

Abstract

1. Introduction

2. FTSDA Framework and the Local Linear Modal Regression

Algorithm

3. Mathematical Support

4. Web Traffic Flow Modeling

4.1. Test of the Data-Driven Approach over Artificial Data

4.2. Real Data Application

5. Discussion and Conclusions

Author Contributions

Funding

Data Availability Statement

Acknowledgments

Conflicts of Interest

Appendix A. Proofs of the Asymptotic Result

Appendix B. Notation Box and Acronyms List

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI