Data-Driven Estimation of End-to-End Delay Probability Density Function for Time-Sensitive WiFi Networks

Cao, Jianyu; Dai, Yujun; Huang, Shuping; Zhang, Minghe

doi:10.3390/electronics14122324

Open AccessArticle

Data-Driven Estimation of End-to-End Delay Probability Density Function for Time-Sensitive WiFi Networks

by

Jianyu Cao

^1,2,*

,

Yujun Dai

^1,2,

Shuping Huang

^1,† and

Minghe Zhang

^1,*,†

¹

School of Computer Science and Information Security, Guilin University of Electronic Technology, Guilin 541004, China

²

State Key Laboratory of Public Big Data, Guizhou University, Guiyang 550025, China

^*

Authors to whom correspondence should be addressed.

^†

These authors contributed equally to this work.

Electronics 2025, 14(12), 2324; https://doi.org/10.3390/electronics14122324

Submission received: 13 May 2025 / Revised: 3 June 2025 / Accepted: 4 June 2025 / Published: 6 June 2025

Download

Browse Figures

Versions Notes

Abstract

Time-sensitive applications require the End-to-End (E2E) delay of wireless networks to be deterministic. For example, control signals in industrial automation, intelligent transportation, and telemedicine must be transmitted to their destinations within the millisecond range, with delay jitter controlled within the microsecond range. To formulate effective policies for maintaining E2E delay within a small deterministic range, it is essential to estimate the probability density function (PDF) of E2E delay. Data-driven methods based on mixture density networks have been employed to estimate the PDF of E2E delay in wireless networks. However, in WiFi networks, the estimation results produced by existing methods exhibit significant discrepancies and fluctuations when compared to actual measurements. Motivated by this, an improved estimation method is proposed, where the delay PDF is divided into three segments with different functional expressions that are coupled together. Moreover, the parameter estimation process is implemented in two stages. First, the two division thresholds for the three segments of the PDF are calculated based on the variation trend of E2E delay measurements. Second, the remaining parameters are obtained through training using an improved mixture density network. Experimental results indicate that the E2E delay PDF obtained by the proposed method exhibits a smaller gap compared to actual measurements than existing methods. Specifically, the mean absolute errors and average fluctuation amplitudes of tail probabilities at certain delay values decrease by at least one order of magnitude. Moreover, the multiple-segmentation feature of the proposed method enhances its robustness in situations where measurement data are affected by low levels of Gaussian noise.

Keywords:

deterministic wireless network; end-to-end delay; mixture density network; time-sensitive network

1. Introduction

Advancements in communication, computation, and artificial intelligence continuously promote the development of time-sensitive applications [1], such as industrial automation, intelligent transportation, telemedicine, data center networking, mobile edge computing (MEC) [2], etc. Time-sensitive applications demand that the End-to-End (E2E) transmission delay remains within the millisecond range and that the delay jitter stays within the microsecond range, both with a high probability [3,4,5,6]. Time-Sensitive Networking (TSN) is a set of standards developed to address these stringent requirements in Ethernet networks [7]. As depicted in Figure 1, for Ethernet, TSN technology employs traffic shaping mechanisms such as IEEE 802.1Qav and IEEE 802.1Qbv to allocate deterministic transmission slots for high-priority traffic. This ensures the availability of transmission bandwidth and precise timing for critical operations, thereby enhancing the reliability and delay determinism of key services. For example, Wang et al. [8] conducted two experiments utilizing two network test facilities: the China Environment for Network Innovations (CENI) and the Yangtze River Delta Comprehensive Test Environment (YZNET). In CENI, the communication distance spans over 2000 km with 11 hops, resulting in an average delay of approximately 27.440 ms. In YZNET, the communication distance exceeds 500 km with 6 hops, leading to an average delay of about 6.625 ms. Notably, a maximum jitter of less than 30 µs was achieved, which remains unaffected by congestion levels or communication distances. However, in wireless networks, factors such as dynamic node behavior, energy limitations, and perpetually evolving environments pose significant challenges to the implementation of TSN technologies. 5G ultra-reliable low-latency communication (URLLC) technology has achieved remarkable progress in reducing E2E delay. However, it does not adequately account for queueing delay, which may still pose challenges in certain real-time applications. As a result, there remains a gap in meeting the stringent delay requirements of time-sensitive applications. Consequently, ensuring a deterministic E2E delay remains a critical objective in the field of wireless networks. To achieve deterministic control of E2E delay in wireless networks, accurate delay estimation is essential, as relying solely on average delay is insufficient. Therefore, the delay probability density function (PDF) must be estimated so that resource and queue management strategies can be adjusted accordingly. This ensures the timely transmission of critical data even as network conditions change.

In the future, WiFi 7 is expected to incorporate TSN capabilities, thereby enabling ultra-reliability and low-latency in unlicensed frequency bands [3]. In contrast to mobile networks, the WiFi environment is predominantly indoor-oriented. Consequently, delay fluctuations in WiFi networks are less pronounced compared to those in mobile networks. Therefore, the estimation of the E2E delay PDF in WiFi networks is not as complex as that in mobile networks, yet similarities exist. In this paper, the primary focus is on the estimation of the E2E delay PDF in WiFi networks. It is expected that the associated findings can be extended and applied to mobile networks. At present, the estimation methods for network delay PDFs can be categorized into two types: theoretical analysis approaches and data-driven methods. Theoretical analysis methods generally depend on a set of assumptions and utilize mathematical tools to solve and analyze the system behavior. However, these assumptions may not always align with practical scenarios. In recent years, the integration of machine learning into network delay estimation has garnered significant attention [9]. The development in this field is primarily driven by data-driven approaches, where machine learning techniques based on neural networks are utilized to identify and learn the relationships between delay and other variables by extracting insights from real-world data.

As a representative data-driven approach, Mixture Density Networks (MDNs) have been extensively applied and validated for delay PDF estimation in complex systems. MDNs integrate neural networks with probabilistic models, such as the Gaussian Mixture Model (GMM), to estimate the parameters of the mixture model via a fully connected neural network, thereby generating a conditional PDF. Recently, Mostafavi et al. [10] employed MDN in conjunction with the Generalized Pareto Distribution (GPD) to estimate the E2E delay PDF of several real networks, including Commercial Off-The-Shelf (COTS) 5G, Open Air Interface (OAI) 5G, and Mango COMM IEEE 802.11 (WiFi). MDN maps the transmission conditions or traffic characteristics to the parameters of the PDF and generates the PDF. However, for WiFi networks, there is a noticeable gap between the estimation results and actual measurements, particularly in the tail of the PDF. To address this issue, this paper proposes a two-stage and three-divided estimation method, referred to as TSTD, for estimating the E2E delay PDF in WiFi networks. The main contributions are summarized as follows.

A three-divided PDF is constructed, consisting of a main part and a two-segment tail part. The main part is modeled using GMM, while the two segments of the tail part are modeled using distinct GPDs. Additionally, the coupling relationships between the different segments in the main part and the tail part are taken into account.
For the parameters of the PDF, the thresholds of the GPDs are determined based on the variation trend of the tail probability in E2E delay measurements, and the remaining parameters are obtained through training via MDN.
Compared with existing methods, the E2E delay PDF obtained using the proposed method exhibits a smaller discrepancy from actual measurements. In particular, the mean absolute errors and average fluctuation amplitudes of tail probabilities at certain delay values are reduced by at least one order of magnitude. Moreover, the multiple-segmentation feature of the proposed method enhances its robustness in situations where measurement data are affected by low levels of Gaussian noise.

The remainder of this paper is structured as follows. Section 2 reviews the related work. Section 3 elaborates on the research motivation and offers a concise introduction to the problem statement. Section 4 details the proposed two-stage and three-divided estimation method. Section 5 details the experimental procedures and performance analysis. Section 6 discusses the limitations of the proposed method and outlines future work. Finally, Section 7 concludes this paper.

2. Related Work

The estimation methods for the E2E delay probability distribution in multiple application scenarios have garnered significant attention. Cao et al. [11] modeled the Mobile Edge Computing (MEC) network as a two-stage tandem queueing system. Subsequently, they proposed an estimation method for the probability distribution of the E2E delay using the matrix-geometric method. Mei et al. [12] modeled and analyzed the delay bound of a multi-cluster MEC network using the stochastic network calculus (SNC) approach. Cui et al. [13] proposed a method for calculating the E2E delay violation probability of target traffic in the industrial Internet of Things (IoT), also based on SNC. Coll-Perales et al. [14] proposed a 5G E2E delay model for Vehicle-to-Network (V2N) and Vehicle-to-Network-to-Vehicle (V2NV2N) communications. They quantified and analyzed the 5G E2E delay performance for V2N and V2NV2N communications under various 5G network deployments and configurations.

The aforementioned methods are all theoretical analysis. Theoretical analysis methods generally rely on a series of assumptions and employ mathematical tools to solve and analyze the system’s behavior. However, these assumptions may not always align with practical scenarios. In view of this, data-driven methods have been extensively studied for estimating network delay probability distributions.

Fadhil et al. [15] proposed a method for modeling the E2E delay of 5G networks using a GMM. Based on E2E delay data, an Expectation-Maximization (EM) algorithm was employed to estimate the GMM parameters. The results indicate that as the number of data samples and GMM components increases, higher accuracy can be achieved. However, the computation time also increases rapidly with the increase in the number of samples and components. Specifically, the increasing trend is approximated as an exponential behavior as the number of GMM components increases [15]. In [16], Chen et al. employed the deep learning approach to generate the probability density function (PDF) of practical data. The cumulative distribution functions (CDFs) of common probability distributions were utilized as activation functions in the hidden layers of the proposed deep learning model to learn actual cumulative probabilities. Furthermore, the differential equation derived from the trained deep learning model can be used to estimate the PDF. Experimental results demonstrated that both the CDF and PDF can be accurately estimated by the proposed method, as assessed by the mean absolute percentage error. In [17], the MDN combined with GMM was utilized to estimate the delay probability distribution of single-stage queueing systems. Subsequently, this work was extended by Raeis et al. [18] to more complex systems, namely service function chains. In [17,18], the estimation methods have been proven effective in fitting the main part of the delay probability distribution using machine learning techniques. However, the fitting results for the tail part of the distribution are less satisfactory. This is because the output of GMM-based models typically exhibits exponentially decaying tail probabilities, which can lead to significant errors when dealing with low-probability events, especially in the case of heavy-tailed distributions.

It is shown in some of the literature that better fitting results can be obtained when the main and tail parts of the delay probability distribution are fitted using different functions respectively. Yasuda et al. [19] utilized the bistate model to simulate wireless networks involving connection and disconnection processes. For the E2E delay probability distribution, the main part, which arises during the connection process, was fitted using the shifted Gamma distribution. In contrast, the tail part, resulting from the accumulation of probe packets during the disconnection phase, was approximated by the exponential distribution. This approach exhibited superior performance compared to traditional methods in terms of both the negative log predictive density (NLPD) [20] and the continuous ranked probability score (CRPS) [21]. In addition, the GPD is a classical asymptotically motivated model for the unknown excess distribution above high thresholds [22,23]. Consequently, it has been widely applied to fit the tail part of the delay PDF [10,24]. In [24], Mostafavi et al. proposed an extreme value mixture model based on the mixture density network (MDN). This was achieved by integrating the GPD tail model with the GMM. Specifically, the GMM was used to capture the main part of the delay PDF, while the GPD was employed to characterize the tail of the delay PDF. Numerical experiments conducted on a three-stage tandem queueing system demonstrated that the proposed method outperforms existing state-of-the-art GMM-based estimation techniques. Furthermore, Mostafavi et al. [10] integrated the MDN with the GPD to estimate the PDF of E2E delay in three wireless network scenarios, namely commercial off-the-shelf (COTS) 5G, Open Air Interface (OAI) 5G, and Mango COMM IEEE 802.11 (WiFi). The GMM was employed to fit the main part of the delay probability distribution, while the tail part was modeled using the GPD. For the 5G scenarios, this approach exhibited robust performance through noise regularization when the tail profile was nonsmooth. However, in the WiFi network scenario, the estimation results showed a significant gap and fluctuation compared to actual measurements.

In [10], the threshold parameter of the GPD is obtained through training. It is worth noting that the threshold parameter of the GPD can also be determined by other methods, such as empirical approaches and graphical techniques [25]. The empirical method depends on an analyst’s statistical knowledge and expertise to select an appropriate threshold, which may introduce bias. Graphical diagnostic methods [26] have been widely used for threshold estimation, utilizing specific plots of the measured data to observe trends and assist in threshold selection. Cyrille et al. [27] utilized graphical diagnostic tools to determine the threshold range. Based on this, they established specific thresholds and optimized the parameters of the GPD using the Kolmogorov–Smirnov (KS) goodness-of-fit test. They further validated that the estimated parameters exhibited improved accuracy with larger sample sizes through the use of actual hydrological data. Zhao et al. [28] pointed out that the process of GPD parameter estimation is independent of threshold selection. They compared the performance of three threshold selection procedures to determine an appropriate threshold for asymptotic fitting of the GPD above this threshold. The results demonstrated that this estimator achieved satisfactory performance in environmental data analysis, particularly in fitting the tail probability distribution.

Drawing on the strengths of the existing works mentioned above, this paper proposes an improved estimation method to enhance the accuracy of fitting the E2E delay PDF in WiFi networks. Firstly, the threshold parameter of the GPD is determined based on the variation trend of E2E delay measurements. Subsequently, the remaining parameters of the GPD are estimated via the MDN. Moreover, the tail part of the delay PDF is divided into two segments, each of which is fitted using a distinct GPD.

3. Research Motivation and Problem Statement

3.1. Motivation for E2E Delay PDF Estimation

In wireless communication networks, delay management is one of the critical factors for ensuring system reliability. The delay PDF can reveal the behavioral characteristics of delays within the network, particularly the tail section extending to the far right, which represents the occurrence of rare delay events. By analyzing these tail events, it becomes possible to predict and mitigate long delays that may affect system performance. For instance, by accurately modeling the delay PDF, the system can anticipate potential congestion or queueing issues and accordingly adjust transmission scheduling, buffer sizes, or retransmission strategies. Moreover, in priority-based scheduling systems, analyzing the delay PDF enables network controllers to differentiate service levels more effectively based on expected delay characteristics. This facilitates the allocation of sufficient bandwidth or transmission opportunities to critical data packets, especially during periods of high traffic load. Consequently, understanding and leveraging the delay PDF is crucial for designing adaptive resource management policies.

3.2. Problem Statement

The conditional PDF for the E2E delay Y of packets with length X in WiFi networks is estimated based on the measured delay data, as illustrated in Figure 2.

Let

\{(x_{n}, y_{n}) : n = 0, 1, \dots, N\}

denote the delay dataset measured over a period of time, where

(x_{n}, y_{n})

is the record corresponding to the n-th packet,

x_{n}

denotes the packet length,

x_{n} \in S = \{s_{k} : k = 0, 1, \dots, K\}

,

s_{k} > 0

, and

y_{n} (y_{n} > 0)

denotes the E2E (uplink or downlink) delay. For packets with length

X = x \in S

, the conditional PDF

h (y | \hat{θ} (x))

will be estimated using some samples from the delay dataset, where

\hat{θ} (x)

represents the estimated parameter vector. In additional, the CDF is denoted as

H (y | \hat{θ} (x)) = P \{Y \leq y | \hat{θ} (x)\}

. The tail probability is denoted as

\bar{H} (y | \hat{θ} (x)) = 1 - H (y | \hat{θ} (x))

.

4. Two-Stage and Three-Divided Estimation Method

The proposed method, TSTD, consists of two steps: namely, PDF construction and parameter estimation, as described below.

h (y ∣ \hat{θ} (x)) = \{\begin{matrix} f (y ∣ {\hat{θ}}_{0} (x)), & y \leq u_{1} (x), \\ α_{1} (x) [1 - F (u_{1} (x) ∣ {\hat{θ}}_{0} (x))] g_{1} (y ∣ {\hat{θ}}_{1} (x)) + (1 - α_{1} (x)) f (y ∣ {\hat{θ}}_{0} (x)), & u_{1} (x) < y \leq u_{2} (x), \\ α_{1} (x) α_{2} (x) [1 - F (u_{1} (x) ∣ {\hat{θ}}_{0} (x))] [1 - G_{1} (u_{2} (x) ∣ {\hat{θ}}_{1} (x))] g_{2} (y ∣ {\hat{θ}}_{2} (x)) \\ + α_{1} (x) (1 - α_{2} (x)) [1 - F (u_{1} (x) ∣ {\hat{θ}}_{0} (x))] g_{1} (y ∣ {\hat{θ}}_{1} (x)) \\ + (1 - α_{1} (x)) f (y ∣ {\hat{θ}}_{0} (x)), & y > u_{2} (x) . \end{matrix}

(1)

4.1. PDF Construction

The PDF

h (y | \hat{θ} (x))

is expressed in the form of (1), where

\hat{θ} (x) = (α_{1} (x), α_{2} (x), {\hat{θ}}_{0} (x), {\hat{θ}}_{1} (x), {\hat{θ}}_{2} (x)) .

(2)

h (y | \hat{θ} (x))

comprises both the main part and the tail part, with the latter being further divided into two segments, as illustrated in Figure 3.

4.1.1. Main Part

The segment within the range of

y \leq u_{1} (x)

is modeled using the GMM, and its PDF is represented as

f (y ∣ {\hat{θ}}_{0} (x))

.

f (y ∣ {\hat{θ}}_{0} (x)) = \sum_{i = 1}^{I} π_{i} (x) N (y | μ_{i} (x), σ_{i} (x)),

(3)

where

{\hat{θ}}_{0} (x) = (π_{i} (x), μ_{i} (x), σ_{i} (x) : i = 1, 2, \dots, I) .

(4)

N (y | μ_{i} (x), σ_{i} (x))

represents the Gaussian PDF with mean

μ_{i} (x)

and standard deviation

σ_{i} (x)

, where

π_{i} (x)

(0 < π_{i} (x) < 1)

denotes the weight. The corresponding CDF is expressed as

F (u_{1} (x) ∣ {\hat{θ}}_{0} (x)) = P \{Y \leq u_{1} (x) ∣ {\hat{θ}}_{0} (x)\}

.

4.1.2. Tail Part

For the segment within the range of

u_{1} (x) < y \leq u_{2} (x)

, the PDF is a mixture of

f (y ∣ {\hat{θ}}_{0} (x))

and

g_{1} (y ∣ {\hat{θ}}_{1} (x))

, as given by the second formula in (1). For the segment within the range of

y > u_{2} (x)

, the PDF is a mixture of

f (y ∣ {\hat{θ}}_{0} (x))

and

g_{m} (y ∣ {\hat{θ}}_{m} (x))

,

m = 1, 2

, as described by the third formula in (1). Here,

α_{m} (x)

represents the mixture weight, and

g_{m} (y ∣ {\hat{θ}}_{m} (x))

denotes the PDF of GPD, as defined in (5).

g_{m} (y ∣ {\hat{θ}}_{m} (x)) = \{\begin{matrix} \frac{1}{β_{m} (x)} (1 + \frac{ξ_{m} (x)}{β_{m} (x)} {(y - u_{m} (x))}^{- \frac{1}{ξ_{m} (x)} - 1}), & ξ_{m} (x) \neq 0, \\ \frac{1}{β_{m} (x)} e^{\frac{y - u_{m} (x)}{β_{m} (x)}}, & ξ_{m} (x) = 0, \end{matrix}

(5)

where

{\hat{θ}}_{m} (x) = (β_{m} (x), ξ_{m} (x), u_{m} (x)) .

(6)

β_{m} (x) > 0

.

y \geq u_{m} (x)

when

ξ_{m} (x) \geq 0

, and

u_{m} (x) \leq y \leq u_{m} (x) - β_{m} (x) / ξ_{m} (x)

when

ξ_{m} (x) < 0

. The corresponding CDF is expressed as

G_{m} (u_{j} (x) ∣ {\hat{θ}}_{m} (x)) = P \{Y \leq u_{m} (x) ∣ {\hat{θ}}_{m} (x)\}

,

j = 1, 2

.

The motivations for dividing the tail part into two segments and considering the coupling relationship between different segments are as follows. The variation trend of the tail probability distribution of the measured delay indicates that the tail initially decreases slowly. However, as the delay exceeds a certain threshold, the tail probability begins to fluctuate and subsequently decreases at a faster rate. Although the GPD exhibits heavy-tailed characteristics, a single GPD is insufficient to capture the variation characteristics of the tail probability. Therefore, the tail part is divided into two segments, each of which is fitted using distinct GPDs. Additionally, according to [16], fitting the probability distribution of actual data using a mixture of multiple probability distributions yields good results. Hence, the coupling relationship between the segments fitted by the GMM and GPDs is considered.

4.2. Parameter Estimation for PDF

For the PDF

h (y ∣ \hat{θ} (x))

, the parameter vector

\hat{θ} (x) = (α_{1} (x), α_{2} (x), {\hat{θ}}_{0} (x), {\hat{θ}}_{1} (x), {\hat{θ}}_{2} (x))

is estimated in two stages, as illustrated in Figure 4. In the first stage, the thresholds

u_{1} (x)

and

u_{2} (x)

in

{\hat{θ}}_{1} (x)

and

{\hat{θ}}_{2} (x)

are determined. In the second stage, the remaining parameters, which constitute the parameter vector

{\hat{θ}}^{'} (x)

, are estimated via MDN.

{\hat{θ}}^{'} (x) = (α_{1} (x), α_{2} (x), {\hat{θ}}_{0} (x), β_{1} (x), ξ_{1} (x), β_{2} (x), ξ_{2} (x)) .

(7)

4.2.1. First Stage: Determine the Thresholds $u_{1} (x)$ and $u_{2} (x)$

The threshold

u_{1} (x)

is utilized to divide the main and tail parts of the delay PDF under the condition of packet length x. Based on the characteristics of heavy-tailed distributions, significant fluctuations typically occur around the inflection point between the main and tail parts. Consequently,

u_{1} (x)

is defined as the delay value at which the absolute value of the differential tail probability attains its maximum. It is computed according to Algorithm 1. The inputs include the delay dataset for packets with length x,

{(x, y_{1}), (x, y_{2}), \dots, (x, y_{N_{x}})}

, the size

(T + 1)

of the tail probability sequence, and the interval

δ

between adjacent delays in the tail probability sequence, such as

δ = 0.5

ms. The outputs are the threshold

u_{1} (x)

and the tail probability sequence

{{\bar{H}}_{i} : i = 0, 1, \dots, T}

.

Algorithm 1: Calculate threshold

u_{1} (x)

The threshold

u_{2} (x)

is utilized to divide the tail part into two segments. Analysis of the measured delay dataset reveals that the tail initially decreases slowly and subsequently exhibits a faster rate of decrease. This transition occurs at a convex bend. Consequently,

u_{2} (x)

is defined as the delay value corresponding to the starting point of the convex bend with the maximum vertical drop, as illustrated in Figure 5. It is determined through Algorithm 2, with inputs including T,

δ

,

u_{1} (x)

, and the tail probability sequence

{{\bar{H}}_{i} : i = 0, 1, \dots, T}

.

Algorithm 2: Calculate threshold

u_{2} (x)

Algorithm 2 comprises three steps. In the first step, the starting points of the convex bends in the tail probability sequence

\{{\bar{H}}_{i} : i = 0, 1, \dots, T\}

are identified in lines 1–11 of Algorithm 2, and these points are also referred to as the starting points of convex point clusters. A convex point cluster is defined as a group of consecutive convex points. The point

(i, {\bar{H}}_{i})

,

i = 1, \dots, T - 1

, is termed a convex point if the following relationship (8) is satisfied.

\frac{{\bar{H}}_{i} - {\bar{H}}_{i + 2}}{2} + {\bar{H}}_{i + 2} < {\bar{H}}_{i + 1} .

(8)

In the second step, the convex bends in the tail probability sequence

\{{\bar{H}}_{i} : i = 0, 1, \dots, T\}

are identified, and their vertical drops are subsequently calculated in lines 12–26 of Algorithm 2. A set of consecutive convex points

\{(i, {\bar{H}}_{i}), (i + 1, {\bar{H}}_{i + 1}), \dots, (j - 1, {\bar{H}}_{j - 1})\}

form a convex bend if the following relationship (9) is satisfied for all

k \in \{i + 1, i + 2, \dots, j - 1\}

. The vertical drop of the aforementioned convex bend is computed as

{\bar{H}}_{i} - {\bar{H}}_{j - 1}

.

\frac{{\bar{H}}_{i} - {\bar{H}}_{j}}{i - j} \cdot (k - j) + {\bar{H}}_{j} < {\bar{H}}_{k} .

(9)

In the third step,

u_{2} (x)

is determined in lines 27–33 of Algorithm 2. Specifically,

u_{2} (x)

corresponds to the starting point of the convex bend with the maximum vertical drop located on the right side of

u_{1} (x)

.

4.2.2. Second Stage: Train the Remaining Parameters

The parameter vector

{\hat{θ}}^{'} (x)

is estimated using an MDN, which comprises one input layer, four hidden layers, one output layer, and one custom layer, as illustrated in Figure 4. The custom layer is utilized to set the thresholds obtained in the first stage. The output layer consists of

(6 + 3 I)

neurons that output the parameters in

{\hat{θ}}^{'} (x)

. The input layer contains a single neuron, into which the packet lengths

x_{n}

from the dataset

\{(x_{n}, y_{n}) : n = 0, 1, \dots, N\}

are fed in batches with batch size

N_{b}

during each epoch. Additionally, the delays

y_{n}

are used in the loss function NLPD [20], which is defined as follows.

Loss = - \frac{\sum_{n = k \cdot N_{b}}^{(k + 1) N_{b} - 1} log (h (y_{n} ∣ \hat{θ} (x_{n})))}{N_{b}},

(10)

where

k = 0, 1, \dots,

and

(k + 1) N_{b} - 1 \leq N

. Moreover, the training process is carried out in multiple rounds with varying learning rates, and each round is divided into several epochs.

The aforementioned training process is adapted from reference [10]. Unlike [10], the PDF in this paper comprises a main part and a two-segment tail part. The two segments of the tail part are modeled using distinct GPDs, with consideration given to the coupling relationship between segments. The thresholds for dividing the three segments are determined based on the variation trend of tail probabilities derived from historical measured delay data, rather than through training. Meanwhile, the remaining parameters are estimated via MDN-based training.

Remark 1.

The loss function NLPD is essentially a negative log-likelihood. Our objective is to minimize the negative log-likelihood, which is equivalent to performing maximum likelihood estimation (MLE). However, when the classical approach for maximizing the likelihood is applied to estimate the parameters of the delay PDF, it demonstrates poor scalability. This limitation arises from the necessity of inverting a large number of covariance matrices proportional to the number of data points [29]. In contrast, an effective alternative is the deep learning framework, which utilizes the NLPD as the loss function.

5. Experiments on WiFi Networks

In this section, we compare the proposed method TSTD with existing approaches in terms of their accuracy in estimating the delay PDF. The subsequent subsections provide a detailed description of the experimental procedures, including data preprocessing, threshold determination, model training, and performance evaluation.

5.1. Data Preprocessing

Four datasets provided by Mostafavi et al. [10] are utilized, corresponding to four types of packets with varying lengths (172 bytes, 3440 bytes, 6880 bytes, and 10,320 bytes). Each dataset comprises over one million samples that record the packet length and downlink delay in software-defined radio (SDR) WiFi networks. The data collection environment for the SDR WiFi networks consists of a conference room measuring

50 m^{2}

, equipped with metallic chairs, whiteboards, and screens. For collecting data related to packets with a length of 172 bytes, the coordinate of the end node location is set variably at

(1, 0)

and

(8, 5)

within a coordinate system (with scale units in meters) using the access point as the origin. This can simulate the small-scale movement of the end node. For other packet types, the end node location is consistently set at

(1, 0)

. The packet generation interval was established at 10 ms. Basic information regarding the utilized datasets is presented in Table 1.

The datasets undergo preprocessing in accordance with the following steps. First, the datasets are normalized by scaling the delay values to the millisecond level. Then, standardization is performed by subtracting the mean from the normalized delay values, yielding zero-mean delay values, to avoid training errors caused by dataset biases. Finally, to enhance the convergence speed during training, the four types of packet lengths are normalized by scaling them to the range

[0, 1]

.

5.2. Threshold Determination

The first stage illustrated in Figure 4 is carried out, where Algorithms 1 and 2 are employed to compute the thresholds

u_{1} (x)

and

u_{2} (x)

for packets with lengths

x =

172, 3440, 6880, and 10,320 bytes, respectively. The input dataset for the algorithms takes the form

{(x, {\dot{y}}_{1}), (x, {\dot{y}}_{2}), \dots, (x, {\dot{y}}_{N_{x}})}

, where

{\dot{y}}_{n}

(n = 1, 2, \dots, N_{x})

represents the delay values scaled to the millisecond level. Additionally, the other two inputs for each packet type are set as

T = 200

and

δ = 0.5

. The resulting calculations are presented in Figure 6.

5.3. Model Training

The second stage illustrated in Figure 4 is carried out, where the parameter vector

{\hat{θ}}^{'} (x)

is obtained by training the MDN model. In this model, the number of Gaussian distributions in (3) is set to

I = 10

. The first to fourth hidden layers contain 10, 50, 50, and 40 neurons, respectively. The activation function for each neuron is set to the ‘tanh’ function. For the output layer, the neurons corresponding to

μ_{i} (x) (i = 1, 2, \dots, 10)

have no activation function, while the neurons corresponding to

π_{i} (x) (i = 1, 2, \dots, 10)

use ‘softmax’ functions. The remaining neurons employ ‘softplus’ functions as their activation functions. For the custom layer, the parameters

u_{1} (x)

and

u_{2} (x)

are fixed at the values calculated in Section 5.2.

The training dataset, in the form of

{({\dot{x}}_{i}, {\ddot{y}}_{i}) : i = 1, 2, \dots, N}

is utilized, where

{\dot{x}}_{i}

and

{\ddot{y}}_{i}

represent the normalized packet length and delay value, respectively. This training dataset is constructed through three steps. First, the four datasets corresponding to different packet lengths are merged into a single dataset. Then, the samples in the merged dataset are randomly shuffled. Finally, N samples are randomly sampled from the shuffled dataset according to a specified sampling ratio, forming the training dataset.

The MDN model training was conducted on a server equipped with an Intel^® Core™ i9-14900K CPU (3.2 GHz) and 128 GB of memory, using only CPU resources and based on the TensorFlow framework. The training process is carried out in four rounds with learning rates of

10^{- 2}

,

10^{- 3}

,

10^{- 4}

, and

10^{- 5}

, respectively. Each round consists of 200 epochs. In each epoch, the batch size

N_{b}

is set to

1 / 8

of the number of samples in the training dataset.

5.4. Performance Evaluation

The performance of TSTD is evaluated in terms of the means and fluctuation amplitudes of the delay tail probabilities, and it is compared with two existing methods. One method is GMM [18], which is one of the state-of-the-art approaches for probability density estimation. The other method is GMEVM [10], which integrates GPD with GMM.

First, Q training datasets are independently sampled with a sampling ratio of

0.8 %

, and these datasets are used to train and obtain Q PDFs. In the experiment, Q was set to 9. The mean and fluctuation amplitude of Q tail probability distributions are plotted in Figure 7, where the delay value serves as the abscissa and the common logarithm of the tail probability serves as the ordinate. The solid line represents the mean, while the shaded area represents the fluctuation amplitude. It can be observed that both GMM and GMEVM exhibit larger deviations from the actual measurements (denoted as MEAS), whereas TSTD fluctuates around the actual measurements.

Then, similar to the above case, the experiment results for the sampling ratios

6.25 %

and

21 %

are presented in Figure 8 and Figure 9, respectively. It can be observed that the fluctuation amplitudes of both GMEVM and TSTD decrease as the sampling ratio of the training dataset increases. TSTD always follows the actual measurements as the delay increases, while GMEVM initially exhibits heavy-tail characteristics and gradually deviates from the actual measurements, because a single GPD is insufficient for accurately capturing the probabilistic characteristics when the tail exhibits heavy-tailed behavior accompanied by fluctuations. Even though the sampling ratio increases to

21 %

, GMM continues to exhibit significant fluctuations relative to the actual measurements. This is due to the fact that as the sampling ratio increases, the influence of tail events becomes more pronounced. Fitting heavy-tailed distributions using GMM introduces substantial errors given the inherent limitations of GMM. This issue persists even when the number of GMM components is increased, as shown in Figure 10, where the number of components is set to 20.

Next, the model is trained using the complete dataset. As shown in Figure 11, the improvement in estimation performance is not significant. In dynamic wireless environments, the model needs to be updated periodically. Introducing a data sampling strategy can effectively reduce computational costs. Thus, achieving an appropriate balance between the sampling ratio (or training time) and model performance is essential for efficient model updates. Table 2 provides a detailed summary of the average training time for Q PDFs under different sampling ratios. Adding one segment means that the model needs to train additional parameters, which also increases the training time to some extent. TSTD requires longer training time compared to GMM and GMEVM. However, the difference is not significant compared to GMEVM, especially when the size of the training dataset is larger.

Furthermore, the mean absolute error (

\hat{M A E}

) and average fluctuation amplitude (

\hat{A F A}

) of tail probabilities across all considered delays are utilized to evaluate the performance of estimation methods.

\hat{M A E} (x) = \frac{\sum_{j = 0}^{T} M A E (y = j \cdot δ, x)}{T + 1},

(11)

where

M A E (y = j \cdot δ, x)

represents the mean absolute error of tail probabilities at delay y, conditioned on packet length x, considering the Q PDFs estimated from Q training datasets.

M A E (y = j \cdot δ, x) = |\frac{\sum_{i = 1}^{Q} {\bar{H}}^{i} (y | \hat{θ} (x))}{Q} - M E A S (y | x)| .

(12)

Q = 9

.

{\bar{H}}^{i} (y | \hat{θ} (x))

represents the tail probability at delay y given packet length x, estimated independently using the i-th training dataset, where

i = 1, 2, \dots, Q

. Additionally,

M E A S (y | x)

denotes the measured tail probability at delay y under the condition of packet length x.

\hat{A F A} (x) = \frac{\sum_{j = 0}^{T} A F A (y = j \cdot δ, x)}{T + 1},

(13)

where

A F A (y = j \cdot δ, x)

represents the average fluctuation amplitude of tail probabilities at delay y given packet length x, considering Q PDFs estimated from Q independent training datasets.

\begin{matrix} A F A (y = j \cdot δ, x) \\ = |M A X \{{\bar{H}}^{i} (y | \hat{θ} (x)) : i = 1, 2, \dots, Q\} - M I N \{{\bar{H}}^{i} (y | \hat{θ} (x)) : i = 1, 2, \dots, Q\}| . \end{matrix}

(14)

M A X {\cdot}

and

M I N {\cdot}

denote the functions used to select the maximum and minimum values, respectively, from a given set.

As shown in Table 3, for TSTD with a sampling ratio of

21 %

, both

\hat{M A E}

and

\hat{A F A}

decrease compared to GMM and GMEVM.

For some delay values, the MAE and AFA of TSTD either remain at the same order of magnitude or decrease by at least one order of magnitude compared with GMM and GMEVM. For example, the MAEs and AFAs for certain delay values are listed in Table 4 and Table 5. However, for specific delay values, the MAE of TSTD increases by one order of magnitude compared to GMM, while the AFA decreases by one order of magnitude, with the fluctuation interval still lying within that of GMM. The PDFs constructed in GMM and TSTD are used to approximate the PDF of E2E delays. In this approximation process, the loss function is defined as the mean of negative log probability densities over multiple delays, which measures the average closeness between the estimated result and the target. Therefore, compared with GMM, it is possible for TSTD to exhibit poor fitting results on a small number of points, but its overall performance remains superior.

Finally, the potential inaccuracies or noise in delay measurements are considered as factors that could influence the estimation results. To evaluate the performance of estimation methods under such conditions, random Gaussian noises with variances of 1 ms and 3 ms are, respectively, introduced into the delay samples. As shown in Figure 12 and Figure 13, in the low-delay domain, as the noise variance increases, the mean of each method gradually deviates from the actual measurements, while their fluctuation amplitudes remain largely unaffected. In the high-delay domain, as the noise variance increases, the mean and fluctuation amplitude of TSTD remain almost unchanged. For GMEVM, its fluctuation amplitude initially increases and then decreases, while its mean remains nearly constant. For GMM, both its mean and fluctuation amplitude are relatively stable, with the latter continuing to decrease slightly. This is because a higher proportion of data lies in the low-delay domain, where changes in these data can significantly affect the mean of the fitting results. Additionally, the added noise tends to smooth out sharp fluctuations, ultimately reducing the fluctuation amplitudes. In summary, compared with GMM and GMEVM, the multiple-segmentation feature of TSTD makes it more robust when measurement data are slightly inaccurate or contain low levels of noise.

5.5. Experiments on the Expansion of WiFi Networks

To further evaluate the proposed method, additional experiments were conducted under real WiFi network conditions. Specifically, delay samples were collected from the network to construct a new dataset for training and evaluating the MDN models. Thereafter, the accuracy of different models in estimating the tail probability distributions of delay was compared.

5.5.1. Measurement Setup

The experiment involves an end node and an access point connected via the WiFi topology depicted in Figure 2, where the end node is implemented with a Raspberry Pi 4B and the access point is established using a FriendlyElec NanoPi R5S router. The two devices are interconnected to form the experimental setup. To ensure precise time synchronization, the clocks of both nodes are synchronized using the Network Time Protocol (NTP) over a dedicated and isolated interface. Delay samples are collected at the application layer using the Interactive Real-Time Traffic (IRTT) software (version 0.9.1), which transmits timestamped packets from the end node to the access point (uplink) and back (downlink). For each packet, IRTT records three key timestamps: the time when the packet is sent from the end node (“send”), the time when it is received at the access point (“receive”), and the round-trip time measured upon its return to the end node (“rtt”). With synchronized clocks, the one-way uplink delay is calculated as the difference between “receive” and “send”, while the downlink delay is derived by subtracting the uplink delay from the round-trip time. This enables precise per-packet measurements of uplink, downlink, and round-trip delays. During data collection, packets of varying sizes (172 bytes, 3440 bytes, and 6880 bytes) are periodically transmitted at 10 ms intervals.

The measurement experiment was conducted over a one-hour period in a university laboratory, during which approximately 0.9 million delay measurements were collected. The lab spans an area of approximately

100 m^{2}

and comprises multiple individual cubicles, desks, desktop computers, monitors, and various other objects. The coordinate of the end node’s location is set at (5, 0) within a coordinate system (with scale units in meters), using the access point as the origin. Partition walls and several obstacles exist between the access point and the end node. The end node is situated in one corner of the rectangular laboratory. Basic information regarding the measured dataset is summarized in Table 6.

5.5.2. Data Preprocessing and Threshold Determination

The dataset is preprocessed according to the procedure outlined in Section 5.1. As described in Section 5.2, Algorithms 1 and 2 are respectively applied to calculate the thresholds

u_{1} (x)

and

u_{2} (x)

, where x denotes packet length. For the case with packet length

x = 172

bytes,

u_{1} (172) = 1

and

u_{2} (172) = 24

. For the case with packet length

x = 3440

bytes,

u_{1} (3440) = 6

and

u_{2} (3440) = 20

. For the case with packet length

x = 6880

bytes,

u_{1} (6880) = 5

and

u_{2} (6880) = 14

.

5.5.3. Model Training and Performance Evaluation

The training process adopts the same parameter configuration as specified in Section 5.3.

Q = 9

training datasets were independently sampled at a rate of

27 %

, and were used to train the models and obtain nine PDFs. The average and variation range of the nine tail probability distributions are depicted in Figure 14, where the horizontal axis denotes delay values and the vertical axis represents the common logarithm of the tail probability. The results indicate that, under all three packet length configurations, both the GMM and GMEVM exhibit lower accuracy in estimating the tail probabilities, whereas the TSTD demonstrates significantly better estimation performance.

6. Discussions

6.1. Limitations of TSTD Method

Despite the significant advantages of TSTD in estimating the E2E delay PDF in WiFi networks, the method still has certain limitations. First, the current model is primarily designed for WiFi networks, and its performance depends on the specific characteristics of the measurement data in a given environment. In dynamically changing or complex network environments, factors such as signal attenuation and interference may lead to more intricate delay distributions, such as heavy-tailed or multimodal patterns, which can challenge the fitting capability of the three-segment model. Second, compared to existing methods, TSTD introduces two segments of GPD to fit the tail, and its MDN output layer uses a fixed neuron structure. This leads to a larger number of parameters, which increases model complexity and prolongs training time. In delay-sensitive scenarios with high real-time requirements, further optimization may be necessary to enhance computational efficiency.

6.2. Future Work

Future research could investigate the adaptive determination of the number of GMM components or segments based on input data characteristics, such as packet length and network conditions. This would not only enhance the computational efficiency of the model but also strengthen its adaptability across diverse network environments. Further refinements will focus on integrating additional network parameters as conditioning variables to enhance both the model’s accuracy and adaptability. Potential parameters encompass modulation and coding scheme (MCS), packet arrival intervals, signal-to-noise ratio (SNR), and packet loss rate. Moreover, assessing the model’s generalizability beyond WiFi networks, such as in 5G and IoT networks, represents a critical research direction. Ultimately, the estimation results will provide a robust foundation for devising deterministic delay control strategies.

7. Conclusions

In this paper, a PDF estimation method for the E2E delay in WiFi networks is proposed. The delay PDF is divided into three segments; the first segment, fitted by GMM, constitutes the main part of the PDF, while the last two segments, fitted by two different GPDs, constitute the tail part of the PDF. The thresholds for dividing the three segments are calculated first, and the other parameters of the PDF are subsequently obtained through training using MDN. Experimental validation demonstrates that this approach achieves results closer to measurement data compared to existing methods. Specifically, the mean absolute errors and average fluctuation amplitudes of tail probabilities at certain delay values decrease by at least one order of magnitude. Moreover, the multiple-segmentation feature of the proposed method enhances its robustness in situations where measurement data are affected by low levels of Gaussian noise.

Author Contributions

Conceptualization, J.C. and Y.D.; methodology, J.C.; software, Y.D.; validation, J.C. and Y.D.; formal analysis, S.H.; investigation, S.H.; resources, M.Z.; data curation, M.Z.; writing—original draft preparation, Y.D.; writing—review and editing, J.C. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported in part by the National Natural Science Foundation of China under Grant 62361017, in part by Natural Science Foundation of Guangxi under Grant 2023GXNSFBA026212, in part by Research Foundation Ability Enhancement Project for Young and Middle aged Teachers in Guangxi Universities under Grant 2023KY0227, in part by the Open Project of State Key Laboratory of Public Big Data under Grant PBD2022-09, and in part by the Innovation Project of GUET Graduate Education 2025YCXS068.

Data Availability Statement

The original contributions presented in the study are included in the article, and further inquiries can be directed to the corresponding author.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Qiao, Y.; Niu, Y.; Chen, S.; Zhong, Z.; Zhang, C.; Wang, N.; Ai, B. Energy Efficiency Optimization of Ultra-Reliable Low-Latency Communication for High-Speed Rail. IEEE Trans. Veh. Technol. 2024, 73, 16638–16653. [Google Scholar] [CrossRef]
Yan, M.; AunChan, C.; Gygax, A.F.; Li, C.; Nirmalathas, A.; Chih-Lin, I. Efficient Generation of Optimal UAV Trajectories with Uncertain Obstacle Avoidance in MEC Networks. IEEE Internet Things J. 2024, 11, 38380–38392. [Google Scholar] [CrossRef]
Adame, T.; Carrascosa-Zamacois, M.; Bellalta, B. Time-Sensitive Networking in IEEE 802.11be: On the Way to Low-Latency WiFi 7. Sensors 2021, 21, 4954. [Google Scholar] [CrossRef] [PubMed]
Lee, H.; Choi, Y.; Han, T.; Kim, K. Probabilistically Guaranteeing End-to-End Latencies in Autonomous Vehicle Computing Systems. IEEE Trans. Comput. 2022, 71, 3361–3374. [Google Scholar] [CrossRef]
Han, F.; Wang, M.; Cui, Y.; Li, Q.; Liang, R.; Liu, Y.; Jiang, Y. Future Data Center Networking: From Low Latency to Deterministic Latency. IEEE Netw. 2022, 36, 52–58. [Google Scholar] [CrossRef]
Yan, M.; Zhang, Y.; Chan, C.A.; Gygax, A.F.; Li, C. Secure Task Offloading Strategy Optimization of UAV-Aided Outdoor Mobile High-Definition Live Streaming. Chin. J. Aeronaut. 2025, 103454. [Google Scholar] [CrossRef]
Huang, Y.; Wang, S.; Huang, T.; Liu, Y. Cycle-Based Time-Sensitive and Deterministic Networks: Architecture, Challenges, and Open Issues. IEEE Commun. Mag. 2022, 60, 81–87. [Google Scholar] [CrossRef]
Wang, S.; Wu, B.; Zhang, C.; Huang, Y.; Huang, T.; Liu, Y. Large-Scale Deterministic IP Networks on CENI. In Proceedings of the IEEE INFOCOM 2021—IEEE Conference on Computer Communications Workshops (INFOCOM WKSHPS), Vancouver, BC, Canada, 10–13 May 2021; pp. 1–6. [Google Scholar] [CrossRef]
Flinta, C.; Yan, W.; Johnsson, A. Predicting Round-Trip Time Distributions in IoT Systems Using Histogram Estimators. In Proceedings of the NOMS 2020—2020 IEEE/IFIP Network Operations and Management Symposium, Budapest, Hungary, 20–24 April 2020; pp. 1–9. [Google Scholar] [CrossRef]
Mostafavi, S.; Sharma, G.P.; Gross, J. Data-Driven Latency Probability Prediction for Wireless Networks: Focusing on Tail Probabilities. In Proceedings of the GLOBECOM 2023—2023 IEEE Global Communications Conference, Kuala Lumpur, Malaysia, 4–8 December 2023; pp. 4338–4344. [Google Scholar] [CrossRef]
Cao, J.; Feng, W.; Ge, N.; Lu, J. Delay Characterization of Mobile-Edge Computing for 6G Time-Sensitive Services. IEEE Internet Things J. 2021, 8, 3758–3773. [Google Scholar] [CrossRef]
Mei, M.; Yao, M.; Yang, Q.; Qin, M.; Kwak, K.S.; Rao, R.R. Delay Analysis of Mobile Edge Computing Using Poisson Cluster Process Modeling: A Stochastic Network Calculus Perspective. IEEE Trans. Commun. 2022, 70, 2532–2546. [Google Scholar] [CrossRef]
Cui, P.; Han, S.; Xu, X.; Zhang, J.; Zhang, P.; Ren, S. End-to-End Delay Performance Analysis of Industrial Internet of Things: A Stochastic Network Calculus Perspective. IEEE Internet Things J. 2024, 11, 5374–5387. [Google Scholar] [CrossRef]
Coll-Perales, B.; Lucas-Estañ, M.C.; Shimizu, T.; Gozalvez, J.; Higuchi, T.; Avedisov, S.; Altintas, O.; Sepulcre, M. End-to-End V2X Latency Modeling and Analysis in 5G Networks. IEEE Trans. Veh. Technol. 2023, 72, 5094–5109. [Google Scholar] [CrossRef]
Fadhil, D.; Oliveira, R. Estimation of 5G Core and RAN End-to-End Delay through Gaussian Mixture Models. Computers 2022, 11, 184. [Google Scholar] [CrossRef]
Chen, C.H.; Song, F.; Hwang, F.J.; Wu, L. A Probability Density Function Generator Based on Neural Networks. Phys. A Stat. Mech. Appl. 2020, 541, 123344. [Google Scholar] [CrossRef]
Raeis, M.; Tizghadam, A.; Leon-Garcia, A. Predicting Distributions of Waiting Times in Customer Service Systems using Mixture Density Networks. In Proceedings of the 2019 15th International Conference on Network and Service Management (CNSM), Halifax, NS, Canada, 21–25 October 2019; pp. 1–6. [Google Scholar] [CrossRef]
Raeis, M.; Tizghadam, A.; Leon-Garcia, A. Probabilistic Bounds on the End-to-End Delay of Service Function Chains using Deep MDN. In Proceedings of the 2020 IEEE 31st Annual International Symposium on Personal, Indoor and Mobile Radio Communications, London, UK, 31 August–3 September 2020; pp. 1–6. [Google Scholar] [CrossRef]
Yasuda, S.; Yoshida, H. Prediction of Round Trip Delay for Wireless Networks by a Two-state Model. In Proceedings of the 2018 IEEE Wireless Communications and Networking Conference (WCNC), Barcelona, Spain, 15–18 April 2018; pp. 1–6. [Google Scholar] [CrossRef]
Chi, J.; Mao, Z.; Jia, M. Robust Gaussian Process Regression Based on Bias Trimming. Knowl.-Based Syst. 2024, 291, 111605. [Google Scholar] [CrossRef]
Taillardat, M.; Fougères, A.L.; Naveau, P.; De Fondeville, R. Evaluating Probabilistic Forecasts of Extremes Using Continuous Ranked Probability Score Distributions. Int. J. Forecast. 2023, 39, 1448–1459. [Google Scholar] [CrossRef]
Martín, J.; Parra, M.I.; Pizarro, M.M.; Sanjuán, E.L. Baseline Methods for the Parameter Estimation of the Generalized Pareto Distribution. Entropy 2022, 24, 178. [Google Scholar] [CrossRef] [PubMed]
Li, J.; Li, D.; Li, P.; Samorodnitsky, G. Generalized Pareto GAN: Generating Extremes of Distributions. In Proceedings of the 2024 International Joint Conference on Neural Networks (IJCNN), Yokohama, Japan, 30 June–5 July 2024; pp. 1–8. [Google Scholar] [CrossRef]
Mostafavi, S.S.; Dán, G.; Gross, J. Data-Driven End-to-End Delay Violation Probability Prediction with Extreme Value Mixture Models. In Proceedings of the 2021 IEEE/ACM Symposium on Edge Computing (SEC), San Jose, CA, USA, 14–17 December 2021; pp. 416–422. [Google Scholar] [CrossRef]
Curceac, S.; Atkinson, P.M.; Milne, A.; Wu, L.; Harris, P. An Evaluation of Automated GPD Threshold Selection Methods for Hydrological Extremes Across Different Scales. J. Hydrol. 2020, 585, 124845. [Google Scholar] [CrossRef]
Alaswed, H. Graphical Diagnostics for Threshold Selection in Fitting the Generalized Pareto Distribution. J. Pure Appl. Sci. 2024, 23, 90–95. [Google Scholar] [CrossRef]
Cyrille, O.G.; Keita, K. Machine Learning Method to Estimate Parameters of the GPD Distribution: Applied to Lobo Flows and Yzeron Water Levels. Research Square. 2024; Preprint. [Google Scholar] [CrossRef]
Zhao, X.; Zhang, Z.; Cheng, W.; Zhang, P. A New Parameter Estimator for the Generalized Pareto Distribution under the Peaks over Threshold Framework. Mathematics 2019, 7, 406. [Google Scholar] [CrossRef]
Lin, A.; Tolooshams, B.; Atchadé, Y.; Ba, D. Probabilistic Unrolling: Scalable, Inverse-Free Maximum Likelihood Estimation for Latent Gaussian Models. In Proceedings of the Fortieth International Conference on Machine Learning (ICML), Honolulu, HI, USA, 27 July 2023; pp. 1–29. [Google Scholar]
Mostafavi, S. SDR WiFi Measurement Commands. Available online: https://github.com/samiemostafavi/wireless-pr3d/blob/main/measurements/campaign1/IEEE80211g.md (accessed on 6 August 2024).

Figure 1. Deterministic delay network for time-sensitive applications.

Figure 2. Diagram of the research problem.

Figure 3. Diagram of the delay PDF.

Figure 4. Parameter estimation process.

Figure 5. Graphical characteristics of

u_{2} (x)

.

Figure 5. Graphical characteristics of

u_{2} (x)

.

Figure 6. Segmentation thresholds for packets of different lengths.

Figure 7. Mean and fluctuation amplitude of tail probability distributions trained using the SDR WiFi downlink delay dataset sampled at a sampling ratio of 0.8%.

Figure 8. Mean and fluctuation amplitude of tail probability distributions trained using the SDR WiFi downlink delay dataset sampled at a sampling ratio of 6.25%.

Figure 9. Mean and fluctuation amplitude of tail probability distributions trained using the SDR WiFi downlink delay dataset sampled at a sampling ratio of 21%.

Figure 10. Mean and fluctuation amplitude of tail probability distributions trained with I = 20 and a sampling ratio of 21% for GMM.

Figure 11. Mean and fluctuation amplitude of tail probability distributions trained using the SDR WiFi downlink delay dataset sampled at a sampling ratio of 100%.

Figure 12. Mean and fluctuation amplitude of tail probability distributions trained using the SDR WiFi downlink delay dataset, sampled at a sampling ratio of 21% and with Gaussian noise of 1 ms variance introduced.

Figure 13. Mean and fluctuation amplitude of tail probability distributions trained using the SDR WiFi downlink delay dataset, sampled at a sampling ratio of 21% and with Gaussian noise of 3 ms variance introduced.

Figure 14. Mean and fluctuation amplitude of tail probability distributions trained using the measured WiFi downlink delay dataset sampled at a sampling ratio of 27%.

Table 1. Basic information on SDR WiFi datasets [30].

End Node Location	RSSI (dBm)	Downlink Capacity (Mbps)	Packet Length (Bytes)	Number of Samples
(1, 0)/(8, 5)	−61/−87	26.22/9.26	172	1,256,295
(1, 0)	−61	26.22	3440	1,075,910
(1, 0)	−61	26.22	6880	1,068,077
(1, 0)	−61	26.22	10,320	1,036,423

RSSI refers to the received signal strength indication.

Table 2. Average training time under different sampling rates,

Q = 9

.

Table 2. Average training time under different sampling rates,

Q = 9

.

Sampling Ratio	GMM	GMEVM	TSTD
$0.8 %$	26 s	42 s	44.7 s
$6.25 %$	2.07 min	3.93 min	4.19 min
$21 %$	10.56 min	15.15 min	15.54 min

Table 3.

\hat{M A E}

and

\hat{A F A}

of tail probabilities.

Table 3.

\hat{M A E}

and

\hat{A F A}

of tail probabilities.

	GMM	GMEVM	TSTD
172 bytes
$\hat{M A E}$	0.0004848	0.0005824	0.0004531
$\hat{A F A}$	0.0013418	0.0009624	0.0006790
3440 bytes
$\hat{M A E}$	0.0002742	0.0002565	0.0000752
$\hat{A F A}$	0.0025520	0.0003158	0.0002115
6880 bytes
$\hat{M A E}$	0.0002252	0.0005638	0.0001192
$\hat{A F A}$	0.0014704	0.0006953	0.0005775
10,320 bytes
$\hat{M A E}$	0.0005072	0.0018821	0.0001714
$\hat{A F A}$	0.0032255	0.0027527	0.0003937

Table 4. MAEs of tail probabilities at some delay values.

	GMM	GMEVM	TSTD	TSTD vs. GMM	TSTD vs. GMEVM
172 bytes
10 ms	0.0001161	0.0009614	0.0002117	—	—
20 ms	0.0003119	0.0002654	0.0000280	↓	↓
40 ms	0.0002604	0.0001805	0.0000621	—	↓
60 ms	0.0000448	0.0001994	0.0000344	—	↓
80 ms	0.0001462	0.0001483	0.0000140	↓	↓
3440 bytes
10 ms	0.0000116	0.0012004	0.0003190	↑	↓
20 ms	0.0010484	0.0006407	0.0003960	↓	—
40 ms	0.0002370	0.0004430	0.0000422	↓	↓
60 ms	0.0000520	0.0002714	0.0000150	—	↓
80 ms	0.0000114	0.0001795	0.0000035	↓	$2 ↓$
6880 bytes
10 ms	0.0019739	0.0015411	0.0009512	↓	↓
20 ms	0.0013008	0.0013864	0.0002981	↓	↓
40 ms	0.0003517	0.0006850	0.0000295	↓	↓
60 ms	0.0001067	0.0004947	0.0000053	$2 ↓$	$2 ↓$
80 ms	0.0000360	0.0003199	0.0000091	↓	$2 ↓$
10,320 bytes
10 ms	0.0025418	0.0172135	0.0009030	↓	$2 ↓$
20 ms	0.0013047	0.0076196	0.0013923	—	—
40 ms	0.0000938	0.0006340	0.0003321	↑	—
60 ms	0.0009785	0.0016258	0.0000657	↓	$2 ↓$
80 ms	0.0003885	0.0016749	0.0000819	↓	$2 ↓$

The values in columns 2–4 denote MAEs. The meanings of the symbols in columns 5 and 6 are as follows. For TSTD, compared with GMM or GMEVM, the symbol ‘—’ represents that MAE remains the same in the order of magnitude; and the symbols ‘↓’, ‘

2 ↓

’, and ‘↑’ represent that MAE decreases by one order of magnitude, decreases by two orders of magnitude, and increases by one order of magnitude, respectively.

Table 5. AFAs of tail probabilities at some delay values.

	GMM	GMEVM	TSTD	TSTD vs. GMM	TSTD vs. GMEVM
172 bytes
10 ms	0.0039507	0.0019459	0.0004887	↓	↓
20 ms	0.0021704	0.0009179	0.0004761	↓	—
40 ms	0.0012367	0.0005630	0.0001728	↓	—
60 ms	0.0009629	0.0004137	0.0001308	—	—
80 ms	0.0005695	0.0003291	0.0001165	—	—
3440 bytes
10 ms	0.0073440	0.0010459	0.0008199	↓	↓
20 ms	0.0054414	0.0005337	0.0006164	↓	—
40 ms	0.0012082	0.0002552	0.0000718	↓	↓
60 ms	0.0001327	0.0001636	0.0000345	↓	↓
80 ms	0.0000365	0.0001188	0.0000171	—	↓
6880 bytes
10 ms	0.0044745	0.0023440	0.0035819	—	—
20 ms	0.0101731	0.0015240	0.0020221	↓	—
40 ms	0.0017925	0.0009852	0.0001703	↓	—
60 ms	0.0005073	0.0006320	0.0000407	↓	↓
80 ms	0.0001442	0.0004451	0.0000130	↓	↓
10,320 bytes
10 ms	0.0173197	0.0219340	0.0022661	↓	↓
20 ms	0.0147038	0.0021904	0.0013999	↓	—
40 ms	0.0066856	0.0039259	0.0006537	↓	↓
60 ms	0.0021740	0.0029630	0.0002614	↓	↓
80 ms	0.0001569	0.0022165	0.0001964	—	↓

The values in columns 2–4 denote AFAs. The meanings of the symbols in columns 5 and 6 are as follows. For TSTD, compared with GMM or GMEVM, the symbol ‘—’ represents that AFA remains the same in the order of magnitude; and the symbol ‘↓’ represents that AFA decreases by one order of magnitude.

Table 6. Basic information on measured WiFi datasets.

End Node Location	RSSI (dBm)	Downlink Capacity (Mbps)	Packet Length (Bytes)	Number of Samples
(5, 0)	−61	0.14	172	254,691
(5, 0)	−61	2.61	3440	328,935
(5, 0)	−61	5.10	6880	304,949

The dataset are available at: https://www.kaggle.com/datasets/daiyujun/wifi-network-delay-dataset/data (accessed on 23 May 2025).

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Cao, J.; Dai, Y.; Huang, S.; Zhang, M. Data-Driven Estimation of End-to-End Delay Probability Density Function for Time-Sensitive WiFi Networks. Electronics 2025, 14, 2324. https://doi.org/10.3390/electronics14122324

AMA Style

Cao J, Dai Y, Huang S, Zhang M. Data-Driven Estimation of End-to-End Delay Probability Density Function for Time-Sensitive WiFi Networks. Electronics. 2025; 14(12):2324. https://doi.org/10.3390/electronics14122324

Chicago/Turabian Style

Cao, Jianyu, Yujun Dai, Shuping Huang, and Minghe Zhang. 2025. "Data-Driven Estimation of End-to-End Delay Probability Density Function for Time-Sensitive WiFi Networks" Electronics 14, no. 12: 2324. https://doi.org/10.3390/electronics14122324

APA Style

Cao, J., Dai, Y., Huang, S., & Zhang, M. (2025). Data-Driven Estimation of End-to-End Delay Probability Density Function for Time-Sensitive WiFi Networks. Electronics, 14(12), 2324. https://doi.org/10.3390/electronics14122324

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Data-Driven Estimation of End-to-End Delay Probability Density Function for Time-Sensitive WiFi Networks

Abstract

1. Introduction

2. Related Work

3. Research Motivation and Problem Statement

3.1. Motivation for E2E Delay PDF Estimation

3.2. Problem Statement

4. Two-Stage and Three-Divided Estimation Method

4.1. PDF Construction

4.1.1. Main Part

4.1.2. Tail Part

4.2. Parameter Estimation for PDF

4.2.1. First Stage: Determine the Thresholds $u_{1} (x)$ and $u_{2} (x)$

4.2.2. Second Stage: Train the Remaining Parameters

5. Experiments on WiFi Networks

5.1. Data Preprocessing

5.2. Threshold Determination

5.3. Model Training

5.4. Performance Evaluation

5.5. Experiments on the Expansion of WiFi Networks

5.5.1. Measurement Setup

5.5.2. Data Preprocessing and Threshold Determination

5.5.3. Model Training and Performance Evaluation

6. Discussions

6.1. Limitations of TSTD Method

6.2. Future Work

7. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Article Menu

Data-Driven Estimation of End-to-End Delay Probability Density Function for Time-Sensitive WiFi Networks

Abstract

1. Introduction

2. Related Work

3. Research Motivation and Problem Statement

3.1. Motivation for E2E Delay PDF Estimation

3.2. Problem Statement

4. Two-Stage and Three-Divided Estimation Method

4.1. PDF Construction

4.1.1. Main Part

4.1.2. Tail Part

4.2. Parameter Estimation for PDF

4.2.1. First Stage: Determine the Thresholds u 1 ( x ) and u 2 ( x )

4.2.2. Second Stage: Train the Remaining Parameters

5. Experiments on WiFi Networks

5.1. Data Preprocessing

5.2. Threshold Determination

5.3. Model Training

5.4. Performance Evaluation

5.5. Experiments on the Expansion of WiFi Networks

5.5.1. Measurement Setup

5.5.2. Data Preprocessing and Threshold Determination

5.5.3. Model Training and Performance Evaluation

6. Discussions

6.1. Limitations of TSTD Method

6.2. Future Work

7. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

4.2.1. First Stage: Determine the Thresholds $u_{1} (x)$ and $u_{2} (x)$