In this section, we compare the proposed method TSTD with existing approaches in terms of their accuracy in estimating the delay PDF. The subsequent subsections provide a detailed description of the experimental procedures, including data preprocessing, threshold determination, model training, and performance evaluation.
5.1. Data Preprocessing
Four datasets provided by Mostafavi et al. [
10] are utilized, corresponding to four types of packets with varying lengths (172 bytes, 3440 bytes, 6880 bytes, and 10,320 bytes). Each dataset comprises over one million samples that record the packet length and downlink delay in software-defined radio (SDR) WiFi networks. The data collection environment for the SDR WiFi networks consists of a conference room measuring
, equipped with metallic chairs, whiteboards, and screens. For collecting data related to packets with a length of 172 bytes, the coordinate of the end node location is set variably at
and
within a coordinate system (with scale units in meters) using the access point as the origin. This can simulate the small-scale movement of the end node. For other packet types, the end node location is consistently set at
. The packet generation interval was established at 10 ms. Basic information regarding the utilized datasets is presented in
Table 1.
The datasets undergo preprocessing in accordance with the following steps. First, the datasets are normalized by scaling the delay values to the millisecond level. Then, standardization is performed by subtracting the mean from the normalized delay values, yielding zero-mean delay values, to avoid training errors caused by dataset biases. Finally, to enhance the convergence speed during training, the four types of packet lengths are normalized by scaling them to the range .
5.3. Model Training
The second stage illustrated in
Figure 4 is carried out, where the parameter vector
is obtained by training the MDN model. In this model, the number of Gaussian distributions in (
3) is set to
. The first to fourth hidden layers contain 10, 50, 50, and 40 neurons, respectively. The activation function for each neuron is set to the ‘tanh’ function. For the output layer, the neurons corresponding to
have no activation function, while the neurons corresponding to
use ‘softmax’ functions. The remaining neurons employ ‘softplus’ functions as their activation functions. For the custom layer, the parameters
and
are fixed at the values calculated in
Section 5.2.
The training dataset, in the form of is utilized, where and represent the normalized packet length and delay value, respectively. This training dataset is constructed through three steps. First, the four datasets corresponding to different packet lengths are merged into a single dataset. Then, the samples in the merged dataset are randomly shuffled. Finally, N samples are randomly sampled from the shuffled dataset according to a specified sampling ratio, forming the training dataset.
The MDN model training was conducted on a server equipped with an Intel® Core™ i9-14900K CPU (3.2 GHz) and 128 GB of memory, using only CPU resources and based on the TensorFlow framework. The training process is carried out in four rounds with learning rates of , , , and , respectively. Each round consists of 200 epochs. In each epoch, the batch size is set to of the number of samples in the training dataset.
5.4. Performance Evaluation
The performance of TSTD is evaluated in terms of the means and fluctuation amplitudes of the delay tail probabilities, and it is compared with two existing methods. One method is GMM [
18], which is one of the state-of-the-art approaches for probability density estimation. The other method is GMEVM [
10], which integrates GPD with GMM.
First,
Q training datasets are independently sampled with a sampling ratio of
, and these datasets are used to train and obtain
Q PDFs. In the experiment,
Q was set to 9. The mean and fluctuation amplitude of
Q tail probability distributions are plotted in
Figure 7, where the delay value serves as the abscissa and the common logarithm of the tail probability serves as the ordinate. The solid line represents the mean, while the shaded area represents the fluctuation amplitude. It can be observed that both GMM and GMEVM exhibit larger deviations from the actual measurements (denoted as MEAS), whereas TSTD fluctuates around the actual measurements.
Then, similar to the above case, the experiment results for the sampling ratios
and
are presented in
Figure 8 and
Figure 9, respectively. It can be observed that the fluctuation amplitudes of both GMEVM and TSTD decrease as the sampling ratio of the training dataset increases. TSTD always follows the actual measurements as the delay increases, while GMEVM initially exhibits heavy-tail characteristics and gradually deviates from the actual measurements, because a single GPD is insufficient for accurately capturing the probabilistic characteristics when the tail exhibits heavy-tailed behavior accompanied by fluctuations. Even though the sampling ratio increases to
, GMM continues to exhibit significant fluctuations relative to the actual measurements. This is due to the fact that as the sampling ratio increases, the influence of tail events becomes more pronounced. Fitting heavy-tailed distributions using GMM introduces substantial errors given the inherent limitations of GMM. This issue persists even when the number of GMM components is increased, as shown in
Figure 10, where the number of components is set to 20.
Next, the model is trained using the complete dataset. As shown in
Figure 11, the improvement in estimation performance is not significant. In dynamic wireless environments, the model needs to be updated periodically. Introducing a data sampling strategy can effectively reduce computational costs. Thus, achieving an appropriate balance between the sampling ratio (or training time) and model performance is essential for efficient model updates.
Table 2 provides a detailed summary of the average training time for
Q PDFs under different sampling ratios. Adding one segment means that the model needs to train additional parameters, which also increases the training time to some extent. TSTD requires longer training time compared to GMM and GMEVM. However, the difference is not significant compared to GMEVM, especially when the size of the training dataset is larger.
Furthermore, the mean absolute error (
) and average fluctuation amplitude (
) of tail probabilities across all considered delays are utilized to evaluate the performance of estimation methods.
where
represents the mean absolute error of tail probabilities at delay
y, conditioned on packet length
x, considering the
Q PDFs estimated from
Q training datasets.
.
represents the tail probability at delay
y given packet length
x, estimated independently using the
i-th training dataset, where
. Additionally,
denotes the measured tail probability at delay
y under the condition of packet length
x.
where
represents the average fluctuation amplitude of tail probabilities at delay
y given packet length
x, considering
Q PDFs estimated from
Q independent training datasets.
and
denote the functions used to select the maximum and minimum values, respectively, from a given set.
As shown in
Table 3, for TSTD with a sampling ratio of
, both
and
decrease compared to GMM and GMEVM.
For some delay values, the MAE and AFA of TSTD either remain at the same order of magnitude or decrease by at least one order of magnitude compared with GMM and GMEVM. For example, the MAEs and AFAs for certain delay values are listed in
Table 4 and
Table 5. However, for specific delay values, the MAE of TSTD increases by one order of magnitude compared to GMM, while the AFA decreases by one order of magnitude, with the fluctuation interval still lying within that of GMM. The PDFs constructed in GMM and TSTD are used to approximate the PDF of E2E delays. In this approximation process, the loss function is defined as the mean of negative log probability densities over multiple delays, which measures the average closeness between the estimated result and the target. Therefore, compared with GMM, it is possible for TSTD to exhibit poor fitting results on a small number of points, but its overall performance remains superior.
Finally, the potential inaccuracies or noise in delay measurements are considered as factors that could influence the estimation results. To evaluate the performance of estimation methods under such conditions, random Gaussian noises with variances of 1 ms and 3 ms are, respectively, introduced into the delay samples. As shown in
Figure 12 and
Figure 13, in the low-delay domain, as the noise variance increases, the mean of each method gradually deviates from the actual measurements, while their fluctuation amplitudes remain largely unaffected. In the high-delay domain, as the noise variance increases, the mean and fluctuation amplitude of TSTD remain almost unchanged. For GMEVM, its fluctuation amplitude initially increases and then decreases, while its mean remains nearly constant. For GMM, both its mean and fluctuation amplitude are relatively stable, with the latter continuing to decrease slightly. This is because a higher proportion of data lies in the low-delay domain, where changes in these data can significantly affect the mean of the fitting results. Additionally, the added noise tends to smooth out sharp fluctuations, ultimately reducing the fluctuation amplitudes. In summary, compared with GMM and GMEVM, the multiple-segmentation feature of TSTD makes it more robust when measurement data are slightly inaccurate or contain low levels of noise.