Asymptotic Analysis of a Thresholding Method for Sparse Models with Application to Network Delay Detection

Evgeniy Melezhnikov; Oleg Shestakov; Evgeniy Stepanov

doi:10.3390/math14010148

,

and

¹

Faculty of Computational Mathematics and Cybernetics, M. V. Lomonosov Moscow State University, Moscow 119991, Russia

²

Moscow Center for Fundamental and Applied Mathematics, Moscow 119991, Russia

³

Federal Research Center “Computer Science and Control” of the Russian Academy of Sciences, Moscow 119333, Russia

^*

Author to whom correspondence should be addressed.

Mathematics2026, 14(1), 148;https://doi.org/10.3390/math14010148
(registering DOI)

This article belongs to the Special Issue Parametric and Nonparametric Statistics: From Theory to Applications, 2nd Edition

Version Notes

Order Reprints

Abstract

This paper explores a stochastic model of noisy observations with a sparse true signal structure. Such models arise in a wide range of applications, including signal processing, anomaly detection, and performance monitoring in telecommunication networks. As a motivating example, we consider round-trip time (RTT) data, which characterize the transit time of network packets, where rare, anomalously large values correspond to localized network congestion or failures. The focus is on the asymptotic properties of the mean-square risk associated with thresholding procedures. Upper bounds are obtained for the mean-square risk when using the theoretically optimal threshold. In addition, a central limit theorem and a strong law of large numbers are established for the empirical risk estimate. The results provide a theoretical basis for assessing the effectiveness of thresholding methods in localizing rare anomalous components in noisy data.

Keywords:

thresholding; mean square risk; Gaussian noise; sparse models; central limit theorem; asymptotic analysis; signal processing; network delays (RTT)

MSC:

62G20; 68Q87

1. Introduction

In recent decades, the rapid development of computing technologies and data acquisition systems has led to a substantial increase in the volume and dimensionality of observed data across a wide range of applied fields, including astronomy, physics, biomedicine, and telecommunications. Modern measurement systems operate at high sampling rates, providing fine temporal and spatial resolution. At the same time, the growth of observation density inevitably amplifies the impact of noise caused by external interference, hardware instability, and random environmental fluctuations. As a result, observed data typically represent a superposition of a weak informative signal and a dominant noise component, while truly informative observations constitute only a small fraction of the entire sample.

In many practical situations, the underlying signal exhibits a pronounced sparsity property: only a limited number of components carry significant information, whereas the majority of observations remain close to zero and can be treated as noise. Such sparse structures naturally arise in high-dimensional statistical problems, where informative events manifest themselves as rare and localized deviations. Sparse signal models have proven effective in a variety of applications, including:

Astronomical observations, where rare astrophysical events such as flares, gamma-ray bursts, or gravitational wave signatures appear against a dominant background noise;
Biomedical diagnostics, where abnormal physiological activity is reflected by infrequent pulses embedded in quasi-stationary biosignals such as ECG or EEG recordings;
Telecommunications, where occasional packet losses or excessive delays indicate network congestion or failures amid predominantly stable transmission conditions;
Industrial monitoring systems, where most sensor readings correspond to normal operation, while rare deviations signal faults or emergency situations.

Round-Trip Time Analysis as a Motivating Example. In modern distributed systems and telecommunication networks, monitoring packet delivery latency, commonly measured via round-trip time (RTT), plays a central role in performance assessment and anomaly detection. For each transmitted packet, RTT is defined as the time elapsed between sending the packet and receiving the corresponding acknowledgment. Under normal operating conditions, RTT values fluctuate within a narrow range around a baseline level, whereas rare, abnormally large delays often indicate transient congestion, queueing effects, or network malfunctions; see, for example, ref. [1].

Accordingly, the sequence

(R T T_{1}, \dots, R T T_{N})

can be viewed as a noisy observation vector in which most entries are close to a baseline value, while a small number of observations correspond to anomalous events. After suitable centering, for instance by subtracting a robust location estimate such as the median, the data naturally conform to the sparse observation model studied in this paper. In this context, thresholding procedures serve as a simple and computationally efficient mechanism for suppressing noise while retaining rare, informative deviations associated with abnormal network delays.

RTT-based measurements and congestion detection mechanisms have been actively studied in the networking literature. Existing approaches focus primarily on protocol design, congestion control, and empirical performance evaluation; see, e.g., refs. [2,3,4]. In contrast, the present paper addresses a complementary theoretical problem: we study the asymptotic risk properties of thresholding procedures in sparse stochastic observation models, with RTT data serving as a motivating example rather than the only application domain.

Thresholding and Risk Analysis. Among various approaches to sparse signal recovery, thresholding methods remain particularly attractive due to their conceptual simplicity and strong theoretical guarantees. The central problem in thresholding consists in selecting an appropriate threshold value that balances noise suppression and signal preservation. A natural quantitative criterion for evaluating this trade-off is the mean-square risk of the thresholded estimator.

The asymptotic behavior of risk functionals and their empirical estimates has been extensively studied for a wide range of thresholding and shrinkage procedures; see, for instance, refs. [5,6,7,8,9,10,11,12,13,14,15]. These works establish consistency, asymptotic normality, and minimax optimality properties under various sparsity and dependence assumptions.

The present paper contributes to this line of research by providing (i) an upper bound on the mean-square risk at the theoretically optimal threshold under an asymptotic sparsity regime, and (ii) asymptotic distributional results for the empirical risk estimate, including a central limit theorem and a strong law of large numbers.

The remainder of the paper is organized as follows. Section 2 introduces the sparse stochastic observation model and the thresholding framework. Section 3 studies the theoretical properties of the mean-square risk and its empirical estimate. In particular, Section 3.1 derives an upper bound for the risk at the theoretically optimal threshold, while Section 3.2 establishes asymptotic results for the empirical risk estimate, including a central limit theorem and a strong law of large numbers. Section 4 presents a numerical illustration based on real RTT data. Concluding remarks are given in Section 5.

2. Data Model

Consider the observed data vector

X_{i} = μ_{i} + ϵ_{i}, i = 1, \dots, N .

(1)

where

μ_{i}

are the true (unknown) signal values, and

ϵ_{i} \sim N (0, σ^{2})

are independent Gaussian random variables modeling additive noise. Thus, the random variables

X_{i}

represent noisy measurements of the underlying signal. Throughout the paper, we adopt the standard additive white Gaussian noise (AWGN) model. In applied interpretations, external interference may be viewed as one of the physical sources contributing to this effective Gaussian perturbation, but no separate interference model is assumed.

To describe the signal structure, we assume that

μ_{i} = a_{i} z_{i},

(2)

where

a_{i} \in R

are fixed (deterministic) amplitude coefficients, and

z_{i} \sim Bernoulli (p_{N})

are random indicators independent of each other and of

ϵ_{i}

, taking the value 1 with probability

p_{N}

and 0 with probability

1 - p_{N}

. The parameter

p_{N} \in (0, 1)

may depend on the sample size N and characterizes the proportion of nonzero signal components. Here, the random indicators

z_{i}

model the presence or absence of an informative signal component at position i. The Bernoulli distribution reflects the assumption that such informative events are rare and occur sporadically, without a predetermined structure or fixed location within the observation vector. This modeling choice is natural in applications where anomalous events appear unpredictably in time or space.

The coefficients

a_{i}

represent the amplitudes of these rare signal components. They are treated as fixed but unknown quantities, allowing for heterogeneous magnitudes of anomalies. In the context of network delay analysis,

a_{i}

correspond to excess delays caused by transient congestion or queueing effects. The quantity

N_{0} = \sum_{i = 1}^{N} z_{i}

(3)

describes the number of active (non-zero) signal elements and has a binomial distribution

N_{0} \sim Bin (N, p_{N})

. We assume that the quantity

N p_{N}

tends to zero as N increases. This situation describes a case of asymptotic sparsity, when the number of informative components constitutes a very small fraction of the entire data vector. For the r.v.

N_{0}

, we introduce the corresponding notation

P (N_{0} = k) = p_{N k}

.

Thus, this model describes a noisy observation vector in which significant (non-zero) components are extremely rare and randomly distributed across positions. In other words, informative signal elements do not form a stable structure and can appear in any coordinate of the vector with a low probability

p_{N}

. This assumption reflects the typical behavior of real-world sparse data encountered in signal recovery problems.

For further analysis, we introduce a quantitative condition for sparsity. Let there exist a parameter

γ \in (0, 1)

such that

N p_{N} = o (N^{1 - γ}), N \to \infty .

(4)

This condition formalizes an asymptotic sparsity regime in which the expected number of informative components

E N_{0} = N p_{N}

grows sublinearly with respect to the sample size. The parameter

γ

quantifies the degree of sparsity: larger values of

γ

correspond to rarer signal occurrences. Such regimes naturally arise in high-frequency monitoring systems, where the observation horizon increases while anomalous events remain infrequent.

In other words, the mathematical expectation of the number of nonzero components

E N_{0} = N p_{N}

grows slower than

N^{1 - γ}

. This condition defines the degree of signal sparsity and will serve as a basic assumption in subsequent theoretical results.

Note that this stochastic structure directly corresponds to data on packet delivery delays in network systems. If we set

X_{i} = R T T_{i} - median (R T T_{1}, \dots, R T T_{N})

, where

R T T_{i}

is the observed transit time of the i-th packet, the centering is performed using a robust location estimate. Then typical delay values after centering are close to zero and are described by the noise component

ϵ_{i}

, while rare anomalous bursts correspond to nonzero values

μ_{i} = a_{i} z_{i}

. Thus, the model can be interpreted as a scheme for detecting rare, anomalously high network delays against a background of Gaussian noise.

To reconstruct the true components

μ_{i}

from the observed

X_{i}

, the threshold function is used

p (T, X_{i}) = \{\begin{matrix} X_{i} - T, & X_{i} > T, \\ X_{i} + T, & X_{i} < - T, \\ 0, & | X_{i} | \leq T, \end{matrix}

(5)

depending on the threshold parameter

T > 0

. The corresponding estimate of the true signal has the form

{\hat{μ}}_{i} = p (T, X_{i}) .

(6)

This type of threshold function provides continuous truncation of small coefficients and is widely used, for example, in wavelet analysis of signals.

The function

p (T, X)

corresponds to the classical soft-thresholding rule (Figure 1). It continuously shrinks small-magnitude observations toward zero while preserving larger coefficients. Such thresholding functions naturally arise as solutions of quadratic risk minimization problems and are widely used in sparse signal recovery and wavelet shrinkage due to their stability and favorable theoretical properties [16].

Figure 1. Soft-thresholding function

p (T, x)

. Observations with magnitude below the threshold T are mapped to zero, while larger values are shrunk toward zero by an amount T.

One of the popular methods for choosing a threshold in a thresholding problem is the universal threshold

T_{U} = σ \sqrt{2 ln N}

. Intuitively, the universal threshold is chosen such that, with a high probability, all noise coefficients are less than this value in absolute value. Indeed, if

Y_{i}

,

i = 1, \dots, N,

have a normal distribution

N (0, σ^{2})

, then

P (max_{1 \leq i \leq N} | Y_{i} | > T_{U}) \to 0 .

(7)

Thus, the universal threshold ensures complete suppression of noise coefficients with an asymptotic probability of 1 and is the largest among the so-called useful thresholds (i.e., those that preserve significant signal coefficients but remove noise ones). In particular, any higher threshold will lead to excessive zeroing of informative coefficients.

Therefore, the universal threshold can be considered the upper limit of the range of rational (useful) thresholds. More details can be found in the monograph [16].

To quantitatively describe the reconstruction quality, we introduce the mean-square risk

R (T) = \sum_{i = 1}^{N} E {({\hat{μ}}_{i} - μ_{i})}^{2},

(8)

which serves as a natural measure of the thresholding procedure’s efficiency. Minimizing

R (T)

with respect to the parameter T leads to the definition of the optimal threshold

T_{min} = \arg min_{T \in (0, T_{U}]} R (T),

(9)

ensuring the smallest value of the mean-square deviation between the reconstructed and true signals. The value of

T_{min}

depends on the unknown components

μ_{i}

and, therefore, cannot be calculated directly in practice.

To circumvent this difficulty, a risk estimate expressed in terms of observed data is used:

\hat{R} (T) = \sum_{i = 1}^{N} F (T, X_{i}),

(10)

where the function

F (T, X_{i})

is defined by the rule

F (T, X_{i}) = \{\begin{matrix} X_{i}^{2} - σ^{2}, & | X_{i} | \leq T, \\ T^{2} + σ^{2}, & | X_{i} | > T . \end{matrix}

(11)

It is easy to verify that the estimate constructed in this way is unbiased, that is,

E \hat{R} (T) = R (T)

.

Similarly, we define an adaptive threshold that minimizes the empirical risk estimate:

T_{S} = \arg min_{T \in (0, T_{U}]} \hat{R} (T),

(12)

which is known in the literature as the SURE threshold (Stein’s Unbiased Risk Estimate threshold). The threshold

T_{S}

is a random variable dependent on the observations

X_{i}

and serves as a practical replacement for the theoretically optimal

T_{min}

.

In the following discussion, the classical Bernstein inequality [17] will be used to obtain probability estimates and upper bounds on risk:

P (\sum_{i = 1}^{n} ξ_{i} \geq t) \leq exp (- \frac{t^{2} / 2}{\sum_{i = 1}^{n} E ξ_{i}^{2} + b t / 3}), t > 0,

(13)

where

| ξ_{i} | \leq b

a.s. This inequality allows one to obtain exponential estimates for the tails of distributions and is widely used in the analysis of sums of independent random variables.

3. Theoretical Properties of the Risk and Its Estimation

3.1. Upper Bound for the Risk at the Optimal Threshold

In this section, we examine the asymptotic behavior of the risk in a sparse model defined by the condition

N p_{N} = o (N^{1 - γ})

. Under these assumptions, we show that the mean-square risk at the optimal threshold

T_{min}

is of order

N^{1 - γ} ln N

at most. The main result is formulated in the following theorem.

Theorem 1.

Let the sparsity condition

N p_{N} = o (N^{1 - γ})

be satisfied for some

γ \in (0, 1)

and let

T_{min}

be the threshold that minimizes the theoretical risk

R (T)

. Then there exists a constant

C > 0

such that

R (T_{min}) \leq C N^{1 - γ} ln N .

(14)

Proof.

Let’s split the risk into two parts:

\begin{matrix} R (T_{m i n}) & = E ∥ \hat{μ} (T_{m i n}) {- μ ∥}^{2} = \\ = E ∥ \hat{μ} (T_{m i n}) {- μ ∥}^{2} (1 (N_{0} > N^{1 - γ}) + 1 (N_{0} \leq N^{1 - γ})) = \\ = E [∥ \hat{μ} (T_{m i n}) {- μ ∥}^{2} 1 (N_{0} > N^{1 - γ})] + E [{∥ \hat{μ} (T_{m i n}) - μ ∥}^{2} 1 (N_{0} \leq N^{1 - γ})] . \end{matrix}

Consider the first part. We introduce the following notation:

Y = ∥ \hat{μ} (T_{m i n}) {- μ ∥}^{2},

(15)

A = {N_{0} > N^{1 - γ}} .

(16)

In this notation, taking into account the upper bound for the terms in (8) from [18], we have:

\begin{matrix} E ([| | \hat{μ} (T_{m i n}) - μ | |^{2} 1 (N_{0} > N^{1 - γ})] | N_{0}) & = P (A) \cdot E (Y | A) = P (A) \cdot E (Y | N_{0}) \leq \\ \leq c N T_{U}^{2} \cdot P (N_{0} > N^{1 - γ}) \end{matrix}

(17)

Using Bernstein’s inequality [17] and taking into account that

N p_{N} = o (N^{1 - γ})

, for

P (N_{0} > N^{1 - γ})

we obtain

\begin{matrix} P (N_{0} > N^{1 - γ}) & = P (N_{0} - E N_{0} > N^{1 - γ} - E N_{0}) \leq \\ \leq exp (- c \cdot \frac{N^{2 (1 - γ)}}{N p_{N} (2 + \frac{N^{1 - γ}}{3 N p_{N}})}) = exp (- c \cdot \frac{N^{2 (1 - γ)}}{2 N p_{N} + N^{1 - γ} / 3}) \leq \\ \leq exp (- c^{'} N^{1 - γ}) \end{matrix}

(18)

Now consider the second part.

\begin{matrix} E [∥ \hat{μ} (T_{m i n}) {- μ ∥}^{2} 1 (N_{0} \leq N^{1 - γ})] & = E (E_{z} [| | \hat{μ} (T_{m i n}) - μ | |^{2} 1 (N_{0} \leq N^{1 - γ})] | N_{0}]) . \end{matrix}

(19)

Since (18) has an exponential order of decrease, this term does not affect the order of

T_{m i n}

and

R (T_{m i n})

, therefore, using the method of proving Lemma 2 from [15], we can verify that

T_{m i n} \geq σ \sqrt{2 γ ln N} - α_{n}

, where

| α_{n} | < c^{'} \frac{ln ln N}{\sqrt{ln N}}

. Next, using the reasoning of Theorem 2 from [14], we obtain the estimate

\begin{matrix} E (E_{z} [| | \hat{μ} (T_{m i n}) - {μ | |}^{2} 1 (N_{0} \leq N^{1 - γ})] | N_{0}) \leq c N^{1 - γ} ln N, \end{matrix}

(20)

where c is some positive constant.

Combining the given estimates, we obtain the statement of the theorem. □

The resulting estimate defines the order of theoretical risk at the optimal threshold. Let us now turn to an analysis of the probabilistic properties of its empirical estimate

\hat{R} (T)

.

3.2. Central Limit Theorem and Strong Law of Large Numbers for the Risk Estimate

Theorem 2.

Let the conditions of Theorem 1 be satisfied and

1 / 2 < γ < 1

. Then

P (\frac{\hat{R} (T_{S}) - R (T_{m i n})}{σ^{2} \sqrt{2 N}} < x) \underset{N \to \infty}{\to} Φ (x) .

(21)

Proof.

\begin{matrix} \frac{\hat{R} (T_{S}) - R (T_{m i n})}{σ^{2} \sqrt{2 N}} & = \frac{(\hat{R} (T_{S}) - R (T_{m i n})) (1 (N_{0} > N^{1 - γ}) + 1 (N_{0} \leq N^{1 - γ}))}{σ^{2} \sqrt{2 N}} = \\ \equiv \frac{(\hat{R} (T_{m i n}) - R (T_{m i n})) 1 (N_{0} \leq N^{1 - γ})}{σ^{2} \sqrt{2 N}} + U (T_{S}, T_{m i n}) . \end{matrix}

(22)

Let us denote the first term by

z_{N}

and consider the difference between the distribution function of

z_{N}

and the distribution function of the standard normal law:

\begin{matrix} | P (z_{N} < x) - Φ (x) | & = | \sum_{k = 0}^{[N^{1 - γ}]} p_{N k} \cdot P (z_{N} < x | N_{0} = k) - Φ (x) | \leq \\ \leq \sum_{k = 0}^{[N^{1 - γ}]} p_{N k} \cdot | (P (z_{N} < x | N_{0} = k) - Φ (x) | + Φ (x) \cdot P (N_{0} > N^{1 - γ}) \leq \\ \leq sup_{0 \leq k \leq [N^{1 - γ}]} | (P (z_{N} < x | N_{0} = k) - Φ (x) | \sum_{k = 0}^{[N^{1 - γ}]} p_{N k} + exp (- c^{'} N^{1 - γ}) . \end{matrix}

(23)

Because of the sparsity conditions [19]

\begin{matrix} lim_{N \to \infty} \frac{D (\hat{R} (T_{m i n}) - R (T_{m i n}))}{σ^{4} 2 N} = 1, \end{matrix}

(24)

and since

\sum_{k = 0}^{[N^{1 - γ}]} p_{N k} \leq 1

, using the Berry-Esseen inequality for sums of differently distributed independent random variables, we obtain that as

N \to \infty

sup_{x \in R} sup_{0 \leq k \leq [N^{1 - γ}]} | (P (z_{N} < x | N_{0} = k) - Φ (x) | \to 0 .

(25)

It remains to show that

U (T_{S}, T_{m i n})

tends to zero in probability when

N \to \infty

.

\begin{matrix} U (T_{S}, T_{m i n}) & = \frac{\hat{R} (T_{S}) - \hat{R} (T_{m i n})}{σ^{2} \sqrt{2 N}} + \frac{(\hat{R} (T_{m i n}) - R (T_{m i n})) 1 (N_{0} > N^{1 - γ})}{σ^{2} \sqrt{2 N}} . \end{matrix}

(26)

For the second term, arguing as in Theorem 1, we obtain the estimate

\begin{matrix} P (| \frac{(\hat{R} (T_{m i n}) - R (T_{m i n})) 1 (N_{0} > N^{1 - γ})}{σ^{2} \sqrt{2 N}} | > ϵ) \leq P (N_{0} > N^{1 - γ}) \leq exp (- c^{'} N^{1 - γ}) . \end{matrix}

(27)

Let’s consider the first term.

\begin{matrix} P (| \frac{\hat{R} (T_{S}) - \hat{R} (T_{m i n})}{σ^{2} \sqrt{2 N}} \cdot 1 (N_{0} \leq N^{1 - γ}) | > ϵ) = \\ = \sum_{k = 0}^{[N^{1 - γ}]} p_{N k} P (| \frac{\hat{R} (T_{S}) - \hat{R} (T_{m i n})}{σ^{2} \sqrt{2 N}} | > ϵ | N_{0} = k) \leq \\ \leq sup_{0 \leq k \leq [N^{1 - γ}]} P (| \frac{\hat{R} (T_{S}) - \hat{R} (T_{m i n})}{σ^{2} \sqrt{2 N}} | > ϵ | N_{0} = k) . \end{matrix}

(28)

Repeating the reasoning of Lemma 3 and Theorem 1 from [15], we verify that for any

ϵ > 0

\begin{matrix} sup_{0 \leq k \leq [N^{1 - γ}]} P (| \frac{\hat{R} (T_{S}) - \hat{R} (T_{m i n})}{σ^{2} \sqrt{2 N}} | > ϵ | N_{0} = k) \leq \\ \leq sup_{0 \leq k \leq [N^{1 - γ}]} (P (T_{S} \leq T^{*} | N_{0} = k) + P (sup_{T \in [T^{*}, T_{U}]} | \frac{\hat{R} (T) - \hat{R} (T_{m i n})}{σ^{2} \sqrt{2 N}} | > ϵ | N_{0} = k)), \end{matrix}

(29)

where

T^{*} = σ \sqrt{2 β ln N}

with an arbitrary

β \in (1, 2 γ)

, and each term tends to zero when

N \to \infty

. Thus,

\begin{matrix} U (T_{S}, T_{m i n}) \overset{P}{⟶} 0 as N \to \infty . \end{matrix}

(30)

The theorem is proved. □

Theorem 3.

Let the conditions of Theorem 1 be satisfied. For any

0 < γ < 1

, when

N \to \infty

\frac{\hat{R} (T_{S}) - R (T_{m i n})}{N} \overset{a . s .}{⟶} 0

(31)

Proof.

For any

ϵ > 0

\begin{matrix} P (| \frac{\hat{R} (T_{S}) - R (T_{m i n})}{N} | > ϵ) = \end{matrix}

\begin{matrix} = P (| \frac{\hat{R} (T_{S}) - R (T_{m i n})}{N} | > ϵ, N_{0} \leq N^{1 - γ}) + \\ + P (| \frac{\hat{R} (T_{S}) - R (T_{m i n})}{N} | > ϵ, N_{0} > N^{1 - γ}) \equiv \\ \equiv p_{N} + q_{N} . \end{matrix}

(32)

\begin{matrix} q_{N} \leq P (N_{0} > N^{1 - γ}) \leq exp (- c^{'} N^{1 - γ}) \end{matrix}

(33)

and hence

\sum_{N = 1}^{\infty} q_{N} < \infty .

For

p_{N}

we have

\begin{matrix} p_{N} = \sum_{k = 0}^{[N^{1 - γ}]} p_{N k} P (| \frac{\hat{R} (T_{S}) - R (T_{m i n})}{N} | > ϵ | N_{0} = k) \leq \\ \leq sup_{0 \leq k \leq [N^{1 - γ}]} P (| \frac{\hat{R} (T_{S}) - R (T_{m i n})}{N} | > ϵ | N_{0} = k) \leq \\ \leq sup_{0 \leq k \leq [N^{1 - γ}]} (P (T_{S} \leq T^{*} | N_{0} = k) + P (sup_{T \in [T^{*}, T_{U}]} | \frac{\hat{R} (T) - \hat{R} (T_{m i n})}{N} | > ϵ | N_{0} = k)), \end{matrix}

(34)

where

T^{*} = σ \sqrt{2 β ln N}

with an arbitrary

β \in (0, min {1, 2 γ})

. Then arguing as in Lemma 3 and Theorem 2 of [15], we obtain that for some positive constants

c_{1}

,

c_{2}

,

C_{1}

and

C_{2}

\begin{matrix} p_{N} \leq C_{1} N exp (- c_{1} \frac{N^{1 - β}}{{(ln N)}^{3}}) + C_{2} exp (- c_{2} \frac{N}{{(ln N)}^{2}}) \end{matrix}

(35)

and

\sum_{N = 1}^{\infty} p_{N} < \infty

. Thus,

\begin{matrix} \sum_{N = 1}^{\infty} (p_{N} + q_{N}) < \infty \end{matrix}

(36)

and by virtue of the Borel-Cantelli lemma the convergence (31) holds. □

4. Numerical Illustration: RTT Data Analysis

To illustrate the practical relevance of the proposed sparse observation model and the thresholding methodology, we present a numerical experiment based on real round-trip time (RTT) measurements collected in a controlled network probing setup. To clarify the overall processing pipeline used in the numerical experiment, Figure 2 presents a schematic overview of the main steps, from raw RTT measurements to sparse anomaly localization.

Figure 2. Schematic illustration of the data processing pipeline. Raw RTT measurements are centered using a robust location estimate, the noise level is estimated via a MAD-based procedure, an adaptive threshold is selected using the SURE criterion, and soft-thresholding is applied to obtain a sparse representation highlighting anomalous RTT components.

4.1. Data Acquisition and Preprocessing

RTT measurements were obtained by periodically sending batches of probe packets to a fixed network destination. Sending packets in batches allows reliable detection of packet losses and anomalous delays, while repeated probing with a fixed periodicity ensures temporal consistency of measurements.

All probes targeted the same network endpoint; therefore, the routing path can be assumed to be stable during the observation period. This allows abrupt RTT deviations to be interpreted primarily as manifestations of transient congestion or queueing effects, rather than route changes.

Each observation consists of a timestamp (in nanoseconds) and the corresponding RTT value (in milliseconds). Let

{R T T_{i}}_{i = 1}^{N}

denote the recorded sequence, where N = 54,000. To remove the constant delay component and obtain data compatible with the assumed noise model, the observations were centered with respect to the median value:

X_{i} = R T T_{i} - median (R T T_{1}, \dots, R T T_{N}) .

(37)

The median was chosen as a robust baseline, since it is insensitive to rare but extremely large RTT values and preserves the approximate symmetry of the noise component.

The noise level

σ

was estimated using the median absolute deviation (MAD) [16],

\hat{σ} = 1.4826 \cdot median (| X_{i} |),

(38)

which yielded

\hat{σ} \approx 0.45

ms for the considered dataset.

4.2. Sparse Structure After Thresholding

Figure 3 illustrates the effect of thresholding on the centered RTT data. The original centered observations are shown together with the output of the continuous soft-thresholding function

p (T, X_{i})

.

Figure 3. Sparse structure of RTT deviations after thresholding. The blue curve corresponds to centered RTT observations, while the orange curve shows the thresholded signal. Thresholding suppresses small-amplitude fluctuations attributed to noise, while preserving rare, high-magnitude deviations corresponding to anomalous network delays.

As predicted by the sparse observation model, the vast majority of coefficients are set to zero after thresholding, while only a small number of significant components remain nonzero. In the present experiment, only about

2.4 %

of the observations retain nonzero values after thresholding, which is consistent with the Bernoulli-type sparsity assumption introduced in Section 2.

4.3. Empirical Risk Minimization

To demonstrate the adaptive threshold selection mechanism, we computed Stein’s unbiased risk estimate

\hat{R} (T)

over a range of threshold values. The search was restricted to the interval

(0, T_{U}]

, where

T_{U} = \hat{σ} \sqrt{2 ln N}

(39)

is the universal threshold. For the considered dataset,

T_{U} \approx 2.08

ms.

Figure 4 shows the resulting empirical risk curve together with the selected SURE threshold

T_{S}

.

Figure 4. Empirical SURE risk

\hat{R} (T)

as a function of the threshold value T. The dashed vertical line indicates the data-driven threshold

T_{S}

minimizing the empirical risk. The dotted vertical line corresponds to the universal threshold

T_{U}

.

The empirical risk curve exhibits a clear and well-defined minimum, illustrating the theoretical results on the existence and stability of an optimal threshold. In particular, overly small thresholds retain excessive noise, while overly large thresholds lead to the loss of informative signal components. The SURE-based procedure yields

T_{S} \approx 0.95

ms in this experiment.

4.4. Summary Statistics

Table 1 reports summary statistics of the RTT data and the resulting sparse representation.

Table 1. Summary statistics of RTT data and thresholding results.

These results quantitatively confirm that RTT anomalies form a highly sparse structure, thereby supporting the modeling assumptions and validating the applicability of the proposed asymptotic analysis.

5. Conclusions

This paper presents an asymptotic analysis of the properties of the mean-square risk during threshold processing of noisy signals in a sparse stochastic observation model. For the theoretically optimal threshold that minimizes the theoretical risk, an upper bound is obtained, showing that the risk does not exceed a value of the order

N^{1 - γ} ln N

.

Furthermore, a central limit theorem for the empirical risk estimate is established, as well as a strong law of large numbers, which guarantees the almost-everywhere convergence of the empirical risk to its theoretical counterpart. These results provide a rigorous theoretical basis for analyzing the asymptotic properties of thresholding procedures in problems of sparse signal recovery from observations with Gaussian noise.

The obtained estimates refine the classical results of Donoho and Johnstone (see [20]) and confirm the effectiveness of thresholding methods under sparsity conditions.

At the same time, the proposed analysis has several limitations. In particular, the theoretical results are derived under the assumption of independent Gaussian noise and rely on an asymptotic framework with a large number of observations. Moreover, the sparsity structure is characterized by a specific rate of decay of the proportion of nonzero components, which may not fully capture more complex or heterogeneous sparse patterns encountered in practice.

A promising direction for future research is to extend the proposed approach to more general models incorporating dependent noise, inhomogeneous variances, and multidimensional data structures. Another important direction is to study the asymptotic behavior of the optimal threshold under different sparsity regimes and to analyze the minimax properties of the resulting procedures. Finally, it would be of interest to investigate models with a random sample size, where the number of observations N is itself a random variable. Such settings naturally arise in practical applications, including network monitoring problems with random traffic intensity or missing observations, and may require new asymptotic tools for analyzing thresholding procedures and their associated risk.

Author Contributions

Conceptualization, E.M., E.S. and O.S.; methodology, E.M. and O.S.; formal analysis, E.M. and O.S.; data curation, E.M. and E.S.; investigation, E.M. and O.S.; writing—original draft preparation, E.M. and O.S.; writing—review and editing, E.M. and O.S.; visualization, E.M.; supervision, E.M. and O.S.; funding acquisition, E.S. and O.S. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported by the The Ministry of Economic Development of the Russian Federation in accordance with the subsidy agreement (agreement identifier 000000C313925P4H0002; grant No 139-15-2025-012).

Data Availability Statement

The original contributions presented in this study are included in the article. Further inquiries can be directed to the corresponding author.

Conflicts of Interest

The authors declare no conflict of interest.

References

Borisov, A.V.; Kurinov, Y.N.; Smeliansky, R.L. Mathematical Support for Monitoring of States and Numerical Characteristics of Network Connection Based on Compound Statistical Information. Inform. Appl. 2025, 19, 35–44. [Google Scholar]
Sawabe, A.; Shinohara, Y.; Iwai, T. Congestion State Estimation via Packet-Level RTT Gradient Analysis with Gradual RTT Smoothing. In Proceedings of the 2024 IEEE 21st Consumer Communications & Networking Conference (CCNC), Las Vegas, NV, USA, 6–9 January 2024; pp. 857–862. [Google Scholar]
Mittal, R.; Lam, V.T.; Dukkipati, N.; Blem, E.; Wassel, H.; Ghobadi, M.; Vahdat, A.; Wang, Y.; Wetherall, D.; Zats, D. TIMELY: RTT-Based Congestion Control for the Datacenter. ACM SIGCOMM Comput. Commun. Rev. 2015, 45, 537–550. [Google Scholar] [CrossRef]
Ancilotti, E.; Boetti, S.; Bruno, R. RTT-Based Congestion Control for the Internet of Things. In Wired/Wireless Internet Communication, Proceedings of the 16th IFIP WG 6.2 International Conference, WWIC 2018, Boston, MA, USA, 18–20 June 2018; Springer: Cham, Switzerland, 2018; pp. 3–15. [Google Scholar]
Gao, H.-Y.; Bruce, A.G. Waveshrink with Firm Shrinkage. Stat. Sin. 1997, 7, 855–874. [Google Scholar]
Gao, H.-Y. Wavelet Shrinkage Denoising Using the Non-Negative Garrote. J. Comput. Graph. Statist. 1998, 7, 469–488. [Google Scholar] [CrossRef]
Marron, J.S.; Adak, S.; Johnstone, I.M.; Neumann, M.H.; Patil, P. Exact Risk Analysis of Wavelet Regression. J. Comput. Graph. Stat. 1998, 7, 278–309. [Google Scholar] [CrossRef]
Poornachandra, S.; Kumaravel, N.; Saravanan, T.K.; Somaskandan, R. WaveShrink Using Modified Hyper-Shrinkage Function. In Proceedings of the 2005 IEEE Engineering in Medicine and Biology 27th Annual Conference, Shanghai, China, 17–18 January 2006; pp. 30–32. [Google Scholar]
Abramovich, F.; Benjamini, Y.; Donoho, D.; Johnstone, I. Adapting to Unknown Sparsity by Controlling the False Discovery Rate. Ann. Statist. 2006, 34, 584–653. [Google Scholar] [CrossRef]
Donoho, D.; Jin, J. Asymptotic Minimaxity of False Discovery Rate Thresholding for Sparse Exponential Data. Ann. Statist. 2006, 34, 2980–3018. [Google Scholar] [CrossRef]
Markin, A.V. Limit Distribution of Risk Estimate of Wavelet Coefficient Thresholding. Inform. Appl. 2009, 3, 57–63. [Google Scholar]
Huang, H.-C.; Lee, T.C.M. Stabilized Thresholding with Generalized Sure for Image Denoising. In Proceedings of the 2010 IEEE International Conference on Image Processing, Hong Kong, China, 26–29 September 2010; pp. 1881–1884. [Google Scholar]
Zhao, R.-M.; Cui, H.-M. Improved Threshold Denoising Method Based on Wavelet Transform. In Proceedings of the 2015 7th International Conference on Modelling, Identification and Control (ICMIC 2015), Sousse, Tunisia, 18–20 December 2015; pp. 1–4. [Google Scholar]
Vorontsov, M.O.; Shestakov, O.V. Mean-Square Risk of the FDR Procedure under Weak Dependence. Inform. Appl. 2023, 17, 34–40. [Google Scholar]
Kudryavtsev, A.A.; Shestakov, O.V. Properties of the SURE Estimates When Using Continuous Thresholding Functions for Wavelet Shrinkage. Mathematics 2024, 12, 3646. [Google Scholar] [CrossRef]
Mallat, S. A Wavelet Tour of Signal Processing; Academic Press: New York, NY, USA, 1999. [Google Scholar]
Bennett, G. Probability Inequalities for the Sum of Independent Random Variables. J. Am. Stat. Assoc. 1962, 57, 33–45. [Google Scholar] [CrossRef]
Donoho, D.; Johnstone, I.M. Ideal Spatial Adaptation via Wavelet Shrinkage. Biometrika 1994, 81, 425–455. [Google Scholar] [CrossRef]
Palionnaya, S.I.; Shestakov, O.V. Asymptotic Properties of MSE Estimate for the False Discovery Rate Controlling Procedures in Multiple Hypothesis Testing. Mathematics 2020, 8, 1913. [Google Scholar] [CrossRef]
Donoho, D.; Johnstone, I.M. Adapting to Unknown Smoothness via Wavelet Shrinkage. J. Amer. Stat. Assoc. 1995, 90, 1200–1224. [Google Scholar] [CrossRef]

Figure 1. Soft-thresholding function

p (T, x)

. Observations with magnitude below the threshold T are mapped to zero, while larger values are shrunk toward zero by an amount T.

Figure 2. Schematic illustration of the data processing pipeline. Raw RTT measurements are centered using a robust location estimate, the noise level is estimated via a MAD-based procedure, an adaptive threshold is selected using the SURE criterion, and soft-thresholding is applied to obtain a sparse representation highlighting anomalous RTT components.

Figure 3. Sparse structure of RTT deviations after thresholding. The blue curve corresponds to centered RTT observations, while the orange curve shows the thresholded signal. Thresholding suppresses small-amplitude fluctuations attributed to noise, while preserving rare, high-magnitude deviations corresponding to anomalous network delays.

Figure 4. Empirical SURE risk

\hat{R} (T)

as a function of the threshold value T. The dashed vertical line indicates the data-driven threshold

T_{S}

minimizing the empirical risk. The dotted vertical line corresponds to the universal threshold

T_{U}

.

Table 1. Summary statistics of RTT data and thresholding results.

Quantity	Value	Units
Total number of observations N	54,000	–
Nonzero components after thresholding	1299	–
Sparsity level	2.4	%
Universal threshold $T_{U}$	2.08	ms
Selected threshold $T_{S}$	0.95	ms
Estimated noise standard deviation $\hat{σ}$	0.45	ms

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Article Metrics

Citations

Article Access Statistics

Journal Statistics

Article metric data becomes available approximately 24 hours after publication online.