Wavelet Thresholding Risk Estimate for the Model with Random Samples and Correlated Noise

: Signal de-noising methods based on threshold processing of wavelet decomposition coefﬁcients have become popular due to their simplicity, speed, and ability to adapt to signal functions with spatially inhomogeneous smoothness. The analysis of the errors of these methods is an important practical task, since it makes it possible to evaluate the quality of both methods and equipment used for processing. Sometimes the nature of the signal is such that its samples are recorded at random times. If the sample points form a variational series based on a sample from the uniform distribution on the data registration interval, then the use of the standard threshold processing procedure is adequate. The paper considers a model of a signal that is registered at random times and contains noise with long-term dependence. The asymptotic normality and strong consistency properties of the mean-square thresholding risk estimator are proved. The obtained results make it possible to construct asymptotic conﬁdence intervals for threshold processing errors using only the observed data.


Introduction
In digital signal processing tasks, it is often assumed that the recorded signal samples are independent. However, there are many physical processes that demonstrate long-term dependence where correlations between observations decrease rather slowly. For example, long-term dependence is often observed in geophysical processes where it takes the form of long periods of large or small values of observations. Interferences in communication channels demonstrate similar phenomena. Wavelet methods are widely used in the analysis and processing of signals recorded during the study of such processes.
The wavelet decomposition of a function f (x) is a series where ψ j,k (x) = 2 j/2 ψ(2 j x − k), and ψ(x) is a wavelet function. The indices j and k are called the scale and the shift, respectively. This decomposition provides a time/scale representation of the signal function, that allows one to localise its features. There exist many wavelet functions with various properties. In practice, a discrete wavelet transform is used. It is a multiplication of a sampled signal vector by an orthogonal matrix defined by the wavelet function ψ(x) (in practice implemented with a fast cascade algorithm [1]). This transform is applied to data, and the threshold processing of the resulting wavelet coefficients is performed [1]. For a model of signal samples with an equispaced grid, these methods were well studied by D. Donoho, I. Johnstone, B. Silverman and others [2][3][4][5][6][7][8][9][10]. Statistical properties of the mean-square risk estimator were also studied. It is shown that under certain conditions it is strongly consistent and asymptotically normal [11][12][13].
In some experiments it is not possible to record signal samples at regular intervals [14]. Sometimes registration of samples is made at random times. It was shown by T. Cai and L. Brown [15] that, if the sample points form a variational series based on a sample from the uniform distribution on the data registration interval, then the rate of the mean-square thresholding risk remains, up to a logarithmic factor, equal to the optimal rate in the class of Lipschitz regular functions. A special case of the uniform distribution appears, for example, when considering a Poisson process, and since the conditional distribution of its points on a given time interval, given the number of points, is uniform. These models can arise, for example, in astronomy when considering the stellar intensity. In this paper, it is proven that under some regularity conditions, the statistical properties of the risk estimator also remain the same for both equispaced and random sample grids.

Long-Term Dependence
Let the signal function f (x) be defined on the segment [0, 1] and be uniformly Lipschitz regular with some exponent γ > 0 and Lipschitz constant L > 0: f ∈ Lip(γ, L). Assume that the samples of f (x) contain additive correlated noise and are recorded at random times that are independent and uniformly distributed on [0, 1]. Namely, consider the following data model: where 0 ≤ x (1) < . . . < x (N) ≤ 1 is the variational series based on a sample from the uniform distribution on the segment [0, 1], and {e i , i ∈ Z} is a stationary Gaussian process with the covariance sequence r k = cov(e i , e i+k ). We assume that e i have zero mean and unit variance. We also assume that the noise autocovariance function decreases at the rate of r k ∼ k −α , where 0 < α < 1. This corresponds to the long-term dependence between observations [7]. The observations consist of pairs (x (1) , Y 1 ), . . . , (x (N) , Y N ), where the distances between the samples are, generally, not equal. It is known that Ex (i) = i/(N + 1) (see Lemma 2 in [15]). Along with (1), consider a sample with equal distances between sample points where For the sample (2) threshold processing methods have been developed that effectively suppress the noise and provide an "almost" optimal rate of the mean-square risk [7,8]. The discrete wavelet transform with Meyer wavelets is applied to the sample (2) to obtain a set of empirical wavelet coefficients [1] where µ j,k are discrete wavelet coefficients of the sample and the noise coefficients ξ j,k have the standard normal distribution, but are not independent. The variances of W j,k have the form σ 2 j = 2 (1−α)(J−j) [12]. To suppress the noise the coefficients W j,k are processed with the hard thresholding function ρ H (y, T) = x1(|y| > T) or the soft thresholding function ρ S (y, T) = sgn(x) (|y| − T) + with some threshold T, and the estimates W j,k are obtained. After that, the inverse wavelet transform is performed. The idea of threshold processing is that the wavelet transform provides a "sparse" representation of the useful signal function; i.e., the signal is represented by a relatively small number of modulo large coefficients. To provide a "sparse" representation of a function that is uniformly Lipschitz regular with an exponent γ, the wavelet function participating in the discrete wavelet transform must have M continuous derivatives (M ≥ γ) and M vanishing moments. It also must decrease fast enough at infinity. It is further assumed that the Meyer wavelets [1] that satisfy all the necessary conditions are used to perform the wavelet transform.
If we apply the discrete wavelet transform to the sample (1), we obtain the set of empirical wavelet coefficients Here ν j,k are the coefficients of the discrete wavelet transform of the sample In general, V j,k are not equal to W j,k , and ν j,k are not equal to µ j,k . However, one can apply the same thresholding procedure to the coefficients V j,k as to the coefficients W j,k and obtain the estimators V j,k . The following sections discuss the properties of the resulting estimators.

Mean-Square Thresholding Risk
The mean-square thresholding risk for a sample with random grid is defined as We also define the mean-square risk for the equispaced sample as The threshold selection is one of the main problems in threshold processing. For the class Lip(γ, L), the threshold T γ = σ j 4αγ 2γ+α ln2 j (calculated for each j) is close to optimal [16]. Using the results of [7] (Theorem 3), we can estimate the rate of R µ ( f , T γ ).
where C is a positive constant.
Additionally, repeating the arguments of Theorem 1 in [15], it can be shown that similar statement is valid for R ν ( f , T γ ) when γ > max{(4α − 2) −1 , 1/2}. Thus, the replacement of equally-spaced samples by random ones does not affect the upper estimate for the rate of the mean-square risk.

Properties of the Mean-Square Risk Estimate
Since expression (3) explicitly depends on the unknown values of µ j,k , it cannot be calculated in practice. However, it is possible to construct its estimate based only observable data. This estimate is determined by the expression where F[V j,k , T] = (V 2 j,k − σ 2 )1(|V j,k | ≤ T) + σ 2 1(|V j,k | > T) for the hard threshold processing and F[V j,k , T] = (V 2 j,k − σ 2 )1(|V j,k | ≤ T) + (σ 2 + T 2 )1(|V j,k | > T) for the soft threshold processing [1,3]. Estimator (4) provides an opportunity to get an idea of the evaluation error for the function f , since it can be calculated using only the observable values V j,k . The following statement establishes its asymptotic normality, that, in particular, allows constructing asymptotic confidence intervals for the mean-square risk (3).
Theorem 2. Let f ∈ Lip(γ, L) on the segment [0, 1] with γ > max{(4α − 2) −1 , 1/2}, α > 1/2, and let the Meyer wavelet satisfy the conditions listed above. Then for the hard and soft threshold processing we have where Φ(x) is the distribution function of the standard normal law, D 2 J = C α 2 J , and the constant C α depends only on α and the wavelet type.

Remark 1.
In practice, one needs to know the constant C α . Unlike the case of independent observations, this constant depends on the chosen wavelet. The method of calculation of C α is discussed in [12].
Proof. Let us prove the theorem for the hard threshold processing method. In the case of soft threshold processing, the proof is similar.
In [12] with the use of the results of [17][18][19] it is shown that Therefore, to prove the theorem, it suffices to show that Under the conditions γ > max{(4α − 2) −1 , 1/2} and α > 1/2, by virtue of Theorem 1 and a similar statement for R ν ( f , T), we obtain that Since for some constantC > 0 we have Consider the sum Using the results of [12,15,20], it can be shown that the conditional distribution of this sum for fixed x i is normal with the mean and the variance that is less thanC whereC α is a positive constant.
Since f ∈ Lip(γ, L), repeating the arguments of [20], it can be shown that Hence, applying the Markov inequality, we obtain The remaining sums in (6) contain indicators where either |V j,k | > T γ or |W j,k | > T γ . Repeating the reasoning from [12] and using (7), it can be shown that, when divided by 2 J/2 , they also converge to zero in probability. The theorem is proven.
Theorem 2 provides the possibility to construct asymptotic confidence intervals for the mean-square thresholding risk on the basis of its estimate.
In addition to the asymptotic normality, the estimator (4) also possesses the property of strong consistency. Theorem 3. Suppose that the conditions of Theorem 2 are satisfied. Then for hard or soft threshold processing, for any λ > 1/2 we have Since (5) (4), and the proof of this statement almost completely repeats the proof of the corresponding risk estimator property in [13].

Discussion
As it has been already mentioned, Theorem 2 provides the possibility to construct asymptotic confidence intervals for the mean-square thresholding risk. For practical purposes, it is desirable to have guaranteed confidence intervals. These intervals could be constructed based on the estimates of the convergence rate in Theorem 2. The estimates should depend on the Lipschitz parameters and parameter α. Guaranteed confidence intervals would help to explain how the results of Theorems 2 and 3 affect the error estimation for the finite signal size. We therefore leave the problem of estimation of the rate of convergence and explicit numerical simulation for future work.
The obtained results are applicable to Meyer wavelets. Their advantage is that they possess infinitely many vanishing moments. It simplifies the proof of asymptotic normality in [12]. In view of the results of [8] it is clear that similar conclusions could be obtained with other wavelets that have a large enough number of vanishing moments (e.g., various Daubechies families).
It follows from Theorems 2 and 3 that the statistical properties of the mean-square risk estimator in a model with the uniform random design remain the same as in a model with equispaced samples. Note that this situation is not common. Random times of sample registration can also result in a random sample size. This situation was considered in [22]. In this case, the properties of the model can significantly differ from the properties of the fixed sample size model. For example, the limit distribution of the mean-square risk estimator can be a scale mixture of normal laws, which can have significantly heavier tails than the normal distribution. In particular, this distribution may belong to the class of stable laws, and it is well known that the variances of all stable laws, except the normal one, are infinite (the properties of stable distributions are discussed in detail in the monograph of V. M. Zolotarev [23]; see also [24]).
Funding: This research was funded by Russian Science Foundation, project number 18-11-00155.