You are currently viewing a new version of our website. To view the old version click .
Mathematics
  • Article
  • Open Access

9 December 2025

Wavelet Estimation for Density and Copula Functions

and
Economics, Management and Quantitative Finance Research Laboratory (LaREMFiQ), Economics and Quantitative Methods Department, Institute of High Commercial Studies of Sousse, University of Sousse, Sousse 4054, Tunisia
*
Author to whom correspondence should be addressed.
This article belongs to the Special Issue Probability Statistics and Quantitative Finance

Abstract

This article investigates the problem of univariate and bivariate density estimation using wavelet decomposition techniques. Special attention is given to the estimation of copula functions, which capture the dependence structure between random variables independent of their marginals. We consider two distinct frameworks: the case of independent and identically distributed (i.i.d.) variables and the case where variables are dependent, allowing us to highlight the impact of the dependence structure on the performance of wavelet-based estimators. Building on this framework, we propose a novel iterative thresholding method applied to the detail coefficients of the wavelet transform. This iterative scheme aims to enhance noise reduction while preserving significant structural features of the underlying density or copula function. Numerical experiments illustrate the effectiveness of the proposed method in both univariate and bivariate settings, particularly in capturing localized features and discontinuities in the presence of varying dependence patterns.

1. Introduction

The theoretical foundation of wavelets was laid in the early works of Grossman and Morlet since 1984 [1], who introduced the concept of wavelet transforms as a tool for time-frequency analysis to approximate potentially highly irregular functions or surfaces. This pioneering work was later formalized by Meyer [2] and Daubechies [3], who constructed orthogonal and compact wavelet bases, enabling efficient multiresolution analysis (MRA), which is a functional framework allowing one to represent a function as a limit of its approximations at different levels of resolution. Mallat [4] further revolutionized the field by developing the fast wavelet transform, providing a practical algorithmic framework for wavelet decomposition and reconstruction. For the theory on developing wavelet systems, we refer to the works of Härdle et al. [5], Tsybakov [6], and Vidakovic [7]. These breakthroughs enabled the application of wavelets in a wide range of fields, including signal processing, image analysis, and numerical solutions of differential equations.
Linear wavelet estimators have been studied by several authors, including Kerkyacharian and Picard [8,9], Antoniadis and Carmona [10], Walter [11,12] who established mean squared error convergence results of wavelet estimators in the case of continuous densities. A nonlinear estimation method based on thresholding techniques was proposed by Donoho and Johnstone [13,14]. This leads to defining a threshold value to identify the coefficients with a high contribution (the choice of this threshold is very important and guarantees the asymptotic properties of the estimator). Several approaches are used to determine the optimal threshold. Donoho [14] presented a universal threshold by analyzing a normal Gaussian noise model; a threshold chooser based on Stein’s unbiased risk estimation was proposed by Donoho and Johnstone [15], and methods based on cross-validation were developed with Naeson [16]. Vidakovic [17] adopts a Bayesian approach. An iterative thresholding approach proposed by and applied to physical sound analysis by Hadjileontiadis et al. [18] is an evaluation of this approach through a fixed-point formulation presented by Ranta et al. [19]. An adaptation of this method in the context of density estimation is one of the objectives of this article. On the other hand, for bivariate data, copula functions have become increasingly popular tools in data analysis. They allow modeling the dependency structure between random variables, regardless of their marginal distribution, based on Sklar’s theorem in 1959 [20]. This disengagement between margins and dependence makes copulas particularly effective in capturing complex relationships between assets, considering phenomena such as asymmetric dependencies or extreme events, which are often overlooked by traditional measures. The books by Joe [21] and Nelsen [22] describe the mathematical and statistical foundations of copulas.
Nonparametric estimators of copula densities based on wavelet decomposition have been suggested by Genest, Masiello, and Tribouley [23] who proposed a wavelet-based estimator for a copula function, and they established the convergence results for the linear estimator. Autin et al. [24] generalized these results in the multivariate framework for the nonlinear estimator. Recent studies have advanced wavelet-based copula estimation. Chatrabgoun et al. [25] proposed a Legendre multiwavelet approach. More recently, Provost [26] reviewed nonparametric copula density methods (see also Falhi and Hmood [27]), while Pensky and De Canditiis [28] refined minimax thresholding theory for wavelet applications.
In the context of α -mixing dependence, various nonparametric problems have been examined. Recent developments can be found in, for example, Bouezmarni et al. [29] and Chesneau [30].
Our contributions in this article are as follows: firstly, we present the fixed-point iterative thresholding algorithm and its convergence conditions to estimate the univariate density function. Secondly, we establish the wavelet copula density estimator in the case where the variables are independent and identically distributed (i.i.d.) and the case where the variables are weakly dependent, forming a sequence of stationary mixtures, we provide a detailed mathematical analysis of the estimator’s statistical properties, including its bias, variance, and mean integrated squared error, thereby establishing theoretical guarantees for its performance under each dependence structure.
The rest of the paper is as follows: Section 2 presents the fixed-point iterative thresholding approach. Section 3 gives an overview of the wavelet copula estimators in two cases. Section 4 shows the results of the simulation experiments and empirical application. Some conclusions are provided in Section 5. Section 6 provides a discussion on the limitations of our study and outlines potential directions for future research.

2. Wavelet Thresholding Estimator in the Univariate Case

This section aims to present the theoretical and methodological foundations of the proposed approach. We first introduce the construction and properties of the wavelet basis used for the estimation procedure. Then, we describe the nonparametric estimation of the density function in the wavelet domain. Finally, we detail the iterative fixed-point thresholding method developed for adaptive noise reduction and optimal estimation.

2.1. Threshold Determination Based on Iterative Fixed-Point Approach

From a mother wavelet ψ and a father wavelet ϕ bases of L 2 ( R ) , the wavelet collection is obtained by translations and dilations as follows:
ϕ j k ( x ) = 2 j 2 ϕ ( 2 j x k ) a n d ψ j k ( x ) = 2 j 2 ψ ( 2 j x k ) .
Any f L 2 ( [ 0 , 1 ] ) can be represented as follows:
f ( x ) = k Z α j 0 k ϕ j 0 k ( x ) + k Z j = j 0 J n β j k ϕ j k ( x ) , x [ 0 , 1 ] ,
where α j 0 k = R f ( u ) ϕ j 0 k ( u ) d u = E { ϕ j 0 k ( X i ) } , β j k = R f ( u ) ψ j k ( u ) d u = E { ψ j k ( X i ) } .
A wavelet estimator of f can be written simply by
f ^ ( x ) = k = 0 2 j 0 1 α ^ j 0 k ϕ j 0 k ( x ) + k = 0 2 j 1 j = j 0 J n β ^ j k ϕ j k ( x ) .
Remark 1. 
In the case of an i.i.d. sample X 1 , , X n , empirical estimates of the coefficients α j 0 K and β j k are given by: α ^ j 0 k = 1 n i = 1 n ϕ j 0 k ( X i )    and     β ^ j k = 1 n i = 1 n ψ j k ( X i )
The estimated detail coefficients β ^ j k are considerable and must be limited by a thresholding method, which consists of conserving only the important coefficients estimated and eliminating the small, estimated coefficients that do not provide any information. This method can be applied locally to each coefficient or globally to a set of coefficients of every level.
There are two well known thresholding functions named the hard thresholding function and the soft thresholding function. The first, hard thresholding, suggests the annulment of all coefficients under the threshold value λ and keeps the other coefficients unchanged. More precisely, the coefficients β ˜ j k will be substituted as follows:
β ˜ j k = β ^ j k 1 ( | β ^ j k | > λ )
The second type is soft thresholding, which, additionally, shrinks the larger coefficients towards zero:
β ˜ j k = s i g n ( β ^ j k ) × ( | β ^ j k | λ )
The universal value of λ is given by Donoho and Johnstone [13,15], under the assumption of white Gaussian noise, by
λ = σ 2 ln N ,
where N is the number of data points, and σ is an estimate of the noise level σ (typically a scaled median absolute deviation of the empirical wavelet coefficient). The iterative wavelet denoising method was initially proposed by Coifman and Wickerhauser [31] and applied to physiological sound analysis by Hadjileontiadis et al. [18]. The goal of this method is to separate the stationary part from the non-stationary part of a signal. An improvement of this iterative algorithm was processed by Ranta et al. [19] based on a fixed-point-type interpretation. Given wavelet coefficients estimated from the data, under the additive model and orthonormal basis, we can write:
β j k = β j k s + β j k r ,
where β j , k s are the informative coefficients and β j , k r are the noisy coefficients.
The principle of iterative thresholding consists of repeating the threshold operation (hard or soft) several times; after two iterations, we can write the model as
β j k t = β j k s , t + β j k r , t = β j k s , t + ( β j k s , t + 1 + β j k r , t + 1 )
The threshold value is calculated at each iteration t according to the standard deviation σ j t of the coefficient { β j k r , t , | β j k r , t | < λ j ( t ) } , as follows:
λ j ( t ) = F a · σ j ( t 1 ) = F a 1 N k , j ( β j k r , t 1 | β j k r , t | < λ j ( t ) ) 2 ,
where F a is a user defined constant, a classical choice is F a = 3 . Consider the function g defined x R + as follows:
g ( x ) = F a 1 N k , j ( β j k r , t 1 | β j k r , t | < x ) 2 ,
and taking values in a finite set of real numbers in [ 0 , λ j ( 0 ) ] ( λ j ( 0 ) when x + ), g is continuous, monotonic (increasing) and positive.
The update threshold value at each iteration is expressed as follows:
λ j ( t ) = g ( λ j ( t 1 ) )
The iterations are interrupted by the validation of a certain stop criterion S T C t , exactly when no element from the set of { β j k r , t 1 } is considered a significant coefficient β j k s , t ,
S T C t : = | E { β j k r , t } 2 E { β j k r , t 1 } 2 | 0 .
which is equivalent to saying that between two successive iterations, the threshold value stays constant ( λ j ( t ) = λ j ( t 1 ) ) . Therefore, Equation (7) can be written in another way as
S T C t : = λ j ( t ) = g λ j ( t )
In conclusion, the stopping criterion is validated by a fixed point of the function g defined previously in (5) and that this point is the final threshold value. At the end of this algorithm, the reconstruction phase is applied by Inverse Discrete Wavelet Transform, and we obtain the thresholded density estimate.
f ˜ ( x ) = j k β j k s , t ψ j k ( x )
To illustrate the main steps of the proposed method, Figure 1 presents the algorithmic flow of the fixed-point iterative wavelet thresholding procedure for density estimation. The process begins with the wavelet decomposition of the empirical density, followed by iterative threshold updates until convergence is achieved, resulting in a smooth and adaptive estimate of the density function.
Figure 1. Algorithmic flowchart of the fixed-point iterative wavelet thresholding method.
Algorithm 1 below includes the steps of the fixed-point wavelet thresholding.
Algorithm 1 Iterative Wavelet Thresholding algorithm with fixed point(FPWT)
  • Step 1: Input data:  X = ( X 1 , , X n )
  • Step 2: (Decomposition:) Estimate initial wavelet coefficients β j k from data for all scales j and positions k.
  • Step 3: (Initialisation) Choice of parameter F a .
              Initialize threshold value λ j ( 0 ) .
              Define the function g ( λ ) = F a . σ j t
  • Step 4: (Iterative thresholding loop:) Repeat until convergence.
             ●     
    Apply thersholding: β j k ( t + 1 ) = β j k ( t + 1 ) 1 | β j k ( t ) | < λ ( t )
             ●     
    update the threshold: using (6)
             ●     
    Evaluate Stop criterion by comparing λ j ( t ) and g ( λ j ( t ) )
  • Step 5: Reconstruction: Reconstruct the density estimator f ˜ ( x ) from thresholded coefficients using Inverse wavelet Transform
Proposition 1. 
If the density probability of the wavelets coefficients p ( β j k ) with zero mean, finite variance, and a mode of 0. A sufficient condition under which the function g defined in (5) admits at least one non-null fixed point λ [ a , b ] with a , b > 0 is that
F a 3 2 a p ( a ) .
Proof. 
By applying the intermediate value theorem, and since the function g ( x ) defined in (5) continuous on an interval [ a , b ] , g has a fixed point if ∀ x [ a , b ] , g ( x ) [ a , b ] . Since g is monotone increasing, we must prove that there exists an interval [ a , b ] with g ( b ) b and g ( a ) a .
Since σ j t is finite, let M = F a σ j t , b M , g ( b ) = F a σ | β j k | < b = M < b .
Let us consider a > 0 , partial integration [ g ( a ) ] 2 = 2 F a 2 p ( β j k ) β j k 3 3 0 a p ( β j k ) β j k 3 3 d β j k .
Under the initial hypothesis, p ( β j k ) has a mode in 0. One can find a such as p ( β j k ) is decreasing on ] 0 , a ] . Then, the derivative p ( a ) < 0 , so 0 a p ( β j k ) β j k 3 3 d β j k < 0 and [ g ( a ) ] 2 > a 2 when [ g ( a ) ] 2 > 2 F a 2 p ( a ) a 3 3 , i.e., when F a 3 2 a p ( a ) . □
Although this condition is sufficient but not necessary, smaller values of F a may still suffice ensure convergence. The function g can admit multiple fixed points; the iterative procedure produces decreasing threshold values and converges to the largest fixed point, which satisfies the stopping criterion and defines the final threshold.
Proposition 2. 
The largest fixed point of the function g defined in (5) is not missed by the iterative algorithm.
Proof. 
Us suppose that the largest fixed point of the function g is missed by the iterative algorithm. That means that there exists γ = g ( γ ) and γ λ j ( t ) ∀ iteration t. This can be further rewritten as there exists t such that t such as λ j ( t ) < γ < λ j ( t + 1 ) .
But, g ( λ j ( t ) ) = λ j ( t + 1 ) so the first inequality rewrites as g ( λ j ( t ) ) < γ . On the other hand, g is monotonically increasing, so the second inequality implies g ( γ ) g ( λ j ( t ) ) which implies γ g ( λ j ( t ) ) , which contradicts the first inequality. So, the hypothesis made at the beginning is false: the largest fixed point is not missed by the iterative algorithm. □

2.2. Case of Gaussian Generalized Distribution

This paragraph aims to determine the convergence conditions of the iterative fixed-point algorithm under the assumption that the coefficients obey a generalized Gaussian distribution.
Definition 1. 
The Generalized Gaussian Distribution (GGD) is defined as follows:
p ( w ) = α e | β w | u
with β = 1 σ Γ ( 3 / u ) Γ ( 1 / u ) , α = β u 2 Γ ( 1 / u ) and Γ ( u ) = 0 e x x u 1 d x , where σ is the standard deviation and u > 0 is the shape parameter of the probability law ( u = 2 for a Gaussian and u = 1 for the Laplace positive density function (pdf)).
The conditions of the preceding Proposition 1 (which are close to real conditions) ensure that the function g is monotonically increasing, and there exists an interval [ a , b ] with g ( a ) > a and g ( b ) < b (i.e., g ( x ) [ a , b ] x [ a , b ] ). This implies that the coefficient F a must be bounded by a minimum value, F a m . Indeed, if F a is chosen to be less than F a m , the algorithm converges to 0, which means that the estimated noise is zero at the scale considered. This minimum value depends on the probability distribution of the wavelet coefficients, p ( β j k ) ), and its expression is proposed by Ranta et al. [19] in the case of a generalized Gaussian distribution in the next proposition.
Proposition 3. 
Using a generalized Gaussian model for the pdf of the wavelet coefficients p ( β j k ) , the lower bound F a m is independent of σ and depends only on the shape parameter u. F a m is given as follows:
F a m = 3 β 2 α ( u e ) 1 u = 3 Γ ( 1 u ) u ( u e ) 1 u .
Proof. 
The inequality (9) can be written
F a 3 2 a α e ( β a ) u .
that is
a 3 2 α F a 2 e ( β a ) u
The objective is to determine the lowest value of F a (denoted F a m ) such that there exists one a > 0 verifying (12).
Let the function q ( a ) = 3 2 α F a 2 e ( β a ) u is differentiable and strictly convex due to the regularity of the exponential term ( e ( β a ) u ), it can have 0, 1, or 2 intersection points with the identity line y ( a ) = a . Thus, F a m is obtained when q ( a ) is tangent (at a point of abscissa a 0 ) to the line y = a . The bound value F a m corresponds to the tangency point where q ( a ) and y = a meet with identical slope, i.e.,
q ( a 0 ) = a 0 , q ( a 0 ) = 1 .
Solving this system yields the explicit lower bound:
3 2 α F a m 2 u β u a 0 u 1 e ( β a 0 ) u
A simple calculation gives us the bound F a m :
F a m = 3 β 2 α ( u e ) 1 u = 3 Γ ( 1 u ) u ( u e ) 1 u
The lower bound F a m is independent of σ and depends only on the shape parameter u. □
The following proposition gives the value F a m in the cases of Gaussian and Laplace distributions.
Proposition 4. 
Cases of Gaussian and Laplace distribution, we have
  • Gaussian distribution (u = 2): F a m = 3 e / 2 2.02 .
  • Laplace distribution (u = 1): F a m = 3 e 2.86 .
Theorem 1. 
Assume that f B p , q s ( R ) is the Besov ball of radius R > 0 on [ 0 , 1 ] with smoothness s > 0 and integrability indices 1 p , q . Using the GGD model for the wavelet coefficients, the fixed-point iteration converges to a unique fixed value λ at each scale j. The resulting estimator satisfies
E ( f ˜ f 2 2 ) C . n 2 s 2 s + 1
where s is the smoothness of f and C is a Constant depends on F a
Proof. 
We see that this error is divided into two terms: the stochastic error due to the random nature of the observations and the bias error due to the estimation approach used.
E ( f ˜ f 2 2 ) 2 p 1 E f ˜ E [ f ˜ ] 2 2 + E ( f ˜ ) f 2 2 : s t o c h a s t i c e r r o r + b i a s e r r o r .
On the one hand, thresholding preserves coefficients β j k with | β j k | > λ . For smooth densities f in Besov space B p , q s , wavelet coefficient decay as | β j k | 2 j ( s + 1 / 2 ) .
The bias is dominated by discarded coefficients below λ as | β j k | 2 j > j 2 2 j s 2 2 j s where j is the coarsest scale with 2 j ( s + 1 / 2 ) λ .
On the other hand, the variance arises from retained noisy coefficients satisfying V a r ( β j k ) n 1 ; thresholding retained o ( 2 j ) coefficients, so V a r i a n c e j j k V a r ( β j k ) 2 j / n . Balancing bias and variance ( 2 2 j s 2 j / n ) yields: 2 j n 1 2 s + 1 , which implies E r r o r C . n 2 s 2 s + 1 the constant C depend on F a via S (thresholds control the bias-variance trade-off). □
This completes the proof for finite sample guarantees; see Donoho and Jhonoston [13] on wavelet shrinkage minimax rates.
The next section addresses the estimation of copula densities using wavelet-based methods under both independent and dependent settings. We analyze the estimator’s statistical properties: bias, variance, and mean integrated squared error for (i.i.d) and mixing dependent cases, providing theoretical performance guarantees. Finally, the copula density is estimated using the fixed-point thresholding approach introduced in Section 2.

3. Wavelet Estimator of Copula Density

In many applied fields, such as finance, insurance, and environmental science, understanding nonlinear and extreme dependence between variables is essential. Since classical parametric copulas often fail to capture complex local dependence, this motivates the use of wavelet-based methods.
In this paper, we propose a wavelet-based copula density estimator and study its performance under independent and weakly dependent α -mixing settings. We analyze its bias and variance and introduce a fixed-point wavelet thresholding procedure that adapts to the unknown dependence structure, making it particularly suitable for financial risk management and extreme event modeling.
We begin this section by introducing the basic concept of copula, which provides a powerful framework for modeling dependence between random variables.

3.1. Basic Concept of Copula

Given a bivariate random vector ( X , Y ) t , denote its joint cumulative distribution function (cdf) by H and corresponding marginal distributions by F and G. According to Sklar (1959) [20], we can rewrite the joint distribution:
H ( x , y ) = C ( F ( x ) , G ( y ) )
where C is the bivariate copula function. If F and G are continuous, C is unique. The copula approach facilitates multivariate analyses by allowing separate modeling of the marginal distributions and copula, which completely characterizes the dependence between X and Y. We may write
C ( u , v ) = H ( F 1 ( u ) , G 1 ( v ) )
where F 1 and G 1 denote the generalized left continuous inverses of F and G.
An important property of a copula is that it can capture the tail dependence: the upper tail dependence λ U exists when there is a positive probability of positive outliers occurring jointly, while the lower tail dependence λ L is a negative probability of negative outliers occurring jointly. Formally, λ U and λ L are defined, respectively, as follows:
λ U = lim u 1 P x 1 F 1 1 ( u ) x 2 F 2 1 ( u ) = lim u 1 1 2 u + C ( u , u ) 1 u ,
λ L = lim u 0 P x 1 F 1 1 ( u ) x 2 F 2 1 ( u ) = lim u 0 C ( u , u ) u .
The copula C ( u , v ) is differentiable on ] 0 , 1 [ × ] 0 , 1 [ , its density exists and is as follows:
c ( u , v ) = 2 C ( u , v ) u v .
In the rest of this section, we consider the problem of copula density estimation in the case where the variables are i.i.d and the case of weak dependence: forming a sequence of stationary mixtures.

3.2. Case of Independent Variables

Let Φ be a given scaling function, and let Ψ be the associated wavelet. It is assumed henceforth that both functions are real-valued and have compact support [0, L] for some L > 0 . For every j { j 0 , , J n } , k = ( k 1 , k 2 ) { 0 , , 2 j 1 } { 0 , , 2 j 1 } , let: { Φ j k , Ψ j k ( 1 ) , Ψ j k ( 2 ) and Ψ j k ( 3 ) } be an orthonormal basis of L 2 ( R ) .
Where Φ j k ( x , y ) = Φ j k 1 ( x ) Φ j k 2 ( y ) , Ψ j k 1 ( x , y ) = Φ j k 1 ( x ) Ψ j k 2 ( y ) , Ψ j k 2 ( x , y ) = Ψ j k 1 ( x ) Φ j k 2 ( y ) , and Ψ j k 3 ( x , y ) = Ψ j k 1 ( x ) Ψ j k 2 ( y ) .
The wavelet representation of the density c is as follows:
c ( x , y ) = c j 0 k ( x , y ) + D j 0 c ( x , y ) ; x , y R
where
c j 0 k ( x , y ) = k 1 , k 2 = 0 2 j 1 α j 0 k 1 , k 2 Φ j 0 k 1 , k 2 ( x , y )
is a trend (or approximation) and
D j 0 c ( x , y ) = ϵ = 1 3 j = j 0 J n k 1 , k 2 = 0 2 j 1 β j , k ϵ Ψ j k 1 , k 2 ϵ ( x , y )
is a sum of details of three types: vertical ϵ = 1 , horizontal ϵ = 2 , and oblique ϵ = 3 . In this representation, the coefficients are as follows:
α j 0 k = ( 0 , 1 ) 2 c ( u , v ) Φ j 0 k ( u , v ) d u d v = ( 0 , 1 ) 2 Φ j 0 k ( F ( x ) , G ( y ) ) h ( x , y ) d y d x = E h Φ j 0 k ( F ( x ) , G ( y ) ) .
and for ϵ { 1 , 2 , 3 }
β j k ϵ = ( 0 , 1 ) 2 c ( u , v ) Ψ j 0 k ϵ ( u , v ) d u d v = ( 0 , 1 ) 2 Ψ j 0 k ϵ ( F ( x ) , G ( y ) ) h ( x , y ) d y d x = E h Ψ j 0 k ϵ ( F ( x ) , G ( y ) ) .
Then, the wavelet-based estimator of c is then given by c ^ :
c ^ ( u , v ) = c ^ j 0 k ( x , y ) + D j 0 c ^ ( x , y ) ; x , y [ 0 , 1 ]
where
c ^ j 0 k ( x , y ) = k 1 , k 2 = 0 2 j 1 α ^ j 0 k 1 , k 2 Φ j 0 k 1 , k 2 ( x , y ) ;
where
α ^ j 0 k 1 , k 2 = 1 n i = 1 n Φ j 0 k 1 , k 2 ( F ( X i ) , G ( Y i ) )
and
D j 0 c ^ ( x , y ) = ϵ = 1 3 j = j 0 J n k 1 , k 2 = 0 2 j 1 β ^ j , k 1 , k 2 ϵ Ψ j k 1 , k 2 ϵ ( x , y ) ;
where
β ^ j , k 1 , k 2 ϵ = 1 n i = 1 n Ψ j k 1 , k 2 ϵ F ( X i ) , G ( Y i ) .
When F and G are unknown, they are replaced by their empirical counterparts F n and G n . The pseudo-observations ( R i / n , S i / n ) , based on the ranks of X i and Y i , serve as empirical approximations of the unobservable pairs ( F ( X i ) , G ( Y i ) ) , forming a sample from the true copula C. The copula density estimator c ˜ is then constructed from these pseudo-observations and it is given by
c ˜ ( u , v ) = c ˜ j 0 k ( x , y ) + D j 0 c ˜ ( x , y ) ; x , y [ 0 , 1 ]
where
c ˜ j 0 k ( x , y ) = k 1 , k 2 = 0 2 j 1 α ˜ j 0 k 1 , k 2 Φ j 0 k 1 , k 2 ( x , y ) ;
where
α ˜ j 0 k 1 , k 2 = 1 n i = 1 n Φ j 0 k 1 , k 2 ( F n ( X i ) , G n ( Y i ) ) = 1 n i = 1 n Φ j 0 k 1 , k 2 ( R i n , S i n ) .
and
D j 0 c ˜ ( x , y ) = ϵ = 1 3 j = j 0 J n k 1 , k 2 = 0 2 j 1 β ˜ j , k 1 , k 2 ϵ Ψ j k 1 , k 2 ϵ ( x , y ) ;
for ϵ { 1 , 2 , 3 } we obtain the following:
β ˜ j , k 1 , k 2 ϵ = 1 n i = 1 n Ψ j k 1 , k 2 ϵ F n ( X i ) , G n ( Y i ) = 1 n i = 1 n Ψ j k 1 , k 2 ϵ R i n , S i n .
Definition 2. 
Consider Besov spaces as functional spaces because they are characterized in terms of wavelet coefficients as follows. Besov spaces depend on three parameters s > 0 , 1 < p < , and 1 < q < , and are denoted by B p , q s . Let c L 2 ( [ 0 , 1 ] 2 ) . Define the sequence norm of the wavelet coefficients of a function c B p , q s as follows:
| c | B p , q s = k Z 2 | α j 0 , k | p 1 / p + j j 0 2 j q ( s + d ( 1 / 2 1 / p ) ) k Z 2 | β j , k | p q / p 1 / q ,
where
( | β j , k | p ) 1 / p = k Z 2 ε E 2 | β j , k ε | p 1 / p .
Proposition 5. 
The difference between empirical and theoretical coefficients
c ^ ( x , y ) c ˜ ( x , y ) = o ln n n
Proof of Proposition 5. 
c ˜ ( x , y ) c ^ ( x , y ) = k 1 , k 2 = 0 2 j 0 1 ( α ˜ j 0 k 1 , k 2 α ^ j 0 k 1 , k 2 ) Φ j 0 k 1 , k 2 ( x , y ) + ϵ = 1 3 j = j 0 J n k 1 , k 2 = 0 2 j 1 ( β ˜ j , k 1 , k 2 ϵ β ^ j , k 1 , k 2 ϵ ) Ψ j k 1 , k 2 ϵ ( x , y ) = : A + ϵ = 1 3 A ϵ
Starting now with A:
We have | | F n F | | = O ln ln n n a . s , | | G n G | | = O ln ln n n a . s so we can apply Taylor to F n ( X i ) and G n ( Y i ) n + there exist ξ 1 and ξ 2 R such that
Φ j 0 k 1 , k 2 ( F ( X i ) , G ( Y i ) ) Φ j 0 k 1 , k 2 ( F n ( X i ) , G n ( Y i ) ) = Φ j 0 k 1 ( F ( X i ) ) Φ j 0 k 2 ( G ( Y i ) ) Φ j 0 k 1 ( F n ( X i ) ) Φ j 0 k 2 ( G n ( Y i ) ) = Φ j 0 k 1 ( F ( X i ) ) Φ j 0 k 2 ( G ( Y i ) ) Φ j 0 k 2 ( G n ( Y i ) ) + Φ j 0 k 1 ( F ( X i ) ) Φ j 0 k 1 ( F n ( X i ) ) Φ j 0 k 2 ( G n ( Y i ) ) = Φ j 0 k 1 ( F ( X i ) ) G n ( Y i ) G ( Y i ) Φ j 0 k 2 ( ξ 1 ) F ( X i ) F n ( X i ) Φ j 0 k 1 ( ξ 2 ) Φ j 0 k 2 ( G n ( Y i ) )
So
| α ˜ j 0 k 1 , k 2 α ^ j 0 k 1 , k 2 | = 1 n | i = 1 n Φ j 0 k 1 , k 2 ( F ( X i ) , G ( Y i ) ) Φ j 0 k 1 , k 2 ( F n ( X i ) , G n ( Y i ) ) | C 2 2 j 0 ln ln n n
So using c a r d { k 1   and   k 2 { 0 , , 2 j 0 1 } } 2 2 j 0
| A | k 1 , k 2 = 0 2 j 0 1 | α ˜ j 0 k 1 , k 2 α ^ j 0 k 1 , k 2 | | Φ j 0 , k 1 , k 2 ( x , y ) | C 2 5 j 0 ln ln n n = o ln n n
Without losing the generality, starting now with ϵ = 1 , using the same methods than A, we have
| β ˜ j , k 1 , k 2 1 β ^ j , k 1 , k 2 1 | = 1 n | i = 1 n Ψ j k 1 , k 2 1 ( F ( X i ) , G ( Y i ) ) Ψ j k 1 , k 2 1 ( F n ( X i ) , G n ( Y i ) ) | = 1 n i = 1 n | Φ j k 1 ( F ( X i ) ) Ψ j k 2 ( G ( Y i ) ) Φ j k 1 ( F n ( X i ) ) Ψ j k 2 ( G n ( Y i ) ) | C ln ln n n 2 2 j
the same calculation in (31) we have A 1 = o ln n n
Theorem 2. 
The bias of the estimators c ˜ satisfies:
Bias { c ˜ ( x , y ) } = o ln n n .
Proof. 
Using Proposition 5 we have
E { c ˜ ( x , y ) } = E { ( c ˜ ( x , y ) c ^ ( x , y ) ) + c ^ ( x , y ) } = o ln n n + E { c ^ ( x , y ) }
and
E { c ^ ( x , y ) } = E k 1 , k 2 = 0 2 j 0 1 α ^ j 0 k 1 , k 2 Φ j 0 k 1 , k 2 ( x , y ) + ϵ 3 j = j 0 J n k 1 , k 2 = 0 2 j 1 β ^ j , k 1 , k 2 ϵ Ψ j k 1 , k 2 ϵ ( x , y ) = k 1 , k 2 = 0 2 j 0 1 E { α ^ j 0 k 1 , k 2 } Φ j 0 k 1 , k 2 ( x , y ) + ϵ = 1 3 j = j 0 J n k 1 , k 2 = 0 2 j 1 E { β ^ j , k 1 , k 2 ϵ } Ψ j k 1 , k 2 ϵ ( x , y ) = c ( x , y )
Theorem 3. 
The estimator variance of c ˜ satisfies:
Var { c ˜ ( x , y ) } = O ln n n .
Proposition 6. 
(a) There exists a constant C > 0 such that, for any j { j 0 , , J n } and k 1 and k 2 { 0 , , 2 j 0 1 } .
E | α ^ j 0 , k 1 , k 2 α j 0 , k 1 , k 2 | 2 C n E | β ^ j , k 1 , k 2 ϵ β j , k 1 , k 2 ϵ | 2 C n ϵ { 1 , 2 , 3 }
(b) 
There exists a constant C > 0 such that, for any j { j 0 , , J n } and k 1 , and k 2 { 0 , , 2 j 0 1 }
E | α ^ j 0 , k 1 , k 2 α j 0 , k 1 , k 2 | 4 C 2 2 j 0 n 2 E | β ^ j , k 1 , k 2 ϵ β j , k 1 , k 2 ϵ | 4 C 2 2 j n 2 ϵ { 1 , 2 , 3 }
(c) 
Let λ j = ln n n there exists a constant C > 0 such that, for any κ large enough, j { j 0 , , J n } and k 1 and k 2 { 0 , , 2 j 0 1 } we have
P | α ^ j 0 k 1 , k 2 α j 0 k 1 , k 2 | > κ λ j 2 C n 4 P | β ^ j k 1 , k 2 ϵ β j k 1 , k 2 ϵ | > κ λ j 2 C n 4 ϵ { 1 , 2 , 3 }
Proof of Proposition 6. 
(a)
E | α ^ j 0 k 1 , k 2 α j 0 k 1 , k 2 | 2 = Var { α ^ j 0 k 1 , k 2 } = 1 n Var { Φ j 0 , k 1 , k 2 ( F ( X i ) , G ( Y i ) ) } 1 n E { Φ j 0 , k 1 , k 2 2 ( F ( X i ) , G ( Y i ) ) } = 1 n ( 0 , 1 ) 2 Φ j 0 , k 1 , k 2 2 ( F ( x ) , G ( y ) ) h ( x , y ) d x d y u = F ( x ) , v = G ( y ) = 1 n ( 0 , 1 ) 2 Φ j 0 , k 1 2 ( u ) Φ j 0 , k 2 2 ( v ) c ( u , v ) d u d v = 1 n ( 0 , 1 ) 2 2 2 j 0 Φ ( 2 j 0 u k 1 ) Φ ( 2 j 0 v k 2 ) c ( u , v ) d u d v | | c | | n ( 0 , 1 ) 2 j 0 Φ ( 2 j 0 u k 1 ) d u ( 0 , 1 ) 2 j 0 Φ ( 2 j 0 v k 2 ) d v = C n x = 2 j 0 u k 1 d x = 2 j 0 d u y = 2 j 0 v k 2 d y = 2 j 0 d v
(b)
Without losing the generality for ϵ = 2
E | β ^ j , k 1 , k 2 2 β j , k 1 , k 2 2 | 4 1 n 2 E { Ψ j , k 1 , k 2 4 ( F ( X 1 ) , G ( Y 1 ) ) } = 1 n 2 ( 0 , 1 ) 2 Ψ j , k 1 4 ( F ( x ) ) Φ j , k 2 4 G ( y ) ) h ( x , y ) d x d y u = F ( x ) , v = G ( y ) = 1 n 2 ( 0 , 1 ) 2 2 4 j Ψ 4 ( 2 j u k 1 ) Φ 4 ( 2 j v k 2 ) c ( u , v ) d u d v x = 2 j u k 1 , y = 2 j v k 2 2 2 j | | c | | n 2 | | Ψ | | 4 4 | | Φ | | 4 4
(c)
 
Lemma 1 (Bernstein’s inequality). 
Let ξ 1 , , ξ n be i.i.d bounded random variables, such that E { ξ i } = 0 , E { ξ i 2 } σ 2 , | ξ i | | | ξ | | < .
  • Then,
    P 1 n i = 1 n ξ i > λ 2 exp n λ 2 2 σ 2 + | | ξ | | λ 3 λ > 0 .
Applying Lemma 1 to ξ i = Φ j , k 1 , k 2 ( F ( X i ) , G ( Y i ) ) E { Φ j , k 1 , k 2 ( F ( X i ) , G ( Y i ) ) } and noting that E { ξ i } = 0 , E { ξ i 2 } σ let λ j = ln n n so we can conclude that, if C > 0 is large enough
P | c ^ j 0 k 1 , k 2 c j 0 k 1 , k 2 | 2 > λ j C n 4
The same for ϵ { 1 , 2 , 3 }
P | β ^ j k 1 , k 2 ϵ β j k 1 , k 2 ϵ | 2 > λ j C n 4
Proof. 
Using Proposition 5 we have
Var { c ˜ ( x , y ) } = Var { c ^ ( x , y ) } + o ln n n
Starting now with
Var { c ^ ( x , y ) } = Var { k 1 , k 2 = 0 2 j 0 1 α ^ j 0 k 1 , k 2 Φ j 0 k 1 , k 2 ( x , y ) + ϵ = 1 3 j = j 0 J n k 1 , k 2 = 0 2 j 0 1 β ^ j , k 1 , k 2 ϵ Ψ j k 1 , k 2 ϵ ( x , y ) } = k 1 , k 2 = 0 2 j 0 1 E { | α ^ j 0 k 1 , k 2 α j 0 k 1 , k 2 | 2 } Φ j 0 k 1 , k 2 2 ( x , y ) + ϵ = 1 3 j = j 0 J n k 1 , k 2 = 0 2 j 0 1 E { | β ^ j , k 1 , k 2 ϵ β j , k 1 , k 2 ϵ | 2 } Ψ j k 1 , k 2 ϵ 2 ( x , y ) = A + ϵ = 1 3 A ϵ
Starting now with A: We apply the inequality of Holder we find the following:
k 1 , k 2 = 0 2 j 0 1 E { | α ^ j 0 k 1 , k 2 α j 0 k 1 , k 2 | 2 } Φ j 0 k 1 , k 2 2 ( x , y ) k 1 , k 2 = 0 2 j 0 1 E 2 | α ^ j 0 k 1 , k 2 α j 0 k 1 , k 2 | 2 1 2 . k 1 , k 2 = 0 2 j 0 1 Φ j 0 k 1 , k 2 4 ( x , y ) 1 2
and
| α ^ j 0 k 1 , k 2 α j 0 k 1 , k 2 | 2 = | α ^ j 0 k 1 , k 2 α j 0 k 1 , k 2 | 2 1 _ | α ^ j 0 k 1 , k 2 α j 0 k 1 , k 2 | ln n n + | α ^ j 0 k 1 , k 2 α j 0 k 1 , k 2 | 2 1 | α ^ j 0 k 1 , k 2 α j 0 k 1 , k 2 | > ln n n
using Jenson ’s inequality and using Proposition 5
E 2 | α ^ j 0 k 1 , k 2 α j 0 k 1 , k 2 | 2 1 | α ^ j 0 k 1 , k 2 α j 0 k 1 , k 2 | ln n n E { | α ^ j 0 k 1 , k 2 α j 0 k 1 , k 2 | 4 . 1 | α ^ j 0 k 1 , k 2 α j 0 k 1 , k 2 | ln n n } ln n n E | α ^ j 0 k 1 , k 2 α j 0 k 1 , k 2 | 2 C ln n n 2
using (37) and (39) we have c a r d k 1 , and k 2 { 0 , , 2 j 0 1 } C 2 2 j 0 we obtain the following:
k 1 , k 2 = 0 2 j 0 1 E | α ^ j 0 k 1 , k 2 α j 0 k 1 , k 2 | 2 1 | α ^ j 0 k 1 , k 2 α j 0 k 1 , k 2 | ln n n Φ j 0 k 1 , k 2 2 ( x , y ) c k 1 , k 2 = 0 2 j 0 1 ln n n 2 2 4 j 0 1 2 c ln n n 2 2 6 j 0 1 2
Starting now with
k 1 , k 2 = 0 2 j 0 1 E | α ^ j 0 k 1 , k 2 α j 0 k 1 , k 2 | 2 1 | α ^ j 0 k 1 , k 2 α j 0 k 1 , k 2 | > ln n n Φ j 0 k 1 , k 2 2 ( x , y )
Using Holder’s inequality from points (b) and (c) of Proposition 6.
E 2 | α ^ j 0 k 1 , k 2 α j 0 k 1 , k 2 | 2 1 | α ^ j 0 k 1 , k 2 α j 0 k 1 , k 2 | > ln n n E | α ^ j 0 k 1 , k 2 α j 0 k 1 , k 2 | 4 . E 1 | α ^ j 0 k 1 , k 2 α j 0 k 1 , k 2 | > ln n n E | α ^ j 0 k 1 , k 2 α j 0 k 1 , k 2 | 4 . P | α ^ j 0 k 1 , k 2 α j 0 k 1 , k 2 | > ln n n C 2 2 j n 2 n 4
Using Holder’s inequality, (37), (41) and c a r d k 1 , and k 2 { 0 , , 2 j 0 1 } C 2 2 j 0
k 1 , k 2 = 0 2 j 0 1 E | α ^ j 0 k 1 , k 2 α j 0 k 1 , k 2 | 2 1 | α ^ j 0 k 1 , k 2 α j 0 k 1 , k 2 | > ln n n Φ j 0 k 1 , k 2 2 ( x , y ) c 2 2 j 0 k 1 , k 2 = 0 2 j 0 1 2 2 j 0 n 2 n 4 1 2 c 2 2 j 0 2 2 j 0 2 2 j 0 n 2 n 4 1 2 C ln n n
so we have A.
Now starting with
A ϵ : = j = j 0 J n k = 0 2 j 1 E | β ^ j k 1 , k 2 ϵ β j k 1 , k 2 ϵ | Ψ j k 1 , k 2 ϵ 2 ( x , y ) ϵ { 1 , 2 , 3 }
we apply Holder’s inequality twice in succession:
j = j 0 J n k 1 , k 2 = 0 2 j 1 E | β ^ j k 1 , k 2 ϵ β j k 1 , k 2 ϵ | 2 Ψ j , k 1 , k 2 ϵ 2 ( x , y ) j = j 0 J n k 1 , k 2 = 0 2 j 1 E | β ^ j k 1 , k 2 ϵ β j k 1 , k 2 ϵ | 2 1 2 . j = j 0 J n k 1 , k 2 = 0 2 j 1 Ψ j , k 1 , k 2 ϵ 4 ( x , y ) 1 2
Lemma 2. 
Using point a of Proposition 6 and c a r d k 1 o r k 2 { 0 , , 2 j 1 } C 2 j we have
j = j 0 J n k 1 , k 2 = 0 2 j 1 E { | β ^ j k 1 , k 2 ϵ β j k 1 , k 2 ϵ | 2 1 | β ^ j k 1 , k 2 ϵ β j k 1 , k 2 ϵ | ln n n Ψ j , k 1 , k 2 ϵ 2 ( x , y ) = O ln n n
Proof of Lemma 10 using Jenson’s inequality, and a of Proposition 6. 
E 2 | β ^ j k 1 , k 2 ϵ β j k 1 , k 2 ϵ | 2 1 | β ^ j k 1 , k 2 ϵ β j k 1 , k 2 ϵ | ln n n E { | β ^ j k 1 , k 2 ϵ β j k 1 , k 2 ϵ | 4 . 1 | β ^ j k 1 , k 2 ϵ β j k 1 , k 2 ϵ | ln n n } ln n n E 2 | β ^ j k 1 , k 2 ϵ β j k 1 , k 2 ϵ | 2 c ln n n 2
so using (65) and (66) and c a r d k 1 and k 2 { 0 , , 2 j 0 1 } C 2 2 j 0 we have
j = j 0 J n k 1 , k 2 = 0 2 j 1 E | β ^ j k 1 , k 2 ϵ β j k 1 , k 2 ϵ | 2 1 | β ^ j k 1 , k 2 ϵ β j k 1 , k 2 ϵ | ln n n Ψ j k 1 , k 2 ϵ 2 ( x , y ) C ( j = j 0 J n k 1 , k 2 = 0 2 j 1 ln n n 2 1 2 C j = j 0 J n ln n n 2 2 2 j 1 2 C ln n n
Lemma 3. 
Using the point (b) and (c) of Proposition 6 c a r d k 1 and k 2 { 0 , , 2 j 1 } C 2 2 j and (65) we have
j = j 0 J n k 1 , k 2 = 0 2 j 1 E 2 | β ^ j k 1 , k 2 ϵ β j k 1 , k 2 ϵ | 2 1 | β ^ j k 1 , k 2 ϵ β j k 1 , k 2 ϵ | > ln n n Ψ j , k 1 , k 2 ϵ 2 ( x , y ) = O ln n n
Proof of Lemma 11. 
Using Holders’ inequality and using points (b) and (c) of proposition (6).
E 2 | β ^ j k 1 , k 2 ϵ β j k 1 , k 2 ϵ | 2 1 | β ^ j k 1 , k 2 ϵ β j k 1 , k 2 ϵ | > ln n n E | β ^ j k 1 , k 2 ϵ β j k 1 , k 2 ϵ | 4 . E 1 | β ^ j k 1 , k 2 ϵ β j k 1 , k 2 ϵ | > ln n n = E | β ^ j k 1 , k 2 ϵ β j k 1 , k 2 ϵ | 4 . P | β ^ j k 1 , k 2 ϵ β j k 1 , k 2 ϵ | > ln n n 2 2 j n 2 n 4
using (65), (67) and using c a r d k 1 and k 2 { 0 , , 2 j 1 } C 2 2 j we obtain:
j = j 0 J n k 1 , k 2 = 0 2 j 1 E | β ^ j k 1 , k 2 ϵ β j k 1 , k 2 ϵ | 2 1 | β ^ j k 1 , k 2 ϵ β j k 1 , k 2 ϵ | > ln n n Ψ j , k 1 , k 2 ϵ 2 ( x ) C j = j 0 J n k 1 , k 2 = 0 2 j 1 2 2 j n 2 n 4 1 2 C j = j 0 J n 2 2 j n 2 2 2 j n 4 1 2 C ln n n

3.3. Under Depend Variables

Definition 3. 
We aim to estimate an unknown function f L 2 ( [ 0 , 1 ] ) via n random variables X 1 , X 2 , … , X n from defined on a probability space a strictly stationary stochastic process ( X t ) t Z defined on a probability space Ω , F , P . We suppose that ( X t ) t Z has a α mixing dependence structure with exponential decay rate; γ > 0 , and θ > 0 such that:
sup m 1 exp θ m α m γ ,
where
α m = sup ( A , B ) F 0 F m | P A B P ( A ) P ( B ) | ,
F 0 is σ-algebra generated by the random variables: … , X 1 , X 0 and F m is σ-algebra generated by the random variables: X m , X m + 1 , ….
We make the following assumptions on the model in Definition 3. Assume that the copula belong to the Besov space defined in Definition 2 and consider the following hypothesis
We suppose the following:
Hypothesis 1. 
The mother wavelet Ψ is bounded and compactly supported.
Hypothesis 2. 
Φ is rapidly decreasing i.e, for every integer m 0 there exists a constant A m such that
| Φ ( u ) | A m ( 1 + | u | ) m
Hypothesis 3. 
It is supposed that there exists a function q:  L 2 ( [ 0 , 1 ] 2 ) ( X 1 ( Ω ) , Y 1 ( Ω ) ) R  such that  γ { Φ , Ψ } , any integer j j 0  and  k 1 and k 2 { 0 , , 2 j 1 }
E { q ( γ j , k 1 ( F ( X ) ) γ j , k 2 ( G ( Y ) ) ) } = 0 1 0 1 c ( x , y ) γ j , k 1 ( x ) γ j , k 2 ( y ) d x d y
where E denotes the expectation.
Hypothesis 4. 
There exist two constants, C > 0 and ρ 0 , satisfying, for any integer j j 0 and k { 0 , , 2 j 1 }
sup ( x , y ) X 1 ( Ω ) Y 1 ( Ω ) | q ( γ j , k 1 ( F ( X ) ) γ j , k 2 ( G ( Y ) ) | C 2 2 ρ j 2 j
for any integer j j 0 and k { 0 , , 2 j 1 } .
X 1 ( Ω ) , Y 1 ( Ω ) | q ( γ j , k 1 ( x ) γ j , k 2 ( y ) | d x d y C 2 2 ρ j 2 j
then,
E { | q ( γ j , k 1 ( F ( X 1 ) ) γ j , k 2 ( G ( Y 1 ) ) ) | 2 } C 2 4 ρ j
where j 0 is the integer satisfying
2 2 j 0 = [ τ ln n ] ,
where [ a ] denotes the integer part of a, J n satisfies
2 2 J n = n ln n 3 1 2 ρ + 1
and
λ j = 2 2 ρ j ln n n
These boundedness assumptions are standard for the model under α -mixing dependence; these assumptions concern the wavelet basis and the α -mixing process. They are inspired by Chesneau [30] and are useful for our adaptive wavelet estimator and its properties.
Theorem 4. 
Bias { c ˜ ( x , y ) } = o ln n n
Proof. 
Using (30) we have
| α ˜ j 0 k 1 , k 2 α ^ j 0 k 1 , k 2 | = 1 n | i = 1 n Φ j 0 k 1 , k 2 ( F ( X i ) , G ( Y i ) ) Φ j 0 k 1 , k 2 ( F n ( X i ) , G n ( Y i ) ) | C 2 2 ρ 2 2 j 0 ln ln n n
So using c a r d { k 1 and k 2 { 0 , , 2 j 0 1 } } 2 2 j 0
| A | k 1 , k 2 = 0 2 j 0 1 | α ˜ j 0 k 1 , k 2 α ^ j 0 k 1 , k 2 | | Φ j 0 , k 1 , k 2 ( x , y ) | C 2 5 j 0 2 4 ρ j ln ln n n = o ln n n
Without losing the generality starting now with ϵ = 1 so A 1 using the same methods then A we have
| β ˜ j , k 1 , k 2 1 β ^ j , k 1 , k 2 1 | = 1 n | i = 1 n Ψ j k 1 , k 2 1 ( F ( X i ) , G ( Y i ) ) Ψ j k 1 , k 2 1 ( F n ( X i ) , G n ( Y i ) ) | = 1 n i = 1 n | Φ j k 1 ( F ( X i ) ) Ψ j k 2 ( G ( Y i ) ) Φ j k 1 ( F n ( X i ) ) Ψ j k 2 ( G n ( Y i ) ) | C 2 4 ρ j ln ln n n 2 2 j
So, A 1 = o ln n n using (33) and (34) □
Lemma 4. 
Let c X 1 , Y 1 be the density of X 1 , Y 1 and let c ( X 1 , Y 1 , X m + 1 , Y m + 1 ) be the density of { X 1 , Y 1 , X m + 1 Y m + 1 } for any m Z . We suppose that there exists a constant C > 0 such that
sup m { 1 , , n 1 } sup ( x , y , x , y ) ( X 1 ( Ω ) , Y 1 ( Ω ) , X m + 1 ( Ω ) , Y m + 1 ( Ω ) ) | c X 1 , Y 1 , X m + 1 , Y m + 1 ( x , y , x , y ) c ( x , y ) c ( x , y ) | C
there exist two constants C > 0 and ρ > 0 , satisfying for γ { Φ , Ψ ϵ } ;
for any m { 1 , , n 1 } 1
| Cov q ( γ j 0 , k 1 ( F ( X m + 1 ) ) γ j 0 , k 2 ( G ( X m + 1 ) ) ) , q ( γ j , k 1 F ( X 1 ) γ j , k 2 ( ( G ( X 1 ) ) ) | C 2 2 ρ j 0 2 2 ρ j 2 j 0 2 j
where Cov denotes the covariance that is Cov { X , Y } = E { X Y } E { X } E { Y }
Proof of Lemma 4. 
(Proof of (50) and (56))
Proof of (50). 
Using Sklar C ( x , y ) = H ( F ( x ) , G ( y ) ) and u = 2 j 0 x k 1 and v = 2 j y k 2
E { | q ( γ j 0 , k 1 ( F ( X 1 ) ) γ j 0 , k 2 ( G ( X 1 ) ) ) | 2 } = 2 j 0 γ ( 2 j 0 x k 1 ) 2 2 j γ ( 2 j y k 2 ) 2 c ( x , y ) d x d y 2 2 ρ j 0 2 2 ρ j | | c | | | | γ | | L 2 2 | | γ | | L 2 2 = C 2 2 ρ j 0 2 2 ρ j
Proof of (56). 
| Cov { q ( γ j 0 , k 1 ( F ( X m + 1 ) ) γ j 0 , k 2 ( G ( X m + 1 ) ) ) , q ( γ j , k 1 F ( X 1 ) γ j , k 2 ( ( G ( X 1 ) ) ) } | = | ( γ j 0 k 1 ( x ) γ j 0 , k 2 ( y ) γ j k 1 ( x ) γ j , k 2 ( y ) c ( x , y , x , y ) γ j 0 k 1 ( x ) γ j 0 , k 2 ( y ) c ( x , y ) γ j k 1 ( x ) γ j , k 2 ( y ) c ( x , y ) ) d x d y | | γ j 0 k 1 ( x ) γ j 0 k 2 ( y ) | | γ j k 1 ( x ) γ j , k 2 ( y ) | c ( x , y , x , y ) c ( x , y ) c ( x , y ) | d x d y C | γ j 0 k 1 ( x ) γ j 0 k 2 ( y ) | d x d y . | γ j k 1 ( x ) γ j , k 2 ( y ) | d x d y C 2 2 ρ j 0 2 2 ρ j 2 j 0 2 j .
Proposition 7. 
Suppose that the assumptions H1, H2, H3, and H4 hold, then
(a) 
m = 1 n | Cov { q ( γ j 0 , k 1 ( F ( X m + 1 ) ) γ j 0 , k 2 ( G ( X m + 1 ) ) ) , q ( γ j , k 1 F ( X 1 ) γ j , k 2 ( ( G ( X 1 ) ) ) } | C 2 2 ρ j 2 ρ j 0
(b) 
there exists a constant C > 0 such that, j { j 0 , , J n } and k 1 and k 2 { 0 , , 2 j 0 1 } .
E | α ^ j 0 , k 1 , k 2 α j 0 , k 1 , k 2 | 2 C 2 4 ρ j 0 n E | β ^ j , k 1 , k 2 ϵ β j , k 1 , k 2 ϵ | 2 C 2 4 ρ j n ϵ { 1 , 2 , 3 }
(c) 
There exists a constant C > 0 such that, for any j { j 0 , , J n } and k 1 and k 2 { 0 , , 2 j 0 1 }
E | α ^ j 0 , k 1 , k 2 α j 0 , k 1 , k 2 | 4 C 2 6 ρ j 0 2 j 0 n E | β ^ j , k 1 , k 2 ϵ β j , k 1 , k 2 ϵ | 4 C 2 6 ρ j 2 j n ϵ { 1 , 2 , 3 }
(d) 
Let λ j be defined as in there exists a constant C > 0 such that, for any κ large enough, j { j 0 , , J n } and k 1 and k 2 { 0 , , 2 j 0 1 } we have
P | α ^ j 0 k 1 , k 2 α j 0 k 1 , k 2 | > κ λ j 2 C 1 n 4
P | β ^ j k 1 , k 2 ϵ β j k 1 , k 2 ϵ | > κ λ j 2 C 1 n 4
Proof of Proposition 7. 
(a)
m = 1 n 1 | Cov q ( γ j 0 , k 1 ( F ( X m + 1 ) ) γ j 0 , k 2 ( G ( X m + 1 ) ) ) , q ( γ j , k 1 F ( X 1 ) γ j , k 2 ( ( G ( X 1 ) ) ) ) | = : A + B
where
A : = m = 1 [ ln n / θ ] 1 | Cov q ( γ j 0 , k 1 ( F ( X m + 1 ) ) γ j 0 , k 2 ( G ( X m + 1 ) ) ) , q ( γ j , k 1 F ( X 1 ) γ j , k 2 ( ( G ( X 1 ) ) ) | . B : = m = [ ln n / θ ] n 1 | Cov q ( γ j 0 , k 1 ( F ( X m + 1 ) ) γ j 0 , k 2 ( G ( X m + 1 ) ) ) , q ( γ j , k 1 F ( X 1 ) γ j , k 2 ( ( G ( X 1 ) ) ) | .
It follows that 2 2 j 2 2 j 0 2 ( ln n ) 1 , where
A C 2 2 ρ j 0 2 j 0 2 2 ρ j 2 j ln n θ C 2 2 ρ j 2 2 ρ j 0
Lemma 5 
(Davydov’s inequality [32]). Let ( W t ) t Z be strictly stationary process α mixing process with mixing coefficient α m , m 0 and let h and k be two measurable functions. Let p > 0 and q > 0 satisfying 1 p + 1 q < 1 such that E | h ( W 1 | p and E | k ( W m + 1 | q exist. Then, there exists a constant C > 0 such that
| Cov { h ( W 1 ) , k ( W n + 1 ) } | C α m 1 1 p + 1 q E | h ( W 1 ) | p 1 p E | h ( W 1 ) | q 1 q
The Davydov’s inequality described in Lemma 5 with p = q = 4 , and 2 2 j 2 2 J n n
B E { γ j 0 , k 1 ( F ( X m + 1 ) ) 4 γ j 0 , k 2 ( G ( X m + 1 ) ) 4 } 1 4 E { γ j , k 1 ( F ( X 1 ) ) 4 γ j , k 2 ( ( G ( X 1 ) ) 4 } 1 4 m = [ ln n / θ ] n 1 α m C 2 ρ j 0 2 ρ j 2 j 0 2 2 j 2 E { γ j 0 , k 1 ( F ( X m + 1 ) ) 2 γ j 0 , k 2 ( G ( X m + 1 ) ) 2 } 1 4 E { γ j , k 1 ( F ( X 1 ) ) 2 γ j , k 2 ( ( G ( X 1 ) ) 2 } 1 4 . m = [ ln n / θ ] exp { θ m / 2 } C 2 2 ρ ( j + j 0 ) 2 j + j 0 2 exp { ln n 2 } C 2 2 ρ ( j + j 0 ) n exp { ln n 2 } C 2 2 ρ ( j + j 0 )
putting (59) and (60) to gether in (57) we have
m = 1 n | Cov q ( γ j 0 , k 1 ( F ( X m + 1 ) ) γ j 0 , k 2 ( G ( X m + 1 ) ) ) , q ( γ j , k 1 F ( X 1 ) γ j , k 2 ( ( G ( X 1 ) ) ) | C 2 2 ρ j 0 2 2 ρ j
(b)
Using (49) and the point (a) of proposition and (50) of Lemma 4 we have
E | α ^ j 0 k 1 , k 2 α j 0 k 1 , k 2 | 2 = Var { α ^ j 0 k 1 , k 2 } = Var 1 n i = 1 n Φ j 0 k 1 , k 2 ( F ( X i ) , G ( Y i ) ) = 1 n Var { Φ j 0 k 1 , k 2 ( F ( X 1 ) , G ( Y 1 ) ) } + m = 1 n 2 n 2 ( n m ) Cov Φ j 0 k 1 , k 2 ( F ( X 1 ) , G ( Y 1 ) ) , Φ j 0 k 1 , k 2 ( F ( X m + 1 ) , G ( Y m + 1 ) 1 n ( E { Φ j 0 k 1 , k 2 2 ( F ( X 1 ) , G ( Y 1 ) ) } + 2 m = 1 n | Cov Φ j 0 k 1 , k 2 ( F ( X m + 1 ) , G ( Y m + 1 ) ) , Φ j 0 k 1 , k 2 ( F ( X m + 1 ) , G ( Y m + 1 ) ) | ) C 2 4 ρ j 0 n
(c)
ϵ { 1 , 2 , 3 }
| β ^ j , k 1 , k 2 ϵ | sup x X 1 ( Ω ) , y Y 1 ( Ω ) | q ( Ψ j , k 1 , k 2 ϵ ) | C 2 2 ρ j 2 j
It follows from the triangular inequality and | β j , k 1 , k 2 ϵ | | | c | | 2 C that
| β ^ j , k 1 , k 2 ϵ β j , k 1 , k 2 ϵ | | β ^ j , k 1 , k 2 ϵ | + | β j , k 1 , k 2 ϵ | | C 2 j 2 2 ρ j
The inequality and the point (a) of proposition:
E | β ^ j , k 1 , k 2 ϵ β j , k 1 , k 2 | 4 C 2 j E | β ^ j k 1 , k 2 β j k 1 , k 2 ϵ | 2 2 2 ρ j c 2 6 ρ j 2 j 1 n C 2 j n 2 6 ρ j
(d)
 
Lemma 6 
(Liebscher’s inequality [33]). Let ( W t ) t Z be a strictly stationary process with the mth strongly mixing coefficient α m , m 0 , let n be a positive integer, let h : R R be a measurable function, and for any t Z , U t = h ( W t ) .
We assume that:  E { U 1 } = 0  and there exists a constant  M > 0  satisfying  | U 1 | M  . Then, for any  m { 1 , , [ n / 2 ] }  and  λ > 0 , we have
P 1 n | i = 1 n U i | λ 4 exp λ 2 n 16 D m m + λ M m 3 + 32 M λ n α m
where D m = max l { 1 , , 2 m } V a r i = 1 l U i
We will use the Liebscher’s inequality in Lemma 6 let us set
U i = α ^ j 0 , k 1 , k 2 E { α ^ j 0 , k 1 , k 2 }
we have E { U 1 } = 0 and by 2 2 j 0 2 J n n / ( ln n ) 3 ,
| U i | 2 sup | α ^ j 0 , k 1 k 2 | = 2 sup | 1 n i = 1 n 2 j 0 Φ ( 2 j 0 F ( X i ) k 1 ) Φ ( 2 j 0 G ( X i ) k 2 ) | c 2 j 0 2 2 ρ j 0 c 2 2 ρ j 0 n ln n 3 ,
(so M = c 2 2 ρ j 0 n ln n 3 ) .
  • for any integer l c ln n , since 2 2 j 2 2 j 0 2 ( ln n ) 1 , we show that
Var i = 1 l U i = Var i = 1 l α ^ j 0 , k 1 , k 2 ( F ( X i ) , G ( Y i ) ) c 2 4 ρ j 0 ( l + l 2 2 2 j 0 ) c 2 4 ρ j 0 l .
Therefore,
D m = max l { 1 , , 2 m } V a r { i = 1 l U i } c 2 4 ρ j 0 m .
owing to Lemma 6 applied with U 1 , , U n ,
λ = κ 2 2 ρ j 0 ln n n / 2 , m = [ κ ln n ] , M = c 2 2 ρ j 0 n ln n 3 ) .
  • We obtain
P | α ^ j 0 k 1 k 2 α j 0 k 1 k 2 | κ λ j 2 c exp c κ 2 λ j 2 n D m m + κ λ j m M + M λ j n e { θ m } c ( exp { c ( κ 2 2 4 ρ j 0 ln ( n ) . 2 2 ρ j 0 + κ 2 2 ρ j 0 ln n n [ κ ln n ] 2 2 ρ j n ( ln n ) 3 1 ) } + n / ( ln n ) 3 ln n / n n e θ [ κ ln n ] ) c n c κ 2 / ( 1 + κ 3 2 ) + n 2 θ κ
taking κ large enough, the last term is bounded by c n 4 . □
Proof. 
Var { c ^ ( x , y ) } = Var { k 1 , k 2 = 0 2 j 0 1 α ^ j 0 k 1 , k 2 Φ j 0 k 1 , k 2 ( x , y ) + ϵ = 1 3 j = j 0 J n k 1 , k 2 = 0 2 j 0 1 β ^ j , k 1 , k 2 ϵ Ψ j k 1 , k 2 ϵ ( x , y ) } = k 1 , k 2 = 0 2 j 0 1 E { | α ^ j 0 k 1 , k 2 α j 0 k 1 , k 2 | 2 } Φ j 0 k 1 , k 2 2 ( x , y ) + ϵ = 1 3 j = j 0 J n k 1 , k 2 = 0 2 j 0 1 E { | β ^ j , k 1 , k 2 ϵ β j , k 1 , k 2 ϵ | 2 } Ψ j k 1 , k 2 ϵ 2 ( x , y ) + 2 n 2 ϵ = 1 3 i < r = 1 n k = 0 2 j 0 1 k = 0 2 j 1 j = j 0 J n Cov Φ j 0 k ( F ( X i ) , G ( Y i ) ) , Ψ j k 1 , k 2 ϵ ( F ( X r ) , G ( Y r ) ) . Φ j 0 k 1 k 2 ( x , y ) Ψ j k 1 , k 2 ( x , y ) = : A + ϵ = 1 3 A ϵ + ϵ = 1 3 B ϵ
Lemma 7. 
B ϵ : = 2 n 2 i < r = 1 n k 1 k 2 = 0 2 j 0 1 k = 0 2 j 1 j = j 0 J n Cov Φ j 0 k ( F ( X i ) , G ( Y i ) ) , Ψ j k 1 , k 2 ϵ ( F ( X r ) , G ( Y r ) ) . Φ j 0 k 1 k 2 ( x , y ) Ψ j k 1 , k 2 ( x , y ) = O ln n n , ϵ { 1 , 2 , 3 }
Proof. 
| B ϵ | = | 2 n 2 i < r = 1 n k 1 k 2 = 0 2 j 0 1 k 1 k 2 = 0 2 j 1 j = j 0 J n Cov Φ j 0 k 1 k 2 ( F ( X 1 ) , G ( Y 1 ) ) , Ψ j k 1 k 2 ( F ( X m + 1 ) , G ( Y m + 1 ) ) . Φ j 0 k 1 k 2 ( x ) Ψ j k 1 k 2 ( x ) | = | 2 n 2 m = 1 n 1 ( n m ) k 1 , k 2 = 0 2 j 0 1 k 1 , k 2 = 0 2 j 1 j = j 0 J n Cov Φ j 0 , k 1 k 2 ( F ( X 1 ) , G ( Y 1 ) ) , Ψ j k 1 k 2 ( ( F ( X m + 1 ) , G ( Y m + 1 ) ) ) . Φ j 0 k 1 k 2 ( x , y ) Ψ j k 1 k 2 ( x , y ) | | 2 n m = 1 n 1 k 1 k 2 = 0 2 j 0 1 k 1 k 2 = 0 2 j 1 j = j 0 J n Cov Φ j 0 k 1 k 2 ( F ( X 1 ) , G ( Y 1 ) ) , Ψ j k 1 k 2 ( F ( X m + 1 ) , G ( Y m + 1 ) ) . Φ j 0 , k 1 k 2 ( x ) Ψ j k 1 k 2 ( x ) |
Starting with Proof of Lemma 7, with out losing the generality ϵ = 1
| B 1 | 2 n j = j 0 J n k 1 k 2 = 0 2 j 0 1 k 1 k 2 = 0 2 j 1 2 2 ρ j 2 2 ρ j 0 | Φ j 0 k 1 k 2 ( x , y ) | | Ψ j k 1 , k 2 ( x , y ) | c n k = 0 2 j 0 1 | Φ j 0 k 1 k 2 ( x , y ) | j = j 0 J n k 1 k 2 = 0 2 j 1 | Ψ j k 1 , k 2 ( x , y ) | c 2 j 0 2 ρ ( j + j 0 ) n c ln n n u s i n g t h a t 2 2 j 0 ln n
Lemma 8. 
Using Proposition (7), (a), (b) and (c), we have
A : = k 1 , k 2 = 0 2 j 0 1 E { | α ^ j 0 k 1 , k 2 α j 0 k 1 , k 2 | 2 } Φ j 0 k 1 , k 2 2 ( x , y ) = O ln n n
Proof. 
Using Proposition 7a
E 2 | α ^ j 0 k 1 , k 2 α j 0 k 1 , k 2 | 2 1 | α ^ j 0 k 1 , k 2 α j 0 k 1 , k 2 | 2 2 ρ j 0 ln n n E { | α ^ j 0 k 1 , k 2 α j 0 k 1 , k 2 | 4 . 1 | α ^ j 0 k 1 , k 2 α j 0 k 1 , k 2 | 2 2 ρ j 0 ln n n } 2 4 ρ j 0 ln n n E | α ^ j 0 k 1 , k 2 α j 0 k 1 , k 2 | 2 C 2 8 ρ j 0 ln n n 2
k 1 , k 2 = 0 2 j 0 1 Φ j 0 , k 1 k 2 4 ( x , y ) = O 2 8 ρ j 0 2 4 j 0 .
Using (37), (62), (63) and c a r d k 1 and k 2 { 0 , , 2 j 0 1 } C 2 j 0 we obtain the following:
k 1 , k 2 = 0 2 j 0 1 E | α ^ j 0 k 1 , k 2 α j 0 k 1 , k 2 | 2 1 | α ^ j 0 k 1 , k 2 α j 0 k 1 , k 2 | 2 2 ρ j 0 ln n n Φ j 0 k 1 , k 2 2 ( x , y ) c ( k 1 , k 2 = 0 2 j 0 1 2 16 ρ j 0 . 2 4 j 0 ln n n 2 ) 1 2 c 2 16 ρ j 0 2 6 j 0 ln n n 2 1 2
Starting now with
k 1 , k 2 = 0 2 j 0 1 E | α ^ j 0 k 1 , k 2 α j 0 k 1 , k 2 | 2 1 | α ^ j 0 k 1 , k 2 α j 0 k 1 , k 2 | > 2 2 j 0 ln n n Φ j 0 k 1 , k 2 2 ( x , y )
using Holder ’s inequality: using points (b) and (c) of Proposition 7:
E 2 | α ^ j 0 k 1 , k 2 α j 0 k 1 , k 2 | 2 1 | α ^ j 0 k 1 , k 2 α j 0 k 1 , k 2 | > 2 2 j 0 ln n n E | α ^ j 0 k 1 , k 2 α j 0 k 1 , k 2 | 4 . E 1 | α ^ j 0 k 1 , k 2 α j 0 k 1 , k 2 | > 2 2 j 0 ln n n E | α ^ j 0 k 1 , k 2 α j 0 k 1 , k 2 | 4 . P | α ^ j 0 k 1 , k 2 α j 0 k 1 , k 2 | > 2 2 j 0 ln n n C 2 j 0 2 6 ρ j 0 n 1 n 4
using Holder’s inequality, (37), (64) and c a r d k 1 and k 2 { 0 , , 2 j 0 1 } C 2 2 j 0
k 1 , k 2 = 0 2 j 0 1 E | α ^ j 0 k 1 , k 2 α j 0 k 1 , k 2 | 2 1 | α ^ j 0 k 1 , k 2 α j 0 k 1 , k 2 | > 2 2 ρ j ln n n Φ j 0 k 1 , k 2 2 ( x , y ) c ( k 1 , k 2 = 0 2 j 0 1 2 j 0 2 6 ρ j 0 n 5 . 2 4 j 0 2 8 ρ j 0 ) 1 2 c 2 7 j 0 2 14 ρ j 0 n 5 1 2 C ln n n
so we have A. □
Lemma 9. 
A ϵ : = j = j 0 J n k = 0 2 j 1 E | β ^ j k 1 , k 2 ϵ β j k 1 , k 2 ϵ | Ψ j k 1 , k 2 ϵ 2 ( x , y ) = O ln n n ϵ { 1 , 2 , 3 }
we applyHolder’s inequality twice in succession:
j = j 0 J n k 1 , k 2 = 0 2 j 1 E | β ^ j k 1 , k 2 ϵ β j k 1 , k 2 ϵ | 2 Ψ j , k 1 , k 2 ϵ 2 ( x , y ) j = j 0 J n k 1 , k 2 = 0 2 j 1 E | β ^ j k 1 , k 2 ϵ β j k 1 , k 2 ϵ | 2 1 2 . j = j 0 J n k 1 , k 2 = 0 2 j 1 Ψ j , k 1 , k 2 ϵ 4 ( x , y ) 1 2
Lemma 10. 
Using point (a) of Proposition 7 and c a r d k 1 and k 2 { 0 , , 2 j 1 } C 2 2 j , we have
j = j 0 J n k 1 , k 2 = 0 2 j 1 E { | β ^ j k 1 , k 2 ϵ β j k 1 , k 2 ϵ | 2 1 | β ^ j k 1 , k 2 ϵ β j k 1 , k 2 ϵ | 2 2 ρ j ln n n Ψ j , k 1 , k 2 ϵ 2 ( x , y ) = O ln n n
Proof of Lemma 10. 
Using Jenson’s inequality, and point (a) of Proposition 7
E 2 | β ^ j k 1 , k 2 ϵ β j k 1 , k 2 ϵ | 2 1 | β ^ j k 1 , k 2 ϵ β j k 1 , k 2 ϵ | 2 2 ρ j ln n n E { | β ^ j k 1 , k 2 ϵ β j k 1 , k 2 ϵ | 4 . 1 | β ^ j k 1 , k 2 ϵ β j k 1 , k 2 ϵ | 2 2 ρ j ln n n } 2 4 ρ j ln n n E | β ^ j k 1 , k 2 ϵ β j k 1 , k 2 ϵ | 2 c 2 8 ρ j ln n n 2
so using (65), (66) and using c a r d k 1 and k 2 { 0 , , 2 j 1 } C 2 2 j we have
j = j 0 J n k 1 , k 2 = 0 2 j 1 E | β ^ j k 1 , k 2 ϵ β j k 1 , k 2 ϵ | 2 1 | β ^ j k 1 , k 2 ϵ β j k 1 , k 2 ϵ | 2 2 ρ j ln n n Ψ j k 1 , k 2 ϵ 2 ( x , y ) C j = j 0 J n k 1 , k 2 = 0 2 j 1 2 8 ρ j ln n n 2 1 2 C j = j 0 J n ln n n 2 2 8 ρ j 2 2 j 1 2 C ln n n
Lemma 11. 
Using points (b) and (c) of Proposition 7 and c a r d k 1 and k 2 { 0 , , 2 j 1 } C 2 2 j and (65) we have
j = j 0 J n k 1 , k 2 = 0 2 j 1 E 2 | β ^ j k 1 , k 2 ϵ β j k 1 , k 2 ϵ | 2 1 | β ^ j k 1 , k 2 ϵ β j k 1 , k 2 ϵ | > 2 2 ρ j ln n n Ψ j , k 1 , k 2 ϵ 2 ( x , y ) = O ln n n
Proof of Lemma 11. 
Using Holder s’ inequality and using points (b) and (c) of Proposition 7.
E 2 | β ^ j k 1 , k 2 ϵ β j k 1 , k 2 ϵ | 2 1 | β ^ j k 1 , k 2 ϵ β j k 1 , k 2 ϵ | > 2 2 ρ j ln n n E | β ^ j k 1 , k 2 ϵ β j k 1 , k 2 ϵ | 4 . E 1 | β ^ j k 1 , k 2 ϵ β j k 1 , k 2 ϵ | > 2 2 ρ j ln n n = E | β ^ j k 1 , k 2 ϵ β j k 1 , k 2 ϵ | 4 . P | β ^ j k 1 , k 2 ϵ β j k 1 , k 2 ϵ | > 2 2 ρ j ln n n C 2 j 2 6 ρ j n n 4
using (65), (67) and using c a r d k 1 and k 2 { 0 , , 2 j 1 } C 2 2 j we obtain the following:
j = j 0 J n k 1 , k 2 = 0 2 j 1 E | β ^ j k 1 , k 2 ϵ β j k 1 , k 2 ϵ | 2 1 | β ^ j k 1 , k 2 ϵ β j k 1 , k 2 ϵ | > 2 2 ρ j ln n n Ψ j , k 1 , k 2 ϵ 2 ( x ) c j = j 0 J n k 1 , k 2 = 0 2 j 1 2 j 2 6 ρ j n 5 1 2 C j = j 0 J n 2 3 j 2 6 ρ j n 5 1 2 C ln n n

3.4. Copula Estimation with Iterative Fixed Point Thresholding

In this paragraph, we extend the iterative wavelet thresholding framework to the estimation of copula densities. Unlike classical wavelet-based copula estimators, the proposed fixed-point iterative procedure adaptively updates the threshold to balance noise suppression and structural preservation in the joint dependence function. This adaptive behavior is particularly advantageous in financial contexts, where copulas often exhibit localized tail dependencies and nonlinear patterns that require scale-sensitive denoising.
Let ( X i , Y i ) 1 i n be a bivariate sample representing two financial assets. We are interested in estimating the copula density c ( u , v ) , which describes the dependence structure between X and Y, regardless of marginals. We define the pseudo-observations:
U i = 1 n + 1 j = 1 n 1 X j X i a n d V i = 1 n + 1 j = 1 n 1 Y j Y i .
The wavelet estimator is given by
c ˜ ( u , v ) = k 1 , k 2 = 0 2 j 1 α ˜ j 0 k 1 , k 2 φ j 0 k 1 , k 2 ( u , v ) + ϵ = 1 3 j = j 0 j n k 1 , k 2 = 0 2 j 1 β ϵ ˜ j k 1 , k 2 ψ j k 1 , k 2 ϵ ( u , v )
We wish to apply the fixed-point iterative thresholding method described in Section 2 to obtain an estimate of copula density. Three cases can be considered:
  • Raw copula estimation: estimate c ( u , v ) from the original data ( X , Y ) using the original pseudo-observations ( U i , V i ) .
  • Both denoised data: After denoising both marginal ( X , Y ) by FPWT-iterative thresholding, yielding ( X ^ , Y ^ ) , then estimate c ^ U ^ , V ^ ( u , v ) .
  • Partial denoising: Apply the method of denoising only to one of the marginal, to X i ; for example, to obtain X i ^ , then estimate c U ^ , V ( u , v ) of ( X ^ , Y ) .
In the next section, we consider an application of this approach for the estimation of univariate density and bivariate copula density in real financial data.

4. Empirical Results

This section applies the fixed-point iterative thresholding (FPWT) algorithm to univariate density estimation and compares it with classical thresholding methods. We then estimate the copula density between Bitcoin and the S&P 500, compare it with the empirical copula, and analyze its performance in both noisy and noise-free settings.
  • Application 1: Density estimation with fixed point iterative thresholding
Considering the estimation of the density function under iterative thresholding (FPWT) compared to the kernel density estimation (KDE) method for Bitcoin’s closing prices covering the period from 4 January 2017 to 15 December 2023.
Figure 2 compares several density estimation methods for daily log returns of Bitcoin.
Figure 2. FPWT vs. Gaussien vs. KDE for Bitcoin density estimation.
The histogram provides the empirical distribution, while the red dashed curve represents a Gaussian fit, assuming normality. The green dashed line (KDE) offers a smoothed non-parametric estimate, which better captures the distribution’s asymmetry and heavier tails. Notably, the blue line (FPWT) highlights finer structures and a sharper peak, indicating that the wavelet-based estimator detects local features and possibly multimodality missed by classical approaches. This suggests that FPWT provides a more adaptive and detailed estimation of Bitcoin return dynamics. To assess the performance of the proposed fixed-point iterative wavelet thresholding method, we compare it with several existing estimators commonly used in the literature. Quantitative performance measures such as the Mean Integrated Squared Error (MISE), bias, and variance are computed to highlight the advantages of the proposed approach in terms of adaptivity, accuracy, and robustness to dependence structures. The results are shown in Table 1.
Table 1. Comparison of univariate density estimation methods.
Comparing (FPWT) to other standard wavelet thresholding methods is a solid way to validate its performance. The results of the comparison are presented in Table 2.
Table 2. Comparing denoising methods.
Iterative thresholding achieves the best MISE and SNR by adapting to the distribution of wavelet coefficients. Universal thresholding is fastest but tends to over smooth due to high sparsity. SureShrink and BayesShrink offer a compromise, balancing adaptivity and computational efficiency.
  • Application 2: Empirical vs. FPWT iterative copula estimation
We consider daily log-returns of Bitcoin and the S&P 500 index over the period January 2020–December 2021 ( n = 512 ) . Both series are standardized into pseudo-observations [ 0 , 1 ] , which serve as inputs for nonparametric copula estimation. To assess the effect of wavelet-based denoising, we apply the fixed-point iterative thresholding procedure to the raw data before computing copula densities.
Figure 3 compares the empirical copula density c ^ e m p ( u , v ) with the FPWT iterative copula c ^ F P ( u , v ) . The empirical estimate shows oscillations and irregularities, particularly in the tail regions, while the FPWT-iterative copula displays smoother contours and a clearer separation between regions of high and low dependence.
Figure 3. Empirical vs. FPWT iterative copula estimation.
Table 3 reports global metrics comparing the two surfaces. The FPWT iterative method achieves an MISE of 0.0055 relative to the empirical copula and a high signal-to-noise ratio (SNR) of 22.1 dB, confirming that thresholding removes spurious oscillations while preserving the variance of the dependence structure.
Table 3. Metrics comparing of thersholding methods.
To further examine the impact on extremes, we compute the lower- and upper-tail dependence coefficients, λ L and λ U , at quantile levels q = 0.10 and q = 0.05 . The results are summarized in Table 4.
Table 4. Tail of dependence.
The results reveal that lower-tail dependence remains stable at moderate levels ( q = 0.10 ), but increases slightly in the extreme case ( q = 0.05 ), indicating that denoising clarifies joint negative extremes. Conversely, upper-tail dependence decreases after denoising, suggesting that part of the strong positive tail dependence in the raw estimate was noise-driven.
Overall, the FPWT iterative approach improves the interpretability of copula estimation for financial data. The method reduces noise, stabilizes the copula surface, and refines the measurement of tail co-movements. Importantly, it reveals stronger downside dependence while tempering spurious positive co-movements, which is consistent with the heavy-tailed and asymmetric nature of financial return distributions.
  • Application 3: Estimation of the bivariate copula density with denoising by FPWT algorithm
The empirical dataset covers the period from 1 January 2020 to 31 December 2021, comprising n = 512 daily log-return observations for Bitcoin (BTC) and the S&P 500 index. Bitcoin returns are modeled using a Student-t distribution with 5 degrees of freedom to capture the heavy-tailed nature and volatility patterns observed in cryptocurrency markets. The S&P 500 returns are generated as a linear combination of the Bitcoin returns and an independent heavy-tailed noise component, thereby introducing realistic positive dependence while retaining idiosyncratic variations.
Both series are transformed into pseudo-observations on the unit interval [ 0 , 1 ] , enabling the nonparametric estimation of the copula density before and after wavelet-based fixed-point iterative denoising. This framework captures the interplay between marginal distributional features and the dependence structure under different noise-reduction scenarios.
We compare the estimation of the bivariate copula density in three distinct scenarios involving wavelet-based denoising of financial return data. In the first case, the copula c ^ ( U , V ) is estimated directly from the raw pseudo-observations without any thresholding, resulting in a noisy and irregular surface, particularly in the tail and boundary regions. In the second case, both marginals are denoised using fixed-point iterative wavelet thresholding before computing the copula c ^ ( U ^ , V ^ ) , yielding a significantly smoother and more interpretable structure. This version reveals clearer tail dependencies and more concentrated density contours. In the third case, only one marginal (e.g., X) is denoised, leading to the copula c ^ ( U ^ , V ) , which displays partial improvement, highlighting the asymmetric impact of noise on dependence estimation.
The results presented in Figure 4 confirm that applying wavelet thresholding, even partially, improves the accuracy and interpretability of copula estimates, especially when data are contaminated with noise or exhibit nonlinear dependencies.
Figure 4. The estimation of the copula density in the three cases.
Comparison of metrics in three cases of denoising is given in Table 5.
Table 5. Comparing thersholding methods.
Effects on tail dependence across the three cases of denoising are given in Table 6.
Table 6. Tail of dependence in three cases of denoising.
The tail dependence analysis highlights the impact of wavelet-based fixed-point iterative denoising on the joint extreme behavior of Bitcoin and the S&P 500. In the lower tail, the dependence coefficient λ L remains broadly stable at q = 0.10 , but shows a slight increase for q = 0.05 after denoising, indicating a clearer detection of joint negative extremes. In contrast, the upper tail coefficient λ U decreases after denoising, most notably in the full denoising case, suggesting that part of the strong positive tail dependence observed in the raw data are attributable to noise. Partial denoising produces intermediate results, improving tail clarity without fully removing spurious dependence patterns. Overall, denoising appears to sharpen lower-tail dependence signals while tempering exaggerated upper-tail associations.

5. Conclusions

This study demonstrates the effectiveness of the fixed-point iterative wavelet thresholding approach for nonparametric estimation in financial applications. At the univariate level, the method produces smoother and more accurate density estimates of return distributions compared to the raw empirical estimates. By suppressing spurious oscillations while retaining the heavy-tailed nature of financial data, it achieves a favorable trade-off between bias reduction and variance control.
At the bivariate level, when applied to copula density estimation, the FPWT iterative approach improves the readability of the dependence structure by eliminating high-frequency noise. This refinement leads to more stable metrics of joint dependence and sharper insights into extreme co-movements. In particular, the method enhances the detection of downside tail dependence, a crucial feature in risk management, while attenuating spurious upper-tail dependence often driven by noisy fluctuations.
Overall, the FPWT approach provides a robust and flexible tool for both marginal and dependence modeling in finance. Its ability to denoise, while preserving essential structural information, makes it especially valuable for studying heavy-tailed distributions, nonlinear dependencies, and tail risks in financial markets. Future work could extend this framework to higher-dimensional copulas and to real-time risk monitoring in dynamic market environments.

6. Discussion

Although wavelet-based density and copula estimators have been widely studied, many methods remain limited when handling noise, heavy tails, or dependence. Standard thresholding may cause over-smoothing or noise retention, and few approaches provide a unified adaptive framework for marginal and bivariate estimation. To address these issues, we propose a fixed-point iterative wavelet method for copula density estimation, tailored to weak dependence. Our approach fills a gap in the literature and is supported by rigorous theoretical results guaranteeing convergence, consistency, and statistical reliability.
The method has broad practical relevance. In finance, it models nonlinear and tail dependence between assets, improving stress testing and risk analysis. In insurance, it captures dependence among risk sources for solvency and capital allocation. In environmental and climate sciences, it is suitable for modeling complex dependence in non-stationary and extreme situations.
Future work will extend the method to higher-dimensional copulas, develop dynamic wavelet copula models for time-varying dependence, and integrate it into real-time risk monitoring and anomaly detection systems.

Author Contributions

Conceptualization, H.B. (Heni Boubaker) and H.B. (Houcem Belgacem); methodology, H.B. (Heni Boubaker); software, H.B. (Houcem Belgacem); validation, H.B. (Houcem Belgacem); formal analysis, H.B. (Heni Boubaker); investigation, H.B. (Heni Boubaker) and H.B. (Houcem Belgacem). All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Data Availability Statement

The data presented in this study are available on request from the corresponding author due to legal and ethical reasons.

Conflicts of Interest

The authors declare no conflicts of interest.

Abbreviations

The following abbreviations are used in this manuscript:
PDFProbability Density Function
CDFCumulative Distribution Function
MSEMean Integrated Squared Error
SNRSignal-to-Noise Ratio
VarVariance
CovCovariance
FPWTFixed Point Wavelet Thresholding
WTWavelet Transform
DWTDiscrete Wavelet Transform
IDWTInverse Discrete Wavelet Transform
MRAMultiresolution Analysis
i.i.d.Independent and Identically Distributed
α -mixing α -mixing dependence condition
GGDGeneralized Gaussiaan Distribution
STCStop Criterion
BTCBitcoin
Nomenclature
X, YRandom variables
f ( x ) Probability density function
f ^ ( x ) , f ˜ ( x ) Estimates density function
ϕ j k , ψ j k Wavelet functions at scale j and position k
β j , k , β j , k ϵ Detail Wavelet coefficients of the function f
λ j Threshold value at resolution level j
λ j ( t ) Threshold value level j and iteration t
σ Noise standard deviation
F a User defined constant
F a m Minimal value of F a
c ^ ( u , v ) , c ˜ ( u , v ) Estimates copula density
JMaximum resolution level
λ L Lower tail of dependence
λ U Upper tail of dependence
β j k s , t Informative wavelet coefficients at iteration t
β j k r , t Noisy wavelet coefficients at iteration t

References

  1. Grossmann, A.; Morlet, J. Decomposition of Hardy functions into square integrable wavelets of constant shape. SIAM J. Math. Anal. 1984, 15, 723–736. [Google Scholar] [CrossRef]
  2. Meyer, Y. Ondelettes et fonctions splines. In Séminaire Equations aux Dŕivés Partielles (Polytechnique) dit aussi "Séminaire Goulaouic-Schwartz"; Ecole Polytechnique: Palaiseau, France, 1986; pp. 1–18. [Google Scholar]
  3. Daubechies, I. Orthonormal bases of compactly supported wavelets. Commun. Pure Appl. Math. 1988, 41, 909–996. [Google Scholar] [CrossRef]
  4. Mallat, S.G. A theory for multiresolution signal decomposition: The wavelet representation. IEEE Trans. Pattern Anal. Mach. Intell. 1989, 11, 674–693. [Google Scholar] [CrossRef]
  5. Härdle, W.; Kerkyacharian, G.; Picard, D.; Tsybakov, A. Wavelets, Approximation and Statistical Application; Springer: New York, NY, USA, 1998. [Google Scholar]
  6. Tsybakov, A.B. Introduction à l’estimation non paramétrique. In Mathématiques et Applications; Springer: Berlin/Heidelberg, Germany, 2003. [Google Scholar]
  7. Vidakovic, B. Statistical Modeling by Wavelets; Institute of Statistics and Decision Science: Kolkata, India, 1999. [Google Scholar]
  8. Kerkyacharian, G.; Picard, D. Density estimation in Besov spaces. Stat. Probab. Lett. 1992, 13, 15–24. [Google Scholar] [CrossRef]
  9. Kerkyacharian, G.; Picard, D. Density estimation by kernel and wavelet methods: Optimality of Besov spaces. Stat. Probab. Lett. 1993, 18, 327–336. [Google Scholar] [CrossRef]
  10. Antoniadis, A.; Carmona, R. Multiresolution Analyses and Wavelets for Density Estimation; Technical Report; University of California at Irvine: Berkeley, CA, USA, 1991. [Google Scholar]
  11. Walter, G.G. Approximation of Delta function by wavelets. J. Approx. Theory 1992, 71, 329–343. [Google Scholar] [CrossRef]
  12. Walter, G.G. Wavelets and Others Orthogonal Systems with Applications; CRC Press: Boca Raton, FL, USA, 1994. [Google Scholar]
  13. Donoho, D.L.; Johnstone, I.M. Ideal spatial adaptation by wavelet shrinkage. Biometrika 1994, 81, 425–455. [Google Scholar] [CrossRef]
  14. Donoho, D.L.; Johnstone, I.M.; Kerkyacharian, G.; Picard, D. Density estimation by wavelet thresholding. Ann. Stat. 1996, 24, 508–539. [Google Scholar] [CrossRef]
  15. Donoho, D.L.; Johnstone, I.M. Minimax risk over lp-balls for lp-error. Probab. Theory Relat. Fields 1994, 99, 277–303. [Google Scholar] [CrossRef]
  16. Naeson, G.P. Wavelet function estimation using cross-validation. J. R. Stat. Soc. Ser. B 1996, 58, 463–479. [Google Scholar] [CrossRef]
  17. Vidakovic, B. Non linear wavelet shrinkage with Bayes rules and Bayes factors. J. Am. Stat. Assoc. 1994, 93, 173–179. [Google Scholar] [CrossRef]
  18. Hadjileontiadis, L.J.; Panas, S.M. Separation of discontinuous adventitious sounds from vesicular sounds using a wavelet-based filter. IEEE Trans. Biomed. Eng. 1997, 44, 1269–1281. [Google Scholar] [CrossRef] [PubMed]
  19. Ranta, R.; Heinrich, C.; Louis-Dorr, V.; Wolf, D. Interpretation and improvement of an iterative wavelet-based denoising method. IEEE Signal Process. Lett. 2003, 10, 239–241. [Google Scholar] [CrossRef]
  20. Sklar, M. Fonctions de répartition à n dimensions et leurs marges. Annales de l’ISUP 1959, 8, 229–231. [Google Scholar]
  21. Joe, H. Multivariate Models and Multivariate Dependence Concepts; CRC Press: Boca Raton, FL, USA, 1997. [Google Scholar]
  22. Nelsen, R.B. An Introduction to Copulas; Springer: Berlin/Heidelberg, Gemany, 2006. [Google Scholar]
  23. Genest, C.; Masiello, E.; Tribouley, K. Estimating copula densities through wavelets. Insur. Math. Econ. 2009, 44, 170–181. [Google Scholar] [CrossRef]
  24. Autin, F.; Pennec, F.L.; Tribouley, K. Thresholding methods to estimate copula density. J. Multivar. Anal. 2010, 101, 200–222. [Google Scholar] [CrossRef]
  25. Chatrabgoun, O.; Parham, G.; Chinipardaz, R. A Legendre multiwavelets approach to copula density estimation. Stat. Pap. 2017, 58, 673–690. [Google Scholar] [CrossRef]
  26. Provost, S.B. Nonparametric copula density estimation methodologies. Mathematics 2024, 12, 398. [Google Scholar] [CrossRef]
  27. Falhi, F.H.; Hmood, M.Y. Estimation of Copula Density Using the Wavelet Transform. Baghdad Sci. J. 2024, 21, 18. [Google Scholar] [CrossRef]
  28. Pensky, M.; Canditiis, D.D. Minimax estimation with thresholding and its application to wavelet analysis. Stat. Med. 2025, 44, 1234–1250. [Google Scholar]
  29. Bouezmarni, T.; Rombouts, J.V.; Taamouti, A. Asymptotic properties of the Bernstein density copula estimator for α-mixing data. J. Multivar. Anal. 2010, 101, 1–10. [Google Scholar] [CrossRef]
  30. Chesneau, C. On the adaptive wavelet estimation of a multidimensional regression function under alpha-mixing dependence:beyond the standard assumptions on the noise. Comment. Math. Univ. Carol. 2013, 54, 527–556. [Google Scholar]
  31. Coifman, R.; Wickerhauser, M.V. Adapted waveform “de-noising” for medical signals and images. IEEE Eng. Med. Biol. Mag. 1995, 14, 578–586. [Google Scholar] [CrossRef]
  32. Davydov, Y.A. The invariance principle for stationary processes. Theory Probab. Its Appl. 1970, 15, 487–498. [Google Scholar] [CrossRef]
  33. Liebscher, E. Strong convergence of sums of α-mixing random variables with applications to density estimation. Stoch. Process. Their Appl. 1996, 65, 69–80. [Google Scholar] [CrossRef]
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Article Metrics

Citations

Article Access Statistics

Multiple requests from the same IP address are counted as one view.