Nonlocal Prior Mixture-Based Bayesian Wavelet Regression with Application to Noisy Imaging and Audio Data

Sanyal, Nilotpal

doi:10.3390/math13162642

Open AccessArticle

Nonlocal Prior Mixture-Based Bayesian Wavelet Regression with Application to Noisy Imaging and Audio Data

by

Nilotpal Sanyal

Department of Mathematical Sciences, University of Texas at El Paso, El Paso, TX 79968, USA

Mathematics 2025, 13(16), 2642; https://doi.org/10.3390/math13162642

Submission received: 8 July 2025 / Revised: 8 August 2025 / Accepted: 14 August 2025 / Published: 17 August 2025

(This article belongs to the Special Issue Bayesian Statistics and Applications)

Download

Browse Figures

Versions Notes

Abstract

We propose a novel Bayesian wavelet regression approach using a three-component spike-and-slab prior for wavelet coefficients, combining a point mass at zero, a moment (MOM) prior, and an inverse moment (IMOM) prior. This flexible prior supports small and large coefficients differently, offering advantages for highly dispersed data where wavelet coefficients span multiple scales. The IMOM prior’s heavy tails capture large coefficients, while the MOM prior is better suited for smaller non-zero coefficients. Further, our method introduces innovative hyperparameter specifications for mixture probabilities and scale parameters, including generalized logit, hyperbolic secant, and generalized normal decay for probabilities, and double exponential decay for scaling. Hyperparameters are estimated via an empirical Bayes approach, enabling posterior inference tailored to the data. Extensive simulations demonstrate significant performance gains over two-component wavelet methods. Applications to electroencephalography and noisy audio data illustrate the method’s utility in capturing complex signal characteristics. We implement our method in an R package, NLPwavelet (≥1.1).

Keywords:

three component spike-and-slab prior; wavelet analysis; nonlocal prior; generalized logit decay; hyperbolic secant decay; generalized normal decay

MSC:

62G08

1. Introduction

We focus on spike-and-slab mixture models for wavelet-based Bayesian nonparametric regression. There are several existing approaches that consider for the wavelet coefficients various spike-and-slab mixture models, such as mixtures of two Gaussian distributions with different standard deviations [1], mixtures of a Gaussian and a point mass at zero [2,3,4], mixtures of a heavy-tailed distribution and a point mass at zero [5,6], mixtures of a logistic distribution and a point mass at zero [7], and mixtures of a nonlocal prior and a point mass at zero [8]. Nonlocal priors [9] are a class of priors that assign zero probability density in a neighborhood of the null value (often zero) of the parameter, unlike local priors, which are positive everywhere. In contrast to the other works mentioned above that used traditionally used local priors for the wavelet coefficients, ref. [8] pioneered the use of nonlocal priors for wavelet regression. Specifically, ref. [8] used two different nonlocal priors, namely the moment prior (MOM) and the inverse moment (IMOM) prior, in their mixture model. In this work, we flexibly extend all the previous approaches by proposing a three-component spike-and-slab mixture model for the wavelet coefficients where, along with a point mass for the spike part, a mixture of nonlocal priors is used to model the slab component. In addition, we introduce novel hyperparameter specifications that are shown to provide improved estimates in extensive simulation experiments with highly dispersed data.

In the Bayesian paradigm, nonlocal priors have been shown to encourage model selection parsimony and selective shrinkage (unlike local priors) for spurious coefficients [10,11]. They create a gap around zero, yielding harder exclusion and lower bias for sizable effects. In contrast, local shrinkage priors provide continuous shrinkage that is particularly suitable for estimation but less effective for variable selection unless combined with explicit selection rules. Previous work [12] has observed, in the context of high-dimensional genomics data, that different nonlocal priors provide better support for large and small regression coefficients. In wavelet regression, the wavelet coefficients at multiple resolution levels capture both location and scale characteristics of the underlying function [13]. If the underlying function is highly dispersed, its energy is spread across a wide range of location or scale components. For such a function, while most of the wavelet coefficients will likely be small or near zero, non-zero wavelet coefficients will span across multiple scales. In other words, no single scale will dominate the wavelet coefficients, as different scales will capture different portions of the signal’s energy. This will lead to significant coefficients at both coarse and fine scales. We conjecture that, if the underlying function is highly dispersed, a mixture of nonlocal priors will provide better support to the distribution of the wavelet coefficients compared to individual nonlocal priors. Specifically, in this work, we consider for the wavelet coefficients a prior that is a mixture of a point mass at zero, a MOM prior, and an IMOM prior.

Our motivation for combining MOM and IMOM priors stems from the fact that they offer complementary strengths in sparse high-dimensional regression. The MOM prior places more mass around moderate non-zero values, thereby offering greater sensitivity to small but meaningful signals. On the other hand, the IMOM prior has heavier tails, allowing it to support large coefficients better. In the context of wavelet regression of highly dispersed data, a single prior, MOM or IMOM, may over-shrink large coefficients or over-allow small noise fluctuations. A mixture of MOM and IMOM provides adaptive flexibility, simultaneously supporting both ends of the coefficient spectrum. This idea echoes adaptive shrinkage principles in sparse Bayesian learning. From a signal processing perspective, the heterogeneous scaling behavior of wavelet coefficients for highly dispersed signals motivates a prior that can adaptively vary shrinkage across scales. This hybrid slab also enables the prior predictive distribution to cover a broader range of plausible signals, reducing prior-data conflict. While our empirical results support this mixture’s performance, the theoretical justification lies in its capacity to model heterogeneity in signal strengths, a feature not adequately captured by any one nonlocal prior alone.

In Bayesian wavelet regression, the probability weights associated with different component distributions of a spike-and-slab mixture prior, henceforth called mixture probabilities, and the scaling parameters of the component distributions are often governed by multiple hyperparameters [2,3,8]. For the mixture probabilities at different resolution levels, previous work considered exponential decay specification [3], Bernoulli distribution [2], and logit specification [8]. Further, for the scaling parameters, exponential decay [3] and polynomial decay specifications [8] have been considered. In this work, we propose several novel specifications that flexibly model the variations in the mixture probabilities and scaling components. For the mixture probabilities, we consider a generalized logit decay, a hyperbolic secant decay, and a generalized normal decay, whereas for the scaling parameter, we consider a double exponential decay. Each of these specifications is controlled by a few hyperparameters. Following an empirical Bayes approach, we estimate the hyperparameters from the data and develop a posterior inference conditional on the hyperparameter estimates.

Through extensive simulation studies, we assess the performance gains of our approach under the different hyperparameter specifications, comparing it to two-component spike-and-slab prior-based wavelet methods. Finally, applications to real-world data, including electroencephalography (EEG) data from a meditation study and audio data from a noisy musical recording, illustrate the practical utility of the proposed method.

Although this work focuses on Bayesian spike-and-slab priors for wavelet analysis, it is still relevant to acknowledge other types of priors explored in the wavelet literature. These include scale mixtures of Gaussians [14,15,16,17], hidden Markov model-based priors [18], the generalized Gaussian distribution prior [19], Jeffreys’ prior [20], the Bessel K Form prior [21], the double Weibull prior [22], Bayes factor thresholding based on mixtures of conjugate priors [23], the beta prior [24], and the logistic prior [25]. For comparative reviews and summaries of wavelet-based nonparametric regression methods, see [26,27].

In what follows, Section 2 describes the proposed Bayesian hierarchical model along with the hyperparameter specifications, and Section 3 describes the empirical Bayes inference procedure. Subsequently, Section 4 describes the simulation experiment and results. The real data applications appear in Section 5 and Section 6. Finally, Section 7 concludes with an overall discussion of our work and relevant remarks. Theoretical proofs are included in Appendix A.

2. Bayesian Hierarchical Model

2.1. Observation Model

Suppose

y_{1}, \dots, y_{n}

represent n noisy observations from an unknown function

f (t)

, which we aim to estimate. We consider the observation model

y_{i} = f (t_{i}) + ϵ_{i}, i = 1, \dots, n,

where

t_{i} = i / n

represents the equispaced sampling points, and the errors

ϵ_{i}, \dots, ϵ_{n}

are assumed to be independent and identically distributed (i.i.d.) normal random variables,

ϵ_{i} \sim N (0, σ^{2})

, with unknown variance

σ^{2}

. Let

y = (y_{1}, \dots, y_{n})

denote the vector of observations,

f = (f (t_{1}), \dots, f (t_{n}))

the vector of true functional values, and

ϵ = (ϵ_{1}, \dots, ϵ_{n})

the vector of errors. The observation model can be expressed in matrix form as

\begin{matrix} y = f + ϵ, ϵ \sim N (0, σ^{2} I_{n}), \end{matrix}

(1)

where

I_{n}

is the n-dimensional identity matrix. In wavelet regression, the goal is to estimate the function

f (t)

from the observations

y

by decomposing

f (t)

into wavelet basis functions. This decomposition allows us to exploit the multiscale nature of wavelets to capture both local and global features of the function. A chief feature of wavelet regression is its ability to adapt to different levels of smoothness and to handle noisy data efficiently, making it especially useful in nonparametric function estimation problems.

2.2. Wavelet Coefficient Model

We represent

f

using an orthogonal wavelet basis matrix as

f = W d

[28], where

W

is the orthogonal basis matrix and

d

is a vector whose elements include the scaling coefficient at the coarsest resolution level along with the wavelet coefficients at all resolution levels. Let

\hat{d} = W^{T} y

denote the vector of empirical wavelet coefficients. Since

W

is orthogonal, we can express

\hat{d}

as

\hat{d} = d + ϵ^{*}

, where

ϵ^{*} = W^{T} ϵ

represents the transformed error vector with

ϵ^{*} \sim N (0, σ^{2} I_{n})

.

Suppose

d_{l j}

denotes the wavelet coefficient at position j and resolution level l, with

{\hat{d}}_{l j}

defined similarly for the empirical wavelet coefficients. Then, the model in terms of individual coefficients is given by

\begin{matrix} {\hat{d}}_{l j} = d_{l j} + ϵ_{l j}^{*}, ϵ_{l j}^{*} \sim N (0, σ^{2}) \end{matrix}

(2)

2.3. Mixture Prior for the Wavelet Coefficients

For the wavelet coefficient

d_{l j}

, we consider a spike-and-slab mixture prior that is a mixture of three components—a MOM prior, an IMOM prior, and a point mass at zero—given by

\begin{matrix} d_{l j} | γ_{l}^{(1)}, γ_{l}^{(2)}, τ_{l}^{(1)}, τ_{l}^{(2)}, σ^{2}, r, ν \sim γ_{l}^{(1)} M O M (τ_{l}^{(1)}, r, σ^{2}) \\ + (1 - γ_{l}^{(1)}) γ_{l}^{(2)} I M O M (τ_{l}^{(2)}, ν, σ^{2}) + (1 - γ_{l}^{(1)}) (1 - γ_{l}^{(2)}) δ_{0} (\cdot), \\ 0 < γ_{l}^{(1)}, γ_{l}^{(2)} < 1, \end{matrix}

(3)

where

γ_{1}^{(l)}

and

γ_{l}^{(2)}

are mixture probabilities. The

M O M

prior with order r and variance component

τ_{l}^{(1)} σ^{2}

, involving the scale parameter

τ_{l}^{(1)}

, has the following density function:

m o m (d_{l j} | τ_{l}^{(1)}, r, σ^{2}) = {\tilde{M}}_{r} {(τ_{l}^{(1)} σ^{2})}^{- r - 1 / 2} d_{l j}^{2 r} exp (- \frac{d_{l j}^{2}}{2 τ_{l}^{(1)} σ^{2}}), r > 1, τ_{l}^{(1)} > 0,

where

{\tilde{M}}_{r} = {(2 π)}^{- 1 / 2} / (2 r - 1)!!

and

(2 r - 1)!! = 1 \times 3 \times \dots \times (2 r - 1)

. The

I M O M

prior with shape parameter

ν

and variance component

τ_{l}^{(2)} σ^{2}

, involving the scale parameter

τ_{l}^{(2)}

, has the following density function:

i m o m (d_{l j} | τ_{l}^{(2)}, ν, σ^{2}) = \frac{{(τ_{l}^{(2)} σ^{2})}^{ν / 2}}{Γ (ν / 2)} {| d_{l j} |}^{- ν - 1} exp (- \frac{τ_{l}^{(2)} σ^{2}}{d_{l j}^{2}}), ν > 1, τ_{l}^{(2)} > 0 .

The conditional density of the wavelet coefficient

d_{l j}

given that

d_{l j} \neq 0

can be expressed as

\begin{matrix} π (d_{l j} | d_{l j} \neq 0) & = \frac{γ_{l}^{(1)}}{γ_{l}^{(1)} + (1 - γ_{l}^{(1)}) γ_{l}^{(2)}} {\tilde{M}}_{r} {(τ_{l}^{(1)} σ^{2})}^{- r - 1 / 2} d_{l j}^{2 r} exp (- \frac{d_{l j}^{2}}{2 τ_{l}^{(1)} σ^{2}}) \\ \frac{(1 - γ_{l}^{(1)}) γ_{l}^{(2)}}{γ_{l}^{(1)} + (1 - γ_{l}^{(1)}) γ_{l}^{(2)}} \frac{{(τ_{l}^{(2)} σ^{2})}^{ν / 2}}{Γ (ν / 2)} {| d_{l j} |}^{- ν - 1} exp (- \frac{τ_{l}^{(2)} σ^{2}}{d_{l j}^{2}}) . \end{matrix}

Figure 1 shows plots of our proposed three-component spike-and-slab mixture model (solid line) for

d_{l j}

with

γ_{1 l} = γ_{2 l} = 0.25

,

τ_{1 l} = τ_{2 l} = 0.2

,

r = ν = 1

, and

σ = 1

along with two-component spike-and-slab mixture models based on MOM (dashed line) and IMOM (dotted line) priors with

γ_{l} = 0.5

,

τ_{l} = 1

, and

σ = 1

.

2.4. Hyperparameter Specifications

The mixture prior given in (3) depends on the mixture probabilities

γ_{1}^{(l)}

and

γ_{l}^{(2)}

, scale parameters

τ_{l}^{(1)}

and

τ_{l}^{(1)}

, order r, and shape parameter

ν

. This section considers different specifications for

γ_{1}^{(l)}

,

γ_{l}^{(2)}

,

τ_{l}^{(1)}

, and

τ_{l}^{(1)}

. Previous work in wavelet regression considered two-component spike-and-slab mixture priors, where the mixture probabilities were specified using exponential decay specification [3], Bernoulli distribution [2], and logit specification [8]. In this work, we examine three specifications for the mixture probabilities that flexibly model the variations in the mixture probabilities and, to the best of our knowledge, are hitherto unused in wavelet regression. These novel specifications were motivated by an exploratory data analysis where, for multiple highly dispersed data, we observed how the wavelet coefficients changed with resolution level. We compare these novel specifications with logit specification for

γ_{l}^{(1)}

and

γ_{l}^{(2)}

, given by

γ_{l}^{(1)} = exp (θ_{1}^{γ} - θ_{2}^{γ} l) / {1 + exp (θ_{1}^{γ} - θ_{2}^{γ} l)}, γ_{l}^{(2)} = exp (θ_{3}^{γ} - θ_{4} l) {1 + exp (θ_{3}^{γ} - θ_{4}^{γ} l)},

where

θ_{1}^{γ}, θ_{3}^{γ} \in R, θ_{2}^{γ}, θ_{4}^{γ} > 0

[8]. Figure 2 shows, in the left panel, the plots of the different specifications for the mixture probabilities against resolution level, with specific values of the hyperparameters. The novel specifications are described as follows:

(a): Generalized logit or Richards decay specifications, given by

$\begin{matrix} γ_{l}^{(1)} & = \frac{1}{{[1 + exp {- (θ_{1}^{γ} - θ_{2}^{γ} l)}]}^{θ_{3}^{γ}}}, θ_{1}^{γ} \in R, θ_{2}^{γ}, θ_{3}^{γ} > 0 \\ γ_{l}^{(2)} & = \frac{1}{{[1 + exp {- (θ_{4}^{γ} - θ_{5}^{γ} l)}]}^{θ_{6}^{γ}}}, θ_{4}^{γ} \in R, θ_{5}^{γ}, θ_{6}^{γ} > 0 . \end{matrix}$

This form corresponds to a flexible S-shaped decay curve that reduces to standard logistic decay when $θ_{3}^{γ}$ (or $θ_{6}^{γ}$ ) and controls the steepness of the curve equal to one.
(b): Hyperbolic secant decay specifications, given by

$\begin{matrix} γ_{l}^{(1)} & = \frac{2}{π} arctan [exp (\frac{π}{2} (θ_{1}^{γ} - θ_{2}^{γ} l))], θ_{1}^{γ} \in R, θ_{2}^{γ} > 0 \\ γ_{l}^{(2)} & = \frac{2}{π} arctan [exp (\frac{π}{2} (θ_{3}^{γ} - θ_{4}^{γ} l))], θ_{3}^{γ} \in R, θ_{4}^{γ} > 0 . \end{matrix}$

This form, although less intuitive, is also sigmoid-like but with heavier tails and slower decay than logistic, indicating that the mixture parameters approach one more slowly with a smoother transition. $θ_{2}^{γ}$ (or $θ_{4}^{γ}$ ) controls steepness and $θ_{1}^{γ}$ (or $θ_{3}^{γ}$ ) controls shift.
(c): Generalized normal decay specifications, given by

$\begin{matrix} γ_{l}^{(1)} & = \frac{1}{2} + sign (θ_{1}^{γ} - l) \frac{1}{2 Γ (1 / θ_{2}^{γ})} γ (1 / θ_{2}^{γ}, {|\frac{θ_{1}^{γ} - l}{θ_{3}^{γ}}|}^{θ_{2}^{γ}}), θ_{1}^{γ} \in R, θ_{2}^{γ}, θ_{3}^{γ} > 0 \\ γ_{l}^{(2)} & = \frac{1}{2} + sign (θ_{4}^{γ} - l) \frac{1}{2 Γ (1 / θ_{5}^{γ})} γ (1 / θ_{5}^{γ}, {|\frac{θ_{4}^{γ} - l}{θ_{6}^{γ}}|}^{θ_{5}^{γ}}), θ_{4}^{γ} \in R, θ_{5}^{γ}, θ_{6}^{γ} > 0 . \end{matrix}$

This form is most flexible and can mimic normal CDF, Laplace CDF, and many other distributions, but also most complex. Whereas $θ_{2}^{γ}$ (or $θ_{2}^{γ}$ ) controls tail behavior with larger values indicating lighter tails, others control scale and shift.

For scale parameters of spike-and-slab mixture priors, previously considered specifications include exponential decay [3] and polynomial decay [8]. Here, for

τ_{l}^{(1)}

and

τ_{l}^{(2)}

, we consider the polynomial decay specification given by

τ_{l}^{(1)} = θ_{1}^{τ} l^{- θ_{2}^{τ}}, τ_{l}^{(2)} = θ_{3}^{τ} l^{- θ_{4}^{τ}}

with

θ_{1}^{τ}, θ_{2}^{τ}, θ_{3}^{τ}, θ_{4}^{τ} > 0

. In addition, for modeling the variations in the scale parameters more flexibly, we novelly propose the following:

(d): Double exponential decay specifications, given by

$\begin{matrix} τ_{l}^{(1)} = θ_{1}^{τ} exp (- θ_{2}^{τ} l) + θ_{3}^{τ} exp (- θ_{4}^{τ} l), θ_{1}^{τ}, θ_{2}^{τ}, θ_{3}^{τ}, θ_{4}^{τ} > 0 \\ τ_{l}^{(2)} = θ_{5}^{τ} exp (- θ_{6}^{τ} l) + θ_{7}^{τ} exp (- θ_{8}^{τ} l), θ_{5}^{τ}, θ_{6}^{τ}, θ_{7}^{τ}, θ_{8}^{τ} > 0 \end{matrix}$

Figure 2 shows, in the right panel, the plots of the different specifications for the scale parameters against resolution level, with specific values of the hyperparameters.

Each of the proposed specifications admits interpretable controls over rate and shape of decay across scales, crucial for adaptivity in multi-resolution modeling, and is governed by a small number of hyperparameters. Note that the hyperparameters for the mixture probabilities are superscripted with

γ

and those for the scale parameters are superscripted with

τ

. In our simulation studies described in Section 4, we analyze each simulated dataset using

4 \times 2 = 8

configurations arising out of the combinations of the above specifications for the mixture probabilities and the scale parameters. While no formal optimality results are available for these specific forms, their empirical performance, as shown in Section 4, supports their practical relevance. Let

θ

generically denote the set of all hyperparameters for any given configuration.

3. Inference

For inference in our proposed Bayesian hierarchical wavelet regression model based on three-component spike-and-slab mixture priors, we adopt the empirical Bayes approach [5,8]. This methodology estimates the hyperparameters from the data and performs posterior inference conditioned on these estimated hyperparameters. For simplicity, in our simulation analysis, we set

r = 1

and

ν = 1

. Further, we estimate the error variance

σ^{2}

using the median absolute deviation estimator [29]

\hat{σ} = {0.6745}^{- 1} {median}_{j} (| {\hat{d}}_{L j} - {median}_{j} ({\hat{d}}_{L j}) |)

, which is a well-established practice in wavelet regression [2,3,5,6].

3.1. Hyperparameter Estimation

Let

δ (x)

denote the value of the point mass function

δ (0)

at x. The following result is used to obtain the hyperparameter estimates.

Result 1.

Integrating out

d_{l j}

from the wavelet coefficient model (2) using the mixture prior of the wavelet coefficients in (3) and using the Laplace approximation for the IMOM prior component, we get the marginal distribution of the empirical wavelet coefficients,

{\hat{d}}_{l j}

, as

\begin{matrix} π ({\hat{d}}_{l j} | σ^{2}, θ, r, ν) \approx γ_{l}^{(1)} {(1 + τ_{l}^{(1)})}^{- r} M_{r}^{*} ({\hat{d}}_{l j}, τ_{l}^{(1)}, σ^{2}) ϕ ({\hat{d}}_{l j}; 0, σ^{2} (1 + τ_{l}^{(1)})) \\ + (1 - γ_{l}^{(1)}) γ_{l}^{(2)} \frac{{(τ_{l}^{(2)} σ^{2})}^{\frac{ν}{2}}}{Γ (ν / 2)} ϕ ({\hat{d}}_{l j}; 0, σ^{2}) \sqrt{2 π} σ_{*} h (d_{l j}^{*} ({\hat{d}}_{l j})) \\ + (1 - γ_{l}^{(1)}) (1 - γ_{l}^{(2)}) ϕ ({\hat{d}}_{l j}; 0, σ^{2}), \end{matrix}

(4)

where, in the offshoot of the MOM component,

M_{r}^{★} ({\hat{d}}_{l j}, τ_{l}^{(1)}, σ^{2}) = \frac{1}{(2 r - 1)!!} \sum_{i = 0}^{r} \frac{(2 r)!}{(2 i)! (r - i)! 2^{r - i}} {(\sqrt{\frac{τ_{l}^{(1)}}{1 + τ_{l}^{(1)}}} \frac{{\hat{d}}_{l j}}{σ})}^{2 i},

and in the offshoot of the IMOM component,

h (d_{l j}) = {| d_{l j} |}^{- (ν + 1)} exp \{- \frac{1}{2 σ^{2}} (d_{l j}^{2} - 2 d_{l j} {\hat{d}}_{l j}) - \frac{τ_{l}^{(2)} σ^{2}}{d_{l j}^{2}}\},

d_{l j}^{*} ({\hat{d}}_{l j})

is the global maxima of

h (d_{l j})

, and

σ_{*}^{2} = - 1 / L_{h}^{″} (d_{l j}^{*} ({\hat{d}}_{l j}))

, with

L_{h} (d_{l j}) = log (h (d_{l j}))

.

The proof of Result 1 is given in Appendix A. Using (4), the marginal likelihood function is approximately given by

\begin{matrix} \prod_{l} \prod_{j} [γ_{l}^{(1)} {(1 + τ_{l}^{(1)})}^{- r} M_{r}^{*} ({\hat{d}}_{l j}, τ_{l}^{(1)}, σ^{2}) ϕ ({\hat{d}}_{l j}; 0, σ^{2} (1 + τ_{l}^{(1)})) \\ + (1 - γ_{l}^{(1)}) γ_{l}^{(2)} \frac{{(τ_{l}^{(2)} σ^{2})}^{\frac{ν}{2}}}{Γ (ν / 2)} ϕ ({\hat{d}}_{l j}; 0, σ^{2}) \sqrt{2 π} σ_{*} h (d_{l j}^{*} ({\hat{d}}_{l j})) \\ + (1 - γ_{l}^{(1)}) (1 - γ_{l}^{(2)}) ϕ ({\hat{d}}_{l j}; 0, σ^{2})] . \end{matrix}

This is a function only of the data and the hyperparameters. Hence, we maximize this function with respect to the hyperparameters to obtain their estimates,

\hat{θ}

. With

θ = \hat{θ}

, the prior distribution in (3) is fully known.

3.2. Posterior Distribution

The following two results provide the posterior estimates of the wavelet coefficients.

Result 2.

The conditional posterior density of the wavelet coefficient

d_{l j}

, given that

d_{l j} \neq 0

and the hyperparameter estimates

\hat{θ}

, by using the Laplace approximation for the IMOM prior component, can be expressed as

\begin{matrix} π (d_{l j} | d_{l j} \neq 0, σ^{2}, θ, r, ν, y) = \frac{p_{l j}^{(1)}}{p_{l j}^{(1)} + p_{l j}^{(2)}} \frac{{\tilde{M}}_{r}}{M_{r}^{*} ({\hat{d}}_{l j}, τ_{l}^{(1)}, σ^{2})} d_{l j}^{2 r} \\ exp \{- \frac{1}{2 σ^{2} \frac{τ_{l}^{(1)}}{(1 + τ_{l}^{(1)})}} {(d_{l j} - \frac{τ_{l}^{(1)}}{1 + τ_{l}^{(1)}} {\hat{d}}_{l j})}^{2}\} + \frac{p_{l j}^{(2)}}{p_{l j}^{(1)} + p_{l j}^{(2)}} ϕ (d_{l j} | d_{l j}^{*} ({\hat{d}}_{l j}), σ_{*}^{2}), \end{matrix}

where

p_{l j}^{(1)} = \frac{O_{l j}^{(1)}}{1 + O_{l j}^{(1)} + O_{l j}^{(2)}}, p_{l j}^{(2)} = \frac{O_{l j}^{(2)}}{1 + O_{l j}^{(1)} + O_{l j}^{(2)}},

O_{l j}^{(1)} = \frac{γ_{l}^{(1)}}{(1 - γ_{l}^{(1)}) (1 - γ_{l}^{(2)})} {(1 + τ_{l}^{(1)})}^{- r - 1 / 2} M_{r}^{*} exp \{\frac{1}{2 σ^{2}} \frac{τ_{l}^{(1)}}{1 + τ_{l}^{(1)}} {\hat{d}}_{l j}^{2}\},

and

O_{l j}^{(2)} = \frac{γ_{l}^{(2)}}{1 - γ_{l}^{(2)}} \frac{{(τ_{l}^{(2)} σ^{2})}^{\frac{ν}{2}}}{Γ (ν / 2)} \sqrt{2 π} σ_{*} h (d_{l j}^{*} ({\hat{d}}_{l j})) .

The proof is given in Appendix A.

Result 3.

The posterior expectation of the wavelet coefficients

d_{l j}

is

\begin{matrix} {\bar{d}}_{l j} & = E (d_{l j} | y) = p_{l j}^{(1)} \frac{M_{r}^{* *}}{M_{r}^{*}} \sqrt{\frac{τ_{l}^{(1)}}{1 + τ_{l}^{(1)}}} σ + p_{l j}^{(2)} d_{l j}^{*} ({\hat{d}}_{l j}), \end{matrix}

where

M_{r}^{★ ★} ({\hat{d}}_{l j}, τ_{l}, σ^{2}) = \frac{1}{(2 r - 1)!!} \sum_{i = 1}^{r + 1} \frac{(2 r + 1)!}{(2 i - 1)! (r + 1 - i)! 2^{r + 1 - i}} {(\sqrt{\frac{τ_{l}^{(1)}}{1 + τ_{l}^{(1)}}} \frac{{\hat{d}}_{l j}}{σ})}^{2 i - 1} .

The proof of Result 3 is immediate from the proof of Result 2 and hence is omitted. Let

\bar{d} = {d_{l j} : l = 1, \dots, L, j = 1 \dots, J}

denote the vector of posterior means of the wavelet coefficients. We use

\bar{d}

in the inverse discrete wavelet transform to obtain the posterior mean of the unknown function f, given by

\bar{f} = E (f | y) = W E (d | y) = W \bar{d} .

4. Simulation Study

In this section, we conduct an extensive simulation analysis to comparatively evaluate the performance of the proposed method with different hyperparameter configurations. We consider three well-known test functions proposed by [30]—blocks, bumps, and doppler—that are used as standard test functions in the wavelet literature. However, we modify the coefficients used by [30] to obtain more highly dispersed signals. We define our modified test functions as

f_{b l o c k s} (t) = \sum h_{j} K (t - t_{j}), where K (t) = {1 + sign (t)} / 2,

$(t j) = (0.1, 0.13, 0.15, 0.23, 0.25, 0.40, 0.44, 0.65, 0.76, 0.78, 0.81)$ ,
$(h j) = (4, - 8, 3, - 4, 8, - 4.2, 2.1, 4.3, - 6.1, 2.1, - 4.7)$ ;

f_{b u m p s} (t) = \sum h_{j} K ((t - t_{j}) / w_{j}), where K (t) = {(1 + | t |)}^{- 4},

$(t_{j}) = (0.1, 0.13, 0.15, 0.23, 0.25, 0.40, 0.44, 0.65, 0.76, 0.78, 0.81)$ ,
$(h_{j}) = (2, 10, 1, 4, 8, 4.2, 2.1, 4.3, 1.1, 3.1, 8.2)$ ,
$(w_{j}) = (0.005, 0.005, 0.006, 0.01, 0.01, 0.03, 0.01, 0.01, 0.005, 0.008, 0.005)$ ;

f_{d o p p l e r} (t) = {t (1 - t)}^{1 / 2} sin {2 π (1 + ϵ) / (t + ϵ)}, ϵ = 0.01 .

As these modifications involved adjustments to specific numerical values only, the overall functional forms remained consistent with the original Donoho–Johnstone test functions. In addition, we consider three different linear combinations of these functions—lcomb1, lcomb2, and lcomb3—that represent curves that combine various features—such as blockiness, bumpiness, and changing frequency—in different proportions, given by

\begin{matrix} f_{l c o m b 1} (t) & = 0.4 f_{b l o c k s} (t) + 0.4 f_{b u m p s} (t) + 0.2 f_{d o p p l e r} (t) \\ f_{l c o m b 2} (t) & = 0.4 f_{b l o c k s} (t) + 0.2 f_{b u m p s} (t) + 0.4 f_{d o p p l e r} (t) \\ f_{l c o m b 3} (t) & = 0.2 f_{b l o c k s} (t) + 0.4 f_{b u m p s} (t) + 0.4 f_{d o p p l e r} (t) . \end{matrix}

Figure 3 shows the plots of the test functions blocks, bumps, and doppler using Donoho–Johnstone specifications (dashed line) and our specifications (solid line), and lcomb1, lcomb2, and lcomb3 based on our specifications, all evaluated at 1024 equally spaced in (0,1). We evaluate each considered test function at n = 512, 1024, 2048, and 4096 equidistant points in the interval (0,1) and add random Gaussian noise with mean 0 to generate data with signal-to-noise ratio, SNR

= 3, 5, and 7

. For each combination of n and SNR, we consider 100 replications. The simulated datasets are analyzed using the proposed methodology with eight different hyperparameter configurations described in Section 2. For comparison, we also analyzed the datasets using individual MOM and IMOM prior-based two-component mixture models [8]. Thus, a total of 24 analysis methods were applied to each dataset. Note that the prior literature [8] has already shown that nonlocal prior-based wavelet analysis generally performs better than other existing wavelet-based methods such as sure [28], BayesThresh [3], cv [31], fdr [32], and Ebayesthresh [6]. So, for brevity, we do not compare with these methods in this work. For wavelet transformation, we consider the Daubechies least asymmetric wavelet with six vanishing moments and periodic boundary conditions. Wavelet computations are implemented using the R package wavethresh [33].

Table 1 presents, for each test function and analysis method, the number of (n, SNR) combinations where the method achieved the lowest mean squared error (MSE). For each test function, the method with the highest frequency of best performance (i.e., the lowest MSE) is highlighted in bold. We observe the following:

(a): For every function, the highest frequency of best performance was shown by a three-component mixture method (with one tie with a two-component mixture method for the blocks function), which is proposed in the current work.
(b): Out of all eight three-component mixture methods, the one with generalized normal specification for the mixture probabilities and double exponential decay specification for the scale parameters (mixture-gennormal-doubleexp) showed the maximum number of best performances in total (12 times) for all the test functions. This was followed by the method using logit specification for the mixture probabilities and polynomial decay specification for the scale parameters (mixture-logit-polynom) (11 times) and the method using generalized normal specification for the mixture probabilities and polynomial decay specification for the scale parameters (mixture-logit-polynom) (9 times).
(c): Considering only the test functions lcomb1, lcomb2, and lcomb3 that represent signals with mixed characteristics in various proportions, the mixture-gennormal-doubleexp method showed the maximum number of best performances (10 times).

For real data, SNR and the original function will not be known. So, next, in Figure 4, we show, for each of the eight three-component mixture methods, the MSE for the four different sample sizes (n), averaged over the three SNRs and six test functions considered in our study. Overall, methods with a double exponential decay specification for the scale parameters showed a better performance. Specifically, for the two largest sample sizes (2048 and 4096), the method with hyperbolic secant and double exponential decay specifications (mixture-hypsec-doubleexp) and the method with generalized logit and double exponential decay specifications (mixture-genlogit-doubleexp) provided top performances. Notably, the method with generalized normal and polynomial decay specifications showed a worse performance in Figure 4, even though the methods with generalized normal specifications showed the best performance in Table 1. This implies that the methods with generalized normal specifications, although most often provide the best denoising, may sometimes produce large bias that negatively affects their overall average performance. So, the results obtained by their usage should be validated by field knowledge or external means (such as auditory assessment for sound signals).

5. EEG Meditation Study

We demonstrate the utility and flexibility of the proposed methodology through the analysis of EEG data from a meditation study, investigating the relation between mind wandering and meditation practice. Detailed information about the study is provided in [34], and the dataset is available on the OpenNeuro platform (see the Supplementary Materials Section S1 for the link). The meditation experiment involved 24 subjects—12 experienced meditators (10 males, 2 females) and 12 novices (2 males, 10 females). Participants meditated while being interrupted approximately every two minutes to report their level of concentration and mind wandering via three probing questions. Each participant completed two to three sessions lasting 45 to 90 min, with a minimum of 30 probes per participant. EEG data were collected using a 64-channel Biosemi system (channels A1–A32 and B1–B32) with a Biosemi 10–20 head cap montage at a sampling rate of 2048 Hz, providing spatial information about brain activity across different scalp regions. The dataset available on OpenNeuro has already been downsampled to 256 Hz.

For our analysis, we focused on the data from session 1 of four participants: two expert meditators (one male and one female, identified as subjects 10 and 5 in the original dataset, respectively) and two novices (one male and one female, identified as subjects 23 and 14, respectively). The data were average-referenced to mitigate common noise and artifacts by calculating the mean signal across all electrodes at each time point and subtracting it from the signal at each electrode. Following this, a 2 Hz high-pass filter was applied using an infinite impulse response (IIR) filter with a transition bandwidth of 0.7 Hz and an order of six. Denoting the time of probe 1 by t, we analyzed the EEG signal within the 16 s interval

(t - 8, t + 8)

. With a sampling rate of 256 Hz, this interval comprised 256 × 16 = 4096 data points, representing noisy observations of the underlying signal. We analyzed the EEG data using the mixture-hypsec-doubleexp, mixture-genlogit-doubleexp, and mixture-gennormal-polynom methods (see Section 4), employing the Daubechies least asymmetric wavelet transform with six vanishing moments and periodic boundary conditions.

Figure 5 presents the plots of the posterior means of the EEG signal (in slategray) based on the mixture-genlogit-doubleexp (left column), mixture-hypsec-doubleexp (middle column), and mixture-gennormal-doubleexp (right column) methods, superimposed on the observed data (in black), obtained during the 16 s interval from the A10 channel of the four considered participants. Similar plots for some other channels are presented in the Supplementary Materials Section S3. The plots clearly indicate that our method, with all the considered hyperparameter configurations, yielded significantly denoised estimates.

6. Musical Sound Study

To further demonstrate the utility of the proposed method, we consider analyzing musical sounds that often exhibit sudden frequency changes over short periods, resulting in highly dispersed signals. For this study, we focus on a vocal music recording from 1934, originally published on a 78 RPM record of Hindustani classical music (one of India’s two classical music traditions) and performed by eminent Ustad Amir Khan. A noisy copy of this recording was sourced from YouTube (link provided in the Supplementary Materials Section S1). From the recording, we extracted a 15 s segment representing a wide frequency range and saved it as a 16-bit WAV audio file, which represents noisy data.

For our analysis, we divided each of the two audio file channels (left and right) into sections of 4096 data points to enhance computational efficiency. Each section was independently analyzed using the mixture-genlogit-doubleexp and mixture-hypsec-doubleexp methods (see Section 4) employing the Daubechies coiflets wavelet transform with five vanishing moments and periodic boundary conditions. The posterior estimates of the individual sections were then combined to reconstruct the posterior estimate of the entire audio segment.

In Figure 6, we present the posterior mean of the right-channel signal of the selected audio segment (in slate gray), superimposed on the noisy data (in black). The audio files corresponding to these posterior estimates, along with the original data and posterior estimates using the hard thresholding rule, are provided in the Supplementary Materials for auditory comparison and assessment. The posterior estimates show significant denoising and precise recovery of the signal. There is a slight systematic noise present in the posterior audio files, which is due to analyzing the whole segment in disjoint sections for computational convenience. That can be mitigated by analyzing larger sections or analyzing partially overlapping sections with weighted averaging (e.g., cross-fading or Hann windowing) to smoothly combine overlapping regions and reduce boundary artifacts.

7. Discussion

In this work, we proposed the use of nonlocal prior mixtures for wavelet-based nonparametric function estimation. The main innovations of our methodology are as follows:

(a): We introduce a three-component spike-and-slab prior for the wavelet coefficients. This structure is particularly suited for modeling highly dispersed signals. The slab component is a mixture of two nonlocal priors—the MOM and IMOM priors, which offer enhanced adaptability to signal characteristics.
(b): We propose flexible and previously unexplored hyperparameter specifications. These include generalized logit (or Richards), hyperbolic secant, and generalized normal decay specifications for the mixture probabilities, as well as a double exponential decay structure for the scale parameter. These enhancements provide improved flexibility and accuracy in modeling complex signal patterns, as demonstrated in our simulation study.
(c): We implement our methodology within the R programming language [35] as a package named NLPwavelet [36], which performs nonlocal prior (NLP)-based wavelet analysis.

In the simulation study, using more dispersed versions of the Donoho–Johnstone test functions and various linear combinations of them, we compared the performance of the proposed approach with the existing two-component spike-and-slab mixture prior and demonstrated the superior flexibility of the proposed approach. Further, analysis using several novel hyperparameter configurations provided valuable insights into the relative advantages. Although no formal optimality results are available for these specific forms, their flexibility is supported by established theoretical principles for scale-dependent shrinkage, and their empirical performance, demonstrated in the simulation study of Section 4, provides strong evidence of their practical relevance.

Note that, in our empirical Bayes implementation, we did not encounter convergence failures. However, in smaller samples or high-noise scenarios, certain decay specifications—particularly those involving generalized normal parameters—showed greater variability in estimated hyperparameters. To reduce potential overfitting and identifiability issues, we recommend sensitivity checks across multiple decay specifications and inspection of hyperparameter estimates for plausibility. Such practices can help ensure the robustness of conclusions in applied settings.

A necessary limitation of the proposed approach is its higher computational cost relative to two-component mixture priors. Supplementary Materials Section S2 reports the average runtime (in seconds) for 24 analysis methods across the sample sizes n considered in our simulation study. Within each method, the specification of the mixture probability has only a modest effect on runtime, while for the scaling parameter, doubleexp options are generally slightly slower than polynom options. The runtime growth of the mixture methods suggests approximately linear scaling with sample size. For instance, for mixture-logit-polynom, the runtime for

n = 1024

is about 1.96 times that for

n = 512

; for

n = 2048

, it is about 1.96 times that for

n = 1024

; and for

n = 4096

, about 1.97 times that for

n = 1024

. That near doubling of runtime when n doubles is consistent across the other mixture methods too. This near doubling with each doubling of n is consistent across other mixture methods, indicating a computational complexity of roughly

O (n)

, which is generally considered good and efficient for large-scale problems. Nonetheless, the larger constant factor of the proposed three-component methods, relative to two-component methods, results in longer absolute runtimes, which is an anticipated trade-off for their enhanced modeling flexibility. While our simulations focus on one-dimensional signals, the linear scaling behavior suggests the approach can extend to higher-dimensional settings (e.g., 2D/3D images), subject to the corresponding increase in the number of coefficients.

Further, while the EEG and audio denoising examples illustrate the practical use of the proposed method, their evaluation is primarily qualitative due to the lack of ground truth reference signals. For EEG data, objective measures such as SNR improvement or reconstruction error cannot be computed reliably without a known clean signal. Similarly, in the audio example, perceptual quality metrics such as PESQ [37] or STOI [38] require access to a clean reference, which was not available for the real-world recording used here. Future work, incorporating controlled experiments in which synthetic noise is added to high-quality EEG or audio recordings, would allow the computation of such quantitative metrics and hence direct, reproducible comparisons with existing denoising techniques, complementing the qualitative assessments presented in this study.

An obvious extension of the proposed approach is to adapt it to multidimensional wavelet regression, such as 2D or 3D image processing tasks. Another possible avenue is to develop fully Bayesian hierarchical models by specifying prior distributions for the hyperparameters and employing Markov chain Monte Carlo tools or variational methods for posterior inference. In addition, one can explore combining wavelet decompositions with local polynomial regression to mitigate the boundary bias issues. Further, adapting our methodology to non-Gaussian or skewed data can be an interesting enterprise. We leave all these to future research.

Supplementary Materials

The following supporting information can be downloaded at: https://www.mdpi.com/article/10.3390/math13162642/s1, Figure S1: Plots of the posterior means of the EEG signal (in slate gray) based on the imom-logit-polynom (left column), mixture-genlogit-doubleexp (middle column), and mixture-gennorm-polynom (right column) methods, suporimposed on the observed data (in black), obtained during the 16-second interval

(t - 8, t + 8)

, t being probe 1 onset time, from the A20 channel of the 4 considered participants; Figure S2: Plots of the posterior means of the EEG signal (in slate gray) based on the imom-logit-polynom (left column), mixture-genlogit-doubleexp (middle column), and mixture-gennorm-polynom (right column) methods, suporimposed on the observed data (in black), obtained during the 16-second interval

(t - 8, t + 8)

, t being probe 1 onset time, from the B10 channel of the 4 considered participants; Figure S3: Plots of the posterior means of the EEG signal (in slate gray) based on the imom-logit-polynom (left column), mixture-genlogit-doubleexp (middle column), and mixture-gennorm-polynom (right column) methods, suporimposed on the observed data (in black), obtained during the 16-second interval

(t - 8, t + 8)

, t being probe 1 onset time, from the B20 channel of the 4 considered participants; Table S1: Average runtime (seconds) for 24 analysis methods applied to simulated datasets from 12 combinations of sample size (n) and SNR, with 100 replications each. Methods include eight MOM-based, eight IMOM-based, and eight proposed three-component mixture priors.

Funding

This research received no external funding.

Acknowledgments

The authors thank the JAKAR High-Performance Cluster at the University of Texas at El Paso for providing computational resources free of charge.

Conflicts of Interest

The authors declare no conflicts of interest.

Abbreviations

The following abbreviations are used in this manuscript:

MOM prior	Moment prior
IMOM prior	Inverse moment prior
NLP	Nonlocal prior
SNR	Signal-to-noise ratio
MSE	Mean squared error
EEG	Electroencephalogram

Appendix A

Appendix A.1. Proof of Result 1

From the wavelet coefficient model (2) and the mixture prior of the wavelet coefficients in (3), using the Laplace approximation for the IMOM prior component, we get the marginal distribution of the empirical wavelet coefficients,

{\hat{d}}_{l j}

, as

\begin{matrix} π ({\hat{d}}_{l j} | σ^{2}, θ, r, ν) \\ = \int π ({\hat{d}}_{l j} | d_{l j}, σ^{2}, θ, r, ν) π (d_{l j} | σ^{2}, θ, r, ν) d d_{l j} \\ = \int {(2 π σ^{2})}^{- \frac{1}{2}} exp \{- \frac{{({\hat{d}}_{l j} - d_{l j})}^{2}}{2 σ^{2}}\} [γ_{l}^{(1)} {\tilde{M}}_{r} {(τ_{l}^{(1)} σ^{2})}^{- r - \frac{1}{2}} d_{l j}^{2 r} exp (- \frac{d_{l j}^{2}}{2 τ_{l}^{(1)} σ^{2}}) \\ + (1 - γ_{l}^{(1)}) γ_{l}^{(2)} \frac{{(τ_{l}^{(2)} σ^{2})}^{\frac{ν}{2}}}{Γ (ν / 2)} {| d_{l j} |}^{- ν - 1} exp (- \frac{τ_{l}^{(2)} σ^{2}}{d_{l j}^{2}}) + (1 - γ_{l}^{(1)}) (1 - γ_{l}^{(2)}) δ (d_{l j})] d d_{l j} \\ = {(2 π σ^{2})}^{- \frac{1}{2}} [γ_{l}^{(1)} {\tilde{M}}_{r} {(τ_{l}^{(1)} σ^{2})}^{- r - \frac{1}{2}} \int d_{l j}^{2 r} exp \{- \frac{1}{2 σ^{2}} ({({\hat{d}}_{l j} - d_{l j})}^{2} + \frac{d_{l j}^{2}}{τ_{l}^{(1)}})\} d d_{l j} \\ + (1 - γ_{l}^{(1)}) γ_{l}^{(2)} \frac{{(τ_{l}^{(2)} σ^{2})}^{\frac{ν}{2}}}{Γ (ν / 2)} \int {| d_{l j} |}^{- ν - 1} exp \{- \frac{{({\hat{d}}_{l j} - d_{l j})}^{2}}{2 σ^{2}} - \frac{τ_{l}^{(2)} σ^{2}}{d_{l j}^{2}}\} d d_{l j} \\ + (1 - γ_{l}^{(1)}) (1 - γ_{l}^{(2)}) exp (- \frac{{\hat{d}}_{l j}^{2}}{2 σ^{2}})] \\ = {(2 π σ^{2})}^{- \frac{1}{2}} [γ_{l}^{(1)} {\tilde{M}}_{r} {(τ_{l}^{(1)} σ^{2})}^{- r - \frac{1}{2}} exp \{- \frac{{\hat{d}}_{l j}^{2}}{2 σ^{2} (1 + τ_{l}^{(1)})}\} {(2 π σ^{2} \frac{τ_{l}^{(1)}}{1 + τ_{l}^{(1)}})}^{1 / 2} {(σ^{2} \frac{τ_{l}^{(1)}}{1 + τ_{l}^{(1)}})}^{r} \\ \sum_{i = 0}^{r} \frac{(2 r)!}{(2 i)! (r - i)! 2^{r - 1}} {(\frac{\frac{τ_{l}^{(1)}}{1 + τ_{l}^{(1)}} {\hat{d}}_{l j}}{\sqrt{\frac{τ_{l}^{(1)}}{1 + τ_{l}^{(1)}} σ}})}^{2 i} + (1 - γ_{l}^{(1)}) γ_{l}^{(2)} \frac{{(τ_{l}^{(2)} σ^{2})}^{\frac{ν}{2}}}{Γ (ν / 2)} exp (- \frac{{\hat{d}}_{l j}^{2}}{2 σ^{2}}) \sqrt{2 π} σ_{*} h (d_{l j}^{*} ({\hat{d}}_{l j})) \\ + (1 - γ_{l}^{(1)}) (1 - γ_{l}^{(2)}) exp (- \frac{{\hat{d}}_{l j}^{2}}{2 σ^{2}})] (using Laplace approximation) \\ = γ_{l}^{(1)} {(1 + τ_{l}^{(1)})}^{- r} M_{r}^{*} ({\hat{d}}_{l j}, τ_{l}^{(1)}, σ^{2}) ϕ ({\hat{d}}_{l j}; 0, σ^{2} (1 + τ_{l}^{(1)})) + (1 - γ_{l}^{(1)}) γ_{l}^{(2)} \frac{{(τ_{l}^{(2)} σ^{2})}^{\frac{ν}{2}}}{Γ (ν / 2)} \\ ϕ ({\hat{d}}_{l j}; 0, σ^{2}) \sqrt{2 π} σ_{*} h (d_{l j}^{*} ({\hat{d}}_{l j})) + (1 - γ_{l}^{(1)}) (1 - γ_{l}^{(2)}) ϕ ({\hat{d}}_{l j}; 0, σ^{2}) . \end{matrix}

where

M_{r}^{★} ({\hat{d}}_{l j}, τ_{l}, σ^{2}) = \frac{1}{(2 r - 1)!!} \sum_{i = 0}^{r} \frac{(2 r)!}{(2 i)! (r - i)! 2^{r - i}} {(\sqrt{\frac{τ_{l}^{(1)}}{1 + τ_{l}^{(1)}}} \frac{{\hat{d}}_{l j}}{σ})}^{2 i},

h (d_{l j}) = {| d_{l j} |}^{- (ν + 1)} exp \{- \frac{1}{2 σ^{2}} (d_{l j}^{2} - 2 d_{l j} {\hat{d}}_{l j}) - \frac{τ_{l}^{(1)} σ^{2}}{d_{l j}^{2}}\},

d_{l j}^{*} ({\hat{d}}_{l j})

is the global maxima of

h (d_{l j})

, and

σ_{*}^{2} = - 1 / L_{h}^{″} (d_{l j}^{*} ({\hat{d}}_{l j}))

, with

L_{h} (d_{l j}) = log (h (d_{l j}))

.

Appendix A.2. Proof of Result 2

\begin{matrix} π (d_{l j} | σ^{2}, θ, r, ν, y) & \propto π (y | d_{l j}, σ^{2}, θ, r, ν) π (d_{l j} | σ^{2}, θ, r, ν) . \end{matrix}

The proportionality constant is

\begin{matrix} C & = \int π (y | d_{l j}, σ^{2}, θ, r, ν) π (d_{l j} | σ^{2}, θ, r, ν) d d_{l j} \\ = {(2 π σ^{2})}^{- \frac{1}{2}} [γ_{l}^{(1)} {(1 + τ_{l}^{(1)})}^{- r - \frac{1}{2}} M_{r}^{*} ({\hat{d}}_{l j}, τ_{l}^{(1)}, σ^{2}) exp \{- \frac{{\hat{d}}_{l j}^{2}}{2 σ^{2} (1 + τ_{l}^{(1)})}\} \\ + (1 - γ_{l}^{(1)}) γ_{l}^{(2)} \frac{{(τ_{l}^{(2)} σ^{2})}^{\frac{ν}{2}}}{Γ (ν / 2)} exp (- \frac{{\hat{d}}_{l j}^{2}}{2 σ^{2}}) \sqrt{2 π} σ_{*} h (d_{l j}^{*} ({\hat{d}}_{l j})) \\ + (1 - γ_{l}^{(1)}) (1 - γ_{l}^{(2)}) exp (- \frac{{\hat{d}}_{l j}^{2}}{2 σ^{2}})] \\ = {(2 π σ^{2})}^{- \frac{1}{2}} (1 - γ_{l}^{(1)}) (1 - γ_{l}^{(2)}) exp (- \frac{{\hat{d}}_{l j}^{2}}{2 σ^{2}}) [O_{l j}^{(1)} + O_{l j}^{(2)} + 1], \end{matrix}

Thus,

\begin{matrix} π (d_{l j} | σ^{2}, θ, r, ν, y) \\ = & \frac{1}{C} π (y | d_{l j}, σ^{2}, θ, r, ν) π (d_{l j} | σ^{2}, θ, r, ν) \\ = & \frac{1}{{(2 π σ^{2})}^{- \frac{1}{2}} (1 - γ_{l}^{(1)}) (1 - γ_{l}^{(2)}) exp (- \frac{{\hat{d}}_{l j}^{2}}{2 σ^{2}}) [O_{l j}^{(1)} + O_{l j}^{(2)} + 1]} {(2 π σ^{2})}^{- \frac{1}{2}} \\ [γ_{l}^{(1)} {\tilde{M}}_{r} {(τ_{l}^{(1)} σ^{2})}^{- r - \frac{1}{2}} exp \{- \frac{{\hat{d}}_{l j}^{2}}{2 σ^{2} (1 + τ_{l}^{(1)})}\} d_{l j}^{2 r} exp \{- \frac{1}{2 σ^{2} \frac{τ_{l}^{(1)}}{1 + τ_{l}^{(1)}}} {(d_{l j} - \frac{τ_{l}^{(1)}}{1 + τ_{l}^{(1)}} {\hat{d}}_{l j})}^{2}\} \\ + (1 - γ_{l}^{(1)}) γ_{l}^{(2)} \frac{{(τ_{l}^{(2)} σ^{2})}^{\frac{ν}{2}}}{Γ (ν / 2)} exp (- \frac{{\hat{d}}_{l j}^{2}}{2 σ^{2}}) \sqrt{2 π} σ_{*} h (d_{l j}^{*} ({\hat{d}}_{l j})) ϕ (d_{l j} | d_{l j}^{*} ({\hat{d}}_{l j}), σ_{*}^{2}) \\ exp \{\frac{{\hat{d}}_{l j} d_{l j}}{σ^{2}} - \frac{d_{l j}^{2}}{2 σ^{2}}\} δ (d_{l j})] \\ = & \frac{O_{l j}^{(1)}}{O_{l j}^{(1)} + O_{l j}^{(2)} + 1} \frac{{\tilde{M}}_{r}}{M_{r}^{*} ({\hat{d}}_{l j}, τ_{l}^{(1)}, σ^{2})} d_{l j}^{2 r} exp \{- \frac{1}{2 σ^{2} \frac{τ_{l}^{(1)}}{(1 + τ_{l}^{(1)})}} {(d_{l j} - \frac{τ_{l}^{(1)}}{1 + τ_{l}^{(1)}} {\hat{d}}_{l j})}^{2}\} \\ + \frac{O_{l j}^{(2)}}{O_{l j}^{(1)} + O_{l j}^{(2)} + 1} ϕ (d_{l j} | d_{l j}^{*} ({\hat{d}}_{l j}), σ_{*}^{2}) + \frac{1}{O_{l j}^{(1)} + O_{l j}^{(2)} + 1} exp \{\frac{{\hat{d}}_{l j} d_{l j}}{σ^{2}} - \frac{d_{l j}^{2}}{2 σ^{2}}\} δ (d_{l j}) \\ = & p_{l j}^{(1)} \frac{{\tilde{M}}_{r}}{M_{r}^{*} ({\hat{d}}_{l j}, τ_{l}^{(1)}, σ^{2})} d_{l j}^{2 r} exp \{- \frac{1}{2 σ^{2} \frac{τ_{l}^{(1)}}{(1 + τ_{l}^{(1)})}} {(d_{l j} - \frac{τ_{l}^{(1)}}{1 + τ_{l}^{(1)}} {\hat{d}}_{l j})}^{2}\} + p_{l j}^{(2)} ϕ (d_{l j} | d_{l j}^{*} ({\hat{d}}_{l j}), σ_{*}^{2}) \\ + (1 - p_{l j}^{(1)} - p_{l j}^{(2)}) exp \{\frac{{\hat{d}}_{l j} d_{l j}}{σ^{2}} - \frac{d_{l j}^{2}}{2 σ^{2}}\} δ (d_{l j}), \end{matrix}

from which Result 2 follows.

References

Chipman, H.A.; Kolaczyk, E.D.; McCulloch, R.E. Adaptive Bayesian Wavelet Shrinkage. J. Am. Stat. Assoc. 1997, 92, 1413–1421. [Google Scholar] [CrossRef]
Clyde, M.; Parmigiani, G.; Vidakovic, B. Multiple shrinkage and subset selection in wavelets. Biometrika 1998, 85, 391–401. [Google Scholar] [CrossRef]
Abramovich, F.; Sapatinas, T.; Silverman, B. Wavelet thresholding via a Bayesian approach. J. R. Stat. Soc. Ser. B 1998, 60, 725–749. [Google Scholar] [CrossRef]
Sanyal, N.; Ferreira, M.A. Bayesian hierarchical multi-subject multiscale analysis of functional {MRI} data. NeuroImage 2012, 63, 1519–1531. [Google Scholar] [CrossRef] [PubMed]
Clyde, M.; George, E.I. Flexible empirical Bayes estimation for wavelets. J. R. Stat. Soc. Ser. B 2000, 62, 681–698. [Google Scholar] [CrossRef]
Johnstone, I.M.; Silverman, B.W. Empirical Bayes Selection of Wavelet Thresholds. Ann. Stat. 2005, 33, 1700–1752. [Google Scholar] [CrossRef]
dos Santos Sousa, A.R. A Bayesian wavelet shrinkage rule under LINEX loss function. Res. Stat. 2024, 2, 2362926. [Google Scholar] [CrossRef]
Sanyal, N.; Ferreira, M.A. Bayesian wavelet analysis using nonlocal priors with an application to FMRI analysis. Sankhya B 2017, 79, 361–388. [Google Scholar] [CrossRef]
Johnson, V.; Rossell, D. On the use of non-local prior densities in Bayesian hypothesis tests. J. R. Stat. Soc. Ser. B 2010, 72, 143–170. [Google Scholar] [CrossRef]
Johnson, V.; Rossell, D. Bayesian model selection in high-dimensional settings. J. Am. Stat. Assoc. 2012, 107, 649–660. [Google Scholar] [CrossRef]
Rossell, D.; Telesca, D. Nonlocal priors for high-dimensional estimation. J. Am. Stat. Assoc. 2017, 112, 254–265. [Google Scholar] [CrossRef]
Sanyal, N.; Lo, M.T.; Kauppi, K.; Djurovic, S.; Andreassen, O.A.; Johnson, V.E.; Chen, C.H. GWASinlps: Non-local prior based iterative SNP selection tool for genome-wide association studies. Bioinformatics 2019, 35, 1–11. [Google Scholar] [CrossRef] [PubMed]
Mallat, S. A Wavelet Tour of Signal Processing, Third Edition: The Sparse Way, 3rd ed.; Academic Press, Inc.: Cambridge, MA, USA, 2008. [Google Scholar] [CrossRef]
Vidakovic, B. Nonlinear Wavelet Shrinkage with Bayes Rules and Bayes Factors. J. Am. Stat. Assoc. 1998, 93, 173–179. [Google Scholar] [CrossRef]
Vidakovic, B.; Ruggeri, F. BAMS Method: Theory and Simulations. Sankhyā Indian J. Stat. Ser. B 2001, 63, 234–249. [Google Scholar]
Portilla, J.; Strela, V.; Wainwright, M.; Simoncelli, E. Image denoising using scale mixtures of Gaussians in the wavelet domain. IEEE Trans. Image Process. 2003, 12, 1338–1351. [Google Scholar] [CrossRef]
Cutillo, L.; Jung, Y.Y.; Ruggeri, F.; Vidakovic, B. Larger posterior mode wavelet thresholding and applications. J. Stat. Plan. Inference 2008, 138, 3758–3773. [Google Scholar] [CrossRef]
Crouse, M.; Nowak, R.; Baraniuk, R. Wavelet-based statistical signal processing using hidden Markov models. IEEE Trans. Signal Process. 1998, 46, 886–902. [Google Scholar] [CrossRef]
Chang, S.; Yu, B.; Vetterli, M. Adaptive wavelet thresholding for image denoising and compression. IEEE Trans. Image Process. 2000, 9, 1532–1546. [Google Scholar] [CrossRef]
Figueiredo, M.; Nowak, R. Wavelet-based image estimation: An empirical Bayes approach using Jeffrey’s noninformative prior. IEEE Trans. Image Process. 2001, 10, 1322–1331. [Google Scholar] [CrossRef]
Boubchir, L.; Boashash, B. Wavelet denoising based on the MAP estimation using the BKF Prior with application to images and EEG signals. IEEE Trans. Signal Process. 2013, 61, 1880–1894. [Google Scholar] [CrossRef]
Reményi, N.; Vidakovic, B. Wavelet shrinkage with double Weibull prior. Commun. Stat. Simul. Comput. 2015, 44, 88–104. [Google Scholar] [CrossRef]
Afshari, M.; Lak, F.; Gholizadeh, B. A new Bayesian wavelet thresholding estimator of nonparametric regression. J. Appl. Stat. 2017, 44, 649–666. [Google Scholar] [CrossRef]
Sousa, A.R.d.S.; Garcia, N.L.; Vidakovic, B. Bayesian wavelet shrinkage with beta priors. Comput. Stat. 2021, 36, 1341–1363. [Google Scholar] [CrossRef]
dos Santos Sousa, A.R. Bayesian wavelet shrinkage with logistic prior. Commun. Stat. Simul. Comput. 2022, 51, 4700–4714. [Google Scholar] [CrossRef]
Vidakovic, B. Statistical Modeling by Wavelets; Wiley Series in Probability and Statistics; Wiley: Hoboken, NJ, USA, 1999. [Google Scholar] [CrossRef]
Antoniadis, A.; Bigot, J.; Sapatinas, T. Wavelet Estimators in Nonparametric Regression: A Comparative Simulation Study. J. Stat. Softw. 2001, 6, 1–83. [Google Scholar] [CrossRef]
Donoho, D.L.; Johnstone, I.M. Adapting to Unknown Smoothness via Wavelet Shrinkage. J. Am. Stat. Assoc. 1995, 90, 1200–1224. [Google Scholar] [CrossRef]
Donoho, D.L.; Johnstone, I.M.; Kerkyacharian, G.; Picard, D. Wavelet Shrinkage: Asymptopia? J. R. Stat. Soc. Ser. B 1995, 57, 301–369. [Google Scholar] [CrossRef]
Donoho, D.; Johnstone, J. Ideal spatial adaptation by wavelet shrinkage. Biometrika 1994, 81, 425–455. [Google Scholar] [CrossRef]
Nason, G.P. Wavelet Shrinkage Using Cross-Validation. J. R. Stat. Soc. Ser. B 1996, 58, 463–479. [Google Scholar] [CrossRef]
Abramovich, F.; Benjamini, Y. Adaptive thresholding of wavelet coefficients. Comput. Stat. Data Anal. 1996, 22, 351–361. [Google Scholar] [CrossRef]
Nason, G. Wavethresh: Wavelets Statistics and Transforms, version 4.7.3; R Foundation for Statistical Computing: Vienna, Austria, 2024. [Google Scholar] [CrossRef]
Brandmeyer, T.; Delorme, A. Reduced mind wandering in experienced meditators and associated EEG correlates. Exp. Brain Res. 2018, 236, 2519–2528. [Google Scholar] [CrossRef]
R Core Team. R: A Language and Environment for Statistical Computing; R Foundation for Statistical Computing: Vienna, Austria, 2024. [Google Scholar]
Sanyal, N. NLPwavelet: Bayesian Wavelet Analysis Using Non-Local Priors, version 1.1; R Foundation for Statistical Computing: Vienna, Austria, 2025. [Google Scholar] [CrossRef]
International Telecommunication Union. Perceptual Evaluation of Speech Quality (PESQ): An Objective Method for End-to-End Speech Quality Assessment of Narrow-Band Telephone Networks and Speech Codecs; Rec. P.862; International Telecommunication Union: Geneva, Switzerland, 2001. [Google Scholar]
Taal, C.H.; Hendriks, R.C.; Heusdens, R.; Jensen, J. A short-time objective intelligibility measure for time-frequency weighted noisy speech. In Proceedings of the 2010 IEEE International Conference on Acoustics, Speech and Signal Processing, Dallas, TX, USA, 14–19 March 2010; pp. 4214–4217. [Google Scholar] [CrossRef]

Figure 1. Plots of our proposed three-component spike-and-slab mixture model (solid line) with

γ_{1 l} = γ_{2 l} = 0.25

,

τ_{1 l} = τ_{2 l} = 0.2

,

r = ν = 1

, and

σ = 1

along with two-component spike-and-slab mixture models based on MOM (dashed line) and IMOM (dotted line) priors with

γ_{l} = 0.5

,

τ_{l} = 1

, and

σ = 1

.

Figure 1. Plots of our proposed three-component spike-and-slab mixture model (solid line) with

γ_{1 l} = γ_{2 l} = 0.25

,

τ_{1 l} = τ_{2 l} = 0.2

,

r = ν = 1

, and

σ = 1

along with two-component spike-and-slab mixture models based on MOM (dashed line) and IMOM (dotted line) priors with

γ_{l} = 0.5

,

τ_{l} = 1

, and

σ = 1

.

Figure 2. Plots of the different specifications considered for the mixture probabilities

γ_{l}^{(1)}

and

γ_{l}^{(2)}

(logit, generalized logit, hyperbolic secant, and generalized normal) and scale parameters

τ_{l}^{(1)}

and

τ_{l}^{(2)}

(polynomial decay and double exponential decay) against resolution level, with specified values of the hyperparameters.

Figure 2. Plots of the different specifications considered for the mixture probabilities

γ_{l}^{(1)}

and

γ_{l}^{(2)}

(logit, generalized logit, hyperbolic secant, and generalized normal) and scale parameters

τ_{l}^{(1)}

and

τ_{l}^{(2)}

(polynomial decay and double exponential decay) against resolution level, with specified values of the hyperparameters.

Figure 3. Plots of the test functions blocks, bumps, and doppler using the Donoho–Johnstone (DJ) specifications (dashed line) and our specifications (solid line), and three linear combinations of them—lcomb1, lcomb2, and lcomb3—based on our specifications, all evaluated at 1024 equally spaced in (0,1).

Figure 4. For the proposed three-component mixture-based methods, MSE for different sample sizes (n), averaged over 3 SNRs and 6 test functions.

Figure 5. Plots of the posterior means of the EEG signal (in slate gray) based on the imom-logit-polynom (left), mixture-genlogit-doubleexp (middle), and mixture-gennorm-polynom (right) methods, superimposed on the observed data (in black), obtained during the 16 s interval

(t - 8, t + 8)

, t being probe 1 onset time, from the A10 channel of the 4 considered participants.

Figure 5. Plots of the posterior means of the EEG signal (in slate gray) based on the imom-logit-polynom (left), mixture-genlogit-doubleexp (middle), and mixture-gennorm-polynom (right) methods, superimposed on the observed data (in black), obtained during the 16 s interval

(t - 8, t + 8)

, t being probe 1 onset time, from the A10 channel of the 4 considered participants.

Figure 6. Plots of the posterior means (in slate gray) of the right-channel signal of the chosen audio segment of the vocal music recording, superimposed on the noisy data (in red).

Table 1. Method comparison: Data were simulated for each test function across 12 combinations of sample size (n) and SNR, with 100 replications per combination. A total of 24 analysis methods were applied to each dataset—8 methods with MOM-based two-component mixture prior, 8 methods with IMOM-based two-component mixture prior, and 8 methods with the proposed three-component mixture prior. The average MSE was computed for each method across the replications. The table presents, for each test function and analysis method, the number of

(n, SNR)

combinations where the method achieved the lowest MSE. For each test function, the method with the highest frequency of best performance is highlighted in bold.

Table 1. Method comparison: Data were simulated for each test function across 12 combinations of sample size (n) and SNR, with 100 replications per combination. A total of 24 analysis methods were applied to each dataset—8 methods with MOM-based two-component mixture prior, 8 methods with IMOM-based two-component mixture prior, and 8 methods with the proposed three-component mixture prior. The average MSE was computed for each method across the replications. The table presents, for each test function and analysis method, the number of

(n, SNR)

combinations where the method achieved the lowest MSE. For each test function, the method with the highest frequency of best performance is highlighted in bold.

Method	blocks	bumps	doppler	lcomb1	lcomb2	lcomb3	Total
mom-logit-polynom	1	0	0	0	0	0	1
mom-logit-doubleexp	1	0	0	0	0	0	1
mom-genlogit-polynom	0	0	0	0	0	0	0
mom-genlogit-doubleexp	0	0	0	0	0	0	0
mom-hypsec-polynom	0	0	0	0	1	0	1
mom-hypsec-doubleexp	3	0	0	0	0	0	3
mom-gennormal-polynom	0	0	0	0	0	0	0
mom-gennormal-doubleexp	0	0	0	0	0	0	0
imom-logit-polynom	0	0	0	0	0	0	0
imom-logit-doubleexp	0	0	0	0	0	0	0
imom-genlogit-polynom	0	0	0	0	0	0	0
imom-genlogit-doubleexp	0	0	0	0	0	0	0
imom-hypsec-polynom	0	0	0	0	0	0	0
imom-hypsec-doubleexp	0	0	0	0	0	0	0
imom-gennormal-polynom	0	0	0	0	0	0	0
imom-gennormal-doubleexp	0	0	0	0	0	0	0
mixture-logit-polynom	1	6	0	1	3	0	11
mixture-logit-doubleexp	0	1	2	1	2	1	7
mixture-genlogit-polynom	2	1	2	0	1	2	8
mixture-genlogit-doubleexp	3	0	0	0	1	0	4
mixture-hypsec-polynom	1	0	4	1	1	0	7
mixture-hypsec-doubleexp	0	1	0	4	1	2	8
mixture-gennormal-polynom	0	3	2	1	1	2	9
mixture-gennormal-doubleexp	0	0	2	4	1	5	12
Total	12	12	12	12	12	12	72

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the author. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Sanyal, N. Nonlocal Prior Mixture-Based Bayesian Wavelet Regression with Application to Noisy Imaging and Audio Data. Mathematics 2025, 13, 2642. https://doi.org/10.3390/math13162642

AMA Style

Sanyal N. Nonlocal Prior Mixture-Based Bayesian Wavelet Regression with Application to Noisy Imaging and Audio Data. Mathematics. 2025; 13(16):2642. https://doi.org/10.3390/math13162642

Chicago/Turabian Style

Sanyal, Nilotpal. 2025. "Nonlocal Prior Mixture-Based Bayesian Wavelet Regression with Application to Noisy Imaging and Audio Data" Mathematics 13, no. 16: 2642. https://doi.org/10.3390/math13162642

APA Style

Sanyal, N. (2025). Nonlocal Prior Mixture-Based Bayesian Wavelet Regression with Application to Noisy Imaging and Audio Data. Mathematics, 13(16), 2642. https://doi.org/10.3390/math13162642

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Nonlocal Prior Mixture-Based Bayesian Wavelet Regression with Application to Noisy Imaging and Audio Data

Abstract

1. Introduction

2. Bayesian Hierarchical Model

2.1. Observation Model

2.2. Wavelet Coefficient Model

2.3. Mixture Prior for the Wavelet Coefficients

2.4. Hyperparameter Specifications

3. Inference

3.1. Hyperparameter Estimation

3.2. Posterior Distribution

4. Simulation Study

5. EEG Meditation Study

6. Musical Sound Study

7. Discussion

Supplementary Materials

Funding

Acknowledgments

Conflicts of Interest

Abbreviations

Appendix A

Appendix A.1. Proof of Result 1

Appendix A.2. Proof of Result 2

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI