Bandwidth Detection of Graph Signals with a Small Sample Size

Xie, Xuan; Feng, Hui; Hu, Bo

doi:10.3390/s21010146

Open AccessLetter

Bandwidth Detection of Graph Signals with a Small Sample Size

by

Xuan Xie

,

Hui Feng

^* and

Bo Hu

Research Center of Smart Networks and Systems, Fudan University, Shanghai 200433, China

^*

Author to whom correspondence should be addressed.

Sensors 2021, 21(1), 146; https://doi.org/10.3390/s21010146

Submission received: 29 October 2020 / Revised: 23 December 2020 / Accepted: 23 December 2020 / Published: 28 December 2020

(This article belongs to the Special Issue Graph Signal Processing for Sensing Applications)

Download

Browse Figures

Review Reports Versions Notes

Abstract

:

Bandwidth is the crucial knowledge to sampling, reconstruction or estimation of the graph signal (GS). However, it is typically unknown in practice. In this paper, we focus on detecting the bandwidth of bandlimited GS with a small sample size, where the number of spectral components of GS to be tested may greatly exceed the sample size. To control the significance of the result, the detection procedure is implemented by multi-stage testing. In each stage, a Bayesian score test, which introduces a prior to the spectral components, is adopted to face the high dimensional challenge. By setting different priors in each stage, we make the test more powerful against alternatives that have similar bandwidth to the null hypothesis. We prove that the Bayesian score test is locally most powerful in expectation against the alternatives following the given prior. Finally, numerical analysis shows that our method has a good performance in bandwidth detection and is robust to the noise.

Keywords:

graph signals; bandwidth detection; bayesian score test

1. Introduction

Graph signal (GS) is a versatile model for describing information in irregular domains, which has been widely used in sensor networks [1,2], biological networks [3,4] and image and 3-D point cloud processing [5,6]. Graph signal processing (GSP) theory generalizes the classic discrete signal processing theory to irregular domains by introducing graph Fourier transform (GFT) [7,8]. In most real world networks, adjacent vertices usually have similar signals, which leads to bandlimitedness or approximate bandlimitedness in graph spectral domain. For example, the differences of temperature measured by sensors within short distances may not vary a lot; people tend to have similar interests or views with their friends in the social network.

Bandwidth is a widely used prior of GS in selecting the best sensor placements to monitor spatial phenomena [9,10,11], low-pass filter designed for denoising [7,8,12] and estimating the signals on all the sensors from partial observations [13,14]. However, the bandwidth of GS is typically unknown in practice. There are only a few works considering the problem of bandwidth detection. The bandwidth estimation in Reference [9] relies on noise-free samples. The scenario of noisy observations is considered in Reference [15], but it only applies to the spectral sparse case. Meanwhile, the estimation accuracy of the bandwidth in Reference [15] highly depends on the parameter controlling the sparsity of the GS, and the process of finding an optimal parameter is time-consuming.

The detection of bandwidth for a GS aims to detect the number of spectral components in a GS, which can be seen as a model selection problem in linear regression. The well-known model selection procedures like Akaike information criterion (AIC) and Bayesian information criterion (BIC) [16] pick the most likely model under some criteria. But they do not consider the significance of the chosen model. In contrast, hypothesis testing, such as likelihood ratio (LR) test and F-test can provide the significance of a model. If the bandwidth of a GS is k, the largest index of non-zero elements in frequency coefficients should be k. Let the vertex number of a GS be N, the regression coefficients of the last

N - k

frequency components are denoted by

{\hat{f}}_{- k}

. To test whether the bandwidth of the GS is k, the hypothesis can be expressed as

\begin{matrix} H_{0} : {\hat{f}}_{- k} = 0, H_{A} : {\hat{f}}_{- k} \neq 0 . \end{matrix}

(1)

The null hypothesis will be rejected if the model with the assumed bandwidth k is not significant enough. Therefore, the bandwidth of GS can be detected by applying tests over all the assumed bandwidths to see whether there is a model achieving the given significance.

For large-scale GS, such as social networks, it is impossible to sample the signal on every vertex due to huge data collection cost, which means we can only have the sample size

M < N - k

sometimes. In this high dimensional situation, the conventional test for testing regression coefficients such as LR test and F-test is infeasible, which will be explained in Section 3.1. There are plenty of literature on testing the coefficients of high dimensional linear regression. For example, Reference [17,18] propose new test statistics by modifying the F-statistic, and Reference [19] introduces Bayesian priors to the parameters being tested. When

k ≪ N

, the spectrum of the GS can be seen as sparse. Several approaches reconstruct the signal in compressive sensing which estimate the bandwidth as a by-product [20,21]. The frequency components detected by compressive sensing methods can help estimate the spectrum sparse signal well. However, the bandwidth obtained by the compressive sensing methods may be inaccurate since the choose frequency components in the whole space without considering the bandlimitedness. More accurate bandwidth is need in some applications like filter design and sampling set design of which the performance highly depend on the bandwidth prior.

In this letter, we try to detect the bandwidth of bandlimited GS with small sample size, which is not much larger than the bandwidth and there is no sparsity constraint to the spectrum. The bandwidth is detected by a forward multi-stage test. The bandwidth of model being tested is increasing over stages and the bandwidth is obtained when the null hypothesis is accepted. Since the samples are not adequate, we try to use the limited samples to better distinguish the assumed bandwidth from those close to it in each stage for a small detection bias. Therefore, we customize priors for

{\hat{f}}_{- k}

in each stage to describe our attention on testing whether some elements in

{\hat{f}}_{- k}

are non-zero. By doing so, we only use the limited samples to focus on testing a few alternatives in each stage. Bayesian score test [19] is adopted in this paper, but we do not give uniform prior to the parameters being tested as Reference [19] does. Since the bandwidth being tested is increasing over stages, our attentions on coefficients of different frequencies are also updated in each stage which makes our method able to locate the true bandwidth with a small bias even though the samples are insufficient.

The widely used multi-stage test for model selection, stepwise regression ([16] Chapter 10) does poorly with a small sample size since the estimation of the regression coefficients in it is biased [22]. In contrast, our method is proved to be locally most powerful (LMP) in expectation against alternatives following the given prior, which means it performs well in distinguishing among models with neighbouring bandwidth even using a small amount of samples in each stage.

2. Problem Formulation

Consider an N-vertex undirected connected graph

G = (V, E, W)

, where

V

is the vertex set,

E

is the edge set and

W

is the

N \times N

weighted adjacency matrix with

W (i, j) = w_{i j}

. A GS defined on

V

can be represented as a vector

f \in R^{N}

, and its element

f_{i}

represents the signal value at the i-th vertex in

V

. Laplacian matrix of the undirected graph is given by

L = D - W

, where

D

is the diagonal degree matrix

diag {d_{1}, \dots, d_{N}}

with

d_{i} = \sum_{j} w_{i j}

. Since

w_{i j} = w_{j i}

for undirected graphs, Laplacian matrix is symmetric. Therefore, it has real non-negative eigenvalues

0 = λ_{1} \leq λ_{2} \leq \dots \leq λ_{N}

and an orthogonal set of eigenvectors

V

. As a result, the spectral decomposition of graph Laplacian is given as

L = V Λ V^{T}

, where

Λ

is a diagonal matrix of eigenvalues. The columns of

V

denoted by

{v_{i}}_{1 \leq i \leq N}

are regarded as the graph Fourier bases and the eigenvalues are regarded as frequencies [7]. The expansion coefficients of

f

corresponding to eigenvectors are defined as

\hat{f}

.

A GS is called bandlimited when there exists a

K \in {0, 1, \dots, N - 1}

such that its GFT

\hat{f}

satisfies

{\hat{f}}_{i} = 0

for all

i \geq K

[10]. The smallest such K is called the bandwidth of

f

. If all the frequency coefficients are non-zero, the GS is not bandlimited. If

f

is a signal with bandwidth K, then it satisfies

f = V_{K} {\hat{f}}_{K}

, where

V_{K}

denotes the first K columns of

V

and

{\hat{f}}_{K}

denotes the first K coefficients of

\hat{f}

.

Suppose that

M (M < N)

noisy observations

y \in R^{M}

are sampled from

f \in R^{N}

to detect the bandwidth, the observation model can be summarized as

y = Ψ (f + w),

(2)

where

w

is an

N \times 1

vector with ith element representing the observation noise on the ith vertex and

Ψ : R^{N} \to R^{M}

is the sampling matrix, of which the element in the ith row and jth column is defined as

\begin{matrix} {(Ψ)}_{i, j} = \{\begin{matrix} 1, & if the j th vertex is the i th sample; \\ 0, & otherwise . \end{matrix} \end{matrix}

(3)

We use a simple example to illustrate the role of the sampling matrix. For a graph with 5 vertices and signal

f = {[f_{1}, \dots, f_{5}]}^{T}

, if the 1st, 2nd and 4th vertices are sampled, then the sampling matrix is a matrix with 3 rows and 5 columns as follow

\begin{matrix} Ψ = [\begin{matrix} 1 & 0 & 0 & 0 & 0 \\ 0 & 1 & 0 & 0 & 0 \\ 0 & 0 & 0 & 1 & 0 \end{matrix}] . \end{matrix}

(4)

The sampled graph signal is

Ψ f = {[f_{1}, f_{2}, f_{4}]}^{T}

.

In this paper, we assume that the observation noise on each sample follows the Gaussian distribution

N (0, σ^{2})

and the noise on all the samples are independently and identically distributed (i.i.d.), so the observation follows

w \sim N (0, σ^{2} I_{N \times N})

, where

I_{N \times N}

is an

N \times N

identity matrix.

By dividing the columns of

V

into two parts

V_{k}

and

V_{- k}

for any assumed bandwidth

k \in {1, 2, \dots, N - 1}

, the observation model (2) can be rewritten as

\begin{matrix} y = U_{k} {\hat{f}}_{k} + U_{- k} {\hat{f}}_{- k} + Ψ w, \end{matrix}

(5)

where

{\hat{f}}_{k}

and

{\hat{f}}_{- k}

denote the first k and the last

N - k

elements of

\hat{f}

, respectively, and

U_{k} = Ψ V_{k}

,

U_{- k} = Ψ V_{- k}

. If the actual bandwidth

K = k

,

H_{0}

in (1) will be accepted. Otherwise,

H_{0}

will be rejected.

For a GS with actual bandwidth K, the hypothesis with assumed bandwidth

k < K

should all be rejected. Therefore, we detect the bandwidth of GS in a forward multi-stage way with assumed bandwidth increasing over stages. The bandwidth is obtained when the null hypothesis is accepted. By doing so, we reduce the number of models to be tested from N to K.

3. Bandwidth Detection

3.1. High Dimensional Challenge

F-test and LR test are common approaches to test the hypothesis (1) in linear regression model. Let the maximum likelihood estimation (MLE) of

\hat{f}

under

H_{0}

and

H_{1}

be

{\hat{f}}_{R}

and

{\hat{f}}_{U}

, respectively. The corresponding sum of squared residuals (SSR) are denoted by

SSR ({\hat{f}}_{R})

and

SSR ({\hat{f}}_{U})

. Then, LR test statistic ([23] Chapter 10) equals

M log (SSR ({\hat{f}}_{U}) / SSR ({\hat{f}}_{R}))

, which is asymptotically distributed as

χ^{2} (N - k)

as the sample size approaches to infinity. When the sample size is small, the test statistic of LR test has no specific distribution which makes it hard to implement. F-test statistic ([23] Chapter 10) equals

\begin{matrix} \frac{SSR ({\hat{f}}_{U}) - SSR ({\hat{f}}_{R}) / N - k}{SSR ({\hat{f}}_{R}) / M - k} & = \frac{y^{T} M_{k} U_{- k} {(U_{- k}^{T} M_{k} U_{- k})}^{- 1} U_{- k}^{T} M_{k} y / N - k}{y^{T} M_{k} (I_{M \times M} - U_{- k} {(U_{- k}^{T} M_{k} U_{- k})}^{- 1} U_{- k}^{T}) M_{k} y / M - k}, \end{matrix}

(6)

where

M_{k} = I_{M \times M} - U_{k} {(U_{k}^{T} U_{k})}^{- 1} U_{k}^{T}

is a projection matrix that projects signals to the complementary subspace of the column space of

U_{k}

. F-test is exact in finite samples, however, to ensure that

U_{k}^{T} U_{k}

in

M_{k}

is invertible, we have to let

M \geq k

. Similarly, to ensure that

U_{- k}^{T} M_{k} U_{- k}

is invertible, there must be

M \geq N - k

. Therefore, (6) requires that

M \geq max (K, N - k)

, which means the sample size needs to be bigger than the assumed bandwidth, as well as the dimension of

{\hat{f}}_{- k}

. For large-scale GS, the sampling cost for bandwidth detection using F-test or LR test is too high. Therefore, a test with small sample size is preferred. The sample size only needs to be larger than the assumed bandwidth, that is,

M > k

, in score test ([23] Chapter 10). However, when

N - k > M

, there may exist some alternatives which have

{\hat{f}}_{- k} \neq 0

but

U_{- k} {\hat{f}}_{- k} = 0

. We can never hope to have any power against these alternatives. Since the GS is known to be bandlimited, we should pay more attention on testing whether the low frequency coefficients in

{\hat{f}}_{- k}

are non-zero than the high frequency coefficients. Thus, we adopt Bayesian score test [19] to allocate our attention among the alternatives by designing a prior for

{\hat{f}}_{- k}

. Different from Reference [19], which pays equal attention to all the alternatives, we customize a prior for

{\hat{f}}_{- k}

to fit our bandwidth detection purpose.

3.2. Design of the Prior

In the test for each bandwidth, the alternative that

{\hat{f}}_{- k} = α

and

{\hat{f}}_{- k} = - α

for every

α \neq 0

contributes equally in rejecting

H_{0}

, so we assign

E ({\hat{f}}_{- k}) = 0

to make the prior unbiased, where

E (\cdot)

denotes the expectation of the given stochastic variable. The covariance matrix of

{\hat{f}}_{- k}

can be assigned as

\begin{matrix} E ({\hat{f}}_{- k} {\hat{f}}_{- k}^{T}) = τ^{2} Σ_{- k}, \end{matrix}

(7)

where

Σ_{- k}

is a positive semidefinite

k \times k

matrix to be designed and

τ^{2}

is an unknown parameter. In this paper, we let

Σ_{- k}

be a diagonal matrix, then each of its diagonal element is the variance of the corresponding element in

{\hat{f}}_{- k}

. Since

E ({\hat{f}}_{- k}) = 0

, a larger variance indicates the corresponding element in

{\hat{f}}_{- k}

is more likely to have a larger amplitude. To decide whether

H_{0}

should be accepted, more attention should be payed to the element in

{\hat{f}}_{- k}

with larger variance. Thus, the attention among elements in

{\hat{f}}_{- k}

in bandwidth detection is linked to their variances in the prior. The attention is allocated by designing a

Σ_{- k}

.

In our forward multi-stage test, the bandwidth being tested updates over stages, thus

Σ_{- k}

also needs to be updated. To avoid the the multi-stage test accepting the null hypothesis too early, the test should be more distinguishable from the model with bandwidth close to the null hypothesis in each stage. Therefore, we use the limited samples to make the test concentrate on distinguishing bandwidth close to k. The design of

Σ_{- k}

follows the guideline that as the frequency increases, element in

{\hat{f}}_{- k}

with higher frequency is paid less attention. Since we cannot determine a bandwidth larger than

M - 1

with sample size M, we set the attention on

{\hat{f}}_{- k}

with frequency larger than M to a small constant

δ

. For example, the ith diagonal element of

Σ_{- k}

can be designed to be

\begin{matrix} {(Σ_{- k})}_{i, i} = \frac{1}{Z} exp (- \frac{i^{2}}{2 σ^{2}}) + δ, \end{matrix}

(8)

where

Z = exp (- 1 / (2 σ^{2}))

is the normalization factor. According to the three-standard-deviations property of normal distribution,

σ

is set to be

(M - k) / 3

to make the attention on coefficients with frequencies larger than M equals to

δ

with high probability. Equation (8) is not the only choice of

Σ_{- k}

, but this form is applicable. The details about how it affects the power of the test is analyzed later in Section 3.3.

As the multi-stage test moves forward, the attentions among frequency coefficients update, and in each stage, the test is distinguishable among bandwidth close to k, which makes the small-bias bandwidth detection with small sample size possible. An illustration of the attention update is shown in Figure 1.

We can complete the specification of the distribution of

{\hat{f}}_{- k}

by choosing a value for

τ^{2}

and a distribution shape. Let

L_{k} ({\hat{f}}_{- k}; y)

be the likelihood of

{\hat{f}}_{- k}

, for a specific prior distribution of

{\hat{f}}_{- k}

, the likelihood of

τ^{2}

is

\begin{matrix} {\bar{L}}_{k} (τ^{2}; y) = \int L_{k} ({\hat{f}}_{- k}; y) π ({\hat{f}}_{- k} | τ^{2}) d {\hat{f}}_{- k}, \end{matrix}

(9)

where

π ({\hat{f}}_{- k} | τ^{2})

is the distribution of

{\hat{f}}_{- k}

for a given

τ^{2}

.

Given

{\bar{L}}_{k} (τ^{2}; y)

, we convert hypothesis (1) to

\begin{matrix} {\bar{H}}_{0} : τ^{2} = 0, {\bar{H}}_{A} : τ^{2} \neq 0 . \end{matrix}

(10)

τ^{2} = 0

implies

{\hat{f}}_{- k} = 0

, since they lead to the same distribution of

y

. Thus

H_{0} : {\hat{f}}_{- k} = 0

will be rejected if

{\bar{H}}_{0}

is rejected. The score test of

{\bar{H}}_{0}

is named the Bayesian score test of

H_{0}

with the given prior of

{\hat{f}}_{- k}

.

The score test of

{\bar{H}}_{0}

in stage k is a one-sided test for one parameter, the test statistic is

\begin{matrix} S_{Σ_{- k}} & = \frac{d}{d τ^{2}} log {\bar{L}}_{k} (0; y), \end{matrix}

(11)

\begin{matrix} = \frac{1}{2} g_{k}^{T} Σ_{- k} g_{k} - \frac{1}{2} tr (Σ_{- k} F_{k}), \end{matrix}

(12)

where

tr (Σ_{- k} F_{k})

denotes the trace of

Σ_{- k} F_{k}

.

{\bar{H}}_{0}

will be rejected when

S_{Σ_{- k}} \geq t

for some threshold t. From (11), we can find that the test statistic of

τ^{2}

can be seen as the slope of log-likelihood function of

τ^{2}

under

{\bar{H}}_{0}

. The slope will equal to 0 when

τ^{2}

equals its MLE

{\hat{τ}}^{2}

. The more closer

{\hat{τ}}^{2}

to 0, the more closer

S_{Σ_{- k}}

to 0. If

S_{Σ_{- k}}

is larger than a given t, which means

{\hat{τ}}^{2}

is much larger than 0, then

{\bar{H}}_{0}

will be rejected. Equation () is given by Reference [19], where

g_{k} = σ^{- 2} U_{- k} M_{k} y

and

F_{k} = σ^{- 2} U^{T} U

are the gradient and Fisher information matrix of

log L_{k} (0; y)

, respectively.

3.3. Method

Considering that the second part of () is not related to the observations, it is more convenient to work with the equivalent test statistic

σ^{- 2} y^{T} M_{k} U_{- k} Σ_{- k} U_{- k}^{T} M_{k} y

. Because

σ^{2}

is not known, we plug in the MLE result

\hat{σ^{2}} \propto y^{T} M_{k} y

under the null hypothesis, the resulting test statistic is

\begin{matrix} S_{Σ_{- k}} = \frac{y^{T} M_{k} U_{- k} Σ_{- k} U_{- k}^{T} M_{k} y}{y^{T} M_{k} y} . \end{matrix}

(13)

Interpretation of the test statistic: $M_{k} y$ indicates the part of $y$ out of the range of $U_{k}$ . The numerator of (13) can be rewritten as $Q = \sum_{i = 1}^{M} {(Σ_{- k})}_{i, i} Q_{i}$ with $Q_{i} = {(y^{T} M_{k} U_{k + i})}^{2}$ . Larger $Q_{i}$ indicates a larger energy of $M_{k} y$ lying in the range of $U_{k + i}$ , which links to a larger amplitude of the ith element in ${\hat{f}}_{- k}$ . $S_{Σ_{- k}}$ equals Q, which is a weighted sum of $Q_{i}$ , normalized by the energy of $M_{k} y$ . When the weight ${(Σ_{- k})}_{i, i}$ decreases with i, $S_{Σ_{- k}}$ will be larger if $Q_{i}$ is larger for smaller i. Therefore, the test is more powerful against $H_{A}$ that has non-zero frequency coefficients in low frequency than that has non-zero frequency coefficients in high frequency. This is in accordance with the purpose of our design that the test should be more distinguishable from the model with bandwidth close to k.

Finally, we decide whether the null hypothesis is accepted by a p-value threshold. For a given significance level

α

, if

p \leq α

, the null hypothesis is rejected. Let

S_{- k}

be the observation value of

S_{Σ_{- k}}

, then the p-value of the test is

\begin{matrix} p & = P_{H_{0}} (S_{Σ_{- k}} > S_{- k}), \\ = P_{H_{0}} (y^{T} M_{k} (U_{- k} Σ_{- k} U_{- k}^{T} - S_{- k}) M_{k} y > 0) . \end{matrix}

(14)

Since

M_{k} y = M_{k} (y - U_{k} {\hat{f}}_{k}) = M_{k} w

under null hypothesis, it follows a multivariate normal distribution. Thus, p is the probability that the quadratic form of normal variables is non-negative. The distribution of the quadratic forms in normal variables is approximated by a

χ^{2}

distribution in Reference [24] and p-value can be calculated approximately as,

\begin{matrix} p = \{\begin{matrix} P (χ_{d_{k}}^{2} > d_{k} - h_{k}), & tr (X_{k}^{3}) > 0; \\ P (χ_{d_{k}}^{2} < d_{k} - h_{k}), & tr (X_{k}^{3}) < 0; \\ Φ (\frac{tr (X_{k})}{\sqrt{2 tr (X_{k}^{2})}}), & tr (X_{k}^{3}) = 0, \end{matrix} \end{matrix}

(15)

where

X_{k} = M_{k} U_{- k} Σ_{- k} U_{- k}^{T} M_{k} - S_{- k} M_{k}

,

d_{k} = {[tr (X_{k}^{2})]}^{3} / {[tr (X_{k}^{3})]}^{2}

, and

h_{k} = tr (X_{k}^{2}) tr (X_{k}) /

tr (X_{k}^{3})

.

Our algorithm for the multi-stage bandwidth detection is shown in Algorithm 1. If

p \leq α

for all the stages, the output of Algorithm 1 will be ‘None’, which means the bandwidth cannot be determined with the given sample size. Then we can say that the bandwidth is larger than

M - 1

.

Algorithm 1: Multi-stage Bandwidth Detection.

Input Samples

y

and significance level

α

;

Output Detected bandwidth K or None;

1: for

k = 1, 2, \dots, M - 1

do

2: Calculate test statistic according to (13);

3: Calculate the p-value of the test according to (15);

4: if

p > α

then

5: Let bandwidth

K = k

and stop;

6: else

7:

k = k + 1

;

8: end if

9: end for

3.4. Power Analysis

In the situation

N - k > M

, there may exist some alternatives which have

{\hat{f}}_{- k} \neq 0

but

U_{- k} {\hat{f}}_{- k} = 0

. It is impossible to find a test which is optimal against all the alternatives, that is, uniformly most powerful. If the true

{\hat{f}}_{- k}

has large deviations from

H_{0}

, it is very easy to detect. However, if the deviations are small, the detection becomes harder. In the Bayesian score test of

H_{0}

, the deviations of the true

{\hat{f}}_{- k}

from

H_{0}

is denoted by

τ ξ

, where

ξ

is assumed to follow a prior distribution with

E (ξ) = 0

and

E (ξ ξ^{T}) \propto Σ_{- k}

and

τ^{2}

indicates how large the deviation is. Next, we will analyze the power of the Bayesian score test of

H_{0}

when

τ^{2}

is small.

Definition 1 (Locally most powerful (LMP) [25]). Consider the problem of testing the simple null hypothesis

H : θ = θ_{0}

against the one-sided alternative

K : θ > θ_{0}

. A significance level α test

Φ_{0}

with power function

β_{Φ_{0}} (θ)

is said to be LMP if given any other level α test Φ with power function

β_{Φ} (θ)

, there exists

Δ > 0

such that

β_{Φ_{0}} (θ) \geq β_{Φ} (θ)

for all

θ \in K

with

0 \leq θ - θ_{0} \leq Δ

.

An LMP test is one of the best tests for detecting small deviations from null hypothesis, though it is not good at detecting all kinds of alternatives. In each stage of Algorithm 1, we aim at deciding whether the bandwidth is k or not, so small deviations in the coefficients of frequencies around k should be detected as possible as we can. Therefore, a LMP test is preferred in each stage. In Theorem 1, we will show that the Bayesian score test in each stage of Algorithm 1 is LMP in expectation.

Theorem 1.

Let

\bar{w} ({\hat{f}}_{- k})

be the power function of the level α Bayesian score test of

H_{0}

and let

w ({\hat{f}}_{- k})

be the power function of any level α test of

H_{0}

. The Bayesian score test of

H_{0}

is LMP in expectation against all

{\hat{f}}_{- k} \in H_{A}

with

{\hat{f}}_{- k} = τ ξ

, where

0 \leq τ^{2} \leq Δ

, that is

E_{ξ} [w (τ ξ)] \leq E_{ξ} [\bar{w} (τ ξ)]

.

Proof of Theorem 1.

Let

\bar{β} (τ^{2})

be the power function of the level

α

score test of

{\bar{H}}_{0}

. Since

{\hat{f}}_{- k} = 0

and

τ^{2} = 0

lead to the same distribution of

y

, the level

α

tests of

H_{0}

and

{\bar{H}}_{0}

lead to the same critical region. The same conclusion holds for any other level

α

tests of

H_{0}

and

{\bar{H}}_{0}

with power function

w ({\hat{f}}_{- k})

and

β (τ^{2})

. It has been proved in Reference [25] that the one-side score test for one dimensional parameter is LMP. Therefore, for the given level

α

, there exists a

Δ > 0

such that for all

τ^{2} \in {\bar{H}}_{A}

with

0 \leq τ^{2} \leq Δ

, the power function of score test for (10) is larger than that of any other test for (10)

\begin{matrix} β (τ^{2}) \leq \bar{β} (τ^{2}) . \end{matrix}

(16)

Let the critical region of the level

α

Bayesian score test be A, we have

\begin{matrix} β (τ^{2}) & = \int_{A} {\bar{L}}_{k} (τ^{2}; y) d μ, \end{matrix}

(17)

where

μ

is a dominating measure. According to (9),

{\bar{L}}_{k} (τ^{2}; y)

is the marginalized distribution of

L_{k} ({\hat{f}}_{- k}; y)

. According to (), every distribution shape of

{\hat{f}}_{- k} = τ ξ

with

E (ξ) = 0

and

E (ξ ξ^{T}) \propto Σ_{- k}

leads to the same

S_{Σ_{- k}}

, and therefore the same power function. So (17) can be written as

\begin{matrix} β (τ^{2}) = \int_{A} (\int L_{k} (τ ξ; y) π (ξ) d ξ) d μ = \int (\int_{A} L_{k} (τ ξ; y) d μ) π (ξ) d ξ = E_{ξ} [\bar{w} (τ ξ)] . \end{matrix}

(18)

According to (16) and (18), we have

E_{ξ} [w (τ ξ)] \leq E_{ξ} [\bar{w} (τ ξ)]

. □

To make Theorem 1 more intuitive, we give the following example.

Example: Suppose there are two alternatives to be tested in step k,

H_{A}^{l} = {\hat{f}}_{- k} = [0.1, 0, \dots, 0]

, which has a small deviation

0.1

at a low frequency and

H_{A}^{h} = {\hat{f}}_{- k} = [0, \dots, 0.1, \dots, 0]

, which has the same deviation at high frequency. The prior of

{\hat{f}}_{- k}

is given as (8), which means the test pays more attention to the low frequency components. Then, the power of testing

H_{0}

against

H_{A}^{l}

will be larger than the power of testing

H_{0}

against

H_{A}^{h}

at the same significance level. As a result, the average power of the Bayesian score test is increased since there are more alternatives having small deviations at the low frequencies in a bandlimited GS.

Furthermore, the proof of Theorem 1 does not rely on the specific form of

Σ_{- k}

, which means that it will hold for different designs of

Σ_{- k}

. We can design different

Σ_{- k}

to allocate attentions among frequency components to achieve various testing purposes with small sample size, for example, spectrum anomaly detection.

4. Numerical Analysis

4.1. Bandwidth Detection

An accurate bandwidth is helpful in various applications. For example, an accurate bandwidth in low-pass filter design for graph signals can help remove the noise and keep the original signal well; An accurate bandwidth can also help in choosing the minimal sample size in the sampling set design for graph signals. So in this section, we first validate the performance of bandwidth detection on an Erdos-Rènyi random graph with

N = 250

and the probability of edge presence being 0.25. The frequency coefficients of bandlimited GS are independently generated from a uniform distribution over the interval

[0, 1]

. Gaussian noise is added to the GS to produce the observations, the signal to noise ratio (SNR) is calculated by

SNR = 10 log ({∥ f ∥}_{2}^{2} / (N σ^{2}))

. The performance of bandwidth detection is shown in bias of 1000 simulations, the SNR is set to 20 dB and vertices are sampled randomly in each simulation. In this simulation, we design two priors for the alternatives in our method follows the guideline in Section 3.2 and the significance level of the test is set to

α = 0.05

. Prior 1 is given by (8) with

δ = 0.01

. Prior 2 is given by

\begin{matrix} {(Σ_{- k})}_{i, i} = \{\begin{matrix} 1 - \frac{i - 1}{M - k} + δ & if i \leq M - k + 1; \\ δ & if i > M - k + 1, \end{matrix} \end{matrix}

(19)

with

δ = 0.01

. To show the effectiveness of our prior designed for bandwidth detection, we compare the mean bandwidth obtained by giving

{\hat{f}}_{- k}

our designed priors and uniform prior

Σ_{- k} = I_{(N - k) \times (N - k)}

, as shown in Figure 2. The test with uniform prior fails to detect the bandwidth when

M = 80

, while the test with designed prior 1 and prior 2 performs well under the same sample size. When the sample size increases to

2 N

, the test with uniform prior turns out to have an acceptable performance, but still worse than the test with designed priors with

M = 80

. This implies that the prior we designed for bandwidth detection is very helpful for saving the sample size. We can also find that the bandwidth detection performance of our method with prior 1 and prior 2 are comparable, which indicates the guideline of prior design in our algorithm improves the detection performance instead of a specific form of prior.

The performance of our method is also compared with the bandwidth estimation method in Reference [15], in which a dictionary of 80 bandlimited kernels is constructed with

β = 10^{3}

and the regularization parameter

μ

in it is set to 0.01 by cross validation. We first simulate the performance of bandwidth detection at different noise levels for bandwidth

K = {10, 30, 50}

, the sample size is set to

M = 80

. The results are shown in Figure 3. We can find that the bias and standard deviation (SD) of Algorithm 1 and sparsity based method [15] are similar for small bandwidth at low noise level. For large bandwidth, the bias and SD of our method are significantly smaller than those of Reference [15]. This is because the GS is no longer sparse for large bandwidth, which is out of the scope of sparsity based method [15]. At high noise level, Algorithm 1 is much more robust than Reference [15] especially when the bandwidth is close to the sample size.

In Figure 4, we show that the increase of sample size can help decrease the bias and SD of Algorithm 1. However, when the sample size is abundant, the performance improvement caused by the increase of sample size can be ignored.

4.2. Signal Estimation

Bandwidth is an useful prior when estimating signals from partial observations. In this section, we use the bandwidth detected by Algorithm 1 as a prior in signal estimation and a least squares (LS) estimator is used to estimate the signal with the form

f^{'} = V_{K} {(V_{K}^{T} V_{K})}^{- 1} V_{K}^{T} y .

(20)

The estimation performance is evaluated by the normalized mean square error (NMSE), which is

NMSE = \frac{∥ (f^{'} - f) ∥_{2}^{2}}{{∥ f ∥}_{2}^{2}} .

(21)

We first show the estimation performance on the same graph signal with that in Section 4.1. The bandwidth of signal varies from 10 to 60, the sample size is set to

M = 80

and the SNR is set to 20 dB. Our method is compared with two methods based on sparsity, namely BP [26] and BCS [21], which also select frequency components to estimate the signal. The result is show in Figure 5, we can find that as the bandwidth increases, the performance of the sparsity base methods degrade rapidly.

This is because the sparsity based methods select frequency components in the whole space and the sparsity constraint makes the amount of frequency components they selected as small as possible. Therefore, they are not fit for the application when the frequency spectrum is not sparse.

We also compare the performance of the signal estimation in real-data set with BP and BCS. The data set comprises 22 signals corresponding to the average temperature on January 1st in the intervals 1998–2019 measured by 119 stations in China [27]. Each station is identified with a vertex and the distance between 2 vertices is calculated by the haversine formula. A 3-NN unweighted graph is constructed. The GS is shown in Figure 6a and the corresponding spectrum is shown in Figure 6b.

i . i . d .

Gaussian noise with SNR = 20 dB is added to the signal to generate observations. 200 simulations are implemented for each signal, the vertices are sampled randomly in each simulation. We can find that the temperatures distribute over the graph smoothly, so there are only a few low frequency components, which is sparse in spectrum. Since the GS is approximately bandlimited in real-data, the approximate bandwidth obtained by Algorithm 1 is used in (20). The average NMSE of all the simulations under different sample sizes is shown in Figure 7.

We can find that the estimation performance of our method is also better than the sparsity based methods in real-data. Since BP ensures the sparsity of the signal by

l_{1}

-norm, it results in obtaining the same signal values on each vertex in this experiment. BCS selects 14 nonadjacent frequency components in all the frequency components, including high frequencies and low frequencies. The approximate bandwidth obtained by Algorithm 1 is 12, which means we estimate the GS using the first 12 frequency components. We can find that if the GS is known to be bandlimited, the signal estimation performance can be improved by detecting the bandwidth accurately first and using the bandwidth as a prior in estimation. If the knowledge that the GS is bandlimited is not take advantage of, more samples are needed to achieve the same estimation performance.

5. Conclusions

In this letter, we proposed a multi-stage Bayesian score test approach for the bandlimited GS bandwidth detection with a small sample size. By customizing a prior for frequency coefficients being tested, we made the test more distinguishable from the models with bandwidths close to the assumed one. In practice, our method may obtain a bandwidth smaller than the true bandwidth K if there is a frequency band of which the coefficients are zeros in band, since we focus on testing the frequency coefficients close to the assumed one but ignore the others in the small sample size situation. We will try to improve the performance under this situation in the future. In addition, we would like to analyze how the different sampling sets affect the performance of bandwidth detection and design a sampling set to make the test more powerful.

Author Contributions

Conceptualization, X.X. and H.F.; methodology, X.X.; validation, X.X.; formal analysis, X.X.; investigation, X.X.; resources, X.X.; writing—original draft preparation, X.X.; writing—review and editing, X.X., H.F. and B.H.; visualization, X.X. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the Shanghai Municipal Natural Science Foundation number 19ZR1404700 and 2020 Okawa Foundation Research Grant.

Acknowledgments

We thank the Fudan-Zhuhai Innovation Institute for supporting this research.

Conflicts of Interest

The authors declare no conflict of interest.

References

Egilmez, H.E.; Ortega, A. Spectral anomaly detection using graph-based filtering for wireless sensor networks. In Proceedings of the International Conference on Acoustics, Speech, and Signal Processing (ICASSP), Florence, Italy, 4–9 May 2014; IEEE: Piscataway, NJ, USA, 2014; pp. 1085–1089. [Google Scholar]
Sakiyama, A.; Tanaka, Y.; Tanaka, T.; Ortega, A. Efficient sensor position selection using graph signal sampling theory. In Proceedings of the 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Shanghai, China, 20–25 March 2016; IEEE: Piscataway, NJ, USA, 2016; pp. 6225–6229. [Google Scholar]
Goldsberry, L.; Huang, W.; Wymbs, N.F.; Grafton, S.T.; Bassett, D.S.; Ribeiro, A. Brain signal analytics from graph signal processing perspective. In Proceedings of the International Conference on Acoustics, Speech, and Signal Processing (ICASSP), New Orleans, LA, USA, 5–9 March 2017; IEEE: Piscataway, NJ, USA, 2017; pp. 851–855. [Google Scholar]
Hu, C.; Cheng, L.; Sepulcre, J.; Johnson, K.A.; Fakhri, G.E.; Lu, Y.M.; Li, Q. A spectral graph regression model for learning brain connectivity of Alzheimer’s disease. PLoS ONE 2015, 10, e0128136. [Google Scholar] [CrossRef] [PubMed]
Hu, W.; Cheung, G.; Ortega, A.; Au, O.C. Multiresolution graph fourier transform for compression of piecewise smooth images. IEEE Trans. Image Process. 2014, 24, 419–433. [Google Scholar] [CrossRef] [PubMed]
Thanou, D.; Chou, P.A.; Frossard, P. Graph-based compression of dynamic 3D point cloud sequences. IEEE Trans. Image Process. 2016, 25, 1765–1778. [Google Scholar] [CrossRef] [PubMed] [Green Version]
Shuman, D.I.; Narang, S.K.; Frossard, P.; Ortega, A.; Vandergheynst, P. The emerging field of signal processing on graphs: Extending high-dimensional data analysis to networks and other irregular domains. IEEE Signal Process. Mag. 2013, 30, 83–98. [Google Scholar] [CrossRef] [Green Version]
Ortega, A.; Frossard, P.; Kovačević, J.; Moura, J.M.F.; Vandergheynst, P. Graph Signal Processing: Overview, Challenges, and Applications. Proc. IEEE 2018, 106, 808–828. [Google Scholar] [CrossRef] [Green Version]
Anis, A.; Gadde, A.; Ortega, A. Efficient sampling set selection for bandlimited graph signals using graph spectral proxies. IEEE Trans. Signal Process. 2016, 64, 3775–3789. [Google Scholar] [CrossRef]
Chen, S.; Varma, R.; Sandryhaila, A.; Kovačević, J. Discrete Signal Processing on Graphs: Sampling Theory. IEEE Trans. Signal Process. 2015, 63, 6510–6523. [Google Scholar] [CrossRef] [Green Version]
Wei, Z.; Li, B.; Guo, W. Optimal sampling for dynamic complex networks with graph-bandlimited initialization. IEEE Access 2019, 7, 150294–150305. [Google Scholar] [CrossRef]
Onuki, M.; Ono, S.; Yamagishi, M.; Tanaka, Y. Graph signal denoising via trilateral filter on graph spectral domain. IEEE Trans. Signal Inf. Process. Over Netw. 2016, 2, 137–148. [Google Scholar] [CrossRef]
Wang, X.; Chen, J.; Gu, Y. Local measurement and reconstruction for noisy bandlimited graph signals. Signal Process. 2016, 129, 119–129. [Google Scholar] [CrossRef]
Huang, C.; Zhang, Q.; Huang, J.; Yang, L. Reconstruction of bandlimited graph signals from measurements. Digit. Signal Process. 2020, 101, 102728. [Google Scholar] [CrossRef]
Romero, D.; Ma, M.; Giannakis, G.B. Kernel-based reconstruction of graph signals. IEEE Trans. Signal Process. 2016, 65, 764–778. [Google Scholar] [CrossRef]
Montgomery, D.C.; Peck, E.A.; Vining, G.G. Introduction to Linear Regression Analysis; John Wiley & Sons: Hoboken, NJ, USA, 2012; Volume 821. [Google Scholar]
Lan, W.; Wang, H.; Tsai, C.L. Testing covariates in high-dimensional regression. Ann. Inst. Stat. Math. 2013, 66, 279–301. [Google Scholar] [CrossRef]
Zhong, P.S.; Chen, S.X. Tests for High-Dimensional Regression Coefficients With Factorial Designs. J. Am. Stat. Assoc. 2011, 106, 260–274. [Google Scholar] [CrossRef]
Goeman, J.J.; Van De Geer, S.A.; Van Houwelingen, H.C. Testing against a high dimensional alternative. J. R. Stat. Soc. Ser. B (Stat. Methodol.) 2006, 68, 477–493. [Google Scholar] [CrossRef]
Tropp, J.; Gilbert, A.C. Signal recovery from partial information via orthogonal matching pursuit. IEEE Trans. Inform. Theory 2007, 53, 4655–4666. [Google Scholar] [CrossRef] [Green Version]
Ji, S.; Xue, Y.; Carin, L. Bayesian compressive sensing. IEEE Trans. Signal Process. 2008, 56, 2346–2356. [Google Scholar] [CrossRef]
Steyerberg, E. Stepwise Selection in Small Data Sets A Simulation Study of Bias in Logistic Regression Analysis. J. Clin. Epidemiol. 1999, 52, 935–942. [Google Scholar] [CrossRef]
Davidson, R.; MacKinnon, J.G. Econom. Theory Methods; Oxford University Press: New York, NY, USA, 2004; Volume 5. [Google Scholar]
Imhof, J.P. Computing the distribution of quadratic forms in normal variables. Biometrika 1961, 48, 419–426. [Google Scholar] [CrossRef] [Green Version]
Omelka, M. The behavior of locally most powerful tests. Kybernetika 2005, 41, 699–712. [Google Scholar]
Chen, S.S.; Donoho, D.L.; Saunders, M.A. Atomic decomposition by basis pursuit. SIAM Rev. 2001, 43, 129–159. [Google Scholar] [CrossRef] [Green Version]
Federal Climate Complex Global Surface Summary of Day Data. Available online: http://www.ncdc.noaa.gov/cgi-bin/res40.pl?page=gsod.html (accessed on 29 August 2019).

Figure 1. Illustration of the attentions on coefficients of different frequencies in different stages. Each line illustrates the attention distribution of one stage. Attention equals to 0 indicates that the corresponding frequency coefficient does not need to be tested.

Figure 2. Bias of bandwidth detection under different priors.

Figure 3. Bias and SD of bandwidth detection at different noise levels.

Figure 4. Bias and SD of bandwidth detection with different sample sizes when

K = 30

.

Figure 4. Bias and SD of bandwidth detection with different sample sizes when

K = 30

.

Figure 5. NMSE of graph signal (GS) estimation of different algorithms under different bandwidths.

Figure 6. GS constructed from the temperature measured by 119 stations on 1st January 1998 and its spectrum. (a) GS on the 3-NN graph. (b) Spectrum of the GS in (a).

Figure 7. NMSE of GS estimation of different algorithms under different sample sizes.

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

© 2020 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Xie, X.; Feng, H.; Hu, B. Bandwidth Detection of Graph Signals with a Small Sample Size. Sensors 2021, 21, 146. https://doi.org/10.3390/s21010146

AMA Style

Xie X, Feng H, Hu B. Bandwidth Detection of Graph Signals with a Small Sample Size. Sensors. 2021; 21(1):146. https://doi.org/10.3390/s21010146

Chicago/Turabian Style

Xie, Xuan, Hui Feng, and Bo Hu. 2021. "Bandwidth Detection of Graph Signals with a Small Sample Size" Sensors 21, no. 1: 146. https://doi.org/10.3390/s21010146

APA Style

Xie, X., Feng, H., & Hu, B. (2021). Bandwidth Detection of Graph Signals with a Small Sample Size. Sensors, 21(1), 146. https://doi.org/10.3390/s21010146

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Bandwidth Detection of Graph Signals with a Small Sample Size

Abstract

1. Introduction

2. Problem Formulation

3. Bandwidth Detection

3.1. High Dimensional Challenge

3.2. Design of the Prior

3.3. Method

3.4. Power Analysis

4. Numerical Analysis

4.1. Bandwidth Detection

4.2. Signal Estimation

5. Conclusions

Author Contributions

Funding

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI