Theoretical Bounds on Performance in Threshold Group Testing Schemes

Seong, Jin-Taek

doi:10.3390/math8040637

Open AccessArticle

Theoretical Bounds on Performance in Threshold Group Testing Schemes^†

by

Jin-Taek Seong

Department of Convergence Software, Mokpo National University, Muan 58554, Korea

^†

This paper is an extended version of our paper published in ICEIC 2020, Barcelona, Spain, 19–22 January 2020.

Mathematics 2020, 8(4), 637; https://doi.org/10.3390/math8040637

Submission received: 26 March 2020 / Revised: 17 April 2020 / Accepted: 20 April 2020 / Published: 21 April 2020

(This article belongs to the Special Issue Group Theory and its Applications in Engineering, Computer Science, and Structural Biology)

Download

Browse Figures

Versions Notes

Abstract

A threshold group testing (TGT) scheme with lower and upper thresholds is a general model of group testing (GT) which identifies a small set of defective samples. In this paper, we consider the TGT scheme that require the minimum number of tests. We aim to find lower and upper bounds for finding a set of defective samples in a large population. The decoding for the TGT scheme is exploited by minimization of the Hamming weight in channel coding theory and the probability of error is also defined. Then, we derive a new upper bound on the probability of error and extend a lower bound from conventional one to the TGT scheme. We show that the upper and lower bounds well match with each other at the optimal density ratio of the group matrix. In addition, we conclude that when the gaps between the two thresholds in the TGT framework increase, the group matrix with a high density should be used to achieve optimal performance.

Keywords:

defective samples; lower bound; probability of error; threshold group testing; upper bound

1. Introduction

Group testing (GT) introduced by Dorfman has been used in a wide range of fields from computer science to biology. A new application area of GT is Compressed Sensing [1], which has recently attracted attention from research communities, and this is a variant of GT. In general, GT has been a research field for handling undetermined problems to identify a subset of defective samples within a large population. The fundamental idea of GT problems comes from as follows: it is assumed that the number of defective samples is very sparse, and even if you select several samples at once, there may be no defect samples among them. Eventually, you can find defective samples without having to individually inspect all samples. In other words, GT is an inverse problem of knowing the original input states through a subset of parameters.

To date, GT has exploited a wide range of applications in biology [2,3], communication theory [4,5,6,7], signal processing [8], computer science [9,10,11], and mathematics [12]. The use of fundamental GTs extend to error correction code [13], identifying available multiple access channel [14,15], recovering sparse signals [1,8], detecting malicious attacks in security networks [16], testing good quality of products [17], and many others. Recently, there has been a move to more precisely study the performance of GT, and furthermore for noiseless and noisy frameworks nearly optimal performance has been presented in [18,19,20].

First, Dorfman developed the GT during the midst of World War II [21]. At this time, syphilis erupted in the army and the U.S. government faced a situation where it was necessary to quickly find soldiers infected with syphilis. This led to the active participation of the U.S. government to develop an early GT model to find syphilis soldiers. However, the syphilis test, which was expensive and used a long diagnostic time, was not enough to test all soldiers. Thus, GT has emerged as a new method of syphilis testing. Suppose the number of soldiers infected with syphilis is very small compared to the total number of soldiers. Indeed, this is a plausible and persuasive hypothesis. In this case, the results may be negative even if the blood of several soldiers is mixed and tested for syphilis at once. The number of tests could be reduced because several blood samples were mixed without syphilis testing individually. In addition, quick testing is allowed. This is the background in which GT models have emerged. Since then, it has been exploited and applied in various research fields based on the core ideas of this GT.

The initial GT is performed by the following way. First, mix blood samples from several soldiers to see if they respond to syphilis. When the result is positive, at least one soldier in the group is infected with syphilis. Conversely, if negative, all the blood samples pooled in the syphilis group can be confirmed not to be infected with syphilis. Such syphilis tests are possible because most soldiers are not infected with syphilis and only a few soldiers are infected with syphilis. Here the problem of GT is mainly focused on two issues. First, it is about how to choose a subset of defective samples to be included in one group. The second is about which identifying the defective samples should be used to find a set of defective samples among a plurality of samples. One to classify between many GT schemes is the way how tests are performed. One way is to perform the tests all at once by a predefined method. This is called nonadaptive GT. That is, all tests are conducted simultaneously in a predetermined manner, and the results of one test are independent of the other. Adaptive GT, however, can be used to design more test pools by using one test result to design another [2]. In general, adaptive GTs allow for fewer tests, but for most practical applications, nonadaptive GTs are preferred because they can perform all tests simultaneously. This is because it reduces the time required for testing.

Various GT models have been introduced so far. In the conventional GT [21], the output result is positive when more than one defective sample is included in the group. Therefore, if all samples included in the test are normal, the result is negative, otherwise it is positive. Quantitative GT [2] is a variant of GT. In the quantitative GT model, the output results are designed to show the number of all defective samples included in one test. This is different from the conventional GT. The model called Threshold Group Testing (TGT) [22] is a variant of GT with two thresholds. TGT takes two thresholds and determines whether the output result is negative or positive. In this model, the test result is positive if there are defective samples larger than a predetermined upper threshold in the group to be tested. Conversely, if the number of defective samples in the group is less than a lower threshold, it is negative. The main feature of this model is that it is designed to produce results that are neither negative nor positive. If there is the number of defective samples between the two thresholds, the result is randomly output with the same probability of being positive and negative. Both the upper and lower thresholds are preset when designing the TGT model, and the difference as called the gap between both them affects performance. Recently, a summary of the number of tests required and the decoding complexity of the TGT schemes including two thresholds can be found in more detail in [23,24]. The comparison with the number of tests and the decoding complexity is out of the main scope in this paper.

So far, most of bounds on performance presented in TGT problems have shown meaning results, such as improving encoding and decoding way [23,24], archiving near-optimal number of tests [25], and enhancing robust and efficient designs [26,27] in TGT schemes. However, there is a lack of research on how to design the group matrix and how defective rate of a set of all the samples affects successful decoding on performance. In this paper, it is to find out the answer to the question of how much density a group matrix should be designed to identifying defects for good performance. In addition, we see how the gap affects good design of the group matrix in TGT schemes. This was the motivation for this paper. The main goal of this paper is to clarify the relationship between the density of the group matrix and the defective rate of the signal.

In this paper, we consider a TGT framework with lower and upper thresholds which are the boundaries between positive and negative results. And we derive the lower and upper bounds for finding a set of defective samples out of a large of samples in TGTs. To this end, we exploit minimization of Hamming weight in channel coding theory. And we define probability of error for our TGT decoding scheme. We obtain new upper bounds on the probability of error which are well matched lower bounds from the information-theoretic approach. We show that the upper and lower bounds coincide with each other at the optimal density ratio of the group matrix. In addition, when the gap becomes large, it is necessary to design a group matrix having a high density to obtain optimal TGT performance. Throughout our results, we conclude how the design parameters of the TGT frameworks can affect performance. This is a main contribution of this paper. Next we will define a scheme of TGT problems.

2. Threshold Group Testing Framework

2.1. Problem Description

This section describes the TGT problem in more detail. Fist, let

x

be a binary vector of size N which has defective and normal samples. Namely,

x = {(x_{1}, x_{2}, \dots, x_{N})}^{T}

where

x_{i}

denotes the ith entry of

x

. Each entry of

x

refers to the state of the sample by using binary, that is, whether it is defective or normal. the ith defective sample is expressed by

x_{i} = 1

. If the sample is not defective, it is expressed as

x_{i} = 0

. All the entries are identically independent distributed (i.i.d.) from the following Bernoulli probability distribution,

Pr (x_{i} = θ) = \{\begin{matrix} 1 - δ & if θ = 0, \\ δ & if θ = 1, \end{matrix}

(1)

where

δ : = \frac{K}{N}

is the defective rate which in general is very small. K is the number of defective samples in

x

. In practice, the defective rate is assumed as a very small value, e.g., syphilis infection rate. Since the input signal

x

is generated from the probability distribution as defined in (1), it can be seen that the number of 1’s of an instance signal

x

is 0 to N. However, when the length of

x

is large, the number of 1’s is close to K. Let

L_{k_{1}}

be the set of the signals with the number of

k_{1}

ones in

x

, then

{∥ x ∥}_{0} = k_{1}

and

(\binom{N}{k_{1}}) = | L_{k_{1}} |

where

{∥ \cdot ∥}_{0}

is the Hamming weight and

| \cdot |

is the cardinality of the set. We define the set

L

of the signals as

L = ⋃_{k_{1} = 0}^{N} L_{k_{1}}

, then its size is

| L | = \sum_{k_{1} = 0}^{N} (\binom{N}{k_{1}}) = 2^{N}

which corresponds to the total number of binary input signals with size N.

Next, we define a group matrix

A

. This matrix consists of M rows and N columns. The role of the group matrix in the TGT model is to express which elements of

x

to test for each group. Each entry

A_{j i}

of the group matrix indicates whether the ith sample

x_{i}

is included in the jth group test. If so,

A_{j i} = 1

. Otherwise,

A_{j i} = 0

. Each row is a set of the input samples participating in the corresponding testing. All the entries of the group matrix

A

are i.i.d. from the following Bernoulli probability distribution,

Pr (A_{j i} = θ) = \{\begin{matrix} 1 - γ & if θ = 0, \\ γ & if θ = 1, \end{matrix}

(2)

where

γ

is the density ratio of the group matrix. High density ratio means that when designing a TGT model, there are many samples participating in tests, so it is expensive and complicated. In this sense, it is necessary to clarify the relation between the defective rate

δ

and the density ratio

γ

in order to design the GT schemes effectively.

We describe the output of the TGT scheme. The output TGT is mathematically described in two steps. First, the input signal defined above (1) is projected linearly into the group matrix generated from (2). Let

s

be the testing vector from the linear combination of

x

and

A

. Therefore, it is expressed as the product of the input signal and the group matrix as follows:

s = Ax .

(3)

Since all the entries of

x

and

A

are binary, the each entry of the testing vector

s

with size M has a nonnegative integer from 0 to N, thus,

s \in {0, 1, 2, \dots, N}^{M}

. Let

s_{j}

be the jth entry of the testing vector

s

. From the linear combination of (3), the entry

s_{j}

is obtained as

s_{j} = \sum_{i = 1}^{N} A_{j i} x_{i}

. The next step is to map the non-negative integer

s_{j}

to the binary output of the TGT model using two thresholds. Let L and U be defined as lower and upper thresholds, respectively. And both thresholds L and U are nonnegative integers.

Suppose f is the decision function that transforms the testing vector

s

into the binary output vector

y

, whose values are positive or negative. The input parameters of the function f are the vector

s

and the two thresholds L and U. Therefore, for function f, we express,

y = f (s, L, U) .

(4)

As defined in the TGT model [22], using the function f, if the entry

s_{j}

is greater than or equal to the upper threshold U, the corresponding output

y_{j}

is positive as

y_{j} = 1

. If

s_{j}

is less than or equal to L, the output is negative,

y_{j} = 0

. Due to the nature of the TGT scheme,

s_{j}

may be between two thresholds L and U. In this case, the output

y_{j}

for

s_{j}

is random. That is, the function f determines 0 or 1 with the same probability for the negative and positive outputs. Therefore, the function f is defined as follows using the input parameters

s_{j}

, L, and U.

f (s_{j}, L, U) = \{\begin{matrix} 0 & if s_{j} \leq L, \\ 0 or 1 & if L < s_{j} < U, \\ 1 & if s_{j} \geq U . \end{matrix}

(5)

Suppose both thresholds are preset and constant. Let the difference between the two thresholds be the gap as

G = U - L - 1

. In general, for the TGT schemes the gap is positive,

G > 0

. In the conventional GT,

G = 0

since

L = 0

and

U = 1

, so that this determines exactly the two output results, positive and negative.

The aim of the TGT scheme is to find an unknown signal

x

from a group matrix

A

and a corresponding vector

y

. So far, the main research direction of GT problems is to determine the number of tests M needed for successfully finding the defective samples of the input signal

x

. Next, we define the probability of error on successful decoding to derive a lower and an upper bound of the performance.

2.2. Definition on Probability of Error

In this section, we define the probability of error on finding defective samples of

x

in a TGT framework for given parameters, i.e., N, K, and M. Priori defining the probability of error, we classify the input signal

x

as a set of signals with the number of ones in

x

.

We assume that an estimator for decoding in our TGT framework is to find a feasible solution

\hat{z}

using the minimization of Hamming weight as follows,

\hat{z} = {arg min | | z | |}_{0} subject to f (A z) = f (A x),

(6)

where

z \in L

is a feasible signal. Let

k_{2}

be the number of ones in

z

as

k_{2} = {| | z | |}_{0}

, so that

k_{2} \leq k_{1}

. We define the error as occurring when a feasible solution

\hat{z}

decided by the minimization rule (6) is not equal to an instance signal

x

which is desired,

f (A \hat{z}) = f (A x)

but

x \neq \hat{z}

. Let

E_{0} (x, \hat{z}) : = {A : x \neq \hat{z}}

be the exact error event of this decoder as a function of the group matrix

A

. This error event

E_{0}

is a subset of the following feasible error event

E

since a feasible signal

z

is a potential candidate of estimated signals. We define the feasible error event

E

as follows,

E (x, z) : = {A : x \neq z, f (A z) = f (A x)} .

(7)

Note that

E_{0} (x, \hat{z}) \subseteq E (x, z)

. Let

Pr (E_{0})

and

Pr (E)

be the probability of error for both events

E_{0} (x, \hat{z})

and

E (x, z)

, respectively. And then, the following inequality is satisfied as

Pr (E_{0}) \leq Pr (E)

. The probability of error

P_{e} : = Pr (E_{0})

is upper bounded by

\begin{matrix} P_{e} & \leq Pr (E) \\ = \frac{1}{| L |} \sum_{x \in L} \sum_{z \in L, z \neq x} Pr (A \in E (x, z) | (x, z)) . \\ = \frac{1}{| L |} \sum_{x \in L} \sum_{z \in L, z \neq x} Pr (f (A z) = f (A x) | (x, z)) . \end{matrix}

(8)

In a brute-force approach, enumeration of individual probabilities of feasible error events in (8) is almost intractable because

| L |

is typically very large. This brute-force approach can be avoided with what will be described subsequently next.

3. Theoretic Bounds on Performance

3.1. Lower Bound

We aim to obtain a theoretic lower bound through this section. First, we investigate related works for some lower bounds. In [2], the author presented a lower bound on the probability of error by using an information-theoretic approach. That is, if we use a sequential algorithm of a conventional GT framework, the minimum number of tests for finding K defective samples is achieved as

M \geq {log}_{2} | X |,

(9)

where

X

denotes the set of the sample space. This bound is obtained from the fact that for each group test,

X

is divided into two disjoint subsets. Moreover, each subset corresponds to one of the two possible groups in the conventional GT schemes. As you know, the lower bound itself is unachievable in small problems. Note that since it must be realizable by some GT problems, the splitting of the set

X

is not random.

We use Fano’s inequality [28] to find the lower bound on the probability of error for the TGT framework. Fano’s inequality is a theorem for the relation between conditional entropy of input signals and output results and the probability of error for any decoder. In addition, in coding theory, Fano’s inequality has been used to derive converse proofs. The following provides an important clue that clarifies the bounds of inferences. In our earlier work [29], we presented the following lower bound on performance in the TGT scheme which is an extension of the previous work [30]. The following is the lower bound on th probability of error for the TGT scheme which is referred to as Theorem 1. Full proof of this theorem is detailed in [29].

Theorem 1

([29]). For any group matrix from (2), any decoding scheme defined in (6), and

0 < δ \leq 1 / 2

, the probability of error

P_{e}

for the TGT scheme is lower bounded by

P_{e} \geq H_{b} (δ) - \frac{M}{N} H_{b} (p) - \frac{1}{N}

(10)

where

H_{b} (\cdot)

denotes the binary entropy, p is the probability that each entry

y_{j}

has 0.

The right side of (10) means the minimum probability of error. In other words, it indicates the lower bound on the probability of error even if any decoder is used. Namely, the bound on the right side of (10) must be negative in order for that the probability of error is vanished. Thus, in the TGT schemes, the following necessary condition is satisfied for perfect decoding,

\begin{matrix} M & > \frac{N H_{b} (δ) - 1}{H_{b} (p)} \geq N H_{b} (δ) - 1 \end{matrix}

(11)

where the second inequality comes from

0 \leq H_{b} (p) \leq 1

. The result of (11) has the following meaning. The number of tests required to successfully reconstruct all defective samples in a TGT problem must meet the following necessary condition:

M > N H_{b} (δ)

. This condition is the same as that of [2] obtained from the information-theoretic approach. It also matches the necessary condition for successfully decoding of the conventional GT scheme. Second, if the following condition is satisfied with

H_{b} (p) = 1

, the minimum number of tests M is achieved. The binary entropy is maximized as the probability p is 0.5. Therefore, for the maximum binary entropy, the following (12) must be satisfied.

Next, we clarify results of (12) on how much effect on design parameters for TGT frameworks with respect to defective rate

δ

, density ratio

γ

, the lower and upper thresholds, L and U. Figure 1 shows the number of tests for different density ratios of the group matrix and gaps between lower and upper thresholds in TGT. As shown in Figure 1, the larger the gap, the more we can design a group matrix with a density ratio over a wider range required for the minimum number of tests. In fact, this can be understood intuitively. Because of the large gaps between two lower and upper thresholds, we can only separate between positive and negative results by performing TGT with more samples. In other words, there is a greater probability that the results of TGT would be negative or random if not enough samples are involved in TGT. Figure 2 shows the number of tests in TGT frameworks when the gaps are equal to 0,

G = 0

, but the two thresholds vary. Overall, to keep the minimum number of tests, we need to use a larger density ratio of the group matrix as the two thresholds increase. It also allows us to use a group matrix with a wider density ratio when we use large thresholds.

\begin{matrix} \frac{1}{2} & = \sum_{s_{1} = 0}^{L} (\binom{N}{s_{1}}) {(δ γ)}^{s_{1}} {(1 - δ γ)}^{N - s_{1}} + \frac{1}{2} \sum_{s_{1} = L + 1}^{U - 1} (\binom{N}{s_{1}}) {(δ γ)}^{s_{1}} {(1 - δ γ)}^{N - s_{1}} \end{matrix}

(12)

3.2. Upper Bound

In this section, we explain the upper bound on performance of TGT. In [19], as a function of the number of tests, an upper bound on the probability of error was obtained. This upper bound is the same as the theoretical lower bound. In this section, we aim to obtain the upper bound by using the minimization of Hamming weight defined in (6). Basically, the minimization of Hamming weight is NP hard, but it allows us to analyze the performance of TGT frameworks on how we can construct the group matrix and how good the decoding algorithm can improve compared to other algorithms. As far as we know, the proposed upper bound is a new approach which is simple and clear for the evaluation of the performance on TGT.

Let us recall the upper bound on probability of error we have defined in (8). Now we aim to drive the upper bound as written in (8). The basic ideas to tackle this bound can be thought of as follows. We first think of the same error pattern. The next step is to find the probability for that error pattern. Finally, we can obtain the total probability by collecting all the individual probabilities with the same error pattern. To find the same error pattern, consider the following: two probabilities are the same, i.e.,

Pr (f (A_{j} x) = f (A_{j} z_{1})) = Pr (f (A_{j} x) = f (A_{j} z_{2}))

such that

(z_{1} \neq z_{2}) \in L

and

{∥ z_{1} ∥}_{0} = {∥ z_{2} ∥}_{0} = k_{2}

where

A_{j}

is the jth row of the group matrix

A

. In other words, two probabilities for

z_{1}

and

z_{2}

having the same Hamming weights are the same (for further detail, we will prove in below). And then, we add the individual probabilities with the same Hamming weights with respect to two vectors

x

and

z

. And we count the number of vectors with the same probability.

Theorem 2.

Given any unknown signal in (1) and any group matrix in (2), using arbitrary decoder defined in (6), the probability of error

P_{e}

is bounded by

\begin{matrix} P_{e} & \leq \frac{1}{| L |} \sum_{k_{1} = 0}^{N} \sum_{k_{2} = 0, x \neq z}^{k_{1}} (\binom{N}{k_{2}}) (\binom{N}{k_{1}}) δ^{k_{1}} {(1 - δ)}^{N - k_{1}} Pr (f (A x) = f (A z)) \end{matrix}

(13)

Proof of Theorem 2.

The conditional probability in (8) with given condition of

k_{1}

and

k_{2}

Hamming weights for

x

and

z

, can be rewritten by the independent rows as follows,

Pr (f (A x) = f (A z)) = \prod_{j = 1}^{M} Pr (f (A_{j} x) = f (A_{j} z))

(14)

Let

P : = Pr (f (A_{j} x) = f (A_{j} z))

. We therefore take into account the probability for the jth entry of

y

, and look at the probability in more detail,

\begin{matrix} P & = Pr (f (A_{j} x) = 0) Pr (f (A_{j} z) = 0) + Pr (f (A_{j} x) = 1) Pr (f (A_{j} z) = 1) \end{matrix}

(15)

where equality holds between the left and right sides of (15) for

f (A_{j} x) = f (A_{j} z)

, i.e.,

0 = 0

and

1 = 1

. Now we find out two probabilities as follows,

\begin{matrix} Pr (f (A_{j} x) = 0) Pr (f (A_{j} z) = 0) = [Pr (\sum_{i = 1}^{N} A_{j i} x_{i} \leq L) + \frac{1}{2} Pr (L < \sum_{i = 1}^{N} A_{j i} x_{i} < U)] \\ \times [Pr (\sum_{i = 1}^{N} A_{j i} z_{i} \leq L) + \frac{1}{2} Pr (L < \sum_{i = 1}^{N} A_{j i} z_{i} < U)] \end{matrix}

(16)

And then,

\begin{matrix} Pr (f (A_{j} x) = 1) Pr (f (A_{j} z) = 1) = [Pr (\sum_{i = 1}^{N} A_{j i} x_{i} \geq U) + \frac{1}{2} Pr (L < \sum_{i = 1}^{N} A_{j i} x_{i} < U)] \\ \times [Pr (\sum_{i = 1}^{N} A_{j i} z_{i} \geq U) + \frac{1}{2} Pr (L < \sum_{i = 1}^{N} A_{j i} z_{i} < U)] \end{matrix}

(17)

where

x

and

z

have

k_{1}

and

k_{2}

Hamming weights, respectively, and so that (16) and (17) hold.

Given Hamming weight and two thresholds, we obtain the following conditional probability for

x

,

\begin{array}{l} Pr (L < \sum_{i = 1}^{N} A_{j i} x_{i} < U | {∥ x ∥}_{0} = k_{1}) & = Pr (L < \sum_{i = 1}^{k_{1}} A_{j i} < U) \\ = \sum_{d = L + 1}^{U - 1} (\binom{k_{1}}{d}) γ^{d} {(1 - γ)}^{k_{1} - d} \end{array}

(18)

Using (18), we find out all the probabilities for the given

x

and

z

in (16) and (17). Next, the probability with

k_{1}

Hamming weight for

x

is defined as

Pr ({∥ x ∥}_{0} = k_{1}) = (\binom{N}{k_{1}}) δ^{k_{1}} {(1 - δ)}^{N - k_{1}}

(19)

This is the end of the proof for Theorem 2. □

For a special case with

k_{1} = k_{2} = K

that we know exactly K defective samples in advance,

\begin{array}{l} P_{e} & \leq (\binom{N}{K}) Pr (f (Ax) = f (Az)) = (\binom{N}{K}) P^{M} \\ \leq 2^{N H_{b} (K / N) + M {log}_{2} P} \end{array}

(20)

where recall that

P : = Pr (f (A_{j} x) = f (A_{j} z))

where

0.5 \leq P \leq 1

. In order for vanishing the probability of error

P_{e}

in the right side of (20), the following condition holds as

N \to \infty

:

\begin{matrix} M & > \frac{N H_{b} (K / N)}{{log}_{2} P^{- 1}} \\ \geq N H_{b} (K / N) \end{matrix}

(21)

The minimum number of tests M required for finding defective samples can be obtained when the probability P is 0.5. This result is exactly the same as the lower bound in (11).

Figure 3 shows the plot of comparisons of the upper and the lower bounds with different thresholds L and U for

N = 1000

,

δ = 0.05

, and

G = 0

. This figure is drawn from the expression given in such that for

K = 50

, the number of tests M is obtained from the probability of error being lower than

10^{- 5}

. One interesting point of this result is that there is an optimal density ratio of the group matrices to obtain the minimum number of tests. In addition, we see that our proposed upper bounds are well matched in comparison with the lower bounds from the information-theoretic theory. One more fact from Figure 3 is that as the two thresholds L and U increase while the gaps are constant, the group matrix should be denser to successfully find defective samples with only a small number of tests. This is an important result. For example, if the two thresholds are small, we have to generate more sparser group matrices. Otherwise, the performance of the TGT framework would be worse. One of the meaning findings shown in Figure 3 is that the acceptable range of the density ratio mainly depends on the lower and upper thresholds. In other words, when the two thresholds are small, a narrow range of the density ratio can be suitable. This characteristic is one of the factors to consider when designing TGT frameworks.

Figure 4 shows how the gap affects performance on the number of tests where we use

N = 1000

and

δ = 0.05

evaluated at the probability of error of

10^{- 5}

. As shown in Figure 4, the proposed upper bounds have the same values as the lower bounds at the optimum density ratio around, but it is slightly different in the region outside the optimal range. From Figure 4, we observe that the larger the gaps G, the greater the density ratio of the group matrix and the better the performance. Higher density ratio of the group matrix in TGTs are a cautious approach due to increasing computational burden. Figure 5 shows comparisons of the upper and lower bounds for

N = 1000

with the optimal density ratios of the group matrices. There is no difference between both bounds over whole range of defective rates. Namely, our lower and upper bounds well match with each other.

4. Conclusions

In this paper, we considered a TGT framework with lower and upper thresholds which are the boundaries between positive and negative results. In addition, we derived the lower and upper bounds for finding defective samples out of a large of samples in TGTs. To this end, we exploited the minimization of the Hamming weight in channel coding theory. Moreover, we defined the probability of error for our decoding scheme. We found the new upper bounds on the probability of error which are well matched with the lower bounds obtained from the information-theoretic bound. We showed that the upper and lower bounds coincide with each other at the optimal density ratio of the group matrix. In addition, we observed that when the gaps between the two thresholds in the TGT scheme increase, the group matrix with high density should be used to achieve optimal performance. Through our results, we concluded how the design parameters of the TGT frameworks can affect the performance.

Author Contributions

Funding acquisition, J.-T.S.; methodology, J.-T.S.; supervision, J.-T.S.; writing–original draft, J.-T.S. Author have read and agreed to the published version of the manuscript.

Funding

This paper was supported by the National Research Foundation of Korea (NRF) grant funded by the korean government (NRF-2017R1C1B5075823).

Conflicts of Interest

The author declares no conflict of interest.

References

Donoho, D.L. Compressed Sensing. IEEE Trans. Inf. Theory 2006, 52, 1289–1306. [Google Scholar] [CrossRef]
Du, D.-Z.; Hwang, F.-K. Pooling Designs and Nonadaptive Group Testing: Important Tools for DNA Sequencing; World Scientific: Singapore, 2006. [Google Scholar]
Bar-Lev, S.K.; Kleiner, I.; Perry, D.; Stadje, W. Recycled incomplete identification procedures for blood screening. Eur. J. Oper. Res. 2017, 259, 330–343. [Google Scholar] [CrossRef]
Tsybakov, A.; Likhanov, P. Packet communication on a channel without feedback. Probl. Inf. Transm. 1983, 19, 69–84. [Google Scholar]
Wolf, J.K. Born again group testing: multi-access communications. IEEE Trans. Inf. Theory 1984, 31, 185–191. [Google Scholar] [CrossRef]
Anderson, P.-O. Superimposed Codes for the Euclidean Channel; Linkoping University: Linkoping, Sweden, 1994. [Google Scholar]
Fan, P.Z.; Darnell, M.; Honary, B. Superimposed codes for the multiaccess binary adder channel. IEEE Trans. Inf. Theory 1995, 41, 1178–1182. [Google Scholar] [CrossRef][Green Version]
Candes, E.; Romberg, J.; Tao, T. Robust uncertainty principles: exact signal reconstruction from highly incomplete frequency information. IEEE Trans. Inf. Theory 2006, 52, 489–509. [Google Scholar] [CrossRef]
Amiri, E.; Tardos, G. High rate fingerprinting codes and fingerprinting capacity. In Proceedings of the 20th ACM-SIAM Sympos, Discrete Algorithms, New York, NY, USA, 4–6 January 2009. [Google Scholar]
Barg, A.; Blakley, G.R.; Kabatiansky, G.A. Digital fingerprinting codes: Problem statements, constructions, identification of traitors. IEEE Trans. Inf. Theory 2003, 49, 852–865. [Google Scholar] [CrossRef]
Desmedt, Y.; Duif, N.; Tilborg, V.H.; Wang, H. Bounds and constructions for key distribution schemes. Adv. Math. Commun. 2009, 3, 273–293. [Google Scholar] [CrossRef]
Colbourn, C.J.; Keri, G.; Rivas Soriano, R.P.; Schlage-Puchta, J.-C. Covering and radius-covering arrays: constructions and classification. Discret. Appl. Math. 2010, 158, 1158–1180. [Google Scholar] [CrossRef][Green Version]
Jnr, E.A.; Key, J.D. Designs and Their Codes; Cambridge University Press: Cambridge, England, 1992. [Google Scholar]
Dyachkov, A.G.; Rykov, V.V. A coding model for a multiple-access adder channel. Probl. Inf. Transm. 1981, 17, 94–104. [Google Scholar]
Bar-David, I.; Plotnik, E.; Rom, R. Forward collision resolution—A technique for random multiple-access to the adder channel. IEEE Trans. Inf. Theory 1993, 39, 1671–1675. [Google Scholar] [CrossRef]
Laarhoven, T. Efficient probabilistic group testing based on traitor tracing. In Proceedings of the 51st Annual Allerton Conference on Communication, Control, and Computing, Monticello, IL, USA, 2–4 October 2013. [Google Scholar]
Bar-Lev, S.K.; Boneh, A.; Perry, D. Incomplete identification models for group-testable items. Nav. Res. Logist. 1990, 37, 647–659. [Google Scholar] [CrossRef]
Ganditota, V.; Grigorescu, E.; Jaggi, S.; Zhou, S. Nearly Optimal Sparse Group Testing. IEEE Trans. Inf. Theory 2019, 65, 2760–2773. [Google Scholar] [CrossRef]
Chan, C.L.; Che, P.H.; Jaggi, S.; Saligrama, V. Non-adaptive probabilistic group testing with noisy measurements: near-optimal bounds with efficient algorithms. In Proceedings of the 49th Annual Allerton Conference on Communication, Control, and Computing, Monticello, IL, USA, 28–30 September 2011. [Google Scholar]
Scarlett, J. Noisy Adaptive Group Testing: Bounds and Algorithms. IEEE Trans. Inf. Theory 2019, 65, 3646–3661. [Google Scholar] [CrossRef]
Dorfman, R. The Detection of Defective Members of Large Populations. Ann. Math. Stat. 1943, 14, 436–440. [Google Scholar] [CrossRef]
Damaschke, P. Threshold group testing. In General Theory of Information Transfer and Combinatorics; Springer: Berlin/Heidelberg, Germany, 2006; Volume 4123, pp. 707–718. [Google Scholar]
Bui, T.V.; Kuribayashi, M.; Cheraghchi, M.; Echizen, I. Efficiently Decodable Non-Adaptive Threshold Group Testing. IEEE Trans. Inf. Theory 2019, 65, 5519–5528. [Google Scholar] [CrossRef]
Bui, T.V.; Kuribayashi, M.; Cheraghchi, M.; Echizen, I. Improved encoding and decoding for non-adaptive threshold group testing. arXiv 2019, arXiv:1901.02283. [Google Scholar]
Chan, C.L.; Cai, S.; Bakshi, M.; Jaggi, S.; Saligrama, V. Near-Optimal Stochastic Threshold Group Testing. In Proceeding of the 2013 IEEE Information Theory Workshop, Sevilla, Spain, 9–13 September 2013. [Google Scholar]
Chen, H.; Bonis, A.D. An almost optimal algorithm for generalized threshold group testing with inhibitors. J. Comput. Biol. 2011, 18, 851–864. [Google Scholar] [CrossRef]
De Marco, G.; Jurdzinski, T.; Rozanski, M.; Stachowiak, G. Subquadratic non-adaptive threshold group testing. Fundam. Comput. Theory 2017, 177–189. [Google Scholar] [CrossRef]
Cover, T.M.; Thomas, J.A. Elements of Information Theory; Wiley: Hoboken, NJ, USA, 2009. [Google Scholar]
Seong, J.-T. A Bound for Finding Defective Samples in Threshold Group Testing. In Proceedings of the 2020 International Conference on Electronics, Information, and Communication (ICEIC), Bacelona, Spain, 19–22 January 2020. [Google Scholar]
Seong, J.-T. Density of Pooling Matrices vs. Sparsity of Signal of Group Testing Frameworks. IEICE Trans. Inf. Syst. 2019, E102, 1081–1084. [Google Scholar] [CrossRef]

Figure 1. The number of tests M with respect to different gaps equal to

G = 0

, 2, 4, 6, 8, and 10, and density ratio of the group matrix at

N = 1000

,

δ = 0.05

(evaluated by probability of error at

10^{- 5}

).

Figure 1. The number of tests M with respect to different gaps equal to

G = 0

, 2, 4, 6, 8, and 10, and density ratio of the group matrix at

N = 1000

,

δ = 0.05

(evaluated by probability of error at

10^{- 5}

).

Figure 2. The number of tests M with respect to two thresholds, and density ratio of the group matrix at

N = 1000

,

δ = 0.05

(evaluated by probability of error at

10^{- 5}

).

Figure 2. The number of tests M with respect to two thresholds, and density ratio of the group matrix at

N = 1000

,

δ = 0.05

(evaluated by probability of error at

10^{- 5}

).

Figure 3. Comparisons of the upper and lower bounds (evaluated at

10^{- 5}

) with different thresholds L and U for

N = 1000

,

δ = 0.05

, and

G = 0

: solid lines indicate the lower bounds and marked dashed lines indicate the upper bounds.

Figure 3. Comparisons of the upper and lower bounds (evaluated at

10^{- 5}

) with different thresholds L and U for

N = 1000

,

δ = 0.05

, and

G = 0

: solid lines indicate the lower bounds and marked dashed lines indicate the upper bounds.

Figure 4. Comparisons of the upper and lower bounds (evaluated at

10^{- 5}

) with different gaps G for

N = 1000

and

δ = 0.05

: solid lines indicate the lower bounds and marked dashed lines indicate the upper bounds.

Figure 4. Comparisons of the upper and lower bounds (evaluated at

10^{- 5}

) with different gaps G for

N = 1000

and

δ = 0.05

: solid lines indicate the lower bounds and marked dashed lines indicate the upper bounds.

Figure 5. Comparisons of the upper and lower bounds (evaluated at

10^{- 5}

) for

N = 1000

: solid lines indicate the lower bounds and dashed lines indicate the upper bounds.

Figure 5. Comparisons of the upper and lower bounds (evaluated at

10^{- 5}

) for

N = 1000

: solid lines indicate the lower bounds and dashed lines indicate the upper bounds.

© 2020 by the author. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Seong, J.-T. Theoretical Bounds on Performance in Threshold Group Testing Schemes. Mathematics 2020, 8, 637. https://doi.org/10.3390/math8040637

AMA Style

Seong J-T. Theoretical Bounds on Performance in Threshold Group Testing Schemes. Mathematics. 2020; 8(4):637. https://doi.org/10.3390/math8040637

Chicago/Turabian Style

Seong, Jin-Taek. 2020. "Theoretical Bounds on Performance in Threshold Group Testing Schemes" Mathematics 8, no. 4: 637. https://doi.org/10.3390/math8040637

APA Style

Seong, J.-T. (2020). Theoretical Bounds on Performance in Threshold Group Testing Schemes. Mathematics, 8(4), 637. https://doi.org/10.3390/math8040637

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Theoretical Bounds on Performance in Threshold Group Testing Schemes^†

Abstract

1. Introduction

2. Threshold Group Testing Framework

2.1. Problem Description

2.2. Definition on Probability of Error

3. Theoretic Bounds on Performance

3.1. Lower Bound

3.2. Upper Bound

4. Conclusions

Author Contributions

Funding

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Article Menu

Theoretical Bounds on Performance in Threshold Group Testing Schemes †

Abstract

1. Introduction

2. Threshold Group Testing Framework

2.1. Problem Description

2.2. Definition on Probability of Error

3. Theoretic Bounds on Performance

3.1. Lower Bound

3.2. Upper Bound

4. Conclusions

Author Contributions

Funding

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Theoretical Bounds on Performance in Threshold Group Testing Schemes^†