Next Article in Journal
Non-Hermitian Generalization of Rényi Entropy
Previous Article in Journal
Entropy Measurements for Leukocytes’ Surrounding Informativeness Evaluation for Acute Lymphoblastic Leukemia Classification
Previous Article in Special Issue
Lossless Medical Image Compression by Using Difference Transform
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Group Testing with Blocks of Positives and Inhibitors

1
Department of Computer Science, National University of Singapore, Singapore 117417, Singapore
2
National Institute of Informatics, Tokyo 101-8430, Japan
3
Department of Information and Communication Engineering, University of Tokyo, Tokyo 113-8654, Japan
4
Graduate School of Natural Science and Technology, Okayama University, Okayama 700-8530, Japan
5
National Institute of Technology, Tokyo College, Hachioji, Tokyo 193-0997, Japan
6
Faculty of Information Technology, University of Science, VNU-HCMC, Ho Chi Minh City 72711, Vietnam
7
Faculty of Information Technology, Vietnam National University, Ho Chi Minh City 720300, Vietnam
*
Author to whom correspondence should be addressed.
Entropy 2022, 24(11), 1562; https://doi.org/10.3390/e24111562
Submission received: 8 September 2022 / Revised: 15 October 2022 / Accepted: 27 October 2022 / Published: 30 October 2022
(This article belongs to the Special Issue Theory and Applications of Information Processing Algorithms)

Abstract

:
The main goal of group testing is to identify a small number of specific items among a large population of items. In this paper, we consider specific items as positives and inhibitors and non-specific items as negatives. In particular, we consider a novel model called group testing with blocks of positives and inhibitors. A test on a subset of items is positive if the subset contains at least one positive and does not contain any inhibitors, and it is negative otherwise. In this model, the input items are linearly ordered, and the positives and inhibitors are subsets of small blocks (at unknown locations) of consecutive items over that order. We also consider two specific instantiations of this model. The first instantiation is that model that contains a single block of consecutive items consisting of exactly known numbers of positives and inhibitors. The second instantiation is the model that contains a single block of consecutive items containing known numbers of positives and inhibitors. Our contribution is to propose efficient encoding and decoding schemes such that the numbers of tests used to identify only positives or both positives and inhibitors are less than the ones in the state-of-the-art schemes. Moreover, the decoding times mostly scale to the numbers of tests that are significantly smaller than the state-of-the-art ones, which scale to both the number of tests and the number of items.

1. Introduction

Group testing [1] was first introduced to reduce time and cost of testing draftees who were possibly positive for syphilis. In this problem, the number of syphilitic draftees is outnumbered by the number of non-syphilitic draftees. The main idea of group testing is instead of testing draftees individually, sets of draftees are pooled and tested. If the test outcome of a pool is positive, then there exists at least one draftee in that pool that is syphilitic and none of the draftees in the pool are syphilitic otherwise. Since this seminal work, group testing has been usually treated as a problem of identifying a small number of specific items in a large population of items. The specific items depend on context and affect how a test on a subset of items is positive or negative.
There are two general strategies for designing tests [2]. The first is adaptive group testing in which the design of a test depends on the designs of the previous tests. This approach usually attains an information-theoretic bound for the number of tests but consumes a substantial amount of time for implementation because of several design stages. To remedy its time-consuming nature while achieving a relatively low number of tests, non-adaptive group testing (NAGT) is used. In this strategy, all tests are designed independently and can be performed in parallel. Because of its advantage, NAGT has been used in a wide range of applications, such as computational and molecular biology [2,3], networking [4], COVID-19 [5,6], and neuroscience [7]. In this work, our focus is on the non-adaptive testing strategy.
NAGT can be represented by a t × n binary matrix T = ( t i j ) , where n is the number of items and t is the number of tests. An entry t i j = 1 means that item (column) j belongs to test (row) i, and t i j = 0 means otherwise. The jth item is represented by the jth column of the matrix. The procedure to produce the measurement matrix is called construction, the procedure to obtain the outcomes of tests using the measurement matrix is called encoding, and the procedure to recover specific items from the outcomes is called decoding. A measurement matrix is random if some tests are generated by a probabilistic scheme, whereas it is deterministic if every test is deterministic. A measurement matrix is strongly explicit (explicit) if it takes the time and space polynomial of the number rows (respectively, the number rows and the number of columns) to generate a column in it.
Some distribution settings may apply on specific items. There are two common settings: (i) the probabilistic setting, in which there is some probability distribution used on specific items, and the identification error probability is allowed; and (ii) the combinatorial setting, which is our focus here, and no probability distribution is used on specific items.
Consider standard group testing in which specific items are only positives. Suppose a test on a subset of items is positive if the subset contains at least one positive and is negative otherwise. Throughout the paper, log refers to base 2 logarithms. If we give a population of n items up to d positives, then there are a number of works for attaining a low number of tests, say t = O ( d 2 log 1 + o ( 1 ) n ) , and/or a fast decoding time, say poly ( d , ln n ) [8,9,10,11,12,13,14] in the combinatorial setting. In probabilistic settings, Bondorf [15] et al. show that the number of tests can be reduced to O ( d log n ) with a decoding time of O ( d 2 log d · log n ) . Price and Scarlett [16] later improved the decoding time to O ( d log n ) .

1.1. New Model and Problem Definition

Because of the natural phenomenon in biology, a new type of item called inhibitor was introduced in group testing [3] and studied [17,18,19,20]. An inhibitor item causes a negative outcome for any test it is involved in. On the other hand, a test on a subset of items is positive if the subset does not contain any inhibitor and contains at least one positive.
Group testing with blocks of positives has been recently presented by Bui et al. [21], which is a generalization of group testing with consecutive positives [22,23,24,25,26,27]. In this model, input n items are linearly ordered, and all positives belong to at most k blocks of consecutive items and each block has up to d consecutive items.
Combining the two models above, we consider a novel model called group testing with blocks of positives and inhibitors. The input n items are linearly ordered. We sub-categorize the model into three models and illustrate them in Figure 1. The first model contains one block of d + h consecutive items and that block contains exactly d positives and h inhibitors. The second model, which is a general model of the first one, contains one block of D d + h consecutive items and that block contains up to d positives and h inhibitors. The third model, which is the most general one, contains multiple blocks, says k, of consecutive items in which each block of size up to D d + h contains up to d positives and h inhibitors. Note that the assumption on the known upper bounds for k , h , and d are obtained from previous statistics.
We formulate the three models above as follows. Sets of the form C = { c 1 , , c k } used in this work are equipped with linear order c i c i + 1 for 1 i < k . We index the population of n items from 1 to n, namely N = { 1 , 2 , , n } . Let x = ( x 1 , , x n ) T { 1 , 0 , 1 } n be the binary representation vector of n items, where x j = 1 indicates that item j is positive, x j = 0 indicates that item j is negative, and x j = 1 indicates that item j is inhibitory. A test on a subset of items is positive if the subset contains at least one positive and does not contain any inhibitors. Otherwise, the test outcome is negative.
The test notation is denoted as ⊙. Let p = ( p 1 , , p n ) { 0 , 1 } n be the test representation vector. Then, the outcome vector of the test p with the input vector x , namely p x , is positive (1) if there does not exist a j such that p j = 1 and x j = 1 , and there exists a j such that p j = 1 and x j = 1 . The test outcome is negative (0) otherwise. Given a measurement matrix M of size t × n and an input vector x , the corresponding outcome vector is M x = [ y 1 , , y n ] T , where y i = M ( i , : ) x .
There are two common decoding types based on classification strategy. The first is to only identify the positives while the second is to identify both positives and inhibitors. Our objective is to find an efficient encoding and decoding scheme to satisfy two decoding types, i.e., minimizing the number of tests and the decoding time.

1.2. Contributions

Overview: We study group testing with blocks of positives and inhibitors and provide efficient encoding and decoding schemes to tackle it. By leveraging the knowledge of positives and inhibitors belonging to a small interval of size D, our objective is to identify the position of some positive, say j * ; then, one could claim that the indices of all positives and inhibitors must belong to the range from max { 1 , j * D + 1 } to min { j * + D 1 , n } . To precisely identify positives and inhibitors, appropriate tests are designed to accomplish this task.
Our proposed scheme includes two procedures, which are the filtering and scrutinizing procedures. The tests in the filtering procedure remove most negative items and leave a subset(s) of size up to 2 D that contains all positives, inhibitors, and probably some negatives. The tests in the scrutinizing procedure remove all negatives and then identify positives and inhibitors. The details of the two procedures are specified in accordance to each specific problem.
The contributions for a single block of (consecutive) positives and inhibitors is summarized in Theorem 1. The proofs for the results of the first and second model are described in Section 3 and Section 4.
Theorem 1. 
Let 1 d , h , d + h n be integers. Suppose that a population of n linearly ordered items includes exactly d positives and h inhibitors in a block of D d + h items that are consecutive that order. When D = d + h (respectively, D d + h ), there exists a deterministic and strongly explicit measurement matrix of size O h log h n d + h + d × n (respectively, O ( D log n ) × n ) that can be used to identify all positives in O h log h n d + h + d (respectively, O ( D log n ) ) time. Moreover, it requires O ( d + h ) 3 log n d + h (respectively, O D log n + D 3 log ( n / D ) ) tests to identify all positives and inhibitors in O ( d + h ) 4 log n d + h (respectively, O D log n + D 4 log ( n / D ) ) time.
The contribution for blocks of positives and inhibitors is summarized in the following theorem, which is proved later in Section 5.
Theorem 2. 
Let 1 d , h , d + h D n be integers. Suppose that a population of n items is linearly ordered and the positives and inhibitors belong to blocks of consecutive items in which each block has a size of up to D and contains up to d positives and h inhibitors. Then, there exists a deterministic and strongly explicit measurement matrix of size O ( D k 2 ( D + log n D ) log n D ) × n that can be used to identify all positives in O ( D k 2 ( D + log n D ) log n D ) time. Moreover, it requires O ( D 2 k 2 ( D + log n D ) log n D ) tests to identify all positives and inhibitors in time.
O D k 2 log n k D ( D + log n D ) + k 4 D 4 log n k D .

2. Preliminaries

Disjunct matrices were first introduced by Kautz and Singleton [28] as superimposed codes and then generalized by Stinson and Wei [29] and D’yachkov et al. [30]. We later use them for identifying both positives and inhibitors. Let the support set for vector v = ( v 1 , , v w ) be supp ( v ) = { j v j 0 } and | v | = | { j v j 0 } | . We denote M ( i , : ) and M ( : , j ) as the ith row and the jth column of matrix M . The formal definition of a disjunct matrix is as follows.
Definition 1. 
An m × n binary matrix M is called an ( n , v , u ) -disjunct matrix if, for any two disjoint subsets S 1 , S 2 [ n ] such that | S 1 | = v and | S 2 | = u , there exists at least one row in which there are all 1s among the columns in S 2 while all the columns in S 1 have 0s, i.e., j S 2 supp ( M ( : , j ) ) \ j S 1 supp ( M ( : , j ) ) 1 .
Chen et al. [31] gave an upper bound on the number of rows for ( n , v , u ) -disjunct matrices as follows.
Theorem 3 
([31] Theorem 3.2). For any positive integers v , u , and n with x = v + u n , there exists a t × n ( n , v , u ) -disjunct matrix with the following.
t ( n , v , u ) = O x u u x v v x log n x .
Once u = 1 , ( n , v , 1 ) -disjunct matrices become v-disjunct matrices. The following theorem states the construction and decoding time for a d-disjunct matrix.
Theorem 4 
([11] Theorem 16). Let 1 d n . Then, there exists a deterministic and explicit t × n d-disjunct matrix with t = O ( d 2 log n ) that can be decoded in the polynomial time of t.

3. Single Block of Consecutive Positives and Inhibitors

In this section, we consider the case when the positives and inhibitors are consecutive and the numbers of positives and inhibitors are known in advance. Set D = d + h .

3.1. Encoding Procedure

Set a = D / ( h + 1 ) and κ = n / a . A super item, denoted as · ¯ , is a set of consecutive items. The n items are distributed into κ super items indexed from 1 to κ and each super item contains exactly a items, except that the last one may contain less than a items. Let E ¯ = { 1 ¯ , 2 ¯ , , κ ¯ } be the set of super items generated from N , where set ( j ¯ ) = { ( j 1 ) a + 1 , , j a } for j = 1 , , κ 1 and set ( κ ¯ ) = { ( κ 1 ) a + 1 , , n } . We then denote that χ E ¯ = ( χ 1 , , χ κ ) be the characteristic vector of E ¯ , where χ j = 1 if the test on j ¯ is positive and χ j = 0 otherwise.

3.1.1. Filtering Matrices

We create h + 2 filtering matrices in the filtering procedure as follows. Let F = [ f 1 , , f κ ] be an f × κ (indexing) binary matrix for which its jth column is the f-bit binary representation of integer j, where f = log ( κ + 1 ) . It is obvious that the index j is uniquely identified by f j . We then generate h + 2 binary matrices F ( u ) = [ F 1 ( u ) , , F κ ( u ) ] for u = 1 , , h + 2 , such that column F ( u ) ( : , j ) is a zero vector if j u mod ( h + 2 ) while F ( u ) ( : , j ) = f j if j u mod ( h + 2 ) . For example, let n = 12 , d = 4 , and h = 2 . We obtain a = ( d + h ) / ( h + 1 ) = 2 and κ = n / a = 6 . Since F = [ f 1 , f 2 , f 3 , f 4 , f 5 , f 6 ] , we imply F ( 1 ) = [ f 1 , 0 , 0 , 0 , f 5 , 0 ] , F ( 2 ) = [ 0 , f 2 , 0 , 0 , 0 , f 6 ] , F ( 3 ) = [ 0 , 0 , f 3 , 0 , 0 , 0 ] , and F ( 4 ) = [ 0 , 0 , 0 , f 4 , 0 , 0 ] . For every h + 2 consecutive column in F ( u ) , there exists only one non-zero column. Therefore, it is used to “isolate” each super item in the h + 1 super items generated from the set of D positives and inhibitors.
Let y F ( u ) = [ y F ( u ) ( 1 ) , , y F ( u ) ( f ) ] T be the outcome vector by using the testing matrix F ( u ) with the set of super items E ¯ . In particular, if F ( u ) ( i , j ) = 1 (respectively, F ( u ) ( i , j ) = 0 ) then all items in the super item j ¯ (respectively, do not) belong to test i. Therefore, we obtain the following.
y F ( u ) = F ( u ) χ E ¯ .

3.1.2. Sanitizing Matrices

In the sanitizing procedure, the measurement matrix depends on whether the objective is to identify positives only or to identify both positives and inhibitors. For the first objective, we design an s × n matrix S such that S ( i , j ) = 1 if i j mod s and S ( i , j ) = 0 or, otherwise, where s = 2 D 1 . In other words, each test contains items spaced 2 D 1 apart in a linear order. For example, when n = 12 , d = 4 , and h = 2 , we obtain the following.
S = 1 0 0 0 0 0 0 0 0 0 0 1 0 1 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 1 0 .
It is straightforward that every column in S (respectively, F ( u ) ) is deterministic and strongly explicit because each column in it can be generated in time and space of O ( 2 D ) = O ( d + h ) (respectively, O ( log κ ) ).
For the second objective, i.e., the objective of identifying both positives and inhibitors, we design an additional matrix R along with matrix S . Let R be a r × n ( n , 2 D 3 , 2 ) -disjunct matrix as defined in Definition 1. Therefore, we have r = O ( D 3 log ( n / D ) ) = O ( ( d + h ) 3 log ( n / ( d + h ) ) ) as in Theorem 3.
Let y S = [ y S ( 1 ) , , y S ( s ) ] T (respectively, y R = [ y R ( 1 ) , , y R ( r ) ] T ) be the outcome vector by using the testing matrix S (respectively, R ) with input set N. In particular, we have the following.
y S = S x a n d y R = R x .

3.2. Decoding Procedure and Correctness

We first approximately locate some positive items by using outcome vectors y F ( 1 ) , , y F ( h + 1 ) . Then, we can locate a set of up to 2 D 1 items that contains all positives and inhibitors. We call this set the set of interest. By using y S , we can exactly identify all positives in that set. Meanwhile, if y R is also used, all inhibitors are also identified. The details of the decoding procedure are as follows.
Let λ be an index such that y F ( λ ) is not a zero vector. There always exists a λ . Indeed, because there are h inhibitors, D consecutive positives and inhibitors, and each super item contains up to D / ( h + 1 ) items, the total number of items contained in super items having inhibitors is up to h D / ( h + 1 ) < D . Therefore, there must exist a super item α ¯ that does not contain any inhibitor but all positives for 1 α κ . Let λ be the index such that F ( λ ) ( : , α ) 0 . Since two consecutive non-zero column in F ( u ) are space by h + 2 , y F ( λ ) = F ( λ ) χ E ¯ = F ( λ ) ( : , α ) . Therefore, to identify α , we convert a non-zero vector y F ( λ ) into a decimal number. The indices of all positives and inhibitors, thus, must belong to the range from max { 1 , j * D + 1 } to min { j * + D 1 , n } . The decoding complexity of identifying λ is therefore O ( h f ) .
Because of the construction of S , a matrix composed of 2 D 1 consecutive columns in it is a permutation of a ( 2 D 1 ) × ( 2 D 1 ) identity matrix. Therefore, given the indices from max { 1 , α D + 1 } to min { α + D 1 , n } , one can identify which item is positive based on the corresponding outcome vector y S . The decoding complexity of y S is, therefore, O ( D ) = O ( d + h ) .
After identifying d positives, the set of interest contains up to 2 D 1 d = d + 2 h 1 potential inhibitors. Because of the construction of R , for any two items and other 2 D 3 items, there exists a test that contains the two items and does not contain the other 2 D 3 items. Therefore, one could identify whether a potential inhibitor is truly an inhibitor by checking the row that contains it and whether it is a positive, in addition to checking that it does not contain the remaining items in the set of interest. Since there are up to 2 D 1 d potential positives and the number of rows in R is r, this procedure to identify inhibitors takes O ( r ( 2 D 1 d ) ) = O ( r ( d + h ) ) .

3.3. Decoding Complexity and Number of Tests

As analyzed in the previous section, to identify the positives only, the number of required tests and the decoding complexity are as follows.
O ( ( h + 2 ) f + s ) = O h log h n d + h + O ( d + h ) = O h log h n d + h + d .
To identify both the positives and inhibitors, i.e., classify all items, the required number of tests is as follows.
( h + 2 ) f + s + r = O h log h n d + h + O ( d + h ) + O ( d + h ) 3 log n d + h = O ( d + h ) 3 log n d + h .
The corresponding decoding complexity is as follows.
O ( h f ) + O ( d + h ) + O ( r ( d + h ) ) = O ( d + h ) 4 log n d + h .

4. Single Block of Positives and Inhibitors

In this section, we consider the case when the positives and inhibitors are not necessarily consecutive but belong to a small block (set) of consecutive items of size up to D d + h , where d and h are the maximum numbers of positives and inhibitors in the population of n items.
In the encoding procedure, we use the same techniques in Section 3.1 but adjust some parameters. In the filtering procedure, we set a = 1 , i.e., every super item reduces to an item. Therefore, κ is equal to n. Moreover, we create D filtering matrices, i.e., h + 1 is replaced by D. In the sanitizing procedure, the parameter s in the s × n matrix S is set to be 2 D 1 . Matrix R is a r × n ( n , 2 D 3 , 2 ) -disjunct matrix as defined in Definition 1. Therefore, we have r = O ( D 3 log ( n / D ) ) as in Theorem 3.
Since the decoding procedure and the proofs of correctness are as the same as in Section 3.2, we only pay attention for the required numbers of tests and the decoding complexities. Each matrix F ( u ) has a size of f × n , where f = log n , for u = 1 , , D . The numbers of tests in matrices S and R are s = 2 D 1 and r = O ( D 3 log ( n / D ) ) , respectively.
To identify the positives only, the number of required tests and the decoding complexity are as follows.
D f + s = O D log n + O ( D ) = O D log n .
To identify both the positives and inhibitors, the required number of tests is described as follows.
D f + s + r = O D log n + O ( D ) + O ( D 3 log ( n / D ) ) = O D log n + D 3 log ( n / D ) .
The corresponding decoding complexity is as follows.
O ( D f ) + O ( d + h ) + O ( r D ) = O D log n + D 4 log ( n / D ) .

5. Blocks of Positives and Inhibitors

In this section, we consider a model consisting of multiple blocks of positives and inhibitors, in which all positives and inhibitors belong to at most k special blocks of consecutive items and each block has up to D consecutive items.Moreover, each special block contains up to dpositives and up to h inhibitors.

5.1. Encoding Procedure

We generate D sets from the set of n items N as follows. Set N ( u ) = { u , D + u , 2 D + u , , n u } and x ( u ) = ( x u , x D + u , x 2 D + u , , x n u ) T , where n u is the largest number smaller than n and n u u mod D , for u = 1 , , D . It is obvious that n u = | N ( u ) | n / D . Since each special block has up to D items and there are up to k special blocks, each set N ( u ) must contain up to k positives and inhibitors in total. Moreover, for each special block τ , there exists an index u τ such that some positive item in that block belongs to N ( u τ ) because two consecutive items in N ( u ) are spaced apart by D and each special block has up to D items.
Let M ( u ) be an m u × n u k-disjunct matrix. We then obtained m u = O ( k 2 log n u ) = O ( k 2 log ( n / D ) ) as in Theorem 4. Let B ( u ) be a b × n u index matrix:
B ( u ) : = b 1 b 2 b n u b ¯ 1 b ¯ 2 b ¯ n u = B 1 ( u ) B n u ( u ) ,
where b = 2 log n u , b j is the log n u -bit binary representation of integer j 1 , b ¯ j is the complement of b j , and B j ( u ) : = b j b j ¯ for j = 1 , 2 , , n u . Item j is characterized by column B j and that the weight of every column in B is b / 2 = log n u . Furthermore, the index j is uniquely identified by b j . For example, if we set n u = 8 , b = 2 log n u = 6 , and the matrix in (4) becomes the following.
B ( u ) = 0 0 0 0 1 1 1 1 0 0 1 1 0 0 1 1 0 1 0 1 0 1 0 1 1 1 1 1 0 0 0 0 1 1 0 0 1 1 0 0 1 0 1 0 1 0 1 0 .
Finally, matrices S and R are defined as in Section 3.1.2. Note that D is not set to be d + h here.
We are now ready to generate a filtering matrix and a scrutinizing matrix. The filtering matrix corresponding to matrix M ( u ) is as follows:
F ( u ) = M ( u ) ( 1 , : ) B ( u ) × diag ( M ( u ) ( 1 , : ) ) M ( u ) ( m u , : ) B ( u ) × diag ( M ( u ) ( m u , : ) ) ,
where diag ( · ) is a diagonal matrix generated by the input vector.
The vector observed after performing the tests given by the measurement matrix F ( u ) is described as follows:
y ( u ) = F ( u ) x ( u ) = M ( u ) ( 1 , : ) x ( u ) B ( u ) x 1 ( u ) M ( u ) ( m u , : ) x ( u ) B ( u ) x m u ( u ) = y 1 ( u ) y 1 ( u ) y m u ( u ) y m u ( u )
where x i ( u ) = diag ( M ( u ) ( i , : ) ) × x ( u ) , y i = M ( u ) ( i , : ) x ( u ) , and y i ( u ) = B ( u ) x i ( u ) , for i = 1 , 2 , , m u . Entry y i indicates whether there exist only negatives and positives in that test. If the answer is yes, vector y i ( u ) tells us whether there exists only one positive or more than one positive.
Let expand ( M ( u ) ( i , : ) ) be M ( u ) ( i , : ) . Then, for any j N ( u ) and M ( u ) ( i , j ) = 1 , every entry in expand ( M ( u ) ( i , : ) ) indexed from max { j D + 1 , 1 } to min { j + D 1 , n } is set to be 1. This vector is used to identify a block of 2 D 1 consecutive items that contains at least one positive item. In particular, to identify positives only, the scrutinizing matrix corresponding to matrix M ( u ) isdefined as follows:
S ( u ) = S × diag ( expand ( M ( u ) ( 1 , : ) ) ) S × diag ( expand ( M ( u ) ( m u , : ) ) ) ,
where S is defined in Section 3.1.2, and the outcome vector obtained by using this matrix is as follows:
s ( u ) = S ( u ) x = S ( diag ( expand ( M ( u ) ( 1 , : ) ) ) × x ) S ( diag ( expand ( M ( u ) ( m u , : ) ) ) × x ) , = s 1 ( u ) s m u ( u ) ,
where s i ( u ) = S ( diag ( expand ( M ( u ) ( i , : ) ) ) × x ) , for i = 1 , 2 , , m u .
To identify both positives and inhibitors, an additional scrutinizing ( n , k D 2 , 2 ) -disjunct matrix R is used. Let r be the outcome vector by using this matrix.

5.2. Decoding Procedure and Correctness

For each u { 1 , D } , we first scan each y ( u ) to locate some positive item in some block. The decoding procedure is as follows. First, find 1 i m u such that y i ( u ) = 1 and | y i ( u ) | = log n u . Second, let α be the corresponding decimal number of the first half of y i ( u ) , where y i ( u ) = 1 and | y i ( u ) | = log n u . Then, similarly to the arguments in Section 3.2, since any matrix composed of 2 D 1 consecutive columns in S is a permutation of a ( 2 D 1 ) × ( 2 D 1 ) identity matrix, one can identify which item is positive based on the corresponding outcome vector s i ( u ) . Finally, all inhibitors in a block can be identified by using r i ( u ) .
Such i always exists in the first step. Indeed, as proved in Section 5.1 that for each special block τ , there exists an index u τ such that some positive item in that block belongs to N ( u τ ) . Since each set N ( u τ ) contains up to k positives and inhibitors and M ( u τ ) is a k-disjunct matrix, there must exist row i such that M ( u τ ) ( i , : ) contains only that positive. Therefore, y i ( u τ ) = 1 and | y i ( u τ ) | = log n u τ . Conversely, if y i ( u ) = 1 , there must exist at least one positive item in N ( u ) in test M ( u ) ( i , : ) . Moreover, since B ( u ) ( diag ( M ( u ) ( i , : ) ) × x ( u ) ) and every column in B ( u ) has weight of log n u , there must exist only one positive item in N ( u ) in that test. Otherwise, | y i ( u ) | > log n u .
In the second step, the indices of positives and inhibitors then ranged from max { 1 , α D + 1 } to min { α + D 1 , n } . Because of the construction of vector expand ( M ( u ) ( 1 , : ) ) , every item indexed from max { 1 , α D + 1 } to min { α + D 1 , n } presents in the characteristic vector diag ( expand ( M ( u ) ( 1 , : ) ) ) × x . Therefore, s i ( u ) is the union of up to D columns in S , which out of them corresponds to all positives and inhibitors in a specific block. The positives are thus identified. The decoding complexity of s i ( u ) is therefore O ( D ) .
In the last step, since R is an ( n , k D 2 , 2 ) -disjunct matrix, for any block of positives and inhibitors, there exists a row such that it contains only a positive and an inhibitor. That inhibitor is thus identified. This procedure takes O ( k × r ( 2 D 1 ) ) = O ( k 4 D 4 log ( n / ( k D ) ) ) .

5.3. Decoding Complexity and Number of Tests

There are D F ( u ) deterministic and strongly explicit matrices in the filtering procedure. Since each has m u ( 1 + b ) = O ( k 2 log n u × log n u ) = O ( k 2 log 2 ( n / D ) ) tests, the total number of tests in the filtering procedure is O ( D k 2 log 2 ( n / D ) ) . The decoding complexity by using these tests is also O ( D k 2 log 2 ( n / D ) ) .
There are also D S ( u ) matrices and D R ( u ) matrices. The total number of tests for D S ( u ) matrices and D R ( u ) matrices include m u D ( 2 D 1 ) = O ( m u D 2 ) and m u D r = O ( m u r D ) , respectively. Therefore, the number of tests for identifying positives only (both positives and inhibitors) is O ( D m u ( 1 + b + s ) ) = O ( D k 2 log ( n / D ) ( D + log ( n / D ) ) ) (respectively, O ( D m u ( 1 + b + s ) + r ) = O ( k 2 D 2 log ( n / D ) ( log ( n / D ) + k D ) ) ).
For each u, the running time to decode all s i ( u ) s is O ( m u s ) = O ( D k 2 log ( n / D ) ) . Since u ranges from 1 to D, the running time to find all positives is O ( D k 2 log 2 ( n / D ) ) + D × O ( m u s ) = O ( D k 2 log ( n / D ) ( D + log ( n / D ) ) ) . On the other hand, the running time to find all positives and inhibitors is as follows.
O D k 2 log n k D ( D + log n D ) + O k 4 D 4 log n k D = O D k 2 log n k D ( D + log n D ) + k 4 D 4 log n k D .

6. Discussion

6.1. Comparison

We compare our proposed schemes with existing schemes, namely Ganesan et al. [32], Chang et al. [33], and Bui et al. [20] in Table 1. There are eight criteria to consider here. The first four criteria are about the structure of the population of n items. They are the number of blocks, the number of items in a block, the number of positives (in a block if applicable), and the number of inhibitors (in a block if applicable). The fifth criterion is the decoding type. The sixth is the construction type, which describes how measurement matrices can be achieved. The seventh and the last are the number of tests and the decoding time.
Consider the decoding type as “positives only.” The construction type in our proposed schemes for the first and the second model, i.e., the number of blocks is one, is deterministic and strongly explicit. They are better than the schemes proposed by Chang et al. and Ganesan et al., whose schemes are random and explicit. The numbers of tests in our proposed schemes are almost less than a factor of d + h compared to the ones in Chang et al.’s and Bui et al.’s schemes and less than the one in Ganesan et al.’s scheme. More importantly, our decoding times scale to the number of tests while the ones in Chang et al.’s and Ganesan et al.’s schemes scale to the number of tests and the number of items. For the third model, the same arguments are applied by replacing d by d k and h by h k .
We now consider the decoding type as “positives and inhibitors.” For the first and second models, the number of tests in our proposed schemes is relatively the same as the one in Chang et al.’s scheme, smaller than the one in Bui et al.’s scheme, and less than the one in Ganesan et al.’s scheme. Meanwhile, the decoding times are smaller than the ones in the three existing schemes. For the third model, the number of tests in our proposed schemes is relatively similar to the one in Chang et al.’s scheme, smaller than the one in Bui et al.’s scheme, and larger than the one in Ganesan et al.’s scheme. However, our decoding time is smaller than the ones in the three existing works.

6.2. Potential Applications

Bruno et al. [34] addressed a group testing-based solution in genetic mapping and sequencing. In this application, the authors consider linear DNA, which consists of consecutive segments of the DNA. Each segment is placed in a pool, called clones, in an order consistently to the order of their appearance in the linear DNA. A collection of such clones is called a linear DNA library. From this point, we can ask where segments (clones) of interest are in the linear DNA library [35]. The segments of interest here can be considered as positives and other segments can be considered as negatives. A pool that contains at least one segment of interest returns a positive outcome when performing testing and returns a negative outcome otherwise.
We extend the application above to a potential application as follows. Given a linear DNA library, we would like to find segments of DNA that express a certain biological property and segments of DNA that inhibit the segments expressing a certain biological property. The first and second types of segments of interest are considered as positives and inhibitors, respectively, while the remaining segments are considered as negatives. Because of the nature of DNA, an inhibitor is usually close to positives. Therefore, the blocks of positives and inhibitors model can be used to identify both positives and inhibitors.

7. Conclusions

In this paper, we presented efficient encoding and decoding procedures to identify positives and/or inhibitors in a single block of (consecutive) positives and inhibitors or in blocks of positives and inhibitors. The number of tests and the decoding times in our proposed schemes is usually smaller than the ones in existing works. An extension of this work to other settings in group testing such as threshold group testing or complex group testing is still an open problem.

Author Contributions

Conceptualization, T.V.B.; methodology, T.V.B.; writing—original draft preparation, T.V.B.; writing—review and editing, T.V.B., I.E., M.K., T.K. and T.D.N.; funding acquisition, T.D.N. All authors have read and agreed to the published version of the manuscript.

Funding

Thuc D. Nguyen was funded in part by University of Science, VNU-HCM under Grant No. CNTT 2021-27.

Institutional Review Board Statement

Not applicable.

Data Availability Statement

Not applicable.

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. Dorfman, R. The detection of defective members of large populations. Ann. Math. Stat. 1943, 14, 436–440. [Google Scholar] [CrossRef]
  2. Du, D.; Hwang, F.K.; Hwang, F. Combinatorial Group Testing and Its Applications; World Scientific: Singapore, 2000; Volume 12. [Google Scholar]
  3. Farach, M.; Kannan, S.; Knill, E.; Muthukrishnan, S. Group testing problems with sequences in experimental molecular biology. In Proceedings of the Compression and Complexity of Sequences 1997, Salerno, Italy, 11–13 June 1997; pp. 357–367. [Google Scholar]
  4. D’yachkov, A.G.; Polyanskii, N.; Shchukin, V.Y.; Vorobyev, I. Separable Codes for the Symmetric Multiple-Access Channel. IEEE Trans. Inf. Theory 2019, 65, 3738–3750. [Google Scholar] [CrossRef] [Green Version]
  5. Shental, N.; Levy, S.; Wuvshet, V.; Skorniakov, S.; Shalem, B.; Ottolenghi, A.; Greenshpan, Y.; Steinberg, R.; Edri, A.; Gillis, R.; et al. Efficient high-throughput SARS-CoV-2 testing to detect asymptomatic carriers. Sci. Adv. 2020, 6, eabc5961. [Google Scholar] [CrossRef] [PubMed]
  6. Gabrys, R.; Pattabiraman, S.; Rana, V.; Ribeiro, J.; Cheraghchi, M.; Guruswami, V.; Milenkovic, O. AC-DC: Amplification Curve Diagnostics for COVID-19 Group Testing. arXiv 2020, arXiv:2011.05223. [Google Scholar]
  7. Bui, T.V. A simple self-decoding model for neural coding. bioRxiv 2022. [Google Scholar] [CrossRef]
  8. D’yachkov, A.G.; Rykov, V.V. Bounds on the length of disjunctive codes. Probl. Peredachi Informatsii 1982, 18, 7–13. [Google Scholar]
  9. Indyk, P.; Ngo, H.Q.; Rudra, A. Efficiently decodable non-adaptive group testing. In Proceedings of the ACM-SIAM Symposium on Discrete Algorithms, Austin, TX, USA, 17–19 January 2010; pp. 1126–1142. [Google Scholar]
  10. Porat, E.; Rothschild, A. Explicit nonadaptive combinatorial group testing schemes. IEEE Trans. Inf. Theory 2011, 57, 7982–7989. [Google Scholar] [CrossRef]
  11. Ngo, H.Q.; Porat, E.; Rudra, A. Efficiently decodable error-correcting list disjunct matrices and applications. In Proceedings of the International Colloquium on Automata, Languages, and Programming, Zurich, Switzerland, 4–8 July 2011; pp. 557–568. [Google Scholar]
  12. Cheraghchi, M. Noise-resilient group testing: Limitations and constructions. Discret. Appl. Math. 2013, 161, 81–95. [Google Scholar] [CrossRef] [Green Version]
  13. Cheraghchi, M.; Nakos, V. Combinatorial group testing and sparse recovery schemes with near-optimal decoding time. In Proceedings of the 2020 IEEE Symposium on Foundations of Computer Science (FOCS), Virtual, 16–19 November 2020; pp. 1203–1213. [Google Scholar]
  14. Cai, S.; Jahangoshahi, M.; Bakshi, M.; Jaggi, S. Efficient algorithms for noisy group testing. IEEE Trans. Inf. Theory 2017, 63, 2113–2136. [Google Scholar] [CrossRef]
  15. Bondorf, S.; Chen, B.; Scarlett, J.; Yu, H.; Zhao, Y. Sublinear-time non-adaptive group testing with O(k log n) tests via bit-mixing coding. IEEE Trans. Inf. Theory 2020, 67, 1559–1570. [Google Scholar] [CrossRef]
  16. Price, E.; Scarlett, J. A Fast Binary Splitting Approach to Non-Adaptive Group Testing. In Proceedings of the Approximation, Randomization, and Combinatorial Optimization. Algorithms and Techniques (APPROX/RANDOM), Virtual, 17–19 August 2020. [Google Scholar]
  17. De Bonis, A.; Vaccaro, U. Improved algorithms for group testing with inhibitors. Inf. Process. Lett. 1998, 67, 57–64. [Google Scholar] [CrossRef]
  18. De Bonis, A.; Gasieniec, L.; Vaccaro, U. Optimal two-stage algorithms for group testing problems. SIAM J. Comput. 2005, 34, 1253–1270. [Google Scholar] [CrossRef]
  19. Hwang, F.K.; Liu, Y. Error-tolerant pooling designs with inhibitors. J. Comput. Biol. 2003, 10, 231–236. [Google Scholar] [CrossRef] [Green Version]
  20. Bui, T.V.; Kuribayashi, M.; Kojima, T.; Echizen, I. Sublinear decoding schemes for non-adaptive group testing with inhibitors. In Proceedings of the International Conference on Theory and Applications of Models of Computation, Kitakyushu, Japan, 13–16 April 2019; pp. 93–113. [Google Scholar]
  21. Bui, T.V.; Chee, Y.M.; Scarlett, J.; Vu, V.K. Group Testing with Blocks of Positives. In Proceedings of the IEEE International Symposium on Information Theory, ISIT 2022, Espoo, Finland, 26 June–1 July 2022; pp. 1082–1087. [Google Scholar] [CrossRef]
  22. Colbourn, C.J. Group testing for consecutive positives. Ann. Comb. 1999, 3, 37–41. [Google Scholar] [CrossRef] [Green Version]
  23. Müller, M.; Jimbo, M. Consecutive positive detectable matrices and group testing for consecutive positives. Discret. Math. 2004, 279, 369–381. [Google Scholar] [CrossRef]
  24. Juan, J.S.T.; Chang, G.J. Adaptive group testing for consecutive positives. Discret. Math. 2008, 308, 1124–1129. [Google Scholar] [CrossRef]
  25. Bui, T.V.; Cheraghchi, M.; Nguyen, T.D. Improved algorithms for non-adaptive group testing with consecutive positives. In Proceedings of the 2021 IEEE International Symposium on Information Theory (ISIT), Melbourne, Australia, 12–20 July 2021; pp. 1961–1966. [Google Scholar]
  26. Chang, H.; Tsai, Y.L. Threshold group testing with consecutive positives. Discret. Appl. Math. 2014, 169, 68–72. [Google Scholar] [CrossRef]
  27. Chang, H.; Lan, W.C. Interval group testing for consecutive positives. Discret. Math. 2017, 340, 1488–1496. [Google Scholar] [CrossRef]
  28. Kautz, W.; Singleton, R. Nonrandom binary superimposed codes. IEEE Trans. Inf. Theory 1964, 10, 363–377. [Google Scholar] [CrossRef] [Green Version]
  29. Stinson, D.R.; Wei, R. Generalized cover-free families. Discret. Math. 2004, 279, 463–477. [Google Scholar] [CrossRef] [Green Version]
  30. D’yachkov, A.; Vilenkin, P.; Torney, D.; Macula, A. Families of finite sets in which no intersection of sets is covered by the union of s others. J. Comb. Theory Ser. A 2002, 99, 195–218. [Google Scholar] [CrossRef]
  31. Chen, H.B.; Fu, H.L.; Hwang, F.K. An upper bound of the number of tests in pooling designs for the error-tolerant complex model. Optim. Lett. 2008, 2, 425–431. [Google Scholar] [CrossRef]
  32. Ganesan, A.; Jaggi, S.; Saligrama, V. Non-adaptive group testing with inhibitors. In Proceedings of the ITW, Jerusalem, Israel, 26 April–1 May 2015; pp. 1–5. [Google Scholar]
  33. Chang, H.; Chen, H.B.; Fu, H.L. Identification and classification problems on pooling designs for inhibitor models. J. Comput. Biol. 2010, 17, 927–941. [Google Scholar] [CrossRef] [PubMed]
  34. Bruno, W.J.; Knill, E.; Balding, D.J.; Bruce, D.C.; Doggett, N.A.; Sawhill, W.W.; Stallings, R.L.; Whittaker, C.C.; Torney, D.C. Efficient pooling designs for library screening. Genomics 1995, 26, 21–30. [Google Scholar] [CrossRef] [Green Version]
  35. Balding, D.J.; Torney, D.C. The design of pooling experiments for screening a clone map. Fungal Genet. Biol. 1997, 21, 302–307. [Google Scholar] [CrossRef]
Figure 1. Three models for blocks of positives and inhibitors. Red, purple, and black dots represent positives, inhibitors, and negative items, respectively. A double arrow line stands for a block of D consecutive items. The first, second, and third models are a single block of consecutive positives and inhibitors, single block of positives and inhibitors, and blocks of positives and inhibitors. The second model is a generalization of the first model, and the third model is a generalization of the first two models.
Figure 1. Three models for blocks of positives and inhibitors. Red, purple, and black dots represent positives, inhibitors, and negative items, respectively. A double arrow line stands for a block of D consecutive items. The first, second, and third models are a single block of consecutive positives and inhibitors, single block of positives and inhibitors, and blocks of positives and inhibitors. The second model is a generalization of the first model, and the third model is a generalization of the first two models.
Entropy 24 01562 g001
Table 1. Comparison with previous work. “Det.” and ”Rnd.” stand for “Deterministic” and “Random.” We set λ = ( d + h ) ln n W ( ( d + h ) log n ) , α = max λ ( d + h ) 2 , 1 , and β = O ( D k 2 ( D + log n D ) log n D + k 4 D 4 log n k D ) , where W ( x ) e W ( x ) = x and W ( x ) Θ ( log x log log x ) .
Table 1. Comparison with previous work. “Det.” and ”Rnd.” stand for “Deterministic” and “Random.” We set λ = ( d + h ) ln n W ( ( d + h ) log n ) , α = max λ ( d + h ) 2 , 1 , and β = O ( D k 2 ( D + log n D ) log n D + k 4 D 4 log n k D ) , where W ( x ) e W ( x ) = x and W ( x ) Θ ( log x log log x ) .
No. of
Blocks
No. of
Items
in a Block
No. of
Positives
(in a Block)
No. of
Inhibitors
(in a Block)
Decoding
Type
SchemeConstruction
Type
No. of Tests
t
Decoding Complexity
Not applicabledhPositives onlyGanesan et al. [32]Rnd., Explicit O ( ( d + h ) log n ) O ( t n )
d h Chang et al. [33] O ( ( d + h ) 2 log n ) O ( t n )
Bui et al. [20]Det.,
Strongly explicit
O ( λ 2 log n ) O λ 5 ( d + h ) 2
1 d + h dhTheorem 1 O h log h n d + h + d O ( t )
D d h Theorem 1 O ( D log n ) O ( t )
k D d h Theorem 2Rnd., Explicit O ( D k 2 ( D + log n D ) log n D ) O ( t )
Not applicabledd Positives
and
inhibitors
Ganesan et al. [32]Rnd., Explicit O ( ( d + h 2 ) log n ) O ( t n )
d h Chang et al. [33]Rnd., Explicit O ( ( d + h ) 3 log n ) O ( t n )
Bui et al. [20]Det.,
Strongly explicit
O ( λ 3 log n ) O ( d λ 6 α )
1 d + h dhTheorem 1Rnd.,
Explicit
O ( d + h ) 3 log n d + h O ( d + h ) 4 log n d + h
D d h Theorem 1 O D log n + D 3 log n D O D log n + D 4 log n D
k D d h Theorem 2Rnd., Explicit O ( D 2 k 2 ( D + log n D ) log n D ) β
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Share and Cite

MDPI and ACS Style

Bui, T.V.; Echizen, I.; Kuribayashi, M.; Kojima, T.; Nguyen, T.D. Group Testing with Blocks of Positives and Inhibitors. Entropy 2022, 24, 1562. https://doi.org/10.3390/e24111562

AMA Style

Bui TV, Echizen I, Kuribayashi M, Kojima T, Nguyen TD. Group Testing with Blocks of Positives and Inhibitors. Entropy. 2022; 24(11):1562. https://doi.org/10.3390/e24111562

Chicago/Turabian Style

Bui, Thach V., Isao Echizen, Minoru Kuribayashi, Tetsuya Kojima, and Thuc D. Nguyen. 2022. "Group Testing with Blocks of Positives and Inhibitors" Entropy 24, no. 11: 1562. https://doi.org/10.3390/e24111562

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop