Group Testing with Consideration of the Dilution Effect

We propose a method of group testing by taking dilution effects into consideration. We estimate the dilution effect based on massively collected RT-PCR threshold cycle data and incorporate them into optimizing group tests. The new constraint helps find a robust solution of a nonlinear equation. The proposed framework has the flexibility to incorporate geographic and demographic information. We conduct a Monte Carlo simulation to compare different group testing approaches under the estimated dilution effect. This study suggests that increased group size adversely impacts the false negative rate significantly when the infection rate is relatively low. Group tests with optimal pool sizes improve the sensitivity over group tests with a fixed pool size. Based on our simulation study, we recommend single group testing with optimal group sizes.


Introduction
Group testing, also known as pooled testing or batch testing, works by amalgamating specimens from individuals into pools and performing tests on these pools. If the group is tested negative, all of its members are declared negative. If the group is tested positive, each member has the remainder of his/her original specimen tested separately to determine the positive/negative outcome. Its implementation has the potential to greatly accelerate the rate of testing and increase the test capacity especially when the prevalence rate is relatively low. The concept of group testing was first introduced for detecting syphilis in US soldiers during World War II [1]. Group testing was studied as an efficient method to detect community transmission [2]. During the COVID-19 outbreak in 2020, Stanford Medical Center, the University of Nebraska, and the Clinical Reference Laboratory applied group testing as the screening strategies for the general population ( [2,3]). Meanwhile, several universities, including Duke University, Michigan State University, the State University of New York, and Syracuse University implemented group testing as their campus screening strategy.
Group testing was discussed with test errors in detail, and it was confirmed that Dorfman's method has lower sensitivity than individual testing [4] . This drawback was mitigated [5] by a new multi-step group testing followed by possible sequential individual tests.
There are two important considerations for applying group testing: group size and dilution effects. Pooling optimal number of specimens together does not adversely affect the detection of positive specimens and achieved 57% fewer tests on average compared to individual testing [6].
The optimal group size [7] was determined by incorporating the dilution effect and the expected cost calculated under Dorfman's procedure. The concentration determines the group testing sensitivity [8]. Ordered pooling is the most efficient way to group patients if the function of the dilution effect is concave [9]. This conclusion generalized the ordered pooling algorithm [10] from no testing errors to testing errors with dilution effects.
Viral load, also known as viral burden, is a numerical expression of the quantity of a virus in a given volume of fluid. Viral load (viral RNA concentration) in patient samples and the rate of successful isolation of virus from clinical specimens in cell culture are the clinical parameters most directly relevant to infectiousness and hence to transmission. The RT-PCR (Reverse transcription-PCR) threshold cycle data were collected from 3303 patients who tested positive for SARS-CoV-2, and viral load was estimated [11]. A Gaussian mixture model was proposed for the threshold cycle value C t of a specimen sample collected from an infected person based on those 3303 positive viral loads data [12]. A logistic regression model was used to fit the relationship between C t and the false negative rate (FNR) [13]. Unlike [8,10,12,13] used molecular-level models of false negatives in RT-PCR, which is a more realistic way.
In this study, we introduced a Monte Carlo method to estimate the expected FNR given a certain group size and infection rate based on the data from [11]. The dilution effects were considered for COVID-19 group testing [14]. However, their method did not mention the C t value distribution among COVID-19 infections. In addition to more realistic dilution effect simulation, we added this expected FNR as a constraint to the group size optimization of the single step group testing and multi-step group testing. The new constraint provided a lower bound for the expected FNR of group testing, and the nonlinear group size optimization was more robust than that in [1,5]. Detailed discussions are given in Sections 2.2.1-2.3.3.
In this study, we found that increasing group size adversely impacts FNR significantly when the infection rate is low. Group testing with optimal pool sizes improves the sensitivity over group tests with a fixed pool size. Under the consideration of the dilution effect in this study, multi-step group testing could not improve the sensitivity over single group testing with an optimal group size. The dilution effects became heavier when false negatives in the previous testing were pooled into larger groups. Dilution effect modeling and simulation are useful to configure an optimal group test setting. Our framework can be applied to effectively combat new diseases in the future.

RT-PCR
RT-PCR is the standard laboratory technique to measure a specific RNA concentration in samples. The targeted RNA sequence in the sample is first reverse transcribed into complementary DNA sequences (cDNAs). Then, those cDNAs are amplified via PCR. The number of those cDNAs appreciatively doubles at each cycle. The C t value will return when the cDNA concentration achieves a fluorescence-detectable level. Therefore, C t = − log 2 V, up to an additive constant and measure error, where V represents the viral load.
Spurious onset of fluorescence could happen when the number of cycles is too large. To control the Type I error, each PCR test has cutoff points (the number of cycles it runs). A censored model was proposed for the measure of the prevalence in a population taking into dilution effects [12]. The limit of detection (LoD) reflects the lowest viral load in the sample that can be detected in a PCR test with a specified probability. The LoD was determined by studies of the limiting distribution using characterized samples. The C t value was estimated by [12] to LoD given in [11] to be d cens = 35.6.

C t Distribution among Infections
The Charité Institute of Virology and Labor in Berlin provided 3303 positive samples and associated viral loads. The positiveness and viral loads were determined by PCR tests [11]. The C t values of 3303 positive COVID-19 cases are fitted with the following Gaussian mixture model [12]:  Figure 1 shows the estimated Gaussian mixture density function and LoD (d cens = 35.6). The shaded area represents the probability that C t of an infected person is beyond LoD, and a value of 0.046 is obtained by numerical integration. The samples with C t values beyond LoD are hard to detect. The FNR was assumed for those hard-to-detect samples as β = 0.8 [12].

Estimation of the False Negative Rate
A censored model was proposed for FNR [12]. If the viral load of a sample is larger than LoD, the FNR will be negligible. However, when the viral load is less than LoD, the FNR of this difficult sample will be estimated. A logistic curve was proposed to fit the relationship between the FNR and C t values [13]. The new logistic regression model recognizes that FNR will strictly increase as the C t value becomes larger. The FNR model is: .
The location parameter, 35.8, and scale parameter, 12.5, were estimated to make FNR(C t ) meet the following two properties:

Dilution Effect Functions
Optimal pool sizes were derived for Dofman's procedure when pooled testing is subject to dilution effects [7]. Given the pool size n and the number of positive cases d, he proposed that the group sensitivity function for d ≥ 1 is : where k is a dilution parameter such that 0 ≤ k ≤ 1. No dilution effect corresponds to k = 0. When k = 1, the group-testing sensitivity, can be interpreted as the probability of a sample randomly selected from a group of size n being positive. The concentration d/n determines the sensitivity of group testing [8]. If n is fixed, S eG (n, d) increases in d. The faster S eG (n, d) converges to the sensitivity of individual testing as d approaches n, the lower the dilution effect. Ordered pooling is shown to be the most efficient way to group patients if S eG (·, d) is concave [9]. This conclusion generalizes the ordered pooling algorithm of [10] from no testing error to testing error with dilution effects based on (3) and (4).
Dilution effects were modeled in a molecular level [12,13]. Based on large-scale COVID-19 clinical data sets used in [11], researchers proposed more realistic dilution models. The average viral load of a pooled sample is n −1 ∑ d i=1 V i . By the relationship between V and C t = − log 2 V, the C t,G value of the pooled sample is: The group testing FNRs were determined by (5) and (2). Monte Carlo simulations were conducted [13] to estimate the expected FNR for a pooled sample given n and d. We follow the notation of [13], and let γ(n, d) denote the expected FNR.
In practice, we do not know d before testing. In contrast, the infection rate of a population is usually roughly estimated by a specific group testing. Inspired by γ(n, d), we propose the expected FNR: for n and the infection rate of the population p. Figure 2 shows the expected FNR versus the pool size under different infection rates. For low infection rates, such as 0.001 and 0.01, the associated FNR becomes higher when the pool size increases. In contrast, for higher infection rates, the associated FNR becomes lower when the pool size becomes larger. The reason behind this phenomenon is concentration. Under the environment of a low infection rate, the viral concentration becomes lower if we increase the group size. However, the viral concentration will become dense if we increase the group size when the infection rate is high.

Multi-Step Group Testing
Multi-step group testing followed by sequential tests achieved high efficiency and efficacy when the dilution effect is not included in the model [5]. Group tests are repeated until the process results in three batch negatives or three batch positives. Each round of the multi-step group testing will depend on the previous group testing results. Negative sub-populations can be retested with a larger group size since the probability of for positive incidents is substantially reduced. Meanwhile, a positive sub-population tends to use groups with smaller sizes for retesting due to the increased probability of positives. The group size of the next iteration is determined by the optimizing the expected number of tests. After several rounds, the majority of the population will not need further investigations while others with 3 positive group test results will need individual tests. We noticed that some of the very large optimal group sizes are impractical. Some frequent group negatives such as − − − or − − +− can yield an optimal group size over 1000 in a later stage when the infection rate is as low as 0.1%. Figure 3 compares test consumption of multi-step group testing with different group size upper limits. Group size upper limits have effects on the number of group tests when the population infection rate is low. The multi-step group testing procedure is more robust in sensitivity when a group size upper limit is implemented.
Type Batch_Consumption Indiv_Consumption Figure 3. Number of tests for multi-step group testing for population size 100,000 with three different group size upper limits: 32, 64, and 10,000.

Optimal Group Size
The size of a group determines the efficiency of group testing. Laboratories are interested in minimizing the required number of tests. Traditionally, research on group testing has focused solely on the expected number of tests per individual [1,4]. Given the probability of a type I error and the probability of a type II error, the expected number of tests per person without dilution setting, T(n), is: The accuracy, sensitivity, and other measures were considered by [5], as well as the number of tests, and [15] included the expected number of tests and accuracy. Both the costs of collecting the samples and those of running the assays were considered by [16]. An extension of the objective function was discussed to array testing over a number of realistic situations [17]. They showed that controversy between different objective functions may be useless since the corresponding results are largely the same for standard testing algorithms in a wide variety of situations.
Before we derive the expected number of tests per person based on (7) for Dorman's method, we first show the probability in each cell of the confusion matrix for group testing results for the infection rate p. Table 1 shows the probabilities for the confusion matrix, where γ(n, 0) denotes the probability that a group of size n with no positive cases test negative.

No Samples Are Infected At Least One Sample is Infected
Test Test result + needs n + 1 tests, whereas test result − needs only one test. The expected number of tests per person under the dilution setting, T(n), is: A selection of the group size is an optimization problem. We seek to minimize T(n) subject to the expected FNR β(n, p) given in (6) not exceeding a given level C. The optimization problem can be written as: such that β(n, p) ≤ C, where n max is the upper limit of the group size. Figure 4 shows the optimal group size among different infection rates with different thresholds for the expected FNR β(n, p). Figure 5 illustrates the expected FNR given the optimal group size under the different thresholds. A group of size 1 is returned for a threshold of 0.01 because even the individual testing cannot achieve the threshold of FNR. Except for low thresholds of 0.01 and 0.1, we can notice that the optimal group size gradually decreases as the infection rate increases. As a result, the viral concentration in pooling increases with an optimal size. The expected FNR given the optimal group size tends to decrease gradually when the infection rate increases as well.

Sensitivity
The sensitivity of Dorfman's method is (1 − β) 2 under no dilution effect assumption [4,5,18]. Note that it does not depend on n or d. We can extend this result by taking into account the dilution effect. To identify an infection as test positive, we want to make no error on neither the pooled testing nor the individual testing. Given the infection rate p, the probability of making no error on pooled testing is 1 − β(n, p). It is intractable to estimate the probability that an individual received a correct test result. The reason is because the pooling result depends on other samples in the same group. If an infected sample has very low viral load and it happens to be in the same group as other infected samples with high viral load, it is possible that individual testing fails to detect the samples with low viral loads. If we ignore those complicated hierarchical relations, we can assume that the individual test result is independent to the pooled test result. Then, the approximate sensitivity, given p and n, is: γ(1, 1)).
(10) Table 2 shows a comparison between the approximate sensitivity based on (10) and the simulated sensitivity for a fixed group size of 10 and different infection rates. The population size is 100, 000 with 100 repetitions. Multi-step group testing methods [5] mitigate the false negative issue without considering dilution effects. Every sample was taken with a few rounds of pooled testing and possible individual tests. At the end of each round, samples in a negative sub-population entered into a larger group in the next round. There is one caveat for multi-step group testing under the assumption of dilution effects. If there are infected samples in a negative sub-population, we will pool those false negative cases in the group with a larger size. The infection rate in the larger pooling size will be lower. We use Figure 2 as an example. If the infection rate of the whole population is 0.03 and the group size is 25, then the expected FNR is around 0.22. After the first round of group testing, we can obtain negative sub-populations and positive sub-populations. Among the negative sub-populations, there can be false negative cases, and the infection rate is lower than 0.03. Multi-step group testing moves the samples in a negative sub-population into a larger group. For example, if the infection rate of the negative sub-population is 0.01 and the group size is 50, the expected FNR will become higher than 0.3.
Dilution effects make further rounds of group testing on negative sub-populations even harder. Hence, we propose in this study that all the groups in the negative subpopulations are tested twice in each round of the multi-step group testing to reduce FNR caused by dilution effects. Samples in the negative groups in the second test will be advanced to the negative sub-population for the next round. The samples in the positive group in the second test will be advanced to the positive sub-population in the next round. Figure 6 shows the negative repetition. For a negative repetition in round k, the expected reduction of the number of false negatives is np(β(n, p) − β(n, p) k+1 ), and it is k −1 np(β(n, p) − β(n, p) k+1 ) per test kit. This number decreases as k increases, and therefore, it is not efficient to conduct further testing for the cases from the negative repetition.

Results
We conducted a Monte Carlo simulation study to evaluate the efficiency and efficacy of group testing procedures in the dilution effects setting. The simulation results are given in Table 3. A population of 100, 000 people is randomly generated 100 times. The infection rates of 0.1%, 1%, 3%, 5%, and 10% are chosen. For the infected people, C t follows (1). The FNR function given C t is given in (2). We compared the classification metrics for (A) individual tests; (B) single group testing with a fixed size 10; (C) single group testing with optimal group sizes; (D) multi-step group tests ending with two batch negatives or two batch positives, with an individual test given to two batch positives; and (E) three stage hierarchical group testing, which will divide positive groups into smaller, non-overlapping subgroups twice until all positive specimens are confirmed by individual testing. For (C), (D), and (E), the group size is determined by (9) for the sub-populations in each step. For each of the infection rates, the overall accuracy, sensitivity, specificity, PPV (positive predictive value: proportion of true positives among test positives), and NPV (negative predictive value: proportion of true negatives among test negatives) are obtained, and the required number of tests to cover the whole population is calculated. The values are averaged over the 100 repetitions.
Conventional individual testing performs better in accuracy and sensitivity than any group testing methods. The sensitivity of individual testing is around 0.95 due to the probability that an infected person has C t , which is beyond the detection limit of around 5%. The C t value of the pool will be high and beyond LoD, even when we use the optimal group size and the infection rate is low.
The sensitivity of group testing method is found to be equal in different infection rates when there are no dilution effects [5]. However, with dilution effects and FNR functions of C t , the sensitivity of the group testing methods increases if the infection rate goes up. It is because a higher infection rate raises the viral load concentration in a pooling sample and therefore reduces FNR. Figure 7 shows that the simulated sensitivity of (B), (C), (D), and (E) increased as the infection rate increased.
Method (C) improved the sensitivity by using optimal group sizes instead of a fixed group size in method (B). Figure 8 shows the C t value distribution in the negative subpopulation for a fixed group size (method (B)) and the optimal group size (method (C)). The C t distribution for C t > 35.6 is almost identical for methods (B) and (C). However, the distribution of C t is lower in (B) than in (C) for C t < 35. 6. Method (C) made less false negatives for the samples with low C t . For the infection rates of 0.001 and 0.01, the number of tests for method (C) was larger than that of method (B). However, when the infection rate became larger than 1%, the number of tests in method (C) started to be less than that of method (B). Table 3. Simulation results: 100 repetitions, population size 100,000; mean with standard deviation in parentheses.
Method (D) improved the sensitivity by multi-step group testing over single layer group testing in method (C) for infection rates of 0.001 and 0.01. Figure 9 shows the distribution of C t in the negative sub-populations for method (C) and method (D). The comparison of the distributions of C t between method (C) and method (D) was identical to the comparison between method (B) and method (C). Method (D) yielded less false negatives for samples with low C t values. The violin plot shapes of methods (C) and (D) for the low C t values are slightly different for an infection rate of 0.001. For the infection rates of 0.05 and 0.1, the C t distributions for methods (C) and (D) were almost identical, but the difference in sensitivity was 0.006 and 0.007 between multi-step group testing and single layer group testing for infection rates of 0.05 and 0.1, respectively.
Method (D) improved over Method (C) for certain infection rates. Table 4 shows the changes in false negatives between step I and step II of multi-step group testing. The multistep architecture could not reduce false negatives by increasing steps when the dilution effect was assumed in this study. The reason is because the previous false negatives fell into groups with larger sizes, yielding even higher dilution effects.

p
Step I False Negatives Step Method (E) can be considered the variation of method (D), which does not advance to further tests on negative sub-populations. Three-stage hierarchical group tests could not improve the sensitivity compared to the Dorfman group testing but improved the test efficiency. Our simulations showed that the difference in sensitivity between method (E) and method (C) was less than 1%. Method (E) saved 7% of test consumption compared to method (C) for the infection rate of 5%.

Discussion
Some infectious diseases spread silently by asymptomatic carriers, and we need a rapid testing of the virus for all the residents of each community. It requires more efficient testing methods than individual testing. Group testing is a natural candidate.
Compared to individual testing, group testing increases FNR. In this study, we have a comprehensive discussion of dilution effects, one major concern for the implementation of the group testing. The pooling result depends on other samples in the same group. This is a limitation of the pooled tests. If we ignore those complicated hierarchical relations, we can assume that the individual test result is independent to the pooled test result. Using over 3000 samples, we modeled the dilution effects. Furthermore, under this specific dilution effect, we can estimate the optimal group size via Monte Carlo simulation. The optimal group size is determined by the infection rate and the dilution effect. Single group testing with an optimal size performs better on both sensitivity and test efficiency than single group testing with a fixed size of 10. The group size of 10 is widely used for COVID-19 group testing.
Based on our simulation study, we recommend single group testing with optimal group sizes. Multi-step group testing cannot improve the sensitivity from single group testing with an optimal size due to the dilution effect. The reason for this is because the samples with false negatives in the previous group are pooled into a larger group, causing a larger dilution effect. More people in the community can be covered by improving the efficiency of tests. Multi-stage hierarchical group testing, a variation of multi-step group testing, can improve the efficiency of testing by a reduction in test consumption when it has less than 1% in sensitivity comparing to the single-layer group testing with optimal group size.
Our dilution effect modeling and simulation tool will be useful to determine an optimal group test. It can be easily applied to various infectious diseases. For COVID-19, for example, the presence of the viral load in the patients can vary over more than nine orders of magnitude [19]. Therefore, any lab using group testing needs a simulation tool to monitor/optimize its group testing regularly. In future studies, we will continue investigating the dilution effect to improve test efficiency and efficacy. Other group testing methods such as overlapping group tests will be investigated as well.