# A Discriminant Function Approach to Adjust for Processing and Measurement Error When a Biomarker is Assayed in Pooled Samples

^{1}

^{2}

^{*}

*Int. J. Environ. Res. Public Health*

**2015**,

*12*(11), 14723-14740; https://doi.org/10.3390/ijerph121114723

## Abstract

**:**

## 1. Introduction

**C**). As in prior related work, we assume each member of a given pool contributes the same sample aliquot volume and the lab assay is expected to return the arithmetic mean (equivalently, the sum) of biomarker concentrations across members of each pool [8,13]. The complication is that we seek to formally account for the fact that measurement error, processing error, or both may be incurred when applying the lab assay to measure X. Our approach relies upon discriminant function analysis (e.g., [16,17]), together with a prior paradigm for modeling sources of error [7]. We note that there is precedent for adopting the discriminant function approach to covariate measurement error problems [18,19]; however, this is to our knowledge the first attempt to apply it to analyses involving bioassay data obtained on pools. Our specific strategy utilizes a variant on classical discriminant function analysis, in which one assumes normal errors in a multiple linear regression model as opposed to multivariate normality of the exposure variable and any covariates [20].

## 2. Methods

#### 2.1. Models for Individual-Level Data without Measurement or Processing Error

**C**). The parameter of primary interest is the adjusted exposure log odds ratio (OR), commonly captured by the coefficient β in the following standard logistic regression model (Equation (1)):

_{i}; t = 1,…, T). Here p

_{ij}= Pr (Y

_{ij}= 1) where Y

_{ij}is the binary outcome for the jth member of the ith of k eventual pools, and g

_{i}is the number of specimens included in the ith pool (we discuss pooling further in the next section).

**C | Y**), Lyles, Guo and Hill [20] recently revisited this approach and demonstrated that the adjusted exposure log OR of interest can be efficiently estimated. In addition, their adaptation of the discriminant function approach required only a univariate distributional assumption for the errors in the following standard multiple linear regression (MLR) model (Equation (2)):

^{2}under Equation (2). When normality of the errors in Equation (2) holds, a uniformly minimum variance (UMVU) estimator for the log OR is available [20]. Further, simulation results [20] demonstrated that such a discriminant function-based estimator can be more accurate and precise in small samples than the standard maximum likelihood estimator (MLE) based on Equation (1).

#### 2.2. Models for Pooled Data without Measurement or Processing Error

_{ij}= 0 or y

_{ij}= 1), and if pooling is random within y strata, then model (1) implies the following poolwise logistic model (Equation (3)):

_{i}= 1 or 0 for a “case” or “control” pool (all members positive or negative, respectively), p

_{i}= Pr (Y

_{i}= 1 | X

_{i}= x

_{i}, C

_{i1}= c

_{i1},…, C

_{iT}= c

_{iT}), g

_{i}is the size of (i.e., number of specimens in) the i-th pool, ${\text{x}}_{\text{i}}={\displaystyle \sum _{\text{j}=1}^{{\text{g}}_{\text{i}}}{\text{x}}_{\text{ij}}}$ is g

_{i}times the average exposure across pool members assumed returned by the assay, ${\text{c}}_{\text{it}}={\displaystyle \sum _{\text{j}=1}^{{\text{g}}_{\text{i}}}{\text{c}}_{\text{ijt}}}$ is the sum of the values of the t-th covariate across the members of pool i, and ln(r

_{gi}) is an offset with r

_{gi}being the ratio of the number of case pools of size g

_{i}to the number of control pools of size g

_{i}. Model (3) is fit with g

_{i}as a covariate and with no intercept, applying the offset. This can be done using standard software, e.g., the LOGISTIC procedure in SAS [23], to obtain an MLE for β and its corresponding standard error.

_{i}) stemming from model (2) is:

_{i}is a random variable corresponding to the observed sum (x

_{i}) of exposure levels across pool members, x

_{i}and c

_{it}are the same exposure and covariate sums that appear in Equation (3), ${\text{y}}_{\text{i}}^{*}={\displaystyle \sum _{\text{j}=1}^{{\text{g}}_{\text{i}}}{\text{y}}_{\text{ij}}}$, and ${\text{\epsilon}}_{\text{i}}={\displaystyle \sum _{\text{j}=1}^{{\text{g}}_{\text{i}}}{\text{\epsilon}}_{\text{ij}}}\stackrel{\text{iid}}{~}\text{N}(0,{\text{g}}_{\text{i}}{\text{\sigma}}^{2})$. Note that model (4), like model (3), is fit without an intercept and with the pool size (g

_{i}) as a covariate. If pool sizes are not equal, model (4) must be fit using weighted least squares (WLS) with weights w

_{i}= 1/g

_{i}. This yields the WLS estimate $\widehat{\text{\beta}}*$, along with the residual variance estimate ${\widehat{\text{\sigma}}}^{2}=\text{MSE}={(\text{k}-\text{T}-2)}^{-1}{\displaystyle \sum _{\text{i}=1}^{\text{k}}{\text{g}}_{\text{i}}^{-1}{({\text{Y}}_{\text{i}}-{\widehat{\text{Y}}}_{\text{i}})}^{2}}$. Lyles, Guo and Hill [20] considered only standard MLR models (with an intercept) fit via ordinary least squares, but the distributional properties of $\widehat{\text{\beta}}*$ and MSE assuming i.i.d. normal errors in the individual-level model (2) yield an immediate extension of their results to derive two estimators for the adjusted log OR of interest based on the WLS fit of model (4). To remain consistent with their notation, we refer to these as $\text{ln}{(\hat{\text{OR}})}_{\text{samp}}$ and $\text{ln}{(\hat{\text{OR}})}_{\text{umvu}}$, respectively, where “samp” denotes an unadjusted sample-based estimate. These new estimators for the adjusted log OR based on pooled exposure assays are given in (Equation (5)):

#### 2.3. Models for Pooled Data with Measurement and/or Processing Error

_{i}’s). Note that model (7) implies the assumptions that each laboratory assay result is subject to measurement error with a constant variance regardless of whether it is performed on a pooled or individual specimen, and the indicator function makes clear that each pooled assay (i.e., where g

_{i}> 1) is assumed subject to processing error with a constant variance regardless of the pool size.

#### 2.4. Design Considerations and Bias Adjustment

^{2}) can be estimated uniquely. For this purpose, we recommend a “hybrid” design [7], in which individual exposure assay measurements (g

_{i}= 1) are combined with pools (g

_{i}> 1) of at least two different sizes. The pools of two or more sizes should permit estimation of σ

^{2}, while the individual assays provide observations devoid of processing error and should permit identification of the other two components. This requirement can be relaxed if one expects only measurement or processing error (not both). In that case one variance component $({\text{\sigma}}_{\text{p}}^{2}\text{or}{\text{\sigma}}_{\text{m}}^{2})$ is eliminated when specifying ${\text{\sigma}}_{\text{i}}^{2}$ in Equation (8), and a design featuring pools of any two or more sizes (including ‘pools’ of size 1) would theoretically be adequate. We return to such considerations in Section 3 when introducing the real data example, and we include “measurement error only” and “processing error only” models in simulation studies described in Section 4.

^{2}, the stability of the discriminant function-based estimator in Equation (9) may remain an issue. We note that in the absence of measurement and processing error, the UMVU estimator in Section 2.2 improves stability by eliminating small-sample bias entirely. While we have not developed a UMVU estimator in the presence of measurement and/or processing error, a second-order Taylor series expansion leads to the following bias-adjusted alternative to the MLE in Equation (9):

^{2}close to 0 can correspond to exceedingly large log OR estimates. This instability makes the theoretical bias associated with both Equation (9) and Equation (11) infinite whenever there is a positive probability that ${\widehat{\text{\sigma}}}^{2}$ equals 0, and can also produce occasional “blow ups” in estimated standard errors based on Equation (10). For this reason, our empirical studies in Section 4 include a discussion of practical strategies to reduce such problems. This includes consideration of Akaike’s information criterion (AIC; [26]) to select a model accounting solely for measurement or processing error if the model accounting for both is subject to instability in the estimated log OR and/or its accompanying standard error.

## 3. Example

#### 3.1. Collaborative Perinatal Project Data

_{1}; black vs. white), and smoking status (C

_{2}; yes vs. no) were measured individually. The cytokine concentration (X) of interest is that of monocyte chemotactic protein 1 (MCP1; X). We use MCP1 assay results from 251 pools of size 2 (involving 502 women), along with individual MCP1 assays from the other 164 women who were not included in pools. Women paired in pools were matched on SA (Y) status.

_{1}) and smoking status (C

_{2}).

#### 3.2. Results

^{2}) and hence the log OR of interest is not. This stems from the fact that only a single pool size (k

_{i}= 2) was utilized in the study (see Section 2.4). The other three models (ME only, PE only, and neither ME nor PE) all agree with regard to a positive but non-significant estimated log OR characterizing the adjusted association between SA status and MCP1 levels. For the ME only model, the estimated measurement error variance $({\text{\sigma}}_{\text{m}}^{2})$ attained a lower bound (0.001) that was set for each variance component in the numerical ML optimization process. As such, results for the ME only and the “neither ME nor PE” models are extremely similar to each other. Those results are also qualitatively similar to an analysis based on the Weinberg-Umbach [8] model in Equation (3), results of which we provide in the last row for comparison.

Model | $\widehat{\text{\beta}}*$ | ${\widehat{\text{\sigma}}}^{2}$ | ${\widehat{\text{\sigma}}}_{\text{m}}^{2}$ | ${\widehat{\text{\sigma}}}_{\text{p}}^{2}$ | $\text{ln}{(\hat{\text{OR}})}_{\text{ml}}$ | $\text{ln}{(\hat{\text{OR}})}_{\text{adj}}$ ^{e} | AIC |
---|---|---|---|---|---|---|---|

ME and PE ^{b} | 0.031 (0.026) | -- | -- | -- | -- | -- | -- |

ME only | 0.032 (0.025) | 0.102 | 0.001 ^{c} | -- | 0.311 (0.25) [−0.17, 0.80] | 0.310 (0.25) [−0.17, 0.79] | 420.64 |

PE only | 0.031 (0.026) | 0.079 | -- | 0.078 | 0.388 (0.32) [−0.25, 1.02] | 0.383 (0.32) [−0.25, 1.01] | 412.82 |

Neither ME nor PE | 0.032 (0.025) | 0.103 | -- | -- | 0.309 (0.25) [−0.17, 0.79] | 0.308 (0.24) [−0.17, 0.79] | 418.46 |

Logistic regression ^{d} | -- | -- | -- | -- | 0.270 (0.24) [−0.20, 0.74] | -- | -- |

^{a}Numbers in parentheses () are estimated standard errors; 95% CIs are in brackets [];

^{b}Model fails to identify σ

^{2}due to design limitations (k

_{i}= 1,2 only);

^{c}${\widehat{\text{\sigma}}}_{\text{m}}^{2}$ hits boundary constraint of 0.001;

^{d}Based on Weinberg-Umbach poolwise model (Section 2.2), not accounting for ME or PE;

^{e}Estimates and standard errors adjusted as proposed in Section 2.4.

## 4. Simulations

_{2}was first generated to match the observed prevalence of smoking status. A second Bernoulli variable C

_{1}was then drawn with prevalence matching that observed within the corresponding smoking group. Then, SA status (Y) was generated conditional on C

_{1}and C

_{2}, according to a logistic regression with parameters matching estimates obtained from the observed CPP data. Finally, MCP1 concentration (X) was generated according to model (2), again with parameters closely mimicking estimates obtained from the corresponding model fit to individual-level CPP data. We note that this means of generating the data can also be shown to imply the validity of model (1). The true values of β* and σ

^{2}used in the simulations were 0.035 and 0.08, respectively, for a true adjusted log OR of 0.4375.

_{i}= 1,2), we generated data with three pool sizes (k

_{i}= 1,2,3) so that all variance components in Equation (8) would theoretically be identifiable. Specifically, 50% of the total number (N) of individual “subjects” were allocated to pools of size 4, 50% of the remainder to pools of size 2, and the rest were treated individually. True poolwise exposures (X

_{i}) were calculated as the sum of individual exposures for those in the pool. For simulations involving ME and/or PE, normal errors were randomly generated with mean 0 and variance ${\text{\sigma}}_{\text{m}}^{2}=0.08$ and/or ${\text{\sigma}}_{\text{p}}^{2}=0.08$, respectively.

#### 4.1. Results of Simulations with Neither ME nor PE

N | $\widehat{\text{\beta}}*$ | MSE | $\text{ln}{(\hat{\text{OR}})}_{\text{samp}}$ | $\text{ln}{(\hat{\text{OR}})}_{\text{umvu}}$ | Logistic Regression ^{c} |
---|---|---|---|---|---|

2000 | 0.035 (0.013) | 0.080 | 0.439 (0.166) [94.6%] | 0.438 (0.166) [94.6%] | 0.441 (0.168) [94.8%] |

200 | 0.035 (0.042) | 0.080 | 0.447 (0.545) [95.8%] | 0.438 (0.534) [95.8%] | 0.474 (0.586) [95.1%] |

^{a}Table shows mean estimates across 5000 simulations, with empirical standard deviations in parentheses () and 95% CI coverages in brackets [];

^{b}True values: β* = 0.035, σ

^{2}= 0.080, ln(OR) = 0.438;

^{c}Based on Weinberg-Umbach poolwise model.

#### 4.2. Results of Simulations with ME and/or PE

_{i}= 1,2,3) permits identifying all variance components in the general model that requires estimating both ${\text{\sigma}}_{\text{m}}^{2}$ and ${\text{\sigma}}_{\text{p}}^{2}$ in addition to the residual variance σ

^{2}, some numerical instabilities were still observed. Specifically, the MLE for σ

^{2}occasionally met the lower boundary of 0.001, and/or the estimated standard error accompanying $\text{ln}{(\hat{\text{OR}})}_{\text{ml}}$ in Equation (9) was implausibly large. In such boundary cases or if the estimated standard error under the general model was more than 10 times the standard error under a model ignoring ME and PE, we used the AIC criterion to select the best fitting alternative model (ME only or PE only) to estimate the log OR. Such model selection adjustments were fairly common with N = 500 when generating data under the most general model, but were almost never necessary under the ME only or PE only models and were much less frequent for larger sample sizes. If a different model than the one that generated the data was selected under the specified criteria, standard errors as well as the point estimate of the log OR were based on the selected model.

^{2}or in the standard error accompanying $\text{ln}{(\hat{\text{OR}})}_{\text{ml}}$ led to an AIC-based decision to base estimation on the ME only or PE only model in 19.1% of simulations with N = 500. This percentage reduced to 8.5% and 3.4% with N = 1000 and 2000, respectively. Ultimately, the estimator is characterized by acceptable mean and median bias and accompanied by adequate (if a bit anticonservative) CI coverages. As expected, the Weinberg-Umbach model (Section 2.2) produces a markedly attenuated log OR estimate with sub-nominal coverage, since it does not account for measurement or processing errors. We do not summarize the bias-corrected estimator $\text{ln}{(\hat{\text{OR}})}_{\text{adj}}$ in Table 3, since numerical issues affecting $\text{ln}{(\hat{\text{OR}})}_{\text{ml}}$ also tended to impact stability of the estimated correction factor under the most general model.

N | $\widehat{\text{\beta}}*$ | ${\widehat{\text{\sigma}}}^{2}$ | ${\widehat{\text{\sigma}}}_{\text{m}}^{2}$ | ${\widehat{\text{\sigma}}}_{\text{p}}^{2}$ | $\text{ln}{(\hat{\text{OR}})}_{\text{ml}}$ ^{d} | Logistic Regression ^{e} |
---|---|---|---|---|---|---|

2000 | 0.035 (0.017) | 0.079 | 0.081 | 0.082 | 0.474 |0.438| (0.28) [95.4%] | 0.254 |0.254| (0.13) [66.7%] |

1000 | 0.035 (0.024) | 0.077 | 0.082 | 0.081 | 0.463 |0.417| (0.37) [96.2%] | 0.252 |0.251| (0.18) [79.4%] |

500 | 0.035 (0.034) | 0.077 | 0.081 | 0.080 | 0.448 |0.402| (0.49) [97.2%] | 0.259 |0.254| (0.26) [88.9%] |

^{a}Table shows mean estimates across 2500 simulations, with median estimates in bars ||, empirical standard deviations in parentheses () and 95% CI coverages in brackets [];

^{b}True values: β* = 0.035, ${\text{\sigma}}^{2}={\text{\sigma}}_{\text{m}}^{2}={\text{\sigma}}_{\text{p}}^{2}=0.08$, ln(OR) = 0.438;

^{c}Mean estimates of β* and variance components exclude simulation runs in which σ

^{2}estimate hit 0.001 boundary. This occurred in 7.6%, 1.2%, and 0.08% of runs with N = 500, 1000, 2000, respectively;

^{d}Final log OR estimate incorporates AIC-based model selection (see Section 4.2) with ME only or PE only model selected in 19.1%, 8.5%, and 3.4% of runs with N = 500, 1000, 2000, respectively;

^{e}Based on Weinberg-Umbach poolwise model (Section 2.2), not accounting for ME or PE.

N | $\widehat{\text{\beta}}*$ | ${\widehat{\text{\sigma}}}^{2}$ | ${\widehat{\text{\sigma}}}_{\text{m}}^{2}$ | $\text{ln}{(\hat{\text{OR}})}_{\text{ml}}$ | $\text{ln}{(\hat{\text{OR}})}_{\text{adj}}$ | Logistic regression ^{d} |
---|---|---|---|---|---|---|

2000 | 0.035 (0.016) | 0.079 | 0.080 | 0.448 (0.22) [0.21] {95.4%} | 0.438 (0.21) [0.21] {95.6%} | 0.291 (0.13) [0.13] {79.9%} |

1000 | 0.035 (0.021) | 0.079 | 0.080 | 0.474 (0.32) [0.31] {96.5%} | 0.450 (0.30) [0.30] {96.4%} | 0.298 (0.19) [0.19] {89.0%} |

500 | 0.036 (0.031) | 0.076 | 0.083 | 0.522 c (0.51) [0.50] {97.5%} | 0.454 c (0.42) [0.43] {96.8%} | 0.307 (0.28) [0.27] {92.0%} |

^{a}Table shows mean estimates across 2500 simulations, with empirical standard deviations in parentheses (), mean estimated standard errors in brackets [] and 95% CI coverages in braces {};

^{b}True values: β* = 0.035, ${\text{\sigma}}^{2}={\text{\sigma}}_{\text{m}}^{2}=0.08$, ln(OR) = 0.438;

^{c}Final log OR estimate incorporates AIC-based model selection (see Section 4.2) with PE only model selected in 0.5% of runs with N = 500. ME only model used in 100% of runs with N = 1000 and 2000;

^{d}Based on Weinberg-Umbach poolwise model (Section 2.2), not accounting for ME or PE.

N | $\widehat{\text{\beta}}*$ | ${\widehat{\text{\sigma}}}^{2}$ | ${\widehat{\text{\sigma}}}_{\text{p}}^{2}$ | $\text{ln}{(\hat{\text{OR}})}_{\text{ml}}$ | $\text{ln}{(\hat{\text{OR}})}_{\text{adj}}$ | Logistic Regression |
---|---|---|---|---|---|---|

2000 | 0.035 (0.014) | 0.080 | 0.080 | 0.444 (0.18) [0.18] {95.6%} | 0.442 (0.18) [0.18] {95.5%} | 0.356 (0.15) [0.15] {91.3%} |

1000 | 0.035 (0.020) | 0.079 | 0.078 | 0.441 (0.26) [0.26] {95.0%} | 0.438 (0.26) [0.26] {95.0%} | 0.356 (0.21) [0.21] {92.4%} |

500 | 0.035 (0.029) | 0.079 | 0.078 | 0.447 (0.37) [0.37] {96.0%} | 0.440 c (0.37) [0.36] {96.0%} | 0.361 (0.30) [0.30] {94.3%} |

^{a}Table shows mean estimates across 2500 simulations, with empirical standard deviations in parentheses (), mean estimated standard errors in brackets [] and 95% CI coverages in braces {};

^{b}True values: β* = 0.035, ${\text{\sigma}}^{2}={\text{\sigma}}_{\text{p}}^{2}=0.08$, ln(OR) = 0.438;

^{c}PE only model used in 100% of all runs for each sample size;

^{d}Based on Weinberg-Umbach (1999) poolwise model (Section 2.2), not accounting for ME or PE.

## 5. Discussion

## 6. Conclusions

## Acknowledgements

## Author Contributions

## Conflicts of Interest

## Appendix: SAS/IML Code Used to Fit Model (7) to Example Data

_{1}) for each pool), racesum (sum of indicators for ethnicity group (C

_{2}) for each pool), mcp1_sum (pooled (or individual, for ki = 1) MCP1 assay value (X) for each pool).

proc iml worksize = 70 symsize = 250; use fordiscrim; read all var{ki} into kj; read all var{SAsum} into ystar; read all var{smokesum} into smokestar; read all var{racesum} into racestar; read all var{mcp1_sum} into xstrtilde; close fordiscrim; npools = 415; ** 251 pools of size 2, and 164 individual samples **; ** Specifying likelihood for FULL ML method **; START LIKELIC(parms) global (npools,kj,pi,ystar,smokestar,racestar,xstrtilde); bet0prm = parms [1]; bet1prm = parms [2]; gamm1prm = parms [3]; *** Parameters in model (7) to be estimated ***; gamm2prm = parms [4]; sigsqx = parms [5]; sigsqp = parms [6]; sigsqm = parms [7]; pi = 2 * arsin (1); *** NOTE: LOWER BOUND CONSTRAINT ON VARIANCE COMPONENTS FOR STABILITY ***; sigsqx = max (sigsqx,.001); sigsqp = max (sigsqp,.001); sigsqm = max (sigsqm,.001); * contributions to likelihood ; func_lkC = j (npools,1,.); do u = 1 to npools; ystr = ystar [u,1]; smkstr = smokestar [u,1]; racestr = racestar [u,1]; ki = kj [u,1]; xstrt = xstrtilde [u,1]; kigt1 = 0; if ki > 1 then kigt1 = 1; muxtstrgyc = ki#bet0prm + bet1prm#ystr + gamm1prm#smkstr + gamm2prm#racestr; sigsqxtstrgyc = ki#sigsqx + sigsqp#kigt1 + sigsqm; func_lkC[u,1] = (1/sqrt(2#pi#max(sigsqxtstrgyc,1E-4)))#exp(-(xstrt-muxtstrgyc)##2/(2#max(sigsqxtstrgyc,1E-4))); ** Next 2 lines to prevent instability during iterations **; if func_lkC [u,1] < 1E-100 then func_lkC [u,1] = 1E-100; if func_lkC [u,1] > 1E20 then func_lkC [u,1] = 1E20; func_lkchk = func_lkC [u,1]; * print func_lkchk; end; m2loglikC = -2 # sum (log(func_lkC)) ; return (m2loglikC); FINISH LIKELIC; ********************************************************************** The following calls the minimization function, computes the Hessian, etc. **********************************************************************; START COMPC; ** Maximum likelihood method **; * create vector of initial parameter estimates for function; parms =.2||.2||.2||.2||.5||.5||.5; * options vector for minimization function; option = {0 3}; ** matrix of lower (row 1) and upper (row 2) bound constraints on probabilities **; con={. . . . .001 .001 .001, . . . . . . .}; *call function minimizer in IML; call nlpqn(rc,xres, “likelic”,parms,option,con); * create vector of mles computed using function minimizer; Parms = xres`; * compute numerical value of Hessian( and covariance matrix) using mles calculated above ; print parms; * call function to approximate 2nd derivatives for Hessian; call NLPFDD (crit,grad,hess, “likelic”,parms); cov_mat = 2 * inv (hess); se_vec1 = sqrt (vecdiag (cov_mat)); print se_vec1; print cov_mat; print rc; bet0prm = parms [1]; bet1prm = parms [2]; gamm1prm = parms [3]; gamm2prm = parms [4]; sigsqx = parms [5]; sigsqp = parms [6]; sigsqm = parms [7]; sebet1prm = sqrt (cov_mat [2,2]); segamm1prm = sqrt (cov_mat [3,3]); bet1discrim = bet1prm/sigsqx; print bet1discrim; FINISH COMPC; run compc; QUIT;

## References

- Dorfman, R. The detection of defective members of a large population. Ann. Math. Stat.
**1943**, 14, 436–440. [Google Scholar] [CrossRef] - Emmanuel, J.C.; Bassett, M.T.; Smith, H.J.; Jacobs, J.A. Pooling of sera for human immunodeficiency virus (HIV) testing: An economical method for use in developing countries. J. Clin. Pathol.
**1988**, 41, 582–585. [Google Scholar] [CrossRef] [PubMed] - Kline, R.L.; Brothers, T.A.; Brookmeyer, R.; Zeger, S.; Quinn, T.C. Evaluation of human immunodeficiency virus seroprevalence in population surveys using pooled sera. J. Clin. Microbiol.
**1989**, 27, 1449–1452. [Google Scholar] [PubMed] - Lan, S.; Hsieh, C.; Yen, Y. Pooling strategies for screening blood in areas with low prevalence of HIV. Biomed. J.
**1993**, 35, 553–565. [Google Scholar] [CrossRef] - Brookmeyer, R. Analysis of multistage pooling studies of biological specimens for estimating disease incidence and prevalence. Biometrics
**1999**, 55, 608–612. [Google Scholar] [CrossRef] [PubMed] - Schisterman, E.F.; Vexler, A. To pool or not to pool, from whether to when: Applications of pooling to biospecimens subject to a limit of detection. Pediatr. Perinat. Epidemiol.
**2008**, 22, 486–496. [Google Scholar] [CrossRef] [PubMed] - Schisterman, E.F.; Vexler, A.; Mumford, S.F.; Perkins, N.J. Hybrid pooled-unpooled design forcost-efficient measurement of biomarkers. Stat. Med.
**2010**, 29, 597–613. [Google Scholar] [PubMed] - Weinberg, C.R.; Umbach, D.M. Using pooled exposure assessment to improve efficiency in case-control studies. Biometrics
**1999**, 55, 718–726. [Google Scholar] [CrossRef] [PubMed] - Ma, C.-X.; Vexler, A.; Schisterman, E.F.; Tian, L. Cost-efficient designs based on linearly associated biomarkers. J. Appl. Stat.
**2011**, 38, 2739–2750. [Google Scholar] [CrossRef] - Zhang, Z.; Albert, P.S. Binary regression analysis with pooled exposure measurements: A regression calibration approach. Biometrics
**2011**, 67, 636–645. [Google Scholar] [CrossRef] [PubMed] - Delaigle, A.; Hall, P. Nonparametric regression with homogeneous group testing data. Ann. Stat.
**2012**, 40, 131–158. [Google Scholar] [CrossRef] - Saha-Chaudhuri, P.; Weinberg, C.R. Specimen pooling for efficient use of biospecimens in studies of time to a common event. Am. J. Epidemiol.
**2013**, 178, 126–135. [Google Scholar] [CrossRef] [PubMed] - Lyles, R.H.; Mitchell, E.M. On Efficient Use of Logistic Regression to Analyze Exposure Assay Data on Pooled Biospecimens; Department of Biostatistics and Bioinformatics, Rollins School of Public Health, Emory University: Atlanta, Georgia, USA, 2013. [Google Scholar]
- Mitchell, E.M.; Lyles, R.H.; Manatunga, A.K.; Danaher, M.; Perkins, N.J.; Schisterman, E.F. Regression for skewed biomarker outcomes subject to pooling. Biometrics
**2014**, 70, 202–211. [Google Scholar] [CrossRef] [PubMed] - Mitchell, E.M.; Lyles, R.H.; Manatunga, A.K.; Perkins, N.J.; Schisterman, E.F. A highly efficient design strategy for regression with outcome pooling. Stat. Med.
**2014**, 33, 5028–5040. [Google Scholar] [CrossRef] [PubMed] - Cornfield, J. Joint dependence of risk of coronary heart disease on serum cholesterol and systolic blood pressure: A discriminant function analysis. Fed. Proc.
**1962**, 21, 58–61. [Google Scholar] [PubMed] - Halperin, M.; Blackwelder, W.C.; Verter, J.I. Estimation of the multivariate logistic risk function: A comparison of the discriminant function and maximum likelihood approaches. J. Chronic Dis.
**1971**, 24, 125–158. [Google Scholar] [CrossRef] - Armstrong, B.G.; Whittemore, A.S.; Howe, G.R. Analysis of case-control data with covariate measurement error: Application to diet and colon cancer. Stat. Med.
**1989**, 8, 1151–1163. [Google Scholar] [CrossRef] [PubMed] - Buonaccorsi, J.P. Double sampling for exact values in the normal discriminant model with application to binary regression. Commun. Stat. Theory Methods
**1990**, 19, 4569–4586. [Google Scholar] [CrossRef] - Lyles, R.H.; Guo, Y.; Hill, A.N. A fresh look at the discriminant function approach for estimating crude or adjusted odds ratios. Am. Stat.
**2009**, 63, 320–327. [Google Scholar] [CrossRef] [PubMed] - Hardy, J.B. The Collaborative Perinatal Project: Lessons and legacy. Ann. Epidemiol.
**2003**, 13, 303–311. [Google Scholar] [CrossRef] - Whitcomb, B.W.; Schisterman, E.F.; Klebanoff, M.A.; Baumgarten, M.; Rhoten-Vlasak, A.; Luo, X.; Chegini, N. Circulating chemokine levels and miscarriage. Am. J. Epidemiol.
**2007**, 166, 323–331. [Google Scholar] [CrossRef] [PubMed] - SAS/STAT 9.2 User’s Guide; SAS Institute, Inc.: Cary, NC, USA, 2008.
- SAS/IML 9.2 User’s Guide; SAS Institute, Inc.: Cary, NC, USA, 2008.
- Firth, D. Bias reduction of maximum likelihood estimates. Biometrika
**1993**, 80, 27–38. [Google Scholar] [CrossRef] - Akaike, H. A new look at the statistical model identification. IEEE Trans. Automat. Contr.
**1974**, 19, 716–723. [Google Scholar] [CrossRef] - Weinberg, C.R.; Umbach, D.M. Correction to “Using pooled exposure assessment to improve efficiency in case-control studies”. Biometrics
**2014**. [Google Scholar] [CrossRef]

© 2015 by the authors; licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution license (http://creativecommons.org/licenses/by/4.0/).

## Share and Cite

**MDPI and ACS Style**

Lyles, R.H.; Van Domelen, D.; Mitchell, E.M.; Schisterman, E.F.
A Discriminant Function Approach to Adjust for Processing and Measurement Error When a Biomarker is Assayed in Pooled Samples. *Int. J. Environ. Res. Public Health* **2015**, *12*, 14723-14740.
https://doi.org/10.3390/ijerph121114723

**AMA Style**

Lyles RH, Van Domelen D, Mitchell EM, Schisterman EF.
A Discriminant Function Approach to Adjust for Processing and Measurement Error When a Biomarker is Assayed in Pooled Samples. *International Journal of Environmental Research and Public Health*. 2015; 12(11):14723-14740.
https://doi.org/10.3390/ijerph121114723

**Chicago/Turabian Style**

Lyles, Robert H., Dane Van Domelen, Emily M. Mitchell, and Enrique F. Schisterman.
2015. "A Discriminant Function Approach to Adjust for Processing and Measurement Error When a Biomarker is Assayed in Pooled Samples" *International Journal of Environmental Research and Public Health* 12, no. 11: 14723-14740.
https://doi.org/10.3390/ijerph121114723