1. Introduction
In human and veterinary science, serological assays are important tools for establishing the seroprevalence of various pathogens in a population. Such tests are typically low-cost, high-throughput, and can be executed rapidly. Serological assays can be used to assess antibody levels in serum (or similar fluid host material). In cases where enzyme-linked immunosorbent assay (ELISA) is applied for this purpose, the antibody concentration is usually reflected by chromogenic changes, measured as the optical density (OD). There is a clear intuition that ‘high’ OD levels should correspond with (previous) infection, and ‘low’ OD levels with the absence of antibodies specific to the pathogen of interest. However, both seropositive and seronegative individuals show heterogeneity in their response: seronegative ones due to variation in cross-reacting antibodies and variation in general antibody response. Seropositive individuals vary due to different infection doses, individual variation in antibody response and waning response over time [
1].
A typical seroprevalence study would include sera from a sample of the target population and classify individuals according to their individual measures of antibody level. For this classification, often a ‘cut-off value’ is chosen, below which an individual is assumed to be seronegative. The determination of a suitable cut-off is a sensitive issue. Roughly in order of increasing complexity, the following approaches are found in practice [
1]:
Using a known negative population. With access to a known negative population, the sample mean
plus a multiple of the standard deviation
S, may be used as a cut-off. This assumes that the distribution of OD values in the negative population is Gaussian, implying that
approximates the 99.7th quantile, resulting in the false positive rate to below 0.3%. However, this assumption rarely holds. Rather, OD values are usually normally distributed on the log-scale [
1,
2,
3]. Moreover, the sensitivity remains unknown, and in the absence of an independent estimate of sensitivity, the prevalence cannot be reliably estimated.
Using a ‘golden standard’. The cut-off can be chosen in such a way as to maximize concordance between another test and the results of the serological assay. This is somewhat unsatisfactory since it presumes one test (i.e., ‘the golden standard’) to be absolutely superior over the other.
Using both a known negative and a known positive population. Having access to an additional known positive population allows for estimating both a sensitivity and specificity at any cut-off. An ROC/AUC (receiver-operator Characteristic/Area Under the Curve) analysis can then aid the researcher in selecting a cut-off with favorable properties [
4], and also enables a correction of the prevalence using the Rogan–Gladen estimator [
5]. However, it is unsatisfactory to obtain a corrected prevalence that no longer matches the prevalence based on the individual classifications.
Moreover, for all of these methods, individuals with a predefined infection status may show less variability in OD value than individuals from the target population, e.g., when experimentally infected animals are used for the known positive population, or animals raised under SPF (specific-pathogen-free) conditions are used as a known negative population.
In practice, often a limited number of negative control sera, sometimes only those added to each plate to correct for plate-to-plate variation, are used to estimate a cut-off value. However, large numbers of reference sera are needed to reliably estimate a mean and standard deviation. In Jacobson [
1], calculations are performed, suggesting hundreds of infected and uninfected reference animals are needed to obtain reasonable accuracy. Furthermore, caution must be taken, and cut-off values based on controls repeated over plates should be avoided, since they represent the uncertainty of the OD values of the controls, not the variation within a population.
The techniques outlined above also do not offer (1) measures of uncertainty of the results; (2) extensions to more complicated settings (e.g., spatial variation, or risk-factors such as age); and (3) extensions to more than two infectious states (e.g., positive, negative, or infected with a related strain). Clearly, the often used cut-off value for classifying ELISA results has some limitations.
The need for the proper inclusion of test accuracy characteristics, and the other pitfalls involved, was pointed out before [
2,
6]. In the latter publication, latent class models set in a frequentist or Bayesian framework were advocated. The mixture models that we propose in the current work can also be viewed as such.
In a binary mixture model, the population is described by a mixture of two components, in which there is not a sharp cut-off, but rather a twilight zone of uncertain classification [
7]. The statistical challenge is to capture those components, and also characterize the uncertainty. The goal of this paper is to introduce a more appropriate technique to analyze ELISA results, built around the concept of ‘binary mixture modeling’. This class of models has properties that a cut-off model lacks: direct estimation of prevalence, quantification of uncertainty in the estimates, and extension to complicated settings (in this paper, the sample matrix: serum or heart fluid). A further advantage is the proper treatment of censored values (measurements outside of the domain where the assay operates). Mixture models and the Bayesian treatment of the subject have been reported before, however, to our knowledge, a user-friendly model including plate-to-plate variation and the inclusion of covariates has not been presented before. To show how binary mixture models can be used in serological studies, we used the model on the example of Seoul orthohantavirus (SEOV) in rats. This rat-borne virus has a worldwide distribution and has gained renewed interest in the last decade, when human cases of hemorrhagic fever with renal syndrome (HFRS) were diagnosed in Europe and the USA [
8]. We compared the model with classical cut-off methods and independent real-time quantitative polymerase chain reaction (RT-qPCR) results and show that the binary mixture model offers various properties that make it a better alternative to classical cut-off methods.
4. Discussion
In this paper, we developed a binary mixture model in a Bayesian setting and used it on data of SEOV infection in rats. This methodology has a number of distinct advantages compared to the classical cut-off method. First, mixture models for serology are based on biologically plausible principles: lognormal distributions of groups of positive and negative individuals. This is in contrast to most cut-off methods where one hard threshold is postulated which is supposed to neatly divide the population in two. In some cut-off methods, this is accounted for by introducing a ‘doubtful’ category, however, this does not really solve the problem since a decision still needs to be made for the ‘doubtful’ category when prevalence is estimated. In binary mixture modeling, the prevalence is estimated directly from the data. Second, the outcome of a mixture model for individual samples is more informative than that of a hard classification based on a cut-off, since we have probabilities of positivity. Now, we do have an opportunity to assess our confidence in the infection status based on these probabilities. Third, when performing a Bayesian model fit, the uncertainty in the parameter estimates may be assessed. For the cut-off method, no such concept exists, only uncertainty due to the finite number of samples is taken into account, for example by bootstrapping the data. Finally, in the Bayesian mixture model, covariates can be included in a straightforward way, which we demonstrated by differentiating between the matrices ‘serum’ and ‘heart fluid’. Other extensions are possible, such as the inclusion of variables such as age, sex, country, species, etc.
Another appealing property of Bayesian modeling is that many data sources can be integrated in one model. We highlighted this by including three studies. Each study added information which enabled more precise estimates of the location and spread of the components. Data of future studies may also be incrementally added to the database, thereby continuously refining our estimates.
In this paper, we also performed a plate-to-plate correction following [
12], but on the log10-transformed OD values. Another adaptation to the procedure was removing the constant term from the regression equation. This was based on the recognition that adding a constant to a log-normal distribution skews the shape, which hinders fitting a model based on detection of these shapes. Some commercial kits recommend using the sample-positive ratio, scaling each plate by the value of the positive control. This method does not take into account that the positive control also has some random fluctuation on top of the systematic bias. With a regression approach using several control sera, such fluctuations are averaged and less influential. In future work, it would be interesting to include the plate-to-plate variation directly in the Bayesian model.
As a negative population, we used a wild rat population which was considered to be negative, based on all-negative RT-qPCR results. Cut-off values were obtained in log10-scale and linear scale using the mean plus two or three times the standard deviation. The log-scale is a prerequisite for obtaining the specificities that are aimed for with a mean plus 2 or 3 standard deviations: 97.5% and 99.7%.
Table 3 shows that indeed, the cut-off values based on log-transformed values come much closer to the desired specificities, albeit not perfectly. For the feeder rat study, this may in part be explained by the different matrix employed, which shifted the distributions to the right.
Furthermore, the binary mixture model did not always perform as well as theoretically predicted. Looking at the captive rats study, we find only a 71% sensitivity where 93% was expected (
Table 3). In the study on feeder rats, we obtained 84% instead of the 97% expected. The ‘outlier points’ in
Figure 4 give us a possible explanation, they seem to be truly RT-qPCR positive but serologically negative or vice versa. For example, in the study on feeder rats, there are 16 PCR negative rats, but three of those were scored positive by the mixture model, as well as by most cut-off methods. However, looking at the top three points, we can observe that those are clearly serologically positive. Taking this into account would raise the specificity from
to perfect specificity. Hence the theoretical mixture model performance could well be accurate while the supposed ’golden standard’ is lacking. Similar considerations also hold for the cut-off-based methods. The few rats for which serology and RT-qPCR results diverge could be rats that were recently infected before capture, or rats that did clear their SEOV infection. The finding of RT-qPCR negative and seropositive rats indicates that clearance of the virus may be possible in rare cases. However, these are exceptions, as most SEOV infections result in life-long infection in rats [
16,
17].
We found that the results for the different cut-off methods are very sensitive to the exact cut-off chosen. Naturally, this depends on the characteristics of the ELISA and the overlap between the two distributions. When a distinctive negative and positive distribution are present, the influence of the chosen cut-off is smaller than that compared to a situation with overlapping distributions, which is more common in adapted or in-house ELISAs for non-standard species or pathogens. In our SEOV ELISA, the result for a cut-off of the mean plus three standard deviations in linear scale works best (in terms of accuracy, sensitivity, and specificity), and it even slightly outperforms the binary mixture model. However, this was a lucky coincidence, as the cut-off choice is arbitrary, and without a ‘gold standard’ to compare with, one can never know which test is best in practice. Nonetheless, the added advantages of the binary mixture model outweigh this slight outperformance.
The sensitivities of the cut-off-based on three standard deviations are poor.
Figure 3 shows why this occurs: the cut-off of the mean plus three standard deviations is ‘cutting off’ a large chunk of the positive component, mainly because of the shift to the right in the feeder rat study, which was based on heart fluid which is systematically higher than serum. Hence, a cut-off that works nicely in one situation may fail in other situation. In contrast, the binary mixture model has a stable performance.
There are some cases in which binary mixture models will not converge to stable parameter estimates. For both mixture models and cut-off models, the performance is very much dependent on the characteristics of the data. When the positive and negative populations overlap substantially, or when the prevalence is extremely small or large, both methods will have trouble giving meaningful results. However, crucially, when using a cut-off method, one can never know how inaccurate the obtained result is. Mixture models in contrast will either fail to converge, or give large credible intervals for parameter estimates, thereby signaling that the results may be unreliable.
Another problematic feature of the data could be that components are skewed and no longer resemble a proper log-normal distribution. Using mixture models, this may be solved by using non-Gaussian distributions [
18,
19]. A possible cause for this phenomenon is that at the extremes of the OD value spectrum, the optical densities no longer linearly depend on the antibody levels. A calibration curve could be included in the model to correct for this effect, or alternatively, the skewed parts of the model could be treated as censored data (the same technique as used in the current study for negative OD values). It could also be the case that the data is actually not log-normally distributed because it is a mixture of differing populations, age-groups, etc. Such a situation would require detailed study to find the root cause.
The use of binary mixture models for serological assays has been advocated since the 1990s [
1,
20]. Since then, several extensions to the basic Bayesian model have been proposed, such as the inclusion of dependence between multiple tests on single subjects [
21], or integrated plate-to-plate variation using a hierarchical Bayesian set up [
3]. It would be an interesting avenue of research to further develop Bayesian models to include all those aspects in one model.
Despite its advantages, the binary mixture modeling method has not gained widespread use in veterinary and clinical practice. Surely the complexity of the method as compared to a simple cut-off value plays a role in this. However, nowadays, researchers in life sciences are increasingly supported by (in-house) statistical or modeling expertise, to assist with data analyses. Furthermore, traditionally the focus has mostly been on the proper data collection and laboratory analyses, instead of the methods used for data-analysis.
It is our hope that this manuscript exposed the subject matter in such a way as to make it of practical use for those who are not expert statisticians. We aimed to develop our model code to be accessible for a broad audience. Having the source code available with this paper can perhaps facilitate the use of binary mixture models, and provide a basis to build further on. Having knowledge of the existence of more flexible and informative data-analysis methods will hopefully lead to the increased use of these.