Decision Strategies for Absorbance Readings from an Enzyme-Linked Immunosorbent Assay—A Case Study about Testing Genotypes of Sugar Beet ( Beta vulgaris L.) for Resistance against Beet Necrotic Yellow Vein Virus (BNYVV)

: The Beet necrotic yellow vein virus (BNYVV) causes rhizomania in sugar beet ( Beta vulgaris L.), which is one of the most destructive diseases in sugar beet worldwide. In breeding projects towards resistance against BNYVV, the enzyme-linked immunosorbent assay (ELISA) is used to determine the virus concentration in plant roots and, thus, the resistance levels of genotypes. Here, we present a simulation study to generate 10,000 small samples from the estimated density functions of ELISA values from susceptible and resistant sugar beet genotypes. We apply receiver operating characteristic (ROC) analysis to these samples to optimise the cutoff values for sample sizes from two to eight and determine the false positive rates (FPR), true positive rates (TPR), and area under the curve (AUC). We present, furthermore, an alternative approach based upon Bayes factors to improve the decision procedure. The Bayesian approach has proven to be superior to the simple cutoff approach. The presented results could help evaluate or improve existing breeding programs and help design future selection procedures based upon ELISA. An R-script for the classiﬁcation of sample data based upon Bayes factors is provided.


Introduction
Rhizomania was first recorded in 1952 in the North of Italy [1]. In 1973 the Beet necrotic yellow vein virus (BNYVV), a member of the genus Benyvirus, was identified as the causal agent of rhizomania [2]. By then, BNYVV, which is transmitted by the soil-borne fungus Polymyxa betae Keskin [3][4][5], had already spread across wide parts of Europe and Japan [6]. Today, rhizomania is spread in sugar beet growing areas worldwide and counts as one of the most destructive soil-borne diseases of sugar beet (Beta vulgaris L.) with loss of root yield of up to 90 % and reduced sugar yield by up to 80% [7].
The enzyme-linked immunosorbent assay (ELISA) counts as the standard tool in microbiology to estimate the concentration of a specific protein in a sample [8]. The coat protein of BNYVV enables the usage of ELISA for virus detection. Mainly double antibody sandwich-ELISA (DAS-ELISA) [7,[9][10][11] and triple antibody sandwich-ELISA (TAS-ELISA) [12][13][14] have been adapted as tools to detect BNYVV in a sample either for qualitative detection where only the presence or absence of the organism is analysed [7], semi-quantitative detection where absorbance levels of the ELISA machine are analysed [11] or quantitative detection where virus concentrations are analysed [10,15,16].
In DAS-ELISA, an enzyme-linked antibody binds specifically to the coat protein of BNYVV, which is bound to the surface of a microtitre plate by another specific antibody. The enzyme catalyses a reaction that decomposes a substrate added to each sample leading to lower transmittance of the sample [17,18]. Thus, the level of transmittance is correlated to both the time elapsed since the substrate was added and the amount of BNYVV in the sample. If the transmittance of each sample is analysed after the same amount of time has passed since the substrate has been added, the transmittance of each sample correlates solely to the amount of BNYVV in the sample. The level of transmittance can be analysed by measuring the absorbance of the sample at a certain wavelength, leading to an optical density (OD) value for each sample [19].
In 1970, first breeding efforts on rhizomania resistance were carried out [20]. Seventeen years later, in 1987, a first partial resistance to rhizomania was discovered [21] followed by the discovery of monogenic resistance genes over the years. Nevertheless, rhizomania resistance appears to be quantitative [20]. Using DAS-ELISA, a wide spread of virus titres in different genotypes of sugar beet grown in BNYVV-infested soils could be observed and used to differentiate between resistant and susceptible sugar beet genotypes [22]. To differentiate between resistant and susceptible genotypes, the usage of a cutoff threshold is the straightforward approach. In this way, a threshold is chosen and if the mean OD value of a genotype is below this threshold, it is assumed to be resistant [10,11].
An alternative procedure would be the usage of Bayes factors where the probability of the hypothesis that the sample originates from one distribution is compared with the probability of the hypothesis that the same sample originates from the other distribution [23]. Bayes factors are frequently applied in the life and medical sciences to assess the evidence of fitted models or distributions [24,25]. They are, for example, used in the assessment of different ELISA tests [26] or in the usage of ELISA data to assess the infection state of dairy herds [27]. Moreover, Bayes factors have been used in the analysis of genetic data [28] and, hence, in the selection of genetic markers associated with an individual's phenotype [29]. Bayes factors were also used in resistance breeding of dairy against Mycobacterium avium subspecies paratuberculosis [30].
Here, we compare the procedure based upon a simple cutoff value with the procedure based upon Bayes factors to differentiate between resistant and susceptible sugar beet genotypes grown in BNYVV infested soil. To this end, we estimated probability density functions from the OD values collected with DAS-ELISA from a relatively large number of plants (in the order of hundreds) of genotypes that are known to be either resistant or susceptible. Next, we simulated the drawing of a large number (e.g., 10,000) of small samples from these estimated density functions. For each of these small samples the average OD value and the Bayes factor were determined. Thanks to the labelling of each of these small samples as either "resistant" or "susceptible", the optimal cutoff value for the sample mean as well as for the Bayes factor could be determined in an ROC analysis. For the optimal cutoff values, false positive rate (FPR), true positive rate (TPR), and area under the curve (AUC) were determined as measures of classification quality. We carried out this analysis for samples of size n = 2 to 8. The calculated optimal cutoff values were used to classify completely new measurements. If a sample mean is below the optimal cutoff for the mean OD for a given sample size, the sample is classified as resistant. If one chooses the Bayes factor as predictor, the sample is classified as resistant if the sample's Bayes factor is greater than the optimal cutoff for the Bayes factor for a given sample size. We show that the classification based upon Bayes factors is superior to that based upon mean OD values in terms of the above mentioned quality measures for all sample sizes under investigation.

Plant Material and Plant Growth
Six genotypes were used, of which four were resistant towards rhizomania (KWS A, KWS B, KWS C, KWS D) and two were susceptible (KWS E, KWS F). Each genotype was classified as either "resistant" or "susceptible" based on the distribution of all OD measurements of that genotype. Five genotypes were doubled haploid lines and one genotype (KWS B) was an inbred line to ensure that each plant of the same genotype is genetically identical. Of each genotype, one hundred plants were used leading to 600 plants for the whole trial.
The trial was carried out following the protocol presented in [31] with some modifications. Seeds were germinated in sterile soil. After one week, seedlings were transplanted into pots filled with a soil sand mixture (40:60) with soil from Pithivier (France). The soil was collected from a trial field where symptoms of rhizomania on plants were visible. The plants were grown for 10 weeks in the greenhouse at around 25 • C during the day and 16 • C at night.
After ten weeks, plants were taken out of the soil, soil was removed from the root body with tap water and lateral roots were removed from the root body. Plant sap was extracted from the lateral roots and 50 µg of plant sap was placed into 500 µg of buffer solution containing 1.59 µg Na 2 CO 3 , 2.93 g NaHCO 3 , and 0.2 g NaN 3 solved in 1 l distilled water.

Determination of Optical Density Values
All wells of the 96-well plates except for blanks and buffer controls were coated with 200 µg BNYVV antibody solution (LOEWE ® Biochemica GmbH, Sauerlach, Germany). After storage of the plates for 4 h at 4 • C, wells were washed using 405TS (BioTek ® Instruments Inc., Winooski, VT, USA), filled with 200 µL of the samples, stored for 12 h at 4 • C and washed again. Afterwards, all 96 wells of each plate were filled with 200 µL of enzyme-linked antibodies (LOEWE ® Biochemica GmbH, Sauerlach, Germany), stored for 4 h at 37 • C and washed again. In a last step, each well of one plate was filled at the same time with the substrate solution containing 4-Nitrophenyl phosphate Na2-salt. After storage of the plates for 90 min at 37 • C, OD values were measured using Infinite F50 ® (Tecan Group AG, Männedorf, Switzerland) at a wavelength of 405 nm.

Statistical Analysis
The measurements from the resistant and the susceptible samples were each pooled to yield the resistant and the susceptible groups, respectively. These pools of measurements, X res and X sus , take on the role of the resistant and susceptible populations in the context of this analysis.
Statistical analysis was performed with the software environment R [32], version 3.6.3., graphical displays were generated using ggplot2 [33] and lattice [34]. Density estimation for the OD values from the resistant and susceptible pools,f res andf sus , resp., and drawing samples from these densities were achieved with the package logspline [35,36].
In a simulation, each 10,000 samples x = x 1 , . . . , x n of size n (n = 2, . . . , 8) were drawn from the resistant and the susceptible pool. For each sample, the arithmetic mean was calculated. For each sample size, all OD values between 0 and 4 in steps of 0.01 were used as cutoff values to classify samples as resistant or susceptible. If the mean of a sample was smaller than the cutoff value, it was classified as resistant, otherwise it was classified as susceptible. For each of the analysed cutoff values, the FPR and TPR were calculated, which is the rate of samples being classified as resistant when truly susceptible or being classified as resistant when truly resistant, respectively. Afterwards, for each sample size, a ROC analysis was performed with the package ROCit [37] where for each cutoff value a graph was produced which displays the TPR on the y-axis and the FPR on the x-axis. Subsequently, the AUC of the ROC curve was calculated. We selected the cutoff value per sample size which maximised the difference between TPR and FPR, called the Youden Index, as the optimal cutoff value.
As an alternative approach, we used Bayes factors to classify the samples as either being resistant or susceptible. The Bayes factor B is defined as where p(x|θ 1 ) and p(x|θ 2 ) are the conditional densities for the sampled data x given the resistant and the susceptible pool, respectively [28]. Since the priors are assumed to be equal, the prior odds are assumed to be one. Therefore, the Bayes factor equals the posterior odds and, in this way, it is taken as a relative measure of evidence of the resistant pool over the susceptible pool. We estimated B from our empirical densitiesf res andf sus , respectively, aŝ where x i , i = 1, . . . , n, are the OD measurements of a simulated sample of size n. In theory, values ofB > 1 are indicative that a sample was taken from the resistant pool, otherwise it was taken from the susceptible pool. For each of the 10,000 samples average OD, Bayes factor, true class label (resistant or susceptible), and sample size were stored in a file for subsequent analysis. Finally, we determined the FPR, TPR and AUC values for the optimal cutoff values for each sample size for both classification approaches, i.e., mean OD value and Bayes factor. The 95% confidence intervals for FPR and TPR were determined using the package fbroc [38]. The 95% confidence intervals for AUC were determined with the package pROC [39].

Application in a Realistic Setting
We were interested to see if reliable classification of small samples is possible if the learning data are derived from one resistant and one susceptible genotype instead of a pool as described before and if the data to be classified are not contained in the learning data. To this end, we defined OD values from KWS C (74 measurements) to be the resistant training set and the OD values from KWS F (86 measurements) to be the susceptible training set.
As test sets to be classified we chose the OD values from KWS D (resistant, 50 measurements) and from KWD E (susceptible, 99 measurements). Choosing n = 5 to be the size of the small samples, we obtained 10 samples for KWS D and 19 samples for KWS E. The small samples were drawn randomly from the measurements without replacement, such that four measurements for KWS E were not considered.

Results
We determined the OD values from a total of 600 individual seedlings from six different sugar beet genotypes. After removing 111 seedlings where the measuring had failed, the remaining 489 samples produced OD values between 0.005 and 4.000. A total of 304 measurements resulted from four resistant genotypes (KWS A, KWS B, KWS C, KWS D with n = 83, 97, 74, and 50, respectively) and 185 measurements from susceptible genotypes (KWS E, KWS F with with n = 99 and 86, respectively). Figure 1 displays the data distribution for every genotype as histograms. Due to the technical limits of the ELISA machine, OD values are limited to the interval between 0 and 4. None of the data distributions seem to follow a normal distribution. All resistant genotypes show a right-skewed data distribution. The genotype KWS A shows the smallest central tendency of ODs and the highest degree of skewness. Despite their classification as resistant genotypes, KWS C and KWS D show a substantial fraction of high OD values. Similarly, KWS E and KWS F show OD values that are almost uniformly distributed across the whole interval.
In Figure 2 the data distribution of ODs is displayed but genotypes are pooled by their resistance level as either resistant or susceptible. The resistant pool shows a right-skewed data distribution, whereas the susceptible pool shows an almost uniform data distribution with a peak at the right border of the interval.  For instructional purposes, we display a ROC curve in Figure 3 which was created using the average OD as predictor with a sample size of n = 2. The Youden index which is chosen as optimal cutoff value for the separation of "resistant" and "susceptible" is the average OD value that maximises the difference between TPR and FPR. It is also the point of the curve that is most distant from the chance line. Its value in this example is 0.77 which, however, cannot be deduced from the graph. At the Youden index, FPR = 0.052, TPR = 0.844 and AUC = 0.96.  Figure 4 displays the TPR, FPR and AUC for each sample size with the average OD and Bayes factor as predictors. One can see that for each of the analysed sample sizes the TPR value based upon the Bayes factor is greater than the TPR value based upon the average OD as predictor. Analogously, the FPR based upon the Bayes factor is smaller for each sample size. Moreover, the AUC values based upon the Bayes factor is higher than the AUC based upon the average OD for each sample size. At a sample size of six, the AUC value based upon the Bayes factor as predictor has practically converged to one. pathogens (Escherichia coli, Klebsiella pneumoniae, Pseudomonas aeruginosa and Staphylococcus aureus). For example, Escherichia coli strains were resistant to both co-trimoxazole and ESBL in three samples, and to fluoroquinolones and ESBL in five samples. Klebsiella pneumoniae strains were resistant to carbapenems in two cases, with double resistance to fluoroquinolones but not to co-trimoxazole. Pseudomonas aeruginosa strains had a different pattern of double resistance: resistance to carbapenems were always (16/16) associated with resistance to fluoroquinolones but less to piperacillin (16/5), to ceftazidime/cefepime (16/4), to aminoglicosydes (16/2) and colistin (16/2). Lastly, Staphylococcus aureus strains, resistant to oxacillin (MRSA), were often refractory, even against fluoroquinolones (52/45)  Figure 5 shows the optimal cutoff values for the mean OD and the Bayes factor for the sample sizes under investigation. The optimal cutoff values for the mean OD are relatively stable at around one as one also might have estimated from the distributions for the resistant and the susceptible pool shown in Figure 2. Values between 0.6 and 1.0 appear to be a good choice since an overwhelming fraction of all measurements from the resistant pool are below this range. The optimal cutoff for the Bayes factor is also in the range of one for samples sizes two to five, as one might expect given the theoretical value is 1.0. The divergent values for samples sizes six to eight ranging from 2 to 2.5 are less obvious. Plotting TPR against possible cutoff values between 1 and 10 revealed that they were above 0.99 in this range, such that the decision of the algorithm might depend on tiny numerical effects. The choice of the theoretical value of one as the optimal cutoff for Bayes factors, thus, seems to be justified also for practical purposes such that an ROC analysis does not seem to be absolutely necessary if one chooses Bayes factors as predictors. In order to facilitate the calculations for interested researchers and breeders, we make an R-script named classifyELISA.r freely available as a Supplementary file. The script reads in a file (TrainingSet.txt) containing the OD measurements that are labelled with either "resistant" or "susceptible". Empirical density functions are calculated from which 10,000 small samples are drawn. The size of the small samples is the same as in the file of data to be classified (DataSet.txt). Both input files are also made available. After the calculation of sample means and Bayes factors for each small sample, optimal cutoff values are determined in a ROC analysis for both mean OD values and Bayes factors. These cutoff values are applied to classify the new data set provided in the file DataSet.txt. The results are written in the file PredictionSet.txt.

Discussion and Conclusions
We have shown in a simulation study that if a representative training set containing absorbance readings from ELISA together with a reliably assigned class label is available, small samples can be classified quite accurately. In more detail, our endeavours were two-fold. First, we wanted to improve a very simple decision procedure in which the average of few values is taken and compared to some conventional threshold, e.g., an average OD of 0.5. If the sample average is below this cutoff, the corresponding genotype is considered as resistant and is selected for further breeding towards rhizomania resistance. Otherwise, it is rejected. We have shown that in a ROC analysis the choice of the cutoff can be optimised in the sense that for it the distance between false positive and true positive rate is maximised. This simple decision strategy yielded surprisingly good results even for samples of size n = 2 (FPR = 0.07, TPR = 0.85).
Secondly, we calculated the Bayes factor for each sample based upon the empirical density functions that were derived from the training set and used it as predictor in a further ROC analysis. We found that the Bayes factor is a predictor that is superior to the average OD value for all sample sizes under investigation. Using the Bayes factor as a predictor, a TPR over 0.99 and a FPR below 0.01 can be expected for n ≥ 5. This performance would be achieved with the average OD strategy only for n ≥ 8.
Given that the calculation of the Bayes factor is relatively straightforward, applying the decision strategy based upon the Bayes factor could contribute to improving the selection procedure in corporate or academic settings by 3 to 4 percentage points in TPR and FPR for n = 3 with the advantage becoming less pronounced for larger samples.
We anticipate that our deliberations can be transferred to similar settings where binary decisions have to be made based upon small samples given approximate distributions of measurements in the two classes. Possible knowledge about the occurrence of the two classes could be considered by specifying prior probabilities in the Bayes factors.
Supplementary Materials: The following are available online at https://www.mdpi.com/article/ 10.3390/agriculture11100956/s1: Script S1: classifyELISA.r, R script for classification of small samples of OD values into resistant or susceptible. Table S2: TrainingSet.txt, table of three columns to estimate the optimal cutoff values. Column 1 contains identifiers, column 2 contains measurements, column 3 contains true labels. Table S3: DataSet.txt, data set of small samples as an example of measurements to be classified. Column 1 contains identifiers, column 2 contains measurement 1, column 3 contains measurement 2, column 4 contains measurement 3.

Conflicts of Interest:
The authors declare no conflict of interest.

Abbreviations
The following abbreviations are used in this manuscript: