Interlaboratory Validation of a DNA Metabarcoding Assay for Mammalian and Poultry Species to Detect Food Adulteration

Meat species authentication in food is most commonly based on the detection of genetic variations. Official food control laboratories frequently apply single and multiplex real-time polymerase chain reaction (PCR) assays and/or DNA arrays. However, in the near future, DNA metabarcoding, the generation of PCR products for DNA barcodes, followed by massively parallel sequencing by next generation sequencing (NGS) technologies, could be an attractive alternative. DNA metabarcoding is superior to well-established methodologies since it allows simultaneous identification of a wide variety of species not only in individual foodstuffs but even in complex mixtures. We have recently published a DNA metabarcoding assay for the identification and differentiation of 15 mammalian species and six poultry species. With the aim to harmonize analytical methods for food authentication across EU Member States, the DNA metabarcoding assay has been tested in an interlaboratory ring trial including 15 laboratories. Each laboratory analyzed 16 anonymously labelled samples (eight samples, two subsamples each), comprising six DNA extract mixtures, one DNA extract from a model sausage, and one DNA extract from maize (negative control). Evaluation of data on repeatability, reproducibility, robustness, and measurement uncertainty indicated that the DNA metabarcoding method is applicable for meat species authentication in routine analysis.


Introduction
Food authentication is known to be a challenging task. The methodology applied depends on several factors, including sample type, type of adulteration, and the information required. Meat products are most commonly adulterated by the replacement of highpriced animal species by lower-quality or cheaper ones. Since DNA-based methodologies are highly suitable to detect genetic variations such as single nucleotide polymorphisms (SNPs), insertions, and deletions, they play a crucial role in the identification and differentiation of animal species in meat products [1][2][3]. DNA-based methodologies target either species-specific sequences in nuclear DNA or conserved regions in mitochondrial DNA [4]. Currently, authentication of meat products in official food laboratories is mainly based on real-time polymerase chain reaction (PCR) assays and/or DNA arrays. Multiplex real-time PCR assays, allowing the identification and quantification of multiple species in one and the same well, are particularly applicable for routine analysis because they allow saving time and costs. Multiplex real-time PCR assays have not only been developed for domesticated species, e.g., beef, pig, chicken, and turkey [5], and beef, pig, horse, and sheep [6], but also In total, seven animal species, including five mammalian species (Sus scrofa domesticus (pig), Bos taurus (cattle), Equus caballus (horse), Ovis gmelini aries (sheep), and Capra aegagrus hircus (goat)), and two poultry species (Gallus gallus domesticus (chicken) and Meleagris gallopavo (turkey)) were covered by the samples. All samples originated from muscle meat and were purchased from local meat suppliers.
Sample 1 contained DNA from seven animal species: DNA from pig as major component (94%, v/v), and DNA from six animal species (cattle, horse, sheep, goat, chicken, turkey; 1% (v/v) each). Samples 2-6 consisted of DNA from five animal species in varying proportions, ranging from 0.1% (v/v) to 67.5% (v/v). All DNA extract mixtures were prepared at the AGES.
The model sausage was produced according to the Codex Alimentarius Austriacus by the Higher Technical College for Food Technology Hollabrunn (Hollabrunn, Austria). The model sausage consisted of 50% (w/w) beef, 40% (w/w) pig, 5% (w/w) chicken, and 5% (w/w) turkey.

Genomic DNA Extraction
Extraction of genomic DNA from muscle meat of the seven animal species (pig, beef, horse, sheep, goat, chicken, turkey) was carried out at the CVUA-FR by applying the official method L 00.00-119 [13]. Identity of the animal species was verified by subjecting DNA extracts to Sanger sequencing and matching a~464 base pair (bp) fragment of the mitochondrial cytochrome b gene against public databases provided by the National Center for Biotechnology Information (NCBI, Bethesda, MD, USA) [14,15]. After verification by Sanger sequencing and isolating genomic DNA fourfold, individual DNA extracts were combined. Total DNA of the (combined) DNA extracts was quantified by spectroscopy employing a UV/VIS spectrophotometer, adjusted to a DNA concentration of 20 ng/µL and sent to AGES. Isolation of DNA from the homogenized model sausage was performed at AGES [13].
The copy number of the mitochondrial 16S ribosomal DNA gene in the extracts from the respective animal species was determined by droplet digital PCR (ddPCR, QX200 Droplet Generator, QX200 Droplet Reader (Bio-Rad, Hercules, CA, USA)) using the Eva-Green Supermix. DNA extract mixtures were prepared at the AGES, by taking the copy numbers (pig (870 copies/µL), beef (1069 copies/µL), horse (1795 copies/µL), sheep (520 copies/µL), goat (790 copies/µL), chicken (620 copies/µL), turkey (673 copies/µL)), into account. The percentages of samples 1 to 6 given in Table 1 were calculated by relating the DNA copy number of the respective animal species to the total number of copies of animal species in the sample.

Study Design
The interlaboratory ring trial for validation of the DNA metabarcoding method for mammalian and poultry species [11] was conducted in the framework of the §64 LFGB working group "NGS Species Identification" under the coordination of AGES. Statistical data analysis was performed by QuoData GmbH (Dresden, Germany).
For sequencing, three benchtop NGS instruments from two companies were used. Benchtop instruments from Illumina (Illumina, San Diego, CA, USA) were employed by eleven laboratories, whereof eight used the MiSeq instrument, three the iSeq 100 instrument, and one participant used both the MiSeq and the iSeq 100 instrument. Four laboratories applied the Ion GeneStudio S5 instrument from Thermo Fisher Scientific (Thermo Fisher Scientific, Waltham, MA, USA).
Each participant obtained 16 anonymously labelled samples, comprising two subsamples of each of the eight samples (Table 1). Sixteen samples were chosen to allow the iSeq 100 platform to be included in the ring trial. This also enabled the use of the most cost-effective MiSeq Reagent Micro Kit v2 for a small number of samples on the Illumina platforms. Participants directly used all individual DNA extracts for DNA library preparation and subsequently for amplicon sequencing on a next-generation sequencing instrument. Together with the "ready-to-use" DNA extracts, the participants obtained reagents for creating DNA libraries, a sequencing kit, and a step-by-step instruction.
In order to be able to perform sequencing on the Ion GeneStudio S5 instrument (Thermo Fisher Scientific, Waltham, MA, USA), the protocol for preparation of DNA libraries and sequencing published previously by Dobrovolny et al. (2019) [11] had to be adapted as follows. Each of the two forward primers were elongated by a 3 bp barcode adapter, one of 16 different 10 bp barcodes (index sequence) and the overhang adapter sequence (A adapter). Each of the two reverse primers was linked to an overhang adapter sequence (trP1 adapter). The PCR setup did not include additional magnesium chloride solution. In the magnetic bead cleaning step, a total of 37.5 µL Agencourt ® AMPure ® XP beads (Beckman Coulter, Brea, CA, USA) was used and the DNA was eluted with 50 µL Tris-EDTA (TE) buffer. The average library size was 190 bp, and all DNA libraries were adjusted to 100 pM and were mixed together in a single 1.5 mL tube. A 25 pM DNA pool was used for sequencing. In general, any deviations from the protocols had to be reported by the participants.
Paired-end sequencing on an Illumina instrument was performed using either the MiSeq Reagent Micro Kit v2 (300-cycles) or the iSeq 100 i1 Reagent v2 (300-cycles), which included a 5% PhiX spike-in. The Ion Chef instrument was used with the Ion 510 TM and Ion 520 TM and Ion 530 TM Kit-Chef and the Ion 520 TM Chip Kit to perform template preparation, enrichment and chip loading. Finally, the sequencing reaction was started on the Ion GeneStudio S5 instrument.
To obtain information about the presence of the animal species in the samples, the sequencing output in FastQ format was processed with a multi-step analysis pipeline by using Galaxy (version 19.01) as described previously [12]. Before the resulting FastQ files were used as input for the data analysis, the raw binary base call (bcl) files generated by Illumina devices were converted to text files using the conversion software bcl2fastq2-v2. 19.0.316 (Illumina, San Diego, CA, USA). The default demultiplexing option of one allowed mismatch in the barcode recognition of the Illumina software (--barcode mismatches) was thereby set to zero (value: 0) and the step was also integrated into the pipeline. Preliminary tests had shown that this can increase the quality of index recognition. The Thermo Fisher instrument software uses this setting by default. The analysis pipeline for sequencing data of the Thermo Fisher Scientific (Waltham, MA, USA) platforms was modified because paired-end FastQ files do not exist in this case. Consequently, the primer sequences were adapted according to the requirements of the analysis tool Cutadapt (Galaxy version 1.16.6 [16]) and the tool fastq-join (Galaxy version 1.1.2-806.1) [17] was removed. Dereplicated reads were directly matched against a customized database (AGES database) including 51 mitochondrial genomes from animals (Supplementary Table S1) and the public databases provided by NCBI using BLASTn [18]. The AGES database contains verified sequences from the NCBI database exclusively from food-relevant animal species. This reduces the time needed for alignment and is intended to avoid nonsense results. For each of the samples, results were listed automatically in a table according to taxonomy and abundance and a formula calculated the proportions of animal species by relating the number of reads for the respective species to the number of total reads (after pipeline) across all animal species obtained for the subsample. For further statistical analysis, all Excel spreadsheets were sent to QuoData GmbH.

Statistical Data Analysis
Statistical analyses were performed by QuoData GmbH. Even though the DNA metabarcoding method used in this interlaboratory comparison is commonly regarded as a qualitative method, the underlying decision process is based on the comparison of a quantitative value, namely the proportion of a single species, with a specific threshold. The performance of such a method can be assessed both on the basis of the qualitative result (yes/no) and on the basis of the underlying quantitative data. Because the information content of the quantitative data can be far greater than the corresponding qualitative data, the quantitative data were used in addition to the qualitative data to describe the performance of the DNA metabarcoding method.
In addition, the study of quantitative data also aimed to verify the extent to which this method can also be used for quantitative determinations.

Quantitative Statistical Analyses
Proportions of animal species ranged between 0.1% and 94%. To avoid asymmetric distributions for proportions near 0% and 100%, and to ensure equality of variances for the individual combinations of samples/animal species, the proportions were subjected to a logit transformation: The logit-transformed proportions can be retransformed as follows: The logit-transformed proportions were then subjected to several outlier tests. Data were checked for systematic errors across samples and/or animal species affecting the mean values (Mandel h statistics) and/or variances (Mandels k statistics). In addition, the occurrence of sample-and animal species-specific outliers regarding the laboratory mean values and variances was tested for by means of the Grubbs and the Cochran tests (significance level 1%), respectively. Proportions identified as outliers were excluded from further statistical analyses.
Logit-transformed and outlier-cleaned data were checked for normal distribution using the Shapiro-Wilk test. Then, repeatability, reproducibility, and accuracy of the proportions of animal species were determined according to the criteria of QuoData certified with the aid of the software solution for method comparison studies and interlaboratory comparison studies PROLab Plus, version 2021.7.22.0 [19] (QuoData, Dresden, Germany), using the statistical methods according to DIN ISO 5725-2 and according to the specifications in the Official Collection of Test Methods ASU §64 LFGB for the statistical evaluation of ring trials for method validation [20]. Taking into account the obtained repeatability  and reproducibility standard deviations for samples 1-7, variance functions describing the  functional relationship between standard deviations and overall mean for the individual  combinations of samples/animal species were modelled. For each of the combinations, the bias (difference) between this overall mean and the proportion of the animal species added to the sample was also determined. Furthermore, the standard deviation of this bias was calculated.
Prediction profiles and measurement uncertainty profiles were constructed, both not considering the bias (based on reproducibility standard deviations), and considering the bias (based on reproducibility standard deviations as well as on the standard deviation of the bias).
In addition, the z scores for each combination of lab/sample/animal species were determined, providing a measure for the standardized deviations of laboratory mean values from the respective overall mean value.

Qualitative Statistical Analyses
A sample was classified as false positive if for at least one animal species that had not been added, a proportion above a defined threshold was obtained. By contrast, a sample was considered false negative for a specific animal species if the proportion was below a defined threshold for this species.
The probability of detection for an arbitrary animal at a defined threshold for (1) a laboratory with average performance, (2) a laboratory with positive bias, and (3) a laboratory with negative bias was determined based on the variance functions by the quantitative statistical analysis.
Each of the laboratories submitted 16 sequencing results in total (eight samples, two subsamples each), with the exception of laboratories 07, 12, and 13. Laboratory 07 did not provide the result for subsample 8B (negative control), whereas datasets submitted by laboratories 12 and 13 were lacking results for both subsamples of sample 8. With the exception of laboratory 03, sequencing was done by using the test kit provided by the AGES. The suitability of the MiSeq Reagent Kit v2 applied by laboratory 03 had been demonstrated in preliminary experiments.
FastQ data provided by the participating laboratories was evaluated by the AGES by using the analysis pipeline in Galaxy. For identification of animal species, the DNA sequences (reads) were aligned, once with the customized AGES database and once with the NCBI database. Supplementary Table S2 summarizes the total number of reads for each laboratory, taking into account the results obtained for each of the fourteen subsamples containing animal species (samples 1-7).
Total numbers of raw reads that passed the analysis pipeline were quite different between laboratories. Very low total numbers of raw reads can, for example, be caused by errors during wet-lab activities, e.g., pipetting errors or error rate of DNA polymerase. In addition, problems with adapter-and index-recognition are known to have an impact. Loss of reads or their elimination by pipeline tools can also be caused by errors in PCR amplification of the library (e.g., index hopping), sequencing errors (e.g., inserts, substitutions, or deletions) or insufficient cluster resolution [21]. All these errors may affect the quality of raw data (FastQ file) and thus the number of DNA sequences (reads) after analysis pipeline.
Total numbers of reads obtained with the Ion GeneStudio S5 were significantly higher than those obtained with the MiSeq (p < 0.001) and the iSeq 100 (p < 0.001). Differences ob-served between the Illumina and the Thermo Fisher technology are caused by differences in data filtering. The instrument-specific software of the Ion GeneStudio S5 removes datasets of lower quality by filtering before starting the analysis pipeline. Thus, considerably more sequences remain after primary data analysis compared to the instruments from Illumina. Differences in recoveries, by relating the total number of reads to the number of raw reads before analysis pipeline, between laboratories using the same instrument type (MiSeq, iSeq 100, or Ion GeneStudio S5, Table S3) hint at differences in the quality of the sequencing run and unintended loss of reads. Significant differences (p < 0.001) in recoveries between laboratories using Illumina instruments and those applying the Ion GeneStudio S5 were, however, expected. These differences are caused by the fact that the pipeline of the Ion GeneStudio S5 neither included paired-end sequencing nor a "joining step" as was the case with the Illumina platforms.

Quantitative Evaluation of Ring Trial Data
The aim of quantitative evaluation of ring trial data was to determine average proportions of the animal species that had been added to samples 1-7, and to identify resulting error components within and in between laboratories.

Proportions of Animal Species in Samples 1-7
Proportions of animal species were calculated by relating the number of reads for the respective species to the number of total reads (after pipeline) across all animal species obtained for the sample. Table 2 gives the proportions of animal species determined for sample 1 containing seven animal species (Table 2a), sample 2 (as a representative of samples consisting of five animal species; Table 2b) and sample 7, a model food sample (Table 2c). Results for samples 3, 4, 5, and 6 are shown as stacked bar plots ( Figure 1).
Preliminary evaluation of the results indicated that the proportions of animal species determined considerably depended on the sequencing platform/technology applied. Due to the low number of laboratories using the sequencing technology from Thermo Fisher Scientific, only results obtained by laboratories applying Illumina platforms were included into statistical evaluation. Results obtained by laboratories 09-12 using the Ion GeneStudio S5 are only shown for comparison.

Logit Transformation
Proportions of animal species in samples 1-7 were quite different, ranging from 0.1% to 94%. However, a prerequisite for the evaluation of ring trial data according to ASU §64 LFGB is that the proportions of animal species follow normal or at least symmetric distribution. In order to allow assumption of normal distribution and ensure equality of variances of the individual combinations of samples/animal species (after elimination of outliers), proportions of animal species were subjected to a logit transformation. The logit is the logarithm of the proportion of the animal species divided by 1 minus the proportion of the animal species. Proportions of, e.g., 0.1%, 1%, 10%, 30%, 50%, and 70%, resulted in logit values of −6.91, −4.60, −2.20, −0.85, 0, and 0.85, respectively. Since the logit for 0% and 100% is not defined, it was set to surrogate values of −15 and +15, respectively.

Outlier Tests
In the course of evaluating ring trial data according to ASU §64 LFGB, logit-transformed proportions of animal species were subjected to several outlier tests (see also Section 2.5.1). Table 3 summarizes the outliers and reasons for their elimination.  Preliminary evaluation of the results indicated that the proportions of animal species determined considerably depended on the sequencing platform/technology applied. Due to the low number of laboratories using the sequencing technology from Thermo Fisher Scientific, only results obtained by laboratories applying Illumina platforms were included into statistical evaluation. Results obtained by laboratories 09-12 using the Ion GeneStudio S5 are only shown for comparison.

Logit Transformation
Proportions of animal species in samples 1-7 were quite different, ranging from 0.1% to 94%. However, a prerequisite for the evaluation of ring trial data according to ASU §64 LFGB is that the proportions of animal species follow normal or at least symmetric distribution. In order to allow assumption of normal distribution and ensure equality of variances of the individual combinations of samples/animal species (after elimination of outliers), proportions of animal species were subjected to a logit transformation. The logit is the logarithm of the proportion of the animal species divided by 1 minus the proportion of the animal species. Proportions of, e.g., 0.1%, 1%, 10%, 30%, 50%, and 70%, resulted in logit values of −6.91, −4.60, −2.20, −0.85, 0, and 0.85, respectively. Since the logit for 0% and 100% is not defined, it was set to surrogate values of −15 and +15, respectively.  In case an outlier was only identified for one database (either AGES or NCBI database), it was, however, eliminated for both databases to ensure data comparability. In total, 15 of 396 (3.8%) combinations of laboratory/sample/animal species were identified as outliers and excluded from further evaluation for each of the databases.

Distribution of Sample-Specific Proportions of Animal Species
After outlier elimination, sample-specific logit-transformed proportions of animal species were tested for normal distribution by Kernel density estimation and the Shapiro-Wilk test.
A small number of cases were found to have a bimodal distribution. However, the Shapiro-Wilk test did not show evidence for non-normality for any of the combinations of samples/animal species. Thus, logit-transformed proportions of animal species could be subjected to further statistical evaluation.
3.1.5. Statistical Parameters According to ASU §64 LFGB Logit-transformed and outlier-cleaned data were normally distributed and thus could be subjected to statistical evaluation according to ASU §64 LFGB. Table 4 gives-for each animal species and both databases (AGES and NCBI)-main statistical parameters, including re-transformed mean value, reproducibility standard deviation (s R ), and repeatability standard deviation (s r ). For the sake of completeness, the logit-transformed parameters are shown as well. The reproducibility standard deviation characterizes the variability of results between laboratories, and the repeatability standard deviation the variability within a laboratory under constant conditions, i.e., the variability of results obtained for the two subsamples of the same sample.
3.1.6. Dependence of Bias, Reproducibility Standard Deviation, and Repeatability Standard Deviation on the Mean Proportion of Animal Species Next, it was evaluated whether the bias between the proportion of the animal species added to the sample and the overall mean determined in the ring trial, as well as whether reproducibility standard deviation and repeatability standard deviation depended on the proportion of the respective animal species (Figure 2A) and/or the predominant animal species in the sample ( Figure 2B). Evaluation was based on the statistical parameters for logit-transformed proportions of animal species. Next, it was evaluated whether the bias between the proportion of the animal species added to the sample and the overall mean determined in the ring trial, as well as whether reproducibility standard deviation and repeatability standard deviation depended on the proportion of the respective animal species (Figure 2A) and/or the predominant animal species in the sample ( Figure 2B). Evaluation was based on the statistical parameters for logit-transformed proportions of animal species. As expected, the reproducibility standard deviation and repeatability standard deviation were found to be higher for lower proportions of animal species than for proportions of about 50% (logit = 0), independent of the database selected for alignment. It was found that pig tends to have higher standard deviations in reproducibility, but not in repeatability, compared to other animal species (Figure 2A). A tendency towards higher reproduc-    As expected, the reproducibility standard deviation and repeatability standard deviation were found to be higher for lower proportions of animal species than for proportions of about 50% (logit = 0), independent of the database selected for alignment. It was found that pig tends to have higher standard deviations in reproducibility, but not in repeatability, compared to other animal species (Figure 2A). A tendency towards higher reproducibility standard deviations was also observed in case beef was the predominant animal species in the sample. This also held true for the repeatability standard deviation, although to a lower extent.
The bias between the proportion of the animal species added to the sample and the overall mean determined in the ring trial was in the range from −0.6 to 0.6 logits, with just two exceptions. Neither the animal species nor the proportion of the animal species was found to have a systematic effect on the bias. The animal species and the proportion of the animal species did not have a systematic impact on the reproducibility standard deviation either.
Thus, the standard deviation induced by the bias ("bias standard deviation"), absolute reproducibility standard deviation, and absolute repeatability standard deviation could be modeled across animal species and samples, for both the AGES and NCBI databases. The modeled variance function was similar for both databases. The lowest bias standard deviation, reproducibility standard deviation, and repeatability standard deviation were found for a proportion of 50% (logit = 0). The closer the proportion to 0% or 100%, the higher the standard deviations. Table 5 summarizes the modeled and re-transformed standard deviations, which were found to be independent of the database used for alignment. Table 5. Bias standard deviation, reproducibility standard deviation, and repeatability standard deviation (absolute, i.e., retransformed to proportions of animal species) depending on the proportion of animal species.

Variability across Animal Species and Measuring Uncertainty
To allow predictions for further analyses, the measuring uncertainty was evaluated. Since the database (AGES or NCBI) was not found to have an impact on bias standard deviation, reproducibility standard deviation, or repeatability standard deviation, the measuring uncertainty was only evaluated for the AGES database, representative for both databases.
Based on the reproducibility standard deviation, a prediction profile was established in terms of the 95% confidence interval of the results of all laboratories across all animal species. In addition, the 95% confidence interval was established by considering both the reproducibility standard deviation and the bias standard deviation.
The upper part of Figure 3 shows the 95% confidence interval of the (outlier-cleaned) results depending on the respective overall mean value of the proportion of an animal species (without considering the bias). The left side of the figure shows the entire range, and the right side an enlarged view of proportions from 0 to 10%. The figure indicates that for a "true proportion" (assuming that the overall mean across laboratories applying Illumina platforms reflects the "true proportion") of, e.g., 5%, the 95% confidence interval is 4.1-6.2%. In total, 4.7% of the individual values are outside the 95% confidence interval. From the prediction profiles, a measurement uncertainty profile was established, indicating how far the proportion determined may deviate from the "true" proportion of the animal species. Figure 4 shows the 95% measurement uncertainty intervals depending on the proportion of the animal species determined, on the left side without consideration of the bias (assuming that the "true" proportion equals the overall mean across laboratories applying Illumina platforms), on the right side under consideration of the bias (assuming that the "true" proportion equals the proportion of animal species added to the sample).  The 95% confidence interval of the (outlier-cleaned) results of all laboratories, depending on the respective proportion added (by considering the bias) is shown in the lower part of Figure 3. The left and right sides show the entire range and an enlarged view of proportions from 0 to 10%, respectively. For example, for an added proportion of 5%, the 95% confidence interval is 2.5-9.9%. In total, 3.7% of the individual values are outside the 95% confidence interval.
From the prediction profiles, a measurement uncertainty profile was established, indicating how far the proportion determined may deviate from the "true" proportion of the animal species. Figure 4 shows the 95% measurement uncertainty intervals depending on the proportion of the animal species determined, on the left side without consideration of the bias (assuming that the "true" proportion equals the overall mean across laboratories applying Illumina platforms), on the right side under consideration of the bias (assuming that the "true" proportion equals the proportion of animal species added to the sample). For a 50% proportion of the animal species, the "true value" can be 3.6% (percentage points) lower or higher, if the bias is not taken into account, or even 15.2%, if the bias is taken into consideration.
3.1.8. z Scores z scores were calculated for interlaboratory evaluation across samples and animal species ( Figure 5). z scores measure standardized deviations of laboratory mean values from the overall mean value. An absolute z score > 2 hints at a statistically significant deviation of the respective laboratory.
(A) (B)  For a 50% proportion of the animal species, the "true value" can be 3.6% (percentage points) lower or higher, if the bias is not taken into account, or even 15.2%, if the bias is taken into consideration.
3.1.8. z Scores z scores were calculated for interlaboratory evaluation across samples and animal species ( Figure 5). z scores measure standardized deviations of laboratory mean values from the overall mean value. An absolute z score > 2 hints at a statistically significant deviation of the respective laboratory. For a 50% proportion of the animal species, the "true value" can be 3.6% (percentage points) lower or higher, if the bias is not taken into account, or even 15.2%, if the bias is taken into consideration.
3.1.8. z Scores z scores were calculated for interlaboratory evaluation across samples and animal species ( Figure 5). z scores measure standardized deviations of laboratory mean values from the overall mean value. An absolute z score > 2 hints at a statistically significant deviation of the respective laboratory.
(A) (B)  LASSO regression was applied to check whether absolute z scores depended on the instrument (MiSeq, iSeq 100), NGS experience of the respective laboratory, and/or activation of the function "adapter trimming" before starting the sequencing run. Analyses were performed excluding data previously identified as outliers. NGS experience of the laboratory was found to significantly affect the z score. For laboratories more experienced in NGS (laboratories 01, 04, 08, 15,20), lower absolute z scores were determined compared to those with lower NGS experience. By contrast, neither the instrument (MiSeq, iSeq 100) nor activation of the function "adapter trimming" before starting the sequencing run was found to significantly affect the z score.

False Positive Rate
Next, it was investigated whether animal species were identified that had not been added to the samples, and whether the false positive rate depended on the database selected for alignment. A sample was classified as positive if for at least one animal species that had not been added, a proportion above a defined threshold was obtained. The threshold was set to 0.1%, 0.5%, 1.0%, or 2.0% proportion of the animal species. Table 6 indicates that alignment against the AGES database (Table 6A) resulted in less false positive reads compared to alignment against the NCBI database (Table 6B). Alignment against the AGES database only yielded false positive samples at threshold values of 0.05% or 0.1% ( Figure 6). Alignment against the AGES database only yielded false positive samples at threshold values of 0.05% or 0.1% ( Figure 6). Higher false positive rates for the NCBI database were inevitably caused by the higher number of entries in the NCBI database compared to the AGES database. Most species resulting in false positive reads when using the NCBI database were not contained in the AGES database and thus could not be identified with the latter.
From the ring trial data it can be concluded that by using an appropriate customized database and by setting the threshold to 0.5%, false positive rates < 1% will be obtained.

False Negative Rate
Next, the false negative rate was evaluated at threshold values of 0.05%, 0.1%, 0.5%, and 1% for both the AGES and the NCBI databases (Table 7).  Higher false positive rates for the NCBI database were inevitably caused by the higher number of entries in the NCBI database compared to the AGES database. Most species resulting in false positive reads when using the NCBI database were not contained in the AGES database and thus could not be identified with the latter.
From the ring trial data it can be concluded that by using an appropriate customized database and by setting the threshold to 0.5%, false positive rates < 1% will be obtained.

False Negative Rate
Next, the false negative rate was evaluated at threshold values of 0.05%, 0.1%, 0.5%, and 1% for both the AGES and the NCBI databases (Table 7).  Chicken  Beef There was no considerable difference between the AGES and the NCBI databases regarding the proportion of false negative results obtained for samples 1-7. At a threshold of 0.05%, false negative results were only obtained for pig in sample 4. At a threshold of 0.1%, none of the combinations of sample-animal species led to false negative results, with the exception of proportions being close to the threshold (0.1%). The data indicate that at a threshold of ≥0.5%, the probability of obtaining false negative results is very low. Figure 7 shows the probability of detection for three thresholds (0.1%, 0.5%, and 1%) and three scenarios, namely a laboratory with average performance, a laboratory with positive bias, and a laboratory with negative bias.    Figure 7. Probability of detection for a laboratory with average performance, a laboratory with positive bias, and a laboratory with negative bias, depending on the threshold and the database. Figure 7 indicates that a threshold value of 1% seems to be a good compromise, provided that the variance function determined in the ring trial is equal to the actual variance function. A threshold value of 1% guarantees that a laboratory with a positive bias (overestimating the actual proportion) does not complain if the proportion of a certain animal species is 0.5%, whereas even a laboratory with a negative bias (underestimating the actual proportion) will be able to identify proportions >1.5% reliably.

Negative Control
Sample 8 was a DNA extract from maize, serving as a negative control. Since the marker system designed for mammals and poultry species does not detect maize, all reads that were obtained for sample 8 had to be regarded as false positive. Table 8 lists the number of total reads (after pipeline) per laboratory and species. Laboratories 12 and 13 did not submit results for sample 8, laboratory 07 only provided results for one of the two subsamples.
In total, eight animal species were identified in the negative control by alignment against the AGES database. Fourteen further animal species, including the species Homo sapiens were identified, when the NCBI database was used. Per laboratory, up to five animal species were only identified with the NCBI database, with the exception of laboratory 06, which even detected eight additional animal species in subsample A.
In most cases, within a laboratory, the number of reads per animal species was similar for both subsamples. When the AGES database was used for alignment, most reads were assigned to beef and pig. Sample multiplexing in general, together with an inappropriate index layout, carries the risk of index misassignment. This is obviously the reason for the over-represented number of reads for pig and beef in the negative controls, as these animal species represent the main quantities in the samples. Although the index kit was used according to the manufacturer's instructions, the number of reads of these animal species could be reduced to the expected level in a supplementary experiment with an alternative index layout. Alignment against the NCBI database also resulted in considerably high numbers of reads (laboratories 06, 07, and 14 > 300, laboratory 07 even > 1000) for Homo sapiens, which was not contained in the AGES database.
As mentioned above, the marker system applied does not detect maize. Thus, the high number of reads for maize obtained by laboratory 06 for one subsample seems to be caused by a random error.

Conclusions
In summary, evaluation of data from the interlaboratory ring trial indicates that the DNA metabarcoding method performed on an Illumina platform is applicable for determining the proportion of the seven animal species with the given precision. Furthermore, the applicability of the method for testing foodstuff was demonstrated by the correct identification of the ingredients of a model sausage, which also supports the results in our study published recently.
Based on the data of the ring trial, a threshold of 0.5% is suitable to reliably assess whether a certain animal species is contained in a sample. The DNA metabarcoding method turned out to be rather robust and is therefore suitable to be implemented in routine analysis in official food control laboratories. Even laboratories that did not have much experience in NGS were able to provide reliable results. We suggest strictly following the given protocol. The results of the interlaboratory ring trial indicate that even alternative test kits or various sequencing platforms might be applied. However, the impact of any deviations from the experimental conditions has obviously to be tested before implementation in routine analysis.
Correct index recognition is of particular importance for pooled DNA libraries. We recommend frequently changing the index kits or the use of longer index sequences to avoid false positive and/or false negative results.
For taxonomic assignment, we suggest applying a customized database, as the pipeline is completed significantly faster and no nonsense results from erroneous database entries occur. However, if unexpected read losses and non-identifiable reads occur, the additional use of the entire NCBI database or any other appropriate sequence database is recommended.
In order to increase interlaboratory comparability of results obtained by DNA metabarcoding methods, it would be necessary to establish a reference database with verified sequence entries of relevant species. Access to adequate reference material would also facilitate harmonization of the methods used.
In general, determination of the meat content (w/w) from the number of NGS reads or the determined target DNA concentration is a well-known difficulty, especially in the quantification of meat species in processed foods. The result is also influenced by the degree of processing of the sample present and by the type of animal ingredients used. Data from testing reference samples out of proficiency testing schemes confirm the limitations known for DNA quantification in meat products [12]. Quantitative results should therefore serve only as rough estimates for weight ratios of different species in food.

Supplementary Materials:
The following supporting information can be downloaded at: https: //www.mdpi.com/article/10.3390/foods11081108/s1, Supplementary Table S1: Sequences included into the reference database, Supplementary Table S2: Total number of reads after pipeline (n = 14, samples 1-7, two subsamples each). min: minimal value, max: maximal value, mean: arithmetic mean, RSD: relative standard deviation, Supplementary Table S3: Recovery (%) (total number of reads after pipeline related to the number of raw reads before analysis pipeline). (n = 14, samples 1-7, two subsamples each). min: minimal value, max: maximal value, mean: arithmetic mean, RSD: relative standard deviation.  Institutional Review Board Statement: Ethical review and approval was waived for this study as no live animals were used or slaughtered to achieve the aims of the study.