DNA Accounting: Tallying Genomes to Detect Adulterated Saffron

The EU General Food Law not only aims at ensuring food safety but also to ‘prevent fraudulent or deceptive practices; the adulteration of food; and any other practices which may mislead the consumer’. Especially the partial or complete, deliberate, and intentional substitution of valuable ingredients (e.g., Saffron) for less valuable ones is of concern. Due to the variety of products on the market an approach to detect food adulteration that works well for one species may not be easily applicable to another. Here we present a broadly applicable approach for the detection of substitution of biological materials based on digital PCR. By simultaneously measuring and forecasting the number of genome copies in a sample, fraud is detectable as a discrepancy between these two values. Apart from the choice of target gene, the procedure is identical across all species. It is scalable, rapid, and has a high dynamic range. We provide proof of concept by presenting the analysis of 141 samples of Saffron (Crocus sativus) from across the European market by DNA accounting and the verification of these results by NGS analysis.


Introduction
Adulteration and mislabeling of food has been known since biblical times and probably goes back as far as when food started to be traded [1]. Unsurprisingly, laws and regulations to combat these practices can also be found throughout history: ancient food regulations are referred to in Egyptian, Chinese, Hindu, Greek, and Roman texts. Today, the EU General Food Law-next to protecting public health-aims to 'prevent fraudulent or deceptive practices; the adulteration of food; and any other practices which may mislead the consumer' (Article 8 of Regulation (EC) No 178/2002 ). A commonly accepted definition is that 'Food fraud includes adulteration, deliberate and intentional substitution, dilution, simulation, tampering, counterfeiting, or misrepresentation of food, food ingredients, or food packaging; or false or misleading statements made about a product for economic gain' [2]. Although food fraud is not explicitly defined in EU legislation, the Food Information to Consumers Regulation (Regulation (EU) No 1169/2011) concretises the relevant aspects of the General Food Law in Article 7 (Fair information practices) by stipulating that food information shall not be misleading, particularly as to the characteristics of the food and, in particular, as to its nature, identity, properties, composition, quantity, durability, country of origin or place of provenance, and method of manufacture or production.
Herbs and spices are among the group of ingredients most vulnerable to fraudulent manipulation, mostly due to their high economic value, which makes them an attractive target. Furthermore, international trade and highly complex supply chains, involving various actors and processing steps, contribute to the presence of adulterated herbs and spices on the market [3,4]. Macro-and microscopic examination of morphological features, physico-chemical methods (e.g., ash, volatile oil content), and determination of aromatic adulteration [22]. The proposed screening approach could help to systematically survey the authenticity of the product placed on the market. It is not suitable for grading its quality as other parts of Crocus sativus such as petals and stamens, which may be used as bulking agents, cannot be recognised.  [23] and the Global Biodiversity Information Facility (GBIF). Plant material and DNA were provided under the agreement of the Convention on Biological Diversity 1992. Reference samples used in this study include Arnica montana, Beta vulgaris, Bixa orellana, Buddleja officinalis, Calendula officinalis, Crocus sativus, Crocus speciosus, Crocus vernus, Daucus carota, and Hemerocallis fulva.

Samples for Method Development
Samples used for method development and validation were prepared from fresh and dry materials commercially available through shops and garden centers. Where possible, single plants/fruits were used.

Commercial Saffron Samples
Samples (n = 141) were collected in 20 countries of the European market space at various stages of the supply chain. Samples that were sold in powdered form were extracted as is, while samples consisting of whole stigmata were milled.

Sample Preparation
Fresh materials were thinly sliced (<1 mm) and dried (Memmert model UM500 at 75 • C) for 2 h or until dry. Dry and dried materials were mixed/milled using an MM301 Ball mill (Retsch) for 2 min at 30 Hz using either 10 mL grinding jars and 10 mm beads (for softer and pre-ground materials) or 20 mL grinding jars and 20 mm beads (harder materials such as seeds). Sample blends were prepared from single species; their mixing ratio (weight/weight percentage) was determined by a microbalance (PG503-S, Mettler Toledo, OH, USA).

Automation
Automated DNA extraction of approximately 300 mg of each sample was performed using a Tecan Freedom EVO liquid handler with Promega chemicals (CTAB extraction buffer, CLD lysis buffer, Reliaprep Resin, BWA wash buffer), the Promega Purefood protocol, a sample load volume of 350 µL, and an elution volume of 150 µL. Large volume DNA extraction (>350 mg sample) was performed manually using a CTAB based method adopted from [24].

Quantification
Fluorometric DNA quantification was done on a Qubit 4 fluorometer (Invitrogen™, Thermo Fisher Scientific, Waltham, MA, USA) with High Sensitivity chemistry (Invitrogen) according to manufacturer instructions using 5 µL of sample. For each sample, two independent sample dilutions were quantified twice (two independent standard curves), thus yielding four measurements per sample that were averaged to obtain the DNA concentration estimate.
Crocus sativus has a haploid genome weight of 5.9 pg and a ploidy of 3 resulting in a monoploid genome weight of 7.87 pg [34] or approximately 127 genome copies per ng of template DNA. However, the results from the digital PCR indicated that the target sequence is only present on two of the three genome copies (2/3 copies per monoploid genome). Saffron is known to be a probable progeny of C. cartwrightianus, which contributes to two of the three genomes, while the other parental lineages remain unclear [35]. As far as we understand it, the targeted sequence is only located on those two C. cartwrightianus derived genomes and not on the third one.
The CroSat assay has an estimated resolution of 1.6 under standard conditions (λ = 0.7, approximately 100 ng DNA per reactions) but improves markedly when less than 50 ng DNA per reaction is used; the assay had approximately 0.5% rain (see [36] for details on the performance parameters).

. Safflower
Carthamus tinctorius: The CarthaJQ primers target a sequence-characterized amplified region (SCAR). SCAR-primers are based on sequenced random amplified polymorphic DNA (RAPD) fragments that show differential amplification between the species of interest. Primers have been adopted from [27]; their sequence is listed in Table 1.

Real Time PCR
Reactions were performed in 25 µL using primers from Table 1 ordered from Invitrogen (standard desalted primers). Reactions were run using PowerUp™ SYBR™ Green Master Mix (Thermo Fisher Scientific, Waltham, MA, USA) and Nuclease free water (Thermo Fisher Scientific). Primer concentration was 200 nM final. DNA template input was 18 ng per reaction, unless otherwise mentioned. All reactions were amplified in ABI microamp 96-well 0.1 mL Fast plates using an Applied Biosystems Quantstudio S7 (Thermo Fisher Scientific). A single thermal cycling protocol was used for all real time PCR reactions: 10 min 95 • C, 60× (15 s 95 • C, 1 min 60 • C). Results were analyzed and exported using the SDS software.

Droplet Digital PCR
Reactions were performed using the Biorad QX200 platform using Twin.Tec 96 well PCR plates (Eppendorf, Hamburg, Germany) . Initial volume of the reaction mixture was 20 µL which, together with the droplet generating oil, resulted in a final PCR volume of approximately 45 µL. Reactions were set up using Evagreen Supermix (Biorad, Hercules, CA, USA), primers ordered from Invitrogen, and Nuclease free water (Ambion). Primer concentration was 200 nM final for CarthaJQ and 300 nM for CroSat. DNA template input varied from 5 to 15 ng per reaction depending on the concentration of the DNA extract. Thermal cycling was performed on a ABI Veriti using the following thermal cycling protocol: 10 min 95 • C, 45× (15 s 95 • C, 1 min 60 • C), 10 min 98 • C. Results were analyzed and exported using the Quantasoft 1.6.6.320 software (Biorad, Hercules, CA, USA).

Barcode PCR Amplification
The five barcodes recommended by the Consortium for the Barcode of Life (CBOL) Plant Working group [40] were used for metabarcoding. Since the five barcodes have different annealing temperatures, five separated PCR reactions were performed. Usually, 40 ng of DNA were used in each reaction. The barcodes, the primers, and the annealing temperatures are reported in Table 2.
PCR reactions were performed in a volume of 50 µL with primers obtained from Invitrogen and using Gold 360 Mastermix (Thermo Fisher Scientific), DMSO (Merck, Kenilworth, NJ, USA), and Nuclease free water (Ambion). Thermal cycling was performed on a GeneAmp PCR system 9700 (Applied Biosystems) using the following thermal cycling protocol: 10 min 95 • C, 35× (30 s 95 • C, 30 s (temperature see Table 2), 40 s 72 • C), 7 min 72 • C. PCR products were separated by agarose gel electrophoresis and then purified using a column based PCR purification kit (PureLink PCR Purification Kit, Invitrogen) and quantified by fluorescence measurements (Qubit) . These products were used as starting material to prepare the DNA barcode libraries for NGS. Table 2. List of barcodes with primer sequences, annealing temperatures, and the mean expected amplicon sizes.

Library Preparation and Sequencing
The libraries were prepared using the Ion Plus Fragment Library Kit (Thermo Fisher Scientific), following the manufacturer's recommendations [41]. All the libraries were evaluated for their quality on the Agilent 2100 Bioanalyzer instrument. Subsequently, the libraries were pooled in an equimolar amount into the template reaction for attachment of the fragments to Ion Sphere Particles (ISP) and clonal amplification in emulsion-PCR. The template reaction was conducted on the Ion OneTouch 2 instrument (Thermo Fisher). Next, recovery and enrichment were performed. Enriched samples were subsequently sequenced on the Ion GeneStudio S5 System (Thermo Fisher), using the Ion 520 chip, which produced 3-5 million reads (1-2 Gb).
2.6. Data Processing 2.6.1. DNA Accounting Data Analysis All calculations and curve fitting were done using R [42] version 3.5.2 (20 December 2018) 'Eggshell Igloo'. The data were exported from the droplet reader as 'csv files' and imported into R. Droplet calling was done using the approach presented in [36] using the 'cloudy' algorithm version 3.03 as retrieved from Github (https://github.com/Gromgorgel/ ddPCR, accessed on 12 February 2021). Non-NGS related sequence analysis (e.g., for primer design and local alignments) was performed in R using functions available through Bioconductor [43] and the 'DNR' package available through Github (http://www.github. com/Gromgorgel/R_Scripts, accessed on 12 February 2021).

NGS Data Analysis
The sequencing data obtained were analyzed on the Torrent Suite Software and then with a custom-tailored software for species identification, provided by Thermo Fisher. The software clustered all the reads and then BLASTed against the NCBI nt database (downloaded locally), providing as results the number of reads attributed to a species with a certain degree of identity (by default higher than 99%). In this way, a list of the species detected in each sample was obtained. The results were then analyzed to evaluate how many reads were attributed to the species of interest and how many reads to possible contaminants or adulterants.

Method Development
The principle of the proposed DNA accounting is based on linear regression analysis to establish a relationship between the measured number of target copies and the expected number of copies derived from the DNA amount and the haploid genome weight of Crocus sativus. This relationship was initially established using DNA extracted from Crocus sativus bulbs as reference samples of known identity and purity (n = 72, analysed in duplicate). In a later stage, this was repeated with DNA extracted from stigmata of Crocus sativus market samples (four samples, analyzed in duplicate) with purity confirmed by NGS (Crocus sativus read percentage above 95%). On both occasions, a large number of Crocus sativus samples spanning a wide range of copy numbers (independent extracts, independent dilutions) was evaluated using the CroSat assay. These 'measured copies' (see Equation (1)) were regressed against the 'expected copies' (c e , as calculated from the measured DNA concentration, see Equation (2)) as shown in Figure 1.
Here, λ is the average number of target copies per partition as measured by digital PCR, V p is the partition volume (0.00085 µL for the QX200 platform), and V s is the sample volume (20 µL).
Here, DN A is the amount (in ng) of template in the reaction, n is the number of target copies per haploid genome, x is the ploidy of the species, and 1C its monoploid genome weight in pg. Figure 1 shows a clear correlation between the expected and measured copies with a slope close to one (1.138). However, the residuals plot (not shown) shows a non-constant variance for the residuals (heteroskedasticity). As a consequence the estimated standard errors are inconsistent. This can be overcome by using weighted least squares (WLS) to compute a prediction interval (i.e., the range of measurements in which 95% of all future observations are expected to lie) as shown in Figure 1, provided the samples have comparable purity of the reference sample. However, it should be noted that without explicitly formulating a model for the variance, the results obtained are not mathematically exact but approximate.
Samples whose measurements fall within the prediction interval are thus comparable to the reference samples and can be considered 'pure'. For samples outside the prediction interval, their purity can be expressed as a percentage (referred to as 'digital PCR purity' or 'percent ddPCR purity') by the ratio between a sample's measured result (c m ) and relevant prediction bound. This way, the percentage reflects how far outside the prediction interval a measurement lies. Or more formally: let s be a sample that has c m,s copies estimated by ddPCR and c e,s expected copies then its percentage dPCR purity (% s ) is given by where l Ce,s and u Ce,s are the respective lower and upper bounds of the prediction interval at c e,s .

Method Validation
Automated DNA extraction worked well for all samples with an average yield of 42.03 ng per µL. The A 260 /A 280 ratio, as checked by UV, was on average 1.864 (>1.8) indicating extracts of sufficient purity for downstream analysis. For a random subset of samples, DNA integrity was checked using automated electrophoresis (Agilent Tapestation, Genomic DNA screentape) and was found to be 4.2 (DIN value) on average. DNA was extracted from Crocus sativus bulbs during method development and for construction of the calibration curve for economical reasons. To ensure the results obtained with DNA from bulbs are compatible with results obtained from stigmas, the ddPCR results of the commercial samples with the highest percentage of NGS reads attributed to Crocus sativus (i.e., most certainly pure saffron) were plotted together with DNA extracted from bulbs and the resulting calibration curve is shown in Figure 1. DNA extracted from saffron stigmata and from bulbs behaved in a similar manner, which confirmed that DNA from pure bulbs and stigma can be used interchangeably.
An initial evaluation of the DNA accounting strategy was performed using laboratory prepared samples which consisted of Crocus sativus material admixed with various known adulterants (Safflower, wheat flour, and Capsicum annuum) in a range of w/w percentages (5, 10, 20, and 30%). Figure 1 shows where these samples fall with respect to the calibration curve and acceptance range around it. These results show that the sensitivity of the approach differs depending on how well DNA is extracted from the adulterant relative to saffron. For materials that have equal or better extractability (e.g., safflower), the ddPCR-% drops fast when increasing the admixture level (61, 57, 17, and 8% ddPCR purity for 5, 10, 20, and 30% admixture respectively). Whereas materials that yield less DNA per mg have a slower decline of ddPCR-% (75, 68, 55, and 53% ddPCR purity for 5, 10, 20, and 30% admixture of Capsicum respectively). In general, the samples with 5% admixture tended to fall close to, but outside, the acceptance range. We therefore estimate that 5% w/w admixture is about the limit of detection for non-declared biomaterial by DNA accounting, provided the DNA of the admixed material is not difficult to extract.

DNA Accounting of Saffron Samples
For each sample the number of copies of the genetic marker was counted using ddPCR and compared to the expected number of copies, as calculated from the amount of template DNA input and interpreted with respect to a calibration curve constructed from DNA obtained from Crocus sativus bulbs. Of the 141 commercial saffron samples, 108 samples were categorized as 'non suspicious' (between 80% and 120% ddPCR purity) and 33 were categorized as 'suspicious' (less than 80% or more than 120% ddPCR purity). Figure 2 shows a visual overview of these results. To provide a thorough evaluation of the DNA accounting approach, a total of 103 samples were analysed by NGS. A final overview of all samples that were deemed adulterated can be found in Table 3, whereas a complete overview of results is presented in appendix.
All samples categorized as 'non suspicious' were found to be botanically pure by NGS metabarcoding (see below). For the 'suspicious' samples that had a relatively high percent ddPCR purity (80-70%, >120%), reads were mostly attributed to Crocus sativus with a few samples having up to 15% reads attributed to other Crocus species but without indication of the presence of other botanicals. Five of the fourteen 'suspicious' samples with low ddPCR purity (<70%) also had suspiciously low C. sativus read counts (see Table 3), while the C. sativus read counts of the remaining samples ranged from 87.5 to 95.5%. This proves that the false-negative rate of DNA accounting is very low: none of the samples containing relevant amounts of non-declared botanicals as found by NGS were classified as 'non suspicous' by ddPCR. This highlights the advantage of the described screening approach, which justifies excluding samples categorized as 'non suspicious' based on their ddPCR purity from NGS analysis, resulting in a significant reduction of NGS workload (here: 33 instead instead of 141 samples). Overview of the market sample results: measured copies per sample plotted in function of their 'expected copies', the shaded area represents the 95% prediction area based on the regression of the reference samples. The boundaries of the acceptable range (80% and 120% ddPCR purity) are shown as dashed lines. Samples have been colored depending on their categorization: green for 100% ddPCR purity, orange for 80-120% ddPCR purity, red for <80% or >120%. To further simplify the data evaluation step, a decision rule was built by combining Equations (1) and (2) and omitting the constant terms. This produced a ratio of the measured copy number of a saffron sample and the corresponding amount (in ng) of template DNA in the reaction ( c m DN A ). The distribution of these ratios represents the variation present within the market (see Figure 3). The ratios were approximately normally distributed with an arithmetic mean value of 143 and standard deviation of 40; thus 95% of the population lies within the boundaries of 65-220 copies/ng (mean ±1.96× standard deviation). New samples can be evaluated by simply checking if their ratio falls within this range; if so, they can be considered as being uncontaminated with non-declared botanicals.
However, the use of the aforementioned strategy should not be done without at least an in-house verification using several authentic samples, as it is strongly dependent on the results obtained from DNA quantification.

NGS Metabarcoding
In the present study an average of 8261 reads per sample was obtained. The NGS software grouped reads into clusters if at least three identical reads were present; clusters with less than three reads were disregarded. Overall, none of the samples analyzed resulted in 100% of the reads attributed solely to Saffron (i.e., Crocus sativus), in samples that fell within the prediction interval, the average read % for Crocus sativus was 91.5%. The bulk of the remaining reads was attributed to 'Crocus spp.' without exact species returned (on average 8.7%). Samples within the acceptance range (but outside the prediction interval) had a an average read % for Crocus sativus of 89.0%. The majority of samples outside the acceptance range had similar read % (i.e., 90.3%, outliers removed) with the exception of the samples listed in Table 3 whose average is 7.7%. Those five samples had particularly low saffron read counts (see Table 3) and are considered adulterated. In four of these samples Carthamus tinctorius (safflower), a known saffron adulterant [26,27,39] was detected with significant read counts (57%, 90%, 88%, 80%) and its presence was confirmed using a safflower specific qPCR [27].
A full breakdown of the results can be found in supplemental material.

qPCR Confirmation of Results
For the samples in which NGS analysis indicated the presence of safflower, qPCR confirmation of its presence was performed by amplifying a specific target for Carthamus tinctorius (Table 1). All four reactions showed amplification of the target and the amplicon had the same melting temperature as a known reference sample (DNA extracted from Carthamus tinctorius seed material). These results confirmed the adulteration of these samples as already indicated by both the sequencing and DNA accounting analysis.

Discussion
DNA based methods have become a widely used tool for detecting fraud in the agri-food chain, particularly for species identification and quantification of certain food ingredient [6][7][8][9]. Species-specific PCR is the method of choice to target a particular adulterant, e.g., horse meat in beef patties, which, by multiplexing, can be extended to target several, known adulterants. Another popular approach, Sanger sequencing of barcoding regions, while efficient for identifying species, does not allow targeting several non-declared species (adulterants) at the same time.
Authentication of culinary herbs and spices goes beyond the question whether the named species, e.g., Crocus sativus, is present but tries to answer how much of the named species makes up the sample, in other words how 'pure' the sample is. The challenge in using quantitative PCR is the transformation of DNA amount (mass or number of copies) into mass (fraction) of the biological material. In its simplest format, binary mixtures (weight/weight) of the adulterant/named species are prepared, DNA is extracted, and after real-time PCR the cycle threshold (Cq) values are plotted against the log-transformed mixture percentage. Including a ubiquitous reference gene makes the assay more robust and compensates to a certain degree the effect of processing on DNA extractability and integrity [44]. However, most published PCR assays aim at quantifying an adulterant using primers specific for the non-declared species, whereas for assessing 'purity' the named species itself has to be quantified.
By assuming that the DNA content is proportional to the mass of biomaterials, Rong Chen et al. [45] developed a real-time PCR assay for estimating directly the mass of saffron in saffron containing herbal products; furthermore, they showed that the amount of DNA was fairly constant for several different batches of saffron.
We have taken this idea one step further and suggest a broadly applicable technique for estimating the 'purity' of biological matter using digital PCR. This technique is less prone to interference by amplification inhibitors and allows the estimation of the number of DNA copies present in a sample without external references. By measuring the number of genome copies in a sample and comparing it to the number of copies calculated from the known amount of DNA used in the reaction, 'purity' of a biomaterial can be estimated, provided the species of interest is well characterized (genome size, ploidy, target copy number). The number of target copies measured should be identical to the 'expected' number of target copies as calculated from its DNA concentration (measured flourometrically) within the bounds of measurements uncertainty.
When regressing the expected against the measured copy numbers over a wide range of saffron samples, the slope of the regression line was 1.138 (Figure 1), proving the assumption that 'purity' of a biomaterial can be assessed by dPCR. However, the estimated slope was not fully in line with the ideal value of 1.00. The reason for this could be inherent to the instrument used, as Low et al. [46] also found deviations from the ideal value when comparing measured copy numbers to dilutions of a certified reference material (ERM-AD623), which were attributed to the particular brand of ddPCR instrument. Furthermore, inaccuracies in the analytical chain are inevitable: fluorometric quantification of the DNA in the extract, the 1C values used, DNA damage during extraction, PCR efficiency, etc. which may all contribute to the deviation of the slope from unity.
We have applied the technique to the analysis of 141 commercial samples of saffron taken in 20 EU Member States. It has to be stressed that the 'purity' estimate obtained by ddPCR is intended for screening purposes and not to draw firm conclusion whether nondeclared botanicals are indeed present. For confirming this, samples flagged as suspicious were analysed using NGS metabarcoding as a means of compositional analysis. If NGS indicated species known to be potential adulterants, their presence was confirmed by qPCR of a species-specific target.
ddPCR screening indicated more samples as suspicious than could be confirmed by NGS. This was partially expected as the DNA used for the construction of the reference curve for DNA accounting was obtained from Crocus sativus bulbs, and although precautions have been taken to minimize difference in DNA-quality between reference and samples (i.e., the bulbs were cut, dried, and milled), it stands to reason that the variability of DNA quality, and therefore of testing results, is higher amongst samples. Differences in processing, drying, milling, storage, microbial decontamination (e.g., gamma irradiation [47]), etc. may adversely affect both DNA quantification and its specific target amplifiability.
DNA accounting has proven to be a valuable analytical approach in the food fraud detection workflow. The ability to rapidly screen large numbers of samples for their purity is a much needed capacity in today's food chain with its rapid turn-over. The suggested approach has significant value as a screening tool and has also a potential for a confirmatory assay to estimate the purity of botanicals. For the latter approach, the inclusion of a suitable and ubiquitously present reference gene in the assay would be necessary to compensate for variations in DNA extractability and amplifiability. Data Availability Statement: Data is available in the Appendix. For raw data disclosure please contact the authors.

Conflicts of Interest:
The authors declare no conflict of interest.

Abbreviations
The following abbreviations are used in this manuscript: