Development of a DNA Metabarcoding Method for the Identification of Bivalve Species in Seafood Products

The production of bivalve species has been increasing in the last decades. In spite of strict requirements for species declaration, incorrect labelling of bivalve products has repeatedly been detected. We present a DNA metabarcoding method allowing the identification of bivalve species belonging to the bivalve families Mytilidae (mussels), Pectinidae (scallops), and Ostreidae (oysters) in foodstuffs. The method, developed on Illumina instruments, targets a 150 bp fragment of mitochondrial 16S rDNA. We designed seven primers (three primers for mussel species, two primers for scallop species and a primer pair for oyster species) and combined them in a triplex PCR assay. In each of eleven reference samples, the bivalve species was identified correctly. In ten DNA extract mixtures, not only the main component (97.0–98.0%) but also the minor components (0.5–1.5%) were detected correctly, with only a few exceptions. The DNA metabarcoding method was found to be applicable to complex and processed foodstuffs, allowing the identification of bivalves in, e.g., marinated form, in sauces, in seafood mixes and even in instant noodle seafood. The method is highly suitable for food authentication in routine analysis, in particular in combination with a DNA metabarcoding method for mammalian and poultry species published recently.


Introduction
Bivalves, a class of molluscs, are distributed worldwide. Due to their high content of essential nutrients, their production has steadily been increased over the last three decades [1][2][3][4][5]. Mytilidae (mussels), Pectinidae (scallops), and Ostreidae (oysters) are the most important bivalve families for human consumption. Each of these bivalve families is divided into several genera comprising a high number of species [6]. In 2019, 1.03 million tons of mussels, scallops, and oysters were caught in nature and 10.25 million tons were cultivated in aquaculture, earning a profit of millions of US dollars [7].
In the EU, international and national regulations exist to ensure legal trade in seafood and seafood products. The EU directive 1379/2013 regulates market organization of fishery and aquaculture products, including correct declaration of seafood [8]. To comply with legal regulations, labels must include both the local trade name in the official language(s) and the correct scientific Latin name [8,9]. Correct labelling of seafood products is important A total of 86 commercial food products were collected from regional supermarkets, fish markets, and delicacy shops in Austria from summer 2018 until winter 2020 (Supplementary  Table S1). Samples were either fresh, deep-frozen, or in processed condition. Each sample was given a specific ID number, with the letter "O" referring to oysters, "S" to scallops, "M" to mussels, and "Mi" to mixed-species seafood. Samples were stored at −20 • C until DNA extraction.
Eleven out of the 86 samples ("reference samples"), comprising three mussel, six scallop, and two oyster species (see Table 1), were used for method development. Identity of bivalve species in these reference samples (samples M12, M13 and M27 for mussels; samples S42, S46, S47, S49, S50, and S55 for scallops; samples O2 and O3 for oysters; Supplementary Table S1) was verified by subjecting DNA extracts to Sanger sequencing (Microsynth, Balgach, Switzerland) and matching the sequences against the public databases provided by the National Center for Biotechnology Information (NCBI, Bethesda, MD, USA). For Sanger sequencing, the forward and reverse primers listed in Table 2 were used.

DNA Extraction and Quantification
Raw material was cut into smaller pieces or homogenized. To 2.0 g of each sample, 10 mL of a hexadecyltrimethylammonium bromide (CTAB) buffer was added. After addition of 80 µL proteinase K, the mixture was incubated on an Intelli-Mixer TM RM2 (LTF Labortechnik, Wasserburg, Germany) overnight at 50 • C.
For DNA isolation, a commercial kit (Maxwell ® 16 FFS Nucleic Acid Extraction System Custom-Kit, Promega, Madison, WI, USA) was used according to the manufacturer's instructions. DNA concentration was determined fluorometrically (Qubit ® 2.0 fluorometer, Thermo Fisher Scientific, Waltham, MA, USA). For higher concentrations, the Qubit ® dsDNA broad range assay kit (2 to 1000 ng) was used, and for lower concentrations, the Qubit ® dsDNA high-sensitivity assay kit (0.2 to 100 ng) was used. DNA purity was assessed from the ratio of the absorbance at 260 and 280 nm (QIAxpert spectrophotometer, software version 2.2.0.21, Qiagen, Hilden, Germany). DNA extracts were stored at −20 • C until further use.

DNA Extract Mixtures
Ternary DNA extract mixtures were prepared by mixing DNA extracts (DNA concentration 5 ng/µL) from Pecten spp., Magallana gigas and Mytilus galloprovincialis, representing the three bivalve families Pectinidae, Ostreidae, and Mytilidae, respectively. Individual DNA extracts were mixed in a ratio of 98.0:1.5: In addition, DNA extract mixtures consisting of DNA from species belonging to one bivalve family were prepared. In these mixtures, DNA from one species was present as the main component, DNA from the other species as minor components (1.0% each). Since only two oyster species were available, the DNA extract mixture representing the bivalve family Ostreidae contained the closely related scallop (Placopecten magellanicus) as a major component (98.0%) and DNA from the two oyster species as minor components (1.0% each).
In addition to mixtures consisting of DNA from bivalve species only, a DNA extract mixture containing another mollusc species was prepared. DNA extract from a squid species (Sepiella inermis) was chosen as the main component (97.0%) and DNA from the bivalve species Placopecten magellanicus, Ostrea edulis and Perna canaliculus was present as minor components (1.0% each).

Reference Sequences
A 150 bp fragment of the mitochondrial 16S rDNA gene was used as a DNA barcode. Reference sequences for commonly consumed bivalve species and some exotic seafood species, that are permitted for consumption in Austria ("Codex Alimentarius Austriacus" chapter B35, [56]), were downloaded from the NCBI databases (Supplementary Table S2) by using CLC Genomics Workbench software (version 10.1.1, Qiagen, Hilden, Germany). If available, complete reference sequences from the RefSeq database were preferentially downloaded due to their reliability. In case complete reference sequences were not available, all DNA sequences of the mitochondrial 16S rDNA available for one and the same species, submitted by individual scientists, were aligned and checked for similarity and unidentified nucleotides. Subsequently, the DNA sequence with the highest quality (e.g., without unknown nucleotides, full-length of the DNA barcode) was chosen as a reference sequence.

Primer Systems
Primers were designed manually on a multiple DNA sequence alignment of the mitochondrial 16S rDNA of approximately 90 bivalve species using the CLC Genomics Workbench software (version 10.1.1, Qiagen, Hilden, Germany). The designed primers were checked for their physical and structural properties (e.g., formation of dimers, secondary structure, annealing temperature) using Oligo Calc, the OligoAnalyzer Tool provided by Integrated DNA Technologies (IDT, Coralville, IA, USA) and the online product descriptions from TIB Molbiol (Berlin, Germany). The primers, listed in Table 2, were synthesized by TIB Molbiol. Table 2 also shows the Illumina overhang adapter sequences which were linked to the target-specific primers.
All in-house-designed primers were tested in real-time PCR with DNA extracted from the eleven reference samples. During optimization, the following PCR conditions/parameters were kept constant and applied as published previously: DNA input amount of 12.5 ng, 'ready-to-use' HotStarTaq Master Mix Kit, annealing temperature (62 • C), 25 cycles [47].
Only one variable, the addition of magnesium chloride solution, was modified (addition of 1.5 or 3 mM MgCl 2 ). Real-time PCR reactions were carried out using a fluorescent intercalating dye (EvaGreen ® (20x in water)) in strip tubes or in 96-well plates, depending on the thermocycler used, the Rotor-Gene Q (Qiagen, Hilden, Germany) or the LightCycler ® 480 System (Roche, Penzberg, Germany), respectively. The total volume of the PCR reactions was 25 µL, consisting of 22.5 µL reaction mix and 2.5 µL of template DNA (diluted DNA samples (5 ng/µL)) or water as negative control. In the reaction mix, the HotStarTaq Master Mix Kit (Qiagen, Hilden, Germany) was used at a final concentration of 1x and the final concentration of primers was 0.2 µM, except the forward primer for mussels (0.4 µM). PCR cycling conditions were 15 min initial denaturation at 95 • C, 25 cycles at 95 • C, 62 • C and 72 • C for 30 s each, and a final elongation for 10 min at 72 • C. The primer pairs for mussels, scallops, and oysters with and without Illumina overhang adapter sequences were first used in singleplex PCR assays. Then, the seven primers (three forward and four reverse primers) listed in Table 2 were combined in a triplex assay. The identity of the PCR products was confirmed by melting curve analysis and/or agarose gel electrophoresis.

Library Preparation and NGS
In general, samples were sequenced by using either the MiSeq ® or the iSeq ® platform (Illumina, San Diego, CA, USA). DNA extracts were diluted to a DNA concentration of 5 ng/µL. Extracts with a DNA concentration < 5 ng/µL were used undiluted.
DNA library preparation was performed according to Dobrovolny et al. [47] with minor modifications (excess of MgCl 2 , final concentration 3 mM; average library size: 278 bp; diluted libraries of the iSeq ® system were denatured automatically on the instrument).
All sequencing runs were performed using either the MiSeq ® Reagent Kit v2 (300-cycles) or the iSeq ® 100 i1 Reagent v2 (300-cycles) with a final loading concentration of 8 pM. The pooled DNA libraries contained a 5% PhiX spike-in.
Reference samples were sequenced in six replicates (three sequencing runs, two replicates per run), while DNA extract mixtures were sequenced in nine replicates (three sequencing runs, three replicates per run). Commercial food products were sequenced in triplicates (three sequencing runs, one replicate per run) and food products were sequenced at least once by using either the MiSeq ® or the iSeq ® platform.

NGS Data Analysis Using Galaxy
After paired-end sequencing, the resulting FastQ files, generated by the instrument control software, were used as input for data analysis. The sequencing output in FastQ format was then processed with an analysis pipeline as described previously by using Galaxy (version 19.01) [47]. The published amplicon analysis workflow was modified as follows: the target-specific primers were trimmed from both ends using the tool Cutadapt and reads were not clustered into Operational Taxonomic Units (OTUs) [57]. Completely identical sequences were collapsed into a single representative sequence with the tool Dereplicate to minimize the number of reads, and then compared against a customized database for bivalves (Supplementary Table S2) using BLASTn [58].

Barcode Region and Primer Systems
We aimed to develop a DNA metabarcoding method allowing the differentiation between species belonging to the bivalve families Pectinidae, Ostreidae, and Mytilidae. To be applicable in routine analysis, the method should allow identifying the economically most important bivalve species in raw and highly processed food products.
We started with searching for appropriate DNA barcode regions of about 150 bp in length, containing conserved parts at the ends and a variable part in between. Potential DNA barcode regions were found in the mitochondrial DNA, especially the mitochondrial 16S rDNA. Several metabarcoding studies have shown that the sequences of the 16S rDNA gene are suitable as barcodes for species identification. Since we have already used a barcode region of the mitochondrial 16S rDNA to identify mammals and poultry [47], this marker gene was chosen as the DNA barcode for our assay.
Since the DNA metabarcoding method for bivalves should be compatible with the DNA metabarcoding method for mammalian and poultry species published recently [47], the primers should anneal at the same temperature (62 • C). In addition, the PCR cycle number should be limited to 25 and DNA libraries should be sequenced with Illumina reagent kits in the 300-cycle format. Due to high sequence variability between closely related bivalve species, none of the primer sets designed enabled obtaining a PCR product for each of the bivalve species of interest. Thus, we continued by designing three primer sets, one for each of the three bivalve families, Pectinidae, Ostreidae, and Mytilidae. Primer pairs consisting of one forward and one reverse primer allowed amplifying the DNA barcode region in scallop and oyster species (Table 2). However, in the case of mussels, a primer set consisting of one forward primer and two reverse primers ( Table 2) was necessary to obtain a PCR product for the mussel species listed in Table 1. Figure 1 shows an alignment of selected DNA barcode sequences for the commercially most relevant bivalve species. The alignment of the 90 bivalve species is shown in Supplementary Figure S1. Blue, green, and red bars indicate the binding sites of the primers for Pectinidae, Ostreidae and Mytilidae, respectively. With the three primer sets, PCR products differing in at least one base should be obtained for all bivalve species of interest. Further sequence alignments indicated that the DNA barcode region selected does not allow distinguishing between all species of the following genera: Chlamys spp., Euvola spp., Pecten spp., Crassostrea spp., Magallana spp., Ostrea spp. and Saccostrea spp. These species cannot be distinguished: Chlamys rubida and Chlamys behringiana; Pecten albicans, Pecten fumatus, Pecten jacobaeus, Pecten keppelianus, Pecten novaezelandiae, Pecten sulcicostatus, Crassostrea hongkongensis, and Crassostrea rivularis; Ostrea angelica and Ostrea lurida; as well as Ostrea permollis and Ostrea puelchana; and Saccostrea echinata, Saccostrea glomerata, and Saccostrea mytiloides. In addition, two mussel species, Mytilus platensis and Mytilus chilensis, can also not be distinguished (for Mytilus platensis only one DNA sequence entry was in the public databases provided by NCBI). However, differentiation at the genus level (Chlamys spp., Pecten spp., Crassostrea spp., Ostrea spp., Mytilus spp.) is sufficient according to the "Codex Alimentarius Austriacus" chapter B35 [56].
When we tested the primers in singleplex PCR assays, for each of the reference samples a PCR product of about 150 bp in length was obtained by increasing the concentration of the forward primer for mussels to 0.4 µM and keeping the concentration of the other six primers at 0.2 µM. In addition, we tested whether the seven primers could be combined to a triplex system. PCR products for the bivalve species of interest were obtained in one and the same vial by increasing the MgCl 2 concentration to a final concentration of 3 mM. Thus, we achieved our objective to perform the triplex PCR assay in combination with the previously published DNA metabarcoding assay for mammalian and poultry species [47].

Library Preparation, Pooling of Libraries, and Sequencing
Library preparation, pooling of 5 or 7 µL per normalized DNA library, and the sequencing process were performed as described previously [47]. However, in case of the pooling process, all DNA libraries were mixed in equal volumes as recommended by the manufacturer's instruction. In our previous study, different volumes from individual DNA libraries were taken to achieve sufficient sequencing depth for minor components. For sample pooling to the maximum of 96 libraries, more than 100,000 NGS reads per sample were expected to be obtained using the 300-cycle MiSeq ® Reagent Kit v2.
Sequencing runs were performed in triplicate and the average run metrics were as follows: cluster density (969 K/mm 2 ) on the flow cell, cluster passing filter (70.22%) as well as the Q-scores (Q30) for read 1 and read 2 were 92.6% and 89.28%, respectively. A total of 5.02% of the total reads were identified as PhiX control sequences with an error rate of 1.49%.

Analysis of DNA Extracts from Reference Samples
PCR products were obtained for each of the reference samples and sequencing results for those samples are summarized in Table 3. The table shows mean values of the total number of raw reads, the total number of reads that passed the analysis pipeline in Galaxy as well as the total number and percentage of reads that were assigned correctly to the eleven species (based on six replicates).
No significant differences were observed in the total number of reads (before data analysis process) between these species, except Mytilus galloprovincialis (162843), Perna canaliculus (169631), and Mytilus edulis (134500). With the exception of Perna canaliculus, >70% of the reads passed the amplicon analysis workflow. All three mussel species, six scallop species and two oyster species could be identified with this workflow at a high rate (>97.5%), except Mytilus edulis.

Analysis of DNA Extract Mixtures
Six ternary DNA extract mixtures were analyzed containing the DNA of the three bivalve families Pectinidae, Ostreidae, and Mytilidae in ratios of 98.0:1.5:0.5 (v/v/v). The composition of the DNA extract mixtures and the results obtained by DNA metabarcoding are summarized in Table 4. The total number of raw reads ranged from 80856 to 159,737 and the reads that passed the workflow were in the range from 65961 to 147196. For the main components (98.0%), the number of reads assigned correctly ranged from 62434 to 140147. In addition, both minor components (1.5% and 0.5%) could be identified. The number of reads assigned correctly was in the range from 1710 to 4356 and 555 to 1478, respectively.  In addition, we analyzed three DNA extract mixtures consisting of DNA from species belonging to one bivalve family ( Table 5). The mixtures contained DNA from a scallop or mussel species, respectively. DNA from other bivalve species was present in a proportion of 1.0% each. Both species being present as main components, Placopecten magellanicus and Perna canaliculus, could be identified, with the number of reads assigned correctly ranging from 58156 to 77483. However, quite different numbers of reads were correctly assigned to the minor components, ranging from 626 (Mizuhopecten yessoensis) to 50,391 (Mytilus galloprovincialis). Aequipecten opercularis was the only minor component that could not be detected. We analyzed a further DNA extract mixture containing DNA from the squid species Sepiella inermis as main component (97.0%) and DNA from the bivalve species Placopecten magellanicus, Ostrea edulis, and Perna canaliculus as minor components (1.0% each). As expected, in this mixture, the main component could not be detected because the primers are not suitable for amplification of the target region for Sepiella inermis. 31424, 28162, and 806 reads, respectively, were assigned correctly to the three bivalve species.
In our previous metabarcoding study [47], individual DNA libraries were pooled in different ratios to achieve sufficient sequencing depth for minor components. The present study demonstrates, that minor components down to a proportion of 0.5% could be identified and differentiated although DNA libraries were pooled by mixing them in equal volumes. DNA extracts from reference samples and DNA extract mixtures most frequently resulted in less than 100,000 reads. However, for all samples on average > 75000 raw reads were obtained, which turned out to be sufficient for reliable species identification.

Analysis of Commercial Seafood Samples
In order to investigate the applicability of the DNA metabarcoding method to foodstuffs, DNA extracts from 75 commercial food products were analyzed. According to declaration, eight samples (O1 and O4-O10) contained oyster species, 27 samples (M11, M14-M26, and M28-M40) mussel species, 15 samples (S41, S43-45, S48, S51-S55, and S56-S61) scallop species and 25 samples (Mi62-Mi86) were mixed-species seafood products ( Table 6). The ingredient list of 30 out of 75 food products did not give any information on the bivalve species. A total of 39 samples were declared to contain "Crassostrea gigas", "Mytilus galloprovincialis", "Mytilus chilensis", "Mytilus edulis", "Zygochlamys patagonica", "Chlamys opercularis", "Placopecten magellanicus", "Pecten maximus", or "Patinopecten yessoensis". The remaining samples (n = 6) were labelled with "Mytilus spp." or "Pecten spp.".   Our results indicate that DNA metabarcoding by targeting the 16S rDNA barcode region of about 150 bp in length is applicable to complex and highly processed foodstuffs. The barcode region could be amplified and sequenced even in products such as Bouillabaisse, Paella, and instant noodle seafood. Oyster sauce was the only sample matrix for which PCR amplification and consequently sequencing failed. Failure of obtaining PCR products for oyster sauce has already been reported by Chin Chin et al. [50], most probably caused by excessive DNA fragmentation due to industrial processing.
In each of the six oyster products that could be subjected to sequencing (O1, O4-O8), Magallana gigas was identified. Magallana gigas is by far the predominant oyster species farmed in the EU [59].
Placopecten magellanicus and Patinopecten yessoensis were listed as ingredients in samples S41, S45, S54, and S57 and samples S48, S52, and S61, respectively. Our results confirmed the presence of these two species, except for sample S57. In sample S43, declared to contain Pecten maximus, the species Mizuhopecten yessoensis was detected. In sample S44 and S53, declared as Pecten spp., the species Mizuhopecten yessoensis was also identified. In line with previous studies, most products declared to contain "Jakobsmuschel" did not contain a species of the genus Pecten [15,18,19]. Instead, we identified Placopecten magellanicus or Mizuhopecten yessoensis.

Conclusions
The DNA metabarcoding method developed in this study allows the detection of species of Mytilidae (mussels), Pectinidae (scallops), and Ostreidae (oysters), the most important bivalve families for human consumption. By combining three forward and four reverse primers in a triplex PCR assay, the barcode region, a fragment of mitochondrial 16S rDNA, could be amplified in the species of interest.
The applicability of the novel DNA metabarcoding method was investigated by analyzing individual DNA extracts from eleven reference samples, ten DNA extract mixtures and DNA extracts from 75 commercial food products. In each of the eleven reference samples, the bivalve species was identified correctly. In DNA extract mixtures, not only the main component but also the minor components were detected correctly, with just a few exceptions. The analysis of commercial seafood products showed that the DNA metabarcoding method is applicable to complex and processed foodstuffs, allowing the identification of bivalves in, e.g., marinated form, in sauces, in seafood mixes and even in instant noodle seafood.
The DNA metabarcoding method runs on both the MiSeq ® and iSeq ® instrument of Illumina. Due to the compatibility of PCR and sequencing parameters, the DNA metabarcoding method can be combined with a DNA metabarcoding method for mammalian and poultry species published recently.

Patent
This manuscript has been submitted for grant of a European patent (application number: EP21204456.4).

Data Availability Statement:
The datasets generated for this study are available on request to the corresponding author.