Developing Single Nucleotide Polymorphisms for Identification of Cod Products by RAD-Seq

Simple Summary As high-value fishery products, cod species are frequently faked by non-cod ones. Genetic fingerprinting is important for both certifying authenticity and traceability of fish species. In this study, we developed a method that combines DNA barcoding and restriction-site associated DNA sequencing (RAD-Seq) approach for the identification of cod products. Two sequences that contain single nucleotide polymorphism (SNP)s were identified and these SNPs can be used to distinguish three different cod species, which are Atlantic cod (Gadus morhua), Greenland turbot (Reinhardtius hippoglossoides), and Patagonian toothfish (Dissostichus eleginoides). This SNP-based method will help us to identify the products, which are sold under the name of “Xue Yu” (Cod) in China, and works in parallel with existing fish identification techniques to establish an efficient framework to detect and prevent fraud at all points of the cod commercialization. Abstract The increase in the rate of seafood fraud, particularly in the expensive fishes, forces us to verify the identity of marine products. Meanwhile, the definition of cod lacks consistency at the international level, as few standards and effective application methods are capable of accurately detecting cod species. Genetic fingerprinting is important for both certifying authenticity and traceability of fish species. In this study, we developed a method that combines DNA barcoding and the restriction-site associated DNA sequencing (RAD-Seq) approach for the identification of cod products. We first obtained 6941 high-quality single nucleotide polymorphism (SNP)s from 65.6 gigabases (Gb) of RAD-Seq raw data, and two sequences that contain SNPs were finally used to successfully identify three different cod product species, which are Atlantic cod (Gadus morhua), Greenland turbot (Reinhardtius hippoglossoides), and Patagonian toothfish (Dissostichus eleginoides). This SNP-based method will help us to identify the products, which are sold under the name of “Xue Yu” (Cod) in China, and works in parallel with existing fish identification techniques to establish an efficient framework to detect and prevent fraud at all points of the seafood supply chain.


Introduction
Fish consumption has been increasing mainly due to an increase in the attention of consumers towards nutrition and health. However, this trend also leads to more fraudulent practices, which chiefly include mislabeling, the substitution of fish with inexpensive or inedible varieties, and fish products of unknown origin [1,2]. To ensure the authenticity of food, most countries have established Animals 2020, 10, 423 3 of 12

Cod Samples and DNA Isolation
The cod samples were purchased from local supermarkets, Shenzhen, China. All samples were sent to the laboratory by ice-chilled. Approximately 50 mg tissues of each sample were collected and stored at −20 • C before the use. In order to retrieve and check the data easily, each sample was marked with a unique number and product details, such as sample ID, species on the label, and product status (in Table 1). DNA was extracted using the DNeasy Tissue and Blood Kit (Qiagen Inc., Valencia, CA, USA), followed by the manufacturer' s recommended guidelines. The quantification of the extracted DNA was also ascertained by weighing the absorbance at 260 and 280 nm in a UV-Vis spectrophotometer [32].

For
the species identification, all tested samples used for RAD-Seq were firstly amplified by PCR with COI universal primers (F: CACGACGTTGTAAAACGACTCAACYAATCAYAAAGATATYGGCAC; R: GGATAACAATTTCACACAGGACTTCYGGGTGRCCRAARAATCA), sequenced on an automatic sequencer ABI3730 (Thermo Fisher Scientific, Waltham, MA, USA), and then the sequences were aligned using BLASTn tool (https://blast.ncbi.nlm.nih.gov/). PCR for COI amplification consisted of 5 µL 10× PCR Buffer, 0.5 µL of 1-20 ng/µL genomic DNA, 4 µL of dNTPs, 1 µL of a 0 µM solution of each primer1, 0.5 µL of Taq polymerase and 38 µL of ddH 2 O, to a final reaction volume of 50 µL. PCR amplification included the following profile: 95 • C denaturation for 3 min, followed by 35 cycles of 95 • C for 15 s, 55 • C for 15 s, 72 • C for 1 min, and final extension step at 72 • C for 5 min. PCR products were checked by 1.5% agarose gel (Gelly PhorLE, Euro Clone, UK) (120 V, 200 Ma, 25 min). Then, the PCR products were sent to BGI-Guangzhou for purification and sequencing. The cod COI sequences were analyzed using the basic local alignment search tool through the GenBank database (https://www.ncbi.nlm.nih.gov/genbank/) considering sequence similarity of at least 98%. Four mitogenomes of cod species (DQ385443.1 for Theragra chalcogramma; JF952739.1 for G. macrocephalus; LC146707.1 for G. ogac; and JQ353979.1 for Anoplopoma fimbria) were downloaded from NCBI (National Center for Biotechnology Information), and then, the COI regions were aligned with 28 COI sequences produced here for the tested samples. We used MEGA7 [33] for the tree construction, using the Bayesian inference analysis. All experiments were carried out according to the guidelines of the Animal Ethics Committee and were approved by the Institutional Review Board on Bioethics and Biosafety of BGI (No. FT14015).

RAD-Seq and Data Analysis
We prepared standard RAD-Seq libraries with 28 samples following the protocol [34]. The 1 µg of genomic DNA from per sample was digested with the restriction endonuclease Taql (5 TCGA3 ) incubated for 20 min in Fast Digest buffer (total volume of 20 µL) at 64 • C. Then, a 5 µL RAD adapter mixture (64 barcodes) was added to the interruption product and the ligation homogenization purification. Quantitative analysis was then carried out by the operating instructions of the Qubit detection kit (Invitrogen, Carlsbad, MA, USA). An Agilent 2100 Bioanalyzer (Palo Alto, CA, USA) was used to detect the length distribution of the purified PCR products. Subsequently, the products were cyclized, digested by enzyme digestion, and purified. The purified product was quantified by the Qubit dsDNA HS Assay kit (Thermo Fisher Scientific, Aalst, Belgium). Finally, sequencing was carried out at the BGISEQ-500RS (BGI, Shenzhen, China) using a 100-bp paired-end strategy.
By using the SOAPnuke software (https://github.com/BGI-flexlab/SOAPnuke) [35], all RAD-seq data of each sample were quality constrained. The remaining high-quality reads were used for subsequent analyses. Then, the genotyping analyses were conducted by the non-reference genomes software Stacks v2.4 (http://catchenlab.life.illinois.edu/stacks/) [25]. The Ustacks program (-m 2 -M 2 -p 15) was run to cluster RAD tags until they exactly matched. The Cstacks program (-b 1 -n 3 -p 15) was set to collect variations, and retained reads were sorted into loci by the Sstacks program (-p 15 -b 1). The parameters of retaining SNPs were as follows: (1) base quality ≥ 25; (2) depth ≥ 5. A total of 6941 SNPs were available for the downstream identification analysis. Then, the following criterion was used to obtain the specific SNPs: (1) in the same species, the genotype of each sample was homozygous and more than 80% of samples had the same genotype; (2) the same locus of three cod had different genotypes.

Primer Design and SNPs Validation
To validate the SNPs identified from RAD-Seq, seven of them were selected for PCR amplification. Flanking sequences of selected SNPs were extracted from the G. morhua reference genome [36] by the BLAST tool (-p blast -m 8 -e 1e-5) (https://nihlibrary.ors.nih.gov/bioinfo/blast/blast.htm) and PCR primers were designed using the software Primer3 (http://primer3.ut.ee/) [37]. Finally, once primer pairs were selected, we checked their specificity in the Primer-BLAST tool available on the NCBI website (https://www.ncbi.nlm.nih.gov/tools/primer-blast/). SNP validation was performed using randomly selected fish products, including Epinephelus coioides (orange-spotted grouper), Boleophthalmus pectinirostris (blue-spotted mudskipper), G. morhua, R. hippoglossoides, and Merlangius from Shenzhen's supermarkets, and then we used the samples for the subsequent analyses. Genomic DNA was extracted using DNeasy Blood & Tissue Kit (QIAGEN, Dusseldorf, Germany) following the manufacturer's recommended guidelines [38]. DNA extraction negative control was processed in parallel to ensure sample integrity throughout the DNA extraction procedure. Standard PCRs (conditions: 95 • C denaturation for 3 min, followed by 35 cycles of 95 • C for 15 s, 60 • C for 15 s, 72 • C for 1 min, and final extension step at 72 • C for 5 min) were performed to amplify the selected loci. PCR control reactions were run using ultrapure water. Negative control was performed on E. coioides, B. pectinirostris, respectively. Validation required SNPs firstly to amplify predominantly and secondly to be polymorphic. Then, the PCR products were sent to BGI-Guangzhou for sequencing and the sequences were aligned using MEGA7 for checking the genotypes.

DNA Barcoding of Cod Samples
The primary analyses for species identification using 28 COI sequences and the four available mitogenomes showed which samples were suitable to be used for subsequent analyses as standard cod samples ( Table 1). The Bayesian tree ( Figure 1) clustered 12 samples as Blan xue, ten as Yin xue, and Animals 2020, 10, 423 6 of 12 eight samples as Da xi yang xue. Simultaneously, the COI identification results were also validated by the phylogenetic tree. Based on these results, the 28 samples were used for the RAD-seq essays.

SNPs Identification Based on RAD Sequencing Data
A total of 65.6 Gb raw data was generated from 28 samples. After trimming the low-quality raw reads, 41.9 Gb of clean data were retained. Clean reads of separated individuals were deposited in the CNSA (CNGB Nucleotide Sequence Archive) public database (CNP0000707). A total of 6941 high-quality SNPs were used for searching identification markers. After filtering out the SNPs, we found only homozygous genotypes for each sample, and more than 80% of samples had the same genotype. A total of 126 SNPs were selected, and then we chose five tag regions containing different SNP genotypes at the same loci among the three cod species.

Primer Design and SNPs Verification
Flanking sequences of those five tags were extracted from the reference genome of G. morhua and the existing NCBI data by sequence alignment. Five pairs of primers were obtained under the default stringency conditions of the Primer3 software. Finally, three pairs of primers were failed to Animals 2020, 10, 423 7 of 12 yield an amplification product, and two pairs of primers (Table 2) provided effective amplification, which generated amplicons of the expected sizes, without any unspecific amplification bands in the agarose gels ( Figure 2). The primer sequences, annealing temperature (TM), and expected sizes for the two tag regions (12,014 and 42,229) are shown in Table 2. We observed no amplification for E. coioides (Ec), B. pectinirostris (Bp), and blank control (H 2 O), and positive bands for all cod species tested (see Figure 2). The amplicon sequencing and their posterior analyses in the G. morhua, R. hippoglossoides, and D. eleginoides showed differences among the three species. For the 12,014 sequences, we found three SNPs at sites 82, 112, and 220; for the 42,229 sequences, we found two SNPs at 127 and 163 positions ( Table 3, Supplementary Material Table S1). In addition, we searched for voucher sequences that matched the sequence F12014 and F42229 from the public databases (Supplementary Material Table  S1). We obtained 8 sequences for G. morhua, 10 for D. eleginoides, and 10 R. hippoglossoides, of which the SNPs were consistent with our findings (see Table 3). For the sequence F42229, we obtained eight matches for G. morhua, seven for D. eleginoides, and four for R. hippoglossoides. All SNPs were also consistent with those shown in Table 3. genotypes at the same loci among the three cod species.

Primer Design and SNPs Verification
Flanking sequences of those five tags were extracted from the reference genome of G. morhua and the existing NCBI data by sequence alignment. Five pairs of primers were obtained under the default stringency conditions of the Primer3 software. Finally, three pairs of primers were failed to yield an amplification product, and two pairs of primers (Table 2) provided effective amplification, which generated amplicons of the expected sizes, without any unspecific amplification bands in the agarose gels ( Figure 2). The primer sequences, annealing temperature (TM), and expected sizes for the two tag regions (12,014 and 42,229) are shown in Table 2. We observed no amplification for E. coioides (Ec), B. pectinirostris (Bp), and blank control (H20), and positive bands for all cod species tested (see Figure 2). The amplicon sequencing and their posterior analyses in the G. morhua, R. hippoglossoides, and D. eleginoides showed differences among the three species. For the 12,014 sequences, we found three SNPs at sites 82, 112, and 220; for the 42,229 sequences, we found two SNPs at 127 and 163 positions ( Table 3, Supplementary Material Table S1). In addition, we searched for voucher sequences that matched the sequence F12014 and F42229 from the public databases (Supplementary Material Table S1). We obtained 8 sequences for G. morhua, 10 for D. eleginoides, and 10 R. hippoglossoides, of which the SNPs were consistent with our findings (see Table 3). For the sequence F42229, we obtained eight matches for G. morhua, seven for D. eleginoides, and four for R. hippoglossoides. All SNPs were also consistent with those shown in Table 3.

The Confusion of Commercial Cod Species
Increasing demand for seafood has led to more and more commercial fraud. Fish are often renamed with titles that are perceived to be more attractive, or given the same names as higher-priced species in order to gain economic advantages [39]. An example of this is the English term "Silver Animals 2020, 10, 423 8 of 12 cod", which does not refer to any cod species; the corresponding Chinese Pinyin (Yin Xue) refers to D. eleginoides. However, when correctly identified it can be found that the correct English name for this species is Patagonian toothfish, and the corresponding Chinese Pinyin is called "Nan Ji xiao lin quan ya yu". Similarly, it can be found that the fish known as flat cod does not actually refer to any cod species either, but the corresponding Chinese Pinyin "Ge ling lan bian xue" is referred to R. hippoglossoides. In fact, when correctly identified, the English name for the fish known as "flat cod" is Greenland turbot, and the corresponding Chinese Pinyin is "Ma she die" [40]. The Chinese denominations of common codfish samples in the Chinese market with the corresponding pinyin, English and scientific names were provided in Supplementary Material Table S2.

COI Gene and Phylogenetic Analysis
In this study, after careful selection of 28 samples to ensure they met all the requirements of this experiment, it was found that the majority of the sample fragments of the COI gene in the expected size were amplified. A popular method of identifying many fish species is to search for certain similarities in the sequence of databases found in the COI regions [41]. Nevertheless, it is also known that a valuable contribution to clarifying the identity of species in doubtful cases can also be made using the phylogenetic analysis to evaluate the COI gene power of discrimination. Thus, it was used for this study, with a phylogenetic analysis based on the Bayesian inference analysis of 32 sequenced samples for the COI gene and the resulting tree is presented in Figure 1. It can be seen that the tree is divided into three main groups that are a perfect match for the three species. Even though the species G. chalcogrammus demonstrates a higher similarity to G. morhu, the groups are still well separated. This is in accordance with the discriminant power attributed to COI as the optimal barcode to the identification of animal species [42]. In addition, it has been found that nearly all of these cod species tend to be very similar in appearance, making their morphological identification extremely difficult, and almost to the point of being impossible [22]. Therefore, sequencing all the samples in this study will not just assess the current level of species substitution in this particular type of product, but also more accurately determine species types that can then be used as standard samples in developing subsequent SNP-based methods.

RAD-seq Identification of Cod Species
When it comes to identifying fish species and even fishery products, molecular methods have been widely used [2], and they have mainly employed mtDNA sequences as taxon barcodes [28,43,44]. Especially, the COI sequence has been serving as a DNA barcode of a global bioidentification system for animals [45]. There are many advantages of the COI barcode: it is less expensive, faster, and only a small segment of mtDNA is needed. Plus, it does not require taxonomic experts to administer. However, the shortcomings of COI barcode are also notable, such as, it cannot distinguish between closely related species from highly similar species; and it cannot help solve the "classification barrier" for the vast majority of unidentified organisms because database resources are lacking [46]. With the rapidly expanding and application of sequencing technology, numerous genomic resources have been provided to identify major marine species [23]. Alternatively, RAD-Seq and subsequent variations have also been extensively applied to generate SNPs [29]. These RAD techniques are popular and accurate, since they are not dependent on prior genomic information [47,48]. Furthermore, they can develop a number of SNPs as candidate markers for species-level assignments [29].
As a final remark, and in what concerns the ecological impacts of our work, our study revealed the use of threatened species as food items. The species G. morhua in this study is listed as "near threatened" and its population is susceptible to overfishing [49]. R. hippoglossoides is a slow-growing and deep-water fish of high commercial value, which belongs to the family Pleuronectidae [50], yet is now included in the red seafood list of Greenpeace International due to overexploitation [51]. D. eleginoides is a large notothenioid species that is considered to have a high migratory capacity [52]. Because of its large size and nutritional content, it is considered to be a kind of fish with high economic value. Most of these Animals 2020, 10, 423 9 of 12 species are caught and collected in the distant oceans of Antarctica, which tends to make tracing it very difficult. The above reason will cause a serious existent crisis for the three cod species. Over and above, while there are many detrimental effects for the commercial market of mislabeling seafood, consumers can also be put at risk if they purchase the potentially harmful products. These tools are hence available for the China scientific and governmental authorities to enhance market surveillance and authentication of fresh cod and processed cod products commonly consumed in the country.

Conclusions
China has the potential to greatly impact the world's seafood trade and consumption. The current research on cod, chosen as the most commercialized species globally, shows that the lack of national technical support for fishery products could seriously affect the fishing industry, consumers' protection, as well as the preservation of fish stocks. In this study, the target species have been first detected in identifying cod species with SNPs genotyped by RAD-Seq, which predominantly demonstrates the applicability of this method to the identification of various fish products types. There are several advantages to the use of the RAD-Seq-based method compared to COI analyses. For example, the species-specific SNPs derived from RAD-Seq could be used for designing specific PCR primers that just amplify the corresponding DNA fragment, then it is possible to distinguish the SNPs by agarose gel electrophoresis.
Author Contributions: Conceptualization, C.Z., X.Y., and X.M.; methodology, S.J. and X.Y.; formal analysis, S.J.; resources, T.L. and X.M.; writing-original draft preparation, S.J. and X.Y.; writing-review and editing, S.J. and X.Y.; funding acquisition, X.M. and C.Z. All authors have read and agreed to the published version of the manuscript.

Conflicts of Interest:
The authors declare no conflict of interest.