Rice, being used in daily cuisine for over one-third of the world’s population, has become a potential target for many dishonest stakeholders and traders who mix low-grade/cost/nutrition adulterants to make a profit with the least effort. Though thousands of varieties with different commercial names are available, particular varieties have become popular and suit the tastes of consumers in a particular region [1
]. Attempts to develop superior varieties/products always ended up with some poor-quality ones. This led to the existence of different quality standards with obvious price differences in the market [3
]. This situation attracted dishonest traders to attempt adulteration of the genuine products to make surplus profit [5
Adulteration in rice is possible from crop harvest until it reaches consumers, and leads to nutrition and health risks [6
]. The common forms of rice prone to adulteration are brown rice, polished rice, rice flour, rice cakes, and rice bran oil. Authentication of rice cultivars is an important issue to address to protect the interests of farmers, dealers, millers, and food processors as well as to provide healthy food to consumers [7
]. To this end, each country establishes quality standards for agricultural products by providing a label with complete details including country of origin and chemical composition. Many methods based on criteria such as morphological parameters (height, grain shape, size), physicochemical properties (starch composition, lipids, seed storage proteins, and alloenzymes), DNA (single-nucleotide polymorphism, insertions and deletions), proteins, and metabolites have been developed to detect the genuineness of the agricultural and animal food products [1
In South Korea, rice consumption per person peaked in 1970 at 136.4 kg per year. Since the mid-1980s, rice consumption per person has declined almost every year, reaching 62.9 kg per year in 2015 [13
]. In addition, the total rice cultivation area accounts for about 50 percent of total cultivated land. Since 2006, rice cultivation area has dropped slightly faster than total area, and was 47.5 percent of the total in 2015 (Korean Statistical Information System (KOSIS) database). Furthermore, South Korea is an important market for U.S and Chinese rice, which subsequently used adulterants in Korean varieties to meet demand. Next-generation sequencing (NGS) technology has been widely used in rice genomics and molecular breeding studies [14
]. This has enabled researchers to perform an accurate genetic polymorphism analysis in order to identify unique molecular markers for particular rice varieties. However, the traceability of imported rice varieties needs in-depth analysis.
Hence, this study aimed at screening and selecting compatible polymorphic InDel markers through whole genome re-sequencing from 17 commercial rice cultivars from China. Our final goal is to identify accurate, sensitive and effective SNP, InDel markers to authenticate domestic Korean rice from those Chinese cultivars. Our results therefore lay the groundwork for long-term efforts to assess the purity of other Korean rice varieties.
High accuracy is important when detecting nucleotide polymorphisms, including SNPs or InDels, using the NGS re-sequencing strategy. In the International Rice Genome Sequencing Project (2005), the entire ‘Nipponbare’ genome was sequenced through the Sanger sequencing method and thereby precise reference sequences were established. For re-sequencing with short reads, the sequencing depth affects the accuracy of polymorphism detection. Here, the sequencing depth of the mapped sequences was an average of 28.9-fold (Table 1
), which is twice the average depth (~14-fold) achieved in the most current 3000 Genomes Project [16
]. The results of our study suggest that this sequencing depth is sufficient to accurately detect nucleotide polymorphisms. However, it appears that there is no relationship between the genome coverage and the sequencing depth of all the rice samples (Table 1
). We found that the number of polymorphic sites (SNPs and InDels) in the genomes of all the samples showed the opposite trend to the genome coverage (Table 1
Furthermore, re-sequencing with short reads cannot physically detect InDels that are longer than the read length (100 bp) and hence we only found insertions and deletions of less than 20 bp (data not shown). InDels with a larger size (100 bp) and copy number variations (>1 kb) were undoubtedly located all over the rice genome [17
]. Various sizes of InDels with different PCR amplicons were separated in both agarose and denaturing polyacrylamide gels [18
]. Here, we found that InDels could be easily detected by separation on 1.8% agarose gels, and we designed the markers accordingly. Indeed, we observed a high PCR success rate for the five markers (Figure 2
; Supplementary Table S1
), and these markers could be used to discriminate polymorphisms by electrophoresis.
New technologies and innovative approaches are changing the way we consider and apply genetic authentication of crop cultivars. DNA markers such as InDels play a crucial role in breeding and cultivar identification. The five InDel markers we identified might deserve further study for distinguishing Korean elite cultivars from adulterants.
4. Materials and Methods
4.1. Plant Materials
We performed whole-genome re-sequencing (WGRS) of 17 commercial rice grains of Japonica populations obtained from the National Institute of Crop Science (NICS), Rural Development Administration, South Korea.
4.2. DNA Isolation and Genome Sequencing
Genomic DNA was prepared from rice grains using a ChargeSwitch® gDNA plant kit (Invitrogen, Carlsbad, CA, United States) according to the manufacturer’s protocol. Whole-genome re-sequencing of the 27 samples was performed on an Illumina HiSeq2500™ by TheragenETEX Bio Institute (Suwon, South Korea). The procedure was performed according to the standard Illumina protocol, including sample preparation and sequencing as follows: high molecular weight genomic DNA was excised from the gel and sheared using a Covaris S2 ultra sonicator system in order to get appropriate sizes and agarose gel electrophoresis was used to select fragments. Libraries with short inserts of 350–450 bp for paired-end reads were prepared using a Truseq DNA sample prep kit following the manufacturer’s protocol for Illumina. Products were quantified using the Bioanalyzer 2100 (Agilent, Santa Clara, CA, USA) and sequencing was performed by establishing a library with Illumina HiSeq2500™. To ensure high quality, low-quality reads (<20) reads with adaptor sequence and duplicated reads were filtered out, and the remaining high-quality data were used in the mapping.
4.3. Mapping of Reads to the Reference
The raw reads were subjected to quality trimming (phred quality score, <Q20
) using FastQC [21
] and adapter trimming was carried out by using the parameters (-O 5 and -m 32) in version 1.0 of the cutadapt software [22
]. Furthermore, the clean reads were mapped to the temperate japonica Nipponbare reference genome, Os-Nipponbare-Reference-IRGSP-1.0 [23
], using the Burrows–Wheeler Aligner (BWA) software [15
] under the default parameters. The alignment results were then merged and indexed as BAM files [24
]. Average sequencing depth and coverage were calculated using the alignment results. The mapped reads were then used to detect SNP and InDel polymorphisms.
4.4. Detection of SNPs and InDels
We used GATK tools software [25
] to detect SNPs and InDels with its default parameters. In our analysis, InDels were defined as insertions or deletions the length of which was between 1 and 10 bp. The InDels falls on more than 10 cultivars are selected for further analysis.
4.5. Primer Design for Common InDel Markers
To design InDel markers for the detection of InDel polymorphisms by electrophoresis, we extracted only InDel regions with a large size (≥2 bp) and a high sequencing depth (DP, ≥5 fold) from each InDel list for the 17 cultivars. Primer pairs for the selected InDel regions were automatically designed by using a Perl script to control the Primer3 program [26
]. In addition, we screened primer pairs for duplication of sequences to maintain specificity. When the sequence of a primer pair matched that of another primer pair, the corresponding pairs were eliminated because they were considered redundant. PCR product size ranged from 100 to 150 bp.
4.6. PCR Validation
To validate the marker sets, we tested the primer pairs by means of gel electrophoresis. Around 10 markers were chosen so as to cover as much of the entire genome as possible, but were randomly chosen within each portion of the genome. The InDel primer sets were validated by using those 17 Chinese cultivars used for detection of the InDel regions. Furthermore, these primer sets were PCR-amplified in approximately 10 L of reaction mixture consisting of EmeraldAmp® GT PCR Master Mix (Takara, Japan) along with 5.0 pmol of each primer, and about 120 ng of the genomic DNA template. The conditions were: an initial denaturation step for 3 min at 95 °C, then 35 cycles of 20 s at 95 °C, 30 s at 55 °C, and 30 s at 72 °C, followed by a final extension for 1 min at 72 °C. PCR products were analyzed by means of gel electrophoresis on 1.5~1.7% agarose gels in Tris/borate/EDTA buffer. After staining with ethidium bromide, the band patterns on the agarose gels were photographed under Gel Doc™ 2000 Gel Documentation System (Bio-Rad, Seoul, Korea).