Next Article in Journal
Study on Optimal Nitrogen Application for Different Oat Varieties in Dryland Regions of the Loess Plateau
Previous Article in Journal
Assessment of Genetic Diversity and Population Structure of Exotic Sugar Beet (Beta vulgaris L.) Varieties Using Three Molecular Markers
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Single Nucleotide Polymorphisms and Insertion/Deletion Variation Analysis of Octoploid and Decaploid Tropical Oil Tea Camellia Populations Based on Whole-Genome Resequencing

1
Forestry Research Institute, Hainan Academy of Forestry (Hainan Academy of Mangrove), Haikou 571100, China
2
School of Breeding and Multiplication (Sanya Institute of Breeding and Multiplication), Hainan Engineering Research Center for Tropical Oil Tea Camellia, Hainan University, Sanya 572000, China
3
School of Tropical Agriculture and Forestry, Hainan University, Danzhou 571700, China
4
Forest Seed and Seedling General Station of Hainan Province, Haikou 570203, China
5
School of Life and Health Sciences, Hainan University, Haikou 570228, China
*
Authors to whom correspondence should be addressed.
These authors contributed equally to this work.
Plants 2024, 13(21), 2955; https://doi.org/10.3390/plants13212955
Submission received: 18 September 2024 / Revised: 19 October 2024 / Accepted: 21 October 2024 / Published: 22 October 2024
(This article belongs to the Section Plant Genetics, Genomics and Biotechnology)

Abstract

:
Oil tea camellia (Camellia spp.) is an important woody oil crop with a high nutritional and economic value. Whole-genome resequencing (WGR) technology can provide an in-depth understanding of the genetic background of this plant as well as a reference for breeding research, germplasm resource conservation, and genetic modification. In this study, we analyzed SNP and InDel variations in 49 individual oil tea camellia germplasm samples collected from five populations located in three provinces of China: Hainan, Guangdong and Guangxi. The samples were analyzed through WGR after the ploidy of the samples was determined through flow cytometry. A total of 239,441,603 high-quality single nucleotide polymorphisms (SNPs) and 23,510,374 high-quality insertion/deletion variation sites (InDels) were obtained. The distribution of SNPs and InDels in different functional regions differed significantly, with a high density of variations in non-coding regions, such as intergenic regions and introns, and a relatively low density of variations in coding regions. Transition was the main type of SNP variation. A population genetic diversity analysis revealed that the sampled oil tea camellia populations exhibited a high genetic diversity and extensive genetic variation. The genetic diversity of the oil tea camellia populations in the Hainan region was higher than inland regions. This study also determined the genetic diversity of and variations between octoploid and decaploid oil tea camellia in the tropics and between Hainan-based and inland oil tea camellia. Such findings provide a reference for the conservation of germplasm resources and the genetic modification of oil tea camellia.

1. Introduction

Oil tea camellia (Camellia spp.), also known as “Shanyou”, is a perennial evergreen shrub or small tree that belongs to the family Theaceae and genus Camellia L. It is a woody oil crop famous for the high oil content of its seeds [1,2]. This plant has a long history of cultivation in Eastern and Southern Asia, particularly in China, and is an important woody oil cash crop. Oil tea camellia is recognized globally as one of the four major woody oilseed crops together with oil palm (Elaeis guineensis Jacq.), olive (Olea europaea L.), and coconut (Cocos nucifera Linn.). Its advantages include a strong adaptability, drought resistance, and infertile soil resistance [3]. There remains limited research on the specific genetic diversity differences between octoploid and decaploid populations in tropical environments. This study aims to fill this gap by providing insights into how ploidy levels and geographic factors contribute to genetic variation, particularly in the underexplored tropical regions of Hainan. Because of its high economic value, the extensive scope of comprehensive development, a diverse range of applications, and significant utilization potential, it is considered to be a “guaranteed high-yield crop” and “green oil bank” by farmers [4].
Oil tea camellia is primarily distributed in the Yangtze River Basin and southern China but is also grown in small quantities in countries such as Myanmar, Thailand, and Vietnam. In China, oil tea camellia is mainly cultivated in the Hunan, Jiangxi, and Guangxi provinces, with these three regions together accounting for 75.8% of China’s total oil tea camellia cultivation area, with Hunan in particular exhibiting the widest cultivation area accounting for approximately 40% of the total [5]. Camellia vietnamensis is particularly suitable for cultivation at low altitudes in the tropical regions of Southern Asia because of its fast growth rate to tall heights and its adaptability to the tropical climate [6]. In the Leizhou Peninsula of the Guangdong Province and from the southeastern part of the Guangxi Zhuang Autonomous Region to Hainan Province, Camellia vietnamensis has adapted to climatic conditions and exhibits good flowering and fruiting characteristics. Camellia vietnamensis cultivated in Hainan is thought to have originated in Gaozhou City, Guangdong Province; therefore, it is also known as “Camellia gauchowensis Chang” in the Hainan region. The history of oil tea camellia cultivation in Hainan may be traced back more than 500 years [7]. A systematic study of the resource distribution of oil tea camellia in Hainan and a field survey revealed that the oil tea camellia germplasm resources in Hainan include wild and artificial cultivars, which are primarily concentrated in nine county-level cities and thirty-eight townships, encompassing approximately 1167.3 hectares [8].
Hainan Island is rich in oil tea camellia resources. The book Flora of China indicates that common native oil tea camellia species in Hainan grow in old-growth forests at altitudes above 800 m. According to the Flora Hainanica, A Checklist of Flowering Plants of Islands and Reefs of Hainan and Guangdong Province, Hainan Island Crop (Plant) Germplasm Resources Investigation Anthology, and other sources, Hainan Island contains a wealth of wild common oil tea camellia germplasm resources. Considerable evidence indicates that Hainan Island is the origin of the tropical oil tea camellia and has a long history of oil tea camellia cultivation. In numerous cities and counties in Hainan Province, there exist many oil tea camellia trees that are more than 100 years old, or even several 100 years old, particularly in Ding’an, Chengmai, Tunchang, and Qionghai, where oil tea camellia forests of an older age are preserved. Chengmai County has the highest concentration of oil tea camellia forests, with the largest trees having basal diameters up to 150 cm [8].
Whole-genome sequencing (WGS) is a precise technology that comprehensively reveals an organism’s genetic information. This enables researchers to gain insight into its genetic background and biological functions. Whole genome resequencing (WGR), meanwhile, refers to the sequencing of the genome of a specific individual or population based on an existing reference genome to discover genome-wide variations, such as single nucleotide polymorphisms (SNPs), structural variants, insertion/deletion variation sites (InDels), and copy number variations. These can be used to analyze the molecular genetic characteristics of an individual or population, screen and predict the genes for key economic traits, and study genetic evolution [9].
In January 2022, a research team from the Research Institute of Sub-Forestry, Chinese Academy of Forestry, used diploid oil tea camellia to successfully map the entire genome of oil tea camellia, the results of which clearly illustrated the origin and evolution of oil tea camellia. After more than 4 years of continuous study, this research team successfully obtained a diploid oil tea camellia genome map with a size of approximately 2.95 GB and a Contig N50 of 1.002 MB by applying PacBio third-generation sequencing technology. They accurately localized the genome sequences to 15 chromosomes, achieving a high localization rate of 91.33% [10]. This represents the first high-quality oil tea camellia genome map with chromosome-level precision, setting an example for the assembly of a complex genome of a woody oilseed crop. As of 2023, the whole genomes of some species within the genus Camellia of the family Theaceae had been sequenced. Besides oil tea camellia, Camellia sinensis, Camellia japonica, Camellia nitidissima, Camellia sinensis var. pubilimba, and Stewartia sinensis have been sequenced. This is a relatively small number compared with the families and genera of other crops.
At present, some progress has been made in the research of oil tea camellia WGR, which has provided a foundation for evaluating and utilizing oil tea camellia germplasm resources, molecular selective breeding, and the selection of high-quality varieties. oil tea camellia is widely distributed in China, and its genetic diversity is rich across these different geographic environments. Hainan oil tea camellia and that of the inland exhibit different genetic backgrounds and adaptive characteristics due to differences in their geographic location and climate. These differences are likely to be closely related to variations in ploidy level and genome structure. In this study, we selected 49 oil tea camellia germplasm resources from five tropical oil tea camellia populations in Hainan, Guangdong, and Guangxi provinces. After determining the ploidy level of the samples through flow cytometry, we performed WGR analysis to obtain the SNP and InDel variation data from the oil tea camellia samples. Additionally, we used bioinformatics to explore the differences and genetic diversity between octoploid and decaploid oil tea camellia in the tropics and between oil tea camellia in Hainan and the inland. Our findings provide a reference for future studies of oil tea camellia selective breeding and the conservation, rational use, and genetic modification of germplasm resources.

2. Results

2.1. Sample Collection Information

The sample populations collected in this experiment were from five different regions. A total of 49 individual plants from these five populations were collected. Populations a, b, c, d, and e were collected from Zhongjiu Village, Qionghai City, Hainan Province; Zharong Village, Wuzhishan City, Hainan Province; Longguangzai Village, Haikou City, Hainan Province; Yuetang Village, Zhanjiang City, Guangdong Province; and Shanzhugen Village, Yulin City, Guangxi Zhuang Autonomous Region, respectively. The names of the sample populations by location are denoted by the population letter codes above in the following text. The information collected regarding the oil tea camellia samples used for statistical analysis is shown in Table 1, and a geographic location map of oil tea camellia sample collection sites is shown in Figure 1.

2.2. Results of Oil Tea Camellia Ploidy Determination

The flow cytometry results for the oil tea camellia samples were obtained after determining the oil tea camellia ploidy levels (Table 2). The histograms obtained through the flow cytometry results are shown in Figure 2. The results indicated that oil tea camellia samples a and b were decaploid, whereas oil tea camellia samples c, d, and e were octaploid. The coefficient of variation of the peak value of each tested sample was within 5%, indicating that the test results are accurate and valid.

2.3. Analysis of the WGR Results

2.3.1. Overview of the WGR Data

According to the statistical results (Table 3), the WGR performed on the 49 oil tea camellia samples yielded a total of about 3770 G of raw data. After filtering and processing with the Fastp software, approximately 3722 G of clean data was obtained. Of the paired reads obtained from sequencing, the mean base content was 97.53% for the Q20 quality score and 92.49% for the Q30 quality score, and the mean percentage of the GC base content was 38.76%. These data indicate that the sequencing quality of the samples is good and meets the quality standards of this study. The data can, therefore, be used in subsequent analyses.
The reference genome size used in this study was 2.95 Gb [10]. After aligning the sample genomes with the reference genome, the output (Table 4) indicated that the read alignment rate of all samples ranged between 71.97% and 99.44%, with a mean value of 96.60%. The correct alignment rate of the reads ranged between 61.87% and 85.61%, with a mean value of 76.10%. The mapping quality values of all samples achieved mapQ ≥ 5, indicating that the alignment results were reliable. The above data indicated that the sequencing data were of high quality and exhibited a high alignment with the reference genome. However, the low correct alignment rate of the reads may result from the differences between the reference genome and the sample genome, indicating the presence of a large number of structural variations in the sample genome.

2.3.2. SNP and InDel Variant Analysis

After alignment with the reference genome, a total of 247,867,204 SNP markers and 25,687,017 InDel markers were extracted due to the high alignment and sequencing depth. After filtering, the SNP and InDel sites were annotated using the SnpEff software to obtain 239,441,603 high-quality SNPs and 23,510,374 high-quality InDels for the sample population. The distribution of SNPs and InDels for each chromosome is shown in Table 5 and Figure 3. All SNPs and InDels were evenly distributed on the 15 chromosomes of oil tea camellia, and the distribution on each chromosome was approximately uniform.
From the distribution of the SNP and InDel variants in each functional region of the genome (Table 6 and Figure 4), it was evident that the distribution of SNPs and InDels in different functional regions differed significantly. The variation in the sequences of the genomes of the tested samples was primarily concentrated in the non-coding regions, such as intergenic regions and introns. The density of variations in the coding regions was relatively low, which may have occurred because variations in the coding regions were prone to causing deleterious mutations, which were eliminated by strong selection in the evolutionary process.
By analyzing the types of base mutations at the SNP sites, the results (Table 7, Figure 5) indicated that the mean Ts/Tv value was 3.672. In this sample genome, transitions were the dominant type of SNP variation, a result that is consistent with the mutational pattern in most species, whereby transitions are more likely to occur and be retained compared with transversions because of complementary base pairing constraints.
Based on their effect on gene function, the variations were categorized into the following four types: HIGH, LOW, MODERATE, and MODIFIER. The results (Table 8) indicated that the vast majority of SNP and InDel variations were MODIFIER types (i.e., those located in non-coding regions), which accounted for 98.345% and 99.091% of the total number of variations, respectively, which is consistent with the above results. This suggests that although the number of variations in the oil tea camellia genome is large, most of them may not directly alter gene function, and only a small number were categorized as HIGH, accounting for 0.063% of the SNPs and 0.606% of the InDels. The latter type will significantly alter gene function by introducing premature stop codons and disrupting intronic splice sites. Although these variations are few in number, they may contain important adaptive loci or domestication genes. MODERATE effect variations accounted for 0.921% and 0.217% of the SNPs and InDels, respectively, and were primarily amino acid substitutions, which may also alter protein function. In contrast, although LOW effect variations changed codons, they were mostly synonymous mutations with less impact on gene function and accounted for lower proportions of the SNPs and InDels.

2.3.3. Genetic Diversity Analysis

The SNP allele frequency percentage and count graphs are shown in Figure 6. From the allele frequency distribution graph, it was evident that the frequencies of most of the alleles were low, primarily concentrated around the 10% frequency interval. The low number of alleles with a high frequency indicates that only a small number of alleles were very common in this population. The presence of a large number of low-frequency alleles also indicated that there was a high genetic diversity and extensive genetic variation in this oil tea camellia population. The number of alleles with a count of one was the highest, which may indicate that these oil tea camellia populations have recently produced alleles or experienced higher mutation rates, providing rich genetic resources for the oil tea camellia populations. The heterozygosity (Ho) of the five oil tea camellia populations was 16.88%, 14.63%, 21.38%, 10.97%, and 13.95%, respectively (Figure 7), with the largest Ho value recorded in population c. A high heterozygosity is usually associated with a good adaptive capacity and survival potential, suggesting that this population is genetically more diverse.

3. Discussion

WGR technology may be used to obtain information on genome-wide locus variation. It has been widely used in gene localization, genetic map construction, mutation site identification, and genetic evolution research [9,11]. Genetic diversity plays a crucial role in the evolutionary capacity of species; those with a higher genetic diversity tend to adapt more efficiently to changes in the environment, whereas those with a lower genetic diversity may gradually lose their competitiveness and become less adaptable through long-term natural selection [12].
In the present study, we used genome-wide allele frequencies, the heterozygosity (Ho) of oil tea camellia, and the diversity indices of phenotypic traits to assess genetic diversity. A high heterozygosity is usually associated with a good adaptive capacity and survival potential. The results indicated the presence of a large number of low-frequency alleles in the sample genomes, which suggests a high genetic diversity and extensive genetic variation in the oil tea camellia populations. Of the five oil tea camellia populations in the sample genome, population c achieved the largest Ho value, indicating a higher genetic diversity among this population compared with the others. Populations a and b were more genetically diverse than d and e. The high genetic diversity observed in oil tea camellia populations is consistent with the findings of Ye et al. [13], who highlighted significant genetic diversity across various Camellia populations, partly attributing this diversity to environmental adaptation and ploidy-level variation. This rich genetic diversity provides a foundation for breeding programs targeting traits like ecological adaptation, oil yield, genetic modification, and biodiversity conservation of the species.
In addition to its role in breeding and adaptation, maintaining a high genetic diversity is crucial for preserving the beneficial bioactive compounds present in oil tea camellia, which are valuable for medicinal and commercial applications. Teixeira and Sousa [14] emphasized the diverse biological activities of Camellia species, including their antioxidant, anti-inflammatory, and anti-cancer properties, which are linked to the genetic variability of these plants. Additionally, Li et al. [15] further elaborated on the therapeutic benefits of camellia oil, highlighting its positive effects on cardiovascular health due to its rich unsaturated fatty acid content. Therefore, the diversity observed in our study is instrumental in enhancing these health-promoting compounds, and conservation efforts should focus on maintaining this genetic diversity to maximize both the medicinal and economic value of oil tea camellia.
The above findings not only enhance our understanding of the oil tea camellia genome, but also provide a basis for the conservation of its germplasm resources and genetic modification. However, it is important to consider ethical issues related to genetic modification. While our study provides a foundation for potential genetic modifications, any practical applications should undergo careful evaluation regarding their ecological impacts and ethical implications. Future studies may further explore how ploidy-level and environmental factors specifically affect the genetic structure and gene expression of oil tea camellia and how these differences influence the survival and reproduction strategies and the evolutionary direction of oil tea camellia. However, some of the observed differences could not be explained in this study because of the large number of genomes associated with the octoploid and decaploid oil tea camellia samples. This was also affected by the presence of a large number of DNA repeated sequences, high heterozygosity, genome complexity, and detection errors in the sequencing methods and technology. In the future, sequencing methodology and technology improvements are needed to further reduce detection errors.
The WGR study has increased our understanding of the genetic diversity of oil tea camellia through variation detection and population genetic analysis. Our findings provide strong support for the discovery of high-quality alleles and the genetic analysis of important agronomic traits. Additionally, the resequencing data may be used to identify various molecular markers, construct high-density genetic maps, and accelerate the molecular breeding process of oil tea camellia. However, there are still important issues to be solved in terms of the WGR of oil tea camellia, such as the relatively high sequencing cost and major challenges such as the huge amount of genome data, the numerous repeated sequences, and the extremely complex homologous and heterologous structures of the various subgenomes of oil tea camellia. Additionally, data analysis and bioinformatics processing also pose challenges, and the validation of and functional research into the various sites must be strengthened. With the increasing maturity of sequencing technology and the continuous improvement of bioinformatics analysis, more breakthroughs are bound to be achieved in future research on the WGR of oil tea camellia.

4. Materials and Methods

4.1. Sample Collection

The samples collected in this study were selected from live oil tea camellia tree clusters over 50 years of age in Hainan Province and inland China (Guangdong and Guangxi provinces). When sampling, 5–6 fresh young leaves (preferably with newly grown red young leaves at the top of the oil tea camellia plant) were selected from each plant, placed into a sealed bag, and numbered. Next, the sealed bags were placed in dry ice for storage and used to determine the ploidy level through flow cytometry and WGR analysis.

4.2. Determination of Sample Ploidy Levels and Genomic DNA Extraction

Flow cytometry was performed [16] along with the cetyltrimethylammonium bromide (CTAB) method [17,18]. The calculation formula was as follows: Genome size of the test sample = control genome size × fluorescence intensity of the test sample ÷ control fluorescence intensity [19,20].

4.3. Resequencing Data Acquisition and Processing

4.3.1. WGR Library Construction

The extracted DNA samples were sent to Wuhan Benagen Technology Co., Ltd., located in Wuhan City, Hubei Province, China, for sequencing to obtain raw WGR data. The sequencing platform used was BGI (BGI-Shenzhen, China), the sequencing library used was DNBSEQ, and the sequencing method was paired-end sequencing.

4.3.2. Raw Data Quality Control (QC)

The initial fluorescence image files acquired from the sequencing platform were converted into raw data. These data were subject to QC through filtering to remove adapter sequences, low-quality bases, and unrecognizable nucleotides (N). The Fastp (version 0.20.0, developed by OpenGene, Shenzhen, China) software [21] was used for processing to remove low-quality sequences. Through filtering, higher-quality data (i.e., clean data) was obtained and statistically analyzed in terms of the number of reads, Q20 and Q30 quality scores, and GC content.

4.3.3. Alignment of Sequencing Data to a Reference Genome

Clean reads were aligned and sequenced against the reference genome sequence using BWA (version 0.7.17, developed by Heng Li, Broad Institute, Cambridge, MA, USA) [22], SAMTOOLS (version 1.6, originally developed by the Sanger Institute, Hinxton, Cambridgeshire, UK) [23], and GATK (version 4.3.3.0, developed by the Broad Institute, Cambridge, MA, USA) [24] software. The reference genome was the diploid oil tea camellia sequence published in the NCBI database (https://www.ncbi.nlm.nih.gov/datasets/genome/GCA_022316695.1, accessed on 1 July 2023) [10]. Statistical analysis was performed after obtaining the alignment information.

4.3.4. Variation Detection and Filtering

Sequence files were polymerase chain reaction-labeled for repeated sequences using the MarkDuplicates tool in the GATK software. These files were indexed using the index tool in the SAMTOOLS software. Variation was detected using the GATK software [24,25].

4.3.5. Variation Annotations

The SnpEff (version 5.2, developed by Pablo Cingolani, La Jolla, CA, USA) software [26,27] and the gff annotation file of the oil tea camellia reference genome (https://github.com/Hengfu-Yin/CON_genome_data, accessed on 1 April 2024) were used to annotate variations in the SNP and InDel sites in the filtered VCF file. The locations and types of variations were also obtained.

5. Conclusions

In this study, through WGR of 49 tropical oil tea camellia germplasm resources, 247,867,204 SNPs and 25,687,017 InDel variation sites were obtained with 96.60% alignment with the reference genome. After filtering out low-quality reads and excluding heterozygous and missing data, we obtained 239,441,603 high-quality SNPs and 23,510,374 high-quality InDels. A variation analysis of 49 oil tea camellia samples using these SNP and InDel loci, which cover the whole genome, revealed that there were significant differences in the distribution of SNPs and InDels in various functional regions, with a higher distribution in the intergenic and upstream regions. This suggests that there is greater evolutionary variability in these regions, and the SNP and InDel variations in the non-coding regions may have a significant impact on oil tea camellia gene expression regulation and genetic diversity. In contrast, variations in the genomic sequences of the tested samples were low in the coding regions. This may have occurred because the variation in coding regions tends to cause deleterious mutations, which are eliminated by strong selection during evolution. Based on our analysis of base mutations at SNP sites, we found that transitions were the predominant type of SNP variation in the genome of this sample. This finding aligns with the typical mutation pattern in most species, where transitions are more likely to occur and be retained than transversions due to complementary base pairing constraints.
The results of this study provide valuable insights into the genetic diversity and variation patterns of oil tea camellia, particularly in tropical populations. The high-quality SNP and InDel datasets generated here offer a valuable resource for future research, including the identification of key genetic markers for breeding programs aimed at improving ecological adaptability, oil yield, and resistance to environmental stresses. Furthermore, our findings highlight the importance of non-coding regions in maintaining genetic diversity, suggesting that future studies should focus on the functional impact of variations in these regions, especially their role in gene regulation and adaptation mechanisms. Advances in sequencing technologies and bioinformatics approaches will be essential for further reducing detection errors and enhancing the understanding of the complex genomic architecture of oil tea camellia. Such efforts will contribute significantly to the conservation and genetic improvement of this economically important species.

Author Contributions

Conceptualization, H.L., J.S., X.Z., B.L. and F.C.; methodology, J.S., X.Z., J.L. and H.H.; software, X.Z. and S.Z.; validation, J.W. and B.L.; formal analysis, J.S., X.Z. and S.Z.; investigation, J.S., X.Z., W.W. and X.H.; resources, H.L., J.S. and B.L.; data curation, X.Z. and S.Z.; writing—original draft preparation, J.S. and X.Z.; writing—review and editing, B.L. and H.L.; visualization, X.Z. and S.Z.; supervision, H.L. and F.C.; project administration, H.L. and D.H.; funding acquisition, H.L. and F.C. J.S., X.Z. and B.L. have contributed equally to this work. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the National Key R&D Program of China (2023YFD2200702), 2023 Hainan Provincial Scientific Research Institute Project (KYYSLK2023-010), Hainan Province Key R&D Project (ZDYF2023XDNY055), and Hainan Province High-tech Industry Development Special Project (KJCGZH017).

Data Availability Statement

Data are available on request from the corresponding authors.

Conflicts of Interest

The authors declare no conflicts of interest.

References

  1. Zhuang, R. Chinese Oil-Tea Camellia; China Forestry Press: Beijing, China, 1988; ISBN 7-5033-0159-X. [Google Scholar]
  2. Lü, S. The current situation of oil-tea camellia resource utilization and development countermeasures in Luoshan County. Henan For. Sci. Technol. 2008, 2, 92–93. [Google Scholar]
  3. Jia, X.C.; Liu, Y.J.; Xu, Y.F.; Yu, Z.Y. Metabolomic analysis of mature seeds of Hainan oil-tea (Camellia sinensis) and common oil-tea (Camellia sinensis). Mol. Plant Breed. 2022, 20, 8255–8263. [Google Scholar]
  4. Wu, F.Y.; Cai, Y.; Hao, B.Q.; Jia, Y.X.; Ye, H.; Zhang, Z.Y.; Ma, J.L. Establishment and application of flow cytometry method for genome size determination of Camellia sinensis and Camellia vietnamensis. J. Trop. Crop 2023, 44, 1542–1550. [Google Scholar]
  5. Yao, S.H.; Wang, K.L.; Luo, X.F.; Ren, H.D.; Gong, B.C. In China’s oil-tea Camellia Resources and Technology: Current Situation and Countermeasures for the Development of Industrialization. In Proceedings of the The Third Annual Conference of the Chinese Society of Cereals and Oils, Yantai, China, 1 September 2004. [Google Scholar]
  6. Shu, Q.; Zhang, F. Cultivation and Pest Control of Oil-Tea Camellia in China; China Forestry Press: Beijing, China, 2009; ISBN 978-7-5038-5592-4. [Google Scholar]
  7. Cao, T.Y. Naming-Function-History: A primary exploration of hainan oil-tea camellia industry culture. World Trop. Agric. Inf. 2018, 12, 9–13. [Google Scholar]
  8. Zheng, D.; Pan, X.; Xie, L.; Zeng, J.; Wu, Y.; Zhang, Z.; Ji, Q. Survey and analysis of oil-tea camellia resources in Hainan. J. Northwest For. Coll. 2016, 31, 130–135. [Google Scholar]
  9. Li, G.Z.; Deng, W.D. Research progress regarding genome sequencing technology and its application. Anhui AgricSci 2018, 46, 20–22. [Google Scholar]
  10. Lin, P.; Wang, K.; Wang, Y.; Hu, Z.; Yan, C.; Huang, H.; Ma, X.; Cao, Y.; Long, W.; Liu, W.; et al. The genome of oil-Camellia and population genomics analysis provide insights into seed oil domestication. Genome Biol. 2022, 23, 14. [Google Scholar] [CrossRef]
  11. Lynch, M.; Conery, J.; Burger, R. Mutational meltdowns in sexual populations. Evolution 1995, 49, 1067–1080. [Google Scholar] [CrossRef]
  12. Glemin, S.; Francois, C.M.; Galtier, N. Genome evolution in outcrossing vs. selfing vs. asexual species. Methods Mol. Biol. 2019, 1910, 331–369. [Google Scholar]
  13. Ye, C.; He, Z.; Peng, J.; Wang, R.; Wang, X.; Fu, M.; Zhang, Y.; Wang, A.; Liu, Z.; Jia, G.; et al. Genomic and genetic advances of oiltea-camellia (Camellia oleifera). Front. Plant Sci. 2023, 14, 1101766. [Google Scholar] [CrossRef]
  14. Teixeira, A.M.; Sousa, C. A Review on the Biological Activity of Camellia Species. Molecules 2021, 26, 2178. [Google Scholar] [CrossRef]
  15. Li, Z.; Liu, A.; Du, Q.; Zhu, W.; Liu, H.; Naeem, A.; Guan, Y.; Chen, L.; Ming, L. Bioactive substances and therapeutic potential of camellia oil: An overview. Food Biosci. 2022, 49, 101855. [Google Scholar] [CrossRef]
  16. Leng, Q.Y.; Lu, J.P.; Huang, S.H.; Xu, S.S.; Li, H.Y.; Niu, J.H.; Yin, J.M. Ploidy level identification of the hybrid progeny of Orchidaceae based on flow cytometry. J. Trop. Crop 2023, 44, 2219–2226. [Google Scholar]
  17. Murray, M.G.; Thompson, W.F. Rapid isolation of high molecular weight plant DNA. Nucleic Acids Res. 1980, 8, 4321–4325. [Google Scholar] [CrossRef] [PubMed]
  18. Ye, T.W.; Yuan, D.Y.; Li, Y.M.; Xiao, S.X.; Gong, S.F.; Zhang, J.; Li, S.F.; Luo, J. Ploidy level identification of Hainan oil-tea camellia. For. Sci. 2021, 57, 61–69. [Google Scholar]
  19. Li, C.; Li, X.; Huang, Z.; Lu, J.; Li, Q.; Huang, C.; Bu, C. Identification of genome size and chromosome ploidy in jasmine using flow cytometry. J. Trop. Crop 2021, 42, 1231–1236. [Google Scholar]
  20. Li, J.; Zhou, P.; Zhang, Q.; Zhang, M. Determination of the genome size of North American hollyhock based on flow cytometry. Chin. Wild Plant Resour. 2023, 42, 29–34. [Google Scholar]
  21. Chen, S.; Zhou, Y.; Chen, Y.; Gu, J. Fastp: An ultra-fast all-in-one FASTQ preprocessor. Bioinformatics 2018, 34, i884–i890. [Google Scholar] [CrossRef]
  22. Durand, N.C.; Shamim, M.S.; Machol, I.; Rao, S.S.; Huntley, M.H.; Lander, E.S.; Aiden, E.L. Juicer provides a one-click system for analyzing loop-resolution Hi-C experiments. Cell Syst. 2016, 3, 95–98. [Google Scholar] [CrossRef]
  23. Li, H.; Handsaker, B.; Wysoker, A.; Fennell, T.; Ruan, J.; Homer, N.; Marth, G.; Abecasis, G.; Durbin, R. 1000 Genome Project Data Processing Subgroup. The sequence alignment/map format and SAMtools. Bioinformatics 2009, 25, 2078–2079. [Google Scholar] [CrossRef]
  24. McKenna, A.; Hanna, M.; Banks, E.; Sivachenko, A.; Cibulskis, K.; Kernytsky, A.; Garimella, K.; Altshuler, D.; Gabriel, S.; Daly, M.; et al. The genome analysis toolkit: A MapReduce framework for analyzing next-generation DNA sequencing data. Genome Res. 2010, 20, 1297–1303. [Google Scholar] [CrossRef] [PubMed]
  25. Lefouili, M.; Nam, K. The evaluation of Bcftools mpileup and GATK HaplotypeCaller for variant calling in non-human species. Sci. Rep. 2022, 12, 11331. [Google Scholar] [CrossRef] [PubMed]
  26. Cingolani, P. Variant annotation and functional prediction: SnpEff. Methods Mol. Biol. 2022, 2493, 289–314. [Google Scholar] [PubMed]
  27. Cingolani, P.; Platts, A.; Wang, L.L.; Coon, M.; Nguyen, T.; Wang, L.; Land, S.J.; Lu, X.; Ruden, D.M. A program for annotating and predicting the effects of single nucleotide polymorphisms, SnpEff: SNPs in the genome of Drosophila melanogaster strain w1118; iso-2; iso-3. Fly 2012, 6, 80–92. [Google Scholar] [CrossRef]
Figure 1. Map of the geographic locations of the oil tea camellia sample collection sites.
Figure 1. Map of the geographic locations of the oil tea camellia sample collection sites.
Plants 13 02955 g001
Figure 2. Histograms for the results of the flow cytometry tests on the oil tea camellia samples. Note: (k): Camellia chekiangoleosa Hu. (control group), (a): Zhongjiu No.1, (b): Zharong No.7, (c): Longguangzai No.1, (d): Yuetang No.3, and (e): Shanzhugen No.8.
Figure 2. Histograms for the results of the flow cytometry tests on the oil tea camellia samples. Note: (k): Camellia chekiangoleosa Hu. (control group), (a): Zhongjiu No.1, (b): Zharong No.7, (c): Longguangzai No.1, (d): Yuetang No.3, and (e): Shanzhugen No.8.
Plants 13 02955 g002
Figure 3. Histogram of the distribution of SNPs and InDel on 15 chromosomes.
Figure 3. Histogram of the distribution of SNPs and InDel on 15 chromosomes.
Plants 13 02955 g003
Figure 4. Histogram of the distribution of variation regions for SNPs and InDels.
Figure 4. Histogram of the distribution of variation regions for SNPs and InDels.
Plants 13 02955 g004
Figure 5. Pie chart (up) and circle chart (down) of the number and proportion of transition and transversion mutations.
Figure 5. Pie chart (up) and circle chart (down) of the number and proportion of transition and transversion mutations.
Plants 13 02955 g005
Figure 6. Histogram of the distribution of SNP allele frequency percentages (top) and counts (bottom).
Figure 6. Histogram of the distribution of SNP allele frequency percentages (top) and counts (bottom).
Plants 13 02955 g006
Figure 7. Histogram for the heterozygosity of the five oil tea camellia populations (Ho).
Figure 7. Histogram for the heterozygosity of the five oil tea camellia populations (Ho).
Plants 13 02955 g007
Table 1. Oil tea camellia sample collection information.
Table 1. Oil tea camellia sample collection information.
Population IDSampling LocationLongitude (E)Latitude (N)Altitude (m)Age of Central PlantSoil TypeHabitatMean Ground Diameter/cmMaximum Ground Diameter/cmNumber of Samples
aZhongjiu Village, Qionghai City, Hainan Province110°18′19°05′82590 yearsLateriteShrub forest25604
bZharong Village, Wuzhishan City, Hainan Province109°27′18°41′1090Over 100 yearsLateritic red earthTropical rainforest205810
cLongguangzai Village, Haikou City, Hainan Province110°14′19°43′158.360 yearsLateriteFarm204210
dYuetang Village, Zhanjiang City, Guangdong Province110°18′20°18′38.8560 yearsTorrid red soilField154513
eShanzhugen Village, Yulin City, Guangxi Zhuang Autonomous Region110°10′22°19′220.750 yearsLateritic red earthShrub forest154212
Total 49
Table 2. Flow cytometry results of the oil tea camellia samples.
Table 2. Flow cytometry results of the oil tea camellia samples.
Sample PopulationSample NamePeak ValueCoefficient of VariationGenome Size/GbPloidy
kCamellia chekiangoleosa Hu. (Control Group)60983.432.732X
aZhongjiu No.131,3983.8813.158610X
bZharong No.731,1003.0113.022110X
cLongguangzai No.121,9984.7810.24828X
dYuetang No.323,3184.1310.57648X
eShanzhugen No.825,5883.3710.72898X
Table 3. Sequencing data quality.
Table 3. Sequencing data quality.
No.Q20 (%)Q30 (%)GC ContentNo.Q20 (%)Q30 (%)GC Content
ZJ0197.8593.1537.53YT0296.1287.9337.34
ZJ0296.9890.2237.56YT0397.4592.0037.34
ZJ0397.8993.2637.48YT0497.5791.7536.37
ZJ0597.8793.1737.82YT0696.5489.2136.89
ZR0197.7793.9042.78YT0797.4291.9337.19
ZR0298.3195.3542.16YT0897.3191.6137.15
ZR0397.7693.9540.70YT0996.8489.8836.90
ZR0498.2895.2841.94YT1096.8190.1436.86
ZR0598.2595.1741.75YT1196.6089.3936.88
ZR0698.3795.5839.90YT1296.6189.4536.98
ZR0798.0294.5841.59YT1497.4692.0636.95
ZR0897.8494.1442.03YT1597.0790.8237.15
ZR0998.2595.2745.78SZG0196.8289.9037.47
ZR1098.1995.0940.61SZG0296.5889.6237.86
LGZ0198.3295.5239.61SZG0397.1591.3138.01
LGZ0297.9094.4439.36SZG0496.7790.1937.86
LGZ0398.1295.0239.38SZG0596.3288.4136.91
LGZ0497.9995.3038.71SZG0696.1587.7538.62
LGZ0597.9194.6439.25SZG0797.2491.0938.79
LGZ0698.5095.4239.19SZG0898.6294.8137.75
LGZ0798.1294.9739.46SZG0996.6889.1938.67
LGZ0897.9995.3239.01SZG1098.3793.8337.70
LGZ0998.4095.6839.36SZG1197.2090.5638.89
LGZ1098.4195.1038.95SZG1297.1690.3537.39
YT0196.6289.4437.24Mean97.5392.4938.76
Table 4. Alignment data statistics.
Table 4. Alignment data statistics.
No.Alignment Rate (%)Correct Alignment Rate (%)No.Alignment Rate (%)Correct Alignment Rate (%)
ZJ0199.3166.47YT0298.5676.47
ZJ0292.9670.71YT0399.2273.33
ZJ0397.8275.90YT0499.2175.62
ZJ0596.9672.65YT0698.8976.32
ZR0194.0275.29YT0799.1177.40
ZR0295.8079.89YT0899.1573.94
ZR0396.1378.90YT0998.9677.05
ZR0497.9281.87YT1099.2672.85
ZR0597.1681.18YT1199.1874.81
ZR0695.2882.11YT1298.9475.04
ZR0794.6277.91YT1499.4375.41
ZR0881.8967.03YT1598.7874.90
ZR0971.9761.87SZG0196.8173.83
ZR1091.2176.66SZG0298.3273.58
LGZ0199.4485.61SZG0391.6069.05
LGZ0299.1582.42SZG0497.9473.66
LGZ0399.1682.60SZG0596.1370.47
LGZ0499.0381.35SZG0698.8174.68
LGZ0599.3582.66SZG0797.3478.85
LGZ0699.3481.68SZG0897.7775.49
LGZ0799.2283.24SZG0993.8870.38
LGZ0899.0583.10SZG1097.2474.43
LGZ0999.1484.24SZG1188.5564.49
LGZ1098.8280.85SZG1297.1972.64
YT0198.5278.23Mean96.6076.10
Table 5. Distribution of SNPs and InDels on 15 chromosomes.
Table 5. Distribution of SNPs and InDels on 15 chromosomes.
ChromosomeSNPsInDelsChromosomeSNPsInDels
120,518,8462,030,755910,568,5051,025,902
218,765,4101,850,5731019,286,8351,885,013
318,863,3751,904,8731115,649,8621,506,655
416,548,5161,620,9701217,933,4171,761,125
518,279,7541,814,3291315,543,2951,482,437
613,300,0851,329,5491412,550,2121,250,940
717,848,8191,729,7881512,726,2301,228,466
811,058,4421,088,999Total239,441,60323,510,374
Table 6. Distribution of SNPs and InDels in each genomic region.
Table 6. Distribution of SNPs and InDels in each genomic region.
Type (Alphabetical Order)SNPInDel
CountPercentCountPercent
DOWNSTREAM19,496,9206.922%2,540,4218.687%
EXON4,410,8241.566%233,8710.8%
INTERGENIC226,919,62380.561%21,897,76174.88%
INTRON10,261,5663.643%1,686,2225.766%
SPLICE_SITE_ACCEPTOR19,3290.007%43620.015%
SPLICE_SITE_DONOR15,6640.006%27200.009%
SPLICE_SITE_REGION176,0890.063%25,2700.086%
UPSTREAM19,697,4096.993%2,719,7779.3%
UTR_3_PRIME397,4230.141%70,0310.239%
UTR_5_PRIME280,8840.1%59,5860.204%
Missense Variant2,594,1750.92%167,6360.57%
Genome Total Length2,891,061,056
Genome Effective Length2,640,572,740
Variant Rate1 variant every 11 bases1 variant every 112 bases
Table 7. SNP base mutations.
Table 7. SNP base mutations.
Base ChangesTransitions (Ts)Transversions (Tv)
AGCT
A029,499,4066,892,1359,304,074
G58,391,90605,606,59410,056,400
C9,977,3665,595,200058,363,346
T9,366,9226,880,41829,507,8360
Total1,103,905,690300,594,047
Ts/Tv3.672
Table 8. Number and proportion of SNPs and InDels at different impact levels.
Table 8. Number and proportion of SNPs and InDels at different impact levels.
Type (Alphabetical Order)SNPInDel
QuantityRatioQuantityRatio
HIGH176,7820.063%177,1320.606%
LOW1,891,5220.672%25,2700.086%
MODERATE2,594,1750.921%63,5520.217%
MODIFIER277,013,25298.345%28,977,75199.091%
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Song, J.; Zhao, X.; Lin, B.; Zhang, S.; Lai, H.; Chen, F.; Huang, D.; Liu, J.; Hu, H.; Wang, J.; et al. Single Nucleotide Polymorphisms and Insertion/Deletion Variation Analysis of Octoploid and Decaploid Tropical Oil Tea Camellia Populations Based on Whole-Genome Resequencing. Plants 2024, 13, 2955. https://doi.org/10.3390/plants13212955

AMA Style

Song J, Zhao X, Lin B, Zhang S, Lai H, Chen F, Huang D, Liu J, Hu H, Wang J, et al. Single Nucleotide Polymorphisms and Insertion/Deletion Variation Analysis of Octoploid and Decaploid Tropical Oil Tea Camellia Populations Based on Whole-Genome Resequencing. Plants. 2024; 13(21):2955. https://doi.org/10.3390/plants13212955

Chicago/Turabian Style

Song, Jiaming, Xin Zhao, Bo Lin, Shihui Zhang, Hanggui Lai, Feifei Chen, Dongyi Huang, Jinping Liu, Haiyan Hu, Jian Wang, and et al. 2024. "Single Nucleotide Polymorphisms and Insertion/Deletion Variation Analysis of Octoploid and Decaploid Tropical Oil Tea Camellia Populations Based on Whole-Genome Resequencing" Plants 13, no. 21: 2955. https://doi.org/10.3390/plants13212955

APA Style

Song, J., Zhao, X., Lin, B., Zhang, S., Lai, H., Chen, F., Huang, D., Liu, J., Hu, H., Wang, J., Wu, W., & Huang, X. (2024). Single Nucleotide Polymorphisms and Insertion/Deletion Variation Analysis of Octoploid and Decaploid Tropical Oil Tea Camellia Populations Based on Whole-Genome Resequencing. Plants, 13(21), 2955. https://doi.org/10.3390/plants13212955

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop