Next Article in Journal
Association and Interrelationship Among Agronomic Traits and Fungal Diseases of Sorghum, Anthracnose and Grain Mold
Previous Article in Journal
Light Energy Use Efficiency in Photosystem II of Tomato Is Related to Leaf Age and Light Intensity
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Genetically Distinct Rice Lines for Specific Characters as Revealed by Gene-Associated Average Pairwise Dissimilarity

Plant Gene Resources of Canada, Saskatoon Research and Development Centre, Agriculture and Agri-Food Canada, 107 Science Place, Saskatoon, SK S7N 0X2, Canada
Crops 2024, 4(4), 636-650; https://doi.org/10.3390/crops4040044
Submission received: 12 October 2024 / Revised: 6 November 2024 / Accepted: 18 November 2024 / Published: 28 November 2024

Abstract

Broadening the genetic base of an elite breeding gene pool is one important goal in a successful long-term plant breeding program. This goal is largely achieved through the search for and introgression of exotic germplasm with adaptive traits. However, little is known about the genetic backgrounds of acquired exotic germplasm, as germplasm selection is mainly based on trait information. Here, we expanded an average pairwise dissimilarity (APD) analysis to samples with SNP genotypes associated with genes for specific characters of breeding interest. Specifically, we explored a gene-associated APD analysis in a genomic characterization of 2643 rice lines based on their published FASTQ data. Published contigs for cloned genes conditioning heat tolerance, cold tolerance, fertility, and seed size were downloaded as gene reference sequences for SNP calling, along with those SNP calls based on the rice reference genome and published indels. Totally, eight SNP or indel data sets were formed for each of three sample groups (All2643, Indica1789, and Japonica854). APD estimation was made for each of the 24 data sets. For each sample group, four novel sets of the 25 most genetically distinct rice lines, each for an assayed character, were identified. Further analyses of APD estimates also revealed some interesting APD properties. Four contig-based SNP data sets for four specific characters displayed similar APD frequency distributions and positive high correlations of APD estimates. Contig-based APD estimates were negatively correlated with genome-based APD estimates and nearly uncorrelated with indel-based APD estimates. These findings are significant for plant germplasm characterization and germplasm utilization in plant breeding.

1. Introduction

A successful plant breeding program is highly dependent on the availability of genetically diverse germplasm in an elite breeding gene pool that is developed for effective genetic improvements of specific characters, such as yield, growth, disease resistance, and abiotic resistance [1,2]. Thus, broadening the genetic base of an elite breeding gene pool is one important goal in plant breeding [3]. Many approaches have been applied to widen the elite breeding gene pool [4], including the search for exotic germplasm such as landraces or crop wild relatives with adaptive traits such as disease resistance for introgression (e.g., [5]) and pre-breeding to explore adapted germplasm from genebanks (e.g., [6,7]). These breeding efforts can expand the elite breeding gene pool with the acquisition of valuable exotic germplasm [8,9], particularly through genomic selection based on the traits of agronomic interest [10]. However, the exotic germplasm selection is based more on the information of traits valued for agricultural adaptation but less on the germplasm genetic backgrounds. Thus, the genetic base of those selected exotic germplasm is uncertain [6], and little is known about the extent of improvement in genetic base from exotic germplasm addition for the elite breeding gene pool [11].
Marker-based average pairwise dissimilarity (APD) is a measure of genetic differences among plant samples that can allow for a relative assessment of genetic distinctness and genetic redundancy among assayed samples. This method was first developed in 2006 to facilitate plant germplasm characterization and utilization [12]. The APD estimation is based on a set of marker genotype data, calculates the pairwise genotypic dissimilarity following the simple matching coefficient of Sokal and Michener [13], and averages all the pairwise dissimilarity values of a sample against the remaining assayed samples. A sample with a higher APD value is meant to be more genetically distinct than the other samples with lower APD values. The method has been well cited in the scientific literature, but interestingly, it has not been employed as widely as expected to assess genetic distinctness and redundancy [14].
The genomic characterization of plant germplasm conserved in genebanks has become more feasible than before (e.g., see [15,16,17,18]) because of the technical advances in genomics. Many published genomic SNP data sets on conserved plant germplasm are available (e.g., [16,19]). Recently, a research effort was made following the APD method to assess the genetic distinctness and redundancy of the previously characterized germplasm accessions conserved in five international genebanks [20]. Based on 12 published genomic data sets on germplasm collections of size ranging from 661 to 55,879 accessions with up to 2.4 million SNPs, an APD value was generated for each accession in each data set. The effort not only helped to identify many sets of genetically distinct and redundant germplasm for the five genebanks but also revealed that the APD estimation was more sensitive to the number of SNPs, minor allele frequency, and missing data and less so to the sample size. An effective APD estimation required 5000 to 10,000 genome-wide SNPs. These findings are encouraging for the genetic characterization and categorization of plant germplasm for better germplasm management and utilization.
This study was conducted to expand the original APD analysis [12] to samples with SNP genotypes associated with genes for specific traits of breeding interest, as such APD estimates should theoretically carry useful information on genetic backgrounds at the functional regions of a genome conditioning the traits. Specifically, we explored a gene-associated APD (gaAPD) analysis in a genomic characterization of 2643 rice lines based on their published rice FASTQ data [21]. The gaAPD exploration carried three specific objectives: (1) generate gaAPD estimates for the rice samples based on cloned genes conditioning specific characters of rice heat tolerance, cold tolerance, fertility, and seed size; (2) assess gaAPD properties through comparative analyses to genome-based and indel-based APD estimations; and (3) identify a set of the most genetically distinct rice lines for each specific assayed character. We hope that this gaAPD exploration will demonstrate its usefulness as a supplemental tool for identifying genetically distinct germplasm to enhance the search for exotic germplasm for the widening of elite breeding gene pools.

2. Materials and Methods

2.1. Acquisition of Published Rice Genomic Data

We downloaded the published original FASTQ files of trimmed, filtered sequence reads for 2643 rice accessions [21] from European Nucleotide Archive (https://www.ebi.ac.uk/ena/browser/view/PRJEB6180; accessed 23 September 2024). These accessions included 1789 and 854 samples representing indica and japonica groups, respectively, and were extracted from the related inventory file of the 3K rice genome project [21]. For ease of computational requirement for this study, only one-run FASTQ files per accession were randomly selected and utilized for genomic and APD analyses. The IRGSP-1.0 genome reference and annotation files were also downloaded from the Rice Annotation Project Database (https://rapdb.dna.affrc.go.jp/download/irgsp1.html; accessed on 23 September 2024). We also acquired contig FASTA files published for cloned genes associated with gene ontologies for rice heat tolerance, cold tolerance, rice fertility, and seed size from the China Rice Data Centre (https://www.ricedata.cn/; accessed 23 September 2024). Specifically, there were 56 contig FASTA files selected for 60 (out of 158) genes for heat tolerance (TO:0000259) (Table S1); 79 contig FASTA files for 83 (out of 184) genes for cold tolerance (TO:0000303) (Table S2); 23 contig FASTA files for 24 (out of 56) genes for rice fertility (TO:0000420) (Table S3); and 44 contig FASTA files for 48 (out of 111) genes for seed size (TO:0000391) (Table S4). These selected contig FASTA files have sequence lengths of 202K base pairs or shorter and were published from many gene cloning efforts, as documented in the China Rice Data Centre. Note that these four characters were arbitrarily selected to represent traits of potential breeding interest, such as abiotic stress, growth, and yield. For a comparative analysis, the 3K RG 2.3mio biallelic indel data set in three PLINK bed, bim and fam formats was downloaded from the Rice SNP-Seek Database (https://snp-seek.irri.org/download.zul, accessed on 26 September 2024).

2.2. Data Processing

Efforts were made to process the acquired genomic data and generate five specific SNP data sets for the whole genome and for genes associated with heat tolerance, cold tolerance, fertility, and seed size for ease of comparative APD analyses. Each SNP data set was generated using several procedures. First, SNP calling from short-read sequence data requires a reference sequence. For the whole genome analysis, the IRGSP-1.0 genome reference was used. For the other four analyses, the reference sequence was generated from assembling selected contig FASTA files and checking ambiguous base calls. For example, the reference sequence for cold tolerance consisted of 79 contig FASTA files (combined without considering the order and length of the selected contigs), followed by base call checks. Second, BAM files were created using BWA (0.7.17-r1188; [22]) and Samtools (v 1.6; [23]), followed by sorting BAM files with Samtools. Third, SNP calling from sorted.bam files was performed using Bcftools (v 1.9; [24]) with the option of excluding indels to generate a SNP genotype VCF file. Fourth, the VCF file was checked using Vcftools (v 0.1.15; [25]) on missing values and minor allele frequency (MAF). For this study, no SNPs with missing values were allowed for contig-based data sets, while up to 20% missing values were permitted for genome-based data sets to ensure compatible numbers of SNPs for comparative APD analyses. Minor allelic frequencies were set to be greater than 0.01 for all data sets and further restricted to be smaller than 0.495 for contig-based SNPs, which carried invariant homozygotes and heterozygotes. The SNPs in the whole genome VCF file were further divided and extracted based on genic and non-genic (or presumably neutral) regions of SNPs. Thus, two extra VCF files were generated for genome genic SNPs and neutral SNPs, respectively. The genic SNPs were obtained with multiple steps: (1) IRGSP-1.0_representative_transcript_exon_2024-07-12.gtf was converted into a bed file using convert2bed (v 2.4.36; [26]); (2) the resulting bed file was used with Bcftools intersect function to generate the overlapped SNPs of the whole genome; and (3) further separation of genic SNPs from the overlapped whole genome SNPs was conducted in Microsoft Excel, along with the extraction of non-genic SNPs. Fifth, as APD estimation was performed using the SNPRelate Bioconductor R package [27], which is capable of handling a large genomic SNP data set, each VCF file was converted into GDS format using SNPRelate functions. Extra effort was also made to convert the downloaded indel data in PLINK formats into GDS format using SNPRelate functions. These efforts generated eight data sets for comparative APD analyses: four gene-associated SNP (gaSNP) data sets (gaSNPs for heat tolerance, gaSNPs for cold tolerance, gaSNPs for fertility, gaSNPs for seed size), three genome-wide SNP data sets (genomeSNPs, neutralSNPs, genicSNPs), and one indel data set (indels). As SNP miscalls can occur from short-read genome sequence data, particularly for gaSNPs, allelic frequencies for each SNP or indel data set were evaluated using a custom R script, and the final SNP or indel data sets were revised accordingly.

2.3. APD Analysis

For each SNP or indel data set, APD and its standard deviation were obtained for each of the 2643 samples using a custom APD.r script published in Fu [20] in an R v. 4.1.2 environment [28]. The R script was specifically written following the method of Fu [12] and incorporating the SNPRelate package to address a large genomic data set. Briefly, the R script considers a typical marker-based characterization of self-fertile plant germplasm with n samples that are assayed at many SNP loci. A given sample can form n-1 pairs with the remaining assayed samples. For each of such pairs, the genotypic similarity (S) can be calculated based on the SNP genotypes following the simple matching coefficient of Sokal and Michener [13], and the pairwise dissimilarity is 1-S. The APD for the given sample can be obtained by averaging all n-1 pairwise dissimilarity values. The higher the APD value obtained for the given sample, the more genetically distinct the sample is among the assayed samples.
The APD.r script was applied to each SNP or indel data set described above on Agriculture and Agri-Food Canada’s Biocluster high-performance computing platform, and the computation lasted up to several hours, particularly with the large indel data set. For each data set, two extra APD analyses were separately made with O. sativa indica and japonica groups of 1789 and 854 samples, respectively, due to their unique genetic features in rice germplasm. This was performed by running the APD.r script modified with the input of a separate sample identification set for each sample group rather than the whole set of 2643 samples. These efforts generated 24 sets of APD estimates: 8 SNP or indel data sets (gaSNPs for heat tolerance, gaSNPs for cold tolerance, gaSNPs for fertility, gaSNPs for seed size, genomeSNPs, neutralSNPs, genicSNPs, and indels) x 3 sample groups (All2643, Indica1789, and Japonica854).
The acquired APD estimates in each data set were further analyzed for their variations with basic statistics and frequency distributions using custom R scripts. An APD correlation analysis was performed using a custom R script among eight data sets in each sample group. To facilitate rice germplasm utilization and management, we identified and presented the most genetically distinct set of 25 rice samples based on the highest APD values for each gaSNP data set in each sample group. The acquired APD estimates for these 24 data sets were compiled and listed in supplemental materials to enhance rice germplasm management and utilization.

3. Results

3.1. Variability in Identified SNPs and Indels

The SNP calling identified variable numbers of SNPs among eight SNP or indel data sets for all 2643 samples (Table 1). The total number of SNPs per data set was dependent on the total length of selected contigs or 12 chromosomes. For example, gaSNPs for fertility with 23 contigs had 1,200,284 SNPs, while gaSNPs for cold tolerance with 79 contigs had 3,685,200 SNPs. Based on the genome reference, there were 74,136,931 SNPs identified for 2643 samples. The published indel data set had 2,354,934 indels across 12 chromosomes. However, there were substantial amounts of missing values for those identified SNPs across 2643 samples, particularly for those from the whole genome (Table 1). Also, up to 700 SNP genotypes were found to be invariant homozygotes and/or heterozygotes across the assayed samples in the four gaSNP data sets and thus were removed by restricting the MAF of 0.495 or smaller from further analyses. To make the analysis comparative, the four gaSNP data sets had no missing SNP values and a range of SNPs from 24,453 to 29,955, while the genome-based SNPs or indels were allowed to have up to 20% of missing values (27,556 and 445,188, respectively), as shown in Table 1 for further data analysis.
Further analyses of the eight final SNP or indel data sets revealed that these selected SNPs were widely distributed across the selected contigs or 12 chromosomes (Table S5). Interestingly, there were 3 out of the 56 contigs without SNPs of non-missing values in gaSNPs for heat tolerance and 5 out of the 79 contigs without SNPs of non-missing values in gaSNPs for cold tolerance. All the selected contigs for the four gaSNP data sets were mainly located in chromosomes 1 to 9 (with one contig AP011111.1 in chromosome 11 associated with seed size), and thus, there were no SNPs identified on chromosomes 10 and 12, both of which did not carry any of the selected gene contigs (Tables S1–S4).
An allelic frequency analysis revealed a similar pattern of MAF distributions present in 2643 samples for the four gaSNP data sets and a similar pattern of MAF distributions for the three genome SNP data sets, while the indel data set displayed an extreme L-shape of MAF distribution, shown in Figure S1A. Such patterns of MAF distribution remain the same for smaller sample groups Indica1789 (Figure S1B) and Japonica854 (Figure S1C). The major differences were observed mainly in minor alleles with frequencies approaching 0.5: more of these were found in the four gaSNP data sets, and fewer in the three genome SNP and indel data sets.

3.2. Variability of APD Estimates for Three Sample Groups

APD estimates of each sample group were generated separately for each SNP or indel data set and all are listed in Tables S6–S8 for three sample groups (All2643, Indica1789, and Japonica854), respectively. Their statistical summaries were given in Table 2, along with the number of SNPs or indels present in each data set. The frequency analysis revealed that the APD estimates were largely following an approximately normal distribution in each SNP data set, while a skewed distribution toward the left was observed in each indel data set (Figure 1). More specifically, the four gaSNP data sets showed similar distribution patterns of APD estimates, while the three genome SNP data sets also displayed similar distribution patterns. Such APD distribution patterns and their variations across eight SNP or indel data sets remain the same for the three sample groups (Figure 1).
The correlation analyses of APD estimates for a given sample group among eight SNP or indel data sets revealed interesting patterns of correlations (Table 3). First, the four gaSNP data sets displayed significantly high correlation coefficients of 0.99 and the three genome SNP data sets showed significantly high correlation coefficients of 0.74 to 0.98 for the three sample groups. Second, the four gaSNP data sets were significantly and negatively correlated with the three genome SNP data sets in All2643 and Indica1789. However, some variations in correlation were also observed in Japonica854, in which the four gaSNP data sets showed positive correlations of APD estimates with those in the genicSNP data set. Third, indel-based APD estimates were significantly and weakly correlated with the other seven SNP data sets in All2643 but non-significantly in Indica1789 and Japonica854. To illustrate these correlation patterns, correlation plots for APD estimates of pairwise SNP or indel data sets were made for All2643 (Figure 2), Indica1789 (Figure S2), and Japonica854 (Figure S3). Clearly, there were three marked patterns of correlations in APD estimates among the eight SNP or indel data sets for the three sample groups.

3.3. Four Sets of Most Genetically Distinct Rice Lines

To facilitate rice germplasm utilization, efforts were made to select a set of the 25 most genetically distinct rice lines for each specific character, which was based on the highest APD estimates across 2643 samples (Table 4). These selected lines had APD estimates larger than two standard deviations and represented both indica and japonica groups, but there were more indica than japonica lines. For example, there were 22 indica and 3 japonica lines for heat tolerance and 21 indica and 4 japonica lines for the other three characters. The selected lines originated from 13 to 14 countries, showing diverse origins. For example, the 25 selected lines for heat tolerance were from 13 countries, while the set for fertility was from 14 countries. Interestingly, most of the selected lines were largely overlapping over the four sets for four characters. For example, the japonica line B166 from North Korea and the indica line IRIS_313-9108 from Bangladesh were present in four sets. In other words, these overlapping lines have genetically distinct backgrounds for all four assayed characters.
A similar effort was made to select a set of the 25 most genetically distinct rice lines for each of the four assayed characters from 1789 indica lines (Table 5) and from 854 japonica lines (Table 6). These selected indica and japonica rice lines had APD estimates larger than two standard deviations. Four indica sets represented lines originating from 11 to 12 countries. A majority of the selected lines overlapped across the four sets. For example, the indica lines IRIS_313-8466 from Thailand and IRIS_313-11968 from China were present in all four sets. Similarly, four distinct japonica sets consisted of 25 rice lines that originated from 12 to 16 countries or regions. Many selected lines were also present across the four japonica distinct sets. For example, the japonica lines B166 from North Korea and IRIS_313-11582 from China were consistently on the top two lines across the four distinct japonica sets.

4. Discussion

Our gene-associated APD exploration not only identified four novel sets of the 25 most genetically distinct rice lines for each of the four specific characters of breeding interest but also revealed several interesting APD properties. First, APD estimates displayed similar frequency distributions and high correlations among the four contig-based SNP data sets and among the three genome-based SNP data sets. Second, APD estimates were negatively correlated between the contig-based and genome-based SNP data sets. Third, indel-based APD estimates were nearly uncorrelated with those in the other seven SNP data sets. These findings are significant for plant germplasm characterization and germplasm utilization in plant breeding.
The results of APD correlations are novel and interesting. For example, the correlation coefficients of 0.99 or higher among the four gaSNP data sets were much higher than the genetic correlations generally expected among these specific characters. With such high APD correlations, one character set of gaAPD estimates for a sample group is sufficient to assess the relative genetic distinctness of the samples with respect to the other characters. However, the observations of negative and/or weak correlations of contig-based APD estimates with those genome-based and indel-based APD estimates are largely unexpected, as these negative correlations implied that gaAPD estimates carried different sets of genetic backgrounds from those present across the genome. More surprising was the finding of negative or weak correlations between gaAPD estimates and those based on genome genic SNPs, as genic SNPs sampled functional regions of the whole genome, and they may overlap with those gaSNPs generated from gene contigs. It is possible that the strong linkage of alleles present in the gaSNPs, compared to the whole genome genic SNPs, may have contributed to the negative correlations. Also, gaAPD estimates were known to be compounded with mis-called SNP genotypes from gene paralogs, as contig-based SNP calls from short sequence reads cannot distinguish adequately between gene orthologs and paralogs (e.g., see [29,30]). This was evident that abundant SNP genotypes were found to be invariant homozygotes and/or heterozygotes across the assayed samples in the four gaSNP data sets. However, we cannot determine the extent of bias in APD estimation from existing paralogs and evaluate the degree of the impacts by biased estimation on the APD-based ranking of samples and APD correlations among SNP or indel data sets. How to improve gaSNP calls from short-read sequence data remains a topic of research interest.
Marker-based average pairwise dissimilarity is a function of marker genotypes of the assayed samples [12]. Thus, even with the same set of SNP genotypes, APD estimates per sample will vary if the APD estimation was made on a subset of the assayed samples versus on all the assayed samples. An APD estimation can also be affected by the type, size, and distribution of genetic markers such as various types of SNPs and indels, as studied here. It was previously found that the APD estimation was more sensitive to the number of SNPs, minor allele frequency, and missing data but less sensitive to the sample size and that 5000 to 10,000 genome-wide SNPs were generally required for an effective APD analysis [20]. As clearly demonstrated in the present APD analysis, APD estimates of the same sample group were different among different SNP or indel data sets (e.g., see Tables S6–S8). Such differences revealed the limitation in the informativeness of APD estimates to rank the assayed germplasm, as APD estimates are strictly informative only to the assayed samples with the given type, size, and distribution of genetic markers. An exception seems to exist for those gaAPD estimates with extremely high correlations among the four gaSNP data sets for the four assayed characters. Despite this exception, it is important to know such a limitation for proper interpretations of APD estimates with respect to germplasm selection and use.
The four sets of rice lines selected from 2643 samples (Table 4) had APD estimates larger than two standard deviations and represented the rice germplasm with the most genetically distinct backgrounds for the specific characters of heat and cold tolerance, fertility, and seed size. Similarly, the sets selected from the indica group of 1789 samples and from the japonica group of 854 samples were the rice germplasm with the most genetically distinct backgrounds for the four assayed characters. As indicated above, however, it is important to know that the selected sets of rice lines from the three sample groups can differ, as their APD estimates were based on different gaSNP data sets. For example, APD estimates for All2643 and Japonica854 for heat tolerance were made for 2643 and 854 samples based on 24,868 and 24,509 SNPs, respectively. The japonica line B166 from North Korea was present in both selected sets for heat tolerance (Table 4 and Table 6), but the second top japonica line IRIS_313-10057 from Japan in the set for All2643 (Table 4) was not present in the corresponding set for Japonica854 (Table 6). Thus, caution should be exercised for using the distinct sets generated from different APD estimations. Selecting the distinct set for a specific character should depend on research or breeding objectives. For example, if research is configured on the seed size of general rice germplasm, the distinct set from All2643 (Table 4) or APD estimates from Table S6 should be considered. Similarly, if a breeding effort is planned on the fertility of japonica rice lines, the distinct set from Japonica854 (Table 6) or APD estimates from Table S8 should be applied. Note that seeds for the distinct sets of rice lines (Table 4, Table 5 and Table 6) should be accessible upon a proper germplasm request to the International Rice Genebank (https://www.irri.org/rice-seeds; accessed on 9 October 2024) at the International Rice Research Institute, Philippines. Each distinct set can also be expanded, if needed, to include more genetically distinct lines by ranking the corresponding APD estimates listed in Tables S6–S8 and selecting the lines with the highest APD estimates.
Our APD analysis here was fully based on SNP or indel data without the need to perform the phenotypic characterization of the assayed rice lines for the associated traits. Thus, we did not have the related phenotypic data of these assayed characters to evaluate the correlations of the resulting APD estimates with their corresponding phenotypic values (for a specific character) of a sample group. However, it is useful to study such correlations by selecting a distinct set and a random set (as control) of rice lines and evaluating them agronomically in diverse environments. Such studies will not only allow for understanding how the rice lines with distinct genetic backgrounds are adapted to diverse environments but also verify the selection of exotic germplasm with the distinct genetic backgrounds and the traits of breeding interest for breeding gene pool addition. Alternatively, when a set of germplasm accessions with acquired traits of interest was identified, presumably from a pre-breeding effort (see [7]) or other germplasm search approaches (as described by Sukumaran et al. [9]), an additional APD analysis can be performed on the selected germplasm set to verify their genetic backgrounds for the re-selection of a few truly elite exotic lines. Either way, an APD analysis can serve as a supplemental genetic tool to search for genetically distinct exotic lines for the widening of elite breeding gene pools.

5. Conclusive Remarks

Exploring the gene-associated APD analysis in the genomic characterization of 2643 rice lines generated four novel sets of the 25 most genetically distinct rice lines, each for a specific character (heat tolerance, cold tolerance, fertility, or seed size). It also revealed several interesting APD properties. Four contig-based SNP data sets for four specific characters displayed similar frequency distributions and high correlations of APD estimates. Contig-based APD estimates were negatively correlated with genome-based APD estimates and nearly uncorrelated with indel-based APD estimates. These findings are significant for plant germplasm characterization and utilization in plant breeding.

Supplementary Materials

The following supporting material can be downloaded at https://doi.org/10.6084/m9.figshare.27214971 (accessed 17 November 2024): Figure S1. Distributions of minor allelic frequencies in eight SNP or indel data sets for three sample groups. Figure S2. Pairwise correlations of APD estimates among eight SNP or indel data sets for the 1789 indica samples. Figure S3. Pairwise correlations of APD estimates among eight SNP or indel data sets for the 854 japonica samples. Table S1. List of 60 genes (on 56 contigs) selected from TO:0000259 with 158 genes associated with heat tolerance and their related information. Table S2. List of 83 genes (on 79 contigs) selected from TO:0000303 with 184 genes associated with cold tolerance and their related information. Table S3. List of 24 genes (on 23 contigs) selected from TO:0000420 with 56 genes associated with fertility and their related information. Table S4. List of 48 genes (on 44 contigs) selected from TO:0000391 with 111 genes associated with seed size and their related information. Table S5. SNP or indel counts per contig or chromosome (chr) in eight SNP or indel data sets for all 2643 rice samples. Table S6. List of APD estimates for eight SNP or indel data sets for all 2643 samples (in excel file). Table S7. List of APD estimates for eight SNP or indel data sets for 1789 Indica samples (in excel file). Table S8. List of APD estimates for eight SNP or indel data sets for 854 Japonica samples (in excel file).

Funding

This study was funded by AAFC research grants J-000066, J-000185 and J-003159 to Yong-Bi Fu.

Data Availability Statement

The meta data sets generated for this paper are included as Supplementary Materials to this paper.

Acknowledgments

The author thanks Jeffrey Ross-Ibarra for his helpful discussion on the inference of allelic frequency distribution; Chenyi Liu for his assistance in gene data processing and helpful reading of the early version of manuscript; Carolee Horbach for her assistance in generating APD distribution plots and editing the early draft of manuscript; and Bill Biligetu for his constructive comments on the early version of manuscript.

Conflicts of Interest

The author declare no conflicts of interest.

References

  1. Bernardo, R.N. Essentials of Plant Breeding; Stemma Press: London, UK, 2014. [Google Scholar]
  2. Allier, A.; Teyssèdre, S.; Lehermeier, C.; Moreau, L.; Charcosset, A. Optimized breeding strategies to harness genetic resources with different performance levels. BMC Genom. 2020, 21, 349. [Google Scholar] [CrossRef]
  3. Allard, R.W. Principles of Plant Breeding, 2nd ed.; John Wiley, Sons, Inc.: New York, NY, USA, 1999. [Google Scholar]
  4. Bohra, A.; Kilian, B.; Sivasankar, S.; Caccamo, M.; Mba, C.; McCouch, S.R.; Varshney, R.K. Reap the crop wild relatives for breeding future crops. Trends Biotechnol. 2022, 22, 624–637. [Google Scholar] [CrossRef] [PubMed]
  5. Prohens, J.; Gramazio, P.; Plazas, M.; Dempewolf, H.; Kilian, B.; Díez, M.J.; Fita, A.; Herraiz, F.J.; Rodríguez-Burruezo, A.; Soler, S.; et al. Introgressiomics: A new approach for using crop wild relatives in breeding for adaptation to climate change. Euphytica 2017, 213, 158. [Google Scholar] [CrossRef]
  6. Wang, C.; Hu, S.; Gardner, C.; Lübberstedt, T. Emerging avenues for utilization of exotic germplasm. Trends Plant Sci. 2017, 22, 624–637. [Google Scholar] [CrossRef] [PubMed]
  7. Schulthess, A.W.; Kale, S.M.; Liu, F.; Zhao, Y.; Philipp, N.; Rembe, M.; Jiang, Y.; Beukert, U.; Serfling, A.; Himmelbach, A.; et al. Genomics-informed prebreeding unlocks the diversity in genebanks for wheat improvement. Nat. Genet. 2022, 54, 1544–1552. [Google Scholar] [CrossRef] [PubMed]
  8. Hernandez, J.; Meints, B.; Hayes, P. Introgression breeding in barley: Perspectives and case studies. Front. Plant Sci. 2020, 11, 761. [Google Scholar] [CrossRef] [PubMed]
  9. Sukumaran, S.; Rebetzke, G.; Mackay, I.; Bentley, A.R.; Reynolds, M.P. Pre-breeding strategies. In Wheat Improvement; Reynolds, M.P., Braun, H.J., Eds.; Springer: Cham, Switzerland, 2022. [Google Scholar] [CrossRef]
  10. Yu, X.; Li, X.; Guo, T.; Zhu, C.; Wu, Y.; Mitchell, S.E.; Roozeboom, K.L.; Wang, D.; Wang, M.L.; Pederson, G.A.; et al. Genomic prediction contributing to a promising global strategy to turbocharge gene banks. Nat. Plants 2016, 2, 16150. [Google Scholar] [CrossRef]
  11. Li, Y.; Shi, F.; Lin, Z.; Robinson, H.; Moody, D.; Rattey, A.; Godoy, J.; Mullan, D.; Keeble-Gagnere, G.; Hayden, M.J.; et al. Benefit of introgression depends on level of genetic trait variation in cereal breeding programmes. Front. Plant Sci. 2022, 13, 786452. [Google Scholar] [CrossRef] [PubMed]
  12. Fu, Y.B. Redundancy and distinctness in flax germplasm as revealed by RAPD dissimilarity. Plant Genet. Resour. 2006, 4, 117–124. [Google Scholar] [CrossRef]
  13. Sokal, R.R.; Michener, C.D. A statistical method for evaluating systematic relationships. Univ. Kans. Sci. Bull. 1958, 38, 1409–1438. [Google Scholar]
  14. Yang, M.H.; Fu, Y.B. AveDissR: An R function for assessing genetic distinctness and genetic redundancy. Appl. Plant Sci. 2017, 5, 1700018. [Google Scholar] [CrossRef] [PubMed]
  15. Peterson, G.W.; Dong, Y.; Horbach, C.; Fu, Y.B. Genotyping-by-sequencing for plant genetic diversity analysis: A lab guide for SNP genotyping. Diversity 2014, 6, 665–680. [Google Scholar] [CrossRef]
  16. Milner, S.G.; Jost, M.; Taketa, S.; Mazón, E.R.; Himmelbach, A.; Oppermann, M.; Weise, S.; Knüpffer, H.; Basterrechea, M.; König, P.; et al. Genebank genomics highlights the diversity of a global barley collection. Nat. Genet. 2019, 51, 319–326. [Google Scholar] [CrossRef] [PubMed]
  17. Sansaloni, C.; Franco, J.; Santos, B.; Percival-Alwyn, L.; Singh, S.; Petroli, C.; Campos, J.; Dreher, K.; Payne, T.; Marshall, D.; et al. Diversity analysis of 80,000 wheat accessions reveals consequences and opportunities of selection footprints. Nat. Commun. 2020, 11, 4572. [Google Scholar] [CrossRef] [PubMed]
  18. Varshney, R.K.; Roorkiwal, M.; Sun, S.; Bajaj, P.; Chitikineni, A.; Thudi, M.; Singh, N.P.; Du, X.; Upadhyaya, H.D.; Khan, A.W.; et al. A chickpea genetic variation map based on the sequencing of 3366 genomes. Nature 2021, 599, 622–627. [Google Scholar] [CrossRef] [PubMed]
  19. Song, Q.; Hyten, D.L.; Jia, G.; Quigley, C.V.; Fickus, E.W.; Nelson, R.L.; Cregan, P.B. Fingerprinting soybean germplasm and its utility in genomic research. G3 Genes Genomes Genet. 2015, 5, 1999–2006. [Google Scholar] [CrossRef] [PubMed]
  20. Fu, Y.B. Assessing genetic distinctness and redundancy of plant germplasm conserved ex situ based on published genomic SNP data. Plants 2023, 12, 1476. [Google Scholar] [CrossRef] [PubMed]
  21. Wang, W.; Mauleon, R.; Hu, Z.; Chebortarov, D.; Tai, S.; Wu, Z.; Li, M.; Zheng, T.; Fuentes, R.R.; Zhang, F.; et al. Genomic variation in 3010 diverse accessions of Asian cultivated rice. Nature 2018, 557, 43–49. [Google Scholar] [CrossRef]
  22. Li, H.; Durbin, R. Fast and accurate short read alignment with Burrows-Wheeler Transform. Bioinformatics 2009, 25, 1754–1760. [Google Scholar] [CrossRef] [PubMed]
  23. Danecek, P.; Bonfield, J.K.; Liddle, J.; Marshall, J.; Ohan, V.; Pollard, M.O.; Whitwham, A.; Keane, T.; McCarthy, S.A.; Davies, R.M.; et al. Twelve years of SAMtools and BCFtools. GigaScience 2021, 10, giab008. [Google Scholar] [CrossRef] [PubMed]
  24. Danecek, P.; McCarthy, S.A.; HipSci Consortium; Durbin, R. A method for checking genomic integrity in cultured cell lines from SNP genotyping data. PLoS ONE 2016, 11, e0155014. [Google Scholar] [CrossRef]
  25. Danecek, P.; Auton, A.; Abecasis, G.; Albers, C.A.; Banks, E.; DePristo, M.A.; Handsaker, R.E.; Lunter, G.; Marth, G.T.; Sherry, S.T.; et al. The variant call format and VCFtools. Bioinformatics 2011, 27, 2156–2158. [Google Scholar] [CrossRef]
  26. Neph, S.; Kuehn, M.S.; Reynolds, A.P.; Haugen, E.; Thurman, R.E.; Johnson, A.K.; Rynes, E.; Maurano, M.T.; Vierstra, J.; Thomas, S.; et al. BEDOPS: High-performance genomic feature operations. Bioinformatics 2012, 28, 1919–1920. [Google Scholar] [CrossRef]
  27. Zheng, X.; Levine, D.; Shen, J.; Gogarten, S.M.; Laurie, C.; Weir, B.S. A high-performance computing toolset for relatedness and principal component analysis of SNP data. Bioinformatics 2012, 28, 3326–3328. [Google Scholar] [CrossRef] [PubMed]
  28. R Core Team. R: A Language and Environment for Statistical Computing; R Foundation for Statistical Computing: Vienna, Austria, 2022; Available online: https://www.R-project.org/ (accessed on 22 September 2024).
  29. Field, M.A.; Burgio, G.; Chuah, A.; Shekaili, J.A.; Hassan, B.; Sukaiti, N.A.; Foote, S.J.; Cook, M.C.; Andrews, T.D. Recurrent miscalling of missense variation from short-read genome sequence data. BMC Genom. 2019, 20, 546. [Google Scholar] [CrossRef] [PubMed]
  30. Steyaert, W.; Haer-Wigman, L.; Pfundt, R.; Hellebrekers, D.; Steehouwer, M.; Hampstead, J.; de Boer, E.; Stegmann, A.; Yntema, H.; Kamsteeg, E.-J.; et al. Systematic analysis of paralogous regions in 41,755 exomes uncovers clinically relevant variation. Nat. Commun. 2023, 14, 6845. [Google Scholar] [CrossRef]
Figure 1. Frequency distribution of APD estimates in eight SNP or indel data sets for three sample groups (All2643, Indica1789, and Japonica854). M = mean and R = range.
Figure 1. Frequency distribution of APD estimates in eight SNP or indel data sets for three sample groups (All2643, Indica1789, and Japonica854). M = mean and R = range.
Crops 04 00044 g001
Figure 2. Pairwise correlations of APD estimates among eight SNP or indel data sets for all 2643 samples.
Figure 2. Pairwise correlations of APD estimates among eight SNP or indel data sets for all 2643 samples.
Crops 04 00044 g002
Table 1. Summary of SNP or indel identifications in eight SNP or indel data sets for all 2643 samples. The bold numbers of SNPs or indels indicate the final SNP or indel data sets used for APD analyses. Three levels of missing values for SNPs or indels were up to 20%, 10% and no missing values.
Table 1. Summary of SNP or indel identifications in eight SNP or indel data sets for all 2643 samples. The bold numbers of SNPs or indels indicate the final SNP or indel data sets used for APD analyses. Three levels of missing values for SNPs or indels were up to 20%, 10% and no missing values.
ChromosomeSequence LengthSNP or Indel SNP Count
Data SetOr Contig CountIn Base PairCount20% Missing10% MissingNo Missing
gaSNPs for heat tolerance568,429,4902,672,786183,883183,88324,868
gaSNPs for cold tolerance7911,855,4093,685,200283,772218,13326,435
gaSNPs for fertility233,542,3471,200,284183,883111,42224,453
gaSNPs for seed size446,528,3742,107,785221,124179,90329,955
genomeSNPs12217,331,82474,136,93127,556663036
neutralSNPs12 17,873
genicSNPs12 9683
indels12 2,354,934445,188438,466386,361
Table 2. Statistical summary of APD estimates in eight SNP or indel data sets for three sample groups (All2643, Indica1789, and Japonica854), along with SNP counts.
Table 2. Statistical summary of APD estimates in eight SNP or indel data sets for three sample groups (All2643, Indica1789, and Japonica854), along with SNP counts.
SNPAPD Estimates
CountMeanStandard DeviationMinimumMaximum
All2643 samples
gaSNPs for heat tolerance24,8680.21460.01560.17440.2689
gaSNPs for cold tolerance26,4350.21390.01420.17450.2598
gaSNPs for fertility24,4530.21540.01680.18040.2803
gaSNPs for seed size29,9550.21710.01520.17590.2687
genomeSNPs27,5560.27900.01030.22230.3063
neutralSNPs17,8730.27160.00950.21780.2997
genicSNPs96830.28780.01050.20720.3143
indels445,1880.35240.02600.31160.5233
Indica1789 samples
gaSNPs for heat tolerance24,5560.20800.01680.17130.2635
gaSNPs for cold tolerance26,1680.20830.01530.17110.2586
gaSNPs for fertility24,1070.20830.01820.17620.2788
gaSNPs for seed size29,5780.21110.01650.17310.2675
genomeSNPs25,2860.29710.00540.25750.3160
neutralSNPs16,2310.29110.00560.24850.3116
genicSNPs89670.30350.00590.23810.3281
indels445,1760.32990.01750.30870.5315
Japonica854 samples
gaSNPs for heat tolerance24,5090.19800.01450.17000.2503
gaSNPs for cold tolerance26,0950.19530.01350.16690.2404
gaSNPs for fertility23,9390.19390.01560.16550.2558
gaSNPs for seed size29,4530.19610.01440.16780.2461
genomeSNPs17,4440.29490.01620.25840.3919
neutralSNPs11,1030.29740.01510.26270.3879
genicSNPs60250.30790.01640.26750.4151
indels193,7000.14140.02170.10950.3152
Table 3. Pairwise correlations of APD estimates in eight SNP or indel data sets for three sample groups (All2643, Indica1789, and Japonica854). The lower diagonal shows the estimates of the Pearson correlation coefficients, and the upper diagonal shows the levels of significant tests (0.0001 means that p is much smaller than 0.0001).
Table 3. Pairwise correlations of APD estimates in eight SNP or indel data sets for three sample groups (All2643, Indica1789, and Japonica854). The lower diagonal shows the estimates of the Pearson correlation coefficients, and the upper diagonal shows the levels of significant tests (0.0001 means that p is much smaller than 0.0001).
HeatColdFertilitySeed SizeGenomeNeutralGenicIndel
All2643 samples
gaSNPs for heat tolerance 0.00010.00010.00010.00010.00010.00010.0001
gaSNPs for cold tolerance0.991 0.00010.00010.00010.00010.00010.0001
gaSNPs for fertility0.9910.990 0.00010.00010.00010.00010.0001
gaSNPs for seed size0.9920.9930.991 0.00010.00010.00010.0001
genomeSNPs−0.718−0.687−0.705−0.689 0.00010.00010.0147
neutralSNPs−0.737−0.707−0.727−0.7100.982 0.00010.0120
genicSNPs−0.583−0.545−0.573−0.5500.9340.914 0.0525
indels−0.078−0.075−0.082−0.0780.0470.0490.038
Indica1789 samples
gaSNPs for heat tolerance0.00010.00010.00010.00010.00010.00010.5087
gaSNPs for cold tolerance0.993 0.00010.00010.00010.00010.00010.4754
gaSNPs for fertility0.9930.991 0.00010.00010.00010.00010.5437
gaSNPs for seed size0.9950.9940.992 0.00010.00010.00010.4414
genomeSNPs−0.428−0.423−0.432−0.430 0.00010.00010.1904
neutralSNPs−0.441−0.436−0.448−0.4410.947 0.00010.2675
genicSNPs−0.148−0.139−0.158−0.1470.7760.736 0.6156
indels0.0160.0170.0140.0180.0310.0290.014
Japonica854 samples
gaSNPs for heat tolerance0.00010.00010.00010.18600.05840.01010.1220
gaSNPs for cold tolerance0.992 0.00010.00010.47320.19890.00150.0898
gaSNPs for fertility0.9910.989 0.00010.07810.01710.02930.0722
gaSNPs for seed size0.9940.9930.991 0.33190.12470.00440.1021
genomeSNPs−0.045−0.025−0.060−0.033 0.00010.00010.4486
neutralSNPs−0.065−0.044−0.082−0.0530.984 0.00010.4246
genicSNPs0.0880.1080.0750.0970.9210.905 0.4255
indels−0.053−0.058−0.062−0.056−0.026−0.027−0.027
Table 4. A list of the 25 most genetically distinct rice lines identified for each of the four specific characters based on the gene-associated APD estimates among all 2643 rice lines. Origin = the country or region of sample origin and SD = standard deviation.
Table 4. A list of the 25 most genetically distinct rice lines identified for each of the four specific characters based on the gene-associated APD estimates among all 2643 rice lines. Origin = the country or region of sample origin and SD = standard deviation.
SampleGroupOriginAPDSDSampleGroupOriginAPDSD
Character: Heat tolerance Character: Cold tolerance
B166japonicaNorth Korea0.26890.0179B166japonicaNorth Korea0.25980.0178
IRIS_313-9108indicaBangladesh0.26480.0118B203indicaChina0.25940.0107
IRIS_313-10375indicaPhilippines0.26450.0102IRIS_313-9108indicaBangladesh0.25930.0105
B203indicaChina0.26440.0125IRIS_313-10002indicaSri Lanka0.25910.0123
IRIS_313-10002indicaSri Lanka0.26350.0127B181indicaAustralia0.25870.0120
IRIS_313-9575indicaThailand0.26350.0131IRIS_313-8859indicaChina0.25770.0120
B146indicaChina0.26210.0138IRIS_313-8383indicaPhilippines0.25750.0124
IRIS_313-8859indicaChina0.26190.0132IRIS_313-10057japonicaJapan0.25590.0199
B202indicaChina0.26070.0124CX51indicaChina0.25530.0118
B185indicaLao0.26020.0131CX9indicaChina0.25520.0118
B181indicaAustralia0.26010.0133IRIS_313-10375indicaPhilippines0.25500.0092
B030indicaIndia0.25980.0140IRIS_313-9814japonicaHungary0.25500.0190
IRIS_313-11733indicaChina0.25940.0141B087indicaChina0.25450.0120
CX50indicaChina0.25930.0147IRIS_313-8401indicaIndia0.25450.0121
IRIS_313-8401indicaIndia0.25860.0137IRIS_313-9575indicaThailand0.25440.0123
CX9indicaChina0.25850.0133B030indicaIndia0.25390.0126
B087indicaChina0.25810.0130B146indicaChina0.25330.0128
IRIS_313-10057japonicaJapan0.25800.0200B202indicaChina0.25330.0118
CX84indicaVietnam0.25780.0121IRIS_313-8474indicaThailand0.25290.0127
IRIS_313-10341indicaBangladesh0.25720.0139CX86indicaVietnam0.25280.0133
IRIS_313-11144indicaMyanmar0.25680.0151CX50indicaChina0.25250.0132
IRIS_313-8383indicaPhilippines0.25660.0135IRIS_313-9346japonicaTaiwan0.25240.0196
CX548indicaChina0.25650.0140CX548indicaChina0.25230.0125
IRIS_313-8314japonicaIndonesia0.25630.0148B185indicaLao0.25220.0122
CX86indicaVietnam0.25600.0136IRIS_313-11733indicaChina0.25190.0136
Character: Fertility Character: Seed size
IRIS_313-10375indicaPhilippines0.28030.0111B203indicaChina0.26870.0111
B166japonicaNorth Korea0.27680.0207B166japonicaNorth Korea0.26850.0205
B203indicaChina0.27450.0134IRIS_313-9108indicaBangladesh0.26680.0114
CX84indicaVietnam0.27180.0136B185indicaLao0.26540.0133
IRIS_313-9108indicaBangladesh0.27070.0137IRIS_313-10002indicaSri Lanka0.26440.0131
B087indicaChina0.26990.0143B181indicaAustralia0.26380.0130
B181indicaAustralia0.26960.0149IRIS_313-8383indicaPhilippines0.26370.0138
IRIS_313-9575indicaThailand0.26940.0149IRIS_313-8401indicaIndia0.26370.0137
IRIS_313-8383indicaPhilippines0.26860.0150B087indicaChina0.26320.0127
CX9indicaChina0.26830.0145IRIS_313-10057japonicaJapan0.26300.0211
IRIS_313-8401indicaIndia0.26820.0151B030indicaIndia0.26250.0137
IRIS_313-10002indicaSri Lanka0.26740.0153IRIS_313-10375indicaPhilippines0.26230.0101
B202indicaChina0.26720.0135CX51indicaChina0.26190.0126
B185indicaLao0.26710.0150IRIS_313-8859indicaChina0.26160.0129
B146indicaChina0.26680.0152CX9indicaChina0.26110.0133
IRIS_313-8859indicaChina0.26620.0145CX50indicaChina0.26080.0140
B030indicaIndia0.26510.0154IRIS_313-9575indicaThailand0.26040.0131
IRIS_313-11144indicaMyanmar0.26410.0161IRIS_313-9814japonicaHungary0.26040.0212
IRIS_313-9814japonicaHungary0.26350.0232B146indicaChina0.26010.0144
IRIS_313-10341indicaBangladesh0.26250.0157IRIS_313-10341indicaBangladesh0.25930.0145
IRIS_313-10057japonicaJapan0.26190.0234B202indicaChina0.25900.0126
IRIS_313-8314japonicaIndonesia0.26170.0171CX548indicaChina0.25900.0138
CX50indicaChina0.26150.0160IRIS_313-9346japonicaTaiwan0.25870.0227
IRIS_313-11139indicaMyanmar0.26100.0158B244indicaChina0.25850.0136
CX51indicaChina0.26090.0143IRIS_313-11144indicaMyanmar0.25830.0145
Table 5. A list of the 25 most genetically distinct rice lines identified for each of the four specific characters based on the gene-associated APD estimates among 1789 indica rice lines. Origin = the country or region of sample origin and SD = standard deviation.
Table 5. A list of the 25 most genetically distinct rice lines identified for each of the four specific characters based on the gene-associated APD estimates among 1789 indica rice lines. Origin = the country or region of sample origin and SD = standard deviation.
SampleGroupOriginAPDSDSampleGroupOriginAPDSD
Character: Heat tolerance Character: Cold tolerance
IRIS_313-8466indicaThailand0.26350.0107IRIS_313-11968indicaChina0.25860.0099
IRIS_313-11968indicaChina0.26320.0106B203indicaChina0.25810.0098
B203indicaChina0.26180.0106IRIS_313-7636indicaMali0.25630.0102
IRIS_313-7636indicaMali0.26110.0107B181indicaAustralia0.25600.0100
IRIS_313-12190indicaLao0.26090.0111IRIS_313-8466indicaThailand0.25540.0096
IRIS_313-11894indicaVietnam0.25960.0113IRIS_313-11894indicaVietnam0.25530.0102
B146indicaChina0.25910.0111IRIS_313-11763indicaCameroon0.25490.0105
B202indicaChina0.25900.0110CX561indicaChina0.25320.0103
B185indicaLao0.25760.0111CX370indicaChina0.25310.0105
B181indicaAustralia0.25750.0111IRIS_313-11779indicaTanzania0.25230.0107
B030indicaIndia0.25650.0113IRIS_313-12190indicaLao0.25170.0107
IRIS_313-11084indicaCambodia0.25640.0116B087indicaChina0.25160.0105
CX413indicaPhilippines0.25630.0129B202indicaChina0.25140.0106
B087indicaChina0.25600.0114B030indicaIndia0.25100.0106
CX561indicaChina0.25590.0115B146indicaChina0.25040.0108
IRIS_313-11779indicaTanzania0.25590.0115IRIS_313-11799indicaChina0.24990.0108
CX369indicaPhilippines0.25550.0116B185indicaLao0.24980.0106
IRIS_313-8405indicaChina0.25450.0117CX378indicaChina0.24980.0109
IRIS_313-11763indicaCameroon0.25420.0115CX416indicaPhilippines0.24960.0110
CX378indicaChina0.25350.0116CX369indicaPhilippines0.24930.0108
CX416indicaPhilippines0.25340.0115B244indicaChina0.24850.0112
IRIS_313-8265indicaIndia0.25290.0119IRIS_313-11896indicaVietnam0.24850.0113
IRIS_313-10054indicaPanama0.25290.0120IRIS_313-11084indicaCambodia0.24830.0112
B207indicaChina0.25220.0118IRIS_313-8265indicaIndia0.24830.0110
CX370indicaChina0.25220.0118IRIS_313-10045indicaGambia0.24830.0112
Character: Fertility Character: Seed size
IRIS_313-8466indicaThailand0.27880.0113B203indicaChina0.26750.0097
B203indicaChina0.27160.0110IRIS_313-11968indicaChina0.26560.0101
CX413indicaPhilippines0.26910.0145IRIS_313-8466indicaThailand0.26260.0106
IRIS_313-11968indicaChina0.26840.0115B185indicaLao0.26260.0106
B087indicaChina0.26670.0116IRIS_313-7636indicaMali0.26180.0108
B181indicaAustralia0.26600.0113B087indicaChina0.26070.0103
IRIS_313-12190indicaLao0.26590.0118B181indicaAustralia0.26070.0103
CX561indicaChina0.26520.0118IRIS_313-11763indicaCameroon0.26070.0110
IRIS_313-11763indicaCameroon0.26500.0116IRIS_313-11779indicaTanzania0.26060.0110
B202indicaChina0.26480.0115CX370indicaChina0.26020.0108
IRIS_313-11779indicaTanzania0.26460.0115B030indicaIndia0.25960.0109
IRIS_313-7636indicaMali0.26320.0118IRIS_313-11894indicaVietnam0.25880.0106
B185indicaLao0.26320.0117CX561indicaChina0.25830.0108
IRIS_313-11894indicaVietnam0.26290.0117IRIS_313-12190indicaLao0.25820.0112
B146indicaChina0.26280.0116CX369indicaPhilippines0.25760.0108
B030indicaIndia0.26150.0120B202indicaChina0.25730.0109
IRIS_313-10054indicaPanama0.25950.0119IRIS_313-11896indicaVietnam0.25650.0114
IRIS_313-11896indicaVietnam0.25930.0126B146indicaChina0.25630.0112
IRIS_313-8405indicaChina0.25870.0125CX413indicaPhilippines0.25620.0121
CX370indicaChina0.25800.0119CX378indicaChina0.25580.0110
CX369indicaPhilippines0.25730.0123IRIS_313-8405indicaChina0.25570.0111
B244indicaChina0.25720.0124B244indicaChina0.25570.0112
IRIS_313-10045indicaGambia0.25700.0123CX416indicaPhilippines0.25510.0112
IRIS_313-8265indicaIndia0.25680.0121IRIS_313-8265indicaIndia0.25500.0113
B207indicaChina0.25680.0124IRIS_313-11084indicaCambodia0.25500.0113
Table 6. A list of the 25 most genetically distinct rice lines identified for each of the four specific characters based on the gene-associated APD estimates among 854 japonica rice lines. Origin = the country or region of sample origin and SD = standard deviation.
Table 6. A list of the 25 most genetically distinct rice lines identified for each of the four specific characters based on the gene-associated APD estimates among 854 japonica rice lines. Origin = the country or region of sample origin and SD = standard deviation.
SampleGroupOriginAPDSDSampleGroupOriginAPDSD
Character: Heat tolerance Character: Cold tolerance
B166japonicaNorth Korea0.25030.0099B166japonicaNorth Korea0.24040.0098
IRIS_313-11582japonicaChina0.24520.0105IRIS_313-11582japonicaChina0.23770.0103
CX389japonicaChina0.24250.0103CX389japonicaChina0.23670.0101
B144japonicaChina0.23980.0107IRIS_313-12330japonicaLao0.23500.0106
IRIS_313-12226japonicaLao0.23950.0112IRIS_313-7863japonicaBrazil0.23460.0112
IRIS_313-7863japonicaBrazil0.23800.0115IRIS_313-11540japonicaGuinea0.23430.0113
IRIS_313-7856japonicaThailand0.23760.0112IRIS_313-8046japonicaItaly0.23270.0113
IRIS_313-12006japonicaMalaysia0.23600.0110IRIS_313-12063japonicaLao0.23200.0109
IRIS_313-8046japonicaItaly0.23570.0122IRIS_313-12226japonicaLao0.23100.0109
IRIS_313-11540japonicaGuinea0.23560.0117B199japonicaChina0.23100.0109
IRIS_313-12330japonicaLao0.23480.0113B144japonicaChina0.23050.0107
IRIS_313-11923japonicaThailand0.23420.0119IRIS_313-7856japonicaThailand0.22930.0109
IRIS_313-9366japonicaUnited States of America0.23390.0116CX352japonicaChina0.22900.0109
IRIS_313-12063japonicaLao0.23360.0119IRIS_313-12006japonicaMalaysia0.22820.0108
B025japonicaIndonesia0.23340.0112IRIS_313-9366japonicaUnited States of America0.22820.0112
B169japonicaJapan0.23310.0117IRIS_313-11652japonicaChina0.22810.0115
CX353japonicaVietnam0.23310.0116IRIS_313-11923japonicaThailand0.22800.0113
IRIS_313-7850japonicaMadagascar0.23310.0118IRIS_313-11890japonicaTaiwan0.22780.0113
B117japonicaChina0.23300.0118B037japonicaArgentina0.22730.0110
B199japonicaChina0.23290.0116IRIS_313-7850japonicaMadagascar0.22720.0117
IRIS_313-12266japonicaMyanmar0.23270.0120B025japonicaIndonesia0.22680.0109
IRIS_313-11755japonicaLiberia0.23260.0116IRIS_313-11755japonicaLiberia0.22680.0115
IRIS_313-11890japonicaTaiwan0.23230.0119IRIS_313-11928japonicaPhilippines0.22680.0113
IRIS_313-11652japonicaChina0.23180.0118IRIS_313-12348japonicaLao0.22660.0115
CX352japonicaChina0.23100.0113CX307japonicaChina0.22600.0112
Character: Fertility Character: Seed size
B166japonicaNorth Korea0.25580.0102B166japonicaNorth Korea0.24610.0104
IRIS_313-11582japonicaChina0.24820.0108IRIS_313-11582japonicaChina0.24340.0101
CX389japonicaChina0.24320.0103IRIS_313-7863japonicaBrazil0.24070.0109
IRIS_313-7856japonicaThailand0.24170.0114CX389japonicaChina0.24060.0099
IRIS_313-12006japonicaMalaysia0.23930.0110IRIS_313-12330japonicaLao0.23820.0108
IRIS_313-12330japonicaLao0.23920.0112IRIS_313-11540japonicaGuinea0.23680.0116
IRIS_313-7863japonicaBrazil0.23810.0116B144japonicaChina0.23530.0104
IRIS_313-12226japonicaLao0.23750.0115IRIS_313-12063japonicaLao0.23470.0117
IRIS_313-11540japonicaGuinea0.23740.0121IRIS_313-12006japonicaMalaysia0.23440.0105
IRIS_313-12266japonicaMyanmar0.23660.0120IRIS_313-7856japonicaThailand0.23440.0110
B101japonicaChina0.23590.0113IRIS_313-12266japonicaMyanmar0.23320.0121
B144japonicaChina0.23420.0107IRIS_313-12226japonicaLao0.23260.0111
IRIS_313-11652japonicaChina0.23380.0122IRIS_313-9366japonicaUnited States of America0.23260.0113
CX307japonicaChina0.23380.0115IRIS_313-8046japonicaItaly0.23240.0118
IRIS_313-9366japonicaUnited States of America0.23360.0117IRIS_313-11652japonicaChina0.23240.0117
IRIS_313-8046japonicaItaly0.23350.0127IRIS_313-7850japonicaMadagascar0.23220.0117
IRIS_313-12063japonicaLao0.23350.0120B199japonicaChina0.23220.0111
B199japonicaChina0.23310.0119IRIS_313-11923japonicaThailand0.23200.0121
B117japonicaChina0.23280.0121CX353japonicaVietnam0.23140.0110
IRIS_313-11923japonicaThailand0.23240.0121B025japonicaIndonesia0.23120.0108
B025japonicaIndonesia0.23170.0115CX352japonicaChina0.23100.0111
IRIS_313-11571japonicaChina0.23120.0119B117japonicaChina0.23020.0117
CX352japonicaChina0.23110.0120IRIS_313-11755japonicaLiberia0.23000.0117
IRIS_313-7850japonicaMadagascar0.23100.0123B101japonicaChina0.22980.0110
IRIS_313-11908japonicaChina0.23100.0129IRIS_313-12348japonicaLao0.22950.0117
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Fu, Y.-B. Genetically Distinct Rice Lines for Specific Characters as Revealed by Gene-Associated Average Pairwise Dissimilarity. Crops 2024, 4, 636-650. https://doi.org/10.3390/crops4040044

AMA Style

Fu Y-B. Genetically Distinct Rice Lines for Specific Characters as Revealed by Gene-Associated Average Pairwise Dissimilarity. Crops. 2024; 4(4):636-650. https://doi.org/10.3390/crops4040044

Chicago/Turabian Style

Fu, Yong-Bi. 2024. "Genetically Distinct Rice Lines for Specific Characters as Revealed by Gene-Associated Average Pairwise Dissimilarity" Crops 4, no. 4: 636-650. https://doi.org/10.3390/crops4040044

APA Style

Fu, Y.-B. (2024). Genetically Distinct Rice Lines for Specific Characters as Revealed by Gene-Associated Average Pairwise Dissimilarity. Crops, 4(4), 636-650. https://doi.org/10.3390/crops4040044

Article Metrics

Back to TopTop