Microsatellite Dataset for Cultivar Discrimination in Spring Orchid (Cymbidium goeringii)

Cymbidium goeringii Reichb. fil., locally known as the spring orchid in the Republic of Korea, is one of the most important and popular horticultural species in the family Orchidaceae. C. goeringii cultivars originated from plants with rare phenotypes in wild mountains where pine trees commonly grow. This study aimed to determine the cultivar-specific combined genotypes (CGs) of short sequence repeats (SSRs) by analyzing multiple samples per cultivar of C. goeringii. In this study, we collected more than 4000 samples from 67 cultivars and determined the genotypes of 12 SSRs. Based on the most frequent combined genotypes (CG1s), the average observed allele number and combined matching probability were 11.8 per marker and 3.118 × 10−11, respectively. Frequencies of the CG1 in 50 cultivars (n ≥ 10) ranged from 40.9% to 100.0%, with an average of 70.1%. Assuming that individuals with the CG1 are genuine in the corresponding cultivars, approximately 30% of C. goeringii on the farms and markets may be not genuine. The dendrogram of the phylogenetic tree and principal coordinate analysis largely divided the cultivars into three groups according to their countries of origin; however, the genetic distances were not great among the cultivars. In conclusion, this dataset of C. goeringii cultivar-specific SSR profiles could be used for ecogenetic studies and forensic authentication. This study suggests that genetic authentication should be introduced for the sale of expensive C. goeringii cultivars. We believe that this study will help establish a genetic method for the forensic authentication of C. goeringii cultivars.


Introduction
Cymbidium goeringii Reichb.fil., belonging to the family Orchidaceae is one of the most important and popular horticultural species in East Asia [1,2].C. goeringii is locally known as the spring orchid in the Republic of Korea because it blooms in early spring.C. goeringii cultivars are classically divided into two types: the "flower-variant cultivar" showing characteristic phenotypes of flower color and shape and the "leaf-variant cultivar" showing characteristic phenotypes of leaf color or variegation pattern and shape (Figure 1).Plants showing characteristic phenotypes in both the leaf and flower are usually called "double-variant cultivar" [3].In the Republic of Korea, thousands of C. goeringii cultivars have been registered by two orchid registration organizations: the Korea Orchid Registration Association (KORA; http://www.koreso.com/,accessed on 31 May 2023) and the Registration Committee of the Korea Orchid Union (RCKOU; http://www.kour.or.kr/, accessed on 31 May 2023).
C. goeringii cultivars with horticulturally rare phenotypes are actively traded commercially at high prices through direct sales between sellers and purchasers or online systems.Depending on the cultivar, prices vary widely from a few to hundreds of thousands USD.Orchid cultivators and purchasers frequently worry about non-genuine cultivars, in which plants belonging to different cultivars are provided instead of the real cultivars.C. goeringii cultivars originally grew in wild mountains, where pine trees commonly grow.Each spring orchid showing an unusual unique phenotype was selected by orchid collectors and registered as a specific cultivar.Cymbidium species are usually reproduced through self-pollination in wild fields [4,5]; however, the individual number of each cultivar was increased through asexual vegetative propagation as a method of separating the shoots of an individual plant.Theoretically, all individuals originating from a particular cultivar strain are genetically identical when mutations that have occurred after cultivar fixation are ignored.Therefore, forensic discrimination can be used to determine whether two C. goeringii plants originate from a common cultivar, using microsatellite profiling [3,6].This application is similar to microsatellite genotyping and the comparison of genetic profiles between suspects and the evidence(s) collected at crime scenes.
When a new C. goeringii cultivar was registered, whole orchid photograph(s) and phenotypic features were provided to the orchid registration organizations (KORA or RCKOU); however, neither tissue sample nor genetic information was deposited.Therefore, tracking the exact origin of a particular cultivar is difficult.In addition, determining which of the plants with similar phenotypes but different genetic profiles are from the original cultivar, is challenging.In the case of the leaf-variant types, the origin could be predicted from the differences in leaf shape, color, and variegation patterns, but it is difficult to predict originality from the phenotypes of the flower-variant types.This is because blooming is observed only in spring and the subtle features of flower colors and shapes differ depending on the cultivation conditions.A recent study suggested that approximately 40% of the purchased spring orchids (with tagged names) among the 10 flower-variant cultivars in the Republic of Korea were not genuine cultivars [3].

Collection of C. goeringii Samples
In this study, 4048 Cymbidium samples were collected from 67 cultivars (Table 1).Among these, 61 cultivars (n = 3957) were originally collected from the mountains of the Republic of Korea, whereas six cultivars (n = 91) originated from Japan and China.Most samples were flower-variant cultivars; some were leaf-or double-variant cultivars.Five C. goeringii from Japan were collected for genotype comparisons and phylogenetic analyses.The Chinese cultivar, Hwanguhajeong belonging to C. forestii was sampled as anoutgroup.In addition to the established cultivar samples, 155 wild C. goeringii samples were collected from the Republic of Korea mountains.

DNA Purification
Leaves or roots of the orchid plants were cut into 0.5-1 cm long fragments and disrupted using a TissueLyser II (Qiagen, Hilden, Germany).Genomic DNA was purified from the disrupted samples using the DNeasy Plant Mini Kit (Qiagen, Hilden, Germany).DNA concentration was determined using a NanoDrop 2000 (Thermo Fisher Scientific, Wilmington, NC, USA).

Phylogenetic and Sibling Analysis
Genetic distances among cultivars were determined from similarities in CGs using a GenAlEx (v6.5) [21].A dendrogram of the phylogenetic tree was constructed from the dissimilarity matrix using the unweighted pair group method with arithmetic averages.In addition, the dissimilarity matrix was used to perform principal coordinate analysis (PCoA), which graphically represents the genetic relationships among C. goeringii cultivars.The sibling probability using CG profiles was determined using Bayes theory [22].

Statistical Analysis
Allele frequencies were calculated by counting the number of observed alleles in all examined samples.Reference allele frequencies were obtained from a wild Republic of Korean C. goeringii population (n = 155).The combined matching probability (CMP) for the CGs of the 12 SSRs was calculated using the PowerStatsV12 program (Promega, Madison, WI, USA).A simple program based on MATLAB (MathWorks, Natick, MA, USA) was designed to search for similar CG from a pool of several thousand CGs.

Determination of Combined Genotypes
This study determined the CGs of 12 microsatellite markers for 61 Republic of Korean cultivars (3957 samples) and 6 Japanese and Chinese cultivars (91 samples).In principle, alleles were named by repeating the number of SSR core units, as described earlier [3,6,9].Most of the markers were analyzed well but genotyping of some samples failed for markers such as CG649, CG787, CG1023, and CG1085.The failed markers, even when retested from the same plants, were not amplified, or were poorly amplified by PCR, suggesting the possibility of variations in the primer binding sites.
The most and second most frequent CGs (indicated by CG1 and CG2, respectively), and their CMPs are shown in Table S1.Based on CG1, the average observed allele number was 11.8 per marker with the highest number ( 16) in CG649 and CG709 and the lowest number (8) in CG1320.The average CMP was 3.118 × 10 −11 , ranging from 8.890 × 10 −10 for Daehongbo to 4.496 × 10 −36 for Cheongoksan.This powerful discrimination made it possible to determine whether a C. goeringii individual labeled as a certain cultivar was genuine.

Determination of Cultivar-Specific Combined Genotypes
Only limited information, such as short phenotypic descriptions and photographs, is available for registered orchid cultivars in the Republic of Korea.Therefore, it is difficult to determine which genotype is genuine for two samples with different genotype profiles.If the CGs were the same in the two samples, they might have originated from an identical plant; otherwise, they might have originated from different plants.For a specific cultivar, the most frequent CG (CG1) among the samples that belonged to the specific cultivar was determined as the representative genotype profile of the corresponding cultivar [3].
In this study, 50 Republic of Korean cultivars with a sample size of 10 or more were analyzed for the frequencies of CG1s (Table 2).The frequencies of the CG1s were higher than 50% in most examined cultivars, except for cultivars of Cheonhwangso (41.0%),Hallasan (47.5%),Sacheonwang (40.9%), and Geumsusan (45.5%).In particular, the second frequent CGs (CG1B) were observed at relatively high frequencies of 29.8% in Cheonhwangso and 30.0% in Hallasan.When the predominant and second frequent CGs were compared in the two cultivars (Cheonhwangso: CG1A: 11 ; matching alleles are underlined), they were considerably similar to each other with high sibling probabilities of 71.016% and 99.997%, suggesting that the plant individuals with both pairs were genetically close relatives.The horticultural phenotypes of plants with CG1A and CG1B were indistinguishable between the two cultivars.Therefore, we concluded that the individuals with either CG1A or CG1B were genuine for the corresponding cultivars.The sum of the frequencies of CG1A and CG1B was 70.8% in Cheonhwangso and 77.5% in Hallasan.Among the 50 cultivars (n ≥ 10), the frequency of the CG1 ranged from 40.9% (Sacheonwang) to 100.0% (Jinjusu) with an average of 70.1% (Figure 2).Assuming that individuals with the CG1 are genuine to the corresponding cultivars, approximately 30% of C. goeringii on the farms and markets may not be genuine.The average frequency of the second frequent combined genotype (CG2) was 7.3%.For the CG1s observed in 12 cultivars with fewer than 10 samples, they were considered insufficient for assigning representative SSR profiles for those cultivars (Table S1).Therefore, we considered them as "probable CGs".

Tracing Cultivar Origin for Samples Assumed to Be Non-Genuine
This study traced the actual cultivars in 29.9% of samples that were suggested to be non-genuine.Most of them did not belong to the 66 C. goeringii cultivars examined in this study; however, some were determined to be CG1 of other cultivars (Table 3).
In the cultivar Hwanggeumso, CG1s were detected in seven cultivars (Gwaneum, Youngchoonso, Namhaeso, Cheonhwangso, Geumhaso, Chanbo, and Hobakjeon).The second most frequent CG2 in Hwanggeumso was identical to the CG1 in Gwaneum (50 times observed).In addition, CG1s of Youngchoonso (seven times) and Namhaeso (five times) were frequently observed in the samples as Hwanggeumso.In the samples collected from the cultivar Gwaneum, five types of CGs were identical to the CG1s of different cultivars (Youngchoonso, Cheongeumso, Geumhaso, Hobakjeon, and Chanbo).Among these, Youngchoonso (thirty times) and Hobakjeon (five times) were frequently observed.CG1 in Gwaneum was frequently identified (six times) in the cultivar Cheonhwangso.In the samples of cultivars Cheonsoo and Cheonsa, the CG1 of the Japanese cultivar Chanbo was frequently observed ten times and five times, respectively.In the samples of the cultivar Agassi, CG1 of Jinjusu was identified seven times.In the samples of the cultivar Hongdaewang, CG1 of Jangdan was identified eight times.In the samples of cultivar Munsubong, the CG1 of Hyangsu was identified six times.
The CG1s of the Japanese cultivars Chanbo and Hobakjeon were observed quite frequently in the samples of several different cultivars: Chanbo in Hwanggeumso (three times), Gwaneum (one time), Cheonsoo (ten times), Cheonsa (five times), Hobakjeon in Hwanggeumso (one time), and Gwaneum (five times).For Republic of Korean cultivars, the CG1 of Youngchoonso was observed in the samples from three cultivars (Hwanggeumso, Gwaneum, and Cheongeumso); Gwaneum, in two cultivars (Hwanggeumso and Cheongeumso); and Geumhaso, in two cultivars (Hwanggeumso and Gwaneum).Interestingly, a Chinese cultivar Hwanguhajung which belongs to the C. forestii was once observed in the samples collected by the cultivar Wonmyoung.

Phylogenetic Analysis among Cultivars
A phylogenetic tree was prepared from the genetic distance matrix of the CG1s from the 67 cultivars (Figure 3).The dendrogram roughly divided the cultivars into three groups according to their origin.Cultivars of Republic of Korean origin were separated from Japanese cultivars (Hobakjeon, Changseongjihwa, Sumunsan, Jilbugeum, and Chanbo) and the C. forestii cultivar (Hwanguhajeong) originating from China.The C. goeringii cultivars were also separated by the principal coordinate analysis (PCoA).The cumulative percentage variances of three principal components (cum %) accounted for 11.83%, 18.52%, and 25.05% (Figure 4).With features similar to those in the phylogenetic dendrogram, the PCoA roughly divided them into three groups according to their origin.However, the Republic of Korean cultivar, Hwansaeng was included in the cluster of Japanese cultivars, and some Republic of Korean cultivars, such as Cheonunso and Jinna, were located near the Japanese cultivars.In both the phylogenetic tree and PCoA, the cultivars were poorly separated by variant type.

Discussion
Each established cultivar of C. goeringii is asexually propagated by dividing the shoots of a single plant; therefore, individuals constituting a cultivar are genetically identical clones in principle.In this study, we determined the CGs of 12 SSR markers for more than 60 cultivars of C. goeringii using multiplex PCR and subsequent CG profiling.We previously reported CGs of 10 Republic of Korean flower-variant cultivars [3].The present study determined the CGs and their phylogenetic distribution in large-scale samples, including more than four thousand samples.This study revealed a very strong power of discrimination and polymorphic information, with a powerful average CMP of 3.118 × 10 −11 ; this implies that the possibility of the exact same genotype profile in two randomly chosen C. goeringii samples is less than 1 in 20 billion.Therefore, we believe that profiling the 12 SSR markers is an excellent forensically applicable method to discriminate cultivars (or individuals) and analyze phylogenetic relationships.In addition, the application of this method could be extended as the cultivar discrimination tool for other Cymbidium species such as C. sinense, C. faberi, C. ensifolium, and C. kanran.
The allele names of microsatellites are usually expressed as the relative repeat numbers of short sequence units in the given samples in the phylogenetic and pedigree construction or linkage analyses.This study named alleles based on the absolute values of repeat numbers, similar to that used in forensic genetics and criminal investigations [20].Therefore, the C. goeringii SSR dataset from this study will enable comparative research by ensuring data compatibility among research groups; in addition, it will serve as a standardized reference SSR database.For all examined cultivars, the most common SSR profiles (CG1) and the second most frequent profiles (CG2) with allele names indicated by repeat numbers are presented in Table S1.
The most frequent CGs (CG1s) were observed in 2664 samples from the 50 cultivars with 10 or more samples (n = 3923).The mean of the obtained CG1 ratio for each cultivar showed a similar value of 70.1%.If we assume that only samples with CG1 are genuine to the corresponding cultivars, approximately 30% of the spring orchid cultivars in the Republic of Korean markets and farms may not be genuine.
Non-genuine sales of spring orchids usually involve cultivars showing similar horticultural phenotypes but different prices.The cultivar Hwanggeumso is subject to frequent non-genuine sales.The cultivar Hwanggeumso has a superior phenotype among the class of yellow flowers with a non-anthocyanin white lip, and its price has remained steady at approximately USD 5000.The most frequently observed non-genuine cultivar of Hwanggeumso is Granum.The price of Gwaneum, which has yellow flowers with a non-anthocyanin white lip, is approximately USD 1000.In addition, Youngchoonso is frequently observed as a non-genuine cultivar of Hwanggeumso and Gwaneum.Youngchoonso belongs to the class of yellow flowers with non-anthocyanin white lips.However, its price is approximately USD 300, which is much cheaper than that of Hwanggeumso and Gwaneum.In particular, two Japanese cultivars (Chanbo and Hobakjeon) are sold as Republic of Korean cultivars.These two Japanese cultivars are far cheaper than their Republic of Korean counterparts, such as Hwanggeumso, Gwaneum, Cheonsoo, and Cheonsa.
The frequencies of the second frequent CGs were generally low, below 15%; however, they were high in two cultivars Cheonhwangso (29.8%) and Hallasan (30.0%).In the cultivars of Cheonhwangso and Hallasan, the most common CG (CG1A) and the second most common CG (CG1B) showed similar profiles with high sibling probabilities.In addition, the horticultural phenotypes of both plants having these different CGs were so similar that they are indistinguishable.Therefore, these were a pair of genetically close sister cultivars.Orchid cultivars collected from nearby wild locations that exhibit similar phenotypes are traditionally called sister cultivars.Nam et al. determined the genetic kinship between four groups of closely related sister cultivars of C. goeringii [6].If individual plants with CG1A or CG1B in Cheonhwangso and Hallasan were sister cultivars, it would be difficult to determine which plant was the original.This gives rise to the question of whether plants with both CG1A and CG1B should be considered genuine cultivars of Cheonhwangso or Hallasan.
The phylogenetic tree based on SSR genotypes divided the cultivars into three groups according to their country of origin.The 61 Republic of Korea-origin cultivars were largely separated from the five Japan-origin cultivars and the China-origin C. forestii cultivar in the phylogenetic tree.These results were consistent with those of a previous study [5].However, the genetic distances among the C. goeringii cultivars seemed to be close; even C. forestii from China was not far from the cultivars of C. goeringii.In the PCoA, C. forestii was located between the Republic of Korean and Japanese cultivar groups but was closer to the Japanese group.Several Republic of Korean cultivars were located around a cluster of Japanese cultivars.
The complete nuclear genome sequence of C. goeringii is not available; however, sequences of the full chloroplast genome (cpDNA) of C. goeringii and a hybrid of C. goringii and C. sinense are available [23,24].In addition, the recent RNA sequencing and transcriptomic analyses have suggested molecular mechanisms for phenotyping leaf color, floral patterning, and scent [25][26][27][28].If genetic information on cpDNA variation and RNA expression is added to the dataset of SSR profile in future studies, a much more reliable cultivar discrimination and phylogenetic characterization can be performed in the Cymbidium species.

Conclusions
We examined 61 Republic of Korean cultivars of C. goeringii by genotyping 12 SSR markers for forensic genetic discrimination.The established dataset for C. goeringii cultivarspecific SSR profiles could be used for ecogenetic studies and forensic authentication.This study revealed that almost 30% of C. goeringii in the market may not be genuine.Therefore, we suggest that genetic authentication should be introduced for the sale of expensive C. goeringii cultivars.In addition, we suggest the preparation of guidelines for the DNA deposition and profiling of SSR genotypes in newly registered C. goeringii cultivars.We believe that this study will help establish a genetic method for the forensic authentication and phylogenetic analysis of C. goeringii cultivars.
al. suggested the potential usefulness of microsatellite combined genotypes as a forensic authentication tool for discriminating among C. goeringii cultivars [3]. Lee et al. determined microsatellite combined genotypes for ten Korean cultivars of C. goeringii [3].Nam et al. analyzed the genetic kinship among cultivars using combined genotypes in four closely related cultivar groups [6].

Figure 2 .
Figure 2. Frequencies of combined genotypes of 12 SSR markers in 50 C. goeringii cultivars with samples of 10 or more (CG1: most frequent combined genotypes, CG2: second frequent combined genotypes).The red dotted line represents the mean frequency of CG1.

Figure 3 .
Figure 3. Phylogenetic tree of 66 cultivars in C. goeringii based on the genetic distance measured from genotypes of 12 SSR loci using the unweighted pair group method with arithmetic average (UPGMA) as the cluster method.The five cultivars within a yellow box are Japan-origin C. goeringii, while the Hwanguhajeong within a blue box is China-origin C. forestii.

Table 2 .
Combined genotypes of 12 SSRs observed in C. goeringii cultivars with samples of 10 or more.