citSATdb: Genome-Wide Simple Sequence Repeat (SSR) Marker Database of Citrus Species for Germplasm Characterization and Crop Improvement

Microsatellites or simple sequence repeats (SSRs) are popular co-dominant markers that play an important role in crop improvement. To enhance genomic resources in general horticulture, we identified SSRs in the genomes of eight citrus species and characterized their frequency and distribution in different genomic regions. Citrus is the world’s most widely cultivated fruit crop. We have implemented a microsatellite database, citSATdb, having the highest number (~1,296,500) of putative SSR markers from the genus Citrus, represented by eight species. The database is based on a three-tier approach using MySQL, PHP, and Apache. The markers can be searched using multiple search parameters including chromosome/scaffold number(s), motif types, repeat nucleotides (1–6), SSR length, patterns of repeat motifs and chromosome/scaffold location. The cross-species transferability of selected markers can be checked using e-PCR. Further, the markers can be visualized using the Jbrowse feature. These markers can be used for distinctness, uniformity, and stability (DUS) tests of variety identification, marker-assisted selection (MAS), gene discovery, QTL mapping, and germplasm characterization. citSATdb represents a comprehensive source of markers for developing/implementing new approaches for molecular breeding, required to enhance Citrus productivity. The potential polymorphic SSR markers identified by cross-species transferability could be used for genetic diversity and population distinction in other species.


Introduction
The genetic selection of plants in conventional plant breeding is decided by the parents and influenced by different environmental conditions [1]. In conventional plant breeding, alleles are mixed over the generations, resulting in the development of new combinations, which helps in achieving higher trait value through selection. Developing a variety of woody plant species through traditional breeding may take 1012 years [2]. This time period can be reduced by performing marker-assisted selection (MAS) on seedling material [3]. In recent years, marker-assisted selection has become popular in breeding programs for many crops [4][5][6][7]. One of the pre-requisites for using MAS is the discovery of DNA-based markers, which are tightly linked to the target trait of interest. Microsatellite (SSR) markers have been the system of choice for quantitative trait loci (QTL) mapping in many crops for a

In Silico Simple Sequence Repeat Mining and Primer Designing
SSRs were identified in genomes of 8 citrus species (Table 1). A Perl script (miSATminer) was written to identify repeat motifs in a genome sequence. Microsatellites were identified with parameters such as 10 repeat units for mono, 5 repeat units for the di, tri, tetra, penta, and hexa. In-house Perl scripts were used to fetch the flanking regions of the identified SSRs for primer designing. Primer3 executables were used to design primers with the following default parameters: melting temperature, 55-65 • C; GC content, 40-60%; primer size, 18-27 bp; length and product size, 150-280 bp [43].

Functional Annotation of SSR Markers
The full annotation of gene functions is available for these eight citrus species and was implemented in the Jbrowse genome browser inside the citSATdb database. In this genome browser, markers can be visualized against the reference sequence, gene coordinates, and structural and functional details.

Marker and Database Development Workflow
Microsatellite repeat loci were mined by pattern identification in the genome sequences using miSATminer, our inhouse developed Perl script. This script mines SSR loci from the genome sequences with custom repeat parameters. SSR primers for genotyping were designed using Primer3 executables by extracting a flanking length of 300 bp upstream and 300 bp downstream of SSR loci in the genome. Selected repeats can be viewed with their markers in the sequence. ePCR was implemented in the database for polymorphism. In citSATdb, we have provided eight genome assemblies and an option for uploading user sequences to check amplification. The markers with variable product size in two genomes were considered as polymorphic markers. All the results can be downloaded as a CSV file. The whole workflow of the database is depicted in (Figure 1).

Database Development and Web Interface
The Citrus microsatellite database (citSATdb) is a three-tier-based relational database developed with a client tier, middle tier, and database tier. Predicted SSRs and their corresponding primers were stored in MySQL data tables and accessed through the Apache server. A user-friendly interface of the database was developed with PHP, HTML5, and Jquery. In silico microsatellite designing with miSATminer and custom Perl scripts and Primer3 was implemented for primer designing. Jbrowse for the visualization of genomic sequences, SSRs and primers was also implemented. The NCBI local and remote database was also implemented for similarity searches. e-PCR was implemented for cross-species transferability. The web server contains seven tabs viz. Home, About, Species, Tools, JBrowse, Help, and Contact; the database will be updated regularly with newly available genome data.

Cross-Species Comparison of Citrus Species SSRs
For the development of the Citrus web genomics resource, SSR loci were mined successfully using miSATminer. A total of 1,699,853 putative microsatellites were mined from the genomes of eight  Table 2). Previous studies have reported a negative correlation between the SSR density and genome size [44]. However, the SSRs identified in our study of eight Citrus species show no correlation between the SSR density and genome size. This is in line with some of the recent findings which reported that there is no correlation between the genome size and SSRs density; genome size differences may lead to the degree of microsatellite repetition in the genome [45][46][47][48][49][50].

SSR Motifs Characterized by Repeat Length
In all the species, mono-nucleotide repeats were most abundant, followed by di-, tri-, tetra-, penta-, and hexa-nucleotide repeats. Among all the citrus species, the maximum number of mono-nucleotide repeats was found in Fortunella hindsii (152,611) followed by C. ichangensis  Figure 2). From these results, a high abundance of mono-nucleotide repeats was observed in all the genomes, which may be due to the intrinsic limitation of the chemistry of next-generation sequencing (NGS) technology used for data generation [51]. Similarly, di-nucleotide repeats in higher abundance have also been reported in other crops [52,53].

Designed SSR Primers, Motif Characterization by Repeat Length
citSATdb is a comprehensive microsatellite database of Citrus represented by eight species containing 1,296,500 in silico predicted markers. Distribution-wise, mononucleotide repeat primers were the most abundant followed by di-, tri-, tetra-, penta-, and hexa-nucleotide. Among the eight species, the maximum number of mononucleotide repeat primers were designed in F. hindsii (128,597) followed by C. maxima (120,885), C. ichangensis   The designed SSR primers can be used for QTL/candidate gene identification, linkage mapping, and germplasm characterization. Varieties with similar morphological characteristics are very difficult to differentiate from just the phenotypic study. To conquer these difficulties, SSR markers have been used in previous studies for variety characterization, trait improvement, linkage mapping, molecular breeding application, variety development, and phylogenetic and taxonomic comparisons [8,[53][54][55]. Similarly, 24 SSR markers were used to assess genetic diversity in 370 Citrus accessions [19]. The designed putative primers present in citSATdb can be used in rapid genotyping for genetic diversity and differentiating varieties. Varietal differentiation using SSR markers has already been reported in many other crops, such as barley [56], sugarcane [57], eggplant [58], capsicum [59], and sesame [60]. These markers can be further explored for trait improvement averse to abiotic and biotic stresses. For example, in Satsuma mandarins, SSR has been used to discover one major QTL for male sterility, and such a QTL can be used in seedless citrus breeding by using flanking region SSR markers with allele size differences between donor and recipient varieties [61]. Such markers can be used for high-density linkage mapping and the discovery of genes needed to improve specific traits. Using SSRs, a linkage map was developed, and QTL mapping was performed to find loci related to the freezing tolerance of citrus [62].
The availability of whole-genome assemblies of different plant species in the public domain provides an opportunity for the study of cross-species transferability in closely related species. Trait-specific candidate genes may be cloned from different species [63]. In silico cross-species transferability can also be predicted with citSATdb, which can be further used for phylogenetic and diversity studies. A similar use has been reported for diversity analysis in citrus species with few numbers of markers [19].

Comparison with Another Databases
Many databases of marker development in plants are publicly available. The Pan-Species Microsatellite Database (PSMD) database contains eight Citrus species in its repository, although it lacks some features such as e-PCR, JBrowse, and BLAST. Plant micro-satellite Database (PMDbase) is another online database, but it has some limitations such as the markers search by user choice, repeat kind, motif type, location in the genome, etc. Secondly, only two species of citrus are present in this database. Similarly, SSRome also has only two Citrus species and lacks features such as ePCR, JBrowse, BLAST, etc. The citSATdb resource overcomes these limitations and is specifically designed as a user-friendly interface to assist the researchers in the horticultural sciences. A detailed comparison of PMDBase, PSMD, SSRome, and citSATdb is presented in (Table 6).

citSATdb: Citrus Microsatellite Web-Genomic Resource
The citrus web-genomic resource (citSATdb) was developed successfully using a three-tier architecture. This is a comprehensive microsatellite database of Citrus represented by eight species containing 1,296,500 in silico predicted markers. The web server contains seven tabs viz. Home, About, Species, Tools, JBrowse, Help, and Contact. The 'Species' tab provides information about the selected species on left and search options on the right side. In silico predicted markers can be searched by selecting genic or genomic, chromosome/scaffold-wise, along with motif type, repeat type, length, and location in the genome. The search results provide a visualization of repeat and flanking primers on the sequence extracted with 500 bp upstream and downstream of the repeat. It also provides an option for ePCR whereby users can check the in silico amplification of selected primers in the genome or cross-species transferability with the user-given sequence(s). All the results can be downloaded in a CSV format text file. The 'Tool' page provides two tabs-SSR prediction and BLAST. miSATminer was implemented with custom scripts to design SSRs and their primers for user input sequences. Standalone BLAST was implemented on the BLAST search page, where users can align their SSR query sequences to genomes. All the eight genome sequences can be visualized with gene and SSR coordinates on the genome using the 'JBrowse' table. The 'Help' tab contains a detailed tutorial for using the database efficiently and a list of frequently asked questions. A detailed workflow of exploring the citSATdb and its search features is illustrated in Figure 4.

Conclusions
We report here a comprehensive web genomic resource for the genus Citrus covering three of its commercially important species. citSATdb, accessed freely via the address http://bioinfo.usu.edu/ citSATdb/, contains a total of 1,296,500 putative microsatellite DNA markers. Our findings on the cross-species transferability of microsatellite loci among six different species of Citrus can be used to cater to the need for molecular markers, especially for the more than 100 species of the genus Citrus for which there are no whole-genome sequence data available yet. This genomic resource can be of immense use to the global community. It can be used for chromosome-wise microsatellite locus mining and primer designing for non-genic and genic FDM-SSR for rapid genotyping. It can also be used to accelerate polymorphism discovery by e-PCR, thus being economically beneficial and needed in future re-sequencing projects. The database can be used not only for knowledge discovery research, such as QTL and gene mapping, but also for marker-assisted breeding in Citrus germplasm improvement and management.
Author Contributions: N.D. and R.K. formulated and designed the research. N.D. and C.D.L. analyzed the data. N.D. and M.M. designed and constructed the web database. Writing-original draft preparation, N.D.; writing-review and editing, R.K.; visualization, N.D. and R.K.; supervision, R.K.; project administration, R.K.; funding acquisition, R.K. All authors have read and agreed to the published version of the manuscript.

Funding:
The authors acknowledge the support to this study from the faculty start-up funds to RK from the Center for Integrated BioSystems (CIB)/Department of Plants, Soils, and Climate, USU. This research was also supported by the Utah Agricultural Experiment Station (UAES), and approved as journal paper number 9410. The funding body did not play any role in the design of this study, the collection, analysis, and interpretation of data, or in the writing of this manuscript.