Development and Application of EST-SSR Markers Related to Lead Stress Responses in Kenaf Based on Transcriptome Sequencing Data

: Kenaf is an important bast ﬁber crop. In order to diversify the available kenaf simple sequence repeat (SSR) molecular markers and generate markers potentially useful for kenaf breeding, we developed expression sequence tag simple sequence repeat (EST-SSR) molecular markers based on lead-stressed kenaf transcriptome sequencing data and spliced unigene sequences. Additionally, the distribution of the SSRs in the transcriptome and the potential functions of the SSR-containing genes were determined. Moreover, SSR markers in the differentially expressed genes (DEGs) of a protein–protein interaction (PPI) network were analyzed to screen for polymorphic markers, which were used to examine the genetic diversity and population structure of kenaf germplasm resources. The genetic diversity and population structure of 138 kenaf germplasm materials revealed that 22 EST-SSR markers could be used to distinguish the kenaf germplasms. The 22 EST-SSR markers enrich the kenaf molecular markers database and provide an important tool for future genetic improvement of kenaf resistance to lead stress.


Introduction
Kenaf (Hibiscus cannabinus L.) is an important bast fiber plant that is used as the raw material for hemp spinning and producing environmentally friendly natural fiber products [1].Kenaf can accumulate heavy metals from the soil [2], making it important for the recovery of acidic soils contaminated with heavy metals [3].Identifying lead-stressresponsive kenaf genes and developing molecular markers for these functional genes can accelerate the screening of heavy-metal-resistant kenaf materials and decrease the workload and time required for selecting suitable parents for breeding.
Simple sequence polymorphism (SSR) is one of the third generation markers after morphological markers and isoenzyme markers.SSR and SNP markers are the two most mainstream molecular marker technologies at present.Different from SNP markers' secondclass sites, which are easy to automate with high-throughput analysis, SSR markers are co-dominant markers like SNP, but their polymorphism is much higher than SNP.It is widely used in the fields of genetic diversity evaluation, population structure analysis, genetic relationship identification, and fingerprinting [4].
At present, the research and application of kenaf molecular markers have been carried out, which are mainly applied in the study of genetic diversity, the construction of genetic linkage map, and the mapping of QTLS for important agronomic traits.The molecular markers used include SRAP, RAPD, AFLP, ISSR, and SSR [5,6].In general, the number of kenaf molecular markers is still very small; the current mainstream molecular marker technologies, SNP and SSR molecular markers, especially, are much behind compared with other crops.There is an urgent need to develop new molecular markers especially SSR and SNP for kenaf molecular breeding and accelerate the process of kenaf genetic improvement.Kenaf molecular markers have been studied and applied [7].The applied molecular markers are associated with random amplified polymorphism marker (RAPD) [8], single nucleotide polymorphism (SNP) [9], and simple sequence repeat (SSR) [10].However, there are still relatively few molecular markers for kenaf, and the application of SSR molecular markers is currently much more extensive in flax [11,12], ramie [13], and okra [14].The development of new molecular markers, especially SSR, will enhance the genetic improvement of kenaf.If a gene with such a marker controls important agronomic traits, the SSR sequence variation in the exon region may lead to a loss-of-function mutation that alters particular traits.Accordingly, the use of expression sequence tag simple sequence repeat (EST-SSR) markers may accelerate quantitative trait locus (QTL) mapping and the identification of candidate genes.
During the transcriptome-based development of molecular markers, the sampling methods used for specific experiments, including analyses of different biotic and abiotic stress responses, different growth and developmental stages, and different tissues and organs, will affect gene expression, which will further influence the identification of differentially expressed genes (DEGs).Using transcriptome sequencing technology to develop EST-SSR markers is still an effective way to rapidly increase the number of kenaf molecular markers.The number of molecular markers available for kenaf genetic breeding is still much lower than that of other crops.In order to further enrich the number of kenaf molecular markers, we used the reported unigene sequences derived from kenaf lead-stressed transcriptome sequencing data [3] to analyze the SSR features at the genome level and to clarify the potential functions of the unigenes containing SSR motifs.

Materials and SSR Marker Development
Kenaf cultivar H368 was obtained from Professor Defang Li (Institute of Bast Fiber Crops, Chinese Academy of Agricultural Sciences).When the plants grew to a height of nine cm, they were treated with a lead concentration of 4000 mg/kg (Pb(NO 3 ) 2 treatment solution).We re-applied 4000 mg/kg of Pb(NO 3 ) 2 in each pot at one time according to the matrix.The application method was to dissolve Pb(NO 3 ) 2 in 1000 mL ddH 2 O and then pour it into the potted soil.Control plants were treated with the same solution without Pb(NO3) 2 .The materials used for the transcriptome sequencing analysis were described in a recently published article [3].The transcriptome data are available in the NCBI SRA (SRR9163842 to SRR9163845).The kenaf germplasm resources used in this study and their geographical sources are listed in Supplementary Table S1.Transcripts were de novo assembled using the default parameters of Trinity and then further clustered into unigenes using the Corset software [15].SSR marker development was described in a recently published article [16].The assembled unigenes were imported into the MISA software for the SSR analysis.The type and frequency distribution of the SSR motifs were recorded.The repetitive motifs of SSRs were analyzed according to the following criteria: number of repeating mononucleotides ≥ 10; number of repeating dinucleotides ≥ 6; and number of repeating trinucleotides, tetranucleotides, pentanucleotides, and hexanucleotides ≥ 5. Specific SSR primers were designed on the basis of the unigene sequences using Primer3.The technical route of this study was shown in Figure 1.

PCR Reaction System, Conditions, and Data Analysis
According to the primer sequence, fluorescent SSR primers were synthesized an FAM fluorescent groups were added to the 5 end of the primer.DNA Polymerase use the Takara DNA polymerase (TransGen Phi29 DNA Polymerase, article number LP101 01).The total volume of PCR system was 20 μL, DNA template 2 μL, Buffer 2 μL, TransTa 0.3 μL, dNTP1.6 μL, ddH2O 12.1 μL, and positive and negative primers 1 μL each (con centration 2 μmol/μL).The PCR amplification program was set as predenaturation 94 °C 4 min, denaturation 94 °C 30 s→ annealing 56 °C 90 s→ extension 72 °C 1 min.These thre stages were repeated 35 times, extension 72 °C 5 min, and preservation at 4 °C.After PCR amplification, 1 μL PCR product was taken and electrophoresis was carried out o ABI3730xl capillary electrophoresis apparatus.After electrophoresis, GeneMapper4.software was used to read and SSR genotyping information was derived.

GO and KEGG Enrichment Analyses
The Enrichment function of TBTools software [17] was used for GO (level3) [18] an KEGG [19] enrichment analyses of the SSR-containing unigenes and SSR-containing DEG [20] in the protein-protein interaction (PPI) network.It could help to identify and analyz the common genes of unigenes and PPI as well as the involved main biological processes molecular functions, cell components, and metabolic pathways.

Analysis of the DEGs in the PPI Network
STRING (https://string-db.org/, 1 March 2018) was used to analyze the PPI network containing DEGs, whereas the Cytoscape software [21] was used to visualize and edit th PPI networks.

Genetic Diversity and Structure Analysis
The SSR genotyping data were converted according to the requirements of each soft ware format.The PowerMarker 3.25 software [22] was used to calculate the polymor phism information content (PIC) value for each locus.The NTSYSPC 2.10e software wa

GO and KEGG Enrichment Analyses
The Enrichment function of TBTools software [17] was used for GO (level3) [18] and KEGG [19] enrichment analyses of the SSR-containing unigenes and SSR-containing DEGs [20] in the protein-protein interaction (PPI) network.It could help to identify and analyze the common genes of unigenes and PPI as well as the involved main biological processes, molecular functions, cell components, and metabolic pathways.

Analysis of the DEGs in the PPI Network
STRING (https://string-db.org/,accessed on 1 March 2018) was used to analyze the PPI networks containing DEGs, whereas the Cytoscape software [21] was used to visualize and edit the PPI networks.

Genetic Diversity and Structure Analysis
The SSR genotyping data were converted according to the requirements of each software format.The PowerMarker 3.25 software [22] was used to calculate the polymorphism information content (PIC) value for each locus.The NTSYSPC 2.10e software was used to calculate the Jaccard genetic similarity coefficient between sample pairs and for compiling the genetic similarity coefficient matrix [23].
Nei's (1983) genetic distance based on the allele frequency was calculated using the PowerMarker 3.25 software.Moreover, a phylogenetic tree was constructed using the neighbor-joining method of the MEGA7.0program.The SSR genotyping data were imported into the Structure 2.0 software and then analyzed.Specifically, K was set from 1 to 20 and the Markov Chain Monte Carlo method was completed with 10,000 iterations and a burn-in of 100,000 samples.Each K value was run three times.The Structure results were analyzed using the Structure Harvester online tool [24].The ∆K value curve for the K values was drawn, and the best K value was determined based on the peak value.The indfile corresponding to the best K value for the three runs was downloaded and imported into the Clumpp 2.0 software.The three results were merged into a Q-value matrix [25], after which the Q plot was drawn using Structure 2.0.The genetic structure of the population was analyzed by comparing the neighbor-joining clustering model with the Structure analysis model.

Analysis of SSR Characteristics
Based on the transcriptome sequencing data, we obtained 136,854 spliced unigenes comprising 151,754,502 bp.The results of the SSRs in the unigene sequences using MISA are presented in Table 1.A total of 43,457 SSRs were detected in all the unigenes.They were distributed in 34,180 sequences, accounting for 24.98% of all the unigenes.There was an average of 1.23 SSRs per sequence.Additionally, 7421 sequences had more than one SSR and 2321 SSRs were present in a compound formation.Of the unigenes containing SSRs, 749 genes were DEGs significantly responsive to lead stress, including 259 up-regulated genes and 454 down-regulated genes.EST-SSR can directly reflect the diversity of related genes and has good universality.In recent years, with the development of high-throughput transcriptome sequencing technology, EST-SSR marker has been widely studied in many crops [26][27][28][29].The differentially expressed genes found in our study were significantly more than the differentially expressed genes found in okra [16].Among the identified SSR motifs, the mononucleotide motifs were the most common, followed by the trinucleotide motifs and dinucleotide motifs (Figure 2).This result is consistent with earlier reports of okra research [16].Regarding these SSR motifs, a total of 26,949 unigene primers (three primer pairs for each SSR locus) were generated to produce a marker library for screening polymorphic EST-SSR markers (Supplementary Table S2).
The SSR-containing unigenes were functionally characterized via GO and KEGG enrichment analyses (Supplementary Figures S1 and S2).The enriched molecular function GO terms of these unigenes included zinc ion binding, transcription regulator activity, and DNA-binding transcription factor activity.The enriched cellular component GO terms were mainly whole membrane, membrane-bounded organelle, and protein-containing complex.The main biological process GO terms were nucleobase-containing compound biosynthetic process, which regulated nitrogen compound metabolic process and gene expression.The main enriched KEGG pathways of the SSR-containing unigenes were folate biosynthesis (00790), transcription factors (03000), and photosynthesis proteins (00194).DNA-binding transcription factor activity and transcription factors (03000) were also reported in Abelmoschus esculentus [16].The SSR-containing unigenes were functionally characterized via GO and KEGG enrichment analyses (Supplementary Figures S1 and S2).The enriched molecular function GO terms of these unigenes included zinc ion binding, transcription regulator activity, and DNA-binding transcription factor activity.The enriched cellular component GO terms were mainly whole membrane, membrane-bounded organelle, and protein-containing complex.The main biological process GO terms were nucleobase-containing compound biosynthetic process, which regulated nitrogen compound metabolic process and gene expression.The main enriched KEGG pathways of the SSR-containing unigenes were folate biosynthesis (00790), transcription factors (03000), and photosynthesis proteins (00194).DNA-binding transcription factor activity and transcription factors (03000) were also reported in Abelmoschus esculentus [16].

Analysis of DEGs in the PPI Network
We analyzed DEGs associated with the PPI network, as the associated genomes generally have similar functions [30].Among the 1697 DEGs identified in the transcriptome, 232 DEGs (96 up-regulated and 136 down-regulated) may encode proteins that physically interact (Figure 3).Additionally, at least one SSR motif was detected in 95 gene sequences.Primers were designed for 57 DEGs.The largest PPI network consisted of 91 genes, which were enriched with biological process GO terms, including dicarboxylic acid metabolic process, carboxylic acid metabolic process, and oxoacid metabolic process (Figure 4A).These genes might be important for lead stress responses in kenaf.Oxoacid metabolic process was also reported in Abelmoschus esculentus [16].The main enriched molecular function GO terms were antioxidant activity, NAD binding, and catalytic activity, whereas the enriched cellular component GO terms were intracellular organelle, organelle, and intracellular.The enriched KEGG pathways among these PPI network genes included cytoskeleton proteins, starch and sucrose metabolism, and DNA replication proteins (Figure 4B).

Analysis of DEGs in the PPI Network
We analyzed DEGs associated with the PPI network, as the associated genomes generally have similar functions [30].Among the 1697 DEGs identified in the transcriptome, 232 DEGs (96 up-regulated and 136 down-regulated) may encode proteins that physically interact (Figure 3).Additionally, at least one SSR motif was detected in 95 gene sequences.Primers were designed for 57 DEGs.The largest PPI network consisted of 91 genes, which were enriched with biological process GO terms, including dicarboxylic acid metabolic process, carboxylic acid metabolic process, and oxoacid metabolic process (Figure 4A).These genes might be important for lead stress responses in kenaf.Oxoacid metabolic process was also reported in Abelmoschus esculentus [16].The main enriched molecular function GO terms were antioxidant activity, NAD binding, and catalytic activity, whereas the enriched cellular component GO terms were intracellular organelle, organelle, and intracellular.The enriched KEGG pathways among these PPI network genes included cytoskeleton proteins, starch and sucrose metabolism, and DNA replication proteins (Figure 4B).

Validation of EST-SSR Molecular Markers
To validate the developed SSR markers and screen for polymorphisms, we synthesized 52 primer pairs for SSR-containing genes in the PPI network.Genomic DNA from 30 representative kenaf varieties/lines were used to screen and verify the polymorphism of the markers.First, using ordinary primers, DNA from one of eight random samples was used as the template for a PCR amplification.The amplified fragments were analyzed by 1% agarose gel electrophoresis.Of the 52 primer pairs, 33 primer pairs were highly specific and produced PCR fragments and were detected as clear bands in agarose gels.The remaining 19 primer pairs did not amplify a fragment with the expected size (100-400 bp) or they amplified sequences non-specifically.Thus, 33 SSR primer pairs with the FAM fluorescent group were synthesized for a PCR amplification.We illustrated the fragment analysis results of one SSR marker and one sample (Figure 5).The resulting fragments were analyzed using the ABI 3730xl system.A total of 25 polymorphic SSR markers, four non-polymorphic markers, and four PCR products with no specific markers were obtained.The preliminarily verified genetic diversity of 30 samples with 25 markers are presented in Supplementary Table S3.The average number of alleles was 7.12, ranging from 3 to 14; the average number of effective alleles was 3.23, ranging from 1.07 to 7.14; the average PIC value was 0.57, ranging from 0.06 to 0.85; and the average Shannon diversity index was 0.61, ranging from 0.07 to 0.86.These results indicate that the 25 SSR markers are highly polymorphic and are broadly applicable for analyzing genetic diversity, population structures, and genetic relationships as well as for DNA fingerprinting [31].After analyzing the genotyping success rate of these 25 EST-SSR markers, we eliminated three markers with the lowest success rate (KSSR32, KSSR42, and KSSR46), and the remaining 22 high-quality EST-SSR markers were retained for investigations of the genetic diversity and population structure of a large number of samples.The genetic locations of these 22 EST-SSR markers are presented in Figure 3.Ten of the markers were located in genes in the largest PPI network, whereas the remaining 12 markers were present in genes in eight smaller PPI networks.To further verify the utility of the 22 new molecular markers, we conducted SSR genotyping analyses involving 108 kenaf germplasm materials.
ment analysis results of one SSR marker and one sample (Figure 5).The resulting fragments were analyzed using the ABI 3730xl system.A total of 25 polymorphic SSR markers, four non-polymorphic markers, and four PCR products with no specific markers were obtained.The preliminarily verified genetic diversity of 30 samples with 25 markers are presented in Supplementary Table S3.The average number of alleles was 7.12, ranging from 3 to 14; the average number of effective alleles was 3.23, ranging from 1.07 to 7.14; the average PIC value was 0.57, ranging from 0.06 to 0.85; and the average Shannon diversity index was 0.61, ranging from 0.07 to 0.86.These results indicate that the 25 SSR markers are highly polymorphic and are broadly applicable for analyzing genetic diversity, population structures, and genetic relationships as well as for DNA fingerprinting [31].After analyzing the genotyping success rate of these 25 EST-SSR markers, we eliminated three markers with the lowest success rate (KSSR32, KSSR42, and KSSR46), and the remaining 22 high-quality EST-SSR markers were retained for investigations of the genetic diversity and population structure of a large number of samples.The genetic locations of these 22 EST-SSR markers are presented in Figure 3.Ten of the markers were located in genes in the largest PPI network, whereas the remaining 12 markers were present in genes in eight smaller PPI networks.To further verify the utility of the 22 new molecular markers, we conducted SSR genotyping analyses involving 108 kenaf germplasm materials.

Genetic Diversity of a Single Locus
We evaluated the genetic diversity of kenaf germplasm resources using the newly developed EST-SSR markers (Table 2).A table containing the SSR locus, allele size, melting temperature, and the sequence of the SSR primers was shown in Table 3.Among the 138 kenaf genotypes, the average number of alleles was 15.64, ranging from 6 to 28; the average number of effective alleles was 3.42, ranging from 1.43 to 5.88; the average major allele frequency was 0.50, ranging from 0.25 to 0.83; the average observed heterozygosity was 0.27, ranging from 0 to 0.57; the expected heterozygosity was 0.66, ranging from 0.30 to 0.83; the average PIC value was 0.63, ranging from 0.30 to 0.81; and the average Shannon diversity index was 1.58, ranging from 0.81 to 2.22.Additionally, the average genetic similarity coefficient of the kenaf germplasms was 0.23.The genetic similarity coefficient was lowest (i.e., 0) for S43, S85, S68, and S85, reflecting the most distant relationship.In contrast, the genetic similarity coefficient for S32 and S54 was 0.78.These results indicated the kenaf germplasms were genetically diverse and were distantly related.As such, they may be useful for selecting suitable parents for future kenaf breeding experiments.

Genetic Structure Analysis
On the basis of the genotyping data for 22 SSR loci, we analyzed the population structure of 138 kenaf materials.The model produced by the Structure software indicates the probability that each material is divided into specific subgroups, and the results were imported into Structure Harvester.The ∆K value curve for the K values was analyzed (Figure 6A).The ∆K value was highest when K = 3, indicating the optimal number of subgroups for the 138 materials was three.We used the Clumpp 2.0 software to merge the three repeated Q matrices corresponding to the best K value to draw the Q plot (Figure 6B).The samples with a Q value exceeding 0.6 were classified into specific clusters, whereas those with a Q value less than 0.6 were classified into mixed clusters.The 138 materials were divided into three clusters comprising 30, 25, and 50 materials.Additionally, 33 materials were included in the mixed cluster.ure 6A).The ΔK value was highest when K = 3, indicating the optimal number of subgroups for the 138 materials was three.We used the Clumpp 2.0 software to merge the three repeated Q matrices corresponding to the best K value to draw the Q plot (Figure 6B).The samples with a Q value exceeding 0.6 were classified into specific clusters, whereas those with a Q value less than 0.6 were classified into mixed clusters.The 138 materials were divided into three clusters comprising 30, 25, and 50 materials.Additionally, 33 materials were included in the mixed cluster.To further analyze the population structure, we used the PowerMarker software to calculate Nei's genetic distance based on the allele frequency.Moreover, the neighborjoining method was used to construct a phylogenetic tree (Figure 6C).The clustering of To further analyze the population structure, we used the PowerMarker software to calculate Nei's genetic distance based on the allele frequency.Moreover, the neighborjoining method was used to construct a phylogenetic tree (Figure 6C).The clustering of the neighbor-joining tree was similar to that based on the Structure analysis.Specifically, the kenaf genotypes in the same cluster of the Structure model were clustered on the same branch of the neighbor-joining tree.The genotypes in the mixed cluster were distributed in each branch.Therefore, these 138 kenaf germplasm materials can be divided into three clusters.This clustering result will be useful for future investigations of kenaf genetics and for identifying and optimizing the application of kenaf germplasm resources.
In this study, SSR molecular markers were developed based on kenaf lead-stressed transcriptome sequencing data.The SSR characteristics of the lead-stress-response-related genes in the whole genome were also analyzed.Moreover, we examined the SSRs of the unigenes identified from the kenaf lead-stressed transcriptome sequencing data.We identified more unigenes and a higher proportion of SSR-containing unigenes than reported by Li et al. [7].The distribution of the SSR motifs also differed between the studies.More specifically, mononucleotide motifs were the most common SSRs in our study, whereas they represented only 0.3% of the SSRs reported by Li et al. [7].This discrepancy may have resulted from the differences in the gene expression levels in diverse transcriptome sequencing experiments.
The SSR markers developed based on transcriptome data have obvious characteristics.The SSR distribution and the potential functions of the SSR-containing unigenes were significantly related to the lead stress tolerance of the kenaf germplasm resources included in this study.The functional enrichment analyses revealed that the SSR-containing unigenes were related to important plant physiological processes, including the scavenging of reactive oxygen species and photosynthesis.Therefore, the EST-SSR markers developed in this study may be useful for evaluating the genetic diversity of populations and reflecting the diversity among the lead-stress-response-related genes of the kenaf germplasm resources.Accordingly, these molecular markers can be used for elucidating population genetic diversity and identifying excellent breeding materials.
Genes encoding proteins in the same PPI network and regulated by the same signaling pathway often have correlated expression patterns and form a group of functional genes.The molecular markers developed based on these genes reflect the functional variations in a group of genes and the linkage or strong linkage disequilibrium in the marker itself and its vicinity as well as the broader genome-wide functional network.Therefore, we selected DEGs in a PPI network and subsequently screened and verified the SSR markers in these DEGs [16].We ultimately obtained 22 polymorphic EST-SSR markers with a high detection rate, and used them to evaluate kenaf germplasm resources.
To verify the practical utility of these 22 EST-SSR markers, we analyzed the genetic diversity and population structure of 138 kenaf germplasms.We found that the 22 EST-SSR loci were genetically diverse.The significant differences in the number of alleles and the number of effective alleles implied that each allele was unevenly distributed in the population and that the alleles were specific to particular subgroups.The neighbor-joining tree based on Nei's genetic distance confirmed that the 138 kenaf materials could be distinguished by these 22 EST-SSR markers.The results of the Structure and neighbor-joining cluster analyses were consistent.The kenaf germplasm resources were divided into three subgroups and one mixed subgroup.We also examined the potential correlation between the subgroups and the geographical origins of the germplasms, but no relationships were detected.These results imply that the 22 newly developed EST-SSR molecular markers are genetically diverse and applicable for distinguishing kenaf germplasms and evaluating population structures.However, they cannot reflect the differences in the geographical origins of the germplasms.These findings indicate the polymorphism of the kenaf genes associated with lead stress tolerance.Additionally, the evaluated population structure might influence the lead tolerance of kenaf germplasms, and this possibility needs to be experimentally verified via QTL mapping, association analyses, and the annotation of gene functions.Although we herein report the genetic diversity and population structure of kenaf germplasms, we cannot compare our results with those of previous related studies because of the diversity in the investigated germplasm resources [32][33][34].
The corresponding genes of KSSR18 and KSSR31 are plant-hormone-related genes.In Pb polluted soil, leaf spraying of plant hormone can increase plant height, root length, and dry weight of aboveground and underground parts of Zea mays, and promote Pb uptake by individual plants [35].HADI et al. believed that this phenomenon was related to root elongation and biomass increase of maize [35].In polluted soil containing 1.0 mg•kg −1 Cd(NO 3 ) 2 , leaf spraying with plant hormone could significantly increase the biomass of Nightshade; the aboveground Cd content increased by 16%, and plant Cd uptake increased by 124% [36].The increase of plant hormones in a heavy metal enrichment capacity may be the result of the combined effect of increasing root development (root biomass) and providing more storage sites (aboveground biomass).On the one hand, a more developed root system means an increase in the volume of soil available for nutrient absorption, which helps plants absorb more nutrients or heavy metals.On the other hand, higher aboveground biomass will provide more storage places for heavy metals to avoid excessive accumulation of heavy metals, thus reducing the load on plant physiological and biochemical processes [37].

Conclusions
In conclusion, we developed a set of EST-SSR markers based on kenaf lead-stressed transcriptome sequencing data.Additionally, we designed EST-SSR primers for DEGs in a PPI network to screen, validate, and apply the generated markers.Finally, 22 EST-SSR markers with a high detection rate and substantial polymorphism were obtained.These markers were able to distinguish 138 kenaf germplasms, which were divided into three subgroups (clusters).Our findings suggest that the 22 EST-SSR primer pairs are suitable for evaluating the genetic diversity and population structure of kenaf germplasm resources.Moreover, the specificity of the EST-SSR markers for particular genes may reflect the considerable differences in the lead tolerance of kenaf germplasm resources.Therefore, these markers may be important for identifying lead-tolerant kenaf materials and screening future molecular-marker-assisted breeding of new, more stress-tolerant varieties.The 22 EST-SSR markers reported herein can also be used for the QTL linkage mapping and association mapping of lead-stress-related traits to further verify the correlation between molecular markers and kenaf lead tolerance.

1 Figure 1 .
Figure 1.The technical route of this study.

Figure 1 .
Figure 1.The technical route of this study.

Sustainability 2023 , 15 Figure 3 .
Figure 3. Genes and PPI networks associated with the newly developed EST-SSR markers.The yellow nodes represent 22 genes with EST-SSR markers.The pink nodes represent node genes that interact with their proteins.

Figure 3 .
Figure 3. Genes and PPI networks associated with the newly developed EST-SSR markers.The yellow nodes represent 22 genes with EST-SSR markers.The pink nodes represent node genes that interact with their proteins.

Figure 3 .
Figure 3. Genes and PPI networks associated with the newly developed EST-SSR markers.The yellow nodes represent 22 genes with EST-SSR markers.The pink nodes represent node genes that interact with their proteins.

Figure 4 .
Figure 4. Enrichment analysis of the differentially expressed genes (DEGs) associated with PPI networks.(A) GO enrichment analysis of the PPI network-associated DEGs; (B) KEGG enrichment analysis of the PPI-network-associated DEGs.

Figure 5 .
Figure 5. Sample KSSR01 labeled ABI3730xl capillary electrophoresis fragment analysis.X axis is fragment size, unit is bp; the ordinate is signal intensity, peak height ht; ar is the peak area; sz is the fragment size.At the time of analysis, sz was an allele.

Figure 5 .
Figure 5. Sample KSSR01 labeled ABI3730xl capillary electrophoresis fragment analysis.X axis is fragment size, unit is bp; the ordinate is signal intensity, peak height ht; ar is the peak area; sz is the fragment size.At the time of analysis, sz was an allele.

Figure 6 .
Figure 6.Analysis of the population structure of kenaf germplasm resources.(A): ΔK changes with the value of K.When K = 3, ΔK had a peak inflection point, so the optimal population number of kenaf germplasm was 3. (B): Q plot.Y axis is Q value, which is the probability of the corresponding material divided into a specific cluster.The abscissa is kenaf germplasm material number.Red, blue, and green represent three clusters, respectively.(C).NJ clustering tree based on Nei's genetic distance.Red represents cluster1, green represents cluster2, blue represents cluster3, and purple represents Mixed.

Figure 6 .
Figure 6.Analysis of the population structure of kenaf germplasm resources.(A): ∆K changes with the value of K.When K = 3, ∆K had a peak inflection point, so the optimal population number of kenaf germplasm was 3. (B): Q plot.Y axis is Q value, which is the probability of the corresponding material divided into a specific cluster.The abscissa is kenaf germplasm material number.Red, blue, and green represent three clusters, respectively.(C).NJ clustering tree based on Nei's genetic distance.Red represents cluster1, green represents cluster2, blue represents cluster3, and purple represents Mixed.

Funding:
This research was funded by the National Key R&D Program and Key Special Project of International Science and Technology Innovation Cooperation between Governments (2017YFE0195300), National Natural Science Foundation of China (31801406; 32202506), Basic Public Welfare Research Program of Zhejiang Province (LGN20C150007), China Agriculture Research System of MOF and MARA, China Agriculture Research System for Bast and Leaf Fiber Crops (CARS-16-S05), and International Cooperation Fund of ZAAS (2022).Institutional Review Board Statement: Not applicable.Informed Consent Statement: Not applicable.

Table 2 .
Analysis of the genetic diversity of 22 new EST-SSR markers in 138 kenaf germplasm resources.

Table 3 .
A table containing the SSR locus, allele size, melting temperature, and the sequence of the SSR primers.