SQUAMOSA Promoter Binding Protein-Like (SPL) Gene Family: TRANSCRIPTOME-Wide Identification, Phylogenetic Relationship, Expression Patterns and Network Interaction Analysis in Panax ginseng C. A. Meyer

SPL (SQUAMOSA promoter binding protein-like) gene family is specific transcription factor in the plant that have an important function for plant growth and development. Although the SPL gene family has been widely studied and reported in many various plant species from gymnosperm to angiosperm, there are no systematic studies and reports about the SPL gene family in Panax ginseng C. A. Meyer. In this study, we conducted transcriptome-wide identification, evolutionary analysis, structure analysis, and expression characteristics analysis of SPL gene family in Panax ginseng by bioinformatics. We annotated the PgSPL gene family and found that they might involve in multiple functions including encoding structural proteins, but the main function were still focused on the binding function. The result showed that 106 PgSPL transcripts were classified into two clades - A and B, both of which respectively consisted of three groups. Besides, we profiled PgSPL transcripts’ genotypic, temporal, and spatial expression characteristics. Furthermore, we calculated the correlation of PgSPL transcripts in the 14 tissues of a 4 years old ginseng and 42 farmers’ cultivars farmers’ cultivars of 4 years old ginsengs’ roots with both results showing that SPL transcripts formed a single network, which indicated that PgSPLs inter-coordinated when performing their functions. What’s more, we found that most PgSPL transcripts tended to express in older ginseng instead of younger ginseng, which was not only reflected in the expression of more types of SPL transcripts in older ginseng, but also in the higher expression of SPL transcripts in older ginseng. Additionally, we found that four PgSPL transcripts were only massively expressed in roots. According to PgSPL transcripts’ expression characteristics, we found that PgSPL23-35 and PgSPL24-09 were most proper two transcripts to further study as ginseng age’s molecular marker. These results provide the basis for further elucidation of the PgSPL transcripts’ biological function in ginseng and ginseng genetics improvement and gene breeding in the future.


Introduction
Transcription factors (TFs) are a group of protein that can specifically bind to the sequence upstream of a gene to regulate the spatial or temporal expression of it [1]. SQUAMOSA promoter binding protein-like (SPL) gene family is a kind of TFs that encode the SQUAMOSA promoter binding proteins (SBPs). It was first identified in Antirrhinum majus with the function of regulating MADS-box genes in the early stage of flower development [2]. In the next two decades, the SPL gene family was identified and widely reported in various species such as moss [3], Arabidopsis [4], rice [5], tomato [6], and maize [7]. Through the findings of these study, the SPL gene family has multiple biological functions in plant growth and development including flowering time [8,9], sporocytes formation [10] and leaf initiation [11,12]. In particular, the SPL genes act as the target of miR156 that controlling the phase transitions between vegetative and reproductive, which is critical to plant reproductive success [13]. Additionally, SPL transcripts are also involved in copper homeostasis, fertility and response to GA (gibberellin acids) signaling [14,15].
The SPL proteins share the SBP domain, which is consisting of 76 amino acid residues that encoded two zinc fingers and a nuclear localization sequence (NLS). It has the function of binding to the GTAC core sequence and allow it to come into play [16]. In Arabidopsis, the SPL gene family can be divided into two categories based on the gene sequence and structure. The members of the second category, except SPL8, contain miR156/157 binding sites [14]. The miR156/157 bind to SPL transcript and regulate its expression by transcript cleavage or translational repression [17]. Besides, they are important mediators linking age to plant growth and development [18]. The miR156/157 is currently the only one known as an age-based molecular marker [19]. As the target genes of miR156/157, the SPL gene family is naturally a critical factor in plant growth and development.
Panax ginseng C. A. Meyer is a kind of slow-growing renascent herbs that belongs to the genus Ginseng in family Araliaceae. It has 4000 years of application history for its therapeutic effect in China according to an ancient Chinese medical book called Shen Nong's herbal classic of materia medica. Panax is named after the ancient Greek πᾶν (pân, "all") + ἄκoς (ákos, "cure"). As the Greeks name describes, ginseng and its derived compounds have been proven to have therapeutic functions in a tremendous scope of diseases, such as cancer [20], cardiovascular disorders [21], autoimmune disease [22], rheumatic diseases [23], obesity [24], ocular diseases [25], and inflammation [25]. Besides, they have also been proven to be capable of improving strategic learning [26] and resistance to stress [27]. In spite of its excellent quality, ginseng is hard in cropping, owing to the disadvantages of its weak resistance to a lot of diseases, and especially, the long growth period. It takes at least four years for ginseng to become a medicinal material. Ginseng's medicinal value increases along with its cultivation time, which explains why an illegal company may sell short-growing ginseng pretended to be long-growing ginseng in order to make money. The current method to judge the age of ginseng mainly relies on observing the appearance of ginseng, which is often not accurate enough. Therefore, there is an urgent need to develop a new way in judging the ginseng age through biotechnology.
In summary, the age judging of ginseng that SPL genes may be involving in was essential for application but have not been report to date. Hence, in this study, we identified SPL gene family from ginseng transcriptome for the first time, and then conducted the basic bioinformatics analysis including phylogenetic relationship, network interaction, and spatial and temporal expression patterns of it. These analysis will contribute to the exploration of the correlation between SPL gene expression and ginseng's age. In addition, it provides a theoretical basis and bioinformatics database for ginseng genetics improvement and breeding in the future.

Identification of the Ginseng SPL Transcripts
Three approaches were applied to identify the ginseng SPL gene family in order to ensure its integrity. First, we performed the search for putative SBP amino acid sequences in ginseng from the Plants 2020, 9, 354 3 of 23 Korean ginseng protein data (http://ginsengdb.snu.ac.kr/genome.php) taking the amino acid sequence of SBP domain (Pfam: PF03110) as the query. Then, the result was used as the query sequences for the TBLASTN to search from the Jilin ginseng transcriptome database for putative SPL transcripts in ginseng (Jilin P. ginseng transcriptome database was developed from 14 tissues of a 4-year-old Jilin ginseng plant, including fiber root, leg root, main root epiderm, main root cortex, rhizome, arm root, stem, leaf peduncle, leaflet pedicel, leaf blade, fruit peduncle, fruit pedicel, fruit flesh, and seed) at an e-value of 1 × 10 −6 . Second, nucleotide sequences of SPL in other species were downloaded from GenBank (http://www.ncbi.nlm.nih.gov/) and were used as the queries for the BLAST to search from the Chinese ginseng transcriptome database (http://ginseng.vicp.io:23488/) and Korean ginseng transcriptome database (http://ginsengdb.snu.ac.kr/transcriptome.php), at an e-value of 1 × 10 −6 . The results were used as the query sequences searching from the Jilin ginseng transcriptome database for putative SPL transcripts in ginseng at an e-value of 1 × 10 −6 . Third, the SBP amino acid sequence was downloaded from the Plant Transcription Factor Database (http://plntfdb.bio.uni-potsdam.de/v3.0/) as the query sequence for BLAST to search from the Jilin ginseng transcriptome database at an e-value of 1 × 10 −6 . Then, the results of the three approaches were merged and duplicates were removed. Finally, PgSPL gene sequences were submitted to ITAK (http://itak.feilab.net/cgi-bin/itak/index.cgi) to verify whether they contain the SBP domain.

GO (Gene Ontology) Annotation, Functional Categorization and Analyses
Since the multiple transcripts of a gene that generated by alternatively splicing are likely to have disparate biological functions [28], we used transcripts instead of genes to perform the GO annotation by Blast2GO [29]. We counted the number of SPL transcripts that were involved in a particular function as well as the number that was involved in multiple functions to show the gene ontology annotation of ginseng SPL gene family.

Multiple Sequence Alignments and Phylogenetic Analysis
In order to classify the PgSPL transcripts, we aligned PgSPL transcripts that have the complete SBP domain with seven other species. The sequences involving complete SBP domains were drawn out for phylogenetic analysis. Species belonging to dicotyledons and monocotyledons, especially belonging to asterid (which ginseng belongs to) were selected out as the outgroup species. Ten SPL transcripts in seven species were selected for sequence alignment and phylogenetic analysis, which consisted of Arabidopsis thaliana, Antirrhinum majus, Daucus carota, Helianthus annuus, Lactuca sativa, Oryza sativa subsp. Japonica and Solanum lycopersicum. The phylogenetic tree were conducted using the maximum-likelihood (ML) method of MEGA-X [30], and the bootstrap replications were set to 2000.

Ka/Ks Ratio among SPL in Ginseng and Other Species
According to the results of the previous step, PgSPL03 and PgSPL24-10 were selected to calculate the Ka/Ks ratio since their expression pattern was representative. The genes of other seven species previously used in the phylogenetic analysis were selected to calculate the Ka/Ks ratio. MEGA-X was used for the alignment and the YN00 of PAMLX software [31] was used for calculation of the Ka/Ks ratio.

Analysis of Different PgSPL Groups for Conserved Motifs
In order to gain a deeper insight into the evolution and classification of PgSPL transcripts, we conducted multiple sequence alignment using the MAFFT software within different groups that were obtained from the phylogenetic analysis. The results are shown by WebLogo [32,33] (http://weblogo.berkeley.edu/). The amino acids of SBP domain used in analysis were transformed from transcripts by NCBI ORF Finder (https://www.ncbi.nlm.nih.gov/orffinder/).

Expression and Interaction Characteristic of PgSPL Transcripts
In order to profile the expression pattern of PgSPL transcripts, we made heat map analysis by TBtools [34]. In order to profile the interaction characteristic among PgSPL transcripts, we calculated the Spearman's rank correlation coefficients using the R language program R Core Team 2013 and the results were visualized as network by BioLayout Express 3D software 3.3 [35]. The expression of PgSPL in 14 different tissues of 4-year-old ginseng and 4 different growing-year (5-, 12-, 18-, 25-years) of ginseng roots were used to show the temporal and spatial characteristic of PgSPL, and the expression of 42 farmer's cultivars of 4-year-old ginseng roots were used to show the characteristic within different genotypes.

Identification of the Ginseng SPL Transcripts
To identify all PgSPL transcripts, we used three methods to search. Taking the SBP domain (Pfam:PF03110) as query, we obtained 478 transcripts; taking as queries the nucleotide sequence of SPL transcripts in other species, we obtained 209 transcripts; taking as query the SBP amino acid sequence downloaded from Plant Transcription Factor Database, we obtained 30,992 transcripts. Then, after consolidating the results of these three methods and removing the duplicates, we obtain 31,994 transcripts. Finally, to verify whether they had SBP domain, we submitted them to ITAK. A total of 106 sequences (Table 1) were verified as possessing the SBP domain. The sequences identified as including SBP were rechristened PgSPL01-PgSPL30 (Table S1), and transcripts of the same gene were distinguished by digital suffixes (e.g., −01).

GO Annotation of PgSPL Transcripts
To provide an overview of functions of PgSPL transcripts, we annotated and categorized PgSPL transcripts using the free basic version of Blast2GO. The original annotation results of PgSPL are shown in Table S2. We further counted the number of transcripts that involved in different GO functions (Figure 1a), and the number that participate in multiple GO functions ( Figure 1b). There were 99 of 106 PgSPL transcripts which have annotations in Blast2GO: 95 of them were annotated with Molecular Function, 70 of them were annotated with Cell Component, and only 14 were annotated with Biological Processes. Especially, there are 12 PgSPL transcripts annotated with all three functions above, which indicated the importance of them.
More concretely, the PgSPL transcripts were annotated with 16 specific GO annotations that belongs to the three functions above. The results are shown in Figure 1b. A total of 90 (91%) PgSPL transcripts were categorized to binding (GO:0005488); 56 (57%) were categorized to cell (GO:0005623), cell part (GO:0044464), and organelle (GO:0043226), respectively; and the amount of PgSPL transcripts that annotated with 12 other functions were no more than 20. This results indicated that the PgSPL gene family may involve in multiple functions, but the main function were focused on the binding function, including DNA binding (GO:0003677) and metal ion binding (GO:0046872). two groups of ratios was extremely significant. On the basis of this, we concluded that it was their over length of complete nucleotide sequence that led to the fact that some transcripts nucleotide sequences could not be annotated with "binding" function. Given this problem, we annotated all PgSPL transcripts ORF and consolidated it with the annotation result of all PgSPL transcripts whole length nucleotides. A total of 16 transcripts still could not be annotated with the "binding" function. Part of them were not blasted and the rest were the result of an incompleteness of the SBP domain.

Phylogenetic Relationship and Classification for PgSPL Transcripts
To obtain the phylogenetic relationship among PgSPL transcripts and SPL transcripts in other species, we chose 56 PgSPL transcripts with an intact SBP domain and ten transcripts from seven other species to construct a phylogenetic tree. The result is shown in Figure 2. The SPL transcripts are basically divided into two clades, A and B. Clades A and B respectively consisted of three groups, i.e., there were six groups in total, which was consistent with the present study [36]. All transcripts from PgSPL26 and PgSPL18 were classified into the A1 group. All transcripts of PgSPL22, 28 and 25 were classified into the A2 group. PgSPL11, 16 and HaSPL were classified into the A3 group. All  In this section, all PgSPL transcripts are expected to annotate with the "binding" function for their SBP domain. However, if we annotate all transcripts' nucleotide sequences, only 56 transcripts will be annotated with the "binding" function. Thus, to explore whether it is the difference in open reading frames that caused the other 50 transcripts to be not annotated with "binding" function, we used open reading frames (ORFs) to annotate transcripts that were not annotated with the "binding" function in the first step. As the result, a total of 90 transcripts were annotated with the "binding" function. To clarify the reason why to some transcripts, only ORFs could be annotated with function while the complete nucleotide sequences could not, we separately calculated the ratio of the SBP domain sequence in the total sequence for the transcripts annotated with "binding" and for those were not annotated with "binding." We found that the ratio of the former was 18% and the latter was 11%. Finally, to test if it was statistically significant, we used the SPSS software to independent-samples t-test these two set of ratio data, with the result showing that the difference between these two groups of ratios was extremely significant. On the basis of this, we concluded that it was their over length of complete nucleotide sequence that led to the fact that some transcripts nucleotide sequences could not be annotated with "binding" function. Given this problem, we annotated all PgSPL transcripts ORF and consolidated it with the annotation result of all PgSPL transcripts whole length nucleotides. A total of 16 transcripts still could not be annotated with the "binding" function. Part of them were not blasted and the rest were the result of an incompleteness of the SBP domain.

Phylogenetic Relationship and Classification for PgSPL Transcripts
To obtain the phylogenetic relationship among PgSPL transcripts and SPL transcripts in other species, we chose 56 PgSPL transcripts with an intact SBP domain and ten transcripts from seven other species to construct a phylogenetic tree. The result is shown in Figure 2. The SPL transcripts are basically divided into two clades, A and B. Clades A and B respectively consisted of three groups, i.e., there were six groups in total, which was consistent with the present study [36]. All transcripts from PgSPL26 and PgSPL18 were classified into the A1 group. All transcripts of PgSPL22, 28 and 25 were classified into the A2 group. PgSPL11, 16 and HaSPL were classified into the A3 group. All transcripts of PgSPL20, 21, 24, PgSPL08, SlSPL2, DcSPL, AtSPL, and AmSPL were classified into the B1 group. All transcripts of PgSPL27, PgSPL04, OsSPL1, 2, and SlSPL1 were classified into the B2 group. PgSPL19, SlSPL3, and LsSPL were classified into the B3 group ( Figure 2; Table S3).

Synonymous and Nonsynonymous Substitution Rates and Selective Pressure in PgSPL
To explore the evolutionary pressure, we calculated the synonymous and nonsynonymous substitution rates (Ka/Ks) of PgSPL transcripts and compared them to the Ka/Ks ratio in other species. PgSPL03 and PgSPL24-10 were selected to calculate the Ka/Ks ratio using YN00 of PAMLX software. The result showed that all of the Ka/Ks ratios are lower than 1. To be specific, the average number is 0.25 with CV = 78%. The lowest one was 0.11 between PgSPL03 and Antirrhinum majus, and the highest one was 0.88 between PgSPL24-10 and Daucus carota. In ginseng, the Ka/Ks ratio between PgSPL03 and PgSPL24-10 was 0.21. Interestingly, Daucus carota is one of the species that have the closest phylogenetic relationship with ginseng, but its Ka/Ks ratios had a big difference with ginseng. If the DcSPL were excluded from the calculation, the average would be as low as 0.18 with a CV = 29%. All of these results indicated that SPL genes in eight species above had been undergoing a strong purifying selection.

Synonymous and Nonsynonymous Substitution Rates and Selective Pressure in PgSPL
To explore the evolutionary pressure, we calculated the synonymous and nonsynonymous substitution rates (Ka/Ks) of PgSPL transcripts and compared them to the Ka/Ks ratio in other species. PgSPL03 and PgSPL24-10 were selected to calculate the Ka/Ks ratio using YN00 of PAMLX software. The result showed that all of the Ka/Ks ratios are lower than 1. To be specific, the average number is 0.25 with CV = 78%. The lowest one was 0.11 between PgSPL03 and Antirrhinum majus, and the highest one was 0.88 between PgSPL24-10 and Daucus carota. In ginseng, the Ka/Ks ratio between PgSPL03 and PgSPL24-10 was 0.21. Interestingly, Daucus carota is one of the species that have the closest phylogenetic relationship with ginseng, but its Ka/Ks ratios had a big difference with ginseng. If the DcSPL were excluded from the calculation, the average would be as low as 0.18 with a CV = 29%. All of these results indicated that SPL genes in eight species above had been undergoing a strong purifying selection.

Analysis for Conserved Domains of PgSPL Transcripts
To explore the structure difference of SBP domain among six different PgSPL transcript groups obtained from phylogenetic analysis, we conducted multiple sequence alignment using MAFFT software. In order to show the differences between groups more clearly, we used Weblogo to show the alignment result. Because of the deficiency in transcript numbers of group A3 and B3, we did not align these two. The result is shown in Figures 3 and 4. Several amino acid in SBP domain were conserved in sequences of all groups, including: in the structure of zinc finger 1, the cysteine (C), cysteine (C), cysteine (C), and histidine (H) at position 214, 219, 236, and 239 amino acid, respectively; in the structure of zinc finger 2, the cysteine (C), cysteine (C), histidine (H), and cysteine (C) at position 255, 258, 262, and 274 amino acid, respectively; in the structure of nuclear localization signal (NLS), the lysine (K), arginine (R), arginine (R), arginine (R), arginine (R), and lysine (K) at position 271, 272, 284, 285, 286, and 287 amino acid, respectively. (Figure 3a). These conserved amino acid residues in PgSPL transcripts are basically consistent with group II reported in Guo's study [37].

Analysis for Conserved Domains of PgSPL Transcripts
To explore the structure difference of SBP domain among six different PgSPL transcript groups obtained from phylogenetic analysis, we conducted multiple sequence alignment using MAFFT software. In order to show the differences between groups more clearly, we used Weblogo to show the alignment result. Because of the deficiency in transcript numbers of group A3 and B3, we did not align these two. The result is shown in Figures 3 and 4. Several amino acid in SBP domain were conserved in sequences of all groups, including: in the structure of zinc finger 1, the cysteine (C), cysteine (C), cysteine (C), and histidine (H) at position 214, 219, 236, and 239 amino acid, respectively; in the structure of zinc finger 2, the cysteine (C), cysteine (C), histidine (H), and cysteine (C) at position 255, 258, 262, and 274 amino acid, respectively; in the structure of nuclear localization signal (NLS), the lysine (K), arginine (R), arginine (R), arginine (R), arginine (R), and lysine (K) at position 271, 272, 284, 285, 286, and 287 amino acid, respectively. (Figure 3a). These conserved amino acid residues in PgSPL transcripts are basically consistent with group II reported in Guo's study [37].
Within different groups, the conserved pattern of PgSPL sequences were different. In the group A1, all of the transcripts contained the C3HC2HC type SBP box with 74 amino acids in total and had a very strong consistency in the other positions apart from the two zinc fingers and the NLS (Figure 3b). The A2 group was also consistent with the two zinc fingers and an NLS with 74 amino acids in total. Its residues in the other positions apart from the three motifs, however, were not conserved so much (Figure 3c). As for the group B1, they also had a strong consistency in two zinc fingers and the NLS whose consistency was only lower than A1's. The other positions of the group B1 apart from the three motifs were relatively conserved but differ from A1's ( Figure 3d). When it comes to the group B2, they had inferior consistency. Their SBP box was the C4C2HC type, which is quite another SBP box type from the C3HC2HC type SBP box conserved in all the rest of the PgSPL transcripts (Figure 3e). Because the conserved domains are essential for sequences to conduct functions, the difference of it in different groups may led to the functional differentiation among groups of PgSPL.

Temporal and Spatial Expression Characteristic of PgSPL Transcripts
In order to show temporal and spatial expression pattern of PgSPL transcripts, we conducted heat maps based on different growing years and different tissues. In different growing years of ginseng (5-, 12-, 18-, and 25-years), there were 81 PgSPL transcripts expressed out of total 106, and the number expressed in each growing year were 42, 47, 53, and 64 respectively. Obviously, the number of expressed PgSPL transcripts were increasing with the growing year longer of ginseng. In addition, only 29 (36%) transcripts expressed in all four growing years, most PgSPL transcripts expressed differential in different growing years. Especially, there are three (13%) transcripts expressed only in 5-year-old root, three (13%) only in 12-year-old root, three only in 18-year-old root. When it comes to 25-year-old root, however, the PgSPL transcripts specifically expressed in it went up to 14, accounting for 61% of total number of expressed transcripts (Figures 5a and 8c).
In 14 tissues of the 4-year-old ginseng, a total of 96 expressed PgSPL transcripts out of 106 were used to construct the heat map (Figure 8a). Among them, 16 (17%) transcripts were expressed in all the tissues while nine (9.4%) transcripts specifically expressed in one tissue. Of these nine transcripts, five expressed only in fruit pedicel, two in leaflet pedicel, one in fruit peduncle, and one in the main root cortex. The rest of the 71 transcripts expressed in multiple tissues (Figure 5b). Within different groups, the conserved pattern of PgSPL sequences were different. In the group A1, all of the transcripts contained the C3HC2HC type SBP box with 74 amino acids in total and had a very strong consistency in the other positions apart from the two zinc fingers and the NLS (Figure 3b). The A2 group was also consistent with the two zinc fingers and an NLS with 74 amino acids in total. Its residues in the other positions apart from the three motifs, however, were not conserved so much ( Figure 3c). As for the group B1, they also had a strong consistency in two zinc fingers and the NLS whose consistency was only lower than A1's. The other positions of the group B1 apart from the three motifs were relatively conserved but differ from A1's ( Figure 3d). When it comes to the group B2, they had inferior consistency. Their SBP box was the C4C2HC type, which is quite another SBP box type from the C3HC2HC type SBP box conserved in all the rest of the PgSPL transcripts (Figure 3e). Because the conserved domains are essential for sequences to conduct functions, the difference of it in different groups may led to the functional differentiation among groups of PgSPL.

Temporal and Spatial Expression Characteristic of PgSPL Transcripts
In order to show temporal and spatial expression pattern of PgSPL transcripts, we conducted heat maps based on different growing years and different tissues. In different growing years of ginseng (5-, 12-, 18-, and 25-years), there were 81 PgSPL transcripts expressed out of total 106, and the number expressed in each growing year were 42, 47, 53, and 64 respectively. Obviously, the number of expressed PgSPL transcripts were increasing with the growing year longer of ginseng. In addition, only 29 (36%) transcripts expressed in all four growing years, most PgSPL transcripts expressed differential in different growing years. Especially, there are three (13%) transcripts expressed only in 5-year-old root, three (13%) only in 12-year-old root, three only in 18-year-old root. When it comes to 25-year-old root, however, the PgSPL transcripts specifically expressed in it went up to 14, accounting for 61% of total number of expressed transcripts (Figure 5a and Figure 8c).
In 14 tissues of the 4-year-old ginseng, a total of 96 expressed PgSPL transcripts out of 106 were used to construct the heat map (Figure 8a). Among them, 16 (17%) transcripts were expressed in all the tissues while nine (9.4%) transcripts specifically expressed in one tissue. Of these nine transcripts, five expressed only in fruit pedicel, two in leaflet pedicel, one in fruit peduncle, and one in the main root cortex. The rest of the 71 transcripts expressed in multiple tissues (Figure 5b).
To show the spatial interaction characteristic of PgSPL transcripts, we calculated the Spearman's rank correlation coefficients among all transcripts expressed in different tissues using the R language. Then, in order to make it easier to read, we visualized the results as a network by BioLayout Express 3D software. Under the p-value of 0.05, all PgSPL transcripts expressed in 14 tissues formed a single co-expression network (Figure 7a), which fell into six clusters (Figure 7b) and contained 96 nodes and 527 edges in total (Figure 6a). The average connectivity reached up to 10.979. In order to reflect the tightness of this network, we counted the nodes and edges of it under the increasingly strict p-value, and introduced same number of sequences randomly drawn from ginseng transcriptome to construct network as the negative control. The results were showed in Figure 6c,d. Under p = 0.05, the number of nodes and edges of all SPL transcripts were 96 and 527 respectively; and for negative control were 95 and 370, respectively. However, under the p = 1 × 10 −5 , the number of nodes and edges of all SPL transcripts were 25 and 21 respectively, while that of nodes and edges of unknown transcripts both were 0. This result indicated that the PgSPL transcripts formed a tighter network than the randomly drawn transcripts. To further prove the correlation between the pairs of each SPL transcripts, we constructed the network using randomly drawn two-thirds of total SPL transcripts (which was 64) and introduced the negative control as described above (Figure 6e were 59.5 and 148.3, respectively. Under the p = 1 × 10 −8 , the number of nodes and edges of 64 SPL transcripts were 3.1 and 2.45, respectively, while that of nodes and edges of 64 unknown transcripts were both 0. Especially, the average number of nodes and edges of PgSPL network were significantly larger than that of the random unknown genes under the p = 1 × 10 −3 to p = 1 × 10 −8 . These results indicated that the PgSPL transcripts have a significant co-expression interaction with each other and might reflect the mechanism underlying the PgSPL functions.

Expression Characteristic of PgSPL Transcripts in 4-Year-Old Roots of Different Genotypes
We

Expression Characteristic of PgSPL Transcripts in 4-Year-Old Roots of Different Genotypes
We explored the expression pattern of PgSPL in different genotype through the heat map analysis of PgSPL transcripts in 4-year-old roots of 42 farmers' cultivars. A total of 90 of 106 transcripts expressed in the roots of the 42 farmer's cultivar. The CV of the number of expressed transcripts in every genotype's roots was 11%, which indicated that the expression of PgSPL transcripts has consistency in different genotypes.
To show the correlation among all PgSPL transcripts in different genotypes, we calculated Spearman's rank correlation coefficients among all transcripts expressed in different genotypes using the R language. Then, in order to make it easier to read, we visualized the results using the BioLayout Express 3D software. Under the p = 0.05, the network of expressed transcripts in 42 farmer's cultivars fell into nine clusters and consisted of 90 nodes and 409 edges in total (Figure 7b). The average connectivity of the network was 9.089. In order to reflect the tightness of this network, we counted the nodes and edges of it under the increasingly strict p-value, and introduced same number of sequences randomly drawn from ginseng transcriptome to construct network as the negative control. The results are shown in Figure 7c,d. Under p = 0.05, both the SPL transcripts' and unknown transcripts' number of nodes and edges were respectively 90 and 409. However, under the p = 1 × 10 −6 , the number of nodes and edges of all SPL transcripts are 26 and 14 respectively, while negative control's nodes and edges both were 0 (Figure 7a-d). This result indicated that the PgSPL transcripts formed a tighter network than the randomly drawn transcripts. To further prove the correlation between pairs of each SPL transcripts, we constructed the network using randomly drawn two-thirds of total SPL transcripts (which was 60) and introduced the negative control as described above (Figure 7e,f). After 20 repetitions, we calculated the average of them. The result shows that under the p = 0.05, the average number of nodes and edges of random 60 SPL transcripts were 42.7 and 109.2, respectively; the average number of nodes and ages of 60 random unknown transcripts were 53.9 and 135.85, respectively. However, under the p = 1 × 10 −8 , the number of nodes and edges of 60 SPL transcripts were 3.25 and 2, respectively, while negative control's nodes and edges were 0.4 and 0.2, respectively (Figure 7e-h). This was the same as the network of expressed SPL transcripts in 14 tissues. The network of expressed SPL transcripts in 42 farmers' cultivars pointed to the fact that the expression of some SPL transcripts in different tissues had a close correlation when performing their biological functions.
Plants 2020, 9,354 16 of 24 every genotype's roots was 11%, which indicated that the expression of PgSPL transcripts has consistency in different genotypes.
To show the correlation among all PgSPL transcripts in different genotypes, we calculated Spearman's rank correlation coefficients among all transcripts expressed in different genotypes using the R language. Then, in order to make it easier to read, we visualized the results using the BioLayout Express 3D software. Under the p = 0.05, the network of expressed transcripts in 42 farmer's cultivars fell into nine clusters and consisted of 90 nodes and 409 edges in total (Figure 7b). The average connectivity of the network was 9.089. In order to reflect the tightness of this network, we counted the nodes and edges of it under the increasingly strict p-value, and introduced same number of sequences randomly drawn from ginseng transcriptome to construct network as the negative control. The results are shown in Figure 7c,d. Under p = 0.05, both the SPL transcripts' and unknown transcripts' number of nodes and edges were respectively 90 and 409. However, under the p = 1 × 10 −6 , the number of nodes and edges of all SPL transcripts are 26 and 14 respectively, while negative control's nodes and edges both were 0 (Figure 7a-d). This result indicated that the PgSPL transcripts formed a tighter network than the randomly drawn transcripts. To further prove the correlation between pairs of each SPL transcripts, we constructed the network using randomly drawn two-thirds of total SPL transcripts (which was 60) and introduced the negative control as described above (Figure 7e,f). After 20 repetitions, we calculated the average of them. The result shows that under the p = 0.05, the average number of nodes and edges of random 60 SPL transcripts were 42.7 and 109.2, respectively; the average number of nodes and ages of 60 random unknown transcripts were 53.9 and 135.85, respectively. However, under the p = 1 × 10 −8 , the number of nodes and edges of 60 SPL transcripts were 3.25 and 2, respectively, while negative control's nodes and edges were 0.4 and 0.2, respectively (Figure 7e-h). This was the same as the network of expressed SPL transcripts in 14 tissues. The network of expressed SPL transcripts in 42 farmers' cultivars pointed to the fact that the expression of some SPL transcripts in different tissues had a close correlation when performing their biological functions.
To show the expression pattern in different genotypes, we visualized the expression quantity data of 42 farmer's cultivars by TBtools [34]. From the heat map, we could see that the expression quantity distributions of PgSPL03, 13, 14, 21-11, 22-03, 24-15, 28-35, and 30-08 were very outstanding, among which PgSPL28-35 had a high expression quantity in almost all genotype except for S1 and S12, while the rest of these had low expression quantities in most genotype except for several farmers' cultivars ( Figure 8b; Table S4).  To show the expression pattern in different genotypes, we visualized the expression quantity data of 42 farmer's cultivars by TBtools [34]. From the heat map, we could see that the expression quantity distributions of PgSPL03, 13, 14, 21-11, 22-03, 24-15, 28-35, and 30-08 were very outstanding, among which PgSPL28-35 had a high expression quantity in almost all genotype except for S1 and S12, while the rest of these had low expression quantities in most genotype except for several farmers' cultivars ( Figure 8b; Table S4).

PgSPL Transcripts Have a Relatively Centralized Distribution of Functions and Tend to Serve for Close Purposes
According to the present studies, SPL transcripts are associated with biological processes such as regulation for flowering [8,9], and molecular function such as DNA-binding [16]. Hence, most of them should have been annotated with biological processes and molecular functions. The GO result shows that, however, only 14 (14%) transcripts are annotated with a biological process, while 70 (71%) transcripts are annotated with cell components, which indicates that the function of SPL in ginseng quite differs from the SPL in other species. SPL transcripts in other species are mostly regulatory

PgSPL Transcripts Have a Relatively Centralized Distribution of Functions and Tend to Serve for Close Purposes
According to the present studies, SPL transcripts are associated with biological processes such as regulation for flowering [8,9], and molecular function such as DNA-binding [16]. Hence, most of them should have been annotated with biological processes and molecular functions. The GO result shows that, however, only 14 (14%) transcripts are annotated with a biological process, while 70 (71%) transcripts are annotated with cell components, which indicates that the function of SPL in ginseng quite differs from the SPL in other species. SPL transcripts in other species are mostly regulatory genes that control the plant development process, while in ginseng, they are not only regulatory genes but also structural genes. A total of 95 (96%) and 70 (71%) transcripts are categorized into molecular functions and cell components, respectively. To be specific, 90 (91%), 56 (57%), 56 (57%) and 56 (57%) Plants 2020, 9, 354 20 of 23 transcripts are categorized into the binding, cell, organelle, and cell part, respectively. With SBP domains, all SPL transcripts are expected to basically associate with DNA-binding. Nonetheless, the heat maps show that the difference of PgSPL transcripts' expressions in different tissues and age groups are very large and pronounced, indicating that certain PgSPL transcripts can just be expressed in a single tissue or a specific development stage (Figure 8a,c); furthermore, some of the SPL transcripts are target genes of miR156/157 and the rest of them are not [14]. All the results above point to the fact that PgSPL transcripts have similar functions and perform their functions through a single system.

Some PgSPL Transcripts with a Prominent Spatial Expression Pattern
From the angle of spatial expression pattern, PgSPL03, 13, 14, 23-01, 23-35, 24-09 and 24-15 tend to be expressed in underground parts of ginseng rather than the aboveground parts, among which the tendencies of PgSPL03, 13, and 14 are particularly obvious. On the contrary, the PgSPL16, 28-35 and 29 tend to be expressed in the aboveground parts such as leaf, rather than underground parts (Figure 8a; Table S4). It is obvious that PgSPL transcripts specifically expressed in underground parts outnumber those specifically expressed in the aboveground parts, which indicates that PgSPL transcripts play a crucial role more in the roots instead of other tissues such as leaf or seed.

Some PgSPL Transcripts with a Prominent Temporal Expression Pattern
It is reported that the expression of miR156 is regulated by age. To be specific, miR156 is highly accumulated in seedlings and gradually decreases as the plant ages in Arabidopsis thaliana [18,38]. The sense of the age of miR156 is prevalent in different species. In maize, rice, and poplar, the expression of miR156 decreases as age increases [5,39,40]. This result suggests that the upstream of miR156 may be a very conservative age signal of the age pathway and content of miR156 determines the physiological age of the plant [19]. Overexpression of miR156 prolongs the plant's infancy, with increases in branching and leaf growth as well as delays of the flowering phenotype. These phenotypes are observed in Arabidopsis [8,18,38], maize [39], rice [5], tomato [13], and poplar [40]. Conversely, inhibition of miR156 activity by "target mimicry" in Arabidopsis can promote plants into adulthood [18,38,41,42].
The miR172 is also an important molecule that regulates plant transformation from childhood to adulthood [18]. Contrary to the expression pattern of miR156, the expression level of miR172 increases as age increases. Experiments with chromatin immunoprecipitation have shown that miR172 is a direct SPL downstream gene. In Arabidopsis, SPL9 and SPL10 bind directly to the promoter of miR172B and activate its transcription [18]. In miR156 target mimicry transgenic plants, the content of miR172 increases [18]. The function of miR172 in plants is opposite to miR156. Overexpression of miR172 causes Arabidopsis to enter adulthood in advance [8].
Thus, according to the present studies, SPL gene expression should go up with the decrease of miR156 and the increase of miR157 expression. Eventually, the plant transition is activated and moves from the vegetative process to the reproductive process. In our study, PgSPL03, 23-01, 23-35, 24-09, and 25-08 have a distinct tendency of expression increasing over time, which is consistent with the present studies. However, this is not a story of a uniform increase of all SPL transcripts. In other words, not all the expression pattern of PgSPL transcripts are consistent with the present studies. The expressions of PgSPL24-07 and 28-11 even decrease over time. PgSPL19, 22-04, 25-01, 30-10 barely express in the roots of the first three age groups (5, 12, and 18 years old), instead of only expressing in 25 years old roots. PgSPL30-02 and PgSPL 28-06 express only in 12-year-old and 18-year-old roots, respectively. One can notice that PgSPL30-02 and PgSPL30-10 come from the same gene, PgSPL30. However, they have entirely different temporal expressions, so do PgSPL24 and 28. Another noticeable point is that some PgSPL transcripts such as PgSPL15, 16, and 28-35 almost have a high expression in all ages of roots except those in 18-year-old roots (Figure 8c; Table S4). A total of 42 (52%), 47 (58%), 53 (65%), and 64 (79%) of transcripts express in 5, 12, 18, and 25 year-old roots respectively. To be specific, 46 (57%) of the transcripts have a higher expression level in 25-year-old roots than those in 5-year-old roots, and 26 (32%) of the transcripts have a lower expression in 25-years-old roots than those in 5-year-old roots, which is just partly consistent with the present studies. In a nutshell, the expression pattern of SPL transcripts in ginseng is very complicated and not completely consistent with the SPL transcripts in other species.

Some PgSPL Transcripts with a Prominent Genotype Expression Pattern
From the expressions of PgSPL transcripts in different genotypes, most transcripts' expressions are very close in all farmers' cultivars, which indicates that the expression of PgSPL transcripts has a universal characteristic. However, there are also specific characteristics of PgSPL. PgSPL28-35 expresses in almost all the farmers' cultivars except S1 and S24, and PgSPL22-03 specifically has an intense expression in S22 (Figure 8b; Table S4).

PgSPL as Ginseng's Age Molecular Markers
From the temporal expression pattern perspective, we found that the number of expressed PgSPL transcripts were increasing with the growing year longer of ginseng and most PgSPL transcripts expressed differential in different growing years. The expression of PgSPL03, PgSPL23-35, PgSPL24-09, PgSPL25-08, and PgSPL23-01 had an obvious tendency to increase with age. Additionally, from the expression heat maps of 14 tissues and 42 genotypes, we found that PgSPL23-35 and PgSPL24-09 expressed steadily in all genotypic ginseng' roots. Therefore, if the relationship between these two transcripts' expression and ginseng's age are verified and a standard curve is drawn, they can act as ginseng age's molecular marker.

Conclusions
In this study, 106 PgSPL transcripts were identified. According to the annotation results, we found that PgSPL gene family might be involved in multiple functions, but the main function is focused on the binding function. We calculated the synonymous and nonsynonymous substitution rates and the results indicated that SPL genes had been undergoing a strong purifying selection. Through phylogenetic analysis, we found that SPL transcripts were basically divided into two clades. These two clades respectively consisted of three groups, which was to say, there were six groups in total. On the basis of this, we aligned the SBP domain for each groups. The results showed that both the C3HC2HC type SBP and C4C2HC type SBP existed in ginseng. We constructed the network of PgSPL transcripts expressed in 14 tissues and 42 genotypes and did the network analyses. The results indicated that the PgSPL transcripts have a significant co-expression interaction with each other and might reflect the mechanism underlying the PgSPL functions. According to PgSPL transcripts' expression characteristics, we found that PgSPL23-35 and PgSPL24-09 were the most proper two transcripts to further study as ginseng age's molecular marker.
Supplementary Materials: The following are available online at http://www.mdpi.com/2223-7747/9/3/354/s1. Table S1: The nucleic acid sequences and protein sequences of the PgSPL gene transcripts. Table S2: The classification, annotation, and GO functional categorization of the PgSPL gene transcripts. Table S3: The identified genes used as evolutionary controls for PgSPL genes phylogenetic analysis.