Microsatellites as Agents of Adaptive Change: An RNA-Seq-Based Comparative Study of Transcriptomes from Five Helianthus Species

: Mutations that provide environment-dependent selective advantages drive adaptive divergence among species. Many phenotypic differences among related species are more likely to result from gene expression divergence rather than from non-synonymous mutations. In this regard, cis-regulatory mutations play an important part in generating functionally signiﬁcant variation. Some proposed mechanisms that explore the role of cis-regulatory mutations in gene expression divergence involve microsatellites. Microsatellites exhibit high mutation rates achieved through symmetric or asymmetric mutation processes and are abundant in both coding and non-coding regions in positions that could inﬂuence gene function and products. Here we tested the hypothesis that microsatellites contribute to gene expression divergence among species with 50 individuals from ﬁve closely related Helianthus species using an RNA-seq approach. Differential expression analyses of the transcriptomes revealed that genes containing microsatellites in non-coding regions (UTRs and introns) are more likely to be differentially expressed among species when compared to genes with microsatellites in the coding regions and transcripts lacking microsatellites. We detected a greater proportion of shared microsatellites in 5 (cid:48) UTRs and coding regions compared to 3 (cid:48) UTRs and non-coding transcripts among Helianthus spp. Furthermore, allele frequency differences measured by pairwise F ST at single nucleotide polymorphisms (SNPs), indicate greater genetic divergence in transcripts containing microsatellites compared to those lacking microsatellites. A gene ontology (GO) analysis revealed that microsatellite-containing differentially expressed genes are signiﬁcantly enriched for GO terms associated with regulation of transcription and transcription factor activity. Collectively, our study provides compelling evidence to support the role of microsatellites in gene expression divergence.


Introduction
Understanding the genomic basis of adaptive divergence among species remains a central theme in evolutionary biology. A major contributor to species divergence is variation in gene expression regulation [1]. Mutations within the cis-regulatory regions appear to be especially important in generating evolutionarily significant variations among species [2]. 2 of 15 Indeed, studies analyzing the relative contribution of cis-regulatory mechanisms and protein coding changes in species divergence suggest that ontogenetic change among species is frequently cued by cis regulatory elements [3][4][5]. Some of the most frequently found cis-regulatory elements are highly mutable microsatellites or short tandem repeats [6]. A growing body of research now suggests that microsatellites in cis-regulatory regions can play a major role in gene expression variation [7,8].
Microsatellites consist of short tandem repeats, typically with 2-to 6-base-pair-long motifs, repeated several to dozens of times [9,10]. Microsatellite evolution is often explained using the Stepwise Mutation Model (SMM) [11], in which mutation is achieved through addition or loss of a single repeat unit resulting in microsatellite length variation according to a symmetric random walk [12]. Significant deviations from this symmetry (skewness or asymmetry) in the distribution of microsatellite repeat numbers have been at times interpreted as evidence for selection [13]. Long considered non-functional and evolutionarily neutral, microsatellites are frequently used in genetic studies due to their high polymorphism [14]. Despite the high mutation rates associated with microsatellites, they are an integral part of all eukaryotic genomes and transcriptomes [15]. Initially, microsatellites were viewed as sequences that could be detrimental if present in the transcribed regions of the genome. For example, uncontrolled expansion of microsatellites in genic regions is known to cause nearly 30 human neurodegenerative diseases [16], including Huntington's disease and fragile X syndrome. Yet, over the years, several studies have shown that microsatellites could be functionally beneficial and could facilitate rapid adaptation [17,18].
Microsatellites have recently been implicated in morphogenesis and reproductive phenology in plants [19][20][21], neuronal and craniofacial development in primates [22,23], limb and skull morphology in domesticated dogs [24], circadian clock cycles in fungi [25], and courtship behaviors in mammals [26], among other traits. Several mechanisms have been proposed to explain how the presence of microsatellites in different regions of the genome can affect gene function and products. Microsatellites may alter transcription rates by serving as alternative transcription start sites [27], provide additional transcription factor binding sites [28], and variation in some coding region microsatellites may affect the structure of proteins, including that of transcription factors [24]. With the advent of high-throughput sequencing, microsatellites and their functional role in gene regulation are now being studied at the genome level. Several such large-scale genome-wide and transcriptome-wide studies indicate that microsatellite-linked variations in gene expression are neither isolated nor sporadic, but widespread across genomes [29][30][31][32].
In this study, we investigate the role of microsatellites in gene expression divergence among species by comparing transcriptomes of several individuals belonging to the genus of North American annual wild sunflowers (Helianthus). Helianthus has a well-characterized ecological and evolutionary history [33], and the genomic basis of local adaptation, ecotype formation, and speciation is well-documented in several species [34][35][36]. This makes them a good system for studying adaptive processes such as those that can result from microsatellite polymorphisms in genes.
Recent transcriptome-based studies of the common sunflower (Helianthus annuus L.) revealed that a substantial number of transcribed microsatellites in sunflower can be functional. Over 400 transcribed microsatellites have been linked to gene expression variation in common sunflower populations across a latitudinal gradient in North America [32]. Furthermore, microsatellites of A and AG motif types have been linked to gene expression divergence among populations of common sunflower [37]. These previous studies on functional microsatellites in common sunflower provide impetus to explore the potential evolutionary role of transcribed microsatellites in gene expression divergence among closely related Helianthus species.
We designed the current study to answer the following questions: (1) Are microsatellite-containing genes more likely to show evidence of expression divergence among species, as compared to genes lacking microsatellites?  [38,39].

Post Sequencing Data Collection
A reference transcriptome was constructed from one H. annuus individual ("Canal2") using the de novo transcriptome assembly software Trinity [40]. The reference assembly consists of 51,468 transcripts as is available here: https://doi.org/10.5061/dryad.9q1n4 (accessed on 15 March 2021) [41]. Each fastq file containing the raw reads for an individual was aligned to the reference transcriptome using BWA [42]. The resulting aligned files in BAM format were indexed and reads mapped to each transcript were measured as counts using SAMTools v.0.1.17 [43].

Functional Annotation
A standalone BLASTX [44] search of the reference transcriptome was performed against a publicly available H. annuus protein sequence database. First, we built a local protein sequence database with common sunflower protein sequences, available at https: //www.heliagene.org/ (version HanXRQr2.0-SUNRISE. accessed on 15 March 2021). Using the reference transcriptome as the query, we performed a BLASTX search against the sunflower protein sequence database with an E-value cutoff set at 0.0001, gap open penalty score set at 11, gap extension penalty score of one, and minimum word size of three. We used BLOSUM62 as the matrix of choice for the search. To minimize the number of hits for each query sequence, we set the best hit overhang at 0.25 and the maximum target sequence value at one. The output of best hits produced by the BLASTX search was further filtered based on E-value and bit score to retain the hit with the lowest E-value and the highest bit score for each query sequence.

Mining and Genotyping SNPs and Microsatellites
To identify and genotype SNPs present in all individuals, we used the "bcftools" and "mpileup" option in SAMTools v.0.1.17 [43]. SNP genotypes with Phred-scaled genotype likelihoods below 30 were excluded from analyses, and hence, only high-quality SNPs were used in further analyses. The methodology was previously detailed in [41].
Microsatellites were identified using SciRoKo v 3.4 [45]. The parameters in SciRoKo were set to mine microsatellites of repeat sizes 1 to 6 bp, a minimum total length of 15 bp, and to find impure microsatellites, i.e., microsatellites that are interrupted by few substitutions or indels, using the "mismatch variable penalty" option, which allows for selection of impure microsatellites adjusted with respect to the total length of the tract.
Microsatellite alleles at a locus can be genotyped using RepeatSeq v 0.8.2 [46]. Re-peatSeq works in conjunction with Tandem Repeats Finder (TRF) v. 4.07b [47]. Hence, microsatellites were mined from the reference transcriptome using TRF. This list of microsatellites generated by TRF is not as exhaustive as the list provided by SciRoKo, due to the more stringent criterion used by TRF that prevents mining shorter microsatellites. TRF was run using default settings, except for the parameter, raw score adjusted from the default 50 to 35, and the maximum repeat size adjusted to 6. All individuals were genotyped at the TRF-generated list of microsatellites using RepeatSeq. RepeatSeq was run with default settings along with parameters set to output "calls" and "repeatseq" files, which contain a list of all microsatellite genotype assignments and alignments at each microsatellite region, respectively.

Differential Expression
We performed differential gene expression analysis in OmicsBox version 1.4 (BioBam Bioinformatics S.L., Valencia, Spain) with the edgeR package [48]. False discovery rate (FDR) was set to 0.05 with a log fold change of ≤2 or ≥2, and read counts were normalized for relative expression and effective library size with the TMM (Trimmed Mean of M-values) method implemented in edgeR. The minimum number of samples reaching more than zero counts per million reads (CPM filter) was set to three to match the number of samples in the smallest group (H. bolanderi) in the data set. We performed differential expression analyses for each pair of species (10 comparisons), and genes with log fold change of ≤2 or ≥2 at FDR < 0.05 for each comparison were identified as significantly differentially expressed (DE) genes for each pairwise species comparison and used in downstream analyses.

Gene Ontology (GO) Enrichment Analysis
A complete list of gene ontology (GO) terms associated with all annotated H. annuus genes available as part of the reference genome [49], was downloaded from https://www. Microsatellite-containing differentially expressed genes for each pairwise species comparison were extracted from the list of microsatellite-containing genes identified by SciRoKo that were successfully mapped to the reference genome. Helianthus annuus gene IDs (HanXRQr2.0 IDs) associated with microsatellite-containing differentially expressed genes were used as the "Test-set", and the background list of all H. annuus gene IDs associated with the reference transcriptome was used as the "Reference-set" in the GO enrichment analysis for each pairwise species comparison. Additionally, we conducted a GO enrichment analysis of all microsatellite-containing genes that were differentially expressed in at least one pairwise species comparison against the background list of all expressed genes in the reference transcriptome. All GO enrichment analyses were conducted in OmicsBox version 1.4 by performing Fisher's Exact Test with FDR set to 0.05.

Relative Importance of Microsatellites in Species Divergence
We are interested in understanding the role of microsatellites in species divergence via gene expression changes. To test this hypothesis, Chi-squared tests were performed to detect if microsatellite-containing genes were more likely to be differentially expressed than genes lacking microsatellites. We performed Chi-squared tests for each pairwise species comparison, and across all comparisons with microsatellites that were identified in DE genes in at least one pairwise species comparison. If microsatellites are important for bringing about species level divergence, then an elevated proportion of microsatellite-containing genes would show differential expression as opposed to genes lacking microsatellites.
Furthere, using the alignment start and end sites from the BLASTX output of the reference transcriptome and the microsatellite start and end sites from the SciRoKo output of the microsatellite search, we identified the location of the microsatellites (non-coding versus coding) within the transcripts. Previous large-scale studies on humans and plants have shown that in transcribed regions, more microsatellites are located within non-coding regions than in coding regions [50][51][52]. We observed similar patterns of microsatellite distribution in the reference transcriptome used in this study. Given the abundance of microsatellites within non-coding regions, which includes UTRs, and their implicated role in gene expression regulation [53][54][55], we tested with Chi-squared tests whether genes containing microsatellites within non-coding regions were more likely to be differentially Symmetry 2021, 13, 933 5 of 15 expressed compared to genes that contained microsatellites in coding regions and genes lacking microsatellites.

Population Genetic Analyses
Individual genotypes for SNPs for four species (H. annuus, H. debilis, H. petiolaris, and H. argophyllus) were used to conduct population genetic analyses. H. bolanderi and H. exilis were excluded from the population genetic analyses due to low sample size. PGDSpider v. 2.0.4 [56] was used to convert the SNP genotypes files into formats suitable for further population genetic analyses. To assess the level of genetic divergence among species, F ST values [57] were estimated in R v.2.15.3 [58] using the package Hierfstat v. 0.04-10 [59] for SNPs in both microsatellite-containing transcripts and those lacking microsatellites.
Microsatellites on average tend to have higher mutation rates when compared to the rest of the genome [60], hence, to obtain a neutral estimate of genetic divergence between genes harboring microsatellites and gene lacking microsatellites, F ST values at SNPs were estimated separately for each category. With respect to microsatellites' role in species divergence, we assume that microsatellite alleles of different lengths could aid in differential response to the environment; hence, different microsatellite alleles are likely to be positively selected in different species. As such, if variation in microsatellite lengths is contributing to differences among species, then allele frequency differences at microsatellite-containing genes are likely to be elevated among species when compared to the background rate of divergence. Hence, to ascertain if genetic divergence observed in microsatellite-containing genes is likely to be higher or lower than the background rate of divergence, we compared F ST values from SNPs at microsatellite-containing genes versus those lacking microsatellites. Wilcoxon Rank Sum tests were carried out in R v.15.2.0 [58] to evaluate significance.

Shared Microsatellites
To estimate the proportion of microsatellite tracts identified in H. annuus that are shared across other Helianthus species, a subset of 1129 microsatellite loci with sufficient genotype data across 48 out of the 50 individuals was extracted from microsatellites detected by TRF. Coverage was determined based on SAMtools idxstats gene-by-gene expression data, with >0 reads found at a gene considered to indicate coverage. This set of 1129 was then analyzed by RepeatSeq, with successful genotyping of a locus in an individual considered as evidence for that locus being shared with that individual. Motif size and gene region information obtained via ESTscan v. 3.0.2 [61,62] data were then used to test whether the location of the microsatellite within a gene determined the likelihood of that microsatellite being shared among the five Helianthus species. Values were normalized relative to the average detection rate of loci in H. annuus.

SNPs and Microsatellites Mined
Approximately 200,000 high-quality SNPs, corresponding to 99.9% genotyping accuracy, were mined from the data using parameters previously described in [41]. All individuals were genotyped at these high-quality SNPs.
A total of 11,166 putative microsatellites were identified in the reference transcriptome by SciRoKo and of those, 9479 microsatellites located in 7247 transcripts were successfully mapped onto genes in the reference genome (Supplementary Table S1). A fraction (3786/11,166) was identified using TRF and used for further genotyping with RepeatSeq to assess the proportion of microsatellites shared among species. High-quality microsatellite loci were genotyped at 1129 of the original list of 3786 microsatellite loci in 48 individuals with RepeatSeq, and that information was used to identify the proportion of microsatellites that each species shared with H. annuus.

Differential Expression Analysis
Significantly differentially expressed (DE) genes were identified for each pairwise species comparison with edgeR. The highest number of DE transcripts was identified between H. exilis and H. petiolaris (20,851), of which 14,756 were mapped to annotated genes in the H. annuus genome (Supplementary Table S2). This estimate was closely followed by 20

Microsatellite-Containing Differentially Expressed Genes
Using the list of 7247 microsatellite-containing genes in the reference transcriptome, we identified microsatellite-containing DE genes in all pairwise species comparisons (Supplementary Table S3). The highest percentage of microsatellite-containing DE genes was identified between H. bolanderi and H. debilis (26.4% of the 7671 DE genes) (Figure 1a). This was followed by 1952 (25.9% of the 7539 DE genes) microsatellite-containing DE genes identified in the H. annuus and H. bolanderi comparison (Figure 1a). The lowest percentage of microsatellite-containing DE genes was observed between H. annuus and H. petiolaris at 22% (1208 of 4282 DE genes) (Figure 1a). Across all 10 pairwise species comparisons, we identified 4978 microsatellite-containing DE genes (23.8% of 20,886 DE genes). In these microsatellite-containing genes that were identified as differentially expressed in at least one pairwise species comparison, the highest percentage of microsatellites was located in non-coding regions (68.4% of 5451 microsatellites). We observed similar patterns of microsatellite distribution in DE genes identified in pairwise species comparisons, and the greatest percentage of non-coding microsatellites were identified in the DE genes between H. annuus and H. debilis (74.5% of 106 microsatellites) ( Table 1).  Figure  1b). However, trends were consistent with an elevated rate of differential expression between most species comparisons, and the relatively low numbers of DE genes identified in some of these pairwise comparisons may have likely reduced the power to detect significant associations. Chi-squared tests were performed to test whether microsatellite-containing genes were more likely to be differentially expressed between species pairs (* p-value < 0.0001, ** p-value < 0.00001). (b) Distribution of microsatellites in different transcript regions within differentially expressed genes in pairwise Helianthus spp. comparisons. Chi-squared tests were performed to test whether genes with microsatellites in non-coding regions were more likely to be differentially expressed compared to genes with microsatellites in coding regions and genes lacking microsatellites (* p-value < 0.01, ** p-value < 0.00001).

Functional Annotation and Gene Ontology Enrichment Analysis
The BLASTX search of the reference transcriptome against the annotated H. annuus genome produced best hits for 32,425 out of 51,468 transcripts (Supplementary Table S4). Using the output of SciRoKo we identified 7247 microsatellite-containing genes from the Chi-squared tests were performed to test whether microsatellite-containing genes were more likely to be differentially expressed between species pairs (* p-value < 0.0001, ** p-value < 0.00001). (b) Distribution of microsatellites in different transcript regions within differentially expressed genes in pairwise Helianthus spp. comparisons. Chi-squared tests were performed to test whether genes with microsatellites in non-coding regions were more likely to be differentially expressed compared to genes with microsatellites in coding regions and genes lacking microsatellites (* p-value < 0.01, ** p-value < 0.00001).
In general, when genes that were identified as differentially expressed in at least one pairwise species comparison (20,886) were considered, microsatellite-containing genes were more likely to be differentially expressed than genes lacking microsatellites (Chi-squared test, p-value < 0.00001, Figure 1a). Similarly, in 6 of the 10 pairwise species comparisons, we found that microsatellite-containing genes were more likely to show expression divergence compared to genes lacking microsatellites (Figure 1a). In four pairwise species comparisons (H. petiolaris v. H. debilis, H. bolanderi v. H. exilis, H. annuus v. H. petiolaris, and H. annuus v. H. debilis), presence of microsatellites did not affect the likelihood of differential expression in genes (Chi-squared test, p-value > 0.05) (Figure 1a).
We further tested whether genes containing microsatellites within non-coding regions (UTRs and introns) are more likely to be differentially expressed compared to genes with microsatellites in the coding regions and genes lacking microsatellites. Generally, across genes that were identified as differentially expressed in at least one pairwise species comparison (20,886), a gene with a microsatellite in non-coding regions was more likely to be differentially expressed compared to genes with microsatellites in the coding regions and genes lacking microsatellites (Chi-squared test, p-value < 0.0001) (Figure 1b). We observed similar associations of non-coding microsatellites and differential expression in six of the 10 Figure  1b). However, trends were consistent with an elevated rate of differential expression between most species comparisons, and the relatively low numbers of DE genes identified in some of these pairwise comparisons may have likely reduced the power to detect significant associations.

Functional Annotation and Gene Ontology Enrichment Analysis
The BLASTX search of the reference transcriptome against the annotated H. annuus genome produced best hits for 32,425 out of 51,468 transcripts (Supplementary Table S4). Using the output of SciRoKo we identified 7247 microsatellite-containing genes from the list of best hits (Supplementary Table S1).
GO enrichment analysis of microsatellite-containing DE genes compared to genes in the reference transcriptome identified enriched GO terms in seven out of the 10 pairwise species comparisons (Supplementary Table S5). The number of GO terms enriched in pairwise species comparisons ranged from 12 to 73 with most GO terms identified in the H. bolanderi v. H. petiolaris comparison (Supplementary Table S5). These enriched GO terms were further reduced to the most specific GO terms with OmicsBox (v 1.4) (Supplementary Table S6, Figure 2). The enriched GO terms represented the three GO categories, namely, biological process, molecular function, and cellular component. Some noteworthy enriched GO terms in microsatellite-containing DE genes across multiple pairwise species comparisons included "regulation of transcription, DNA-templated" (GO:0006355, in five comparisons), "DNA-binding transcription factor activity" (GO:0003700 in four comparisons), and "hormone-mediated signaling pathway" (GO:0009755 in five comparisons) ( Figure 2).
When microsatellite-containing genes that were identified as DE in at least one pairwise species comparison was used as the 'test-set', we identified 48 enriched GO terms across all three GO categories Supplementary Table S7. These 48 GO terms were further reduced to 10 specific GO terms that included some of the GO terms associated with transcription regulation identified in pairwise species comparisons Supplementary Table S8. When microsatellite-containing genes that were identified as DE in at least one pairwise species comparison was used as the 'test-set', we identified 48 enriched GO terms across all three GO categories Supplementary Table S7. These 48 GO terms were further reduced to 10 specific GO terms that included some of the GO terms associated with transcription regulation identified in pairwise species comparisons Supplementary Table S8.

Population Genetic Estimates
To infer the relative divergence of microsatellite-containing genes to those lacking microsatellites, FST values at SNPs in these two sets of genes were compared. Mean pairwise FST at microsatellite-containing genes were significantly greater than those of genes lacking microsatellites (Table 2).

Population Genetic Estimates
To infer the relative divergence of microsatellite-containing genes to those lacking microsatellites, F ST values at SNPs in these two sets of genes were compared. Mean pairwise F ST at microsatellite-containing genes were significantly greater than those of genes lacking microsatellites (Table 2).

Shared Microsatellites
Across species, we found that H. bolanderi shared the largest portion of H. annuus microsatellite loci (94.2%); H. debilis, H. exilis, and H. petiolaris showed considerably lower percentages of shared microsatellites (79.1%, 72.7%, and 70.7%, respectively). Of repeat motifs, mono-and dinucleotide repeat tracts were less shared across all species (Figure 3). Trinucleotide tracts were much more likely to be shared than other repeat sizes in H. debilis (0.971; average 0.791) and H. petiolaris (0.858; average 0.707) (Figure 3). H. annuus hexanucleotides were preferentially shared with H. exilis (0.893; average 0.727) (Figure 3). The position of microsatellite tracts within a gene was also seen to influence the proportion of shared microsatellites. In all species, the percentage of shared microsatellites found in the 3 UTR and in transcripts that did not detectably code for proteins was lower than the percentage of shared microsatellites found in the 5 UTR and in the coding region ( Figure 4). position of microsatellite tracts within a gene was also seen to influence the proportion of shared microsatellites. In all species, the percentage of shared microsatellites found in the 3′UTR and in transcripts that did not detectably code for proteins was lower than the percentage of shared microsatellites found in the 5′UTR and in the coding region ( Figure 4).

Discussion
Microsatellites are an integral part of eukaryotic genomes and are abundant in both functional and non-functional regions. Frequently found in cis-regulatory regions of genes, microsatellites have been implicated in gene function and products linked to several traits. Here we tested the prediction that microsatellites in transcribed regions of the genome are involved in gene expression divergence among species with 50 individuals from nine closely related Helianthus species using an RNA-seq approach. Our results show

Discussion
Microsatellites are an integral part of eukaryotic genomes and are abundant in both functional and non-functional regions. Frequently found in cis-regulatory regions of genes, microsatellites have been implicated in gene function and products linked to several traits. Here we tested the prediction that microsatellites in transcribed regions of the genome are involved in gene expression divergence among species with 50 individuals from nine closely related Helianthus species using an RNA-seq approach. Our results show that microsatellite-containing genes in general are more likely to be differentially expressed among species. Furthermore, compared to genes with microsatellites in coding regions and genes lacking microsatellites, genes that harbor microsatellites in non-coding regions are more likely to be differentially expressed among species. We found that microsatellitecontaining differentially expressed genes were significantly enriched for GO terms such as "regulation of transcription, DNA-templated" (GO:0006355) and "DNA-binding transcription factor activity" (GO:0003700), among others. Population genetic analyses indicate greater levels of divergence in genes that contained microsatellites compared to those lacking microsatellites.
In most pairwise species comparisons and overall, we observe that microsatellitecontaining genes are more likely to show gene expression divergence compared to those lacking microsatellites (Figure 1a), which highlights the potential contribution of microsatellites to Helianthus species adaptation through gene expression regulation. Similar patterns linking microsatellites to gene expression divergence have been reported in studies on primates. With a large panel of microsatellites from humans and other primates, Ref. [63] reported that genes harboring microsatellites in promoters, introns, UTRs, and coding regions showed greater levels of gene expression divergence among species when compared to genes lacking microsatellites. This study further revealed that genes with microsatellites in transcribed regions consistently show elevated inter-species gene expression levels across different tissue types [63]. At the intra-species level, microsatellites of short motif types, specifically A and AG repeats, have been implicated in gene expression divergence among common sunflower populations, which suggests that specific microsatellite motif types may be more likely to influence gene expression divergence [37]. In addition, a recent study on common sunflower populations across a narrow latitudinal range detected hundreds of transcribed microsatellites linked to gene expression variation among and within populations, which provides further evidence of the regulatory role of microsatellites at an even finer scale [32].
The location of microsatellites within transcripts could be crucial in determining their regulatory role. Our results indicate that a gene containing a microsatellite in non-coding regions that may include UTRs are more likely to show gene expression divergence among sunflower species. Previous reports on UTR microsatellites have linked them to gene expression variation in several species [55,64]. Some of the proposed mechanisms by which UTR microsatellites can regulate gene expression involve altering transcription start and end sites [64,65] and influencing mRNA stability [66]. Microsatellites linked to gene expression variation (eSTRs) in the human genome have been found to significantly overlap transcription factor binding sites [29]. Several experimental studies have demonstrated that gain or loss of repeat units in microsatellites can impact binding of transcription factors causing changes in transcription rates [67][68][69]. Certainly, our results from the GO enrichment analysis of microsatellite-containing differentially expressed genes appear to suggest that microsatellite-mediated gene expression regulation could be linked to transcription factor binding in Helianthus.
Population genetic analysis indicates that microsatellite-containing genes could be under stronger selective pressures than those lacking microsatellites. These patterns are significant and consistent across all pairwise species comparisons. Previously, a study on wild emmer wheat reported asymmetric or non-random distribution of microsatellite repeat numbers among microclimatic niches as evidence for selection on microsatellites [13]. Collectively, our estimates provide compelling evidence to support the potential contribu-tion of microsatellites to species divergence in Helianthus. However, we acknowledge that our use of a single Helianthus annuus individual transcriptome as the reference could have introduced some bias when mapping and mining SNPs. Furthermore, short-read technology and RNA-seq data come with their own set of limitations and challenges. Short reads could have limited access to some longer microsatellite alleles in some species, which could have affected RepeatSeq-based estimates of shared microsatellites. RNA-seq only provides access to the transcribed regions of the genomes; therefore, microsatellites upstream of 5 UTRs that could potentially influence gene expression divergence are beyond our reach. Given these limitations, we believe that the evidence that this study provides supporting a potential causal relationship between the presence of microsatellites and gene expression divergence is consistent across multiple species comparisons and is substantial. However, the specific mechanisms by which microsatellites may influence gene expression divergence among species remain largely unknown. Therefore, in future, studies targeting specific microsatellite-containing differentially expressed genes may be necessary to further explore mechanisms underlying the potential causal relationship between microsatellites and gene expression divergence among species. Collectively, our study shows that transcribed microsatellites could potentially contribute to species divergence; therefore, we expect that integrating analysis of these highly mutable repetitive elements in gene expression studies could greatly enhance our understanding of the genomic basis of species divergence.
Supplementary Materials: The following are available online at https://www.mdpi.com/article/10 .3390/sym13060933/s1, Table S1: List of microsatellite-containing transcripts in the reference transcriptome mapped to the Helianthus annuus reference genome, Table S2: Differentially expressed transcripts identified in the ten pairwise species comparisons, Table S3: List of microsatellite-containing transcripts differentially expressed in each pairwise Helianthus species comparison, Table S4: BLASTX results from mapping the reference transcriptome to the annotated Helianthus annuus reference genome, Table S5: Gene Ontology (GO) terms enriched within microsatellite-containing differentially expressed genes identified in pairwise Helianthus species comparisons compared to the reference transcriptome, Table S6: Gene Ontology (GO) terms enriched (reduced to most specific) within microsatellite-containing differentially expressed genes identified in pairwise Helianthus species comparisons compared to the, Table S7: Gene Ontology (GO) terms enriched within microsatellitecontaining genes differentially expressed in at least one pairwise species comparison compared to the reference transcriptome, Table S8: Gene Ontology (GO) terms enriched (reduced to most specific) within microsatellite-containing genes differentially expressed in at least one pairwise species comparison compared to the reference.