Population Genomics Reveals Gene Flow and Adaptive Signature in Invasive Weed Mikania micrantha

A long-standing and unresolved issue in invasion biology concerns the rapid adaptation of invaders to nonindigenous environments. Mikania micrantha is a notorious invasive weed that causes substantial economic losses and negative ecological consequences in southern China. However, the contributions of gene flow, environmental variables, and functional genes, all generally recognized as important factors driving invasive success, to its successful invasion of southern China are not fully understood. Here, we utilized a genotyping-by-sequencing approach to sequence 306 M. micrantha individuals from 21 invasive populations. Based on the obtained genome-wide single nucleotide polymorphism (SNP) data, we observed that all the populations possessed similar high levels of genetic diversity that were not constrained by longitude and latitude. Mikania micrantha was introduced multiple times and subsequently experienced rapid-range expansion with recurrent high gene flow. Using FST outliers, a latent factor mixed model, and the Bayesian method, we identified 38 outlier SNPs associated with environmental variables. The analysis of these outlier SNPs revealed that soil composition, temperature, precipitation, and ecological variables were important determinants affecting the invasive adaptation of M. micrantha. Candidate genes with outlier signatures were related to abiotic stress response. Gene family clustering analysis revealed 683 gene families unique to M. micrantha which may have significant implications for the growth, metabolism, and defense responses of M. micrantha. Forty-one genes showing significant positive selection signatures were identified. These genes mainly function in binding, DNA replication and repair, signature transduction, transcription, and cellular components. Collectively, these findings highlight the contribution of gene flow to the invasion and spread of M. micrantha and indicate the roles of adaptive loci and functional genes in invasive adaptation.


Introduction
Biological invasion has been reported at increasing rates and on expanding scales since the 20th century, gaining the attention of ecologists and evolutionary biologists [1]. A long-standing unresolved issue in invasion biology concerns the ability of invading species to rapidly adapt to novel environments in nonindigenous areas. A positive correlation between genetic diversity and invasion success has been proposed, as many invaders maintain a level of genetic diversity similar to or even higher than that of native species [2,3]. This correlation has been confirmed in many successful invasive species, especially those that have experienced severe genetic bottlenecks [4,5]. The maintenance of genetic diversity is often related to gene flow between invasive populations [6]. Gene flow can lead to genetic exchange between nearby populations and alter the distribution of genetic variation across the invaded range, preventing a decline in genetic diversity from core to edge improving their invasion potential [21]. Frequent and long-distance seed dispersal may affect genetic differentiation and gene flow among M. micrantha populations. Considering its high dispersal potential and strong adaptability, M. micrantha is an ideal candidate for investigating the contribution of gene flow to population expansion, the response of genetic loci to environmental variables, and the role of functional genes in invasive adaptation.
M. micrantha spread rapidly throughout southern China following its introduction to the Hong Kong Zoological and Botanical Gardens in 1884 [29]. The role of genetic and epigenetic variations in the population expansion of M. micrantha in southern China has been investigated in several genetic studies, all of which used a small number of anonymous genomic markers [30][31][32][33][34][35]. Nevertheless, the available population genetic data failed to provide a comprehensive understanding of the genetic status and adaptive response of M. micrantha populations. Some key issues have not been resolved in these genetic studies. First, different molecular markers detected obviously different levels of genetic differentiation across populations distributed in similar geographic ranges (values ranging from 0.044 to 0.442), hindering the understanding of how gene flow contributes to the invasive success of M. micrantha in southern China. Secondly, genetic loci that respond to environmental variables have not been detected; consequently, the effect of environmental variables on genetic variation and population adaptation remains unresolved. Third, candidate functional genes related to colonization and population adaptation have not yet been identified. Population genomic data for natural populations of M. micrantha are needed to address these issues.
In the current study, we used the GBS approach to generate a large number of SNPs to analyze 306 M. micrantha individuals from 21 populations in six regions, covering the main part of its invasive range in southern China. Based on the SNP data, we detected the effect of gene flow on the genetic variation in the introduced populations of M. micrantha. We also identified outlier SNPs associated with environmental variables. Furthermore, the gene family clustering analysis of the genomes of M. micrantha and 12 other plants revealed unique gene families in M. micrantha that are important for adaptation. We also detected candidate genes that showed a significant positive selection signature in M. micrantha and explored their functions. The current study provides new insights into the population expansion and adaptation of M. micrantha.

Plant Material and Soil Sampling
In 2016, 306 M. micrantha individuals were sampled from 21 invasive populations distributed in six regions in southern China, including Hong Kong, Macao, Shenzhen, Neilingding Island, Dongguan, and Zhuhai ( Figure 1; Table S1). From each population, 10-16 plants each growing more than 15 m apart were randomly sampled. Fresh leaves were collected from each plant sample and preserved in silica gel. Additionally, three soil samples that were evenly distributed within each population were collected. Approximately 1 kg of soil per sample was taken from a depth below 10 cm.

DNA Extraction and GBS Library Sequencing
Genomic DNA from 306 M. micrantha individuals was extracted using the cetyltrimethylammonium bromide (CTAB) method [36]. DNA quality was evaluated using 1% agarose gel electrophoresis and the DNA amount was quantified using a NanoPhotometer spectrophotometer (IMPLEN, CA, USA). The DNA concentration was measured using a Qubit 2.0 Flurometer (Invitrogen, Life Technologies, CA, USA). High-quality DNA (concentration ≥ 20 ng/µL and mass ≥ 300 ng) was used for GBS library construction.
GBS libraries were constructed for 306 M. micrantha DNA samples using MseI (New England Biolabs, NEB) and EcoRI restriction enzymes. Briefly, the DNA samples were incubated with MseI at 37 • C. The digested DNA samples were ligated to MseI Y adapter N containing individual-specific barcodes with T4 DNA ligase. To further reduce the complexity and increase genome coverage, the MseI-digested samples were redigested with EcoRI. EcoRI-digested DNA fragments were purified using Agencourt AMPure XP (Beckman) and PCR-amplified using Phusion Master Mix (NEB), universal primer, and index primer. Amplified DNA from 306 M. micrantha individuals was purified and screened for size. Size-selected fragments (insert sizes of 265-315 bp) were purified and diluted for sequencing. Samples for all M. micrantha individuals were divided into 12 pooled libraries, which were sequenced using the Illumina NovaSeq platform (Illumina, San Diego, CA, USA) to generate 150 bp paired-end reads.

Data Quality Control and SNP Calling
Sequences from each M. micrantha individual were sorted based on the individualspecific barcode within the raw reads. Raw reads for each individual were qualitycontrolled with in-house Perl scripts. Clean reads were obtained by removing reads that contained more than 10% ambiguous bases (N), more than 50% low-quality (Phered quality score < 5) bases, and bases that aligned to adapters exceeding 10 nt.
Clean reads from 306 individuals were aligned to the M. micrantha reference genome (unpublished genome data from our laboratory) using the Burrows-Wheeler Aligner (BWA) v0.7.17 [37] with the parameters mem -t 4 -k 32 -M. The alignment results were transformed into BAM files, which were then sorted in SAMtools v1.3 [38]. mpileup in SAMtools was used for SNP calling, with the parameters -q 1 -C 50 -t SP -t DP -m 2 -F 0.002. SNP quality control was conducted using BCFtools in SAMtools with the following command: vcfutils.pl varFilter -Q 20 -d 2 -D 100000. To ensure the reliability and accuracy of SNP identification, GATK v3.8 [39] was also used to identify SNPs, with the following parameters: -T VariantFiltration QD < 4.0, FS > 60.0, MQ < 40.0, and GQ < 5. Only SNPs identified by both SAMtools and GATK were retained. To obtain a reliable and high-quality SNP dataset, the SNP loci were filtered using the following criteria: genotype quality (GQ) > 20, depth of 5-100, minor allele frequency (MAF) > 0.05, call rate > 0.5, and only biallelic SNPs retained. To determine the locations and mutation types of SNPs in the M. micrantha genome, ANNOVAR [40] was used to annotate the high-quality SNPs identified in 306 individuals from 21 populations.

Genetic Variation and Population Structure
The genetic diversity indices, including the observed heterozygosity (H O ), gene diversity (H S ), allelic richness (A R ), and inbreeding coefficients (F IS ), were calculated for each population and region using the R package hierfstat [41]. The means and 95% confidence intervals of these four indices were also calculated, and the differences in these four diversity parameters among the six regions were tested using Kruskal-Wallis and post hoc Nemenyi test implemented in the R package PMCMRplus [42]. The number of private alleles for each population and region was determined using the R package poppr [43]. In addition, the correlation of population genetic diversity with the longitude and latitude was examined using linear regression and Pearson correlation analysis (p < 0.05).
To determine the level of genetic differentiation in the study areas, pairwise Weir and Cockerham's F ST values were assessed between populations/regions using the diveRsity package in R [44]. To test how geographic isolation affected the genetic differentiation of populations, the correlation between genetic and geographic distance was examined using the ade4 package in R [45]. Hierarchical analysis of molecular variance (AMOVA) was performed using the poppr package in R [43] to identify genetic variation partitioned within and among populations. The gene flow among populations/regions was also analyzed, and the parameter N m was calculated from the pairwise F ST values using the formula N m = (1-F ST )/4F ST .
To investigate the genetic structure of M. micrantha, PLINK v1.9 [46] was used to filter the obtained high-quality SNP loci (see Section 2.3) using the following parameters: -indep-pairwise window size = 50, sliding window = 5, and correlation coefficient (r 2 ) = 0.8. Filtered SNPs (5966 SNPs) were used for genetic structure analysis. ADMIXTURE v1.23 [47] was used to infer the population structure of 306 individuals. The number of genetic clusters (K) was set from 2 to 22. Cross-validation (CV) error was used to select the most likely number of clusters. The K value with the lowest CV error was determined to be the optimal number of clusters. GCTA v1.26 [48] was used to perform principal component analysis (PCA) to explore the genetic structure of M. micrantha. Based on the pairwise genetic distances among individuals, the unweight pair group method with the arithmetic mean (UPGMA) method implemented in MEGA v5.10 [49] was used to construct the phylogenetic tree. One thousand bootstrap replicates were conducted to obtain the branch support rate of the phylogenetic tree. The phylogenetic tree was visualized using FigTree v1.4.2 [50].

Environmental Variables
The environmental data comprised 19 bioclimatic variables, 25 soil factors, and 6 ecological variables. The bioclimatic variables corresponding to historic conditions (from 1960 to 1990) for each population were obtained from WorldClim v1.4 (http:// www.worldclim.org (accessed on 5 March 2019)) at a 2.5-arc-minute resolution. The soil factors, including fresh water content; air-dried water content; pH; electrical conductivity; organic matter content; total N, C, K, Ca, Na, Mg, Al, P, S, Si, Fe, Mn, Zn, Cu, Pb, Cr, As, Se, Ni, and Cd were measured following the method of Shen et al. (2021) [35]. The ecological variables included altitude, normalized difference vegetation index, enhanced vegetation index, percent of tree cover, percent of non-tree vegetation cover, and percent of non-vegetation cover. Altitude values were determined from the sampling locations of the 21 populations. The remaining five ecological variables were acquired from the Moderate Resolution Imaging Spectroradiometer (MODIS) dataset stored in the Land Process Distributed Active Archive Center (http://lpdaac.usgs.gov (accessed on 8 March 2019), recorded in 2006-2016). The yearly mean value for each ecological variable derived from the MODIS dataset was computed using the maximum value composites function [51]. To reduce multicollinearity among environmental variables, the variance inflation factors (VIFs) were calculated for the 19 bioclimatic variables, 25 soil factors, and 6 ecological variables using the vif function implemented in R package usdm [52]. Environmental variables with VIF < 10 were retained, including five bioclimatic variables, 12 soil factors, and 6 ecological variables (Table S2).

Identification of Candidate Selective Loci and Function Annotation
To determine whether there were SNP loci potentially under selection in the 21 populations, an F ST -based outlier approach was used with the BayeScan software [53]. It has been reported that the BayeScan software has an advantage over other similar programs in reducing false-positive rates. This software employs a logistic regression that dissects genetic variation into a locus-specific F ST coefficient (α) shared by all populations and a population-specific F ST coefficient (β) shared by all loci [53]. A positive α value indicates diversifying selection, while a negative α value indicates purifying or balancing selection. BayeScan was run with the default parameters. To reduce false positives, only SNP loci with log 10 (PO) > 2 were considered outlier SNPs under selection. Given that purifying or balancing selection is more prone to elevated false-positive rates when identifying outlier loci in BayeScan under the context of range expansion [54], only outlier SNPs with diversifying selection were analyzed. In the association analysis between outlier SNPs and environmental variables, only SNPs with positive α values were considered.
The genes containing outlier SNPs identified by BayeScan were functionally annotated. Annotation based on the Pfam protein database was performed using Hmmscan implemented in HMMER v3.1 [55]. Based on the annotation results in the Pfam database, gene ontology (GO) terms were acquired using Blast2GO [56] and a custom script. SwissProt database annotation was performed using DIAMOND v0.8.36 [57] with an E-value cutoff of 1 × 10 −5 .

Association of Candidate Selective Loci with Environmental Variables
The latent factor mixed model (LFMM) was used to detect the associations of outlier SNPs with environmental variables in the R package LEA [58]. LFMM has been confirmed to have a good balance between high power and a low false-positive rate, and thus to accurately detect the correlation of a single locus and a univariable [59]. The number of latent factors (K) was selected using the snmf function, which uses a Bayesian clustering algorithm to estimate the admixture coefficients of individuals. Based on the results of the snmf analysis obtained with 10 repetitions and 20,000 iterations for each K value from 1 to 22, the K value with the minimum cross-entropy was identified. This K value was used to perform LFMM analysis with the lfmm function. The lfmm function was implemented using 50,000 burn-in, 100,000 iterations, and 10 replicates. The p-values were used to test the significance of the correlation, with a p-value < 0.005 representing a significant association.
The Bayesian method implemented in the BAYENV v2.0 package [60] was also used to evaluate the association of allele frequencies with environmental variables. BAYENV uses the covariance of allele frequencies between populations as a null model to identify SNPs that are potentially under selection. To maximize the stability in estimating the null model, three covariance matrices were generated from three independent runs, and the number of iterations for each run was set to 100,000. The three covariance matrices were then averaged. To determine whether the average covariance matrix reflected the variance of allele frequencies between populations, the average covariance matrix was compared to a pairwise F ST matrix using a Mantel test in the R package ade4 [45] with 1000 permutations. Based on the average covariance matrix, BAYENV was implemented three times to infer the Bayes factor for the correlation between each SNP and each environmental variable. SNP loci with a mean Bayes factor greater than three were retained as potentially undergoing selection.

Gene Family Analysis
Orthologous gene clusters in the genomes of M. micrantha and 12 other plants were identified using the OrthoMCL program [61]. In addition to the predicted M. micrantha proteins, protein sets from 12 other plants were obtained from public sources (Table S3). Protein sequences from the 13 species were processed for a minimum protein length of 30 amino acids. To obtain the similarity relationship between the protein sequences, an all-against-all comparison was conducted using BLASTP with an E-value cutoff of 1 × 10 −5 . OrthoMCL was used to cluster the BLASTP results to define gene families.
To identify biological pathways significantly enriched among the unique gene families of M. micrantha, Kyoto Encyclopedia of Genes and Genomes (KEGG) enrichment analysis was performed using KOBAS [62]. Adjusted p-values were used to determine the significance of the KEGG pathway enrichment, with adjusted p-values < 0.05 considered significant. In addition, ANNOVAR [40] was used to annotate the high-quality SNPs identified in 306 individuals from 21 populations to identify nonsynonymous SNPs. To further explore whether the genes unique to M. micrantha have developed nonsynonymous amino acid mutations at the population level, overlaps between genes containing nonsynonymous SNPs (Table S4) and genes unique to M. micrantha were detected. Overlapping genes were functionally annotated.

Identification of Positively Selected Genes
Based on the gene family clustering data for M. micrantha and the 12 other plants, single-copy orthologous genes of the 13 species were identified. The protein-coding sequences of single-copy orthologous genes were subjected to multiple sequence alignments using MUSCLE [63]. To identify candidate genes that underwent positive selection in M. micrantha, the branch-site model implemented in the codeml module of PAML [64] was used, with M. micrantha as the foreground branch and the 12 other plants as the background branches. The likelihood ratio test was used to evaluate the difference between the results obtained in the null and alternative models, and the p-values were tested again using the FDR method under the standard of 0.05. The positively selected genes (PSGs) were annotated and used for KEGG enrichment analysis. Furthermore, the overlaps between genes containing nonsynonymous SNPs and PSGs in M. micrantha were detected.

GBS Sequencing and SNP Calling
GBS library sequencing of 306 M. micrantha individuals from 21 invasive populations was performed. The analysis yielded a total of 3320.63 million raw reads, with a mean of 10.85 million reads per individual (Table S5). After quality filtering, 3320.39 million clean reads representing 478.14 G bases were retained, with a mean Q20 of 95.01% and a mean Q30 of 88.16%. The clean reads for each individual were mapped to the M. micrantha reference genome. The individual mapping rates ranged from 85.65% to 98.42% (Table S6), indicating good mapping. We successfully identified 117,870 and 5,066,387 SNPs using GATK and SAMtools, respectively. To ensure the reliability of the SNP identification, only SNPs identified by both GATK and SAMtools were selected. A total of 54,392 SNPs were simultaneously identified by these two methods. After filtering, 16,944 high-quality SNPs were identified across 306 individuals, with an average SNP call rate of 90.03% (range 42.94-96.88%) (Table S7). Among the 16,944 identified SNPs, 10,207 SNPs were involved in transition mutations and 6737 in transversion mutations, with the most common mutation type being a C/T transition (5156, 30.43%) and the least common being a C/G transversion (1089, 6.43%) (Table S8). Furthermore, most of the SNPs were located in intergenic regions (13,984,82.53%), with only 5.04% (854) located in exonic regions (Table S9). A total of 402 synonymous SNPs and 305 nonsynonymous SNPs were located within exons, generating a nonsynonymous/synonymous ratio of 0.76.  Table 1). The inbreeding coefficient (F IS ) values were positive in all 21 populations. The distribution of genetic diversity across 21 populations was not limited by longitude and latitude (p > 0.05), except for a weak correlation between A R and latitude ( Figures S1 and S2). In addition, no private alleles were found across populations. All six regions were consistently found to have similar genetic diversity and positive F IS values. The Kruskal-Wallis and post hoc Nemenyi test demonstrated that there were no significant differences in A R , H S , H O , and F IS between the six regions (p > 0.05) (Table S10). Regarding population differentiation, the F ST value was highest between HK3 and NLD2 (0.133) and lowest (0.004) between NLD2 and NLD6 (Table S11). We further evaluated the pairwise F ST values among the six regions, with F ST ranging from 0.008 (SZ versus DG) to 0.043 (SZ versus NLD) (Table S12). These results indicated low levels of genetic differentiation in M. micrantha, implying that gene flow between populations and between regions was common. Indeed, we detected a high rate of gene flow between populations and between regions. The level of gene flow (N m ) between populations ranged from 1.63 to 62.25 and from 5.564 to 31 between regions (Tables S11 and S12).

Genetic Variation and Population Genetic Structure
Principal component analysis (PCA) revealed an admixed distribution pattern in which individuals from different populations clustered randomly (Figure 2). To further explore the population structure of M. micrantha, genetic clustering of the 306 individuals was conducted using ADMIXTURE, which also provided evidence of an admixed structure at the optimal clustering level (K = 15) ( Figure 3 and Figure S3). Consistent with the results from the ADMIXTURE and PCA analyses, the UPGMA tree also reflected an unclear genetic relationship within M. micrantha. Individuals within a population were allocated to different genetic clades of the phylogenetic tree, while individuals from distinct populations were located on the same branch ( Figure S4). The analysis of molecular variance (AMOVA) revealed that 95.84% of the total variation occurred within populations, while only 4.16% occurred among populations (p < 0.001, Table S13). AMOVA detected a low genetic differentiation among populations (F ST = 0.042; p < 0.001). The Mantel test showed that the genetic differentiation between populations included a weak genetic relatedness caused by isolation by distance (IBD) (r = 0.311, p = 0.006).

Identification of Candidate Selective Loci and Gene Annotation
In the BayeScan analysis, 273 SNPs with diversifying selection were considered to be outlier SNPs at a threshold of log 10 (Table S14). The 11 genes had a length ranging from 5714 to 16,509 bp (Table S15). In addition, these 11 genes were annotated in the SwissProt and Pfam protein databases. GO terms were used to functionally classify the 11 genes. The genes were assigned to 29 terms: 10 terms in "biological process", 8 terms in "cellular component", and 11 terms in "molecular function" (Table S15). Under the biological process category, "metabolic process" (GO: 0008152), "response to stimulus" (GO: 0050896), and "oxidation-reduction process" (GO: 0055114) were highly representative GO terms. For the molecular function category, the most abundant functions were related to "binding" (GO: 0005488), "transporter activity" (GO: 0005215), and "transcription regulator activity" (GO: 0140110). In terms of the cellular component category, "membrane" (GO: 0016020) and "membrane part" (GO: 0044425) were the most noticeable functions.

Association of Candidate Selective Loci with Environmental Variables
We utilized two outlier test approaches-a LFMM analysis and the Bayesian method implemented in BAYENV-to detect signatures of adaptation among M. micrantha populations. These two approaches have great power for the identification of outlier loci associated with environmental variables. We identified 31 associations between 18 outlier SNPs and six environmental variables in the LFMM analysis (Table S16), and 67 associations between 24 outlier SNPs and eight environmental variables in BAYENV (Table S17). Combining the results of the two analysis approaches, we detected 96 associations between 38 outlier SNPs and 13 environmental variables. Among the associations, 14 were related to temperature, 25 to precipitation, 25 to soil factors, and 6 to ecological variables. Of the associated environment variables, soil C content was associated with the most outlier SNPs (21), followed by mean temperature of driest quarter (14), isothermality (12), precipitation of driest month (12), annual precipitation (11), precipitation of warmest quarter (7), soil Si content (6), soil Ca content (5), and percent of non-vegetation cover (4). Besides these nine environmental variables, soil K content and percent of tree cover were also noteworthy factors, and these may play a potential role in the adaptation of M. micrantha. Notably, four outlier SNPs were simultaneously identified in the BayeScan, LFMM, and BAYENV analyses and were associated with 10 environmental variables, suggesting that environmental variables are important factors affecting the population adaptation of M. micrantha.

Gene Family Analysis
To explore the special roles of unique gene families in the expansion and adaptation of M. micrantha, we performed a gene family clustering analysis on the genomes of M. micrantha and 12 other plants. Using the predicted protein set of these plants, a total of 36,638 orthologous gene families consisting of 409,166 genes were identified (Table S18). Furthermore, 6346 gene families containing 170,110 genes shared among the 13 plants and 683 gene families comprising 2256 genes were found to be unique to M. micrantha (Tables S18-S20). The length of the 2256 genes unique to M. micrantha ranged from 154 to 122,119 bp (Table S19). Among the gene families unique to M. micrantha, KEGG enrichment analysis revealed five significantly overrepresented KEGG pathways-"oxidative phosphorylation" (ko00190), "biosynthesis of unsaturated fatty acids" (ko01040), "photosynthesis" (ko00195), "other glycan degradation" (ko00511), and "ether lipid metabolism" (ko00565)-all of which are important for the growth, metabolism, and defense response of M. micrantha ( Figure 5; Table S21). Further, we found that eight genes unique to M. micrantha have developed mutations resulting in nonsynonymous amino acid substitutions across 21 populations. Among the eight candidate genes, three, four, and five genes were annotated in the SwissProt, Pfam, and GO databases, respectively (Table S22). These candidate genes encode proteins and participate in some functional categories such as glutathione S-transferase T3 (GSTT3) and "integral to membrane" (GO: 0016021), respectively.

Positive Selection Gene Analysis
The gene family clustering analysis revealed 393 single-copy orthologous genes among M. micrantha and the 12 other plants. Following the likelihood ratio test, 41 genes were considered candidate genes that had undergone positive selection in M. micrantha (Table S23). The 41 genes had a length ranging from 1723 to 16,686 bp (Table S24). Of the 41 PSGs, 30, 33, and 38 PSGs were annotated in the SwissProt, Pfam, and GO databases, respectively (Table S24). In the GO annotation, the highly represented GO terms were related to the functions of (i) binding, (ii) DNA replication and repair, (iii) signature transduction, and (iv) transcription and cellular components ( Figure S5). These functions may be important for the colonization and adaptation of M. micrantha. KEGG enrichment analysis of 41 PSGs showed that "SNARE interactions in vesicular transport" (ko04130) were significantly enriched (Table S25; Figure S6). In addition, two PSGs harbored mutations resulting in nonsynonymous amino acid substitutions in the 21 M. micrantha populations, and these were mainly related to binding, membrane component, and fatty acid biosynthesis.

Population Variation and Structure
During population invasion, factors affecting the invasiveness of alien species, such as genetic diversity, may be influenced by genetic events, geographic distance, gene flow, and human activities [65]. Alien species that colonize nonindigenous areas generally experience the founder effect or bottleneck events that lead to a reduction in genetic diversity and influence the invasive potential of the species populations [66]. Heterozygosity and allele diversity are two representative indices of genetic diversity that are sensitive to the founder effect and bottleneck events [67,68]. In this study, obvious heterozygote deficiency was detected in M. micrantha across populations and regions. This is understandable, considering that M. micrantha has experienced severe founder effect and genetic bottlenecks during rapid colonization and invasion [33,69]. Inbreeding is a genetic trait observable in some invasive species [4] that can give rise to heterozygote deficiency within populations [70]. As M. micrantha is a self-incompatible species [71] and its local spread mainly relies on clonal propagation [72], inbreeding may be an important factor contributing to heterozygote deficiency. Indeed, we observed positive F IS values in all populations and regions, suggesting the occurrence of inbreeding in the studied populations and regions.
The genetic differentiation levels in M. micrantha were quite low (0.004 ≤ F ST ≤ 0.133), and lower than the interpopulation genetic differentiation level reported by previous genetic studies based on a small number of molecular markers (8-483 markers) [30,31,34]. In general, low-resolution molecular markers may cause bias in the assessment of genetic patterns. The genomic SNP data presented herein improved the assessment of genetic differentiation among populations of M. micrantha. Low genetic differentiation levels may occur because M. micrantha plants can produce enormous numbers of lightweight seeds that are dispersed over long distances by the wind [21], promoting colonization at different locations. The local establishment of M. micrantha mainly depends on clonal propagation, which may also be an important factor leading to low genetic differentiation. Human activities can result in seeds or cloned fragments being carried to different locations, increasing genetic exchange between individuals from different populations and weakening population divergence. In addition, the homogenization effect of frequent gene flow would reduce adaptive differentiation between populations. Consistent with this, we detected a high rate of gene flow (1.63 ≤ N m ≤ 62.25) among M. micrantha populations. Similarly, a high rate of gene flow has been found among populations within the Asian invasion range [33], indicating that M. micrantha population invasion is accompanied by a high rate of gene flow.
Generally, the range limit linked to species invasions affects the genetic diversity of invasive populations [73]. The genetic diversity of invasive populations generally decreases from the abundance centers where it was first introduced to the range margin [73]. However, marginal populations often rely on central or nearby populations as genetic sources for colonization, persistence, and range expansion [7,74]. A high rate of gene flow can release invasive species from environmental and geographical constraints on genetic diversity [5,6]. In addition to gene flow, multiple introductions can also lead to increased genetic diversity through recombination between invasive genotypes [5]. In this study, all populations were found to possess high and similar levels of genetic diversity that were higher than those detected previously in invasive and native M. micrantha populations [31,34,75], higher than those reported for other invasive plants [4,76], and not limited by longitude and latitude across the entire invasive area. These results suggest that M. micrantha exhibits patterns of population genetics that tend to enhance its invasion potential and that gene flow and multiple introductions (see the following paragraph) may mitigate the negative influence of the founder effect, genetic bottleneck, or inbreeding on genetic diversity and mediate genetic diversity distribution in its invasive range.
ADMIXTURE and PCA analysis revealed multiple genetic clusters and an admixed population structure in the study area, which was consistent with the genetic structure previously detected in populations that invaded Asia [33,69]. Multiple introduction and a high rate of gene flow are two important driving factors that cause population admixture. Generally, invasive populations that have undergone multiple introductions often contain private alleles from different geographic sources [76]. No private alleles were detected among the populations assessed in our study. We speculate that frequent gene flow likely mediates genotype movement among populations that are occupied by different native genotypes. In fact, a high rate of gene flow was observed among populations herein, which may constrain adaptive divergence and lead to a weak population structure [77]. The influence of gene flow on the genetic structure of M. micrantha was also confirmed by the weak correlation between genetic differentiation and geographic distance. Geographical isolation can explain genetic differentiation during the early period of population invasion, when populations are geographically dispersed. The effect of geographical isolation on population differentiation becomes weaker with a high rate of gene flow during population expansion. The continued invasion of M. micrantha in southern China (more than 130 years) and high rate of gene flow among populations can explain the weak correlation between genetic differentiation and geographic distance. Overall, M. micrantha was introduced multiple times and subsequently experienced a rapid range expansion with recurrent high rates of gene flow.

Adaptive Response to Environmental Variables
The detection of association between SNPs and environmental variables helps us to recognize the bioclimatic, soil, and ecological variables that contribute to population adaptation. During the process of M. micrantha invasion from tropical America to southern China, the new environments encountered likely induced candidate-selected loci that respond to environmental conditions. The LFMM and BAYENV analysis revealed significant associations between 38 outlier SNPs and 13 environmental variables, including temperature, precipitation, soil, and ecological variables, all of which may play roles in the invasive adaptation of M. micrantha. Specifically, (i) rapid adaptation to climate is conducive to range expansion of invasive plants. Temperature and precipitation are important determinants of the growth, adaptation, and potential distribution of M. micrantha [23]. Consistent with this, the mean temperature of driest quarter, isothermality, precipitation of driest month, annual precipitation, and precipitation of warmest quarter were significantly associated with outlier SNPs, which reflects the importance of these environmental variables in the distribution of this species [35,78] and indicates that the M. micrantha genome responds to temperature and precipitation fluctuations. (ii) The significant effect of soil composition on the invasion of M. micrantha has been confirmed [79]. The soil C/Ca/K/Si content was found to be significantly associated with outlier SNPs. These findings are not unexpected, considering the functional importance of these soil factors. For instance, C supply has an important effect on the expression of genes involved in primary and secondary metabolism, growth, transport, signaling, and defense [80]. C depletion has been found to be related to drought-induced seed abortion in maize [81]. Ca plays a crucial role in regulating plant growth, development, and stress responses [82]. Furthermore, K is an essential nutrient in plants and its deficiency alters plant growth, photosynthetic performance, and antioxidant capacity [83]. Indeed, soil K availability regulates plant invasive success [84] and it has been reported that the rapid growth of M. micrantha depends on accelerated activation of soil K [85]. Finally, Si participates in various physiological processes-e.g., it increases the chlorophyll biosynthesis and assimilation rate and alleviates oxidative damage in stressed plants [86]. In addition, Si supply can increase K accumulation in the shoots [86]. (iii) Ecological variables are also important factors affecting the adaptation of M. micrantha. We found significant associations between outlier SNPs, the percentage of non-vegetation cover, and the percentage of tree cover. M. micrantha often forms a dense thicket above nearby plants [21] and escapes the negative effects of tree or other vegetation cover on light availability [87]. These correlations suggest that the M. micrantha genome may respond positively to tree or vegetation cover. Taken together, these results reveal that the M. micrantha genome already has selective signatures that respond to the changing environment to promote its invasive adaptation.

Genes Unique to M. micrantha May Be Important for Adaptation
Based on gene family clustering analysis, we identified gene families unique to M. micrantha. Unique gene families were also observed in the genomes of Artemisia annua [88] and Chrysanthemum nankingense [89]. Gene families unique to M. micrantha were enriched in functional pathways related to important physiological processes and stress response, such as oxidative phosphorylation, the biosynthesis of unsaturated fatty acids, and photosynthesis. Oxidative phosphorylation is an important metabolic pathway that generates ATP required for plant growth and metabolism. There is evidence that oxidative phosphorylation plays a crucial role in abiotic stress responses. For instance, oxidative phosphorylation is enriched by genes associated with the cold tolerance in Anthurium andraeanum [90]. Unsaturated fatty acids play crucial roles in multiple biological processes, enabling the plant to respond to biotic and abiotic stress [91]. High photosynthesis capacity is a representative feature of M. micrantha, providing the energy supply needed for rapid vegetative growth [92]. The enrichment of unique gene families in the photosynthesis pathway reflects the importance of photosynthesis in the invasion and adaptation of M. micrantha. Similarly, the upregulation of genes related to photosynthesis promotes the adaptation of Solidago canadensis to a shaded habitat, greatly improving the plant's ability to adapt to different light conditions [93]. The identification of unique gene families in M. micrantha with functional importance will provide a valuable resource for further research on the adaptive mechanisms of this species.
Unique genes that harbor mutations leading to nonsynonymous amino acid substitutions among populations are worthy of attention. These unique genes encode functional proteins and are involved in important function categories. For example, GSTT3 is a member of GST superfamily and is related to antioxidants that respond to drought and biotic stress [94,95]. We detected a unique gene encoding the GSTT3 protein, which may have significant implications for increasing the tolerance of M. micrantha to drought and biotic stress. We also found unique genes related to 'integral to membrane'. This functional category is related to the defense response of apple root to pathogen infection [96]. Although these genes were not identified as candidate loci for selection, they may become loci with an adaptive importance after sufficient time.

Role of Adaptive Genes in M. micrantha
Genes under evolutionary pressure (such as positive selection) may be important for the survival and spread of various species [97]. The identification of functionally important positive selection genes for M. micrantha would enable us to better understand the molecular associations underpinning its invasive adaptation. Accordingly, we identified 41 genes showing a significant positive selection signature in M. micrantha. These PSGs were found to be involved in (i) binding, (ii) DNA replication and repair, (iii) signature transduction, and (iv) transcription and cellular components ( Figure S5; Table S24). Function binding, including DNA binding, ATP binding, and GTP binding, generally participates in various physiological processes and stress responses. For example, DNA/ATP/GTP-binding proteins play important regulatory roles in the response to cold stress in mature pollen of Arabidopsis, and in the response to heat stress during the imbibition and germination of Ricinus communis seed [98,99]. DNA replication is indispensable for the growth, development, and proliferation of cells in plants. A growing body of research shows that minichromosome maintenance (MCM) complex proteins are important regulatory factors essential for DNA replication. The overexpression of a single subunit of pea MCM6 improves salt tolerance in transgenic tobacco plants [100]. Plants have also developed a variety of mechanisms to repair DNA damage caused by abiotic stresses. Casati and Walbot (2008) showed that the DNA excision repair protein ERCC-1 (ERCC1) enhances the efficiency of UV-B-induced DNA damage repair in maize [101]. Exonuclease 1 (Exo1) is known to function in a number of DNA repair pathways [102]. The positive selection of genes encoding MCM6, ERCC1, and Exo1 might be an adaptive response allowing M. micrantha to adapt to invasive environments under various abiotic stresses. Signal transduction links the sensing mechanism with genetic responses, which is important for signal transmission to the cellular machinery to initiate adaptive responses upon an environmental change [103]. For example, the G-protein-coupled receptor signaling pathway participates in cold tolerance in hullless barley (Hordeum vuglare) [104]. This pathway may have important implications for the response of M. micrantha to cold stimulus. Finally, key cellular components (e.g., nucleus) and transcriptional regulation are essential for plant growth and environmental adaptation. The positive selection of genes related to the nucleus and transcription suggests their potential roles in M. micrantha. In addition, two functionally important PSGs harbored mutations that resulted in nonsynonymous amino acid substitutions among 21 populations, which implies that they may contribute to the population adaptation of M. micrantha.
During rapid population invasion, genes that regulate important biological processes are expected to function in M. micrantha, providing insights into M. micrantha adaptation. We detected 11 outlier SNPs distributed in 11 genes as candidate targets of adaptive importance. The GO annotation analysis of the 11 genes revealed that they were related to stress response, which should be considered in future population genomic research. Under the biological process category, "metabolic process", "oxidationreduction process", and "response to stimulus" were highly representative GO terms important for the environmental adaptation of invasive and non-invasive plants. The metabolic process may contribute to the cold tolerance in the northward invasion of Alternanthera philoxeroides [105]. The oxidation-reduction process is involved in the drought stress response in Amorpha fruticosa seedlings [106] and in the adaptation to hypoxia and low temperatures in Kandelia obovata [107]. Effective responses to environmental changes or stress stimuli are beneficial for the colonization and adaptation of invasive plants in nonindigenous areas. Response to stimulus is considered an important adaptation mechanism for plants under stress, e.g., due to water deprivation [108]. In the molecular function category, GO terms such as "binding", "transporter activity", and "transcription regulator activity" are generally related to stress responses, and have been shown to participate in the K deficiency response in tobacco [109] and the salt resistance of the invasive plant Phragmites karka [110]. In terms of the cellular component category, the membrane is an important structure that indirectly or directly perceives stimulation to initiate signal trans-duction. For instance, several membrane genes are upregulated in Tamarix ramosissima in response to water deficiency [111].

Conclusions
In the present study, we used population genomics to characterize the range expansion and invasive adaptation of M. micrantha in southern China. Genome-wide SNP analysis revealed, for the first time, the effects of gene flow, environmental variables, and functional genes on the population variation and adaptation of M. micrantha. The presented findings demonstrate the importance of a high rate of gene flow coupled with genetic admixture for the invasion success of M. micrantha. We identified 38 outlier SNPs associated with environmental variables. Soil composition, temperature, precipitation, and ecological variables were important determinants affecting the invasive adaptation of M. micrantha. Candidate genes with outlier signatures were related to abiotic stress responses. Gene family clustering analysis revealed 683 gene families unique to M. micrantha that may have significant implications for the growth, metabolism, and defense response in M. micrantha. Forty-one genes with significant positive selection signatures were identified. These PSGs mainly function in binding, DNA replication and repair, signature transduction, transcription, and cellular components. In addition, eight genes unique to M. micrantha and two PSGs harbored mutations leading to nonsynonymous amino acid substitutions across 21 populations; these may play crucial roles in the population expansion and adaptation of M. micrantha. Overall, the presented population genomic analysis indicates that the sustained high rate of gene flow, response of adaptive loci, and regulation of functional genes have enabled the invasion and adaptation of M. micrantha in southern China. These findings can be used to guide prevention and monitoring efforts for this invasive species.   , Table S9: The locations and mutation types of SNPs in the M. micrantha genome, Table S10: The analysis results of the differences in genetic diversity parameters among the six regions obtained from Kruskal-Wallis and post hoc Nemenyi test, Table S11: Genetic differentiation and gene flow between pairwise populations for M. micrantha individuals. The lower triangle represents genetic differentiation values (F ST ), and the upper triangle represents gene flow between pairwise populations, Table S12: Genetic differentiation and gene flow between pairwise regions for M. micrantha. The lower triangle represents genetic differentiation values (F ST ), and the upper triangle represents gene flow between pairwise regions, Table S13: Molecular variance analysis (AMOVA) of M. micrantha, Table S14: The position of outlier SNPs identified by BayeScan on the M. micrantha genome, Table S15: The annotation of the genes containing outlier SNPs, Table S16: Outlier SNPs associated with environmental variables identified using LFMM, Table S17: Outlier SNPs associated with environmental variables identified using BAYENV, Table S18: Gene family cluster analysis results of the genomes of M. micrantha and 12 other plants, Table S19: Gene families unique to M. micrantha and gene length distribution, Table S20: Protein sequences of 683 gene family members unique to M. Micrantha, Table S21: Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway enrichment analysis of 683 gene families unique to M. micrantha, Table S22: Nonsynonymous amino acid mutation information and functional annotation of genes unique to M. micrantha, Table S23: Likelihood ratio test (LRT) of the variable w ratio under different models, Table S24: The annotation of the 41 positive selection genes in M. micrantha, Table S25