Assessment of the Genetic Diversity and Population Structure of Rhizophora apiculata Blume (Rhizophoraceae) in Thailand

Simple Summary We utilized the 10× Genomics technology to obtain a reference whole-genome sequence for assessing the genetic diversity and population structure of Rhizophora apiculata in Thailand. Using SNPs identified from the R. apiculata genome sequence, moderate genetic diversity and high genetic differentiation were observed among 82 R. apiculata accessions collected along the coasts of Thailand. Two subpopulations corresponding to the Gulf of Thailand and the Andaman Sea coasts were clustered and confirmed from three approaches: population structure, PCA, and phylogenetic analyses. The AMOVA result revealed that the percentage of variation within populations (76%) was higher than that among populations (24%). Abstract Rhizophora apiculata is one of the most widespread and economically important mangrove trees in the Indo-West Pacific region. Knowledge of the genetic variation of R. apiculata in Thailand is limited. Here, we generated a whole-genome sequence of R. apiculata using the 10× Genomics technology. R. apiculata genome assembly was 230.47 Mb. Based on its genome, 2640 loci of high-quality biallelic SNPs were identified from 82 R. apiculata accessions collected from 17 natural mangrove forests in Thailand to assess the genetic diversity and population structure among them. A moderate level of genetic diversity of R. apiculata was observed. The average observed heterozygosity (Ho = 0.48) was higher than the average expected heterozygosity (He = 0.36). Two subpopulations were observed and confirmed from three approaches: population structure, PCA, and phylogenetic analyses. They corresponded to the Gulf of Thailand and the Andaman Sea separated by the Malay Peninsula. AMOVA analyses indicated that genetic variation was attributable to 76.22% within populations and 23.78% among populations. A high level of genetic differentiation between the two subpopulations (FST = 0.24, p < 0.001) was observed. This study evaluated the genetic diversity and population structure of R. apiculata, providing useful information for sustainable mangrove management in Thailand.


Introduction
Mangroves generally grow in intertidal habitats, which are at the interface between land and sea in tropical and sub-tropical regions [1]. They are the most important component of coastal ecosystems and ecological services [2][3][4][5][6]. Mangrove trees protect against coastal erosion, reduce the effects of strong winds and heavy waves on the coasts as well as provide habitats, food, timber, and medicines [5,6]. In addition, mangrove trees can serve as carbon sinks that mitigate greenhouse effects [2][3][4]. In the past decades, coastal variants were identified from 82 R. apiculata accessions collected from 17 natural mangrove forests in Thailand. The variants were filtered to identify high-quality biallelic SNPs that were used to assess the genetic diversity and population structure of R. apiculata in Thailand.

Samples
For reference genome sequencing, one R. apiculata individual was chosen as a representative species in this study. It is in the natural mangrove forest in the Ranong province (9°52′36.1″ N 98°36′11.5″ E) under the protection of the Department of Marine and Coastal Resources. The morphology of R. apiculata as a reference genome is presented in Figure 1. In addition, the collection sites of PKT, PNA, RNG, STN, and TRG represent locations on the Andaman Sea coast. These two coastal regions were regionally separated by the Malay Peninsula. The geographic map was created to locate collection sites using the QGIS software v3.24.2 [34]. The general characters of R. apiculata accessions have large and tall trees, a narrowly elliptic leaf shape, brown to dark grey bark, and prop and stilt roots. In addition, the collection sites of PKT, PNA, RNG, STN, and TRG represent locations on the Andaman Sea coast. These two coastal regions were regionally separated by the Malay Peninsula. The geographic map was created to locate collection sites using the QGIS software v3.24.2 [34]. The general characters of R. apiculata accessions have large and tall trees, a narrowly elliptic leaf shape, brown to dark grey bark, and prop and stilt roots.

DNA Extraction and Sequencing
The fresh leaves of all R. apiculata accessions were stored in liquid nitrogen. The standard CTAB (Cetyl Trimethyl Ammonium Bromide) method was used to extract genomic DNA from the R. apiculata leaves [35]. One R. apiculata accession as a reference sequence was sequenced using the 10× Genomics technology with linked-read sequencing, which is a microfluidics-based method to generate long-range information from shortread sequencing data (10× Genomics; https://www.10xgenomics.com (accessed on 5 January 2022)). The 10× genomics library was constructed from approximately 1 ng of high quality, high molecular weight DNA using the Chromium Genome Library Kit & Gel Bead Kit v2, the Chromium Genome Chip Kit v2 and the Chromium i7 Multiplex Kit according to the manufacturer's instructions (10× Genomics) and was sequenced using the Illumina HiSeq X Ten (paired-end reads at 150 bp). Furthermore, the 82 R. apiculata accessions were sequenced using MGI technology with restriction site-associated DNA sequencing (RAD-seq), which is one of the most genomic library preparations to reduce representation sequencing [33]. For each sample, a RAD-seq library was created with ap-

DNA Extraction and Sequencing
The fresh leaves of all R. apiculata accessions were stored in liquid nitrogen. The standard CTAB (Cetyl Trimethyl Ammonium Bromide) method was used to extract genomic DNA from the R. apiculata leaves [35]. One R. apiculata accession as a reference sequence was sequenced using the 10× Genomics technology with linked-read sequencing, which is a microfluidics-based method to generate long-range information from short-read sequencing data (10× Genomics; https://www.10xgenomics.com (accessed on 5 January 2022)). The 10× genomics library was constructed from approximately 1 ng of high quality, high molecular weight DNA using the Chromium Genome Library Kit & Gel Bead Kit v2, the Chromium Genome Chip Kit v2 and the Chromium i7 Multiplex Kit according to the manufacturer's instructions (10× Genomics) and was sequenced using the Illumina HiSeq X Ten (paired-end reads at 150 bp). Furthermore, the 82 R. apiculata accessions were sequenced using MGI technology with restriction site-associated DNA sequencing (RAD-seq), which is one of the most genomic library preparations to reduce representation sequencing [33]. For each sample, a RAD-seq library was created with approximately 1 µg of DNA following the MGIEasy RAD Library Prep Kit Instruction Manual (MGI Tech). Libraries were pooled and sequenced using the MGISEQ-2000RS sequencing platform with a 150 bp paired-end cycle kit following the manufacturer's protocol. High-quality reads were obtained from the MGISEQ-2000RS sequencer.

Population Structure and Principal Component Analysis
The population structure analysis of R. apiculata was examined using the STRUCTURE program v2.3.4 with a Bayesian clustering approach via the Markov Chain Monte Carlo (MCMC) estimation [47,48]. To prepare input data for STRUCTURE, the final VCF file was converted to the STRUCTURE file format using PGDSpider v2.0.9.2 [49]. Then, the analysis was performed using twenty replicate runs in each number of clusters (K) from 1 to 10, an MCMC burn-in period length of 10,000, and a run length of 10,000 [32]. The most probable K value (the number of subpopulations) was determined by comparing delta K (∆K) based on the rate of change in the log-probability of the data [LnP(D)] between successive K values using the Structure Harvester v0.6.7 [50,51]. Based on the best K value, clustering from 1000 replicates in STRUCTURE was summarized using CLUMPP version 1.1.2 [52].
To determine the proportion of variation explained by each principal component, eigenvalues of the filtered SNPs were generated using PLINK v1.9 [53]. Principle component analysis was performed using PCA in R [54]. The first and second principal components were plotted using R software v3.3.4 with the library tidyverse and the package ggplot [55].

Phylogenetic Analysis
To assess the phylogenetic relationships among the 82 R. apiculata accessions, phylogenetic analysis was performed using a maximum likelihood (ML) method based on SNPs. The SNPs in all accessions were aligned using MUSCLE with default in MEGA X [60]. To be the best fit model for the SNP dataset, the K2 + G model was identified using the find best DNA/protein model tool in MEGA X. A ML phylogenetic tree was constructed using MEGA X with the K2 + G model. A bootstrap consensus tree with 1000 replications was carried out. The phylogenetic tree was visualized using the interactive Tree of Life (iTOL) [61].

Genome Assembly and SNP Data
For the whole-genome sequencing of R. apiculata, a total of 110.04 Gb of 150 bp pairedend reads were obtained and assembled. The de novo assembly contained 22,267 contigs with the longest scaffold, N50 length, and N90 length of 1,917,847, 144,278, and 3785 bp, respectively (Table S1). The contigs were further scaffolded based on the previous R. apiculata genome by merging homology sequences and reconciling genome assembly scaffolds [37]. The final R. apiculata genome assembly contained 133 scaffolds and 10,427 contigs with an aggregate size of 230.47 Mb (Table S1). The longest scaffold, N50, and N90 were 12,839,107; 4,834,853; and 74,632 bp in length, respectively (Table S1). The size of our R. apiculata genome (231 Mb) is similar to the size of the previously reported R. apiculata genomes (232 Mb) in China [37,62], which covered 84% of the estimated R. apiculata genome based on flow cytometry (274 Mb) [37]. The BUSCO (benchmarking universal single-copy orthologues) result revealed that all scaffolds (208 Mb, Table S1) covered approximately 97% of predicted R. apiculata genes ( Figure S1) [37].
To generate SNPs data, 82 R. apiculata accessions in natural mangrove forests in 17 provinces of Thailand ( Figure 2) were sequenced using RAD-seq. A total of 226.04 Gb of 150 bp paired-end reads were generated (Table S2). A total of 1,745,767 SNPs were identified among 82 R. apiculata accessions. Numerous SNPs with missing data > 0.05 (1,395,365 SNPs) and minimum read depth < 10 in each accession (313,981 SNPs) were removed. After filtering SNPs using the criteria mentioned in the Materials and Methods Section, 2640 high-quality biallelic SNPs were obtained.

SNP Characterization
The distributions of values for genetic diversity, polymorphic information content (PIC), and minor allele frequency (MAF) of the 2640 SNPs estimated on the 82 R. apiculata accessions are shown in Figure 3. The SNP diversity ranged from 0.16 to 0.50 with an average of 0.39 with the vast majority (96%) falling between 0.21 and 0.50 ( Figure 3A; Table S3). The PIC values ranged from 0.15 to 0.38 with a mean PIC value of 0.31 ( Figure 3B; Table S3). Approximately 60% of the SNPs had PIC values exceeding 0.30, suggesting high polymorphic data. The MAF values varied from 0.09 to 0.50, with an average value of 0.31 ( Figure 3C; Table S3). Biology 2022, 11, x FOR PEER REVIEW 7 of 14

Population Structure and Principal Component Analysis
To classify subpopulations, the population structure and principal component analysis (PCA) of R. apiculata in Thailand was performed ( Figure 4). The Bayesian clustering algorithm was used to analyze the population structure. The largest delta K was observed at K = 2 ( Figure 4A), suggesting the presence of two subpopulations. The first subpopulation (orange in Figure 4B) consists of accessions collected from TRT, CTI, CCO, SPK, SKN, PBI, PKN, CMP, SNI, NST, PTN, and NWT, and the second subpopulation (blue in Figure  4B) consists of accessions from STN, TRG, PKT, PNA, and RNG. The analysis demonstrated two distinct genetic clusters corresponding to two geographic regions, the Gulf of Thailand (TRT, CTI, CCO, SPK, SKN, PBI, PKN, CMP, SNI, NST, PTN, and NWT) and the Andaman Sea (STN, TRG, PKT, PNA, and RNG). In addition to the population structure, the PCA of R. apiculata was conducted across the 82 R. apiculata accessions ( Figure 4C) where the first two components explained 70.46% of the total genetic variation (67.60% and 2.86%, respectively). According to the first two components, the accessions were divided into two groups: the Gulf of Thailand (TRT, CTI, CCO, SPK, SKN, PBI, PKN, CMP, SNI, NST, PTN, and NWT) and the Andaman Sea (STN, TRG, PKT, PNA, and RNG).

Population Structure and Principal Component Analysis
To classify subpopulations, the population structure and principal component analysis (PCA) of R. apiculata in Thailand was performed ( Figure 4). The Bayesian clustering algorithm was used to analyze the population structure. The largest delta K was observed at K = 2 ( Figure 4A), suggesting the presence of two subpopulations. The first subpopulation (orange in Figure 4B) consists of accessions collected from TRT, CTI, CCO, SPK, SKN, PBI, PKN, CMP, SNI, NST, PTN, and NWT, and the second subpopulation (blue in Figure 4B

Genetic Diversity and Genetic Differentiation
At the population level, the average number of effective alleles (Ne), Shannon's information index (I), observed heterozygosity (Ho), expected heterozygosity (He), and the percentage of polymorphic loci (PPL) were estimated (Table 1; Table S4). All diversity parameters were similar in the two subpopulations. The PPL value in the two subpopulations was over 99%, indicating high genetic diversity within them. Remarkably, average Ho values were higher than average He values, which resulted in a negative inbreeding coefficient value (F = −0.199), indicating an excess of heterozygosity. To assess genetic differences, AMOVA (analysis of molecular variance) analyses of the accessions with the two subpopulations revealed that 23.78% of the total genetic variation was attributable to differences among subpopulations, and 76.22% of the total genetic variation was attributable to differences among accessions within populations ( Table  2). The fixation index (FST) was 0.24 (p < 0.001), suggesting significant high genetic differentiation between the subpopulations.

Genetic Diversity and Genetic Differentiation
At the population level, the average number of effective alleles (Ne), Shannon's information index (I), observed heterozygosity (Ho), expected heterozygosity (He), and the percentage of polymorphic loci (PPL) were estimated (Table 1; Table S4). All diversity parameters were similar in the two subpopulations. The PPL value in the two subpopulations was over 99%, indicating high genetic diversity within them. Remarkably, average Ho values were higher than average He values, which resulted in a negative inbreeding coefficient value (F = −0.199), indicating an excess of heterozygosity. To assess genetic differences, AMOVA (analysis of molecular variance) analyses of the accessions with the two subpopulations revealed that 23.78% of the total genetic variation was attributable to differences among subpopulations, and 76.22% of the total genetic variation was attributable to differences among accessions within populations ( Table 2). The fixation index (F ST ) was 0.24 (p < 0.001), suggesting significant high genetic differentiation between the subpopulations. Notes. degree of freedom (df), *** statistically significant (p < 0.001).

Population Phylogenetic Relationship
To understand relationships among 82 R. apiculata accessions, a maximum likelihood (ML) tree was conducted ( Figure 5). Bootstrap values at all branches are high (bootstrap value ≥ 82; mostly equal to 100), indicating that the ML tree was highly reliable. The ML tree shows that 82 R. apiculata accessions are divided into two main clusters. Forty-seven accessions from the Gulf of Thailand coast are comprised in cluster I (as shown in the yellow clades in Figure 5), and thirty-four accessions (28 accessions from the Andaman Sea coast and six accessions from the Gulf of Thailand coast) are comprised in cluster II (as shown in the blue clades in Figure 5). Nonetheless, an accession (SNI 04) collected in the Surat Thani province is not in the two clusters of the ML tree. The branching structure of the ML tree is similar to STRUCTURE analysis, in which cluster I accessions are labeled in red text (as shown in orange for group 1 in the STRUCTURE analysis; Figure 4B), whereas cluster II accessions are labeled in blue text (as shown in blue for group 2 in the STRUCTURE analysis; Figure 4B) ( Figure 5). In addition, cluster I and II of the ML tree are concordant with the PCA analysis that most of the accessions are divided into two groups: the Gulf of Thailand and Andaman Sea with 67.60% of PC1 ( Figure 4C).   Notes. degree of freedom (df), *** statistically significant (p < 0.001).

Population Phylogenetic Relationship
To understand relationships among 82 R. apiculata accessions, a maximum likelihood (ML) tree was conducted ( Figure 5). Bootstrap values at all branches are high (bootstrap value ≥ 82; mostly equal to 100), indicating that the ML tree was highly reliable. The ML tree shows that 82 R. apiculata accessions are divided into two main clusters. Forty-seven accessions from the Gulf of Thailand coast are comprised in cluster I (as shown in the yellow clades in Figure 5), and thirty-four accessions (28 accessions from the Andaman Sea coast and six accessions from the Gulf of Thailand coast) are comprised in cluster II (as shown in the blue clades in Figure 5). Nonetheless, an accession (SNI 04) collected in the Surat Thani province is not in the two clusters of the ML tree. The branching structure of the ML tree is similar to STRUCTURE analysis, in which cluster I accessions are labeled in red text (as shown in orange for group 1 in the STRUCTURE analysis; Figure 4B), whereas cluster II accessions are labeled in blue text (as shown in blue for group 2 in the STRUCTURE analysis; Figure 4B) ( Figure 5). In addition, cluster I and II of the ML tree are concordant with the PCA analysis that most of the accessions are divided into two groups: the Gulf of Thailand and Andaman Sea with 67.60% of PC1 ( Figure 4C).

Discussion
Because of the loss of mangrove forest areas, an understanding of R. apiculata genetic diversity and population structure is necessary for mangrove management strategies. In the present study, 82 R. apiculata accessions (Rhizophoraceae) collected from 17 natural mangrove forests in Thailand were evaluated using SNP markers. Gene diversity (GD) and polymorphic information content (PIC) values have been used to evaluate the level of genetic variation in populations [63]. Based on 2640 SNPs with the MAF values of ≥5%, GD values ranged from 0.16 to 0.50 and PIC values ranged from 0.15 to 0.38, representing moderate to high levels of genetic variation in R. apiculata. In addition, the PIC value is commonly used to measure the informativeness of genetic markers [63]. Following the criteria of Botstein et al. [63], 77% (2027 SNPs) of the SNPs were observed to be highly informative markers (0.5 > PIC > 0.25) and 23% (613 SNPs) of the SNPs were less informative markers (PIC < 0.25) (Figure 3; Table S3).
To examine the population structure of 82 R. apiculata accessions, the Bayesian modelbased clustering method and PCA were carried out. Using the STRUCTURE software, the result is shown for the best K = 2, revealing two subpopulations that are divided along the Malay Peninsula between the Gulf of Thailand and the Andaman Sea coasts. In addition, PC1 and PC2 separated the R. apiculata accessions into the Gulf of Thailand and the Andaman Sea coasts. These results are concordant with a previous study that reported two groups of R. apiculata between the Gulf of Thailand (Bangkok and Surat Thani) and Andaman Sea (Trang) [23]. In contrast, the population of R. mucronata in Thailand, which is a sympatric mangrove species with R. apiculata, was not clustered into two groups between the Gulf of Thailand and Andaman Sea [23]. Hence, R. apiculata in Thailand were shown to independently adapt to their own environments, probably due to the Malay Peninsula barrier that prevents the movement among R. apiculata subpopulations between the Gulf of Thailand and Andaman Sea coasts. Based on SNP markers, 63 accessions of Bruguiera parviflora (a mangrove species in the family Rhizophoraceae) were also clustered in two subpopulations, the Gulf of Thailand (Surat Thani and Trat) and the Andaman Sea (Phang-nga, Ranong, and Satun) [32]. Furthermore, the population structure of Ceriops tagal (one of the mangrove species in the family Rhizophoraceae) was evaluated in Thailand and China and it was shown that populations from the eastern coastline of Thailand were more genetically similar to populations from the South China Sea coast than to populations from the western coastline of Thailand, using inter-simple sequence repeat [64]. The population structure of R. apiculata and other mangrove species in the family Rhizophoraceae is conceptually consistent with the land barrier hypothesis of the Malay Peninsula [23].
Interestingly, genetic admixture was found in several accessions in both subpopulations in the Gulf of Thailand (CCO, CMP, SNI, NST, PTN and NWT) and Andaman Sea (STN, TRG, PKT, PNA and RNG) ( Figure 4B), which was concordant with previous studies that reported the genetic admixture of R. apiculata in Thailand (such as R. apiculata populations in Krabi, Phuket, and Ranong), in Indonesia, and in Malayia as well as the genetic admixture of relative Rhizophora species in some parts of the IWP region [26,28]. The genetic admixture of R. apiculata in Thailand reflected higher levels of genetic diversity with admixed alleles from the two subpopulations.
Genetic diversity and genetic differentiation in the R. apiculata population in Thailand were evaluated. Moderate genetic diversity (mean He = 0.39) was observed in the R. apiculata population along coastlines in Thailand. This is concordant with the study of R. apiculata populations in the Hainan Island, Gulf of Thailand, and the west coast of Thailand [27] as well as the breeding system of R. apiculata by mixed mating or predominantly outcrossing, which maintains the level of genetic diversity [26]. In general, both natural and environmental factors, such as mating systems, habitat fragmentation, climate change, and anthropogenic activities, affect the genetic diversity of mangrove species [28,29]. Low genetic diversity was reported in numerous studies on mangrove species, particularly the Rhizophora species [24][25][26]28,37,[64][65][66]. Furthermore, a high degree of genetic differentiation among two subpopulations (F ST = 0.24, p < 0.001) was similarly observed in several studies of R. apiculata populations, such as in the Strait of Malacca between the east coast (Hainan Island and Gulf of Thailand) and the west coast (Gulf of Thailand) of the Indo-Malayan region (F ST = 0.48 (SNPs)) [27], in Thailand between the Bangkok and Trang provinces (F ST = 0.875 (five nuclear genes) and F ST = 0.688 (two cpDNA regions)) [23], in the Greater Sunda Islands of Indonesia (F ST = 0.381 (five microsatellite markers)) [24], in the Indo-Malaysian region between Thailand's western coast and other regions (F ST = 0.242-0.532 (SNPs)) [25], and in Malaysia (F ST = 0.315 (three nuclear microsatellite markers)) [28].
The ML tree showed that 81 R. apiculata accessions were grouped into two genetic clusters (the Gulf of Thailand and the Andaman Sea) corresponding to the two subpopulations of the STRUCTURE and PCA clustering. Our tree topology is consistent with others [28,66]. For example, the phylogenetic tree by UPGMA (unweighted pair group method with arithmetic mean) revealed the two clusters of R. apiculata between the western and eastern regions of Peninsular Malaysia based on microsatellite markers [28]. Using nuclear and chloroplast regions, the NJ (Neighbor-joining) tree of R. apiculata populations in the Malay Peninsula regions showed two clusters, the west and south of the Malay Peninsula, and the east of the Malay Peninsula [66]. These support the existence of two genetic patterns in the R. apiculata population in Thailand that were geographically isolated on the Gulf of Thailand coast on the east and the Andaman Sea coast on the west. In addition, one accession (SNI 04) from Surat Thani province is separated between cluster I and II based on the ML tree. The STRUCTURE analysis has shown genetic admixture in this accession ( Figure 4B). SNI 04 is in the middle of the PCA plot, no cluster with other accessions ( Figure 4C). These results suggest that the accession is an admixed individual that combines genetic variation from two genetically differentiated source subpopulations (Gulf of Thailand and Andaman Sea), leading to an increase in the genetic variation within the population of R. apiculata in Thailand.

Conclusions
This study was addressed to examine the genetic diversity and population structure of R. apiculata (Rhizophoraceae) along coastlines in Thailand based on SNP markers. Moderate genetic diversity and high genetic differentiation were observed. The results showed two subpopulation differentiations, indicating genetic discontinuity between the coast of the Gulf of Thailand and the Andaman Sea. An AMOVA indicating 76% of variation found within populations and 24% of variation found among populations corroborated these results. Genetic diversity after the divergence of the R. apiculata populations might have been caused by geographic isolation.
Supplementary Materials: The following supporting information can be downloaded at: https:// www.mdpi.com/article/10.3390/biology11101449/s1, Figure S1: Genome assembly evaluation of Rhizophora apiculata using BUSCO; Table S1: Statistical assembly of Rhizophora apiculata; Table S2: Statistical mapping of R. apiculata; Table S3: List of 2640 SNPs of R. apiculata and marker characteristics; Table S4: Summary of genetic parameters of two genetic clusters in 82 R. apiculata accessions based on 2640 SNPs.  Data Availability Statement: The genome sequence of R. apiculata was submitted to the National Center for Biotechnology Information (NCBI), the project accession number was PRJNA846534.