Genetic Diversity and Population Structure of Capirona ( Calycophyllum spruceanum Benth.) from the Peruvian Amazon Revealed by RAPD Markers

: Capirona ( Calycophyllum spruceanum Benth.) is a tree species of commercial importance widely distributed in South American forests that is traditionally used for its medicinal properties and wood quality. Studies on this tree species have been focused mainly on wood properties, propagation, and growth. However, genetic studies on capirona have been very limited to date. Currently, it is possible to explore genetic diversity and population structure in a fast and reliable manner by using molecular markers. We here used 10 random ampliﬁed polymorphic DNA (RAPD) markers to analyze the genetic diversity and population structure of 59 samples of capirona that were sampled from four provinces located in the eastern region of the Peruvian amazon. A total of 186 bands were manually scored, generating a 59 × 186 presence/absence matrix. A dendrogram was generated using the UPGMA clustering algorithm, and, similar to the principal coordinate analysis (PCoA), it showed four groups that correspond to the geographic origin of the capirona samples (LBS, Irazola, Masisea, Iñapari). Similarly, a discriminant analysis of principal components (DAPC) and STRUCTURE analysis conﬁrmed that capirona is grouped into four clusters. However, we also noticed that a few samples were intermingled. Genetic diversity estimation was conducted considering the four groups (populations) identiﬁed by STRUCTURE software. AMOVA revealed the greatest variation within populations (71.56%) and indicated that variability among populations is 28.44%. Population divergence (F st ) between clusters 1 and 4 revealed the highest genetic difference (0.269), and the lowest F st was observed between clusters 3 and 4 (0.123). RAPD markers were successful and effective. However, more studies are needed, employing other molecular tools. To the best of our knowledge, this is the ﬁrst investigation employing molecular markers in capirona in Peru considering its natural distribution, and as such it is hoped that this helps to pave the way towards its genetic improvement and the urgent sustainable management of forests in Peru. Contributions: Conceptualization, C.L.S. and W.C.; methodology, C.L.S., J.D.C., M.Y.C. and W.C.; analysis, and resources, M.R. E.C.; data curation, C.I.A.; C.L.S. and C.I.A.; writing—review and C.L.S. C.I.A.; supervi-sion, W.C.


Introduction
Calycophyllum spruceanum (Benth.) (Rubiaceae) is a fast-growing forest species [1] with a natural origin in the Amazon basin, covering the territories of Bolivia, Peru, Colombia, and Brazil [2]. Capirona is a valuable timber forest species in the Peruvian Amazon that is widely used by local communities and is exported around the world. It has a highdensity, durable wood; therefore, it is used for the construction of economically valuable products [3]. Capirona has high potential for cultivation in agroforestry systems, which counters unsustainable land use due to activities such as slash-and-burn agriculture, and this species represents an alternative to increase rural incomes and development [4]. Thus, capirona is currently the object of research in the Peruvian Amazon basin in order to establish participatory research with local communities, optimizing genetic and silvicultural management strategies including community conservation methods [1] since it is known that the indiscriminate use of forest species is generating deforestation in various regions of the planet [5][6][7]. Genetic diversity studies are indispensable for conducting conservation programs and sustainable management. Moreover, these studies based on molecular markers provide important information on the genetic makeup of the population because they are independent of environmental factors [8]. A demographic study would not provide this kind of information, influencing decision making. In recent years, molecular markers such as DNA-based markers like random-amplified polymorphic DNA (RAPD), microsatellite or simple sequence repeat (SSR) and inter simple sequence repeat (ISSR) have been used in studies for the analysis of phylogeny, inter-species relationships and genetic diversity for forest species like Pinus leucodermis, Cedrela odorata Eucalyptus globulus, Swietenia macrophylla, Bertholletia excelsa and Populus deltoides [9][10][11][12][13][14].
Previous research determined the genetic variation of capirona by using amplified fragment length polymorphisms (AFLP) among nine populations from the Peruvian Amazon [1]. They demonstrated that variation among individuals within populations is predominant. However, they did not employ samples from the Madre de Dios department, which is considered a place where capirona exists naturally [15]. On the other hand, Tauchen et al. [3] identified the genetic variability of capirona at the level of DNA polymorphism evaluated by non-specific ITS primers of plant materials from Pucallpa, department of Ucayali, in the Peruvian Amazon. Their results showed that morphological diversity is greater than genetic diversity of capirona. They also found that environmental factors had a greater impact on the phenotype in those accessions. Finally, their results are in line with the claims of previous studies on C. spruceanum, suggesting greater variation within provenances than among provenances. Capirona is an orphan forest species, and no further studies of genetic diversity with molecular markers have been reported.
RAPD is an especially useful DNA-based method for initial research of genetic diversity in plant species. These studies are crucial for the initiation of conservation programs, and the rapidity of working with RAPD markers is an advantage because they can deliver crucial information in a short time [16][17][18]. They have been used to differentiate cultivars and also on genetic mapping [19][20][21]. Although some researchers have shown reservations regarding the usefulness of RAPD, basically due its poor reproducibility [22,23], other studies mentioned that RAPD can be employed for genetic analyses [24][25][26][27][28]. Additionally, other studies demonstrated a high level of reproducibility [29]. A great potential for the analysis of polymorphism and genetic diversity has also been demonstrated by RAPD markers as they are easy to test and require low concentrations of DNA [17].
The objective of this study was to determine the genetic diversity and population structure of capirona from primary forests located in the departments of Ucayali, San Martín and Madre de Dios located in the Peruvian Amazon to help pave the way towards its modern breeding program, the sustainable conservation of this tree species in the near future.

Plant Material
We collected 59 samples of capirona from Ucayali, San Martín and Madre de Dios departments in the Peruvian Amazon considering their natural range of distribution. Leaf samples were collected in paper envelopes, stored in airtight containers with silicone gel, and transported to the National Institute of Agricultural Innovation (INIA) for genomic DNA extraction. Further details of the samples examined in this study are available in Table 1.

DNA Amplification
We extracted genomic DNA by using the CTAB method with minor modifications [30,31]. All procedures were performed in 1.5 and 2 mL plastic microfuge tubes. About 0.1 g dry leaves were grounded in liquid nitrogen and suspended in 600 mL of 2× CTAB buffer containing 0.2% b-mercaptoethanol followed by incubating at 65 • C for 60 min, then an equal volume of chloroform:isoamylalcohol (24:1, v/v) was added and the sample was shaken gently and then centrifuged. The supernatant was extracted, then 10× CTAB buffer and chloroform:isoamylalcohol (24:1, v/v) was added again. The supernatant was extracted and then mixed with ice cold isopropanol. DNA was recovered as a pellet by centrifugation, washed with ice-cold ethanol twice (70 and 90%), and then DNA was dried in the air. Finally, DNA was resuspended in nuclease-free water. The RNA contamination in all samples was removed by digesting the extract with RNase-A (100 µg ml −1 ) for 30 min at 37 • C. DNA quality and quantity were checked in 1% agarose gels using Gelred (Biotium ® , Fremont, CA, USA) and by standard spectrophotometry.
A total of 10 RAPD markers (Operon Technologies Inc., Alameda, CA, USA) were employed to assess genetic diversity among 59 samples of C. spruceanum (Table S1). Amplification was achieved with 10 µL reaction volume containing 5 ng DNA, kit Kapa HiFi Hotstart ReadyMix, and 0.2 µM of primers. The PCR amplification was performed at an initial denaturation temperature of 94 • C for 4 min followed by 40 cycles of 1 min denaturation at 94 • C, 45 s annealing at 37 • C and 2 min extension at 72 • C with a final extension of 10 min at 72 • C [32] in a Thermal Cycler Simplyone (Applied Biosystems™, Foster City, CA, USA). Amplified products were separated on 1% (w/v) agarose gel in TBE buffer by electrophoresis and were then visualized with Gelred staining and photographed using the gel documentation system. The size of the amplification products was estimated by comparing the amplicons with a 100 bp ladder (New England Biolabs, Ipswich, MA, USA).

Data Analysis
The RAPD band patterns were scored visually for the presence (1) or absence (0) of various molecular weights. Only polymorphic, clear and consistent bands were considered for the analysis [32]. Similar to the procedure employed by Chia [33], loci with more than 10% missing data were excluded from the analysis. Polymorphic information content (PIC) from dominant markers was calculated using the following equation: where, fi is the frequency of the amplified band (1) and (1 − fi) is frequency of absence of band (0) [34]. We then used RStudio software v1.2.5033 to calculate genetic distances based on Provesti's coefficient [35], then a dendrogram was generated using the UPGMA clustering algorithm with 1000 bootstrap replicates from poppr package v2.9.2 [36]. We also used ade4 v1.7-16 and adegenet v.2.1.3 packages in Rstudio to conduct a principal coordinate analysis (PCoA) and a discriminant analysis of the principal component analysis (DAPC), respectively, in order to infer the capirona population structure. Number of populations (K) were set from 1 to 10 by k-means clustering with 100,000 iterations. Selection of the most likely number of clusters was based on the lowest Bayesian Information Criterion (BIC) value.
Population structure was also estimated using the STRUCTURE program v.2.3.4 [37] with ten runs for each number of populations (K value) ranging from 1 to 10 with a burnin length of 50,000 Monte Carlos iterations, which was followed by 150,000 iterations. An admixture model with no previous population information was considered; all other parameters were set to default values. Estimation of the most likely number of clusters was calculated by the Evanno method [38]. Membership probabilities ≥ 0.8 or the maximum membership probability was adopted to divide the capirona samples into different clusters. Population structure plots were generated with R package pophelper v.2.3.1 [39]. We used tess3r package v1.1.0 [40] to display the STRUCTURE membership coefficient matrix of the estimated K on a geographic map of Peru.
We used the number of populations determined by STRUCTURE to conduct an analysis of molecular variance (AMOVA) using the R package poppr. In addition, three genetic diversity indices were calculated using the same package: (i) Shannon-Wiener index, (ii) Simpson's index, and (iii) Nei's gene diversity (expected heterozygosity). The degree of gene differentiation among clusters in terms of allele frequencies (F st ) was estimated using the following formula: where H S is the average expected heterozygosity estimated from each cluster and H t is total gene diversity or expected heterozygosity in the total cluster as estimated from the pooled allele frequencies.

RAPD Analysis
The 10 RAPD primers employed in this work revealed 16 to 23 fragments in 59 samples of capirona, with 29.4 fragments being the average. Of the total 186 fragments, 100% were polymorphic ( Figure S1). Polymorphic information content (PIC) ranged from 0.20 to 0.38 (Table 2).

Genetic Diversity and Population Structure
A total of 186 bands were manually scored, generating a 59 × 186 presence/absence data matrix. A phylogenetic tree based on Provesti's genetic distances showed that all members of capirona were placed in four main clusters according to their geographic locality: (1) samples belonging to Iñapari (except cap069), (2) samples from LBS, (3) samples collected at Masisea and (4) an "admixture" cluster with samples from different origin but showing a prevalence of samples belonging to Irazola (Figure 1). Regions Irazola and Masisea belong to Ucayali department. In addition, our dendrogram showed that cluster Masisea and Iñapari are the only clusters supported with a bootstrap value of more than 70% (Figure 1).
Forests 2021, 12, x FOR PEER REVIEW 6 of 13 of capirona are clustered according to their geographic origin, except for samples cap020, cap021, cap035 and cap085 ( Figure 3). Moreover, Figure S4 shows STRUCTURE membership proportions to clusters spatially interpolated into a map resulting from analysis in TESS3 [40]. Spatial interpolation of membership matrix inferred assigned capirona samples from Irazola to cluster 1 mainly. In the second cluster, samples belonging to Masisea are included. Cluster 3 comprised samples from LBS, and cluster 4 included samples from Iñapari. Genetic diversity and Fst estimation were conducted considering the four clusters (populations) identified by STRUCTURE software, taking into account that RAPD are dominant markers. The Nei´s genetic diversity [46] estimates ranged from 0.12 to 0.41 Principal coordinate analysis (PCoA) showed that the first two axis explained 22.8% of the variation and revealed that all 59 individuals are separated into four populations, according to their geographic locality (Irazola, Masisea, LBS, Iñapari). Two samples from Masisea are mixed with the Irazola population, and two samples from Iñapari are intermingled within the LBS group (Figure 2a). To explore the genetic structure of capirona from the Peruvian Amazon basin, we used the find.clusters function to determine the best K value for our capirona samples, discovering that K = 4 is the most likely number of groups according to the BIC criteria ( Figure S2). In accordance with the dendrogram result, discriminant analysis of principal components (DAPC) also evidenced that all samples of capirona are separated into four clusters (Figure 2b, Table S2). Moreover, Figure S3 demonstrated that capirona samples also clustered according to their geographic origin mainly. However, four samples from cluster 1 (cap030, cap033, cap035, cap065) grouped within the Masisea and LBS population. Similarly, sample cap069 from cluster 4 were placed within the LBS population. Sample cap085 belongs to cluster 3, but it was intermingled within the Iñapari population.  The analysis of molecular variance (AMOVA) allows the study of genetic variation within and between clusters. AMOVA revealed that 28.44% of the total variation was found between clusters of capirona while 71.56% was within clusters (Table 4).   The Evanno method [38] depicted that the best K value (number of populations) is two for our data set. However, the next largest peak is at K = 4. Waples [43] mentioned that the Evanno method tends to underestimate the number of genetic clusters. Similar to previous studies [44,45], we obtained a false highest peak at K = 2 with the Evanno method due to a strong rejection of the null hypothesis of no structure (K = 1). On the other hand, DAPC indicated that K = 4 is the most likely number of populations ( Figure S2), which is concordant with the dendrogram (Figure 1) and PCoA (Figure 2a). Therefore, further analyses followed K = 4. STRUCTURE analysis exhibited admixture for a few samples. Most samples of capirona are clustered according to their geographic origin, except for samples cap020, cap021, cap035 and cap085 (Figure 3). Moreover, Figure S4 shows STRUCTURE membership proportions to clusters spatially interpolated into a map resulting from analysis in TESS3 [40]. Spatial interpolation of membership matrix inferred assigned capirona samples from Irazola to cluster 1 mainly. In the second cluster, samples belonging to Masisea are included. Cluster 3 comprised samples from LBS, and cluster 4 included samples from Iñapari.

Discussion
Genetic diversity sheds light on the evolutionary pressure on the alleles and the mutation rate a locus might have undergone over a period of time [47]. To the best of our knowledge, this is the first study that employs molecular markers to estimate population structure and genetic diversity of capirona from a wide range of natural distribution in Peru. Molecular markers based on DNA (RAPD, AFLP, SSR) are used to study genetic variation in forest species [1,8,16,48]. Advantages of these methods over others include their better representation of the variation present within species [12]. RAPD markers are applied for initial studies of genetic diversity in forest species like pine (Pinus leucodermis), big-leaf mahogany (Swietenia macrophylla) and tornillo (Cedrelinga cateniformis) [9,12,16]. To date, only a small number of studies have used molecular markers to examine genetic diversity within and between populations of capirona, and this study represents the first population structure assessment in capirona from the Peruvian Amazon. We report that genetic diversity of capirona based on RAPD markers is high among four populations sampled across the Peruvian Amazon. These estimates are in agreement with the results obtained by Russell et al. [1]. On the contrary, Dávila-Lara et al. [48] reported lower ge- Genetic diversity and F st estimation were conducted considering the four clusters (populations) identified by STRUCTURE software, taking into account that RAPD are dominant markers. The Nei's genetic diversity [46] estimates ranged from 0.12 to 0.41 among the four populations sampled across the Peruvian Amazon, showing a considerable similarity between the samples studied here. The Shannon-Wiener index ranged from 1.79 to 2.94, and Simpson's index from 0.83-0.95, indicating high genetic diversity for all four populations of capirona. The percentage of polymorphic loci per cluster ranged from 51.08% (cluster 1) to 96.24% (cluster 3) with an average of 78.5% (Table 2). Population divergence (F st ) between clusters 1 and 4 revealed the highest genetic difference (0.269), and the lowest F st was observed between clusters 3 and 4 (Table 3). The analysis of molecular variance (AMOVA) allows the study of genetic variation within and between clusters. AMOVA revealed that 28.44% of the total variation was found between clusters of capirona while 71.56% was within clusters (Table 4).

Discussion
Genetic diversity sheds light on the evolutionary pressure on the alleles and the mutation rate a locus might have undergone over a period of time [47]. To the best of our knowledge, this is the first study that employs molecular markers to estimate population structure and genetic diversity of capirona from a wide range of natural distribution in Peru. Molecular markers based on DNA (RAPD, AFLP, SSR) are used to study genetic variation in forest species [1,8,16,48]. Advantages of these methods over others include their better representation of the variation present within species [12]. RAPD markers are applied for initial studies of genetic diversity in forest species like pine (Pinus leucodermis), big-leaf mahogany (Swietenia macrophylla) and tornillo (Cedrelinga cateniformis) [9,12,16]. To date, only a small number of studies have used molecular markers to examine genetic diversity within and between populations of capirona, and this study represents the first population structure assessment in capirona from the Peruvian Amazon. We report that genetic diversity of capirona based on RAPD markers is high among four populations sampled across the Peruvian Amazon. These estimates are in agreement with the results obtained by Russell et al. [1]. On the contrary, Dávila-Lara et al. [48] reported lower genetic diversity parameters across 13 populations of capirona in Nicaragua (Central America). This discrepancy may be explained due to different sampling origins. The Amazon basin is known as the center of origin of this forest species [2], therefore this geographic region might hide plenty of beneficial genes as germplasm around this center because it has had the opportunity to evolve adaptation capacities to multiple environmental challenges over a longer period of time. Therefore, it is expected that the capirona samples from the Amazon basin possess higher genetic diversity. Moreover, the low genetic diversity of samples from Nicaragua might be explained by its biogeographic history, Pleistocene range dynamics and recent anthropogenic deforestation [48]. In addition, Tauchen et al. [3] used nuclear ribosomal internal transcribed spacers (nrITS) to molecularly characterize populations of capirona and failed to separate individuals by multivariate analysis. They concluded that morphological diversity is higher than the genetic one in this tree species. Similarly, Montes et al. [49] analyzed the morphological diversity of Callycophyllum spruceanum in 11 natural populations of the Peruvian Amazon. They grew seeds of capirona and analyzed stem-growth and branch-wood traits and found no significant structuring populations.
The different values obtained for the three diversity indices (Shannon-Wiener, Simpson and Nei's indices) may be explained due to the fact that whereas Nei's index measures the level of heterozygosity, Shannon-Wiener and Simpson assumes that any difference between the analyzed individuals means a different species. Nei's genetic diversity index ranged from 0.12 to 0.41, indicating high genetic diversity, which is concordant for specimens from natural populations as they are not affected by artificial selection. Maintaining high genetic diversity in natural populations is important because it reduces the risk of local extinction under natural conditions [50,51]. Similarly, the Shannon-Wiener index is slightly high (1.79 to 2.94) and demonstrates the wealth and abundance among capirona clusters. Likewise, Simpson index varied from 0.83 to 0.95, revealing that it is very likely that two individuals of capirona randomly selected from a cluster will belong to the same species. Although genetic relationships between species may be relatively low (low Nei's index), the diversity of the sequences and therefore of the population can still be relatively high (high Simpson and Shannon indices) [46,52]. Similarly, a recent study on Guazuma crinita reported low Nei's gene diversity and high Shannon information index [50]. In addition, our expected heterozygosity is similar to the populations of tornillo (Cedrelinga cateniformis) [31].
Population structure analysis is informative to understand genetic diversity and facilitates subsequent association mapping studies [51]; its presence in those studies can lead to false positive associations between markers and traits [51]. Consequently, assessing population structure is a crucial step to conduct association between molecular markers and traits. In our study, DAPC and STRUCTURE results revealed that 59 samples of capirona from the Peruvian Amazon clustered into four well-defined groups associated with their genetic structure. Moreover, PCoA gave similar results. According to Rosyara et al. [53], DAPC exhibits the ability to control population structure in association with mapping studies, and it is slightly better than STRUCTURE analysis for discriminating among populations as revealed in Prunus avium [54] and Camelia sinensis [55]. The presence of structure in these 59 samples of capirona meet our expectations since according to the pedigree of the genotypes, all the individuals can be divided into four geographical locations, and intentional selection of traits by farmers might affect the population structure. However, there are a few accessions intermingled between the four clusters. This might be due to the common process of exchanging genetic material by farmers living in close geographic areas (Irazola, Masisea, Iñapari) or migration as it may have occurred for sample cap069 (LBS).
Among the four populations identified in this study, cluster 1 included mainly samples from Irazola and showed the biggest difference from the other clusters, and it had the lowest genetic diversity. These results may suggest that samples from Irazola possess narrow genetic diversity as they were geographically isolated and developed using limited genetic resources. AMOVA demonstrated that the greatest variation exists within populations of capirona. This may be explained due to the sexual propagation method of capirona. In addition, this is expected in this species because it produces seeds that are dispersed over long distances by wind and water. In general, this is the case for tropical trees, and especially those that are outcrossing, pollinated and/or dispersed by wind, so they tend to harbor high levels of genetic diversity at local and regional spatial scales [50]. Moreover, our results agree with similar studies conducted by Russell et al. [1] and Tauchen et al. [3].
Even though there have been some efforts to study capirona at the molecular and morphological level, further research is needed aiming at identifying putative genes in this tree species. Next generation sequencing (NGS) is a modern tool widely used on many plant species. However, very few investigations were conducted using NGS on forest species [56,57]. Our next step is to develop extra molecular tools for capirona, and other tree species using NGS techniques such as de novo transcriptome in order to identify EST-SSR markers. Similar to other studies on forest species [58][59][60], the development of those markers will benefit the genetic mapping and breeding process of capirona in the near future. These molecular tools may also shed light on deciphering the evolution history of the Calycophyllum genus.

Conclusions
In this study we demonstrated that RAPD markers were successful and effective for the assessment of the genetic diversity and structure of C. spruceanum populations from the Peruvian Amazon. High levels of genetic diversity were registered using different indices, reflecting probable extensive gene flow due to long-distance seed dispersal. Moreover, capirona samples were grouped into four clusters according to their geographic affiliation. However, few samples were intermingled, probably due to the common activity of seed exchange followed by farmers. More work is still needed using additional collections of capirona from a wider geographic range and other molecular markers. In addition, extra molecular tools should be developed for this tree species using NGS techniques in order to promote the establishment of a modern breeding program of forest species in Peru.

Supplementary Materials:
The following are available online at https://www.mdpi.com/article/ 10.3390/f12081125/s1, Table S1: List of the primers used in the RAPD analysis, their sequence and reference. Table S2: Cluster assignment of 59 samples of capirona based on STRUCTURE and DAPC analysis. Figure S1: RAPD banding pattern using primers OPA-10: A, SC10-20: B, OPA-02: C. Ladder 100 pb NEB. 1% agarosa. 1 ul de ADN + 9 uL de Buffer dye 1 × and 0.035 uL de gel red. Figure S2: Number of populations. Plot of K ranging from 1 to 10. (a) All K values were obtained from adegenet analysis (DAPC). (b) All K values were obtained from STRUCTURE analysis. Four populations were considered in a data set of 10 RAPD markers and 59 samples of capirona. Figure S3: Population structure of 59 samples of capirona inferred by DAPC using 10 RAPD markers. Irazola, Masisea, LBS and Iñapari refer to the geographic origin of the samples. Figure  Funding: This research was funded by the National Agrarian Innovation Program (PNIA)-Project No. 120-PI. C.L.S. and C.I.A. were supported by PIP 2449640 "Creación del Servicio de Agricultura de Precisión en los Departamentos de Lambayeque, Huancavelica, Ucayali, y San Martín 4 Departamentos" and PP0068 "Reducción de la vulnerabilidad y atención de emergencias por desastres".