Recent Trends in Research on the Genetic Diversity of Plants: Implications for Conservation

: Genetic diversity and its distribution, both within and between populations, may be determined by micro-evolutionary processes, such as the demographic history of populations, natural selection, and gene ﬂow. In plants, indices of genetic diversity (e.g., k , h and π ) and structure (e.g., F ST ) are typically inferred from sequences of chloroplast markers. Given the recent advances and popularization of molecular techniques for research in population genetics, phylogenetics, phylogeography, and ecology, we adopted a scientometric approach to compile evidence on the recent trends in the use of cpDNA sequences as markers for the analysis of genetic diversity in botanical studies, over the years. We also used phylogenetic modeling to assess the relative contribution of relatedness or ecological and reproductive characters to the genetic diversity of plants. We postulated that genetic diversity could be deﬁned not only by microevolutionary factors and life history traits, but also by relatedness, so that species more closely related phylogenetically would have similar genetic diversities. We found a clear tendency for an increase in the number of studies over time, conﬁrming the hypothesis that the advances in the area of molecular genetics have supported the accumulation of data on the genetic diversity of plants. However, we found that the vast majority of these data have been produced by Chinese authors, and refer speciﬁcally to populations of Chinese plants. Most of the data on genetic diversity have been obtained for species in the International Union for Conservation of Nature (IUCN) category NE (Not Evaluated), which indicates a relative lack of attention on threatened species. In general, we observed very high F ST values in the groups analyzed and, as we focused primarily on species that have not been evaluated by the IUCN, the number of plant species that are threatened with extinction may be much greater than that indicated by the listing of this organization. We also found that the number of haplotypes ( k ) was inﬂuenced by the type of geographic distribution of the plant, while haplotype diversity ( h ) was a ﬀ ected by the type of ﬂower, and the ﬁxation index (F ST ), by the type of habitat. The plant species most closely-related phylogenetically have similar levels of genetic diversity. Overall, then, it will important to consider phylogenetic dependence in future studies that evaluate the e ﬀ ects of life-history traits on plant genetic diversity.


Introduction
Genetic diversity can be defined as any quantitative measure of the variability of a population, which reflects the equilibrium between mutation and the loss of genetic variation [1,2]. The development of molecular markers for plants, initially isoenzymes (e.g., [3]), provided access to the genetic variability found in the accessions, which was useful for characterizing the germoplasm and for genetic improvement, based on specific markers [4]. Given their genetic link, DNA markers can be used to detect allelic variation in the genes underlying the target characteristics [5].
Following the popularization and modernization of laboratory techniques for the analysis of genetic markers [6], primers developed for the detection of variability in accessions of plants of agricultural value have gradually been transferred to plants of ecological interest (e.g., [7][8][9][10][11]). With this, indices of genetic diversity (e.g., the number of alleles, number of haplotypes-k, haplotype diversity-h, nucleotide diversity-π, observed and expected heterozygosity-H O and H E ) and population structuring, such as F ST [12,13], R ST , G ST and θ ST , have been applied increasingly in other areas, such as phylogenetics, phylogeography, biogeography, molecular ecology, ecological genetics, genetic geography, and landscape genetics [14]. In plants, chloroplast DNA (cpDNA) markers predominate, especially in ecological studies.
In most angiosperms, the cpDNA is inherited through the maternal lineage, has a low mutation rate, and is rarely subject to recombination [15][16][17]. Estimates of genetic diversity based on cpDNA sequences can be used to provide inferences on the evolutionary history of plant species, including possible recolonization routes, diversification events, and gene flow [18][19][20]. This approach has been used not only to analyze ecologically important species, such as those with a high degree of endemism [21], but also medicinal plants [22], and those with commercial potential [23].
Some of the technology now available, such as New Generation Sequencing (NGS), has permitted the accumulation of large quantities of molecular data through the analysis of marker regions [24], making estimates of genetic diversity increasingly robust. In recent years, in fact, there have been enormous advances in the speed of analyses, the duration of readings and transfer rates, together with a dramatic reduction in the costs per base analyzed [25]. The advent and evolution of these techniques has led to an increase in the number of publications, and a progressive expansion in related areas, in particular those with an ecological perspective. Some studies, such as that of Souza et al. [26], have evaluated the tendencies in the scientific literature on plant population genetics, although, in this specific case, the study focused only on species from the Brazilian Cerrado savanna. Given this, we decided to test the hypothesis that the popularization of molecular techniques has supported a general tendency for an increasing number of publications that use cpDNA sequences to obtain estimates of genetic diversity in plants. We used a scientometric approach to collect data and provide evidence on the propensities of the use of these markers in botanical studies over time.
In the studies in which the genetic diversity or variation in plants is the principal focus, ecological or reproductive characteristics are rarely dealt with explicitly, and in most cases, the study only elucidates the fact that the principal factor determining the loss of genetic diversity is the reduction in effective population size caused by habitat fragmentation. This fragmentation reduces populations to small isolates, which are subject to increasing genetic drift and inbreeding, and reduced gene flow [27]. However, life-history traits, such as pollination and seed dispersal modes have received little attention in terms of their potential contribution to the loss of genetic diversity. Ballesteros-Mejia et al. [28] used the phylogenetic generalized least squares (PGLS) approach to explain the phylogenetic independence of the predictor variables and verify the robustness of the results of generalized linear models (GLMs) that confirmed the effects of the pollination mode and breeding system on the patterns of genetic differentiation in Neotropical plant species. A number of other studies [29][30][31][32] have attempted to relate genetic diversity to the life history traits of the plant, although as the data are not normally distributed, in general, they have either been transformed for the application of an analysis of variance (ANOVA) or a GLM has been applied. These studies have thus overlooked the phylogenetic relationships among the species analyzed, although other research has shown that all the species in a monophyletic group, that is, species that share a common ancestor, tend to be more similar to one another than species selected at random from a phylogenetic tree, and cannot be considered to be independent data points in statistical analyses [33,34].
Given these considerations, we investigated the relative influence of phylogenetic relationships and ecological or reproductive traits on the genetic diversity of plants. We test the hypothesis that genetic diversity may be determined not only by micro-evolutionary factors and life history traits, but also by the phylogenetic relationships of the species, with more closely-related species having more similar genetic characteristics. To this end, we (i) verified the number of scientific papers that report on studies using cpDNA markers to investigate plant genetic diversity, and the nationality of the authors that publish most in this field, (ii) identified the plant species and families that were the subject of most studies, and recorded their ecological and reproductive characteristics, and (iii) related the data on the life history of the species to the published estimates of genetic diversity and structure, and demonstrate the importance of phylogeny for studies that relate life history traits to genetic diversity in plants.

Scientometric Data
To evaluate tendencies in the publications on plant genetics, the "Web of Science", "Scopus", and "Pubmed" databases were searched using the key words: [cpDNA* AND genetic diversity*]; [chloroplast DNA* AND genetic diversity*]. The search was limited only by topics, and ran until the end of March, 2018.
The papers identified in this search were reviewed to identify and exclude all those that (i) did not employ cpDNA markers (but only referred to the cpDNA), (ii) focused on organisms other than plants (i.e., animals, microorganisms, and algae, except green algae), (iii) focused on molecular markers linked to the cpDNA, but were not cpDNA sequences per se, i.e., microsatellites or Simple Sequence Repeats (SSRs), Restriction Fragment Length Polymorphisms (RFLPs), Randomly Amplified Polymorphic DNA (RAPD), Amplified Fragment Length Polymorphisms (AFLPs), Single Nucleotide Polymorphisms (SNPs) or Polymerase Chain Reaction, associated with the DNA Fragment Polymorphism obtained by Restriction Enzymes (PCR-RFLP), (iv) reported on the development of software and/or techniques, (v) reported on the development or analysis of proteins or (vi) used the cpDNA for the analysis of phenomena other than genetic diversity. All other papers were included in the analysis.
The following information was compiled from each of the papers selected for analysis: (i) year of publication, (ii) journal, (iii) impact factor (Journal Citation Reports -JCR), (iv) number of citations (Web of Science), (v) nationality of the corresponding author, (vi) geographic study area, (vii) plant family, (viii) number of haplotypes (k), (ix) haplotype diversity (h), (x) nucleotide diversity (π), and (xi) genetic differentiation of populations (F ST ). For each study species, the following data were also compiled, based on the IUCN database (International Union for Conservation of Nature [35] and the scientific literature, for information on life history (ecological and reproductive) traits: (i) life form (tree, shrub, subshrub, epiphyte, herb, grass, vine), (ii) habitat (specialist or non-specialist), (iii) geographic distribution (ample or restricted), (iv) IUCN category (DD-Data Deficient, NE-Not Evaluated, LC-Least Concern, NT-Near Threatened, VU-Vulnerable, EN-Endangered, CR-Critically Endangered), (v) reproductive cycle (short cycle or perennial), (vi) type of flower (monoecious or dioecious), and (vii) type of pollinator (insect, wind, water).
To remove the effects of the general tendency for the number of papers to increase over time, the number of papers obtained in each year was divided by the total number of papers found in the Thomson-ISI database, and this value was multiplied by 10 6 , following the approach of Souza et al. [26]. The growth in the number of publications was evaluated using linear regression models, as well as the residuals of these models, to analyze the number of papers published per year as a function of time.
The diversity of the journals was estimated using Shannon-Wiener's diversity index [(H = − pi * Ln pi), (pi = ni/N)], where pi = the relative abundance of each journal, ni = the number of journals registered in the respective period, and N = the total number of journals. The mean impact factors of the papers published by the corresponding authors of different nationalities were compared using Tukey's test (significance level of p = 0.05). This analysis was run only for the nationalities that had more than two publications. The analyses were run in the R environment [36].

Genetic Diversity versus Phylogeny, and Ecological and Reproductive Traits
As few data were obtained on the genetic diversity and other characteristics of gymnosperms, and these data were often incomplete, and only angiosperms were included in these analyses. To evaluate the phylogenetic effects of the ecological and reproductive characteristics of the species on their genetic diversity, we obtained a reference phylogeny for each species included in the analyses. These phylogenies were constructed using Phylomatic master tree R20160415, obtained from the Phylomatic platform, using the modern classification of the angiosperm phylogenetic groups (APG IV, [37,38]). As no information was available on the length of the branches, we attributed an arbitrary value of one in all cases, and this did not have any significant influence on the results [39]. This phylogeny was used as the explanatory variable to predict evolutionary traits (see [40]) in the phylogenetic modeling based on phylogenetic eigenvector maps (PEMs). The PEMs provides a robust phylogenetic signal for the analysis of the genetic diversity of the target species, and were produced in the MPSEM package [41] of the R environment, version 3.4.3.

Scientometric Data
A total of 4046 scientific papers were identified in the three databases, although only 385 satisfied the selection criteria adopted in the initial review (Figure 1), and were included in the study. The earliest paper identified was published in 1996 and discusses the evolutionary biology and gene flow of populations of Aquilegia chrysantha and Aquilegia longissima, based on data on the trnL-trnF region of the chloroplast. The diversity of the journals was estimated using Shannon-Wiener's diversity index [(H' = -∑pi * Ln pi), (pi = ni/N)], where pi = the relative abundance of each journal, ni = the number of journals registered in the respective period, and N = the total number of journals. The mean impact factors of the papers published by the corresponding authors of different nationalities were compared using Tukey's test (significance level of p = 0.05). This analysis was run only for the nationalities that had more than two publications. The analyses were run in the R environment [36].

Genetic Diversity versus Phylogeny, and Ecological and Reproductive Traits
As few data were obtained on the genetic diversity and other characteristics of gymnosperms, and these data were often incomplete, and only angiosperms were included in these analyses. To evaluate the phylogenetic effects of the ecological and reproductive characteristics of the species on their genetic diversity, we obtained a reference phylogeny for each species included in the analyses. These phylogenies were constructed using Phylomatic master tree R20160415, obtained from the Phylomatic platform, using the modern classification of the angiosperm phylogenetic groups (APG IV, [37,38]). As no information was available on the length of the branches, we attributed an arbitrary value of one in all cases, and this did not have any significant influence on the results [39]. This phylogeny was used as the explanatory variable to predict evolutionary traits (see [40]) in the phylogenetic modeling based on phylogenetic eigenvector maps (PEMs). The PEMs provides a robust phylogenetic signal for the analysis of the genetic diversity of the target species, and were produced in the MPSEM package [41] of the R environment, version 3.4.3.

Scientometric Data
A total of 4,046 scientific papers were identified in the three databases, although only 385 satisfied the selection criteria adopted in the initial review (Figure 1), and were included in the study. The earliest paper identified was published in 1996 and discusses the evolutionary biology and gene flow of populations of Aquilegia chrysantha and Aquilegia longissima, based on data on the trnL-trnF region of the chloroplast.  The number of publications involving the use of cpDNA sequences as markers tended to increase over time. This trend was confirmed by the significant adjustment of the linear regression model (Figure 2a), and was most apparent in the 1980s. In the 1960s and 1970s, the number of publications/total number of publications*10 6 was less than five per year, while the threshold of 20 publications per annum was reached only in the 1990s. When only the publications that used cpDNA markers to obtain estimates of genetic diversity are analyzed, there was a gradual increase in the number of papers published over time (Figure 2b). The adjustment of both data sets to the linear model is supported by the random distribution of the residuals (Figure 2c,d). The number of publications involving the use of cpDNA sequences as markers tended to increase over time. This trend was confirmed by the significant adjustment of the linear regression model (Figure 2a), and was most apparent in the 1980s. In the 1960s and 1970s, the number of publications/total number of publications*10 6 was less than five per year, while the threshold of 20 publications per annum was reached only in the 1990s. When only the publications that used cpDNA markers to obtain estimates of genetic diversity are analyzed, there was a gradual increase in the number of papers published over time (Figure 2b). The adjustment of both data sets to the linear model is supported by the random distribution of the residuals (Figures 2c,d). Overall, 21.82% of the publications that used cpDNA markers to estimate genetic diversity were indexed in all three databases searched (i.e., Pubmed, Scopus, and Web of Science). A larger percentage (34.53%) was indexed only in Pubmed, while 19.62% were found in either Scopus or the Web of Science. The remaining 24.03% were indexed simultaneously in the latter two databases.
The papers are distributed in 73 different journals, of which 35 published only a single paper (9.09% of all publications), with the other papers being divided among the remaining 38 journals. The largest number of papers (54, or 14.02% of the total) were published in Molecular Ecology ( Figure S1a), with the first being published in 2001. The second journal that most published papers was Plos One, with 32 (8.32% of the total), with the first being published in 2011, and a further six in 2012. The diversity of journals in which papers that used cpDNA markers for the analysis of genetic diversity has increased significantly over time ( Figure S1b).
The corresponding authors of the papers belonged to 49 different nationalities, although a majority (39.1%) were Chinese ( Figure S2a), followed by Japanese (9.6%) and American (9.1%). Overall, 21.82% of the publications that used cpDNA markers to estimate genetic diversity were indexed in all three databases searched (i.e., Pubmed, Scopus, and Web of Science). A larger percentage (34.53%) was indexed only in Pubmed, while 19.62% were found in either Scopus or the Web of Science. The remaining 24.03% were indexed simultaneously in the latter two databases.
The papers are distributed in 73 different journals, of which 35 published only a single paper (9.09% of all publications), with the other papers being divided among the remaining 38 journals. The largest number of papers (54, or 14.02% of the total) were published in Molecular Ecology ( Figure S1a), with the first being published in 2001. The second journal that most published papers was Plos One, with 32 (8.32% of the total), with the first being published in 2011, and a further six in 2012. The diversity of journals in which papers that used cpDNA markers for the analysis of genetic diversity has increased significantly over time ( Figure S1b).
The corresponding authors of the papers belonged to 49 different nationalities, although a majority (39.1%) were Chinese ( Figure S2a), followed by Japanese (9.6%) and American (9.1%). Together, authors of these three countries contributed 57.8% of the papers published in this field of research.
The papers identified in the literature search referred to research in 171 different study areas, varying from a whole continent, such as North America, to countries and geographic features, including mountain ranges, peninsulas, and islands. Almost half (85,49.71%) of these areas were targeted in only a single paper. Once again, three countries predominated the study areas, with China being the focal area in 21.1% of the papers, Japan in 6.1%, and the United States, in 4.7% ( Figure S2b). Chinese authors analyzed primarily the genetic diversity of plants sampled in China. In addition to China and Japan, other Asian countries, such as Taiwan and Korea, were among the principal geographic areas targeted in the different studies.
The mean impact factor of the journals varied significantly (p < 0.0001) according to the nationality of the corresponding author ( Figure 3a). The highest impact factors were recorded for authors from Switzerland (5.11), Canada (5.94), Austria (5.08), and New Zealand (4.90), while the lowest mean was recorded for authors from Russia (1.11).
Only 10 (2.61%) of the papers have been cited more than 100 times (Figure 3b), whereas 20.10% have never been cited in another published paper. The most cited papers are those of Jakob and Blatnner [42], with 162 citations, Wang et al. [43] with 161, Zhang et al. [44] with 157, and Anderson et al. [45], with 147. While all these papers are more than eight years old, no systematic relationship was found between the publication date and the number of citations. Together, authors of these three countries contributed 57.8% of the papers published in this field of research. The papers identified in the literature search referred to research in 171 different study areas, varying from a whole continent, such as North America, to countries and geographic features, including mountain ranges, peninsulas, and islands. Almost half (85,49.71%) of these areas were targeted in only a single paper. Once again, three countries predominated the study areas, with China being the focal area in 21.1% of the papers, Japan in 6.1%, and the United States, in 4.7% ( Figure S2b). Chinese authors analyzed primarily the genetic diversity of plants sampled in China. In addition to China and Japan, other Asian countries, such as Taiwan and Korea, were among the principal geographic areas targeted in the different studies.
The mean impact factor of the journals varied significantly (p < 0.0001) according to the nationality of the corresponding author (Figure 3a). The highest impact factors were recorded for authors from Switzerland (5.11), Canada (5.94), Austria (5.08), and New Zealand (4.90), while the lowest mean was recorded for authors from Russia (1.11).
Only 10 (2.61%) of the papers have been cited more than 100 times (Figure 3b), whereas 20.10% have never been cited in another published paper. The most cited papers are those of Jakob and Blatnner [42], with 162 citations, Wang et al. [43] with 161, Zhang et al. [44] with 157, and Anderson et al. [45], with 147. While all these papers are more than eight years old, no systematic relationship was found between the publication date and the number of citations. Data were recorded on 639 species, representing 137 plant families and five groups (algae, bryophytes, pteridophytes, gymnosperms, and angiosperms). Of these species, 507 (in 118 families) were angiosperms (Table S1). The family with the largest number of study species (46) was the Poaceae, although these species were included in only 11 publications. The nest most diverse families were the Panacea and the Rosaceae, both with 38 species, targeted in 18 and 20 publications, respectively. These families were followed by the Asteraceae (35 species in 25 publications) and Fagaceae (23 species in 14 papers). Some of these more diverse families, such as the Asteraceae, Pinaceae, Rosaceae, Fabaceae, Fagaceae, Poaceae, and Polygonaceae, were related significantly with the increase in the number of publications over time ( Figure 4). Data were recorded on 639 species, representing 137 plant families and five groups (algae, bryophytes, pteridophytes, gymnosperms, and angiosperms). Of these species, 507 (in 118 families) were angiosperms (Table S1). The family with the largest number of study species (46) was the Poaceae, although these species were included in only 11 publications. The nest most diverse families were the Panacea and the Rosaceae, both with 38 species, targeted in 18 and 20 publications, respectively. These families were followed by the Asteraceae (35 species in 25 publications) and Fagaceae (23 species in 14 papers). Some of these more diverse families, such as the Asteraceae, Pinaceae, Rosaceae, Fabaceae, Fagaceae, Poaceae, and Polygonaceae, were related significantly with the increase in the number of publications over time ( Figure 4).  There was a tendency for the number of plant families analyzed in the publications to increase over the years. This pattern was observed in the principal components analysis (Figure 5a), with papers published in recent years including many families that were not evaluated in the early studies. In addition, a number of families recurred in 2012, and in 2014-2017.
The IUCN category was not used as a criterion for the selection of study species, even though some categories (LC, NE, NT, and VU) were related significantly with the year of publication. There was a prevalence of species in the Not Evaluated (NE) category, followed by other, less restrictive categories, such as LC (Figure 5b).   There was a tendency for the number of plant families analyzed in the publications to increase over the years. This pattern was observed in the principal components analysis (Figure 5a), with papers published in recent years including many families that were not evaluated in the early studies. In addition, a number of families recurred in 2012, and in 2014-2017.
The IUCN category was not used as a criterion for the selection of study species, even though some categories (LC, NE, NT, and VU) were related significantly with the year of publication. There was a prevalence of species in the Not Evaluated (NE) category, followed by other, less restrictive categories, such as LC (Figure 5b).   The IUCN category was not used as a criterion for the selection of study species, even though some categories (LC, NE, NT, and VU) were related significantly with the year of publication. There was a prevalence of species in the Not Evaluated (NE) category, followed by other, less restrictive categories, such as LC (Figure 5b).
The prevalence of studies on species in the Not Evaluated (NE) IUCN category was most accentuated in the years after 2006, peaking in 2016. Overall, the vast majority (81.66%) of the species were in the NE category, followed by LC (13.21%), VU and NT (1.58%), DD (1.38%), and EN (0.59%). A tendency was also observed for the studies to focus on species for which data on genetic diversity were not available, so that only 50 (7.84%) of the species registered here were targeted in more than one publication, in particular, Carpinus laxiflowera (Betulaceae), Dunnia sinensis (Rubiaceae), Euonymus oxyphyllus (Celastraceae), Leucomeris decora (Asteraceae), Nouelia insignis (Asteraceae), and Pseudotsuga menziesii (Pinaceae), which were each included in three studies. Only 27 angiosperm species were included in more than one publication, 24 in Asia, two in North America, and one in South America.
Just under half (48.52%) of the species were classified as perennials, and only 7.89% as short cycle, while a relatively large percentage (43.59%) were not classified ( Figure S3d). The type of flower was not classified for the vast majority (83.63%) of species, while 12.82% were monoecious, and 3.55%, dioecious ( Figure S3e). Similarly, no data were available on the pollinator of most species (84.02%), although 11.05% are known to be pollinated by insects, 4.73% by the wind, and 0.20% by water ( Figure S3f).

Genetic Diversity vs. Phylogeny, and Ecological and Reproductive Traits
The largest mean number of haplotypes (k = 20) was observed in the epiphytes, and the lowest number (k = 2), in the vines (Figure 6a). Habitat specialists and non-specialists had a similar mean number of haplotypes, of approximately 13. Unexpectedly, species with a restricted geographic distribution had a higher k (k = 14.9) than those with an ample distribution (k = 11.8). Perennial and short cycle species both had k values of approximately 13, as did dioecious and monoecious species (k ≈ 13.5). The species pollinated by insects and the wind both had mean k values of approximately 12, whereas those pollinated by water had a slightly lower mean (k = 9). In terms of the IUCN categories, the lowest k values were recorded in DD (6.00) and NE (8.5) species. The lowest mean haplotype diversity (h = 0.32) was recorded in the grasses (Figure 7a). Once again, the h values were similar between specialists and non-specialists (h ≈ 0.59) and between species with more ample and more restricted distributions (h ≈ 0.58). The h of short-cycle species was slightly higher (h = 0.68) than that of perennial species (h = 0.57). The h was also slightly higher in monoecious species (h = 0.66) than in dioecious ones (h = 0.52). The species pollinated by the wind had the lowest mean haplotype diversity (h = 0.48), while the lowest value for an IUCN category was recorded in the endangered (EN) species (h = 0.30). The lowest mean haplotype diversity (h = 0.32) was recorded in the grasses (Figure 7a). Once again, the h values were similar between specialists and non-specialists (h ≈ 0.59) and between species with more ample and more restricted distributions (h ≈ 0.58). The h of short-cycle species was slightly higher (h = 0.68) than that of perennial species (h = 0.57). The h was also slightly higher in monoecious species (h = 0.66) than in dioecious ones (h = 0.52). The species pollinated by the wind had the lowest mean haplotype diversity (h = 0.48), while the lowest value for an IUCN category was recorded in the endangered (EN) species (h = 0.30). Diversity 2018, 10, x FOR PEER REVIEW 10 of 22 In the case of nucleotide diversity (π), many outliers were observed, and the highest mean values (π = 0.005) were recorded in shrubs and grasses (Figure 8a). The value was approximately 0.007 for specialist and non-specialist species. Once again, species with a restricted distribution presented a higher value (π = 0.009) than those with a more ample distribution (π = 0.003). Shortcycle species presented a slightly lower value (π = 0.003) than those with a perennial cycle. Species with monoecious and dioecious flowers had similar nucleotide diversity (π ≈ 0.002). In relation to the IUCN categories, the NT group had the highest mean (π = 0.007). In the case of nucleotide diversity (π), many outliers were observed, and the highest mean values (π = 0.005) were recorded in shrubs and grasses (Figure 8a). The value was approximately 0.007 for specialist and non-specialist species. Once again, species with a restricted distribution presented a higher value (π = 0.009) than those with a more ample distribution (π = 0.003). Short-cycle species presented a slightly lower value (π = 0.003) than those with a perennial cycle. Species with monoecious and dioecious flowers had similar nucleotide diversity (π ≈ 0.002). In relation to the IUCN categories, the NT group had the highest mean (π = 0.007).   Overall, phylogeny appears to have a fundamental influence on the genetic diversity data (Table 1), and was highly significant, both when analyzed separately and when combined with other variables. The number of alleles was influenced by the type of distribution (R 2 = 0.015; p = 0.025), phylogeny (R 2 = 0.829; p > 0.001), and by this factor combined with all other variables. When phylogeny was combined with only one other variable, the life form had a significant influence (R 2 = 0.878; p > 0.001). Haplotype diversity was influenced only by phylogeny (R 2 = 0.799; p > 0.001) and the combination of this factor with all other variables. The life form also influenced h significantly (R 2 = 0.861; p > 0.001) when combined only with life form. Table 1. Values of R 2 and p calculated for the influence of phylogeny and ecological variables, and the different combinations of these variables, on the genetic diversity (k, h and π) and FST of plants. The data were obtained from studies that used cpDNA markers for the analysis of genetic diversity. Overall, phylogeny appears to have a fundamental influence on the genetic diversity data (Table 1), and was highly significant, both when analyzed separately and when combined with other variables. The number of alleles was influenced by the type of distribution (R 2 = 0.015; p = 0.025), phylogeny (R 2 = 0.829; p > 0.001), and by this factor combined with all other variables. When phylogeny was combined with only one other variable, the life form had a significant influence (R 2 = 0.878; p > 0.001). Haplotype diversity was influenced only by phylogeny (R 2 = 0.799; p > 0.001) and the combination of this factor with all other variables. The life form also influenced h significantly (R 2 = 0.861; p > 0.001) when combined only with life form. Nucleotide diversity (π) was also influenced by phylogeny (R 2 = 0.877; p > 0.001), and when combined with one other variable, the IUCN category was the most significant (R 2 = 0.918; p > 0.001). In general, F ST was the index least influenced by the phylogenetic relationships among the species (R 2 = 0.600; p > 0.001). This index was also influenced by the type of habitat (R 2 = 0.069; p = 0.035) and the combination of habitat and geographic distribution (R 2 = 0.115; p = 0.010). The combination of phylogeny and life form also had a significant influence on genetic structure (R 2 = 0.668; p > 0.001).
In the case of the reproductive variables (Table 2), the phylogeny, once again, had the greatest influence on the diversity of the data. The type of flower (monoecious or dioecious) also had a significant influence on the h values (R 2 = 0.090; p = 0.046). When phylogeny was considered together with each of the other reproductive characters, the type of pollinator had the greatest influence on k (R 2 = 0.833; p > 0.001), h (R 2 = 0.769; p > 0.001), and π (R 2 = 0.914; p > 0.001). In the case of F ST , the association between phylogeny and the life cycle was the most significant (R 2 = 0.899; p > 0.001). Table 2. Values of R 2 and p calculated for the influence of phylogeny and reproductive variables, and the different combinations of these variables, on the genetic diversity (k, h and π) and F ST of plants. The data were obtained from studies that used cpDNA markers for the analysis of genetic diversity.

Discussion
A linear increase was observed in the number of publications that used cpDNA sequences as genetic markers, starting in 1987 (Figure 2b), which may be related directly to the expanding use of the polymerase chain reaction (PCR) technique developed by Kary Mullis, in 1983 [46]. The growth in studies nevertheless became more accentuated only from 1992 onward, which may be accounted for by the development and popularization of universal cpDNA primers (e.g., [47][48][49]). The development of an increasing number of cpDNA markers culminated in studies that require indices of genetic diversity [50]. Given this, publications containing data on genetic diversity derived from these markers began to appear in 1996 (Figure 2b), increasing progressively during subsequent years. Shaw et al. [51] published no less than 21 non-codifying cpDNA sequences that constitute potential markers for the analysis of phylogenetic relationships. In addition, the development of new-generation sequencing (NGS) techniques provided access to an increasing number of populations and species, which supported the expansion of studies based on estimates of genetic diversity, including research in population genetics, phylogeography, and phylogeny [52,53].
China was by far the country with the most publications that used cpDNA markers to analyze genetic diversity ( Figure S2). There was also a clear relationship between the number of authors and the study region, which reflects the propensity of researchers to study species found in their home nation. The prominence of China in this scientific field can be accounted for by this country's history of investment in this type of research. In 1998, the Chinese Ministry for Science and Technology, in collaboration with other government bodies, established a number of private institutions for genomic research, including the Beijing Institute of Genomics (BGI) and the Chinese Academy of Sciences. Through these initiatives, China has become a major force in the field of DNA sequencing, and since the turn of the 21st century, research in genetics has advanced considerably in this country [54]. In 2010, the BGI became a global leader in DNA sequencing, collaborating with researchers around the world, and in 2016, it not only increased the velocity of its sequencing, but also reduced costs [55]. Between 2006 and 2010, China was the country with the second largest number of papers published in a number of different fields of research [56], and the findings of the present study corroborated this general tendency in the specific case of the papers that report on research in genetic diversity based on chloroplast markers. One factor that has reinforced this trend is the hotspot of endemism found in the mountainous region of western China [57]. This, together with the considerable technological advances, has stimulated local research, further reinforcing the focus of Chinese researchers on their own geographic region.
Even so, the large number of papers published by Chinese authors is not reflected in the quality of the publications, in terms of their impact factors. As for other fields of research, there is a clear trend for some papers, in particular, reviews and methodological studies, to have a long-lasting influence and to be cited widely [58], while the vast majority of papers are cited infrequently or not at all. Even so, no systematic relationship was found between the publication date of the paper and the number of times it was cited, which may mean that many recent papers have yet to have an impact, while many older papers are already outdated [59].
The family Poaceae was represented by the largest number of species in the papers published on research in genetic diversity that has used cpDNA sequences as molecular markers. This prominence was predictable, given the economic importance and, to a certain extent, the ecological relevance of this family, which is one of the most widely-studied angiosperm groups [60]. The Pinaceae and Rosaceae were also prominent in terms of both the number of species and publications. Many species of both families are commercially important, due to their medicinal properties and flowers, which stimulates interest in their population genetics, and phylogeographic and phylogenetic characteristics (e.g., [61][62][63][64][65][66][67]). The increase in the number of papers published over time was also associated with an increase in the number of families and species analyzed, reflecting a general tendency for authors to focus on species not studied previously, even when they are aware of the fact that the genetic diversity of different populations of a species may vary considerably.
Most of the species analyzed are herbs, which confirms the influence of medicinal properties on the selection of study species. Sampling considerations also appear to be a factor determining the selection of study species, given the predominance of perennial species with an ample geographic distribution over species with a short cycle and a more restricted distribution.
The predominance of species in the Not Evaluated (NE) category indicates a relative lack of studies of species that have been classified by the IUCN, in particular endangered forms. This indicates that, despite the importance of the IUCN classification as a measure of the risk of extinction of a species [68], and the fact that indices of genetic diversity provide an excellent measure of the conservation status of a species, by providing insights into population bottlenecks and structuring, and inbreeding levels, this parameter has yet to influence the research on cpDNA markers. In other words, this field of research has yet to establish a systematic link between the genetic data and species conservation.
The lowest mean number of haplotypes (k) were recorded in species classified as DD (6.00) and NE (8.5), which indicates that a systematic relationship may exist between genetic diversity and IUCN categorias, which can only be confirmed once more endangered species have been analyzed, and more of the NE species that have been analyzed are classified by the IUCN. Worldwide, the total number of vascular plant species has been estimated at between 219,204 [69] and 308,312 [70], while the IUCN Red List [35] contains only 25,452 assessments of plant species, which represents only 7.25% of the valid species known to exist, worldwide [71]. This means that the vast majority of plant species are yet to be evaluated in terms of priorities for conservation [72]. In addition, the assessments of around one third of the species that have been evaluated by the IUCN are considered to be outdated, given that they were assessed more than 10 years ago or were evaluated using a previous version of the IUCN criteria [73].
The lowest haplotype diversity (k = 0.30) was observed in the EN species, which is an especially important finding, given that the IUCN tends to overlook genetic data. This is important because the risk of extinction may be underestimated when genetic factors are not included in assessments, leading to the application of inadequate management strategies [74]. Spielman et al. [75] found that heterozygosity is 35% lower, on average, in endangered taxa than in unrelated non-threatened species, and that extinction typically occurs only after deleterious genetic effects.
Habitat specialist plants had lower mean genetic structure than non-specialist plants, and a higher mean F ST was recorded in more amply distributed plants, in comparison with the species with a more restricted range. These findings were corroborated by the phylogenetic modeling, which showed that the F ST values were influenced by habitat and the combination of habitat with distribution. The number of alleles was also influenced by the type of distribution of the plant. This is because species with fewer adaptive restrictions, whether for habitat or geographic distribution tend to have a higher genetic diversity [2]. Habitat specialists and plants with a restricted distribution tend to have less available habitat and smaller populations, which will have an increased propensity for inbreeding, and more structured populations. Species with an ampler distribution tend to have greater phenotypic plasticity, and total genetic variation (allelic and haplotype diversity) greater than species with a more restricted distribution [76].
In general, we observed high levels of genetic structuring in all plant groups, with high levels of structuring typically correlating with low expected heterozygosity at the population level [77]. These findings indicate that many of the plants evaluated have high levels of population structuring, and that the number of endangered plant species is probably much greater than that indicated by the IUCN classification.
On its own, phylogeny had a greater influence on genetic diversity than ecological or reproductive traits. These findings are consistent with previous studies [78,79], which have shown that phylogeny may influence the ecological and morphological characteristics of a species. As closely-related species tend to have similar life-history traits, phylogeny may be an important determinant of plant genetic diversity. This reinforces the conclusion that phylogenetic relationships should not be overlooked in the analysis of plant life-history traits.
When associated with phylogeny, the plant life form influenced k, h and F ST . The grasses presented low values for k and h, and a mean F ST of 0.55. Hamrick and Godt [30] also found evidence of the influence of life form on genetic diversity, with woody plants having slightly greater diversity than non-woody species with similar life-history traits. Ballesteros-Mejia et al. [28] also recorded higher genetic diversity (h and π) in tree species in comparison with plants of smaller size, although grasses were not analyzed in this case. The lower genetic diversity recorded in the present study for grasses may be related to the widespread domestication of many species of this group. Prolonged periods of intensive artificial selection and improvement have resulted in bottlenecks that have reduced genetic diversity throughout the genome of the plant, and limit the germoplasm available for further advances in the reproduction of the species [80,81]. The genetic diversity of wheat declined considerably between 1950 and 1989 [82], for example, and other studies (e.g., [83]) have shown that the genetic diversity of some tree species has been affected in a similar way.
In addition, many grasses are apomictic, a condition that is normally associated with low levels of genetic variation at the population level (e.g., [84][85][86]), especially in plants that have a short life cycle. On the other hand, many grasses are invasive species, which typically have a greater phenotypic plasticity than more native species. Invasive species typically colonize new areas in smaller numbers (normally with low genetic diversity) and often encounter conditions distinct from those in which they evolved [87].
The type of flower also influenced the h values of the plant species analyzed in the present study. The mean h recorded here was slightly higher in monoecious species than in dioecious ones. This contradicts expectations, given the potential for self fecundation in monoecious flowers, which will tend to increase inbreeding levels in the population. However, mechanisms that restrict the maturation of the ovaries until after that of the pollen may guarantee sexual reproduction in these plants.
While the present study has provided a number of important insights into the influence of phylogeny and life-history traits on the genetic diversity of plants, further research should be encouraged, in particular because of the relative lack of data for some groups, such as algae, cacti, mosses, and palms. We also encountered difficulties for the definition of the reproductive characteristics of many plants. We were unable to identify any genetic data on plant species pollinated by bats, for example. There is a clear need for more robust databases that include information on the ecological, and in particular, the reproductive characteristics of the plants, like that available from BirdLife International for birds, for example.

Conclusions
Recent advances in molecular research are reflected in a progressive increase in the number of published papers that report on the genetic diversity of plants, based on cpDNA sequences. Overall, however, these data were published predominantly by Chinese authors, and focused primarily on plants that occur in China. The vast majority of the plant species for which data on genetic diversity are available have yet to be evaluated by the IUCN, even though the analysis of genetic parameters would be extremely useful for the definition of the conservation status of these species, especially as we found evidence of increased structuring in all types of plants, which may reflect the presence of populations of reduced size and potentially endangered species. The number of haplotypes (k) was influenced by the type of geographic distribution, while haplotype diversity (h) reflected the type of flower, and the mean fixation index (F ST ) was influenced by the type of habitat. However, of all the variables analyzed, phylogeny was the factor that most influenced the genetic diversity of the plants, with phylogenetically proximate species tending to have similar genetic diversity. This reinforces the need for the verification of phylogenetic influences in studies that evaluate the effects of life-history traits on the genetic diversity of plants.
Supplementary Materials: The following are available online at http://www.mdpi.com/1424-2818/11/4/62/s1, Figure S1: (a) Journals with the largest number of papers reporting estimates of genetic diversity derived from cpDNA markers; (b) Variation in the diversity (Shannon-Wiener index) of the journals publishing studies on cpDNA markers over time, Figure S2: (a) The number of publications containing estimates of genetic diversity obtained using cpDNA markers, in relation to the nationality of the corresponding author; (b) The number of publications on genetic diversity based on cpDNA markers, according to the geographic region focused on by the study, Figure S3: Classification of the angiosperm species investigated in the papers that analyzed genetic diversity using cpDNA markers: (a) Life mode; (b) Habitat specialization; (c) Geographic distribution; (d) Reproductive cycle; (e) Type of flower, and (f) Type of pollinator, Table S1: Plant species identified in the publications containing estimates of genetic diversity obtained from the use of cpDNA sequences as molecular markers.