Molecular Data Reveal Multiple Lineages in Piranhas of the Genus Pygocentrus (Teleostei, Characiformes)

Carnivorous piranhas are distributed in four serrasalmid genera including Pygocentrus, which inhabit major river basins of South America. While P. cariba and P. piraya are endemics of the Orinoco and São Francisco basins, respectively, P. nattereri is widely distributed across the Amazonas, Essequibo, lower Paraná, Paraguay, and coastal rivers of northeastern Brazil, with recent records of introductions in Asia. Few studies have focused on the genetic diversity and systematics of Pygocentrus and the putative presence of additional species within P. nattereri has never been the subject of a detailed molecular study. Here we aimed to delimit species of Pygocentrus, test the phylogeographic structure of P. nattereri, and access the origin of introduced specimens of P. nattereri in Asia. Phylogenetic analyses based on a mitochondrial dataset involving maximum-likelihood tree reconstruction, genetic distances, Bayesian analysis, three delimitation approaches, and haplotype analysis corroborate the morphological hypothesis of the occurrence of three species of Pygocentrus. However, we provide here strong evidence that P. nattereri contains at least five phylogeographically-structured lineages in the Amazonas, Guaporé (type locality), Itapecuru, Paraná/Paraguay, and Tocantins/Araguaia river basins. We finally found that the introduced specimens in Asia consistently descend from the lineage of P. nattereri from the main Rio Amazonas. These results contribute to future research aimed to detect morphological variation that may occur in those genetic lineages of Pygocentrus.


Introduction
The Neotropical fish family Serrasalmidae contains 16 genera and 97 valid species [1] of ecomorphologically diverse freshwater fishes popularly known as pacus and piranhas. The species are divided in three main clades being two encompassed by pacus, tambaquis and silverdollars and a third containing mostly carnivorous piranhas [2]. This later clade includes six genera: the four carnivorous genera Pristobrycon Eigenmann, 1915 [2][3][4]. The monophyly of this six-genera group is supported on the basis of both morphological [3,4] and multilocus molecular data [2,5,6].
The genus Pygocentrus includes the largest species of piranhas, reaching up to 50 cm standard length [7], that are highly appreciated in the ornamental trade and have a relative economic importance in regional fisheries and aquaculture [8,9]. Species of Pygocentrus are morphologically distinguished from other serrasalmids by a substantially wider head, a dorsal profile that is moderately to strongly convex, the presence of a preanal spine that is often undetectable externally, tricuspid teeth and the lack of ectopterygoid teeth, except in small juveniles [10,11]. The main synapomorphy of the genus is the presence of crests around the lateral-sensorial system of the frontal, parietal and pterotic bones [12].

Taxon Sampling
Specimens were collected or obtained from fish collections, and morphologically identified by consulting the taxonomic literature and identification keys [10]. Specimens of the three valid species of Pygocentrus plus Serrasalmus elongatus Kner, 1858 as outgroup (root) were included in the analysis (Appendix A; Figure 2). The matrix contained 161 specimens, in which 57 were newly sequenced and 104 were obtained from GenBank (ncbi.nlm.nih.gov/genbank) or BOLD (boldsystems.org) databases (Appendix A). We attempted to obtain samples from all river basins in order to sample intraspecific genetic diversity for each species. We also included available sequences of P. nattereri introduced in the Philippines [20], the only available sequences on GenBank, to identify the original region that served as the source of those introduced specimens. Vouchers were fixed in 95% ethanol or 10% formalin and transferred to 70% ethanol for permanent storage and posteriorly deposited in the Laboratório de Biologia e Genética de Peixes, Universidade Estadual Paulista, Botucatu, Brazil (LBP), and Colección de Zoología, Universidad del Tolima, Ibagué, Colombia (CZUT-IC) (Appendix A).

Taxon Sampling
Specimens were collected or obtained from fish collections, and morphologically identified by consulting the taxonomic literature and identification keys [10]. Specimens of the three valid species of Pygocentrus plus Serrasalmus elongatus Kner, 1858 as outgroup (root) were included in the analysis (Appendix A; Figure 2). The matrix contained 161 specimens, in which 57 were newly sequenced and 104 were obtained from GenBank (ncbi.nlm.nih.gov/genbank) or BOLD (boldsystems.org) databases (Appendix A). We attempted to obtain samples from all river basins in order to sample intraspecific genetic diversity for each species. We also included available sequences of P. nattereri introduced in the Philippines [20], the only available sequences on GenBank, to identify the original region that served as the source of those introduced specimens. Vouchers were fixed in 95% ethanol or 10% formalin and transferred to 70% ethanol for permanent storage and posteriorly deposited in the Laboratório de Biologia e Genética de Peixes, Universidade Estadual Paulista, Botucatu, Brazil (LBP), and Colección de Zoología, Universidad del Tolima, Ibagué, Colombia (CZUT-IC) (Appendix A).

DNA Extraction, Amplification and Sequencing
Tissue samples were taken from livers, gills, fins or muscles. The total DNA was isolated using the Qiagen "DNeasy Blood & Tissue" (Qiagen, Hilden, Germany) kit according to manufacturer's instructions. Partial segments of the COI gene were amplified by PCR using the primers Fish F1 (5'-

DNA Extraction, Amplification and Sequencing
Tissue samples were taken from livers, gills, fins or muscles. The total DNA was isolated using the Qiagen "DNeasy Blood & Tissue" (Qiagen, Hilden, Germany) kit according to manufacturer's instructions. Partial segments of the COI gene were amplified by PCR using the primers Fish F1 (5 -TCAACCAACCACAAAGACATTGGCAC-3 ) and Fish R1 (5 -TAGACTTCTGGGTGGCCAAAG AATCA-3 ) [21]. The PCR was performed on a thermocycler with a final volume of 12 µL containing of 8.175 µL distilled water, 0.5 µL dNTP (8 mM), 1.25 µL 10× Taq buffer (500 mM KCl; 200 mM Tris-HCl), 0.375 µL of MgCl 2 , 0.25 µL of each primer (10 µM) and 0.2 µL of PHT Taq polymerase. PCR conditions consisted of an initial denaturation at 95 • C for 5 min, followed by 35 cycles including denaturation at 95 • C for 45 s, annealing at 52 • C for 45 s and extension at 68 • C for 120 s, and a final extension at 68 • C for 5 min. Amplified products were checked on 1% agarose gel.
Amplicons were then purified with ExoSAP-IT (USB Corporation, Cleveland, OH, USA) following the manufacturer's protocol. The purified product was used as template to sequence both DNA strands. The cycle sequencing reaction was carried out using a BigDye Terminator v3.1 Cycle Sequencing Ready Reaction kit (Applied Biosystems, Austin, TX, USA) in a final volume of 7 µL containing 0.35 µL primer (10 mM), 1.05 µL buffer 5×, 0.7 µL BigDye mix, and 3.9 µL distilled water. The cycle sequencing conditions were initial denaturation at 96 • C for 2 min followed by 30 cycles of denaturation at 96 • C for 45 s, annealing at 50 • C for 60 s, and extension at 60 • C for 4 min. The sequencing products were then purified following the protocol suggested in the BigDye Terminator v3.1 Cycle Sequencing kit's manual (Applied Biosystems). All samples were sequenced on an ABI 3130 Genetic Analyzer (Applied Biosystems) following the manufacturer's instructions.

Species Delimitation Analyses
Sequences were assembled and edited in Geneious 4.8 [22] to obtain a single consensus sequence for each specimen and also to check for deletions, insertions, and stop codons. Then, sequences were aligned with Muscle algorithm [23], and the aligned matrix was tested for saturation in DAMBE v7 [24]. The TN93+I (Tamura-Nei + Invariant sites) was estimated as the best-fit model of nucleotide evolution for our data by PartitionFinder [25] and was used in programs containing such a model. Sequences were binned into groups according to a neighbor-joining tree using TN93 in MEGA X [26]; for example, subgroups of Pygocentrus nattereri were split in five drainage-groups (Amazonas, Guaporé, Itapecuru, Paraná-Paraguay, and Tocantins/Araguaia) to test the prior hypothesis of multiple structured populations. The Amazonas population includes samples from the entire basin, except for Guaporé and Tocantins-Araguaia river basins as determined by the distance analysis. Three approaches of genetic distances were obtained using the TN93 model in MEGA X: the overall mean distance, intraspecific distances, and interspecific distances. The neighbor-joining tree was then generated in MEGA and tested by 1000 bootstrap pseudoreplicates.
We used three distinct species delimitation methods (Poisson Tree Process, Automatic Barcode Gap Discovery, and General Mixed Yule Coalescent Model) for our dataset using either sequence-based estimations or topology-based analyses based on the maximum likelihood (ML) or Bayesian inference. The maximum likelihood (ML) analysis was performed in RAxML v7.2 [27] using the GTR-GAMMA model, a maximum parsimony starting tree, and a posteriori analysis of bootstrap with the autoMRE function [28]. The best ML tree was used as an input tree for the Poisson Tree Process (PTP) model, that delimits species using non-ultrametric trees, since the speciation rate is modeled directly by the number of nucleotide substitutions [29]. The analysis was performed with the PTP webserver (species.h-its.org/ptp) using 100,000 MCMC generations and a 0.1 burn-in rate as the default settings.
Secondly, we performed the Automatic Barcode Gap Discovery (ABGD) analysis, an automatic procedure that sorts sequences into hypothetical species based on the barcode gap [30]. It infers a model-based confidence limit for intraspecific divergence by detecting the barcode gap as the first significant gap beyond this limit and uses it to partition the data. Inference of the limit and gap detection are then recursively applied to previously obtained groups to get finer partitions until there is no further partitioning [30]. The analysis was performed at the ABGD webserver (wwwabi.snv. jussieu.fr/public/abgd/abgdweb.html) with the Kimura (K80; 2.0) distance model with X = 1.0, Pmin = 0.001 and Pmax = 0.05. Finally, we ran the General Mixed Yule Coalescent model (GMYC), a likelihood method that delimits species by fitting within-and between species branching models to reconstructed gene trees [31]. Because GMYC requires no polytomies, DAMBE v7 [24] was used to remove duplicated haplotypes, which improves the algorithm and maximizes computational time analysis. Then, a Bayesian inference of phylogeny was estimated with a relaxed lognormal clock with a speciation birth-death model, on an arbitrary timescale, using BEAST v1.8.4 [32]. The nucleotide evolution model used to estimate the ultrametric tree was TN93+I as estimated by PartitionFinder [25]. A random tree was used as a starting tree for the MCMC searches with two independent runs of 500,000,000 generations, with trees sampled at every 50,000th generation. The distribution of log-likelihood scores was examined to determine the stationary phase for each search and to decide whether extra runs were required to achieve convergence using Tracer v1.7.1 [33]. All sampled topologies beneath the asymptote were discarded as part of a burn-in procedure (10%), and the remaining trees were used to construct a 50% majority-rule consensus tree in TreeAnnotator v1.8.4. The resulting tree was visualized in FigTree v1.4.3, and the resultant topology was implemented in the GMYC analysis. The GMYC delimitation analysis was performed at the webserver (species.h-its.org/gmyc) with a single threshold method and other parameters set as default.
We also used DnaSP v5 [34] to estimate the number of polymorphic sites, haplotype number and haplotype/nucleotide diversity (H D /Pi) and used PopART v1.7 [35] to run a median-joining analysis [36] and obtain a haplotype network. Finally, we used a PhyloMap-PTP tool [37] available in the PTP webserver that combines Principal Coordinates Analysis (PCoA), PTP, and species tree mapping. These approaches were applied to understand the spatial distribution of haplotypes and how they are related to each other.

Results
Newly generated sequences were obtained from 57 specimens in addition to 104 sequences obtained from public databases, resulting in a final matrix with 161 sequences. Sequences are deposited in BOLD PYGO001-18-048-18 and PYGO049-19-057-19. Stop codons, deletions or insertions were absent in all sequences. Following alignment and editing, the final matrix has 522 bp of which 476 bp were conserved (91.2%) and 46 were variable, with 22.6% adenine, 31.8% cytosine, 27.9% thymine and 17.8% guanine. DAMBE revealed Iss values lower than Iss.cAsym and Iss.cSym values, which mean the lack of a saturation signal in the matrix. The dataset contains a total of 12 haplotypes (Pi = 12.157; H D = 0.835): one haplotype of Serrasalmus as root and 11 haplotypes of Pygocentrus. Pygocentrus cariba presented two haplotypes, P. piraya presented four haplotypes, and P. nattereri presented six haplotypes. Within P. nattereri, each sample from Amazonas, Guaporé, Itapecuru, Paraná/Paraguay and Tocantins/Araguaia presented exclusive haplotypes.
The genetic distance analysis recognizes the three morphologically-defined species of Pygocentrus with 0.059 ± 0.010 of distance between P. cariba and P. piraya, 0.055 ± 0.010 between P. cariba and P. nattereri, and 0.026 ± 0.006 between P. piraya and P. nattereri. Subgroups of P. nattereri presented genetic distances ranging from 0.005 ± 0.003 between Guaporé and Paraná/Paraguay to 0.017 ± 0.005 between Itapecuru and Tocantins/Araguaia and Itapecuru and Guaporé (Table 1). Results also reveal low intraspecific genetic variation within each lineage (0.000-0.003) ( Table 1).   Figure S2) and the Bayesian tree ( Figure S3) recognized each of the three previously recognized species of Pygocentrus and also indicates a clear segmentation of lineages in P. nattereri ( Figure 3). The PTP method returned well-defined lineages for P. cariba and P. piraya and splitted P. nattereri in five distinct lineages from Amazonas, Guaporé, Itapecuru, Paraná/Paraguay, and Tocantins/Araguaia. The ABGD method resulted in eight partitions that ranged from 11 (p = 0.001) to two lineages (p = 0.02), with three partitions supporting the presence of seven lineages of Pygocentrus (p = 0.002-0.005), that is P. cariba, P. piraya, and P. nattereri subdivided in five subgroups: Amazonas, Guaporé, Itapecuru, Paraná/Paraguay and Tocantins/Araguaia. The GMYC oversplitted Pygocentrus in 16 lineages, two for P. cariba, five for P. piraya and eight for P. nattereri (three in the Amazonas, two in the Tocantins/Araguaia, and one for each Guaporé, Itapecuru, and Paraná/Paraguay). The threshold time obtained in the GMYC analysis was −1.14 × 10 −4 T, where T is the time from present to the time of the root.
Additionally, we included seven sequences of introduced specimens of Pygocentrus nattereri in the Philippines [20] to determine the source of parental specimens that were originally from South America. All topologies evidenced that they are genetically proximate to the Amazonas group ( Figures S1 and S2). The sequences of specimens from Philippines (FCOD numbers) do not have any nucleotide substitution when compared to those collected in the Amazonas drainages (i.e., 0.000 genetic distance). This evidence indicates that the introduced specimens were obtained from somewhere in the Amazonas basin other than in the Guaporé or Tocantins/Araguaia or any other South American drainage. Haplotype network and PhyloMap-PTP approaches allow the visualization of the distribution and relationships of each haplotype (Figure 4). Genes 2019, 10 8   sequences of specimens from Philippines (FCOD numbers) do not have any nucleotide substitution when compared to those collected in the Amazonas drainages (i.e., 0.000 genetic distance). This evidence indicates that the introduced specimens were obtained from somewhere in the Amazonas basin other than in the Guaporé or Tocantins/Araguaia or any other South American drainage. Haplotype network and PhyloMap-PTP approaches allow the visualization of the distribution and relationships of each haplotype (Figure 4).

Discussion
Species delimitation results support the recognition of the two species Pygocentrus cariba (Río Orinoco) and P. piraya (Rio São Francisco), and reveal the presence of five genetic lineages within the widely distributed P. nattereri. The three methods (PTP, ABGD and GMYC) split P. nattereri into five lineages: Amazonas, Guaporé, Itapecuru, Paraná/Paraguay, and Tocantins/Araguaia, and with

Discussion
Species delimitation results support the recognition of the two species Pygocentrus cariba (Río Orinoco) and P. piraya (Rio São Francisco), and reveal the presence of five genetic lineages within the widely distributed P. nattereri. The three methods (PTP, ABGD and GMYC) split P. nattereri into five lineages: Amazonas, Guaporé, Itapecuru, Paraná/Paraguay, and Tocantins/Araguaia, and with GMYC splitting P. cariba and P. piraya in two and five entities in the Orinoco and São Francisco basins, respectively. After the examination of voucher specimens using traditional morphometric/meristic data for Serrasalmidae [38], we did not identify morphological variation or diagnoses to formally describe these genetic lineages (or potential species). Thus, we recognize the three current species of Pygocentrus and the presence of five structured populations of P. nattereri in South America. These lineages can be potentially sibling species sensu Mayr [39], representing the herein named P. nattereri species complex. Sibling species represent a special case of cryptic species, when they are closest relatives and are not distinguished from one another, taxonomically [39,40]. Similarly, recent studies have been revealed several examples of cryptic species in Neotropical freshwater fish, mostly due to advances in molecular systematics and integrative taxonomy [41][42][43].
Pygocentrus nattereri is the most abundant and widely distributed species of Pygocentrus and, accordingly, has controversial species boundaries and carries a history of doubts about its diagnostic features, validity and taxonomic status. Fink [10] performed a revision of Pygocentrus and could not find any exclusive character supporting its species status, despite analyzing P. nattereri from all drainages. However, he delimited P. nattereri by the combination of characters such as absence of humeral blotch in adults and lack of rays in the adipose fin. Type specimens of P. nattereri were assigned to rio Guaporé of the rio Madeira basin [10] and two names currently in synonym of P. nattereri are available for Pygocentrus: P. altus from the upper Rio Amazonas that could be applied for the Amazonas lineage, and P. ternetzi from the Rio Paraguay that could be applied for the Paraná/Paraguay lineage. However, we consider prematurely revalidating those species without a taxonomic revision, taking into account our strong molecular evidence for the occurrence of additional lineages within the present concept of P. nattereri.
Our results agree with the most recent barcoding study of the family Serrasalmidae that included all species of Pygocentrus [15] and recognized both P. cariba and P. piraya as two species, with segmentation of P. nattereri in multiple lineages depending on the delimitation approach. The authors [15] found two well-defined lineages of P. nattereri (Tocantins/Araguaia lineage, and Branco/Madeira/Tapajós lineage) with GMYC recognizing a third lineage from the Rio Guaporé (Madeira basin). Our results indicate those same clusters and added two additional ones: the Itapecuru and Paraná/Paraguay lineages ( Figure 3). Present data also support the previous phylogeographic hypothesis that P. nattereri contains structured populations along the wide continental distribution [10,13] and also delimit each genetic lineage along the distribution of the species. It is noteworthy that additional samples from Guianas and other remote regions of Amazonia can be added to our dataset to further delimit P. nattereri.
Results presented herein indicate a very low genetic variation among most species of Pygocentrus, evident in P. piraya and within the P. nattereri complex, as exemplified by the low genetic distance values (Table 1) and the presence of few haplotypes even including species from a broad geographic expanse ( Figure 4). For example, we identified an exclusive haplotype that is shared between specimens of P. nattereri collected in the Rio Solimões at Brazil/Colombia boundary and from Amapá lakes at the northern Amazonas estuary. Pygocentrus cariba presents the highest genetic distance values among Pygocentrus species, even more than S. elongatus with other Pygocentrus. In fact, Hubert et al. [13] found a rapid speciation between P. cariba and the ancestor of P. nattereri and P. piraya less than one million year after the split between Pygocentrus and Serrasalmus (~8.73 Ma vs. 8.0 Ma). On the other hand, the cladogenetic events leading to P. nattereri and P. piraya were much more recent at around 2.63 ± 0.2 Ma, the split of P. nattereri from the Amazonas and that from the upper Paraguay at around 1.8 Ma and that from the Paraná at about 1.77 ± 0.3 Ma, and the differentiation of the lineages from the upper Amazonas (Ucayali and Madeira) at around 0.79 ± 0.1 Ma, which suggest a rapid and relatively recent differentiation of P. nattereri and P. piraya lineages. Accordingly, Machado et al. [15] found P. cariba to be the first species to diverge from any other species of Pygocentrus or Serrasalmus.
Species of Pygocentrus are widely introduced outside their native ranges and the environment impacts are specially related to predation of native species and damage of fishing nets and other fishes [44,45]. Herein, sequences of Pygocentrus introduced in the Philippines [20] were included in the analyses and results indicate that they belong to the Amazonas lineage. The effects of an invasion can be both observed on single or small groups of species or through an entire ecosystem; impacts such as predation, herbivory, parasitism, disease, competition and hybridization led to extirpation or reduction of the local population, or even causing global extinction of native species [46]. The recognition of the invasive species is the first step towards the investigation and management actions that may follow, such as eradication, maintenance management and control of population density [46]. Since the effects of introduction of these specimens may lead to ecological damage (e.g., competition for food, space and spawning sites), the accurate information about origin of introduced specimens of P. nattereri might contribute for future local management purposes.
Morphological characters are traditionally used to discriminate Serrasalmidae species despite allometry and body coloration being highly variable during ontogeny, thus strongly affecting accurate species identifications [10]. The combination of morphological and molecular approaches appears to be a good point to study interspecific variation and, indeed, has helped to identify, discriminate and describe species of other serrasalmid genera. For example, the study including three recognized species of Mylossoma indicated five genetic lineages instead [47], with two species resurrected and redescribed afterwards (M. albiscopum and M. unimaculatum; [48]). In a similar vein, Andrade et al. [49] recognized the seventh species of Tometes by integrating both morphological and mitochondrial data. The results presented herein integrate these two studies and expand the promising field of integrative taxonomy of Serrasalmidae. Together, these studies indicate the need for deep revisions of species and genera of Serrasalmidae, involving both genetic and morphological data to determine the presence of potential undescribed species and to reassign species among genera. In this context, further research can address additional morphological characters in order to test our molecular hypothesis of the presence of seven genetic lineages of Pygocentrus in South America that can be potentially be recognized as valid species.
Supplementary Materials: The following are available online at http://www.mdpi.com/2073-4425/10/5/371/s1, Figure S1: Neighbor joining tree based on the cytochrome c oxidase I gene for Pygocentrus species. Figure S2: Maximum likelihood tree based on the cytochrome c oxidase I gene for Pygocentrus species Figure S3: Bayesian inference tree based on the cytochrome c oxidase I gene for Pygocentrus species.

Acknowledgments:
The authors are grateful to Francisco Villa-Navarro (CZUT-IC) for the loan of important tissues, to Alec Zeinad for live images of Pygocentrus nattereri and P. piraya, and to Rafaela P. Ota for several discussions and comments on earlier versions of this paper.

Conflicts of Interest:
The authors declare no conflict of interest. The funders had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript, or in the decision to publish the results. Table A1. Voucher, locality information and BOLD or Genbank accession numbers of the analyzed specimens of Pygocentrus.