Phylogenetic and Haplotype Network Analyses of Diaporthe eres Species in China Based on Sequences of Multiple Loci

Simple Summary Diaporthe eres is one of the most serious plant pathogenic fungi that affect many economically important plants. It can cause rootstock death, stem canker, stem necrosis, dead branch, shoot blight, fruit rot, leaf spot, leaf necrosis, and umbel browning. In general, morphological and molecular characterization using multiple loci sequences were performed for the identification of Diaporthe species. However, there are morphological differences due to culture conditions, and the taxonomy of species of Diaporthe is unclear because the phylogeny based on different genes gives different tree topologies. In this study, we evaluate the phylogenetic relationships and population diversity among D. eres and other Diaporthe species. Our results showed that phylogenetic analyses from concatenated multi-locus DNA sequence data could resolve the D. eres species. Furthermore, haplotype network analysis showed that no correlation existed between population diversity and distribution or hosts across China. These results could improve our understanding of the epidemiology of D. eres and provide useful information for effective disease management. Abstract Diaporthe eres is considered one of the most important causal agents of many plant diseases, with a broad host range worldwide. In this study, multiple sequences of ribosomal internal transcribed spacer region (ITS), translation elongation factor 1-α gene (EF1-α), beta-tubulin gene (TUB2), calmodulin gene (CAL), and histone-3 gene (HIS) were used for multi-locus phylogenetic analysis. For phylogenetic analysis, maximum likelihood (ML), maximum parsimony (MP), and Bayesian inferred (BI) approaches were performed to investigate relationships of D. eres with closely related species. The results strongly support that the D. eres species falls into a monophyletic lineage, with the characteristics of a species complex. Phylogenetic informativeness (PI) analysis showed that clear boundaries could be proposed by using EF1-α, whereas ITS showed an ineffective reconstruction and, thus, was unsuitable for speciating boundaries for Diaporthe species. A combined dataset of EF1-α, CAL, TUB2, and HIS showed strong resolution for Diaporthe species, providing insights for the D. eres complex. Accordingly, besides D. biguttusis, D. camptothecicola, D. castaneae-mollissimae, D. cotoneastri, D. ellipicola, D. longicicola, D. mahothocarpus, D. momicola, D. nobilis, and Phomopsis fukushii, which have already been previously considered the synonymous species of D. eres, another three species, D. henanensis, D. lonicerae and D. rosicola, were further revealed to be synonyms of D. eres in this study. In order to demonstrate the genetic diversity of D. eres species in China, 138 D. eres isolates were randomly selected from previous studies in 16 provinces. These isolates were obtained from different major plant species from 2006 to 2020. The genetic distance was estimated with phylogenetic analysis and haplotype networks, and it was revealed that two major haplotypes existed in the Chinese populations of D. eres. The haplotype networks were widely dispersed and not uniquely correlated to specific populations. Overall, our analyses evaluated the phylogenetic identification for D. eres species and demonstrated the population diversity of D. eres in China.

Genealogical Concordance Phylogenetic Species Recognition (GCPSR) [42] represents an enhanced tool for species delimitation in the Diaporthe genus compared to morphological and biological identification [14,43]. The species concept in D. eres has greatly progressed since the molecular approach of concatenated multigene genealogies under GCPSR started to be conducted. However, the other processes, e.g., incomplete lineage sorting, recombination, and horizontal gene transfer, can cause discordances between gene trees and species trees and mask the true evolutionary relationship among closely related taxa [44]. Furthermore, the regular approach of concatenating sequence data from multiple loci under GCPSR can lead to inconsistency and poor species discrimination [45].
Therefore, the objectives of the study are to (i) employ different delimitation methods based on a genomic DNA sequence database to interpret species boundaries and to facilitate further species identification for D. eres, (ii) investigate Chinese populations of the D. eres species and characterize the relationship between the populations and their distributions based on sequences of multiple loci, and (iii) reconstruct phylogeny and explore the evolution of D. eres with the newly updated Chinese population.

D. eres and Related Species Isolates Used
Thirty-seven species, including the D. eres species complex and closely related species, were used in phylogenetic analyses. These species were originally collected in Australia, Canada, China, France, Germany, Italy, Japan, Korea, Netherlands, South Africa, Suriname, Thailand, UK, USA, and Yugoslavia, and their corresponding DNA sequences were downloaded from NCBI's GenBank nucleotide database (www.ncbi.nlm.nih.gov (accessed on 7 April 2020)) ( Table 1).

Sequence Alignment and Phylogenetic Analyses
Multi-locus phylogenetic analyses were conducted to identify isolates to species level using assembled DNA sequences of five loci. DNA sequences were used for consensus analysis with minor manual editions in the DNASTAR Lasergene Core Suite software program (SaqMan v.7.1.0; DNASTAR Inc., Madison, WI, USA). Sequence alignments and comparisons of assembled sequences were performed using the L-INS-i algorithm on the MAFFT alignment online server v.7.467 [54]. The aligned sequences were checked and manually adjusted in BioEdit v.7.2.5 [55] and converted to suitable formats (PHYLIP and NEXUS) using the Alignment Transformation Environment (ALTER) website online server [56]. The resulting DNA sequences, containing all five loci, were deposited at TreeBASE (submission number: 26697). Maximum likelihood (ML) phylogenetic trees were constructed using RAxML-HPC BlackBox v.8.2.10 [57], available in the CIPRES Science Gateway v.3.3 Web Portal [58], with 1000 bootstrap replications. The general time-reversible model of evolution, including the estimation of invariable sites (GTRGAMMA + I), was performed in ML analysis. Maximum parsimony (MP) analysis was performed with 1000 replicates using Phylogenetic Analyses Using Parsimony (PAUP*) v.4.0b10 [59]. Goodness fit and bootstrap values were calculated and harvested from tree length (TL), the consistency index (CI), the retention index (RI), the rescaled consistency index (RC), and the homoplasy index (HI). A heuristic search was carried out with 1000 random stepwise addition replicates using the tree bisection-reconnection (TBR) branch-swapping algorithm on "best trees". Gaps were treated as missing data, and all characters were weighted equally. The bootstrap support values (BS) were determined by the software to assess the robustness of MLBS and MPBS analyses; only branches with MLBS and MPBS over 70% were considered for ML/MP phylogenetic inference. Posterior probabilities values (PP) were calculated by Markov Chain Monte Carlo (MCMC) sampling in MrBayes v.3.2.2 [60]. The best-fit model of nucleotide substitution was determined with corrected Akaike information criterion (AIC) in MrModeltest v.2.3 [61] (Table S2). For BI analysis, four MCMC chains were run simultaneously, starting from random trees, for 10 5 generations, and trees were sampled every 100 th generation. The calculation of BI analysis was stopped when the average standard deviation of split frequencies fell below 0.01. The first 10% of resulting BI trees, which represent the burn-in phase of the analysis by inspecting likelihoods and parameters in Tracer v.1.7.1 [62], were discarded, and the remaining 9000 trees were used to calculate the posterior probabilities (PP) in the majority rule consensus tree. Bayesian posterior probability values (BIPP) over 0.95 were considered for BI trees, and all trees were rooted with D. citri (CBS 135422).

Genealogical Concordance Phylogenetic Species Recognition (GCPSR) Analysis
In this study, species boundaries were determined using genealogical concordance phylogenetic species recognition (GCPSR), as described in previous studies, in SplitsTree4 v.4.14.6 (www.splitstree.org (accessed on 26 September 2017)) [42,63,64]. Multi-locus concatenated sequence data, with EF1-α, TUB2, CAL, and HIS, were used to determine the recombination level within phylogenetically closely related species. In addition, the results of relationships between closely related species were visualized by constructing neighbor-joining (NJ) graphs.

Phylogenetic Informativeness Analysis
Phylogenetic informativeness (PI) was analyzed from taxonomically authenticated species and type-strains based on the multi-locus combined dataset of ITS, EF1-α, TUB2, CAL, and HIS. Twenty-eight representative isolates (from 23 species, including an outgroup) with a close relationship to the D. eres species complex based on phylogenetic analysis were selected to determine the profiling of phylogenetic informativeness [65]. Ultrametric trees were generated from the concatenated alignment dataset using maximum likelihood (ML) phylogenetic analysis, as described above. To estimate phylogenetic informativeness (phylogenetic informativeness per site (PI per site) and net phylogenetic informativeness (Net PI)), the corresponding partitioned alignment was harvested from the PhyDesign Web Portal at http://phydesign.townsend.yale.edu/ (accessed on 22 April 2020) [66].

Population Aggregation and Haplotype Network Analysis
To confirm the D. eres species, 138 taxa (Table S1), along with 51 taxa (Table 1), were reconstructed using multi-locus sequences of EF1-α, TUB2, CAL, and HIS ( Figure S17). In order to analyze the genetic diversity for D. eres populations, isolates that have been analyzed in phylogenetic analyses were applied. In brief, an individual locus was sequenced, and the alignment and comparison of assembled sequences were performed using ClustalX v.2.0.11 [67]. Gaps were treated as the missing data of each locus, and the end of 5 -and 3 -partial sequences were trimmed in the dataset. All population genetic parameters, including the number of polymorphic (segregating) sites (S), Nei's nucleotide diversity (π), haplotype numbers (Hap), haplotype diversity (Hd), nucleotide diversity from S (θw), and neutrality statistic information, such as Tajima's D, Fu and Li's D, and Fu's Fs, were calculated using DnaSP v.6.11.01 [68] for each individual locus and combined loci. Therefore, relationships among the haplotypes were depicted with the median-joining (MJ) method in Population Analysis with Reticulate Trees (PopART: http://popart.otago.ac.nz/index.shtml (accessed on 1 June 2020)) [69].

Phylogenetic Analysis of D. eres
In this study, the concatenated DNA sequences of five loci (ITS, EF1-α, CAL, TUB2, and HIS) from 216 sequences, including 5 outgroup sequences of D. citri, were used to infer delimitation of Diaporthe species. For the reconstruction of phylogenetic trees of Diaporthe species, altogether, 51 sequences of ITS, 47 sequences of EF1-α, 36 sequences of CAL, 47 sequences of TUB2, and 35 sequences of HIS were obtained from the GenBank database. Sequences of ITS, EF1-α, CAL, TUB2, and HIS were determined as 598, 592, 542, 828, and 502 base pairs (bp), respectively. For species delimitation of the D. eres complex, 50 taxa were analyzed, with 2464 bp assembled sequences of 4 genes, including 592 bp (1-592) of EF1-α, 542 bp (593-1134) of CAL, 828 bp (1135-1962) of TUB2, and 502 bp (1963-2464) of HIS, respectively. For the 5 loci combined sequences dataset with ITS region, we filled in the end of the four-gene dataset with 598 bp (2465-3062) of ITS. ML, MP, and BI analyses were used to perform phylogenetic reconstruction for individual and combined datasets; results showed similar topology and few differences in statistical support values. A comparison of alignment properties in parsimony analyses of individual and combined loci used in phylogenetic analyses is provided in Table 2.
Phylogenetic analyses for Diaporthe species were performed using each individual locus and combined loci of DNA sequences (Figures S1-S15). Among them, phylogenetic trees, using EF1-α, EF1-α+CAL, and EF1-α+CAL+TUB2+HIS, showed clear delimitation for D. eres species, while unclear delimitation was observed using five loci sequences of EF1-α+CAL+TUB2+HIS+ITS ( Figure 1). It was found that the three-loci combined dataset of EF1-α+CAL+TUB2 or EF1-α+CAL+HIS, with ML, MP, and BI analyses, was unable to separate D. bicincta  Figures S12 and S13). Overall, the four-loci combined dataset of EF1-α+CAL+TUB2+HIS showed the highest reliability to identify and resolve species boundaries in the D. eres complex (Figures 1 and 2).
The Bayesian inference phylogenetic tree of the D. eres species complex and close species constructed with combined DNA sequences is presented in Figure 2 as an example. In the phylogenetic tree, the D. eres species complex cluster (D. eres species complex) was clearly separated from another cluster that included D. collariana, D. heterophyllae, D. virgiliae, D. penetriteum, D. infertilis, D. sambucusii, and D. shennongjiaensis. Within the D. eres species complex cluster, a subcluster that included D. eres and 13 species with other names should be the synonymous species of D. eres. The close subcluster, which included D. helicis (CBS 138596), D. pulla (CBS 338.89), D. phragmitis (CBS 138897), and D. celeris (CBS 143349), showed a relatively distinct distance with D. eres, indicating that they are different species (Figure 2A). The D. eres species was further analyzed by GCPSR analysis. The NJ tree shows the relationship between Cluster I and Cluster II ( Figure 2B), indicating that genetic diversity is rich in this species.

Phylogenetic Informative Analysis
For phylogenetic informative analysis, only taxa with complete 5 loci sequences were used. As a result, the assembled DNA sequences were 3049 bp, including 544 bp of CAL Phylogenetic informativeness (PI) profiles, both Net PI and PI per site, indicated that EF1-α and CAL displayed the highest informative sequences to resolve the phylogenetic signal at the taxonomic level. Next were HIS and TUB2, which maintained fairly high informative sequences (Figure 3 and Figure S16). ITS presented the lowest PI signal among the selected loci and was unreliable for the delimitation of the D. eres species. The combined dataset of four loci (EF1-α, TUB2, CAL, and HIS) showed better delimitation for D. eres compared to the dataset of five loci (Figure 3), further confirming that the ITS locus was lowly informative.

Population Aggregation and Haplotype Network Analysis
The

D. eres Species Boundaries
Based on the phylogenetic analyses using multi-locus reconstruction (EF1-α, TUB2, CAL, and HIS), the species delimitation was determined among D. eres and closely related species (Figures 1 and 2 The detailed description and illustrations for these species can be found in previous reports [73][74][75].

Population Aggregation and Haplotype Network Analysis
The ITS of 137 taxa, EF1-α of 132 taxa, TUB2 of 137 taxa, CAL of 118 taxa, HIS of 70 taxa, and the combined sequences of 61 taxa were 450, 278, 355, 492, 421, 1429 (four loci without ITS), and 1882 (five loci) bp in length, respectively. The analysis of genetic diversity within D. eres showed a high level of haplotype numeric (Hap) and haplotype diversity (Hd). The summary of sequence variation and indices of sequence variation within the five loci among D. eres are shown in Table 3 and Table S3. The haplotype diversity values of ITS, EF1-α, TUB2, CAL, HIS, and the combined datasets were greater than 0.5, reflecting high genetic diversity. The neutrality statistic (Tajima's D and Fu's Fs) results showed negative values, suggesting population expansion in D. eres isolates. We obtained similar results of population network analysis in the phylogenetic tree ( Figure 4A) and the median-joining haplotype network ( Figure 4B) using the combined dataset of EF1-α, TUB2, CAL, and HIS. Population connectivity was grouped into two clusters that were not correlated to specific populations of geographic distribution ( Figure 5).

Phylogenetic Informative Analysis
It should be noted that median-joining haplotype network analysis was also performed based on each individual locus using DnaSP v.6.11.01. The major haplotype numbers from EF1-α, CAL, TUB2, HIS, and ITS were 29, 49, 25, 23, and 9, respectively. Analysis with the CAL locus showed two small distinct clusters: one consisted of hap 1, 18, and 21 from BJ, GS, HN, HUB, JL, LN, and SD, and another consisted of hap 5, 9, and 10 from CQ, HUB, JX, and YN. HUB isolates could be found in both clusters ( Figure 5). Similarly, the analysis of the TUB2 locus also showed two small clusters. Thus, we found that haplotypes that were connected between Cluster I (hap 11) and Cluster II (hap 13, 21, and 26) were from a center part of China, i.e., HEB, HN, HUB, and JX. Analysis of EF1-α, HIS, and ITS loci showed a wide distribution and incommensurate derivative splitting by geographic distribution. These median-joining haplotype networks in each individual locus are shown in Figure S18.

Discussion
In this study, we used five-loci DNA sequences to understand and interpret the species boundaries of the D. eres species complex and assess the genetic diversity of D. eres populations in China. A total of 51 taxa, including 37 close species to the D. eres species complex, was applied to narrow the criteria of phylogenetic relatives using the GCPSR of phylogenetic species, while 138 D. eres isolates from various Chinese populations were examined to assess the relationship between genetic diversity and different geographic distributions.
Recently, the classification of Diaporthe species has become more dependent on a molecular approach rather than traditional morphological characterization [72,84,85]. Nextgeneration sequencing (NGS) technology, such as DNA barcoding, is highly efficient, more accurate, and, thus, valid for fungal identification at the species level [86,87]. The ITS sequence is commonly used for preliminary fungal identification and is recommended for identifying species boundaries in the genus Diaporthe, Diaporthaceae, and Sordariomycetes [5,77,88,89]. However, there are many intraspecific variations in the ITS locus of certain Diaporthe species. Sometimes the intraspecific variation is even greater than the interspecific variation, which makes it difficult to identify Diaporthe species using the ITS sequence alone [90,91].
The identification of Diaporthe species based on morphological characterization is very contradictory, and a molecular approach using DNA sequences should be combined to identify species within this genus [46,47]. To redefine the boundaries of Diaporthe species, Santos et al. [47] proposed highly effective phylogenetic reconstruction using DNA barcoding sequences of multiple loci, i.e., ITS, EF1-α, TUB2, CAL, and HIS. The taxonomy of Diaporthe is complex, and many Diaporthe spp. are classified based on different criteria, according to host associations, morphological characteristics [12,[92][93][94], or sequences of the ITS region [5,92,95]. It is suggested that only the type strains whose identification has been widely recognized should be accepted as references for the taxonomy of this genus [46,96,97]. In this study, several isolates, including type strains from previous publications, were selected as references with phylogenetic analysis. However, when a MegaBlast search was performed for each locus in NCBI, generally, the Diaporthe species showing the highest similarity with the sequence of each locus of the isolates were not the type strains. Thus, the species used by us in the current study were not always the same as those recovered by the single locus MegaBlast search in NCBI. The combined multi-locus phylogenetic reconstruction shows the very strong species delimitation for the D. sojae complex [48]. Fan et al. [14] demonstrated the effectiveness of 3 loci, including EF1-α, TUB2, and CAL, for the identification of the D. eres complex in walnut trees. Similarly, Yang et al. [20] and Zhou and Hou [35] also used three-locus sequences to identify D. eres species associated with different hosts in China. These studies excluded a few closely related Diaporthe species with typical reference strains. However, our study revealed that the phylogenetic analysis from the combined dataset of EF1-α, TUB2, CAL, and HIS was highly effective and strongly supported to resolve species boundaries of the D. eres species complex. This is consistent with the results obtained by Guo et al. [16]. Phylogenetic informative (PI) profiles using multi-locus phylogenetic analysis of five loci are commonly used to identify the D. eres species complex; both Net PI and PI per site showed similar results. Among different loci, EF1-α, APN2 (DNA lyase), and HIS loci are effective for the species delimitation of the D. eres species complex. It was reported that EF1-α showed the highest effectiveness to resolve the phylogenetic signal, which is concordant with the results obtained by Udayanga et al. [43]. Similarly, the highly variable EF1-α locus showed the highest effectiveness to discriminate species in the Diaporthe genus [43,47,91,98]. Our study revealed that EF1-α was reliable, but the ITS region impeded species delimitation and relatively limited phylogenetic signals when the combined DNA sequences of five loci (ITS, EF1-α, TUB2, CAL, and HIS) were used.
Using population genetic analyses, Manawasighe et al. [19] demonstrated the genetic variation of D. eres associated with grapevine dieback in China and found isolates grouped according to geographic location. However, a comparison of Chinese and European D. eres isolates, using both individual-and multi-locus DNA sequences of ITS, EF1-α, TUB2, CAL, and HIS loci, did not show significant differences between the two geographical populations. In this study, D. eres were grouped into two major populations that were not correlated with geographic distribution. Interestingly, we found that isolates from the central part of China, e.g., HEB, HN, HUB, and JX, simultaneously fell into two different clusters with a significant haplotype connection, suggesting that this region is the origin of D. eres. This is consistent with the observation that HUB isolates might be the parental population of D. eres [19].
Finally, future species identification should use a highly effective molecular approach to make it simple and easy to detect D. eres in routine plant quarantine. For genetic variation and population analyses, sample sizes should be increased and comparisons should be performed with the analyses using other molecular markers, including amplified fragment length polymorphism (AFLP), random amplified polymorphic DNA (RAPD), and inter simple sequence repeat (ISSR). Further understanding should focus on the ancestor, phylogeographic and demographic history, divergence-time estimation, and migration history of D. eres.

Conclusions
The current study provides an overview of D. eres on several plant varieties and some valuable knowledge to identify this fungus. Phylogenetic trees were reconstructed for the D. eres species complex with combined DNA sequences of EF1-α, CAL, TUB2, and HIS. Phylogenetic analyses and phylogenetic informativeness profiles reported in this study revealed that for D. eres species identification and delimitation, the usage of the EF1-α locus represents the optimal alternative; this proposition is also supported by previous studies [43,47,91,98]. Moreover, our analyses revealed that the usage of the ITS region hampers proper species recognition within the D. eres species complex. An expansion of population connectivity among the D. eres populations was detected. One hundred and thirty-eight D. eres isolates were divided by phylogenetic analyses, and genetic distance estimation with haplotype networks revealed two clusters with strong support from population genetic parameters and neutrality of statistic informative values, indicating a high level of haplotype diversity. However, we found that the two clusters from both methods were not separated based on geographic distribution. Overall, our analyses determined the current pattern of phylogenetic identification for the D. eres species and population diversity within the Chinese isolates of D. eres. In the future, studies on the evolution of D. eres and plant-D. eres interaction should be conducted.

Institutional Review Board Statement: Not applicable.
Informed Consent Statement: Not applicable.

Data Availability Statement:
Alignments generated during the current study are available in Tree-BASE (accession http://purl.org/phylo/treebase/phylows/study/TB2:S26697 (accessed on 2 August 2020)). All sequence data are available in the NCBI GenBank, following the accession numbers in the manuscript.

Acknowledgments:
The authors sincerely thank the reviewers for their contributions during the revision process.

Conflicts of Interest:
The authors declare no conflict of interest. The funders had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript or in the decision to publish the results.