Realistic Gene Transfer to Gene Duplication Ratios Identify Different Roots in the Bacterial Phylogeny Using a Tree Reconciliation Method

The rooting of phylogenetic trees permits important inferences about ancestral states and the polarity of evolutionary events. Recently, methods that reconcile discordance between gene-trees and species-trees—tree reconciliation methods—are becoming increasingly popular for rooting species trees. Rooting via reconciliation requires values for a particular parameter, the gene transfer to gene duplication ratio (T:D), which in current practice is estimated on the fly from discordances observed in the trees. To date, the accuracy of T:D estimates obtained by reconciliation analyses has not been compared to T:D estimates obtained by independent means, hence the effect of T:D upon inferences of species tree roots is altogether unexplored. Here we investigated the issue in detail by performing tree reconciliations of more than 10,000 gene trees under a variety of T:D ratios for two phylogenetic cases: a bacterial (prokaryotic) tree with 265 species and a fungal-metazoan (eukaryotic) tree with 31 species. We show that the T:D ratios automatically estimated by a current tree reconciliation method, ALE, generate virtually identical T:D ratios across bacterial genes and fungal-metazoan genes. The T:D ratios estimated by ALE differ 10- to 100-fold from robust, ALE-independent estimates from real data. More important is our finding that the root inferences using ALE in both datasets are strongly dependent upon T:D. Using more realistic T:D ratios, the number of roots inferred by ALE consistently increases and, in some cases, clearly incorrect roots are inferred. Furthermore, our analyses reveal that gene duplications have a far greater impact on ALE’s preferences for phylogenetic root placement than gene transfers or gene losses do. Overall, we show that obtaining reliable species tree roots with ALE is only possible when gene duplications are abundant in the data and the number of falsely inferred gene duplications is low. Finding a sufficient sample of true gene duplications for rooting species trees critically depends on the T:D ratios used in the analyses. T:D ratios, while being important parameters of genome evolution in their own right, affect the root inferences with tree reconciliations to an unanticipated degree.


Introduction
Species trees are crucial to understanding how biological lineages, their genomes and their traits evolve over time and are central to modern evolutionary research. There are currently several approaches for species tree reconstruction, with most approaches yielding unrooted trees in which the polarity of processes is not resolved. A rooting step (root inference) is hence often required for evolutionary interpretations of species trees. Methods commonly used for rooting include: (i) the outgroup [1], including the special case of tree rooting with ancient gene duplications [2]; (ii) the midpoint criterion [3]; (iii) the minimal ancestor deviation (MAD) approach [4]; (iv) molecular clock models [5]; and (v) time-irreversible models [6]. The accuracy and assumptions underlying these different and Terrabacteria. The only difference between the three roots found by Coleman et al. [17] concerns the relative positioning of the lineage they designated as Fusobacteria. These three roots are the main result of the paper upon which their further inferences about the nature of the last bacterial common ancestor rest. Their findings differed substantially from those of an earlier report about the position of the root in the bacterial tree and the biology of the last bacterial common ancestor [18].
The T:D ratio that ALE estimated for each gene tree is likely to affect the bacterial roots that passed the AU test, but the impact of T:D estimates upon the root results was not reported by Coleman et al. [17]. It is known that gene transfers in prokaryotes vastly outnumber gene duplications and play the dominant role in the growth of bacterial biochemical networks [19,20]. In quantitative terms, and as estimated by reconciliation-independent methods, the frequency of gene transfer in prokaryotes is about 50-100 times higher than the frequency of gene duplications [21,22], whereby the increase in genome size generated by transfers is compensated by the deletion bias inherent to prokaryote genome evolution [23][24][25][26][27][28][29]. In the analysis of Coleman et al. [17] the final T:D ratio differed across gene trees not by a factor of 20, but by 20 orders of magnitude. That is, ALE estimated that gene transfers are roughly 10 10 times more likely than gene duplications for some genes, whereas gene duplications are 10 10 more likely than gene transfers for other genes. Such a large variation in T:D ratios across bacterial genes, from 10 −10 to 10 10 , represents an unrealistically broad range in comparison to previous studies of T:D ratios in real data using methods that are independent of tree reconciliation analyses [21,22].
Here we asked: Does the freedom in T:D ratios affect ALE's choice of preferred roots? To answer this question, we reanalyzed the bacterial trees from Coleman et al. [17] using ALE under a set of predefined T:D ratios based on analyses of real genome data and repeated the AU tests. Additionally, we analyzed an independent dataset composed of metazoan-fungi species for which the root position is known as a control for our analyses. Although the accuracy (or lack thereof) of the nominal evolutionary rates T and D are likely to affect root inferences made by ALE in their own right, we investigated here the impact of T:D ratios, which are a measure for the relative frequency of gene transfers to gene duplications. We focused on T:D in particular because reliable reference values of T:D ratios could be readily obtained from the literature, whereas reliable values for the nominal rates (T and D) are more challenging to find. We used the ALE-independent T:D estimates as references to assess ALE's performance on the test datasets.

Material and Methods
For the bacterial dataset gene trees reconstructed from protein alignments, species trees and the fraction of missing genes per species were kindly provided by T. Williams [17]. The gene trees were reconciled against the rooted species tree using the program ALEml_undated [14] with the same parameters as described in Coleman et al. [17] (1:1 T:D ratio). Note that ALE analyzes each gene tree independently, hence the gene duplication and the gene transfer rates are optimized for each gene tree independently from the others. Additional tree reconciliations with ALEml_undated were conducted as to conform to different gene transfer to gene duplication (T:D) ratios by specifying, for each gene tree, the gene duplication and the gene transfer rates while maintaining the remaining parameters unchanged. Note that while ALE allows users to separately adjust T and D for each gene tree, there is not a single T:D parameter that can be given as input. Hence to obtain a desired T:D value, we proceeded as follows: for a given gene tree, the gene transfer rate was set to the value returned by the 1:1 T:D analysis (for that same gene tree), in which the gene transfer rate was automatically optimized by ALE. To obtain a desired T:D ratio, the gene duplication rate was defined as the division of the gene transfer rate by an adjusting factor as to obtain the desired T:D ratio. For instance, if the optimized gene transfer rate for a gene tree was 0.5 and we wanted to constrain the T:D to 50:1, then we set T to 0.5 and D to 0.01. The range of T:D ratios evaluated for the bacterial dataset were 50:1, 100:1 and 50:1 or more. For the analysis of 50:1 or more, the gene duplication rates were only adjusted for gene trees with an optimized T:D ratio below 50:1.
For the fungi-metazoan dataset, gene families were obtained from EggNOG version 4.5 [23], and it consists of 15,614 gene families spanning 31 species. Protein sequence alignments were generated for each gene family using MAFFT version 7.027b with the L-INS-i alignment strategy [24] Gene trees were reconstructed with IQ-TREE version 2.0.3 [25] with the best-fitting evolutionary model and recording the best tree for each of the 1000 bootstrap alignment samples. The reference species tree was reconstructed from the concatenated alignment of 117 single-copy gene families, present as single-copy in all 31 species, with IQ-TREE version 2.0.3, with the best-fitting model for each partition in the concatenated alignment [25]. Conditional clade probabilities for each gene tree were calculated with ALEobserve [14], and the gene trees were reconciled against the species trees with ALEml_undated. Tree reconciliations were performed for each possible root branch in the unrooted species tree (59 in total), with different T:D ratios. In the first tree, reconciliations round gene transfer and gene duplication rates were freely estimated by ALE (1:1 T:D analysis). To achieve specific T:D ratios, additional tree reconciliation runs were carried out by setting gene transfer and gene duplication rates in the same manner as described for the bacterial dataset. The gene transfer rates for each gene tree were selected from the 1:1 T:D analysis and the gene duplication rates were adjusted accordingly as to obtain the desired T:D ratio. The T:D ratios tested for the fungi-metazoan dataset were 1:2, 1:50 and 50:1.
The approximately unbiased (AU) tests [16] were performed with custom R scripts using the 'scaleboot' library [26]. The Spearman correlation tests were performed in MATLAB. All tests were considered significant at p-value ≤ 0.05.

Gene Transfer to Gene Duplication Ratios Constrain the Number of Inferred Species Tree Roots
To see what effect the use of a more realistic T:D ratio would have on the inference of the bacterial root, we set the T:D ratio to values that agree with previous studies [21,22], and repeated the analyses following the same protocol as in Coleman et al. [17]. As an initial control, we used exactly the same parameters as Coleman et al. [17] used, and allowed ALE to estimate T and D on its own (1:1 T:D ratio); this exactly reproduced the results of Coleman et al. [17] and found the same three bacterial roots that passed the AU tests ( Figure 1). We then repeated the analyses, but changed T and D as to conform to a T:D ratio of 50:1 across gene trees, a ratio that is considered to be a conservative (lower-bound) estimate of T:D ratios for bacterial genes [22]. All that the 50:1 T:D setting does is to penalize bacterial gene duplications relative to bacterial gene transfers, because we used the same species tree (and remaining parameters) in comparison to the 1:1 T:D analysis. In contrast to the three roots obtained with the 1:1 T:D analysis, the 50:1 T:D ratio resulted in eight bacterial roots (Figure 1). Three roots fell between Gracilicutes and Terrabacteria, two of which were also obtained with the 1:1 T:D analysis. The remaining five roots identified with a T:D of 50:1 fell within the bacterial group that Coleman et al. [17] designate as Gracilicutes ( Figure 1).  In an earlier study focusing on recent gene transfers and recent gene duplications, we found that average bacterial T:D ratios across genes can approach or exceed 100:1 in some bacterial lineages [22]. To understand what influence a higher T:D ratio would have on the root inferences, we repeated the tests by setting T and D across gene trees so as to conform to a 100:1 T:D ratio. The 100:1 analysis uncovered nine bacterial roots in total, that is, one more than the 50:1 T:D analysis. Four roots were found between Gracilicutes and Terrabacteria and five roots within Gracilicutes ( Figure 1). Together, the 50:1 and 100:1 T:D analyses show that gene duplication and gene transfer rates have a significant effect upon ALE's root choices. In terms of trying to get realistic inferences, it is indeed possible that fixed T:D ratios across genes are too strict. That is because some genes, such as those involved in translation, are less frequently transferred [27,28]. Thus, we performed one additional analysis by capping the T:D ratio so as not to fall below 50:1, thereby allowing ALE to assume any T:D ratio that prefers gene transfers over gene duplications by at least a factor of 50:1. With this setting, the AU test recovered a total of ten bacterial roots: five falling between Gracilicutes and Terrabacteria; three roots that fall within Gracilicutes; and two roots that fall within the Terrabacteria, one of which separates a clade of Actinobacteria and Firmicutes from the rest of the tree (Supplemental Table S1). Our results reveal that more realistic T:D ratios increase the number of inferred roots for the bacterial tree. Permitting unconstrained variation of T:D across gene trees, as ALE does in the default 1:1 T:D setting, unnecessarily constrains the likelihoods assigned to alternative bacterial roots.

Imbalance of Gene Gains and Gene Losses
Both gene duplications and gene transfers can contribute novel gene-copies, thereby steadily increasing genome size. However, bacterial genome sizes are not free to expand, they are themselves constrained [29] due to deletional bias [30,31] and energetic costs of gene expression [32]. Thus, in any realistic analysis with ALE, gene gains should be balanced out by gene losses in bacteria. Since in all our analyses we allowed ALE to automatically optimize gene loss rates, the distribution of gene loss rates across gene trees as a function of different T:D settings offers an independent means to assess the realism of ALE's analyses. We compared the balance of gene gains and gene losses across bacterial genes, as obtained with different T:D settings (columns in Figure 2), conditioned on the three bacterial roots reported by Coleman et al. [17] (rows in Figure 2). The three panels of Figure 2 on the left (A-C) show the results for the 1:1 T:D analysis (default), the panels in the middle (D-F) show the results for the T:D of 50:1, and the panels at the right (G-I) show the results for the 100:1 T:D ratio. In all the different T:D settings the rates of gene loss for each gene tree were freely estimated by ALE and, as such, the distributions of gene loss rates offer an opportunity to evaluate the results. Gene gains and losses are more balanced, with narrower distribution, for the more realistic 50:1 and 100:1 T:D ratios in comparison to the 1:1 T:D ratio ( Figure 2). The trend is largely independent upon the root position within the bacterial species tree (see also Supplemental Figure S1).
Dividing the mean loss rate (L) by the mean gain rate (G) across gene trees provided us with an L:G ratio across gene families, a proxy for the direction of genome size evolution. An L:G above one indicates genome reduction, whereas an L:G below one indicates genome growth. The body of literature showing that bacterial genomes lose genes more often than they acquire genes is rich [30,31,[33][34][35][36][37]. In both the 50:1 and 100:1 T:D ratio analyses, gene losses do indeed exceed gene gains in bacteria, with L:G ratios of 1.24 and 1.25, respectively ( Table 1). The L:G estimate is indicative of a general trend for an average bacterial genome relative to all genomes in the dataset. Exceptions to this obviously exist. For instance, filamentous cyanobacteria species exhibit larger genomes in comparison to their unicellular cyanobacteria relatives due to the increase in genome size at the origin of filaments [38][39][40][41]. ALE, in its default 1:1 T:D setting, indicates instead that the general tendency of an average bacterial genome is to become larger in size, yielding a low L:G value of 0.76 which, in other words, means that gene gains are 24% more frequent than gene losses, contrary to observations in real data. In other words, the 1:1 T:D analyses indicate that gene gains outnumber gene losses in bacterial genomes, in contradiction to previous observations [30,31,[33][34][35][36][37].

Testing ALE on a Simple Phylogenetic Case: The Fungal-Metazoan Root
Demonstrating the limitations of species tree rooting with ALE using the bacterial tree carries the caveat that the bacterial root is not known nor is it known whether the species tree that Coleman et al. [17] used to search for roots is correct-an issue that we do not address here. To circumvent some of the unknowns underlying bacterial evolution,

Testing ALE on a Simple Phylogenetic Case: The Fungal-Metazoan Root
Demonstrating the limitations of species tree rooting with ALE using the bacterial tree carries the caveat that the bacterial root is not known nor is it known whether the species tree that Coleman et al. [17] used to search for roots is correct-an issue that we do not address here. To circumvent some of the unknowns underlying bacterial evolution, we tested the ability of ALE to recover the root of a simpler phylogenetic case: the root of the metazoan (animals) and fungal tree. By all accounts, fungi are monophyletic relative to metazoa and vice versa [4,[42][43][44], with the root of the tree lying undisputedly on the branch splitting the two groups. We analyzed the ability of ALE to identify the fungal-metazoan root using a dataset composed of 31 species and 15,614 gene trees. We reconstructed a reference fungal-metazoan tree from the concatenated alignment of 117 single-copy genes shared among all 31 species using maximum-likelihood and tested each of the 59 branches in the unrooted tree as a possible root position, using different T:D ratios with ALE.
Assuming an initial T:D ratio of 1:1, the AU tests identified the correct root ( Figure 3). By inspecting the evolutionary rates across gene trees, we found that the average number of duplications (D) was twice as high as the average number of transfers (T) across genes, that is an average T:D ratio of 1:2 ( Table 2). This result contrasts with a previous estimate of a 1:100 T:D ratio in eukaryotes [44] based on a large sample of eukaryotic gene families harboring gene duplications. It was observed that fewer than 1% of all eukaryotic gene duplications are shared among eukaryotic supergroups [44]. On the other hand, 99% of the gene duplications are exclusively found in a single eukaryotic supergroup, most of them exclusive to plants, metazoa or fungi [44]. These observations suggest an approximate T:D ratio of 1:100 in eukaryotes, assuming that gene transfers are the sole evolutionary process responsible for generating duplicates shared among eukaryotic supergroups (see [44] for further discussion). However, assuming gene transfer as the sole process is certainly an extreme proposition, and thus the T:D value of 1:100 only serves as a very liberal reference for the analyses of the fungi-metazoan dataset. Interestingly, for the 117 single-copy genes used to infer the fungi-metazoan tree, ALE calculated an abnormally high average T:D ratio of 26,968:1. According to ALE, there are only two possible causes for these tree incongruences: gene transfers or gene duplications. Other explanations are theoretically possible, such as incomplete lineage sorting and tree reconstruction errors, but those factors are not part of the evolutionary model that ALE implements. Topological discordances among gene trees and species trees are expected to occur in real biological data as a result of the ever-present problem of tree reconstruction errors. However, ALE assumes that both gene tree and species tree are correct, a rather unrealistic assumption which is known to introduce biases in tree reconciliation analyses [45]. Although phylogenetic errors may be accounted for by collapsing branches in the trees with bootstrap values below a predefined threshold, this is neither a common practice nor guaranteed to alleviate biases arising from tree reconstruction errors, since even branches with high bootstraps may be incorrect. According to the model implemented in ALE, there are two possibilities to explain topological deviations in the fungal-metazoan single-copy gene trees: 'gene duplication' or 'gene transfers'. Calling gene duplications would imply a series of parallel gene losses which penalize the likelihood score of the 'gene duplication' scenario to a suboptimal value in comparison to the 'gene transfer' scenario, inflating gene transfer rates to an unrealistic degree.
Life 2022, 12, 995 9 of 16 not retrieve the correct root, while all gene trees combined can? One possibility is that the sample size of 117 gene trees is not large enough. We ruled out sample size as the main issue because the correct root was recovered with a random sample of 117 gene trees (Sup plemental Table S2). The most plausible explanation has to do with the relative im portance of gene duplications and gene transfers for species tree rooting, as we explain in the following.  Table S3). The species tree is shown rooted on the known root branch that separates fungi (pink) from metazoan (light green). The rooting analyse were performed under different T:D settings (rows). The reconciliations were performed for 15,614 maximum-likelihood gene trees against the all-possible rooted versions of the unrooted tree (59  Table S3). The species tree is shown rooted on the known root branch that separates fungi (pink) from metazoan (light green). The rooting analyses were performed under different T:D settings (rows). The reconciliations were performed for 15,614 maximum-likelihood gene trees against the all-possible rooted versions of the unrooted tree (59 roots in total). The results for the AU test are shown separately for all gene trees [left; (a,c,e,g)], and for 117 gene trees that contain all species without paralogs [right; (b,d,f,h)], referred in the text as single-copy gene trees. The root branches that passed the AU test (p > 0.05) are marked with black arrows.  Distinguishing between gene transfers and gene duplications is only possible if T and D are known. Whether the values of T and D estimated by ALE are accurate or not is a crucial aspect of tree reconciliation analyses that we address here. Our results suggest that the T:D ratios estimated by ALE, when left to its own devices, are abnormal in a dataset-independent manner. The frequency of gene duplications and gene transfers, and their ratios, are known to be very different in bacteria and in eukaryotes. Yet ALE estimates that the average T:D ratio for the fungal-metazoan genes is roughly 1:2 and 2:1 for bacterial genes, both average T:D very close to the initial 1:1 T:D ratio used as prior. The ranges of T:D ratios across genes are also too wide, ranging from 10 −11 to 10 9 across fungal-metazoan genes and 10 −11 to 10 10 across bacterial genes (Supplemental Figure S1).
The single-copy gene trees from the fungi-metazoan dataset offer an opportunity to further demonstrate how the T and D rates affect the root inferences. Performing the AU tests using single-copy trees alone under 1:1 T:D leads to an inference of 20 equally supported roots for the fungal-metazoan tree (Figure 3). Why can the single-copy gene trees not retrieve the correct root, while all gene trees combined can? One possibility is that the sample size of 117 gene trees is not large enough. We ruled out sample size as the main issue because the correct root was recovered with a random sample of 117 gene trees (Supplemental Table S2). The most plausible explanation has to do with the relative importance of gene duplications and gene transfers for species tree rooting, as we explain in the following.

Gene Duplications Are More Informative Than Gene Transfers for Rooting Species Trees
ALE's ability to root species trees rests upon gene duplications, gene transfers and gene losses, but the relative contribution of each of these types of events for ALE's rooting procedure is obscure. It has been speculated that gene transfers are the main phylogenetic information that enables the rooting with ALE because gene transfers constrain the relative age of donor-recipient lineage pairs [17]. Since gene transfers cannot happen forward or backward in time, gene transfer would hold the potential to polarize node pairs in the species tree that correspond to donor-recipient lineages. While this argument may hold for dated species trees, that is, trees for which the branch lengths are proportional to time, gene transfers have only a limited potential to root undated species trees, which are most commonly used in phylogenetic practice.
To investigate the issue, we performed the following experiment for gene duplication, gene transfer and gene loss independently: We ranked gene trees in decreasing order of evolutionary rates for the given variable (gene duplication rate, gene transfer rate, and gene loss rate), as freely estimated by ALE (1:1 T:D), and selected the top 100 gene trees in each ranked list to carry out the AU tests for rooting. The number of significant roots obtained were counted and the AU tests were iteratively repeated for the following 100 gene trees in the ranked list until all non-overlapping subsets 100 gene trees were analyzed. The number of significant roots obtained as a function of the gene duplication rate alone, for both the bacterial and fungal-metazoan datasets, are shown in Figure 4. This simple power analysis clearly shows that gene duplications, not gene transfers, are the most informative type of phylogenetic event for rooting species trees with ALE, a trend observed for both the fungal-metazoan tree and the bacterial tree. The number of significant roots is negatively correlated with the average gene duplication rate across the gene trees (rho = −0.32 and p < 0.01 for bacteria; rho = −0.52 and p < 0.01 for fungi-metazoan; two-tailed Spearman correlation). On the other hand, the average gene loss rate has no significant influence on the number of inferred roots (rho ≈ 0.08 and p ≈ 0.4 for bacteria; rho ≈ 0.01 and p ≈ 0.9 for fungi-metazoan), while the gene transfer rate shows a weak correlation with the number of roots which is only marginally significant (rho = 0.09 and p = 0.04 for bacteria; rho = −0.17 p ≈ 0.05, for fungi-metazoan). Note that the Spearman correlation is a non-parametric statistic, as opposed to the Pearson correlation, and is appropriate for identifying dependencies between variables even when the underlying distribution is unknown.
The fact that gene duplications are more important than gene transfer for the species tree rooting with ALE readily explains why more realistic T:D ratios increase the number of inferred roots for the bacterial phylogeny. Higher T:D ratios result in fewer gene duplications, rendering the root inference more uncertain. The T:D ratio serves the purpose of determining the proportion of tree incongruences that are attributed to gene duplications and the proportion of tree incongruences that are attributed to gene transfers. For the bacterial dataset, the optimized T:D ratios were on average 2:1, which means that approximately 33% of the tree incongruences in the data were attributed to gene duplications, whereas the remaining 67% were attributed to gene transfers. However, most of these gene duplications are in fact false gene duplications, as evidenced by the unexpectedly large average T:D of 2:1. Due to the high number of false gene duplications, incorrect roots in the bacterial tree receive inappropriately high statistical support. In other words, the evolutionary model is biased and not as powerful as it may seem at first sight (due to the high statistical support). By increasing T:D, the number of false gene duplications is reduced. As a consequence, the evolutionary model becomes less biased and the root inference appears more uncertain (as it should), because the paucity of true gene duplications in the data renders the identification of the correct root not possible. In tree the reconciliation analyses performed by ALE, the finite number of tree incongruences between gene trees and the species tree constrains a priori the total number of evolutionary events (gene transfers plus gene duplications) that can be invoked. Importantly, the number of incongruences between a gene tree and the species tree depends upon the root position in the species tree such that species tree roots that induce a larger number of tree incongruences will generally attain smaller likelihood scores in comparison to roots that induce fewer tree incongruences. The fact that gene duplications are more important than gene transfer for the species tree rooting with ALE readily explains why more realistic T:D ratios increase the number The importance of gene duplications for rooting furthermore explains why the singlecopy gene trees alone cannot retrieve the correct root for the fungal-metazoan tree. Singlecopy genes bear few detectable gene duplications and as such are not root informative. The importance of gene duplications on the root choice by ALE becomes even clearer when we carry out the analyses fixing the T:D ratio in such a way that we allow for more gene duplications to occur in comparison to gene transfers. Using the fungal-metazoan tree as a clear reference, and increasing the T:D ratio to 1:2, as opposed to the default 1:1, the AU tests identified 19 roots when using only the single-copy gene trees to root the fungal-metazoan tree, one root less than is obtained with the 1:1 T:D analyses in which the occurrences of gene duplications are more restricted. Allowing even more gene duplications by setting ALE to a T:D of 1:50, the number of roots decreases to 16 ( Figure 3). Overall, these analyses indicate that allowing more gene duplications in ALE increases the power of the AU test in rejecting incorrect roots. Yet gene duplications are so rare across the single-copy genes that a decisive root inference for the fungal-metazoan tree is not possible with the single-copy gene trees alone. Furthermore, our observation that the analysis of all fungi-metazoan gene trees with T:D set to 1:1 recovers the true root makes sense because gene duplications are abundant in eukaryotic genomes, in particular in fungi and metazoa [38]. ALE is able to find a sufficient number of the gene duplications present across all fungi-metazoan gene trees, yet ALE infers a high number of false gene transfers because the average T:D ratio of 1:2 is too high even in comparison to the very liberal reference T:D value of 1:100 [38]. Despite the high number of falsely inferred gene transfers in the fungi-metazoan data, there is no considerable bias for the root inference because gene transfers are not root informative, as we have shown. Interestingly, the analyses of all fungi-metazoan gene trees with a T:D set of 1:2 and 1:50, in which more gene duplications can be accommodated in comparison to the default 1:1, renders an uncertain root inference for the fungi-metazoan phylogeny. The reason is likely due to a systematic tendency of tree reconciliations to incorrectly place terminal gene duplications at the base of gene trees [45], a problem that becomes more acute when more gene duplications are allowed (other factors, however, may also play a role).
If gene duplications indeed play a crucial role for the root inferences, then penalizing the occurrence of gene duplications and facilitating the occurrence of gene transfers instead should have a negative effect on the inferences. To investigate this possibility, we set the T:D ratio for the fungal-metazoan gene trees to an unrealistic 50:1 T:D, whereby gene transfers exceed gene duplications by a factor of 50. With a T:D ratio of 50:1, we obtained two equally supported roots for the fungal-metazoan tree, both of which lie within the metazoan lineage, far from the true root branch (Figure 3).

Conclusions
We have demonstrated that the rooting of species trees with ALE depends on the quality of the gene transfer to gene duplication ratio used as input. The reliance on hard-coded priors of 1:1 used to estimate the relative evolutionary rates of duplications and transfers hinders root inference because evolutionary rates vary considerably across lineages. The influence of prior choice as source of bias in phylogenetic inferences has been reported before [46].
From the theoretical point of view, tree reconciliation methods are an attempt to fully model the complex evolutionary process shaping the evolution of genes and genomes. In a single gene-tree-species-tree reconciliation analysis, there are several parameters that need to be simultaneously estimated besides T:D-effective population size, fraction of non-sampled lineages, gene loss rate, and gene tree root (which is also optimized in ALE). Each of these parameters are required for the analyses and are likely to affect the accuracy of species tree rooting. Investigating all of these parameters is out of our scope. We purposefully focused on T:D ratios in particular because we have a good reference to the distribution of actual T:D values. A realistic distribution of bacterial T:D values were obtained by an independent study [22] which explicitly minimized biases by focusing on inferences of T:D using recent evolutionary events, for which the distinction among gene transfers and gene duplications is possible without committing to any particular bacterial species tree (hence a reconciliation-free approach). While we specifically focused on the effect of T:D here, our work will motivate further investigations about the accuracy of species tree rooting obtained with tree reconciliation methods.
Our work goes well beyond previous studies that assessed the performance of tree reconciliation models with simulations. Simulations, despite being useful, present limitations that are hard to bypass: (i) they do not fully capture the characteristics of biological data and are highly dependent on parameter choice; (ii) tree reconstruction errors are often not accounted for; and (iii) the evolutionary models used to carry out simulations often belong to the same family of the models being tested, making the study biased by design.
We observed that in using realistic, independently estimated T:D ratios to infer the bacterial root with ALE, the inferred root placements become more uncertain and the number of possible roots increases relative to the results of ALE left to its own devices. It is notable that gene duplications are significantly more important than gene transfers for rooting. Gene duplications are vertically inherited and, as such, are rich phylogenetic characters to trace the evolution of species [44,47], whereas frequent gene transfers, such as those observed among prokaryotes, are not root informative. Indeed, it has been suggested that the high frequency of gene transfers among prokaryotes may render the species tree framework inapplicable [48,49], a proposition that is nevertheless challenging to implement in practice. Overall, our comprehensive analysis for the bacterial tree does not strongly support the Gracilicutes-Terrabacteria root. Furthermore, we show that root inferences obtained through gene-tree-species-tree reconciliations are far more uncertain than previously recognized due to factors that are only now coming to light.
Supplementary Materials: The following supporting information can be downloaded at: https: //www.mdpi.com/article/10.3390/life12070995/s1. Figure S1: The rates of gene gains and gene losses across bacterial genes for bacterial roots (rows) that were found with a T:D ratio of 50:1 and 100:1 and not reported earlier by Coleman et al. [17]. The columns represent the different T:D settings. Table S1: AU tests for all bacterial roots with different T:D settings. Table S2: AU tests for all fungal-metazoan roots with different T:D settings. Table S3: Complete species composition of the fungal-metazoan analysis.