Exploring LSU and ITS rDNA Sequences for Acanthamoeba Identification and Phylogeny

The identification and classification of strains of Acanthamoeba, a potentially pathogenic ubiquitous free-living amoeba, are largely based on the analysis of 18S rDNA sequences, currently delineating 23 genotypes, T1 to T23. In this study, the sequences of the ITS region, i.e., the 5.8S rDNA and the two internal transcribed spacers (ITS-1 and ITS-2), and those of the large subunit (LSU) rDNA of Acanthamoeba were recovered from amoeba genomes; the sequences are available in GenBank. The complete ITS–LSU sequences could be obtained for 15 strains belonging to 7 distinct lineages (T4A, T4D, T4F, T4G, T2, T5, and T18), and the site of the hidden break producing the 26Sα and 26Sβ was identified. For the other lines, either the LSU is partial (T2/T6, T7) or the ITS is fragmentary (T7, T10, T22). It is noteworthy that a number of sequences assigned to fungi turned out to actually be Acanthamoeba, only some of which could be affiliated with known genotypes. Analysis of the obtained sequences indicates that both ITS and LSU are promising for diagnostic and phylogenetic purposes.


Introduction
Acanthamoeba spp. (Amoebozoa, Discosea, Centramoebida) are ubiquitous free-living amoebae, abundant in a variety of natural and man-made environments; moreover, they are of medical interest because they can behave as opportunistic parasites for humans and other animals. Various species have been implicated in disseminated infections in multiple tissues and organs, with possible haematogenous spread to the central nervous system resulting in chronic granulomatous amoebic encephalitis (GAE); this is almost always fatal, especially in immunocompromised individuals [1,2]. Acanthamoeba is also a rare ocular pathogen causing infections normally confined to the cornea, leading to blinding amoebic keratitis (AK); nevertheless, other parts of the eye can also sometimes be invaded [3,4].
These relationships are largely consistent with those obtained using mitochondrial genes, such as SSU (16S) rDNA or the cytochrome c oxidase subunit I (Cox1) gene [6,[9][10][11], as well as using sequences from the first intergenic transcribed spacer (ITS-1) of the nuclear rDNA [12]. However, the phylogenesis of Acanthamoeba is not fully resolved as some incongruences remain within and between the different trees; in addition, various genotype clusters are not well-supported. This is partly because, for all genotypes, genetic data are only available for 18S rDNA. Furthermore, it is likely that additional genotypes have yet to be discovered. In this study, the ITS region, i.e., the 5.8S rDNA flanked by ITS-1 and ITS-2, and the large subunit (LSU) of nuclear rDNA, were analysed; this was conducted in order to assess their possible use to improve phylogenetic resolution.

Materials and Methods
The Acanthamoeba genomes available on the NCBI portal were analysed to extract the ITS and LSU regions. The genomes were analysed by BLAST using as a query a sequence from the Neff strain (T4G genotype) spanning the complete rDNA operon (Gen-Bank ID GU001160). Then, single or overlapping contigs with the region of interest were identified and the final sequences assembled. The different ITS-LSU sequences obtained were used for further genome screening and also, as a query to search by BLAST for closely related sequences in GenBank.
A first analysis was performed on LSU rDNA only, including other amoebae and fungi, to assess tree topology and confirm reliable identification of recovered sequences. Then, for sequences confirmed as Acanthamoeba, the ITS and LSU regions were aligned separately with that of Balamuthia mandrillaris (strain 2046) used as an outgroup. An 18S rDNA tree was constructed including the available sequences of the strains studied, to compare the results given by the different portions of the rDNA operon. Multiple alignments were performed using MAFFT and manually refined to exclude ambiguous sites using BIOEDIT. For alignment of the ITS region (ITS1-5.8S-ITS2), among the various programs tested (not shown), MAFFT with the L-INS-I option [13] was found to perform best. The multiple alignment thus obtained was visually checked to verify the correct positions of the 5.8S, as well as certain homologous parts identified in the ITS.
Molecular phylogenetic trees were built as previously described [14,15] with maximum likelihood (ML) (GTR G + I:4) using TREEFINDER [16], and neighbour-joining (NJ) (Kimura 2-P) and maximum parsimony (MP) using MEGA7 [17], with 1000 bootstraps. Pairwise similarity values for rDNA sequences were obtained with BIOEDIT by removing common and terminal gaps, and using all the sites and indels. Mean values within and between groups were then calculated manually.

Sequence Retrieval: General Features
The rDNA sequences were obtained from the available genomes of different Acanthamoeba strains; some were deposited under erroneous names, whose true identity was previously clarified by nuclear and mitochondrial SSU rDNA analysis [6]. Sequences of approximately 5400 bp were successfully extracted from fifteen of the twenty-four available genomes. Analysis of the remaining genomes gave incomplete or poor-quality results, even for 18S sequences. The obtained sequences covered the last 30 bp of 18S rDNA, the entire ITS region (about 1200 bp), and almost always the complete LSU rDNA sequence (near to 4300 bp); delineated by identifying the 3'end of the gene (GenBank ID L07635), as determined by Yang et al. [18]. The ITS region could not be completely recovered for Acanthamoeba sp. T22 because the corresponding contig contains various gaps, and it is very fragmented for A. culbertsoni (T10) and A. astronyxis (T7). Moreover, for A. astronyxis, only the 5' end of the LSU (up to 2600 bp) could be detected. Another sequence considered here is that of strain BCP-EM3VG21-1 (hereafter BCP for simplicity) consisting of the complete 18S and ITS region, but only a short LSU (Table 1). In Acanthamoeba as well as its close relative Balamuthia, ITS-1 and ITS-2 are long, exceeding the 250-300 nt size usually found in other protists [20]; as shown here (Table 1) for the two other analysed amoebae, the true Rhizamoeba [21] and Vermamoeba subtype 2 [19].
ITS-1 has a length between 309 and 512 nt; the shortest being found in A. lenticulata (T5) and the longest in A. terricola (T4G). Considering also the data of Köhsler et al. [12], some size distribution seems to occur. For example, the ITS-1 is 348-438 nt in T4A and T4B strains, and 440-480 nt in T4D strains; while it is 320-339 nt in the A. palestinensis group (T2/T6 line). A similar correlation between size and genotype is also found for ITS-2; this is always longer, varying between 480 nt for A. lenticulata (T5), 580-620 nt for T4A strains, around 640 nt for A. mauritaniensis, A. rhysodes (T4D) and A. terricola (T4G), and up to 700 nt for A. triangularis (T4F). The ITS-2 of A. byersi (T18) (Pb30/40 strain) is 1155 nt, and data for other MG1 species will be required to confirm if this is a feature of this lineage. The length variation of ITS-1 is largely due to multiple short repeats, mainly di-and tri-nucleotides (microsatellites) [12]. Microsatellite variations are also present in the ITS-2 of the different groups; although in all sequences, portions corresponding to ITS-2 helices could be identified (not shown). Similarity values for ITS-2 within groups are >75%, but drop to <60% even between related lines (Table 2). For similarity within groups, the range of values is shown (bold). For the composition of the groups (n).
The complete 5.8S rDNA sequence was found for all the strains analysed. Its length varies between 160 and 174 bp (Table 1), with 19 to 20 nucleotide substitutions (approx. 82% identity) between the MG1 species (T7 and T18) and all the others. The latter have much less variation, with only 0-6 nucleotide changes (96.3-100% identity). It is noteworthy that T7 and T18, although belonging to the same group, differ by 16 nt (91.9% identity). The sequence of the Neff strain used as a query (GenBank ID GU001160) has three mutations in 5.8S; these are not found in any other strain and are very likely the result of a sequencing error.
Complete LSU rDNA of 17 strains of six genotypes was obtained ( Table 1). The sequences vary in length from approx. 4080 to 4350 bp, have no intron, and show similarity values close to those obtained from 18S rDNA (Table 3).  Early works demonstrated that the LSU rRNA of Acanthamoeba (strain Neff) splits into two smaller, but unequal fragments: 26Sα and 26β, of approx. 2400 and 2000 nt, respectively; this is evidenced by gel electrophoresis of heat-denatured RNA and the formation of R-loops in the DNA-RNA hybridization assay [22,23]. The discontinuity in the LSU rDNA was located as a 200 bp gap on a restriction map of the cloned rDNA unit, between the Bgl II and Bam HI sites [23]. This region corresponds to domain III of the LSU rRNA, extending from stem 26' to H62 (numbering after Petrov et al. [24]); in the Acanthamoeba sequences retrieved here, it exhibits two unusual expansion elements, forming however coherent structures in two-dimensional reconstructions (Figure 1). 1 For 18S analysis, the sequence of PD2S was used because that of PT14 is incomplete.
Early works demonstrated that the LSU rRNA of Acanthamoeba (strain Neff) splits into two smaller, but unequal fragments: 26Sα and 26β, of approx. 2400 and 2000 nt, respectively; this is evidenced by gel electrophoresis of heat-denatured RNA and the formation of R-loops in the DNA-RNA hybridization assay [22,23]. The discontinuity in the LSU rDNA was located as a 200 bp gap on a restriction map of the cloned rDNA unit, between the Bgl II and Bam HI sites [23]. This region corresponds to domain III of the LSU rRNA, extending from stem 26' to H62 (numbering after Petrov et al. [24]); in the Acanthamoeba sequences retrieved here, it exhibits two unusual expansion elements, forming however coherent structures in two-dimensional reconstructions (Figure 1).
One element is located inside the stem of H58 (58es1); the other in the loop between stems 55' and 54' (55es1); and their lengths vary between 50-130 and 10-80 nt among species, respectively. Interestingly, in all species, while the AT content of the whole LSU rDNA is 41.2-48.9%, it is very high for 55es1: between 63.2-86.3% (Table 4), which probably makes it unstable. The hidden break may therefore occur at this site, by the splicing of 55es1; thus, this produces the 26Sα and 26Sβ of about 2320 and 1880 nt, respectively. These results on several genotypes are entirely consistent with the previous ones based on the single Neff strain, and find new confirmation in the recent study by Natsidis et al. [25]; they analysed the hidden break in a larger number of eukaryotic LSU rRNA, retrieving for Acanthamoeba (Neff) almost identical prediction. The hidden break was not detected in the LSU of Balamuthia and the other amoebae analysed here.

Uncultured Fungi Turned Out to Be Acanthamoeba
BLAST search using Acanthamoeba ITS-LSU sequences retrieved from the analysed genomes yielded as close relatives 21 sequences of about 2600 bp (Kallberg et al., unpubl.); plus other shorter sequences of about 880 bp [26], all recovered from soil samples and deposited in GenBank as uncultured fungi. The longer sequences include both the complete ITS region and approx. 1300 bp of the LSU (domain I and part of domain II, up to stem 36); while the shorter sequences consist of the LSU alone (domain I up to H25a). Phylogenetic analysis based on the LSU sequences clearly indicates that these sequences are not fungi, but belong to Acanthamoeba (Figure 2). This misidentification is most likely due to the fact that in GenBank, ITS-LSU fungal sequences are overrepresented; while those of Acanthamoeba are absent, except for that of Neff strain. spectively; this is evidenced by gel electrophoresis of heat-denatured RNA and the formation of R-loops in the DNA-RNA hybridization assay [22,23]. The discontinuity in the LSU rDNA was located as a 200 bp gap on a restriction map of the cloned rDNA unit, between the Bgl II and Bam HI sites [23]. This region corresponds to domain III of the LSU rRNA, extending from stem 26' to H62 (numbering after Petrov et al. [24]); in the Acanthamoeba sequences retrieved here, it exhibits two unusual expansion elements, forming however coherent structures in two-dimensional reconstructions (Figure 1).  Five clones emerge within Acanthamoeba T4, closely related to A. quina (T4A), A. terricola (T4G), or species of the T4D genotype. Ten other clones are clearly affiliated with A. palestinensis (T2), clustering in three groups, A to C; these could correspond to the other lineages of the T2/T6 clade. The remaining clones form three distinct lines, labelled groups 1/2, 3 and 4 for convenience; these are difficult to place due to the lack of available sequences for the other genotypes. The overall tree topology and mean within/between group similarity values for the LSU rDNA domains I/II (Table 5) are largely consistent with results typically obtained using 18S rDNA sequences. Analysis of the ITS region of these clones, which are actually uncultured Acanthamoeba, also shows an interesting distribution by size and by group (Table 6); this is in agreement with the results presented above (Table 1) obtained from the genomes of Acanthamoeba strains.

ITS Phylogeny
The ITS region (ITS-1-5.8S-ITS-2) and ITS-1 alone from Acanthamoeba were used to assess the diagnostic potential and phylogenetic resolution of these portions. As only fragmentary portions for T7, T10, and T22 could be obtained, these genotypes were excluded from the ITS analysis; the analysis counts 38 complete sequences in total. Furthermore, 23 additional sequences are available for ITS-1; most were obtained from clinical or environmental samples (GenBank ID AF526424-AF526434; AY128512-AY128522) by Köhsler et al. [12]; and another, clone 20A, from a soil sample in Russia (GenBank ID MG706257; Oglodin et al., unpubl.).
Within the T2/T6 clade, it seems that group B can correspond to T6; it is sometimes incorrectly named "A. operculata" because a strain of Acanthamoeba was misdiagnosed as Comandonia operculata (actually synonymous of Flamella; for the correct naming of strains and species, see Corsaro [6]). By contrast, group A could be either lineage OX1 or lineage Page-45. In any case, the LSU and ITS data are congruent in supporting that the T2/T6 clade is composed of several distinct genotypes, as previously suggested [6,14].
There is a weak indication from preliminary data that group 1/2 could belong to the A. jacobsi or A. culbertsoni groups; however, representing neither T22 nor T10, the only members of both groups for which LSU sequences are available. Obviously, the ITS and LSU sequences of additional recognized strains will be required to elucidate their position. On the other hand, groups 3 and 4 remain unclassifiable; they could also correspond to new lines.

ITS Phylogeny
The ITS region (ITS-1-5.8S-ITS-2) and ITS-1 alone from Acanthamoeba were used to assess the diagnostic potential and phylogenetic resolution of these portions. As only fragmentary portions for T7, T10, and T22 could be obtained, these genotypes were excluded from the ITS analysis; the analysis counts 38 complete sequences in total. Furthermore, 23 additional sequences are available for ITS-1; most were obtained from clinical or environmental samples (GenBank ID AF526424-AF526434; AY128512-AY128522) by Köhsler et al. [12]; and another, clone 20A, from a soil sample in Russia (GenBank ID MG706257; Oglodin et al., unpubl.).
The different lineages are all very well recovered by the phylogeny of the entire ITS region (Figure 3a), with a tree topology almost identical to that obtained with the LSU (Figure 2). Many subgroups are also well recovered using only ITS-1; however, producing inconsistent trees (Figure 3b).   Tables 1 and 6). Balamuthia mandrillaris was used as an outgroup. At the nodes, bootstrap values (1000 replicates) for ML/NJ/MP are shown; filled and open circles, bootstrap support 100 or >90% with all the methods. *, node recovered but support <50%; -, node not recovered. ITS-1 could therefore be useful for identifying certain groups, but not for inferring phylogeny. In addition, the tree based on the ITS region is also in good agreement with that expected according to the 18S genotype. This is particularly evident for relationships between and within the closely related major groups, T4 (T4A to T4G) and the T2/T6 clade, down to the single strains; since for many of them, the 18S rDNA sequences are also available, allowing for an in-depth comparison of tree topologies (Figure 4). bootstrap values (1000 replicates) for ML/NJ/MP are shown; filled and open circles, bootstrap support 100 or >90% with all the methods. *, node recovered but support <50%; -, node not recovered. ITS-1 could therefore be useful for identifying certain groups, but not for inferring phylogeny. In addition, the tree based on the ITS region is also in good agreement with that expected according to the 18S genotype. This is particularly evident for relationships between and within the closely related major groups, T4 (T4A to T4G) and the T2/T6 clade, down to the single strains; since for many of them, the 18S rDNA sequences are also available, allowing for an in-depth comparison of tree topologies (Figure 4). The strains for which ITS-1 alone or the entire rDNA operon is also available are shown in red and blue, respectively. The exceptions are T7, T10, and T22, and the BCP strain of T4A (see Table 1). The tree rooted on . Molecular phylogeny of Acanthamoeba based on complete 18S rDNA. The strains for which ITS-1 alone or the entire rDNA operon is also available are shown in red and blue, respectively. The exceptions are T7, T10, and T22, and the BCP strain of T4A (see Table 1). The tree rooted on Balamuthia mandrillaris; bootstrap values (1000 replicates) for ML/NJ/MP are shown at the nodes; filled and open circles (100 or >90% support). *, node recovered but support <50%; -, node not recovered.

Conclusions
Phylogenetic analyses based on ITS and LSU largely support the results obtained by 18S. Separation within T4 into distinct groups, T4A to T4G, is always observed; except for some mixing between T4A and T4B, already reported for nuclear and mitochondrial SSU rDNA sequences [6,11]. The close affinity of the Linc-AP1 strain with A. lungdunensis (T4A) and not with A. polyphaga (T4E) [6] is confirmed. Moreover, the C3 strain turns out not to belong to A. castellanii, but to a distinct branch of T4A; this is evidenced by the 18S phylogeny (Figure 4). Similar results are obtained for the T2/T6 clade, for which the same lineages can be identified by the different portions of the rDNA operon. The LSU presents a variability comparable to that of 18S, or even slightly higher if the complete gene is considered (Table 3). Specific regions for the different genotypes and subgroups can already be identified in the partial LSU (domains I and II); which, for its small size (about 1300 bp), would be easier to sequence while ensuring useful data for the diagnosis and phylogenetics.
Various group I introns are present in the 18S of at least four genotypes (T3, T4, T5, and T15) [28,29]. However, they were not found in the LSU sequences analysed here; nonetheless, this does not exclude the possibility that other LSU may have introns.
ITS-1 was previously found to be tenfold more variable and correlated with 18S genotypes [12]; similar rate variability ( Table 2) and genotype correlation ( Figure 3) were found here for ITS-2, suggesting that these rDNA portions may be useful for the molecular identification of strains. Alignment of ITS-1 and ITS-2 is, however, difficult due to the large variations in length and sequence between the strains; this provides variable results depending on the program used. Including both 5.8S rDNA and additional sequences greatly improves the alignment, and produces a more reliable phylogenetic result. It can be expected that obtaining the ITS sequences of the remaining genotypes will make it possible to elucidate their secondary structure and to better identify the homologous regions to be retained in the multiple alignment.
ITS and partial LSU both clearly show their utility for Acanthamoeba sequence analysis; in addition, their use, separately or in combination, appears to better discriminate closely related strains. A major objective would obviously be to obtain the complete ITS and LSU sequences of at least one strain of each genotype. It would thus be possible to verify if the hidden break in the LSU occurs in all lineages, as well as to build a robust rDNA operon tree to better resolve the phylogeny.