Previous Article in Journal
Mitochondrial Dysregulation in Male Infertility: A Preliminary Study for Infertility-Specific lncRNA Variants
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Inteins at Eleven Distinct Insertion Sites in Archaeal Helicase Subunit MCM Exhibit Varied Architectures and Activity Levels Across Archaeal Groups

by
Danielle Arsenault
1,*,
Gabrielle F. Stack
1 and
Johann Peter Gogarten
1,2,*
1
Department of Molecular and Cell Biology, University of Connecticut, Storrs, CT 06268-3125, USA
2
Institute for Systems Genomics, University of Connecticut, Storrs, CT 06268-3125, USA
*
Authors to whom correspondence should be addressed.
Submission received: 25 May 2025 / Revised: 22 July 2025 / Accepted: 31 July 2025 / Published: 14 August 2025

Abstract

Background/Objectives: Inteins are mobile genetic elements invading highly conserved genes across all domains of life and viruses. Five active intein insertion sites (MCM-a through e) had previously been identified and studied in the archaeal replicative helicase minichromosome maintenance (MCM) subunit gene mcm, making MCM an ideal system for dissecting the dynamics of multi-intein genes. However, work in this system thus far has been limited to particular archaeal groups. To better understand the dynamics and diversity of these inteins, MCM homologs spanning all archaeal groups were extracted from NCBI’s non-redundant protein sequence database, and the distribution and structural architectures of their inteins were characterized. Methods: The amino acid sequences of 4243 archaeal MCM homologs were retrieved from NCBI’s non-redundant protein sequence database. These sequences were systematically assessed for their intein content through within-group multiple sequence alignments. To characterize the inteins present at each site, extensive intein structure predictions and comparisons were performed. Phylogenetic analyses were used to investigate intein relatedness between and within sites, as well as the distribution of different MCM inteins in geographically overlapping populations of archaea. Results: In total, 11 active MCM intein insertion sites were identified, expanding on the previously known five. The insertion sites have varied invasion activity levels across archaeal groups, with Nanobdellati (DPANN) being the only group with all 11 sites active. In all but two (Methanonatronarchaeia and Hadarchaeota) of the archaeal groups studied where inteins were present, at least one MCM homolog was invaded by more than one intein. With respect to intein structure, within-intein insertions bearing semblance to DNA-binding domains were identified, with varied presence between inteins. Additionally, a study of archaeal MCM sequences of samples collected from the Atacama Desert in June 2013 revealed high MCM intein diversity levels. Conclusions: We identified six new active intein insertion sites in archaeal MCM, more than doubling the five previously known sites. All eleven intein insertion sites were either close to the ATP binding site, or the lined the channel through which the single-stranded DNA is pulled during the catalytic cycle of the helicase. Many of the analyzed inteins contained insertions bearing similarity to DNA-binding helix-turn-helix domains suggesting potential involvement in the intein homing process. Additionally, the high levels of MCM intein diversity observed in archaea from the Atacama Desert provide novel and strong support for a co-existence model of intein persistence.

1. Introduction

Inteins are mobile genetic elements invading highly conserved genes throughout all domains of life and viruses. An intein invades its host gene at the DNA level, similar to an intron [1]. Unlike introns which are spliced out at the RNA level, inteins splice themselves out post-translationally at the protein level using an autocatalytic self-splicing reaction enabled by the intein’s self-splicing domain. During protein splicing, the intein is able to seamlessly ligate the two halves of its host protein back together, allowing the host protein to function [2]. This natural capacity of inteins to engage in seamless protein splicing has made them invaluable tools in the development of protein engineering technology [3]. Recent large-scale intein characterization studies have revealed increasingly diverse intein architectures, such as varied architectures in the inteins of phages [4]. Such novel intein architectures hold the potential for new technological applications, emphasizing the importance of continued mass intein-characterization efforts.
Along with their novel biochemical capabilities, inteins also engage in unorthodox evolutionary behaviors. In addition to the self-splicing domain, full inteins contain a central homing endonuclease domain which bestows them with the ability to be inherited at super-Mendelian frequencies through the process of homing [5]. When in the presence of an uninvaded copy of the intein’s host gene, the homing endonuclease domain will make a double-strand DNA break at the intein insertion site. Then, through homologous recombination-based DNA repair, the intein-containing copy of the gene can be used as a template leading the intein DNA sequence to be pasted into the previously uninvaded copy. As a result, homing allows inteins to rapidly proliferate through a population in spite of their fitness cost to the host [6]. Per the Goddard-Burt life cycle of a homing endonuclease-driven selfish genetic element such as an intein, once the element reaches saturation in a population and there are no more empty target sites, the homing endonuclease is no longer under selective pressure to be maintained [5]. Once the homing endonuclease of an intein has severely decayed beyond function, it is referred to as a mini intein. In the Goddard-Burt model, the mini intein is eventually lost from the population. Recent models have expanded on the Goddard-Burt model to suggest the co-existence of the three states (intein-free, full intein-containing, and mini intein-containing) as opposed to a synchronized progression through the states [7,8], but to date no evidence of such co-existence of all three states in a single population has been shown [9].
Archaea, which experience high levels of horizontal transfer [10,11,12], contain inteins in a wide range of genes [13]; however, the majority of intensive archaeal intein characterization efforts have been focused on a few select groups such as haloarchaea. In addition to being an understudied group ripe for the exploration of intein architectures and evolutionary dynamics, archaea offer a unique area of intein exploration in which both architecture and evolution can be studied: genes invaded by multiple inteins simultaneously [14,15]. The archaeal gene mcm, previously also referred to as cdc21, encodes the minichromosome maintenance (MCM) subunit of replicative DNA helicase [16,17]. It was previously known to contain five intein insertion sites named MCM-a through e respectively. Inteins at sites MCM-a through d have been the subject of intein insertion site recognition and self-splicing investigations, particularly in haloarchaea [15,18]. Insertion site MCM-e is not invaded in any haloarchaea analyzed to date, but the site is active in some groups of non-haloarchaea. The MCM-e intein published to the intein database InBase 2.0 [19] in 2012 was from Thermococcus litoralis, with the insertion site name CDC21-e [20]. An analysis of the MCM inteins at sites MCM-a through d in haloarchaeal MCM homologs from the order Haloferacales revealed a wide array of intein invasion statuses (empty, single, double, triple, and quadruple), mini and full inteins at the same insertion sites in different homologs, and sporadic distribution of the four inteins across the host protein phylogeny [15]. The diversity of MCM inteins in this archaeal order alone begs the question of whether such patterns will hold when a similar analysis is performed on other archaeal lineages, and whether such diversity can also be found in a single group of geographically overlapping populations of archaea as opposed to a mass sampling of sequences from a wide array of timepoints and geographic locations.
To address these questions of intein architectural diversity, distribution patterns, and evolutionary dynamics at the population level, we gathered 4243 complete archaeal MCM homologs from NCBI’s non-redundant protein sequence database. To obtain as accurate a description of the MCM inteins across all archaea as possible with available data, an iterative search approach was used to thoroughly sample all available groups of archaea. A combination of sequence alignment and predicted structure-based analyses were used to characterize the inteins at all sites, through which six new archaeal MCM intein insertion sites were discovered, raising the total from five to eleven documented MCM intein insertion sites. These new insertion sites all fall within the same catalytic domains of MCM as the known five (MCM-a through e). The sites are not all actively invaded in all archaeal groups, with Nanobdellati (DPANN) being the only group to have at least one intein at all 11 MCM intein insertion sites. Our structural analyses revealed three sites within the MCM inteins where insertions resembling DNA-binding domains are found. These within-intein insertions vary in presence and size, adding a second facet to the inteins’ architectural diversity beyond the status of their homing endonucleases (mini or full). These insertions further expand the repertoire of known intein architectures. Additionally, within the investigated dataset were 26 haloarchaeal sequences from the Atacama Desert all sampled in June of 2013 as part of a metagenomic study by Finstad et al. [21]. This single group of geographically overlapping archaeal populations had greatly diverse MCM intein compositions, including no, mini, and full inteins at the same site in different individuals. Such a mixture of alleles has not yet been reported, and strongly supports the co-existence model of intein persistence and captures the varied histories of inteins found at the different sites of a multi-intein gene. Thus, these findings mark an important advancement in our understanding of intein evolutionary dynamics and persistence across interacting populations.

2. Materials and Methods

Retrieving and curating amino acid sequence collection of archaeal MCM homologs. Using the MCM extein (host protein only, inteins removed in silico) sequence of Haloferax mediterranei (Protein Accession: WP_004058379.1) as the query sequence, position-specific iterated basic local alignment search tool (PSI-BLAST) [22] searches were performed against NCBI’s non-redundant protein sequence database with maximum 500 target sequences and an e-value cutoff of 0.0001. No more than five iterations were allowed, and the resulting matches to be used for the subsequent iteration were manually pruned to remove partial MCM sequences (less than 600 amino acids (aa)) and any non-MCM sequences. Each search was restricted to a different taxonomic group, such that effective sampling could be performed even for highly sequenced groups. After combining the smaller subsets of matches into taxonomically relevant groups (e.g., combining the four orders of Haloarchaea into a single Haloarchaea subgroup), 16 subgroups were formed: Haloarchaea (taxid 183963), Methanomicrobia (taxid 224756), Methanoliparia (taxid 2545688) Archaeoglobi (taxid 183980), Methanonatronarchaeia (taxid 171536), Thermoplasmatota (taxid 2283796), Nanohaloarchaea (taxid 1051663), Nanobdellati (DPANN) (taxid 1783276), Theionarchaea (taxid 1980645), Methanofastidiosa (taxid 1705400), Thermococci (taxid 183968), Hadarchaeota (taxid 3055124), Thermoproteati (TACK) (taxid 1783275), Promethearchaeati (Asgard) (taxid 1935183), and Hydrothermarchaeota (taxid 1935019).
Combined sequence and structure-based approach for characterizing the architectures of all inteins at each insertion site. For each of the 16 sets of MCM homologs, the sequences were initially aligned using MUSCLE [23] in SeaView [24] with slight manual adjustments to clarify intein versus extein (host protein) boundaries. For more complex cases such as Nanobdellati (DPANN) where all 11 intein insertion sites are active, and to varying degrees, no tried alignment algorithms (MUSCLE, clustalo [25], and MAFFT [26]) were able to properly align the sequences. For these cases, more extensive manual adjustments were required to establish the intein-host protein boundaries. These alignments were never directly used for phylogenetic reconstruction, and were rather used to establish boundaries between host protein and intein sequences which were then extracted and re-aligned algorithmically for further analyses. For each intein insertion site within each taxonomic group sampled, the largest intein at the site was extracted, de-aligned, and used as input for AlphaFold3 [27]. Through this process, three sites within the inteins which occasionally contained insertions were identified: Insert Site 1 at the end of the N-terminal portion of the self-splicing domain, Insert Site 2 at the start of the C-terminal portion of the self-splicing domain, and Insert Site 3 ~12aa after Insert Site 2. Guided by the predicted structure of the largest intein, the homing endonuclease LAGLIDADG motif blocks and any within-intein insertions (Insertions 1, 2, and/or 3) were marked as selectable Sites in the sequence alignment in SeaView. By defining these Sites, each intein could be characterized based on its homing endonuclease and insertion architecture. The insertions were categorized as either small (20aa–60aa) or large (greater than 60aa) to capture the size variation observed between insertions at the same sites in different inteins. The minimum cutoff of 20aa was chosen based on the minimum length of a helix-turn-helix DNA-binding domain [28]. The sequence alignments for each group with declared Sites are provided as Nexus (.nxs) files in Supplemental Data S1. The NCBI Protein Accession numbers are provided in the annotation line of every sequence. The inteins at each site were extracted into joined files, where Sites indicating Inserts 1, 2, and 3 were established in SeaView (Nexus (.nxs) files available in Supplemental Data S2).
Unrooted amino acid sequence phylogenies. All phylogenies generated for this work should be considered unrooted, and are arbitrarily rooted when presented as such. For construction, the respective alignments were used as input for IQ-TREE2 [29], allowing ModelFinder [30] to identify the best fit model, and with 1000 replicates of ultrafast bootstrapping [31]. The selected models are provided in the figure legends for each respective phylogeny. Treefiles were visualized using FigTree v.1.4.4 and Inkscape v.1.2.2.

3. Results

Analysis of MCM homologs across archaea reveals six new active MCM intein insertion sites. To investigate the abundance, structural features, and distribution of archaeal MCM inteins, 4243 archaeal MCM homologs from NCBI’s non-redundant protein sequence database were systematically collected. The domain Archaea was divided into subgroups following NCBI’s Taxonomy Browser classifications [32,33,34], with more heavily sampled groups such as Stenosarchaea broken down into smaller groups for more thorough sampling. With thorough manual inspection of the sequence alignments generated for each subgroup, 11 distinct MCM intein insertion sites were identified (Figure 1). To our best knowledge, the only previously reported archaeal MCM intein insertion sites were MCM-a, b, c, d, and e. The new sites are all located in the same catalytic region as the known five (Figure 1A–C), owing to inteins’ propensity to invade highly conserved regions of genes [35]. The intein insertion sites mainly cluster around motifs especially important for ATP binding: the Walker A, Walker B, and arginine finger motifs [36]. The remaining intein insertion sites are in the channel through which single-stranded DNA is threaded during DNA replication (Figure 1C). Following the naming convention used for these intein insertion sites thus far, with slight alteration due to the very close proximity of two new sites to two pre-existing sites, we refer to these new sites as MCM-f, MCM-g, MCM-h, MCM-i, MCM-d1, and MCM-e1. MCM-f through i are named in order of discovery and not their position in the linear sequence (Figure 1D), as this has been followed for naming the previously known sites. MCM-d1 and e1 are distinctly different from but very close to MCM-d and e (one residue upstream), thus we felt it beneficial to stray slightly from the traditional naming convention to reflect this. In this work, we refer to the original MCM-d and e as MCM-d2 and e2 due to them being one residue downstream of MCM-d1 and e1 respectively. Due to lack of sequence variation in the three identified MCM-d1 inteins, using phylogenetic reconstruction to further cement them as inteins of a separate insertion site than the MCM-d2 (d) inteins was not possible using an alignment of the MCM-d1 and d2 (d) inteins alone. However, there was sufficient variation among the inteins found at MCM-e1, allowing all MCM-e1 and MCM-e2 (e) inteins to be extracted, re-aligned, and used for phylogenetic reconstruction (Figure S1). In the resulting phylogeny, the MCM-e1 inteins group together as opposed to grouping with the MCM-e2 inteins from their respective archaeal groups (Nanobdellati (DPANN) and Promethearchaeati (Asgard)), providing further support for the MCM-e1 inteins being distinct from MCM-e2 (e).
Varied invasion levels and distinct evolutionary histories at each MCM intein insertion site. After establishing the positions of all MCM intein insertion sites, the invasion levels for each site across each archaeal group were assessed (Figure 2A). Out of the 16 archaeal subgroups, 14 had at least one actively invaded MCM intein insertion site. The only group in which all 11 sites are active, meaning at least one homolog from the group contains an intein at that site, is Nanobdellati (DPANN). Nanobdellati (DPANN) is also the only group with an active MCM intein insertion site which is inactive in all other groups (MCM-f). In contrast to MCM-f whose activity is seemingly limited to Nanobdellati (DPANN), in all intein-containing groups except for Hadarchaeota, at least one homolog had an intein at MCM-c. All new MCM intein insertion sites are less populated with inteins than the previously known sites. Similarly, instances of multi-intein invasions more frequently involved previously known sites, with all quadruple invasions involving sites MCM-a, b, and c, with the fourth occupied site either being MCM-d2 (d) or e2 (e) (Table S1). In total, ~73.5% of homologs (3125) had no inteins, ~17% (709) had one intein, ~7% (305) had two inteins, ~2% (79) had three inteins, and ~0.5% (25) had four inteins (Table 1). While having 11 intein insertion sites and accounting for an intein status of empty, mini, or full at each site yields 311 (177,147) theoretically possible MCM intein combinations, only 105 were observed. Out of those 105, 37 of the arrangements were observed in only one homolog each. All observed combinations of MCM inteins and their occurrences are available in Table S1.
In addition to assessing intein invasion levels at each site, phylogenetic analysis was performed to assess grouping patterns of the MCM inteins (Figure 2B). Inteins at sites MCM-g, b, e1, d1, d2 (d), and f are monophyletic. The MCM-e2 (e) inteins all group together, with the MCM-e1 intein group emerging from them, adding further support to the differentiation between MCM-e1 and e2 (e) inteins despite their insertion sites being a single residue apart. This analysis also strengthens confidence in the differentiation between MCM-d1 and d2 (d) inteins, as the MCM-d1 inteins exhibit evolutionary distance from the MCM-d2 (d) inteins. The MCM-d1 inteins emerge from a group of MCM-a inteins. An additional small group of MCM-a inteins from which the MCM-h and i intein groups emerge is observed, but the majority of MCM-a inteins group together. The MCM-c inteins all group together, with the MCM-f inteins emerging from them. Over all, the inteins group strongly by insertion site as opposed to archaeal group.
Decaying versus full homing endonucleases and within-intein insertions at three distinct sites generate architectural diversity. For each MCM intein insertion site in each of the 16 groups of homologs, all inteins were categorized as either mini (no detectable homing endonuclease domain) or full (detectable homing endonuclease domain with both LAGLIDADG motifs). Mini inteins were only identified at sites MCM-a, b, c, d2, e1, and e2. For those sites, the number of full inteins found was 1.5 to 8 times more than the number of mini inteins found (Figure S3). Using a combined sequence and predicted structure-based approach to define the domains of the inteins found at each site, inteins with additional domains beyond a homing endonuclease and self-splicing domain were identified. These within-intein insertions were identified in both full and mini inteins. Across all 1656 inteins investigated, three distinct sub-insertion sites within the inteins were identified: Insertion 1 located at the end of the N-terminal portion of the self-splicing domain; Insertion 2 at the beginning of the C-terminal portion of the self-splicing domain; Insertion 3 located ~12aa downstream of Insertion 2, placing it just after a conserved beta-strand in the C-terminal portion of the self-splicing domain [38,39] (Figure 3). Accounting for homing endonuclease and within-intein insertion status (mini or full intein; no, small, or large Insert 1; no, small, or large Insert 2; no, small, or large Insert 3) a total of 17 architectural variants were identified. The distribution of these architectural variants across the MCM intein insertion sites was assessed (Figure 4). Certain MCM intein insertion sites exhibited little variation in the architecture of their inteins, such as MCM-g which contained only full inteins with no insertions. This homogeneity is not due to limited distribution, as MCM-g inteins are present across several archaeal groups: Thermoplasmatota, Nanobdellati (DPANN), Hadarchaeota, and Thermoproteati (TACK) (Figure 2). In contrast, site MCM-b contained seven architectural variants. Insertion 3 was only identified in inteins located at site MCM-d2 (d).
Geographically overlapping populations in the Atacama Desert have a wide range of MCM intein architectures and invasion statuses including co-existing empty, mini intein, and full intein alleles. To investigate models of intein persistence which involve co-existence of intein-free, mini-intein, and full-intein alleles [9], the haloarchaeal Atacama Desert sequences generated during the halite metagenome-based project of Finstad et al. [21] were utilized. All samples were collected from three regions in the Atacama Desert in Chile during June of 2013. From their sequence data, we identified 26 complete haloarchaeal MCM homologs, all of which were part of our collection of 4243 MCM homologs. While the sequences are classified only as Halobacteriales archaea through the Finstad et al. project, we were able to provide more certainty on the genus-level identities of 24/26 archaea by comparing to sequences of haloarchaea in our dataset with known genus-level identities (Figure S4). By these classifications, these archaeal populations span the genera Salinarchaeum, Natronomonas, Halovenus, Halostella, Halosimplex, Halosegnis, Halorussus, Halorubrum, Halomicrobium, Halomarina, Halococcus, Halobaculum, and Haloarcula. Mapping the intein presence and architectures of these sequences onto a phylogeny of the MCM host proteins reveals a mixture of vertical inheritance and horizontal transfer, and varied intein architectures at a single site in closely related individuals (Figure 5). All degrees of MCM intein invasion except quadruple (empty, single, double, and triple) are observed in the population, as well as mini and full inteins at sites MCM-a and d2 (d). The Atacama Desert sequences provide concrete evidence for the co-existence of empty, mini-intein, and full-intein alleles in geographically overlapping populations of archaea from a single time period (June 2013), with each intein insertion site exhibiting different degrees of balance between the three alleles owing the unique evolutionary histories of inteins at each site.

4. Discussion

Archaeal MCM is a powerful system for the continued exploration of multi-intein gene dynamics. Our work presents six previously unknown MCM intein insertion sites and provides extensive characterization of the frequencies and architectures of inteins found at each site across archaea. We find the maximum degree for invasion of the mcm gene to be four inteins, despite many archaeal groups having more than four active MCM intein insertion sites. Similar caps on intein invasion have been observed for other multi-intein genes. The archaeal gene polB, encoding B-type DNA polymerase which is involved in DNA replication [40,41], has up to three inteins simultaneously invading investigated copies of the gene [14]. Additionally, a gene encoding a bacterial ribonucleotide reductase (RIR), an important enzyme for DNA synthesis [42,43], has up to four inteins in investigated copies [44]. Interestingly, the previous work in archaeal polB [14] revealed Haloquadratum walsbyi to harbor the highest degree of intein invasion (triple), which is also the case for several strains of Haloquadratum walsbyi analyzed in this study (invaded at sites MCM-a, b, c, and d2 (d)); however, the highest degree of invasion is still ultimately the rarest configuration in both polB and mcm. Nevertheless, Haloquadratum walsbyi’s apparent proclivity towards highly invaded intein-containing genes makes this a species of interest for future intein fitness cost investigations. Previous work investigating the fitness cost associated with harboring an intein, also carried out in haloarchaea, found a 7% fitness cost associated with the intein found at position c in the aforementioned polB archaeal gene [6,14]. Future studies focusing on fitness costs associated with more than one intein will reveal whether each intein accrues a constant fitness cost, or whether there is a compounding effect. As mentioned, individuals with the highest degree of intein invasion are far rarer than those with fewer inteins; however, how will the fitness cost per intein change when different parameters such as homing endonuclease status (degraded in mini inteins, versus intact in full inteins), and sequence similarity between insertion sites of investigated inteins are accounted for? While we hypothesize mini inteins will accrue a lesser fitness cost due to requiring fewer nucleotide and amino acid resources, and that inteins with less-similar insertion sites will co-populate more effectively due to decreased risk of non-specific intein endonuclease activity, future experiments will further resolve our understanding of these aspects.
Our investigation of the geographically overlapping populations of archaea in the Atacama Desert using the MCM sequences of Finstad et al. [21] revealed an intriguing composition which contradicts the overall observed trend: eight intein-free individuals, three single-intein individuals, twelve double-intein individuals, three triple-intein individuals, and no quad-intein individuals. Thus, across these populations, which were all sampled in June 2013, it is more common to have two MCM inteins than it is to have one or zero. The twelve double-intein individuals exhibit eight different intein configurations and span eight known genera (Figure 5), ruling out population homogeneity as an explanation for this deviation from the trend of fitness cost accrued with each intein. Ultimately, this is a small sample (26 sequences) compared to the main dataset analyzed (4243 sequences), and future assessments of MCM inteins in geographically overlapping populations of archaea on a larger scale will be needed to further tease apart these trends in intein fitness cost. Thus, the six additional MCM intein insertion sites presented in this work offer new avenues for exploring molecular and evolutionary dynamics between inteins co-inhabiting a gene. In addition to these newly discovered intein insertion sites, the range of MCM intein architectures discovered opens avenues for further investigation of the biochemical versatility of inteins with additional, potentially DNA-binding, domains.
Potential origin and role of the within-intein insertions. Archaea utilize several small DNA-binding proteins for transcriptional regulation, with some even interacting directly with MCM [45]. The core domain responsible for the DNA-binding abilities of these proteins is a helix-turn-helix domain [28,46,47], to which the within-intein insertions found within many of the MCM inteins presented in this work bear strong resemblance in predicted structures (Figure 6A). Thus, the pool of small DNA-binding helix-turn-helix proteins encoded in archaeal genomes could potentially be the source of some of these insertions. Within inteins, such additional inserted domains have been observed in a region analogous to Insertion 1 in this work: at the end of the N-terminal portion of the self-splicing domain. In the crystal structure of the yeast vacuolar ATPase intein PI-SceI (Protein Databank (PDB) entry 1LWS [48]), such an insertion is present (Figure 6B). Work preceding the solving of PDB 1LWS implicated this region in DNA recognition and binding [49,50], with the subsequent solving of the crystal structure confirming direct interaction between this region and the target DNA sequence [48]. Thus, it is possible that the insertions within the MCM inteins are involved in the homing process, potentially in stabilizing the binding of the intein to its target DNA. Future work can build on our findings to assess the individual roles of each of the three types of within-intein insertions of MCM inteins.
Atacama Desert archaea provide support for co-existence model of intein persistence. Several models have been proposed to explain the life cycles of inteins in populations, with more recent proposals expanding on the Goddard-Burt homing cycle to suggest the co-existence of the three alleles (intein-free, full intein-containing, and mini intein-containing) in populations as a means of intein persistence [5,9]. To obtain evidence in support of or against the co-existence models, sequences from geographically overlapping populations sampled within a short timeframe serve as an ideal dataset to analyze. The Atacama Desert samples investigated in this work provide for the first time a clear picture of the MCM intein dynamics in a group of geographically overlapping archaeal populations. In these archaea, there is a balance of empty, mini, and full alleles for sites MCM-a and d2 (d), and empty and full alleles for sites MCM-b and c (the other seven sites are inactive, which is true of all haloarchaea analyzed). Thus, the MCM inteins in these populations operate in a manner more in line with a co-existence model for intein persistence [7,8,9], rather than the synchronized Goddard-Burt life cycle [5]. Future investigations of intein dynamics within geographically overlapping populations will continue to shed light on the frequencies of co-existence versus synchronized progression modes for intein persistence in natural populations.

5. Conclusions

Through this work, we characterized six new active intein insertion sites in the MCM subunit of archaeal replicative DNA helicase. By thoroughly characterizing the distributions and architectures of inteins found at all of the archaeal MCM intein insertion sites, we observed varying degrees of site invasion activity between archaeal groups and a wide range of structural architectures. The within-intein insertions responsible for part of this architectural range bear similarity to DNA-binding proteins, suggesting a potential role in intein homing. Additionally, the MCM intein diversity observed in archaea from the Atacama Desert supports a co-existence model of intein persistence in these populations, wherein each MCM intein insertion site exhibits a different arrangement of balance between empty, mini, and full alleles. Future endeavors to characterize the MCM intein content of more geographically overlapping archaeal populations, as well as the intein content of other multi-intein genes, will continue to build our understanding of how intein distributions are balanced to maintain intein persistence.

Supplementary Materials

The following supporting information can be downloaded at: https://www.mdpi.com/article/10.3390/dna5030039/s1, Figure S1: Unrooted phylogeny of MCM-e1 and MCM-e2 (MCM-e) inteins; Figure S2: Collapsed MCM intein phylogeny with ultrafast bootstrap support values; Figure S3: Total mini and full inteins observed at each MCM intein insertion site for each archaeal group; Figure S4: Unrooted phylogeny of haloarchaeal MCM host proteins with Atacama Desert sequences indicated; Table S1: All observed MCM site combinations; Supplemental Data S1: all MCM protein sequences used in this study with protein accession numbers in each annotation line, organized by archaeal taxonomy; Supplemental Data S2: all identified MCM intein protein sequences investigated in this study, organized by MCM intein insertion site; Supplemental Data S3: alignments and tree files for all phylogenies mentioned in this work.

Author Contributions

Conceptualization, D.A. and J.P.G.; methodology, D.A., G.F.S. and J.P.G.; software, D.A., G.F.S. and J.P.G.; validation, D.A., G.F.S. and J.P.G.; formal analysis, D.A. and G.F.S.; investigation, D.A., G.F.S. and J.P.G.; resources, D.A., G.F.S. and J.P.G.; data curation, D.A. and G.F.S.; writing—original draft preparation, D.A., G.F.S. and J.P.G.; writing—review and editing, D.A., G.F.S. and J.P.G.; visualization, D.A.; supervision, D.A., G.F.S. and J.P.G.; project administration, J.P.G.; funding acquisition, J.P.G. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The protein sequences analyzed in this study are publicly available on NCBI’s Protein Sequence Database. The individual Protein Accession numbers for each sequence are available in the annotations for each sequence in the provided alignments (Supplemental Data S1).

Acknowledgments

We thank the Computational Biology Core at the University of Connecticut for maintaining and managing the Xanadu computing clusters, which were a critical resource for many of the analyses completed in this work.

Conflicts of Interest

The authors declare no conflicts of interest.

Abbreviations

The following abbreviations are used in this manuscript:
aaamino acid
LAGLIDADGone letter abbreviations of amino acids in a particular homing endonuclease motif
MCMminichromosome maintenance protein, also previously known as cdc21
PDBProtein Databank
pLDDTpredicted local distance difference test
PSI-BLASTposition-specific iterated basic local alignment search tool
RIRribonucleotide reductase
taxidTaxonomy ID

References

  1. Hirata, R.; Ohsumk, Y.; Nakano, A.; Kawasaki, H.; Suzuki, K.; Anraku, Y. Molecular structure of a gene, VMA1, encoding the catalytic subunit of H+-translocating adenosine triphosphatase from vacuolar membranes of Saccharomyces cerevisiae. J. Biol. Chem. 1990, 265, 6726–6733. [Google Scholar] [CrossRef]
  2. Kane, P.M.; Yamashiro, C.T.; Wolczyk, D.F.; Neff, N.; Goebl, M.; Stevens, T.H. Protein Splicing Converts the Yeast TFP1 Gene Product to the 69-kdDSubunit of the Vacuolar H+-Adenosine Triphosphatase. Science 1990, 250, 651–657. [Google Scholar] [CrossRef]
  3. Wang, H.; Wang, L.; Zhong, B.; Dai, Z. Protein Splicing of Inteins: A Powerful Tool in Synthetic Biology. Front. Bioeng. Biotechnol. 2022, 10, 810180. [Google Scholar] [CrossRef]
  4. Gosselin, S.P.; Arsenault, D.; Gogarten, J.P. Actinobacteriophage Inteins: Host Diversity, Local Dissemination, and Non-Canonical Architecture. bioRxiv 2025. [Google Scholar] [CrossRef]
  5. Goddard, M.R.; Burt, A. Recurrent invasion and extinction of a selfish gene. Proc. Natl. Acad. Sci. USA 1999, 96, 13880–13885. [Google Scholar] [CrossRef]
  6. Naor, A.; Altman-Price, N.; Soucy, S.M.; Green, A.G.; Mitiagin, Y.; Turgeman-Grott, I.; Davidovich, N.; Gogarten, J.P.; Gophna, U. Impact of a homing intein on recombination frequency and organismal fitness. Proc. Natl. Acad. Sci. USA 2016, 113, E4654–E4661. [Google Scholar] [CrossRef] [PubMed]
  7. Barzel, A.; Obolski, U.; Gogarten, J.P.; Kupiec, M.; Hadany, L. Home and away-the evolutionary dynamics of homing endonucleases. BMC Evol. Biol. 2011, 11, 324. [Google Scholar] [CrossRef]
  8. Yahara, K.; Fukuyo, M.; Sasaki, A.; Kobayashi, I. Evolutionary maintenance of selfish homing endonuclease genes in the absence of horizontal transfer. Proc. Natl. Acad. Sci. USA 2009, 106, 18861–18866. [Google Scholar] [CrossRef] [PubMed]
  9. Gogarten, J.P.; Hilario, E. Inteins, introns, and homing endonucleases: Recent revelations about the life cycle of parasitic genetic elements. BMC Evol. Biol. 2006, 6, 94. [Google Scholar] [CrossRef] [PubMed]
  10. Nelson-Sathi, S.; Sousa, F.L.; Roettger, M.; Lozada-Chávez, N.; Thiergart, T.; Janssen, A.; Bryant, D.; Landan, G.; Schönheit, P.; Siebers, B.; et al. Origins of major archaeal clades correspond to gene acquisitions from bacteria. Nature 2015, 517, 77–80. [Google Scholar] [CrossRef]
  11. Nelson-Sathi, S.; Dagan, T.; Landan, G.; Janssen, A.; Steel, M.; McInerney, J.O.; Deppenmeier, U.; Martin, W.F. Acquisition of 1000 eubacterial genes physiologically transformed a methanogen at the origin of Haloarchaea. Proc. Natl. Acad. Sci. USA 2012, 109, 20537–20542. [Google Scholar] [CrossRef] [PubMed]
  12. Gophna, U.; Altman-Price, N. Horizontal Gene Transfer in Archaea—From Mechanisms to Genome Evolution. Annu. Rev. Microbiol. 2022, 76, 481–502. [Google Scholar] [CrossRef]
  13. Novikova, O.; Jayachandran, P.; Kelley, D.S.; Morton, Z.; Merwin, S.; Topilina, N.I.; Belfort, M. Intein Clustering Suggests Functional Importance in Different Domains of Life. Mol. Biol. Evol. 2016, 33, 783–799. [Google Scholar] [CrossRef]
  14. Naor, A.; Lazary, R.; Barzel, A.; Papke, R.T.; Gophna, U. In Vivo Characterization of the Homing Endonuclease within the polB Gene in the Halophilic Archaeon Haloferax volcanii. PLoS ONE 2011, 6, e15833. [Google Scholar] [CrossRef]
  15. Turgeman-Grott, I.; Arsenault, D.; Yahav, D.; Feng, Y.; Miezner, G.; Naki, D.; Peri, O.; Papke, R.T.; Gogarten, J.P.; Gophna, U.; et al. Neighboring inteins interfere with one another’s homing capacity. PNAS Nexus 2023, 2, 354. [Google Scholar] [CrossRef]
  16. Brewster, A.S.; Chen, X.S. Insights into the MCM functional mechanism: Lessons learned from the archaeal MCM complex. Crit. Rev. Biochem. Mol. Biol. 2010, 45, 243–256. [Google Scholar] [CrossRef]
  17. Maine, G.T.; Sinha, P.; Tye, B.-K. Mutants of S. cerevisiae defective in the maintenance of minichromosomes. Genetics 1984, 106, 365–385. [Google Scholar] [CrossRef]
  18. Yalala, V.R.; Lynch, A.K.; Mills, K.V. Conditional Alternative Protein Splicing Promoted by Inteins from Haloquadratum walsbyi. Biochemistry 2022, 61, 294–302. [Google Scholar] [CrossRef] [PubMed]
  19. In Base2.0. Available online: https://inbase.ligsciss.com/index.php?r=site/index (accessed on 23 May 2025).
  20. Perler, F.B. In Base: The Intein Database. Nucleic Acids. Res. 2002, 30, 383–384. [Google Scholar] [CrossRef]
  21. Finstad, K.M.; Probst, A.J.; Thomas, B.C.; Andersen, G.L.; Demergasso, C.; Echeverría, A.; Amundson, R.G.; Banfield, J.F. Microbial Community Structure and the Persistence of Cyanobacterial Populations in Salt Crusts of the Hyperarid Atacama Desert from Genome-Resolved Metagenomics. Front. Microbiol. 2017, 8, 1435. [Google Scholar] [CrossRef] [PubMed]
  22. Altschul, S.F.; Madden, T.L.; Schäffer, A.A.; Zhang, J.; Zhang, Z.; Miller, W.; Lipman, D.J. Gapped BLAST and PSI-BLAST: A new generation of protein database search programs. Nucleic. Acids. Res. 1997, 25, 3389–3402. [Google Scholar] [CrossRef]
  23. Edgar, R.C. MUSCLE: A multiple sequence alignment method with reduced time and space complexity. BMC Bioinform. 2004, 5, 113. [Google Scholar] [CrossRef] [PubMed]
  24. Gouy, M.; Tannier, E.; Comte, N.; Parsons, D.P. Seaview Version 5: A Multiplatform Software for Multiple Sequence Alignment, Molecular Phylogenetic Analyses, and Tree Reconciliation. Methods Mol. Biol. 2021, 2231, 241–260. [Google Scholar] [CrossRef] [PubMed]
  25. Sievers, F.; Higgins, D.G. Clustal Omega for making accurate alignments of many protein sequences. Protein. Sci. 2018, 27, 135–145. [Google Scholar] [CrossRef] [PubMed]
  26. Katoh, K.; Misawa, K.; Kuma, K.; Miyata, T. MAFFT: A novel method for rapid multiple sequence alignment based on fast Fourier transform. Nucleic. Acids. Res. 2002, 30, 3059–3066. [Google Scholar] [CrossRef] [PubMed]
  27. Jumper, J.; Evans, R.; Pritzel, A.; Green, T.; Figurnov, M.; Ronneberger, O.; Tunyasuvunakool, K.; Bates, R.; Žídek, A.; Potapenko, A.; et al. Highly accurate protein structure prediction with AlphaFold. Nature 2021, 596, 583–589. [Google Scholar] [CrossRef]
  28. Pellegrini-Calace, M. Detecting DNA-binding helix-turn-helix structural motifs using sequence and structure information. Nucleic. Acids. Res. 2005, 33, 2129–2140. [Google Scholar] [CrossRef]
  29. Minh, B.Q.; Schmidt, H.A.; Chernomor, O.; Schrempf, D.; Woodhams, M.D.; von Haeseler, A.; Lanfear, R. IQ-TREE 2: New Models and Efficient Methods for Phylogenetic Inference in the Genomic Era. Mol. Biol. Evol. 2020, 37, 1530–1534. [Google Scholar] [CrossRef]
  30. Kalyaanamoorthy, S.; Minh, B.Q.; Wong, T.K.F.; von Haeseler, A.; Jermiin, L.S. ModelFinder: Fast model selection for accurate phylogenetic estimates. Nat. Methods 2017, 14, 587–589. [Google Scholar] [CrossRef]
  31. Hoang, D.T.; Chernomor, O.; von Haeseler, A.; Minh, B.Q.; Vinh, L.S. UFBoot2: Improving the Ultrafast Bootstrap Approximation. Mol. Biol. Evol. 2018, 35, 518–522. [Google Scholar] [CrossRef]
  32. Woese, C.R.; Kandler, O.; Wheelis, M.L. Towards a natural system of organisms: Proposal for the domains Archaea, Bacteria, and Eucarya. Proc. Natl. Acad. Sci. USA 1990, 87, 4576–4579. [Google Scholar] [CrossRef]
  33. Göker, M.; Oren, A. Valid publication of names of two domains and seven kingdoms of prokaryotes. Int. J. Syst. Evol. Microbiol. 2024, 74, 006242. [Google Scholar] [CrossRef]
  34. Schoch, C.L.; Ciufo, S.; Domrachev, M.; Hotton, C.L.; Kannan, S.; Khovanskaya, R.; Leipe, D.; Mcveigh, R.; O’nEill, K.; Robbertse, B.; et al. NCBI Taxonomy: A comprehensive update on curation, resources and tools. Database 2020, 2020, baaa062. [Google Scholar] [CrossRef]
  35. Swithers, K.S.; Senejani, A.G.; Fournier, G.P.; Gogarten, J.P. Conservation of intron and intein insertion sites: Implications for life histories of parasitic genetic elements. BMC Evol. Biol. 2009, 9, 303. [Google Scholar] [CrossRef] [PubMed]
  36. Brewster, A.S.; Wang, G.; Yu, X.; Greenleaf, W.B.; Carazo, J.M.; Tjajadi, M.; Klein, M.G.; Chen, X.S. Crystal structure of a near-full-length archaeal MCM: Functional insights for an AAA+ hexameric helicase. Proc. Natl. Acad. Sci. USA 2008, 105, 20191–20196. [Google Scholar] [CrossRef] [PubMed]
  37. Meagher, M.; Epling, L.B.; Enemark, E.J. DNA translocation mechanism of the MCM complex and implications for replication initiation. Nat. Commun. 2019, 10, 3117. [Google Scholar] [CrossRef] [PubMed]
  38. Mills, K.V.; Johnson, M.A.; Perler, F.B. Protein Splicing: How Inteins Escape from Precursor Proteins. J. Biol. Chem. 2014, 289, 14498–14505. [Google Scholar] [CrossRef]
  39. Tori, K.; Dassa, B.; Johnson, M.A.; Southworth, M.W.; Brace, L.E.; Ishino, Y.; Pietrokovski, S.; Perler, F.B. Splicing of the Mycobacteriophage Bethlehem DnaB Intein. J. Biol. Chem. 2010, 285, 2515–2526. [Google Scholar] [CrossRef]
  40. Xu, F.; Zhang, B.; Gao, Z.; Xu, R.; Liu, X.; Ishino, S.; Feng, M.; Shen, Y.; Ishino, Y.; She, Q.; et al. A Well-Conserved Archaeal B-Family Polymerase Functions as an Extender in Translesion Synthesis. mBio 2022, 13, e02659-21. [Google Scholar] [CrossRef]
  41. Makarova, K.S.; Krupovic, M.; Koonin, E.V. Evolution of replicative DNA polymerases in archaea and their contributions to the eukaryotic replication machinery. Front. Microbiol. 2014, 5, 354. [Google Scholar] [CrossRef]
  42. Albert, J.; Eduard, T.; Irma, S.; Ulf, H.; Isidre, G.; Peter, R. Ribonucleotide Reduction in Pseudomonas Species: Simultaneous Presence of Active Enzymes from Different Classes. J. Bacteriol. 1999, 181, 3974–3980. [Google Scholar] [CrossRef]
  43. Kirdis, E.; Jonsson, I.-M.; Kubica, M.; Potempa, J.; Josefsson, E.; Masalha, M.; Foster, S.J.; Tarkowski, A. Ribonucleotide reductase class III, an essential enzyme for the anaerobic growth of Staphylococcus aureus, is a virulence determinant in septic arthritis. Microb. Pathog. 2007, 43, 179–188. [Google Scholar] [CrossRef]
  44. Liu, X.-Q.; Yang, J.; Meng, Q. Four Inteins and Three Group II Introns Encoded in a Bacterial Ribonucleotide Reductase Gene. J. Biol. Chem. 2003, 278, 46826–46831. [Google Scholar] [CrossRef] [PubMed]
  45. Aravind, L.; Koonin, E.V. DNA-binding proteins and evolution of transcription regulation in the archaea. Nucleic. Acids. Res. 1999, 27, 4658–4670. [Google Scholar] [CrossRef] [PubMed]
  46. Ohlendorf, D.H.; Anderson, W.F.; Fisher, R.G.; Takeda, Y.; Matthews, B.W. The molecular basis of DNA–protein recognition inferred from the structure of cro repressor. Nature 1982, 298, 718–723. [Google Scholar] [CrossRef] [PubMed]
  47. Sauer, R.T.; Yocum, R.R.; Doolittle, R.F.; Lewis, M.; Pabo, C.O. Homology among DNA-binding proteins suggests use of a conserved super-secondary structure. Nature 1982, 298, 447–451. [Google Scholar] [CrossRef] [PubMed]
  48. Moure, C.M.; Gimble, F.S.; Quiocho, F.A. Crystal structure of the intein homing endonuclease PI-SceI bound to its recognition sequence. Nat. Struct. Biol. 2002, 9, 764–770. [Google Scholar] [CrossRef]
  49. Christ, F.; Steuer, S.; Thole, H.; Wende, W.; Pingoud, A.; Pingoud, V. A Model for the PI-SceI×DNA Complex Based on Multiple Base and Phosphate Backbone-specific Photocross-links. J. Mol. Biol. 2000, 300, 841–849. [Google Scholar] [CrossRef]
  50. Hu, D.; Crist, M.; Duan, X.; Quiocho, F.A.; Gimble, F.S. Probing the Structure of the PI-SceI-DNA Complex by Affinity Cleavage and Affinity Photocross-linking. J. Biol. Chem. 2000, 275, 2705–2712. [Google Scholar] [CrossRef]
Figure 1. Six newly discovered MCM intein insertion sites all fall within the same catalytic domains as the five known insertion sites. A monomer from the solved crystal structure of the MCM homohexamer from archaeon Saccharolobus solfataricus P2 bound to ATP and single-stranded DNA (PDB 6MII Chain A [37]) was used to visualize core catalytic motifs (A) and all 11 intein insertion sites (B) of archaeal MCM. (A) ATP-binding motifs Walker A (denoted with an A), Walker B (denoted with a B), and an arginine finger (denoted with an R) in tertiary structure and linear amino acid sequence contexts. Single-stranded DNA (brown) and a bound ATP molecule from the structure are visualized. The arginine side chain of the arginine finger is outlined in green. (B) The five known (MCM-a, b, c, d, and e) and six new (MCM-d1, e1, f, g, h, and i) archaeal MCM intein insertion sites in tertiary structure and linear amino acid sequence contexts. Single-stranded DNA (brown) from the structure is visualized. MCM-d and e are referred to as MCM-d2 and e2, respectively, throughout this work. (C) The MCM homohexamer of PDB 6MII [37] with all 11 intein insertion sites visualized. The sites all reside in the catalytic domains of MCM. The subunits are all identical and only depicted in two shades of gray to better visualize the interfaces between subunits. The single-stranded DNA fed through the center of the structure is depicted in black. (D) The MCM host protein amino acid sequence of a Diapherotrites archaeon (Protein Accession: MBN2067331.1) with each MCM intein insertion site indicated with a triangle. The new sites presented in this work (MCM-d1, e1, f, g, h, and i) are indicated with stars.
Figure 1. Six newly discovered MCM intein insertion sites all fall within the same catalytic domains as the five known insertion sites. A monomer from the solved crystal structure of the MCM homohexamer from archaeon Saccharolobus solfataricus P2 bound to ATP and single-stranded DNA (PDB 6MII Chain A [37]) was used to visualize core catalytic motifs (A) and all 11 intein insertion sites (B) of archaeal MCM. (A) ATP-binding motifs Walker A (denoted with an A), Walker B (denoted with a B), and an arginine finger (denoted with an R) in tertiary structure and linear amino acid sequence contexts. Single-stranded DNA (brown) and a bound ATP molecule from the structure are visualized. The arginine side chain of the arginine finger is outlined in green. (B) The five known (MCM-a, b, c, d, and e) and six new (MCM-d1, e1, f, g, h, and i) archaeal MCM intein insertion sites in tertiary structure and linear amino acid sequence contexts. Single-stranded DNA (brown) from the structure is visualized. MCM-d and e are referred to as MCM-d2 and e2, respectively, throughout this work. (C) The MCM homohexamer of PDB 6MII [37] with all 11 intein insertion sites visualized. The sites all reside in the catalytic domains of MCM. The subunits are all identical and only depicted in two shades of gray to better visualize the interfaces between subunits. The single-stranded DNA fed through the center of the structure is depicted in black. (D) The MCM host protein amino acid sequence of a Diapherotrites archaeon (Protein Accession: MBN2067331.1) with each MCM intein insertion site indicated with a triangle. The new sites presented in this work (MCM-d1, e1, f, g, h, and i) are indicated with stars.
Dna 05 00039 g001
Figure 2. Distribution of MCM inteins across sampled archaeal groups and phylogeny of all MCM inteins analyzed. (A) An unrooted phylogeny of the amino acid sequences of MCM exteins arbitrarily chosen from each sampled archaeal group (left) with the total inteins observed at each of the 11 MCM intein insertion sites mapped to the leaves (right). The NCBI Taxonomy ID (taxid) and total number of homologs collected for each group are listed in the phylogeny taxa labels. The total number of inteins observed at each MCM intein insertion site over all are listed in bold under the letter representing each site (G for MCM-g, A for MCM-a, etc.). (B) All analyzed MCM inteins were extracted, joined into one file, and re-aligned using MAFFT [26]. The alignment was used as input for IQ-TREE2, allowing ModelFinder to choose the best fit model (Q.pfam + F + I + R10 chosen according to Bayesian Information Criterion), and performing 1000 replicates of ultrafast bootstrapping. The resulting tree was visualized using FigTree and Inkscape to illustrate the insertion sites at which each group of inteins on the phylogeny are found. A collapsed version of the phylogeny with bootstrap support values is provided in Figure S2.
Figure 2. Distribution of MCM inteins across sampled archaeal groups and phylogeny of all MCM inteins analyzed. (A) An unrooted phylogeny of the amino acid sequences of MCM exteins arbitrarily chosen from each sampled archaeal group (left) with the total inteins observed at each of the 11 MCM intein insertion sites mapped to the leaves (right). The NCBI Taxonomy ID (taxid) and total number of homologs collected for each group are listed in the phylogeny taxa labels. The total number of inteins observed at each MCM intein insertion site over all are listed in bold under the letter representing each site (G for MCM-g, A for MCM-a, etc.). (B) All analyzed MCM inteins were extracted, joined into one file, and re-aligned using MAFFT [26]. The alignment was used as input for IQ-TREE2, allowing ModelFinder to choose the best fit model (Q.pfam + F + I + R10 chosen according to Bayesian Information Criterion), and performing 1000 replicates of ultrafast bootstrapping. The resulting tree was visualized using FigTree and Inkscape to illustrate the insertion sites at which each group of inteins on the phylogeny are found. A collapsed version of the phylogeny with bootstrap support values is provided in Figure S2.
Dna 05 00039 g002
Figure 3. Three sites of within-intein insertions found in the archaeal MCM inteins. AlphaFold3-predicted structures (left) and linear amino acid sequences (right) of the mini intein at insertion site MCM-e2 (e) from Methanofastidiosum methylothiophilum (Protein Accession KYC49087.1) (top) and the full intein at insertion site MCM-g from a Hadarchaeales archaeon (Protein Accession MEM2874594.1) (bottom). Neither of the inteins contain within-intein insertions. The N-terminal portion of the self-splicing domain is depicted in light gray, the homing endonuclease (only present in the full intein) in gray, and the C-terminal portion of the self-splicing domain in black. The relative positions of the within-intein insertion sites identified are depicted in red, yellow, and blue for Insertions 1, 2, and 3, respectively. Sequence alignments differentiate Insertion 1 from Insertion 2 in mini inteins (see Supplemental Data S1).
Figure 3. Three sites of within-intein insertions found in the archaeal MCM inteins. AlphaFold3-predicted structures (left) and linear amino acid sequences (right) of the mini intein at insertion site MCM-e2 (e) from Methanofastidiosum methylothiophilum (Protein Accession KYC49087.1) (top) and the full intein at insertion site MCM-g from a Hadarchaeales archaeon (Protein Accession MEM2874594.1) (bottom). Neither of the inteins contain within-intein insertions. The N-terminal portion of the self-splicing domain is depicted in light gray, the homing endonuclease (only present in the full intein) in gray, and the C-terminal portion of the self-splicing domain in black. The relative positions of the within-intein insertion sites identified are depicted in red, yellow, and blue for Insertions 1, 2, and 3, respectively. Sequence alignments differentiate Insertion 1 from Insertion 2 in mini inteins (see Supplemental Data S1).
Dna 05 00039 g003
Figure 4. Distribution of observed intein architecture variants across the 11 MCM intein insertion sites. For each of the 17 observed combinations of homing endonuclease status (full or mini) and within-intein insertions (no, small, or large Inserts 1, 2, and/or 3), their distribution across the 11 MCM intein insertion sites was determined. The total number of inteins observed with a given architecture is included above the bar plot for that particular architecture.
Figure 4. Distribution of observed intein architecture variants across the 11 MCM intein insertion sites. For each of the 17 observed combinations of homing endonuclease status (full or mini) and within-intein insertions (no, small, or large Inserts 1, 2, and/or 3), their distribution across the 11 MCM intein insertion sites was determined. The total number of inteins observed with a given architecture is included above the bar plot for that particular architecture.
Dna 05 00039 g004
Figure 5. Distribution and architectures of MCM inteins across haloarchaea from Atacama Desert populations sampled by Finstad et al. The host protein sequences of the 26 haloarchaeal MCM homologs all from the Atacama Desert metagenomic project of Finstad et al. [21] in which all samples were collected in June of 2013 were used to construct an unrooted phylogeny. The host protein alignment was used as input with IQ-TREE2, allowing ModelFinder to choose the best fit model (LG + F + I + G4 chosen according to Bayesian Information Criterion), and performing 1000 replicates of ultrafast bootstrapping. The genera of the archaea were not provided by the original project, so comparisons to haloarchaeal MCM sequences of known genera were used to assign genera where confident assignments could be made (Figure S4). Circle diagrams are used to depict the intein architecture found at each MCM intein insertion site: a large central gray circle indicates a full intein with a homing endonuclease with both LAGLIDADG motifs intact; a small central gray circle indicates a mini intein with a degraded or completely lost homing endonuclease domain; large and small exterior circles indicate large (>60aa) and small (20aa–60aa) Insertions 1 (red), 2 (yellow), and 3 (blue). A clade with three particularly diverse mini inteins at MCM-d2 (d) is used to demonstrate the associated AlphaFold3 predicted intein structures corresponding to the circle diagrams.
Figure 5. Distribution and architectures of MCM inteins across haloarchaea from Atacama Desert populations sampled by Finstad et al. The host protein sequences of the 26 haloarchaeal MCM homologs all from the Atacama Desert metagenomic project of Finstad et al. [21] in which all samples were collected in June of 2013 were used to construct an unrooted phylogeny. The host protein alignment was used as input with IQ-TREE2, allowing ModelFinder to choose the best fit model (LG + F + I + G4 chosen according to Bayesian Information Criterion), and performing 1000 replicates of ultrafast bootstrapping. The genera of the archaea were not provided by the original project, so comparisons to haloarchaeal MCM sequences of known genera were used to assign genera where confident assignments could be made (Figure S4). Circle diagrams are used to depict the intein architecture found at each MCM intein insertion site: a large central gray circle indicates a full intein with a homing endonuclease with both LAGLIDADG motifs intact; a small central gray circle indicates a mini intein with a degraded or completely lost homing endonuclease domain; large and small exterior circles indicate large (>60aa) and small (20aa–60aa) Insertions 1 (red), 2 (yellow), and 3 (blue). A clade with three particularly diverse mini inteins at MCM-d2 (d) is used to demonstrate the associated AlphaFold3 predicted intein structures corresponding to the circle diagrams.
Dna 05 00039 g005
Figure 6. Within-intein insertions with potential for DNA-binding capacity. (A) The largest insertions found within the MCM inteins at Insertion sites 1, 2, and 3 were used as input in AlphaFold3 (Inserts 1-3, left to right). The structures are colored by an AlphaFold-provided model confidence metric (predicted local distance difference test (pLDDT)), ranging from very high confidence (dark blue) to very low confidence (orange) [27]. (B) The crystal structure of yeast vacuolar ATPase intein PI-SceI bound to its DNA target (PDB 1LWS [48]). Following the scheme used in Figure 3, the N-terminal portion of the self-splicing domain is colored light gray, the DNA-binding insertion is colored red, the homing endonuclease is colored gray, and the C-terminal portion of the self-splicing domain is colored black. The target DNA is colored light brown.
Figure 6. Within-intein insertions with potential for DNA-binding capacity. (A) The largest insertions found within the MCM inteins at Insertion sites 1, 2, and 3 were used as input in AlphaFold3 (Inserts 1-3, left to right). The structures are colored by an AlphaFold-provided model confidence metric (predicted local distance difference test (pLDDT)), ranging from very high confidence (dark blue) to very low confidence (orange) [27]. (B) The crystal structure of yeast vacuolar ATPase intein PI-SceI bound to its DNA target (PDB 1LWS [48]). Following the scheme used in Figure 3, the N-terminal portion of the self-splicing domain is colored light gray, the DNA-binding insertion is colored red, the homing endonuclease is colored gray, and the C-terminal portion of the self-splicing domain is colored black. The target DNA is colored light brown.
Dna 05 00039 g006
Table 1. Degree of MCM intein invasion across homologs. The total numbers of MCM homologs with each degree of intein invasion are given. The degree of invasion ranged from empty (no inteins), to quadruple (four inteins).
Table 1. Degree of MCM intein invasion across homologs. The total numbers of MCM homologs with each degree of intein invasion are given. The degree of invasion ranged from empty (no inteins), to quadruple (four inteins).
MCM Intein Invasion StatusTotal Homologs with Invasion Status
Empty3125
Single709
Double305
Triple79
Quadruple25
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Arsenault, D.; Stack, G.F.; Gogarten, J.P. Inteins at Eleven Distinct Insertion Sites in Archaeal Helicase Subunit MCM Exhibit Varied Architectures and Activity Levels Across Archaeal Groups. DNA 2025, 5, 39. https://doi.org/10.3390/dna5030039

AMA Style

Arsenault D, Stack GF, Gogarten JP. Inteins at Eleven Distinct Insertion Sites in Archaeal Helicase Subunit MCM Exhibit Varied Architectures and Activity Levels Across Archaeal Groups. DNA. 2025; 5(3):39. https://doi.org/10.3390/dna5030039

Chicago/Turabian Style

Arsenault, Danielle, Gabrielle F. Stack, and Johann Peter Gogarten. 2025. "Inteins at Eleven Distinct Insertion Sites in Archaeal Helicase Subunit MCM Exhibit Varied Architectures and Activity Levels Across Archaeal Groups" DNA 5, no. 3: 39. https://doi.org/10.3390/dna5030039

APA Style

Arsenault, D., Stack, G. F., & Gogarten, J. P. (2025). Inteins at Eleven Distinct Insertion Sites in Archaeal Helicase Subunit MCM Exhibit Varied Architectures and Activity Levels Across Archaeal Groups. DNA, 5(3), 39. https://doi.org/10.3390/dna5030039

Article Metrics

Back to TopTop