Machine Learning-Based Characterization of Bacillus anthracis Phenotypes from pXO1 Plasmid Proteins
Abstract
1. Introduction
2. Materials and Methods
2.1. Extraction of pXO1 Amino Acid Sequences and Embedding Generation
2.2. Functional Annotation of Protein Sequences
2.3. Phylogenetic Tree Construction and Visualization
2.4. Comparison of pXO1 Protein Composition to Whole-Genome Phylogeny
2.5. Affinity Analysis of pXO1 Proteins
2.6. Bacillus anthracis Lineage Classifier
2.7. Protein Module Characterization
2.8. Domain Analysis on Functionally Redundant Module Proteins
2.9. DNA Processing and Replication Module
3. Results
3.1. Bacillus anthracis Whole Genome Sequencing and pXO1 Plasmid Sequencing Produce Inconsistent Phylogenies
3.2. Vector Encodings of pXO1 Plasmid Protein Composition Reflect Lineage-Specific Structure
3.3. Decision Tree of pXO1 Protein Composition Identifies Sublineage-Specific Markers
3.4. Protein Modules Reflect Whole Genome Phylogenetic Structure
3.5. Geographic Distribution of Protein Modules Reveals Region-Specific Genomic Variation
3.6. Domain Analysis of Ambiguously Annotated Proteins Reveals Complementary Functional Roles
3.7. Characterization of a DNA Replication Module Reveals Conserved and Lineage-Specific Protein Functions on the pXO1 Plasmid
4. Discussion
5. Conclusions
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
References
- Ikuta, K.S.; Swetschinski, L.R.; Aguilar, G.R.; Sharara, F.; Mestrovic, T.; Gray, A.P.; Weaver, N.D.; E Wool, E.; Han, C.; Hayoon, A.G.; et al. Global mortality associated with 33 bacterial pathogens in 2019: A systematic analysis for the Global Burden of Disease Study 2019. Lancet 2022, 400, 2221–2248. [Google Scholar] [CrossRef]
- Gatt, Y.E.; Margalit, H. Common Adaptive Strategies Underlie Within-Host Evolution of Bacterial Pathogens. Mol. Biol. Evol. 2021, 38, 1101–1121. [Google Scholar] [CrossRef] [PubMed]
- Arora, G.; Bothra, A.; Prosser, G.; Arora, K.; Sajid, A. Role of post-translational modifications in the acquisition of drug resistance in Mycobacterium tuberculosis. FEBS J. 2021, 288, 3375–3393. [Google Scholar] [CrossRef] [PubMed]
- Carlson, C.J.; Kracalik, I.T.; Ross, N.; Alexander, K.A.; Hugh-Jones, M.E.; Fegan, M.; Elkin, B.T.; Epp, T.; Shury, T.K.; Zhang, W.; et al. The global distribution of Bacillus anthracis and associated anthrax risk to humans, livestock and wildlife. Nat. Microbiol. 2019, 1, 1337–1343. [Google Scholar] [CrossRef]
- Van Ert, M.N.; Easterday, W.R.; Huynh, L.Y.; Okinaka, R.T.; Hugh-Jones, M.E.; Ravel, J.; Zanecki, S.R.; Pearson, T.; Simonson, T.S.; U’Ren, J.M.; et al. Global genetic population structure of Bacillus anthracis. PLoS ONE 2007, 2, e461. [Google Scholar] [CrossRef]
- Collet, J.M.; McGuigan, K.; Allen, S.L.; Chenoweth, S.F.; Blows, M.W. Mutational pleiotropy and the strength of stabilizing selection within and between functional modules of gene expression. Genetics 2018, 208, 1601–1616. [Google Scholar] [CrossRef]
- Hämälä, T.; Guiltinan, M.J.; Marden, J.H.; Maximova, S.N.; dePamphilis, C.W.; Tiffin, P. Gene expression modularity reveals footprints of polygenic adaptation in Theobroma cacao. Mol. Biol. Evol. 2020, 37, 110–123. [Google Scholar] [CrossRef]
- Serra, F.; Arbiza, L.; Dopazo, J.; Dopazo, H. Natural selection on functional modules, a genome-wide analysis. PLoS Comput. Biol. 2011, 7, e1001093. [Google Scholar] [CrossRef]
- Fumasoni, M.; Murray, A.W. The evolutionary plasticity of chromosome metabolism allows adaptation to constitutive DNA replication stress. eLife 2020, 9, e51963. [Google Scholar] [CrossRef]
- Pearson, T.; Busch, J.D.; Ravel, J.; Read, T.D.; Rhoton, S.D.; U’Ren, J.M.; Simonson, T.S.; Kachur, S.M.; Leadem, R.R.; Cardon, M.L.; et al. Phylogenetic discovery bias in Bacillus anthracis using single-nucleotide polymorphisms from whole-genome sequencing. Proc. Natl. Acad. Sci. USA 2004, 101, 13536. [Google Scholar] [CrossRef]
- Lin, Z.; Akin, H.; Rao, R.; Hie, B.; Zhu, Z.; Lu, W.; Smetanin, N.; Verkuil, R.; Kabeli, O.; Shmueli, Y.; et al. Evolutionary-scale prediction of atomic-level protein structure with a language model. Science 2023, 379, 1123–1130. [Google Scholar] [CrossRef] [PubMed]
- Chen, C.; Xu, Y.; Ouyang, J.; Xiong, X.; Łabaj, P.P.; Chmielarczyk, A.; Różańska, A.; Zhang, H.; Liu, K.; Shi, T.; et al. VirulentHunter: Deep learning-based virulence factor predictor illuminates pathogenicity in diverse microbial contexts. Brief Bioinform. 2025, 26, bbaf271. [Google Scholar] [CrossRef] [PubMed]
- Hwang, Y.; Cornman, A.L.; Kellogg, E.H.; Ovchinnikov, S.; Girguis, P.R. Genomic language model predicts protein co-regulation and function. Nat. Commun. 2024, 15, 2880. [Google Scholar] [CrossRef]
- Wu, J.; Ouyang, J.; Qin, H.; Zhou, J.; Roberts, R.; Siam, R.; Wang, L.; Tong, W.; Liu, Z.; Shi, T. PLM-ARG: Antibiotic resistance gene identification using a pretrained protein language model. Bioinformatics 2023, 39, btad690. [Google Scholar] [CrossRef]
- Pilo, P.; Rossano, A.; Bamamga, H.; Abdoulkadiri, S.; Perreten, V.; Frey, J. Bovine Bacillus anthracis in Cameroon. Appl. Environ. Microbiol. 2011, 77, 5818–5821. [Google Scholar] [CrossRef]
- Abdel-Glil Mostafa, Y.; Alexandra, C.; Giuliano, G.; Antonio, F.; Antonio, P.; Dag, H.; Jolley, K.A.; Elschner, M.C.; Tomaso, H.; Linde, J.; et al. A whole-genome-based gene-by-gene typing system for standardized high-resolution strain typing of Bacillus anthracis. J. Clin. Microbiol. 2021, 59, e02889-20. [Google Scholar] [CrossRef]
- Kompes, G.; Duvnjak, S.; Reil, I.; Mihaljević, Ž.; Habrun, B.; Benić, M.; Cvetnić, L.; Špičić, S.; Bagarić, A. Antimicrobial resistance profile, whole-genome sequencing and core genome multilocus sequence typing of B. anthracis Isolates in Croatia from 2001 to 2022. Antibiotics 2024, 13, 639. [Google Scholar] [CrossRef]
- Sawhney, R.; Ferrell, B.D.; Dejean, T.; Schreiber, Z.; Harrigan, W.; Polson, S.W.; Wommack, K.E.; Belcaid, M. Fine-tuning protein language models unlocks the potential of underrepresented viral proteomes. PeerJ 2025, 13, e19919. [Google Scholar] [CrossRef]
- Harrigan, W.L.; Ferrell, B.D.; Wommack, K.E.; Polson, S.W.; Schreiber, Z.D.; Belcaid, M. Improvements in viral gene annotation using large language models and soft alignments. BMC Bioinform. 2024, 25, 165. [Google Scholar] [CrossRef]
- Norris, M.H.; Kirpich, A.; Bluhm, A.P.; Zincke, D.; Hadfield, T.; Ponciano, J.M.; Blackburn, J.K. Convergent evolution of diverse Bacillus anthracis outbreak strains toward altered surface oligosaccharides that modulate anthrax pathogenesis. PLoS Biol. 2020, 18, e3001052. [Google Scholar] [CrossRef]
- Norris, M.H.; Zincke, D.; Daegling, D.J.; Krigbaum, J.; McGraw, W.S.; Kirpich, A.; Hadfield, T.L.; Blackburn, J.K. Genomic and Phylogenetic Analysis of Bacillus cereus Biovar anthracis Isolated from Archival Bone Samples Reveals Earlier Natural History of the Pathogen. Pathogens 2023, 12, 1065. [Google Scholar] [CrossRef]
- La, T.H.A.; McMillan, I.A.; Dahal, P.; Burger, A.H.; Belcaid, M.; Phelps, D.M.; Goldstein, S.M.; Brown, V.R.; Norris, M.H. Tracking sero-molecular trends of swine brucellosis in Hawai‘i and the central Pacific. Front. Public Health 2024, 12, 1440933. [Google Scholar] [CrossRef] [PubMed]
- Shakya, M.; Ahmed, S.A.; Davenport, K.W.; Flynn, M.C.; Lo, C.C.; Chain, P.S.G. Standardized phylogenetic and molecular evolutionary analysis applied to species across the microbial tree of life. Sci. Rep. 2020, 10, 1723. [Google Scholar] [CrossRef] [PubMed]
- Letunic, I.; Bork, P. Interactive Tree Of Life (iTOL) v5: An online tool for phylogenetic tree display and annotation. Nucleic Acids Res. 2021, 49, W293–W296. [Google Scholar] [CrossRef]
- van der Maaten, L.; Hinton, G. Visualizing Data using t-SNE. J. Mach. Lear Res. 2008, 9, 2579–2605. [Google Scholar]
- Pedregosa, F.; Varoquaux, G.; Gramfort, A.; Michel, V.; Thirion, B.; Grisel, O.; Blondel, M.; Prettenhofer, P.; Weiss, R.; Dubourg, V.; et al. Scikit-learn: Machine Learning in Python. J. Mach. Lear Res. 2011, 12, 2825–2830. [Google Scholar]
- Katoh, K.; Rozewicki, J.; Yamada, K.D. MAFFT online service: Multiple sequence alignment, interactive sequence choice and visualization. Brief. Bioinform. 2019, 20, 1160–1166. [Google Scholar] [CrossRef]
- Kuraku, S.; Zmasek, C.M.; Nishimura, O.; Katoh, K. aLeaves facilitates on-demand exploration of metazoan gene family trees on MAFFT sequence alignment server with enhanced interactivity. Nucl. Acids Res. 2013, 41, W22–W28. [Google Scholar] [CrossRef]
- Jones, P.; Binns, D.; Chang, H.-Y.; Fraser, M.; Li, W.; McAnulla, C.; McWilliam, H.; Maslen, J.; Mitchell, A.; Nuka, G.; et al. InterProScan 5: Genome-scale protein function classification. Bioinformatics 2014, 30, 1236–1240. [Google Scholar] [CrossRef]
- Grynberg, M.; Godzik, A. NERD: A DNA processing-related domain present in the anthrax virulence plasmid, pXO1. Trends Biochem. Sci. 2004, 29, 106–110. [Google Scholar] [CrossRef]
- Szklarczyk, D.; Kirsch, R.; Koutrouli, M.; Nastou, K.; Mehryary, F.; Hachilif, R.; Gable, A.L.; Fang, T.; Doncheva, N.T.; Pyysalo, S.; et al. The STRING database in 2023: Protein–protein association networks and functional enrichment analyses for any sequenced genome of interest. Nucleic Acids Res. 2023, 51, D638–D646. [Google Scholar] [CrossRef] [PubMed]
- Kiel, J.; Parker, J.E.; Giffor, H.; Stribling, L.J.V.; Alls, J.L.; Meltz, M.L.; McCreary, R.P.; Holwitt, E.A. Basis for the extraordinary genetic stability of anthrax. Ann. N. Y. Acad. Sci. 2002, 969, 112–118. [Google Scholar] [CrossRef] [PubMed]
- Rondinone, V.; Serrecchia, L.; Parisi, A.; Fasanella, A.; Manzulli, V.; Cipolletta, D.; Galante, D. Genetic characterization of Bacillus anthracis strains circulating in Italy from 1972 to 2018. PLoS ONE 2020, 15, e0227875. [Google Scholar] [CrossRef]
- Pilo, P.; Frey, J. Bacillus anthracis: Molecular taxonomy, population genetics, phylogeny and patho-evolution. Infect. Genet. Evol. 2011, 11, 1218–1224. [Google Scholar] [CrossRef]
- Sahl, J.W.; Pearson, T.; Okinaka, R.; Schupp, J.M.; Gillece, J.D.; Heaton, H.; Birdsell, D.; Hepp, C.; Fofanov, V.; Noseda, R.; et al. A Bacillus anthracis Genome Sequence from the Sverdlovsk 1979 Autopsy Specimens. mBio 2016, 7, e01501-16. [Google Scholar] [CrossRef]
- Liu, S.; Moayeri, M.; Leppla, S.H. Anthrax lethal and edema toxins in anthrax pathogenesis. Trends Microbiol. 2014, 22, 317–325. [Google Scholar] [CrossRef]
- Bower, W.A.; Hendricks, K.A.; Vieira, A.R.; Traxler, R.M.; Weiner, Z.; Lynfield, R.; Hoffmaster, A. What Is Anthrax? Pathogens 2022, 11, 690. [Google Scholar] [CrossRef]
- Bhatnagar, R.; Batra, S. Anthrax Toxin. Crit. Rev. Microbiol. 2001, 27, 167–200. [Google Scholar] [CrossRef]
- Brossier, F.; Mock, M. Toxins of Bacillus anthracis. Toxicon 2001, 39, 1747–1755. [Google Scholar] [CrossRef]
- Tessier, E.; Cheutin, L.; Garnier, A.; Vigne, C.; Tournier, J.N.; Rougeaux, C. Early circulating edema factor in inhalational anthrax infection: Does it matter? Microorganisms 2024, 12, 308. [Google Scholar] [CrossRef]
- Kaguni, J.M. DNA Replication: Initiation in Bacteria. In Encyclopedia of Biological Chemistry, 2nd ed.; Academic Press: Cambridge, MA, USA, 2013. [Google Scholar]
- Zawilak-Pawlik, A.; Nowaczyk, M.; Zakrzewska-Czerwińska, J. The Role of the N-Terminal Domains of Bacterial Initiator DnaA in the Assembly and Regulation of the Bacterial Replication Initiation Complex. Genes 2017, 8, 136. [Google Scholar] [CrossRef]
- On, K.F.; Jaremko, M.; Stillman, B.; Joshua-Tor, L. A structural view of the initiators for chromosome replication. Curr. Opin. Struct. Biol. 2018, 53, 131–139. [Google Scholar] [CrossRef]
- Grimwade, J.E.; Leonard, A.C. Chromosome Replication and Segregation. In Encyclopedia of Microbiology, 3rd ed.; Academic Press: Cambridge, MA, USA, 2009. [Google Scholar]
- Souza, D.P.; Oka, G.U.; Alvarez-Martinez, C.E.; Bisson-Filho, A.W.; Dunger, G.; Hobeika, L.; Cavalcante, N.S.; Alegria, M.C.; Barbosa, L.R.; Salinas, R.K.; et al. Bacterial killing via a type IV secretion system. Nat. Commun. 2015, 6, 6453. [Google Scholar] [CrossRef]
- Cox, G.; Stogios, P.J.; Savchenko, A.; Wright, G.D. Structural and molecular basis for resistance to aminoglycoside antibiotics by the adenylyltransferase ANT(2″)-Ia. mBio 2015, 6, e02180-14. [Google Scholar] [CrossRef]
- Bassenden, A.V.; Dumalo, L.; Park, J.; Blanchet, J.; Maiti, K.; Arya, D.P.; Berghuis, A.M. Structural and phylogenetic analyses of resistance to next-generation aminoglycosides conferred by AAC(2′) enzymes. Sci. Rep. 2021, 11, 11614. [Google Scholar] [CrossRef]
- Kim, C.; Cha, J.Y.; Yan, H.; Vakulenko, S.B.; Mobashery, S. Hydrolysis of ATP by aminoglycoside 3′-phosphotransferases: An unexpected cost to bacteria for harboring an antibiotic resistance enzyme. J. Biol. Chem. 2006, 281, 6964–6969. [Google Scholar] [CrossRef]
- Wintjens, R.; Rooman, M. Structural classification of HTH DNA-binding domains and drotein—DNA interaction modes. J. Mol. Biol. 1996, 262, 294–313. [Google Scholar] [CrossRef]
- Brennan, R.G.; Matthews, B.W. The helix-turn-helix DNA binding motif. J. Biol. Chem. 1989, 264, 1903–1906. [Google Scholar] [CrossRef]
- Pellizzari, R.; Guidi-Rontani, C.; Vitale, G.; Mock, M.; Montecucco, C. Lethal factor of Bacillus anthracis cleaves the N-terminus of MAPKKs: Analysis of the intracellular consequences in macrophages. Int. J. Med. Microbiol. 2000, 290, 421–427. [Google Scholar] [CrossRef]
- Schmitz, M.; Querques, I. DNA on the move: Mechanisms, functions and applications of transposable elements. FEBS Open Bio 2024, 14, 13–22. [Google Scholar] [CrossRef]
- Hugh-Jones, M.; Blackburn, J. The ecology of Bacillus anthracis. Mol. Asp. Med. 2009, 30, 356–367. [Google Scholar] [CrossRef]
- Leppla, S.H. Anthrax toxin edema factor: A bacterial adenylate cyclase that increases cyclic AMP concentrations of eukaryotic cells. Proc. Natl. Acad. Sci. USA 1982, 79, 3162–3166. [Google Scholar] [CrossRef]
- Pena-Gonzalez, A.; Rodriguez-R, L.M.; Marston, C.K.; E Gee, J.; A Gulvik, C.; Kolton, C.B.; Saile, E.; Frace, M.; Hoffmaster, A.R.; Konstantinidis, K.T. Genomic characterization and copy number variation of Bacillus anthracis plasmids pXO1 and pXO2 in a historical collection of 412 strains. mSystems 2018, 3, e00065-18. [Google Scholar] [CrossRef]
- Redzej, A.; Ukleja, M.; Connery, S.; Trokter, M.; Felisberto-Rodrigues, C.; Cryar, A.; Thalassinos, K.; Hayward, R.D.; Orlova, E.V.; Waksman, G. Structure of a VirD4 coupling protein bound to a VirB type IV secretion machinery. EMBO J. 2017, 36, 3080–3095. [Google Scholar] [CrossRef]
- Yuan, Q.; Carle, A.; Gao, C.; Sivanesan, D.; Aly, K.A.; Höppner, C.; Krall, L.; Domke, N.; Baron, C. Identification of the VirB4-VirB8-VirB5-VirB2 pilus assembly sequence of Type IV secretion systems. J. Biol. Chem. 2005, 280, 26349–26359. [Google Scholar] [CrossRef]
- Wegrzyn, K.; Oliwa, M.; Nowacka, M.; Zabrocka, E.; Bury, K.; Purzycki, P.; Czaplewska, P.; Pipka, J.; Giraldo, R.; Konieczny, I. Rep protein accommodates together dsDNA and ssDNA which enables a loop-back mechanism to plasmid DNA replication initiation. Nucleic Acids Res. 2023, 51, 10551–10567. [Google Scholar] [CrossRef]
STRINGDB Reference Proteins | DNA Processing Superfamily/Domain | Domain Function | Found in pXO1 Plasmid Proteins |
---|---|---|---|
Nuclease-related domain containing protein | - NERD: IPR011528 | - DNA processing, and may have a nuclease function. | - Nuclease-related domain containing protein |
Leucine–tRNA Ligase | - Rossmann-like alpha/beta/alpha sandwich fold: IPR014729 | - Bind nucleotide cofactors | - Phosphoadenosine phosphosulfate reductase family protein |
Glycosyltransferase 2-like domain-containing protein domains | - Nucleotide-diphospho-sugar transferases: IPR029044 | - catalyze the transfer of sugar moieties from an activated nucleotide sugar donor to a specific acceptor molecule. | - UTP–glucose-1-phosphate uridylyltransferase GalU |
- Glycosyl transferase family 2: IPR001173 | - Hyaluronan synthase HasA | ||
- TPR-like: IPR011990 | - act as scaffolds for protein–protein interactions, regulating diverse biological processes like cell cycle control, gene regulation, and protein transport. | - Rap family tetratricopeptide repeat protein | |
- Tetratricopeptide repeats: IPR019734 | |||
Nucleotidyltransferase domain protein | - Nucleotidyltransferase: IPR043519 | - catalyze the transfer of a nucleotidyl group from a nucleotide triphosphate (NTP) to an acceptor molecule in DNA and RNA processing, DNA repair, and signal transduction. | - Nucleotidyltransferase domain-containing protein |
- RelA/SpoT domain-containing protein | |||
DNA Polymerase I | - Ribonuclease H-like: IPR012337 | - cleaves RNA within RNA-DNA hybrid structures in DNA replication and repair. | - IS3 family transposase |
- Ribonuclease H: IPR036397 | - IS4 family transposase | ||
- IS4-like element IS231S family transposase | |||
- DNA/RNA polymerases: IPR043502 | - This domain is characterized by its “palm” subdomain, which is crucial to the catalytic activity involving nucleic acid synthesis from a template. | - Group II intron reverse transcriptase/maturase | |
DNA-Directed RNA Polymerase | - Winged helix-like DNA binding domain superfamily: IPR036388 | - sequence-specific DNA binding by transcription factors, as strand-separating wedges in DNA recombination and repair helicases, and can also mediate protein–protein interactions | - Anthrax toxin expression trans-acting transcription regulator AtxA |
- Metalloregulator ArsR/SmtB family transcription factor | |||
- IS3 family transposase | |||
- Transcriptional repressor PagR | |||
- DNA translocase FtsK |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Harrigan, W.; La, T.H.A.; Dahal, P.; Belcaid, M.; Norris, M.H. Machine Learning-Based Characterization of Bacillus anthracis Phenotypes from pXO1 Plasmid Proteins. Pathogens 2025, 14, 1019. https://doi.org/10.3390/pathogens14101019
Harrigan W, La THA, Dahal P, Belcaid M, Norris MH. Machine Learning-Based Characterization of Bacillus anthracis Phenotypes from pXO1 Plasmid Proteins. Pathogens. 2025; 14(10):1019. https://doi.org/10.3390/pathogens14101019
Chicago/Turabian StyleHarrigan, William, Thi Hai Au La, Prashant Dahal, Mahdi Belcaid, and Michael H. Norris. 2025. "Machine Learning-Based Characterization of Bacillus anthracis Phenotypes from pXO1 Plasmid Proteins" Pathogens 14, no. 10: 1019. https://doi.org/10.3390/pathogens14101019
APA StyleHarrigan, W., La, T. H. A., Dahal, P., Belcaid, M., & Norris, M. H. (2025). Machine Learning-Based Characterization of Bacillus anthracis Phenotypes from pXO1 Plasmid Proteins. Pathogens, 14(10), 1019. https://doi.org/10.3390/pathogens14101019