Next Article in Journal
Long-Term Rewetting of Three Formerly Drained Peatlands Drives Congruent Compositional Changes in Pro- and Eukaryotic Soil Microbiomes through Environmental Filtering
Next Article in Special Issue
Immune-modulatory Properties of the Octapeptide NAP in Campylobacter jejuni Infected Mice Suffering from Acute Enterocolitis
Previous Article in Journal
SpoVG Is Necessary for Sporulation in Bacillus anthracis
Previous Article in Special Issue
Virulence Traits of Inpatient Campylobacter jejuni Isolates, and a Transcriptomic Approach to Identify Potential Genes Maintaining Intracellular Survival
Open AccessArticle

Biological Machine Learning Combined with Campylobacter Population Genomics Reveals Virulence Gene Allelic Variants Cause Disease

1
100 K Pathogen Genome Project, Department of Population Health and Reproduction, School of Veterinary Medicine, University of California Davis, Davis, CA 95616, USA
2
Department of Veterinary, Paraclinical Sciences, College of Veterinary Medicine, University of the Philippines Los Baños, Los Baños 4031, Philippines
*
Author to whom correspondence should be addressed.
Microorganisms 2020, 8(4), 549; https://doi.org/10.3390/microorganisms8040549
Received: 3 March 2020 / Revised: 7 April 2020 / Accepted: 8 April 2020 / Published: 10 April 2020
(This article belongs to the Special Issue Foodborne Pathogen Campylobacter)
Highly dimensional data generated from bacterial whole-genome sequencing is providing an unprecedented scale of information that requires an appropriate statistical analysis framework to infer biological function from populations of genomes. The application of genome-wide association study (GWAS) methods is an appropriate framework for bacterial population genome analysis that yields a list of candidate genes associated with a phenotype, but it provides an unranked measure of importance. Here, we validated a novel framework to define infection mechanism using the combination of GWAS, machine learning, and bacterial population genomics that ranked allelic variants that accurately identified disease. This approach parsed a dataset of 1.2 million single nucleotide polymorphisms (SNPs) and indels that resulted in an importance ranked list of associated alleles of porA in Campylobacter jejuni using spatiotemporal analysis over 30 years. We validated this approach using previously proven laboratory experimental alleles from an in vivo guinea pig abortion model. This framework, termed μPathML, defined intestinal and extraintestinal groups that have differential allelic porA variants that cause abortion. Divergent variants containing indels that defeated automated annotation were rescued using biological context and knowledge that resulted in defining rare, divergent variants that were maintained in the population over two continents and 30 years. This study defines the capability of machine learning coupled with GWAS and population genomics to simultaneously identify and rank alleles to define their role in infectious disease mechanisms. View Full-Text
Keywords: porA; infectious disease; XGBoost; Campylobacter; abortion; protein modeling; artificial intelligence; allelic variation; bacterial metastasis porA; infectious disease; XGBoost; Campylobacter; abortion; protein modeling; artificial intelligence; allelic variation; bacterial metastasis
Show Figures

Figure 1

MDPI and ACS Style

Bandoy, D.D.R.; Weimer, B.C. Biological Machine Learning Combined with Campylobacter Population Genomics Reveals Virulence Gene Allelic Variants Cause Disease. Microorganisms 2020, 8, 549.

Show more citation formats Show less citations formats
Note that from the first issue of 2016, MDPI journals use article numbers instead of page numbers. See further details here.

Article Access Map by Country/Region

1
Search more from Scilit
 
Search
Back to TopTop