Impact of Genomics on Clarifying the Evolutionary Relationships amongst Mycobacteria: Identification of Molecular Signatures Specific for the Tuberculosis-Complex of Bacteria with Potential Applications for Novel Diagnostics and Therapeutics

An alarming increase in tuberculosis (TB) caused by drug-resistant strains of Mycobacterium tuberculosis has created an urgent need for new antituberculosis drugs acting via novel mechanisms. Phylogenomic and comparative genomic analyses reviewed here reveal that the TB causing bacteria comprise a small group of organisms differing from all other mycobacteria in numerous regards. Comprehensive analyses of protein sequences from mycobacterial genomes have identified 63 conserved signature inserts and deletions (indels) (CSIs) in important proteins that are distinctive characteristics of the TB-complex of bacteria. The identified CSIs provide potential means for development of novel diagnostics as well as therapeutics for the TB-complex of bacteria based on four key observations: (i) The CSIs exhibit a high degree of exclusivity towards the TB-complex of bacteria; (ii) Earlier work on CSIs provide evidence that they play important/essential functions in the organisms for which they exhibit specificity; (iii) CSIs are located in surface-exposed loops of the proteins implicated in mediating novel interactions; (iv) Homologs of the CSIs containing proteins, or the CSIs in such homologs, are generally not found in humans. Based on these characteristics, it is hypothesized that the high-throughput virtual screening for compounds binding specifically to the CSIs (or CSI containing regions) and thereby inhibiting the cellular functions of the CSIs could lead to the discovery of a novel class of drugs specifically targeting the TB-complex of organisms.

. Partial sequence alignment of a conserved region of the folylpolyglutamate synthase protein FOLC showing a three amino acid insertion that is specific for members of the "Tuberculosis" clade.

Mycobacteriaceae
(0/>100) . Partial sequence alignment of a conserved region of the DNA topoisomerase I protein showing a three amino acid insertion that is specific for members of the "Tuberculosis" clade.

Mycobacteriaceae
. Partial sequence alignment of a conserved region of the metal cation transporting ATPase H protein showing a one amino acid deletion that is specific for members of the "Tuberculosis" clade. Mycobacterium tuberculosis . Partial sequence alignment of a conserved region of an acyltransferase protein showing a four amino acid insertion that is specific for members of the "Tuberculosis" clade and absent from most other Mycobacteriaceae.

Mycobacteriaceae
(2/30) . Partial sequence alignment of a conserved region of an alpha-amylase protein showing a one amino acid deletion that is specific for members of the "Tuberculosis" clade. Mycobacterium tuberculosis . Partial sequence alignment of a conserved region of the hypothetical protein IQ48_14915 showing a three amino acid insertion that is specific for members of the "Tuberculosis" clade. Other . Partial sequence alignment of a conserved region of the hypothetical protein CAB90_01059 showing a three amino acid insertion that is specific for members of the "Tuberculosis" clade and is absent from most other Mycobacteriaceae. Mycobacterium tuberculosis . Partial sequence alignment of a conserved region of the transcriptional regulator protein showing a one amino acid deletion that is specific for members of the "Tuberculosis" clade and is absent from most other Mycobacteriaceae.

Mycobacteriaceae
(2/38) . Partial sequence alignment of a conserved region of the hypothetical protein IU12_21070 showing a four amino acid insertion that is specific for members of the "Tuberculosis" clade.  showing a two amino acid deletion that is specific for members of the "Tuberculosis" clade.   showing an eight amino acid insertion that is specific for most members of the "Tuberculosis" clade and is absent from most other Mycobacteriaceae.  Mycobacterium gastri  Figure S27. Partial sequence alignment of the hypothetical protein ERS181347_00724 showing a three amino acid insertion that is specific for members of the "Tuberculosis" clade.   Figure S28. Partial sequence alignment of a conserved membrane protein showing a three amino acid insertion that is specific for members of the "Tuberculosis" clade.    Figure S30. Partial sequence alignment of the anti-sigma K factor protein showing a one amino acid insertion that is specific for members of the "Tuberculosis" clade.   Figure S31. Partial sequence alignment of a conserved protein showing a three amino acid insertion that is specific for members of the "Tuberculosis" clade.  Figure S33. Partial sequence alignment of the multidrug resistance protein EmrB showing a three amino acid deletion that is specific for members of the "Tuberculosis" clade.  Figure S34. Partial sequence alignment of the Hypothetical protein ERS024213_05484 showing a one amino acid deletion that is specific for members of the "Tuberculosis" clade.    Figure S36. Partial sequence alignment of the polyprenyl-diphosphate synthase GrcC protein showing a three amino acid deletion that is specific for members of the "Tuberculosis" clade.  Figure S37. Partial sequence alignment of the polyprenyl-diphosphate synthase GrcC protein showing a one amino acid deletion that is specific for members of the "Tuberculosis" clade and is absent from most other Mycobacteriaceae.   Figure S38. Partial sequence alignment of a cold-shock protein showing a two amino acid deletion that is specific for members of the "Tuberculosis" clade.    Figure S40. Partial sequence alignment of the hypothetical protein IQ40_04435 showing a four amino acid insertion that is specific for members of the "Tuberculosis" clade.   Figure S41. Partial sequence alignment of an esterase protein showing a one amino acid insertion that is specific for members of the "Tuberculosis" clade.  Figure S44. Partial sequence alignment of a phosphoglycerate mutase protein showing a one amino acid deletion that is specific for members of the "Tuberculosis" clade.  Figure S45. Partial sequence alignment of a hypothetical protein CAB90_02390 showing a two amino acid insertion that is specific for members of the "Tuberculosis" clade.   Figure S46. Partial sequence alignment of the glycerol-3-phosphate dehydrogenase protein showing a four amino acid insertion that is specific for members of the "Tuberculosis" clade and is absent from most other Mycobacteriaceae.

Mycobacteriaceae
(2/>100)  Figure S47. Partial sequence alignment of the GTP-binding protein LepA showing a three amino acid insertion that is specific for members of the "Tuberculosis" clade.  Figure S48. Partial sequence alignment of the type I restriction/modification system specificity determinant HsdS protein showing a four amino acid insertion that is specific for members of the "Tuberculosis" clade.  Figure S49. Partial sequence alignment of the hypothetical protein IQ38_12515 showing a two amino acid deletion that is specific for members of the "Tuberculosis" clade. . Partial sequence alignment of the polyketide synthase protein showing a three amino acid insertion that is specific for most members of the "Tuberculosis" clade.  Figure S51. Partial sequence alignment of a lipase protein showing a one amino acid insertion that is specific for members of the "Tuberculosis" clade.  Figure S53. Partial sequence alignment of the DNA polymerase IV protein showing a one amino acid deletion that is specific for most members of the "Tuberculosis" clade and is absent from most other Mycobacteriaceae.

Mycobacteriaceae
(1/53) Figure S54. Partial sequence alignment of an ATP-dependent DNA helicase protein showing a one amino acid deletion that is specific for members of the "Tuberculosis" clade and is absent from most other Mycobacteriaceae.

Mycobacteriaceae
(2/>100) Figure S55. Partial sequence alignment of a membrane protein showing a one amino acid insertion that is specific for members of the "Tuberculosis" clade.  Figure S56. Partial sequence alignment of an ATPase protein showing a one amino acid insertion that is specific for members of the "Tuberculosis" clade.  Figure S57. Partial sequence alignment of a DNA glycosylase protein showing a four amino acid insertion that is specific for members of the "Tuberculosis" clade and is absent from most other Mycobacteriaceae.

Mycobacteriaceae
(2/>100) Figure S58. Partial sequence alignment of the hypothetical protein IQ47_16905 showing a three amino acid insertion that is specific for most members of the "Tuberculosis" clade and is absent from most other Mycobacteriaceae.

Mycobacteriaceae
(2/>100) . Partial sequence alignment of a hydrolase protein showing a three amino acid insertion that is specific for members of the "Tuberculosis" clade and is absent from most other Mycobacteriaceae.  Figure S60. Partial sequence alignment of the hypothetical protein RN11_1864 showing an eight amino acid insertion that is specific for most members of the "Tuberculosis" clade and is absent from most other Mycobacteriaceae.