Proteomic Analysis of Antigen 60 Complex of M. bovis Bacillus Calmette-Guérin Reveals Presence of Extracellular Vesicle Proteins and Predicted Functional Interactions

Tuberculosis (TB) is ranked among the top 10 causes of death worldwide. New biomarker-based serodiagnostics and vaccines are unmet needs stalling disease control. Antigen 60 (A60) is a thermostable mycobacterial complex typically purified from Bacillus Calmette-Guérin (BCG) vaccine. A60 was historically evaluated for TB serodiagnostic and vaccine potential with variable findings. Despite containing immunogenic proteins, A60 has yet to be proteomically characterized. Here, commercial A60 was (1) trypsin-digested in-solution, analyzed by LC-MS/MS, searched against M. tuberculosis H37Rv and M.bovis BCG Uniprot databases; (2) analyzed using STRING to predict protein–protein interactions; and (3) probed with anti-TB monoclonal antibodies and patient immunoglobulin G (IgG) on Western blot to evaluate antigenicity. We detected 778 proteins in two A60 samples (440 proteins shared), including DnaK, LprG, LpqH, and GroEL1/2, reportedly present in mycobacterial extracellular vesicles (EV). Of these, 107 were also reported in EVs of M. tuberculosis, and 27 key proteins had significant protein–protein interaction, with clustering for chaperonins, ribosomal proteins, and proteins for ligand transport (LpqH and LprG). On Western blot, 7/8 TB and 1/8 non-TB sera samples had reactivity against 37–50 kDa proteins, while LpqH, GroEL2, and PstS1 were strongly detected. In conclusion, A60 comprises numerous proteins, including EV proteins, with predicted biological interactions, which may have implications on biomarker and vaccine development.


Introduction
Tuberculosis (TB) persists as a top global cause of death, particularly in low-income areas [1]. Mycobacterium tuberculosis complex (MTB) are the causative agents of tuberculosis, and consist of several related pathogenic mycobacteria including M. africanum, M. caprae and M.bovis [2]. The latter typically infects cattle and its attenuated strain, Bacillus Calmette-Guérin (BCG), has been used as a live TB vaccine since 1921 [3]. However, the BCG affords poor protection for adults and is incompatible with immunocompromised individuals [4]. Furthermore, the diversity of pathogenic strains that cause TB and heterogeneous clinical manifestations across different subpopulations [5][6][7], reliance on laboratory-confined sputum-based diagnostics [8], and cumbersome treatment regimens [9], have complicated efforts to significantly decrease global TB transmission, precipitated in ice-cold acetone overnight, then reduced using 10 mM tris(2-carboxyethyl) phosphine (Sigma Aldrich, St. Louis, MO, USA) (45 min, dark at 37 • C) and alkylated using 55 mM iodoacetamide (Sigma-Aldrich) (30 min, dark at room temperature (RT)). Samples were trypsin (Thermo Fisher Scientific, Waltham, MA, USA) digested at 1:50 enzyme: protein ratio (overnight, 37 • C), acidified to pH < 2 with formic acid, and desalted using Pierce C18 Spin Columns (Thermo Fisher Scientific) with flow-through passed back over the column twice. Proteins were eluted with 80% acetonitrile containing 0.1% trifluoroacetic acid, concentrated to 20 µL by SpeedVac TM (Thermo Fisher Scientific) and stored at −80 • C prior to mass spectrometry analysis.

Mass Spectrometry (MS)
LC-MS/MS was performed using Orbitrap Lumos mass spectrometer (Thermo Fisher Scientific) fitted with nanoflow reversed-phase HPLC (Ultimate 3000 RSLC, Dionex). The nano-LC system was equipped with an Acclaim Pepmap nano-trap column and an Acclaim Pepmap RSLC analytical column. 1 µL of the peptide mix was loaded onto the enrichment (trap) column at an isocratic flow of 5 µL/min of 3% acetonitrile containing 0.1% formic acid for 6 min before the enrichment column was switched in-line with the analytical column. The eluents used for the LC were 0.1% v/v formic acid (solvent A) and 100% acetonitrile/0.1% formic acid v/v. The gradient used was 3% B to 20% B for 95 min, 20% B to 40% B in 10 min, 40% B to 80% B in 5 min and maintained at 80% B for the final 5 min before equilibration for 10 min at 3% B prior to the next sample. The mass spectrometer was equipped with a NanoEsi nano-electrospray ion source (Thermo Fisher Scientific) for automated MS/MS. High mass accuracy MS data were obtained in a data-dependent acquisition mode with the Orbitrap resolution set at 75,000 and the top-ten multiply charged species selected for fragmentation by higher-energy collisional dissociation (HCD) (single-charged and double-charged species were ignored). The ion threshold was set to 15,000 counts for MS/MS. The capillary electrophoresis (CE) voltage was set to 27. The resolution was set to 120,000 at MS1 with lock mass of 445.12003 with HCD Fragmentation and MS2 scan in ion trap. Top 3 s method was used to select species for fragmentation. Singly charged species were ignored and an ion threshold triggering at 1 × 10 4 was employed. CE voltage was set to 1.9 kV.

Serum Sample Population
Archived serum samples acquired from repositories managed by the Foundation for Innovative New Diagnostics (FIND) (25) were from Vietnamese HIV-negative eligible consenting (>18 years) active pulmonary TB patients (PTB) (n = 8), confirmed using solid or liquid TB culture, and non-TB controls (n = 8) provisionally diagnosed with PTB based on chest X-ray and other symptoms suggestive of PTB, but tested negative for smear microscopy and culture at enrolment and at two months follow-up. Serum samples were collected between July 2009 to December 2012 in Vietnam, before initiation of treatment. Information on TB genotype and status of latent or extrapulmonary TB (LTB, ETB) were unavailable. All human samples used in this study are from repositories for which subjects gave their informed consent for inclusion and are non-identifiable. The study was conducted with approval by the Alfred Ethics Committee (Certificate No. 169/13).

Data Analysis
The MS/MS spectra data were used to identify proteins using the Mascot search algorithm (Matrix Science, London, UK) queried against Uniprot databases for MTB H37Rv (83332) and M. bovis BCG (410289), with trypsin set as the specific digest reagent. Significant protein matches (>2 significant peptide sequences) were annotated based on Gene Ontology (GO) terms, and analyzed using STRING 11.0 [35], a powerful web resource of known and predicted protein-protein interactions. As the BCG database was not available in STRING, we used the MTB H37Rv database for predictive protein-interaction analysis and to classify the identified proteins into their biological process and cellular compartment based on enriched GO terms. The Venn diagram online tool [36] was used to compare protein significant matches of A60-BCG S1 and S2, and to compare proteins of A60-BCG and EVs of MTB published by Lee et al. [37]. Data were tabulated and visualized using Microsoft Excel.

Protein Identification and Annotation
Commercial A60 (S1 and S2) were subjected to shotgun proteomic analysis on LC-MS/MS and searched against BCG Uniprot database. Similar to previous studies of "method-antigen complexes" such as MTB and M. bovis PPD [38,39], shotgun proteomic analysis of the A60-BCG complex antigen preparation detected a high number of proteins, specifically 630 and 595 proteins with ≥2 significant matches in A60 S1 and A60 S2, respectively, of which 440 were common to both samples (71% of A60 S1; 74% of A60 S2), amounting to 778 combined protein matches ( Figure 1A) (Table S1). Overall, the proteins identified in the A60 represents approximately 19.9% of the 3891 proteins identified in the BCG proteome [40].

Predictive Protein-Protein Interaction
Analysis of 426 proteins possessing MTB H37Rv homology using STRING MTB H37Rv database (set to highest confidence with interaction score of >0.9) showed significant predictive proteinprotein interaction (PPI) enrichment (p-value: < 1.0 × 10 −16 ), indicating that proteins in the A60 complex as a group are at least partially biologically connected. The main clusters of functional interactions observed were ribosomal/transcription and translation-related (29 proteins), metabolic enzymes (20 proteins), stress response/protein-refolding chaperones (12 proteins), and fatty acid biosynthesis (9 proteins) (Supplementary Figure S1). Significant interaction networks were also observed in the 107 proteins shared between A60 and EV MTB (p-value < 1.0 × 10 −16 ), primarily Further analysis were conducted on the 440 proteins shared between A60 S1 and S2. Of these 440 proteins identified using BCG Uniprot database, 426 had matches in MTB H37Rv database. These 426 proteins were compared with the proteomic profile of EV MTB ( Figure 1A) and annotated for GO terms based on enrichments in STRING analysis. Among A60 proteins with significant functional enrichment (FDR < 7.39 × 10 −5 ), were those associated with metabolic processes (27%), cellular processes (26%), growth (16%) and biosynthetic processes (15%). For cellular components classification, a majority of the proteins were associated with the membrane (35%), followed by cell wall (25%), cytoplasm (23%), protein-containing/macromolecular complexes (8%) and extracellular regions (9%).

Predictive Protein-Protein Interaction
Analysis of 426 proteins possessing MTB H37Rv homology using STRING MTB H37Rv database (set to highest confidence with interaction score of >0.9) showed significant predictive protein-protein interaction (PPI) enrichment (p-value: < 1.0 × 10 −16 ), indicating that proteins in the A60 complex as a group are at least partially biologically connected. The main clusters of functional interactions observed were ribosomal/transcription and translation-related (29 proteins), metabolic enzymes (20 proteins), stress response/protein-refolding chaperones (12 proteins), and fatty acid biosynthesis (9 proteins) (Supplementary Figure S1). Significant interaction networks were also observed in the 107 proteins shared between A60 and EV MTB (p-value < 1.0 × 10 −16 ), primarily consisting of plasma membrane proteins with functional enrichment for growth and protein binding, with clusters of chaperone proteins, ribosomal proteins, and enzymatic proteins ( Figure 2).   Based on highest number of peptides with significant matches as well as experimental detection and available literature, 27 proteins of interest were also analyzed in greater detail ( Table 1). The PPI enrichment analysis suggests significant functional interaction between the proteins (p-value 9.49 × 10 −11 ) with three main clusters of interaction between (1) chaperonins DnaJ1, DnaK, GroEL1, GrpE, ClpB and GroEL2; (2) ligand transport-related LpqH and LprG; (3) ribosomal RpoB, RpoC, RpsA, Tuf, AtpA, and AtpD; and (4) fatty acid biosynthesis proteins Fas and FabG4. The highest scores obtained for predicted interaction were of rpoC with rpoB (0.999), atpD with atpA (0.999), GrpE with DnaK (0.998), and DnaJ1 with DnaK (0.998). Correspondingly, the top functional enrichments for these proteins were predicted to be protein folding (FDR: 1.17 × 10 −8 ), growth (FDR: 7.59 × 10 −8 ) and response to heat (FDR: 4.78 × 10 −7 ), primarily related to chaperone protein activity and stress response.

Antigenicity of A60-BCG
Electrophoresed and blotted A60 probed with rabbit polyclonal anti-WCL antibodies showed several bands of varying intensity between 20 kDa and 75 kDa, with smeared bands approximately 50 kDa and higher, while clear bands were seen in lower molecular weight (MW) proteins between 20-37 kDa (Figure 3a). This pattern of antigenicity in rabbit antiserum appears similar when A60 was probed with patient sera samples. Western blot of individual serum IgG shows consistent reactivity against proteins approximately 37-50 kDa in MW were observed in 7/8 patients and 1/8 non-TB controls from Vietnam, while reactivity against a 20 kDa band was observed in 4/7 patients, and the same non-TB control (Figure 3c). Finally, strong bands were observed when probed using pooled mouse monoclonal anti-LpqH, anti-PstS1 and anti-GroEL2 antibodies, particularly for LpqH (Figure 3b). However, bands for PstS1 and GroEL2 appeared at lower MW while LpqH band appeared at a significantly higher MW. Electrophoresed and blotted A60 probed with rabbit polyclonal anti-WCL antibodies showed several bands of varying intensity between 20 kDa and 75 kDa, with smeared bands approximately 50 kDa and higher, while clear bands were seen in lower molecular weight (MW) proteins between 20-37 kDa (Figure 3a). This pattern of antigenicity in rabbit antiserum appears similar when A60 was probed with patient sera samples. Western blot of individual serum IgG shows consistent reactivity against proteins approximately 37-50 kDa in MW were observed in 7/8 patients and 1/8 non-TB controls from Vietnam, while reactivity against a 20 kDa band was observed in 4/7 patients, and the same non-TB control (Figure 3c). Finally, strong bands were observed when probed using pooled mouse monoclonal anti-LpqH, anti-PstS1 and anti-GroEL2 antibodies, particularly for LpqH ( Figure  3b). However, bands for PstS1 and GroEL2 appeared at lower MW while LpqH band appeared at a significantly higher MW.

Discussion
A60 is a large antigenic complex that has been consistently purified from lysate of log-phase grown BCG using a Sepharose 6B SEC column [34], albeit with some antigenic variation between batches. Despite its extended history, the origins of A60 and the reason this complex consistently appears upon cell lysis has not been investigated given that the micelle-forming complex was presumed to be an artefact of cell lysis.
However, in recent years, there have been emerging interest in re-evaluation of classical mycobacterial antigen preparations using global approaches such as mass spectrometry, which may provide more insights into their characteristics to inform research in biomarkers and vaccine development. These studies have primarily focussed on the proteomic profiling of PPD [38,39], a cocktail of antigens used for determining latent tuberculosis in the Tuberculin Skin/Mantoux Test (including M. bovis PPD, used for diagnosis of bovine TB), which has been historically purified from steaming cultures of MTB by repeated precipitation with ammonium sulfate [41]. Given the crude nature of the PPD and A60, it is unsurprising that the two preparations reportedly share several similar antigens [25]. However, this is the first study known at present to proteomically characterize the A60 complex of M. bovis BCG including predictive protein-protein-interaction analysis of A60 members and their corresponding antigenicity.
Our MS results, identifying hundreds of different proteins, appear consistent with previous proteomic studies on complexes such as PPD reporting identification of 265 [39], 356 [38] and 608 [42] different proteins, and cellular component proteome studies such as M. avium and MTB cell wall identifying 309 [43] and 528 [44] different proteins. In this study, 778 unique proteins with ≥2

Discussion
A60 is a large antigenic complex that has been consistently purified from lysate of log-phase grown BCG using a Sepharose 6B SEC column [34], albeit with some antigenic variation between batches. Despite its extended history, the origins of A60 and the reason this complex consistently appears upon cell lysis has not been investigated given that the micelle-forming complex was presumed to be an artefact of cell lysis.
However, in recent years, there have been emerging interest in re-evaluation of classical mycobacterial antigen preparations using global approaches such as mass spectrometry, which may provide more insights into their characteristics to inform research in biomarkers and vaccine development. These studies have primarily focussed on the proteomic profiling of PPD [38,39], a cocktail of antigens used for determining latent tuberculosis in the Tuberculin Skin/Mantoux Test (including M. bovis PPD, used for diagnosis of bovine TB), which has been historically purified from steaming cultures of MTB by repeated precipitation with ammonium sulfate [41]. Given the crude nature of the PPD and A60, it is unsurprising that the two preparations reportedly share several similar antigens [25]. However, this is the first study known at present to proteomically characterize the A60 complex of M. bovis BCG including predictive protein-protein-interaction analysis of A60 members and their corresponding antigenicity.
Our MS results, identifying hundreds of different proteins, appear consistent with previous proteomic studies on complexes such as PPD reporting identification of 265 [39], 356 [38] and 608 [42] different proteins, and cellular component proteome studies such as M. avium and MTB cell wall identifying 309 [43] and 528 [44] different proteins. In this study, 778 unique proteins with ≥2 significant matches were identified in two experimental replicate A60 samples, among which 440 were detected in both samples while 183 and 155 proteins were unique to A60 S1 and A60 S2, respectively. Of the 440 proteins in A60, 426 proteins with MTB H37Rv homology were largely found to have significant enrichment for cell wall and membrane components, and functional enrichment for metabolic and cellular processes. This appears in line with the functional interaction networks observed on STRING analysis illustrating significant clustering among ribosomal proteins, metabolic enzymes, stress response/protein-refolding chaperones, and fatty acid biosynthesis. These same enrichments for proteins of growth and defense have also been reported in the extracellular proteins of other bacteria during the exponential/log-phase growth [45].
These findings suggest that despite assumptions that A60 complexes are artefacts of cell lysis and extraction methods, several proteins present in these heterogeneous complexes may possess physical or functional biological associations. This raises further questions whether the micellar structures that contain A60 proteins may be associated with proteins released extracellularly into EVs, since a total of 107 proteins identified in the A60 were also reported in Lee et al.'s (2015) proteomic analysis of EV MTB [37]. Several of these proteins such as LpqH, DnaK, GroEL1, and PstS1-related PstS3 [29,46,47], were found to be clustered with other chaperone proteins such as ClpB, DnaJ1 and GrpE based on predictive protein-protein-interaction analysis. The latter of which has been implicated as a novel immune activator capable of interacting with dendritic cells to generate Th1-biased memory T cells [48] and shown to confer better protection compared to that of DnaK-immunization [49]. Although the presence of several chaperones known to prevent misfolding, facilitate folding/refolding, and more recently, unfolding in order to recover functional proteins from aggregates appears in support of the A60 being a purposeful complex [50], the possibility of the protein aggregation being a product of bacterial cell lysis cannot be excluded [51]. Additionally, because DnaK and other chaperone proteins present in A60, such as HspX, and GroEL, are known to be highly homologous and conserved in different mycobacteria [52], it is unsurprising that attempts at using crude preparation such as A60 and PPD for diagnostic purposes have given rise to many false positive results [41].
Besides the presence of a chaperone protein cluster, other significant interactions found among 27 proteins of interest highlight clustering of protein synthesis-associated RpoB, RpoC, RpsA, Tuf, AtpA, and AtpD, and interaction between ligand transport proteins LpqH and LprG, which have been observed in EVs. While less is known about the latter three proteins, RpoB is well-described due to its association with resistance to the first-line TB drug rifampicin, with mutations in rpoC recently also found to influence RpoB-related resistance [53], while mutations for RpsA-coding genes is known to confer resistance against another first-line TB drug, pyrazinamide [54]. LpqH is a well-described protein consistently found in EVs, which has been observed to be overexpressed when EV production is increased [55], while LprG translocates lipoarabinomannan to the cell surface and transports triacylglycerides across the inner cell membrane into the periplasm [56]. Together these co-occurring ligand transport proteins are both virulence factors recognized by toll-like receptor 2 (TLR2) known to enhance TLR2-associated inflammation responses [47,57].
Despite the significant number of proteins identified on LC-MS/MS, experimental confirmation remains limited by existing monoclonal antibodies against the proteins. Using available antibodies, the Western blot analysis confirms presence of LpqH, GroEL2, and PstS1 likely due to molecular and functional similarity to PstS3 [31]. Although the protein bands appear at different molecular weights to their predicted sizes, particularly for LpqH, this size discrepancy has been observed previously and is hypothesized to relate to post-translational modifications and/or anomalous behaviour of the protein on SDS-PAGE [58].
When A60 was probed with TB patient sera samples, reactivity was most consistently observed against proteins approximately between 37-50 kDa and 20 kDa. These MWs appear to match the range of sizes of A60 proteins reported in earlier studies [15,21], as well as the sizes of the majority of the 27 proteins identified with the highest significant peptide matches on LC-MS/MS in this study.
However, these were also the same bands recognized by a single non-TB control serum sample. This cross-reactivity may reflect the presence of conserved proteins such as DnaK and GroEL, and population exposure to environmental mycobacteria as is common in TB-endemic countries such as Vietnam, from which the serum samples were collected [59], although it is also possible that this patient may have had ETB that was not detected through the diagnostics methods utilised.
The major limitations of the data presented relate to batch-to-batch variation of A60 and BCG, and the influence of sample processing for MS, which may hamper exact reproducibility of data. Although the analysis focussed on consistently detected proteins, as with many native antigenic preparations (including for EVs), contaminating proteins may be present which may result in artefacts. The STRING analysis is predictive and the interactions illustrated have yet to be experimentally validated, largely due to limitations in antibodies available against proteins detected, and lack of reactivity of available antibodies against certain proteins such as DnaK (data not shown) due to unknown reasons, which complicates attempts to verify predicted interactions through immunological methods such as Western blot and co-immunoprecipitation. Furthermore, due to limited databases available for M. bovis BCG, PPI analysis using STRING could only be conducted for proteins with peptide sequence homology to MTB H37Rv proteins. Finally, this analysis of A60 neglects the many glycolipid antigens that are not detectable by MS, and due to its qualitative nature, the proteins identified and their respective significant peptide matches provide limited information on protein abundance and distribution in the complex. Hence, in-depth quantitative studies are important to determine the abundance and presence of up-or down-regulated proteins in A60 especially for EV-associated proteins. Particularly striking is the signal intensity detected for LpqH on Western blot, a bonafide EV-associated protein, despite being detected with lower numbers of significant peptide matches-underscoring the qualitative nature of the MS analysis.
This study remains a first attempt to identify and characterize proteins present in A60 complex using a global proteomic bioinformatics approach, which has identified several proteins associated with MTB growth, survival and interaction with host immunity. Future work may focus on purification of A60-like HMW complexes from virulent MTB for comparison with HMW of attenuated/non-pathogenic strains, such as MTB H37Ra and BCG to identify upregulated proteins unique to MTB using labelled proteomic approaches such as Isobaric tags for relative and absolute quantitation (iTRAQ), and for direct proteomic and electron microscopic comparison with EVs. Such characterization may bring us closer to isolation of more consistent preparations of immunogenic and antigenic proteins which are specific to pathogenic MTB and possess reduced cross-reactivity with non-pathogenic mycobacteria, to inform the burgeoning research in mycobacterial EVs and impel progress in TB biomarker and vaccine development.