Whole-Genome Sequencing of Mycobacterium tuberculosis Isolates from Diabetic and Non-Diabetic Patients with Pulmonary Tuberculosis

The association of tuberculosis and type 2 diabetes mellitus has been a recognized re-emerging challenge in management of the convergence of the two epidemics. Though much of the literature has studied this association, there is less knowledge in the field of genetic diversities that might occur in strains infecting tuberculosis patients with and without diabetes. Our study focused on determining the extent of diversity of genotypes of Mycobacterium tuberculosis in both these categories of patients. We subjected 55 M. tuberculosis isolates from patients diagnosed with pulmonary TB with and without type 2 diabetes mellitus to whole-genome sequencing on Illumina Hi Seq platform. The most common lineage identified was lineage 1, the Indo-Oceanic lineage (n = 22%), followed by lineage 4, the Euro-American lineage (n = 18, 33%); lineage 3, the East-African Indian lineage (n = 13, 24%); and lineage 2, the East-Asian lineage (n = 1, 2%). There were no significant differences in the distribution of lineages in both diabetics and non-diabetics in the South Indian population, and further studies involving computational analysis and comparative transcriptomics are needed to provide deeper insights.


Introduction
Globally, about 10 million people are estimated to have developed tuberculosis (TB) in 2020 with eight countries accounting for two-thirds of the global total, with India reporting the largest proportion at 26% [1]. Type 2 diabetes mellitus (DM) is becoming a major public health problem globally, especially in emerging-economy countries [2]. Worldwide, about 15-25% of annual incident TB cases are estimated to occur among persons with DM [3,4]. World Health Organization (WHO) has indicated that India will become the "diabetes capital of the world" by 2025 [5]. DM increases the risk of developing TB by twoto three-fold [6,7], and DM worsens the clinical course of TB, while TB worsens glycemic control in those with DM [5]. Also, DM is associated with treatment failure, relapse, and deaths [8]. The association of active TB in DM patients is significantly increased when compared to those without DM [9,10]. Diminished innate and adaptive immunity likely contributes to the reduced ability of DM patients to control M. tuberculosis infection [11]. In DM patients, subsequent episodes of infection are more likely to be from the same bacteria as the previous episode. However, previous studies have shown that the occurrence of Patients diagnosed with pulmonary TB examined between January and December 2019 at J.S.S. Hospital in Mysore, Karnataka state, India, were included in the study. We grouped the patients according to their place of residence in different administrative units of the state, i.e., districts and taluks including Mysore, Chamrajnagar, K.R. Pete, Hassan, Hunsur, Kodagu, Mandya, Srirangapatna, T. Narasipura, and H.D.Kote. Sputum samples from the recruited subjects were tested at the clinical Microbiology laboratory at J.S.S Hospital. Samples that were smear-positive by Ziehl Neelsen staining were utilized for the study. Patient details were collected from the hospital records.

Sample Processing
Sputum samples were digested and decontaminated with the N-acetyl-L-cysteine sodium hydroxide (NALC-NaOH) method and cultured on Lowenstein-Jensen medium (HiMedia Laboratories, Mumbai, India). Cultures were monitored for growth for up to 4 weeks before declaring them as negative growth. M. tuberculosis colonies were scraped from the Lowenstein-Jensen medium and genomic DNA was extracted with the QIAamp DNA mini kit (QIAGEN, Hilden, Germany) following the manufacturer's protocol. Relevant patient demographic and clinical information details were collected from their medical records.

DNA Quality Check
The DNA samples were assessed for quality with the criteria of an OD260/280 ratio of 1.8 to 2.0 and an OD260/230 ratio of 2.0 to 2.2, with the latter bearing an additional value for purity. In addition, gel electrophoresis was performed to determine if a further cleaning was required before shearing.

DNA Library Preparation Protocol
Whole-genome libraries were prepared with NEBNext ® Ultra™ DNA Library Prep Kit (Cat. No: E7370L). The workflow involved shearing of DNA (250 bp), repairing ends, and adenylation of 3 ends, followed by adapter ligation. At each step, the products were purified with AMPure beads (Beckman Coulter, Cat. No.: A63882, San Jose, CA, USA). The adapter sequences were added onto the ends of DNA fragments to generate paired-end libraries. The resulting adaptor-ligated libraries were purified and index tags were added by amplification, followed by purification. Libraries were assessed for quality and quantity with an Agilent 2200 tapestation system (Cat. No.: 3-PM 863NA).

Sequencing Protocol
Prepared libraries were quantified with Qubit high-sensitivity reagent. The obtained libraries were diluted to a final concentration of 2 nm in 10 µL and were subjected to cluster amplification. Once the cluster generation was completed, the flow cells were loaded on to the sequencer. The sequencing was carried out on Illumina Hi Seq at MedGenome Labs Ltd., Bangalore, to generate 2 × 150 base-pair sequence reads at 100× sequencing depth coverage (~0.5 GB). Sequence data were processed to generate FASTQ files, which were uploaded onto the FTP server for secondary data analysis.

Bioinformatics Analysis
The FASTQ sequence files were analyzed with Geneious [14] software version R11.02 with default settings. Raw and processed read statistics were computed and visualized with FastQC v.0.11.3 for quality control. Reads with Phred score over 30 were used for our analysis. The short reads were assembled by mapping to a reference sequence of H37Rv M. tuberculosis strain (Gen Bank accession number AL_123456) and de novo assembly. Paired-end reads were mapped to the publicly available annotated genome of M. tuberculosis reference strain H37Rv (Gen Bank accession number AL_123456). The consensus sequence was obtained with the default setting of Geneious. Consensus fasta files were submitted for rapid annotation based on the subsystems technology (RAST) [15] server and genotyping software. Mycobacterial lineage, sublineage, main spoligotype, and RD (region of difference) were predicted with TB Profiler (https://tbdr.lshtm.ac.uk/, accessed on 17 August 2019) with the default setting. Lineage prediction, Beijing typing, and in silico spoligotyping were performed with CASTB at default settings [16], and TB Profiler and PhyresSE [17] databases were consulted for the detection of drug resistance and associated mutations. Core and accessory genome phylogenetic trees were constructed with the automasublineated pipeline of Roary [18] with the default setting. The newick files of the trees were visualized with the iTOL v5 (Interactive tree of life) online-based software (https://itol.embl.de/, accessed on 25 February 2020) [19].

Statistical Analysis
Data were entered in Microsoft Excel and analyzed using SPSS Version 25 (licensed to the institution). Descriptive analysis was carried out using proportions, and graphs were used to describe the lineage distribution. Inferential statistics were analyzed using chi-square analysis to find the association between patient characteristics and diabetes status. p-values were considered significant when <0.05.

Patient Characteristics
Between January and December 2019, we obtained 55 M. tuberculosis isolates from patients diagnosed with pulmonary TB. Of these, 20 (36%) were from patients with DM and 35 (64%) were from patients without DM. Demographic and clinical characteristics of the patients are shown in Table 1. p value was calculated via chi-square test with a 2 × 2 contingency table and the association between two categorical variables was assessed. Among all the variables tested, only the p value for age group was found to be significant (0.04). The majority of patients were men (n = 40, 73%) in the age group of 50-69, with diabetes being the major co-morbidity (n = 20, 36.36%). No formal education of any grade had been received by most patients (n = 24, 43.63%) and many of them were unemployed or their occupation status was unknown (n = 21, 38%). There was a history of smoking in only a small number of patients (n = 7, 12.7%).

M. tuberculosis Genotypes
To explore the genomic diversities of the isolates, we performed whole-genome sequencing to characterize the genomic profiles of the 55 isolates. Figure 1 shows the distribution of lineages. The most common lineage among the samples was lineage 1, the Indo-Oceanic lineage (n = 22, 40%), followed by lineage 4, the Euro-American lineage (n = 18, 33%); lineage 3, the East-African Indian lineage (n = 13, 24%); and lineage 2, the East-Asian lineage (n = 1, 2%). One isolate was not assigned any lineage by the bioinformatics tools as it was identified as contaminated. Indo-Oceanic lineage was the most common lineage among both DM (n = 9, 45%) and non-DM patients (n = 13, 37%).  Table 2 and Figure 2 show distribution of the study population with regard to lineages and their geographical place of residence. Each district had almost equal number of diabetic and non-diabetic patients. The highest number of patients came from Mysore (n = 26). All four lineages were detected in the study population from Mysore, which included lineage 1 (n = 11), lineage 2 (n = 1), lineage 3 (n = 4), and lineage 4 (n = 10). Table 2. Distribution of lineages with respect to geographical locations of patients and their diabetic status (DM-diabetic, ND-non-diabetic). Lineage 1 (Indo-Oceanic) was the most common among both diabetic (n = 9) and non-diabetic populations (n = 13). Asian lineage (n = 1). One particular isolate out of the fifty-five was identified as contaminated. Table 2 and Figure 2 show distribution of the study population with regard to lineages and their geographical place of residence. Each district had almost equal number of diabetic and non-diabetic patients. The highest number of patients came from Mysore (n = 26). All four lineages were detected in the study population from Mysore, which included lineage 1 (n = 11), lineage 2 (n = 1), lineage 3 (n = 4), and lineage 4 (n = 10).
Further, the isolates were subjected to in silico spoligotyping and long-sequence polymorphism analysis (region of difference-RD) using SpolPred software of TB Profiler, and the results are as shown in Table 3. EAI (40%), followed by CAS (22%), were the main spoligotypes identified among both DM and non-DM patients.
To trace the evolutionary relationships between the clinical isolates, 48 best-quality genomes were subjected to pan-genome analysis and phylogenetic tree construction by the Roary pan-genome pipeline. Table 4 shows the core genome statistics used for tree construction, and Figure 3 shows the Roary matrix with distribution of core genes and accessory genes. Trees based on core and accessory genes were visualized with iToL, as shown in Figures 4 and 5, respectively, revealing grouping of the isolates into specific clades, which matched with the lineage analysis by TB Profiler database. In addition, the phylogenetic trees did not reveal any striking genetic clusters of isolates within lineages 1, 2, 3, and 4. Table 2. Distribution of lineages with respect to geographical locations of patients and their diabetic status (DM-diabetic, ND-non-diabetic). Lineage 1 (Indo-Oceanic) was the most common among both diabetic (n = 9) and non-diabetic populations (n = 13).

Lineage
Charmrajnagar  Further, the isolates were subjected to in silico spoligotyping and long-seque ymorphism analysis (region of difference-RD) using SpolPred software of TB and the results are as shown in Table 3. EAI (40%), followed by CAS (22%), were t spoligotypes identified among both DM and non-DM patients.    shown in Figures 4 and 5, respectively, revealing grouping of the isolates into specific clades, which matched with the lineage analysis by TB Profiler database. In addition, the phylogenetic trees did not reveal any striking genetic clusters of isolates within lineages 1, 2, 3, and 4.

M. tuberculosis Drug Resistance
Drug resistance of M. tuberculosis isolates by DM status is shown in Table 3. Drug resistance mutations were detected by PhyresSE and TB Profiler. Isoniazid resistance was seen in 5 of the 55 isolates (9%). Substitution of serine to threonine at codon 315 of the katG gene accounted for mutations in two DM and three non-DM patients. Five isolates (one DM and four non-DM) showed resistance to streptomycin (9%). Three of them accounted for mutations in gidB, with two of them showing substitution of valine to glycine at codon 65 and one having glycine-to-alanine substitution at codon 34. Two other isolates had rrs gene mutations. Of the 55 isolates, only 1 showed ethambutol resistance (2%), which had a mutation in the embB gene with substitution of glycine to aspartine at codon 406. Two

M. tuberculosis Drug Resistance
Drug resistance of M. tuberculosis isolates by DM status is shown in Table 3. Drug resistance mutations were detected by PhyresSE and TB Profiler. Isoniazid resistance was seen in 5 of the 55 isolates (9%). Substitution of serine to threonine at codon 315 of the katG gene accounted for mutations in two DM and three non-DM patients. Five isolates (one DM and four non-DM) showed resistance to streptomycin (9%). Three of them accounted for mutations in gidB, with two of them showing substitution of valine to glycine at codon 65 and one having glycine-to-alanine substitution at codon 34. Two other isolates had rrs gene mutations. Of the 55 isolates, only 1 showed ethambutol resistance (2%), which had a mutation in the embB gene with substitution of glycine to aspartine at codon 406. Two isolates (one DM and one non-DM) had resistance to ethionamide (4%) with mutation of ethA gene with substitution of glycine to aspartine at codon 413.

Discussion
India has a high burden of TB, accounting for nearly a quarter of the global burden of TB. The prevalence is also high, estimated at 188/100,000 in 2021 [20]. The increasing prevalence of DM will exacerbate the incidence of TB, despite the efforts of the National Tuberculosis Elimination Program (NTEP) launched in 2020. DM occurring in an individual with LTBI may induce the latent infection to reactivate, or a new infection occurring someone with DM may lead to rapid progression to active disease. These outcomes are attributed to diminished protective immunity in those with DM. Here, we hypothesized that diminished immunity in DM subjects may enhance their susceptibility to a greater diversity of M. tuberculosis strains than in subjects without DM. We performed a detailed analysis of the WGS of M. tuberculosis strains isolated from patients with TB with or without DM. We found no significant difference in the M. tuberculosis spoligotypes isolated from DM vs. non-DM patients. Those with DM were all infected with the same spoligotypes found in non-DM TB patients. There was no difference in distribution of these genotypes with respect to geographical place of residence of the patients. Phylogenetic analysis revealed that the distribution of the genotypes from DM and non-DM TB patients clustered together. There was also no significant difference in drug resistance profiles in terms of DM status.
The only previous WGS study from South India identified M. tuberculosis lineages 1 (Indo-Oceanic lineage) and 3 (East-African Indian lineage) as the predominant lineages, which occur at a substantially lower frequency elsewhere. The Indo-Oceanic lineage was the most common (70%), followed by the East-African Indian lineage (16%) [21]. Lineages 2 (East Asian lineage) and 4 (Euro-American lineage) are most common in Europe, Africa, and many other parts of the world [22]. Within India, lineage 3 is predominantly found in the North, while lineage 1 is common in the South [23]. We observed that the most common lineage detected among both the DM and non-DM subjects was the Indo-Oceanic lineage, which accounted for 40% of the isolates, while the Euro-American lineage accounted for 32% of the isolates.
Although not significant, CAS was more common among TB patients with DM in our study. The Beijing strain, which is associated with greater risk for drug resistance [24,25], was seen in only one isolate (1.8%) in our study. The prevalence of the Beijing strain increases in states geographically situated northwards, with the highest prevalence observed in the state of Sikkim at 62.4% [26].
Our finding is similar to a study conducted in Kenya [27] which concluded that DM did not significantly increase M. tuberculosis genotype clustering among TB patients. A meta-analysis of six cohort studies of M. tuberculosis genotypes in those with DM concluded that clustering of DM in TB transmission chains has to be further investigated, due to several limitations pertaining to study setting and factors impacting cluster frequency [28].
The major limitation of this study is the small sample size of the TB patients and their isolates. The TB patients were identified over a period of 12 months. Furthermore, the analysis of the lineages and drug resistance was confined to in silico analyses of the WGS, which may have misclassified some of the isolates [29] However, other genotypic or phenotypic analyses of the M. tuberculosis strain are not likely to have yielded different results. A more granular analysis of the WGS to identify genetic features of the pan-genomes of the isolates from DM vs. non-DM TB patients may reveal features associated with DM or non-DM patients. Table 5 shows studies determining lineages and spoligotypes distributed across the world and specifically in India . An overview of various studies conducted in different regions around the world to determine the predominant lineages and spoligotypes of Mycobacterium tuberculosis, offers valuable insights into the distribution and diversity of the strains in different populations, and also raises several points for critical discussion. Firstly, there is a predominance of the Euro-American lineage in several regions, such as Uganda, Botswana, Tanzania, and Argentina. The high prevalence of this lineage suggests a common ancestry and widespread dissemination, possibly through historical migration or colonization. However, without a comprehensive analysis of genetic variations within the Euro-American lineage, it is challenging to ascertain whether these strains originated from a single source or if multiple introductions occurred. Secondly, the predominance of specific spoligotypes within lineages raises questions about their epidemiological significance. For instance, the LAM spoligotype is prevalent in Botswana, Peru, and Kenya, while the T spoligotype is predominant in Botswana and Argentina. Understanding the transmission dynamics and potential virulence of these specific spoligotypes requires additional investigations, such as molecular typing techniques and population-based studies. Furthermore, the use of different methods for strain typing, including SNP typing, spoligotyping, MIRU-VNTR, WGS, and LSP typing, across the studies raises concerns about comparability and standardization. While each method has its advantages and limitations, it is crucial to establish consistent methodologies to ensure accurate comparisons between studies. Harmonizing the techniques used would facilitate a more comprehensive and standardized understanding of TB strain distribution worldwide. Another noteworthy observation is the significant regional variation within a country. For instance, in India, different spoligotypes dominate in various regions, such as Beijing type in Sikkim, EAI in Agra, and CAS in Delhi. These regional differences may be attributed to local transmission dynamics, genetic factors, or socioeconomic variations. Understanding the reasons behind this intranational heterogeneity would aid in developing targeted interventions and policies to control the spread of TB within specific regions. Additionally, the table highlights the wide use of spoligotyping as a typing method in several studies. While spoligotyping provides valuable information on the genetic diversity of TB strains, its limitations in discriminating closely related strains or accurately predicting transmission chains are well-documented. Therefore, integrating complementary techniques, such as MIRU-VNTR and WGS, can enhance the resolution and accuracy of strain typing, leading to a more comprehensive understanding of TB epidemiology. This summary of important studies provides a snapshot of TB strain distribution in different regions worldwide and highlights the predominance of specific lineages and spoligotypes in various populations, raising important questions about their origins, transmission dynamics, and potential implications for disease control. However, the variability in methods used across studies and the need for standardized approaches warrant further attention. Future research should aim to address these limitations and explore the underlying factors contributing to the observed strain diversity, ultimately aiding in the development of effective TB control strategies.  The study focused on understanding the impact of DM on the diversity of M. tuberculosis strains and drug resistance profiles. It was hypothesized that individuals with DM may be more susceptible to a greater diversity of M. tuberculosis strains due to diminished protective immunity. However, the analysis revealed no significant difference in the spoligotypes or drug resistance profiles between DM and non-DM TB patients. The predominant lineages detected were the Indo-Oceanic lineage and the Euro-American lineage, which aligns with previous studies conducted in South India. The findings suggest that there are no specific lineages more common among diabetics compared to non-diabetics unlike studies from Lima, Peru [13] which observed that subjects with diabetes and pre-diabetes had significant differences in mycobacterial strains causing disease. Drug resistance analysis revealed that isoniazid resistance was observed in 9% of the isolates, with specific mutations detected in the katG gene. Streptomycin resistance was found in 9% of the isolates, associated with mutations in the gidB and rrs genes. Ethambutol resistance was observed in only 2% of the isolates, with a mutation in the embB gene, while 4% isolates showed resistance to ethionamide with mutations in the ethA gene. Fortunately, there were no cases with rifampicin resistance. These findings provide valuable insights into the clinical characteristics, genomic diversity, and drug resistance profiles of M. tuberculosis isolates in the study population, contributing to our understanding of tuberculosis and its management.
However, the study's limitations, including the small sample size and reliance on in silico analyses, should be considered. Further research with a larger sample size and more comprehensive genetic analyses is required. In future, we plan to explore the potential associations between DM and M. tuberculosis strains inspired by studies that used computational models [63] and comparative transcriptomics [64] designed to study the mechanisms of such relationships. This would provide a holistic view on this subject.

Conclusions
There was no difference in the genotypes or drug resistance profiles of M. tuberculosis isolated from DM vs. non-DM TB patients. The Indo-Oceanic lineage, followed by the Euro-American lineage were both similarly represented in DM and non-DM patients. Spoligotyping analysis observed that EAI and CAS were the most common spoligotypes. The analysis of the WGS based on current M. tuberculosis lineage classification schemes may not be sufficiently sensitive. However, the hypothesis that strains and lineages of M. tuberculosis among diabetics versus non-diabetics may differ cannot be rejected based on these preliminary study results. Further, to confirm the strength of these associations and to obtain a holistic view, we need comprehensive analysis, including comparative transcriptomics and computational analysis, to be performed on a larger population.