Genomic Sequencing Profiles of Mycobacterium tuberculosis in Mandalay Region, Myanmar

This study aimed to characterize whole-genome sequencing (WGS) information of Mycobacterium tuberculosis (Mtb) in the Mandalay region of Myanmar. It was a cross-sectional study conducted with 151 Mtb isolates obtained from the fourth nationwide anti-tuberculosis (TB) drug-resistance survey. Frequency of lineages 1, 2, 3, and 4 were 55, 65, 9, and 22, respectively. The most common sublineage was L1.1.3.1 (n = 31). Respective multi-drug resistant tuberculosis (MDR-TB) frequencies were 1, 1, 0, and 0. Four clusters of 3 (L2), 2 (L4), 2 (L1), and 2 (L2) isolates defined by a 20-single-nucleotide variant (SNV) cutoff were detected. Simpson’s index for sublineages was 0.0709. Such high diversity suggests that the area probably had imported Mtb from many geographical sources. Relatively few genetic clusters and MDR-TB suggest there is a chance the future control will succeed if it is carried out properly.


Introduction
Although there are international efforts to fight tuberculosis (TB), it remains the main cause of death from a single pathogen [1]. Understanding the nature of the pathogen, Mycobacterium tuberculosis (Mtb), is crucial in the prevention and control program.
Whole-genome sequencing (WGS) is an effective way to disclose the nature of Mtb. Data from WGS can be used for the classification of Mtb into lineages and sublineages, with much better precision and accuracy than previous technologies such as restriction fragment length polymorphism (RFLP) [2]. With the evidence of certain genes associated with certain drug resistance, WGS can replace the conventional drug susceptibility test (DST) in the drug resistance survey. Moreover, the genetic distance between two or more isolates can be used to detect genetic cluster outbreaks of Mtb.
Worldwide, Mtb was classified into four major lineages (L), which are significantly linked with geography and host genetics [3]. Common areas of L1 to L4 are East Africa and Southeast Asia; Asia to Europe and Africa; South Asia, North Africa, and East Africa; and Europe and America [4][5][6][7]. Any areas with all sublineages in common would be considered as having high biodiversity, which may be due to the importing and maintenance of Mtb from various geographic sources. On the other hand, evidence of the genetic clusters would suggest the recent outbreak or the domination of transmission of a certain strain.
In Myanmar, the estimated incidence of TB was 360 per 100,000 population. The estimated proportions of TB cases with multi-drug-resistant (MDR)/rifampicin-resistant tuberculosis (RR-TB) in 2021 were 4.1% (3.8-4.3) among new TB cases and 19% (18)(19)(20) among previously treated cases [1]. The Myanmar National TB Programme (NTP) conducts the anti-TB drug resistance survey every five years. The three previous surveys revealed the proportion of MDR-TB among new and previously treated TB patients to be 4.0% and 15.5% in 2002-2003; 4.2% and 10.0% in 2007-2008; and 5.0% and 27.1% in 2012-2013 [8]. The fourth nationwide anti-TB drug resistance survey was conducted in 2019-2020 [8]. Here, we report the Mandalay regional part of this fourth survey, which was the first survey using WGS. Although the study of Phyu et al. was the first to describe the distribution of lineages and drug resistances using WGS in Upper Myanmar [9], here we provide new information about genomic clusters, biodiversity, and sublineages, in detail, that were circulating in the Mandalay region.
To broaden the understanding of Mtb genomics in Myanmar, readers are reminded that there was a similar study at Kayin State in the Myanmar Thailand Border area in 2019 [10]. The information contributed by our current report, when combined with that from the Kayin study, will provide valuable input for the management of the National TB Control Programme in Myanmar.
The objectives of this study were (1) to construct the phylogenetic tree based on the genomic sequences of Mtb isolates obtained from the survey in the Mandalay region of Myanmar; (2) to classify the genotypes of isolates and examine the possible genetic clustering among the collected isolates in Mandalay region; (3) to compare the biodiversity and sublineages of TB isolates in Mandalay region and Kayin state in Myanmar; and (4) to describe the drug-resistance mutations with sublineages in Mandalay.

Study Sites
Mandalay is the major city in the middle part of Myanmar. The population of Mandalay region is around 6.2 million [11]. Notified MDR/RR-TB cases were 180 in 2020, the third rank in the whole country [12]. The Mandalay region consists of seven districts, which are subdivided into 28 townships. From the NTP survey design, our study sites covered five townships: Chanmyathazi, Kyaukpadaung, Meiktila, Sintgaing, and Singu townships. They are shown in Supplementary Figure S1.
Kayin state is situated in Lower Myanmar and adjacent to the Mandalay region. The population of Kayin state is 1.574 million [11]. It consists of five districts, which are subdivided into 7 townships. The study sites were three townships in Kayin state, and they were Hpa-An, Kawkareik, and Myawaddy [10].

Study Design
A cross-sectional study was carried out between February and August 2020 in the above five townships involving TB patients based on positive sputum culture regardless of their microscopic (for acid fast bacilli) and/or molecular study (Gene Xpert MTB/RIF Assay) results. The selected patients must not have been incarcerated at the time of selection and must never have received anti-TB treatment for more than seven days in their current regimen.

Participants
Of 28 townships of the Mandalay region, 5 were used because they could provide sputum samples from at least 10 smear positive consenting patients [8].
Morning sputum was collected on 3 consecutive days. The first specimen was sent for Gene Xpert test. The other two were sent for TB culture at the Upper Myanmar TB Laboratory if the first produced a positive Xpert test [8].
Based on the assumption that the proportion of MDR-TB was 0.05, the design effect was 1.2, and the 95% confidence limit was 0.04 from the estimate, the required sample size was 137 [13].

TB Culture Testing
At the Upper Myanmar TB Laboratory with biosafety level 3, sputum samples were decontaminated and inoculated in Mycobacterium Growth Indicator Tube (MGIT) culture immediately on arrival, and within ≤3 days of sputum sample collection. For each sample, one Löwenstein-Jensen media (LJ) tube was inoculated in parallel to MGIT as a back-up, kept at 37 • C, and the sputum sample discarded thereafter, to ensure that re-culturing in MGIT was possible if the initial MGIT culture failed. For each specimen, and for each positive MGIT culture that was confirmed as Mtb using the Capilia TB Immunochromatographic test (ICT) kit, at least one aliquot of viable culture was cryopreserved in glycerol at −80 • C, and the remaining culture tube was sub-cultured onto 2 LJ tubes. One positive sub-cultured LJ tube was used for DNA extraction for WGS, and the other tube was stored at −80 • C as a back-up, in case further DNA extractions were required [8].
Among the 221 smear positive or Xpert MTB/RIF positive samples, 206 samples were successfully cultured.

Extraction of Genomic DNA
At the Upper Myanmar TB Laboratory, genomic DNA was extracted from sputum culture isolates. MoBio Microbial DNA Isolation Kits, (Qigen DNEasy Ultraclean Microbial DNA extraction kit, Cat No: 12224-50) was used [14][15][16][17]. Both first-and second-line line probe assay (LPA) were tested by the MTB DRplus kit and the MTB DRsl kit. The extracted DNA was kept at −80 • C and sent to University of Otago, New Zealand via the National TB Reference Laboratory every 3 months for WGS or as necessary according to the international regulation to assure the stability of extracted DNA.

Sequencing at University of Otago
At University of Otago, extracted DNA was sequenced using Illumina MiSeq (https: //www.illumina.com, accessed on 5 February 2021) as previously described [15,16]. The sequencing data (FASTQ) were stored on the University of Otago's high-capacity central file storage (HCS) server.

Genomic Characterization
At the Faculty of Science, Mahidol University, Bangkok, Thailand, we used the snpplet pipeline for processing short-read sequencing data to obtain short variants (SNPs and indels). Using trimmomatic v0.39, the short reads were trimmed to remove adapter sequences and low-quality read positions (sliding-window trimming with a window size of 4 and a read quality threshold of 30) [18]. The trimmed reads were then mapped to the H37Rv reference genome (NC_000962.3) using bwa mem [19]. Picard's MarkDuplicates was used to identify duplicate reads before a per-sample variant calling using GATK (genome analysis toolkit) HaplotypeCaller in a haploid model [20], excluding bases with a quality score below 20. To compare SNPs across samples, we performed joint genotyping of all samples using GATK GenotypeGVCFs [20], using per-sample variant calls as inputs. We used mtbtyper for genotyping Mtb isolates from WGS data (https://github.com/ythaworn/mtbtyper, accessed on 15 October 2022).

Phylogenetic Analysis
A maximum-likelihood (ML) phylogenetic tree of 151 isolates was inferred using IQ-TREE v2 [21] with ultrafast bootstrap supports from 1000 replications (Figure 1). For the quality of WGS, this study cut off the coverage and read depth of sequencing at 90 and 15, respectively. The phylogenetic tree was constructed from 151 isolates. The phylogenetic tree was visualized using the FigTree program version 1.4.4 (http://tree.bio.ed.ac.uk/software/ figtree/, accessed on 20 October 2022). The best-fit nucleotide substitution model was K3Pu + F+ASC + R4 chosen according to the Bayesian Information Criterion (BIC) as determined by ModelFinder [22]. The lineage 4 H37Rv reference strain (Gene Bank accession number NC_000962.3) was used as an outgroup for rooting the tree. We used mtbtyper for genotyping Mtb isolates from WGS data (https://github.com/y worn/mtbtyper, accessed on 15 October 2022).

Phylogenetic Analysis
A maximum-likelihood (ML) phylogenetic tree of 151 isolates was inferred using TREE v2 [21] with ultrafast bootstrap supports from 1000 replications (Figure 1). For quality of WGS, this study cut off the coverage and read depth of sequencing at 90 and respectively. The phylogenetic tree was constructed from 151 isolates. The phylogen tree was visualized using the FigTree program version 1.4.4 (http://tree.bio.ed.ac.uk/s ware/figtree/, accessed on 20 October 2022). The best-fit nucleotide substitution model K3Pu + F+ASC + R4 chosen according to the Bayesian Information Criterion (BIC) as termined by ModelFinder [22]. The lineage 4 H37Rv reference strain (Gene Bank access number NC_000962.3) was used as an outgroup for rooting the tree.   [23]. We identified genomic clusters as clades in the phylogeny containing isolates that can be linked via pairwise single nucleotide polymorphism (SNP) distances. If the pairwise distance between two isolates was <20 SNPs, they were considered closely related or genomically linked [24]. A cluster contains an aggregate of pairs of isolates in which each one differs by fewer than 20 SNPs from at least one of the other elements of the cluster.

Biodiversity
At the Department of Epidemiology, Faculty of Medicine, Prince of Songkla University, Hat Yai, Thailand, Simpson's index was calculated based on the formula: where D is Simpson's index, n is the total number of organisms of a particular species, N is the total number of organisms of all species, and Σ means "sum up". The scale ranges from 0-1, with 0 representing the highest biodiversity of a particular species and 1 representing the lowest biodiversity of a particular species [25,26].

Drug Resistance
We used TB-profiler software for the prediction of drug-resistance mutation [27].

Merging with Kayin State Data
The 109 samples of sequencing data for Kayin state were downloaded from the European Nucleotide Archive (ENA) of EMBL-EBI mirrored in the Sequence Read Archive (SRA) database [10].

Statistical Analysis
Statistical analyses were performed using R version 4.0.2 (R foundation for Statistical Computing, Vienna, Austria). The Mtb major lineages, sublineages, and their drugresistance mutations were presented as frequency and percentage. Information on drug resistance was limited to only untreated cases in order to assess only primary drug resistance [28], which better reflected the severity of the problem. The relationship between the geographical areas of Mandalay and Mtb lineages, and between anti-TB drug resistance and major lineages, was analyzed using the Chi-squared test. The significance level was set at a p value of less than 0.05.

Participant Characteristics
The median age of the participants was 42 (IQR: 31.5-54.0) years. Males accounted for 70%. There were no significant association between Mtb major lineages and the age and gender of the participants; or townships (Table 1).

Phylogenetic Analysis
A maximum-likelihood phylogenetic tree of 151 Mycobacterium tuberculosis isolates from five townships is shown in Figure 1, with the names of sublineages labeled in color at the outermost part. The majority of isolates belonged to lineage 1 (L1) and L2. Intriguingly, among L1, the majority of isolates belonged to L1.

Identification of Genetic Clusters
There were four clusters identified based on the pairwise distance. The number of isolates in the respective lineages were 3 (L2), 2 (L4), 2 (L1), and 2 (L2).

Distribution of Sublineages and Biodiversity in Mandalay Region and Kayin State
The distribution of sublineages and the biodiversity in Mandalay region and Kayin state are shown in Table 2. The overall Simpson's indices were 0.0709 in Mandalay and 0.072885 in Kayin. Sublineage L1.1.3.1 was the most common (n = 31), followed by L2.2.AA3.2 (n = 17), which was available only in Mandalay. In Kayin, the most prominent sublineages were L1.1.3.1 (n = 17) and L1.3 (n = 17). The latter was only in Kayin.

Distribution of Drug-Resistance Mutation and Sublineages in Mandalay Region
The distribution of drug-resistance mutation and sublineages is shown in Table 3. Among the isoniazid-resistant strains, 70% were S315T mutations in the katG gene. Regarding the distribution of streptomycin drug resistant mutation, there were 73% resistant isolates with an rpsL K43R mutation, which was the most common occurrence in the Mandalay region. Among 126 new TB patients, MDR-TB accounted for 1.6% (2/126) and poly-drug resistance 5.6% (7/126). MDR-TB (isoniazid and rifampicin resistance) genes were found in L1 and L2. Poly-drug resistance was also found in L1 (n = 2), L2 (n = 4), and L4 (n = 1). There was no drug resistance identified in seven isolates of L3.
Many studies have suggested that the L1.1.3 sublineage is more virulent and more resistant to anti-TB drugs. According to the research done in South Africa, active TB cases infected with L1.1.3 strains were at a higher risk of developing drug-resistant strains and treatment failure than those infected with other strains [32]. Another study conducted in Ethiopia found that the L1.1.3 sublineage was associated with a higher risk of treatment failure and death among TB patients, compared to other Mtb strains [33]. The prominence of the L2.2.AA3.2 sublineage in Mandalay is not fully understood.
Our current study revealed relatively few outbreaks of TB genetic clusters. These findings imply the simultaneous transmission of different lineages imported from different sources. Simpson's index (D) was 0.0709 in our study and 0.07 reported by Maung et al. in Kayin state [10], much lower than that of 0.68 in the Philippines [30] and 0.21 in China [31]. It has been proposed that the primary causes of genetic diversity within the Mycobacterium tuberculosis complex (Mtbc) may be attributed to the relative fitness and adaptability of imported genotypes to ecological and other unknown environmental factors [34].
The two most common drug-resistance mutations found were streptomycin (n = 11, 7%) and isoniazid (n = 10, 7%), which have been extensively used in the past. The first Mtb drug resistance to be identified was to streptomycin [35]. Of the streptomycin-resistant isolates, 73% had a rpsL K43R mutation, which confers high-level resistance to streptomycin by reducing the binding affinity of the drug to the ribosome [36]. This was the most common occurrence in the Mandalay region. A previous study from Upper Myanmar also reported resistant streptomycin 32 isolates with a rpsL K43R mutation [9]. In Asia, rpsl mutations were prevalent [37][38][39][40]. In Myanmar, streptomycin is utilized in developing specialized treatment for hepatotoxic TB patients who are not suitable for the standard anti-TB treatment regimen [9]. S315T mutations in the katG gene accounted for 70% of the isoniazid-resistant strains in our study. There were 22 resistant isoniazid isolates with the katG S315T mutation in a previous study from Upper Myanmar [9]. The katG gene encodes for the enzyme catalase-peroxidase, which is involved in the activation of isoniazid, a major component of anti-TB treatment [41]. A significant mechanism for isoniazid resistance in Mtb is via mutations in the katG gene [41].
Our study only obtained isolates from culture-positive TB patients. The findings cannot be generalized to smear negative and extra-pulmonary TB patients. We also did not have metadata on ethnicity, socio-economic status, traveling history, and treatment compliance. These limitations should be taken into account for TB control planning.

Conclusions
The current data suggest that the area probably had imported Mtb from many geographical sources. With relatively few genetic clusters and MDR-TB, there is a good chance that the future control will succeed if it is carried out properly.