Geno-Spatial Distribution of Mycobacterium Tuberculosis and Drug Resistance Profiles in Myanmar–Thai Border Area

Worldwide, studies investigating the relationship between the lineage of Mycobacterium tuberculosis (MTB) across geographic areas has empowered the “End TB” program and understand transmission across national boundaries. Genomic diversity of MTB varies with geographical locations and ethnicity. Genomic diversity can also affect the emergence of drug resistance. In Myanmar, we still have limited genetic information about geographical, ethnicity, and drug resistance linkage to MTB genetic information. This study aimed to describe the geno-spatial distribution of MTB and drug resistance profiles in Myanmar–Thailand border areas. A cross-sectional study was conducted with a total of 109 sequenced isolates. The lineages of MTB and the potential associated socio-demographic, geographic and clinical factors were analyzed using Fisher’s exact tests. p value of statistically significance was set at < 0.05. We found that 67% of the isolates were lineage 1 (L1)/East-African-Indian (EAI) (n = 73), followed by lineage 2 (L2)/Beijing (n = 26), lineage 4 (L4)/European American (n = 6) and lineage 3 (L3)/Delhi/Central Asian (n = 4). “Gender”, “type of TB patient”, “sputum smear grading” and “streptomycin resistance” were significantly different with the lineages of MTB. Sublineages of L1, which had never been reported elsewhere in Myanmar, were detected in this study area. Moreover, both ethnicity and lineage of MTB significantly differed in distribution by patient location. Diversity of the lineage of MTB and detection of new sublineages suggested that this small area had been resided by a heterogeneous population group who actively transmitted the disease. This information on distribution of lineage of MTB can be linked in the future with those on the other side of the border to evaluate cross-border transmission.


Introduction
Myanmar and Thailand, two bordering countries, are among the top 30 high burden countries for tuberculosis (TB) [1]. The recent estimated incidence of TB in Myanmar and Thailand was 338/100,000 and 153/100,000 population, respectively [1]. Drug-resistant TB (DR-TB), impact of migration and border health issues are the important barriers for ending TB in both countries [2]. Border health is still one of the challenges in TB response especially in limited resources country such as Myanmar because of its low socio-economic level, minority ethnic groups, and migrant issues [2]. Similarly, a few recent studies from Thailand showed that there were some TB/DR-TB outbreaks in border provinces of Thailand using molecular genotyping methods [3]. With the availability of deoxyribonucleic acid (DNA) technology, genotyping has become the standard method for species and strain differentiation. The generally used molecular genotyping methods for Mycobacterium tuberculosis (MTB) give limited discrimination power because only a small part of the genome was examined. Whole Genome Sequencing (WGS) can overcome this limitation by providing more information about diversity of sublineages due to its very high discrimination power [4,5]. Genomic diversity affects virulence, transmissibility, host response, and the emergence of drug resistance [6][7][8]. This information on distribution of lineage of MTB across geographic areas has empowered the global TB response.
Several studies have described the distribution of lineages of MTB and their associations with geographic and ethnic background of the TB patients in other parts of Southeast Asia [9][10][11][12]. A few genotyping studies in Myanmar were previously reported [7,[13][14][15]. However, WGS of Mycobacterium tuberculosis and DR-TB has been done only for a small number of isolates [16] and lacks geographical links to genetic information. To effectively control TB, there is a need to obtain information on drug resistance profiles of MTB in the population. Phenotypic drug susceptibility testing for the whole population is expensive. WGS information can be used to reflect the drug resistance problem, which can affect the response for ending TB. The objective of this study, therefore, was to describe the geno-spatial distribution of Mycobacterium tuberculosis and drug-resistant genotypes in a previously unstudied border area of Myanmar and Thailand.

Study Setting
In Myanmar, there are eight major national ethnic groups, such as Bamar, Kachin, Kayah, Kayin, Chin, Mon, Rakhine and Shan with another 135 minority ethnicities [17]. This study was conducted in three townships of Kayin State where many of the residents belong to the Kayin ethnicity. Kayin State is mountainous and located between latitudes 15 • 45 and 19 • 25 N and longitudes 96 • 10 and 98 • 28 E. It is bordered by Mandalay Region, Shan State and Kayah State to the north, Mon State and Bago Region to the west, and Mae Hong Son, Tak, and Kanchanaburi provinces of Thailand to the east. The three study site townships were Hpa-An, Kawkareik and Myawaddy, which all lie on the East-West Economic Corridor that links the port city of Yangon, Myanmar to South China Sea at Da Nang, Vietnam.

Study Design and Participants
This was a cross-sectional study conducted between March and August 2019. The patients under study were the pulmonary sputum smear-positive patients aged more than 8 years who were diagnosed and treated at one of the aforementioned township TB centers located in the township hospital compound. As culture is not routinely done in the study sites, smear-positive sputum increased on chance to have the viable MTB, which can be grown in the culture isolate after the specimen arrived at the National TB Reference Biosafety Level 3 Laboratory (NTRL) in Yangon. Xpert MTB/RIF tests were also done, but no phenotypic drug susceptibility tests (DST) were conducted following the National guideline of DRTB in Myanmar [18]. For sample size calculation, we used the finite population proportion sample size formula.
where p is the proportion of the most frequent lineage strain among pulmonary sputum smear-positive TB patients (0.76) based on a previous study [19], d is the precision (0.08), and α is the type I error rate (0.05). With these parameters, a minimum of 104 WGS TB isolates from TB patients were required.
A consecutive sampling technique was used in which every subject meeting the selection criteria were selected until the required sample size was achieved. We estimated that the percentage of obtaining WGS isolates from sputum smear-positive samples was about 50%. Therefore, in order to achieve 104 WGS isolates, 200 sputum smear-positive samples were needed. These 200 smear-positive sputum samples were sent to the NTRL Laboratory in Yangon. Among them, 167 were culture growth positive and DNA was extracted. After excluding low quality DNA extracted samples, WGS of a total of 109 isolates were available in this study.

Samples Collection, Genomic DNA Extraction, and Whole Genome Sequencing
Two sputum samples from each patient were collected and sent to the NTRL Laboratory in Yangon. Sputum specimens were decontaminated and were then inoculated onto Lowenstein-Jensen medium for culturing according to the standard procedure. DNA was extracted using an UltraClean Microbial DNA Isolation Kit (Mo Bio Laboratories, Carlsbad, CA, USA) following the instruction manual [20]. Sequencing was performed at Massey Genome Service, New Zealand. Samples were prepared using an Illumina Nextera XT kit (Illumina, San Diego, CA, USA) as per the manufacturer's protocol with minor modification as follows the incubation time for the tagmentation step was increased to 8 min, the PCR cycle was increased to 14 cycles and 90 µL (1.8×) of Ampure beads (Beckman Coulter, Brea, CA, USA) were used for size-selection from 400 bp to 600 bp fragments. A multiplexing step for unique identification for each MTB isolate was performed using an Illumina indices kit: Set A-D (Illumina, San Diego, CA, USA). WGS fragment sizes were quantified using a high sensitivity assay on the LabChip GX touch instrument (PerkinElmer, MA, USA). Pooling in equal molarity for sequencing was performed on a Janus robotic platform workstation (PerkinElmer, MA, USA). Prepared MTB libraries were sequenced using an Illumina MiSeq sequencer and 500 cycles kit (2 × 250 PE) (Illumina, San Diego, CA, USA). Raw sequencing reads were de-multiplexed using the Illumina RTA software (version v1.17.21.3) and SAV software (version v1.8.46) (Illumina, San Diego, CA, USA).

Genome Sequencing Data and Variant Calling
The system produced paired-end reads in FastQ file format. Raw reads were trimmed with Trimmomatic (v.0.3654) to remove adapter sequence and low-quality bases. Then, trimmed reads were mapped to the reference genome, M. tuberculosis H37Rv (Accession no. NC_000962.3) using BWA mem (v.0.7.17).
The median and average depth of sequencing were 59.72 and 47.11, respectively. 99.56% of genome was covered by paired end reads on average. Variants were called using GATK (v 3.8) by setting a minimum coverage of 10 reads, a phred score of at least 20 and a minimum allele frequency of 75% as thresholds. All single nucleotide variants (SNVs) positions were then extracted and converted to an SNV-super-matrix using an in-house Python script before being used in the phylogenetic analysis. Insertions and deletions were excluded from this study. The 109 samples of sequencing data were submitted to the European Nucleotide Archive (ENA) of EMBL-EBI mirrored in the Sequence Read Archive (SRA) database. This project was deposited in the NCBI database under "BioProject" and "BioSample" numbers, PRJNA645523 and SAMN15507652, respectively. Actual read sequences can be downloaded directly from the SRA database. (https://dataview.ncbi.nlm.nih.gov/object/PRJNA645523? reviewer=qgmhm2j6c98kdkdau8f4q54cs).

Phylogenetic Analysis
The SNVs of 109 isolates were used for phylogenetic tree construction. The SNVs that were present in any drug-resistant genes, mobile genetic element, phage, PE/PPE region, and non-homozygous SNVs were excluded. We analyzed the generated super-matrix with the maximum likelihood (ML) methods using RAxML (v 7.3.4). The core tree (starting tree) was generated using the BioNJ method. The tree was visualized using the FigTree program version 1.4.2. (http://tree.bio.ed.ac.uk/software/figtree/).

Genotyping
The genotypes were assigned based on the sublineage-specific SNPs previously listed [9].

Prediction of Drug Resistance
Prediction of drug resistance is based on the results from TB profiler software [21].

Statistical Analysis
Data from the questionnaire and record review were entered into EpiData version 3.1 (http://www. epidata.dk/) and analyzed using R version 3.6.3 (https://cran.r-project.org/). To describe the geographical distribution of major lineages of MTB and ethnicity in study townships, visualization of spatial data was done. The patients' socio-demographic characteristics, obtained from the questionnaire and clinical characteristics obtained from medical record review, were presented descriptively. The major lineages of MTB and the potential associated socio-demographic and clinical factors, including TB drug resistance were analyzed using Fisher's exact test. A p value < 0.05 was considered as statistically significant.

Ethics
The study was approved by the Institutional Ethics Committee of Faculty of Medicine, Prince of Songkla University, Hat Yai, Thailand (REC number 61-220-18-1) and the Ethics Review Committee of the Department of Medical Research, Ministry of Health and Sports, Myanmar (ERC number 2018-161). Table 1 shows the socio-demographic characteristics of the TB patients by major lineages of MTB. About 50% of the patients were from Hpa-An township, the ratio of males to females was 2:1 and half were aged between 35 and 57 years. Kayin was the dominant ethnicity (47%) followed by Bamar (35%). There was no significant association between major lineages of MTB and township, age of the patients, and ethnicity. The only significantly variable was gender, where L2 was significantly more prevalent in males.

Genotype Information
The average coverage of the genome was 99.56%. The maximum likelihood phylogenetic tree is shown in Figure 1. About 67% of the isolates were belonged to lineage_1 (L1) or East-African-Indian (EAI) (n = 73), followed by lineage_2 (L2) or Beijing (n = 26 or 23% of isolates), lineage_4 (L4) or European American (n = 6 or 6% of isolates) and lineage_3 (L3) or Delhi/Central Asian (n = 4 or 4% of isolates). In Figure 1, the numbers and names of the sublineages are labelled at the outermost parts. The first number of the sublineage indicates the lineage. Thus, starting from the blue shaded area at 9 o'clock and moving in an anticlockwise direction shows twenty six isolates of L2, then four isolates of L3, six isolates of L4 and the remaining seventy three isolates belonged to L1. The frequency distribution of these lineages and sublineages are summarized in Table 2.   [9]. Most of the Ancestral Beijing isolates belonged to L2.2.1. Ancestral 4, a recently described sublineage common in other Tibeto-Burman speaking tribes, Akha, and Lahu. L2.2.1. Ancestral 4 is the Ancestral Beijing strains that is most genetically related to the Modern Beijing strains, with the presence of G1286766C (mutT2 codon 58) but not the ogt12 mutation [9]. The former was previously considered as a specific barcoding SNP marker of Modern Beijing strains. L3 and L4 strains were found in a small portion in this study.

Geno-Spatial Analysis
Figures 2 and 3 illustrate maps of the patients' residence by ethnicity and major lineages of MTB, respectively. The range of longitudinal coordinates was equally divided into four zones. The cities are in zone 1 (Hpa-An), zone 3 (Kawkareik) and zone 4 (Myawaddy). Statistics of these distributions are summarized in Table 3. Both ethnicity and major lineages of MTB significantly differed in distribution by zone. Bamar were mainly living in zones 1 and 4 whereas Kayin were mainly in zones 1 and 3. Kayin, the majority ethnic group in this study, were scattered in zones where the cities are situated but most of them lived outside the city. Most Bamar patients lived in Hpa-An and Myawaddy city. Other ethnic groups were scattered in rural areas of zone 2 and few in all three cities. Zone 1 had all type of major lineages of MTB. Zone 3 and 4 showed L1 and L2 dominant and zone 2 had only the L1 lineage.

Clinical Characteristics
Detail information of the clinical characteristics of the study patients and the distribution of major lineages is shown in Table 4. Sputum grading was the only variable significantly different with the major lineages of MTB. Most of L1, L3, and L4 lineages showed high smear grading "2+ and 3+" and few were smear grading "1+". However, about 43% of L2 showed the smear grading "scanty and 1+". The variable type of TB patient was marginally significant with the distribution of major lineages of MTB. About a quarter of L2 patients were retreated TB patients. However, few L1 patients were retreated TB patients and there were no retreated L3 and L4 patients. The rest of the clinical characteristics had no significant difference with major lineages.

Drug Resistance of TB Drug Information
When we analyzed the drug resistance of TB drugs, we excluded the isolates that were less than 30× depth and 95% genome completion coverage and so 63 isolates were analyzed to detect the drug-resistant profiles in this study. Table 5 summarized the association between drug susceptibility and major lineages of MTB. Out of 63 isolates, 19 (30%) had at least one anti-TB drugs resistance mutations. Most common resistance was found against Isoniazid and followed by Ethionamide and Streptomycin. Among TB drugs, Streptomycin resistance was the only one associated with major lineages of MTB. Table 6 displays detailed information on identified variants associated with drug resistance in the genomes of the 19 clinical Mycobacterium tuberculosis complex isolates. Two of the Isoniazid-resistant isolates were also resistant to Rifampicin and Pyrazinamide, Ethambutol and Streptomycin. These two MDR-TB patients tested positive for Rifampincin resistance on Xpert MTB/RIF, while the Xpert MTB/RIF tests of the other isolates showed no Rifampicin resistance. Both MDR-TB isolates belonged to the L2 lineage. Among eight poly-DRTB isolates, four were L1 lineage and another four were L2 lineage. Six out of eight poly-DRTB isolates had resistance to Isoniazid. The remaining nine were mono drug-resistant tuberculosis. There were no pre-extensively drug-resistant tuberculosis (Pre-XDRTB) and extensively drug-resistant tuberculosis (XDRTB) cases in this study. Both RIF-resistant isolates showed resistance-determinant mutations in rpoB gene, either Ser450Leu or His445Asp. Among the INH-resistant strains, 10/13 (77%) had a Ser315Thr mutation in the katG gene, 2/13 (15%) showed a mutation in the fabG1 gene and the other one had a mutation in the ahpC gene. Details of drug-resistant mutation points are shown in Table 6.

Discussion
The Mycobacterium tuberculosis population in the study area was genetically diverse with the total of 4 lineages and at least 30 sublineages identified among 109 study isolates. Around two-thirds were lineage 1 and approximately a quarter was lineage 2. Among the socio-demographic characteristics, gender was the only variable significantly associated with major lineages. Both ethnicity and major lineages differed in their distribution by longitudinal zone of patients' locations.
Some clinical characteristics such as "type of TB patient", and "sputum smear grading were significant difference with the lineages of MTB. Moreover, we found an association between drug resistance and major lineages of MTB and identified 19 drug-resistant isolates in this study.
However, there was no evidence of an association between lineage and patients' ethnicity. There was also no specific geographical pattern on drug-resistant isolates.
The descriptive statistics on distribution of major lineages were significantly different between male and female patients. The proportion of males was 2.2 times higher than females overall and significantly higher among L2 lineage (88% of L2 isolates) and the finding was contrasted with other studies from China [23,24]. Similar to WHO data and another study from Myanmar, about half of the patients were from the most active and productive age group [1,2]. Another study also found that patients with lineage 1 were older than those with other lineages and our study also showed that the mean age of patients with lineage 1 was higher than those with lineage 2 and lineage 4. However, the difference was not statistically significant. [9]. A previous study in Thailand showed that some major lineages and sublineages such as the Ancestral_4 strain was associated with ethnic groups that speak Tibeto-Burman language family such as Akha and Lahu tribes, which were originally from Yunnan, China [25,26] compared to Thai ethnic group [9]. Although there was no significant association between the major lineages and patients' ethnicity, here we identify the

Discussion
The Mycobacterium tuberculosis population in the study area was genetically diverse with the total of 4 lineages and at least 30 sublineages identified among 109 study isolates. Around two-thirds were lineage 1 and approximately a quarter was lineage 2. Among the socio-demographic characteristics, gender was the only variable significantly associated with major lineages. Both ethnicity and major lineages differed in their distribution by longitudinal zone of patients' locations. Some clinical characteristics such as "type of TB patient", and "sputum smear grading were significant difference with the lineages of MTB. Moreover, we found an association between drug resistance and major lineages of MTB and identified 19 drug-resistant isolates in this study. However, there was no evidence of an association between lineage and patients' ethnicity. There was also no specific geographical pattern on drug-resistant isolates.
The descriptive statistics on distribution of major lineages were significantly different between male and female patients. The proportion of males was 2.2 times higher than females overall and significantly higher among L2 lineage (88% of L2 isolates) and the finding was contrasted with other studies from China [23,24]. Similar to WHO data and another study from Myanmar, about half of the patients were from the most active and productive age group [1,2]. Another study also found that patients with lineage 1 were older than those with other lineages and our study also showed that the mean age of patients with lineage 1 was higher than those with lineage 2 and lineage 4. However, the difference was not statistically significant [9]. A previous study in Thailand showed that some major lineages and sublineages such as the Ancestral_4 strain was associated with ethnic groups that speak Tibeto-Burman language family such as Akha and Lahu tribes, which were originally from Yunnan, China [25,26] compared to Thai ethnic group [9]. Although there was no significant association between the major lineages and patients' ethnicity, here we identify the Ancestral_4 strain in both Bamar and Kayin ethnic groups, both of which also speak the Tibeto-Burman language family. This suggests that the sublineage may be associated with several ethnic groups that speak the language family. Further studies on the ethnic association may provide information the origin of the sublineage. Table 7 compares the distribution of lineage of MTB from our study with that from other studies [9,11,19,27]. L1 lineage was the most dominant in our and the Philippines study but in detail, the sublineage pattern was different [11]. L2 lineage was very prominent in a study from Yangon Region, Myanmar [19]. This difference between lineage distribution of our study and the Yangon study suggests that transmission of TB in border areas has relatively little communication with the Yangon Region. It also explains the relatively low prevalence of multiple drug-resistant TB compared to Yangon Region. With similar findings from previous studies, the proportion of L3 and L4 lineages were quite low, compared with L1 and L2 lineages [7,19]. By using a combination of spatial analysis and geographic information system (GIS), we could visualize the distribution of major lineages of MTB and ethnicity. All of the zones showed significant L1 lineage dominant but in zone (4) where the number of L1 and L2 lineages were roughly equal. L2 lineage was more prevalent on the Myanmar-Thai border than central region, and this finding was consistent with a study from Thailand, which showed that the L2 lineage was dominant in a border province [3]. Moreover, all zones showed that the Kayin ethnicity was dominant; except for zone (4), which had a predominance of the Bamar ethnicity. This may be due to the location of main residential areas of Myawaddy city where there are many Bamar cross-border migrant workers. Furthermore, most people of Bamar ethnicity live near the main cities but most Kayin and other minority ethnicities are scattered in rural areas. Further studies are needed to better understand phylo-geographic variation of TB disease, and explore local risk factors that may be responsible for the observed pattern.
There were only two clinical features significantly associated with the lineage of MTB in our study: type of TB patient and sputum grading. We found that the L2 lineage had a high percentage of retreated cases, consistent with previous reports that showed this lineage to have a higher relapse rate and an association with drug resistance [9,28,29]. However, our finding of higher sputum grading in L1 lineage was not consistent with the finding of no association in South Africa study and higher grading of L2 in Sri Lanka study [30,31].
The anti-TB drug resistance burden predicted from WGS here was quite similar to that reported in previous studies in Myanmar [28,32]. Many of the studies showed that L2 lineages had significant association with DRTB and our studies also highlighted that they were associated with streptomycin resistance and both MDR-TB isolates were L2 lineages [16,19]. In addition, the study also highlighted that none of the strains in this study were predicted to be pre-XDRTB and XDRTB. Rifampicin and Isoniazid, the two most powerful anti-TB drugs, were resistant due to mutations in the rpoB and KatG genes respectively, and the finding was similar to previously observed study in Myanmar [16].
In Myanmar, National TB Programme stills cannot introduce first-line and second-line DST in routine diagnosis for all registered TB patients and those tests are mainly focus on the registered DRTB patients only. However, according to the National guideline of DRTB in Myanmar, all of the registered smear-positive TB patients need to be tested for Xpert MTB/RIF with free of charge [18] and we have the results of Xpert MTB/RIF in our system. The Xpert MTB/RIF results of the two RIF resistance isolates were consistent with our study. Our study can also highlight that if we use routine diagnosis flow, we can detect only two RIF resistance patients, but if we could use WGS in future, we can detect the additional drug resistance.
In this study, we could combine the genetic results from the whole genome sequencing and geographical data and patients' background and clinical characteristics. The study was the first in Kayin ethnic areas near Myanmar-Thai border. The limitation of this study was on its small number of isolates. The local health services and the transportation conditions did not allow us to recover all the potential isolates collected in the field. Generalization of the results to the whole community in this area must be made with caution.

Conclusions
Non-significant ethnic differential but significant geographic differential of lineage of MTB in this study area may suggest relatively well-mixed ethnicity. It may also suggest that an area focus is more justifiable than an ethnic focus in local ending TB. Non-clustered drug-resistant MTB genotypes and relatively few MDR-TB cases compared to the national scale indicate an opportunity to prevent MDR-TB spread in this locality.