Whole-Genome Sequencing to Predict Mycobacterium tuberculosis Drug Resistance: A Retrospective Observational Study in Eastern China

Pulmonary tuberculosis (TB) is an infectious disease caused by Mycobacterium tuberculosis (MTB). Whole-genome sequencing (WGS) holds great promise as an advanced technology for accurately predicting anti-TB drug resistance. The development of a reliable method for detecting drug resistance is crucial in order to standardize anti-TB treatments, enhance patient prognosis, and effectively reduce the risk of transmission. In this study, our primary objective was to explore and determine the potential of WGS for assessing drug resistance based on genetic variants recommended by the World Health Organization (WHO). A total of 1105 MTB strains were selected from samples collected from 2014–2018 in Zhejiang Province, China. Phenotypic drug sensitivity tests (DST) of the anti-TB drugs were conducted for isoniazid (INH), rifampicin (RFP), streptomycin, ethambutol, fluoroquinolones (levofloxacin and moxifloxacin), amikacin, kanamycin, and capreomycin, and the drug-resistance rates were calculated. The clean WGS data of the 1105 strains were acquired and analyzed. The predictive performance of WGS was evaluated by the comparison between genotypic and phenotypic DST results. For all anti-TB drugs, WGS achieved good specificity values (>90%). The sensitivity values for INH and RFP were 91.78% and 82.26%, respectively; however, they were ≤60% for other drugs. The positive predictive values for anti-TB drugs were >80%, except for ethambutol and moxifloxacin, and the negative predictive values were >90% for all drugs. In light of the findings from our study, we draw the conclusion that WGS is a valuable tool for identifying genome-wide variants. Leveraging the genetic variants recommended by the WHO, WGS proves to be effective in detecting resistance to RFP and INH, enabling the identification of multi-drug resistant TB patients. However, it is evident that the genetic variants recommended for predicting resistance to other anti-TB drugs require further optimization and improvement.


Tuberculosis (TB) is an infectious disease caused by Mycobacterium tuberculosis (MTB).
According to the "Global Tuberculosis Report" published by the World Health Organization (WHO) in 2022, 10.6 million new cases of TB worldwide were estimated in 2021, and 1.4 million people died from the disease.That year, 3.6% of new patients and 18% of retreated patients developed RFP-resistant or multidrug-resistant tuberculosis (MDR-TB), and 450,000 patients developed drug-resistant TB (DR-TB) [1].DR-TB, especially MDR-TB, poses many problems during diagnosis and treatment and is one of the main challenges affecting global TB prevention and control [2][3][4].
Regarding the diagnosis of DR-TB, traditional phenotypic drug sensitivity testing (DST) is an effective tool; however, it requires a long testing time (>2 weeks), and the stability of some drugs needs to be improved.The lack of rapid diagnostic and detection tools for drug resistance hinders the formulation of standard treatment regimens based on DST results for patients with DR-TB, especially MDR-TB.Non-standard anti-TB treatment also leads to related issues such as the spread and variation of drug-resistance patterns [5].This is an important factor that contributes to the poor prognosis of patients with DR-TB.Molecular diagnostic technologies can overcome the shortcomings of conventional phenotypic DST methods [6].
Molecular diagnostic tools possess advantages in MTB detection, such as fast detection speed, high sensitivity, and reasonable specificity [7].Based on the molecular mechanism that attributed the majority of drug resistance in clinical MTB strains to chromosomal mutations or epigenetic modifications, a series of molecular tools, such as GeneXpert, have been developed that are helpful for the detection of anti-TB drug resistance.However, most methods can only detect resistance to individual anti-TB drugs.Newly developed molecular tools may effectively improve the clinical diagnosis of TB, partly [8], improving the clinical treatment effectiveness of some TB patients [9], and increasing the treatment success rate of DR-TB [10].With the advancement of technology, molecular diagnostics have gradually expanded to predict common anti-TB drug resistance [11].However, molecular methods are limited in their prediction of sensitivity to other common anti-TB drugs, especially second-line drugs for MDR-TB patients [12].
Whole-genome sequencing (WGS) technology has shown promising applications in TB detection and the prediction of anti-TB drug resistance.For example, Wu et al. designed a customized Ion AmpliSeq TB analysis platform that can obtain analysis results consistent with the phenotypic DST and effectively shorten the diagnosis time [13].Furthermore, WGS technology is a useful tool for monitoring genetic variations and mutations in TB caused by abnormal anti-TB treatment and clinical drug use [14] and for the early detection and prevention of MDR-TB and extensively drug-resistant tuberculosis (XDR-TB) [15].
WHO recommends a series of mutation sites for each of the common anti-TB drugresistance predictions using WGS (Table S1).However, the predictive value of these mutated loci needs to be validated with further studies involving different populations and regions.In the present study, WGS was performed among patients with common pulmonary tuberculosis (PTB), based on a population-based study conducted in Zhejiang Province, China, from 2014 to 2018, and aimed to provide a theoretical basis and technical support for DR-TB diagnosis and treatment.

Predictive Value of WGS Genotypic DST
The results of WGS genotypic DST for different anti-TB drugs are shown in Table 2. WGS achieved good specificity values in predicting resistance to all anti-TB drugs, which were above the 95% level.The sensitivity values for RFP and INH were 91.78% and 82.26%, respectively; however, the sensitivity values for other anti-TB drugs were only 60% or less.

Distributions of Drug Resistance-Associated Mutations
Based on the WHO recommendations and anti-TB drug-resistance-associated gene mutations detected by WGS (Table S2), the MTB strains were classified into four categories: A: no mutations; B: only mutations other than WHO-recommended ones; C: only mutations recommended by WHO; and D: mutations included in both WHO-recommended ones and others.The distribution of these categories for different drugs is shown in Table 3 and Figure 1.
As shown in Table S1, except for AK, KM, and CM, mutations associated with other anti-TB drug resistance coexisted in the phenotypically resistant and sensitive groups.Meanwhile, cases exist where mutations are not detected in drug-resistant strains but are detected in sensitive groups.These cases, in terms of each anti-TB drug, are described as follows: • RFP: Of the 73 RFP-resistant strains, six did not have any mutations.Forty-two strains were detected with the rpoB_p.Ser450Leu site, which had the highest mutation frequency.

•
INH: Among the 124 INH-resistant strains, no mutations were detected in 22 strains.The katG_p.Ser315Thr site was detected in 85 strains, which had the highest mutation frequency.

•
EMB: Of the 63 EMB-resistant strains, 37 did not contain any mutations.Eleven strains were detected with the embB_p.Met306Val site, which had the highest mutation frequencies.

•
LFX: Among the 60 LFX-resistant strains, mutations were not detected in 24.In 15 strains, gyrA_p.mutations were detected at the Asp94Gly site, showing the high-est frequency, followed by gyrA_p.Ala90Val (11 strains) and gyrA_p.Asp94Asn (7 strains).• MFX: For MFX-resistant strains, gyrA_p.Ala90Val and Asp94Gly were the two sites with the highest mutation rates, with three and eight strains in the drug-resistant group, respectively, and 11 and 10 strains in the sensitive group, respectively.• SM: Out of 159 SM-resistant strains, no mutations was detected in 63 strains; Out of the detected mutant strains, 72 strains showed mutations in the rpsL_p.Lys43Arg site.

•
AK: Of the five resistant strains, no mutation was detected in two strains, and the rrs_n.1401A>Gmutation was detected in three strains.

•
KM: No mutations were detected in 22 of the 28 KM-resistant strains.Three, two, and one strains were detected with mutations at rrs_n.1401A>G, eis_c.-37G>T,and eis_c.-10G>A,respectively.• CM: Among the 53 resistant strains, no mutations were detected in 50 strains and three were detected with mutated rrs_n.1401A>G.

Discussion
DR-TB, particularly MDR-TB, is a major concern in TB prevention and control.Phenotypic anti-TB DSTs play a key role in the early detection of DR-TB [16] and provide a basis for patients to receive standardized anti-TB treatment, thereby improving its efficacy [17].Standardized anti-TB treatment is also crucial to reducing the spread of DR-TB.Recently, the application of WGS technology for predicting drug resistance has received widespread attention [18,19].It covers the most common anti-TB drugs and provides timely detection results.Therefore, WGS has the potential for the rapid detection of drug-resistant MTB strains, leading to more anti-TB drugs and supporting TB treatment [20].Based on the progress in WGS, the WHO has selected a series of genetic mutations to predict drug resistance.
Based on the mutation sites recommended by the WHO, this study was conducted in a Chinese population with TB.We found that WGS can effectively identify non-resistant strains for most anti-TB drugs as well as RFP-and INH-resistant MTB strains among typical PTB patients.However, the predictive effect for EMB, SM, AK, LFX, MFX, KM, and CM resistance could be improved.
In the present study, the prediction of RFP resistance based on WHO-recommended mutation sites achieved a sensitivity and specificity of over 90%, and this prediction was closely related to the mutation of key sites in the rpoB gene for RFP resistance [21].Current sequencing technologies can detect common genetic mutations; however, different genetic variations lead to different characteristics of RFP resistance [22].The study also found that 75.3% of rpoB mutations in RFP-resistant strains only occurred at WHO-recommended loci, whereas 16.4% of mutations occurred simultaneously at sites outside the recommended ones.Thus, approximately 92% of the resistant strains have mutations at the WHOrecommended loci, whereas this proportion in the sensitive strains is only 0.9%.In INHresistant strains, the proportion of WHO-recommended mutations in inhA and katG genes, which are closely related to INH resistance, was approximately 80%, whereas the mutation frequency of the corresponding sites in sensitive strains was approximately 1.3%.
These results demonstrate that WGS technology is effective in predicting resistance to first-line anti-TB drugs.However, the predictive effect of resistance to second-line anti-TB drugs differs among studies.The difference might be either related to the genetic differences of the MTB strains in different regions or to the heavy dependence on the mutation sites recruited in the data analysis for drug-resistance prediction.
The studies mentioned above suggest that WGS technology is applicable for predicting resistance to some anti-TB drugs among patients with MDR-TB or high-risk populations, but its effectiveness may be limited among general TB patients.This study is based on the TB drug-resistance survey project, in which most of the recruited patients were nondrug-resistant cases.As reported in a previous study, about 10% of RFP-resistant strains and nearly 20% of INH-resistant strains are related to other mutations [29], which are more prominent in SM and EMB.This study also indicates that WGS technology based on WHO-recommended sites can detect 60% of SM-resistant strains and 40% of EMB-resistant strains, which suggests that the gene mutations leading to drug resistance occur outside the range of the WHO-recommended sites.
For second-line anti-TB drugs, predictions based on WHO-recommended mutation sites can achieve good specificity, indicating that WGS has a good predictive effect for non-resistant strains; however, the predictive effect for resistant strains is poor.For example, our study found that in MFX-resistant strains, 53.6% of gyrA mutations occurred at both WHO-recommended mutation sites and other sites, whereas the remaining 46.4% of gyrA mutations and 7.1% of gyrB mutations occurred at sites outside WHO recommendations.Therefore, we conclude that if WHO-recommended mutation sites are used for drug resistance prediction, WGS can only detect drug-resistant mutations, which account for approximately 50% of the total mutations.Similar results were observed for other second-line anti-TB drugs such as LFX, AK, KM, and CM.For example, in LFX-resistant strains, 60% of gyrA mutations occurred simultaneously at both WHO-recommended and other sites, whereas the remaining 40% occurred at other sites.Similar mutation-distribution characteristics were detected in eis and rrs, which were associated with resistance to SM, KM, AK, and CM, indicating that their predictive value for resistance was relatively low.
The specific distribution characteristics of mutations (Table S2) revealed that the distribution discovered by WGS varies for different mutation features.Some mutations exist in the resistant strains but not in the sensitive strains, and these mutations may be genetic variation sites closely related to resistance.Some sites are distributed in sensitive strains but not in resistant strains, and these sites may be mutations that are either phenotypically not expressed or unrelated to drug resistance.There is also a feature where mutation sites exist in both resistant and sensitive strains, and these sites may be related to cross-resistance or unrelated to resistance.The actual roles and mechanisms of these mutations are worth exploring to elucidate their relationship with drug resistance.
Additionally, in certain instances, no mutations were detected in the resistant strains.This indicates the complexity of microbial resistance mechanisms, including MTB.According to the current understanding, resistance to anti-TB drugs occurs due to genetic mutations at specific sites, and these mutations can be detected using molecular biology methods, including WGS.However, the detection of mutations is influenced by the method-ology itself, including factors such as sequencing depth, the range of loci, and the number of loci considered in the predictive model analysis.Moreover, drug resistance is influenced by various factors and mechanisms, such as epigenetics, that cannot be detected through sequencing technology.
In summary, the present study demonstrated that WGS has good potential for predicting drug resistance in TB, especially for the commonly used first-line drugs INH and RFP.However, the mutation features and transmission patterns of strains may vary with region, treatment, and prevention measures.Therefore, the selection of genetic mutations may play a vital role in predicting anti-TB drug resistance using WGS.It is necessary to establish prediction models and methods that are appropriate for different regions and strains with different genetic backgrounds.The advancement of WGS methods for drug-resistance prediction will play a positive and effective role in improving the standardization of anti-TB treatment and further reducing the occurrence and spread of drug resistance due to non-standard treatment.
This study was based on a drug resistance survey of general TB patients and thus had good representativeness for general TB patients, making the study's conclusions more applicable to the detection of anti-TB drug resistance.However, the limitations of this study include the low resistance rate among ordinary patients with TB, which resulted in a limited sample size for the resistant group.This affected the representativeness of the number and characteristics of the resistant sites presented in the study.
The present study concludes that WGS demonstrates a strong predictive capability for resistance to two essential anti-TB drugs (INH and RFP), based on the WHO-recommended mutation sites associated with resistance.This finding highlights the crucial role of WGS in promptly identifying and detecting MDR-TB.Nevertheless, the prediction of resistance to other anti-TB drugs based on the WHO-recommended mutation sites proves suboptimal.Consequently, it may be necessary to construct drug-resistance prediction models for these drugs, incorporating several additional mutation sites.As the comprehension of potential mechanisms of resistance to these anti-TB drugs deepens, the predictive value of WGS will become increasingly acknowledged and validated.

Study Design
A study on anti-TB drug resistance surveillance programs covering 30 of the 89 counties in Zhejiang Province was conducted from 2014 to 2018.According to the study design, all patients with positive sputum smears in designated TB hospitals after the start of the surveillance project were informed, and those that agreed to sign an informed consent form were recruited into the study.In each county, at least 30 positive MTB culture cases were required to be continuously recruited before the inclusion was stopped.

Ethics Statement
This study was approved by the Ethics Committee of the Zhejiang Provincial Center for Disease Control and Prevention.All eligible participants who agreed to participate in the program and signed an informed consent form were required to complete a questionnaire and provide at least one sputum specimen for subsequent studies.

Specimen Collection
Each patient was required to provide an initial sputum specimen before starting anti-TB or other relevant clinical treatments.After the sputum specimens were collected, further treatment and culture of MTB using solid or liquid media were carried out in the TB laboratory of each designated TB hospital.Subsequent to the culture tests, positive cultures were collected and sent to the TB-reference laboratory of the Zhejiang Provincial CDC for further DSTs.

Phenotypic DST
Phenotypic DSTs for anti-TB drugs were performed in the Zhejiang Provincial CDC TB laboratory by trained staff, according to standard operating procedures.The proportion method using a solid Löwenstein-Jensen medium was used.The anti-TB drugs used included INH, RFP, SM, EMB, fluoroquinolones (LFX and MFX, AK, KM, and CM.While the phenotypic drug sensitivity tests were conducted, the positive cultures of Mycobacterium were simultaneously inoculated on solid culture media containing thiophene-2-carboxylic acid hydrazide (TCH) and P-nitrobenzoic acid (PNB).Those that could grow on solid media containing TCH and PNB were regarded as NTM; otherwise, they were classified as MTB or Mycobacterium bovis.Those identified as NTM strains were excluded from this study, and the included samples for WGS were reconfirmed by the molecular method.Strains identified as NTM in the reconfirmation were also excluded from further analysis.Details of drug concentrations and testing procedures have been reported in our previous study [30].MTB culture products were inactivated, and genomic DNA was isolated using a bacterial DNA extraction kit (QIAGEN Inc., Dusseldorf, Germany), according to the manufacturer's instructions.The isolated and purified DNA products were transported via a cold chain to a sequencing facility.The purified genomic DNA was quantified using a TBS-380 fluorometer (Turner BioSystems Inc., Sunnyvale, CA, USA) to ensure that the DNA met the quality requirements (OD260/280 ≥1.5 and DNA quantity ≥150 ng) for library preparation, sequencing, and detection.

Library Construction and Genome Sequencing
At least 1 µg of genomic DNA per sample was used as the input material for DNA sample preparation.The DNA samples were treated and fragmented to a size of ~400 bp.Sequencing libraries were generated using the NEXTflex™ Rapid DNA-Seq Kit following specific steps: Connect the A and B adapters; screen and remove adapter-dimer fragments; select fragments by gel electrophoresis to retain those with one end as the A adapter and the other end as the B adapter; and produce single-stranded DNA fragments by sodium hydroxide denaturation for further bridge PCR amplification.The prepared library was sequenced using the Illumina NovaSeq 6000 PE150 system (San Diego, CA92122, USA).

Quality Control and Sample Selection
Raw sequencing data were processed using fastp (v0.20.1)[31] to remove adapter sequences and filter out low-quality bases.High-quality sequence data were then input into Kraken (v1.1.1)[32] for species identification, and samples identified as other species or with an MTB proportion below 80% were rejected as contaminated samples.Finally, the sequencing data from the remaining samples were aligned to the H37Rv reference genome (NC_000962.2) using BWA (v0.7.17) [33].A total of 1105 samples with an average sequencing depth >20X and average genome coverage >95% were selected for subsequent data analysis.

Identification of Drug Resistance-Associated Mutations
Clean sequencing data were input into the local version of TB-Profiler (v4.4.2) [34] and aligned with the reference genome of H37Rv (NC_000962.2) to identify the genotype of resistance-associated mutations and detect the resistance profile of 14 anti-TB drugs.Mutations with a frequency of less than 90% were excluded.WGS genotypic DST results were obtained based on the presence or absence of mutations in a database of drugresistance-associated mutations with evidence levels recommended by the WHO (including Tier 1 and Tier 2 mutations) [35].

Statistics Analysis
R (v4.0.5) [36] was used to calculate the drug-resistance rates of each phenotypic DST.The predictive performance of WGS genotypic DST was also compared with the phenotypic DST results, including sensitivity (for the phenotypically drug-resistant MTB strains, the proportion of those can be judged as genetically resistant by WGS, specificity (for the phenotypically sensitive MTB strains, the proportion of those can be judged as genetically sensitive by GWS), positive predictive value (for the genetically resistant MTB strains judged by WGS, the proportion of those are considered phenotypically resistant), and negative predictive value (for the genetically sensitive MTB strains detected by WGS, the proportion of those are considered phenotypically sensitive).Based on the detected mutation profiles in drug resistance genes of each sample, the overall mutation profiles were classified into four categories: A for no mutation; B for only mutations other than WHO-recommended ones; C for only mutations recommended by the WHO; and D for mutations that include both WHO-recommended ones and others.The distribution of drug-resistance-associated mutations was analyzed for each drug.

Conclusions
In light of the findings from our study, we draw the conclusion that WGS is a valuable tool for identifying genome-wide variants.Leveraging the genetic variants recommended by WHO, WGS proves to be effective in detecting resistance to RFP and INH, enabling the identification of multi-drug-resistant TB patients.However, it is evident that the genetic variants recommended for predicting resistance to other anti-TB drugs require further optimization and improvement.

Figure 1 .
Figure 1.Distribution characteristics of genetic mutation sites related to anti-TB drug resistance.

Figure 1 .
Figure 1.Distribution characteristics of genetic mutation sites related to anti-TB drug resistance.

Author Contributions:
Methodology, X.L.; software, Y.L.; formal analysis, B.C., Y.L. and K.W.; investigation, Y.P., F.W. and L.Z.; data curation, Y.Z., S.C. and X.W.; writing-original draft preparation, M.Z. and Z.L.; writing-review and editing, J.P. All authors have read and agreed to the published version of the manuscript.Funding: This research was funded by the Zhejiang Provincial Basic Public Welfare Research Program Project of China (Grant No. LTGY23H190002) and Medical and Health Research Project of Zhejiang Province (2020keyan512 and 2015KYA056).Institutional Review Board Statement: The study was approved by the Ethics Committee of Zhejiang Provincial Center for Disease Control and Prevention.Informed Consent Statement: Informed consent was obtained from all participants involved in the study.

Table 2 .
Comparison between WGS genotypic and phenotypic DST.

Table 3 .
Mutation distribution characteristics of anti-tuberculosis drug-resistance-associated genes.