Transcriptomic Biomarkers for Tuberculosis: Validation of NPC2 as a Single mRNA Biomarker to Diagnose TB, Predict Disease Progression, and Monitor Treatment Response

External validation in different cohorts is a key step in the translational development of new biomarkers. We previously described three host mRNA whose expression in peripheral blood is significantly higher (NPC2) or lower (DOCK9 and EPHA4) in individuals with TB compared to latent TB infection (LTBI) and controls. We have now conducted an independent validation of these genes by re-analyzing publicly available transcriptomic datasets from Brazil, China, Haiti, India, South Africa, and the United Kingdom. Comparisons between TB and control/LTBI showed significant differential expression of all three genes (NPC2high p < 0.01, DOCK9low p < 0.01, and EPHA4low p < 0.05). NPC2high had the highest mean area under the ROC curve (AUROC) for the differentiation of TB vs. controls (0.95) and LTBI (0.94). In addition, NPC2 accurately distinguished TB from the clinically similar conditions pneumonia (AUROC, 0.88), non-active sarcoidosis (0.87), and lung cancer (0.86), but not from active sarcoidosis (0.66). Interestingly, individuals progressing from LTBI to TB showed a constant increase in NPC2 expression with time when compared to non-progressors (p < 0.05), with a significant change closer to manifestation of active disease (≤3 months, p = 0.003). Moreover, NPC2 expression normalized with completion of anti-TB treatment. Taken together, these results validate NPC2 mRNA as a diagnostic host biomarker for active TB independent of host genetic background. Moreover, they reveal its potential to predict progression from latent to active infection and to indicate a response to anti-TB treatment.


Introduction
Tuberculosis (TB) is a curable infectious disease that remains a serious health problem worldwide, mainly due to inadequate diagnosis and treatment of infected individuals. infection of phagocytes by M. tuberculosis, but the same was not the case with the NPC1 protein [18]. Indeed, NPC1 mRNA was later shown not to be differentially expressed in the blood of TB patients [10].
In the present work, we performed an extensive evaluation of NPC2, EPHA4, and DOCK9 mRNA levels (i) as diagnostic biomarkers according to WHO TPP criteria for a community-based triage or referral test to identify people suspected of having TB, (ii) as potential biomarkers for predicting progression from latent TB infection to active disease [19], and (iii) as correlates of a clinical response to anti-TB treatment. For this purpose, we analyzed previously published and unpublished datasets from cross-sectional tuberculosis cohorts from Brazil, Haiti, India, South Africa, and the United Kingdom, as well as from two prospective studies from China and South Africa.

Ethics Statements
The Brazilian study was approved by the Ethics Committee of the Oswaldo Cruz Foundation under registration code 560-10 [10]. Details about sample collection and ethical procedures of the Haitian cohort were previously published [20,21]. The data from the other cohorts were publicly available at GEO [22].

Terminology
We used the following case definitions and abbreviations. Control = healthy uninfected individuals recently exposed to a TB index case or not (the control subjects recruited by de Araujo et al. [10,11] and Wipperman et al. [20,21] were known to have been exposed to a TB index case but TB infection was subsequently ruled out). Symptomatic non-TB (S-NTB) = symptomatic adults self-presenting for investigation of pulmonary tuberculosis and showing no laboratorial evidence of active TB disease, regardless of the history of known exposure to a TB index case (for more details please check [6]). LTBI = defined by a positive Mantoux TST and/or IGRA and absence of active TB diagnostic [5]. TB = active tuberculosis diagnosed by sputum smear and/or culture and/or GeneXpert MTB/RIF [6,7,10,11,[23][24][25]. TBtt = drug treatment for TB [7,24]. OD = other non-TB pulmonary diseases, such as active (aSARC) and non-active sarcoidosis (naSARC), lung cancer (LC), and pneumonia (PN) [23].

Inclusion Criteria for Eligible Published Datasets
The following search keywords were used to identify eligible datasets on GEO and ArrayXpress: human, transcriptomic, tuberculosis, and blood. Datasets that were deposited until March 2020 and were not present in our previous study [10] were included in this reanalysis. Inclusion criteria were studies containing the following characteristics. Biological specimens: whole blood or peripheral blood mononuclear cells (PBMC); subjects: adults (≥18 years old) with active pulmonary TB, LTBI, and controls with or without other diseases; transcriptomic profiling by RNAseq or microarray analysis. Exclusion criteria were samples from subjects <18 years of age or positive HIV status and non-human samples. Table 1 summarizes the included cohorts. The Brazilian cohort is composed of a joint analysis of sub-cohorts from our two previous studies. In the first, we had originally discovered NPC2, DOCK9, and EPHA4 mRNA as potential biomarkers for TB [10]. The second was focused on small noncoding RNA (sncRNA) expression in a larger group of samples [11], but we also used some of them to extract the normalized mRNA expression values for the present study. One unpublished dataset containing a cohort of Haitians and publicly available datasets from India were also included, following the criteria detailed above. Using all five cross-sectional cohorts, we evaluated expression of the three mRNA in TB patients from different geographic areas, also for the differentiation from non-TB pulmonary infections. In the longitudinal cohorts (two public datasets from China and South Africa), we evaluated their expression (i) in the progression from LTBI to active TB and (ii) during follow-up of anti-TB treatment. For the Brazilian and Haitian cohorts, peripheral whole blood was collected in Paxgene RNA tubes (PreAnalytiX, SWZ) and processed and analyzed as described previously [10].

Acquisition and Normalization of Datasets
Transcriptomic data (microarray or RNAseq) of whole blood or PBMC were obtained as follows. GEO2R web tool [27] was used to gather normalized expression values of microarray studies. For RNAseq data, the FASTQ files were exported to the GREIN tool [28] and submitted to standard normalization. Processed expression data of E-MTAB-8290 [6] were downloaded from the ArrayExpress platform (https://www.ebi.ac.uk/arra yexpress/experiments/E-MTAB-8290/?page=1&pagesize=250, accessed on 31 March 2020) and included in our analysis. The normalized microarray or RNAseq expression values were exported to Prism 6 (GraphPad Software, 6.07, San Diego, CA, USA) for statistical analysis.

Statistical Analysis
Significance of differences between two groups was assessed with the Mann-Whitney (cross-sectional) or Wilcoxon (longitudinal) test. For comparisons of >2 groups, the Kruskal-Wallis (cross-sectional) or Friedman test (longitudinal) was used. Means, medians, standard deviations (SD), dispersion plots, area under the receiver operating characteristics curve (AUROC) values, 95% confidence intervals (CI), and coefficient of variation (CV) were computed using Prism 6 (GraphPad Software).

TB Detection Studies Comparing with Control and LTBI
Cross-sectional published datasets were included in the re-analysis, which comprise subjects enrolled between 2009 and 2013 in India and the United Kingdom (London). Together with the Brazilian cohort enrolled from 2010 to 2013 and the Haitian cohort collected from 2016 to 2020, four cohorts from four different countries were, thus, available for analysis. In a first step, we assessed the diagnostic biomarker potential of DOCK9, EPHA4, and NPC2 mRNA to distinguish among active TB, LTBI, and control groups.
In accordance with our previous findings [10], active TB induced significantly higher NPC2 mRNA levels and lower expression of DOCK9 and EPHA4 mRNA in the cohorts from Haiti and India (Figure 1). An NPC2 high expression pattern (similar to the one observed among TB cases) was more frequent among LTBI than controls in the cohorts from Haiti ( Figure 1H) and Brazil ( Figure 1G), whilst LTBI from the Indian cohort showed less dispersed expression ( Figure 1I). The control group showed a more heterogeneous expression profile of DOCK9 and EPHA4, i.e., greater dispersion along the y-axis, which is more noticeable among the Brazilian (compare Figure 1A,D with Figure 1G) and Indian (compare Figure 1C,F with Figure 1I) cohorts. Overall, this analysis showed (i) that expression of all three mRNA changed in a similar fashion in blood of TB cases from the different geographic areas and (ii) that LTBI cases were more likely than non-infected samples to exhibit the "TB-like" pattern NPC2 high . The Mann-Whitney test was used to assess significance between 2 groups. The Kruskal-Wallis test was used to assess significance of differences across more than 2 groups, followed by Dunn's multiple comparison tests correction. * p-value < 0.05; ** p-value < 0.01; *** p-value < 0.005 with respect to TB. The Mann-Whitney test was used to assess significance between 2 groups. The Kruskal-Wallis test was used to assess significance of differences across more than 2 groups, followed by Dunn's multiple comparison tests correction. * p-value < 0.05; ** p-value < 0.01; *** p-value < 0.005 with respect to TB.
In line with our previous findings [10], in this new reanalysis, NPC2 showed the highest AUROC values for TB vs. non-TB discrimination among Brazilians (AUROC, TB vs. control/LTBI: 0.94), which was validated in the Indian (TB vs. LTBI: 0.98) as well as the British (TB vs. control: 0.99) cohort. Intriguingly, a slightly lower performance of NPC2 (AUROC, TB vs. LTBI = 0.89) to differentiate TB vs. LTBI compared to DOCK9 (TB vs. LTBI = 0.95) and EPHA4 (TB vs. LTBI = 0.96) was observed in the Haitian cohort. As in high TB-burden settings, the chances of infection and progression to disease are higher, this might indicate that some subjects could actually be in the initial stages of progression, which could interfere with the performance of NPC2 high as a biomarker for the binary distinction TB vs. LTBI ( Figure 1).
Overall, the biomarker potential of NPC2 high was successfully validated in these new analyses, which showed the highest mean AUROC values in the comparisons between TB and control (mean AUROC = 0.95) or LTBI (mean AUROC 0.94, Table 2). Further analysis will be performed in the next sections to obtain the sensitivity and specificity values of NPC2 high for TB detection across the different cohorts included here. DOCK9 and EPHA4 also showed interesting biomarker value in this context.

Identification of TB in Individuals Presenting with Respiratory Symptoms
Data from whole blood of individuals with respiratory symptoms, self-presenting for investigation of pulmonary TB, were available from the South African cohort [6]. Even though all subjects had respiratory symptoms, those diagnosed with TB by positive sputum smear and/or sputum liquid culture and/or GeneXpert TB/RIF test showed significant differences in DOCK9 (p-value = 0.0002), EPHA4 (p < 0.0001), and NPC2 (p < 0.0001) mRNA expression when compared with non-TB subjects, as defined by negative results by the aforementioned diagnostic tests ( Figure 2). However, all three mRNA were only moderately accurate in classifying the subjects into TB and non-TB cases: EPHA4 showed a slightly higher AUROC (0.71, 95% CI: 0.63 to 0.79) than NPC2 (0.68, 95% CI: 0.60 to 0.77) or DOCK9 (0.675, 95% CI: 0.59 to 0.76).  expression when compared with non-TB subjects, as defined by negative results by the aforementioned diagnostic tests ( Figure 2). However, all three mRNA were only moderately accurate in classifying the subjects into TB and non-TB cases: EPHA4 showed a slightly higher AUROC (0.71, 95% CI: 0.63 to 0.79) than NPC2 (0.68, 95% CI: 0.60 to 0.77) or DOCK9 (0.675, 95% CI: 0.59 to 0.76). The diagnostic groups comprise adults with respiratory symptoms self-presenting for investigation of pulmonary TB who were ultimately diagnosed as having TB or not. S-NTB: symptomatic adults showing no laboratorial evidence of active TB disease, regardless of the history of known exposure to a TB index case. TB = active tuberculosis. The Mann-Whitney test was used to assess significance between two groups: *** p-value < 0.005, **** p-value < 0.001. The diagnostic groups comprise adults with respiratory symptoms self-presenting for investigation of pulmonary TB who were ultimately diagnosed as having TB or not. S-NTB: symptomatic adults showing no laboratorial evidence of active TB disease, regardless of the history of known exposure to a TB index case. TB = active tuberculosis. The Mann-Whitney test was used to assess significance between two groups: *** p-value < 0.005, **** p-value < 0.001.

Differentiation from Other Pulmonary Diseases
Using the available datasets from the United Kingdom (GSE42826), we evaluated the potential of the three mRNAs to discriminate between TB and other lung diseases that are likely to constitute clinically important confounders. This study generated whole blood microarray transcriptional data from patients with TB or OD and from controls ( Figure 3). Differentiation from Other Pulmonary Diseases Using the available datasets from the United Kingdom (GSE42826), we evaluated the potential of the three mRNAs to discriminate between TB and other lung diseases that are likely to constitute clinically important confounders. This study generated whole blood microarray transcriptional data from patients with TB or OD and from controls ( Figure 3).
Similarly to the observation in Section Studies Comparing with Control and LTBI, in the British cohort, DOCK9 and EPHA4 expression was also significantly lower (p < 0.0001) in TB patients compared to control ( Figure 3A,B). However, only NPC2 expression levels were significantly higher in TB than in the majority of the other lung diseases, such as non-active sarcoidosis (p = 0.021), lung cancer (p = 0.018), and pneumonia (p = 0.006) (Figure 3C).
Similarly to TB, but on a smaller magnitude, patients with active sarcoidosis had significantly higher blood levels of NPC2 in comparison with control (p < 0.0001), which was not observed for any of the other disease groups ( Figure 3C, control vs. non-active sarcoidosis: 0.07, vs. lung cancer: 0.052, vs. pneumonia > 0.99). In fact, TB induced the highest median NPC2 blood levels (0.58, 95% CI 0.37-0.80), which was followed by active sarcoidosis (0.46, 95% CI 0.23-0.71). Lower median values were observed in non-active sarcoidosis (0.19, 95% CI 0.091-0.54), lung cancer (0.063, 95% CI 0.0018-0.84), and pneumonia (0.00109, 95% CI 0.34-0.675) and the lowest in the control group (−0.22, 95% CI 0.296-0.15). Thus, higher NPC2 levels in peripheral blood might underlie immunopathological processes that are similar in active sarcoidosis and in TB, but are less common during the cessation of sarcoidosis symptoms, and even lower in lung cancer and pneumonia.
In general, DOCK9 and EPHA4 ( Figure 3; Table 2, mean AUROC <0.65) showed lower potential to differentiate between TB and these clinical confounders when compared to NPC2 ( Table 2, mean AUROC = 0.73). All mRNAs had low AUROC values for the distinction TB vs. active sarcoidosis ( Table 2, AUROC between 0.51 and 0.69). In contrast, a moderate to high potential to discriminate TB from non-active sarcoidosis (AUROC = 0.87), lung cancer (0.86), and pneumonia (0.88) was observed. In fact, only one patient diagnosed with lung cancer and one with pneumonia showed an NPC2 high profile (black dots, Figure  3C). mRNA in blood. Diagnostic groups comprise: controls; TB = active tuberculosis; aSARC = active sarcoidosis; naSARC = non-active sarcoidosis; LC = lung cancer; PN = pneumonia. The Kruskal-Wallis test was used to assess significance of differences across more than 2 groups, followed by Dunn's multiple comparison tests correction. * p-value < 0.05; ** p-value < 0.01; *** p-value < 0.005 with respect to TB. • single LC and PN patients presenting an NPC2 high profile. UK = United Kingdom. mRNA in blood. Diagnostic groups comprise: controls; TB = active tuberculosis; aSARC = active sarcoidosis; naSARC = non-active sarcoidosis; LC = lung cancer; PN = pneumonia. The Kruskal-Wallis test was used to assess significance of differences across more than 2 groups, followed by Dunn's multiple comparison tests correction. * p-value < 0.05; ** p-value < 0.01; *** p-value < 0.005 with respect to TB. • single LC and PN patients presenting an NPC2 high profile. UK = United Kingdom.
Similarly to the observation in Section Studies Comparing with Control and LTBI, in the British cohort, DOCK9 and EPHA4 expression was also significantly lower (p ≤ 0.0001) in TB patients compared to control ( Figure 3A,B). However, only NPC2 expression levels were significantly higher in TB than in the majority of the other lung diseases, such as non-active sarcoidosis (p = 0.021), lung cancer (p = 0.018), and pneumonia (p = 0.006) ( Figure 3C).
Similarly to TB, but on a smaller magnitude, patients with active sarcoidosis had significantly higher blood levels of NPC2 in comparison with control (p < 0.0001), which was not observed for any of the other disease groups ( Figure 3C, control vs. non-active sarcoidosis: 0.07, vs. lung cancer: 0.052, vs. pneumonia > 0.99). In fact, TB induced the highest median NPC2 blood levels (0.58, 95% CI 0.37-0.80), which was followed by active sarcoidosis (0.46, 95% CI 0.23-0.71). Lower median values were observed in non-active sarcoidosis (0.19, 95% CI 0.091-0.54), lung cancer (0.063, 95% CI 0.0018-0.84), and pneumonia (0.00109, 95% CI 0.34-0.675) and the lowest in the control group (−0.22, 95% CI 0.296-0.15). Thus, higher NPC2 levels in peripheral blood might underlie immunopathological processes that are similar in active sarcoidosis and in TB, but are less common during the cessation of sarcoidosis symptoms, and even lower in lung cancer and pneumonia.
In general, DOCK9 and EPHA4 ( Figure 3; Table 2, mean AUROC ≤ 0.65) showed lower potential to differentiate between TB and these clinical confounders when compared to NPC2 ( Table 2, mean AUROC = 0.73). All mRNAs had low AUROC values for the distinction TB vs. active sarcoidosis ( Table 2, AUROC between 0.51 and 0.69). In contrast, a moderate to high potential to discriminate TB from non-active sarcoidosis (AUROC = 0.87), lung cancer (0.86), and pneumonia (0.88) was observed. In fact, only one patient diagnosed with lung cancer and one with pneumonia showed an NPC2 high profile (black dots, Figure 3C).

Disease Progression
Our previous observation of a subgroup of LTBI subjects with NPC2 TB-like expression profile [10] raised the question whether this would indicate progression of sub-clinical cases.
The Pan African cohort comprises a series of samples collected from household contacts after a person who was recently diagnosed with TB returned to the household (GSE94438). Based on samples collected three times at intervals of 6 months, subjects who developed TB are here classified as "TB progressors" (n = 64), and those who did not develop TB in the 18-month window are classified as "non-progressors" (n = 208) [24]. At the time of Mtb exposure (time 0), the TB progressors already show differential expression of DOCK9 and NPC2 in comparison to the non-progressors ( Figure 4A-C, small clades under the graph; p < 0.026). During follow up, non-progressors and TB progressors initially showed similar expression changes until month 6, i.e., down-regulation of DOCK9 (compare white and grey bars, Figure 4A) and EPHA4 ( Figure 4B) and up-regulation of NPC2 ( Figure 4C). Interestingly, after month 6 of exposure, expression of all three mRNA showed a trend toward normalization in non-progressors, although a significant difference was only observed for NPC2 (as indicated by the significant down-regulation between the dashed and dotted white bars in Figure 4C). In contrast, NPC2 mRNA levels in TB progressors continued to increase throughout the follow-up period ( Figure 4C, compare white and grey bars).
In order to understand the dynamics of transcriptional changes from the time of infection to disease onset, we performed additional analyses on the Pan African GSE94438 dataset. They were grouped into TB progressors, according to the time elapsed between the blood collection and the diagnosis of TB (T<3 m = <3 m, T4-6m = 4-6 m, T7-12m = 7-12 m, or T13-18 m = 13-18 m) and non-progressors at enrollment (time 0) for comparison. This analysis showed that TB progressors already had a significantly lower expression of DOCK9 at T13-18m before disease development ( Figure 4D). Significant up-regulation of NPC2 among TB progressors was observed from T7 to 12m ( Figure 4F), while EPHA4 showed later expression changes at T4-6m ( Figure 4E). 0, x FOR PEER REVIEW 12 of 21 Expression values were obtained from the dataset GSE94438. The Mann-Whitney test was used to assess significance between two groups. The Kruskal-Wallis test was used to assess significance of global differences across more than two groups, followed by Dunn's multiple comparison tests correction. Small bar: p-value comparing groups. # p-value < 0.1; * p-value < 0.05; ** p-value < 0.01; *** p-value < 0.005.

Correlation with Completion of Anti-TB Treatment
Besides their potential to detect disease cases, expression of optimal biomarkers for TB should reflect successful completion of anti-TB treatment. To evaluate this aspect, we used two datasets, China GSE54992 [25] and South Africa GSE89403 [7], which featured prospective sampling during anti-TB treatment. The dataset GSE54992 comprises microarray expression data from PBMC, and GSE89403 is an RNAseq-based study of whole blood.
In accordance with the cross-sectional evaluation shown in Figures 1 and 2, in this section, significant differences in DOCK9, EPHA4, and NPC2 expression were also evident in all pairwise comparisons between untreated active TB patients (TB) and LTBI and controls (see Figure 5, p-values in black font). However, although DOCK9 was down-regulated in the previous analyses, it was up-regulated in PBMC from TB cases in the Chinese dataset ( Figure 5A). Apart from this discrepancy, significant DOCK9 low (GSE89403 only; p < 0.0001), EPHA4 low (GSE54992 and GSE89403; p < 0.001) and NPC2 high (GSE54992 and GSE89403; p < 0.03) expression differences in the comparison TB vs. control/LTBI were also observed among these datasets from China and South Africa.
During anti-TB chemotherapy, expression of these three genes changed significantly over time. Even though expression of all three tended to normalize, it is notable that only NPC2 expression levels did not differ between control/LTBI groups and TBtt by the end of therapy in both cohorts ( Figure 5C [control/LTBI vs. TBtt (6 m), p = 0.6] and 5F [control vs. TBtt (24 weeks), p = 0.096]). In the South African cohort, DOCK9 also showed a gradual Expression values were obtained from the dataset GSE94438. The Mann-Whitney test was used to assess significance between two groups. The Kruskal-Wallis test was used to assess significance of global differences across more than two groups, followed by Dunn's multiple comparison tests correction. Small bar: p-value comparing groups. # p-value < 0.1; * p-value < 0.05; ** p-value < 0.01; *** p-value < 0.005.
We expected that these expression changes would be more pronounced with decreasing time to disease onset. Indeed, a median decrease in DOCK9 ( Figure 4D) and EPHA4 ( Figure 4E) was observed in individuals with time-to-disease of 3 and 6 months, but only NPC2 up-regulation was significantly increased up to one year of time-to-disease, with the highest expression observed in individuals with the shortest time-to-disease (3 months) (p = 0.003; Figure 4F, dark grey bars). Altogether, these findings suggest that monitoring NPC2 expression in blood might serve as biomarker for progression to TB among individuals recently exposed to Mtb or among household contacts of recently diagnosed TB cases.

Correlation with Completion of Anti-TB Treatment
Besides their potential to detect disease cases, expression of optimal biomarkers for TB should reflect successful completion of anti-TB treatment. To evaluate this aspect, we used two datasets, China GSE54992 [25] and South Africa GSE89403 [7], which featured prospective sampling during anti-TB treatment. The dataset GSE54992 comprises microarray expression data from PBMC, and GSE89403 is an RNAseq-based study of whole blood.
In accordance with the cross-sectional evaluation shown in Figures 1 and 2, in this section, significant differences in DOCK9, EPHA4, and NPC2 expression were also evident in all pairwise comparisons between untreated active TB patients (TB) and LTBI and controls (see Figure 5, p-values in black font). However, although DOCK9 was downregulated in the previous analyses, it was up-regulated in PBMC from TB cases in the Chinese dataset ( Figure 5A). Apart from this discrepancy, significant DOCK9 low (GSE89403 only; p < 0.0001), EPHA4 low (GSE54992 and GSE89403; p < 0.001) and NPC2 high (GSE54992 and GSE89403; p ≤ 0.03) expression differences in the comparison TB vs. control/LTBI were also observed among these datasets from China and South Africa. normalization and no statistically different expression levels between control and TBtt at the end of therapy.
Moreover, expression data available from the Haitian cohort after two weeks of anti-TB treatment ( Figure S1) corroborated the significant reduction in NPC2 mRNA blood levels even in this early stage of anti-TB treatment. ; GSE89403: 7 days, and 1 month), and at the end of therapy (GSE54992: and GSE89403: 6 months). The Friedman test was used to assess significance of the longitudinal analysis among TB cases during treatment. The Mann-Whitney test was used to assess significance between two groups vs. LTBI or vs. control. The Kruskal-Wallis test was used to assess significance of global differences across more than two groups, followed by Dunn's multiple comparison tests correction. p-values obtained comparing TBtt time intervals against non-TB cases (LTBI and control) are shown in black (•). p-values obtained comparing expression changes in TBtt patients during treatment are shown in blue (•). * p-value < 0.05; ** p-value < 0.01; *** p-value < 0.005.

NPC2 Accuracy: Sensitivity and Specificity Analysis
Overall, in the previous sections, NPC2 showed better discriminatory potential for TB across the different group comparison analyses. Therefore, we proceeded with a more detailed ROC curve analysis exclusively for NPC2.
As the selection of cut-off values can be adjusted in order to improve either the sensitivity or the specificity of a given test, we decided to assess NPC2 accuracy in various possible diagnostic scenarios. For this purpose, we calculated its sensitivity and specificity for the detection of TB according to: (i) maximum Youden index; (ii) the TPP for a community-based triage or referral test to identify people suspected of having TB (9); (iii) the TPP for a test for predicting progression from TB infection to active disease (12). ; GSE89403: 7 days, and 1 month), and at the end of therapy (GSE54992: and GSE89403: 6 months). The Friedman test was used to assess significance of the longitudinal analysis among TB cases during treatment. The Mann-Whitney test was used to assess significance between two groups vs. LTBI or vs. control. The Kruskal-Wallis test was used to assess significance of global differences across more than two groups, followed by Dunn's multiple comparison tests correction. p-values obtained comparing TBtt time intervals against non-TB cases (LTBI and control) are shown in black (•). p-values obtained comparing expression changes in TBtt patients during treatment are shown in blue (•). * p-value < 0.05; ** p-value < 0.01; *** p-value < 0.005.
During anti-TB chemotherapy, expression of these three genes changed significantly over time. Even though expression of all three tended to normalize, it is notable that only NPC2 expression levels did not differ between control/LTBI groups and TBtt by the end of therapy in both cohorts ( Figure 5C [control/LTBI vs. TBtt (6 m), p = 0.6] and 5F [control vs. TBtt (24 weeks), p = 0.096]). In the South African cohort, DOCK9 also showed a gradual normalization and no statistically different expression levels between control and TBtt at the end of therapy.
Moreover, expression data available from the Haitian cohort after two weeks of anti-TB treatment ( Figure S1) corroborated the significant reduction in NPC2 mRNA blood levels even in this early stage of anti-TB treatment.

NPC2 Accuracy: Sensitivity and Specificity Analysis
Overall, in the previous sections, NPC2 showed better discriminatory potential for TB across the different group comparison analyses. Therefore, we proceeded with a more detailed ROC curve analysis exclusively for NPC2.
As the selection of cut-off values can be adjusted in order to improve either the sensitivity or the specificity of a given test, we decided to assess NPC2 accuracy in various possible diagnostic scenarios. For this purpose, we calculated its sensitivity and speci-ficity for the detection of TB according to: (i) maximum Youden index; (ii) the TPP for a community-based triage or referral test to identify people suspected of having TB (9); (iii) the TPP for a test for predicting progression from TB infection to active disease (12).
As shown in Table 3, analyses performed at the maximum Youden index (sensitivity + specificity/2) showed high mean sensitivities for TB detection varying between 87.5% and 100% vs. controls and moderate values when compared to LTBI (72.7-75%), while exhibiting high mean specificity (90.2-100%). The analysis comparing TB against OD showed the lowest mean specificity in the case of TB vs. active sarcoidosis (56.3%), but for the other TB clinical confounders, these values were ≥79%, maintaining a sensitivity of ≥72.7%. However, discriminatory power decreased when comparing TB vs. S-NTB, as sensitivity and specificity were ≤67.8%.
By adjusting the accuracy analysis to meet the TPP for a community-based triage or referral test to identify people suspected of having TB, we observed that most of the analyses meet the minimum sensitivity (>90%) and specificity (>70%) requirements (Table 3, bold font). Only the cohorts from Brazil and South Africa (in which the control and S-NTB groups were composed of recent close contacts or symptomatic respiratory patients, respectively) and comparisons between TB and naSARC and aSARC did not meet this TPP minimum criteria.
In contrast, it is noteworthy that in the longitudinal analysis, NPC2 high fulfilled the minimum TPP sensitivity and specificity (>75%) criteria for a monitoring test for prediction of progression from latent to active TB [19]. Here, NPC2 high could detect subjects that will progress to active TB in a time interval <3 m before disease onset (Pan African cohort GSE94438, see bold numbers in Table 3), but not later than that. If we aim for a maximum detection of TB cases, i.e., higher sensitivity, NPC2 high demonstrated 92.3% mean sensitivity and 75% mean specificity to detect TB < 3 m before disease onset.    Diagnostic groups are composed of exposed controls, latent tuberculosis infection (LTBI) and active tuberculosis (TB). S-NTB = symptomatic non-TB; aSARC = active sarcoidosis; naSARC = non-active sarcoidosis; LC = lung cancer; PN = Pneumonia. BR = Brazil, H = Haiti; I = India; UK = United Kingdom; PA = Pan African (SA, the Gambia, Ethiopia, and Uganda). a Results that fulfilled the TPP minimum sensitivity and specificity recommendations are printed in bold. b Healthy individuals recently exposed to a TB index case. c Healthy individuals with no known recent contact with a TB index case. NA = non-applicable.

Discussion
We have performed an external validation/re-evaluation of the mRNA triplet DOCK9, EPHA4, and NPC2, which we had previously identified as potential biomarkers in whole blood of Brazilian TB patients [10]. We observed similar changes in expression among subjects from different countries, regardless of differences in genetic background and local TB incidence ( Figures 1 and 2, Table 2). Our results suggest that the gene set was differentially expressed among patients with active TB and that NPC2 high should perform better even in high burden areas such as South Africa (520/100.000), India (199/100.000), and Haiti (176/100.000). Notably, the single microarray dataset included in this study (GSE42826) confirmed the differential expression of the three genes originally uncovered by RNAseq, corroborating a previous study showing the equivalence of microarray and RNAseq to assess differential gene expression [29].
Our study, which was done in different cohorts, shows mean AUROC values > 0.90 of NPC2 for the detection of active TB, which was consistent in all cohorts for the discrimination between TB cases and control, with the highest mean AUROC values compared to EPHA4 and DOCK9. In addition, NPC2 expression was significantly lower in other lung diseases, except for active sarcoidosis (Figure 3). Even though mycobacteria and propionibacteria are the most commonly implicated etiologic agents of sarcoidosis, based on studies using PCR amplification of microbial DNA, so far, Propionibacterium acnes is the only microorganism successfully isolated from sarcoid lesions by bacterial culture, which may help in the diagnostic differentiation from TB [30][31][32][33].
The biomarker reproducibility of a signature with a small number of genes may be a concern, as some studies have reported that increasing the number of genes may improve sensitivity and specificity [26]. Yet, other studies have reported that a signature comprising 16 genes, which initially predicted progression from LTBI to TB, decreased in accuracy during external validation, failing in the validation in cohorts from other countries [24]. More recently, a systematic review evaluated the accuracy of several proposed transcriptional signatures in a setting with a high burden of TB and HIV in South Africa [6]. Note that none of 27 selected signatures met the WHO optimum or minimum criteria for triage (95% sensitivity and 80% specificity) or confirmatory test (65% sensitivity and 98% specificity), including NPC2 [10]. In contrast, in the present study, especially NPC2 demonstrated high mean sensitivity (>87.5%) to distinguish between TB and LTBI, even though a "TB-like" expression profile was also observed among some subjects classified as LTBI and, more frequently, among S-NTB. It is also important to mention that the study by Turner et al. [6] did not feature a confirmatory analysis of these genes, whereas differential expression of NPC2 in TB was already confirmed by (RT) qPCR in a different Brazilian cohort not included in the present study [10]. Additionally, a diagnostic algorithm tree combining NPC2 expression cut-off values combined with the results provided by the low-cost chest X-ray examination enabled accurate discrimination between TB and LTBI individuals [10]. When applied in a population to be prospectively evaluated for TB, this type of holistic approach combining imaging findings with transcriptional signatures should be considered in future studies.
Furthermore, we also found that NPC2 high met the minimum sensitivity and specificity TPP criteria for predicting progression from latent tuberculosis to active TB in most of the cohorts. The lower sensitivity/specificity observed for the comparisons with the South African respiratory symptomatic cohort (E-MTAB-8290, Table 3) might seem to be a drawback at first. However, we have to consider the limitations of the current diagnostic tests for TB detection [2][3][4][5] and that an NPC2 high pattern may already be observed at earlier stages of the TB progression spectrum (as seem on Figure 4F). Thus, S-NTB individuals showing an NPC2 high profile in blood could be harboring a sub-clinical/paucibacillary TB infection.
We identified NPC2 as an accurate marker for identifying individuals at high risk of progressing from LTBI to TB. If the sensitivity of 92.3% and specificity of 75% (Table 3) for predicting progression from latent to active TB is corroborated in additional studies, monitoring NPC2 expression in blood can contribute to the detection and early treatment of those LTBI cases at risk of progression to active TB by using a simpler method, (RT) qPCR, which was already validated for this marker [10]. In addition, chest X-ray would be an easy tool to perform in order to screen high-risk LTBI vs. active TB cases among subjects with an NPC2 high profile. On the other hand, considering that all-trans retinoic acid (ATRA) triggers an NPC2-dependent antimicrobial response against Mtb [18], it is important to investigate whether vitamin A deficiency could contribute to false-negative results in individuals otherwise expected to have an NPC2 high pattern. Clearly, there is a need for additional prospective studies to validate our current findings.
Our data of non-progressor household TB contacts showed significant changes in NPC2 gene expression only 6 months after exposure to the initial index case, with a return of expression to the initial level after 18 months (Figure 4). However, a different dynamic was observed for the TB progressors, which showed a continual up-regulation of this gene expression toward the TB profile with increasing proximity to disease onset ( Figure 4F). Nowadays, TST and IGRA are the eligible tests to identify M. tuberculosis exposure, although they cannot distinguish TB vs. LTBI or identify individuals who will progress from latent to active TB within the next two years [5]. These immune responsebased tests do not reflect the presence of live bacilli in the host and are still positive after completion of treatment of the infection. As exemplified by NPC2 mRNA, this drawback can be overcome by measuring the expression of genes that play a functional role in host defenses against the pathogen. Measuring its activity at the mRNA level may be a particularly attractive option due to the highly dynamic nature of transcriptional responses in the host's biological processes [34]. The sensitivity and high dynamic range of many transcriptomic responses likely also explains the ability of NPC2 to predict progression to TB and its correlation with treatment completion.
TB control critically relies on the identification of individuals with active disease and the administration of complete drug treatment. However, to follow the response to treatment, the tools available, such as in vitro culture of clinical specimens, are slow and laborious. Rapid molecular tests, such as GeneXpert ® , as well as the less expensive sputum stain for acid-fast bacilli, may produce false positive results due to the detection of residual nucleic acids and structure of dead bacilli, respectively [35]. For the South African cohort, we observed a significant decrease (p = 0.0037, Figure 5F) in NPC2 levels during treatment, with a borderline significance (p = 0.057, Figure 5C) for the Chinese cohort. The presence of drug-resistant strains can affect the outcome of anti-TB treatments and could explain the borderline significance observed on Figure 5C (blue line). Unfortunately, we cannot do any further analysis in this regard, since information on the presence of drug resistance is not available for these cohorts.

Conclusions
In summary, this analysis of publicly available datasets from different geographic areas validates our previous findings that NPC2 is a promising host biomarker for diagnosing TB. Potential use as a differential diagnostic between TB and other lung diseases was also observed, although the diagnostic performance was slightly lower among subjects from South Africa with respiratory symptoms. Notably, we obtained additional evidence indicating that up-regulation of this gene in blood might also be used for predicting progression from latent to active infection (also fitting to the minimum TPP criteria from WHO) and for monitoring response to anti-TB treatment. The relatively low number of subjects in the independent validation cohorts is an important limitation of this study. Further studies are required to corroborate our findings, including heterogeneous cohorts with larger sample sizes, different TB clinical confounders, and doing prospective evaluations during disease progression and anti-TB treatments.