Evaluating Latent Tuberculosis Infection Test Performance Using Latent Class Analysis in a TB and HIV Endemic Setting

Background: Given the lack of a gold standard for latent tuberculosis infection (LTBI) and paucity of performance data from endemic settings, we compared test performance of the tuberculin skin test (TST) and two interferon-gamma-release assays (IGRAs) among health-care workers (HCWs) using latent class analysis. The study was conducted in Cape Town, South Africa, a tuberculosis and human immunodeficiency virus (HIV) endemic setting Methods: 505 HCWs were screened for LTBI using TST, QuantiFERON-gold-in-tube (QFT-GIT) and T-SPOT.TB. A latent class model utilizing prior information on test characteristics was used to estimate test performance. Results: LTBI prevalence (95% credible interval) was 81% (71–88%). TST (10 mm cut-point) had highest sensitivity (93% (90–96%)) but lowest specificity (57%, (43–71%)). QFT-GIT sensitivity was 80% (74–91%) and specificity 96% (94–98%), and for TSPOT.TB, 74% (67–84%) and 96% (89–99%) respectively. Positive predictive values were high for IGRAs (90%) and TST (99%). All tests displayed low negative predictive values (range 47–66%). A composite rule using both TST and QFT-GIT greatly improved negative predictive value to 90% (range 80–97%). Conclusion: In an endemic setting a positive TST or IGRA was highly predictive of LTBI, while a combination of TST and IGRA had high rule-out value. These data inform the utility of LTBI-related immunodiagnostic tests in TB and HIV endemic settings.


Introduction
The testing for and treatment of latent tuberculosis (TB) infection (LTBI) in targeted populations is an important strategy in TB elimination [1,2]. In South Africa targeted LTBI testing is aimed at children under the age of five years as infection is one of the criteria used in the diagnosis of TB. In adults, it is used to diagnose latent infection in immunosuppressed patients, specifically human immunodeficiency virus (HIV) positive individuals and silicotic gold miners who would benefit from

Population and Test Outcomes
HCWs were drawn from seven healthcare facilities providing TB diagnostic and treatment services. Five facilities were located in the Cape Town suburb of Khayelitsha, a very high TB incidence area with a TB case notification rate of over 1, 600/100,000, 70% of whom are co-infected with HIV [17]. The study was approved by the Human Research Ethics Committee of the Faculty of Health Sciences at the University of Cape Town, South Africa (Reference Number: 417/2008; date of approval: 3 November 2008).
From an eligible study population of 764 HCWs, 505 were recruited to the study. All participants underwent administration of TST and venesection for QFT-GIT and T-SPOT.TB. TST was performed using 1 tuberculin unit (TU) dose of purified protein derivative (PPD) RT23 (Statens Serum Institut, Copenhagen, Denmark). Skin induration was read after 48-72 h using a ruler and ballpoint. An induration of at least 10 mm was considered positive. In the case of known HIV infection 5 mm or more was considered positive.
Blood samples for the IGRA assays were drawn concurrently or within three days of administering the TST to eliminate potential boosting, i.e., the generation of a false positive response to the recently administered tuberculin [18,19]. Administration and interpretation of IGRA test results were done in accordance with manufacturer's instructions [9].
In view of the high TB incidence in this province, participants were also screened for current TB disease by way of a symptom screening questionnaire and chest radiograph. Those with a suspect radiograph or positive symptom screen were then referred for sputum microscopy and culture to confirm TB diagnosis. Statistical analyses were performed using Stata version 11 (Stata Corp, College Station, TX, USA). Outcomes included agreement between the tests using the kappa statistic (κ).

Latent Class Analysis
LCA is based on the premise that the results of various imperfect tests for a condition are influenced by a common underlying latent variable, which represents the true status [12,19]. We also used the model to calculate the positive and negative predictive values of each individual test and a combination of two tests, specifically a composite rule under which a HCW who is positive on either TST or QFT-GIT is classified as LTBI positive, and if negative on both tests classified as LTBI negative.
In applying LCA, subjects with both determinate and indeterminate results were included as participants with indeterminate results in one test may still have relevance in the analysis on account of determinate results in the remaining two tests.
Similarities in technological properties and immunological mechanism underlying the two IGRA assays could potentially result in correlation of the errors in both tests, referred to as conditional dependence. Therefore, a latent class model that allowed for conditional dependence between QFT and TSPOT.TB both among LTBI positive and LTBI negative subjects was considered [20,21]. A fixed effects model allowing for conditional dependence between QFT-GIT and TSPOT.TB was used to fit the data [22].
When the number of diagnostic tests used in the study sample does not provide at least as many degrees of freedom as the number of unknown parameters to be estimated, the model is not identifiable. This was the case for our analysis as the number of degrees of freedom available was seven, but the number of unknown parameters was nine [LTBI prevalence, sensitivity and specificity of the three individual tests, and two covariance terms between the two IGRA tests (among LTBI positive and LTBI negative individuals)]. To obtain a meaningful solution to this analytic problem it is necessary to employ a Bayesian approach for inference and to provide prior information (i.e., information external to the observed data) on some of the unknown parameters [23].
In this instance, prior information is available on the sensitivity and specificity of the tests from the literature (Table 1) [24]. The latent class analysis updates this prior information using the newly observed data to provide posterior distributions for each unknown test characteristic. Results of the Bayesian analysis are reported as the median and 95% credible intervals (the Bayesian equivalent of a confidence interval) for each test performance characteristic. Another advantage of using the Bayesian approach is that one can use the posterior distributions to calculate the probability that the performance characteristic of one test is greater than that of another test. Prior information on the accuracy of the three tests was elicited from Pai, Zwerling and Menzies using ranges of sensitivity and specificity of TST and QFT-GIT based on systematic reviews and meta-analyses of studies, carried out in high TB burden settings and BCG-vaccinated populations. (Table 1) [24]. Each of these ranges was expressed as a Beta probability distribution by matching the end points of the range to the 2.5% and 97.5% quantiles of the distribution. A non-informative Beta (1,1) prior distribution was used for the prevalence of LTBI, which is unknown in this study. In other words, this prior information places equal weight on all values between 0% and 100%. The prior 95% credible interval (CrI) of each parameter was set to be equivalent to the confidence interval resulting from the meta-analysis.
Non-informative prior distributions were used for the covariance terms over their possible ranges. In an attempt to use as little prior information as possible, we first fitted a model that incorporated prior information on specificities only. We then fitted a model that incorporated prior information on both specificities and sensitivities for comparison.
WinBUGS software was used to analyze the data [25]. Twenty thousand samples were drawn from the posterior distribution after discarding a burn-in of 1000 iterations. Convergence of the Monte Carlo Markov chain was assessed using the Gelman-Rubin statistic.
As a secondary objective, the prevalence of active TB in this population was also evaluated. Screening for current active TB demonstrated a high prevalence of chest radiograph abnormality (23%) (X-ray compatible with active or inactive TB) and a positive TB symptom screen (26%) (defined as yes to the presence of any TB symptoms). Those who were HIV positive (on testing or as reported) were 22 (11%).
Two participants indicated that they were currently receiving TB treatment, one of whom was HIV positive. Of the 103 participants referred for sputum investigation (those who were either symptom screen positive or had a chest radiograph in keeping with active TB), a further five tested positive for active disease. This translates into a prevalence of 5/503 or 1/1000 for active TB in this population. Of the five new cases detected, none tested positive on sputum microscopy and all five were culture positive. Three tested positive on symptom screen, three on chest radiograph, and three were HIV positive. One case did not test positive on either chest radiograph or symptom screen but was referred for sputum at her own request as she had felt unwell and had a colleague recently diagnosed with TB as a result of the study. This represented a deviation from the study protocol. All cases were TST positive, three were QFT and four were T-SPOT.TB positive.
Agreement analysis revealed poor agreement for test positivity when comparing TST to QFT-GIT (kappa = 0.28) and to TSPOT.TB (kappa = 0.25) (Table 3).  Using prior information on specificities only, LTBI prevalence was estimated at 81% (95% CrI 71-88%), with the sensitivity highest for TST ( Table 4). The probability that the sensitivity of TST was higher than that of QFT was 0.99, and higher than that of TSPOT.TB, 1.0. Specificity on the other hand was significantly lower for TST than for the IGRAs (57% vs. 95-96%). A model using prior information on the sensitivities as well as specificities of all three tests estimated the prevalence of LTBI to be 76% (95% CrI 70-83%). TST sensitivity was again higher than that of IGRAs (Table 5). In this model the probability that the sensitivity of TST was higher than that of QFT was 0.95 and of TSPOT.TB 0.88.  In the model that incorporated prior information on specificity only, the positive predictive value, i.e., the proportion of positive tests correctly identifying the presence of LTBI of all tests was high: QFT-GIT 98% TSPOT.TB 99% and TST 90%. However, the negative predictive value, i.e., the proportion of negative tests correctly identifying the absence of LTBI, of all the tests was relatively low (range 47-65%) ( Table 4).
On a composite rule classifying a participant as LTBI positive if either TST or QFT-GIT was positive, the positive predictive value remained high at 90% (95% CrI 81-95%) when compared to individual tests. By contrast, the negative predictive value (i.e., of both tests negative) was substantially greater than those of the individual tests, at 90% (95% CrI 80-97%).

Discussion
The main findings of this study in a TB/HIV endemic setting are that (i) TST is highly sensitive for LTBI diagnosis and strongly predictive of the presence of LTBI; (ii) IGRAs have superior specificity for LTBI diagnosis, and (iii) a combination of TST and IGRA has high rule-out value for LTBI (strong negative predictive value for the absence of LTBI). Participants had a high prevalence of LTBI, active TB and history of previous TB treatment. These findings reflect the endemic nature of TB in the underlying population and by implication in the occupational setting.
The LTBI prevalence as measured by TST, QFT-GIT and TSPOT.TB was 84%, 65% and 60% respectively for the group. The prevalence of LTBI as measured by TST is markedly higher than from a recent study of South African HCWs in Johannesburg which reported a 57% prevalence conducted among nurses and medical students [7]. It is, however, similar to community based general population studies in the Western Cape, which reported an LTBI prevalence of 81-88% [26][27][28]. Wood et al., (2010) showed an increasing prevalence of latent TB infection with increasing age in a population of HIV negative individuals aged 5-40 years drawn from high TB prevalence areas, with 88% of adults 31-35 years of age testing positive on TST. The prevalence of LTBI among.
HCWs in this study is therefore not indicative of greater TB infection risk than is found in community participants in the Western Cape. A more recent review of LTBI in HCWs in low-and middle-income countries by Apriani et al. reported the prevalence of LTBI in HCWs as ranging from 8-98% (mean 49%) based on TST, although studies were characterized by a high degree of heterogeneity [16]. Whilst sensitization to environmental mycobacteria may play a role in LTBI prevalence as measured by TST, this is not considered a clinically important cause of false-positive TST results, except in populations with a high prevalence of NTM sensitization and a very low prevalence of TB infection [29]. It has also been shown that TST reactions greater than 10 mm are unlikely to be caused by NTM. It is noted that the median size of the TST reaction in this group was 18 mm (interquartile range (IQR) [13][14][15][16][17][18][19][20][21][22] [30,31]. Data on sensitization to environmental mycobacteria are not available for South Africa.
The high prevalence of LTBI as reflected in TST positivity is unlikely to be confounded by high rates of active TB in this population as only 1% participants were found to have active TB. Furthermore, TST has previously been found to be highly sensitive for LTBI diagnosis and has been shown to have equivalent predictive value for incident active TB as IGRAs [32].
The lower LTBI prevalence estimated by using IGRAs is in keeping with studies among HCWs from low and intermediate incidence TB burden settings which have generally reported lower LTBI prevalence measured by IGRA, primarily using QFT-GIT, than for TST [33]. This has been ascribed to the IGRAs' greater specificity and less confounding by BCG vaccination. Studies using IGRAs among HCWs from high TB incidence settings have produced varied results. Apriani et al. reported the prevalence of positive IGRA as ranging from 9% to 86% (p-value for heterogeneity = 0.01) in HCWs [16].
LTBI estimates using IGRAs in our study approximate those found in a community-based South African study involving healthy adults, which performed a head-to-head comparison between TST and IGRAs [26]. As with TST, LTBI prevalence as measured by IGRAs was similar in HCWs and the community, reaffirming the high background prevalence of TB infection in this population.
In the absence of a gold standard for LTBI, the use of latent class analysis allowed a more direct comparison of TST and IGRA test performance in this population. Both models showed a higher sensitivity for TST than IGRAs for LTBI diagnosis, although the probability was somewhat attenuated in the model that included prior information on both sensitivity and specificity. Furthermore, the sensitivity of TST (93%) in this study is higher than that shown in BCG-vaccinated populations (84%) of immunocompetent adults [13]. Whilst our study included immune-compromised individuals, the impact if any would be to decrease TST sensitivity, not enhance it as anergy in immunosuppressed individuals would result in false negative responses. As a first line screening test TST thus has a high probability of being more sensitive than either IGRA at detecting LTBI in this population.
The positive predictive value for IGRAs in this population (each 99%) compared to TST (90%) is slightly higher than that previously shown in a high prevalence (>50%) setting, i.e., QFT-GIT = 88% and TST = 73% [13]. This is most likely influenced by the exceptionally high prevalence of LTBI and near universal BCG vaccination in this population. However, the routine use of IGRAs in this setting is currently not advocated and given resource constraints and test performance of TST (higher sensitivity and slightly lower positive predictive value (PPV) than IGRA), this is likely to remain the first line test for LTBI.
Our analysis relied on prior information that IGRAs had significantly better specificity at ruling out LTBI than TST [24]. This relationship persisted after updating the prior information with the observed data. This is similar to findings from a recent review for BCG vaccinated populations where specificity for TST ranged from (76-82%) and BCG vaccination was shown to reduce TST specificity by as much as 21% [13]. In our study, an even lower estimate of specificity for TST 57% (95% CI; 43-71%) was found, suggesting that there may be a greater impact of BCG than elsewhere or an effect of exposure to other mycobacteria. IGRAs thus clearly show superior specificity to TST, making them superior as rule-out tests in this setting. Specificity of IGRAs appears to be unaffected by immune status (immune competent vs. immune compromised) [34].
The definition of optimal test performance in any given setting depends on the exact objective of testing, LTBI prevalence, and the costs, acceptability and sustainability of the proposed regimen. Therefore, there is no general rule for test optimality in practice. In the setting of an LTBI treatment programme based on test conversion offered to HCWs in a very high LTBI prevalence setting, high test accuracy at baseline is a priority to identify those in need of surveillance given the limited capacity to support such programmes. IGRAs have reasonably high sensitivity, high specificity, high PPV but low negative predictive value (NPV) in this setting. IGRA test results have also shown great variability when their use has been assessed in serial screening programmes, as compared to TST. This complicates baseline test interpretation [35][36][37][38]. There is therefore an argument for using a combined test, such as TST and QFT-GIT, at baseline. This takes advantage of TST's high sensitivity and the greatly increased NPV from using a "both TST and IGRA negative" definition to produce a more accurate estimation of likely LTBI diagnosis. This will inform a more targeted approach to surveillance of those truly at risk and may result in cost-savings and less chance of unnecessary treatment for LTBI.
A limitation of the study is that of possible selection bias due to voluntary participation. Clinical staff (nurses and physicians) was underrepresented, probably as a result of high workloads and limited time to participate in the study during work hours, resulting in a greater proportion of participants being support and administrative staff. The LTBI prevalence may, therefore, be more reflective of high background community rates of infection in additional to any occupational effect than would have been the case with a higher proportion of physicians. However, many nurses come from high TB burden communities.
The study setting was one of the highest TB incidence areas in the country and findings may not be generalizable to all HCWs in South African or occupational settings where the community incidence of TB may be lower and occupational exposure less than is the case in this study. Furthermore, the use of cut-points for test positivity was based on the manufacturer's instructions and software provided for assay analysis at the time that the study was conducted (≥6 spots or more for TSPOT.TB and ≥0.35 IU/mL for QFT-GIT). Current evidence suggests that further revision of cut points may be required which may be considerably different to those currently in use [37,38].
Another limitation of this study is the lack of sufficient sample size to evaluate test performance separately in immune competent and immune compromised individuals. Doan et al., in an extensive review evaluating LTBI test performance utilizing latent class analysis and Bayesian modelling, have shown that both TST and QFT-GIT have suboptimal sensitivity in immune compromised individuals [13]. Given the high HIV prevalence in South African HCWs (at least 10%) and the fact that such HIV positive HCWs face a six-fold higher risk of contracting TB than uninfected individuals, test performance among HIV positive health workers needs further investigation [39]. Recent work suggests that T-SPOT may be less susceptible to false negative status among anergic health workers with LTBI, but this needs to be confirmed [40].

Conclusions
In conclusion, our findings indicate that in populations with high TB incidence, high LTBI prevalence and near universal BCG vaccination, TST has a high probability of being more sensitive than IGRAs at detecting LTBI but lacks specificity. The high NPV of both TST and IGRA negative in combination may suggest a role for such combined testing to limit the unnecessary administration of LTBI treatment in resource limited settings where TB is endemic. The many factors other than test performance which determine the effectiveness of a programme of LTBI testing in HCWs in a high TB incidence setting need further research. These factors include, inter alia, prevalence of LTBI, conversion rates, risk of re-infection, objectives of the screening programme, opportunity costs of testing and treatment, acceptability of testing and LTBI treatment to HCWs, and the availability of occupational health service resources [41,42]. Cost-benefit analysis is needed to define an optimal testing strategy under different assumptions.