Review of Methods Used for Diagnosing Tuberculosis in Captive and Free-Ranging Non-Bovid Species (2012–2020)

The Mycobacterium tuberculosis complex (MTBC) is a group of bacteria that cause tuberculosis (TB) in diverse hosts, including captive and free-ranging wildlife species. There is significant research interest in developing immunodiagnostic tests for TB that are both rapid and reliable, to underpin disease surveillance and control. The aim of this study was to carry out an updated review of diagnostics for TB in non-bovid species with a focus predominantly on those based on measurement of immunity. A search was carried out to identify relevant papers meeting a pre-defined set of inclusion criteria. Forty-one papers were identified from this search, from which only twenty papers contained data to measure and compare diagnostic performance using diagnostic odds ratio. The diagnostic tests from each study were ranked based on sensitivity, specificity, and diagnostic odds ratio to define high performing tests. High sensitivity and specificity values across a range of species were reported for a new antigenic target, P22 complex, demonstrating it to be a reliable and accurate antigenic target. Since the last review of this kind was undertaken, the immunodiagnosis of TB in meerkats and African wild dogs was reported for the first time. Suid species showed the most consistent immunological responses and highlight a potential dichotomy between humoral and cellular immune responses.


Introduction
The Mycobacterium tuberculosis complex (MTBC) is a group of genetically similar bacteria that cause the disease tuberculosis (TB) in a range of hosts [1]. The MTBC comprises the major pathogenic mycobacteria species M. tuberculosis, M. bovis, M. africanum, M. canettii, M. microti, M. caprae, M. pinnipedii, M. mungi, M. suricattae, and M. orygis [2]. Cattle are considered the primary host of M. bovis; however, infection is not limited to livestock but also affects humans and many other free-ranging and captive wildlife species [3]. Notably, the European badger (Meles meles) in the United Kingdom, Brushtail possum (Trichosurus vulpecula) in New Zealand, and White-tailed deer (Odocoileus virginianus) in the United States are all species implicated in transmission of M. bovis to livestock [3]. As reviewed by Miller and Olea-Popelka, different control strategies for TB are implemented in different countries based on the level of disease transmission and prevalence within that country, and considering which species are infected or at risk of infection [4]. Common control strategies include surveillance, culling of reservoirs and infected animals, increased biosecurity and vaccination underpinned by diagnostic testing [5].
Zoonotic transmission of TB may be more likely in zoos due to the close contact of staff with animals, as well as the potential for transmission of infection from human to animal, although this is very rare [6]. In addition, zoos may find it problematic to maintain biodiversity and conserve valuable or endangered species and to exchange genetic resources and animals with one another where TB outbreaks occur, as reviewed by Lécu and Ball [7]. Accurate diagnosis of TB in captive wildlife is therefore important but challenging, given the diversity of species susceptible to the MTBC.
Conventional diagnostic tests for TB are considered the gold-standard and comprise of bacterial culture, histopathology, and post-mortem examination [8]. Often, these conventional tests are used in combination with one another or are used as a confirmatory test for newer immunological diagnostics, as discussed by Salfinger and Pfyffer [9]. Culture and post-mortem examinations are relatively expensive tests that require laboratory facilities for isolation and identification of mycobacteria. Culture is the primary gold-standard test for TB. However, as mycobacteria are slow growing, culture can be protracted as well as being liable to cross-contamination with other environmental bacteria [10]. Culture also varies in sensitivity depending upon the type of sample used [11].
Immunological diagnostics based on the humoral immune response rely on the detection of antibodies specific to MTBC antigens. Whilst easy to perform, they may be a poor indicator of TB infection because antibody titers tend to increase as disease progresses [12]. Hence, the humoral response is less reliable for the detection of asymptomatic cases or cases early on in infection but can be used to monitor the progress of infection, as reviewed in Pollock et al., 2001 [13]. Efforts to increase the sensitivity of antibody-based tests have included the use of multiple antigens, most recently, the P22 complex, made up of 118 different antigenic targets, including MPB83, MPB70 and ESAT-6 [14].
In contrast, the cell-mediated immune (CMI) response is characterized by the production of cytokines, such as IFN-γ released by stimulated lymphocytes. As discussed in the Pollock et al. review, relative to antibody production, the CMI response generally occurs earlier after infection and is considered to play a major role in controlling TB [13]. The intradermal delayed-type hypersensitivity tuberculin skin test (TST) involves intradermal injection of tuberculin, a complex mix of antigens derived from M. bovis-purified protein derivative (PPD) and measurement of swelling at the injection site usually 72 hrs later [15]. The TST is generally unreliable in most non-bovine species, such as European badgers, producing a weak response, which can be altered by the stress of capture [15,16]. In addition, the TST is often considered impractical for free-ranging wildlife because of the need to capture and retain the animal to read the test, as discussed and reviewed by De Lisle et al. [15].
Indicators often used to measure diagnostic test performance include, but are not limited to, sensitivity and specificity, predictive values, likelihood ratios, and receiver operating characteristic curve (ROC curve) analysis. Another, but less used, method is the diagnostic odds ratio (DOR). The DOR is a single indicator of test performance, being a ratio of the odds of a positive result in a diseased individual relative to the odds of a positive result in a non-diseased individual [17]. DOR can range from 0 to infinity, but a value of 1 demonstrates that the test has no discrimination between an individual with and without disease. The higher the DOR value, the better able a test is in discriminating infected from non-infected individuals [17].
The aim of the current project was to perform a review of diagnostics used for the detection of TB in free-ranging and captive wildlife species with a focus predominantly on those based on measurement of immunity, updating on previous reviews published in 2009 and 2013 [18,19]. Different studies identifying new or modified immunological targets and techniques in either known or novel wildlife reservoirs were identified and explored to evaluate the performance of the diagnostic technique and approaches used. The DOR was used to determine the performance of diagnostic tests for TB, not having previously been used for this purpose in animal studies.

Summary of Reported Techniques and Species since 2012
A total of 41 papers were identified and considered as relevant. Table 1 shows the test employed, test target, MTBC species, the nature of infection (natural or experimental), and the species of animal being observed for these 41 papers.  M. bovis N (62) African Lion (Panthera leo) CITT, GEA bPPD, aPPD [60] 1 Number in parenthesis indicates samples size, ND indicates precise number could not be determined from the paper, e.g., specify trapping events rather than individuals. 2  Using the data from Table 1, the most frequently used diagnostic tests, target antigens, and studied species were identified. First, it was noted that the most common species studied were wild boar (Sus scrofa), deer, and badgers (Meles meles), with each species appearing in seven individual studies (16.7% each of the total), closely followed by warthogs (Phacochoerus africanus), which appeared in six studies (14.3%) ( Table 1). The animals of these species were a range of wild and captive animals, with the deer being a mix of red deer (Cervus elaphus) and white-tailed deer (Odocoileus virginianus). Species which appeared in fewer studies included elephants (Elephas maximus and Loxodonta africana), lions (Panthera leo) and rhinoceros (Ceratotherium simum). Meerkats (Suricata suricatta) and African wild dog (Lycaon pictus) were reported for the first time in this context, each being the focus of one study ( Table 1).
The most common techniques used within the 41 studies were ELISAs and Lateral Flow devices (LFD), consisting of INgezim TB-CROM (Eurofins Technologies Ingenasa, Madrid, Spain), STAT-PAK (Chembio Diagnostic Systems, Inc., Hauppauge, NY, USA), and Dual Path Platform (DPP) VetTB assay (Chembio Diagnostic Systems, Inc.). Other tests included tests of CMI, such as the Interferon-Gamma Release Assay (IGRA) and TST (both the Comparative Intradermal Tuberculin Test (CITT) and the Single Intradermal Tuberculin Test (SITT)) ( Table 1). However, it was evident that serological tests were more frequently selected approaches than those based on CMI.
In parallel with the most common techniques used, the most recurrent antigenic targets were revealed. Most tests used the same or similar antigens, or a mixture of recombinant proteins (Table 1). For instance, bPPD, MPB70, MPB83, ESAT-6, and CFP10 were commonly used as either individual targets or mixed as a cocktail of either ESAT-6 and CFP10 or MPB70 and MPB83. The P22 complex protein [14] was used in a range of lateral flow assays or a 'P22 ELISA'.

Statistical Analysis
From the 41 papers, 20 contained data with which to carry out statistical analysis, calculating, if not stated, sensitivity, specificity, negative and positive predictive values (NPV, PPV), DOR, and where suitable the corresponding 95% confidence intervals (95% CI). The false negative and false positive (FN, FP), true negative and true positive (TN, TP) rates were calculated and used in a statistical test to create another set of data ( Table 2) that could be used to compare diagnostic performance. The DOR was used as a measure of diagnostic performance, being the ratio of the odds of a positive result in a diseased individual relative to the odds of a positive result in a non-diseased individual [17]. The remaining 22 papers were evaluated, but no statistical analysis was conducted because the appropriate information was missing from the paper, such as true infection status.

Analyzing Diagnostic Performance
Diagnostic performance was compared across all tests from the twenty studies in Table 2. Data were ordered to find the top ten best performing and lowest ten performing tests based on sensitivity, specificity, and DOR, individually (Tables 3 and 4). The 95% CI overlapped for nearly all diagnostic tests as many DOR calculations had a large 95% CI; therefore, the tests were simply ranked based on the calculated DOR. Fifty-three% of top-ranking tests were carried out in wild boar, with 8/10 (80%) of the DOR top ranking tests used in suid species. The INgezim TB Porcine test used for wild boar had the highest sensitivity (100%), specificity (100%), and DOR (3111) of any test (Table 3). This was followed by the DPP VetTB assay and an in-house ELISA [49] based on antibody recognition of the P22 protein complex, also tested in suid species. The most frequently used antigenic targets in the top-ranking tests were MPB83, MPB70, P22 complex, ESAT-6, and CFP10. In contrast, the lower ranking tests consisted of more CMI diagnostics, such as the TST. Based on the DOR, the worst performing test were an LFD using an IgG cocktail of commercial anti-mouse IgG and anti-rabbit IgG [42] (DOR, 1.1) and both CITT and SITT. Antigenic targets among the more poorly performing tests included bPPD and MPB83, although MPB83 featured infrequently in comparison to bPPD. Additionally, the lowest ranking tests were carried out on deer, badgers, and lions, with a few studies on warthogs and wild boar. Notably, the TB ELISA-VK using bPPD as a target [33] was ranked within the top 10 for sensitivity but within the lowest 10 for specificity and did not appear among either of the rankings according to DOR.

Importance of Gold-Standard Testing and Knowledge of Infection Status
Through critical analysis of the papers examined, it was apparent that the estimated test performance was dependent on whether the diagnostic samples were derived from naturally or experimentally infected animals, and on the definition of infection status (i.e., the gold-standard applied in the study). An example of the importance of the former was the study by Fresco-Taboada et al. [49], in which they tested a series of techniques separately using experimental samples and field samples. As shown in Tables 2 and 3, tests on the field samples showed greater accuracy in comparison to the experimental samples, ranking higher in the data analysis. With respect to the gold-standard of infection, the study by King et al. [27], which was excluded from analysis due to not using culture as the gold-standard to define true infection status, is illustrative. In that study, the diagnostic performance of three tests (IGRA; STAT-PAK; and qPCR) was assessed. The study did not use a true gold-standard test but instead interchangeably trialled the STAT-PAK and IGRA, as gold-standards together to form one gold standard, and as indiviudal gold standards. When calculating DOR, because there was no measure to identify TN, FN, TP, and TN, a DOR of 1.0 was generated for each test, no matter which 'gold standard' method was used, rendering it impossible to determine the true diagnostic value of any of the tests.

Discussion
This study was intended as a review of the tests available for diagnosing TB in non-bovid species, focusing on immunological tests and highlighting any advances from previous reviews undertaken in 2009 and 2012 [18,19]. Common indicators of diagnostic performance include sensitivity, specificity, PPV, and NPV; however, these factors are insufficient to demonstrate diagnostic performance alone [17]. Sensitivity and specificity indicators are based on a proportion of results showing positive or negative results among diseased or healthy individuals and do not consider cut off values [17]. NPV and PPV are generally not good indicators of diagnostic performance per se as they are dependent on the prevalence of infection and therefore assess diagnostic performance in a context-dependent situation [17,62]. For this study, DOR was chosen as the primary method of evaluating diagnostic performance because it serves as a single measure of test performance independent of disease prevalence [17], making comparisons across studies more straightforward. This is the first study to use DOR to assess diagnostic test performance in animals, although it has been used to assess the performance of TB tests in humans, e.g., [63].
Wild boar, badger, and deer were the most common species used in studies, with 27.5% of studies carried out in suid species (pigs, wild boar, and warthogs). Wild boar, badger and white-tailed deer are all significant maintenance hosts of TB in different countries [3]. Wild boars have been documented across Europe showing marked increase in numbers [64]. Throughout Europe, wild boar are showing higher levels of transmission of TB, without the requirement of livestock to maintain infection in the ecosystem, as reviewed in Gortázar et al., 2012 [65]. This has an impact on the population of wild boar itself but also increases the chance of transmitting the disease to other wildlife [66]. The increased awareness of wild boar as an important vector of animal TB is reflected the increase in the number of papers reporting the use of immunodiagnostics for suids, 14 papers in this report in comparison to only three papers covering a similar span of time in the last review [19]. The performance of diagnostic tests was reported in two new species since the previous reviews: meerkat and African wild dog, both being the focus of one study each. TB in meerkats is similar to that in other mammalian species [67], and their study has shed light on the behaviors and social interactions that may affect transmission of TB within social mammal species [68]. African Wild Dogs are classed as a threatened species that are currently under high pressure of infection which may impact their long-term survival and conservation [50]. A study looked at 21 packs of wild dog in Kruger National Park, where TB is endemic in African buffaloes and found using an IGRA that 20/21 of the packs studied had been sensitized to M. bovis, showing an 83% prevalence of infection [50]. Despite these results, the species is currently considered stable but highlights the potential threat that could occur with changes in biological and environmental factors such as habitat availability and reproductive rates [50].
Antigenic targets identified frequently in this study were ESAT-6, CFP10, MPB83 and MPB70. Recombinant proteins like CFP10/ESAT-6 have demonstrated high sensitivity and specificity for TB detection in people in comparison to conventional CMI diagnostics like the TST [69]. CFP10 and ESAT-6 may also be the target of strong antibody-positive responses when included in serology tests for both elephants and wild boar [52,54] but show poor diagnostic performance in badgers, with no significant increase in antibody response associated with disease progression [70]. Therefore, the diagnostic performance of CFP10 and ESAT-6 antigens cannot be generalized across species, as is the case with many antigenic targets, but does demonstrate potential for accurate detection of TB in certain species. Individually, MPB83 induces high antibody responses across a range of species including cattle, badger, deer, wild boar, and primates [49,[71][72][73]. P22 was described in 2017 [14], and therefore, was not reviewed previously. P22 complex is a mix of 118 different proteins, some of the most abundant being MPB70, MPB83, and ESAT-6 [14]. P22 complex was reported to have reduced cross-reactivity with Mycobacterium avium, having greater sensitivity than other antigenic targets, like bPPD, [14] in different species, including llamas, cattle, goats, pigs, and sheep [74,75]. In our review, although MPB83 and P22 appeared most frequently as antigenic targets in the top-ranking tests according to sensitivity, specificity, or DOR, they did not appear any more frequently than would be expected by chance, their appearance among the best performing tests more likely indicating how commonly these antigens are used. Nonetheless, both antigens gave good performance in a variety of test platforms against a range of non-bovid species. P22 as an antigenic target gave sensitivity and specificity values of 70.1-96.7% and 75.0-100.0%, respectively, across studies in wild boar, pig, deer, and badgers. Interestingly, the inclusion of multiple antigens usually increases the likelihood of FP occurring, but this was not seen with P22, despite it being a complex of 118 different antigens. When a P22-based ELISA was compared to the diagnostic performance of MPB83 as a target, it produced similar diagnostic results; however, when used in parallel, sensitivity was increased [76]; some infected animals were only detectable using MPB83 antigen, whilst others were only detectable using the P22 complex [76]. This was surprising since MPB83 is an abundant component of P22. Consequently, when used in parallel, a greater range of animal species were detected. More research is required using field samples to compare and validate the potential of P22 across a wider array of species to confirm the findings above.
Serological diagnostics were more common in the present study than CMI tests, with more serological tests appearing in the top ten. Generally, CMI tests are considered to give high sensitivity; however, this was not seen in this review as CMI tests did not appear among the tests with the highest DOR values. In general, the CMI tests were not carried out in suid species but instead in lions and deer, and this could explain the cause of their lower apparent performance, particularly as the high performing tests were carried out in suid species. Suid species are noted to have a detectable humoral response soon after M. bovis exposure which is maintained with disease progression, allowing for rapid detection [54,77]. Moreover, as reviewed by Berger, in most species, the humoral antibody response is dependent upon the cell-mediated response initiating a T helper cell response to activate macrophages and other essential cytokines for antibody activation [78]. However, it has been suggested that suid species have a dichotomy between the humoral and CMI response, meaning that a strong humoral response can occur independently of a cell-mediated response [79][80][81].
Despite a test having a high accuracy, it did not always correlate with high diagnostic performance, based on DOR. For example, TB ELISA-VK [37], t-bPPD [37], and bPPD2 [37] were all ranked among the top ten performing tests according to DOR but did not appear in the top ten for either sensitivity or specificity. Conversely, the Ingezim TB-CROM [49], Indirect PPD ELISA [33], and TB ELISA-VK [33] appeared in either or both top ten for specificity and sensitivity but not DOR. We reason that DOR is a better metric for assessing the performance of a diagnostic test since sensitivity and specificity (as pooled or indiviudal indicators) do not represent discriminatory performance, since a high sensitivity can be accompanied by a low specificity, as shown particularly for the TB ELISA-VK [33]. In contrast, DOR is a combination of both sensitivity and specificity, increasing when they become near perfect.

Literature Search and Exclusion Criteria
Using NCBI PubMed, we identified appropriate papers written in English from 2012, when the last review [19] was carried out. A total of 162 papers were found using the search criteria: ((((wild*) AND (mycobacteri*)) AND (diagnos*)) AND (("2012/09/01"[Date-Publication]: "3000"[Date-Publication]))) AND (immun*). For each of the 162 papers, the abstracts were reviewed, looking for details of the use of (immuno)diagnostics for MTBC infection in non-bovid species. Papers were excluded if they were exclusively based on bovid species, mycobacteria that do not cause TB infection such as Mycobacterium avium subspecies paratuberculosis or used exclusively non-immunological based diagnostic tests, with the exception of Stewart et al. [42] as it reported a novel immunochromatographic lateral flow assay specific for Mycobacterium bovis cells. Additionally, previous review articles were excluded from data collection and statistical analysis but were recorded and reviewed for completeness.
From this, forty-one papers were recorded as relevant from which data were collected, including the species under study, whether TB was experimentally or naturally induced, the mycobacterium species, the test used and how it was employed, the target of the test (i.e., antigen(s)), sample size and type of sample, relative sensitivity and specificity of the diagnostic technique, the NPV and PPV, and the associated cut-off values. If any of the information was not present or had not been mentioned, this was noted.

Statistical Analysis
Using the sample size and infection status, TP and TN, and FP and FN values were calculated from the 41 papers where possible, if not already stated. Studies with missing values, e.g., for sensitivity, specificity, PPV, and NPV, were calculated where possible from the reported data using the following formulae: Following this, the 95% CI surrounding the sensitivity and specificity were noted, if available, or calculated if not. The DOR for each individual test was calculated using the formula: DOR = (TP/FN)/(FP/TN) The DOR was then adjusted, by adding 0.5 to each of the cells in the contingency table, to account for the tests that had '0 values in any of the TN, TP, FP, FN values. The DOR adjustment was applied across all studies to prevent introducing bias to the data. The adjusted DOR was then used to calculate the 95% CI using the formulae below. All calculations were rounded to 1 decimal place. All formulae for the calculations outlined were sourced from [17].
Standard Error (SE) (lnDOR) = 1 TP 95% CI = lnDOR ± 1.96 × SE (lnDOR) True 95% CI = '=EXP(±CI)' The most common species, techniques, and antigenic targets were noted, and the data used to rank the tests in order of sensitivity, specificity, and DOR. All study data included in statistical analysis involving test sensitivity and specificity were established using culture as a gold-standard to confirm infection status.

Conclusions
In conclusion, a variety of diagnostic tests are now available for an array of wildlife species, with increasing variety of species being studied. The focus of this review was on diagnostic tests that detect or measure the host immune response to infection. From the current review, it was evident that serological tests are surpassing tests like the TST and even other CMI-based tests, such as IGRA for diagnostic performance. Obtaining proof of high accuracy in tests is still an issue, restricting validation of many tests. The current review used DOR to evaluate diagnostic performance, which to the best of our knowledge has not been used previously for assessing TB diagnostic tests in animals. P22 complex was identified as a promising, new antigenic target, which alongside MPB83 demonstrated potential for use as an accurate seroantigenic target. We believe these conclusions to be consistent with the evidence and arguments presented. Data Availability Statement: Data are either contained within the article or relate to 3rd Party Data. Full references are provided for these 3rd Party Data but restrictions may apply to their availability.

Conflicts of Interest:
The authors declare no conflict of interest.