Evaluation of an AI-Based TB AFB Smear Screening System for Laboratory Diagnosis on Routine Practice

The most robust and economical method for laboratory diagnosis of tuberculosis (TB) is to identify mycobacteria acid-fast bacilli (AFB) under acid-fast staining, despite its disadvantages of low sensitivity and labor intensity. In recent years, artificial intelligence (AI) has been used in TB-smear microscopy to assist medical technologists with routine AFB smear microscopy. In this study, we evaluated the performance of a TB automated system consisting of a microscopic scanner and recognition program powered by artificial intelligence and machine learning. This AI-based system can detect AFB and classify the level from 0 to 4+. A total of 5930 smears were evaluated on the performance of this automatic system in identifying AFB in daily lab practice. At the first stage, 120 images were analyzed per smear, and the accuracy, sensitivity, and specificity were 91.3%, 60.0%, and 95.7%, respectively. In the second stage, 200 images were analyzed per smear, and the accuracy, sensitivity, and specificity were increased to 93.7%, 77.4%, and 96.6%. After removing disqualifying smears caused by poor staining quality and smear preparation, the accuracy, sensitivity, and specificity were improved to 95.2%, 85.7%, and 96.9%, respectively. Furthermore, the automated system recovered 85 positive smears initially identified as negative by manual screening. Our results suggested that the automated TB system could achieve higher sensitivity and laboratory efficiency than manual microscopy under the quality control of smear preparation. Automated TB smear screening systems can serve as a screening tool at the first screen before manual microcopy.


Introduction
Tuberculosis is an emerging infectious disease worldwide that is treatable, preventable, and curable. However, due to its slow decline, tuberculosis remains a global public health threat. In 2019, about 10 million people fell ill with tuberculosis, and 1.4 million died, more than any other infectious disease [1,2]. Due to the COVID-19 pandemic, there has been a large global drop in the number of people newly diagnosed with TB and an 18% decline, from 7.1 million in 2019 to 5.8 million in 2020, was reported [3]. Sixteen countries accounted for 93% of this reduction, with India, Indonesia, and the Philippines the worst affected. According to the report in 2019, one year before the COVID-19 pandemic, the 30 high TB burden countries accounted for nearly 90% of new TB cases. Eight countries accounted for two-thirds of the total, with India leading the count with 26%, followed by Indonesia (8.5%), China (8.4%), the Philippines (6.0%), Pakistan (5.7%), Nigeria (4.4%), Bangladesh Sensors 2022, 22, 8497 2 of 10 (3.6%), and South Africa (3.6%) [1]. Heads of state and government representatives from all UN members are committed to taking major steps toward building a tuberculosis-free world, including ambitious goals to treat more than 40 million people with tuberculosis and to prevent at least 30 million from becoming ill between 2018 and 2022, through the provision of tuberculosis preventive treatment. According to the data from Taiwan National Infectious Disease Statistics in 2019, there were 3.7 new TB cases and 0.23 deaths per 100,000 people in Taiwan. The tuberculosis treatment success rate of new cases and culture-positive cases were 72.1% and 68.9%, respectively [4].
The most robust and economical method recommended by the World Health Organization (WHO) for the first line of laboratory diagnosis of pulmonary tuberculosis is the acid-fast stain method of sputum smears which relies on manual microscopic examination for acid-fast mycobacteria bacilli (AFB). Although smear microscopy is a low-cost and widely used method, its sensitivity is only 50-60% in pulmonary TB [5]. The fluorescence microscopy is more sensitive than Ziehl-Neelsen stain smear microscopy, but the specificity is similar [6]. The WHO suggested that 1.1 microscopy laboratories should be available for every 100,000 people, and be able to perform quality-assured AFB microscopy [2]. To increase the positive rate, it requires three sputum samples in a row for each potential TB patient (72% at the first samples to 76% with three samples' examination) [7]. AFB smear microscopy is labor-intensive work, and sensitivity may vary with the experience of examiners, smear quality, eye fatigue, etc. Although some molecular-based methods, such as GeneXpert MTB/RIF can assist in the identification of TB, its high cost means it is unlikely to be affordable in many high TB-burden countries [8]. The gold standard for the diagnosis of tuberculosis is still AFB and the fluorescent microscopic examination of smear or bacteriological confirmation by the culture method. Recently, some automated TB smear microscopy systems have been developed based on artificial intelligence (AI) and big data analysis, which may significantly increase the sensitivity of TB smear microscopy [9][10][11][12][13]. Although all these studies reported better performance than human microscopic examination, most are still in development. Furthermore, this system adopted the idea from digital pathology and used a whole slide scanner to randomly capture the images from the smears. However, the image results are not satisfactory, because TB sputum smears are prepared by hand and the specimens are not smoothly placed onto the glass slide for scanning, unlike the liquid-based cytological smear. An automatic AFB detection system, equipped with a 20× objective lens and an eight-slide loading tray, was applied in daily routine practice for TB sputum smear in our lab. In this study, we evaluated the accuracy of this system compared to traditional manual smear screening in actual practice on routine screening TB smears. We would also like to figure out the key factors that affect the performance when applying the automated TB smear screening system to replace human manual screening. We found the key parameters that effect the accuracy were mostly smear quality and amount of image capture in a constant area. The outcome of an automatic analysis mainly replies on the input images and setting. For achieving the best performance in automation AI-based systems or devices, we think it is necessary to combine the automation of sample preparation and the image recognition algorithm in a whole system.

Study Design
This was a single-center, retrospective, and double-blind study. The study was approved by Kaohsiung Veterans General Hospital Institutional Review Board (KSVGH22-CT4-04). A total of 5930 TB sputum smears were evaluated in this study, collected from the microbiology lab at Kaohsiung General Hospital. The data was collected from December 2018 to January 2020. The evaluation periods were divided into two stages. In the first stage (from December 2018 to March 2019), 1014 smears were analyzed based on the original setting of the machine, which randomly captured 120 images per smear. In the second stage (from April 2019 to January 2020), the image captured field was increased to 200 images per smear, and 3902 smears were analyzed.

Procedures
The sample collection, handling, preparation, and AFB smear microscopy were based on the lab guideline authority by the biosafety committee. The sputum samples were spread on a fixed area of 1 cm × 2 cm on the slide and treated with Ziehl-Neelsen stain [14]. All of the slides were loaded into the automated system and scanned by the system. After the system finished the reading, the slides were reviewed by a microbiological medical technologist without knowing the results of the system. Both sides of the results were checked by an independent microbiology medical technologist. The general workflow diagram was illustrated in Figure 1.
stage (from December 2018 to March 2019), 1014 smears were analyzed based on the original setting of the machine, which randomly captured 120 images per smear. In the second stage (from April 2019 to January 2020), the image captured field was increased to 200 images per smear, and 3902 smears were analyzed.

Procedures
The sample collection, handling, preparation, and AFB smear microscopy were based on the lab guideline authority by the biosafety committee. The sputum samples were spread on a fixed area of 1 cm × 2 cm on the slide and treated with Ziehl-Neelsen stain [14]. All of the slides were loaded into the automated system and scanned by the system. After the system finished the reading, the slides were reviewed by a microbiological medical technologist without knowing the results of the system. Both sides of the results were checked by an independent microbiology medical technologist. The general workflow diagram was illustrated in Figure 1.

AI-Based Automatic TB Detection Device
An automated microscope system (µ-Scan 1.1, Wellgen Medical, Kaohsiung, Taiwan) was used for TB detection [15]. The system consists of two components: hardware and software. The hardware included an eight-slide tray and a microscopic digital camera with an auto-focusing and slide-scanning mechanism to cover the specimen based on WHO recommendations (300 fields @1000× oil lens) (Figure 2a). The system is designed to only scan the 1 cm × 2 cm fixed area and capture the images (Figure 2b). The fixed 1 cm × 2 cm area was divided into ten parts and was scanned one by one ( Figure 2c). The software uses an image recognition algorithm to detect and classify positive AFB from level 0 to 4+ ( Figure 3). This device can be loaded with eight slides on a tray per run, and each slide takes about 4 min to scan and obtain results. The microscopic images were digitally and randomly captured and stored. In the detection phase, candidate AFBs were marked and differentiated from other substances and tissues in the smear based on color and morphological features. In the classification phase, the feature parameters were extracted from

AI-Based Automatic TB Detection Device
An automated microscope system (µ-Scan 1.1, Wellgen Medical, Kaohsiung, Taiwan) was used for TB detection [15]. The system consists of two components: hardware and software. The hardware included an eight-slide tray and a microscopic digital camera with an auto-focusing and slide-scanning mechanism to cover the specimen based on WHO recommendations (300 fields @1000× oil lens) (Figure 2a). The system is designed to only scan the 1 cm × 2 cm fixed area and capture the images (Figure 2b). The fixed 1 cm × 2 cm area was divided into ten parts and was scanned one by one ( Figure 2c). The software uses an image recognition algorithm to detect and classify positive AFB from level 0 to 4+ ( Figure 3). This device can be loaded with eight slides on a tray per run, and each slide takes about 4 min to scan and obtain results. The microscopic images were digitally and randomly captured and stored. In the detection phase, candidate AFBs were marked and differentiated from other substances and tissues in the smear based on color and morphological features. In the classification phase, the feature parameters were extracted from AFB candidates as the input parameters to a proprietary classifier. The results were recorded as positive if any AFB was identified in the image of the slide. Senior medical technologists reviewed all the images and slides to confirm the consistency and evaluate the system's performance. The training system is a hybrid system including supervised training (targeting bacilli-like objects) and unsupervised training (putting the bacilli-like objective into the convolutional neural network (CNN) for deep learning). The algorithm setting was developed by the manufacturer (µ-Scan 1.1, Wellgen Medica, Kaohsiung, Taiwan). The system applied supervised training to identify the targets of interest based on the morphology (e.g., color, size, length-to-width ratio, etc.) of a TB bacillus. All the targets of interest were processed through a deep learning algorithm for acid-fast bacillus recognition based on the convolutional network work (CNN) framework and other modified methods to fine-tune the algorithm by adjusting the thresholds. Then the certified medical technicians reviewed the results from the algorithm validation as the gold standard, and identified false positive and false negative targets of interest to be re-enrolled into the dataset by the machine learning framework to further improve the algorithm.
AFB candidates as the input parameters to a proprietary classifier. The results were recorded as positive if any AFB was identified in the image of the slide. Senior medical technologists reviewed all the images and slides to confirm the consistency and evaluate the system's performance. The training system is a hybrid system including supervised training (targeting bacilli-like objects) and unsupervised training (putting the bacilli-like objective into the convolutional neural network (CNN) for deep learning). The algorithm setting was developed by the manufacturer (µ-Scan 1.1, Wellgen Medica, Kaohsiung, Taiwan). The system applied supervised training to identify the targets of interest based on the morphology (e.g., color, size, length-to-width ratio, etc.) of a TB bacillus. All the targets of interest were processed through a deep learning algorithm for acid-fast bacillus recognition based on the convolutional network work (CNN) framework and other modified methods to fine-tune the algorithm by adjusting the thresholds. Then the certified medical technicians reviewed the results from the algorithm validation as the gold standard, and identified false positive and false negative targets of interest to be re-enrolled into the dataset by the machine learning framework to further improve the algorithm.

Data Interpretation
The evaluation of test performance is based on overall accuracy, sensitivity, and specificity. Statistical analysis was performed using IBM SPSS Statistics, Version 25.0. Armonk, NY, USA.

The Performance of the Automation System
When the study started in December 2018, the first test results (from December 2018 to March 2019, n = 1014) were as follows: the sensitivity and specificity were 60% (75/125) and 95.7% (851/889). After a series of imaging training and testing and an increase in the scanning area by 40% (from 120 digital images to 200), the second test results (from April 2019 to January 2020, n = 3902) were improved: the sensitivity and specificity were 77.4% (460/594) and 96.6% (3197/3308), respectively (Table 1). During the study, some smears were not as consistent during the sampling process. A total of 709 smears were later excluded from this study due to incomplete stain removal (n = 325), smear location shift (n = 89), smear being too thick (n = 49), smear being too thin (n = 110), smear dropping off (n = 96), and other reasons due to atypical mycobacterial morphology (n = 40) ( Table 2, Figures  2b and 4). Thus, the overall results of accuracy, sensitivity, and specificity were 95.2 (3040/3193), 85.6% (406/474), and 95.2% (2634/2719), respectively (Table 1).   Table 5. The precision and error rate of the automation system and manual microscopy. The "True" represents the cases diagnosed correctly; The "False" represents the cases of wrong diagnosis.  Table 6. The precision and error rate of the automation system and manual microscopy. The "True" represents the cases diagnosed correctly; the "False" represents the cases of wrong diagnosis, after removal of inadequate cases.

Data Interpretation
The evaluation of test performance is based on overall accuracy, sensitivity, and specificity. Statistical analysis was performed using IBM SPSS Statistics, Version 25.0. Armonk, NY, USA.

The Performance of the Automation System
When the study started in December 2018, the first test results (from December 2018 to March 2019, n = 1014) were as follows: the sensitivity and specificity were 60% (75/125) Sensors 2022, 22, 8497 5 of 10 and 95.7% (851/889). After a series of imaging training and testing and an increase in the scanning area by 40% (from 120 digital images to 200), the second test results (from April 2019 to January 2020, n = 3902) were improved: the sensitivity and specificity were 77.4% (460/594) and 96.6% (3197/3308), respectively (Table 1). During the study, some smears were not as consistent during the sampling process. A total of 709 smears were later excluded from this study due to incomplete stain removal (n = 325), smear location shift (n = 89), smear being too thick (n = 49), smear being too thin (n = 110), smear dropping off (n = 96), and other reasons due to atypical mycobacterial morphology (n = 40) ( Table 2, Figures 2b and 4). Thus, the overall results of accuracy, sensitivity, and specificity were 95.2 (3040/3193), 85.6% (406/474), and 95.2% (2634/2719), respectively (Table 1).   We also compared the performance of the automated TB smear microscopy system with manual microscopy performed by medical technologists. Manual microscopy was superior to the automated TB smear microscopy system. The sensitivity, specificity, and accuracy of manual TB smear microscopy were 84.5%, 100%, and 97.6%, respectively, while those of the automated system were 77.4%, 96.6%, and 93.7%, respectively (Table 3, Figure 5). However, after excluding inadequate cases, the sensitivity of the automation system increased to 85.7%, which is higher than manual microscopy with a sensitivity of 82.1% (Table 4 and Figure 6). Furthermore, about 91.4% of cases were reported as positive by both the automation system and technologists, and only <0.1% of cases were missed and reported as negative by both. Overall, the error rate of the automation system (6.32%) was higher than our technologists (2.31%) ( Table 5). After excluding the inadequate cases, the error rate of the automation system decreased to 4.73%, while that of technologists slightly increased to 2.6% (Table 6). There were two positive cases missed out by both of them. False positive AFB 3+ False positive AFB 4+         Table 5. The precision and error rate of the automation system and manual microscopy. The "True" represents the cases diagnosed correctly; The "False" represents the cases of wrong diagnosis.  Table 6. The precision and error rate of the automation system and manual microscopy. The "True" represents the cases diagnosed correctly; the "False" represents the cases of wrong diagnosis, after removal of inadequate cases.

The Consistency between the Automated System and Manual Microscopy
In our result, the consistency between the automated system and manual microscopic examination was 100% on AFB results on 2+ to 4+ smears (within ±1 to 2 levels were acceptable). The consistency discrepancy mainly came from smears in trace and 1+ and trace, 39.1%, and 89.1%, respectively (Table 7). This system recovered 85 smears initially found negative from manual microscopic examination (one smear from 2+, 19 smears from 1+, and 65 smears from scanty). However, the automatic system missed 66 smears reported as positive from manual microscopic examination (5 smears from 1+ and 61 smears from scanty) ( Table 8).

Discussion
In the microbiology laboratory, the AFB staining and fluorescence microscopy of sputum smear is still considered standard procedures to confirm Mycobacteria. They are also the most economical, rapid, and readily available methods. However, AFB smear microscopy is labor-intensive work, and the sensitivity may vary with the experience of examiners, smear quality, eye fatigue, etc. Thus, the application of an automatic AI-based system in TB smear examination may provide a constancy of performance and assist medical technologists in carrying out the first screen. We hope such automatic AFB screening system can perform high-throughput screening to increase the positive detection rate.
We applied an automatic TB screening system on AFB smear microscopy in our hospital. In our on-site test, the automated TB scan system achieved an accuracy of 95.2%, sensitivity of 85.7%, and specificity of 96.9%. When comparing the smear microscopy results with culture, the manual microscopy's overall performance was superior to the automation system. However, after excluding inadequate cases, the sensitivity of the automation system increased to 85.7 from 77.4%, which was higher than manual microscopy with a sensitivity of 82.1%. The error rate of the automation system was 3.7% higher than manual microscopy (2.6%). The design of an image recognition algorithm is to maximize sensitivity, therefore false positives may occur, and those cases need to be ruled out by technologists. All the false positive cases were trace or 1+ levels and accounted for 1.7% in total cases and 11.1% in positive cases detected by the automated system. The automation system showed acceptable consistency with manual microscopy at the higher level, such as 2+, 3+, and 4+. It may save 24.5% of screening time on positive cases for manual microscopy.
In 2012, Lewis et al. reported a smear microscopy system with the sensitivity and specificity of 75.8% and 43.8% [9]. In 2019, Lopez-Garnier et al. reported a CNN automatic TB diagnostic system with high accuracy (96.6%, and a sensitivity and specificity ranging from 91% to 99%. From 2012 to now, numerous AI-based images diagnosis systems have been reported, and the overall performance showed dramatic improvement. Such an improvement in AI development may be due to the progress of algorithms and computing capability. However, we found training images had very different qualities, such as color pixels; this may lead to overfitting outcomes. That may imply that most systems might not be universal. Compared to those studies, our proof-of concept study showed acceptable results. For seeking the best performance, we think the standardized sample preparation and customized algorithm fine-tuning are needed. Furthermore, we found several issues worth mentioning when applying automated microscope systems in clinical laboratories: (a) the smear location: the manual smear technique needs to be standardized so that the smear location is systematically the same place. If the smear location is not standardized, it may affect the accuracy of the automated system. (b) Stain quality: the manual stain technique also affects the automated system's performance, since the recognition software uses color as an important parameter for detecting AFB. An automatic smear and stain system, in addition to our automated microscope, may improve the quality of the smears. The main purpose of developing an AI-based TB automatic screening system is to save the technician's workload and focus on the suspected positive slide of AFB. We consider such AI-based TB automation system as an assistance tool, instead of a replacement for the diagnosis from medical technologists. This system can facilitate the junior technicians to minimize error rate, due to the lack of experience.
Through advances in artificial intelligence (AI), machine learning (ML), and deep learning technology (DL), medicine is shifting into the digital medicine era. Computer technology not only can be used to make medical records, transport data, or in analysis, but also be expected to function with AI to diagnose diseases, make medical decisions, give health suggestions, and even treat patients [16]. AI has been massively applied in the development of medical devices and software. In the medical laboratory, most works still rely on pathologists, cytopathologists, and medical technologists, and manual operation remains. With a large number of samples needing to be handled in medical laboratories every day, the role of AI in laboratory medicine is expected to improve the accuracy of detection and laboratory workflow, help avoid diagnosis errors, and increase efficiency. Based on a survey, about 15.6% of organizations currently use AI in lab diagnosis. However, most still remain uncertain about adopting AI in medical diagnosis. One concern is the lack of proven clinical benefits. Therefore, our lab-side routine practice data showed promising results and support and ensured the reliability of AI-based medical devices' performance [17]. The proof-of-concept of this AI-based medical devices at our lab can provide reassurance to other laboratories when they were considering the installation of such expensive automation systems or devices.
This TB smear automatic screening system is rapid and easy to operate for lab staff. The light and neat device design can easily settle on the table in any laboratory space. A good medical device design should consider the applicability and convenience of the potential users. The standard AFB staining for TB diagnosis only requires simple sputum smear preparation. With the help of automatic TB bacteria detection systems, the capacity of AFB tests per year can be increased. Automatic TB bacteria screening systems equipped with AI should be installed in high tuberculosis burden countries with limited medical resources, especially in developing countries or remote areas.
Besides bacteria morphology, gene identification, chest X-ray images, and biosensor medical devices, such as electronic noses, can detect tuberculosis via patient's breath and even culture samples of Mycobacterium tuberculosis. The odor of bacteria itself or patients with TB infection can be analyzed via e-nose [18]. These need machine learning and deep learning to set up algorithms to work on the device. In the future, AI can help develop needed tools for TB diagnosis and increase the accuracy of current TB diagnosis methods.

Conclusions
In conclusion, the automated TB microscopy system can achieve the same high performance as manual microscopy. We found that the accuracy of this system was mainly influenced by the scanning area and the smear staining quality. By increasing the number of captured images on each slide, the positive rate of TB screening will also be increased. To achieve better staining quality, the establishment of standardized staining procedures is essential. It could assist medical technologists in screening positive cases, which may be missed by manual microscopy screening. The standardized smear preparation protocol is needed for the adoption of the automation system. A regularly updating algorithm from clinical data for training, validation, and testing may improve the accuracy.