Decision-Aware Vision Mamba with Context-Guided Slot Mixing for Chest X-Ray Screening and Culture-Based Hierarchical Tuberculosis Classification
Abstract
1. Introduction
2. Related Works
2.1. Deep Learning for Medical Imaging and Datasets
2.2. Active TB and Inactive TB Classification
2.3. Feature Localization and Preprocessing Techniques
3. Proposed Method
3.1. Vision Mamba
3.2. Context-Guided Slot Mixer (CGSM)
3.2.1. Context-Aware Initialization
3.2.2. Iterative Slot Attention with Gated MLP
3.3. Multi-Stage Feature Refinement
3.4. Hierarchical Classification Head
4. Experimental Setup and Evaluation Roadmap
4.1. Data Preprocessing and Augmentation Strategy
4.2. Dataset and Ethics Statement
4.2.1. SCH-CXR Dataset
4.2.2. Generalization Analysis for Open Dataset
4.3. Implementation Details
4.4. Evaluation Metrics
5. Experimental Results
5.1. Performance Measurement on SCH-CXR Dataset
5.2. Performance Measurement on Kaggle-CXR Dataset
5.3. Performance Measurement on DA and DB Dataset
5.4. Qualitative Evaluation via Grad-CAM
5.5. Ablation Study: Impact of Hierarchical Head Architecture
6. Discussion
6.1. Architectural Effectiveness of SSM and CGSM
6.2. Clinical Utility and Trustworthiness
6.3. Limitations
7. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
- World Health Organization. Global Tuberculosis Report 2024; World Health Organization: Geneva, Switzerland, 2024; Available online: https://www.who.int/teams/global-tuberculosis-programme/tb-reports/global-tuberculosis-report-2024 (accessed on 6 February 2026).
- World Health Organization. Global Tuberculosis Report 2023; World Health Organization: Geneva, Switzerland, 2023; Available online: https://www.who.int/teams/global-tuberculosis-programme/tb-reports/global-tuberculosis-report-2023 (accessed on 6 February 2026).
- Rim, B.; Jang, H.; Lee, H.; Jeon, W. Active and Inactive Tuberculosis Classification Using Convolutional Neural Networks with MLP-Mixer. Bioengineering 2025, 12, 630. [Google Scholar] [CrossRef]
- Choi, Y.R.; Yoon, S.H.; Kim, J.; Yoo, J.Y.; Kim, H.; Jin, K.N. Chest Radiography of Tuberculosis: Determination of Activity Using Deep Learning Algorithm. Tuberc. Respir. Dis. 2023, 86, 226–233. [Google Scholar] [CrossRef]
- Mirugwe, A.; Tamale, L.; Nyirenda, J. Improving Tuberculosis Detection in Chest X-Ray Images Through Transfer Learning and Deep Learning: Comparative Study of Convolutional Neural Network Architectures. JMIRx Med. 2025, 6, e66029. [Google Scholar] [CrossRef] [PubMed]
- Kotei, E.; Thirunavukarasu, R. A Comprehensive Review on Advancement in Deep Learning Techniques for Automatic Detection of Tuberculosis from Chest X-ray Images. Arch. Comput. Methods Eng. 2024, 31, 455–474. [Google Scholar] [CrossRef]
- Shekar, B.H.; Mannan, S. Tuberculosis Detection with Customized CNN and Oversampling Techniques: A Deep Learning Approach. Discov. Artif. Intell. 2025, 5, 116. [Google Scholar] [CrossRef]
- Abraham, B.; Mohan, J.; John, S.M.; Varghese, B.M. Computer-aided Detection of Tuberculosis from X-ray Images Using CNN and PatternNet Classifier. J. X-Ray Sci. Technol. 2023, 31, 699–711. [Google Scholar] [CrossRef] [PubMed]
- García Seco de Herrera, A.; Yagis, E.; Pinpo, N.; Abolghasemi, V.; Andritsch, J.; Chaichulee, S.; Dicente Cid, Y.; Ingviya, T. Ensemble Deep Learning Architectures for Detecting Pulmonary Tuberculosis in Chest X-rays. Sci. Rep. 2026, 16, 1242. [Google Scholar] [CrossRef]
- Sharma, V.; Nillmani; Gupta, S.K.; Shukla, K.K. Deep Learning Models for Tuberculosis Detection and Infected Region Visualization in Chest X-ray Images. Intell. Med. 2024, 4, 104–113. [Google Scholar] [CrossRef]
- Chen, C.F.; Hsu, C.H.; Jiang, Y.C.; Wu, W.J.; Wang, C.C.; Chen, Y.H. A Deep Learning-based Algorithm for Pulmonary Tuberculosis Detection in Chest Radiography. Sci. Rep. 2024, 14, 14917. [Google Scholar] [CrossRef]
- Visu, P.; Sathiya, V.; Ajitha, P.; Surendran, R. Enhanced Swin Transformer Based Tuberculosis Classification with Segmentation Using Chest X-ray. J. X-Ray Sci. Technol. 2025, 33, 167–186. [Google Scholar] [CrossRef]
- Nafisah, S.I.; Muhammad, G. Tuberculosis Detection in Chest Radiograph Using Convolutional Neural Network Architecture and Explainable Artificial Intelligence. Neural Comput. Appl. 2024, 36, 111–131. [Google Scholar] [CrossRef] [PubMed]
- Devasia, J.; Goswami, H.; Lakshminarayanan, S.; Rajaram, M.; Adithan, S. Deep Learning Classification of Active Tuberculosis Lung Zones Wise Manifestations Using Chest X-rays: A Multi Label Approach. Sci. Rep. 2023, 13, 887. [Google Scholar] [CrossRef] [PubMed]
- Lakhani, P.; Sundaram, B. Deep Learning at Chest Radiography: Automated Classification of Pulmonary Tuberculosis by Using Convolutional Neural Networks. Radiology 2017, 284, 574–582. [Google Scholar] [CrossRef] [PubMed]
- Szegedy, C.; Liu, W.; Jia, Y.; Sermanet, P.; Reed, S.; Anguelov, D.; Erhan, D.; Vanhoucke, V.; Rabinovich, A. Going Deeper with Convolutions. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Boston, MA, USA, 7–12 June 2015; pp. 1–9. [Google Scholar]
- Hwang, E.J.; Park, S.; Jin, K.N.; Kim, J.I.; Choi, S.Y.; Lee, J.H.; Goo, J.M.; Park, C.M. Development and Validation of a Deep Learning-based Automatic Detection Algorithm for Active Pulmonary Tuberculosis on Chest Radiographs. Clin. Infect. Dis. 2019, 69, 739–747. [Google Scholar] [CrossRef]
- Rahman, T.; Khandakar, A.; Kadir, M.A.; Islam, K.R.; Islam, K.F.; Mazhar, R.; Tahir, T.; Islam, M.S.; Kashem, S.; Mahbub, Z.B.; et al. Reliable Tuberculosis Detection Using Chest X-ray with Deep Learning, Segmentation and Visualization. IEEE Access 2020, 8, 191586–191601. [Google Scholar] [CrossRef]
- Ait Nasser, A.; Akhloufi, M.A. A Review of Recent Advances in Deep Learning Models for Chest Disease Detection Using Radiography. Diagnostics 2023, 13, 159. [Google Scholar] [CrossRef]
- Maheswari, B.U.; Sam, D.; Mittal, N.; Gupta, S.K. Explainable Deep-neural-network Supported Scheme for Tuberculosis Detection from Chest Radiographs. BMC Med. Imaging 2024, 24, 32. [Google Scholar] [CrossRef]
- Zhou, B.; Khosla, A.; Lapedriza, A.; Oliva, A.; Torralba, A. Learning Deep Features for Discriminative Localization. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 27–30 June 2016; pp. 2921–2929. [Google Scholar]
- Ribeiro, M.T.; Singh, S.; Guestrin, C. “Why Should I Trust You?”: Explaining the Predictions of Any Classifier. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Francisco, CA, USA, 13–17 August 2016; pp. 1135–1144. [Google Scholar]
- Selvaraju, R.R.; Cogswell, M.; Das, A.; Vedantam, R.; Parikh, D.; Batra, D. Grad-CAM: Visual Explanations from Deep Networks via Gradient-based Localization. In Proceedings of the IEEE International Conference on Computer Vision (ICCV), Venice, Italy, 22–29 October 2017; pp. 618–626. [Google Scholar]
- Sarawagi, R.; Bajpai, A.; Chaubey, A.; Gupta, S. Self-Trained Convolutional Neural Network (CNN) for Tuberculosis Diagnosis in Medical Imaging. Cureus 2024, 16, e63356. [Google Scholar] [CrossRef]
- World Health Organization. WHO Consolidated Guidelines on Tuberculosis. Module 3: Diagnosis—Rapid Diagnostics for Tuberculosis Detection; World Health Organization: Geneva, Switzerland, 2021. [Google Scholar]
- Boehme, C.C.; Nabeta, P.; Hillemann, D.; Nicol, M.P.; Shenai, S.; Krapp, F.; Allen, J.; Tahirli, R.; Blakemore, R.; Rustomjee, R.; et al. Rapid molecular detection of tuberculosis and rifampin resistance. N. Engl. J. Med. 2010, 363, 300–311. [Google Scholar] [CrossRef]
- Krizhevsky, A.; Sutskever, I.; Hinton, G.E. ImageNet Classification with Deep Convolutional Neural Networks. Commun. ACM 2017, 60, 84–90. [Google Scholar] [CrossRef]
- Simonyan, K.; Zisserman, A. Very Deep Convolutional Networks for Large-scale Image Recognition. In Proceedings of the 3rd International Conference on Learning Representations (ICLR), San Diego, CA, USA, 7–9 May 2015. [Google Scholar]
- He, K.; Zhang, X.; Ren, S.; Sun, J. Deep Residual Learning for Image Recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 27–30 June 2016; pp. 770–778. [Google Scholar]
- Huang, G.; Liu, Z.; Van Der Maaten, L.; Weinberger, K.Q. Densely Connected Convolutional Networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, 21–26 July 2017; pp. 4700–4708. [Google Scholar]
- Tan, M.; Le, Q.V. EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks. In Proceedings of the 36th International Conference on Machine Learning (ICML), Long Beach, CA, USA, 9–15 June 2019; pp. 6105–6114. [Google Scholar]
- Pham, H.H.; Le, T.T.; Tran, D.Q.; Ngo, D.T.; Nguyen, H.Q. Interpreting Chest X-rays via CNNs that Exploit Hierarchical Disease Dependencies and Uncertainty Labels. Neurocomputing 2021, 437, 186–194. [Google Scholar] [CrossRef]
- Ronneberger, O.; Fischer, P.; Brox, T. U-Net: Convolutional Networks for Biomedical Image Segmentation. In Medical Image Computing and Computer—Assisted Intervention (MICCAI); Springer: Munich, Germany, 2015; pp. 234–241. [Google Scholar]
- Tolstikhin, I.; Houlsby, N.; Kolesnikov, A.; Beyer, L.; Zhai, X.; Unterthiner, T.; Yung, J.; Steiner, A.; Keysers, D.; Uszkoreit, J.; et al. MLP-Mixer: An All-MLP Architecture for Vision. In Proceedings of the Advances in Neural Information Processing Systems (NeurIPS), Online, 6–14 December 2021; pp. 24261–24272. [Google Scholar]
- Dosovitskiy, A.; Beyer, L.; Kolesnikov, A.; Weissenborn, D.; Zhai, X.; Unterthiner, T.; Dehghani, M.; Minderer, M.; Heigold, G.; Gelly, S.; et al. An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale. In Proceedings of the International Conference on Learning Representations (ICLR), Vienna, Austria, 3–7 May 2021. [Google Scholar]
- Jaeger, S.; Candemir, S.; Antani, S.; Wáng, Y.X.; Lu, P.X.; Thoma, G. Two Public Chest X-ray Datasets for Computer-aided Screening of Pulmonary Diseases. Quant. Imaging Med. Surg. 2014, 4, 475–477. [Google Scholar] [PubMed]
- Lee, S.; Yim, J.J.; Kwak, N.; Lee, J.K.; Kim, H.J. Deep Learning to Determine the Activity of Pulmonary Tuberculosis on Chest Radiographs. Radiology 2021, 301, 435–442. [Google Scholar] [CrossRef] [PubMed]
- Rahman, T.; Khandakar, A.; Rahman, A.; Mahbub, Z.B.; Islam, K.R.; Chowdhury, M.E.H. TB-CXRNet: Tuberculosis and Drug-resistant Tuberculosis Detection Technique Using Chest X-ray Images. Cogn. Comput. 2024, 16, 1393–1412. [Google Scholar] [CrossRef]
- Kazemzadeh, S.; Yu, J.; Jamshy, S.; Pilgrim, R.; Nabulsi, Z.; Chen, C.; Beladia, N.; Lau, C.; McKinney, S.M.; Hughes, T.; et al. Deep Learning Detection of Active Pulmonary Tuberculosis at Chest Radiography Matched the Clinical Performance of Radiologists. Radiology 2023, 306, 124–137. [Google Scholar] [CrossRef]
- Owda, M.; Al-Zubi, S.; Owda, A. A Lightweight Hybrid Deep Learning Model for Tuberculosis Detection from Chest X-Rays. Diagnostics 2025, 15, 3216. [Google Scholar] [CrossRef]
- Alshmrani, G.M.M.; Ni, Q.; Jiang, R.; Piao, S.; Xie, Y. A Deep Learning Architecture for Multi-class Lung Diseases Classification Using Chest X-ray (CXR) Images. Alex. Eng. J. 2023, 64, 923–935. [Google Scholar] [CrossRef]
- Ejiyi, C.J.; Qin, Z.; Nnani, A.O.; Agwu, P.K.; Anderson, C. ResfEANet: ResNet-fused External Attention Network for Tuberculosis Diagnosis Using Chest X-ray Images. Comput. Methods Programs Biomed. Update 2024, 5, 100133. [Google Scholar] [CrossRef]
- Ou, C.-Y.; Chen, I.-Y.; Chang, H.-T.; Wei, C.-Y.; Li, D.-Y.; Chen, Y.-K.; Chang, C.-Y. Deep Learning-Based Classification and Semantic Segmentation of Lung Tuberculosis Lesions in Chest X-ray Images. Diagnostics 2024, 14, 952. [Google Scholar] [CrossRef]
- Ali, A.; Zimerman, I.; Wolf, L. The Hidden Attention of Mamba Models. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops, Seattle, WA, USA, 17–21 June 2024. [Google Scholar]
- Zhu, L.; Liao, B.; Zhang, Q.; Wang, X.; Liu, W.; Wang, X. Vision Mamba: Efficient Visual Representation Learning with Bidirectional State Space Model. In Proceedings of the 41st International Conference on Machine Learning (ICML), Vienna, Austria, 21–27 July 2024. [Google Scholar]
- Gu, A.; Dao, T. Mamba: Linear-Time Sequence Modeling with Selective State Spaces. arXiv 2023, arXiv:2312.00752. [Google Scholar] [CrossRef]
- Locatello, F.; Weissenborn, D.; Unterthiner, T.; Mahendran, A.; Heigold, G.; Uszkoreit, J.; Dosovitskiy, A.; Kipf, T. Object-Centric Learning with Slot Attention. In Proceedings of the Advances in Neural Information Processing Systems (NeurIPS), Online, 6–12 December 2020; pp. 11525–11538. [Google Scholar]
- Jeon, W.; Jang, H.; Lee, H.; Choi, S. COPD Multi-Task Diagnosis on Chest X-Ray Using CNN-Based Slot Attention. Appl. Sci. 2026, 16, 14. [Google Scholar] [CrossRef]
- Ridley, M.; Luyt, D.K.; Scholvinck, E.; Broadbent, J. Anatomical Localization of Lung Disease in Children: Lobes, Segments and Bronchopulmonary Segments. Paediatr. Respir. Rev. 2020, 35, 66–73. [Google Scholar]
- Shazeer, N. GLU Variants Improve Transformer. arXiv 2020, arXiv:2002.05202. [Google Scholar] [CrossRef]
- Ramachandran, P.; Zoph, B.; Le, Q.V. Searching for Activation Functions. arXiv 2017, arXiv:1710.05941. [Google Scholar] [CrossRef]
- Hu, J.; Shen, L.; Sun, G. Squeeze-and-Excitation Networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Salt Lake City, UT, USA, 18–23 June 2018; pp. 7132–7141. [Google Scholar]
- Szegedy, C.; Vanhoucke, V.; Ioffe, S.; Shlens, J.; Wojna, Z. Rethinking the Inception Architecture for Computer Vision. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 27–30 June 2016; pp. 2818–2826. [Google Scholar]
- Giunchiglia, E.; Lukasiewicz, T. Coherent Hierarchical Multi-label Classification Networks. In Proceedings of the Advances in Neural Information Processing Systems (NeurIPS), Online, 6–12 December 2020; pp. 9662–9673. [Google Scholar]
- Kendall, A.; Gal, Y.; Cipolla, R. Multi-Task Learning Using Uncertainty to Weigh Losses for Scene Geometry and Semantics. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Salt Lake City, UT, USA, 18–23 June 2018; pp. 7482–7491. [Google Scholar]
- Deng, J.; Dong, W.; Socher, R.; Li, L.J.; Li, K.; Fei-Fei, L. ImageNet: A Large-Scale Hierarchical Image Database. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Miami, FL, USA, 20–25 June 2009; pp. 248–255. [Google Scholar]
- Buslaev, A.; Iglovikov, V.I.; Khvedchenya, E.; Parinov, A.; Druzhinin, M.; Kalinin, A.A. Albumentations: Fast and Flexible Image Augmentations. Information 2020, 11, 125. [Google Scholar] [CrossRef]
- Cubuk, E.D.; Zoph, B.; Shlens, J.; Le, Q.V. RandAugment: Practical Automated Data Augmentation with a Reduced Search Space. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops, Seattle, WA, USA, 14–19 June 2020; pp. 702–703. [Google Scholar]
- Chest X-Ray (Pneumonia, COVID-19, Tuberculosis). Kaggle Dataset. Available online: https://www.kaggle.com/datasets/jtiptj/chest-xray-pneumoniacovid19tuberculosis (accessed on 26 February 2025).
- DA and DB—TB Chest X-Ray Datasets. Kaggle Dataset. Available online: https://www.kaggle.com/datasets/vbookshelf/da-and-db-tb-chest-x-ray-datasets (accessed on 17 March 2026).
- Paszke, A.; Gross, S.; Massa, F.; Lerer, A.; Bradbury, J.; Chanan, G.; Killeen, T.; Lin, Z.; Gimelshein, N.; Antiga, L.; et al. PyTorch: An Imperative Style, High-Performance Deep Learning Library. In Proceedings of the Advances in Neural Information Processing Systems (NeurIPS), Vancouver, BC, Canada, 8–14 December 2019; pp. 8024–8035. [Google Scholar]
- Loshchilov, I.; Hutter, F. Decoupled Weight Decay Regularization. In Proceedings of the International Conference on Learning Representations (ICLR), New Orleans, LA, USA, 6–9 May 2019. [Google Scholar]
- Youden, W.J. Index for Rating Diagnostic Tests. Cancer 1950, 3, 32–35. [Google Scholar] [CrossRef]
- Chollet, F. Xception: Deep Learning with Depthwise Separable Convolutions. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, 21–26 July 2017; pp. 1251–1258. [Google Scholar]
- Liu, Z.; Mao, H.; Wu, C.Y.; Feichtenhofer, C.; Darrell, T.; Xie, S. A ConvNet for the 2020s. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), New Orleans, LA, USA, 18–24 June 2022; pp. 11976–11986. [Google Scholar]
- Ghiasi, A.; Kazemi, H.; Borji, A.; Goldstein, T. What Do Vision Transformers Learn? A Visual Exploration. arXiv 2022, arXiv:2212.06727. [Google Scholar] [CrossRef]








| Primary Class | Sub-Category | Training Set | Validation Set | Test Set | Total |
|---|---|---|---|---|---|
| Normal | - | 16,225 | 2028 | 2028 | 20,281 |
| Pneumonia | - | 6658 | 832 | 833 | 8323 |
| Tuberculosis | Inactive TB | 7282 | 910 | 911 | 9103 |
| Active TB | 3868 | 484 | 483 | 4835 | |
| Total | 34,033 | 4254 | 4255 | 42,542 |
| (a) Kaggle-CXR dataset | ||||
| Primary class | Sub-class | Train | Test | Total |
| Normal | - | 1341 | 242 | 1583 |
| Pneumonia | Pneumonia | 3875 | 398 | 4273 |
| COVID-19 | 460 | 116 | 576 | |
| Tuberculosis | - | 650 | 53 | 703 |
| Total | - | 6326 | 809 | 7135 |
| (b) DA and DB Tuberculosis Chest X-ray dataset | ||||
| Class | Train | Test | Total | |
| Normal | 102 | 51 | 153 | |
| Tuberculosis | 74 | 51 | 125 | |
| Total | 176 | 102 | 278 | |
| Parameter | Value |
|---|---|
| Backbone Network | Vision Mamba Base (Pretrained on ImageNet) |
| Image Resolution | 512 × 512 |
| Batch Size | 104 (Physical)/1040 (Effective with Gradient Accumulation) |
| Total Epochs | 100 |
| Optimizer | AdamW (Decoupled Weight Decay 0.01) |
| Learning Rate | 1 × 10−4 |
| Learning Rate Scheduler | Linear Warmup (10 Epochs) + Cosine Annealing (90 Epochs) |
| Loss Function (Screening) | Cross-Entropy Loss (Label Smoothing 0.1) |
| Loss Function (Activity) | Binary Cross-Entropy Loss (Target Smoothing 0.1) |
| Classification Loss | Categorical Cross-Entropy (Label Smoothing 0.1) |
| Multi-Task Strategy | Uncertainty-Weighted Loss + Conditional Masking |
| Sampling Strategy | Weighted Random Sampler (Balanced 1:1:1:1), Drop Last |
| Precision | Automatic Mixed Precision (FP16) |
| Normalization | ImageNet Mean & Std |
| Data Augmentation | Albumentations (Geometric & Intensity Transforms) |
| Model | Params | Overall | Screening | Activity | |||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| J | Acc | Sen | Spe | J | Acc | Sen | Spe | J | Acc | Sen | Spe | ||
| ResNet152 | 58.18 M | 77.36% | 92.15% | 83.28% | 94.08% | 74.90% | 90.18% | 82.71% | 92.20% | 90.56% | 95.48% | 94.62% | 95.94% |
| Xception | 20.82 M | 78.15% | 91.99% | 84.00% | 94.15% | 75.70% | 90.04% | 83.42% | 92.28% | 90.14% | 95.34% | 94.20% | 95.94% |
| EfficientNet B7 | 63.81 M | 78.43% | 92.56% | 84.13% | 94.30% | 76.03% | 90.79% | 83.47% | 92.56% | 90.51% | 95.77% | 93.58% | 96.93% |
| ViT-B | 86.44 M | 78.11% | 92.80% | 83.69% | 94.42% | 76.16% | 91.01% | 83.47% | 92.69% | 89.17% | 95.34% | 92.13% | 97.04% |
| ConvNeXt | 87.57 M | 78.55% | 92.43% | 84.21% | 94.35% | 76.52% | 90.54% | 83.95% | 92.56% | 88.66% | 95.19% | 91.51% | 97.15% |
| Vision Mamba | 97.69 M | 79.40% | 92.60% | 84.87% | 94.53% | 77.59% | 90.77% | 84.74% | 92.85% | 90.58% | 95.62% | 94.20% | 96.38% |
| Proposed Model | 144.27 M | 79.55% | 92.96% | 84.94% | 94.61% | 77.67% | 91.27% | 84.70% | 92.97% | 90.83% | 95.91% | 93.79% | 97.04% |
| Fold | Overall | Screening | Activity | |||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| J | Acc | Sen | Spe | J | Acc | Sen | Spe | J | Acc | Sen | Spe | |
| 1 | 79.67% | 93.20% | 84.88% | 94.79% | 78.11% | 91.58% | 84.89% | 93.21% | 86.21% | 94.51% | 88.52% | 97.69% |
| 2 | 79.09% | 92.84% | 84.50% | 94.59% | 77.46% | 91.15% | 84.50% | 92.96% | 85.84% | 94.33% | 88.31% | 97.53% |
| 3 | 79.68% | 92.75% | 85.05% | 94.64% | 77.95% | 91.06% | 84.94% | 93.01% | 87.17% | 94.69% | 89.97% | 97.20% |
| 4 | 80.30% | 93.27% | 85.44% | 94.86% | 78.77% | 91.66% | 85.43% | 93.34% | 87.41% | 94.98% | 89.56% | 97.86% |
| 5 | 79.50% | 92.99% | 84.89% | 94.61% | 77.59% | 91.32% | 84.59% | 93.00% | 87.67% | 95.05% | 89.87% | 97.80% |
| Mean ± Std | 79.65% ± 0.44% | 93.01% ± 0.22% | 84.95% ± 0.34% | 94.70% ± 0.12% | 77.97% ± 0.52% | 91.35% ± 0.26% | 84.87% ± 0.37% | 93.11% ± 0.17% | 86.86% ± 0.79% | 94.71% ± 0.30% | 89.25% ± 0.77% | 97.62% ± 0.26% |
| Fold | Class | Overall | |||
|---|---|---|---|---|---|
| J | Acc | Sen | Spe | ||
| 1 | Normal | 79.77% | 89.82% | 91.20% | 88.57% |
| Pneumonia | 68.36% | 90.82% | 73.27% | 95.09% | |
| Inactive TB | 83.34% | 94.61% | 86.54% | 96.80% | |
| Active TB | 87.21% | 97.53% | 88.52% | 98.69% | |
| 2 | Normal | 78.47% | 89.21% | 89.69% | 88.77% |
| Pneumonia | 67.90% | 90.34% | 73.45% | 94.45% | |
| Inactive TB | 83.06% | 94.38% | 86.55% | 96.52% | |
| Active TB | 86.92% | 97.44% | 88.31% | 98.61% | |
| 3 | Normal | 77.96% | 89.02% | 88.04% | 89.91% |
| Pneumonia | 69.80% | 90.34% | 75.96% | 93.83% | |
| Inactive TB | 82.63% | 94.23% | 86.22% | 96.41% | |
| Active TB | 88.35% | 97.43% | 89.97% | 98.38% | |
| 4 | Normal | 79.36% | 89.65% | 90.46% | 88.90% |
| Pneumonia | 69.93% | 90.74% | 75.48% | 94.45% | |
| Inactive TB | 83.64% | 94.99% | 86.27% | 97.37% | |
| Active TB | 88.28% | 97.68% | 89.56% | 98.73% | |
| 5 | Normal | 78.14% | 89.02% | 90.09% | 88.05% |
| Pneumonia | 67.41% | 90.42% | 72.67% | 94.74% | |
| Inactive TB | 83.86% | 94.79% | 86.92% | 96.93% | |
| Active TB | 88.59% | 97.72% | 89.87% | 98.73% | |
| Mean ± Std | Normal | 78.74% ± 0.79% | 89.34% ± 0.37% | 89.90% ± 1.18% | 88.84% ± 0.68% |
| Pneumonia | 68.68% ± 1.13% | 90.53% ± 0.23% | 74.17% ± 1.46% | 94.51% ± 0.46% | |
| Inactive TB | 83.31% ± 0.48% | 94.60% ± 0.31% | 86.50% ± 0.28% | 96.81% ± 0.38% | |
| Active TB | 87.87% ± 0.75% | 97.56% ± 0.14% | 89.25% ± 0.77% | 98.63% ± 0.14% | |
| Model | Overall | |||
|---|---|---|---|---|
| J | Acc | Sen | Spe | |
| ResNet152 | 91.45% | 96.66% | 94.59% | 96.86% |
| Xception | 91.91% | 96.79% | 94.74% | 97.17% |
| EfficientNet B7 | 92.00% | 96.85% | 94.94% | 97.06% |
| ViT-B | 91.93% | 96.91% | 94.78% | 97.15% |
| ConvNeXt | 93.23% | 97.34% | 95.66% | 97.57% |
| Vision Mamba | 93.47% | 97.03% | 95.94% | 97.53% |
| Proposed Model | 94.04% | 97.34% | 96.33% | 97.71% |
| Model | Binary | |||
|---|---|---|---|---|
| J | Acc | Sen | Spe | |
| ResNet152 | 72.55% | 86.27% | 88.24% | 84.31% |
| Xception | 74.50% | 87.25% | 82.35% | 92.16% |
| EfficientNet B7 | 76.47% | 88.24% | 78.43% | 98.04% |
| ViT-B | 76.47% | 88.24% | 84.31% | 92.16% |
| ConvNeXt | 76.47% | 88.24% | 90.20% | 86.27% |
| Vision Mamba | 78.43% | 89.22% | 82.35% | 96.08% |
| Proposed Model | 80.39% | 90.20% | 80.39% | 100.00% |
| Dataset | Head Configuration | Overall | |||
|---|---|---|---|---|---|
| J | Acc | Sen | Spe | ||
| SCH-CXR | Unified Single-Head | 79.32% | 92.91% | 84.70% | 94.62% |
| Hierarchical Multi-Head (Ours) | 79.55% | 92.96% | 84.94% | 94.61% | |
| Kaggle-CXR | Unified Single-Head | 92.85% | 97.22% | 95.41% | 97.44% |
| Hierarchical Multi-Head (Ours) | 94.04% | 97.34% | 96.33% | 97.71% | |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2026 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license.
Share and Cite
Jeon, W.; Jang, H.; Lee, H.; Park, C.; Lyu, J.; Choi, S. Decision-Aware Vision Mamba with Context-Guided Slot Mixing for Chest X-Ray Screening and Culture-Based Hierarchical Tuberculosis Classification. Sensors 2026, 26, 2100. https://doi.org/10.3390/s26072100
Jeon W, Jang H, Lee H, Park C, Lyu J, Choi S. Decision-Aware Vision Mamba with Context-Guided Slot Mixing for Chest X-Ray Screening and Culture-Based Hierarchical Tuberculosis Classification. Sensors. 2026; 26(7):2100. https://doi.org/10.3390/s26072100
Chicago/Turabian StyleJeon, Wangsu, Hyeonung Jang, Hongchang Lee, Chanho Park, Jiwon Lyu, and Seongjun Choi. 2026. "Decision-Aware Vision Mamba with Context-Guided Slot Mixing for Chest X-Ray Screening and Culture-Based Hierarchical Tuberculosis Classification" Sensors 26, no. 7: 2100. https://doi.org/10.3390/s26072100
APA StyleJeon, W., Jang, H., Lee, H., Park, C., Lyu, J., & Choi, S. (2026). Decision-Aware Vision Mamba with Context-Guided Slot Mixing for Chest X-Ray Screening and Culture-Based Hierarchical Tuberculosis Classification. Sensors, 26(7), 2100. https://doi.org/10.3390/s26072100

