EyeXNet: Enhancing Abnormality Detection and Diagnosis via Eye-Tracking and X-ray Fusion
Abstract
:1. Introduction
- EyeXNet, a DL framework that combines chest X-ray images with fixation maps corresponding to the reporting moments of radiologists, aiming to improve the performance of automatic abnormality detection in chest X-ray images.
- To the extent of our knowledge, our approach is the first in the literature to use the filtering of gaze related to non-reporting moments to obtain more meaningful eye-tracking data as a proxy of the radiologists’ attention in the abnormality detection network’s training process.
- A comprehensive evaluation of EyeXNet using various DL architectures, including ConvNext, DenseNet, VGG, MobileNet, RegNet, and EfficientNet, and a comparison of their performance with baseline models that relies solely on image-level information.
- An analysis involving a think-aloud experiment with two experienced radiologists sheds light on the challenges faced by radiologists during their assessments and how these challenges may influence the performance of DL models for chest X-ray abnormality detection.
2. Key Eye-Tracking Concepts in Radiology
3. Related Work
4. EyeXNet Architecture
- A set of CXR images: ;
- A set of eye-tracking heatmap: .
5. EyeXNet Complexity Analysis
6. Experimental Setup
6.1. Models
6.2. Dataset
6.3. Data Preprocessing
6.4. Evaluation
6.5. Model Complexity Analysis
7. Results
Human Grounded Results
8. Discussion
8.1. Redundancy in Fixation Masks
8.2. Model Architecture
8.3. Data Quality Challenges
9. Conclusions
Author Contributions
Funding
Data Availability Statement
Acknowledgments
Conflicts of Interest
References
- Parker, M.S.; Chasen, M.H.; Paul, N. Radiologic Signs in Thoracic Imaging: Case-Based Review and Self-Assessment Module. Am. J. Roentgenol. 2009, 192, S34–S48. [Google Scholar] [CrossRef] [PubMed]
- Moses, D.A. Deep learning applied to automatic disease detection using chest X-rays. J. Med. Imaging Radiat. Oncol. 2021, 65, 498–517. [Google Scholar] [CrossRef] [PubMed]
- Bigolin Lanfredi, R.; Zhang, M.; Auffermann, W.F.; Chan, J.; Duong, P.A.T.; Srikumar, V.; Drew, T.; Schroeder, J.D.; Tasdizen, T. REFLACX, a dataset of reports and eye-tracking data for localization of abnormalities in chest x-rays. Sci. Data 2022, 9, 350. [Google Scholar] [CrossRef] [PubMed]
- Karargyris, A.; Kashyap, S.; Lourentzou, I.; Wu, J.T.; Sharma, A.; Tong, M.; Abedin, S.; Beymer, D.; Mukherjee, V.; Krupinski, E.A.; et al. Creation and validation of a chest X-ray dataset with eye-tracking and report dictation for AI development. Sci. Data 2021, 8, 92. [Google Scholar] [CrossRef]
- Luís, A.; Hsieh, C.; Nobre, I.B.; Sousa, S.C.; Maciel, A.; Moreira, C.; Jorge, J. Integrating Eye-Gaze Data into CXR DL Approaches: A Preliminary study. In Proceedings of the 2023 IEEE Conference on Virtual Reality and 3D User Interfaces Abstracts and Workshops (VRW), Shanghai, China, 25–29 March 2023. [Google Scholar]
- Pershin, I.; Mustafaev, T.; Ibragimova, D.; Ibragimov, B. Changes in Radiologists’ Gaze Patterns Against Lung X-rays with Different Abnormalities: A Randomized Experiment. J. Digit. Imaging 2023, 36, 767–775. [Google Scholar] [CrossRef] [PubMed]
- Castner, N.; Kuebler, T.C.; Scheiter, K.; Richter, J.; Eder, T.; Hüttig, F.; Keutel, C.; Kasneci, E. Deep semantic gaze embedding and scanpath comparison for expertise classification during OPT viewing. In Proceedings of the ACM Symposium on Eye Tracking Research and Applications, Stuttgart, Germany, 2–5 June 2020; pp. 1–10. [Google Scholar]
- Saporta, A.; Gui, X.; Agrawal, A.; Pareek, A.; Truong, S.Q.; Nguyen, C.D.; Ngo, V.D.; Seekins, J.; Blankenberg, F.G.; Ng, A.Y.; et al. Benchmarking saliency methods for chest X-ray interpretation. Nat. Mach. Intell. 2022, 4, 867–878. [Google Scholar] [CrossRef]
- Neves, J.; Hsieh, C.; Nobre, I.B.; Sousa, S.C.; Ouyang, C.; Maciel, A.; Duchowski, A.; Jorge, J.; Moreira, C. Shedding light on ai in radiology: A systematic review and taxonomy of eye gaze-driven interpretability in deep learning. Eur. J. Radiol. 2024, 172, 111341. [Google Scholar] [CrossRef] [PubMed]
- Wang, S.; Ouyang, X.; Liu, T.; Wang, Q.; Shen, D. Follow My Eye: Using Gaze to Supervise Computer-Aided Diagnosis. IEEE Trans. Med. Imaging 2022, 41, 1688–1698. [Google Scholar] [CrossRef] [PubMed]
- Saab, K.; Hooper, S.M.; Sohoni, N.S.; Parmar, J.; Pogatchnik, B.; Wu, S.; Dunnmon, J.A.; Zhang, H.R.; Rubin, D.; Ré, C. Observational Supervision for Medical Image Classification Using Gaze Data. In Proceedings of the Medical Image Computing and Computer Assisted Intervention—MICCAI 2021, Strasbourg, France, 27 September–1 October 2021; de Bruijne, M., Cattin, P.C., Cotin, S., Padoy, N., Speidel, S., Zheng, Y., Essert, C., Eds.; Springer International Publishing: Cham, Switzerland, 2021; pp. 603–614. [Google Scholar]
- Watanabe, A.; Ketabi, S.; Namdar, K.; Khalvati, F. Improving disease classification performance and explainability of deep learning models in radiology with heatmap generators. Front. Radiol. 2022, 2, 991683. [Google Scholar] [CrossRef]
- Nie, W.; Zhang, Y.; Patel, A. A Theoretical Explanation for Perplexing Behaviors of Backpropagation-based Visualizations. arXiv 2018, arXiv:1805.07039. [Google Scholar]
- Agnihotri, P.; Ketabi, S.; Khalvati, F. Using Multi-modal Data for Improving Generalizability and Explainability of Disease Classification in Radiology. arXiv 2022, arXiv:2207.14781. [Google Scholar]
- Qi, Z.; Khorram, S.; Li, F. Visualizing Deep Networks by Optimizing with Integrated Gradients. arXiv 2019, arXiv:1905.00954. [Google Scholar] [CrossRef]
- Lanfredi, R.B.; Arora, A.; Drew, T.; Schroeder, J.D.; Tasdizen, T. Comparing radiologists’ gaze and saliency maps generated by interpretability methods for chest X-rays. arXiv 2021, arXiv:2112.11716. [Google Scholar]
- Moreira, C.; Alvito, D.; Sousa, S.C.; Nobre, I.B.; Ouyang, C.; Kopper, R.; Duchowski, A.; Jorge, J. Comparing Visual Search Patterns in Chest X-ray Diagnostics. In Proceedings of the ACM on Computer Graphics and Interactive Techniques (ETRA), Tübingen, Germany, 29 May–3 June 2023. [Google Scholar]
- Shneiderman, B. Human-Centered AI; Oxford University Press: Oxford, UK, 2022. [Google Scholar]
- El Kafhali, S.; Alzubaidi, L.; Al-Sabaawi, A.; Bai, J.; Dukhan, A.; Alkenani, A.H.; Al-Asadi, A.; Alwzwazy, H.A.; Manoufali, M.; Fadhel, M.A.; et al. Towards Risk-Free Trustworthy Artificial Intelligence: Significance and Requirements. Int. J. Intell. Syst. 2023, 2023, 4459198. [Google Scholar] [CrossRef]
- Stöger, K.; Schneeberger, D.; Holzinger, A. Medical artificial intelligence: The European legal perspective. Commun. ACM 2021, 64, 34–36. [Google Scholar] [CrossRef]
- Holmqvist, K.; Andersson, R. Eye-Tracking: A Comprehensive Guide to Methods, Paradigms and Measures; Oxford University Press: Oxford, UK, 2017. [Google Scholar]
- Nodine, C.F.; Kundel, H.L. Using eye movements to study visual search and to improve tumor detection. RadioGraphics 1987, 7, 1241–1250. [Google Scholar] [CrossRef] [PubMed]
- Duchowski, A.T. Eye Tracking Methodology: Theory and Practice; Springer: Berlin/Heidelberg, Germany, 2003. [Google Scholar]
- Rong, Y.; Xu, W.; Akata, Z.; Kasneci, E. Human Attention in Fine-grained Classification. In Proceedings of the 32nd British Machine Vision Conference (BMVC), Online, 22–25 November 2021. [Google Scholar]
- Hochreiter, S.; Schmidhuber, J. Long Short-term Memory. Neural Comput. 1997, 9, 1735–1780. [Google Scholar] [CrossRef]
- Woo, S.; Park, J.; Lee, J.; Kweon, I.S. CBAM: Convolutional Block Attention Module. arXiv 2018, arXiv:1807.06521. [Google Scholar]
- Dosovitskiy, A.; Beyer, L.; Kolesnikov, A.; Weissenborn, D.; Zhai, X.; Unterthiner, T.; Dehghani, M.; Minderer, M.; Heigold, G.; Gelly, S.; et al. An Image is Worth 16 × 16 Words: Transformers for Image Recognition at Scale. arXiv 2020, arXiv:2010.11929. [Google Scholar]
- Carmody, D.; Nodine, C.; Kundel, H. Finding lung nodules with and without comparative visual scanning. Percept. Psychophys. 1981, 29, 594–598. [Google Scholar] [CrossRef]
- Krupinski, E.A. Visual scanning patterns of radiologists searching mammograms. Academic radiology 1996, 3 2, 137–144. [Google Scholar] [CrossRef]
- Hu, C.H.; Kundel, H.L.; Nodine, C.F.; Krupinski, E.A.; Toto, L.C. Searching for bone fractures: A comparison with pulmonary nodule search. Acad. Radiol. 1994, 1, 25–32. [Google Scholar] [CrossRef] [PubMed]
- Kendall, A.; Gal, Y.; Cipolla, R. Multi-Task Learning Using Uncertainty to Weigh Losses for Scene Geometry and Semantics. arXiv 2017, arXiv:1705.07115. [Google Scholar]
- Matthew Zeiler, D.; Rob, F. Visualizing and understanding convolutional neural networks. In Proceedings of the European Conference on Computer Vision, Zurich, Switzerland, 6–12 September 2014. [Google Scholar]
- Howard, A.G.; Zhu, M.; Chen, B.; Kalenichenko, D.; Wang, W.; Weyand, T.; Andreetto, M.; Adam, H. Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv 2017, arXiv:1704.04861. [Google Scholar]
- He, K.; Zhang, X.; Ren, S.; Sun, J. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 770–778. [Google Scholar]
- Huang, G.; Liu, Z.; Van Der Maaten, L.; Weinberger, K.Q. Densely connected convolutional networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 4700–4708. [Google Scholar]
- Tan, M.; Le, Q. Efficientnet: Rethinking model scaling for convolutional neural networks. In Proceedings of the International Conference on Machine Learning. PMLR, Long Beach, CA, USA, 10–15 June 2019; pp. 6105–6114. [Google Scholar]
- Liu, Z.; Mao, H.; Wu, C.Y.; Feichtenhofer, C.; Darrell, T.; Xie, S. A convnet for the 2020s. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA, 18–24 June 2022; pp. 11976–11986. [Google Scholar]
- Simonyan, K.; Zisserman, A. Two-stream convolutional networks for action recognition in videos. Adv. Neural Inf. Process. Syst. 2014, 27, 568–576. [Google Scholar]
- Radosavovic, I.; Kosaraju, R.P.; Girshick, R.; He, K.; Dollár, P. Designing network design spaces. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 13–19 June 2020; pp. 10428–10436. [Google Scholar]
- Hsieh, C.; Ouyang, C.; Nascimento, J.C.; Pereira, J.a.; Jorge, J.; Moreira, C. MIMIC-Eye: Integrating MIMIC Datasets with REFLACX and Eye Gaze for Multimodal Deep Learning Applications (version 1.0.0). PhysioNet 2023. [Google Scholar] [CrossRef]
- Johnson, A.E.; Pollard, T.J.; Berkowitz, S.J.; Greenbaum, N.R.; Lungren, M.P.; Deng, C.Y.; Mark, R.G.; Horng, S. MIMIC-CXR, a de-identified publicly available database of chest radiographs with free-text reports. Sci. Data 2019, 6, 317. [Google Scholar] [CrossRef] [PubMed]
- Johnson, A.E.; Pollard, T.J.; Berkowitz, S.J.; Greenbaum, N.R.; Lungren, M.P.; Deng, C.y.; Mark, R.G.; Horng, S. MIMIC-CXR Database (version 2.0.0). PhysioNet 2019. [Google Scholar] [CrossRef]
- Johnson, A.E.; Pollard, T.J.; Greenbaum, N.R.; Lungren, M.P.; Deng, C.y.; Peng, Y.; Lu, Z.; Mark, R.G.; Berkowitz, S.J.; Horng, S. MIMIC-CXR-JPG, a large publicly available database of labeled chest radiographs. arXiv 2019, arXiv:1901.07042. [Google Scholar]
- Johnson, A.; Bulgarelli, L.; Pollard, T.; Celi, L.A.; Mark, R.; Horng IV, S. MIMIC-IV-ED (Version: 2.2). PhysioNet 2023. [Google Scholar] [CrossRef]
- Johnson, A.; Bulgarelli, L.; Pollard, T.; Horng, S.; Celi, L.A.; Mark, R. MIMIC-IV (version 2.2). Physionet 2023. [Google Scholar] [CrossRef]
- Goldberger, A.; Amaral, L.; Glass, L.; Hausdorff, J.; Ivanov, P.; Mark, R.; Mietus, J.; Moody, G.; Peng, C.; Stanley, H. PhysioBank, PhysioToolkit, and PhysioNet: Components of a new research resource for complex physiologic signals. Circulation 2020, 101, e215–e220. [Google Scholar] [CrossRef] [PubMed]
- Chakraborty, D.P. A brief history of free-response receiver operating characteristic paradigm data analysis. Acad. Radiol. 2013, 20, 915–919. [Google Scholar] [CrossRef]
- Ganesan, A.; Alakhras, M.; Brennan, P.C.; Mello-Thoms, C. A review of factors influencing radiologists’ visual search behaviour. J. Med. Imaging Radiat. Oncol. 2018, 62, 747–757. [Google Scholar] [CrossRef] [PubMed]
- Hsieh, C.; Nobre, I.B.; Sousa, S.C.; Ouyang, C.; Brereton, M.; Nascimento, J.C.; Jorge, J.; Moreira, C. MDF-Net: Multimodal Dual-Fusion Network for Abnormality Detection using CXR Images and Clinical Data. arXiv 2023, arXiv:2302.13390. [Google Scholar]
- Borys, M.; Plechawska-Wójcik, M. Eye-tracking metrics in perception and visual attention research. EJMT 2017, 3, 11–23. [Google Scholar]
- Harezlak, K.; Kasprowski, P. Application of eye tracking in medicine: A survey, research issues and challenges. Comput. Med. Imaging Graph. 2018, 65, 176–190. [Google Scholar] [CrossRef] [PubMed]
- Mall, S.; Brennan, P.C.; Mello-Thoms, C.R. Modeling visual search behavior of breast radiologists using a deep convolution neural network. J. Med. Imaging 2018, 5, 035502. [Google Scholar] [CrossRef]
- Mall, S.; Brennan, P.C.; Mello-Thoms, C. Can a Machine Learn from Radiologists’ Visual Search Behaviour and Their Interpretation of Mammograms—A Deep-Learning Study. J. Digit. Imaging 2019, 32, 746–760. [Google Scholar] [CrossRef] [PubMed]
- Mall, S.; Brennan, P.; Mello-Thoms, C. Fixated and Not Fixated Regions of Mammograms, A Higher-Order Statistical Analysis of Visual Search Behavior. Acad. Radiol. 2017, 24, 442–455. [Google Scholar] [CrossRef]
- Khosravan, N.; Celik, H.; Turkbey, B.; Jones, E.C.; Wood, B.; Bagci, U. A collaborative computer aided diagnosis (C-CAD) system with eye-tracking, sparse attentional model, and deep learning. Med. Image Anal. 2019, 51, 101–115. [Google Scholar] [CrossRef] [PubMed]
Backbone | Setting | Sensitivity @[0.5] | Sensitivity @[1.0] | Sensitivity @[2.0] | Sensitivity @[4.0] | mFROC @[0.5, 1, 2, 4] | mRecall |
---|---|---|---|---|---|---|---|
MobileNet [33] | image only | 0.489 | 0.665 | 0.813 | 0.916 | 0.720 | 0.345 |
fixation maps | 0.520 | 0.686 | 0.830 | 0.946 | 0.745 | 0.448 | |
ResNet18 [34] | image only | 0.547 | 0.700 | 0.844 | 0.927 | 0.755 | 0.348 |
fixation maps | 0.571 | 0.735 | 0.881 | 0.969 | 0.789 | 0.448 | |
DenseNet161 [35] | image only | 0.597 | 0.796 | 0.922 | 0.967 | 0.820 | 0.396 |
fixation maps | 0.564 | 0.721 | 0.860 | 0.957 | 0.775 | 0.414 | |
EfficientNetB5 [36] | image only | 0.565 | 0.744 | 0.886 | 0.951 | 0.787 | 0.491 |
fixation maps | 0.566 | 0.752 | 0.883 | 0.961 | 0.791 | 0.391 | |
EfficientNetB0 [36] | image only | 0.575 | 0.750 | 0.877 | 0.947 | 0.786 | 0.325 |
fixation maps | 0.581 | 0.764 | 0.895 | 0.971 | 0.803 | 0.414 | |
ConvNextNet [37] | image only | 0.574 | 0.748 | 0.902 | 0.965 | 0.797 | 0.457 |
fixation maps | 0.597 | 0.759 | 0.907 | 0.975 | 0.808 | 0.491 | |
VGG16 [38] | image only | 0.570 | 0.753 | 0.897 | 0.970 | 0.798 | 0.443 |
fixation maps | 0.582 | 0.784 | 0.914 | 0.987 | 0.817 | 0.430 | |
RegNet [39] | image only | 0.432 | 0.626 | 0.786 | 0.910 | 0.688 | 0.470 |
fixation maps | 0.565 | 0.734 | 0.869 | 0.959 | 0.782 | 0.430 | |
Overall Best Model | ConvNextNet [w/fixations] | DenseNet161 [Image only] | DenseNet161 [Image only] | VGG16 [w/fixations] | DenseNet161 [image only] | ConvNextNet [w/fixations] |
Backbone | Setting | #TP | #FP | #FN | mRecall | mPrecision |
---|---|---|---|---|---|---|
MobileNet | image only | 690 | 3387 | 844 | 0.345 | 0.096 |
fixation maps | 931 | 6485 | 603 | 0.448 | 0.111 | |
ResNet18 | image only | 685 | 3014 | 849 | 0.348 | 0.110 |
fixation maps | 878 | 5197 | 656 | 0.448 | 0.124 | |
DenseNet161 | image only | 849 | 4920 | 685 | 0.396 | 0.105 |
fixation maps | 821 | 3983 | 713 | 0.414 | 0.123 | |
EfficientNetB5 | image only | 915 | 8657 | 619 | 0.491 | 0.126 |
fixation maps | 819 | 3710 | 715 | 0.391 | 0.117 | |
EfficientNetB0 | image only | 691 | 2707 | 843 | 0.325 | 0.103 |
fixation maps | 842 | 3755 | 692 | 0.414 | 0.123 | |
ConvNextNet | image only | 905 | 5145 | 629 | 0.457 | 0.138 |
fixation maps | 945 | 4758 | 589 | 0.491 | 0.150 | |
VGG16 | image only | 918 | 6122 | 616 | 0.443 | 0.122 |
fixation maps | 888 | 4228 | 646 | 0.430 | 0.120 | |
RegNet | image only | 945 | 8008 | 590 | 0.470 | 0.095 |
fixation maps | 848 | 4260 | 686 | 0.430 | 0.118 | |
Overall Best Model | ConvNextNet [w/fixations] | EfficientNetB0 [Image only] | ConvNextNet [w/fixations] | ConvNextNet [w/fixations] | ConvNextNet [w/fixations] |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Hsieh, C.; Luís, A.; Neves, J.; Nobre, I.B.; Sousa, S.C.; Ouyang, C.; Jorge, J.; Moreira, C. EyeXNet: Enhancing Abnormality Detection and Diagnosis via Eye-Tracking and X-ray Fusion. Mach. Learn. Knowl. Extr. 2024, 6, 1055-1071. https://doi.org/10.3390/make6020048
Hsieh C, Luís A, Neves J, Nobre IB, Sousa SC, Ouyang C, Jorge J, Moreira C. EyeXNet: Enhancing Abnormality Detection and Diagnosis via Eye-Tracking and X-ray Fusion. Machine Learning and Knowledge Extraction. 2024; 6(2):1055-1071. https://doi.org/10.3390/make6020048
Chicago/Turabian StyleHsieh, Chihcheng, André Luís, José Neves, Isabel Blanco Nobre, Sandra Costa Sousa, Chun Ouyang, Joaquim Jorge, and Catarina Moreira. 2024. "EyeXNet: Enhancing Abnormality Detection and Diagnosis via Eye-Tracking and X-ray Fusion" Machine Learning and Knowledge Extraction 6, no. 2: 1055-1071. https://doi.org/10.3390/make6020048
APA StyleHsieh, C., Luís, A., Neves, J., Nobre, I. B., Sousa, S. C., Ouyang, C., Jorge, J., & Moreira, C. (2024). EyeXNet: Enhancing Abnormality Detection and Diagnosis via Eye-Tracking and X-ray Fusion. Machine Learning and Knowledge Extraction, 6(2), 1055-1071. https://doi.org/10.3390/make6020048