Deep Learning for Laryngopharyngeal Reflux Diagnosis
Abstract
:1. Introduction
2. Materials and Methods
2.1. Data Collection
2.2. Backbone Network
2.3. Attention Modules
2.4. Dealing with Data Deficiency
3. Results
3.1. Backbone Network with Attention Modules
3.2. Transfer Learning
3.3. Few-Shot Learning
4. Discussion
5. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
- Koufman, J.A. The otolaryngologic manifestations of gastroesophageal reflux disease (GERD): A clinical investigation of 225 patients using ambulatory 24-hour pH monitoring and an experimental investigation of the role of acid and pepsin in the development of laryngeal injury. Laryngoscope 1991, 101, 1–78. [Google Scholar] [PubMed]
- Du, C.; Al-Ramahi, J.; Liu, Q.; Yan, Y.; Jiang, J. Validation of the Laryngopharyngeal Reflux Color and Texture Recognition Compared to pH-Probe Monitoring. Laryngoscope 2017, 127, 665–670. [Google Scholar] [CrossRef] [PubMed]
- Litjens, G.; Kooi, T.; Bejnordi, B.E.; Setio, A.A.A.; Ciompi, F.; Ghafoorian, M.; van der Laak, J.A.W.M.; van Ginneken, B.; Sánchez, C.I. A survey on deep learning in medical image analysis. Med. Image Anal. 2017, 42, 60–88. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- LeCun, Y.; Bengio, Y.; Hinton, G. Deep learning. Nature 2015, 521, 436–444. [Google Scholar] [CrossRef] [PubMed]
- He, K.; Zhang, X.; Ren, S.; Sun, J. Deep Residual Learning for Image Recognition. In Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 27–30 June 2016; IEEE: New York, NY, USA, 2016; pp. 770–778. [Google Scholar] [CrossRef] [Green Version]
- Nair, V.; Hinton, G. Rectified Linear Units Improve Restricted Boltzmann Machines. In Proceedings of the 27th International Conference on Machine Learning (ICML-10), Haifa, Israel, 21–24 June 2010; pp. 807–814. [Google Scholar]
- Ioffe, S.; Szegedy, C. Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift. arXiv 2015, arXiv:150203167. [Google Scholar]
- Hu, J.; Shen, L.; Sun, G. Squeeze-and-Excitation Networks. In Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Salt Lake City, UT, USA, 18–23 June 2018; IEEE: New York, NY, USA, 2018; pp. 7132–7141. [Google Scholar] [CrossRef] [Green Version]
- Woo, S.; Park, J.; Lee, J.-Y.; Kweon, I.S. CBAM: Convolutional Block Attention Module. In Proceedings of the Computer Vision—ECCV 2018 15th European Conference Proceedings: Lecture Notes in Computer Science (LNCS 11211), Munich, Germany, 8–14 September 2018; pp. 3–19. [Google Scholar] [CrossRef] [Green Version]
- Pan, S.J.; Yang, Q. A Survey on Transfer Learning. IEEE Trans. Knowl. Data Eng. 2010, 22, 1345–1359. [Google Scholar] [CrossRef]
- Tan, C.; Sun, F.; Kong, T.; Zhang, W.; Yang, C.; Liu, C. A Survey on Deep Transfer Learning. In Artificial Neural Networks and Machine Learning—ICANN 2018; Pt Iii; Kurkova, V., Manolopoulos, Y., Hammer, B., Iliadis, L., Maglogiannis, I., Eds.; Springer International Publishing Ag: Cham, Switzerland, 2018; Volume 11141, pp. 270–279. [Google Scholar] [CrossRef] [Green Version]
- Snell, J.; Swersky, K.; Zemel, R. Prototypical Networks for Few-shot Learning. In Advances in Neural Information Processing Systems 30 (NIPS 2017); Guyon, I., Luxburg, U.V., Bengio, S., Wallach, H., Fergus, R., Vishwanathan, S., Eds.; Neural Information Processing Systems (NIPS): La Jolla, CA, USA, 2017; Volume 30, pp. 4077–4087. [Google Scholar]
- Russakovsky, O.; Deng, J.; Su, H.; Krause, J.; Satheesh, S.; Ma, S.; Huang, Z.; Karpathy, A.; Khosla, A.; Bernstein, M.; et al. ImageNet Large Scale Visual Recognition Challenge. Int. J. Comput. Vis. 2015, 115, 211–252. [Google Scholar] [CrossRef] [Green Version]
- Vinyals, O.; Blundell, C.; Lillicrap, T.; Kavukcuoglu, K.; Wierstra, D. Matching Networks for One Shot Learning. In Advances in Neural Information Processing Systems 29 (NIPS 2016); Lee, D.D., Sugiyama, M., Luxburg, U.V., Guyon, I., Garnett, R., Eds.; Neural Information Processing Systems (NIPS): La Jolla, CA, USA, 2016; Volume 29, pp. 3630–3638. [Google Scholar]
- Kingma, D.P.; Ba, J. Adam: A Method for Stochastic Optimization. In Proceedings of the 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, 7–9 May 2015. [Google Scholar]
- Witt, D.R.; Chen, H.; Mielens, J.D.; McAvoy, K.E.; Zhang, F.; Hoffman, M.R.; Jiang, J.J. Detection of Chronic Laryngitis due to Laryngopharyngeal Reflux Using Color and Texture Analysis of Laryngoscopic Images. J. Voice 2014, 28, 98–105. [Google Scholar] [CrossRef] [PubMed] [Green Version]
Feature Size | ResNet18 | ResNet18_SE | ResNet18 _CBAM | ResNet50 | ResNet101 |
---|---|---|---|---|---|
112 112 | 7 7, 64, stride 2 | ||||
56 56 | max pooling, 3 , stride 2 | ||||
28 28 | |||||
14 14 | |||||
7 7 | |||||
1 1 | average pooling, 1D fc, softmax |
Model | Accuracy | AUC Value |
---|---|---|
ResNet18 | 0.688 ± 0.089 | 0.704 ± 0.095 |
ResNet18_SE | 0.697 ± 0.086 | 0.715 ± 0.123 |
ResNet18_CBAM | 0.729 ± 0.050 | 0.739 ± 0.068 |
ResNet50 | 0.707 ± 0.126 | 0.719 ± 0.117 |
ResNet50_SE | 0.659 ± 0.088 | 0.680 ± 0.116 |
ResNet50_CBAM | 0.734 ± 0.101 | 0.719 ± 0.132 |
ResNet101 | 0.672 ± 0.108 | 0.678 ± 0.115 |
ResNet101_SE | 0.656 ± 0.074 | 0.682 ± 0.103 |
ResNet101_CBAM | 0.679 ± 0.110 | 0.691 ± 0.096 |
Model | Accuracy | AUC Value |
---|---|---|
ResNet18 | 0.688 0.089 | 0.704 0.095 |
ResNet18 + freeze | 0.7020.112 | 0.7620.106 |
ResNet18 + finetune | 0.676 0.078 | 0.694 0.078 |
ResNet18_CBAM | 0.7290.050 | 0.7390.068 |
ResNet18_CBAM + freeze | 0.593 0.158 | 0.659 0.122 |
ResNet18_CBAM + finetune | 0.626 0.096 | 0.724 ± 0.103 |
Model | Accuracy | AUC Value |
---|---|---|
ProtoNet | 0.578 | 0.610 |
ProtoNet + finetune | 0.632 | 0.623 |
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |
© 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Ye, G.; Du, C.; Lin, T.; Yan, Y.; Jiang, J. Deep Learning for Laryngopharyngeal Reflux Diagnosis. Appl. Sci. 2021, 11, 4753. https://doi.org/10.3390/app11114753
Ye G, Du C, Lin T, Yan Y, Jiang J. Deep Learning for Laryngopharyngeal Reflux Diagnosis. Applied Sciences. 2021; 11(11):4753. https://doi.org/10.3390/app11114753
Chicago/Turabian StyleYe, Gen, Chen Du, Tong Lin, Yan Yan, and Jack Jiang. 2021. "Deep Learning for Laryngopharyngeal Reflux Diagnosis" Applied Sciences 11, no. 11: 4753. https://doi.org/10.3390/app11114753