Risk-Aware Machine Learning Classifier for Skin Lesion Diagnosis
Abstract
:1. Introduction
- We propose a DNN-based CAD model that uses approximate Bayesian inference to output an uncertainty estimate along with its prediction in skin lesion classification. The proposed framework is general enough to support a wide variety of medical machine learning tasks and applications. Our results demonstrate the effectiveness of the confidence ratings in improving the diagnosis performance of the CAD–physician team and reducing the physician workload.
- We formulate metrics to evaluate the uncertainty estimation performance of the Bayesian models. These metrics provide us with an informative tool to compare the quality of uncertainty estimations obtained from various models. Moreover, they provide hints for choosing an appropriate uncertainty threshold to reject samples and refer them to the physician for further inspection.
- We provide in-depth analysis to show that the uncertainty-aware referral system via the Bayesian deep networks is effective for improving the team diagnosis accuracy on NV, BCC, AKIEC, BKL, and VASC lesion types.
2. Related Work
2.1. Physician–CAD Interaction
2.2. Skin Lesion Diagnosis
2.3. Uncertainty Estimation
3. Materials and Methods
3.1. Uncertainty Estimation via Bayesian Neural Networks
3.2. MC-Dropout for Bayesian Neural Network Approximation
3.3. Uncertainty Evaluation Metrics
3.4. Approximate Bayesian Network Building Strategy
3.5. Training Procedure
4. Data
4.1. Data Description
4.2. Data Preparation
5. Experimental Results
5.1. Bayesian Architecture Designs
5.2. Prediction Performance of the Bayesian Models
5.3. Uncertainty Estimation Performance of the Bayesian Models
5.4. Uncertainty-Aware Skin Lesion Classification and Referral
5.5. Lesion-Specific Performance Analysis of Bayesian DenseNet-169
6. Discussion
Software and Code Availability
7. Conclusions
Author Contributions
Funding
Conflicts of Interest
Abbreviations
DNN | Deep Neural Network |
BNN | Bayesian Neural Network |
MC | Monte Carlo |
iu | incorrect-uncertain |
cc | correct-certain |
ic | incorrect-certain |
cu | correct-uncertain |
UA | Uncertainty Accuracy |
Appendix A
Appendix A.1. Dropout as Approximate Variational Inference in Bayesian Neural Networks
Appendix A.2. Bayesian Model Architectures
References
- LeCun, Y.; Bengio, Y.; Hinton, G. Deep learning. Nature 2015, 521, 436. [Google Scholar] [CrossRef] [PubMed]
- Liao, F.; Liang, M.; Li, Z.; Hu, X.; Song, S. Evaluate the Malignancy of Pulmonary Nodules Using the 3-D Deep Leaky Noisy-or Network. IEEE Trans. Neural Netw. Learn. Syst. 2019, 1–12. [Google Scholar] [CrossRef] [PubMed]
- Mobiny, A.; Van Nguyen, H. Fast capsnet for lung cancer screening. In Proceedings of the International Conference on Medical Image Computing and Computer-Assisted Intervention, Granada, Spain, 16–20 September 2018; Springer: Cham, Switzerland, 2018; pp. 741–749. [Google Scholar]
- Berisha, S.; Lotfollahi, M.; Jahanipour, J.; Gurcan, I.; Walsh, M.; Bhargava, R.; Van Nguyen, H.; Mayerich, D. Deep learning for FTIR histology: Leveraging spatial and spectral features with convolutional neural networks. Analyst 2019, 144, 1642–1653. [Google Scholar] [CrossRef] [PubMed]
- Dou, Q.; Chen, H.; Yu, L.; Zhao, L.; Qin, J.; Wang, D.; Mok, V.C.T.; Shi, L.; Heng, P. Automatic detection of cerebral microbleeds from MR images via 3D convolutional neural networks. IEEE Trans. Med Imaging 2016, 35, 1182–1195. [Google Scholar] [CrossRef] [PubMed]
- Gulshan, V.; Peng, L.; Coram, M.; Stumpe, M.C.; Wu, D.; Narayanaswamy, A.; Venugopalan, S.; Widner, K.; Madams, T.; Cuadros, J.; et al. Development and validation of a deep learning algorithm for detection of diabetic retinopathy in retinal fundus photographs. JAMA 2016, 316, 2402–2410. [Google Scholar] [CrossRef]
- Esteva, A.; Kuprel, B.; Novoa, R.A.; Ko, J.; Swetter, S.M.; Blau, H.M.; Thrun, S. Dermatologist-level classification of skin cancer with deep neural networks. Nature 2017, 542, 115. [Google Scholar] [CrossRef] [PubMed]
- Doi, K. Computer-aided diagnosis in medical imaging: Historical review, current status and future potential. Comput. Med. Imaging Graph. 2007, 31, 198–211. [Google Scholar] [CrossRef] [Green Version]
- Jorritsma, W.; Cnossen, F.; van Ooijen, P.M.A. Improving the radiologist—CAD interaction: Designing for appropriate trust. Clin. Radiol. 2015, 70, 115–122. [Google Scholar] [CrossRef]
- Gal, Y. Uncertainty in Deep Learning. Ph.D. Thesis, University of Cambridge, Cambridge, UK, 2016. [Google Scholar]
- Ahmadian, S.; Vahidi, B.; Jahanipour, J.; Hoseinian, S.H.; Rastegar, H. Price restricted optimal bidding model using derated sensitivity factors by considering risk concept. IET Gener. Transm. Distrib. 2016, 10, 310–324. [Google Scholar] [CrossRef]
- Leibig, C.; Allken, V.; Ayhan, M.S.; Berens, P.; Wahl, S. Leveraging uncertainty information from deep neural networks for disease detection. Sci. Rep. 2017, 7, 17816. [Google Scholar] [CrossRef]
- Ahmadian, S.; Tang, X.; Malki, H.A.; Han, Z. Modelling Cyber Attacks on Electricity Market Using Mathematical Programming With Equilibrium Constraints. IEEE Access 2019, 7, 27376–27388. [Google Scholar] [CrossRef]
- Chan, H.-P.; Charles, E.; Metz, P.; Lam, K.L.; Wu, Y.; Macmahon, H. Improvement in radiologists’ detection of clustered microcalcifications on mammograms. Arbor 1990, 1001, 48109-0326. [Google Scholar] [CrossRef]
- Kasai, S.; Li, F.; Shiraishi, J.; Doi, K. Usefulness of computer-aided diagnosis schemes for vertebral fractures and lung nodules on chest radiographs. Am. J. Roentgenol. 2008, 191, 260–265. [Google Scholar] [CrossRef]
- Mobiny, A.; Moulik, S.; Van Nguyen, H. Lung cancer screening using adaptive memory-augmented recurrent networks. arXiv 2017, arXiv:1710.05719. [Google Scholar]
- Brem, R.F.; Schoonjans, J.M. Radiologist detection of microcalcifications with and without computer-aided detection: A comparative study. Clin. Radiol. 2001, 56, 150–154. [Google Scholar] [CrossRef]
- Petrick, N.; Haider, M.; Summers, R.M.; Yeshwant, S.C.; Brown, L.; Edward Iuliano, M.; Louie, A.; Choi, J.R.; Pickhardt, P.J. CT colonography with computer-aided detection as a second reader: Observer performance study. Radiology 2008, 246, 148–156. [Google Scholar] [CrossRef]
- Skitka, L.J.; Mosier, K.L.; Burdick, M. Does automation bias decision-making? Int. J. Hum.-Comput. Stud. 1999, 51, 991–1006. [Google Scholar] [CrossRef] [Green Version]
- Awai, K.; Murao, K.; Ozawa, A.; Nakayama, Y.; Nakaura, T.; Liu, D.; Kawanaka, K.; Funama, Y.; Morishita, S.; Yamashita, Y. Pulmonary nodules: Estimation of malignancy at thin-section helical CT—Effect of computer-aided diagnosis on performance of radiologists. Radiology 2006, 239, 276–284. [Google Scholar] [CrossRef]
- Li, F.; Aoyama, M.; Shiraishi, J.; Abe, H.; Li, Q.; Suzuki, K.; Engelmann, R.; Sone, S.; MacMahon, H.; Doi, K. Radiologists’ performance for differentiating benign from malignant lung nodules on high-resolution CT using computer-estimated likelihood of malignancy. Am. J. Roentgenol. 2004, 183, 1209–1215. [Google Scholar] [CrossRef]
- Kashikura, Y.; Nakayama, R.; Hizukuri, A.; Noro, A.; Nohara, Y.; Nakamura, T.; Ito, M.; Kimura, H.; Yamashita, M.; Hanamura, N.; et al. Improved differential diagnosis of breast masses on ultrasonographic images with a computer-aided diagnosis scheme for determining histological classifications. Acad. Radiol. 2013, 20, 471–477. [Google Scholar] [CrossRef]
- Horsch, K.; Giger, M.L.; Vyborny, C.J.; Lan, L.; Mendelson, E.B.; Hendrick, R.E. Classification of breast lesions with multimodality computer-aided diagnosis: Observer study results on an independent clinical data set. Radiology 2006, 240, 357–368. [Google Scholar] [CrossRef]
- Apalla, Z.; Nashan, D.; Weller, R.B.; Castellsague, X. Skin cancer: Epidemiology, disease burden, pathophysiology, diagnosis, and therapeutic approaches. Dermatol. Ther. 2017, 7, 5–19. [Google Scholar] [CrossRef]
- Bray, F.; Ferlay, J.; Soerjomataram, I.; Siegel, R.L.; Torre, L.A.; Jemal, A. Global cancer statistics 2018: GLOBOCAN estimates of incidence and mortality worldwide for 36 cancers in 185 countries. CA: A Cancer J. Clin. 2018, 68, 394–424. [Google Scholar] [CrossRef] [Green Version]
- Kimball, A.B.; Resneck, J.S., Jr. The US dermatology workforce: A specialty remains in shortage. J. Am. Acad. Dermatol. 2008, 59, 741–745. [Google Scholar] [CrossRef]
- Maragoudakis, M.; Maglogiannis, I. Skin lesion diagnosis from images using novel ensemble classification techniques. In Proceedings of the 10th IEEE International Conference on Information Technology and Applications in Biomedicine, Corfu, Greece, 3–5 November 2010; IEEE: Piscataway, NJ, USA, 2010; pp. 1–5. [Google Scholar]
- Madooei, A.; Drew, M.S.; Sadeghi, M.; Atkins, M.S. Intrinsic melanin and hemoglobin colour components for skin lesion malignancy detection. In Medical Image Computing and Computer-Assisted Intervention; Springer: Berlin/Heidelberg, Germany, 2012; pp. 315–322. [Google Scholar]
- Litjens, G.; Kooi, T.; Bejnordi, B.E.; Setio, A.A.A.; Ciompi, F.; Ghafoorian, M.; Laak, J.A.V.D.; Ginneken, B.V.; Sánchez, C.I. A survey on deep learning in medical image analysis. Med. Image Anal. 2017, 42, 60–88. [Google Scholar] [CrossRef] [Green Version]
- Mobiny, A.; Lu, H.; Nguyen, H.V.; Roysam, B.; Varadarajan, N. Automated Classification of Apoptosis in Phase Contrast Microscopy Using Capsule Network. IEEE Trans. Med. Imaging 2019. [Google Scholar] [CrossRef]
- Ghesu, F.C.; Krubasik, E.; Georgescu, B.; Singh, V.; Zheng, Y.; Hornegger, J.; Comaniciu, D. Marginal space deep learning: Efficient architecture for volumetric image parsing. IEEE Trans. Med. Imaging 2016, 35, 1217–1228. [Google Scholar] [CrossRef]
- Krizhevsky, A.; Sutskever, I.; Hinton, G.E. Imagenet classification with deep convolutional neural networks. In Proceedings of the Advances in Neural Information Processing Systems, Lake Tahoe, NV, USA, 3–6 December 2012; pp. 1097–1105. [Google Scholar]
- Kawahara, J.; Hamarneh, G. Multi-resolution-tract CNN with hybrid pretrained and skin-lesion trained layers. In International Workshop on Machine Learning in Medical Imaging; Springer: Cham, Switzerland, 2016; pp. 164–171. [Google Scholar]
- Yang, J.; Sun, X.; Liang, J.; Rosin, P.L. Clinical skin lesion diagnosis using representations inspired by dermatologist criteria. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–22 June 2018; pp. 1258–1266. [Google Scholar]
- Szegedy, C.; Vanhoucke, V.; Ioffe, S.; Shlens, J.; Wojna, Z. Rethinking the inception architecture for computer vision. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 2818–2826. [Google Scholar]
- Gessert, N.; Sentker, T.; Madesta, F.; Schmitz, R.; Kniep, H.; Baltruschat, I.; Werner, R.; Schlaefer, A. Skin Lesion Diagnosis using Ensembles, Unscaled Multi-Crop Evaluation and Loss Weighting. arXiv 2018, arXiv:1808.01694. [Google Scholar]
- Tschandl, P.; Rosendahl, C.; Kittler, H. The HAM10000 dataset, a large collection of multi-source dermatoscopic images of common pigmented skin lesions. Sci. Data 2018, 5, 180161. [Google Scholar] [CrossRef]
- Der Kiureghian, A.; Ditlevsen, O. Aleatory or epistemic? Does it matter? Struct. Saf. 2009, 31, 105–112. [Google Scholar] [CrossRef]
- Gal, Y.; Ghahramani, Z. Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In Proceedings of the International Conference on Machine Learning, New York, NY, USA, 19–24 June 2016; pp. 1050–1059. [Google Scholar]
- Kendall, A.; Gal, Y. What uncertainties do we need in bayesian deep learning for computer vision? In Proceedings of the Advances in Neural Information Processing Systems, Long Beach, CA, USA, 4–9 December 2017; pp. 5574–5584. [Google Scholar]
- Ayhan, M.S.; Berens, P. Test-time data augmentation for estimation of heteroscedastic aleatoric uncertainty in deep neural networks. In Proceedings of the MIDL 2018 Conference, Amsterdam, The Netherlands, 4–6 July 2018. [Google Scholar]
- Neal, R.M. Bayesian Learning for Neural Networks; Springer Science and Business Media: Berlin, Germany, 2012; Volume 118. [Google Scholar]
- MacKay, D.J.C. A practical Bayesian framework for backpropagation networks. Neural Comput. 1992, 4, 448–472. [Google Scholar] [CrossRef]
- Neal, R.M. Bayesian learning via stochastic dynamics. In Proceedings of the Advances in Neural Information Processing Systems, Santa Cruz, CA, USA, 26–28 July 1993; pp. 475–482. [Google Scholar]
- Mobiny, A.; Nguyen, H.V.; Moulik, S.; Garg, N.; Wu, C.C. DropConnect Is Effective in Modeling Uncertainty of Bayesian Deep Networks. arXiv 2019, arXiv:1906.04569. [Google Scholar]
- Graves, A. Practical variational inference for neural networks. In Proceedings of the Advances in Neural Information Processing Systems, Granada, Spain, 12–14 December 2011; pp. 2348–2356. [Google Scholar]
- Srivastava, N.; Hinton, G.; Krizhevsky, A.; Sutskever, I.; Salakhutdinov, R. Dropout: A simple way to prevent neural networks from overfitting. J. Mach. Learn. Res. 2014, 15, 1929–1958. [Google Scholar]
- Damianou, A.; Lawrence, N. Deep gaussian processes. In Proceedings of the Artificial Intelligence and Statistics, Scottsdale, AZ, USA, 29 April–1 May 2013; pp. 207–215. [Google Scholar]
- Cortes Ciriano, I.; Bender, A. Reliable Prediction Errors for Deep Neural Networks Using Test-Time Dropout. J. Chem. Inf. Model. 2019. [Google Scholar] [CrossRef]
- DeVries, T.; Taylor, G.W. Leveraging uncertainty estimates for predicting segmentation quality. arXiv 2018, arXiv:1807.00502. [Google Scholar]
- Gal, Y.; Ghahramani, Z. Bayesian convolutional neural networks with Bernoulli approximate variational inference. arXiv 2015, arXiv:1506.02158. [Google Scholar]
- Louizos, C.; Welling, M. Multiplicative normalizing flows for variational bayesian neural networks. In Proceedings of the 34th International Conference on Machine Learning, Sydney, NSW, Australia, 6–11 August 2017; Volume 70, pp. 2218–2227. [Google Scholar]
- Lakshminarayanan, B.; Pritzel, A.; Blundell, C. Simple and scalable predictive uncertainty estimation using deep ensembles. In Proceedings of the Advances in Neural Information Processing Systems, Long Beach, CA, USA, 4–9 December 2017; pp. 6402–6413. [Google Scholar]
- Goodfellow, I.; Bengio, Y.; Courville, A. Deep Learning; MIT Press: Cambridge, MA, USA, 2016. [Google Scholar]
- MacKay, D.J.C. Probable networks and plausible predictions—A review of practical Bayesian methods for supervised neural networks. Netw. Comput. Neural Syst. 1995, 6, 469–505. [Google Scholar] [CrossRef]
- Blundell, C.; Cornebise, J.; Kavukcuoglu, K.; Wierstra, D. Weight uncertainty in neural networks. arXiv 2015, arXiv:1505.05424. [Google Scholar]
- Simonyan, K.; Zisserman, A. Very deep convolutional networks for large-scale image recognition. arXiv 2014, arXiv:1409.1556. [Google Scholar]
- He, K.; Zhang, X.; Ren, S.; Sun, J. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 770–778. [Google Scholar]
- Huang, G.; Liu, Z.; Maaten, L.V.D.; Weinberger, K.Q. Densely connected convolutional networks. In Proceedings of the IEEE Conference On Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 4700–4708. [Google Scholar]
- Kendall, A.; Badrinarayanan, V.; Cipolla, R. Bayesian segnet: Model uncertainty in deep convolutional encoder-decoder architectures for scene understanding. arXiv 2015, arXiv:1511.02680. [Google Scholar]
- Kingma, D.P.; Ba, J. Adam: A method for stochastic optimization. arXiv 2014, arXiv:1412.6980. [Google Scholar]
- Russakovsky, O.; Deng, J.; Su, H.; Krause, J.; Satheesh, S.; Ma, S.; Huang, Z.; Karpathy, A.; Khosla, A.; Bernstein, M.; et al. Imagenet large scale visual recognition challenge. Int. J. Comput. Vis. 2015, 115, 211–252. [Google Scholar] [CrossRef]
- He, K.; Zhang, X.; Ren, S.; Sun, J. Delving deep into rectifiers: Surpassing human-level performance on imagenet classification. In Proceedings of the IEEE international conference on computer vision, Santiago, Chile, 11–18 Decembe 2015; pp. 1026–1034. [Google Scholar]
- Milton, M.A.A. Automated Skin Lesion Classification Using Ensemble of Deep Neural Networks in ISIC 2018: Skin Lesion Analysis Towards Melanoma Detection Challenge. arXiv 2019, arXiv:1901.10802. [Google Scholar]
- Ray, S. Disease Classification within Dermascopic Images Using features extracted by ResNet50 and classification through Deep Forest. arXiv 2018, arXiv:1807.05711. [Google Scholar]
- Perez, F.; Avila, S.; Valle, E. Solo or Ensemble? Choosing a CNN Architecture for Melanoma Classification. arXiv 2019, arXiv:1904.12724. [Google Scholar]
- Scott, D.W. On optimal and data-based histograms. Biometrika 1979, 66, 605–610. [Google Scholar] [CrossRef]
- Brinker, T.J.; Hekler, A.; Enk, A.H.; Klode, J.; Hauschild, A.; Berking, C.; Schilling, B.; Haferkamp, S.; Schadendorf, D.; Fröhling, S.; et al. A convolutional neural network trained with dermoscopic images performed on par with 145 dermatologists in a clinical melanoma image classification task. Eur. J. Cancer 2019, 111, 148–154. [Google Scholar] [CrossRef] [Green Version]
- Brinker, T.J.; Hekler, A.; Enk, A.H.; Klode, J.; Hauschild, A.; Berking, C.; Schilling, B.; Haferkamp, S.; Schadendorf, D.; Holland-Letz, T.; et al. Deep learning outperformed 136 of 157 dermatologists in a head-to-head dermoscopic melanoma image classification task. Eur. J. Cancer 2019, 113, 47–54. [Google Scholar] [CrossRef] [Green Version]
- Haenssle, H.A.; Fink, C.; Schneiderbauer, R.; Toberer, F.; Buhl, T.; Blum, A.; Kalloo, A.; Hassen, A.B.; Thomas, L.; Enk, A.; et al. Man against machine: Diagnostic performance of a deep learning convolutional neural network for dermoscopic melanoma recognition in comparison to 58 dermatologists. Ann. Oncol. 2018, 29, 1836–1842. [Google Scholar] [CrossRef]
- Kruskal, W.H.; Wallis, W.A. Use of ranks in one-criterion variance analysis. J. Am. Stat. Assoc. 1952, 47, 583–621. [Google Scholar] [CrossRef]
- Gal, Y.; Islam, R.; Ghahramani, Z. Deep bayesian active learning with image data. In Proceedings of the 34th International Conference on Machine Learning, Sydney, Australia, 6–11 August 2017; Volume 70, pp. 1183–1192. [Google Scholar]
- Abadi, M.; Barham, P.; Chen, J.; Chen, Z.; Davis, A.; Dean, J.; Devin, M.; Ghemawat, S.; Irving, G.; Isard, M.; et al. Tensorflow: A system for large-scale machine learning. In Proceedings of the 12th USENIX Symposium on Operating Systems Design and Implementation (OSDI 16), Savannah, GA, USA, 2–4 November 2016; pp. 265–283. [Google Scholar]
Lesion Type | MEL | NV | BCC | AKIEC | BKL | DF | VASC | Total |
---|---|---|---|---|---|---|---|---|
Number of samples | 1113 | 6705 | 514 | 327 | 1099 | 115 | 142 | 10,015 |
Method | % Prediction Accuracy (±std) |
---|---|
PNASNet [64] | 76.00 |
ResNet-50 + gcForest [65] | 80.04 |
VGG-16 + GoogLeNet Ensemble [66] | 81.50 |
Densenet-121 with SVM [36] | 82.70 |
Densenet-169 [36] | 85.20 |
VGG-16 | 79.63 (±0.25) |
ResNet-50 | 80.45 (±0.21) |
DenseNet-169 | 81.35 (±0.14) |
Bayesian VGG-16 (T = 27) | 81.02 (±0.22) |
Bayesian ResNet-50 (T = 18) | 82.37 (±0.14) |
Bayesian DenseNet-169 (T = 10) | 83.59 (±0.17) |
© 2019 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).
Share and Cite
Mobiny, A.; Singh, A.; Van Nguyen, H. Risk-Aware Machine Learning Classifier for Skin Lesion Diagnosis. J. Clin. Med. 2019, 8, 1241. https://doi.org/10.3390/jcm8081241
Mobiny A, Singh A, Van Nguyen H. Risk-Aware Machine Learning Classifier for Skin Lesion Diagnosis. Journal of Clinical Medicine. 2019; 8(8):1241. https://doi.org/10.3390/jcm8081241
Chicago/Turabian StyleMobiny, Aryan, Aditi Singh, and Hien Van Nguyen. 2019. "Risk-Aware Machine Learning Classifier for Skin Lesion Diagnosis" Journal of Clinical Medicine 8, no. 8: 1241. https://doi.org/10.3390/jcm8081241
APA StyleMobiny, A., Singh, A., & Van Nguyen, H. (2019). Risk-Aware Machine Learning Classifier for Skin Lesion Diagnosis. Journal of Clinical Medicine, 8(8), 1241. https://doi.org/10.3390/jcm8081241