# Unsupervised Learning for Concept Detection in Medical Images: A Comparative Analysis

^{*}

## Abstract

**:**

## 1. Introduction

#### Related Work

## 2. Methods

- We experimented with creating image descriptors using bags of visual words (BoWs), for two different visual keypoint extraction algorithms; and
- With the use of modern deep learning approaches, we designed and trained various deep neural network architectures: a sparse denoising autoencoder (SDAE), a variational autoencoder (VAE), a bidirectional generative adversarial network (BiGAN), and an adversarial autoencoder (AAE).

#### 2.1. Bags of Visual Words

#### 2.2. Deep Representation Learning

#### 2.2.1. Sparse Denoising Autoencoder

#### 2.2.2. Variational Autoencoder

#### 2.2.3. Bidirectional GAN

- Like in the encoder, the image was processed by the convolutional neural network described in Table 1 with $nb=128$;
- The prior code, z, was fed to two fully connected layers with an output shape of B × 64 (where B is the batch size);
- The two outcomes, (1) and (2), were concatenated to form a tensor of shape B × 192, followed by 2 fully connected networks of shape B × 512;
- Finally, a fully connected layer with a single neuron (Bx1) produced the output $D(x,z)$.

#### 2.2.4. Adversarial Autoencoder

#### 2.2.5. Network Training Details

#### 2.3. Evaluation

#### 2.3.1. Logistic Regression

#### 2.3.2. k-Nearest Neighbors

## 3. Results

#### 3.1. Qualitative Results

#### 3.2. Linear Classifiers

#### 3.3. k-Nearest Neighbors

## 4. Conclusions

## Author Contributions

## Funding

## Conflicts of Interest

## Abbreviations

CBIR | Content-based Image Retrieval |

RBM | Restricted Boltzmann Machine |

GAN | Generative Adversarial Network |

BoW | Bag of Words |

SDAE | Sparse Denoising Autoencoder |

VAE | Variational Autoencoder |

BiGAN | Bidirectional Generative Adversarial Network |

AAE | Adversarial Autoencoder |

SIFT | Scale Invariant Feature Transform |

ORB | Oriented FAST and Rotated BRIEF |

ReLU | Rectified Linear Unit |

CUI | Concept Unique Identifier |

UMLS | Unified Medical Language System |

PCA | Principal Component Analysis |

FTRL | follow-the-regularized-leader |

## References

- Bengio, Y.; Courville, A.; Vincent, P. Representation learning: A review and new perspectives. Pattern Anal. Mach. Intell. IEEE Trans.
**2013**, 35, 1798–1828. [Google Scholar] [CrossRef] [PubMed] - Ravi, D.; Wong, C.; Deligianni, F.; Berthelot, M.; Andreu-Perez, J.; Lo, B.; Yang, G.Z. Deep Learning for Health Informatics. IEEE J. Biomed. Health Inform.
**2017**, 21, 4–21. [Google Scholar] [CrossRef] [PubMed] - Lee, H.; Battle, A.; Raina, R.; Ng, A.Y. Efficient sparse coding algorithms. Adv. Neural Inf. Process. Syst.
**2006**, 19, 801–808. [Google Scholar] - Hinton, G.E.; Osindero, S.; Teh, Y.W. A fast learning algorithm for deep belief nets. Neural Comput.
**2006**, 18, 1527–1554. [Google Scholar] [CrossRef] [PubMed] - Wangming, X.; Jin, W.; Xinhai, L.; Lei, Z.; Gang, S. Application of Image SIFT Features to the Context of CBIR. In Proceedings of the 2008 International Conference on Computer Science and Software Engineering, Wuhan, China, 12–14 December 2008; pp. 552–555. [Google Scholar] [CrossRef]
- Dimitrovski, I.; Kocev, D.; Kitanovski, I.; Loskovska, S.; Džeroski, S. Improved medical image modality classification using a combination of visual and textual features. Comput. Med. Imaging Graph.
**2015**, 39, 14–26. [Google Scholar] [CrossRef] [PubMed] - Vincent, P.; Larochelle, H.; Lajoie, I.; Bengio, Y.; Manzagol, P.A. Stacked Denoising Autoencoders: Learning Useful Representations in a Deep Network with a Local Denoising Criterion. J. Mach. Learn. Res.
**2010**, 11, 3371–3408. [Google Scholar] - Kingma, D.P.; Welling, M. Auto-Encoding Variational Bayes. arXiv, 2014; arXiv:1312.6114. [Google Scholar]
- Goodfellow, I.J.; Pouget-abadie, J.; Mirza, M.; Xu, B.; Warde-farley, D.; Ozair, S.; Courville, A.; Bengio, Y. Generative Adversarial Nets. arXiv, 2014; arXiv:1406.2661v1. [Google Scholar]
- Donahue, J.; Krähenbühl, P.; Darrell, T. Adversarial feature learning. arXiv, 2016; arXiv:1605.09782. [Google Scholar]
- Li, Z.; Zhang, X.; Müller, H.; Zhang, S. Large-scale Retrieval for Medical Image Analytics: A Comprehensive Review. Med. Image Anal.
**2017**. [Google Scholar] [CrossRef] [PubMed] - Kalpathy-Cramer, J.; de Herrera, A.G.S.; Demner-Fushman, D.; Antani, S.; Bedrick, S.; Müller, H. Evaluating performance of biomedical image retrieval systems—An overview of the medical image retrieval task at ImageCLEF 2004–2013. Comput. Med. Imaging Graph.
**2015**, 39, 55–61. [Google Scholar] [CrossRef] [PubMed][Green Version] - Litjens, G.; Kooi, T.; Bejnordi, B.E.; Setio, A.A.A.; Ciompi, F.; Ghafoorian, M.; van der Laak, J.A.; van Ginneken, B.; Sánchez, C.I. A survey on deep learning in medical image analysis. Med. Image Anal.
**2017**, 42, 60–88. [Google Scholar] [CrossRef] [PubMed][Green Version] - Jimenez-del Toro, O.; Otálora, S.; Andersson, M.; Eurén, K.; Hedlund, M.; Rousson, M.; Müller, H.; Atzori, M. Analysis of histopathology images: From traditional machine learning to deep learning. In Biomedical Texture Analysis; Elsevier: New York, NY, USA, 2018; pp. 281–314. [Google Scholar]
- Sun, W.; Zheng, B.; Qian, W. Automatic feature learning using multichannel ROI based on deep structured algorithms for computerized lung cancer diagnosis. Comput. Biol. Med.
**2017**, 89, 530–539. [Google Scholar] [CrossRef] [PubMed] - Wu, G.; Kim, M.; Wang, Q.; Munsell, B.C.; Shen, D. Scalable High-Performance Image Registration Framework by Unsupervised Deep Feature Representations Learning. IEEE Trans. Biomed. Eng.
**2016**, 63, 1505–1516. [Google Scholar] [CrossRef] [PubMed] - Lowe, D.G. Distinctive Image Features from Scale-Invariant Keypoints. Int. J. Comput. Vis.
**2004**, 60, 91–110. [Google Scholar] [CrossRef][Green Version] - Rublee, E.; Rabaud, V.; Konolige, K.; Bradski, G. ORB: An efficient alternative to SIFT or SURF. In Proceedings of the 2011 International Conference on Computer Vision, Barcelona, Spain, 6–13 November 2011; pp. 2564–2571. [Google Scholar]
- Bradski, G. The OpenCV Library. Dr. Dobbs J.
**2000**, 25, 120–126. [Google Scholar] - Sivic, J.; Zisserman, A. Video Google: A text retrieval approach to object matching in videos. In Proceedings of the Ninth IEEE International Conference on Computer Vision, Nice, France, 13–16 October 2003; pp. 1470–1477. [Google Scholar]
- Radford, A.; Metz, L.; Chintala, S. Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks. arXiv, 2016; arXiv:1511.06434. [Google Scholar]
- Ioffe, S.; Szegedy, C. Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift. arXiv, 2015; arXiv:1502.03167. [Google Scholar]
- Ba, J.L.; Kiros, J.R.; Hinton, G.E. Layer Normalization. arXiv
**2016**, arXiv:1607.06450. [Google Scholar] - Makhzani, A.; Shlens, J.; Jaitly, N.; Goodfellow, I.; Frey, B. Adversarial Autoencoders. arXiv, 2015; arXiv:1511.05644. [Google Scholar]
- Kingma, D.P.; Ba, J.L. Adam: A Method for Stochastic Optimization. arXiv, 2015; arXiv:1412.6980. [Google Scholar]
- Abadi, M.; Agarwal, A.; Barham, P.; Brevdo, E.; Chen, Z.; Citro, C.; Corrado, G.S.; Davis, A.; Dean, J.; Devin, M.; et al. TensorFlow: Large-Scale Machine Learning on Heterogeneous Distributed Systems. arXiv, 2016; arXiv:1603.04467. [Google Scholar]
- Eickhoff, C.; Schwall, I.; de Herrera, A.; Müller, H. Overview of ImageCLEFcaption 2017—The Image Caption Prediction and Concept Extraction Tasks to Understand Biomedical Images; Working Notes CLEF; CEUR-WS.org: Aachen, Germany, 2017. [Google Scholar]
- McMahan, H.B.; Holt, G.; Sculley, D.; Young, M.; Ebner, D.; Grady, J.; Nie, L.; Phillips, T.; Davydov, E.; Golovin, D.; et al. Ad click prediction: A view from the trenches. In Proceedings of the 19th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Chicago, IL, USA, 11–14 August 2013; pp. 1222–1230. [Google Scholar]
- Johnson, J.; Douze, M.; Jégou, H. Billion-scale similarity search with GPUs. arXiv, 2017; arXiv:1702.08734. [Google Scholar]
- Dimitris, K.; Ergina, K. Concept Detection on Medical Images Using Deep Residual Learning Network; Working Notes CLEF; CEUR-WS.org: Aachen, Germany, 2017. [Google Scholar]
- Valavanis, L.; Stathopoulos, S. IPL at ImageCLEF 2017 Concept Detection Task; Working Notes CLEF; CEUR-WS.org: Aachen, Germany, 2017. [Google Scholar]
- Lipton, Z.C.; Elkan, C.; Narayanaswamy, B. Thresholding Classifiers to Maximize F1 Score. Mach. Learn. Knowl. Discov. Databases
**2014**, 8725, 225–239. [Google Scholar] [PubMed] - Pinho, E.; Silva, J.F.; Silva, J.M.; Costa, C. Towards Representation Learning for Biomedical Concept Detection in Medical Images: UA. PT Bioinformatics in ImageCLEF 2017; Working Notes CLEF; CEUR-WS.org: Aachen, Germany, 2017. [Google Scholar]

**Figure 5.**A few samples from the ImageCLEF 2017 concept detection data set with their respective file IDs and trimmed list of concept identifiers.

**Figure 6.**The 2D projections of the latent codes in the validation set, for each learned feature space. Best seen in color.

Layer | Kernels | Size/Stride | Details |
---|---|---|---|

conv1 | 64 | 5 × 5/2 | Normalization + non-linearity |

conv2 | 128 | 5 × 5/2 | Normalization + non-linearity |

conv3 | 256 | 5 × 5/2 | Normalization + non-linearity |

conv4 | 512 | 5 × 5/2 | Normalization + non-linearity |

conv5 | 512 | 5 × 5/2 | Normalization + non-linearity |

avgpool | N/A | N/A | |

fc | nb | Linear activation |

Layer | Kernels | Size/Stride | Details |
---|---|---|---|

fc | 4096 | Reshaped to 1024 × 2 × 2 | |

dconv5 | 512 | 5 × 5/2 | Normalization + ReLU |

dconv4 | 256 | 5 × 5/2 | Normalization + ReLU |

dconv3 | 128 | 5 × 5/2 | Normalization + ReLU |

dconv2 | 64 | 5 × 5/2 | Normalization + ReLU |

dconv1 | 3 | 5 × 5/2 | Linear activation |

**Table 3.**The ten most frequently occurring concepts in the ImageCLEF 2017 training set for concept detection.

CUI | Occurrences in Training Set | Textual Description |
---|---|---|

C1696103 | 17998 | Image-dosage form |

C0040405 | 16217 | X-ray computed tomography |

C0221198 | 14219 | Lesion |

C1306645 | 10926 | Plain X-ray |

C0577559 | 9769 | Mass (lump, localized mass) |

C0027651 | 9570 | Tumor |

C0441633 | 9289 | Diagnostic scanning |

C0817096 | 5602 | Thorax |

C1317574 | 5039 | Note |

C0087111 | 4983 | Therapy |

**Table 4.**The best metrics obtained from logistic regression for each representation learned, where Mix is the feature combination of sparse denoising autoencoder (SDAE) and adversarial autoencoder (AAE). The highest scores are shown in bold.

Type | ${F}_{1}$ Score | Precision | Recall | AUC | ${F}_{1}$ Score (Test) |
---|---|---|---|---|---|

ORB | 0.138 | 0.138 | 0.143 | 0.699 | 0.0967 |

SIFT | 0.133 | 0.119 | 0.151 | 0.753 | 0.0952 |

SDAE | 0.151 | 0.141 | 0.162 | 0.781 | 0.1029 |

VAE | 0.140 | 0.137 | 0.142 | 0.760 | 0.0924 |

BiGAN | 0.141 | 0.142 | 0.139 | 0.781 | 0.781 |

AAE | 0.159 | 0.159 | 0.174 | 0.787 | 0.1080 |

Mix | 0.161 | 0.147 | 0.179 | 0.789 | 0.1105 |

**Table 5.**The best ${F}_{1}$ scores obtained from the vector similarity search for each representation learned with the highest scores shown in bold.

Type | ${F}_{1}$ Score | Precision | Recall | AUC | k | ${F}_{1}$ Score (Test) |
---|---|---|---|---|---|---|

ORB | 0.043 | 0.030 | 0.106 | 0.552 | 4 | 0.0418 |

SIFT | 0.060 | 0.043 | 0.134 | 0.567 | 3 | 0.0567 |

SDAE | 0.080 | 0.070 | 0.120 | 0.560 | 2 | 0.0751 |

VAE | 0.036 | 0.025 | 0.087 | 0.543 | 4 | 0.0345 |

BiGAN | 0.047 | 0.035 | 0.099 | 0.549 | 3 | 0.0473 |

AAE | 0.072 | 0.063 | 0.109 | 0.554 | 2 | 0.0691 |

© 2018 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

## Share and Cite

**MDPI and ACS Style**

Pinho, E.; Costa, C.
Unsupervised Learning for Concept Detection in Medical Images: A Comparative Analysis. *Appl. Sci.* **2018**, *8*, 1213.
https://doi.org/10.3390/app8081213

**AMA Style**

Pinho E, Costa C.
Unsupervised Learning for Concept Detection in Medical Images: A Comparative Analysis. *Applied Sciences*. 2018; 8(8):1213.
https://doi.org/10.3390/app8081213

**Chicago/Turabian Style**

Pinho, Eduardo, and Carlos Costa.
2018. "Unsupervised Learning for Concept Detection in Medical Images: A Comparative Analysis" *Applied Sciences* 8, no. 8: 1213.
https://doi.org/10.3390/app8081213