# Unsupervised Learning for Concept Detection in Medical Images: A Comparative Analysis

## Abstract

## 1. Introduction

#### Related Work

## 2. Methods

- We experimented with creating image descriptors using bags of visual words (BoWs), for two different visual keypoint extraction algorithms; and
- With the use of modern deep learning approaches, we designed and trained various deep neural network architectures: a sparse denoising autoencoder (SDAE), a variational autoencoder (VAE), a bidirectional generative adversarial network (BiGAN), and an adversarial autoencoder (AAE).

#### 2.1. Bags of Visual Words

#### 2.2. Deep Representation Learning

#### 2.2.1. Sparse Denoising Autoencoder

#### 2.2.2. Variational Autoencoder

#### 2.2.3. Bidirectional GAN

- Like in the encoder, the image was processed by the convolutional neural network described in Table 1 with $nb=128$;
- The prior code, z, was fed to two fully connected layers with an output shape of B × 64 (where B is the batch size);
- The two outcomes, (1) and (2), were concatenated to form a tensor of shape B × 192, followed by 2 fully connected networks of shape B × 512;
- Finally, a fully connected layer with a single neuron (Bx1) produced the output $D(x,z)$.

#### 2.2.4. Adversarial Autoencoder

#### 2.2.5. Network Training Details

#### 2.3. Evaluation

#### 2.3.1. Logistic Regression

#### 2.3.2. k-Nearest Neighbors

## 3. Results

#### 3.1. Qualitative Results

#### 3.2. Linear Classifiers

#### 3.3. k-Nearest Neighbors

## 4. Conclusions

## Abbreviations

CBIR | Content-based Image Retrieval |

RBM | Restricted Boltzmann Machine |

GAN | Generative Adversarial Network |

BoW | Bag of Words |

SDAE | Sparse Denoising Autoencoder |

VAE | Variational Autoencoder |

BiGAN | Bidirectional Generative Adversarial Network |

AAE | Adversarial Autoencoder |

SIFT | Scale Invariant Feature Transform |

ORB | Oriented FAST and Rotated BRIEF |

ReLU | Rectified Linear Unit |

CUI | Concept Unique Identifier |

UMLS | Unified Medical Language System |

PCA | Principal Component Analysis |

FTRL | follow-the-regularized-leader |

**Figure 5.**A few samples from the ImageCLEF 2017 concept detection data set with their respective file IDs and trimmed list of concept identifiers.

**Figure 6.**The 2D projections of the latent codes in the validation set, for each learned feature space. Best seen in color.

Layer | Kernels | Size/Stride | Details |
---|---|---|---|

conv1 | 64 | 5 × 5/2 | Normalization + non-linearity |

conv2 | 128 | 5 × 5/2 | Normalization + non-linearity |

conv3 | 256 | 5 × 5/2 | Normalization + non-linearity |

conv4 | 512 | 5 × 5/2 | Normalization + non-linearity |

conv5 | 512 | 5 × 5/2 | Normalization + non-linearity |

avgpool | N/A | N/A | |

fc | nb | Linear activation |

Layer | Kernels | Size/Stride | Details |
---|---|---|---|

fc | 4096 | Reshaped to 1024 × 2 × 2 | |

dconv5 | 512 | 5 × 5/2 | Normalization + ReLU |

dconv4 | 256 | 5 × 5/2 | Normalization + ReLU |

dconv3 | 128 | 5 × 5/2 | Normalization + ReLU |

dconv2 | 64 | 5 × 5/2 | Normalization + ReLU |

dconv1 | 3 | 5 × 5/2 | Linear activation |

**Table 3.**The ten most frequently occurring concepts in the ImageCLEF 2017 training set for concept detection.

CUI | Occurrences in Training Set | Textual Description |
---|---|---|

C1696103 | 17998 | Image-dosage form |

C0040405 | 16217 | X-ray computed tomography |

C0221198 | 14219 | Lesion |

C1306645 | 10926 | Plain X-ray |

C0577559 | 9769 | Mass (lump, localized mass) |

C0027651 | 9570 | Tumor |

C0441633 | 9289 | Diagnostic scanning |

C0817096 | 5602 | Thorax |

C1317574 | 5039 | Note |

C0087111 | 4983 | Therapy |

**Table 4.**The best metrics obtained from logistic regression for each representation learned, where Mix is the feature combination of sparse denoising autoencoder (SDAE) and adversarial autoencoder (AAE). The highest scores are shown in bold.

Type | ${F}_{1}$ Score | Precision | Recall | AUC | ${F}_{1}$ Score (Test) |
---|---|---|---|---|---|

ORB | 0.138 | 0.138 | 0.143 | 0.699 | 0.0967 |

SIFT | 0.133 | 0.119 | 0.151 | 0.753 | 0.0952 |

SDAE | 0.151 | 0.141 | 0.162 | 0.781 | 0.1029 |

VAE | 0.140 | 0.137 | 0.142 | 0.760 | 0.0924 |

BiGAN | 0.141 | 0.142 | 0.139 | 0.781 | 0.781 |

AAE | 0.159 | 0.159 | 0.174 | 0.787 | 0.1080 |

Mix | 0.161 | 0.147 | 0.179 | 0.789 | 0.1105 |

**Table 5.**The best ${F}_{1}$ scores obtained from the vector similarity search for each representation learned with the highest scores shown in bold.

Type | ${F}_{1}$ Score | Precision | Recall | AUC | k | ${F}_{1}$ Score (Test) |
---|---|---|---|---|---|---|

ORB | 0.043 | 0.030 | 0.106 | 0.552 | 4 | 0.0418 |

SIFT | 0.060 | 0.043 | 0.134 | 0.567 | 3 | 0.0567 |

SDAE | 0.080 | 0.070 | 0.120 | 0.560 | 2 | 0.0751 |

VAE | 0.036 | 0.025 | 0.087 | 0.543 | 4 | 0.0345 |

BiGAN | 0.047 | 0.035 | 0.099 | 0.549 | 3 | 0.0473 |

AAE | 0.072 | 0.063 | 0.109 | 0.554 | 2 | 0.0691 |

