3.2.1. Image Synthesis Using LCGAN
Our dataset pre-processing of LCGAN includes two steps: First, we use images of each class as a single dataset to train three LCGAN models individually. Second, we use three trained LCGAN to generate different synthetic images for each class. The original dataset is resized as dimensions with 256 batch size, and we also use bilinear interpolation as an interpolation algorithm for resizing. A bilinear interpolation algorithm can avoid generating strange checkerboard patterns of images. After the training process of LCGAN is finished, we generate 10,000 synthetic images for each class of lung cancer with size ; we then have two datasets after the LCGAN training process is finished. One is the original dataset with size ; the other one is generated by the LCGAN with size . Before we use these datasets to train transfer learning models, we construct two new datasets: the real dataset with size , which is resized from the original dataset (15,000 images in total, each class has 5000 images); and the real and synthetic dataset with size , which combines the original dataset and synthetic dataset (45,000 images in total, 15,000 images for each class, the images in each class includes 5000 real images and 10,000 synthetic images). We then split the training and test datasets in 70:30 ratio for both datasets.
The idea of LCGAN is inspired by the deep convolutional generative adversarial networks (DCGANs) [
11]. It is a class of unsupervised learning CNNs and contains traditional convolutional architecture with certain architecture constraints. Nonetheless, the basic theory for both LCGAN and DCGANs is from the GANs [
10]. The whole GANs contain two parts: a generative model G called generator, which can capture the data distribution to manufacture outputs; and a discriminative model D called discriminator, which can estimate the outputs from the generator whether they are artificially created. During the training procedure of the generator, it will improve the chance for the discriminator to make mistakes. The idea was inspired by the famous strategy of the minimax two-player game. If the probability of real and fake images are both equal to
for the discriminator, we can say that the generator has enough abilities to generate artificial images. The mathematical representation of this theory will be explained in the next paragraph.
We assume
is the generator’s distribution over data
x. The mapping from input noise variables
to data space is defined as
. The
G is regarded as the generator with parameters
. Similarly,
is a discriminator with parameters
that can take the real data or data from the generator to output a single scalar called
. During the training process, we train discriminator
D to increase the accuracy of classifying the real images. Simultaneously, we train generator
G to generate fake images close to real images during the training process. Overall, the formula tries to train a discriminator that can maximise the probability of distinguishing real images and synthetic images. It also trains a generator that can minimise the probability of distinguishing real and fake images by the discriminator. Finally, the generator can generate synthetic images that look like real images, and the discriminator cannot find the differences between them. The formula can be described as a function
below:
The LCGAN inherits some default settings from the DCGANs [
11]:
Replace the fully connected layers with a uniform noise distribution for the generator;
Use the sigmoid function of the flattened layer for the discriminator;
Add a Batch Normalization [
26] layer to generator and discriminator to avoid poor initialization problems during the training process. The algorithm of Batch Normalization is shown in Equation (
2). Here, we use
to represent a mini-batch of an entire training set with
m examples. Then we can calculate the mean and variance of the mini-batch. Subsequently, we normalize the data
in the mini-batch. The
is an arbitrarily small constant for numerical stability. Finally, we implement a transformation step to scale and shift the output.
All layers in the discriminator use the LeakyReLU function.
Initially, we used most of the default settings from the DCGANs to train our models. The generated images always blur, and some checkerboard patterns are present in the synthetic images. Therefore, beyond the default setting from the DCGANs, we redesigned some features based on the default settings. The overall architecture of LCGAN is shown in
Figure 2, and the innovations of LCGAN are shown as follows:
Try to add more filters in the front layers of the generator. More filters in the front layers can help the generator to get more activation maps to avoid missing essential features of the original image. Without sufficient filters, the generator will produce blurry images.
Use the LeakyReLU activation function for all the layers in the generator except the output layer with the Tanh function.
Add several dropout layers in the discriminator to avoid overfitting.
Avoid checkerboard artefacts:
During image pre-processing, we use the bilinear interpolation algorithm when we resize the original size to
. The bilinear interpolation algorithm is used to do two-dimensional interpolation for a rectangle. First, we find four points of a rectangle:
,
,
, and
. Second, we assume the values of four points are
for
,
for
,
for
, and
for
. Finally, we can estimate the value of the formula at any point
. The algorithm can refer to Equation (
3).
Inspired by the idea from [
27], the author uses sub-pixel convolution to get a better performance in image super-resolution. Set the kernel size that can be divided by the stride, and try to make the kernel size as big as possible.
Figure 3 shows synthetic lung benign images based on the different versions of LCGAN. As
Figure 3a shows, the initial version of LCGAN uses the default settings of the DCGANs, and it causes checkerboard patterns and blurry images. After using our settings,
Figure 3b does not have a checkerboard pattern anymore, but it is still blurry. After applying all of the methods,
Figure 3c becomes clearer than
Figure 3b. By comparing with the real image shown in
Figure 3d, the LCGAN generates clearer images that are difficult to distinguish from the real images.
3.2.2. Regularization Enhanced Transfer Learning Model
The second part of the LCGANT framework is based on the pre-trained transfer learning models. We use four different pre-trained models in our framework: VGG-16, ResNet50, DenseNet121, and EfficientNetB4. Beyond the traditional transfer learning process, we add the dropout and fine-tuning techniques to prevent the overfitting problem and improve the classification performance. Finally, we found that our modified VGG-16 model got the best performance, and we call it VGG-DF.
The overall architecture of VGG-DF is shown in
Figure 4. The entire architecture of VGG-DF includes five blocks, and there are different layers in each block.
The large dataset requires sufficient computational power and time to train a model with better performance, and it is not always ideal to train a model with a large dataset from scratch. However, if a model is only suitable for specific data and cannot efficiently work on different datasets, the robustness of the model is poor, and it cannot be widely used in different domains. Fortunately, the concept of transfer learning can easily use pre-trained models to adapt to similar datasets.
The typical workflow of transfer learning starts with taking layers of the pre-trained model. We load the four different models in our proposed framework: VGG-16, ResNet50, DenseNet121, and EfficientNetB4. The second step is to freeze all the pre-trained model layers to preserve the existing parameters. We then add trainable layers on top of the pre-trained models. These layers will be trained to fit the new dataset. In all the pre-trained models, the last layer is configured with a softmax function to classify three classes of lung cancer. The fourth step is training the added layer with the dataset. We also use fine-tuning to unfreeze the entire model, and re-training the entire model with a low learning rate can help the entire model adapt to the dataset.
To prevent the overfitting problem, we add a dropout layer to each pre-trained model to improve the generalization ability of the classification performance. Deep learning neural networks are likely to produce the overfitting problem when the neural network model has plenty of parameters with an insufficient dataset. On the one hand, the inadequate performance of the model does not have enough ability to extract the feature maps of the dataset; On the other hand, if the model fits the training dataset too well, it causes the overfitting problem. The overfitting problem typically means that the model has good performance with the training data, but it does not generalize well on the test dataset.
When training each batch of the dataset, the dropout layer randomly removes the neurons according to the given dropout probability of 0.2, and only the weights of retained neurons are updated. Due to the random elimination of neurons, the sparsity of the model improves to a certain level, which can efficiently reduce the synergistic effect. It also weakens the joint adaptability between neurons and finally enhances the generalization ability and robustness of the model. The algorithm of dropout for each neuron shows in Equation (
4). In this equation, each intermediate activation
is replaced by a value
with a dropout probability
p.
All four of these transfer learning models are first trained on the original dataset and we record the results. The original dataset has 15,000 images that belong to three classes of lung cancer. After training on the original dataset, LCGAN generates 30,000 images, and each class has 10,000 new images. We then record the results for it. We will evaluate the results in
Section 4.
3.2.3. Pre-Trained Model Auto-Selection
To ensure that we can get the best performance of each transfer learning model during the training, we use two callback functions to save the training time and the best model during the training process. The functions are model checkpoint callback and early stopping callback. After we get the model, we automatically compare the performance of each model and choose the model with the best performance of the four pre-trained models.
The model checkpoint callback function will monitor the overall performance of the current training model. It will save the model when the current model gets the best performance. As a result, the final model will have the best performance within all the model variants during the training process. An early stop callback function is also used in our framework. The early stop callback function will monitor the validation loss with the patience of 10 epochs, then try to stop the training process when there are no improvements within 10 epochs.
When we finished the training process of four transfer learning models, we compared the performance of each model and chose the VGG-DF with the best performance as the second part of the LCGANT framework.