# A Hybrid Framework for Lung Cancer Classification

^{*}

## Abstract

**:**

## 1. Introduction

- A novel hybrid framework LCGANT is proposed to classify lung cancer images and solve the overfitting problem of lung cancer classification tasks.
- The proposed LCGANT framework is better than other state-of-the-art approaches.
- A lung cancer deep convolutional GAN (LCGAN) can generate synthetic lung cancer datasets to solve lung cancer classification tasks’ limited labelled data issue. A regularization enhanced model called VGG-DF can prevent overfitting problems with pre-trained model auto-selection.

## 2. Related Work

#### 2.1. Cancer Classification

#### 2.2. Data Augmentation

#### 2.3. Pre-Trained CNN Architecture for Feature Extraction

#### 2.3.1. VGG-16

#### 2.3.2. ResNet50

#### 2.3.3. DenseNet121

#### 2.3.4. EfficientNet

## 3. Materials and Methods

#### 3.1. Dataset

#### 3.2. Proposed LCGANT Framework for Lung Cancer Classification

#### 3.2.1. Image Synthesis Using LCGAN

- Replace the fully connected layers with a uniform noise distribution for the generator;
- Use the sigmoid function of the flattened layer for the discriminator;
- Add a Batch Normalization [26] layer to generator and discriminator to avoid poor initialization problems during the training process. The algorithm of Batch Normalization is shown in Equation (2). Here, we use $\mathcal{B}$ to represent a mini-batch of an entire training set with m examples. Then we can calculate the mean and variance of the mini-batch. Subsequently, we normalize the data ${x}_{i}$ in the mini-batch. The $\u03f5$ is an arbitrarily small constant for numerical stability. Finally, we implement a transformation step to scale and shift the output.
- All layers in the discriminator use the LeakyReLU function.

- Try to add more filters in the front layers of the generator. More filters in the front layers can help the generator to get more activation maps to avoid missing essential features of the original image. Without sufficient filters, the generator will produce blurry images.
- Use the LeakyReLU activation function for all the layers in the generator except the output layer with the Tanh function.
- Add several dropout layers in the discriminator to avoid overfitting.
- Avoid checkerboard artefacts:
- During image pre-processing, we use the bilinear interpolation algorithm when we resize the original size to $64\times 64$. The bilinear interpolation algorithm is used to do two-dimensional interpolation for a rectangle. First, we find four points of a rectangle: $({x}_{1},{y}_{1})$, $({x}_{1},{y}_{2})$, $({x}_{2},{y}_{1})$, and $({x}_{2},{y}_{2})$. Second, we assume the values of four points are ${Q}_{11}$ for $({x}_{1},{y}_{1})$, ${Q}_{21}$ for $({x}_{1},{y}_{2})$, ${Q}_{12}$ for $({x}_{2},{y}_{1})$, and ${Q}_{22}$ for $({x}_{2},{y}_{2})$. Finally, we can estimate the value of the formula at any point $(x,y)$. The algorithm can refer to Equation (3).
- Inspired by the idea from [27], the author uses sub-pixel convolution to get a better performance in image super-resolution. Set the kernel size that can be divided by the stride, and try to make the kernel size as big as possible.

#### 3.2.2. Regularization Enhanced Transfer Learning Model

#### 3.2.3. Pre-Trained Model Auto-Selection

## 4. Results

#### 4.1. Set-Up of Experiments

#### 4.2. Results and Analysis

#### 4.2.1. Lung Images Generated by DCGAN Generator

#### 4.2.2. The Results of Different Transfer Learning Models with Different Training Datasets

#### 4.3. Comparison with State-Of-Art Methods for the Same Dataset

## 5. Discussion

- The synthetic images generated by the LCGAN have slight differences from real images.
- The dimension of images produced by the generator is $64\times 64$. This is not sufficient for the biomedical domain. Images with high resolution are very essential in this area.

## 6. Conclusions

## Author Contributions

## Funding

## Data Availability Statement

## Acknowledgments

## Conflicts of Interest

## References

- World Health Organization. Available online: https://www.who.int/news-room/fact-sheets/detail/cancer (accessed on 9 May 2022).
- Siegel, R.L.; Miller, K.D.; Fuchs, H.E.; Jemal, A. Cancer statistics, 2021. Cancer J. Clin.
**2021**, 71, 7–33. [Google Scholar] [CrossRef] [PubMed] - Wernick, M.N.; Yang, Y.; Brankov, J.G.; Yourganov, G.; Strother, S.C. Machine learning in medical imaging. IEEE Signal Process. Mag.
**2010**, 27, 25–38. [Google Scholar] [CrossRef] [PubMed] [Green Version] - Alyafeai, Z.; Ghouti, L. A fully-automated deep learning pipeline for cervical cancer classification. Expert Syst. Appl.
**2020**, 141, 112951. [Google Scholar] [CrossRef] - Kriegsmann, M.; Haag, C.; Weis, C.A.; Steinbuss, G.; Warth, A.; Zgorzelski, C.; Muley, T.; Winter, H.; Eichhorn, M.E.; Eichhorn, F.; et al. Deep learning for the classification of small-cell and non-small-cell lung cancer. Cancers
**2020**, 12, 1604. [Google Scholar] [CrossRef] [PubMed] - Li, J.; Wang, P.; Zhou, Y.; Liang, H.; Luan, K. Different machine learning and deep learning methods for the classification of colorectal cancer lymph node metastasis images. Front. Bioeng. Biotechnol.
**2021**, 8, 1521. [Google Scholar] [CrossRef] - Lakshmanaprabu, S.; Mohanty, S.N.; Shankar, K.; Arunkumar, N.; Ramirez, G. Optimal deep learning model for classification of lung cancer on CT images. Future Gener. Comput. Syst.
**2019**, 92, 374–382. [Google Scholar] - Perez, L.; Wang, J. The effectiveness of data augmentation in image classification using deep learning. arXiv
**2017**, arXiv:1712.04621. [Google Scholar] - DeVries, T.; Taylor, G.W. Dataset augmentation in feature space. arXiv
**2017**, arXiv:1702.05538. [Google Scholar] - Goodfellow, I.; Pouget-Abadie, J.; Mirza, M.; Xu, B.; Warde-Farley, D.; Ozair, S.; Courville, A.; Bengio, Y. Generative adversarial nets. Adv. Neural Inf. Process. Syst.
**2014**, 27, 1–9. [Google Scholar] - Radford, A.; Metz, L.; Chintala, S. Unsupervised representation learning with deep convolutional generative adversarial networks. arXiv
**2015**, arXiv:1511.06434. [Google Scholar] - Mirza, M.; Osindero, S. Conditional generative adversarial nets. arXiv
**2014**, arXiv:1411.1784. [Google Scholar] - Zhu, J.Y.; Park, T.; Isola, P.; Efros, A.A. Unpaired image-to-image translation using cycle-consistent adversarial networks. In Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy, 22–29 October 2017; pp. 2223–2232. [Google Scholar]
- Karras, T.; Laine, S.; Aila, T. A style-based generator architecture for generative adversarial networks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 15–20 June 2019; pp. 4401–4410. [Google Scholar]
- Shorten, C.; Khoshgoftaar, T.M. A survey on image data augmentation for deep learning. J. Big Data
**2019**, 6, 1–48. [Google Scholar] [CrossRef] - Gatys, L.A.; Ecker, A.S.; Bethge, M. A neural algorithm of artistic style. arXiv
**2015**, arXiv:1508.06576. [Google Scholar] [CrossRef] - Zoph, B.; Le, Q.V. Neural architecture search with reinforcement learning. arXiv
**2016**, arXiv:1611.01578. [Google Scholar] - Gao, W.; Baskonus, H.M. Deeper investigation of modified epidemiological computer virus model containing the Caputo operator. Chaos Solitons Fractals
**2022**, 158, 112050. [Google Scholar] [CrossRef] - Zhong, Y.; Ruan, G.; Abozinadah, E.; Jiang, J. Least-squares method and deep learning in the identification and analysis of name-plates of power equipment. Appl. Math. Nonlinear Sci.
**2021**. ahead of print. [Google Scholar] [CrossRef] - Simonyan, K.; Zisserman, A. Very deep convolutional networks for large-scale image recognition. arXiv
**2014**, arXiv:1409.1556. [Google Scholar] - Deng, J.; Dong, W.; Socher, R.; Li, L.J.; Li, K.; Fei-Fei, L. Imagenet: A large-scale hierarchical image database. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Miami, FL, USA, 20–25 June 2009; pp. 248–255. [Google Scholar]
- He, K.; Zhang, X.; Ren, S.; Sun, J. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 770–778. [Google Scholar]
- Huang, G.; Liu, Z.; Van Der Maaten, L.; Weinberger, K.Q. Densely connected convolutional networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 4700–4708. [Google Scholar]
- Tan, M.; Le, Q. Efficientnet: Rethinking model scaling for convolutional neural networks. In Proceedings of the International conference on Machine Learning. PMLR, Long Beach, CA, USA, 5–9 June 2019; pp. 6105–6114. [Google Scholar]
- Borkowski, A.A.; Bui, M.M.; Thomas, L.B.; Wilson, C.P.; DeLand, L.A.; Mastorides, S.M. Lung and colon cancer histopathological image dataset (lc25000). arXiv
**2019**, arXiv:1912.12142. [Google Scholar] - Ioffe, S.; Szegedy, C. Batch normalization: Accelerating deep network training by reducing internal covariate shift. In Proceedings of the International conference on Machine Learning PMLR, Lille, France, 7–9 July 2015; pp. 448–456. [Google Scholar]
- Shi, W.; Caballero, J.; Huszár, F.; Totz, J.; Aitken, A.P.; Bishop, R.; Rueckert, D.; Wang, Z. Real-time single image and video super-resolution using an efficient sub-pixel convolutional neural network. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 27–30 June 2016; pp. 1874–1883. [Google Scholar]
- Bukhari, S.U.K.; Asmara, S.; Bokhari, S.K.A.; Hussain, S.S.; Armaghan, S.U.; Shah, S.S.H. The Histological Diagnosis of Colonic Adenocarcinoma by Applying Partial Self Supervised Learning. medRxiv
**2020**. [Google Scholar] - Phankokkruad, M. Ensemble Transfer Learning for Lung Cancer Detection. In Proceedings of the 4th International Conference on Data Science and Information Technology, Shanghai, China, 23–25 July 2021; pp. 438–442. [Google Scholar]
- Hlavcheva, D.; Yaloveha, V.; Podorozhniak, A.; Kuchuk, H. Comparison of CNNs for Lung Biopsy Images Classification. In Proceedings of the IEEE 3rd Ukraine Conference on Electrical and Computer Engineering (UKRCON), Lviv, Ukraine, 6–8 July 2021; pp. 1–5. [Google Scholar]
- Masud, M.; Sikder, N.; Nahid, A.A.; Bairagi, A.K.; AlZain, M.A. A machine learning approach to diagnosing lung and colon cancer using a deep learning-based classification framework. Sensors
**2021**, 21, 748. [Google Scholar] [CrossRef] - Hatuwal, B.K.; Thapa, H.C. Lung Cancer Detection Using Convolutional Neural Network on Histopathological Images. Int. J. Comput. Trends Technol
**2020**, 68, 21–24. [Google Scholar] [CrossRef] - Chehade, A.H.; Abdallah, N.; Marion, J.M.; Oueidat, M.; Chauvet, P. Lung and Colon Cancer Classification Using Medical Imaging: A Feature Engineering Approach. 2022. preprint. Available online: https://assets.researchsquare.com/files/rs-1211832/v1_covered.pdf?c=1641239335 (accessed on 6 April 2022).
- Raza, K.; Singh, N.K. A tour of unsupervised deep learning for medical image analysis. Curr. Med. Imaging
**2021**, 17, 1059–1077. [Google Scholar] [CrossRef] [PubMed]

**Figure 1.**Three examples of different lung cancer classes from the original dataset with size $768\times 768$: (

**a**) lung adenocarcinoma, (

**b**) lung benign, (

**c**) lung squamous cell carcinoma.

**Figure 3.**Synthetic lung benign images generated from the different version of LCGAN: (

**a**) image from the initial version of LCGAN, (

**b**) image from LCGAN which applies avoiding checkerboard methods, (

**c**) image from LCGAN which applies all the proposed settings, (

**d**) image from original dataset.

**Figure 5.**Comparison between real and LCGAN images: (

**a**) lung adenocarcinoma, (

**b**) lung benign, (

**c**) lung squamous cell carcinoma.

Hyperparameter | Value |
---|---|

Optimizer | Adam |

Loss (LCGAN) | Binary Crossentropy |

Loss (VGG-DF) | Categorical Crossentropy |

Dropout Probability | 0.2 |

Batch Size | 256 |

Number of Epochs (LCGAN) | 15,000 |

Number of Epochs (VGG-DF) | 5 |

Kernel Size (G) | 4 |

Kernel Size (D) | 3 |

Layer (Type) | Output Shape | Number of Parameters |
---|---|---|

Input | 100 | 0 |

Dense | 8192 | 827,392 |

LeakyReLU | 8192 | 0 |

Reshape | $4\times 4\times 512$ | 0 |

TransConv | $8\times 8\times 512$ | 4,194,816 |

BatchNorm | $8\times 8\times 512$ | 2048 |

LeakyReLU | $8\times 8\times 512$ | 0 |

TransConv | $16\times 16\times 512$ | 2,097,408 |

BatchNorm | $16\times 16\times 512$ | 1024 |

LeakyReLU | $16\times 16\times 512$ | 0 |

TransConv | $32\times 32\times 128$ | 524,416 |

BatchNorm | $32\times 32\times 128$ | 512 |

LeakyReLU | $32\times 32\times 128$ | 0 |

TransConv | $64\times 64\times 64$ | 131,136 |

BatchNorm | $64\times 64\times 64$ | 256 |

LeakyReLU | $64\times 64\times 64$ | 0 |

TransConv | $64\times 64\times 3$ | 3075 |

Activation | $64\times 64\times 3$ | 0 |

Total Parameters: 7,782,083 | ||

Trainable Parameters: 7,780,163 | ||

Non-trainable Parameters: 1920 |

Layer (Type) | Output Shape | Number of Parameters |
---|---|---|

Input | $64\times 64\times 30$ | |

Conv2D | $64\times 64\times 128$ | 3584 |

LeakyReLU | $64\times 64\times 128$ | 0 |

BatchNorm | $64\times 64\times 128$ | 512 |

Conv2D | $64\times 64\times 128$ | 147,584 |

LeakyReLU | $64\times 64\times 128$ | 0 |

BatchNorm | $64\times 64\times 128$ | 512 |

MaxPooling | $21\times 21\times 128$ | 0 |

Dropout | $21\times 21\times 128$ | 0 |

Conv2D | $21\times 21\times 128$ | 147,584 |

LeakyReLU | $21\times 21\times 128$ | 0 |

BatchNorm | $21\times 21\times 128$ | 512 |

Conv2D | $21\times 21\times 128$ | 147,584 |

LeakyReLU | $21\times 21\times 128$ | 0 |

BatchNorm | $21\times 21\times 128$ | 512 |

MaxPooling | $7\times 7\times 128$ | 0 |

Dropout | $7\times 7\times 128$ | 0 |

Flatten | 6272 | 0 |

Dense | 128 | 802,944 |

LeakyReLU | 128 | 0 |

Dense | 128 | 16,512 |

LeakyReLU | 128 | 0 |

Dense | 1 | 129 |

Total Parameters: 1,267,969 | ||

Trainable Parameters: 1,266,945 | ||

Non-trainable Parameters: 1024 |

VGG-DF (Ours) | ResNet50 | DenseNet121 | EfficientNetB4 | |
---|---|---|---|---|

Accuracy | 95.80% | 95.56% | 53.87% | 45.96% |

Precision | 95.81% | 95.56% | 76.86% | 75.93% |

Sensitivity | 95.80% | 95.56% | 53.87% | 45.96% |

F1-score | 95.80% | 95.56% | 49.62% | 38.44% |

**Table 5.**Results of different transfer learning models trained on the dataset that combines original and synthetic datasets.

VGG-DF (Ours) | ResNet50 | DenseNet121 | EfficientNetB4 | |
---|---|---|---|---|

Accuracy | 99.84% | 99.46% | 79.64% | 51.74% |

Precision | 99.84% | 99.49% | 66.81% | 68.70% |

Sensitivity | 99.84% | 99.46% | 59.19% | 51.74% |

F1-score | 99.84% | 99.46% | 52.06% | 43.03% |

Reference | Model (Method) | Accuracy | Precision | Sensitivity | F1-Score |
---|---|---|---|---|---|

[28] | RESNET50 | 93.91% | 95.74% | 81.82% | 96.26% |

RESNET18 | 93.04% | 96.81% | 84.21% | 95.79% | |

RESNET34 | 93.04% | 95.74% | 80.95% | 95.74% | |

[29] | Ensemble | 91% | 92% | 91% | 91% |

ResNet50V2 | 90% | 91% | 90% | 90% | |

[30] | CNN-D | 94.6% | - | - | - |

[31] | DL-based CNN | 96.33% | 96.39% | 96.37% | 96.38% |

[32] | CNN | 97.2% | 97.33% | 97.33% | 97.33% |

[33] | XGBoost | 99.53% | 99.33% | 99.33% | 99.33% |

Our method | LCGANT | 99.84% | 99.84% | 99.84% | 99.84% |

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |

© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

## Share and Cite

**MDPI and ACS Style**

Ren, Z.; Zhang, Y.; Wang, S.
A Hybrid Framework for Lung Cancer Classification. *Electronics* **2022**, *11*, 1614.
https://doi.org/10.3390/electronics11101614

**AMA Style**

Ren Z, Zhang Y, Wang S.
A Hybrid Framework for Lung Cancer Classification. *Electronics*. 2022; 11(10):1614.
https://doi.org/10.3390/electronics11101614

**Chicago/Turabian Style**

Ren, Zeyu, Yudong Zhang, and Shuihua Wang.
2022. "A Hybrid Framework for Lung Cancer Classification" *Electronics* 11, no. 10: 1614.
https://doi.org/10.3390/electronics11101614