Using Generative Adversarial Networks and Parameter Optimization of Convolutional Neural Networks for Lung Tumor Classiﬁcation

: Cancer is the leading cause of death worldwide. Lung cancer, especially, caused the most death in 2018 according to the World Health Organization. Early diagnosis and treatment can considerably reduce mortality. To provide an efﬁcient diagnosis, deep learning is overtaking conventional machine learning techniques and is increasingly being used in computer-aided design systems. However, a sparse medical data set and network parameter tuning process cause network training difﬁculty and cost longer experimental time. In the present study, the generative adversarial network was proposed to generate computed tomography images of lung tumors for alleviating the problem of sparse data. Furthermore, a parameter optimization method was proposed not only to improve the accuracy of lung tumor classiﬁcation, but also reduce the experimental time. The experimental results revealed that the average accuracy can reach 99.86% after image augmentation and parameter optimization.


Introduction
According to a report from the World Health Organization in 2018, there were about 9.6 million deaths from cancer globally, of which 1.76 million cases were attributed to lung cancers [1]. Studies have identified environmental factors and smoking as major causes of lung cancer [2]. Generally, chest X-ray, computed tomography (CT) and magnetic resonance imaging are modalities used to evaluate lung cancer [3,4]. The chest X-ray is the first test in diagnosing lung cancer. It indicates abnormal formations in the lungs. Compared to a chest X-ray, a CT scan can show a more detailed view of the lungs and can also show the exact shape, size, and location of formations. A CT scan is therefore a major diagnostic tool for the assessment of lung cancer. To reduce the workload of analyzing CT images manually and to avoid subjective interpretations, machine learning techniques are applied to computer-aided design systems for objectively auxiliary diagnosis. Lately, due to the rapid growth of deep learning, convolutional neural networks (CNNs) not only show good performance in image classification and object detection tasks [5][6][7], but are also widely used in several applications, including smart homes, driverless cars, manufacturing robots, drones, and chat robots. The studies of CNNs have been continually innovating and improving.
In 1998, LeCun et al. proposed LeNet-5 [8], a simple CNN for handwritten digits classification. The LeNet-5 comprises a feature extraction part convolutional layers and pooling layers and a classification part fully connected layer. Subsequently, in 2012, Krizhevsky et al. 2 of 17 proposed AlexNet [9] and won the ImageNet Large Scale Visual Recognition Competition. AlexNet replaces Sigmoid and Tanh activation function with a rectified linear unit. It also introduces Dropout and max pooling which are different from LeNet-5. In 2014, Szegedy et al. proposed GoogLeNet [10] and proposed Inception module which uses three different size of convolutional kernels simultaneously for extracting more features in one layer. In the same year, Simonyan et al. proposed VGGNet model [11]. VGGNet adopts 3 × 3 stacked convolutional layers and deepening the depth of the network as well as the number of input and output channel of layers. In 2015, He et al. proposed ResNet [12] and introduced a residual block alleviating the degradation problem of the deep network. Certainly, more and more architectures are innovated. However, unbalanced or sparse data sets and network parameter settings for training are two major problems faced by deep learning.
Unbalanced data, especially medical images, is one of the most challenging tasks in deep learning [13][14][15][16][17]. Typical data augmentation methods include translation, rotation, flipping, and zooming [18,19]. However, those geometric transformations might not be able to provide sufficient data diversity. In 2014, generative adversarial networks (GANs) [20] were proposed to tackle the problem of sparse data. This model consists of two networks: one generator network and one discriminator network. The generator network aims to generate plausible fake images. On the other hand, the discriminator network distinguishes real data from the data created by the generator or real data and acts as a classifier. In 2015, deep convolutional GANs (DCGANs), a direct extension of GANs, were proposed [21] which replaced the original convolutional layers by transposed convolutional layers. Later, many studies discussed the complementary data process techniques in medical applications. Perez et al. [22] investigated the impact of 13 data augmentation scenarios, such as traditional color and geometric transforms, elastic transforms, random erasing, and lesion mixing method for melanoma classification. The results confirmed that data augmentation can lead to more performance gains than obtaining new images. Madani et al. [23] implemented GANs for producing chest X-ray images to augment a dataset and showed higher accuracy for normal vs abnormal classification in chest X-rays.
In the other hand, selecting a better network parameter combination is another timeconsuming task. Several experiments are required for determining the optimum parameter combination. To reduce the time cost, many studies have been proposed for network parameter optimization methods. Real et al. [24] introduced genetic algorithm into CNN architecture and achieved high accuracy in both CIFAR-10 and CIFAR-100 data sets. An autonomous and continuous learning algorithm proposed by Ma et al. [25] could automatically generate deep convolutional neural network (DCNN) architectures by partition DCNN into multiple stacked meta convolutional blocks and fully connected blocks then used genetic evolutionary operations to evolve a population of DCNN architectures. Although those methods showed high accuracy, they are still time consuming. The Taguchi method proposed by Dr. Genichi Taguchi has been widely applied as a design method [26][27][28]. It is not only straightforward and easy to implement in many engineering situations but also able to narrow down the scope of a research project quickly.
In the present study, the main contributions are to alleviate the problem of sparse medical images and to use a parameter optimizer to select an optimal network parameter combination in fewer experiments based on the state-of-art CNNs for providing an accurate and a general applicable lung tumor classification. Firstly, GAN was introduced to augment CT images in order to increase the data diversity for improving the accuracy of CNNs and AlexNet architecture was chosen as the backbone classification network with a parameter optimizer which is capable to select a better parameter combination in fewer experiments for achieving the goals of the present study. The rest of this paper is organized as follows. Section 2 describes a data augmentation method to increase lung tumor CT images. Section 3 reviews CNN architecture and introduces the network parameter optimizer. The experimental results and discussions are detailed in Section 4. Section 5 draws the conclusion. CT images. Section 3 reviews CNN architecture and introduces the network parameter optimizer. The experimental results and discussions are detailed in Section 4. Section 5 draws the conclusion.

Data Augmentation Using GANs
Training a multilayer CNNs using a limited number of data sets results in overfitting. To avoid overfitting data augmentation is a technique used to increase the amount of data in training processing. Typical data augmentation techniques include cropping, flipping, rotation, and translation, yet those methods are lacking data diversity. GAN, an innovation network proposed in 2014, was applied to generate new data automatically. It trains the generator and discriminator networks simultaneously. The former generates new images, and the latter learns to distinguish the fake images from the input of real and generated data. The two networks continually generate and discriminate tasks and constantly update the parameters of the network. Finally, the training process terminates when the generator network deceives the discriminator network. Figure 1 shows the flowchart of GANs. In the present study, DCGAN was applied for data augmentation which the discriminator network uses convolutional stride for downsampling and the generator network uses transposed convolution for upsampling. The details of DCGAN are described as follows.

Generator Network
The generator network learns characteristics from real images. First, a 1 × 1 × 100 noise array is converted into a 7 × 7 × 128 array by reshape and projection layers. The deconvolution (DC) layer, batch normalization, and ReLU activation function are performed to obtain a 64 × 64 × 3 image. Figure 2 displays the flowchart of the generator network. Table 1 lists the details of the generator network parameters.  In the present study, DCGAN was applied for data augmentation which the discriminator network uses convolutional stride for downsampling and the generator network uses transposed convolution for upsampling. The details of DCGAN are described as follows.

Generator Network
The generator network learns characteristics from real images. First, a 1 × 1 × 100 noise array is converted into a 7 × 7 × 128 array by reshape and projection layers. The deconvolution (DC) layer, batch normalization, and ReLU activation function are performed to obtain a 64 × 64 × 3 image. Figure 2 displays the flowchart of the generator network. Table 1 lists the details of the generator network parameters.
CT images. Section 3 reviews CNN architecture and introduces the network parameter optimizer. The experimental results and discussions are detailed in Section 4. Section 5 draws the conclusion.

Data Augmentation Using GANs
Training a multilayer CNNs using a limited number of data sets results in overfitting. To avoid overfitting data augmentation is a technique used to increase the amount of data in training processing. Typical data augmentation techniques include cropping, flipping, rotation, and translation, yet those methods are lacking data diversity. GAN, an innovation network proposed in 2014, was applied to generate new data automatically. It trains the generator and discriminator networks simultaneously. The former generates new images, and the latter learns to distinguish the fake images from the input of real and generated data. The two networks continually generate and discriminate tasks and constantly update the parameters of the network. Finally, the training process terminates when the generator network deceives the discriminator network. Figure 1 shows the flowchart of GANs. In the present study, DCGAN was applied for data augmentation which the discriminator network uses convolutional stride for downsampling and the generator network uses transposed convolution for upsampling. The details of DCGAN are described as follows.

Generator Network
The generator network learns characteristics from real images. First, a 1 × 1 × 100 noise array is converted into a 7 × 7 × 128 array by reshape and projection layers. The deconvolution (DC) layer, batch normalization, and ReLU activation function are performed to obtain a 64 × 64 × 3 image. Figure 2 displays the flowchart of the generator network. Table 1 lists the details of the generator network parameters.

Discriminator Network
The discriminator network determines whether the input image is a generated image or a real image. The network takes a 64 × 64 × 3 image as an input and the output is a scalar prediction score using a series of convolutional layers with batch normalization and leaky ReLU activation function. The dropout value is set to 0.5. Leaky ReLU shown in Figure 3 allocates all negative values to a nonzero slope. Figure 4 displays the flowchart of the discriminator network. Table 2 provides the details of the discriminator network parameters.

Discriminator Network
The discriminator network determines whether the input image is a generated or a real image. The network takes a 64 × 64 × 3 image as an input and the output is prediction score using a series of convolutional layers with batch normalization an ReLU activation function. The dropout value is set to 0.5. Leaky ReLU shown in F allocates all negative values to a nonzero slope. Figure 4 displays the flowchar discriminator network. Table 2 provides the details of the discriminator network eters.

Discriminator Network
The discriminator network determines whether the input image is a generated image or a real image. The network takes a 64 × 64 × 3 image as an input and the output is a scalar prediction score using a series of convolutional layers with batch normalization and leaky ReLU activation function. The dropout value is set to 0.5. Leaky ReLU shown in Figure 3 allocates all negative values to a nonzero slope. Figure 4 displays the flowchart of the discriminator network. Table 2 provides the details of the discriminator network parameters.

CNN Architecture and Parameter Optimizer
This section reviews a CNN architecture and describes how CNN parameters can be adjusted by using the parameter optimizer. Figure 5 illustrates the flowchart of parameter optimization process.

CNN Architecture and Parameter Optimizer
This section reviews a CNN architecture and describes how CNN parameters can be adjusted by using the parameter optimizer. Figure 5 illustrates the flowchart of parameter optimization process.

CNNs
CNNs are the most commonly modalities used for image recognition and usually consist of three parts: convolutional, pooling, and fully connected (FC) layers. The convolutional and pooling layers are the most crucial parts for extracting global and local features.

Convolutional Layer
The convolutional layer (C) contains several kernels which are used to extract features from images. Each convolutional layer is covered by kernels with various weight combinations. The kernel performs convolution operations through a sliding approach to generate feature maps. Then, the inner product between the input kernels at each spatial position is calculated. Finally, the output of the convolutional layer is obtained by stacking the feature maps of all kernels in the depth direction.

Pooling Layer
The objective of using a pooling layer (Pool) is to reduce the size of feature maps without losing important feature information and reduce subsequent operations. Pooling can be performed using several methods, including average and max pooling. Average pooling calculates the average value within the selected patch from the feature map. Contrarily, max pooling calculates the maximum value within the selected patch from the feature map. In addition, padding (P) is seldom applied in the pooling layer. Also, the pooling layer does not generate trainable variables.

Activation Function
In neural networks, each neuron is connected to other neurons in order to passing the signal from an input layer to an output layer in one direction. The activation layer relates to the forward propagation of the signal through the network. The purpose of the activation function is to substitute the nonlinear function into the output of the neuron to solve complex nonlinear problems. Sigmoid, tanh, and ReLU are common activation functions, with ReLU being among the most widely used. ReLU, as expressed in Equation (1), is also

CNNs
CNNs are the most commonly modalities used for image recognition and usually consist of three parts: convolutional, pooling, and fully connected (FC) layers. The convolutional and pooling layers are the most crucial parts for extracting global and local features.

Convolutional Layer
The convolutional layer (C) contains several kernels which are used to extract features from images. Each convolutional layer is covered by kernels with various weight combinations. The kernel performs convolution operations through a sliding approach to generate feature maps. Then, the inner product between the input kernels at each spatial position is calculated. Finally, the output of the convolutional layer is obtained by stacking the feature maps of all kernels in the depth direction.

Pooling Layer
The objective of using a pooling layer (Pool) is to reduce the size of feature maps without losing important feature information and reduce subsequent operations. Pooling can be performed using several methods, including average and max pooling. Average pooling calculates the average value within the selected patch from the feature map. Contrarily, max pooling calculates the maximum value within the selected patch from the feature map. In addition, padding (P) is seldom applied in the pooling layer. Also, the pooling layer does not generate trainable variables.

Activation Function
In neural networks, each neuron is connected to other neurons in order to passing the signal from an input layer to an output layer in one direction. The activation layer relates to the forward propagation of the signal through the network. The purpose of the activation function is to substitute the nonlinear function into the output of the neuron to solve complex nonlinear problems. Sigmoid, tanh, and ReLU are common activation functions, with ReLU being among the most widely used. ReLU, as expressed in Equation (1), is also used as an activation function for addressing the vanishing gradient problem and it can reduce the degree of overfitting, as displayed in Figure 6. used as an activation function for addressing the vanishing gradient problem an reduce the degree of overfitting, as displayed in Figure 6. 0, , Figure 6. ReLU activation function.

Fully Connected Layer
The fully connected (FC) layer is functioned as a classifier. The FC layer conv two-dimensional feature map output by the convolution layer into a one-dimensio tor. The final probability of each label is obtained using Softmax.
LeNet-5 and AlexNet contain fewer layers and simple architecture compar other deeper CNNs. Among them, AlexNet has not only been presented good mance in many applications, but also allows color images as input, such as co tomography images. Therefore, with data augmentation and parameter optimize mentation, AlexNet might be a suitable network architecture used in this study. A consists of five convolutional layers, three pooling layers, three FC layers, and with 1000 outputs. The aim of this study was to classify lung CT images into be malignant tumors. Thus, the transfer learning technique was applied to change FC layer to two outputs. The AlexNet architecture is illustrated in Figure 7 and lists the details of the AlexNet.

Fully Connected Layer
The fully connected (FC) layer is functioned as a classifier. The FC layer converts the two-dimensional feature map output by the convolution layer into a one-dimensional vector. The final probability of each label is obtained using Softmax.
LeNet-5 and AlexNet contain fewer layers and simple architecture compared with other deeper CNNs. Among them, AlexNet has not only been presented good performance in many applications, but also allows color images as input, such as computed tomography images. Therefore, with data augmentation and parameter optimizer implementation, AlexNet might be a suitable network architecture used in this study. AlexNet consists of five convolutional layers, three pooling layers, three FC layers, and Softmax with 1000 outputs. The aim of this study was to classify lung CT images into benign or malignant tumors. Thus, the transfer learning technique was applied to change the last FC layer to two outputs. The AlexNet architecture is illustrated in Figure 7 and Table 3 lists the details of the AlexNet. mentation, AlexNet might be a suitable network architecture used in this study. AlexNet consists of five convolutional layers, three pooling layers, three FC layers, and Softmax with 1000 outputs. The aim of this study was to classify lung CT images into benign or malignant tumors. Thus, the transfer learning technique was applied to change the last FC layer to two outputs. The AlexNet architecture is illustrated in Figure 7 and Table 3 lists the details of the AlexNet.

Parameter Optimization
Selecting an optimal network parameter combination is a time-consuming task. In this study, the objective is to investigate the performance of CNNs using parameter optimization. The Taguchi method is a low-cost, high-efficiency quality engineering method that emphasizes improving product quality through design experiments. Therefore, the Taguchi method was applied for the parameter optimization of CNNs.
First, the objective function is defined. Then, the factors and levels that affect the objective function are selected. The orthogonal array and the signal-to-noise ratio (S/N ratio) are the two main indicators in the Taguchi method. The orthogonal array is used to determine the number of times the experiment needs and allocate experimental factors into an orthogonal array. Additionally, the S/N ratio is used to verify whether the CNN parameters are the optimal parameter combination. Finally, according to the experimental results the optimal key factors and levels are decided. Although the cost-effectiveness of the experiment is an issue, the optimal combination of factors and levels can be found. Figure 8 displays the flowchart of the Taguchi method.

Parameter Optimization
Selecting an optimal network parameter combination is a time-consuming task. In this study, the objective is to investigate the performance of CNNs using parameter optimization. The Taguchi method is a low-cost, high-efficiency quality engineering method that emphasizes improving product quality through design experiments. Therefore, the Taguchi method was applied for the parameter optimization of CNNs.
First, the objective function is defined. Then, the factors and levels that affect the objective function are selected. The orthogonal array and the signal-to-noise ratio (S/N ratio) are the two main indicators in the Taguchi method. The orthogonal array is used to determine the number of times the experiment needs and allocate experimental factors into an orthogonal array. Additionally, the S/N ratio is used to verify whether the CNN parameters are the optimal parameter combination. Finally, according to the experimental results the optimal key factors and levels are decided. Although the cost-effectiveness of the experiment is an issue, the optimal combination of factors and levels can be found. Figure 8 displays the flowchart of the Taguchi method.  Understand the task to be completed. Here, the CNN parameters, including kernel size (KS), stride (S), and padding (P), were tasks that needed to be optimized in order to achieve higher accuracy in fewer experiments.

•
Second step: Select factors and levels. In AlexNet, the first convolutional layer involves global feature extraction, and the fifth convolutional layer involves local feature extraction of the input image. Therefore, KS, S, and P of the first and fifth convolutional layers were adjusted by Taguchi method. The factors are: kernel size (C1-KS), stride (C1-S), and padding (C1-P) of the first convolutional layer, kernel size (C5-KS), stride (C5-S), and padding (C5-P) of the fifth convolutional layer. The levels are assigned according to the parameters commonly used in the state-of-art CNNs as shown in Table 4. Choose an appropriate orthogonal array. The orthogonal array provides statistical information with fewer experiments. After the factors and levels selection, the appropriate orthogonal array should be chosen based on the factors and the levels. In this study, C1-P had two levels, and C1-KS, C1-S, C5-KS, C5-S, and C5-P had three levels. The total degree of freedom in the experiment is 11, therefore, the L 18 orthogonal array is selected. Initially, the selected factors and levels required 486 (3 × 3 × 2 × 3 × 3 × 3) experiments, while using the orthogonal array the scope of experiments was reduced to only 18 experiments.

• Fifth step:
Perform 18 experiments based on the L 18 orthogonal array. In this study, each experiment was tested five times to get an overall accuracy.

•
Sixth step: Calculate the S/N ratio and analyze the experimental data.

• Seventh step:
Accurate classification of lung tumor images is the purpose of this study. Hence, a higher S/N ratio indicates that the parameter combination is optimal and is able to provide superior performance.

•
Eighth step: Finally, use the acquired optimal parameter combination to train AlexNet again to verify that the optimal parameter combination is able to improve the accuracy of this network.

Experimental Results
All experiments were implemented using MATLAB software on a personal computer (Intel Xeon processor E3-1225 v5; processor speed, 3.30 GHz; GTX 1080 graphics processor unit).

SPIE-AAPM Lung CT Challenge Data Set
The SPIE-AAPM Lung CT Challenge data set [29] was first presented at the Medical Imaging conference in 2015 and supported by the American Society of Medical Physics (AAPM) and the National Cancer Institute. It contains 22,489 lung CT images, with 11,407 images of malignant tumors and 11,082 images of benign tumors. The size of each image is 512 × 512 pixels. Figure 9 displays the CT images of malignant and benign tumors.
Perform 18 experiments based on the orthogonal array. In this study, each experiment was tested five times to get an overall accuracy.

• Sixth step:
Calculate the S/N ratio and analyze the experimental data.
• Seventh step: Accurate classification of lung tumor images is the purpose of this study. Hence, a higher S/N ratio indicates that the parameter combination is optimal and is able to provide superior performance.

• Eighth step:
Finally, use the acquired optimal parameter combination to train AlexNet again to verify that the optimal parameter combination is able to improve the accuracy of this network.

Experimental Results
All experiments were implemented using MATLAB software on a personal computer (Intel Xeon processor E3-1225 v5; processor speed, 3.30 GHz; GTX 1080 graphics processor unit).

SPIE-AAPM Lung CT Challenge Data Set
The SPIE-AAPM Lung CT Challenge data set [29] was first presented at the Medical Imaging conference in 2015 and supported by the American Society of Medical Physics (AAPM) and the National Cancer Institute. It contains 22,489 lung CT images, with 11,407 images of malignant tumors and 11,082 images of benign tumors. The size of each image is 512 × 512 pixels. Figure 9 displays the CT images of malignant and benign tumors.

Experiment 1: Data Augmentation
All lung tumor images were classified using AlexNet and the training parameters of AlexNet and GAN listed in Table 6 were chosen by the user experiences based on MATLAB official default settings. In order to avoid producing confusing images, malignant and benign tumor images were generated separately. Figure 10 displays the generated images.

Experiment 1: Data Augmentation
All lung tumor images were classified using AlexNet and the training parameters of AlexNet and GAN listed in Table 6 were chosen by the user experiences based on MATLAB official default settings. In order to avoid producing confusing images, malignant and benign tumor images were generated separately. Figure 10 displays the generated images.   The generated images were mixed into the original image data set for lung tu identification. Thereby, 70% of mixed images were training data and 30% of mixed im were testing data, as presented in Figure 11.   Table 7 lists the num of mixed images and Table 8 presents the accuracy, specificity, and sensitivity of tumor classification. With data augmentation, both accuracy and sensitivity impr from 97.48% to 98.42% and from 95.10% to 99.40%, respectively. The generated images were mixed into the original image data set for lung tumor identification. Thereby, 70% of mixed images were training data and 30% of mixed images were testing data, as presented in Figure 11.   The generated images were mixed into the original image data set for lung tumor identification. Thereby, 70% of mixed images were training data and 30% of mixed images were testing data, as presented in Figure 11.  The accuracy improves when the number of images is increased. Table 7 lists the number of mixed images and Table 8 presents the accuracy, specificity, and sensitivity of lung tumor classification. With data augmentation, both accuracy and sensitivity improved from 97.48% to 98.42% and from 95.10% to 99.40%, respectively.  The accuracy improves when the number of images is increased. Table 7 lists the number of mixed images and Table 8 presents the accuracy, specificity, and sensitivity of lung tumor classification. With data augmentation, both accuracy and sensitivity improved from 97.48% to 98.42% and from 95.10% to 99.40%, respectively.

Experiment 2: Verification of Generated Image
To verify the plausible generated images, 30% of the original images were reserved as the validation data at the beginning. The remaining 70% of the original images were mixed with generated images as training data. The flowchart of verification process is presented in Figure 13.   Table 9 and Table 10 displays the accuracy, specificity, and sensitivity of verification. The experimental results reveal that the

Experiment 2: Verification of Generated Image
To verify the plausible generated images, 30% of the original images were reserved as the validation data at the beginning. The remaining 70% of the original images were mixed with generated images as training data. The flowchart of verification process is presented in Figure 13.

Experiment 2: Verification of Generated Image
To verify the plausible generated images, 30% of the original images were reserved as the validation data at the beginning. The remaining 70% of the original images were mixed with generated images as training data. The flowchart of verification process is presented in Figure 13.   Table 9 and Table 10 displays the accuracy, specificity, and sensitivity of verification. The experimental results reveal that the   Tables 9 and 10 displays the accuracy, specificity, and sensitivity of verification. The experimental results reveal that the accuracy and sensitivity improved after the original data set was augmented. The highest accuracy rate reached 99.60%, and the highest sensitivity was 99.80% in other words, the generated images are able to be trusted to solve the problem of sparse medical images.
Appl. Sci. 2021, 11, x FOR PEER REVIEW 12 of 17 accuracy and sensitivity improved after the original data set was augmented. The highest accuracy rate reached 99.60%, and the highest sensitivity was 99.80% in other words, the generated images are able to be trusted to solve the problem of sparse medical images.  From Figures 12 and 14, it can be noticed that increasing the size of data into quadruple does not show significant accuracy improvement. The reason might be that the images diversity is sufficient for the network to learn the features of lung tumors when increasing the size of data into triple. Moreover, the accuracy in experiment 2 reached 99.6% which is higher than that of in experiment 1 might be the reason that the goal of conducing experiment 2 was to verify the generated images which contain noises in order to train the network more diversity. Therefore, the generated images are more appropriate to help network extracting different features but not for testing. Moreover, the sensitivity is another important index evaluation for the network, especially in medical applications. From those experiments, after data augmentation, the highest sensitivity achieved is 99.8%.

Using Parameter Optimization in Experiment 1
In parameter optimization, 18 parameter combinations were conducted through the orthogonal array, and each experiment was repeated five times. The training parameters of AlexNet are presented in Table 11. Table 12 lists the five observations and the S/N ratio.   From Figures 12 and 14, it can be noticed that increasing the size of data into quadruple does not show significant accuracy improvement. The reason might be that the images diversity is sufficient for the network to learn the features of lung tumors when increasing the size of data into triple. Moreover, the accuracy in experiment 2 reached 99.6% which is higher than that of in experiment 1 might be the reason that the goal of conducing experiment 2 was to verify the generated images which contain noises in order to train the network more diversity. Therefore, the generated images are more appropriate to help network extracting different features but not for testing. Moreover, the sensitivity is another important index evaluation for the network, especially in medical applications. From those experiments, after data augmentation, the highest sensitivity achieved is 99.8%.

Using Parameter Optimization in Experiment 1
In parameter optimization, 18 parameter combinations were conducted through the orthogonal array, and each experiment was repeated five times. The training parameters of AlexNet are presented in Table 11. Table 12 lists the five observations and the S/N ratio. According to the S/N ratio in Table 12, the optimal level based on each factor was analyzed and the significant factors were ranked. The results from 18 experiments were displayed in Table 13 and the best factors were the results mapping to Table 4. The best parameter combination for SPIE-AAPM data set classification is C1-KS 3 , C1-S 1 , C1-P 1 , C5-KS 2 , C5-S 3 , C5-P 1 . The highest accuracy using the best parameter combination achieved is 99.99% in this study. The accuracy is considerably higher than other networks, besides, the training time is less than those networks. The results are shown in Table 14.  Table 15 displays the best factors based on different sizes of images. Table 16 presents a comparison of AlexNet using the Taguchi method with original AlexNet according to different data quantity. Table 16 reveals that the average accuracy improves from 97.48% to 99.49% after data augmentation and parameter optimization. In addition, the experimental results are graphically presented in Figure 15.    Table 17 presents the best factors according to each size of data set. Table 18 lists a comparison of the AlexNet with Taguchi method with original AlexNet. The accuracy increases from 97.10% to 99.86% when the data are augmented, and the parameter optimization is implemented.  9  4  2  3  1  1  Double  9  4  2  3  3  1  Treble  13  4  2  3  1  1  Quadruple  11  2  2  3 3 1 Figure 15. Graph of the best parameter combination in comparison with AlexNet. Table 17 presents the best factors according to each size of data set. Table 18 lists a comparison of the AlexNet with Taguchi method with original AlexNet. The accuracy increases from 97.10% to 99.86% when the data are augmented, and the parameter optimization is implemented. Table 17. Best factors of each quantity of images in experiment 2. Factor  C1  C5  KS  S  P  KS  S  P   Original  9  4  2  3  1  1  Double  9  4  2  3  3  1  Treble  13  4  2  3  1  1  Quadruple  11  2  2  3 3 1 Overall, from experiment one and two, considering the size of the data, the better augment size might be double or triple. The accuracy shows significant improvement in those sizes of data set. In addition, AlexNet with optimal parameter combination shows better accuracy and lower standard deviation, which is more stable than the original AlexNet. Although Taguchi method reduces the number of experiments, it still needs to execute multiple times. However, for medical applications, it is vital to have an accurate classification network.

Conclusions
An accurate lung tumor classification is a crucial role for early diagnosis. Computer aided design can considerably reduce clinicians' workload. However, obtaining open access medical images is difficult. Therefore, the GAN was used to augment the data set to alleviate the data shortage problem. With data augmentation, the overall accuracy of the CNN improved by 2.73%. Moreover, tuning the parameters in CNN has become another issue to face nowadays. In this study, the Taguchi method was implemented for selecting optimal parameters through fewer experiments. The experimental results revealed that the accuracy of using the optimal parameter combination can reach 99.86%. The present study only discussed the lung tumor classification and the optimizer for CNNs only took three parameters in the first and fifth layers as consideration. Further research will entail clinical application and optimizer improvement, such as adjusting the parameters of each layer to obtain the best parameter combination or implementing the optimizer in different network architectures. The method can also be applied to other medical applications, such as breast, brain, and liver cancer classification.