Using Double Convolution Neural Network for Lung Cancer Stage Detection

: Recently, deep learning is used with convolutional Neural Networks for image classiﬁcation and ﬁgure recognition. In our research, we used Computed Tomography (CT) scans to train a double convolutional Deep Neural Network (CDNN) and a regular CDNN. These topologies were tested against lung cancer images to determine the Tx cancer stage in which these topologies can detect the possibility of lung cancer. The ﬁrst step was to pre-classify the CT images from the initial dataset so that the training of the CDNN could be focused. Next, we built the double Convolution deep Neural Network with max pooling to perform a more thorough search. Finally, we used CT scans of different Tx cancer stages of lung cancer to determine the Tx stage in which the CDNN would detect possibility of lung cancer. We tested the regular CDNN against our double CDNN. Using this algorithm, doctors will have additional help in early lung cancer detection and early treatment. After extensive training with 100 epochs, we obtained the highest accuracy of 0.9962, whereas the regular CDNN obtained only 0.876 accuracy.


Introduction
Medical treatment has always been done with symptoms-based analysis. This means that patients first have their symptoms analyzed and, if necessary, they are sent for a more precise analysis (specialists and scans). Nowadays, the concept of "precise medicine" tries to solve the problem of the vast but fractured state of biomedical data. This is done using patient-centric appointments and storing the digital data of the patients in shareable online databases [1]. Furthermore, the European Medical Association, the World Health Organization and the United States Association have found that there is an enormous increase in lung cancer in the United states and Europe, making lung cancer the number one cause of death in Europe and the US, [1]. Latest developments in deep learning and Deep Neural Networks (DNN) have improved the process of image recognition. Using Deep Neural Networks, we can search for patterns in an image and determine if we recognize the pattern. Furthermore, when analyzing an image, we can search for multiple patterns. Training the Neural Network often requires a dataset that is predetermined, which the network can use to learn, recognize and classify an image.
Deep Neural Networks are becoming more and more popular as they can be easily applied to image pattern recognition and image classification. Few other derivative methods have emerged, such as Template Matching, Support Vector Machine, Deep Restricted Boltzmann, Stacked Autoencoders and Deep Convolutional Networks [2,3]. Convolutional Deep Neural Networks have number of medical images is difficult to obtain, other methods can be used to train the DNN. In [23], the authors used active learning to help with the dataset, that is, to help with selecting and classifying the images before training. They used multistage training scheme to overcome the overfitting problem, which means that they started with a smaller dataset and reduced it to the point where there is no overfitting. For each step, they predicted the amount of data they needed to send the DNN and measured if and when overfitting happens.
To train the network with "heavy" multimedia, one needs to have large set of input nodes to pass the information through the network. In [24,25], the authors used extremely large Computer-Aided Detection (CAD) 3D images of lung cancer to provide the classification. To achieve this, they used U-Net LUNA 16 labeled data nodules to pass throughout the network. They had smaller pieces of the images already divided into nodules (pieces) that were pre-labeled as malignant or not. This way, the entire image is not taken into consideration, but the small nodules that are directly mapped to the nodes of the network.
Images can be classified (using CDNN) in more ways, not just into piles of cancerous or non-cancerous. In [26,27], the authors used fluorodeoxyglucose positron emission tomography (FDG-PET) images to determine the Tx stage of the cancer. They defined four piles of T1-T4 stages of cancer and determined the outcome of the classification. As an extended research [26], the authors of [27] additionally used CAD images and compared the results with FDG-PET.
We propose a lung cancer medical image classifier that is based on a Convolutional Deep Neural Network. To train and test our system, we used CT images of lungs that were previously classified by medical specialists and put into piles of yes/no (yes, the patient is diagnosed with lung cancer; and, no, the patient is cancer-free). Similar to Stanitsas and Cherian in [23], we pre-classified the images, but, in our case, we pre-classified them into groups of slice images taken from the same angle of the lung from different patients from our training dataset. Our system was trained using these images to be able to classify a new (previously unknown) image into one of the two piles (pile of cancer or pile of cancer-free) and tested the network to determine the success rate. Similar to the authors of [24,25], we divided the image into smaller pieces (using the convolution layer). Unlike the work in [24,25], our algorithm uses the entire image (combined with the pieces) for each following layer, reduced with a max-pooling algorithm. When the initial success rate of training the network was satisfactory (fit value), the topology was saved and further asynchronously tested against an additional dataset. This additional dataset was composed of images outside of the initial dataset of CT lung images and contains medical images of predetermined lung cancer images in stages 2, 3 and 4. Our algorithm, unlike the algorithm in [26,27], uses these three stages of lung cancer and determines in which of these Tx stages our algorithm can detect the possibility of a cancer.

Medical Image Classification Using Double DNN
Image recognition in Deep Neural Networks is based on image classification, where the Neural Network is trained to classify an image into a list of predetermined piles or types, [28]. In its simplest form, it is used to determine if something is recognized or not. In our case, we tried to classify medical images and determine if there is cancer or not, thus we can simplify the outcome of the recognition as YES/NO (YES there is cancer or NO there is not). The Neural Network has to be trained so that it can be used for image classification. This process takes a list of input data, which are fed to the network and the outcome is compared to the expected outcome. The input data in our case were a pile of CT images that were fed to the network, so the input layer could have as many input nodes as the size of the array. This way, the input layer would have many nodes and the training would be slower and the network may overfit. Thus, we added additional layers (i.e., max-pooling algorithm) that downsized the input data. The output of the network can be a single node 0/1 or an array. The exit layer, in our case, outputs a single decimal value that is between 0.0 and 1.0 (0.0 (not cancer) or 1.0 (cancer)).

Data Preparation
The data preparation is a crucial part in DNN training and testing. In our case, data preparation was done in several stages. The CT images were obtained from the Image & Data Archive of the University of South Carolina and the Laboratory of Neuro Imaging (LONI) database, (ida.loni.usc.edu). These images were analyzed and classified by medical personnel (as cancerous or not) by performing a biopsy of the lung cancer tissue to ensure high level of certainty about the labeling. The initial dataset contains CT scans of patients. When a patient does a CT scan, the scanner takes many images of the lung of the patient; each of these images is from a different part of the lung. These images are called slices (different angle images), which capture different parts or angles of the lung. Thus, one CT scan of one patient can produce many slices, and each of these slices is saved as an image. First, the initial dataset was divided into two piles; the first pile contained images from patients that were diagnosed with cancer, and the second pile contained images from patients without cancer. Thus, the two piles divided the images into cancerous or cancer-free. Next, the images of the two piles were further divided into groups, where each group contained images (slices) from the same part of the lung but from different patients.
The initial dataset had 95 patients, where each patient had gone through one CT scanning process. One such process produced 64 CT images (slices) of the chest of the patient. This means that the initial dataset had 6080 images that the DNN used for training and testing. One of the 64 slices (images) is shown on Figure 1. Next, these images were labeled by medical personnel as cancerous or cancer-free and the initial dataset was divided into the two piles. In our case, we had 73 patients who were diagnosed the possibility of cancer (Pile 1) and 22 who were without cancer (Pile 2).
Appl. Sci. 2019, 9, x FOR PEER REVIEW 4 of 12 (ida.loni.usc.edu). These images were analyzed and classified by medical personnel (as cancerous or not) by performing a biopsy of the lung cancer tissue to ensure high level of certainty about the labeling. The initial dataset contains CT scans of patients. When a patient does a CT scan, the scanner takes many images of the lung of the patient; each of these images is from a different part of the lung. These images are called slices (different angle images), which capture different parts or angles of the lung. Thus, one CT scan of one patient can produce many slices, and each of these slices is saved as an image. First, the initial dataset was divided into two piles; the first pile contained images from patients that were diagnosed with cancer, and the second pile contained images from patients without cancer. Thus, the two piles divided the images into cancerous or cancer-free. Next, the images of the two piles were further divided into groups, where each group contained images (slices) from the same part of the lung but from different patients. The initial dataset had 95 patients, where each patient had gone through one CT scanning process. One such process produced 64 CT images (slices) of the chest of the patient. This means that the initial dataset had 6080 images that the DNN used for training and testing. One of the 64 slices (images) is shown on Figure 1. Next, these images were labeled by medical personnel as cancerous or cancer-free and the initial dataset was divided into the two piles. In our case, we had 73 patients who were diagnosed the possibility of cancer (Pile 1) and 22 who were without cancer (Pile 2). Next, we needed to further group the images in the two piles, where each group represented images from same slice (angle of the chest of the patient). This way, we created 64 groups within each pile, where each group in Pile 1 (cancer pile) had 73 CT images. Each of the 64 groups of Pile 2 (cancerfree pile) had 22 CT images. A sample of different angle (slice) CT images is shown on Figure 2. The middle image is cancer-free (belongs to Pile 2) and the other two are with marked location of the cancer (form Pile 1). Creating the piles of images was done so that we can create positives and negatives to train and test the network, but the groups in each pile were created so that Deep Neural Network would focus on recognizing same slice (angle) images. The groups were created using Kmeans algorithm to group the image into the appropriate slice group. The reason we used the Kmeans clustering is because we had CT images that were taken from 16-, 32-, 64-, 128-, 256-and 320slice CT scanners. We used 64 groups because most of the images we had were 64-sliced. For images obtained from a scanning that made more or fewer than 64 slices, we used the K-means algorithm to put them into the correct group. Next, we needed to further group the images in the two piles, where each group represented images from same slice (angle of the chest of the patient). This way, we created 64 groups within each pile, where each group in Pile 1 (cancer pile) had 73 CT images. Each of the 64 groups of Pile 2 (cancer-free pile) had 22 CT images. A sample of different angle (slice) CT images is shown on Figure 2. The middle image is cancer-free (belongs to Pile 2) and the other two are with marked location of the cancer (form Pile 1). Creating the piles of images was done so that we can create positives and negatives to train and test the network, but the groups in each pile were created so that Deep Neural Network would focus on recognizing same slice (angle) images. The groups were created using K-means algorithm to group the image into the appropriate slice group. The reason we used the K-means clustering is because we had CT images that were taken from 16-, 32-, 64-, 128-, 256and 320-slice CT scanners. We used 64 groups because most of the images we had were 64-sliced. For images obtained from a scanning that made more or fewer than 64 slices, we used the K-means algorithm to put them into the correct group.
Network would focus on recognizing same slice (angle) images. The groups were created using Kmeans algorithm to group the image into the appropriate slice group. The reason we used the Kmeans clustering is because we had CT images that were taken from 16-, 32-, 64-, 128-, 256-and 320slice CT scanners. We used 64 groups because most of the images we had were 64-sliced. For images obtained from a scanning that made more or fewer than 64 slices, we used the K-means algorithm to put them into the correct group. The number of slices determines the distance between one scanned image of the body to the next. If there are more slices, the distance between each slice is smaller, but we get more information about the patient. Furthermore, one scanner from one manufacturer can output images from first to last (or vice versa) or save them in files in their own file format name, thus even if patients were scanned by a 64-slice scanner, if images were from a different manufacturer or version of the software, the images might not be in the same order. We could discard images that are not compliant to our scanner, but since CT images are hard to obtain, we did not discard images. The K-means algorithm we used is given in Equation (1).
where "k" is the number of groups (in our case 64), and we made about n = 100 test cases where the image should be placed. More tests (n-parameter) lead to a more precise estimation (classification) of the image. In our case, by using basic trial-and-error method, we found that 100 cycles were the smallest sufficient number of tests to classify a CT image into the appropriate slice group.
The image that we tried to group is X i , and we compared that image to pre-group one from each group, C j . We picked a referent group of pre-classified images C j and used them to determine the distance function (Euclidean distance) |X i -C i | and cluster the new image X i . Since Euclidean distance is the shortest distance between two points, we calculated the smallest difference between the image we tried to group and the pre-group reference image. The distance between two images is the difference between them, i.e., how close they are to each other. We calculated the structural similarity index of two grayscale images (Python function compare_ssmin from skimage Python package). This function returns the score of the comparison and the difference between the two images. The group with the smallest difference is where the image Xi was placed.
Additionally, the algorithm further divided the groups into training and testing sets. In our case, we used 10% of the images for testing and the other 90% for training. The data were loaded from a batch of images and the images were divided into training and testing sets (a subsets [X,Y]). Each of these subsets was accompanied by a set of 0s and 1s indicating that the image was cancer-free or cancerous (this output was evaluated by medical personnel, in this case an oncologist). Finally, the data were shuffled, converted into binary class matrix and fed to the Neural Network.

Defining, Training and Testing the DNN
Once the images were ready and in binary matrix form, they were fed to the DNN and trained and tested. However, the network model had to be created so that it could be trained and tested. The creation of the network model means defining the parameters and layers of the DNN. After images were ready to be used for training (Section 3.1), we prepared the Neural Network with additional layers to create the Deep Neural Network. The inner layers are composed of one convolution layer, max pooling layer, followed by double convolution layers (two convolutions) and an additional max pooling. The first convolution layer does the initial segmentation of the images and the interconnection of the nodes. Next, we needed to reduce the size of the data with the max-pooling layer (to avoid over-fitting). The second and the third convolution was done so that we could make a more thorough search of the problem (the cancer) and obtain more precise information of where the cancer might be. The connection between the convolution layer and the DNN is provided by Equation (2). Equation (2) shows how the state of one isolated neuron is calculated using convolution, if there were q input connections.
We can see from Equation (2) that the state of the neuron with one convolution layer is F H , where we have H kernel filtered images, which use filters W and a bias factor b. The bias factor b can have value 0 or 1, telling the network whether to include that neuron. x i is the value of the input nodes of the previous layer to the i-th node of the current layer. Equation (2) is illustrated on Figure 3.
Appl. Sci. 2019, 9, x FOR PEER REVIEW 6 of 12 value 0 or 1, telling the network whether to include that neuron. x i is the value of the input nodes of the previous layer to the i-th node of the current layer. Equation (2) is illustrated on Figure 3. The calculation of the correlation between the convolution and the DNN presented in Equation (2) is further explained in Equation (3). Equation (3) is Equation (2) with the addition that the neuron is now a part of a hidden convolution layer and the output energy of the neuron Ej,k is calculated as: where the energy output E is calculated for neuron k in layer j of the DNN where we use σ (sigmoid function) to calculate the fire value. Again, we used the bias factor b, the input connection weights of each neuron in layer j-1 denoted as "w", and we convoluted that value with the input from the nodes in the previous layer, denoted as "x". "q" and "l" represent the size of the input matrix of shared weights of W, and "ii" and "jj" are the indexes of the input activation at position (j+i,k+z). The calculation of the energy output Ej,k is illustrated on Figure 4. When the DNN is trained, the bias factor can be more precisely calculated and the correlations between the nodes adjusted. What the convolution does to an image is shown in Figure 5.
As shown in Figure 5, we further divided the image into smaller parts, where the parts overlap. This way, we could focus on (isolate) a certain part of the image and use that (smaller) image to search for a pattern. In our network, we defined the convolution parameters and the slicing window. Our convolution layer takes input of 128  128  1 (width  height  color). We used 1 for color (depth) since the image was grayscale. Furthermore, we used Rectified Linear Unit as an activation function, which means that all negative values of activation are replaced with a value of 0. There is a tradeoff here as to how much overlapping there should be. If we increase the overlapping, we make more window images and thus more detailed search. However, by doing so, we slow down the process of learning and classifying, as the window-cutting requires more resources. Since the convolution results in more smaller images from the original one, by using max-pooling, we reduce the size of these images into chunks of data, where we get the most (maximum) of every image. This means that The calculation of the correlation between the convolution and the DNN presented in Equation (2) is further explained in Equation (3). Equation (3) is Equation (2) with the addition that the neuron is now a part of a hidden convolution layer and the output energy of the neuron E j,k is calculated as: where the energy output E is calculated for neuron k in layer j of the DNN where we use σ (sigmoid function) to calculate the fire value. Again, we used the bias factor b, the input connection weights of each neuron in layer j-1 denoted as "w", and we convoluted that value with the input from the nodes in the previous layer, denoted as "x". "q" and "l" represent the size of the input matrix of shared weights of W, and "ii" and "jj" are the indexes of the input activation at position (j + i, k + z). The calculation of the energy output E j,k is illustrated on Figure 4.
Appl. Sci. 2019, 9, x FOR PEER REVIEW 6 of 12 value 0 or 1, telling the network whether to include that neuron. x i is the value of the input nodes of the previous layer to the i-th node of the current layer. Equation (2) is illustrated on Figure 3. The calculation of the correlation between the convolution and the DNN presented in Equation (2) is further explained in Equation (3). Equation (3) is Equation (2) with the addition that the neuron is now a part of a hidden convolution layer and the output energy of the neuron Ej,k is calculated as: where the energy output E is calculated for neuron k in layer j of the DNN where we use σ (sigmoid function) to calculate the fire value. Again, we used the bias factor b, the input connection weights of each neuron in layer j-1 denoted as "w", and we convoluted that value with the input from the nodes in the previous layer, denoted as "x". "q" and "l" represent the size of the input matrix of shared weights of W, and "ii" and "jj" are the indexes of the input activation at position (j+i,k+z). The calculation of the energy output Ej,k is illustrated on Figure 4. When the DNN is trained, the bias factor can be more precisely calculated and the correlations between the nodes adjusted. What the convolution does to an image is shown in Figure 5.
As shown in Figure 5, we further divided the image into smaller parts, where the parts overlap. This way, we could focus on (isolate) a certain part of the image and use that (smaller) image to search for a pattern. In our network, we defined the convolution parameters and the slicing window. Our convolution layer takes input of 128  128  1 (width  height  color). We used 1 for color (depth) since the image was grayscale. Furthermore, we used Rectified Linear Unit as an activation function, which means that all negative values of activation are replaced with a value of 0. There is a tradeoff here as to how much overlapping there should be. If we increase the overlapping, we make more window images and thus more detailed search. However, by doing so, we slow down the process of learning and classifying, as the window-cutting requires more resources. Since the convolution results in more smaller images from the original one, by using max-pooling, we reduce the size of When the DNN is trained, the bias factor can be more precisely calculated and the correlations between the nodes adjusted. What the convolution does to an image is shown in Figure 5.
As shown in Figure 5, we further divided the image into smaller parts, where the parts overlap. This way, we could focus on (isolate) a certain part of the image and use that (smaller) image to search for a pattern. In our network, we defined the convolution parameters and the slicing window. Our convolution layer takes input of 128 × 128 × 1 (width × height × color). We used 1 for color (depth) since the image was grayscale. Furthermore, we used Rectified Linear Unit as an activation function, which means that all negative values of activation are replaced with a value of 0. There is a tradeoff here as to how much overlapping there should be. If we increase the overlapping, we make more window images and thus more detailed search. However, by doing so, we slow down the process of learning and classifying, as the window-cutting requires more resources. Since the convolution results in more smaller images from the original one, by using max-pooling, we reduce the size of these images into chunks of data, where we get the most (maximum) of every image. This means that we searched for the cancer by upsizing the image in one layer and downsizing the results in the next by maximizing the bias (similarity) between adjacent kernels of the convolution. In the convolution function, we mainly used sharpening and edge detection filters. The filter in the convolution is a simple matrix that convolutes the image matrix and the result is another image whose edges are sharpened. The resulting image of the convolution filters is the third image in Figure 5. Before we trained and tested the network, we had to define the learning rate and the dropout factor. In our case, the algorithm drops out elements that have below 50% success rate (by testing different models, we found that 50% dropout rate is most optimal for image classification). The calculation of the half square-error cost function is done by Equation (4). As we can see from Equation (4), we used half square-error cost function, that is, we back propagated the error to correct the previous layers and the convolution bias factor.
In Equation (4), as input, we take the connection weights of the network as W, the bias b, the input weights of the nodes x and the expected outcome y. The output of the network hw,b (x) is calculated against the expected output y and one half of the error is propagated throughout the network.
Once the network was defined, the algorithm was then executed using the network parameters and the result of the training and testing is the DNN topology. The training process took the preprepared dataset presented in Section 3.1 and defined the (X,Y) as 90% (5472 images) of the dataset and testing (X_Test, Y_Test) as the remaining 10% (608 images). Each epoch (train cycle), the algorithm passed all images once through the training process. Our training process had 100 epochs to train 95 CT images in one pile of 64 piles. After the training was finished, the network's topology was saved and could be used to classify CT images and determine the possibility of a cancer. In the process of training and testing, the network calculated the fit value of the topology by evaluating the number of properly classified images against the error.
In the training and testing process, we evaluate the system and its accuracy by making average of the epochs and how the algorithm classified the images. For defining the DNN and training and testing it, we used Tensorflow-GPU version 1.8 (Google, Montingeville, CA, USA) compiled for CUDA GPU version 7.1 (NVIDIA, Santa Clara, CA, USA) and Keras libraries version 2.1 in Python combined with native Python libraries to prepare the data. The algorithm was executed on a GPU-NVIDIA Corporation GM200 machine (NVIDIA, Santa Clara, CA, USA) equipped with about 1000 GPU GeForce cores. After the network was defined, trained, tested and cross-validated (this process took several hours), the topology was saved and used to classify new images outside the initial dataset. The classification of new images took a few seconds, which means that medical personnel and patients would have initial diagnosis in just seconds after the CT scanning is finished.
We defined the cancerous images as positive images (4672 images), and the cancer-free images as negatives (1408 images), and we calculated the true positives (accurately classified positives), true negatives (accurately classified negatives), false positive and false negative (inaccurately classified Before we trained and tested the network, we had to define the learning rate and the dropout factor. In our case, the algorithm drops out elements that have below 50% success rate (by testing different models, we found that 50% dropout rate is most optimal for image classification). The calculation of the half square-error cost function is done by Equation (4). As we can see from Equation (4), we used half square-error cost function, that is, we back propagated the error to correct the previous layers and the convolution bias factor.
In Equation (4), as input, we take the connection weights of the network as W, the bias b, the input weights of the nodes x and the expected outcome y. The output of the network h w,b (x) is calculated against the expected output y and one half of the error is propagated throughout the network.
Once the network was defined, the algorithm was then executed using the network parameters and the result of the training and testing is the DNN topology. The training process took the pre-prepared dataset presented in Section 3.1 and defined the (X,Y) as 90% (5472 images) of the dataset and testing (X_Test, Y_Test) as the remaining 10% (608 images). Each epoch (train cycle), the algorithm passed all images once through the training process. Our training process had 100 epochs to train 95 CT images in one pile of 64 piles. After the training was finished, the network's topology was saved and could be used to classify CT images and determine the possibility of a cancer. In the process of training and testing, the network calculated the fit value of the topology by evaluating the number of properly classified images against the error.
In the training and testing process, we evaluate the system and its accuracy by making average of the epochs and how the algorithm classified the images. For defining the DNN and training and testing it, we used Tensorflow-GPU version 1.8 (Google, Montingeville, CA, USA) compiled for CUDA GPU version 7.1 (NVIDIA, Santa Clara, CA, USA) and Keras libraries version 2.1 in Python combined with native Python libraries to prepare the data. The algorithm was executed on a GPU-NVIDIA Corporation GM200 machine (NVIDIA, Santa Clara, CA, USA) equipped with about 1000 GPU GeForce cores. After the network was defined, trained, tested and cross-validated (this process took several hours), the topology was saved and used to classify new images outside the initial dataset. The classification of new images took a few seconds, which means that medical personnel and patients would have initial diagnosis in just seconds after the CT scanning is finished.
We defined the cancerous images as positive images (4672 images), and the cancer-free images as negatives (1408 images), and we calculated the true positives (accurately classified positives), true negatives (accurately classified negatives), false positive and false negative (inaccurately classified positives and negatives). The averaged results of all 100 epochs are shown in Table 1. Using these parameters, we calculated the accuracy, sensitivity, specificity and positive prediction values of the two algorithms. The accuracy, shown in Equation (5), gives us the certainty of prediction or how accurate is the system. Furthermore, the sensitivity in Equation (6) gives us the measure of how the dataset is ready for classification or the measure of how accurate is the information it provides.
The specificity in Equation (7), on the other hand, gives us the ratio of how many of the cancerous images were classified as cancer-free, against other false-classified images of the dataset. We used an additional parameter called positive predictive value, shown in Equation (8), which indicates how much of the cancer has affected the patient (to determine the probability that the patient has cancer).
We tested different thresholds for classification of the images and plotted the values of the results on a Receiver Operating Characteristic (ROC) curve to determine the best classification threshold. The ROC is presented on Figure 6.  Using these parameters, we calculated the accuracy, sensitivity, specificity and positive prediction values of the two algorithms. The accuracy, shown in Equation (5), gives us the certainty of prediction or how accurate is the system. Furthermore, the sensitivity in Equation (6) gives us the measure of how the dataset is ready for classification or the measure of how accurate is the information it provides.
The specificity in Equation (7), on the other hand, gives us the ratio of how many of the cancerous images were classified as cancer-free, against other false-classified images of the dataset. We used an additional parameter called positive predictive value, shown in Equation (8), which indicates how much of the cancer has affected the patient (to determine the probability that the patient has cancer).
We tested different thresholds for classification of the images and plotted the values of the results on a Receiver Operating Characteristic (ROC) curve to determine the best classification threshold. The ROC is presented on Figure 6.  From the analysis of the ROC curve, we found that 0.76 threshold gives the best accuracy of classification. The results are given in Table 2. In Table 2, we can see that our double CDNN algorithm has an almost 99.6% accuracy with a 0.76 threshold value, whereas the regular CDNN had the highest accuracy of 87% with a 0.70 threshold value. The sensitivity of the data was similar in both cases and this was expected since they used the same dataset. In addition, we can see in Table 2 that our double CDNN obtained higher results for prediction of cancer than the regular CDNN. We discuss these topologies in Section 4 and the determination of the Txstage at which regular and double CDNN can detect the possibility of cancer.

Tx Stages of Lung Cancer of Our Double CDNN
After defining, training and testing our network (Section 3.2), we used the topologies of the regular and double CDNN in experiments with an additional dataset from 35 patients diagnosed with lung cancer in stages 2, 3 and 4 (images obtained from the medical hospital in Tetovo, Macedonia). Since it is difficult to obtain images in stages 0 and 1, we found 35 patients whose possibility of cancer was diagnosed in stage 2 and recorded up until late stage 4. In Figure 7, we can see example CT images of stages 2, 3 and 4 of lung cancer.
Appl. Sci. 2019, 9, x FOR PEER REVIEW 9 of 12 From the analysis of the ROC curve, we found that 0.76 threshold gives the best accuracy of classification. The results are given in Table 2. In Table 2, we can see that our double CDNN algorithm has an almost 99.6% accuracy with a 0.76 threshold value, whereas the regular CDNN had the highest accuracy of 87% with a 0.70 threshold value. The sensitivity of the data was similar in both cases and this was expected since they used the same dataset. In addition, we can see in Table 2 that our double CDNN obtained higher results for prediction of cancer than the regular CDNN. We discuss these topologies in Section 4 and the determination of the Txstage at which regular and double CDNN can detect the possibility of cancer.

Tx Stages of Lung Cancer of Our Double CDNN
After defining, training and testing our network (Section 3.2), we used the topologies of the regular and double CDNN in experiments with an additional dataset from 35 patients diagnosed with lung cancer in stages 2, 3 and 4 (images obtained from the medical hospital in Tetovo, Macedonia). Since it is difficult to obtain images in stages 0 and 1, we found 35 patients whose possibility of cancer was diagnosed in stage 2 and recorded up until late stage 4. In Figure 7, we can see example CT images of stages 2, 3 and 4 of lung cancer.  Figure 7) shows the cancer in red circle. In this stage, the tumor is larger than 4 cm and is in the middle of the lung and/or going towards the outer parts of the body. We can see in the second image in Figure 7 that the tumor is in late stage 3 since it leans towards the outer parts of the lungs. Stage 4 is in the third image in Figure  7 and we can see that the size of the tumor is covering large portions of the lung and is almost in the outer parts of the body and lung. This outer part of the lung is called area 1 and if the tumor is in this area (shown with red circle in the third image in Figure 7), it means that the cancer is terminal. From CT scan images, doctors can diagnose the stage only by using the size of the tumor (and in some cases the position of the tumor). Stage 2 (first image in Figure 7) shows the tumor in red circle on the left side, which in real size is around 4 cm. Stage 3 (second image in Figure 7) shows the cancer in red circle. In this stage, the tumor is larger than 4 cm and is in the middle of the lung and/or going towards the outer parts of the body. We can see in the second image in Figure 7 that the tumor is in late stage 3 since it leans towards the outer parts of the lungs. Stage 4 is in the third image in Figure 7 and we can see that the size of the tumor is covering large portions of the lung and is almost in the outer parts of the body and lung. This outer part of the lung is called area 1 and if the tumor is in this area (shown with red circle in the third image in Figure 7), it means that the cancer is terminal.

Comparison of the Regular Against Double CDNN
We tested these images with standard Convolution Deep Neural Network, used by the authors of [28], against our double Convolution pre-clustered Deep Neural Network with edge sharpening filters. We used this test set of images to determine the threshold or Tx stage in which both networks can detect possibility of cancer. The networks output a decimal value from 0.0 to 1.0, where 1.0 is cancer and 0.0 is cancer-free. We converted this value as a percentage of certainty and multiplied this by 100. The results of the two networks are shown in Figure 8. The drawback here is that we had to decide the minimal value of certainty we would accept as being satisfactory. To fairly compare both networks (regular and double CDNN), we took a mean of their best accuracy value. In Table 2, we can see that the best accuracy of the regular CDNN is 70%, and the best accuracy of the 76%, thus we used 73% as the minimal threshold value for certainty for cancer detection. Taking 73% as threshold for cancerous for both topologies and using this value as a threshold for cancerous, we can see in Figure 8 that our double CDNN detected cancer in stage 3, whereas the regular DNN from [28] did not detect cancer even in stage 4 (late stage). Taking the lower threshold value of 70% (Table 2), the regular CDNN detected possibility of cancer in late stage 4.
Appl. Sci. 2019, 9, x FOR PEER REVIEW 10 of 12 We tested these images with standard Convolution Deep Neural Network, used by the authors of [28], against our double Convolution pre-clustered Deep Neural Network with edge sharpening filters. We used this test set of images to determine the threshold or Tx stage in which both networks can detect possibility of cancer. The networks output a decimal value from 0.0 to 1.0, where 1.0 is cancer and 0.0 is cancer-free. We converted this value as a percentage of certainty and multiplied this by 100. The results of the two networks are shown in Figure 8. The drawback here is that we had to decide the minimal value of certainty we would accept as being satisfactory. To fairly compare both networks (regular and double CDNN), we took a mean of their best accuracy value. In Table 2, we can see that the best accuracy of the regular CDNN is 70%, and the best accuracy of the 76%, thus we used 73% as the minimal threshold value for certainty for cancer detection. Taking 73% as threshold for cancerous for both topologies and using this value as a threshold for cancerous, we can see in Figure 8 that our double CDNN detected cancer in stage 3, whereas the regular DNN from [28] did not detect cancer even in stage 4 (late stage). Taking the lower threshold value of 70% (Table 2), the regular CDNN detected possibility of cancer in late stage 4. Our results were discussed and analyzed with medical personnel from the oncology department, of the hospital in Tetovo, Macedonia. The results were marked as satisfactory, since expert oncologists cannot determine possibility of cancer from a CT scan up to stage 2 or 3. Experts can have doubts of a possibility of cancer from stage 0, but will not schedule a biopsy of the tissue until late stage 2 or 3. The threshold is expected at this stage, since most of the cancerous images used (Section 3.1) for training and testing (Section 3.2) of the algorithms (both standard and double convolution DNN) were mostly from phase T3 or above.

Conclusions
The first novelty in our paper is using the K-means algorithm to pre-classify the images into piles of same slice images, where the DNN can focus on image classification of same slice images. The second novelty is the additional convolution layer with edge sharpening filters, to thoroughly search for cancer. Finally, the main novelty is testing our Deep Neural Network with lung cancer images from Tx stages 2, 3 and 4 and determining at which Tx stage the two algorithms can detect the possibility of cancer. The results were analyzed with medical personnel from the oncology department and were marked as satisfactory to determine cancer in T3 phase.
For future work, we plan on making a further analysis, where we will change the DNN to output 2 values (0 and 1) and determine which one has higher certainty of classification. This way, we can classify the image not just as being decimal value between 0.0 or 1.0, but also compare how much is 0 (not cancer) and how much is 1 (cancer). For additional future work, similar to Cruz-Roa and Arevalo Ovalle in [29], who used RGB (color) images to highlight the area of malignant cells, we plan on modifying the DNN to show us where (the location) on the CT image it has detected a cancer. Our results were discussed and analyzed with medical personnel from the oncology department, of the hospital in Tetovo, Macedonia. The results were marked as satisfactory, since expert oncologists cannot determine possibility of cancer from a CT scan up to stage 2 or 3. Experts can have doubts of a possibility of cancer from stage 0, but will not schedule a biopsy of the tissue until late stage 2 or 3. The threshold is expected at this stage, since most of the cancerous images used (Section 3.1) for training and testing (Section 3.2) of the algorithms (both standard and double convolution DNN) were mostly from phase T3 or above.

Conclusions
The first novelty in our paper is using the K-means algorithm to pre-classify the images into piles of same slice images, where the DNN can focus on image classification of same slice images. The second novelty is the additional convolution layer with edge sharpening filters, to thoroughly search for cancer. Finally, the main novelty is testing our Deep Neural Network with lung cancer images from Tx stages 2, 3 and 4 and determining at which Tx stage the two algorithms can detect the possibility of cancer. The results were analyzed with medical personnel from the oncology department and were marked as satisfactory to determine cancer in T3 phase.
For future work, we plan on making a further analysis, where we will change the DNN to output 2 values (0 and 1) and determine which one has higher certainty of classification. This way, we can classify the image not just as being decimal value between 0.0 or 1.0, but also compare how much is 0 (not cancer) and how much is 1 (cancer). For additional future work, similar to Cruz-Roa and Arevalo Ovalle in [29], who used RGB (color) images to highlight the area of malignant cells, we plan on modifying the DNN to show us where (the location) on the CT image it has detected a cancer.
Author Contributions: G.J. and D.D. defined the problem to be detection of cancer in patients. Since G.J. obtained access to the image database of lung cancer, D.D. prepared the images to be fed to the algorithm. G.J. used the K-means algorithm to divide them into slice piles. D.D. adjusted the layers of the Deep Neural Network to reflect the algorithm. The additional tests with the stages II, III and IV was done by G.J. and D.D. The results from the tests, in the end, were analyzed by G.J. and medical personnel from the oncology department.

Conflicts of Interest:
The authors declare no conflict of interest.