Detection on Cell Cancer Using the Deep Transfer Learning and Histogram Based Image Focus Quality Assessment

In recent years, the number of studies using whole-slide imaging (WSIs) of histopathology slides has expanded significantly. For the development and validation of artificial intelligence (AI) systems, glass slides from retrospective cohorts including patient follow-up data have been digitized. It has become crucial to determine that the quality of such resources meets the minimum requirements for the development of AI in the future. The need for automated quality control is one of the obstacles preventing the clinical implementation of digital pathology work processes. As a consequence of the inaccuracy of scanners in determining the focus of the image, the resulting visual blur can render the scanned slide useless. Moreover, when scanned at a resolution of 20× or higher, the resulting picture size of a scanned slide is often enormous. Therefore, for digital pathology to be clinically relevant, computational algorithms must be used to rapidly and reliably measure the picture’s focus quality and decide if an image requires re-scanning. We propose a metric for evaluating the quality of digital pathology images that uses a sum of even-derivative filter bases to generate a human visual-system-like kernel, which is described as the inverse of the lens’ point spread function. This kernel is then used for a digital pathology image to change high-frequency image data degraded by the scanner’s optics and assess the patch-level focus quality. Through several studies, we demonstrate that our technique correlates with ground-truth z-level data better than previous methods, and is computationally efficient. Using deep learning techniques, our suggested system is able to identify positive and negative cancer cells in images. We further expand our technique to create a local slide-level focus quality heatmap, which can be utilized for automated slide quality control, and we illustrate our method’s value in clinical scan quality control by comparing it to subjective slide quality ratings. The proposed method, GoogleNet, VGGNet, and ResNet had accuracy values of 98.5%, 94.5%, 94.00%, and 95.00% respectively.


Introduction
Cancer is a leading cause of mortality worldwide [1], and according to a study conducted by the American Cancer Society (ACS), roughly 600,920 Americans were predicted to die from cancer in 2017 [2]. Therefore, combating cancer is a significant problem for both researchers and clinicians [3]. Cancer diagnosis relies heavily on early detection, which may enhance the chances of long-term survival. Medical imaging is a crucial technology for the early identification and diagnosis of cancer. As is well known, medical imaging has been extensively used for early cancer diagnosis, monitoring, and follow-up after therapy [4]. Therefore, beginning in the early 1980s, computer-aided diagnosis (CAD) systems were developed to help physicians in evaluating medical pictures more efficiently [5]. For the identification and diagnosis of cancer, machine learning methods are commonly used in CAD systems, integrating medical imaging. The extraction of features is often a crucial step in the adoption of machine learning methods. Different approaches to feature extraction have been studied for various imaging modalities and cancer types [6][7][8][9][10][11][12][13][14][15][16][17][18][19][20][21]. For detecting mass areas in mammograms, bilateral image subtraction, difference of Gaussians, and Laplacian of Gaussian filters have been chosen as feature extractors [6][7][8][9][10] for breast cancer diagnosis. However, past work have mostly focused on generating appropriate feature descriptors in conjunction with machine learning algorithms for medical picture context learning. These approaches based on the extraction of features have several flaws. The shortcoming prevents the performance of CAD systems from being enhanced further. In recent years, rather than feature engineering, the relevance of representation learning has been stressed [22,23] in order to overcome the shortcomings and enhance the performance of CAD systems. Deep learning is a representation learning approach in which picture data are used to create hierarchical feature representations. Deep learning can be used to generate high-level representations of image features directly based on raw image data. In addition, with the support of massively parallel architectures and graphics processing units (GPUs), deep learning strategies have achieved significant success in a variety of fields in recent years, including picture identification, object detection, and speech recognition. Recent studies have indicated, for example, that CNNs [24] produce promising outcomes in cancer detection and diagnosis.
In this article, we present various prominent deep learning algorithms and proposed metrics for evaluating digital pathology images which uses some even derivative filter bases to generate human visual-system-like kernels.
The remainder of this paper is laid out as follows. Related work is discussed in Section 2. Then, in Section 3, we present the proposed method. In Section 4, we present the dataset preparation. In Section 5, we present the experimental results and a discussion. In Section 6, we present our conclusions.

Related Work
Cancer is the unregulated and abnormal division and development of cells inside the body. A brain tumor is a mass of abnormally dividing and proliferating brain tissue cells. Despite their rarity, brain tumors are among the most fatal types of cancer [25].
Brain tumors are classified as either primary or metastatic, depending on where they started. Although metastatic brain tumors originate in another organ and then spread to the brain, primary brain cancers arise directly from brain tissue. Gliomas are brain tumors that arise from glial cells. According to recent segmentation studies, they are the most common kind of brain tumor studied. Astrocytomas and oligodendrogliomas are examples of low-grade gliomas, but glioblastoma multiforme (GBM) is the most aggressive and prevalent kind of primary malignant brain tumor [26]. Gliomas are often treated with surgery, chemotherapy, and radiation therapy [27].
For better treatment options, early glioma detection is crucial. Imaging methods such as CT, SPECT, positron emission tomography (PET), magnetic resonance spectroscopy (MRS), and magnetic resonance imaging (MRI) may all help assess the size, location, and metabolism of brain tumors (MRI). Although both imaging techniques are used to provide the most thorough analysis of brain tumors, MRI is preferred due to its superior soft tissue contrast and general availability. Using radio frequency waves to stimulate target tissues and produce interior images under the influence of a strong magnetic field, MRI is a noninvasive in vivo imaging technology. During image capture, different MRI sequence images may be obtained by varying the excitation and repetition periods. The various tissue contrast images produced by these MRI modalities allow for the identification and segmentation of tumors and their subregions, as well as the provision of structural data. There are four primary methods for diagnosing a glioma using MRI: T1-weighted imaging, T2-weighted imaging, T1-Gd imaging, and fluid attenuated inversion recovery imaging (FLAIR). Approximately 150 slices of 2D images, depending on the equipment, are produced during MRI acquisition to represent the 3D brain volume. Additionally, the data become quite complex and perplexing when the slices of the necessary standard modalities are joined for diagnosis [28].
Although T2 images are used to characterize the edema region on the image, which produces a strong signal, T1 images are used to distinguish healthy tissues. The tumor border may be easily seen in T1-Gd images because of the high signal of the accumulated contrast agent (gadolinium ions) in the active cell zone of the tumor tissue. Necrotic cells do not interact with the contrast agent; therefore, they may be easily distinguished from the area of active cells in the tumor core based on the hypo-intense part of the tumor core in the same order. The reduced water molecule signals of the FLAIR images help to distinguish the edematous region from the cerebral fluid (CSF) [29].
Segmenting the tumor is necessary before beginning any therapy in order to protect healthy tissues, while damaging and getting rid of cancerous ones. Brain tumor segmentation comprises the diagnosis, delineation, and separation of tumor tissues from healthy brain tissues, including gray matter (GM), white matter (WM), and CSF. Tumor tissues include active cells, necrotic cores, and edema. In practical situations, this method still requires the manual annotation and segmentation of a number of multimodal MRI images. Since human segmentation requires a lot of time, the creation [29] of reliable automated segmentation algorithms to provide effective and objective segmentation has been a fascinating and popular research issue in recent years. Deep learning algorithms are now useful for segmentation, making them good candidates for this area of research [30]. In this article, we provide the foundations of the field of cancer diagnosis, including the procedures of cancer diagnosis, followed by the conventional classification strategies used by doctors, thus providing the reader with a historical perspective on cancer classification methodologies. The seven-point detection method, the Menzies method, and pattern analysis are provided as examples. The purpose of this bibliographic research was to provide academics interested in adopting deep learning and artificial neural networks for cancer diagnosis with an overview of the most current developments.

Proposed Method
The FCNN-based architecture recommended for locating and categorizing malignant cells in cancer images is described in this section. GoogLeNet, VGGNet, and ResNet-three well-known FCNN architectures-were used to independently extract different low-level aspects in the suggested system. For the purpose of the classification task, the aggregated attributes were added to a layer that was already completely linked, as shown in Figure 1. In the following subsections, we detail each phase of the suggested architecture.

Processes for Data Preprocessing and Augmentation
To eliminate the many forms of noise in tissue images, the preprocessing phase is crucial. The H&E-stained tissue micrographs are normalized in the suggested manner using the method described in [31]. A CNN requires vast data sets to improve its precision. The CNN's performance is further hindered by the overfitting of small data sets. The network's performance is low when applied to test data, but it excels when used on training data [32,33]. The data augmentation approach expands the sample size by using core image processing techniques and geometric alterations to picture data sets. In order to expand the image data set, several techniques such as color processing, transformation (including translation, scaling, and rotation), inversion, and noise perturbation have been used used.

Feature Extraction Using a Pre-Trained FCNN Model
Separate CNN architectures were originally implemented for feature extraction in classification tasks prior to their incorporation into a fully linked layer. Circularity, roundness, and compactness, as well as other shape descriptors, may be associated with these qualities. The feature extractors GoogLeNet [34], the Visual Geometry Group Network (VGGNet) [35], and Residual Networks (ResNet) [36] are used in the framework that is given for the classification of cancer in cytology images. These structures are pre-trained for a range of general image descriptors using the transfer learning theory [37], and then the important properties are recovered from microscopic images using that training. The following sections cover the key components of each known FCNN architecture.

GoogleNet
This condensed network is made up of three layers of convolution, two levels of adaptive pooling, two layers of completely linked layers, and one layer that has its linear operations adjusted. We suggested a model that, using the architecture of GoogleNet, integrates numerous convolution filters of varied widths into a single new filter. As a result, both the number of perimeters and the computational cost are reduced. The essential organizational structure of GoogleNet is shown in Figure 2.

Resnet
In classification challenges related to ImageNet, ResNet, an extremely deep residual network, performs well [38]. Because of its deep structure, ResNet is able to minimize training time by combining convolution filters of various sizes. ResNet's basic design is illustrated in Figure 3.

VGGNet
There are more convolution layers in VGGNet, which makes it comparable to AlexNet. VGGNet has 13 layers of convolution, rectification, and pooling, as well as three layers that are completely linked [35]. A 3 × 3 window filter and a 2 × 2 adaptive Pooling network are used in the convolution network. VGGNet outperforms AlexNet because of its simpler design [30]. Figure 4 illustrates VGGNet's underlying structure.

Deep Transfer Learning
To train a CNN from scratch requires a massive amount of data, but it may be challenging to maintain a large data collection that covers all needed subjects. In the great majority of real-world applications, obtaining similar training and testing data is either impossible or very difficult. As a result, the notion of transfer learning has emerged. Transfer learning is one of the most well-known techniques in machine learning since it facilitates the transfer of previously acquired information to new situations with comparable features. In the first instance, the basic network is trained using the appropriate data set, and then it is applied to the desired task and trained using the desired data [37]. The selection of the pre-trained model, the difficulty level, and the similarity of the issues may be divided into two main transfer learning components. If the related issue is important in connection to the target problem, one can use a trained model. Target data sets that are smaller (less than 1000 pictures) and similar to the training data sets are more likely to suffer from overfitting (e.g., medical data sets, data sets of handwritten characters, data sets linked to cars, or data sets of biometrics). Similarly, if the amount of target data is equal to or greater than the amount of source data, the pre-trained model requires only modest alterations. The proposed system employs three CNN architectures (GoogleNet, VGGNet, and ResNet) to evaluate their transfer learning and fine-tuning capabilities. Transfer learning was applied in the ImageNet training of these three CNN architectures. This allowed the architecture to discover the underlying properties of several data sets without further training. Using av-erage pooling, the fully connected layer classifies benign and cancerous cells by combining the quantity of data collected independently from each CNN design.

Obtaining Pathology Images
For additional analysis, more than 138 high-resolution images of varying sizes (1600 × 1200, 1807 × 835, and 1807 × 896) were gathered and pre-classified. For each image, tissue samples were stained with May Grünwald-Giemsa (MGG) and hematoxylin and eosin (H&E) stains. The amount of images in each class varied, with 58 in the advanced class and only 20 in the normal class. Because of the reduced resolution of the input, the images were pre-processed into tiny, non-overlapping patches in order to better capture the cell properties necessary for calculating their grade.

Image Dataset
The slicing of the 138 original shots yielded 6468 patches, representing a 357% increase in the amount of images. In all, 55% (4358) of the background and non-tissue information patches were eliminated. May Grünwald-Giemsa (MGG) and hematoxylin and eosin (H&E) stain pathology images use a merged dataset of all MGG and H&E stained images. Hematoxylin and eosin (H&E) stain is the most commonly used stain for light microscopy in histopathology laboratories due to its relative simplicity of use and the ability to disclose a wide range of normal and diseased cell and tissue components. Five patients diagnosed with malaria infections had their peripheral blood (PB) smears stained with May Grünwald-Giemsa (MGG) every day at the Core Laboratory at the Hospital Clinic de Barcelona. Using an Olympus B × 43 microscope with 1000 × magnification and a digital camera, digital images were obtained (Olympus DP73). The images included inside the collection were JPG files (RGB, 2400 × 1800 pixels).

Cross-Validation and Training-validation
To evaluate the DL model, 80% and 20% images from each dataset were sorted into the training and validation sets, respectively. Using a K = 5 K-fold cross-validation approach, we divided the MGG, H&E, and mixed datasets into five equal sections and used them to generate five new cross-validation sets including five new copies of each dataset (e.g., MGG Set 1 to MGG Set 5 for MGG). Eighty percent of the images used for training and validation in each batch were distinct (20%). The average of the five training cycles was utilized to measure improvements.

Cancer Cell Detection Result
In this stage of the study, we tried to detect the cells based on the cancer images. For this experiment, we used 100 cancer images and we achieved an average detection of 70% of cells. At the same time, we tried to find the positive and negative cell detection results from the cancer images. For the positive and negative cell detection tasks, we obtained accuracy values of 85% and 80%, respectively. Figure 5 shows the positive and negative cell detection results and Figure 6 shows the TMA analysis.

Proposed Method Results
The proposed system uses transfer learning to take the knowledge gained from training on three distinct FCNN architectures (GoogLeNet, VGGNet, and ResNet) and apply it to the task of extracting features jointly. An array of modern methods and a set of integrated characteristics are used to evaluate the single CNN's output.
Furthermore, during the testing of the proposed technique, the data were separated into training and testing data sets. According to the 80%/20% split, 80% of the data were expected to be used to train CNN models, whereas the remaining 20% were likely to be used to test the CNN models. The proposed approach based on data splittingwas compared to those of the various CNN architectures described in Table 1. Under "Class Name," Table 1 displays the class of cancer (B-benign or M-malignant) and its accompanying precision, recall, F1 score, and accuracy. It also provides the average accuracy of each design based on splitting processes. When compared to individual structures, the suggested framework delivered more accurate images of cancer cells. Figure 7 shows a graphical representation of the results.

Accuracy Comparison with Different Approaches
As shown in Table 2, we compared the proposed framework's outcomes to four other well-known methodologies to observe the performance of the suggested design. Table 2 showed that the approaches in [39][40][41][42] were accurate to 92.63%, 90.0%, 97.0%, and 97.5%, respectively, whereas the findings produced using the suggested framework were accurate to 98.50%, which was greater than the values of all four other methods. These findings demonstrate the approach's superior accuracy when compared to other approaches of a similar kind. Table 2. Accuracy comparisons with other models.

Conclusions
To better identify and classify cancerous tumors, we developed a new deep learning system based on transfer learning. This method uses three CNN architectures (GoogleNet, VGGNet, and ResNet) to extract features from tumor cytology images and then combines them to enhance classification accuracy. A similar idea, data augmentation, was introduced by us to expand the size of a data set to enhance a CNN structure's performance. Finally, the proposed framework's performance was compared to several CNN designs, as well as existing approaches. Without having to start from scratch, the suggested framework produced outstanding accuracy results, which could boost classification efficiency. Handcrafted and CNN characteristics will be employed together in the future to boost classification accuracy even more. The suggested experimental findings revealed that the performance measures of accuracy, precision, recall, and F1 scores were 98.5%, 96.0%, 97.0%, and 93.0%, respectively. As the major focus of this work was deep learning for cancer diagnosis, the most important topics to describe to our readers are all possible deep learning diagnostic procedures.
Author Contributions: M.R.B. developed the experimental model, the structure of the manuscript, and the performance evaluation and wrote the preliminary draft. J.A. helped to fix the error codes, checked the labeled data and results, and reviewed the full paper. All authors have read and agreed to the published version of the manuscript.