Deep Learning Fast Screening Approach on Cytological Whole Slides for Thyroid Cancer Diagnosis

Simple Summary Papillary thyroid carcinoma is the most common type of thyroid cancer and could be cured if diagnosed and treated early. In clinical practice, the primary method for determining diagnosis of papillary thyroid carcinoma is manual visual inspection of cytopathology slides, which is difficult, time consuming and subjective with a high inter-observer variability and sometimes causes suboptimal patient management due to false-positive and false-negative results. This study presents a fast, fully automatic and efficient deep learning framework for fast screening of cytological slides for thyroid cancer diagnosis. We confirmed the robustness and effectiveness of the proposed method based on evaluation results from two different types of slides: thyroid fine needle aspiration smears and ThinPrep slides. Abstract Thyroid cancer is the most common cancer in the endocrine system, and papillary thyroid carcinoma (PTC) is the most prevalent type of thyroid cancer, accounting for 70 to 80% of all thyroid cancer cases. In clinical practice, visual inspection of cytopathological slides is an essential initial method used by the pathologist to diagnose PTC. Manual visual assessment of the whole slide images is difficult, time consuming, and subjective, with a high inter-observer variability, which can sometimes lead to suboptimal patient management due to false-positive and false-negative. In this study, we present a fully automatic, efficient, and fast deep learning framework for fast screening of papanicolaou-stained thyroid fine needle aspiration (FNA) and ThinPrep (TP) cytological slides. To the authors’ best of knowledge, this work is the first study to build an automated deep learning framework for identification of PTC from both FNA and TP slides. The proposed deep learning framework is evaluated on a dataset of 131 WSIs, and the results show that the proposed method achieves an accuracy of 99%, precision of 85%, recall of 94% and F1-score of 87% in segmentation of PTC in FNA slides and an accuracy of 99%, precision of 97%, recall of 98%, F1-score of 98%, and Jaccard-Index of 96% in TP slides. In addition, the proposed method significantly outperforms the two state-of-the-art deep learning methods, i.e., U-Net and SegNet, in terms of accuracy, recall, F1-score, and Jaccard-Index (p<0.001). Furthermore, for run-time analysis, the proposed fast screening method takes 0.4 min to process a WSI and is 7.8 times faster than U-Net and 9.1 times faster than SegNet, respectively.


Introduction
Thyroid cancer is the most prevalent cancer in the endocrine system and accounts for the majority of head and neck cancer cases [1]. Thyroid cancer has been on the rise globally for the past two decades, including in the United States, despite a decline in the incidence of certain other cancer forms [1]. Thyroid cancer is three times more prevalent in women than in men. The types of thyroid cancer are papillary carcinoma, follicular carcinoma, Hürthle (oncocytic) cell carcinoma, poorly differentiated carcinoma, medullary carcinoma and anaplastic (undifferentiated) carcinoma [2]. Papillary thyroid carcinoma (PTC) is the most common type of thyroid carcinoma, which accounts for 70% to 80% of all thyroid malignancies. The prognosis of PTC is better than other types of thyroid carcinoma [3]. Thyroid fine needle aspiration (FNA) is an important, safe tool for diagnosing PTC with an accuracy of approximately 94%, and a high degree of sensitivity, specificity [4,5]. Thyroid FNA is applied to distinguish benign from neoplastic or malignant thyroid nodules [6]. The wide use of thyroid FNA has greatly reduced the unnecessary thyroid surgical intervention and thus increased the percent of malignant nodules among all nodules surgically removed. The Bethesda System for Reporting Thyroid Cytopathology (TBSRTC) [7] is the universally accepted reporting system for thyroid FNA diagnosis. Although the cytologic feature of PTC is well documented, including enlarged overlapping nuclei, irregular nuclear contours, intranuclear pseudoinclusions, nuclear grooving, and fine, pale chromatin [8,9], traditionally a time-consuming cytologic analysis is performed by an experienced pathologist who manually examines the glass slides under a light microscope. The most common stain for cytological preparations is the Papanicolaou stain. May-Grünwald Giemsa Stain is one of the common Romanwsky stains used in cytology. Digital pathology has emerged as a potential new standard of care, in which glass slides are digitized into whole slide images (WSIs) using digital slide scanners. With over 100 million pixels in a typical WSI, pathologists find it difficult to manually identify all the information in histopathological images [10]. Thus, an automated diagnosis methods based on artificial intelligence are developed to overcome the constraints of manual and complex diagnosis process [11,12]. In recent years, deep learning has emerged as a potential approach for the automated analysis of medical images. Automating the diagnostic process helps the pathologists to make correct diagnosis in a short period of time. Deep learning has been commonly used to identify diseases such as retinal disease [13], skin cancer [14], colorectal polyp [15], cardiac arrhythmia [16], neurological problems [17], psychiatric problems [18], acute intracranial hemorrhage [19], and autism [20]. Deep learning has also shown the ability to help pathologists diagnose, classify, and segment cancer. For example, Courtiol et al. [21] trained a deep convolutional neural network (MesoNet) to automatically and accurately estimate the overall survival in mesothelioma patients from diagnostic unannotated histopathology images. Yamamoto et al. [22] trained a deep learning based framework that can derive explainable features from diagnostic unannotated histopathology images and anticipate predictions more accurately than humans. Zhang et al. [23] proposed a deep learning-based framework for automating the human-like diagnostic reasoning process, which would include second opinions and thereby encourage clinic consensus. Sanyal et al. [24] trained a convolutional neural network to classify PTC and non PTC on microphotographs from thyroid fine needle aspiration cytology (FNAC). In comparison, Sanyal et al.'s method [24] obtains the diagnostic accuracy of 85.06% on microphotographs of size 512 × 512 pixels from thyroid FNAC while the proposed method achieves an accuracy of 99% on gigapixels WSI of papanicolaou-stained thyroid FNA and ThinPrep (TP) cytological slides for detection and segmentation of PTC. Furthermore, Sanyal et al.'s method [24] can only operate on FNA slides while the proposed method performs consistently well on both thyroid FNA and TP slides. Ke et al. [25] trained a deep convolutional neural network (Faster R-CNN [26]) for detection of PTC from ultrasonic images. To the best of the authors' knowledge, this is the first study to build an automated deep learning framework for detection and segmentation of PTC from papanicolaou-stained thyroid FNA and TP cytological slides. Figure 1 presents the proposed framework structure and the dataset information. Figure 1a presents the workflow of the system from collection of data to analysis of outcome. Figure 1(ai) shows the thyroid smears are obtained through FNA and TP; in Figure 1(aii), slides of thyroid FNA and TP are prepared with papanicolaou's staining; in Figure 1(aiii), stained slides are digitalized at 20× objective magnification using Leica AT Turbo scanner; in Figure 1(aiv), digitized whole slide gigapixel images are distributed into a separate training (21%) set and a testing (79%) set; in Figure 1(av), WSIs are processed with fast background filtering of the proposed system; in Figure 1(avi), cytological samples of PTC of individual WSIs are rapidly identified by the proposed deep learning model in seconds. Figure 1b presents the distribution thyroid FNA and TP cytological slides for training and testing. Figure 1c presents the distribution of the number of tiles per WSI. Figure 1d presents the size distribution of the WSIs w.r.t. the width and height. In evaluation, as this is the first study on automatic segmentation of PTC in papanicolaou-stained thyroid FNA and TP cytological slides, we compare the proposed method with the two state-of-the-art deep learning models, including U-Net [27] and SegNet [28].

The Dataset
De-identified and digitized 131 WSIs, including 120 PTC cytologic slides (smear, papanicolaou-stained, n = 120) and 11 PTC cytologic slides (TP, papanicolaou-stained, n = 11) were obtained from the Department of Pathology, Tri-Service General Hospital, Taipei, Taiwan. All papillary thyroid carcinoma smears were cytologically diagnosed and histologically confirmed by the two expert pathologists. The well-preserved thyroid FNAs, which were done within the last two years, are selected. Ethical approvals have been obtained from the research ethics committee of the Tri-Service General Hospital (TSGHIRB No.1-107-05-171 and No.B202005070), and the data were de-identified and used for a retrospective study without impacting patient care. All the stained slides were scanned using Leica AT Turbo (Leica, Germany), at 20× objective magnification. The average slide dimensions are 77,338 × 37,285 pixels with physical size 51.13 × 23.21 mm 2 . The ground truth annotations were produced by two expert pathologists. The training model uses a total of 28 papanicolaou-stained WSIs (21%), including 25 thyroid FNA and 3 TP cytologic slides. The remaining 103 papanicolaou-stained WISs (79%), including 95 thyroid FNA and 8 TP cytologic slides, are used as a separate testing set for evaluation. The detailed information on the distribution of data could be found in Figure 1b.

Methods
In this work, we propose a fast and efficient deep learning based framework for segmentation of PTC in papanicolaou-stained thyroid FNA and TP cytological slides. Figure 2a presents the workflow of the proposed framework. Initially, each WSI is formatted into hierarchical tile-based data structure and assessed by the proposed deep learning model to produce the segmentation results of PTC in papanicolaou-stained thyroid FNA and TP WSIs. Figure 2b shows the detailed architecture of the proposed deep learning model. Initially, each WSI is formatted into hierarchical data structure and processed by fast background filtering to efficiently discard all the background and reduce the amount of computation per slide and the proposed deep learning model is used to produce the segmentation results of PTC on papanicolaou-stained thyroid FNA and TP WSIs. The details of the proposed WSI processing framework is described in Section S1.1 of the Supplementary Methods and Evaluation Metrics.

Proposed Convolution Network Architecture
The proposed deep learning network is built using VGG16 model as a backbone and adapted from a fully convolutional network framework [29], which has been widely employed in the field of pathology such as neuropathology [30], histopathology [31], and microscopy [32]. The proposed deep learning network consists of a padding layer, six convolutional layers, five max-pooling layers, two dropout layers, one deconvolutional layer, and a cropping layer (see Section S1.2 in Supplementary Methods and Evaluation Metrics for details). The detailed architecture of the proposed deep learning network is shown in Table 1 and Figure 2c.

Implementation details
The proposed method uses the VGG16 model as the backbone for training, with the network optimized using stochastic gradient descent (SGD) optimization and the cross entropy function as a loss function. Furthermore, the network training parameters of the proposed method, including the learning rate, dropout ratio, and weight decay, are set as 1 × 10 −10 , 0.5, and 0.0005, respectively. The benchmark methods (U-Net and SegNet) are implemented using the keras implementation [33]. For training, the benchmark methods (U-Net and SegNet) are initialized using a pre-trained VGG16 model, and the network is optimized using Ada-delta optimization with the cross entropy function as a loss function. Furthermore, the network training parameters of the benchmark methods, including the learning rate, dropout ratio, and the weight decay are set to 0.0001, 0.2, and 0.0002, respectively. The proposed method and the benchmark methods (U-Net and SegNet) uses the same framework to process a WSI, which is described in Section 2.2.

Evaluation Metrics
The quantitative evaluation is produced using five measurements, i.e., accuracy, precision, recall, F1-score, and Jaccard-Index. The evaluation metrics are described in Section S2 of Supplementary Methods and Evaluation Metrics.

Quantitative Evaluation with Statistical Analysis
The aim of this study is to develop a deep learning framework that can automatically detect PTC from both papanicolaou-stained thyroid FNA and TP cytological slides. For quantitative evaluation, we compared the performance of the proposed method with the state-of-the-art deep learning models, including U-Net and SegNet. Table 2 shows the quantitative evaluation results for segmentation of PTC from papanicolaou-stained thyroid FNA and TP cytological slides. The experimental results show that overall the proposed method achieves the highest accuracy 99%, precision 86%, recall 94%, F1-score 88% and Jaccard 82%, and outperforms the two benchmark approaches. For TP slides, the proposed method obtains even better results with accuracy 99%, precision 97%, recall 98%, F1-score 98% and Jaccard 96%. Figure 3 presents the box plots of the quantitative evaluation results, showing that (a) the proposed method works constantly well overall and significantly outperforms the benchmark methods in terms of accuracy, recall, F1-score, and Jaccard-Index (p < 0.001) and (b) the type of cytological slides, i.e., FNA or TP, does not affect the performance of the proposed model, which consistently performs well for both kinds of data while the benchmark approaches tend to perform better in TP than FNA w.r.t. accuracy and precision (p < 0.05). For statistical analysis, the quantitative scores were analyzed with the Fisher's Least Significant Difference (LSD) test using SPSS software (see Table 3). In comparison, the proposed method significantly outperforms the benchmark methods (U-Net and SegNet) in terms of accuracy, recall, F1-score, and Jaccard-Index, based on LSD test (p < 0.001). The experimental results demonstrate the high accuracy, efficiency, and reliability of the proposed method on papanicolaou-stained thyroid FNA and TP cytological slides. Figure 4 compares the qualitative segmentation results of the proposed method and two benchmark methods (U-Net and SegNet) in FNA and TP WSIs, showing that the proposed method is able to segment PTC from papanicolaou-stained thyroid FNA and TP slides consistent with the reference standard while the state-of-the-art benchmark methods (U-Net and SegNet) are unable to detect the PTC in some cases. Figure 5 further shows the annotations produced by the expert pathologists with typical PTC features, including papillary like structure, elongated nucleus, pale nucleus, pseudoinclusions in nuclear cytoplasm, nucleoli and longitudinal grooves in FNA and TP slides and the results by the proposed method with typical PTC features as well.    * The proposed method is significantly better than the benchmark methods (U-net [27] and SegNet [28]) (p < 0.001).

Run Time Analysis
Due to enormous size of WSI, the computing time of WSI analysis is crucial for practical clinical use. We examined the computing time using various hardware configurations (see Table 4). Table 4 compares the computational efficiency of the proposed method with the benchmark methods (U-Net and SegNet), showing that the proposed method takes 0.4 minute to process a WSI using four GeForce GTX 1080 Ti GPUs and 1.7 minute using a single GeForce GTX 1080 Ti GPU, whereas the U-Net model takes 13.2 minutes and the SegNet model takes 15.4 minutes. In addition, even with a single low-cost GPU, the proposed method outperforms the benchmark approaches with less computing time and is 7.8 times faster than U-Net and 9.1 times faster than SegNet. Overall, the proposed method is shown to be capable of detecting PTC reliably in both FNA and TP WSIs and rapidly in seconds, making it suitable for practical clinical use.

Discussion
In this study, we present a fully automatic and efficient deep learning framework for segmentation of PTC from both papanicolaou-stained thyroid FNA and TP cytological slides. PTC is the most common form of the thyroid cancer with best prognosis and most patients can be cured if treated appropriately and early enough. Thyroid FNA, in addition to pathological examination, is considered the most effective approach for the clinical diagnosis of PTC due to its diagnostic safety, minimal invasiveness and high accuracy. Manual pathological diagnosis is sometimes difficult, time-consuming, and laborious task. In cytopathological diagnosis, pathologists have to conduct a thorough inspection of all information on the glass slides under a light microscope. In recent years, digital pathology in which glass slides are converted into WSIs, has emerged as a potential new standard of care, allowing pathological images to be examined using computer-based algorithms. A typical WSI comprises of more than 100 million pixels, which makes it difficult for pathologists to manually conduct a thorough inspection of all information on cytopathological and histopathological slides. Algorithm-assisted pathologists on WSI diagnosis revealed higher accuracy than either the algorithm or the pathologist alone on review of lymph nodes for metastatic breast cancer, especially improved the sensitivity of detection for micrometastases (91% vs. 83%, p = 0.02) [34]. Pathologists easily over-calculate the percent of tumor cells, and the use of AI-based analysis increases the accuracy in applying tumor cell count to genetic analysis [35]. Although artificial intelligence has the potential to provide advantages in accuracy, precision, and efficiency through the automation of digital pathology. Artificial intelligence-related applications are also facing challenges, including regulatory roadblocks, quality of data, interpretability, algorithm validation, reimbursement, and clinical adoption [36]. There are compelling reasons to believe that digital pathology in addition to artificial intelligence for cytological PTC diagnosis is a viable answer to this problem because it aids in the production of more accurate diagnoses, shortens examination times, reduces the pathologists' workload, and lowers examination cost. For diagnosis of WSIs, many current algorithms are not well adapted to clinical applications due to the high computational cost of employing computational techniques. For practical clinical usage, we develop a fast, efficient, and fully automatic deep learning framework for fast screening of both FNA and TP slides. The experimental results show that the proposed deep learning framework has been demonstrated to be effective, as the proposed method achieves the accuracy and recall of over 90%. Furthermore, the proposed method achieves significantly superior performance than the state-of-the art deep learning models, including U-Net and SegNet using Fisher's LSD test (p < 0.001). The results demonstrate that the proposed method is able to segment PTC with high accuracy, precision, and sensitivity, comparable to the referenced standard produced by pathologists in seconds laying the groundwork for the use of computational decision support systems in clinical practice. The including 131 cytologic slides are all definitive papillary thyroid carcinoma under TBSRTC criteria. The limitation of our study is that in practice, cytopathologists have to face quantitatively insufficient specimens for a definitive diagnosis, for instance, the diagnosis of "suspicious for PTC". In future work, the proposed framework could be extended to the detection and segmentation of "suspicious for PTC" slides and different types of carcinoma, such as follicular carcinoma, Hürthle (oncocytic) cell carcinoma, medullary carcinoma, poorly differentiated carcinoma and anaplastic carcinoma.

Conclusions
In this work, we introduce a deep learning-based framework for automatic detection and segmentation of PTC in papanicolaou-stained thyroid FNA and TP cytological slides. We evaluated the proposed framework on a dataset of 131 WSIs, including 120 PTC cytologic slides (smear, papanicolaou-stained, n = 120) and 11 PTC cytologic slides (TP, papanicolaou-stained, n = 11), and the experimental results show that the proposed method achieves high accuracy, precision, recall, F1-score, and Jaccard-Index. In addition, we compared the proposed method with the state-of-the-art deep learning models, including U-Net and SegNet. Based on Fisher's LSD test, the proposed method significantly outperforms the two benchmark methods in terms of accuracy, recall, F1-score, and Jaccard-Index (p < 0.001). Institutional Review Board Statement: In this study, to enable the development of recognizing-PTC DL algorithms, de-identified, digitized 131 WSIs, including 119 PTC cytologic slides (smear, papanicolaou-stained, n = 120) and 11 PTC cytologic slides (TP, papanicolaou-stained, n = 11) were obtained from the Department of Pathology, Tri-Service General Hospital, National Defense Medical Center, Taipei, Taiwan. Ethical approvals have been obtained from the research ethics committee of the Tri-Service General Hospital (TSGHIRB No.1-107-05-171 and No.B202005070).
Informed Consent Statement: Patient consent was formally waived by the approving review board, and the data were de-identified and used for a retrospective study without impacting patient care.

Data Availability Statement:
The data that support the findings of this study are available from the corresponding author upon reasonable request.

Conflicts of Interest:
The authors declare that they have no known competing financial interest or personal relationships that could have appeared to influence the work reported in this paper.