A Soft Label Deep Learning to Assist Breast Cancer Target Therapy and Thyroid Cancer Diagnosis

Simple Summary Early diagnosis and treatment of cancer is crucial for the survival of cancer patients. Pathologists can use computational pathology techniques to make the diagnosis process more efficient and accurate. With the emergence of deep learning, there is considerable hope that this technology will be able to address issues that were previously impossible to tackle. In this study, we present an automatic soft label deep learning framework to select patients for human epidermal factor receptor 2 target therapy and papillary thyroid carcinoma diagnosis. This approach will assist in breast cancer target therapy and thyroid cancer diagnosis with rapid examination and decrease human judgment mistakes. Abstract According to the World Health Organization Report 2022, cancer is the most common cause of death contributing to nearly one out of six deaths worldwide. Early cancer diagnosis and prognosis have become essential in reducing the mortality rate. On the other hand, cancer detection is a challenging task in cancer pathology. Trained pathologists can detect cancer, but their decisions are subjective to high intra- and inter-observer variability, which can lead to poor patient care owing to false-positive and false-negative results. In this study, we present a soft label fully convolutional network (SL-FCN) to assist in breast cancer target therapy and thyroid cancer diagnosis, using four datasets. To aid in breast cancer target therapy, the proposed method automatically segments human epidermal growth factor receptor 2 (HER2) amplification in fluorescence in situ hybridization (FISH) and dual in situ hybridization (DISH) images. To help in thyroid cancer diagnosis, the proposed method automatically segments papillary thyroid carcinoma (PTC) on Papanicolaou-stained fine needle aspiration and thin prep whole slide images (WSIs). In the evaluation of segmentation of HER2 amplification in FISH and DISH images, we compare the proposed method with thirteen deep learning approaches, including U-Net, U-Net with InceptionV5, Ensemble of U-Net with Inception-v4, Inception-Resnet-v2 encoder, and ResNet-34 encoder, SegNet, FCN, modified FCN, YOLOv5, CPN, SOLOv2, BCNet, and DeepLabv3+ with three different backbones, including MobileNet, ResNet, and Xception, on three clinical datasets, including two DISH datasets on two different magnification levels and a FISH dataset. The result on DISH breast dataset 1 shows that the proposed method achieves high accuracy of 87.77 ± 14.97%, recall of 91.20 ± 7.72%, and F1-score of 81.67 ± 17.76%, while, on DISH breast dataset 2, the proposed method achieves high accuracy of 94.64 ± 2.23%, recall of 83.78 ± 6.42%, and F1-score of 85.14 ± 6.61% and, on the FISH breast dataset, the proposed method achieves high accuracy of 93.54 ± 5.24%, recall of 83.52 ± 13.15%, and F1-score of 86.98 ± 9.85%, respectively. Furthermore, the proposed method outperforms most of the benchmark approaches by a significant margin (p <0.001). In evaluation of segmentation of PTC on Papanicolaou-stained WSIs, the proposed method is compared with three deep learning methods, including Modified FCN, U-Net, and SegNet. The experimental result demonstrates that the proposed method achieves high accuracy of 99.99 ± 0.01%, precision of 92.02 ± 16.6%, recall of 90.90 ± 14.25%, and F1-score of 89.82 ± 14.92% and significantly outperforms the baseline methods, including U-Net and FCN (p <0.001). With the high degree of accuracy, precision, and recall, the results show that the proposed method could be used in assisting breast cancer target therapy and thyroid cancer diagnosis with faster evaluation and minimizing human judgment errors.


Introduction
Cancer is the largest cause of mortality in the world, accounting for over 10 million deaths in 2020. Early detection and treatment of cancer reduce deaths. However, the detection of cancer is one of the most difficult tasks in cancer pathology. Trained pathologists can analyze complicated tissue structures and detect tumors, but the judgements are subjective, qualitative, and time-consuming, resulting in significant intra-and inter-observer variability. Pathologists' exhaustion and fatigue may contribute to diagnostic mistakes as workload increases, lowering the overall quality of pathology service. To deal with this problem, modern processing techniques such as artificial intelligence (AI) techniques have been developed. Deep learning (DL), a subset of AI capable of autonomously extracting valuable properties from images to achieve specified tasks, has been repeatedly shown to outperform standard image-processing algorithms, as demonstrated for image classification [1] or segmentation [2]. Deep learning (DL) has recently been widely employed for high-performance image-analysis tasks such as object recognition [3][4][5], image segmentation [2,[6][7][8][9], and image classification [1,[10][11][12]. The ability to distinguish objects and properties in images (for example, cancer cells in biopsy samples) is changing the way clinical samples are evaluated. In this study, we present a soft label fully convolutional network (SL-FCN) for automatic segmentation of human epidermal growth factor receptor 2 (HER2) amplification in fluorescence in situ hybridization (FISH) and dual in situ hybridization (DISH) images of invasive breast cancer and papillary thyroid carcinoma (PTC) on Papanicolaou-stained FNA and thin prep (TP) whole slide images (WSIs).
Breast cancer remains the most frequently diagnosed cancer and the leading cause of cancer death among females worldwide [13]. Human epidermal growth factor receptor 2 gene amplification (HER2; ERBB2) test is well established to determine whether a breast cancer patient is eligible for anti-HER2 target therapy [14,15]. When breast cancer treated with anti-HER2 target therapies, such as trastuzumab, pertuzumab, and tyrosine kinase inhibitor lapatinib and neratinib, they have been shown to significantly improve survival, but without appropriate anti-HER2 therapy, HER2-amplified tumors are associated with poor prognosis [16][17][18][19][20][21][22]. Although immunohistochemistry (IHC) is a good screening method for negative (0+ or 1+) and strong positive (3+) results, any patient with IHC equivocal positive result (2+) should be confirmed by fluorescence in situ hybridization (FISH) analysis for anti-HER2 target therapies [23]. Dual in situ hybridization (DISH) can be used for signal visualization and the benefit of simultaneous morphologic correlation using light microscopy, and there is no need for specialized fluorescence equipment [24,25]. FISH and DISH both use dual probes to highlight the HER2 gene and the chromosome 17 centromere (CEN17) in different colors. The main distinction between positive and negative amplification status is based on the HER2/CEN17 ratio and the average HER2 copy number per nucleus in at least 20 nuclei. The American Society of Clinical Oncology (ASCO)/College of American Pathologists (CAP) initially issued a detailed guideline for clinical testing and interpretation of HER2 results in 2007, which were first revised in 2013 and updated in 2018. Based on the 2018 ASCO-CAP guidelines, the result is classified into five categories by FISH; group 1: When the HER2/CEN17 ratio is ≥2.0, and the average HER2 gene copy number ≥ 4 is reported as positive; group 2: When the HER2/CEN17 ratio is ≥2.0, and HER2 gene copy number < 4 is reported as negative, unless concurrent IHC 3+; group 3 : When HER2/CEN17 ratio is <2.0, and HER2 gene copy number ≥ 6 is reported as negative, unless concurrent IHC 2+ or 3+; group 4 : When HER2/CEN17 ratio is <2.0, and HER2 gene copy number ≥ 4 and <6 is reported as negative, unless concurrent IHC 3+; group 5 : When HER2/CEN17 ratio is <2.0, and HER2 gene copy number < 4 is reported as negative [24,26]. Accurate assessment of HER2 status is an essential step to identify the subset of breast cancer patients who may benefit from the anti-HER2 targeted therapy [17,[26][27][28]. Manual assessment of the HER2 amplification status is very time-consuming, laborious, and error-prone. The automated medical images diagnostic method is arguably the most successful field of medical applications that can dramatically increase the time efficiency for the pathologist's analysis and improve the accuracy of counting [29][30][31]. The development of image analysis based on new artificial intelligence (AI)-based approaches in pathology is being led by computer engineers and data scientists can also be used to improve diagnostic accuracy for clinical precision decision-making in cancer treatment [31]. However, analysis of HER2 expression is challenging due to unclear and blurry cell boundaries with large variations on cell shapes and signals as illustrated in Figure 1. Our research is the first attempt to use soft label FCN technology for automatic segmentation of HER2 amplification in FISH and DISH images of invasive breast cancer. In evaluation, to test the model robustness and model generalizability, three clinical datasets were collected using different magnifications from the Tri-service general hospital in Taipei, Taiwan. The pathologists produced a reference standard by manually annotating the HER2, ERBB2, and CEN17 signals in the FISH and DISH images. We compare the proposed algorithms with thirteen popular or recently published deep learning methods, including U-Net [2] +InceptionV4 [32], Ensemble of U-net with Inception-v4 [32], Inception-Resnet-v2 encoder [32], and ResNet-34 encoder [33], SegNet [34], Modified FCN [6][7][8][9][10][11], YOLOv5 [35], FCN [36], CPN [37], SOLOv2 [38], BCNet [39], and Deeplabv3+ [40] with three different backbones, including MobileNet [41], ResNet [33], and Xception [42] (see Section 4). The algorithms we developed are more objective, precise, and unbiased than the current standard manual interpretation results for anti-HER2 target therapy.
Thyroid cancer has one of the highest occurrences among the numerous forms of cancer [43]. The most frequent kind of thyroid cancer is papillary thyroid carcinoma (PTC). The study of a fine needle aspiration biopsy (FNAB), which is stained and spread onto a glass slide, is the most essential test in the preliminary detection of thyroid cancer [44]. A cytopathologist examines the FNAB sample under an optical microscope to estimate the risk of malignancy based on numerous aspects of thyroid cells, such as size, color, and cell group architecture. Digital pathology has just recently developed as a potential new standard of treatment in which glass slides are transformed into whole slide images (WSIs) utilizing digital slide scanners. Due to the very large size of a typical WSI (on the order of gigapixels), pathologists consider it challenging to manually detect all the information in WSI. Thus, artificial intelligence-based automated diagnosis approaches are being explored to solve the restrictions of manual and complicated diagnosis processes. In this study, we develop a soft labeled FCN based deep learning framework for the automatic segmentation of PTC in WSIs. To evaluate the robustness and generalizability of the proposed method, the clinical dataset containing 131 Papanicolaou-stained WSIs was collected from Tri-Service general hospital in Taipei, Taiwan. The reference standard was manually generated by annotating tumor cells in Papanicolaou-stained WSIs. In evaluation, the proposed method is compared with three state-of-the-art deep learning methods, including Modified FCN [6][7][8][9][10][11], U-Net [2], and SegNet [34].

Related Works in Soft Label, Label Smoothing, and Segmentation Approaches
In this section, we discuss three categories of works, which are most related to our proposed method, including soft label techniques, label smoothing methods, and segmentation approaches.

Soft Label Techniques
In traditional segmentation methods, the network usually receives binary ground truth labels or hard labels (label values are 0 and 1 only), which may cause information loss, especially for the pixels at the boundary between two different types [45]. To prevent this limitation, instead of hard labels, researchers [45][46][47] propose to use soft labels (label values are continuous values between 0 and 1), which can preserve more image information throughout the training process [47]. Soft label approaches have improved generalization, accelerated learning, and reduced network over-confidence [45][46][47]. When computing segmentation-based morphometric measurements, SoftSeg, a method based on U-Net [48] architecture proposed by Gros et al. [45], makes better precision than traditional binary segmentations (increase in 6.5% of DICE on the 2019 BraTS dataset) and has increased sensitivity, which is desired by radiologists. Zhang et al. [49] compared the segmentation result between using hard labels and soft labels and demonstrated that using soft labels can increase the segmentation performance. Engelen et al. [50] proposed to blur the ground truth mask with a Gaussian filter for label softening and demonstrated the improvement in in-vivo MRI and CT angiography (CTA) [51] images dataset. Qi et al. [52] developed a novel Progressive Cross-camera Soft-label Learning (PCSL) framework for the semisupervised person re-identification task that enhanced feature representations through a different learning method. Kats et al. [47] proposed a modified simultaneous truth and performance level estimation (STAPLE) [53] algorithm for soft annotations of experts and demonstrated that training the fully convolution neural network with the soft labels improves generalization and performance gain.

Label Smoothing Methods
It is widely known that neural network training is sensitive to the loss that is minimized [46]. Instead of using hard labels for model training, labeling smoothing methods utilize soft labels that are generated by exploiting a uniform distribution to smooth the distribution of the hard labels and aim at providing regularization for a learnable classification model [49]. Label smoothing is a method commonly used in training deep learning models to keep the neural network from becoming over-confident and to enhance model calibration and segmentation performance [46]. The label smoothing approach has been utilized in the fields of medical image analysis [54,55], style transfer [56], speech recogni-tion [57], and language translation [58] to improve the performance of the deep learning models. For example, Müller et al. [46] demonstrated that label smoothing implicitly calibrates learned models so that the confidences of their predictions are more aligned with the accuracies of their predictions. Li et al. [54] developed a ground truth softening methodology using the over-segmentation algorithm and smoothing based on the distance to an annotated boundary, and the experimental results demonstrate that using soft labels improves the model performance on both 2D and 3D medical images (increase in 0.7% of Dice on the MRBrainS18 dataset [59]). Zhao et al. [56] proposed an approach, which automatically segments items and extracts their soft semantic masks from the style and content images, to preserve the structure of the content image while having the style transferred. Pham et al. [55] developed a labeling smoothing method to better handle uncertain samples, which constitute a significant portion of chest X-ray datasets. Zhang et al. [49] presented an Online Label Smoothing (OLS) strategy, which generates soft labels based on the statistics of the model prediction for the target category, and demonstrates that the performance of the OLS method is better than other regularization approaches on the Canadian Institute for Advanced Research-100 (CIFAR-100) dataset [60].

Segmentation Approaches
Segmentation models are widely used in automated medical image analysis and have shown good performance [6,36,38,40]. A fully convolutional network (FCN) is introduced by Shelhamer et al. [36] for semantic image segmentation. To produce accurate and detailed segmentations, they defined a skip architecture that combines semantic information from a deep, coarse layer with appearance information from a shallow, fine layer. In recent years, researchers developed a modified FCN-32s approach and demonstrated that it is beneficial for tumor segmentation in the diagnosis of cervical cancer [7], thyroid cancer [6], breast cancer [8], ovarian cancer [10,11], and EBUS [9]. Shen et al. [61] developed a modified mini-U-net to segment the touching cells accurately in FISH images and demonstrated that the performance is better than the original mini-U-net [62]. Upschulte et al. [37] built a Contour Proposal Networks (CPNs), a framework for object instance segmentation by proposing contours that are encoded as fix-sized representations based on Fourier Descriptors, and evaluated the performance on three datasets (NCB, BBBC039 [63], SYNTH), which contains the large variations in cell shapes. Ke et al. [39] proposed a Bilayer Convolutional Network (BCNet), a bilayer mask prediction network for addressing the issues of heavy occlusion and overlapping objects in two-stage instance segmentation, and evaluated the performance on the COCO dataset [64]. Wang et al. [38] designed a dynamic instance segmentation framework called Segmenting Objects by Locations v2 (SOLOv2) and showed its robustness using the MS COCO dataset [64], which includes 91 stuff categories of per-pixel segmentation masks. Chen et al. [40] proposed DeepLabv3+, a deep learning model with an encoder-decoder structure, and proved its efficacy on the Cityscapes dataset [65], which includes polygonal annotations of instance segmentation for vehicles and people. In our experiment, we compare the proposed method with the state-of-the-art deep learning models, including FCN [36], Modified FCN [6][7][8][9][10][11], U-Net [2] +InceptionV4 [32], Ensemble of U-Net with Inception-v4 [32], Inception-Resnet-v2 encoder [32], and ResNet-34 encoder [33], U-Net [2], SegNet [34], YOLOv5 [35], BCNet [39], CPN [37], SOLOv2 [38], and DeepLabv3+ [40] with three different backbones, including MobileNet [41], ResNet [33], and Xception [42].

Materials
The performance of the proposed deep learning model is evaluated using four datasets, including two DISH breast datasets obtained on two different magnification levels, a FISH breast dataset, and a Papanicolaou-stained FNA and TP thyroid dataset. Ethical approvals have been obtained from the research ethics committee of the Tri-Service General Hospital (TSGHIRB No.1-107-05-171 and No.B202005070), and the data were de-identified and used for a retrospective study without impacting patient care. For FISH and DISH images of invasive breast cancer, we select patients coming to our medical center for breast cancer treatment who had infiltrating ductal carcinoma pathology diagnoses. De-identified, digitized images of Dual-color FISH and DISH in HER2 IHC scores 2+ equivocal cases from January 2014 to December 2021 were obtained from the tissue bank of the Department of Pathology, Tri-Service General Hospital, National Defense Medical Center, Taipei, Taiwan (n = 470, including 200 FISH images and 270 DISH images with two different device magnifications). For the DISH breast dataset 1, the slides were collected with 1200× overall magnification using 20× eyepiece lens (Forever Plus Corp., Taiwan) and 60× objective lens (Olympus, Japan). For the DISH breast dataset 2 and FISH breast dataset, the slides were collected with 600× overall magnification using 10× eyepiece lens (Olympus, Japan) and 60× objective lens (Olympus, Japan). DISH and FISH results were evaluated independently by two pathologists, generating annotations of invasive breast cancer areas of each slide to highlight individual tumor cells with associated labels for HER2 and CEN17 signals. For Papanicolaou-stained FNA and TP cytological slides for thyroid cancer diagnosis, de-identified and digitized 131 WSIs were received from the Department of Pathology, Tri-Service General Hospital, Taipei, Taiwan, comprising 120 PTC cytologic slides (smear, Papanicolaou-stained, n = 120) and 11 PTC cytologic slides (TP, Papanicolaou-stained, n = 11). Table 1 presents the detailed information of experimental datasets. The PathVysion HER2 DNA probe kit II (Vysis Inc., Downers Grove, IL, USA) was performed following the manufacturer's instructions, which is designed to detect amplification of the HER2 gene via FISH in formalin-fixed paraffin-embedded (FEPE) human breast cancer tissue specimens. FISH is performed using a dual probe highlighting the HER2 gene and the CEN17 in a different color. The FFPE tissue blocks containing breast cancer were selected and regions of interest were marked on hematoxylin and eosin (H and E) slides. The selected area in the subsequent section was taken for FISH analysis. Tissues were subjected to a series of deparaffinization, dehydration, and prehybridization treatments. After this time, probes were added, and the sections were left to incubate overnight. After post-hybridization washes, sections were mounted and checked for signal. The entire slide was screened, and every single discrete nucleus was examined for red and green signals.

Dish Breast Datasets
This study is performed by using the INFORM HER2 Dual ISH DNA Probe Cocktail Assay from Ventana Medical Systems, which is a dual-color DISH assay. The HER2 gene is detected by a dinitrophenyl (DNP)-labeled probe and visualized using an ultraView silver in situ hybridization (SISH) DNP detection Kit. The CEN17 is targeted using a digoxigenin (DIG)-labeled probe and detected using an ultraView Red ISH DIG detection Kit. Under light microscopy, HER2 shows as discrete black signals, and chromosome 17 appears as red signals. The sections were loaded into the Ventana Benchmark XT machine. A fully automated procedure was carried out with the following basic steps: Deparaffinization, followed by cell conditioning, and protease digestion. Following that, the probe was applied followed by hybridization and application of the SISH Multimer. Following that, the silver chromogen was applied and then followed by the application of Red ISH Multimer and red chromogen. Finally, hematoxylin was used to counterstain the image, which was followed by clearing in xylene and mounting with dibutyl phthalate polystyrene xylene.

FNA and TP Thyroid Dataset
The screening of cytology slides was first performed by cytologists, and two experienced pathologists confirmed these papillary carcinoma tumor groups labeled by cytologists. Cytology was performed using a 2017 Bethesda System for reporting thyroid cytopathology. The well-preserved thyroid FNAs performed during the previous two years are chosen. All stained slides were scanned at 20× objective magnification with a Leica AT Turbo (Leica, Germany) and the average slide size is 77,338 × 37,285 pixels. Two experienced pathologists created the reference standard. The training model uses a total of 28 Papanicolaou-stained WSIs (21%), including 25 thyroid FNA and three TP cytologic slides. The remaining 103 Papanicolaou-stained WSIs (79%), including 95 thyroid FNA and eight TP cytologic slides, are used as a separate testing set for evaluation.

Proposed Method: Soft Label FCN
A fully convolutional network (FCN) is introduced by Shelhamer et al. [36] for semantic image segmentation, and the proposed method is an extended improved model of our previous effort, i.e., a modified FCN, which has been demonstrated to be highly effective for tumor segmentation in the diagnosis of thyroid cancer [6], cervical cancer [7], breast cancer [8], ovarian cancer [10,11], and EBUS [9] and showed better segmentation performance than the original FCN [36] and a number of popular deep learning approaches. However, when dealing with objects of interest with blurry or unclear boundaries, the performance of existing deep learning models declines as shown in our experiment. To deal with this issue, we propose an improved soft-labeled FCN architecture to achieve better results, especially for data with blurry or unclear cell borders for semantic image segmentation. By utilizing soft labels instead of hard labels, the image information loss during the training process could be reduced [47]. Recent studies show that label smoothing can improve the segmentation performance at the boundaries of different regions [54][55][56]. In our study, we proposed a new loss function, namely the soft weight softmax loss function, which utilizes soft labels and integrates the concept of a label smoothing method [45,54] into the softmax loss function (see Sections 3.2.1 and 3.2.2) to improve the image segmentation results on data with blurry or unclear cell boundaries.
The major modification of the proposed soft-labeled FCN is the replacement of the original softmax loss function with a new soft weight softmax loss function, which assigns lower weights to the blurry and unclear cell bordering regions and higher weights to the center regions of annotations in computing the model loss. This helps build models focusing on the center annotated regions of interest with higher confidence (by assigning higher weight), and in the meantime, for confusing bordering regions, with lower attention in these blurry or unclear cell borders while training. Figure 2 presents the workflow of the proposed framework.

Soft Label Modeling
The efficacy of using soft labels instead of hard labels has been demonstrated in many research [45][46][47]. To improve the performance of boundary segmentation, we devise a soft label modeling for training better models. We convert these annotations A into bounding boxes B = {b k } k=1,2,...K which could be formulated as follows: where i r a k represents the x-axis coordinate of the k-th annotation, j r a k represents the y-axis coordinate of the k-th annotation, w b k denotes the width of the k-th bounding box, and h b k denotes the height of the k-th bounding box.
We define ψ = {ψ k } k=1,2,...K as a set of the diagonal lines of bounding box in the training dataset, and the diagonal line ψ k can be formulated as follows: After the ψ has been generated, we arrange the elements of ψ in an ascending order, and let ψ denote the set of diagonal lines after sorting which is formulated as follows: The median of diagonal line ψ * is calculated as follows: where % represents the remainder operator.
Given ψ * , the erosion kernel size κ e and dilation kernel size κ d could be formulated as follows: where φ, υ and τ are empirically determined to scale the kernel size; φ = 0.01, υ = 2, and τ = 6. Given F[κ e ] and F[κ d ] representing two binary structuring elements, each with a morphological kernel size (κ e and κ d ) for erosion and dilation operations, the F[κ e ] and F[κ d ] could be formulated as follows: Let R c = {r c k } k=1,2,...,K denote the r a k region after erosion operation, which is formulated by Equation (11), and R o = {r o k } k=1,2,...,K denotes the r c k region after dilation operation, which is calculated with Equation (12): where ⊕ and denote the binary morphological dilation and erosion operations. Given R c and R o , the erosion region R e = {r e k } k=1,2,...K and the dilation region R d = {r d k } k=1,2,...K could be formulated as follows: However, on the other hand, the soft label regions R s = {r s k } k=1,2,...K are the union of erosion regions and dilation regions, which is formulated as follows: After generating soft label map, we model the loss weight ω (m) of each pixel at m as formulated in Equation (16): where Ψ, Π, and ℵ are empirically determined; Ψ = 2, Π = 1.5, and ℵ = 1.
As shown in Equation (16), the higher weights are assigned to the center of annotation R c so that the model can focus on these regions during the training process while assigning lower weights to the boundary regions, which include blurry or unclear cell boundaries (R s ) and lowest weights to the background to reduce their influence on gradients during the training process.

Soft Weight Softmax Loss Function
The softmax loss function is popular in image segmentation models [6,7,34,36,56]. Based on the original softmax loss function, we proposed to utilize a new loss function that can preserve more image information and reduce the influences caused by the confusing regions during the training process. In this paper, we built a soft weight softmax loss function L sws to help the model focus on the central regions of interest with high confidence while reducing the attention on blurry or unclear cell borders.
Shown as Figure 2(c1), the original softmax loss function L s in modified FCN architecture [6][7][8][9][10][11] could be formulated as follows: where M is the number of pixels of training data, and p mn is formulated as follows: where N denotes the number of classes, z mn is the predicted score z for pixel m belonging to the target class n; and z mt denotes the predicted score z belonging to t-th class (t ∈ [1,N]) in pixel m. Figure 2(c2) shows the soft weight softmax loss function L sws in our proposed soft label FCN, which is formulated by adding the soft weight. The soft weight softmax loss function is formulated as follows: where ω m is the weight value ω belonging to the pixel m. The center of annotations R c has been assigned the highest weights in computing model loss so that the model can focus on training the central regions with high confidence. On the other hand, the boundary regions which include erosion regions R E , dilation regions R D , and the background regions have been assigned lower weights in computing model loss to reduce the confusion caused by these regions while training. By assigning these regions with different weights in the loss function, the model can focus on the target regions and reduce the confusion by other regions.

Proposed Soft-Labeled FCN Architecture
Based on the modified FCN [6][7][8][9][10][11], we proposed a soft-labeled FCN that is improved from the FCN-32 architecture, which is shown in Figure 2a. Firstly, the network requires 512 × 512 tiles as an input image. The first two stages consist of two convolutional layers with a filter size of 3 × 3, a stride of 1, and the ReLU, then the max-pooling layer with 2 × 2 filter size and stride of 2 comes next to the convolutional layer. The next three stages consist of three convolutional layers with the filter size of 3 × 3, the stride of 1 and the ReLU comes next to the convolutional layer, the max-pooling layer with the filter size of 2 × 2 and the stride of 1 is followed by the convolutional layer. After three convolutional layers, the next two stages consist of a fully connected (FC) layer with 3 × 3 filter size, stride 1, ReLU, and dropout layer. Next, the convolutional layer with 1 × 1 kernel size, and then the deconvolutional layer with kernel size 64 × 64 and stride of 32 is utilized to upsample the feature maps. After the deconvolutional layer is the cropping layer. Following cropping, the last layer of the model is the loss function. Figure 2b demonstrates the process of obtaining weight value in the proposed loss function using the soft label modeling (see Section 3.2.1). The detailed information about the proposed soft weight softmax loss function and its comparison with the softmax loss function is described in Section 3.2.2. Figure 2d presents the output segmentation results from the traditional softmax loss function (Figure 2(d1)) and the proposed loss function (Figure 2(d2)). It can be seen that the proposed loss function has improved the performance of the model. The detailed framework of the proposed soft label FCN is presented in Figure 2. The detailed architecture of the proposed deep learning network is shown in Table 2.
N represents the number of types to predict; in this study N = 3, and there are three types to predict, including the background class, the type of tissues other than the targetting type and the targetting tissue type.

Implementation Details
To train the proposed method, the model is initialized by the VGG16 model, optimized with the SGD optimizer, and using the soft weight loss as the loss function. Moreover, the base learning rate in the proposed method is 1 × 10 −10 , weight decay of 5 × 10 −4 , and momentum of 0.99. Data augmentation is also utilized as a regularizer in neural networks, minimizing overfitting and improving performance when dealing with unbalanced classes. For data argumentation, we rotate our input images per 5 • and 5 times, increment of 90 • , and flip our input images along the horizontal and vertical axes during the training process.

Evaluation Metrics
For quantitative evaluation, we utilize the accuracy, precision, recall, F1-score, and Jaccard index to compare and measure the performance of the benchmark approaches and the proposed method. The metrics are calculated as follows: Jaccard index = TP TP + FP + FN (24) where TP represents the true positive, TN is the true negative, FP denotes false positive, and FN is the false negative.

Quantitative Evaluation with Statistical Analysis in DISH Breast Dataset 1
The quantitative evaluation results in segmentation of HER2 amplification in DISH dataset 1 are presented in Table 3a. The proposed Soft-label FCN in segmentation of HER2 amplification of DISH dataset 1 with an accuracy of 87.77 ± 14.97%, precision of 77.19 ± 23.41%, recall of 91.20 ± 7.72%, F1-score of 81.67 ± 17.76%, and Jaccard Index of 72.40 ± 23.05%. In addition, the box plots of the quantitative assessment results for breast cancer segmentation are shown in Figure 3a, demonstrating that the suggested technique consistently outperforms the baseline approaches. To further demonstrate the efficacy and efficiency of the proposed method, using SPSS software, we examined that the quantitative scores were evaluated with Fisher's Least Significant Difference (LSD) ( Table 4). Based on the LSD test, the suggested approach substantially exceeds most of the baseline approaches in terms of precision, recall, F1-score, and Jaccard index (p < 0.001). Figure 4 presents the visual comparison of segmentation results of the proposed method and the baseline approaches for segmentation of HER2 amplification. Here, we can observe a consistency between the typical segmentation results generated by the proposed method and the reference standard produced by an expert pathologist. Results from the quantitative and qualitative evaluation show that the proposed soft label FCN outperforms the baseline models, including U-Net [2] with InceptionV4 [32], Ensemble of U-net with Inception-v4 [32], Inception-Resnet-v2 encoder [32], and ResNet-34 encoder [33], Seg-Net [34], Modified FCN [6][7][8][9][10][11], U-Net [2], YOLOv5 [35], FCN [36], CPN [37], SOLOv2 [38], BCNet [39], and Deeplabv3+ [40] with three different backbones, including MobileNet [41], ResNet [33], and Xception [42].

Quantitative Evaluation with Statistical Analysis in the FISH Breast Dataset
The quantitative evaluation results in the segmentation of HER2 amplification in FISH dataset are presented in Table 3c. The proposed soft label FCN for HER2 amplification of FISH dataset with an accuracy of 93.54 ± 5.24%, precision of 91.75 ± 8.27%, recall of 83.52 ± 13.15%, F1-score of 86.98 ± 9.85%, and Jaccard Index of 78.22 ± 14.73%. In addition, the box plots of the quantitative assessment results for breast cancer segmentation are shown in Figure 3c, demonstrating that the suggested technique consistently outperforms the baseline approaches. To further demonstrate the efficacy and efficiency of the proposed method, using SPSS software, we examined the quantitative scores that were evaluated with Fisher's Least Significant Difference (LSD) ( Table 6). Based on the LSD test, the suggested approach substantially exceeds the baseline approaches in terms of precision, recall, F1-score, and Jaccard index (p < 0.001). Figure 6 presents the visual comparison of segmentation results of the proposed method and the baseline approaches for segmentation of HER2 amplification. Here, we can observe a consistency between the typical segmentation results generated by the proposed method and the reference standard produced by an expert pathologist. Results from the quantitative and qualitative evaluation show that the proposed soft label FCN outperforms the baseline models, including Modified FCN [6][7][8][9][10][11], YOLOv5 [35], CPN [37], SOLOv2 [38], BCNet [39], and Deeplabv3+ [40] with three different backbones, including MobileNet [41], ResNet [33], and Xception [42]. Table 6. Statistical analysis to compare the proposed method with benchmark approaches using the LSD test on the FISH dataset.  The mean difference is significant at the level of ** 0.01, and *** 0.001.

Quantitative Evaluation with Statistical Analysis in the Thyroid Dataset
The quantitative evaluation results for the segmentation of PTC in Papanicolaoustained FNA and TP WSIs are presented in Table 7a. The experimental results demonstrate that the proposed SL-FCN achieves superior performance compared to the baseline approaches, including Modified FCN [6][7][8][9][10][11], U-Net [2], and SegNet [34] with an accuracy of 99.99 ± 0.01%, precision of 92.02 ± 16.6%, recall of 90.90 ± 14.25%, F1-score of 89.82 ± 14.92%, and Jaccard Index of 84.16 ± 19.91% for the segmentation of PTC in histopathological WSIs.  (Table 7b). The LSD test results demonstrate that the proposed SL-FCN substantially exceeds the baseline approaches, including U-Net [2] and SegNet [34] in terms of precision, recall, F1-score, and Jaccard index (p < 0.001). Furthermore, the qualitative segmentation results of the proposed SL-FCN and the baseline approaches for the segmentation of PTC in Papanicolaou-stained WSIs are presented in Figure 8. A consistency can be seen between the predicted result by the proposed method and the reference standard produced by the expert pathologist in Figure 8.  The mean difference is significant at the level of * 0.05, and *** 0.001. ν The evaluation results are referred from [6] on the thyroid dataset.

Ablation Study
In this section, we conduct four experiments to validate the performance of each component of our proposed soft label FCN, including changing the ratio of weight value for different region, changing the soft label regions, utilizing different initialization methods, and utilizing different optimizers with the Kaiming initialization. We conduct the experiments to investigate the soft label regions in our proposed soft label FCN, and analyze the relationships among segmentation performance with our proposed method (see Table 8a). We compare the performance of the proposed soft label FCN with different initialization methods and without initialization (see Table 8b). The quantitative results of the ablation study show that the proposed method without initialization obtains improved performance over the version with Kaiming initialization and Xavier initialization. We compare the performance of the proposed soft label FCN with different ratios of weight which are assigned in different regions (see Table 8c). We also compare the performance of the proposed soft label FCN with Kaiming initialization and different optimizers, including Stochastic Gradient Descent (SGD) with momentum, Adam, Adaptive Gradient, AdaDelta, Nesterov's Accelerated Gradient (NAG), and RMSprop (see Table 8d). All the experiments are conducted on the DISH dataset 1. The experimental results demonstrate that the proposed method with soft label region R S , without initialization, weight values (Ψ = 2, Π = 1.5, ℵ = 1), and SGD with momentum optimizer provides the best performance.

Discussion and Conclusions
Cancer research has seen constant growth throughout the last few decades. Scientists used several approaches, such as early-stage screening, to detect cancer types before they develop symptoms. Furthermore, they have created novel ways for predicting cancer therapy outcomes early on. However, reliable cancer prediction is one of the most difficult jobs for clinicians. To deal with this challenge, deep learning methods have grown in popularity among medical researchers. The deep learning methods may find and detect patterns as well as accurately determine potential outcomes of a form of cancer. In this study, we develop a SL-FCN method for automated segmentation of HER2 amplification in FISH and DISH images of invasive breast cancer to assist breast cancer target therapy and PTC on Papanicolaou-stained FNA and TP WSIs to help in thyroid cancer diagnosis.
Breast cancer is classified into five subtypes including luminal A, luminal B, HER2positive luminal B, non-luminal HER2-positive, and triple negative, for treating early breast cancer in the adjuvant setting using levels of ER, PR, Ki67, and HER2 expression [66]. The amplified HER2 gene can be observed in approximately 15-20% of patients with invasive breast cancer as a poor prognostic factor [21,66,67]. HER2 amplification with adverse prognostic effects is not limited to breast and gastric cancer but is also found in a variety of tumor types such as colon cancer, urinary bladder cancer, and biliary cancer [67][68][69][70][71]. Clinical outcomes for HER2-positive breast cancer have dramatically changed with HER2targeted therapy [21,22]; however, in addition to being expensive, HER2 targeted therapy has some serious side effects associated with its use, such as cardiomyopathy, pulmonary toxicity, and febrile neutropenia [72,73]. Considering these reasons, it is very important to determine the HER2 status for selection of treatment options, and maximizing efficacy while minimizing toxicity and cost is imperative. To date, no biomarkers that predict response to anti-HER2 therapy other than HER2 overexpression itself have been discovered [74]. This requires a reliable method for identifying HER2-positive cases. A key first step in appropriately deciding on the use of HER2-targeted therapy is the accurate determination of HER2 overexpression. IHC detects HER2 protein expression on the cell membrane, and is defined on a scale of 0-3 based on the Hercept Test Score [75]. Scores of 0 and 1+ were considered negative, and a score of 3+ was considered to be positive. An equivocal result, represented by a score of 2+, requires further testing to confirm the presence or absence of HER2 gene amplification, which can be achieved using a second method, most commonly ISH [76]. HER2 ISH was traditionally performed by FISH. DISH provides faster turnaround times and the ability to store slides for long periods without loss of signal [77]. In addition, DISH may also be superior to FISH in assessing heterogeneity, especially when discrete areas of amplification are present within the tumor [78].
The HER2/CEN17 ratio and average HER2 copy number are very important to determine whether the FISH and DISH results are positive or negative. Pathologists rely on their experience to analyze the HER2 gene amplification status of a select region by visual evaluation, which can easily produce bias and inter-observer variability. Therefore, an automated diagnostic method based on AI can potentially overcome the limitations of manual assessment procedure [79][80][81][82]. The development of automated diagnostic tools has been used for segmentation of chromosomes in multicolor FISH images to make pathological examinations more accurate and reliable [30,83,84]. In this study, we developed a soft label FCN technology for analyzing FISH and DISH images. We compared IHC equivocal cases (2+) combined with FISH or DISH testing assessed by visual counting or deep learning methods to confirm HER2 gene status. Using FISH or DISH current standard visual evaluation as a reference, the diagnostic indices for soft label deep learning in (1) FISH dataset with sensitivity 83.52%, specificity 98.65%, and accuracy 93.54%; (2) DISH dataset 1 with sensitivity 91.2%, specificity 86.45%, and accuracy 87.77% and (3) DISH dataset 2 with sensitivity 83.78%, specificity 97.16%, and accuracy 94.64%. Moreover, in statistical analysis, the proposed soft label FCN approach outperforms the baseline approaches by a significant margin (p < 0.001). Even for the challenging FISH images with blurry cell borders as shown in Figure 6, the proposed soft label FCN consistently performs well and outperforms benchmark approaches. The approach enables the automated counting of more nuclei with high precision, sensitivity, and accuracy, which is comparable to the usual clinical manual counting method. Adjuvant trastuzumab with chemotherapy is standard treatment for HER2-positive breast cancer, defined as IHC2+ and FISH amplified. Although there is no complete documentation in our experimental data to determine whether FISH-amplified cases are positively associated with treatment outcome, some cases with high HER2 copy number do have a good clinical response that provides oncologists with valuable information on the possibilities of response or not after anti-HER2 target therapy.
PTC is the most common malignant tumor of thyroid cancer. In evaluation for thyroid FNA, pathologists must evaluate all information on glass slides under a light microscope. Digital pathology has emerged as a possible new standard of treatment in recent years, enabling pathology images to be analyzed using computer-based algorithms. However, due to the large size of a typical WSI, pathologists find it difficult to manually detect all of the information in WSI. As a result, artificial intelligence-based automated diagnosis systems are being investigated in order to overcome the limitations of manual and difficult diagnosis procedures. In this study, we developed a soft label FCN technology for analyzing Papanicolaou-stained WSIs for PTC diagnosis. The quantitative evaluation results demonstrate that the proposed method achieves superior performance for the segmentation of PTC on Papanicolaou-stained WSIs than the baseline methods, including Modified FCN, U-Net, and SegNet, with accuracy, precision, and recall of over 90%. Moreover, in statistical analysis based on Fisher's LSD test, the proposed soft label FCN approach outperforms the baseline approaches, including U-Net and SegNet by a significant margin (p < 0.001).
The potential of DL-based soft label approaches in our study have a high degree of accuracy, precision, recall and F1-score. The experimental results on FISH and DISH images of invasive breast cancer for assessment of HER2 amplification and Papanicolaou-stained FNA and TP WSIs for PTC diagnosis demonstrate that the proposed deep learning-based system may not only eliminate misclassification owing to human error, but also decrease the decision-making time, enhancing accuracy and reproducibility while also being more objective, precise, and unbiased than current standard visual interpretation results. People will have more confidence in AI algorithms after they are validated using multi-center data and have increased interpretability. The collaboration between pathologists and AI will promote tumor diagnosis and precision treatment. For live demonstration, an online web- Informed Consent Statement: Patient consent was formally waived by the approving review board, and the data were de-identified and used for a retrospective study without impacting patient care.

Data Availability Statement:
The data that support the findings of this study are available from the corresponding author upon reasonable request.

Conflicts of Interest:
The authors declare that they have no known competing financial interest or personal relationship that could have appeared to influence the work reported in this paper.