Recent Advances of Deep Learning for Computational Histopathology: Principles and Applications

Simple Summary The histopathological image is widely considered as the gold standard for the diagnosis and prognosis of human cancers. Recently, deep learning technology has been extremely successful in the field of computer vision, which has also boosted considerable interest in digital pathology analysis. The aim of our paper is to provide a comprehensive and up-to-date review of the deep learning methods for digital H&E-stained pathology image analysis, including color normalization, nuclei/tissue segmentation, and cancer diagnosis and prognosis. The experimental results of the existing studies demonstrated that deep learning is a promising tool to assist clinicians in the clinical management of human cancers. Abstract With the remarkable success of digital histopathology, we have witnessed a rapid expansion of the use of computational methods for the analysis of digital pathology and biopsy image patches. However, the unprecedented scale and heterogeneous patterns of histopathological images have presented critical computational bottlenecks requiring new computational histopathology tools. Recently, deep learning technology has been extremely successful in the field of computer vision, which has also boosted considerable interest in digital pathology applications. Deep learning and its extensions have opened several avenues to tackle many challenging histopathological image analysis problems including color normalization, image segmentation, and the diagnosis/prognosis of human cancers. In this paper, we provide a comprehensive up-to-date review of the deep learning methods for digital H&E-stained pathology image analysis. Specifically, we first describe recent literature that uses deep learning for color normalization, which is one essential research direction for H&E-stained histopathological image analysis. Followed by the discussion of color normalization, we review applications of the deep learning method for various H&E-stained image analysis tasks such as nuclei and tissue segmentation. We also summarize several key clinical studies that use deep learning for the diagnosis and prognosis of human cancers from H&E-stained histopathological images. Finally, online resources and open research problems on pathological image analysis are also provided in this review for the convenience of researchers who are interested in this exciting field.


Introduction
Cancer is the second leading cause of mortality worldwide. It is reported that the global cancer burden is expected to be 28.4 million cases in 2040 [1]. Thus, the effective and efficient diagnosis of human cancer, especially at its early stage, is essential for global cancer control. Recently, a wide variety of biomarkers have been utilized for the diagnosis and prognosis of cancers, including radiomics images [2], histopathological images, and genetic signatures, such as genetic mutations, gene expression, and protein markers [3]. Among these, the histopathology image is widely recognized as the "golden standard" for analyzing human cancers since it can visually reflect the aggressiveness of human cancers at the cell level [4]. Recently, with the remarkable success of digital histopathology, whole-slide imaging (WSI) has become more advanced and has been frequently used for the diagnosis and prognosis of human cancers, since it excels at characterizing the morphology within the tissue at high resolution [5]. Hematoxylin and eosin (H&E) staining is the most commonly used tissue staining method in the world. Generally, the research directions for the analysis of H&E-stained WSI can be summarized into the components of color normalization, segmentation, and cancer diagnosis/prognosis (shown in Figure 1). Specifically, color normalization is used to preprocess the images to correct staining variations across different images. WSI segmentation is used to segment the nuclei or tissues from the WSI. Finally, the prediction models are designed for the diagnosis and prognosis of human cancers. However, due to the time-consuming inspection of WSI and the large inter-operator variation among pathologists, there is an imperative need to develop machine learning models to automatically analyze H&E-stained histopathological images in a more reliable way [6].
Cancers 2022, 14, x 3 of 20 cient to characterize the complex WSI [18]. Thirdly, most traditional machine learning algorithms are designed for data that would be completely loaded into memory, which is difficult for analyzing large amounts of WSI [19]. Recently, deep learning technology has been extremely successful in the field of computer vision, which also boosts considerable interest in digital H&E-stained pathology analysis [20][21][22]. In comparison with traditional machine learning approaches, the deep learning algorithms go directly from the input to the desired output to extract useful features for specific WSI analysis tasks, which can avoid the complex feature extraction step. In addition, the heterogenous patterns of WSI can cause variance across different samples, thereby causing the difficulties of handcrafted features with limited generalization abilities [23]. The deep learning algorithms are capable of characterizing such complex patterns when given amounts of WSI data for model training. Moreover, given recent advances in the high-throughput tissue bank and archiving of digitized WSI, the deep learning algorithms are much more scalable due to their ability to process massive amounts of data and perform a lot of computations in a cost and time-effective manner [24].
In this paper, we systematically review the research directions and challenges of deep learning methods for H&E-stained histopathological image analysis (shown in Figure 1). Our paper is organized as follows. In Section 2, we will briefly introduce the concepts and structure of the deep neural network. In Section 3, we will introduce the research direction of color normalization for the H&E-stained histopathological image analysis. In Section 4, we will summarize the literatures that applied the deep learning method for various H&E image segmentation tasks such as nuclei and tissue segmentation. In Section 5, we will review the clinical studies that apply H&E-stained histopathological images for the diagnosis and prognosis of cancer based on H&E-stained histopathological images. Finally, online resources and open research problems on H&E-stained histopathology image analysis are also provided in Section 6.

Deep Neural Network
Deep learning is a new research direction in the field of machine learning based on the deep neural network, which has greatly boosted the performance of natural image The machine learning-based methods for the analysis of H&E-stained histopathological images can be divided into two categories (i.e., the traditional machine learning methods and deep learning methods). The traditional computational methods objectively evaluate disease-related tissue changes by extracting handcrafted features such as textural [7] and morphological features [8], followed by designing classifiers such as support vector machine (SVM) [9], random forest (RF) [10] and K-nearest neighbors (K-NN) [11] for the downstream analysis tasks. For instance, Kruk et al. [12] first extracted morphometric, textural, and statistical features from the WSI, and then used these features for nuclei classification by the combination of SVM and RF classifiers. Fuchs et al. [13] proposed a computational pipeline to extract local binary patterns and color features from images and then used these features to segment nuclei relying on a RF classification model. Zeralla et al. [14] firstly extract the spatial feature from WSI, then the SVM classifier is applied to accomplish the color normalization task. It has been proved that traditional machine learning algorithms could achieve significantly superior classification performance than their competitors if the sample size for model training is small [15], which is suitable for analyzing rare cancer subtypes with a limited sample size [16]. Moreover, traditional machine learning models are more understandable and explainable, and can be used to help clinicians understand how the machine learning models make decisions.
Although much progress has been made, three common limitations have existed in the traditional machine learning methods for H&E-stained histopathological image analysis. First, the handcrafted features are extracted in an unsupervised way and are uncorrelated with the following WSI analysis task [17]. Secondly, the extracted handcrafted features can only learn the shallow representation of the input image, given the heterogenous patterns of WSI, these shallow model-based feature extraction methods may be insufficient to characterize the complex WSI [18]. Thirdly, most traditional machine learning algorithms are designed for data that would be completely loaded into memory, which is difficult for analyzing large amounts of WSI [19]. Recently, deep learning technology has been extremely successful in the field of computer vision, which also boosts considerable interest in digital H&E-stained pathology analysis [20][21][22]. In comparison with traditional machine learning approaches, the deep learning algorithms go directly from the input to the desired output to extract useful features for specific WSI analysis tasks, which can avoid the complex feature extraction step. In addition, the heterogenous patterns of WSI can cause variance across different samples, thereby causing the difficulties of handcrafted features with limited generalization abilities [23]. The deep learning algorithms are capable of characterizing such complex patterns when given amounts of WSI data for model training. Moreover, given recent advances in the high-throughput tissue bank and archiving of digitized WSI, the deep learning algorithms are much more scalable due to their ability to process massive amounts of data and perform a lot of computations in a cost and time-effective manner [24].
In this paper, we systematically review the research directions and challenges of deep learning methods for H&E-stained histopathological image analysis (shown in Figure 1). Our paper is organized as follows. In Section 2, we will briefly introduce the concepts and structure of the deep neural network. In Section 3, we will introduce the research direction of color normalization for the H&E-stained histopathological image analysis. In Section 4, we will summarize the literatures that applied the deep learning method for various H&E image segmentation tasks such as nuclei and tissue segmentation. In Section 5, we will review the clinical studies that apply H&E-stained histopathological images for the diagnosis and prognosis of cancer based on H&E-stained histopathological images. Finally, online resources and open research problems on H&E-stained histopathology image analysis are also provided in Section 6.

Deep Neural Network
Deep learning is a new research direction in the field of machine learning based on the deep neural network, which has greatly boosted the performance of natural image analysis techniques, such as image classification [24], object detection [25], and semantic segmentation [26].
A deep neural network is composed of multiple nonlinear modules which can be regarded as a feature learning process from low to high levels. The convolutional neural network (CNN) is the most widely used artificial neural network [27] (shown in Figure 2), which can be regarded as a feature learning process from low to high level. Specifically, the convolutional layers are used to learn local features (i.e., corners and edges from the images). Different convolutional layers are interleaved with the pooling layers, which are used to reduce the output from the convolutional layers. The last fully connected A deep neural network is composed of multiple nonlinear modules which can be regarded as a feature learning process from low to high levels. The convolutional neural network (CNN) is the most widely used artificial neural network [27] (shown in Figure 2), which can be regarded as a feature learning process from low to high level. Specifically, the convolutional layers are used to learn local features (i.e., corners and edges from the images). Different convolutional layers are interleaved with the pooling layers, which are used to reduce the output from the convolutional layers. The last fully connected layers are used to combine the features, which are learned from the convolutional layers together and by which we can obtain complex and high-level representation for the final prediction task. We compare and summarize typical CNN (i.e., AlexNet [28], ZFNet [29], VGGNet [30], GoogLeNet [31], ResNet [32], and SENet [33]) from the perspectives of network structure, calculation speed, and classification performance in Table 1, where the additional dropout layer is used to reduce the risk of overfitting [28], while the batch normalization strategy [32] can help diminish the reliance of gradients on the scale of the parameters or their underlying values. CNN takes raw images (or large patches) as input to avoid the complex feature extraction step, which is highly invariant to translation, scaling, inclination, and other forms of deformation. Histopathology images are characterized by data complexity, making deep learning algorithms extremely suitable for each step in pathological image analysis, including color normalization, histopathological image segmentation, and the diagnosis and prognosis of human cancers. We will review them in the following sections.

Color Normalization
Color variations usually exist in WSI due to differences in raw materials and staining protocols across different pathology labs, interpatient variabilities, and slide scanner variations. Intuitively, such color variance will affect the generalization performance of deep learning models. Normalization of the color represented by WSI is thus an important preprocessing task for digital pathology analysis [34]. Herein, we discuss literature on the use of deep learning-based methods for color normalization in histopathological images ( Figure 3). protocols across different pathology labs, interpatient variabilities, and slide scanner variations. Intuitively, such color variance will affect the generalization performance of deep learning models. Normalization of the color represented by WSI is thus an important preprocessing task for digital pathology analysis [34]. Herein, we discuss literature on the use of deep learning-based methods for color normalization in histopathological images (Figure 3).  [35][36][37][38][39][40][41].
In general, traditional color normalization methods (i.e., color matching and stain separation [42][43][44]) mainly rely on the predefined template image and cannot conduct the style transformation between different image datasets. In principle, this style transformation can be resolved by the deep learning-based methods due to their complicated network structure [39,40,45]. For instance, Patli et al. [40] proposed a self-supervised, learning-based lightweight neural network to estimate the color shift from the source stain to a predetermined target stain in appearance. Bug et al. [45] used a pre-trained deep neural network as a feature extractor steering a pixel-wise normalization pipeline, which can achieve excellent normalization results and ensure a consistent representation of color and texture. Janowczyk et al. [41] presented a novel stain normalization algorithm based on sparse autoencoder (StaNoSa) to standardize the color distribution of input images. The results indicated that StaNoSa showed either comparable or superior results to its competitors.
Recently, with the rapid development of deep learning, generative adversarial network (GAN) [36] is also widely used to normalize the patches without the guidance of the template images but can still preserve the organization structure of the tissues. For example, BenTaieb et al. [37] designed a discriminative image analysis model equipped with the GAN component that transferred stains across datasets. However, its performance was largely determined by the auxiliary tasks requiring extra labeling efforts. In order to reduce the labeling efforts for experts, Zanjani et al. [46] proposed a novel unsupervised generative model, which was trained in an end-to-end manner and could be instantly applied to unseen images. Inspired by the cycle-GAN [47], which could be successfully applied to image-style transformation, Shaban et al. [35] proposed a framework named StainGAN, which could achieve better qualitative performance in normalizing different images ( Figure 4). In addition, other works [38,39] also considered the structural integrity of the histopathological images and integrated semantic information at different layers between a pre-trained semantic network and the stain color normalization network to further improve the normalization performance. In general, traditional color normalization methods (i.e., color matching and stain separation [42][43][44]) mainly rely on the predefined template image and cannot conduct the style transformation between different image datasets. In principle, this style transformation can be resolved by the deep learning-based methods due to their complicated network structure [39,40,45]. For instance, Patli et al. [40] proposed a self-supervised, learning-based lightweight neural network to estimate the color shift from the source stain to a predetermined target stain in appearance. Bug et al. [45] used a pre-trained deep neural network as a feature extractor steering a pixel-wise normalization pipeline, which can achieve excellent normalization results and ensure a consistent representation of color and texture. Janowczyk et al. [41] presented a novel stain normalization algorithm based on sparse autoencoder (StaNoSa) to standardize the color distribution of input images. The results indicated that StaNoSa showed either comparable or superior results to its competitors.
Recently, with the rapid development of deep learning, generative adversarial network (GAN) [36] is also widely used to normalize the patches without the guidance of the template images but can still preserve the organization structure of the tissues. For example, BenTaieb et al. [37] designed a discriminative image analysis model equipped with the GAN component that transferred stains across datasets. However, its performance was largely determined by the auxiliary tasks requiring extra labeling efforts. In order to reduce the labeling efforts for experts, Zanjani et al. [46] proposed a novel unsupervised generative model, which was trained in an end-to-end manner and could be instantly applied to unseen images. Inspired by the cycle-GAN [47], which could be successfully applied to image-style transformation, Shaban et al. [35] proposed a framework named StainGAN, which could achieve better qualitative performance in normalizing different images ( Figure 4). In addition, other works [38,39] also considered the structural integrity of the histopathological images and integrated semantic information at different layers between a pre-trained semantic network and the stain color normalization network to further improve the normalization performance.

Pathology Image Segmentation
The segmentation task, which aims at assigning a class label to each pixel of an image, is a common task in pathology image analysis [48]. The segmentation task on histopathological images can be divided into two categories, nuclei segmentation, and tissue segmentation. The nuclei segmentation task focuses on exploring the nuclei features, such as morphological appearances in histopathological images, which are widely recognized as the most frequently used biomarkers for cancer histology diagnosis. On the other hand, the tissue segmentation task takes the histopathology image as input and segments the tissues that are composed of a group of cells in the input image with certain characteristics and structures (i.e., gland, tumor-infiltrating lymphocytes, etc.). These quantitatively measured tissues are also a crucial indicator for the diagnosis and prognosis of human cancers [49,50].
Due to the heterogenous patterns in WSI, the accurate segmentation of nuclei and tissues in the histopathological images is with huge challenges. First, there are variations on nucleus/tissue sizes and shape, requiring a segmentation model with a strong generalization ability. Second, nuclei/cells are often clustered into clumps so that they might partially overlap or touch one another, which will lead to the under-segmentation of histopathological images. Third, in some malignant cases, such as moderately and poorly differentially adenocarcinomas, the structure of the tissues (such as the glands) are heavily degenerated, making them difficult to discriminate [51,52].
In view of these challenges, numerous deep learning-based approaches have been proposed to extract high-level features from WSI that can achieve enhanced segmentation performance. Here, we first review the deep learning-based nuclei segmentation algorithm. Then, we summarize the development of deep learning algorithms on tissue-level segmentation tasks. We show the overview of papers using deep learning for nuclei/tissue segmentation in Figure 5.

Pathology Image Segmentation
The segmentation task, which aims at assigning a class label to each pixel of an image, is a common task in pathology image analysis [48]. The segmentation task on histopathological images can be divided into two categories, nuclei segmentation, and tissue segmentation. The nuclei segmentation task focuses on exploring the nuclei features, such as morphological appearances in histopathological images, which are widely recognized as the most frequently used biomarkers for cancer histology diagnosis. On the other hand, the tissue segmentation task takes the histopathology image as input and segments the tissues that are composed of a group of cells in the input image with certain characteristics and structures (i.e., gland, tumor-infiltrating lymphocytes, etc.). These quantitatively measured tissues are also a crucial indicator for the diagnosis and prognosis of human cancers [49,50].
Due to the heterogenous patterns in WSI, the accurate segmentation of nuclei and tissues in the histopathological images is with huge challenges. First, there are variations on nucleus/tissue sizes and shape, requiring a segmentation model with a strong generalization ability. Second, nuclei/cells are often clustered into clumps so that they might partially overlap or touch one another, which will lead to the under-segmentation of histopathological images. Third, in some malignant cases, such as moderately and poorly differentially adenocarcinomas, the structure of the tissues (such as the glands) are heavily degenerated, making them difficult to discriminate [51,52].
In view of these challenges, numerous deep learning-based approaches have been proposed to extract high-level features from WSI that can achieve enhanced segmentation performance. Here, we first review the deep learning-based nuclei segmentation algorithm. Then, we summarize the development of deep learning algorithms on tissue-level segmentation tasks. We show the overview of papers using deep learning for nuclei/tissue segmentation in Figure 5.

Nuclei-Level Segmentation
Cellular object segmentation is a prerequisite step for the assessment of hum cers [65]. For example, the counting of mitoses is one of the most prognostic f breast cancer requiring the assistance of nuclei segmentation [66]. In the diagnos vical cytology, nuclei segmentation is necessary to discover all types of cytologic malities [67]. The traditional nuclei segmentation algorithms are based on morph processing methods [8], clustering algorithms [68], level set methods [69], and th ants [70][71][72], whose performance are largely determined by the designed feature ing the domain knowledge of experts. Recently, deep learning approaches ha widely applied without the efforts of designing hand-crafted features [73].
Generally, the deep learning-based nuclei segmentation algorithms can be into two categories, the pixel-wise classification methods [64,[74][75][76] and the full lutional network (FCN)-based methods [60,61,77]. Pixel-wise classification meth vert the segmentation task into the classification task, by which the label of each predicted from raw pixel values in a square window centered on it [74]. For Cireşan et al. [64] first densely sampled the squared windows from the WSI, foll classifying the centered pixels via utilizing the rich context information within pled windows. Moreover, Zhou et al. [63] learned a bank of convolutional filte sparse linear regressor to produce the likelihood for each pixel being nuclear ground regions. By considering the windows of different sizes can extract help

Nuclei-Level Segmentation
Cellular object segmentation is a prerequisite step for the assessment of human cancers [65]. For example, the counting of mitoses is one of the most prognostic factors in breast cancer requiring the assistance of nuclei segmentation [66]. In the diagnosis of cervical cytology, nuclei segmentation is necessary to discover all types of cytological abnormalities [67]. The traditional nuclei segmentation algorithms are based on morphological processing methods [8], clustering algorithms [68], level set methods [69], and their variants [70][71][72], whose performance are largely determined by the designed features requiring the domain knowledge of experts. Recently, deep learning approaches have been widely applied without the efforts of designing hand-crafted features [73].
Generally, the deep learning-based nuclei segmentation algorithms can be divided into two categories, the pixel-wise classification methods [64,[74][75][76] and the fully convolutional network (FCN)-based methods [60,61,77]. Pixel-wise classification methods convert the segmentation task into the classification task, by which the label of each pixel is predicted from raw pixel values in a square window centered on it [74]. For example, Cireşan et al. [64] first densely sampled the squared windows from the WSI, followed by classifying the centered pixels via utilizing the rich context information within the sampled windows. Moreover, Zhou et al. [63] learned a bank of convolutional filters and a sparse linear regressor to produce the likelihood for each pixel being nuclear or background regions. By considering the windows of different sizes can extract helpful complementary information for the nuclei segmentation, a multiscale convolutional network and graph-partitioning-based method [62] were proposed for the task of nuclei segmentation. In addition, Xing et al. [78] firstly learned a CNN model to generate a probability map of each image. According to the probability map, each pixel is then assigned a probability belonging to the nucleus. Finally, an iterative region merging algorithm was used to accomplish the segmentation task. Nesma et al. [79] also presented an optimized pixel-based classification model by the cooperation of region growing strategy that could successfully obtain nucleus and cyto- plasm segmentation results. Additionally, Liu et al. [75] proposed a panoptic segmentation model which incorporates an auxiliary semantic segmentation branch with the instance branch to integrate global and local features for nuclei segmentation.
Although the above pixel-wise classification methods have shown more promising performance over the traditional segmentation algorithms, obvious limitations can also be found. First, they are quite slow since the densely selected patches increase the calculation burden for neural network training [80]. Second, the extracted patches cannot fully reveal the rich context information within the whole input image for nuclei segmentation. Accordingly, a more elegant architecture called "fully convolutional network" is proposed [81]. FCN can use the full image rather than the densely extracted patches as the input, which can produce a more accurate and efficient nuclei segmentation result. In addition to FCN, U-Net is another powerful nuclei segmentation tool [82]. In comparison with FCN, U-Net uses skip connections between downsampling and upsampling paths that can stabilize gradient updates for deep model training. Based on the U-Net structure, Zhao et al. [61] proposed a Triple U-Net architecture for nuclei segmentation without the necessity of color normalization and achieved state-of-the-art nuclei segmentation performance ( Figure 6). To split touching nuclei that are hard to segment, Yang et al. [60] used a hybrid network consisting of U-Net and region proposal networks, followed by a watershed step to separate them into individual ones. Amirreza et al. [59] proposed a two-stage U-Net-based model for touching cell segmentation, where the first stage used the U-Net to separate nuclei from the background while the second stage applied the U-Net to regress the distance map of each nucleus for the final touching cell segmentation. To explicitly mimic how human pathologists combine multi-scale information, Schmitz et al. [77] introduced a family of multi-encoder FCN with deep fusion for nuclei segmentation. Other U-Net-based studies include [51,58] proposed deep contour-aware networks that integrate multilevel contextual features to accurately detect and segment nuclei from histopathological images, which could also effectively improve the final segmentation performance. reveal the rich context information within the whole input image for nuclei segmentation. Accordingly, a more elegant architecture called "fully convolutional network" is proposed [81]. FCN can use the full image rather than the densely extracted patches as the input, which can produce a more accurate and efficient nuclei segmentation result. In addition to FCN, U-Net is another powerful nuclei segmentation tool [82]. In comparison with FCN, U-Net uses skip connections between downsampling and upsampling paths that can stabilize gradient updates for deep model training. Based on the U-Net structure, Zhao et al. [61] proposed a Triple U-Net architecture for nuclei segmentation without the necessity of color normalization and achieved state-of-the-art nuclei segmentation performance ( Figure 6). To split touching nuclei that are hard to segment, Yang et al. [60] used a hybrid network consisting of U-Net and region proposal networks, followed by a watershed step to separate them into individual ones. Amirreza et al. [59] proposed a twostage U-Net-based model for touching cell segmentation, where the first stage used the U-Net to separate nuclei from the background while the second stage applied the U-Net to regress the distance map of each nucleus for the final touching cell segmentation. To explicitly mimic how human pathologists combine multi-scale information, Schmitz et al. [77] introduced a family of multi-encoder FCN with deep fusion for nuclei segmentation.
Other U-Net-based studies include [51,58] proposed deep contour-aware networks that integrate multilevel contextual features to accurately detect and segment nuclei from histopathological images, which could also effectively improve the final segmentation performance.

Tissue-Level Segmentation
Besides nuclei segmentation, computerized segmentation of specific tissues in histopathological images is another core operation to study the tumor biology system. For instance, the segmentation of tumor-infiltrating lymphocytes and characterizing their spatial correlation on WSI have become crucial in diagnosis, prognosis, and treatment re-

Tissue-Level Segmentation
Besides nuclei segmentation, computerized segmentation of specific tissues in histopathological images is another core operation to study the tumor biology system. For instance, the segmentation of tumor-infiltrating lymphocytes and characterizing their spatial correlation on WSI have become crucial in diagnosis, prognosis, and treatment response prediction for different cancers [83]. Moreover, gland segmentation is one prerequisite step for quantitatively measuring glandular formation, which is also an important indicator for exploring the degree of differentiation [84,85].
The automatic segmentation of tissues in histology images has been explored by many studies [86,87]. Traditional tissue segmentation methods usually relied on the extraction of handcrafted features, the design of conventional classifiers [88]. Recently, deep learning has become popular in computer vision and image-processing tasks due to its outstanding performance, and some studies also applied deep learning methods for the segmentation of different types of tissues from WSI [56,89,90]. Among the existing deep learning segmentation algorithms, the U-Net-based neural network is still most widely used. For example, Saltz et al. [57] applied the U-Net network to present mappings of tumor-infiltrating lymphocytes on H&E images from 13 TCGA (The Cancer Genome Atlas) tumor types. Based on U-Net, Raza et al. [56] presented a minimal information loss dilated network for gland instance segmentation in colon histology images. Chen et al. [89] presented a deep contouraware network by formulating an explicit contour loss function in the training process and achieved the best performance during the 2015 MICCAI Gland Segmentation (Glas) on-site challenge. Lu et al. [55] proposed BrcaSeg, a WSI processing pipeline that utilized deep learning to perform automatic segmentation and quantification of epithelial and stromal tissues for breast cancer WSI from TCGA. Besides the U-Net structure, Zhao [91] proposed a deep neural network, SCAU-Net, with spatial and channel attention for gland segmentation. SCAU-Net could effectively capture the nonlinear relationship between spatial-wise and channel-wise features, and achieve state-of-the-art gland segmentation performance. Moreover, with the help of the DeeplabV3 model, Musulin [90] developed an enhanced histopathology analysis tool that could accurately segment epithelial and stromal tissue for oral squamous cell carcinoma. Considering that the boundary of the gland is difficult to discriminate, Yan et al. [92] proposed a shape-aware adversarial deep learning framework, which had better tolerance to boundary uncertainty and was more effective for boundary detection. In addition, due to the fixed encoder-decoder structure, U-Net is not suitable for processing texture WSIs, Wen et al. [93] utilized a Gabor-based module to extract texture information at different scales and directions for tissue segmentation. Rojthoven et al. [94] proposed HookNet, a semantic segmentation model combining context information in WSIs via multiple branches of encoder-decoder CNN, for tissue segmentation.
Although much progress has been achieved, the superior performance of previous deep neural network-based methods mainly depends on the substantial number of training images with pixel-wise annotation, which are difficult to obtain due to the requirements of tremendous labeling efforts for experts. In order to reduce the overall labelling cost, several weakly supervised tissue segmentation algorithms have also been proposed [53,95,96]. For instance, Mahapatra [95] proposed a deep active learning framework that could actively select valuable samples from the unlabeled data for annotation, which significantly reduced the annotation efforts while still achieving comparable gland segmentation performance. Lai et al. [96] proposed a semi-supervised active learning framework with a region-based selection criterion. This framework iteratively selects regions for annotation queries to quickly expand the diversity and volume of the labeled set. Besides, Xie et al. [54] proposed a pairwise relation-based semi-supervised model for gland segmentation on histology images, which could produce considerable improvement in learning accuracy with limited labeled images and amounts of unlabeled images. Other studies include [53] having proposed a multiscale conditional GAN for epithelial region segmentation that could be used to compensate for the lack of labeled data in the training dataset. Moreover, Gupta et al. [97] introduced the idea of 'image enrichment' whereby the information content of images based on GAN is increased in order to enhance segmentation accuracy.

Cancer Diagnosis and Prognosis
Cancer is an aggressive disease with a low median survival rate. Ironically, the treatment process is long and very costly due to its high recurrence and mortality rates. Accurate early diagnosis and prognosis prediction of cancer is essential to enhance the patient's survival rate [98,99]. It is now widely recognized that histopathological images are regarded as golden standards for the diagnosis and prognosis of human cancers [100,101]. Previous studies on histopathology image classification and prediction mainly focused on manual feature design. For instance, Cheng et al. [16] extracted a 150-dimensional handcrafted feature to describe each WSI, followed by the traditional classifiers to distinguish different types of renal cell carcinoma. Yu et al. [102] extracted 9879 quantitative features from each image tile and used regularized machine-learning methods to select the top features and to distinguish shorter-term survivors from longer-term survivors with adenocarcinoma or squamous cell carcinoma. Recently, with the success of deep learning in various computer vision tasks, training end-to-end deep learning models for various histopathological image analysis tasks without manually extracting features has drawn much attention [103][104][105].
Generally, the main challenge for applying deep learning algorithms for WSI classification and prediction is the large size of the WSI (e.g., 100,000 × 100,000 pixels), and it is impossible to directly feed these large images into the deep neural network for model training [106,107]. To address this challenge, there are two main lines of approaches, the patch-based and WSI-based methods (which are summarized in Figure 7). learning accuracy with limited labeled images and amounts of unlabeled images. Other studies include [53] having proposed a multiscale conditional GAN for epithelial region segmentation that could be used to compensate for the lack of labeled data in the training dataset. Moreover, Gupta et al. [97] introduced the idea of 'image enrichment' whereby the information content of images based on GAN is increased in order to enhance segmentation accuracy.

Cancer Diagnosis and Prognosis
Cancer is an aggressive disease with a low median survival rate. Ironically, the treatment process is long and very costly due to its high recurrence and mortality rates. Accurate early diagnosis and prognosis prediction of cancer is essential to enhance the patient's survival rate [98,99]. It is now widely recognized that histopathological images are regarded as golden standards for the diagnosis and prognosis of human cancers [100]. Previous studies on histopathology image classification and prediction mainly focused on manual feature design. For instance, Cheng et al. [101] extracted a 150-dimensional handcrafted feature to describe each WSI, followed by the traditional classifiers to distinguish different types of renal cell carcinoma. Yu et al. [102] extracted 9879 quantitative features from each image tile and used regularized machine-learning methods to select the top features and to distinguish shorter-term survivors from longer-term survivors with adenocarcinoma or squamous cell carcinoma. Recently, with the success of deep learning in various computer vision tasks, training end-to-end deep learning models for various histopathological image analysis tasks without manually extracting features has drawn much attention [103][104][105].
Generally, the main challenge for applying deep learning algorithms for WSI classification and prediction is the large size of the WSI (e.g., 100,000 × 100,000 pixels), and it is impossible to directly feed these large images into the deep neural network for model training [106,107]. To address this challenge, there are two main lines of approaches, the patch-based and WSI-based methods (which are summarized in Figure 7).

Patch-Level Methods
In connection with the large size of WSI, the patch-based methods required the pathologist to select the region of interests from WSI that are representative, then the selected regions were split into patches with a significantly smaller size for deep model training [108,109,117]. For instance, Zhu et al. [108]

Patch-Level Methods
In connection with the large size of WSI, the patch-based methods required the pathologist to select the region of interests from WSI that are representative, then the selected regions were split into patches with a significantly smaller size for deep model training [108,109,117]. For instance, Zhu et al. [108] developed a deep CNN for survival analysis (DeepConvSurv) with the pathological patches derived from the WSI. They demonstrated that the end-to-end learning algorithm, DeepConvSurv, outperformed the standard Cox proportional hazard model. Cheng et al. [109] applied a deep autoencoder to aggregate the extracted patches into different groups and then learn topological features from the clusters to characterize cell distributions of different cell types for survival prediction.
By considering that training a model from scratch requires a very large dataset and takes a long time to train. Some patch-based methods also adopted the transfer learning model (TL) to speed up the training procedure, as well as improve the classification performance. TL provides an effective solution for feasibly and fast customized accurate models by transferring and fine-tuning the learned knowledge of pre-trained models over large datasets. For instance, Xu et al. [117] exploited CNN activation features to achieve region-level classification results. Specifically, they first over-segmented each preselected region into a set of overlapping patches. A TL strategy was then explored by pretraining CNN with ImageNet. Finally, an SVM classifier was adopted for classification. Similarly, Källénet et al. [110] extracted features from the divided patches via the pre-trained OverFeat network. The RF classifier was applied to discriminate the subtypes in prostatic adenocarcinoma. Moreover, in [111], the pre-trained VGG-16 network was first applied to extract descriptors from the preselected patches. Then, the feature representation of WSI was computed by the average pooling of the feature representations of its associated patches.

WSI-Level Methods
Although much progress has been achieved, the abovementioned patch-level prediction methods still have several inherent drawbacks. First, the patch-based methods required labor-sensitive patch-level annotation, which would increase the workload for the pathologist [118]. Second, most of the existing patch-based methods usually assumed that the diagnosis or survival information with each randomly selected patch was the same as its corresponding WSI, which neglected the fact that WSI usually had large heterogenous patterns and thus the patch-level label would not always match the WSI-level label [119].
In view of these challenges, building diagnosis/prognosis models only relying on WSI-level annotation has been widely investigated [112,119,120]. Among the WSI-based methods, the multi-instance learning (MIL) framework was a simple but most effective tool. For example, Shao et al. [112] considered the ordinal characteristic of the survival process by adding a ranking-based regularization term on the Cox model and used the average pooling strategies to aggregate the instance-level results to the WSI-level prediction results (Figure 8). Similarly, Iizuka et al. [120] first trained a CNN model using millions of tiles extracted from the WSI. Then, a max-pooling strategy combined with the recurrent neural network was adopted to fuse the patch-level results into WSI-level prediction results. However, by considering the simple decision fusion approaches (e.g., average pooling and max pooling) were insufficiently robust to make the right WSI-level prediction, Yao et al. [113] proposed an attention-guided deep multiple instance learning network (DeepAttnMISL) for survival prediction from WSI. In comparison with the traditional pooling strategies, attention-based aggregation is more flexible and adaptive for survival prediction. In addition, Chikontwe et al. [114] presented a novel MIL framework for histopathology slide classification. The proposed framework could be applied for both instance and bag level learning with a center loss that minimized intraclass distances in the embedding space. The experimental results also suggested that the proposed method could achieve overall improved performance over recent state-of-the-art methods. Moreover, Wang et al. [119] first extracted the spatial contextual features from each patch. Then, a globally holistic region descriptor was calculated after aggregating the features from multiple representative instances for WSI-level classification.
WSIs have demonstrated their effectiveness since GCNs can better exploit and preserve neighboring relations compared with CNN-based models. Besides, some researchers have noticed the relation between genes and images. Chen et al. [116] presented a multimodal co-attention transformer (MCAT) framework that learns an interpretable, dense co-attention mapping between WSI and genomic features formulated in an embedding space.

Open Resources
A collection of high-quality labeled datasets is an important prerequisite for deep model training. We show the benchmark datasets in terms of different tasks in Table 2. Specifically, to carry out color normalization tasks, NIA Lymphoma 2009, UCSB, CAMELYON16, and CAMELYON17 datasets were most widely used. As for nuclei/tissue segmentation tasks, MoNuSeg 2018, TNBC 2018, GLAS 2015, and CRAG 2019 projects provided essential information for the convenience of deep model training. Finally, the datasets of ACDC-LungHP 2019, CRCHisto 2016, and CoNSeP 2019 collected the WSI and their corresponding diagnosis/prognosis information for numerous cancers patients. As can be seen from Table 3, QuPath [123], PMA.start, Orbit [124], and CellProfiler [125] are open, powerful, flexible, extensible software platforms for bioimage analysis, which can conduct each step for pathological image analysis. Openslide [126] is a Python package that can provide a simple interface to read WSI, and ASAP is an open-source WSI viewer which focuses on fast and fluid image viewing with an easy-to-use interface for making annotations based on Openslide. In addition, ImageJ [127] is also a famous open-source medical imaging viewer which can add powerful plug-ins to use many image analysis algorithms. A plugin for ImageJ, named SlideJ, can seamlessly extend the application of image analysis algorithms implemented in ImageJ for single microscopic field images to a WSI analysis. Finally, The Cytomine software [128] is an open-source web platform that can foster collaborative analysis of very large images and allows for semi-automatic processing of large image collections via machine learning algorithms.  Although CNN-based MIL frameworks have shown impressive performance in the field of histopathology analysis, they are unable to capture complex neighborhood information as they analyze local areas determined by the convolutional kernel to extract interaction information between objects. Recently, some researchers have also applied the graph convolutional network (GCN) to analyze histopathological images for the diagnosis and prognosis of human cancers [115,121], which are becoming increasingly useful for medical diagnosis and prognosis. For instance, Chen et al. [115] presented a context-aware graph convolutional network that hierarchically aggregates instance-level histology features to model local-and global-level topological structures in the tumor microenvironment. Li et al. [121] proposed to model WSI as a graph and then develop a graph convolutional neural network with attention learning that better serves the survival prediction by rendering the optimal graph representations of WSIs. Moreover, the study in [122] presented a patch relevance-enhanced graph convolutional network (RGCN) to explicitly model the correlations of different patches in WSI, which can approximately estimate the diagnosisrelated regions in WSI. Extensive experiments on real lung and brain carcinoma WSIs have demonstrated their effectiveness since GCNs can better exploit and preserve neighboring relations compared with CNN-based models. Besides, some researchers have noticed the relation between genes and images. Chen et al. [116] presented a multimodal co-attention transformer (MCAT) framework that learns an interpretable, dense co-attention mapping between WSI and genomic features formulated in an embedding space.

Open Resources
A collection of high-quality labeled datasets is an important prerequisite for deep model training. We show the benchmark datasets in terms of different tasks in Table 2. Specifically, to carry out color normalization tasks, NIA Lymphoma 2009, UCSB, CAME-LYON16, and CAMELYON17 datasets were most widely used. As for nuclei/tissue segmentation tasks, MoNuSeg 2018, TNBC 2018, GLAS 2015, and CRAG 2019 projects provided essential information for the convenience of deep model training. Finally, the datasets of ACDC-LungHP 2019, CRCHisto 2016, and CoNSeP 2019 collected the WSI and their corresponding diagnosis/prognosis information for numerous cancers patients. As can be seen from Table 3, QuPath [123], PMA.start, Orbit [124], and CellProfiler [125] are open, powerful, flexible, extensible software platforms for bioimage analysis, which can conduct each step for pathological image analysis. Openslide [126] is a Python package that can provide a simple interface to read WSI, and ASAP is an open-source WSI viewer which focuses on fast and fluid image viewing with an easy-to-use interface for making annotations based on Openslide. In addition, ImageJ [127] is also a famous open-source medical imaging viewer which can add powerful plug-ins to use many image analysis algorithms. A plugin for ImageJ, named SlideJ, can seamlessly extend the application of image analysis algorithms implemented in ImageJ for single microscopic field images to a WSI analysis. Finally, The Cytomine software [128] is an open-source web platform that can foster collaborative analysis of very large images and allows for semi-automatic processing of large image collections via machine learning algorithms.

Future Work
We primarily reviewed the recently developed deep learning algorithms employed for the analysis of histopathological images. Although tremendous efforts have been made, several issues should be addressed in future studies. First, most color normalization algorithms are designed to match the H&E-stained images derived from different sources. However, it is still challenging to accomplish the color transformation task from H&E-stained images to other immunohistochemistry-stained images due to the large variance between them. Applying the normalization step to match the image with different stains that can facilitate a chromatic distinction among different tissue constituents needs more study [129]. Second, although the deep learning algorithms have shown their advantages for the segmentation of nuclei and specific tissues from the histopathological image, the generation of an adequate volume of high-quality labels still needs tremendous annotation efforts from the pathologist. While the existing weakly supervised learning algorithms, such as active learning and semi-supervised learning methods, can reduce the annotation workload on pathologists to some extent, a design for a scalable crowdsourcing approach [130] that benefits from the participation of non-pathologists to reduce pathologist effort and enables minimal-effort collection of segmentation boundaries is needed. Third, most of the WSI-level diagnosis or prognosis models are calculated in a black box, so that no human can understand which part in the WSI mostly affects the final prediction. To make our model more explainable, it is desirable to design a deep learning model that can identify discriminant patches from the WSI that triggers the clinical results. Finally, imaging genomics [131], as an emerging research field, has also created new opportunities for the diagnosis and prognosis of human cancers. How to effectively combine the imaging and genomic data [132] to help better understand prognostic and, hopefully, therapeutic aspects of various human cancers is another interesting and prospective research direction in the future.

Conclusions
We have reviewed the advanced deep learning algorithms for the computational analysis of H&E-stained histopathological images. We presented some recent findings on the state-of-the-art deep learning techniques on different H&E-stained pathological image analysis tasks, such as color normalization, nuclei/tissue segmentation, and the diagnosis and prognosis of human cancers. We also provided online resources and outlined open research problems on digital H&E-stained pathology image analysis. Deep learning is a powerful tool, providing reliable support for diagnostic assessment and treatment decisions. Last but not least, we also provided open research problems for future studies including removing the stain variation between H/E and IHC stained images, reducing the human annotation efforts for tissue/nuclei segmentation, designing the explainable deep neural network for identifying discriminant and meaningful patches from the image, and integrating histopathological images with genomic data for clinical outcome prediction.

Conflicts of Interest:
The authors declare no conflict of interest.