However, only a few of these works focus on computer-assisted estimation, although they hold the promise of providing a more objective and a less error-prone method, especially when using machine-learning models.
Early computer-assisted approaches for tumor and stroma assessment were based on conventional machine learning, using so-called handcrafted texture features. For example, Linder et al. [
24] investigated various texture features such as local binary patterns and Gabor filters together with support vector machines (SVM) for the classification of image patches as either tumor epithelium or stroma. Image patches were extracted from digitized colorectal cancer tissue microarrays (TMAs) which had been immunostained with an epidermal growth factor receptor antibody. TSR values were not calculated but rather mentioned as a possible application. Similarly, Bianconi et al. [
25] applied handcrafted features in conjunction with an SVM as well as a nearest-neighbor and naive Bayes rule classifier to address the same two-class problem (tumor epithelium vs. stroma) on colorectal cancer TMAs, but they did not calculate TSR values either. Geessink et al. (2015) [
26] investigated the pixel-wise classification of tumor and stroma tissue in hematoxylin and eosin (H&E)-stained digitized colon sections. They also used explicit features, for example local density nucleus pixels, and trained a normal-density-based quadratic classifier. They evaluated their approach against manual pixel-wise annotations. Since they only distinguished between tumor and stroma, smaller necrotic areas were counted as part of the tumor. In subsequent work, Geessink et al. (2019) [
8] applied a convolutional neural network (CNN) for tissue classification and distinguished nine different tissue classes. The computer-derived TSR values were compared to TSR values estimated by two pathologists for 129 patients, which was followed by a survival analysis. Zhao et al. [
9] also applied a CNN that was trained on image patches of nine different tissue classes. In contrast to many other approaches, which calculate the TSR only within a region of interest (ROI), they determined the TSR based on the complete WSI as a ratio between the classified stroma and tumor areas. In addition to evaluating the prognostic significance of the TSR regarding the survival rate, they also performed a TSR consistency analysis on 126 images with manual annotations of stroma and tumor tissues. However, for this consistency analysis, they hand-selected ROIs that only comprised tumor and stroma tissues. Millar et al. [
22] used the QuPath PixelClassifier v0.2.1, yet they only segmented the images into the three classes: tumor epithelium, stroma and background (with fatty tissue). They calculated the TSR for breast cancer TMAs that were stained with H&E and investigated the prognostic significance of the TSR. Segmentation results were not quantitatively evaluated, but the authors indicated that segmentation required supervision by a pathologist and reported this as a limitation of their study. They speculated that a deep learning approach might improve segmentation. Hacking et al. [
27] used QuPath superpixel image segmentation (SIS) together with an artificial neural network as a classifier to segment tumor regions into tumor epithelium, collagenous stroma and myxoid stroma. They also did not quantitatively evaluate their segmentation results but focused on the prognostic value of myxoid stroma ratio. Hong et al. [
28] only considered three classes: background (non-tissue), stroma and tumor. They generated a binary tissue mask with a fixed threshold after transformation of the H&E image into grayscale. The special aspect of their work is converting the grayscale H&E image into a virtual cytokeratin stained image using a conditional generative adversarial network (GAN). Afterwards, they binarized the cytokeratin image by thresholding its chromogen (Diaminobenzidine, DAB) channel. Based on these two binary masks, they calculated the tumor and stroma areas. Abbet et al. [
12] reported a fully automated TSR estimation on WSIs and performed survival analysis on 221 WSIs of colorectal cancer patients. Their TSR scoring followed the recommendation of Pelt et al. [
7]. In a first step, a tissue classification is performed on the full WSI by applying a model trained using self-supervision and unsupervised domain adaptation [
29] to detect tumor and tumor-adjacent stroma tissue. As in the other CNN-based classification methods mentioned above, a class label is assigned to every image patch. The resulting checkerboard-like segmentation is then smoothened by applying conditional random fields. In a second step, automatic identification of the ROI in which the TSR is to be determined takes place. Finally, TSR is calculated within this ROI. Moreover, the TSR for the complete WSI is calculated, and both TSR values were shown to be statistically relevant for survival analysis. Smit et al. (2023) [
14] investigated how feasible semi-automated and fully automated TSR scoring is. They used the same procedure as Geessink et al. (2019) for tissue segmentation. For the semi-automated approach, the ROIs from the visual scoring were used. For their fully automated approach, the WSIs were segmented and post-processed by a concave hull algorithm to obtain the tumor region. They selected circular ROIs based on several rules such as, e.g., size or lack of background.
Most of the works cited above focus on survival analysis based on TSR scoring [
8,
9,
12,
22,
28]. An evaluation of the TSR values compared to those of human observers was performed by Geessink et al., Hong et al. and Smit et al. [
8,
14,
28]. Geessink et al. performed both a comparison of the TSR values and a pixel-by-pixel comparison of the segmentation results against manual annotations. Zaoh et al. [
9] performed a thorough evaluation of the segmentation and TSR determination of their method, but they only considered the tumor and stroma.
Our work combines the most important aspects of previous works. We segment a wide number of relevant tissue types in the tumor microenvironment (TME), including the tumor, stroma, necrosis, mucus and background. Therefore, with our approach, the TSR can even be determined in regions where necrosis or mucus are present. Almost all of the methods mentioned above are based on patch-wise classification. We directly employ a more finely grained segmentation method. The main difference between segmentation and classification approaches is that the result of the former is a mask that assigns the contained pixels of the considered image patches to the different classes, while classification assigns the whole image patch to a single class. Classification yields a detailed segmentation map by analyzing overlapping image patches (increasing the computational complexity) and by post-processing, e.g., using conditional random fields.
The supervised training of a deep learning-based segmentation approach requires pixel-precise annotations of a set of example images. Creating such annotations is a very time-consuming and tedious task. Therefore, we have chosen a so-called few-shot method [
30] instead, which can be adapted to new segmentation tasks given only a few new annotated examples (“a few shots”). In the case of prototype-based few-shot models, this strategy can even be used without retraining the underlying neural network’s weights. In contrast, only prototypes representing the classes to be segmented need to be adjusted, which may represent one of the most attractive features of this few-shot approach. As a comparative baseline reference, we train and evaluate a U-Net model [
31], which is one of the most widely applied algorithms for the segmentation of biological and medical image data [
32]. We present both an evaluation on a pixel-wise annotated test set and a comparison of human observers’ TSR values with the TSR values derived from the predicted segmentation results. Moreover, we examine in detail the causes of discrepancies between the calculated and estimated TSR values as well as the inter-rater agreement.