Segmentation of Non-Small Cell Lung Carcinomas: Introducing DRU-Net and Multi-Lens Distortion

Oskouei, Soroush; Valla, Marit; Pedersen, André; Smistad, Erik; Dale, Vibeke Grotnes; Høibø, Maren; Wahl, Sissel Gyrid Freim; Haugum, Mats Dehli; Langø, Thomas; Ramnefjell, Maria Paula; Akslen, Lars Andreas; Kiss, Gabriel; Sorger, Hanne

doi:10.3390/jimaging11050166

Open AccessArticle

Segmentation of Non-Small Cell Lung Carcinomas: Introducing DRU-Net and Multi-Lens Distortion

by

Soroush Oskouei

^1,2,*

,

Marit Valla

^3,4,

André Pedersen

^3,5,6,

Erik Smistad

^1,7,

Vibeke Grotnes Dale

^3,8,

Maren Høibø

^3,4,

Sissel Gyrid Freim Wahl

⁸,

Mats Dehli Haugum

⁸,

Thomas Langø

^7,9,

Maria Paula Ramnefjell

^10,11

,

Lars Andreas Akslen

^10,11

,

Gabriel Kiss

^9,12 and

Hanne Sorger

^1,2

¹

Department of Circulation and Medical Imaging, Norwegian University of Science and Technology (NTNU), NO-7491 Trondheim, Norway

²

Clinic of Medicine, Levanger Hospital, Nord-Trøndelag Health Trust, NO-7600 Levanger, Norway

³

Department of Clinical and Molecular Medicine, Norwegian University of Science and Technology (NTNU), NO-7491 Trondheim, Norway

⁴

Clinic of Laboratory Medicine, St. Olavs Hospital, Trondheim University Hospital, NO-7030 Trondheim, Norway

⁵

Clinic of Surgery, St. Olavs Hospital, Trondheim University Hospital, NO-7030 Trondheim, Norway

⁶

Application Solutions, Sopra Steria, NO-7010 Trondheim, Norway

⁷

Department of Health Research, SINTEF Digital, NO-7465 Trondheim, Norway

⁸

Department of Pathology, St. Olavs Hospital, Trondheim University Hospital, NO-7030 Trondheim, Norway

⁹

Center for Innovation, Medical Devices and Technology, Research Department, St. Olavs Hospital, Trondheim University Hospital, NO-7491 Trondheim, Norway

¹⁰

Centre for Cancer Biomarkers CCBIO, Department of Clinical Medicine, University of Bergen, NO-5007 Bergen, Norway

¹¹

Department of Pathology, Haukeland University Hospital, NO-5020 Bergen, Norway

¹²

Department of Computer Science, Norwegian University of Science and Technology (NTNU), NO-7491 Trondheim, Norway

Show full affiliation list

Hide full affiliation list

^*

Author to whom correspondence should be addressed.

J. Imaging 2025, 11(5), 166; https://doi.org/10.3390/jimaging11050166

Submission received: 27 March 2025 / Revised: 3 May 2025 / Accepted: 13 May 2025 / Published: 20 May 2025

(This article belongs to the Special Issue Deep Learning in Biomedical Image Segmentation and Classification: Advancements, Challenges and Applications)

Download

Browse Figures

Versions Notes

Abstract

The increased workload in pathology laboratories today means automated tools such as artificial intelligence models can be useful, helping pathologists with their tasks. In this paper, we propose a segmentation model (DRU-Net) that can provide a delineation of human non-small cell lung carcinomas and an augmentation method that can improve classification results. The proposed model is a fused combination of truncated pre-trained DenseNet201 and ResNet101V2 as a patch-wise classifier, followed by a lightweight U-Net as a refinement model. Two datasets (Norwegian Lung Cancer Biobank and Haukeland University Lung Cancer cohort) were used to develop the model. The DRU-Net model achieved an average of 0.91 Dice similarity coefficient. The proposed spatial augmentation method (multi-lens distortion) improved the Dice similarity coefficient from 0.88 to 0.91. Our findings show that selecting image patches that specifically include regions of interest leads to better results for the patch-wise classifier compared to other sampling methods. A qualitative analysis by pathology experts showed that the DRU-Net model was generally successful in tumor detection. Results in the test set showed some areas of false-positive and false-negative segmentation in the periphery, particularly in tumors with inflammatory and reactive changes. In summary, the presented DRU-Net model demonstrated the best performance on the segmentation task, and the proposed augmentation technique proved to improve the results.

Keywords:

lung carcinoma; digital pathology; tumor segmentation; deep learning; data augmentation

1. Introduction

Early diagnosis of lung cancer is crucial for patient survival [1]. Although physical examinations and medical imaging are included in the diagnostic work-up, tissue samples are needed to establish a cancer diagnosis. The histopathological diagnosis, including the analysis of tumor biomarkers, influences therapeutic decisions and should, therefore, be assessed as early and accurately as possible [2,3].

Digitizing tissue slides allows evaluation via computer screens, which can improve efficiency over traditional microscopy [4]. It also supports AI-driven tissue classification, segmentation, potentially increasing the speed of image interpretation, and refining clinical decision-making [5,6,7,8]. Correct segmentation of the tumor is a necessary step towards computer-assisted tumor analysis and lung cancer diagnosis [9,10,11,12,13,14].

When working with whole slide images (WSIs), the application of AI models is complicated due to the large size of the images. Down-sampling the WSIs to a manageable size would compromise resolution and potentially result in the loss of critical diagnostic details. A common approach in digital pathology is, therefore, to divide the images into several small squares, called patches. This is a more effective approach, but the use of patch-based analysis alone can lead to a loss of broader spatial relationships. Alternatively, the image can be down-sampled, or a hybrid strategy that combines both methods can be used to optimize the analytical balance between detailed resolution and global context.

Some of the best-performing AI methods in the analysis of WSIs are deep neural networks [14,15]. The state-of-the-art in image segmentation tasks is the use of complex neural network architectures such as vision transformers and InternImage [16,17]. However, these methods require a relatively large amount of data [18]. Transfer learning techniques may also be used to train or fine-tune pre-trained models on new data [19]. Patch-wise classification (PWC) or segmentation approaches may outperform direct segmentation of the tumor in a down-sampled image without dividing it into patches [20].

Several models have been proposed for tumor segmentation in WSIs [11,21,22,23,24,25,26,27,28,29]. Zhao et al. proposed a novel hybrid deep learning framework for colorectal cancer that uses a U-Net architecture. This model features innovative residual ghost blocks, which include switchable normalization and bottleneck transformers for extracting features [11].

The MAMC-Net model introduced a multi-resolution attention module that utilizes pyramid inputs for broader feature information and detail capture [21]. An attention mechanism refines features for segmentation, while a multi-scale convolution module integrates semantic and high-resolution details. Finally, a connected conditional random field ensures accurate segmentation by addressing discontinuities [21]. The authors showcased the superior performance of their model on breast cancer metastases and gastric cancer [21].

DHU-Net combines Swin Transformer and ConvNeXt within a dual-branch hierarchical U-shaped architecture [22,30,31]. This method effectively fuses global and local features by processing WSI patches through parallel encoders, utilizing global-local fusion modules and skip connections for detailed feature integration [22]. The Cross-scale Expand Layer aids in resolution recovery across different scales. The network was evaluated on datasets covering different tumor features and cancer types, and achieved higher segmentation results than other tested methods [22].

Krikid et al. showed that deep-learning applications in microscopic image segmentation have evolved from predominantly cell- and nucleus-centric tasks—often on small, homogeneous datasets—to encompass more complex, tissue-level analyses, reflecting a shift toward multi-scale, clinically relevant segmentation across diverse microscopy-modality types [32]; Greeley et al. introduced pyramid tiling for efficient gigapixel histology analysis [33]; promptable models like SAM and MedSAM enable zero-shot, universal segmentation across modalities [34,35].

Pedersen et al. introduced H2G-Net, a cascaded convolutional neural network (CNN) architecture for segmenting breast cancer regions from gigapixel histopathological images [23]. It employs a patch-wise detection stage and a convolutional autoencoder for refinement, demonstrating significant improvements in tumor segmentation. The approach outperformed single-resolution methods, achieving a Dice similarity coefficient (DSC) of (0.933 ± 0.069) [23]. Its efficiency is underscored by fast processing times and the ability to train deep neural networks without having to store patches on disk.

One of the most significant challenges in using WSIs for tumor segmentation is still the scarcity of labeled data. The marking of tumor tissue in WSIs by pathology experts is time-consuming and may be a bottleneck in research. Alternative computational strategies, such as unsupervised or semi-supervised learning methods should, therefore, be explored. Clustering allows the segmentation of tumor regions with little or no need for predefined labels, and can be a useful tool in this context [24,25].

Yan et al. presented a self-supervised learning method using contrastive learning to process WSIs for tissue clustering [26]. This approach generates discriminative embeddings for initial clustering, refined by a silhouette-based scheme, and extracts features using a multi-scale encoder [26]. It achieved high accuracy in identifying tissues without annotations. Their results show an area under the curve (AUC) of 0.99 and accuracy of approximately 0.93 for distinguishing benign from malignant polyps in a cohort of 20 patients [26].

Few-shot learning is also a promising method for handling limited labeled data [27,28]. By design, few-shot learning algorithms can learn from a very limited number of labeled examples. This can be particularly relevant for the classification of small patches, where a small set of labeled examples can guide the learning process. Few-shot learning techniques can generalize from these examples to classify new, unseen patches, facilitating the identification and segmentation of tumor regions [27,28]. Titoriya et al. explored few-shot learning to enhance dataset generalization and manageability by utilizing prototypical networks and model agnostic meta-learning across four datasets [29]. The design achieved 85% accuracy in a 2-way 2-shot 2-query mode [29].

In this paper, we propose a new CNN-based model which is a combination of DenseNet [36], ResNet [37], and U-Net architecture (DRU-Net) for segmenting non-small cell lung carcinomas (NSCLCs). It is an end-to-end approach consisting of a dual head for feature extraction and patch classification, followed by a U-Net for refining the segmentation result. The proposed model is tested on a novel in-house dataset of 97 annotated NSCLC WSIs. To increase model performance, we adopted a many-shot learning approach during training and added a multi-lens distortion augmentation technique to both patches and down-sampled WSIs.

2. Materials and Methods

2.1. Cohorts

In this study, two different collections of NSCLCs were used: the Norwegian lung cancer biobank (NLCB) cohort and Haukeland University lung cancer (HULC) cohort [38,39]. The NLCB cohort includes histopathological, cytological, biomarker, and clinical follow-up data from patients with suspected lung cancer diagnosed in Central Norway after 2006 [40]. Both diagnostic tumor biopsies and sections from surgical lung cancer specimens are available. The distribution of histological subtypes in each dataset is listed in Table 1 [41,42].

The HULC cohort comprises 438 surgically treated NSCLC patients diagnosed at Haukeland University Hospital, Bergen, Norway from 1993 to 2010. In this study, 97 NSCLC cases from the HULC cohort were included. From both cohorts, 4 µm tissue sections were made, deparaffinized, rehydrated in ethanol, and immersed in tap water. Hematoxylin staining was applied and sections were rinsed in water and then in ethanol. Sections were then stained with alcoholic eosin. Post-staining, slides were dehydrated in ethanol, placed in TissueClear, air-dried, and scanned using Olympus VS120-S5 scanner (Olympus Soft Imaging Solutions GmbH, Munster, Germany) at

\times 40

magnification [43]. WSIs were quality-controlled by a pathologist to ensure that only high-quality scans were included in the study. They were reviewed for sectioning, staining, and scanning artifacts.

To conduct a broader study of the proposed augmentation’s effect, we utilized the following open datasets in addition to HULC: MNIST, Fashion-MNIST, CIFAR-10, and CIFAR-100 [44,45,46].

2.2. Ethical Aspects

All methods were carried out in accordance with relevant guidelines and regulations, and the experimental protocols were approved by the Regional Committee for Medical and Health Sciences Research Ethics (REK) Norway (2013/529, 2016/1156, and 257624). Informed consent was obtained from all subjects and/or their legal guardian(s) for NLCB in accordance with REK 2016/1156. For subjects in the HULC cohort, exemption from consent was ethically approved by REK (2013/529).

2.3. Annotations and Dataset Preparation

We used two annotation approaches on WSIs: whole tumor annotation (WTA) and partial selective annotation (PSA). In the WTA approach, pathologists marked the tumor outline in 97 WSIs from the HULC cohort. Of these WSIs, 51 were used for training, 26 were used for validation, and 20 were used for testing. WSIs with tissue microarray (TMA) holes (n = 3) were manually assigned to the test set to prevent potential biased training; the remaining WSIs were randomly separated into the training, validation, and the rest of the test sets.

To reduce the time spent by pathologists in making the WTAs, initial annotations were first made in 72 cases using two different AI-based segmentation models, (i) the H2G-Net model developed for breast cancer segmentation (n = 25) and (ii) a customized early-stage clustering model based on the corrected annotations from the H2G-Net model (n = 47) [23]. Pathologists then manually refined the tumor region annotations using the QuPath software (version 0.3.2) [47]. The remaining 25 cases were manually annotated without any prior AI-based segmentation models. A third pathologist reviewed the annotations, and in case of discrepancy, consensus was reached after discussion. The final annotations were exported as binary masks, serving as ground truth.

In the PSA approach, pathologists marked small regions of interest in 42 WSIs from the NLCB cohort. These WSIs were used for training and validation of the patch-wise classifier model. Marked areas included parts of the invasive tumor, normal alveolar tissue, stromal tissue, immune cells, and areas of necrosis. Other non-tumor tissues marked included respiratory epithelium, reactive alveolar tissue, cartilage, blood vessels, glands, lymph nodes, and macrophages. The purpose of marking these regions was to reduce the time required for manually annotating whole tumor regions, and to guide a particular selective generation of patches intended for use in the patch-wise model’s training.

2.4. Proposed Method

The pipeline of the proposed model (DRU-Net) has two distinct stages, a PWC stage and a refinement stage. The PWC model was trained on the NLCB cohort using a many-shot learning method, and the refinement U-Net was trained on a set of down-sampled WSIs from the HULC cohort. In the PWC stage, the model assigns probabilities to each patch of the WSIs (excluding the glass), indicating whether the patch contains tumor tissue or non-tumor tissue. The classifier outputs a preliminary assessment of each patch’s nature, based on local features within the patch. The patches are then stitched together to produce a heatmap matching the original size of the down-sampled WSIs.

2.4.1. Patch-Wise Classifier

The PWC was constructed by fusing truncated backbones of two architectures, DenseNet201 [36] and ResNet101V2 [37], pre-trained on ImageNet [48]. We conducted a preliminary search on a dataset subset to determine the most effective truncation points for both DenseNet and ResNet backbones. This empirical exploration guided our layer selection based on performance. These networks are used for parallel processing of the input and feature generation (we refer to this PWC model as DR-Fused). In our proposed architecture, both DenseNet201 and ResNet101V2 receive the same input, which is the image patch. Each network processes this input concurrently, and after feature extraction, the outputs from both DenseNet201 and ResNet101V2 pass through their respective global average pooling layers. This step compresses the feature representation to help prevent overfitting. The compressed features from both networks are then concatenated and fed through the classifier head (Figure 1).

2.4.2. Refinement Network

The heatmap is generated from applying the PWC across the WSIs. The resultant heatmap is then resized and concatenated with a down-sampled version of the WSI (

1120 \times 1120

pixels). The fused inputs are then fed to a refinement network, similar to H2G-Net [23]. Using a refinement network allows for adjusting the initial patch-wise predictions based on global WSI-level information.

The proposed refinement network is a simple, lightweight U-Net architecture, specifically tailored to process two image inputs (Figure 1). In this model, the two inputs (down-sampled RGB WSI and the heatmap) are concatenated into a 4-channel image and then processed through multiple convolutional layers with ReLU activation functions, batch normalization, spatial dropout, skip connections, max pooling, and up-sampling layers (with nearest-neighbor interpolation). The network ends with a softmax activation function.

2.4.3. Data Augmentation

To improve model robustness, data augmentation is commonly performed. Data augmentation generates artificial copies of the training data through a predefined algorithm. This allows the training data to better cover the expected data variation. Data augmentation was integrated into the training data generation process, with the following methods applied randomly: vertical and horizontal flipping, rotations (multiples of

90^{°}

), multiplicative contrast adjustment, hue and brightness variations, and the proposed multi-lens distortion augmentation. During the many-shot learning using PSA, we extracted patches by cropping a random

224 \times 224

-pixel section from each image. Each image appeared only once per epoch, where an epoch is defined as one iteration of all the training data.

2.4.4. Multi-Lens Distortion Augmentation

A novel data augmentation method, multi-lens distortion, was developed to simulate several local random lens distortions. This technique aims to allow the model to recognize the important features of the images under a wider range of cell/tissue shapes.

The algorithm uses a predefined number of lenses. For each lens, a random position within the image is selected. Then, a random distortion radius and strength value are used to apply the barrel and/or pincushion distortion effect at the selected positions (Algorithm 1). An example of this augmentation is shown in Figure 2. The optimal radius range and lens count were established empirically through an iterative series of experiments, with each configuration assessed qualitatively to identify the most compelling results. From a histopathology point of view, too strong augmentations produce morphologically invalid images, which degrade performance. Thus, it is necessary to specifically tune these parameters for the targeted applications, especially in healthcare.

Algorithm 1 Multi-Lens Distortion (implementation-level pseudocode)

Require:

i m g \in R^{H \times W \times C}

, N ▹ number of lenses,

(r_{min}, r_{max})

,

(s_{min}, s_{max})

Ensure:

o u t \in R^{H \times W \times C}

1:: $o u t \leftarrow i m g$ ▹ deep copy
2:: $(y i d x, x i d x) \leftarrow meshgrid (0 : H - 1, 0 : W - 1)$
3:: for $i \leftarrow 1$ to N do
4:: $c x \leftarrow randInt (0, W - 1)$
5:: $c y \leftarrow randInt (0, H - 1)$
6:: $R \leftarrow randInt (r_{min}, r_{max})$
7:: $S \leftarrow randFloat (s_{min}, s_{max})$
8:: for all $(y, x)$ in ${0 : H - 1} \times {0 : W - 1}$ do
9:: $d x \leftarrow x - c x$ ; $d y \leftarrow y - c y$
10:: $r \leftarrow \sqrt{d x^{2} + d y^{2}}$
11:: if $r < R$ then
12:: $\hat{r} \leftarrow r / R$ ▹ normalised distance
13:: $s f \leftarrow 1 - \hat{r}$ ▹ scaling factor
14:: $s c a l e \leftarrow 1 - S \cdot s f$
15:: $x_{n e w} \leftarrow c x + d x \cdot s c a l e$
16:: $y_{n e w} \leftarrow c y + d y \cdot s c a l e$
17:: $x_{n e w} \leftarrow clamp (x_{n e w}, 0, W - 1)$
18:: $y_{n e w} \leftarrow clamp (y_{n e w}, 0, H - 1)$
19:: $o u t [y, x] \leftarrow i m g [y_{n e w}, x_{n e w}]$
20:: end if
21:: end for
22:: end for
23:: return $o u t$

2.4.5. Model Training

The PWC network was fine-tuned to adapt to the specific task by freezing the initial layers. The following training parameters were included: optimizer: Adamax with a learning rate of

1 \times 10^{- 4}

; loss function: categorical crossentropy; metrics: F₁-score; batch size: dynamically determined based on the training generator configuration; epochs: up to 200 with early stopping based on validation loss to prevent overfitting.

The refinement network training involved the following: optimizer: Adam with a learning rate of

1 \times 10^{- 4}

; loss function: Dice loss function, optimized for segmentation tasks; metrics: Thresholded Dice score; batch size: 2; epochs: up to 300 with early stopping based on validation loss to prevent overfitting; training environment: utilization of GPU and memory growth settings to optimize hardware usage.

In the WTA method, the same set of slides was used for both PWC and segmentation models’ training. From the 97 slides, 77 slides were randomly chosen and divided into training and validation sets in a 2:1 ratio, with 51 and 26 slides, respectively, while 20 slides (including those with TMA holes) were used for testing.

WSIs in the dataset from the HULC cohort were divided into tiles (patches) and each tile was fed into the neural network along with the non-tumor/tumor label based on the provided annotation. To create the annotation labels for patches, non-tumor and tumor tiles were assigned the values 0 and 1, respectively. We first used a threshold on color gradients to separate the tissue from the background glass. Any tile that did not include more than 25% tissue was disregarded, meaning that all the input tiles contained less than 75% background glass. Also, a minimum of 5% of the tumor area was required for a tile to be classified as tumor, and for the non-tumor regions, only tiles with no tumor were assigned. Tiles containing less than 5% tumor area were excluded.

Using the annotated WSI regions with PSA in the NLCB dataset, 40 areas were assigned to the tumor class (labeled as 1) and 50 areas to the non-tumor class (labeled as 0). The selected areas led to the generation of patches in subsequent steps. Specifically, out of 50 areas categorized as non-tumor, 40 clearly lacked tumor characteristics, and 10 showed features slightly above the initial threshold, as shown in Supplementary Figure S2. This threshold was established through model training before intentionally creating an imbalance in the dataset. The imbalance was introduced after unsuccessful attempts to enhance model generalizability through various methods, including weighted loss functions, focal loss, threshold adjustment, and sampling strategies.

2.4.6. Post-Processing

After the segmentation results were received, two post-processing steps were performed. First, small fragments were removed by converting images into grayscale and then to binary format to identify and eliminate fragments smaller than a fixed threshold. The threshold was set to the smallest annotated segmentation area in the ground truth. In the second step, an edge smoothing algorithm was applied to enhance image quality. This improvement was achieved through mathematical techniques known as morphological operations, which are commonly used in digital image processing to modify the geometrical structure of images. Specifically, we used a process called morphological opening, which involves an erosion operation followed by a dilation. This sequence helps reduce jagged edges and smooths the boundaries of objects within the image. The operations were performed using a kernel size of

7 \times 7

. Additionally, a median blur with a kernel size of

11 \times 11

was applied to further smooth the edges. It is important to note that these morphological operations refer to image processing techniques. They are purely computational methods used to process the digital images and should not be confused with the morphological study of biological tissues.

2.5. Implementation

Implementation was conducted in Python 3.8.10. TensorFlow (v2.13.1) was used for model architecture implementation and training [49]. These additional libraries were used for the experiments: pyFAST, OpenCV, NumPy, Pillow, SciPy, scikit-learn, and Matplotlib [50,51,52,53,54,55,56,57]. Trained models were converted to the ONNX format using the tf2onnx library [58]. Converted models were then integrated into FastPathology for deployment [59]. FastPathology is an open-source, user-friendly software developed for deep learning-based digital pathology that offers tools for processing and visualizing WSIs. The source code used to conduct the experiments is made openly available at. https://github.com/AICAN-Research/DRU-Net (accessed on 29 April 2024).

2.6. Experiments

To compare the proposed model (DRU-Net) with other models, the following experiments were carried out: modifications of the previously introduced H2G-Net model on both datasets, DRU-Net with the backbone trained on the HULC cohort and NLCB, and applying the few-shot and many-shot learning techniques along with clustering (Table 2) [23].

H2G-Net could be tested as is, and be fine-tuned with five different modifications [23]. First, H2G-Net was tested without any modification, fine-tuning, or additional training, to see whether a model trained for breast cancer tumor delineation can also work for lung cancer. Second, the PWC of the H2G-Net was fine-tuned on annotated WSIs from the HULC cohort, and the original U-Net of H2G-Net was applied on top of the PWC results. Third, the whole model (PWC and U-Net) was fine-tuned on the training data. Then, the same three methods were tested, but with the PWC trained on NLCB instead of the HULC cohort.

An ablation study was performed to evaluate the effect of the proposed multi-lens distortion augmentation. A pre-trained DenseNet121 was tested on four open datasets: MNIST, Fashion-MNIST, CIFAR-10, and CIFAR-100 [44,45,46]. Experiments were repeated with and without this augmentation on the mentioned open datasets by randomly selecting 10% of the training data and the results were compared using Wilcoxon test. Both control and test groups included other augmentation techniques, such as color adjustments, flipping, rotation, brightness, and contrast augmentations. The effect of this augmentation on the training time was measured using the integrated TensorFlow functions by comparing the time with and without the augmentation and the results were averaged on WSIs and compared between the two [49].

We also investigated the effect of removing the top-most skip connection of the U-Net refinement model and we calculated the average Hausdorff distances (HDs) for two sets of final segmentation predictions in comparison to a ground truth set. This was conducted to quantify the effect of removing that skip connection, which was implemented to reduce the small fragments around the segmentation perimeter.

2.7. Model Evaluation

2.7.1. Quantitative Model Assessment

To quantitatively validate the patch-wise classification performance, precision, recall, and F₁-score were used [61]. The validation of the final segmentation on WSI-level was performed using DSC and HD [62].

2.7.2. Qualitative Model Assessment

The qualitative assessment of the segmentation results was conducted by two pathologists using the scoring system described in Table 3. Qualitative assessment was conducted on the same 20 WSIs of the test set from the HULC cohort.

2.7.3. Saliency Maps

To survey the model’s decision-making process and the areas of patches that were most relevant for predicting the tumor class, we employed a method known as gradient-based saliency maps [63,64,65,66]. This approach operates by computing the gradient of the output class (the class for which we want to understand model sensitivity) with respect to the input image. These gradients indicate the sensitivity of the output to each pixel in the input image. By highlighting the pixels with the highest gradients, we can visualize the areas that most strongly influenced the model’s classification decision. We used six different patches selected from six different WSIs from the HULC cohort to analyze the saliency maps. Patches were chosen to represent true positive, false positive, and false negative predictions. Patches with true positive predictions were selected to include various histological features and cell types in each patch to better assess the model’s decision process.

2.7.4. Computation of FLOPs and Parameters

To quantitatively assess the computational complexity and model size, we calculated the number of floating-point operations (FLOPs) and the total number of trainable parameters for all evaluated models, including DR-Fused and several standard architectures. For each model, FLOPs were estimated by converting the model into a frozen computational graph using TensorFlow’s convert_variables_to_constants_v2 function, followed by profiling with tf.compat.v1.profiler. The FLOPs represent the total number of arithmetic operations required for a single forward pass of an input image sized

224 \times 224 \times 3

. Parameter counts were obtained directly via the count_params method provided by TensorFlow. All FLOPs and parameter values were reported in millions (M) for clarity. MobileNetV2 was designated as the baseline model. Relative changes in FLOPs and parameters (ΔFLOPs and ΔParams) were computed for each model compared to MobileNetV2, using the following formulas:

Δ FLOPs (%) = \frac{{FLOPs}_{model} - {FLOPs}_{baseline}}{{FLOPs}_{baseline}} \times 100

(1)

Δ Params (%) = \frac{{Params}_{model} - {Params}_{baseline}}{{Params}_{baseline}} \times 100

(2)

3. Results

The highest DSC on average on the 20 WSIs of the test set from HULC cohort was achieved by DRU-Net, followed by the H2G-Net with fine-tuned PWC on the HULC cohort (Figure 3). Similar differences in DSC were observed for the models without the refinement networks (Figure 4).

Proposed multi-lens distortion augmentation applied to various datasets resulted in increased F₁-score overall, this change was statistically significant when applied to our dataset from the NLCB (Table 4). Applying this augmentation technique increased training time by an average of 8%. DSC and patch-wise accuracy increased when multi-lens distortion augmentation was used with a magnitude strength in the range [−0.4, 0.4], but higher magnitudes caused a decrease in performance (Figure 5).

The original H2G-Net resulted in an average of 0.76 DSC (Figure 3) and 0.66 intersection over union (IOU) scores. On average, 25% of the non-tumor regions around the true tumor outlines were falsely labeled as tumor. When the PWC component of the model was used without refinement, the predictions resulted in 0.64 DSC and 0.61 IOU, showing that the refinement improved the predictions significantly.

A fine-tuned PWC trained and validated on 77 WSIs from the HULC cohort, with the direct implementation of the pre-trained U-Net from H2G-Net, was tested on 20 WSIs from the HULC cohort and resulted in an average of 0.83 DSC (median 0.91) (Figure 3) and an average of 0.74 IOU scores. Scores were reduced to an average of 0.77 DSC (median of 0.87) and an average of 0.69 IOU when both the U-Net and the PWC were fine-tuned.

The proposed model (DRU-Net) tested on the same 20 WSIs resulted in an average of 0.91 DSC (median 0.93) and 0.81 IOU. Also, removing the top skip connection in our U-Net model (DRU-Net) resulted in an average reduction in HD by 4.8%. Figure 6 shows a comparison of the results from various models. Table 5 summarizes various backbones’ performance in the patch-wise classifier component of the model.

In addition to the classification performance, we evaluated the computational complexity of each backbone in terms of FLOPs (floating point operations) and number of parameters, as summarized in Table 6. While the proposed DR-Fused backbone exhibits higher computational cost compared to lightweight models such as MobileNetV2 [60], it remains significantly more efficient than very large networks like VGG19 [67] and ResNet101V2 [37]. Importantly, the DR-Fused model achieves substantial improvements in classification performance (Table 5), with an F₁-score of 0.94 compared to 0.86 for MobileNetV2 and 0.91 for DenseNet201 [36].

We compared the performance of several models on processing a set of 20 WSIs, with the average dimensions being approximately 108,640 pixels in width and 129,835 pixels in height. H2G-Net and its fine-tuned versions were the fastest models during inference (62 s). Although the many-shot and few-shot models had faster training, they exhibited slower runtimes, with MSC taking the longest at 167 s and DRU-Net at 152 s.

The results of the saliency map analysis in six patches are shown in Figure 7. False-positive areas in the saliency maps were partly explained by areas with reactive pneumocytes, macrophages, and reactive pneumocyte hyperplasia.

The qualitative assessment resulted in an average score of 3.95 out of 5. In nine of the cases assessed, there were sparse areas in the periphery of the tumor that the model misclassified.

4. Discussion

In this paper, we introduce a novel deep learning-based model to segment the outline of NSCLCs. We have incorporated a patch-wise classifier, synergistically integrating truncated DenseNet201 [36] and ResNet101V2 [37] architectures, enhanced by a segmentation refinement U-Net model. The proposed composite PWC model demonstrated superior performance over other tested backbones. Due to our relatively small dataset and considering the desired memory and speed efficiency, CNNs were preferred in this study. Using transformer-based models would have required more extensive datasets and computational resources [70,71].

This study also resulted in a novel dataset comprising annotated NSCLCs and marked regions of interest in WSIs from NSCLCs, covering various tissue types. Our results indicate that the PSA approach yielded more effective training outcomes for the patch-wise classifier than the WTA techniques, both with and without class balancing via tissue clustering. Using the WTA approach, annotations were extremely time-consuming for expert pathologists (including review and correction). However, the PSA method significantly reduced this time by an order of magnitude.

Our study demonstrated that the implementation of the multi-lens distortion augmentation technique enhanced classification outcomes across diverse datasets with limited volume of training data. However, the effect of this augmentation could vary depending on the data themselves. We investigated the effect of the augmentation’s strength range on the patch-wise classification accuracy and refinement network’s DSC on WSI-level, concluding that the degree of augmentation is pivotal for its impact on the training process. Excessively strong distortion of images could obstruct the model’s ability to learn relevant patterns, as shown in the impact of the multi-lens distortion augmentation with various strength ranges (Figure 5). It is important to note that the effective range is dependent on the dataset, and the same values may not necessarily yield similar improvements across different datasets.

The non-linear warping introduced by the multi-lens distortion mimics the subtle spatial deformations, slight micro-stretches of the tissue, and local distortions. By applying controlled, spatially varying warps at different scales, our augmentation reproduces these effects. This generates realistic variations in cell and tissue morphology. This not only strengthens model robustness to scanner-induced artifacts, but also promotes generalization across varying magnification levels, shapes, stretches, and similar sample-preparation conditions.

Instead of stain normalization techniques, we used an augmentation-based approach to produce more robustness, maintain important staining details, reduce computational complexity, and safeguard essential characteristics from unintended modification. The dataset already had consistent staining, which eliminated the need for traditional stain normalization [72,73]. To mimic the wide range of HE staining protocols seen across laboratories, we applied the mentioned randomized adjustments in brightness, contrast, and hue during training. By exposing the model to these controlled, biologically plausible variations in color balance and intensity, we effectively simulate batch-to-batch and site-to-site staining differences.

The RGSB-UNet model features a unique hybrid design that combines residual ghost blocks with switchable normalization and a bottleneck transformer [11]. This design focuses on extracting refined features through its complex structure. However, our study found that simpler and more synergistic architectures can also effectively extract reliable features.

The MAMC-Net model improves tumor boundary detection by using a conditional random field layer [21], whereas the DRU-Net model enhances segmentation by fine-tuning a U-Net on a down-sampled image. While both methods achieved good results, our approach—using a U-Net on down-sampled images—proved faster and highly efficient. Notably, our model using the PSA approach achieved comparable results despite using a much smaller dataset.

Transformer-based models like Swin-UNet and InternImage have demonstrated impressive performance in medical image segmentation tasks due to their ability to capture global contextual information through self-attention mechanisms [22,74]. However, transformer architectures typically have higher model complexity due to extensive self-attention operations and large parameter counts, which can result in increased computational demands compared to traditional CNNs [75]. In contrast, our proposed CNN-based DRU-Net maintains competitive segmentation performance with relatively lower computational requirements, potentially making it more suitable for deployment in resource-constrained clinical environments.

Similar to H2G-Net, our proposed model, DRU-Net, also utilizes a cascaded design with two stages of PWC and refinement, and has achieved comparable results [23]. Although H2G-Net uses a lightweight PWC and a relatively heavier U-Net for refinement, our architecture—DRU-Net—demonstrated better performance when using a heavier feature extractor (PWC) combined with a lightweight U-Net. This architectural choice is particularly beneficial in scenarios with limited training data. In such cases, placing the model’s capacity earlier in the pipeline allows it to capture more discriminative and generalizable features during the initial extraction stage, while a simpler refinement network, like a lightweight U-Net, helps to avoid overfitting during the later stages. This balance ensures that the network focuses on learning robust features without excessive parameter overhead in the refinement phase. Pedersen et al. introduced a balancing technique to ensure equitable representation of available categories [23]. This helps minimize bias toward specific tissue types or tumor characteristics.

In this study, we also encountered some challenges due to the significant class imbalance between the patches derived from the WTA approach. Addressing the resultant low precision, a comprehensive strategy was implemented to improve model accuracy. Key interventions included resampling techniques, both under- and over-sampling, as well as the incorporation of focal loss, which specifically helps to address class imbalance by modulating the loss function to focus on harder-to-classify examples [76]. Furthermore, we explored the clustering of similar tissue types before sampling, the use of a weighted loss function, and adjustments to the decision threshold.

In the training phase of the many-shot model using PSA-derived samples, we deliberately introduced a controlled imbalance to optimize threshold settings and enhance performance. Experiments suggested that the deliberately-induced imbalance may offer improved performance compared to methods such as resampling, under-/over-sampling, focal loss, clustered tissue sampling, weighted loss functions, and threshold tuning [76]. However, this approach poses a risk of bias, requiring careful calibration and ongoing monitoring to prevent skewed results. The DRU-Net model’s performance was validated externally, trained on the NLCB dataset and tested on 20 slides from the HULC cohort.

The decrease in performance after fine-tuning the U-Net layers of the H2G-Net may be due to the relatively small number of annotated WSIs available in our study. Conversely, the DRU-Net network’s superior performance under similar conditions suggests the efficacy of the DR-Fused network, accompanied by a relatively lightweight U-Net architecture in data-scarce scenarios.

The relatively low performance of the original H2G-Net on NSCLCs with no fine-tuning can be explained by different tissue morphology, growth pattern, and stromal invasions, which can mislead the model during inference [42,77,78,79,80,81,82,83].

To analyze the effect of the proposed U-Net refinement network, we compared Figure 3 and Figure 4. Our results indicate that refining the PWC heatmap with the suggested refinement network improved the performance of the evaluated models. However, the main strengths and weaknesses of the models compared to each other directly stem from the PWC models and the training methods used. Additionally, combining the two processes seems to improve and reduce the variance in the segmentation DSC values, indicating that the refinement models have learned to understand overall patterns and connections, leading to a better segmentation.

The difference observed in the average DSCs between the PWC models indicates that the models trained using PSA outperformed the WTA approach under limited data conditions. This was likely due to the inadequate separability of the feature distributions between tumor and non-tumor. In the WTA approach, the method involved annotating entire tumor regions, which often included patches where the feature distributions of tumor and non-tumor tissues overlapped significantly. This overlap reduced the separability and weakened the discriminatory power of the classification models trained using this approach. Consequently, the distinction between tumor and non-tumor features in these patches became less pronounced, leading to potential misclassifications.

The PSA method adopted a more selective approach by targeting patches for annotation based on their discriminative morphology. By focusing on patches where tumor and non-tumor features were clearly distinguishable, PSA enhanced the model’s ability to accurately classify these features. This selective annotation process effectively increased the inter-class variance while reducing the intra-class variance, thus significantly improving the overall performance of the classification models in distinguishing between tumor and non-tumor tissues under conditions of limited data. In the WTA approach, the mentioned inseparable feature distribution affected the loss function negatively, resulting in lower accuracy. This was most likely rooted in the fact that the tumor regions also include other cell types than the invasive epithelial cells. By using histopathological knowledge for selecting areas with the most relevant features in PSA, the variation in the features between the two classes could be increased.

It should, nonetheless, be noted that in our case, the PSA and WTA methods were applied to different datasets. Therefore, the observed performance differences do not constitute a statistical comparison, and no definitive claims can be made about the superiority of one approach over the other.

Our study indicates that employing few-shot learning in conjunction with a clustering approach can achieve accuracy levels comparable to methods reliant on extensive datasets, potentially mitigating the need for large-scale data collection. The few-shot learning approach can be beneficial when there is a high degree of similarity within each class of tissue types and a clear distinction between the classes in the feature space [84].

One of the novel techniques presented here was utilizing an evolutionary optimization technique to determine the optimal number of clusters (classes) to minimize intra-cluster variance and maximize inter-cluster variance prior to few-shot training. This method optimally configures clusters to reflect the most coherent and meaningful class structures, which is crucial when the available training data are scarce. By focusing on minimizing intra-cluster variance and minimizing inter-cluster similarity, the approach enhances the model’s ability to generalize from limited examples, a critical aspect in few-shot scenarios where the risk of overfitting is high. Evolutionary algorithms also offer adaptability and flexibility. This enables the model to effectively handle varying data types and distributions. This pre-training optimization led to more efficient training and improved model performance by grouping patches into different classes.

The qualitative assessment of our results suggests that the DRU-Net model shows limitations in accurately delineating the tumor periphery. This challenge was particularly evident in regions with fibrosis, reactive tissue, or inflammation, where the model tends to produce false-positive and false-negative segmentations. This limitation is most likely due to the limited size of the training data; with a larger dataset containing more examples of these complex regions correctly annotated, the model’s performance in these areas might be significantly improved.

A key limitation of our study is the modest size of our dataset of 97 WSIs from the HULC cohort. Generating pixel-perfect tumor outlines on WSIs is an extremely labor-intensive process and time-consuming for an expert pathologist (including review and correction), even when using semi-automated contouring tools. Under these resource constraints, expanding beyond 97 expertly whole tumor annotated slides was simply not feasible within the project timeline. We chose to create this new dataset rather than relying on existing publicly available annotated datasets because most of them focus exclusively on neoplastic cells at the pixel level, often excluding the surrounding stroma and other intermixed cell types present within the tumor region. Additionally, comparable datasets that adopt a whole-tumor region approach typically lack the resolution and accuracy required to precisely capture tumor borders and small, scattered tumor cell clusters.

Despite the limited dataset size, we observed a consistent alignment between training and validation loss curves along with a stable performance on the external test set. This suggests that the model’s performance is not merely a result of overfitting but a genuine generalization to the tested unseen data.

In the future, we suggest reducing the model size using advanced attention-focusing mechanisms and a multi-scale patch-wise classifier to better incorporate information at different scales. Employing anomaly detection algorithms might help identify reactive tissue outliers that contribute to false-positive classifications.

Although HE is the standard coloring method for the assessment of histopathology slides, the stain can vary from laboratory to laboratory. Hence, testing on non-Norwegian cohorts and from laboratories with different staining techniques can be beneficial. We searched extensively for open-access lung tumor-segmentation datasets that include tumor outlines demarcated according to the same protocol we employ, but did not identify any that match our annotation style or resolution. As a result, quantitative evaluation of segmentation generalizability beyond the NLCB and HULC cohorts remains challenging. For future work, we suggest addressing this gap with a proper dataset with multi-institutional WSI cohorts capturing a range of scanners, staining protocols, and patient demographics. After establishing the generalizability, the model should be set up for clinical validation.

Additionally, Mask R-CNN architectures are highly effective in distinguishing complex patterns that can be used for better tumor border delineation. Implementing Bayesian neural networks can potentially improve the prediction of tumor boundaries while quantifying the uncertainty of predictions. To more effectively incorporate global WSI context, methods such as Markov or conditional random fields could be integrated along with PWC or transformer architectures. Using this approach will ensure that segmented areas are not only based on local pixel values. To further improve the differentiation between the two classes, we suggest Neuro-Fuzzy Systems, maintaining the learning capabilities of neural networks while applying the reasoning capabilities of fuzzy logic. To overcome the challenge of limited data, we suggest using unsupervised domain adaptation algorithms to leverage annotated data from other histopathology source domains.

5. Conclusions

In conclusion, we have introduced DRU-Net for non-small cell lung cancer tumor delineation in WSIs. Our new model, which synergistically integrates truncated DenseNet201 and ResNet101V2 with a U-Net-based refinement stage, demonstrated high performance in NSCLCs over various tested methods. Our patch-wise classifier achieves superior performance through an advanced multi-lens distortion augmentation technique and an optimized PSA strategy.

Supplementary Materials

The following supporting information can be downloaded at https://www.mdpi.com/article/10.3390/jimaging11050166/s1. Reference [85] is cited in the supplementary materials. PDF S1: Supplementary Information for Segmentation of Non-Small Cell Lung carcinomas: Introducing DRU-Net and Multi-Lens Distortion. This file contains the following: Methods that were tested to address the class imbalance in the data; Details on the alternative methods that were used for comparison against the proposed method including H2G-Net, few-shot learning, and clustering techniques; Explaining the feature distribution challenges and the deliberately induced data imbalance. The source code is made openly available https://github.com/AICAN-Research/DRU-Net (accessed on 29 April 2024).

Author Contributions

Conceptualization, S.O. and H.S.; Data curation, M.V., V.G.D., S.G.F.W., M.D.H., M.P.R., L.A.A. and H.S.; Formal analysis, S.O., A.P. and E.S.; Funding acquisition, L.A.A. and H.S.; Investigation, S.O., M.V., V.G.D., S.G.F.W., M.D.H. and G.K.; Methodology, S.O., A.P., E.S., T.L. and G.K.; Project administration, H.S.; Resources, H.S.; Software, S.O., A.P. and E.S.; Supervision, H.S.; Validation, S.O. and M.V.; Visualization, S.O., A.P., E.S. and M.D.H.; Writing—original draft, S.O.; Writing—review & editing, S.O., M.V., A.P., V.G.D., M.H., S.G.F.W., M.D.H., T.L., M.P.R., L.A.A., G.K. and H.S. All authors have read and agreed to the published version of the manuscript.

Funding

The research leading to these results received funding from The Liaison Committee for Education, Research, and Innovation in Central Norway (identifiers 2021/928 and 2022/787). The work was also supported by grants from the Research Council of Norway through its Centres of Excellence funding scheme, project number 223250 (to L.A.A.).

Institutional Review Board Statement

This study was conducted in accordance with the Declaration of Helsinki and approved by the Regional Committee for Medical and Health Sciences Research Ethics (REK) Norway (identifier 257624, date of approval 21 June 2021), the institutional Personal Protection Officer and local Data Access Committee at the Norwegian University of Science and Technology and St. Olavs hospital, Trondheim University Hospital (identifier 2021/1374, date of approval 27 May 2022).

Informed Consent Statement

Written informed consent was obtained from all subjects involved in this study who were recruited from the Norwegian Lung Cancer Biobank (NLCB cohort). For subjects recruited from the University of Bergen (HULC cohort), the Regional Committee for Medical and Health Sciences Research Ethics (REK) Norway granted ethical approval to waive patient consent, since many patients were already deceased and obtaining consent only from surviving subjects would lead to study bias.

Data Availability Statement

The datasets generated and/or analysed during the current study are not publicly available due to the sensitive nature of personal medical data from patients who may still be alive, but might be available from Associate Professor Hanne Sorger upon request, on a mutual collaborative basis.

Acknowledgments

We extend our gratitude to Borgny Ytterhus for her contributions to this project.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Rami-Porta, R. Future perspectives on the TNM staging for lung cancer. Cancers 2021, 13, 1940. [Google Scholar] [CrossRef] [PubMed]
Lim, C.; Tsao, M.; Le, L.; Shepherd, F.; Feld, R.; Burkes, R.; Liu, G.; Kamel-Reid, S.; Hwang, D.; Tanguay, J.; et al. Biomarker testing and time to treatment decision in patients with advanced nonsmall-cell lung cancer. Ann. Oncol. 2015, 26, 1415–1421. [Google Scholar] [CrossRef] [PubMed]
Woodard, G.A.; Jones, K.D.; Jablons, D.M. Lung cancer staging and prognosis. In Lung Cancer Treatment and Research; Springer: Cham, Switzerland, 2016; pp. 47–75. [Google Scholar]
Hanna, M.G.; Reuter, V.E.; Samboy, J.; England, C.; Corsale, L.; Fine, S.W.; Agaram, N.P.; Stamelos, E.; Yagi, Y.; Hameed, M.; et al. Implementation of digital pathology offers clinical and operational increase in efficiency and cost savings. Arch. Pathol. Lab. Med. 2019, 143, 1545–1555. [Google Scholar] [CrossRef] [PubMed]
Bera, K.; Schalper, K.A.; Rimm, D.L.; Velcheti, V.; Madabhushi, A. Artificial intelligence in digital pathology—New tools for diagnosis and precision oncology. Nat. Rev. Clin. Oncol. 2019, 16, 703–715. [Google Scholar] [CrossRef]
Sakamoto, T.; Furukawa, T.; Lami, K.; Pham, H.H.N.; Uegami, W.; Kuroda, K.; Kawai, M.; Sakanashi, H.; Cooper, L.A.D.; Bychkov, A.; et al. A narrative review of digital pathology and artificial intelligence: Focusing on lung cancer. Transl. Lung Cancer Res. 2020, 9, 2255–2276. [Google Scholar] [CrossRef]
Niazi, M.K.K.; Parwani, A.V.; Gurcan, M.N. Digital pathology and artificial intelligence. Lancet Oncol. 2019, 20, e253–e261. [Google Scholar] [CrossRef]
Kurc, T.; Bakas, S.; Ren, X.; Bagari, A.; Momeni, A.; Huang, Y.; Zhang, L.; Kumar, A.; Thibault, M.; Qi, Q.; et al. Segmentation and classification in digital pathology for glioma research: Challenges and deep learning approaches. Front. Neurosci. 2020, 14, 27. [Google Scholar] [CrossRef]
Ho, D.J.; Yarlagadda, D.V.; D’Alfonso, T.M.; Hanna, M.G.; Grabenstetter, A.; Ntiamoah, P.; Brogi, E.; Tan, L.K.; Fuchs, T.J. Deep multi-magnification networks for multi-class breast cancer image segmentation. Comput. Med. Imaging Graph. 2021, 88, 101866. [Google Scholar] [CrossRef]
Qaiser, T.; Tsang, Y.W.; Taniyama, D.; Sakamoto, N.; Nakane, K.; Epstein, D.; Rajpoot, N. Fast and accurate tumor segmentation of histology images using persistent homology and deep convolutional features. Med. Image Anal. 2019, 55, 1–14. [Google Scholar] [CrossRef]
Zhao, T.; Fu, C.; Tie, M.; Sham, C.W.; Ma, H. RGSB-UNet: Hybrid Deep Learning Framework for Tumour Segmentation in Digital Pathology Images. Bioengineering 2023, 10, 957. [Google Scholar] [CrossRef]
Viswanathan, V.S.; Toro, P.; Corredor, G.; Mukhopadhyay, S.; Madabhushi, A. The state of the art for artificial intelligence in lung digital pathology. J. Pathol. 2022, 257, 413–429. [Google Scholar] [CrossRef] [PubMed]
Wang, S.; Yang, D.M.; Rong, R.; Zhan, X.; Fujimoto, J.; Liu, H.; Minna, J.; Wistuba, I.I.; Xie, Y.; Xiao, G. Artificial intelligence in lung cancer pathology image analysis. Cancers 2019, 11, 1673. [Google Scholar] [CrossRef] [PubMed]
Davri, A.; Birbas, E.; Kanavos, T.; Ntritsos, G.; Giannakeas, N.; Tzallas, A.T.; Batistatou, A. Deep Learning for Lung Cancer Diagnosis, Prognosis and Prediction Using Histological and Cytological Images: A Systematic Review. Cancers 2023, 15, 3981. [Google Scholar] [CrossRef]
Cheng, J.; Huang, K.; Xu, J. Computational pathology for precision diagnosis, treatment, and prognosis of cancer. Front. Med. 2023, 10, 1209666. [Google Scholar] [CrossRef]
Ranftl, R.; Bochkovskiy, A.; Koltun, V. Vision transformers for dense prediction. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Montreal, BC, Canada, 10–17 October 2021; pp. 12179–12188. [Google Scholar]
Wang, W.; Dai, J.; Chen, Z.; Huang, Z.; Li, Z.; Zhu, X.; Hu, X.; Lu, T.; Lu, L.; Li, H.; et al. Internimage: Exploring large-scale vision foundation models with deformable convolutions. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Nashville, TN, USA, 11–15 June 2023; pp. 14408–14419. [Google Scholar]
Park, N.; Kim, S. How do vision transformers work? arXiv 2022, arXiv:2202.06709. [Google Scholar]
Kassani, S.H.; Kassani, P.H.; Wesolowski, M.J.; Schneider, K.A.; Deters, R. Deep transfer learning based model for colorectal cancer histopathology segmentation: A comparative study of deep pre-trained models. Int. J. Med. Inform. 2022, 159, 104669. [Google Scholar] [CrossRef]
Lin, H.; Chen, H.; Dou, Q.; Wang, L.; Qin, J.; Heng, P.A. ScanNet: A Fast and Dense Scanning Framework for Metastastic Breast Cancer Detection from Whole-Slide Image. In Proceedings of the 2018 IEEE Winter Conference on Applications of Computer Vision (WACV), Lake Tahoe, NV, USA, 12–15 March 2018; pp. 539–546. [Google Scholar] [CrossRef]
Zeng, L.; Tang, H.; Wang, W.; Xie, M.; Ai, Z.; Chen, L.; Wu, Y. MAMC-Net: An effective deep learning framework for whole-slide image tumor segmentation. Multimed. Tools Appl. 2023, 82, 39349–39369. [Google Scholar] [CrossRef]
Wang, L.; Pan, L.; Wang, H.; Liu, M.; Feng, Z.; Rong, P.; Chen, Z.; Peng, S. DHUnet: Dual-branch hierarchical global–local fusion network for whole slide image segmentation. Biomed. Signal Process. Control 2023, 85, 104976. [Google Scholar] [CrossRef]
Pedersen, A.; Smistad, E.; Rise, T.V.; Dale, V.G.; Pettersen, H.S.; Nordmo, T.A.S.; Bouget, D.; Reinertsen, I.; Valla, M. H2G-Net: A multi-resolution refinement approach for segmentation of breast cancer region in gigapixel histopathological images. Front. Med. 2022, 9, 971873. [Google Scholar] [CrossRef]
Albusayli, R.; Graham, D.; Pathmanathan, N.; Shaban, M.; Minhas, F.; Armes, J.E.; Rajpoot, N.M. Simple non-iterative clustering and CNNs for coarse segmentation of breast cancer whole-slide images. In Proceedings of the Medical Imaging 2021: Digital Pathology, Online, 15–20 February 2021; Volume 11603, pp. 100–108. [Google Scholar]
Chelebian, E.; Avenel, C.; Ciompi, F.; Wählby, C. DEPICTER: Deep representation clustering for histology annotation. Comput. Biol. Med. 2024, 170, 108026. [Google Scholar] [CrossRef]
Yan, J.; Chen, H.; Li, X.; Yao, J. Deep contrastive learning based tissue clustering for annotation-free histopathology image analysis. Comput. Med. Imaging Graph. 2022, 97, 102053. [Google Scholar] [CrossRef] [PubMed]
Deuschel, J.; Firmbach, D.; Geppert, C.I.; Eckstein, M.; Hartmann, A.; Bruns, V.; Kuritcyn, P.; Dexl, J.; Hartmann, D.; Perrin, D.; et al. Multi-prototype few-shot learning in histopathology. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Montreal, BC, Canada, 11–17 October 2021; pp. 620–628. [Google Scholar]
Shakeri, F.; Boudiaf, M.; Mohammadi, S.; Sheth, I.; Havaei, M.; Ayed, I.B.; Kahou, S.E. FHIST: A benchmark for few-shot classification of histological images. arXiv 2022, arXiv:2206.00092. [Google Scholar]
Titoriya, A.K.; Singh, M.P. Few-Shot Learning on Histopathology Image Classification. In Proceedings of the 2022 International Conference on Computational Science and Computational Intelligence (CSCI), Las Vegas, NV, USA, 14–16 December 2022; IEEE: Piscataway, NJ, USA, 2022; pp. 251–256. [Google Scholar]
Liu, Z.; Lin, Y.; Cao, Y.; Hu, H.; Wei, Y.; Zhang, Z.; Lin, S.; Guo, B. Swin transformer: Hierarchical vision transformer using shifted windows. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Montreal, BC, Canada, 11–17 October 2021; pp. 10012–10022. [Google Scholar]
Liu, Z.; Mao, H.; Wu, C.Y.; Feichtenhofer, C.; Darrell, T.; Xie, S. A ConvNet for the 2020s. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA, 18–24 June 2022; pp. 11976–11986. [Google Scholar]
Krikid, F.; Rositi, H.; Vacavant, A. State-of-the-Art Deep Learning Methods for Microscopic Image Segmentation: Applications to Cells, Nuclei, and Tissues. J. Imaging 2024, 10, 311. [Google Scholar] [CrossRef] [PubMed]
Greeley, C.; Holder, L.; Nilsson, E.E.; Skinner, M.K. Scalable deep learning artificial intelligence histopathology slide analysis and validation. Sci. Rep. 2024, 14, 26748. [Google Scholar] [CrossRef]
Deng, R.; Cui, C.; Liu, Q.; Yao, T.; Remedios, L.W.; Bao, S.; Landman, B.A.; Wheless, L.E.; Coburn, L.A.; Wilson, K.T.; et al. Segment anything model (sam) for digital pathology: Assess zero-shot segmentation on whole slide imaging. In Proceedings of the IS&T International Symposium on Electronic Imaging, San Francisco, CA, USA, 2–6 February 2025; Volume 37, p. COIMG–132. [Google Scholar]
Ma, J.; He, Y.; Li, F.; Han, L.; You, C.; Wang, B. Segment anything in medical images. Nat. Commun. 2024, 15, 654. [Google Scholar] [CrossRef]
Huang, G.; Liu, Z.; Van Der Maaten, L.; Weinberger, K.Q. Densely Connected Convolutional Networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 4700–4708. [Google Scholar]
He, K.; Zhang, X.; Ren, S.; Sun, J. Deep Residual Learning for Image Recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 770–778. [Google Scholar]
Hatlen, P. Lung Cancer—Influence of Comorbidity on Incidence and Survival: The Nord-Trøndelag Health Study. Ph.D. Thesis, Norges Teknisk-Naturvitenskapelige Universitet, Det Medisinske Fakultet, Institutt for Sirkulasjon og Bildediagnostikk, Trondheim, Norway, 2014. [Google Scholar]
Ramnefjell, M.; Aamelfot, C.; Helgeland, L.; Akslen, L.A. Vascular invasion is an adverse prognostic factor in resected non–small-cell lung cancer. Apmis 2017, 125, 197–206. [Google Scholar] [CrossRef]
Hatlen, P.; Grønberg, B.H.; Langhammer, A.; Carlsen, S.M.; Amundsen, T. Prolonged survival in patients with lung cancer with diabetes mellitus. J. Thorac. Oncol. 2011, 6, 1810–1817. [Google Scholar] [CrossRef]
Yoh Watanabe, M. TNM classification for lung cancer. Ann. Thorac. Cardiovasc. Surg. 2003, 9, 343–350. [Google Scholar]
Travis, W. The 2015 WHO classification of lung tumors. Der Pathol. 2014, 35, 188. [Google Scholar] [CrossRef]
Valla, M.; Vatten, L.J.; Engstrøm, M.J.; Haugen, O.A.; Akslen, L.A.; Bjørngaard, J.H.; Hagen, A.I.; Ytterhus, B.; Bofin, A.M.; Opdahl, S. Molecular subtypes of breast cancer: Long-term incidence trends and prognostic differences. Cancer Epidemiol. Biomark. Prev. 2016, 25, 1625–1634. [Google Scholar] [CrossRef]
Deng, L. The MNIST Database of Handwritten Digit Images for Machine Learning Research. IEEE Signal Process. Mag. 2012, 29, 141–142. [Google Scholar] [CrossRef]
Xiao, H.; Rasul, K.; Vollgraf, R. Fashion-MNIST: A Novel Image Dataset for Benchmarking Machine Learning Algorithms. arXiv 2017, arXiv:1708.07747. [Google Scholar]
Krizhevsky, A. Learning Multiple Layers of Features from Tiny Images. Master’s Thesis, University of Tront, Toronto, ON, Canada, 2009. [Google Scholar]
Bankhead, P.; Loughrey, M.B.; Fernández, J.A.; Dombrowski, Y.; McArt, D.G.; Dunne, P.D.; McQuaid, S.; Gray, R.T.; Murray, L.J.; Coleman, H.G.; et al. QuPath: Open source software for digital pathology image analysis. Sci. Rep. 2017, 7, 16878. [Google Scholar] [CrossRef]
Deng, J.; Dong, W.; Socher, R.; Li, L.J.; Li, K.; Fei-Fei, L. ImageNet: A large-scale hierarchical image database. In Proceedings of the 2009 IEEE Conference on Computer Vision and Pattern Recognition, Miami, FL, USA, 20–25 June 2009; IEEE: Piscataway, NJ, USA, 2009; pp. 248–255. [Google Scholar]
Abadi, M.; Agarwal, A.; Barham, P.; Brevdo, E.; Chen, Z.; Citro, C.; Corrado, G.S.; Davis, A.; Dean, J.; Devin, M.; et al. TensorFlow: Large-Scale Machine Learning on Heterogeneous Systems. 2015. Available online: https://www.tensorflow.org (accessed on 10 November 2023).
Smistad, E.; Bozorgi, M.; Lindseth, F. FAST: Framework for heterogeneous medical image computing and visualization. Int. J. Comput. Assist. Radiol. Surg. 2015, 10, 1811–1822. [Google Scholar] [CrossRef]
Smistad, E.; Østvik, A.; Pedersen, A. High performance neural network inference, streaming, and visualization of medical images using FAST. IEEE Access 2019, 7, 136310–136321. [Google Scholar] [CrossRef]
Bradski, G. The OpenCV Library. Dr. Dobb’s J. Softw. Tools 2000, 120, 122–125. [Google Scholar]
Clark, A. Pillow (PIL Fork) Documentation. 2015. Available online: https://buildmedia.readthedocs.org/media/pdf/pillow/latest/pillow.pdf (accessed on 10 November 2023).
Harris, C.R.; Millman, K.J.; van der Walt, S.J.; Gommers, R.; Virtanen, P.; Cournapeau, D.; Wieser, E.; Taylor, J.; Berg, S.; Smith, N.J.; et al. Array programming with NumPy. Nature 2020, 585, 357–362. [Google Scholar] [CrossRef]
Virtanen, P.; Gommers, R.; Oliphant, T.E.; Haberland, M.; Reddy, T.; Cournapeau, D.; Burovski, E.; Peterson, P.; Weckesser, W.; Bright, J.; et al. SciPy 1.0: Fundamental Algorithms for Scientific Computing in Python. Nat. Methods 2020, 17, 261–272. [Google Scholar] [CrossRef]
Pedregosa, F.; Varoquaux, G.; Gramfort, A.; Michel, V.; Thirion, B.; Grisel, O.; Blondel, M.; Prettenhofer, P.; Weiss, R.; Dubourg, V.; et al. Scikit-learn: Machine Learning in Python. J. Mach. Learn. Res. 2011, 12, 2825–2830. [Google Scholar]
Hunter, J.D. Matplotlib: A 2D graphics environment. Comput. Sci. Eng. 2007, 9, 90–95. [Google Scholar] [CrossRef]
ONNX. Convert TensorFlow, Keras, Tensorflow.js and Tflite Models to ONNX. 2024. Available online: https://github.com/onnx/tensorflow-onnx (accessed on 10 November 2023).
Pedersen, A.; Valla, M.; Bofin, A.M.; De Frutos, J.P.; Reinertsen, I.; Smistad, E. FastPathology: An open-source platform for deep learning-based research and decision support in digital pathology. IEEE Access 2021, 9, 58216–58229. [Google Scholar] [CrossRef]
Sandler, M.; Howard, A.; Zhu, M.; Zhmoginov, A.; Chen, L.C. MobileNetV2: Inverted Residuals and Linear Bottlenecks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 4510–4520. [Google Scholar]
Goutte, C.; Gaussier, E. A probabilistic interpretation of precision, recall and F-score, with implication for evaluation. In Advances in Information Retrieval, Proceedings of the 27th European Conference on Information Retrieval, Santiago de Compostela, Spain, 21–23 March 2005; Springer: Berlin/Heidelberg, Germany, 2005; pp. 345–359. [Google Scholar]
Kim, H.; Monroe, J.I.; Lo, S.; Yao, M.; Harari, P.M.; Machtay, M.; Sohn, J.W. Quantitative evaluation of image segmentation incorporating medical consideration functions. Med. Phys. 2015, 42, 3013–3023. [Google Scholar] [CrossRef] [PubMed]
Patro, B.N.; Lunayach, M.; Patel, S.; Namboodiri, V.P. U-CAM: Visual Explanation using Uncertainty based Class Activation Maps. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Seoul, Republic of Korea, 27 October–2 November 2019; pp. 7444–7453. [Google Scholar]
Simonyan, K.; Vedaldi, A.; Zisserman, A. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv 2013, arXiv:1312.6034. [Google Scholar]
Zeiler, M.D.; Fergus, R. Visualizing and understanding convolutional networks. In Computer Vision–ECCV 2014, Proceedings of the 13th European Conference, Zurich, Switzerland, 6–12 September 2014; Proceedings, Part I 13; Springer: Berlin/Heidelberg, Germany, 2014; pp. 818–833. [Google Scholar]
Sundararajan, M.; Taly, A.; Yan, Q. Axiomatic attribution for deep networks. In Proceedings of the International Conference on Machine Learning, Sydney, Australia, 6–11 August 2017; pp. 3319–3328. [Google Scholar]
Simonyan, K.; Zisserman, A. Very Deep Convolutional Networks for Large-Scale Image Recognition. arXiv 2014, arXiv:1409.1556. [Google Scholar]
Tan, M.; Le, Q. EfficientNetV2: Smaller Models and Faster Training. In Proceedings of the International Conference on Machine Learning, Virtual, 18–24 July 2021; pp. 10096–10106. [Google Scholar]
Szegedy, C.; Vanhoucke, V.; Ioffe, S.; Shlens, J.; Wojna, Z. Rethinking the Inception Architecture for Computer Vision. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 2818–2826. [Google Scholar]
Gai, L.; Xing, M.; Chen, W.; Zhang, Y.; Qiao, X. Comparing CNN-based and transformer-based models for identifying lung cancer: Which is more effective? Multimed. Tools Appl. 2024, 83, 59253–59269. [Google Scholar] [CrossRef]
Sangeetha, S.; Mathivanan, S.K.; Muthukumaran, V.; Cho, J.; Easwaramoorthy, S.V. An Empirical Analysis of Transformer-Based and Convolutional Neural Network Approaches for Early Detection and Diagnosis of Cancer Using Multimodal Imaging and Genomic Data. IEEE Access 2025, 13, 6120–6145. [Google Scholar] [CrossRef]
Lakshmanan, B.; Anand, S.; Jenitha, T. Stain removal through color normalization of haematoxylin and eosin images: A review. Proc. J. Phys. Conf. Ser. 2019, 1362, 012108. [Google Scholar] [CrossRef]
Tellez, D.; Litjens, G.; Bándi, P.; Bulten, W.; Bokhorst, J.M.; Ciompi, F.; Van Der Laak, J. Quantifying the effects of data augmentation and stain color normalization in convolutional neural networks for computational pathology. Med. Image Anal. 2019, 58, 101544. [Google Scholar] [CrossRef]
Cao, H.; Wang, Y.; Chen, J.; Jiang, D.; Zhang, X.; Tian, Q.; Wang, M. Swin-unet: Unet-like pure transformer for medical image segmentation. In Computer Vision—ECCV 2022 Workshops, Proceedings of the European Conference on Computer Vision, Tel Aviv, Israel, 23–27 October 2022; Springer: Cham, Switzerland, 2022; pp. 205–218. [Google Scholar]
Dosovitskiy, A.; Beyer, L.; Kolesnikov, A.; Weissenborn, D.; Zhai, X.; Unterthiner, T.; Dehghani, M.; Minderer, M.; Heigold, G.; Gelly, S.; et al. An image is worth 16x16 words: Transformers for image recognition at scale. arXiv 2020, arXiv:2010.11929. [Google Scholar]
Lin, T.Y.; Goyal, P.; Girshick, R.; He, K.; Dollár, P. Focal loss for dense object detection. In Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy, 22–29 October 2017; pp. 2980–2988. [Google Scholar]
Menon, A.; Singh, P.; Vinod, P.; Jawahar, C. Exploring Histological Similarities Across Cancers from a Deep Learning Perspective. Front. Oncol. 2022, 12, 842759. [Google Scholar] [CrossRef]
Kashima, J.; Kitadai, R.; Okuma, Y. Molecular and Morphological Profiling of Lung Cancer: A Foundation for “Next-Generation" Pathologists and Oncologists. Cancers 2019, 11, 599. [Google Scholar] [CrossRef] [PubMed]
Petersen, I. The morphological and molecular diagnosis of lung cancer. Dtsch. Ärztebl. Int. 2011, 108, 525–531. [Google Scholar] [CrossRef] [PubMed]
Inamura, K. Lung cancer: Understanding its molecular pathology and the 2015 WHO classification. Front. Oncol. 2017, 7, 193. [Google Scholar] [CrossRef] [PubMed]
Zhao, S.; Chen, D.P.; Fu, T.; Yang, J.C.; Ma, D.; Zhu, X.Z.; Wang, X.X.; Jiao, Y.P.; Jin, X.; Xiao, Y.; et al. Single-cell morphological and topological atlas reveals the ecosystem diversity of human breast cancer. Nat. Commun. 2023, 14, 6796. [Google Scholar] [CrossRef]
Binder, A.; Bockmayr, M.; Hägele, M.; Wienert, S.; Heim, D.; Hellweg, K.; Ishii, M.; Stenzinger, A.; Hocke, A.; Denkert, C.; et al. Morphological and molecular breast cancer profiling through explainable machine learning. Nat. Mach. Intell. 2021, 3, 355–366. [Google Scholar] [CrossRef]
Tan, P.H.; Ellis, I.; Allison, K.; Brogi, E.; Fox, S.B.; Lakhani, S.; Lazar, A.J.; Morris, E.A.; Sahin, A.; Salgado, R.; et al. The 2019 WHO classification of tumours of the breast. Histopathology 2020, 77, 181–185. [Google Scholar] [CrossRef]
Qi, Y.; Sun, H.; Liu, N.; Zhou, H. A Task-Aware Dual Similarity Network for Fine-Grained Few-Shot Learning. In PRICAI 2022: Trends in Artificial Intelligence, Proceedings of the Pacific Rim International Conference on Artificial Intelligence, Shanghai, China, 10–13 November 2022; Springer: Cham, Switzerland, 2022; pp. 606–618. [Google Scholar]
Achanta, R.; Shaji, A.; Smith, K.; Lucchi, A.; Fua, P.; Süsstrunk, S. Slic Superpixels; Ecole Polytechnique F´edrale de Lausanne (EPFL): Lausanne, Switzerland, 2010. [Google Scholar]

Figure 1. Illustration of the proposed DRU-Net model. The patched image is fed into the classifier part. The output of the classifier is combined with a down-sampled WSI as an input for the refinement head.

Figure 2. Sample effect of the novel augmentation on a patch with overlaid grids to illustrate the effect. (a) Original image showing epithelial cells. (b) Augmented image with parameters set too high, cell size variation and deformation are visible. (c) Augmented image with a medium setting of the parameters.

Figure 3. Boxplots of the Dice similarity coefficients (DSCs) of the experiments shown Table 2 on the 20 WSIs of the test set. (I) original H2G-Net, (II) H2G-Net with fine-tuned PWC on HULC cohort, (III) H2G-Net with fine-tuned U-Net on HULC cohort, (IV) H2G-Net with fine-tuned PWC and U-Net on HULC cohort, (V) DRU-Net trained on HULC Cohort, (VI) H2G-Net with fine-tuned PWC on NLCB, (VII) H2G-Net with fine-tuned PWC on NLCB and fine-tuned U-Net on HULC Cohort, (VIII) FSC, (IX) MSC, (X) DRU-Net with PWC trained on NLCB and U-Net trained on HULC Cohort.

Figure 4. Boxplot of the Dice similarity coefficients (DSCs) of the PWC models in experiments listed in Table 2 without the refinement network, only the patch-wise classifier is used to produce these results. (I) original H2G-Net, (II) H2G-Net with fine-tuned PWC on HULC cohort, (V) DR-Fused trained on HULC Cohort, (VI) H2G-Net with fine-tuned PWC on NLCB, (X) DR-Fused trained on NLCB and U-Net trained on HULC Cohort.

Figure 5. The impact of the multi-lens distortion augmentation technique using the DRU-Net model. DSC: Dice similarity coefficient. The highlighted regions indicate the variance, and the mean values are shown on the curve.

Figure 6. Sample results of three tested networks. First row: original whole slide images (WSIs), second row: DRU-Net, third row: FSC (Few-shot learning + clustering), fourth row: H2G-Net with fine-tuned patch-wise classifier and original U-Net. Green pixels indicate true positives, White pixels indicate false positives and red pixels indicate false negatives. * Indicates that this is not the original H2G-Net, but a modified version.

Figure 7. Sample patches (top row) and their overlaid saliency maps (bottom row); only the patches were given to the PWC model. Note that the saliency map does not indicate malignancy; instead, it shows how different regions of the image influence the classification decision. The colors on the map range from blue, indicating the least influence, to red, which indicates the most influence. (a) shows a false negative where it misses the tumor, (b,c) show false positive tumor detection. (d–f) show true positive tumor detection. (a) shows three small sheets of atypical epithelial tumor cells, of which only one is highlighted in red. The remaining tissue comprises widened alveolar septae with inflammatory cells, pigmented macrophages, reactive pneumocytes and red blood cells. (b) includes reactive pneumocytes and macrophages. (c) shows reactive pneumocyte hyperplasia. (d) presents a solid tumor with enlarged nuclei, where the majority of the model’s focus lies; the peripheral parts of the patch contain alveolar tissue. (e) highlights a solid tumor (mostly in yellow) alongside inflammatory cells (primarily in blue). (f) shows a solid tumor with areas of necrosis (mainly highlighted in yellow and red) as well as fibrous tissues with inflammatory cells, predominantly marked in blue and some green.

Table 1. Histological subtypes of non-small cell lung carcinoma cases in the NLCB and the HULC cohorts. Counts are shown with corresponding percentages. AC: Adenocarcinoma, SCC: Squamous Cell Carcinoma, NSCC: Non-small Cell Carcinomas, WSIs: Whole Slide Images.

Histological Subtype	NLCB (n,%)	HULC—Train (n, %)	HULC—Test (n, %)
AC	16 (38.1%)	38 (49.4%)	7 (35.0%)
SCC	15 (35.7%)	32 (41.6%)	10 (50.0%)
Other NSCC	11 (26.2%)	7 (9.1%)	3 (15.0%)
Total number of WSIs	42	77	20

Table 2. Methods and experiments carried out with various models on the same 20 WSIs of the test set from the HULC cohort. Abbreviations: PWC: patch-wise classifier; HULC: Haukeland University Lung Cancer; NLCB: Norwegian Lung Cancer Biobank; FSC: few-shot (with a pre-trained MobileNetV2 [60] model) + clustering; MSC: many-shot (with a pre-trained MobileNetV2 [60] model) + clustering.

	Models	Modifications	Training Dataset (s)
(I)	H2G-Net	—	—
(II)	H2G-Net	Fine-tuned PWC	HULC Cohort
(III)	H2G-Net	Fine-tuned U-Net	HULC Cohort
(IV)	H2G-Net	Fine-tuned PWC and original U-Net	HULC Cohort
(V)	DRU-Net	—	HULC Cohort
(VI)	H2G-Net	Fine-tuned PWC	NLCB
(VII)	H2G-Net	Fine-tuned PWC and U-Net	PWC trained on NLCB, U-Net trained on HULC Cohort
(VIII)	FSC	—	NLCB
(IX)	MSC	—	NLCB
(X)	DRU-Net	—	PWC trained on NLCB, U-Net trained on HULC Cohort

Table 3. Qualitative evaluation scoring system.

0	1	2	3	4	5
No tumor tissue in image or segmentation, or image not suitable for analysis	Completely wrong segmentation of tumor, tumor tissue not segmented	A large part of the tumor is not segmented	Most of the tumor is correctly segmented, but some false positive or false negative areas	Most of the tumor is correctly segmented, only sparse false positive or false negative areas	The whole or almost the whole tumor correctly segmented

Table 4. The impact of the multi-lens distortion augmentation technique using different architectures on different datasets, randomly selecting 10% of the training data. Pairwise tests were performed using Wilcoxon signed-rank tests. The augmentation design with the highest F₁-scores row-wise are highlighted in bold.

		F₁-Score
Model	Dataset	W/O Aug	W/ Aug	p-Value
DenseNet121	MNIST	0.9893	0.9894	0.2311
DenseNet121	Fashion-MNIST	0.9043	0.9208	<0.001
DenseNet121	CIFAR-10	0.8086	0.8235	<0.001
DenseNet121	CIFAR-100	0.5199	0.5581	0.0502
H2G-Net	NLCB	0.8299	0.8341	0.0701
DRU-Net	NLCB	0.8868	0.9025	0.0241

Table 5. Comparison of different backbone architectures for patch-wise classification of lung cancer tissue using the many-shot method. The best-performing architecture per metric is highlighted in bold. Abbreviations: DR: fusion of DenseNet201 (D) and ResNet101V2 (R).

Architecture	F₁-Score	Precision	Recall
VGG19 [67]	0.87	0.86	0.87
ResNet101V2 [37]	0.89	0.89	0.89
MobileNetV2 [60]	0.86	0.86	0.86
EfficientNetV2 [68]	0.89	0.89	0.89
InceptionV3 [69]	0.90	0.89	0.91
DenseNet201 [36]	0.91	0.91	0.91
Proposed DR-Fused	0.94	0.94	0.93

Table 6. Computational complexity comparison between different backbone architectures. Metrics are reported as total FLOPs and number of parameters. The percentage increase relative to MobileNetV2 is also reported.

Architecture	FLOPs (M)	Params (M)	ΔFLOPs (%)	ΔParams (%)
DR-Fused	11,105.27	13.18	1712.42	483.02
VGG19 [67]	39,276.93	139.58	6310.14	6074.55
ResNet101V2 [37]	14,430.04	42.63	2255.04	1785.86
MobileNetV2 [60]	612.73	2.26	0.00	0.00
EfficientNetV2 [68]	1455.32	5.92	137.51	161.97
InceptionV3 [69]	5693.36	21.81	829.18	864.67
DenseNet201 [36]	8631.68	18.33	1308.72	710.68

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Oskouei, S.; Valla, M.; Pedersen, A.; Smistad, E.; Dale, V.G.; Høibø, M.; Wahl, S.G.F.; Haugum, M.D.; Langø, T.; Ramnefjell, M.P.; et al. Segmentation of Non-Small Cell Lung Carcinomas: Introducing DRU-Net and Multi-Lens Distortion. J. Imaging 2025, 11, 166. https://doi.org/10.3390/jimaging11050166

AMA Style

Oskouei S, Valla M, Pedersen A, Smistad E, Dale VG, Høibø M, Wahl SGF, Haugum MD, Langø T, Ramnefjell MP, et al. Segmentation of Non-Small Cell Lung Carcinomas: Introducing DRU-Net and Multi-Lens Distortion. Journal of Imaging. 2025; 11(5):166. https://doi.org/10.3390/jimaging11050166

Chicago/Turabian Style

Oskouei, Soroush, Marit Valla, André Pedersen, Erik Smistad, Vibeke Grotnes Dale, Maren Høibø, Sissel Gyrid Freim Wahl, Mats Dehli Haugum, Thomas Langø, Maria Paula Ramnefjell, and et al. 2025. "Segmentation of Non-Small Cell Lung Carcinomas: Introducing DRU-Net and Multi-Lens Distortion" Journal of Imaging 11, no. 5: 166. https://doi.org/10.3390/jimaging11050166

APA Style

Oskouei, S., Valla, M., Pedersen, A., Smistad, E., Dale, V. G., Høibø, M., Wahl, S. G. F., Haugum, M. D., Langø, T., Ramnefjell, M. P., Akslen, L. A., Kiss, G., & Sorger, H. (2025). Segmentation of Non-Small Cell Lung Carcinomas: Introducing DRU-Net and Multi-Lens Distortion. Journal of Imaging, 11(5), 166. https://doi.org/10.3390/jimaging11050166

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Segmentation of Non-Small Cell Lung Carcinomas: Introducing DRU-Net and Multi-Lens Distortion

Abstract

1. Introduction

2. Materials and Methods

2.1. Cohorts

2.2. Ethical Aspects

2.3. Annotations and Dataset Preparation

2.4. Proposed Method

2.4.1. Patch-Wise Classifier

2.4.2. Refinement Network

2.4.3. Data Augmentation

2.4.4. Multi-Lens Distortion Augmentation

2.4.5. Model Training

2.4.6. Post-Processing

2.5. Implementation

2.6. Experiments

2.7. Model Evaluation

2.7.1. Quantitative Model Assessment

2.7.2. Qualitative Model Assessment

2.7.3. Saliency Maps

2.7.4. Computation of FLOPs and Parameters

3. Results

4. Discussion

5. Conclusions

Supplementary Materials

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI